How to protect your website from being used for AI training

AI operators use bots called AI crawlers to acquire the data needed to build and develop AI models. They conduct mass-scale searches in order to copy content from publicly available websites and services.
The scope of training and development of new models changes over time, and the situation needs to be monitored to adjust the methods for preventing the use of protected content.

 

Proposed solutions:

 

Robots.txt file

Protection against this type of unauthorized or unlicensed use can be accomplished in several ways:

  1. Configuring the robots.txt file

    The file is placed in the root path (Root-directory) of the main domain and all sub-domains, e.g. https://company.
    com.pl/robots.txt , https://blog.company.com.pl, etc..
     
    It must be publicly available and contain relevant records.

    This method is ineffective for AI operators who do not respect the declarations written in robots.txt.
    Then we recommend solutions 2 and 3.

    On www.zaiks.org.pl/ai
    , you can find a sample robots.txt file blocking the currently known AI crawlers from collecting data from the music and video folders. 

    On the basis of this file, you can prepare a version suitable for the protected site.

  2. Another method available for sites using protection services provided by, e.g., Cloudflare, Akamai, Imperva, Barracuda, etc., is the possibility of enabling AI protection. This is an increasingly popular feature. The configuration will depend on the specific service or solution. We recommend contacting your service provider.

  3. For advanced sites of users who have teams of site administrators, it is possible to maintain the blocking of IP addresses of networks from which bots operate. Then it is also possible to detect bots that do not follow the rules written in robots.txt.

 

Reservation of rights for the purposes of TDM

Complementarily, you can add a sample reservation of rights on your site, which prevents the application of the copyright exception for text and data mining. The reservation can be added even in the terms and conditions of the website. You can find a sample reservation on www.zaiks.org.pl/ai.

 

>Back to previous page