Friday, June 14, 2024

EU Regulator Adopts Restrictive GDPR Position on Data Scraping Impacting AI Technologies

Must read


May 23, 2024

The Dutch Data Protection Authority—the Autoriteit Persoonsgegevens (AP)—recently announced that it will in many cases regard scraping of personal data by private sector organizations as an infringement of the EU General Data Protection Regulation (GDPR). This position, if widely adopted across the European Union and United Kingdom, will have potentially far-reaching implications for “developers” and “deployers” of artificial intelligence (AI) models and systems, as it could result in a significant restriction on data scraping in connection with AI model training activities.

The AP’s position has drawn criticism from Dutch lawmakers as well the European Commission, and there is a pending referral to the Court of Justice of the European Union on the issue.


The AP’s 1 May 2024 guidance adopts a broad understanding of scraping to mean any automated collection and recording of information from the Internet. In the AP’s view, scraping data from the Internet (such as from social media profiles) will inevitably result in the scraping of personal data, which will, in turn, trigger the application of the GDPR.

The AP’s view on the scraping of data is that it is challenging for GDPR-compliant “consent” to be given by a data subject to the processing of their data; notably, in practice the controller may find it difficult to identify the applicable data subjects whose data will be scraped to obtain their consent. Therefore, according to the AP, “legitimate interests” is likely to be the only basis under the GDPR that private sector organizations may be able to potentially use to lawfully scrape personal data.

Critically, however, the AP’s view is that such organizations’ “commercial interests” are unlikely to qualify as a potential “legitimate interest” under the GDPR. (This AP position has been criticized by the European Commission.) This could, in turn, potentially exclude a large number of private sector use cases for data scraping, including the training of AI models. Overall, in practice, the AP is very likely to treat personal data scraping by private sector parties in many circumstances as not having a “lawful basis” under the GDPR.

The AP has also set out illustrative example of use cases involving data scraping where, in its view, the data processing can and cannot satisfy GDPR requirements. It is unclear how the examples the AP provides of potentially acceptable use cases are consistent with the AP’s position on “legitimate interest” as described above. The AP provides the following examples of use cases which are very likely to be unlawful under the GDPR, namely scraping personal data from:

  • the internet to create profiles of data subjects to resell them;
  • protected social media accounts or private forums; and
  • public social media profiles to determine whether data subjects are eligible for insurance coverage.


Following its above analysis, the AP considers that developing Generative AI using scraped personal data would not qualify as a potential “legitimate interest” under the GDPR. This development coupled with the imminent coming into force of the EU’s new AI Act, and certain prominent copyright holders seeking to opt out of copyright-related “text and data mining” may collectively represent a notable tightening of EU legal requirements with respect to AI-related data scraping, including with respect to Generative AI.

(We have also written about the position that other global data protection regulators—including the United Kingdom’s GDPR regulator, the Information Commissioner’s Office—and courts in the United States have adopted with respect to data scraping.)

We expect that the impact of all these EU legal developments (even if the AP’s position on scraping is subsequently not accepted more widely within the EU, as well as softened in the Netherlands itself) is that organizations will have to be much more cognizant of what online personal data they are scraping as well as the GDPR and other legal requirements to do so lawfully.

Data license arrangements with certain types of data aggregators or mass data holders might be an available path, likely with the aggregators or data holders taking on responsibility for ensuring notice and consent to subjects. Additionally, there has been much discussion of the potential to use synthetic data in the training of AI models, taking the lead from use cases such as autonomous vehicles, which could be a potentially powerful solution to this challenge.

In summary, the practical implications at an EU-wide level of the AP’s position are yet to be understood, along with the workarounds and solutions that will be required should this position be adopted.

Trainee solicitor Annabel Pahl assisted with this LawFlash.

Latest article