Trusq

factual analysis · traceable to primary sources

Explainer

AI sourcing: finding and scraping candidates without breaking the rules

Adopted 2026-06-21 ยท ≈ 2 min read ยท Dirk Baaijen

AI tools that find candidates by scraping public profiles mainly engage the GDPR: even public data needs a basis, transparency and data minimisation. Untargeted scraping of facial images is even prohibited, and once the tool ranks candidates the high-risk regime is added.

Short answer: Sourcing tools that use AI to find candidates by searching public profiles (LinkedIn, GitHub, social media) mainly engage the GDPR. Data being public does not mean you may freely collect and process it: you need a basis, transparency and data minimisation. The untargeted scraping of facial images is moreover prohibited, and once the tool scores or ranks candidates the high-risk regime is added.

Public is not free

The biggest misconception: "it's public, so I may use it." The GDPR applies in full to publicly available personal data. That means:

  • A basis โ€” usually legitimate interest, with a balancing test that weighs the candidate's interest.
  • Transparency โ€” in principle you must inform the data subject that you process their data (with a limited exception for disproportionate effort).
  • Data minimisation โ€” collect only what is relevant to the job, not full profiles "just in case".

The scraping ban

Article 5 of the AI Act prohibits the untargeted scraping of facial images from the internet or camera footage to build recognition databases. A sourcing tool that collects profile photos for facial recognition falls squarely under it โ€” the highest fine band.

Once it ranks: high-risk

Many sourcing tools not only find but rank candidates by suitability. At that point the tool performs selection and falls under Annex III โ€” with human oversight and bias monitoring, just like CV screening. A ranking based on scraped data combines two risks: a questionable basis and possible discrimination.

Platform terms

Beyond the law, the platforms' own terms of use apply, which often prohibit automated scraping. A breach of those is not an AI Act matter, but it is a legal and reputational risk.

What to do

  • Determine the basis before you scrape โ€” and be honest about the balancing test.
  • Minimise: collect job-relevant data, not full profiles.
  • Do not process facial images for recognition.
  • Treat ranking as high-risk with human oversight.
  • Respect the platform terms.

AI sourcing makes the pond bigger, but it does not shift the responsibility. Whoever collects, processes โ€” and the processor is on the hook.

Sources

  1. https://eur-lex.europa.eu/eli/reg/2016/679/oj
    Regulation (EU) 2016/679 (GDPR): basis, transparency and data minimisation, also when processing publicly available personal data.
  2. https://eur-lex.europa.eu/eli/reg/2024/1689/oj
    Regulation (EU) 2024/1689 (AI Act): Art. 5 prohibits untargeted scraping of facial images; Annex III where the tool assesses or ranks candidates.

Share on LinkedIn

Read next

U

GDPR in the workplace: employee data and AI

HR AI runs on employee data, and that sits under the GDPR. Consent is rarely a valid basis given the power imbalance; you often fall back on legitimate interest or a legal obligation. Special-category data, transparency, data minimisation and a DPIA decide whether it is allowed.

A

AI sentiment analysis of employees: the thin line to the emotion ban

AI inferring employee mood from email, chat, surveys or speech brushes against the emotion-recognition ban (Art. 5 AI Act) and the GDPR. Aggregated and anonymous is sometimes possible; individual monitoring almost never.

U

Data processing agreement (GDPR art. 28): needed with an AI vendor?

If an AI vendor processes personal data on your behalf, Article 28 GDPR requires a written data processing agreement with fixed minimum content. This explainer sets out what it must contain and what to watch for with AI services.

Dirk Baaijen

About this knowledge base

Compiled and maintained by YRproject โ€” programme and project direction at the intersection of digital transformation, AI and regulation. Every factual claim is traceable to its primary source. YRproject is led by Dirk Baaijen About & method โ†’

A project or programme? Work with YRproject โ†’

The monthly briefing

AI regulation in five minutes: what changed, what is coming and what it means. No spam, unsubscribe anytime.

Your address is used for this only and stored on our own servers.