AI sourcing: finding and scraping candidates without breaking the rules
AI tools that find candidates by scraping public profiles mainly engage the GDPR: even public data needs a basis, transparency and data minimisation. Untargeted scraping of facial images is even prohibited, and once the tool ranks candidates the high-risk regime is added.
Short answer: Sourcing tools that use AI to find candidates by searching public profiles (LinkedIn, GitHub, social media) mainly engage the GDPR. Data being public does not mean you may freely collect and process it: you need a basis, transparency and data minimisation. The untargeted scraping of facial images is moreover prohibited, and once the tool scores or ranks candidates the high-risk regime is added.
Public is not free
The biggest misconception: "it's public, so I may use it." The GDPR applies in full to publicly available personal data. That means:
- A basis โ usually legitimate interest, with a balancing test that weighs the candidate's interest.
- Transparency โ in principle you must inform the data subject that you process their data (with a limited exception for disproportionate effort).
- Data minimisation โ collect only what is relevant to the job, not full profiles "just in case".
The scraping ban
Article 5 of the AI Act prohibits the untargeted scraping of facial images from the internet or camera footage to build recognition databases. A sourcing tool that collects profile photos for facial recognition falls squarely under it โ the highest fine band.
Once it ranks: high-risk
Many sourcing tools not only find but rank candidates by suitability. At that point the tool performs selection and falls under Annex III โ with human oversight and bias monitoring, just like CV screening. A ranking based on scraped data combines two risks: a questionable basis and possible discrimination.
Platform terms
Beyond the law, the platforms' own terms of use apply, which often prohibit automated scraping. A breach of those is not an AI Act matter, but it is a legal and reputational risk.
What to do
- Determine the basis before you scrape โ and be honest about the balancing test.
- Minimise: collect job-relevant data, not full profiles.
- Do not process facial images for recognition.
- Treat ranking as high-risk with human oversight.
- Respect the platform terms.
AI sourcing makes the pond bigger, but it does not shift the responsibility. Whoever collects, processes โ and the processor is on the hook.
Sources
- https://eur-lex.europa.eu/eli/reg/2016/679/oj
Regulation (EU) 2016/679 (GDPR): basis, transparency and data minimisation, also when processing publicly available personal data. - https://eur-lex.europa.eu/eli/reg/2024/1689/oj
Regulation (EU) 2024/1689 (AI Act): Art. 5 prohibits untargeted scraping of facial images; Annex III where the tool assesses or ranks candidates.
Read next
GDPR in the workplace: employee data and AI
HR AI runs on employee data, and that sits under the GDPR. Consent is rarely a valid basis given the power imbalance; you often fall back on legitimate interest or a legal obligation. Special-category data, transparency, data minimisation and a DPIA decide whether it is allowed.
AI sentiment analysis of employees: the thin line to the emotion ban
AI inferring employee mood from email, chat, surveys or speech brushes against the emotion-recognition ban (Art. 5 AI Act) and the GDPR. Aggregated and anonymous is sometimes possible; individual monitoring almost never.
Data processing agreement (GDPR art. 28): needed with an AI vendor?
If an AI vendor processes personal data on your behalf, Article 28 GDPR requires a written data processing agreement with fixed minimum content. This explainer sets out what it must contain and what to watch for with AI services.