AI systems are, fundamentally, data systems. They are trained on vast collections of human-generated content, they learn from every interaction, and they make inferences about individuals that those individuals may not even be aware of. The relationship between AI and privacy is one of the most important and least understood aspects of modern technology.

What AI systems know about you

Modern AI systems collect, process, and infer information about users in ways that go far beyond what most people realise. Understanding the different types of data involved helps you make more informed choices.

Data you explicitly provide

The questions you ask, the documents you upload, the conversations you have. When you ask a health AI about symptoms, describe a financial situation to a chatbot, or upload a confidential document for summarisation — that data exists somewhere, processed by someone's infrastructure.

Data inferred from your behaviour

How long you pause before typing, which suggestions you accept, what topics you return to repeatedly, how you rephrase questions when you don't get what you want. These behavioural signals reveal preferences, emotional states, and decision patterns that users rarely consciously disclose.

Training data you never gave

Large language models were trained on vast amounts of internet content — including content that people posted publicly without knowing it would be scraped into AI training sets. Your old forum posts, social media content, or professional work may have contributed to training AI systems without your knowledge or consent.

The inferences problem

Modern AI can infer sensitive attributes — health conditions, political views, sexual orientation, financial distress — from seemingly innocuous data like purchase patterns, location history, or writing style. You may never disclose these things explicitly, but an AI system may infer them with surprising accuracy.

The training data question

One of the most contested privacy questions in AI concerns what data was used to train these systems and whether consent was obtained. Large language models were predominantly trained on data scraped from the internet — Common Crawl, Wikipedia, GitHub, books, and more. Much of this content was created by people who had no expectation it would be used to train commercial AI systems.

Several major lawsuits are underway — from news organisations, authors, and artists — arguing that using copyrighted content in AI training without permission or compensation violates intellectual property law. These cases will shape the norms around training data for years to come.

Risks to individuals

What organisations must do — key regulations

Privacy regulation is evolving rapidly in response to AI. The key frameworks you should know:

GDPR (EU)
The General Data Protection Regulation applies to any organisation processing data of EU residents. Requires lawful basis for processing, data minimisation, purpose limitation, and grants individuals rights to access, correct, and delete their data. Fines up to 4% of global annual revenue. Has significant implications for AI training and automated decision-making.
DPDP Act (India)
India's Digital Personal Data Protection Act (2023) establishes rights for Indian citizens over their personal data. Requires explicit consent for data processing, imposes obligations on "data fiduciaries" (organisations processing data), and creates a Data Protection Board for enforcement. Particularly relevant for the large Indian AI and tech sector.
EU AI Act
The first comprehensive AI-specific regulation, which includes significant privacy protections. Bans real-time biometric surveillance in public spaces (with narrow exceptions), requires transparency when AI is making decisions about individuals, and mandates human oversight for high-risk AI applications. Covered in detail in the Regulation module.
CCPA (California)
The California Consumer Privacy Act gives California residents rights over their personal data — access, deletion, and opt-out of sale. A patchwork of similar state laws is emerging across the US in the absence of federal privacy legislation.

Practical steps for individuals

You have more agency over your AI privacy than you might realise. Practical actions:

Practical steps for organisations

Key takeaways

  • AI systems collect explicit inputs, infer from behaviour, and were trained on data people never knowingly contributed
  • AI can infer sensitive attributes — health, politics, financial state — from indirect behavioural signals
  • Key regulations: GDPR (EU), DPDP Act (India), EU AI Act, and state-level laws in the US
  • Individual steps: check privacy settings, don't share sensitive data carelessly, use enterprise plans or local models for sensitive work
  • Organisation steps: data minimisation, clear policies on employee AI use, vendor due diligence, privacy impact assessments
  • Training data consent remains legally and ethically contested — major cases are working their way through courts