Where Your AI Agent Sends Personal Data — A GDPR Data Flow Map
Most AI agents leak PII to 5+ third parties you haven't mapped. Trace every data flow from prompt to storage, and see exactly what GDPR requires at each hop.
You probably don't know where your data goes
Ask a developer "does your AI agent comply with GDPR?" and they'll say "yeah, we don't store personal data." Then you look at their setup and find PII in conversation logs, memory files, API request bodies sent to US-based model providers, error tracking services, and analytics platforms. The data is everywhere. They just never mapped it.
This isn't negligence. It's a symptom of how AI agents work. Agents are integrators by design. They take user input, process it through one or more language models, execute skills that call external services, and store context for future use. Each step in that chain is a potential place where personal data lands. And under GDPR, you need to account for every one of them.
The typical agent data flow
Let me trace a typical interaction through an OpenClaw agent and show where PII ends up.
Step 1: User input. A user in Hamburg types "My name is Lena Mueller, please check on order #4521 and email me at lena.m@example.de." Right there you've got a name, an email address, and an order ID that links to more personal data.
Step 2: Context assembly. Your agent loads its SOUL.md, any relevant memory files, and the conversation history. If Lena has chatted before, her previous messages (with whatever PII they contain) are now in the context window.
Step 3: LLM API call. The full context -- including Lena's name, email, and conversation history -- goes to an LLM provider's API. If that provider is Anthropic or OpenAI, the data just left the EU and crossed the Atlantic. This is a GDPR data transfer.
Step 4: Skill execution. The agent calls your order lookup service, your email service, maybe a CRM. Each of these receives some subset of Lena's personal data.
Step 5: Response and logging. The agent's response goes back to Lena, but it's also written to your conversation log, your analytics platform, your error tracker, and possibly your agent's memory store for future reference.
That's at least six places where Lena's PII now lives: your app server, the LLM provider, the order service, the email service, the logging system, and the memory store. Possibly more if any of those services have their own sub-processors.
What GDPR requires at each step
Lawful basis (Article 6)
You need a legal basis for processing personal data at each step. For most agent interactions, you're looking at one of two options:
- Consent: The user explicitly agreed to the processing. This needs to be specific, informed, and revocable.
- Legitimate interest: You have a reasonable business reason and the processing doesn't override the user's rights. This requires a documented Legitimate Interest Assessment.
"We need the data to make the agent work" is not automatically a lawful basis. You need to pick one and document it.
Data minimization (Article 5)
You should only process the personal data that's strictly necessary. This is where things get uncomfortable for agent builders, because agents work better with more context. Sending the full conversation history to the LLM gives better responses. But GDPR says you should send the minimum necessary.
In practice, this means thinking carefully about what goes into each API call. Does the LLM need Lena's email address to check her order status? No. Strip it before the API call and add it back when the agent composes the email. Does your analytics service need the full conversation text? No. Send anonymized event data instead.
Data transfers (Chapter V)
If personal data leaves the EU -- and it probably does when you call an LLM API hosted in the US -- you need a transfer mechanism. The options are:
- Standard Contractual Clauses (SCCs): Most LLM providers include these in their data processing agreements. Read them. Make sure they're actually signed.
- Adequacy decision: If the provider is in a country the EU considers "adequate" for data protection, you're covered. The US-EU Data Privacy Framework covers some US companies. Check if your provider is certified.
- Binding Corporate Rules: Only relevant if the provider is part of your own corporate group.
The safest approach is to strip PII before it leaves the EU. If the LLM never sees the personal data, there's no transfer to worry about. ClawPine's compliance proxy does exactly this -- it replaces PII with tokens before the API call and restores them in the response.
Right of access and erasure (Articles 15 and 17)
Lena can ask "what data do you have about me?" and you need to provide a complete answer. She can also say "delete everything" and you need to comply within 30 days, with limited exceptions.
For AI agents, this is harder than it sounds. Lena's data might be in conversation logs, memory files, skill execution traces, analytics events, LLM provider logs, and any downstream service. You need a way to find all of it and delete all of it.
If your agent uses persistent memory -- files that carry context between sessions -- you need to be able to search by user identity and selectively delete. Corrupting the memory store while deleting one user's data is not an acceptable side effect. Build the deletion path before you need it, not when someone actually files a request.
Data Protection Impact Assessment (Article 35)
If your agent processes personal data at scale or handles sensitive categories (health data, biometric data, children's data), you need a DPIA. This is a documented assessment of the risks to individuals and the measures you've taken to mitigate them.
A DPIA for an AI agent should cover: what data is processed, where it flows, who has access, what could go wrong (data breach, unauthorized access, incorrect automated decisions), and what controls are in place. If your agent makes automated decisions that significantly affect people, Article 22 gives those people the right to human review.
The memory file problem
Agent memory is a GDPR headache that doesn't get enough attention. OpenClaw agents can maintain memory across sessions -- facts about the user, previous interactions, preferences. This is great for user experience and terrible for data management.
Memory files are unstructured. They might contain "User prefers email over phone" alongside "User mentioned they're being treated for condition X." There's no schema, no field-level access control, and no obvious way to audit what personal data they contain without reading them.
My recommendation: treat memory files as personal data by default. Apply retention limits (delete memory older than N days). Run PII detection on memory writes so you know what's being stored. And build the deletion tooling early, because manual memory cleanup across thousands of users is not something you want to do under a 30-day deadline.
Practical steps
If you're running an OpenClaw agent that serves EU users, here's what I'd prioritize:
- Map the data flow. Draw every system that touches user data. Include logging, error tracking, analytics, and backups. You'll be surprised how many endpoints there are.
- Strip PII at the boundary. Before any data leaves your controlled infrastructure -- especially to LLM providers -- run it through a PII detection and stripping layer. ClawPine's compliance proxy handles this.
- Document your lawful basis. For each processing step, write down which Article 6 basis you're relying on. Keep it in a living document, not a one-time legal memo.
- Build the deletion pipeline. You will get erasure requests. Build the tooling to find and delete a user's data across all systems before the first request arrives.
- Set retention limits. Don't keep conversation logs and memory files forever. Define retention periods and enforce them automatically.
- Check your transfer mechanisms. If data crosses borders, confirm you have SCCs or another valid mechanism in place with every provider.
None of this requires a team of lawyers or a six-month project. But it does require sitting down, mapping the data flow honestly, and closing the gaps. The regulation is not going away, and enforcement is getting more frequent, not less.
Stay compliant, automatically
ClawPine monitors your agents for GDPR, SOC 2, and HIPAA compliance in real time.