Privacy impact assessments for AI systems: practical considerations

Privacy impact assessments are an established discipline. Mature programs have templates, methodologies, and review processes that have been refined over many years. The templates typically cover the categories of personal data processed, the lawful basis for processing, the controls in place, the cross-border transfer mechanisms, the data subject rights provisions, and the retention and deletion arrangements. These templates serve well for many forms of data processing.

AI systems introduce processing patterns that traditional templates address incompletely. The standard categories continue to apply, but the specific questions within each category require adjustment to capture the dimensions of AI processing that do not arise in traditional systems. This article outlines the additional considerations that warrant attention when privacy impact assessments are conducted for AI systems.

Mapping the processing paths

Traditional data processing has a relatively simple structure. Data enters the system, is processed according to defined logic, and produces an output that may be stored, transmitted, or discarded. AI systems often involve multiple processing paths that warrant separate analysis.

The processing paths in an AI system typically include personal data passing through the system as inputs to the AI model, personal data appearing in outputs the system generates, personal data included in training, fine-tuning, or evaluation datasets, personal data retrieved from indexed sources during system operation, personal data captured in logs of the system's activity, and personal data included in the context provided to AI agents during their operation.

An effective assessment maps each of these paths separately and analyzes the privacy implications of each. The standard template may treat all of these as a single processing activity. The more granular treatment surfaces considerations that the consolidated treatment can obscure.

Lawful basis for distinct processing paths

The lawful basis for one processing path may not extend to others. The lawful basis for processing a customer's data to provide them service may not extend to using that data to improve models that benefit other customers. The lawful basis for retrieval over a corpus of documents may not extend to all the personal data those documents incidentally contain.

The assessment should establish the lawful basis for each identified processing path separately. Where consent is the basis, the assessment should establish whether existing consent covers the AI processing or whether additional consent is needed. Where legitimate interest is the basis, the legitimate interest assessment should be specific to the AI use case rather than inherited from earlier assessments of related processing.

Training data lineage and memorization

Where the AI system involves training or fine-tuning of models on data that includes personal information, the assessment should establish the full data lineage. Considerations include the origin of the training data, the lawful basis under which it was collected, the compatibility of the original purpose specification with model training, and the technical possibility of memorization, where a model trained on data may produce outputs that include specific records from training.

Memorization is a documented characteristic of language models trained on personal data. Mitigations include data deduplication, training-time differential privacy, and output filtering. The assessment should document which mitigations apply and the residual risk after mitigation. An assessment that does not address memorization risk for a model trained on personal data is incomplete.

Output accuracy

Personal data produced by AI systems must satisfy the same accuracy obligations as personal data captured through other means. AI systems can produce outputs that contain inaccurate information about identifiable individuals. Where the inaccurate information could cause harm, the controller's obligation to maintain accuracy applies.

The assessment should examine the controls supporting output accuracy. Considerations include measurement of output accuracy across relevant use cases, factual grounding mechanisms that connect outputs to source material, processes for individuals to contest inaccurate output and obtain correction, and validation gates for high-stakes uses of system output. The assessment should treat accuracy as a continuing obligation rather than a point-in-time check.

Automated decision-making

Where the AI system makes decisions affecting individuals, the assessment should examine whether the decisions fall within the scope of automated decision-making provisions in applicable privacy law. The threshold typically involves whether the decision is solely automated and whether it produces legal or similarly significant effects. AI systems often sit at the boundary of these provisions, with decisions that involve some human review or that produce effects of varying significance.

The assessment should examine each category of decision the system can produce and determine whether automated decision-making provisions apply. Where they apply, the assessment should document the human review provisions, the legal or significant effects on data subjects, the mechanisms for individuals to contest decisions, and the provision of meaningful explanation. Meaningful explanation is more challenging to provide for AI systems than for rule-based systems, and the assessment should examine how it will be supported.

Cross-border transfers and AI service providers

Most AI systems involve transfers to AI service providers, often across borders. Standard cross-border transfer mechanisms apply. The assessment should additionally examine considerations specific to AI providers, including the AI provider's training position with respect to the transferred data, the subprocessor chain including any foundation model providers, and the contractual protections beyond the standard transfer mechanism.

Transfer impact assessments for high-risk transfers should address the AI-specific dimensions rather than rely on generic adequacy analysis. Where the transfer involves data flowing into AI training pipelines, the assessment should establish that the contractual and technical controls prevent this where it is not authorized.

Data subject rights for AI processing

Each data subject right has specific implications for AI systems. Access requests against an AI system raise questions about what is in scope. Data subjects may reasonably ask for the prompts processed about them, the outputs the system generated, and the logs of their interactions with the system.

Deletion requests raise more complex questions. Removing primary records is straightforward. Removing the influence of the data on a trained model is technically difficult and may not be fully possible without retraining. The assessment should document the deletion capabilities honestly. Where full deletion of training-data influence is not possible, the assessment should state this and document the available approaches such as model retraining, output filtering, or unlearning techniques where they apply.

Logging as its own processing activity

AI systems typically generate substantial logs of their operation. These logs contain personal data and constitute processing in their own right. The assessment should treat logging infrastructure as a processing activity with its own lawful basis, retention period, access controls, and data subject rights implications.

The logging infrastructure often supports legitimate purposes including security incident investigation, audit obligations, and AI Act compliance for high-risk systems. These purposes can support a lawful basis for the log retention. The assessment should document these purposes explicitly rather than treating logs as an incidental byproduct of system operation.

Special category data inferences

AI systems can produce outputs that constitute inferences about special category data, including health information, sexual orientation, and other categories that warrant heightened protection. Producing inferences about special category data constitutes processing of special category data and requires the corresponding lawful basis.

The assessment should examine whether the system produces or could produce special category inferences, even where special category data is not directly provided as input. Where the system could produce such inferences, the lawful basis for that processing should be established or controls implemented to prevent the inference from being produced or used.

Coordination with AI Act assessments

Where the AI system is classified as high-risk under the EU AI Act, the privacy impact assessment overlaps with AI Act conformity work. The two frameworks have different scopes but related concerns regarding data governance, transparency, human oversight, and accuracy. Conducting privacy and AI Act assessments as separate workstreams creates inefficiencies and risks gaps where considerations span both frameworks.

The practical approach is to coordinate the two assessments, with cross-references between them, shared evidence where appropriate, and consistent treatment of overlapping considerations. The privacy impact assessment can serve as the integration point for privacy and AI Act considerations, with cross-references that ensure both sets of obligations are addressed coherently.

Building the AI dimension into the program

For privacy programs that have been conducting privacy impact assessments using traditional templates, the practical step is to develop an AI-specific addendum that captures the considerations above and integrate it into the standard assessment workflow for AI systems. The existing program structure continues to apply. The substance of the assessment expands to address the additional dimensions. The investment is modest relative to the gap in coverage that the standard template otherwise leaves.