Redacting Isn’t Enough: The Hidden Risks of Using AI Tools With Student or Patient Data Without a Formal Agreement
Disclaimer
I am not a lawyer, and this blog should not be construed as legal advice. Practitioners should consult with their institution’s IT, legal, and/or compliance office before entering any potentially identifiable information into an AI platform or large language model (LLM).
The growing misconception
Across trainings and professional forums, it’s becoming increasingly common to hear that it’s acceptable to use general-purpose AI tools—like ChatGPT, Gemini, or Claude—for writing reports or summarizing case data as long as identifying details such as names, schools, or locations are redacted.
Unfortunately, this advice may be potentially risky under both FERPA and HIPAA. Even when obvious identifiers are removed, the underlying data may still be personally identifiable or re-identifiable once processed by a large language model.
Why LLM data are rarely anonymous
Every time a practitioner interacts with an online LLM, information such as IP address, user name, or email handle is likely transmitted to the model’s servers. These quasi-identifiers may be linked with the content of your prompt, making it possible to trace a dataset back to an individual student or client.
Kotsenas and colleagues (2021) concluded that medical data contain quasi-identifiers—unique combinations of features such as anatomy, device serial numbers, or timestamps—that can re-identify a person when linked with other digital traces. They wrote that in today’s digital ecosystem patient data is never “ truely anonymous but rather more or less costly to reidentify.”
When applied to education or psychology, even innocuous combinations— student interests, disability classification, or a rare behavioral profile—may act like digital fingerprints.
FERPA vs. HIPAA: Different standards, same principle
FERPA and HIPAA have slightly different definitions of what counts as personally identifiable information (PII) or protected health information (PHI), but the principle is the same: data must be considered de-identified only when there is no reasonable basis to believe an individual could be identified.
FERPA prohibits disclosure of student information unless a reasonable person in the school community could not identify the student—even when multiple data points are combined.
HIPAA’s Safe Harbor standard lists 18 identifiers that must be removed and requires that the remaining data cannot reasonably lead to re-identification.
When practitioners use consumer-grade AI tools that lack a signed Data Use Agreement (DUA) or Business Associate Agreement (BAA), there are no such safeguards. These systems may store, log, or train on user input. In practice, that means any redacted case summary or assessment note could still be tied back to a real individual.
NASP’s position: De-identification may not protect students
The National Association of School Psychologists (NASP) has cautioned:
“It is important to note that de-identifying the data we provide to AI systems may be insufficient to adequately protect the students we’re working with. Even when names and identifying information are removed, advanced algorithms may be capable of re-identifying individuals by analyzing patterns in the data.”
This aligns with findings across medicine and data science showing that even “scrubbed” datasets remain vulnerable once machine learning models can infer patterns, reconstruct missing variables, or cross-link sources.
Why this matters for schools and clinics
When a practitioner pastes a student narrative or therapy note into a general AI tool:
The data may be stored on remote servers.
The system can capture metadata such as IP address or institution domain.
Model developers may legally reuse the data for “service improvement.”
These actions can create unintended disclosures of protected data under both FERPA and HIPAA—even if the user believed the information was de-identified.
Consider that a few contextual details (e.g., a student with a low incidence disability in a small district) may be enough to identify a specific individual to anyone familiar with the case.
Safer paths forward
Use compliant AI platforms.
Whenever possible, select AI tools that are explicitly FERPA/HIPAA-compliant and supported by a formal Data Use Agreement (DUA) or Business Associate Agreement (BAA) with your organization (for example, enterprise deployments of OpenAI, Gemini for Education, or Microsoft Azure OpenAI). The presence of a signed agreement, rather than the tool’s name alone, determines compliance.Run local or offline models.
For drafting, summarizing, or brainstorming tasks, consider using local models such as LM Studio or Ollama that process data entirely on your own device, minimizing the risk of external storage or cloud transmission.Train your teams.
Emphasize that redaction is not equivalent to de-identification. Provide professional learning opportunities focused on privacy thresholds, ethical data use, and current institutional or state guidance on AI tools.Advocate for clear institutional policies.
Encourage your organization, district, or agency to develop written AI-use policies defining which tools are approved, how compliance is verified, and when AI may be used for report writing, documentation, or data analysis. Such policies should be reviewed by legal and compliance professionals.
The bottom line
De-identification is not foolproof, and redaction alone may not protect sensitive information in an AI context. The patterns themselves—test scores, writing style, behavioral profiles—can act as unique identifiers.
Until federal privacy laws are updated to account for AI’s re-identification capabilities, the safest rule remains clear:
Never input any student or patient data into a general LLM unless the system is explicitly covered by a signed compliance agreement.
References
Kotsenas, A. L., Balthazar, P., Andrews, D., Geis, J. R., & Cook, T. S. (2021). Rethinking patient consent in the era of artificial intelligence and big data. Journal of the American College of Radiology, 18(1), 180–184.
Kancherla, J. (2023). Re-identification of health data through machine learning. Georgia Institute of Technology.
National Association of School Psychologists. (2024). Guidance on artificial intelligence in school psychology practice. https://www.nasponline.org
Author’s Note: Portions of this article were drafted with assistance from OpenAI’s GPT technology to support clarity and organization. All content, interpretations, and conclusions reflect my own professional judgment and responsibility.