De-Identified and Still Exposed: How Clinical AI Memorises Individual Patients

Somewhere inside a foundation model trained on millions of supposedly de-identified electronic health records, a ghost lingers. Not a literal one, of course, but a data spectre: the clinical history of a patient whose records were stripped of names, addresses, and social security numbers before ever touching an algorithm. The model was never supposed to remember this person. It was supposed to learn medicine. Instead, it learned a patient.
This is the memorisation problem, and it is rapidly becoming one of the most consequential privacy challenges in clinical artificial intelligence. As healthcare systems worldwide rush to deploy foundation models trained on vast troves of electronic health record data, researchers are discovering that de-identification, the process long treated as the gold standard for protecting patient privacy, may not be enough. These models do not merely generalise medical knowledge from the populations they study. In some cases, they memorise individual patient records with enough fidelity that an adversary armed with the right prompts could extract sensitive clinical details about real people.
The implications are profound. A patient with a rare autoimmune disorder, an individual whose HIV status was recorded during a hospital visit, a person who sought treatment for substance use: these are precisely the kinds of patients whose records are most vulnerable to memorisation, because their clinical profiles are, by definition, unusual. And unusualness is exactly what makes data memorable to a machine learning model.
When Models Remember What They Should Forget
In October 2025, a team of researchers led by Sana Tonekaboni, a postdoctoral fellow at the Eric and Wendy Schmidt Center at the Broad Institute of MIT and Harvard, published a paper that would reframe how the clinical AI community thinks about privacy. The study, “An Investigation of Memorization Risk in Healthcare Foundation Models,” was presented at the 2025 Conference on Neural Information Processing Systems (NeurIPS) in San Diego. Co-authored with Lena Stempfle, Adibvafa Fallahpour, Walter Gerych, and Marzyeh Ghassemi, an associate professor at MIT in Electrical Engineering and Computer Science, the paper introduced a suite of black-box evaluation tests designed to probe whether foundation models trained on structured electronic health records were genuinely generalising medical knowledge or simply recalling individual patients.
The distinction matters enormously. A model that generalises has learned, say, that patients over 65 with elevated troponin levels and chest pain are at high risk of myocardial infarction. That knowledge draws on thousands of patient records and reflects a genuine population-level pattern. But a model that memorises has locked onto a singular patient record, and when prompted with the right combination of attributes, it can reproduce details about that specific individual. “Knowledge in these high-capacity models can be a resource for many communities,” Tonekaboni explained, “but adversarial attackers can prompt a model to extract information on training data.”
The framework the team developed includes methods for probing memorisation at both the embedding level, where models encode patient data as numerical representations, and the generative level, where models produce clinical outputs. Crucially, the researchers designed their tests to distinguish between benign generalisation and genuinely harmful memorisation. Not all information leakage is created equal. If a model reveals that a particular patient profile tends to involve elderly males, that reflects population statistics. If it reveals that a specific combination of laboratory values, timestamps, and diagnostic codes corresponds to a single identifiable individual, that constitutes a privacy breach.
The findings were sobering. The researchers demonstrated that the more prior knowledge an attacker possesses about a particular patient, the more likely the model is to leak additional information. Patients with rare conditions proved especially vulnerable, precisely because their clinical signatures are distinctive enough to be picked out from the broader training distribution. And while some categories of leaked information, such as a patient's age or gender, represent relatively low risk, others carry serious consequences. Diagnoses related to HIV, substance use disorders, or mental health conditions were flagged as potentially harmful disclosures that could damage a person's employment prospects, insurance coverage, or social standing.
Ghassemi, the paper's senior author, offered a practical framing of the threat. “We really tried to emphasise practicality here,” she noted. “If an attacker has to know the date and value of a dozen laboratory tests from your record in order to extract information, there is very little risk of harm. If I already have access to that level of protected source data, why would I need to attack a large foundation model for more?” The question cuts to the heart of the adversarial calculus: how much prior knowledge makes an attack feasible, and at what point does memorisation cross from theoretical vulnerability to practical danger?
The Adversary's Toolkit
To understand the scale of the memorisation threat, it helps to look beyond healthcare-specific models to the broader landscape of large language model security research. The foundational work in this space comes from Nicholas Carlini and colleagues, whose research at Google DeepMind and collaborating institutions has systematically demonstrated that language models memorise and can be made to regurgitate their training data.
In a landmark 2021 paper published at USENIX Security, Carlini, along with Florian Tramer, Eric Wallace, and others, showed that an adversary could extract hundreds of verbatim text sequences from GPT-2, including personally identifiable information such as names, phone numbers, and email addresses. The attack required no access to the training data itself, only the ability to query the model. By 2023, the same research group, now including Milad Nasr, Daphne Ippolito, and Christopher Choquette-Choo, had scaled their methods dramatically. Their paper “Scalable Extraction of Training Data from (Production) Language Models” demonstrated that an adversary could extract gigabytes of training data from both open-source models and commercial systems including ChatGPT.
The 2023 work introduced a particularly concerning technique: the divergence attack. By crafting prompts that cause a model to diverge from its normal conversational behaviour, the researchers achieved training data emission rates up to 150 times higher than those observed during typical usage. The attack essentially tricks aligned models into reverting to their pre-alignment behaviour, at which point they begin outputting memorised sequences with alarming fidelity.
What does this mean for clinical AI? The attack surface is substantial. An electronic health record foundation model trained on millions of patient records contains, by design, sensitive clinical information. Even if the records have been de-identified according to HIPAA standards, the model itself may have encoded enough information to reconstruct individual patient profiles when queried with the right combination of clinical attributes. A rare diagnosis combined with a specific age range and a distinctive pattern of laboratory values could function as a fingerprint, allowing an attacker to extract additional details that the de-identification process was supposed to protect.
The level of prior knowledge required for a successful attack varies depending on the model architecture, training methodology, and the patient population in question. Research on general-purpose language models suggests that model size strongly correlates with memorisation: larger models, with their greater capacity to store training data patterns, are more vulnerable to extraction attacks. Given that clinical foundation models are trending towards ever-larger architectures to capture the complexity of medical knowledge, this scaling relationship poses a direct tension between clinical utility and patient privacy.
De-identification Was Never Bulletproof
The memorisation problem does not exist in isolation. It builds upon decades of research demonstrating that de-identification of health data has always been more fragile than regulators and healthcare institutions have assumed.
The seminal work in this field belongs to Latanya Sweeney, now the Daniel Paul Professor of the Practice of Government and Technology at the Harvard Kennedy School. In 1997, while still a graduate student at MIT, Sweeney demonstrated that she could re-identify the medical records of then-Massachusetts Governor William Weld by cross-referencing publicly available voter registration data with de-identified hospital discharge records. The records had been stripped of names, addresses, and social security numbers, but they retained date of birth, gender, and ZIP code. Sweeney showed that just these three attributes were sufficient to uniquely identify an individual.
Her subsequent research revealed that 87 per cent of the United States population could be uniquely identified using only ZIP code, date of birth, and gender, a finding that helped shape the HIPAA Privacy Rule's Safe Harbour de-identification standard. Yet even with these protections in place, re-identification remains possible. A 2018 study demonstrated that patients could be re-identified from HIPAA-compliant de-identified datasets by cross-referencing them with publicly available newspaper articles about hospitalisations.
A 2025 paper published in AI and Ethics highlighted the particular challenge of clinical free text. Structured data fields like diagnosis codes and laboratory values can be systematically scrubbed, but clinical notes contain narrative descriptions that may include identifying details embedded in the prose: references to a patient's occupation, family circumstances, or the name of a referring physician. De-identification tools, including those powered by natural language processing, struggle with the ambiguity and variability of clinical language.
The emergence of foundation models adds a new dimension to this longstanding vulnerability. Traditional re-identification attacks required an adversary to obtain and cross-reference multiple external datasets. Memorisation attacks against AI models require only the ability to query the model itself. The model becomes both the target and the pathway to the data it was trained on, collapsing what was previously a multi-step process into a single interaction. A 2025 study published in PMC on contemporary threats to anonymised healthcare data warned that AI-based techniques can now infer identity from traditionally de-identified sources using data such as electrocardiograms or patterns of gait, data types that were never considered identifiers under existing privacy frameworks.
How AI Threats Compare with Conventional Cybersecurity Risks
The memorisation vulnerability exists within a broader landscape of healthcare cybersecurity threats that are already severe and worsening. Understanding how AI-specific risks compare with conventional attack vectors is essential for calibrating the response.
The numbers from conventional healthcare cybersecurity are staggering. In 2024, 259 million Americans had their protected health information compromised through hacking incidents, a figure driven overwhelmingly by the Change Healthcare ransomware attack. That single breach, perpetrated by the ALPHV/BlackCat ransomware group, affected approximately 190 million individuals after attackers exploited a Citrix remote access service that lacked multi-factor authentication. UnitedHealth Group, Change Healthcare's parent company, reported total cyberattack impacts of 2.457 billion dollars in the first nine months of 2024 alone.
The healthcare sector has become the most targeted industry for ransomware, accounting for 17 per cent of all ransomware attacks across sectors. Complete protected health information packages command prices of up to 1,200 dollars per record on criminal marketplaces, roughly 80 times the value of stolen credit card data. Over 80 per cent of stolen health records in 2024 were taken not from hospitals directly but from third-party vendors, software services, and business associates, highlighting the systemic nature of the vulnerability.
Against this backdrop, AI memorisation attacks represent a qualitatively different kind of threat. Conventional breaches involve exfiltrating stored data, breaking through perimeters, and exploiting network vulnerabilities. Memorisation attacks exploit the model itself as an unwitting data store. There is no firewall to breach, no database to penetrate. The sensitive information is encoded within the model's parameters, distributed across billions of numerical weights in ways that resist simple detection or removal. An attacker needs nothing more than API access to the model, which in many clinical deployment scenarios would be available to any authorised user of the system.
The two categories of threat also differ in their detectability. A ransomware attack produces obvious signs: encrypted systems, operational disruption, ransom demands. A memorisation extraction attack can be conducted through queries that resemble normal clinical usage, making it far harder to detect. Medical identity theft already takes an average of 24 months to discover, compared with four months for financial fraud. Memorisation-based data extraction could extend this detection timeline even further, because the data never technically leaves the system in the conventional sense.
Yet it would be a mistake to treat AI memorisation as the dominant threat. The scale of conventional breaches dwarfs anything that memorisation attacks have demonstrated in practice. The Change Healthcare incident compromised the records of roughly 190 million people in a single event. Memorisation attacks, by contrast, tend to target individual patients or small groups, requiring specific prior knowledge about each target. The threat from memorisation is more surgical than it is sweeping, but for the individuals affected, particularly those with rare conditions or stigmatising diagnoses, the consequences could be devastating.
The Regulatory Patchwork
The regulatory response to AI memorisation risks in healthcare remains fragmented and, in many respects, inadequate. Existing frameworks were designed for a world where privacy threats came from databases, not algorithms.
In the United States, HIPAA remains the foundational framework for protecting health information, but it was enacted in 1996, long before the emergence of clinical AI. The proposed update to the HIPAA Security Rule, published by the Department of Health and Human Services in January 2025, represents the first major revision in over a decade. The proposal eliminates the distinction between “required” and “addressable” security controls, mandates encryption for all electronic protected health information, and introduces multi-factor authentication requirements. Critically, it establishes that data used in AI training, prediction models, and algorithm development by regulated entities falls under HIPAA's protections.
However, the proposed rule does not specifically address memorisation risks. It treats AI systems primarily through the lens of conventional cybersecurity: access controls, encryption, audit logging. These measures are necessary but insufficient for a threat that is embedded within the model's learned representations rather than stored in a conventional database. The public comment period for the proposed rule closed in March 2025 with nearly 5,000 submissions, and the final rule is expected in late 2025 or 2026. Whether it will address the unique characteristics of AI memorisation remains uncertain.
The European Union's approach through the AI Act offers somewhat more specificity. The regulation classifies AI systems used in healthcare as high-risk, subjecting them to requirements for data governance, transparency, human oversight, and post-market monitoring. From August 2026, most obligations will apply, with full compliance for high-risk medical device AI required by August 2027. The Medical Device Coordination Group published guidance document MDCG 2025-6 to clarify how the AI Act interacts with existing medical device regulations under the MDR and IVDR frameworks.
The AI Act's data governance requirements are particularly relevant to memorisation. High-risk AI manufacturers must implement practices appropriate for the intended purpose, including attention to possible biases and privacy risks. The transparency obligations require that systems be designed to allow deployers to interpret outputs and use systems appropriately. These provisions create a regulatory foundation that could, in principle, require memorisation testing before deployment. But the specifics of implementation remain to be worked out through standards and guidance that have not yet been finalised.
At the state level in the United States, a patchwork of legislation is emerging. By 2025, over 250 AI-related bills had been introduced across more than 34 states. Texas enacted the Responsible Artificial Intelligence Governance Act in June 2025, requiring healthcare practitioners to provide patients with written disclosure of AI use in diagnosis or treatment. Colorado and Utah have enacted their own comprehensive AI laws. The result is a fragmented landscape that creates compliance challenges for healthcare organisations operating across jurisdictions whilst providing inconsistent protection for patients.
Technical Safeguards and Their Limits
The technical toolkit for mitigating memorisation risks is growing, though no single approach offers a complete solution.
Differential privacy, the mathematical framework developed by computer scientists including Cynthia Dwork of Harvard University, provides formal guarantees about information leakage during model training. By adding carefully calibrated statistical noise to the training process, differential privacy ensures that the model's outputs reveal almost nothing about any individual training example. Recent research has demonstrated that healthcare AI models can achieve 96.1 per cent accuracy with a privacy budget of epsilon equals 1.9, suggesting that strong privacy and high clinical performance can coexist.
Yet differential privacy has limitations. The privacy-utility trade-off is real: stronger privacy guarantees require more noise, which can degrade model performance on clinical tasks where accuracy directly affects patient outcomes. The United States Census Bureau's experience with differential privacy in the 2020 census provides a cautionary example. Research found that the technique introduced disproportionate discrepancies for rural and non-white populations, raising concerns about equity impacts that would be equally relevant in clinical settings where underrepresented populations already face disparities in care.
Federated learning offers another approach, keeping patient data decentralised across institutions whilst training a shared model. Rather than aggregating raw data on a central server, each participating hospital trains the model locally and shares only model updates. Yet research has shown that these model updates themselves can leak information. Gradient inversion attacks can reconstruct substantial portions of original training data from the mathematical updates exchanged during federated learning. A study titled “Two Models are Better than One: Federated Learning Is Not Private for Google GBoard Next Word Prediction” demonstrated that user sentences could be reconstructed from model updates alone.
Machine unlearning, the targeted removal of specific patient data from a trained model, has emerged as a conceptually appealing response to memorisation. The approach aligns with the General Data Protection Regulation's right to be forgotten, which allows individuals to request deletion of their personal data. Research presented at MICCAI 2025 introduced Forget-MI, a method for unlearning multimodal medical data from trained architectures. A December 2025 testbed called MedForget modelled hospital data as a nested hierarchy, enabling fine-grained unlearning assessment across multiple organisational levels.
But machine unlearning faces fundamental practical barriers. Retraining a model from scratch without specific patient data remains the only guaranteed path to complete unlearning, and for large foundation models, retraining can take weeks and cost millions of dollars. Approximate unlearning methods are faster but cannot guarantee that all traces of a patient's data have been removed. Moreover, if certain demographic groups are more likely to exercise their right to be forgotten, the resulting training data could become skewed, potentially worsening the very biases that clinical AI is supposed to help address. As a Health Affairs analysis noted, machine unlearning “is computationally intensive, scientifically immature, and potentially destabilising to models that must remain reliable across a wide range of clinical inputs.”
Data deduplication, the removal of repeated training examples, provides a simpler but partial mitigation. Research has consistently shown that models are more likely to memorise data that appears multiple times in training sets. Curating and deduplicating training data can reduce memorisation rates, though it cannot eliminate the risk entirely for patients whose clinical profiles are inherently distinctive.
Building an Evaluation Framework That Works
The MIT team's work points towards what a comprehensive evaluation framework for clinical AI memorisation might look like. Their open-source toolkit, validated on a publicly available electronic health record foundation model, provides a starting point for systematic privacy assessment before model deployment.
The framework's key innovation is contextualising memorisation within healthcare. Not all information leakage constitutes a meaningful privacy risk. A model that reveals population-level patterns, such as the typical age distribution of patients with a particular condition, is doing exactly what it was designed to do. The danger arises when a model's outputs can be traced to a specific individual, particularly when the leaked information includes sensitive diagnoses or treatment histories.
Tonekaboni emphasised the importance of practical evaluation. “This work is a step towards ensuring there are practical evaluation steps our community can take before releasing models,” she said. The framework assesses both embedded memorisation, where patient information is encoded in the model's internal representations, and generative memorisation, where the model can be prompted to produce patient-specific outputs. By testing across both dimensions, the framework provides a more complete picture of privacy risk than either approach alone.
For this kind of evaluation to become standard practice, it would need to be integrated into the regulatory approval process for clinical AI systems. Currently, most AI-enabled medical devices in the United States are cleared through the FDA's 510(k) pathway, which requires demonstration of substantial equivalence to a previously approved device but does not mandate independent clinical performance studies or privacy evaluation. A cross-sectional study of 903 FDA-approved AI devices found that clinical performance studies were reported for only approximately half at the time of regulatory approval. Memorisation testing is not part of the approval process at all.
The Coalition for Health AI (CHAI), on whose working group Ghassemi serves, represents one effort to establish industry-wide standards for trustworthy health AI. The NIST AI Risk Management Framework provides a complementary structure, addressing validity, reliability, safety, security, explainability, privacy, and fairness. Integrating memorisation evaluation into these existing frameworks would be more practical than creating entirely new regulatory apparatus, but it requires agreement on what constitutes acceptable levels of memorisation risk, a question that remains open.
Rare Conditions, Outsized Vulnerability
The memorisation problem falls hardest on the patients who can least afford it. Individuals with rare diseases, by definition, have clinical profiles that stand out from the broader population. Their diagnostic codes appear infrequently in training data. Their laboratory value patterns are unusual. Their treatment trajectories are distinctive. All of these characteristics make their records more memorable to a model and more extractable by an adversary.
The same is true for patients with stigmatising diagnoses. HIV status, substance use disorders, psychiatric conditions, and sexually transmitted infections all carry social consequences that extend far beyond the clinical encounter. Disclosure of these conditions can affect employment, housing, insurance, and personal relationships. De-identification was supposed to sever the link between these sensitive details and the individuals they describe. Memorisation threatens to re-forge that link through the model itself.
This disproportionate vulnerability raises equity concerns that mirror broader patterns in healthcare AI. Research has repeatedly shown that AI systems can perpetuate and amplify existing biases against marginalised populations. If memorisation risks are concentrated among patients with rare or stigmatising conditions, the privacy burden falls most heavily on those who are already underserved by the healthcare system.
Addressing this inequity requires targeted protections. Higher levels of differential privacy noise could be applied to training data involving sensitive diagnoses, at the cost of reduced model performance for those specific conditions. Rare disease patient records could be excluded from training sets entirely, though this would eliminate the clinical utility of foundation models for precisely the populations that stand to benefit most from AI-assisted care. Neither option is satisfactory, and the tension between privacy protection and clinical benefit for rare disease patients may prove to be one of the defining challenges of clinical AI governance.
What Genuine Protection Requires
The path from current vulnerability to genuine protection requires action across multiple domains simultaneously. No single technical safeguard, regulatory standard, or evaluation framework will suffice in isolation.
On the technical side, differential privacy during training should become the default rather than the exception for clinical foundation models. Memorisation evaluation, using frameworks like the one developed by Tonekaboni and colleagues, should be mandatory before any model is deployed in a clinical setting. Ongoing monitoring should be built into deployment infrastructure to detect potential memorisation-based extraction attempts in real time. And machine unlearning capabilities, however immature, should be developed and standardised so that patients can exercise meaningful control over the fate of their data within AI systems.
On the regulatory side, HIPAA needs to evolve beyond its current framework to address threats that are embedded within model architectures rather than stored in conventional databases. The EU AI Act's high-risk classification for healthcare AI provides a useful starting point, but implementation guidance must specifically address memorisation risks. Regulatory bodies including the FDA, the European Medicines Agency, and national health authorities need to incorporate memorisation testing into their approval and post-market surveillance processes.
On the institutional side, healthcare organisations deploying clinical AI must treat memorisation as a distinct category of risk requiring its own governance structures, audit procedures, and incident response plans. The conventional cybersecurity toolkit, with its emphasis on perimeter defence, encryption, and access control, is necessary but not sufficient for threats that live inside the model rather than outside the firewall.
The researchers behind the MIT study plan to expand their work to become more interdisciplinary, bringing in clinicians, privacy experts, and legal scholars. That instinct is exactly right. The memorisation problem sits at the intersection of computer science, medicine, law, and ethics, and solving it will require all four disciplines working in concert.
“There's a reason our health data is private,” Tonekaboni observed. “There's no reason for others to know about it.” That principle has guided health privacy law for decades. The question now is whether it can survive the age of foundation models trained on the very data it was designed to protect. The answer will depend on whether the clinical AI community treats memorisation as a fundamental design constraint rather than an afterthought, building privacy into the architecture of these systems from the ground up rather than bolting it on after deployment. The technology to do so exists. Whether the will and the regulatory momentum exist to mandate it remains the open question.
References and Sources
Tonekaboni, S., Stempfle, L., Fallahpour, A., Gerych, W., and Ghassemi, M. “An Investigation of Memorization Risk in Healthcare Foundation Models.” arXiv:2510.12950, presented at NeurIPS 2025. https://arxiv.org/abs/2510.12950
MIT News. “MIT scientists investigate memorization risk in the age of clinical AI.” January 5, 2026. https://news.mit.edu/2026/mit-scientists-investigate-memorization-risk-clinical-ai-0105
Carlini, N., Tramer, F., Wallace, E., et al. “Extracting Training Data from Large Language Models.” USENIX Security 2021. https://www.usenix.org/conference/usenixsecurity21/presentation/carlini-extracting
Nasr, M., Carlini, N., Hayase, J., et al. “Scalable Extraction of Training Data from (Production) Language Models.” arXiv:2311.17035, 2023. https://arxiv.org/abs/2311.17035
Sweeney, L. “Simple Demographics Often Identify People Uniquely.” Carnegie Mellon University, Data Privacy Working Paper 3, 2000. https://dataprivacylab.org/people/sweeney/work/index.html
Sweeney, L. “Risks to Patient Privacy: A Re-identification of Patients in Maine and Vermont Statewide Hospital Data.” Technology Science, 2018. https://techscience.org/a/2018100901/
PMC. “Addressing contemporary threats in anonymised healthcare data using privacy engineering.” 2025. https://pmc.ncbi.nlm.nih.gov/articles/PMC11885643/
Springer Nature. “What is the patient re-identification risk from using de-identified clinical free text data for health research?” AI and Ethics, 2025. https://link.springer.com/article/10.1007/s43681-025-00681-0
HIPAA Journal. “Healthcare Data Breach Statistics.” https://www.hipaajournal.com/healthcare-data-breach-statistics/
UnitedHealth Group. “UnitedHealth Group Updates on Change Healthcare Cyberattack.” April 22, 2024. https://www.unitedhealthgroup.com/newsroom/2024/2024-04-22-uhg-updates-on-change-healthcare-cyberattack.html
HHS. “Changes Proposed to Strengthen HIPAA Security Rule.” January 2025. https://www.hhs.gov/hipaa/for-professionals/special-topics/de-identification/index.html
Reed Smith. “The EU AI Act and Medical Devices: Navigating High-Risk Compliance.” 2025. https://www.reedsmith.com/our-insights/blogs/viewpoints/102kq35/the-eu-ai-act-and-medical-devices-navigating-high-risk-compliance/
European Commission. “Medical Devices Joint Artificial Intelligence Board, MDCG 2025-6.” 2025. https://health.ec.europa.eu/document/download/b78a17d7-e3cd-4943-851d-e02a2f22bbb4_en
Health Affairs Forefront. “Unlearning In Medical AI: A New Frontier For Privacy, Regulation, And Trust.” 2025. https://www.healthaffairs.org/content/forefront/unlearning-medical-ai-new-frontier-privacy-regulation-and-trust
MICCAI 2025. “Forget-MI: Machine Unlearning for Forgetting Multimodal Information in Healthcare Settings.” https://arxiv.org/html/2506.23145
MedForget. “MedForget: Hierarchy-Aware Multimodal Unlearning Testbed for Medical AI.” December 2025. https://arxiv.org/html/2512.09867v1
Nature Medicine. “Medical large language models are vulnerable to data-poisoning attacks.” January 2025. https://www.nature.com/articles/s41591-024-03445-1
Becker's Hospital Review. “EHR-trained AI could compromise patient privacy: MIT.” 2026. https://www.beckershospitalreview.com/healthcare-information-technology/ai/ehr-trained-ai-could-compromise-patient-privacy-mit/
Cobalt. “Healthcare Data Breach 2025 Statistics.” https://www.cobalt.io/blog/healthcare-data-breach-statistics
NIST AI Risk Management Framework. https://www.nist.gov/artificial-intelligence/ai-risk-management-framework

Tim Green UK-based Systems Theorist & Independent Technology Writer
Tim explores the intersections of artificial intelligence, decentralised cognition, and posthuman ethics. His work, published at smarterarticles.co.uk, challenges dominant narratives of technological progress while proposing interdisciplinary frameworks for collective intelligence and digital stewardship.
His writing has been featured on Ground News and shared by independent researchers across both academic and technological communities.
ORCID: 0009-0002-0156-9795 Email: tim@smarterarticles.co.uk








