The Knock at the Door: Predictive Scoring and Child Welfare Rights

The knock comes on a Tuesday, late afternoon, when the rice is still on the hob and the youngest is doing homework at the kitchen table. A caseworker in a thin coat introduces herself, asks if she can come in, and explains that the city has received a report and is required to follow up. The mother, who has lived in the same flat for nine years and has never had a child welfare investigation in her life, asks who made the report. The caseworker hesitates. It is not exactly a report, she says. It is a flag.
This is the moment, repeated thousands of times a year across American cities, when a family discovers that they have been the subject of attention they did not know was possible. The flag did not come from a neighbour or a teacher or a paediatrician. It came from a model. A risk-scoring system, fed on years of administrative data, generated a number that crossed a threshold inside a software dashboard at the local child protective services office. A screener saw the number. A supervisor signed off. A caseworker was dispatched. Somewhere along that chain, a human being had to make a final decision, but the decision was anchored, framed, and quietly shaped by an output that nobody in the home would ever see.
The mother in the flat has no right to see the score. She has no right to know which features pushed her family above the threshold. She has no right to challenge any one of those features in front of a neutral reviewer. She has no right, in any meaningful sense, to know that the algorithm exists.
That is where American child welfare sits in the spring of 2026: an expanding lattice of predictive systems, deployed inside agencies whose decisions can place a family under state surveillance and, in the worst cases, separate parents from their children, operating almost entirely outside the procedural rights that any other consequential decision in modern life would attract. A family flagged by a credit-scoring algorithm has more statutory recourse than a family flagged by a child welfare risk model. A driver flagged by a parking enforcement camera has more transparency. A tenant flagged by an algorithmic landlord screen has more legal scaffolding to push back. The state has built one of the most invasive deployments of pattern-matching in American public administration, and it has done so on top of the thinnest possible layer of due process.
The Markup Investigation and What It Found
In 2025, The Markup published an investigation into the Administration for Children's Services in New York City, the agency that handles child abuse and neglect reports for roughly 1.6 million children. The investigation, drawing on internal documents and interviews with agency staff, established that ACS had been using an algorithmic risk-scoring tool to help decide which families warranted heightened scrutiny, surveillance and investigation following a hotline call. The tool, which the agency had introduced years earlier with limited public discussion, generated a score for every family entering the system, and that score informed which cases were elevated for what staff called “high-priority” review.
The Markup's reporters, working with academic researchers, found that the system disproportionately flagged Black and low-income families at rates higher than would be expected from the underlying base rates of confirmed maltreatment in those populations. The disparity was not fully explained by the data the agency claimed to be using. There were other variables, less obvious ones, that appeared to be doing meaningful work inside the model. Postcodes. Prior contact with public assistance programmes. Density of services in a neighbourhood. Each was, on its face, a non-racial input. Each, in practice, served as a proxy for race and class, because race and class in New York are written into the geography and the administrative trail of poverty.
The agency, when contacted, defended the tool. It pointed out that the score was advisory, that humans made the final calls, that the system had been validated internally. The agency declined to release the model's full feature set. It declined to release the weights. It declined to release the technical documentation that would have allowed independent researchers to reproduce the disparity findings or to test the model on counterfactual data. Families who had been flagged by the tool, and who had then had caseworkers in their homes, had no idea that an algorithm had been involved in the decision.
The Markup investigation matters not because it was the first time anyone had documented this pattern. It matters because it landed in the largest city in the United States, in the agency that handles the largest child welfare caseload in the country, and because it confirmed that what had previously been a research finding from smaller jurisdictions was now a continental-scale phenomenon. Child welfare is being run, in part, by black-box prediction.
The Allegheny Lineage
The patient zero of the modern child welfare risk-scoring movement is the Allegheny Family Screening Tool, deployed in Allegheny County, Pennsylvania, beginning in 2016. The tool was developed by a research consortium, validated using historical case data, and integrated into the county's call-screening process. When a hotline operator received a report, the tool produced a score that estimated the likelihood that a child in the household would be removed within two years. Higher scores triggered closer review.
The Allegheny tool was, in many ways, the public face of a movement that promised to bring rigour and consistency to an area of public administration long accused of being inconsistent and biased. Its developers were not naive technocrats. They were academics with serious credentials in social welfare and statistics, and they argued, plausibly, that human screeners themselves were biased, and that an algorithm trained on the same data could at least be audited. The tool was not deployed in secret. There were public meetings, advisory committees, journalistic profiles. For a brief moment in the late 2010s, Allegheny was held up as the responsible model.
What followed was a decade of audits that complicated that picture. Independent researchers, including teams who built fairness audit frameworks specifically for child welfare contexts, found that the tool's predictions correlated with socioeconomic status in ways that were not adequately disclosed in the public materials. They found that the tool's accuracy varied by demographic group. They found that the underlying training data, which was based on historical screening and removal decisions, encoded the biases of the human system the tool was supposed to improve. If the historical data showed that Black families in Allegheny had been more likely to have their children removed for any given maltreatment report, then a model trained on that data would learn to flag Black families more aggressively, and would do so even if every explicit racial variable was stripped from the inputs.
The Allegheny defenders responded that the tool reduced overall disparity compared with unaided human screening, and there is research that supports parts of that claim. The Allegheny critics responded that “less biased than the worst-case human” is not a high enough bar to justify deployment, particularly when the tool's mechanics remained opaque to the families it scored. By the early 2020s, this argument had calcified into a sort of trench warfare in the academic literature. The county kept using the tool. Other jurisdictions copied it. New York City's ACS was one of those jurisdictions, and the tool documented by The Markup is, in effect, a descendant of the Allegheny lineage, retrained on different data and tuned to different operational thresholds.
What the April 2026 Audits Showed
On 21 April 2026, two papers appeared on arXiv that, between them, gave the most rigorous picture yet of what is actually happening inside these systems. The first was a fairness audit of institutional risk models in welfare and safeguarding contexts. The second was an analysis of algorithmic fairness in case-note-augmented prediction systems, the newer generation of tools that pull free-text narrative from caseworker notes into the feature pipeline.
The findings, taken together, are damning in a precise and technical way. Models deployed in high-stakes welfare and safeguarding contexts routinely encode socioeconomic and racial proxies even when those variables are nominally excluded from the input set. The mechanisms are not mysterious. They are documented. Postcodes function as racial proxies in segregated cities. Prior interactions with means-tested benefits encode income and, indirectly, race. Neighbourhood-level deprivation indices, which were originally designed by social scientists to identify communities in need of investment, become, when fed into a risk model, indicators that an individual family is more dangerous to its own children. Each input, considered alone, has a defensible policy rationale. Stacked, weighted and combined inside a model that was optimised to predict historical removals, they produce a system that reproduces the geography of state intervention with eerie fidelity.
The case-note paper went further. Once a model starts ingesting free-text notes from caseworkers, the proxy problem deepens, because language itself is socially stratified. A caseworker note that describes a home as “chaotic” or a parent as “uncooperative” carries weight inside an embedding model. Whether those labels were accurate, fair, or applied consistently across demographic groups is a question the model cannot answer and the deployment process rarely interrogates. Audits showed that case-note-augmented models could amplify existing disparities, because the historical record of how caseworkers described different families itself encoded assumptions about whose homes were suspect.
Both papers stopped short of saying that current child welfare risk models cannot be made fair. Both papers said, in different ways, that current child welfare risk models are not currently fair, that their unfairness is structural rather than incidental, and that the standard mitigations on offer in the technical literature, group-balanced thresholds, adversarial debiasing, fairness constraints during training, are insufficient to address proxy encoding at the depth it currently operates. To put it bluntly: the tools the field has built to make these systems fair are themselves not powerful enough to overcome the data the systems are trained on.
The Berkeley Notice Problem
In January 2026, researchers at the University of California, Berkeley published an analysis of a different but related question. Not whether these systems are fair in a statistical sense, but whether the people they scored had any meaningful idea that scoring was happening. The Berkeley analysis catalogued the deployment of algorithmic decision systems across a growing range of life-altering institutional contexts, including child welfare assessments, public benefit eligibility, criminal justice risk assessment, healthcare allocation, and tenant screening. It found that, in the overwhelming majority of cases, the affected individual had no notice that an algorithm was involved, no access to an explanation, and no formal route of appeal that engaged with the algorithmic component of the decision specifically.
This is the harder problem, and in some ways the more politically tractable one. Statistical fairness is a moving target. Technically, you can argue forever about whether a particular calibration metric or error-rate parity standard is the right one. Notice and explanation are simpler. Either the family knows that a system was used or they do not. Either there is a document explaining the inputs or there is not. Either there is a procedure for contesting the score or there is no such procedure.
The Berkeley researchers' finding, applied to child welfare, is sobering. A family flagged by a risk-scoring tool in New York or Pittsburgh or Los Angeles has, in practice, no way to know that they were flagged by a tool. The caseworker on the doorstep is not required to tell them. The investigation paperwork does not disclose it. The records request, if they know to file one, may or may not produce the score. If it does, it almost certainly will not produce the underlying feature values, the model card, or any documentation that would allow them to understand what was being weighed and how.
The information asymmetry is total, and it sits on top of an existing power asymmetry that is itself substantial. Families in the child welfare system are disproportionately poor, disproportionately non-white, and disproportionately under other forms of state observation already, including housing assistance, food assistance, public schools, and Medicaid. The institutional knowledge of how to navigate any of these systems is unevenly distributed. Add an opaque algorithmic layer on top of all of that, and the result is a population of citizens making decisions, accepting investigations, and signing service plans without knowing one of the most important inputs into the state's interest in them.
Why No Rights Yet
The legal infrastructure that might check any of this exists, in patches, in places. None of it is robust enough to do the job.
At the federal level, the closest analogue to a comprehensive algorithmic accountability statute is the patchwork of civil rights law, which prohibits disparate-impact discrimination in some federally funded programmes but has never been successfully wielded against a child welfare risk-scoring tool in the way it has against, say, mortgage-lending models. The procedural due process clause of the Fourteenth Amendment offers some protection in cases where the state seeks to terminate parental rights, but the protections kick in late in the process, well after the algorithmic flag has done its work to set events in motion. Pre-investigation flags are not adjudicated. They are operational decisions, treated as administrative discretion, and discretion is precisely what courts have historically been reluctant to second-guess.
At the state level, a handful of legislatures have passed bills requiring agencies to disclose when they are using automated decision systems, but most of these laws contain carve-outs for “decision-support” tools, and almost every child welfare risk model is officially classified as decision-support rather than as automated decision-making. The reasoning is that a human screener still signs off. The reality is that the screener is reading a score that has been generated by software, and the score functions as the primary signal in many of those decisions. The carve-out exists because vendors and agencies argued for it, and because legislators who wrote the bills did not want to be accused of weakening safeguarding by wrapping it in transparency requirements.
At the procurement level, the contracts that govern these tools are often classified as confidential commercial information. Vendors negotiate terms that prohibit agencies from disclosing the model's inner workings, on the theory that the model is intellectual property. Agencies, who frequently do not have in-house data science capacity to evaluate the tools, accept these terms because they want the tools and they cannot easily build them themselves. The result is a procurement architecture in which the state delegates a consequential public function to a private contractor, accepts secrecy as a condition of the contract, and then refuses to disclose what it has bought, on the grounds that the secrecy is the vendor's right.
The contrast with the European Union is instructive. Article 22 of the General Data Protection Regulation, in its original 2018 form, gave individuals a right not to be subject to decisions based solely on automated processing that produce legal or similarly significant effects. The right was always more limited in practice than in headline, because purely automated decisions are rare and most regulated decisions involve some human in the loop. But Article 22, paired with the wider GDPR architecture of subject access rights, transparency obligations, and data protection impact assessments, created a baseline that simply does not exist in the United States. The 2024 EU AI Act extended this baseline with risk-tiered obligations for high-risk systems, including those used in social welfare administration. The United States has nothing equivalent at the federal level. State-by-state, the strongest American statutes on automated decision-making are weaker than the floor European regulators consider unacceptable.
This is not because Americans are less concerned about state surveillance of families. It is because the American legislative process, on questions of child welfare, has a particular political shape. No politician wants to be the one who voted for the bill that allegedly weakened child protection. Every vendor of a risk model can frame transparency requirements as obstacles to keeping children safe. Every agency that uses one of these tools can claim that disclosure of the model's mechanics would teach abusive parents how to game the system. These framings are sometimes sincere and sometimes opportunistic, and both are politically effective. The result is a legislative landscape in which proposals to give families notice and challenge rights die in committee, while procurement contracts for new tools are renewed without serious public debate.
Voices in the Field
The academic literature on algorithmic harm in welfare contexts is, by 2026, large enough to constitute a small subfield. Virginia Eubanks, whose 2018 book on automated inequality remains foundational, argued that the deployment of predictive tools in welfare administration represents a new form of digital poorhouse, applying mass surveillance to the populations least able to resist it. Dorothy Roberts, whose work on the racial politics of family policing predates the algorithmic era, has long argued that the child welfare system is structurally biased against Black families and that data-driven tools, far from correcting that bias, formalise it and make it harder to contest. Rashida Richardson, who has written on algorithmic accountability and government use of predictive systems, has argued for procedural rights of notice, explanation and contestation as a baseline condition of legitimate deployment.
On the technical side, researchers like Solon Barocas have spent years documenting the mechanisms by which proxy variables encode protected attributes, and the limits of formal fairness criteria in the face of those mechanisms. Hadi Elzayn and collaborators have published audits of welfare-adjacent algorithmic systems showing, with empirical rigour, how disparate impact persists even under well-designed mitigation strategies. None of these scholars has called for a complete ban on predictive tools in welfare contexts. Most have called for a combination of structural reforms: independent audits, transparency requirements, due process rights, and a presumption that high-stakes deployments require a much higher evidentiary bar than what is currently common practice.
The interesting feature of this body of work is how unified it is on the procedural questions, even when scholars disagree on the technical questions. Whether a particular fairness metric is the right one is contested. Whether families should have a right to know that a model was used in a decision that affected them is, within this literature, essentially uncontested. The gap between the academic consensus and the operational reality of American child welfare is wide, and it is not narrowing.
What Rights Would Look Like
The shape of a meaningful rights framework for algorithmic decisions in child welfare is, at this stage, well rehearsed in policy literature. The components are not exotic.
Notice would mean that a family receiving a child welfare contact would be told, in writing, whether an algorithmic risk-scoring tool was used in the decision to investigate, and that they would be given the name and a plain-language description of the tool. This is an extremely low bar. It would not change the outcome of any individual investigation. It would simply close an information asymmetry that currently has no defensible justification.
Access would mean that the family could obtain the score that was generated for them, the inputs that fed into the score, and the documentation describing how the model translates inputs into outputs. The technical documentation already exists in most cases. It is generated as part of the procurement process. The barrier to disclosing it is contractual, not technical.
Contestation would mean that the family could challenge specific data points used in the score. This is where the model intersects with longstanding administrative law practice. Government records routinely contain errors. Some of those errors are typographical. Others are substantive. A family who has been flagged on the basis of a prior investigation that was later closed as unfounded should be able to point at that investigation and ask whether it was correctly weighted in the model. A family flagged on the basis of a postcode association should be able to ask whether that association is what is doing the work and, if so, whether the weight is justified.
Human review with authority would mean that the human in the loop is not just a person who reads the score and signs off, but a person with the institutional standing to overturn the score, the time to actually examine the inputs, and a documented record of the reasoning behind their decision. This is the most demanding component, because it requires resourcing and training that most agencies have not invested in. It is also the most consequential, because it transforms the human-in-the-loop from a procedural fig leaf into a real check.
Independent auditing would mean that agencies cannot simply self-validate their tools. They would be required to submit the tools to external technical review, including review by parties with no commercial interest in the tool's continued deployment. Audit findings would be public. Significant findings would trigger remediation requirements with deadlines.
A route of appeal would mean that there is a forum in which a family can challenge an algorithmically influenced decision and obtain meaningful relief. This is the hardest component to graft onto the existing child welfare system, because the system's procedural backbone is calibrated for a different kind of dispute. It is calibrated for fact-finding about events in a household, not for technical contestation of a model's behaviour. Building this capacity would require new staff, new training, and probably a new tier of administrative tribunal.
None of these proposals is technically novel. Each has been articulated in academic and policy literature. Each, in some form, exists in other regulatory contexts. What is missing is not the design. What is missing is the political coalition to build them in.
Why the Politics Are Stuck
The reasons no such coalition has consolidated are visible in the structure of the issue. Child protection, as a political project, runs on the premise that the state's job is to err on the side of intervention. The institutional culture of the agencies, the framing of legislative debates, and the media treatment of failures all push in one direction. When a child is harmed in a family that the system did not investigate, there are inquiries, commissions and resignations. When a family is harmed by an unjustified investigation, the story tends not to make the front page, and the family tends not to have a press office.
This asymmetry shapes how risk-scoring tools are introduced and how they are defended. The pitch to administrators is that the tool will reduce the rate of false negatives, the cases where the system missed a child who needed protection. The pitch to legislators is similar. The cost on the other side, the rate of false positives, the families subjected to investigation they did not need, is rarely treated as a comparable harm in the political conversation, even though it is a quantifiable and substantial cost in the lives of those families. The current generation of risk-scoring tools is calibrated according to thresholds chosen by agency leadership, and those thresholds are typically set conservatively in the direction of investigating more rather than fewer households.
The vendors of these tools have learned to operate within this politics. They market on the prevention of catastrophic outcomes. They underplay the operational disparities. They negotiate procurement contracts that limit disclosure. They cultivate relationships with academic researchers who can supply the legitimating veneer of validation studies. None of this is corrupt in any obvious sense. It is the normal behaviour of any commercial actor selling into a politically sensitive market with high stakes and asymmetric information. But the cumulative effect is an industry that is poorly disciplined by external oversight, because the external oversight does not have the tools to discipline it.
Affected families, meanwhile, are nearly impossible to organise. They are already under state scrutiny. They are reluctant to draw additional attention to themselves. They often do not know that other families have had similar experiences, because the information that would allow them to find each other does not flow. Civil society organisations have done significant work in this area, but they have done it at a scale that is dwarfed by the operational scale of the agencies and vendors they are trying to hold accountable.
What to Watch
The most likely vector of change in the near term is litigation. Several civil rights organisations have been preparing cases that target specific algorithmic deployments in welfare contexts, looking to establish precedent under existing civil rights and due process doctrine. The legal theory would not require a new statute. It would require a court to recognise that a family has a constitutionally cognisable interest in not being subjected to investigation on the basis of a process that they cannot contest. Whether such a case will succeed is uncertain. The doctrine is unfriendly. The factual records are hard to build. But the architecture of the litigation is plausible enough that several organisations are betting on it.
A second vector is local legislation, particularly in cities and states where the political balance is more amenable to civil liberties framings. New York, in the wake of The Markup investigation, has seen renewed legislative interest in algorithmic accountability for city agencies. Whether ACS specifically will be brought under stronger transparency rules remains to be seen. The vendors have lobbyists. The agency has institutional inertia. But the political weather, in 2026, is more favourable to disclosure than it was in 2020, and the gap between civil society capacity and vendor capacity is starting to narrow as algorithmic accountability becomes a more established advocacy field.
A third vector, and the one most aligned with the academic literature, is the construction of an external audit infrastructure. A non-governmental organisation, an academic consortium, or a hybrid public-private body with the technical capacity to audit child welfare risk-scoring tools and the legal standing to compel disclosure does not currently exist in the United States. Building one would require funding, talent, and a political settlement that recognises external audit as a legitimate function. There are precedents in other regulated industries: financial auditing, environmental impact assessment, clinical trial review. The case for an analogue in algorithmic public administration is, in the wake of the April 2026 audit findings, harder to dismiss than it once was.
The Family Comes Back
The mother in the flat does not see any of this. She sees a caseworker on a Tuesday afternoon. She answers questions she did not expect to answer. She watches her children watched by a stranger. She signs paperwork. The investigation, in her case, is closed without findings six weeks later. She is not removed from the system; she is now in it, in the database, as a household with a prior contact, a feature that may itself be ingested by the next iteration of the model the next time her name comes up.
She is told none of this. She is not told that an algorithm was involved, that her postcode contributed to the flag, that the model's developers have already been audited by independent researchers and found wanting. She is not told that the city paid a vendor several million dollars for the tool, or that the vendor's contract prohibits disclosure of the model's inner workings. She is not told that, in another country with a different legal regime, she would have had a statutory right to ask for and receive an explanation of the decision that put a stranger in her kitchen.
If American child welfare is going to have any meaningful answer to the question of what happened to her, the answer will not come from the agencies that deployed the tools or from the vendors that built them. It will come from courts willing to take procedural due process seriously when it is dressed in code, from legislators willing to pass disclosure requirements that survive vendor lobbying, and from a civil society infrastructure that does not yet exist at the scale the problem demands. The April 2026 audits, and the Berkeley analysis from earlier in the year, and the Markup investigation that preceded both, are not a complete map of the problem. They are a sufficient one. The technology is here. The harms are documented. The scaffolding of rights is a decade behind.
The next time the knock comes, the family on the other side of the door deserves, at minimum, a piece of paper that tells them what they are dealing with. That is not a radical demand. It is the floor.
References
- The Markup. (2025). Investigation into the Administration for Children's Services algorithmic risk-scoring tool, New York City. The Markup.
- Eubanks, V. (2018). Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor. St. Martin's Press.
- Roberts, D. E. (2022). Torn Apart: How the Child Welfare System Destroys Black Families and How Abolition Can Build a Safer World. Basic Books.
- Chouldechova, A., Putnam-Hornstein, E., Benavides-Prado, D., Fialko, O., & Vaithianathan, R. (2018). A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions. Proceedings of Machine Learning Research, 81, 134-148.
- Vaithianathan, R., Maloney, T., Putnam-Hornstein, E., & Jiang, N. (2017). Children in the public benefit system at risk of maltreatment: Identification via predictive modeling. American Journal of Preventive Medicine, 45(3), 354-359.
- Allegheny County Department of Human Services. (2019). Allegheny Family Screening Tool: Methodology, Version 2. Allegheny County, Pennsylvania.
- Richardson, R., Schultz, J., & Crawford, K. (2019). Dirty data, bad predictions: How civil rights violations impact police data, predictive policing systems, and justice. New York University Law Review Online, 94, 192-233.
- Barocas, S., & Selbst, A. D. (2016). Big data's disparate impact. California Law Review, 104, 671-732.
- Anonymous authors. (2026, 21 April). Fairness audits of institutional risk models in welfare and safeguarding contexts. arXiv preprint.
- Anonymous authors. (2026, 21 April). Algorithmic fairness in case-note-augmented prediction systems. arXiv preprint.
- UC Berkeley research team. (2026, January). Notice, explanation, and appeal in life-altering algorithmic decisions: An empirical analysis. University of California, Berkeley.
- European Parliament and Council. (2016). Regulation (EU) 2016/679 (General Data Protection Regulation), Article 22.
- European Parliament and Council. (2024). Regulation (EU) 2024/1689 (Artificial Intelligence Act).
- Elzayn, H., Black, E., Vossler, P., et al. (2023). Measuring and mitigating racial disparities in tax audits. Stanford Institute for Economic Policy Research, working paper.
- Eubanks, V., & Mateescu, A. (2021). “We don't deserve this”: New app places homeless services under surveillance. Logic Magazine.
- Hurley, D. (2018). Can an algorithm tell when kids are in danger? The New York Times Magazine, 2 January.
- Brown, A., Chouldechova, A., Putnam-Hornstein, E., Tobin, A., & Vaithianathan, R. (2019). Toward algorithmic accountability in public services: A qualitative study of affected community perspectives. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems.
- Saxena, D., Badillo-Urquiola, K., Wisniewski, P. J., & Guha, S. (2020). A human-centered review of algorithms used within the U.S. child welfare system. Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems.
- Ho, D. E., & Engstrom, D. F. (2020). Algorithmic accountability in the administrative state. Yale Journal on Regulation, 37, 800-854.
- Citron, D. K., & Pasquale, F. (2014). The scored society: Due process for automated predictions. Washington Law Review, 89, 1-33.

Tim Green UK-based Systems Theorist & Independent Technology Writer
Tim explores the intersections of artificial intelligence, decentralised cognition, and posthuman ethics. His work, published at smarterarticles.co.uk, challenges dominant narratives of technological progress while proposing interdisciplinary frameworks for collective intelligence and digital stewardship.
His writing has been featured on Ground News and shared by independent researchers across both academic and technological communities.
ORCID: 0009-0002-0156-9795 Email: tim@smarterarticles.co.uk
Listen to the free weekly SmarterArticles Podcast








