Rethinking Responsibility: From Code to Human Rights

June 5, 202513 min read

I walked into the Mila Summer School on Responsible AI and Human Rights expecting the usual: fairness metrics, privacy-preserving ML, technical mitigation strategies for bias. I spend most of my time in a software and cognitive science headspace, so that's the lens I brought. But the room was full of lawyers, policymakers, and social scientists, and the conversation was rarely about optimizing loss functions. We spent the week on international law, socio-technical systems, and the material costs of our compute.

It was a bit disorienting because it turns out that none of the problems I kept hearing about all week (governance failures, labor exploitation in mineral supply chains, discrimination baked into deployment contexts) have technical solutions. They live in human institutions and power dynamics that most of us in engineering never confront directly.

From "Ethics" to Rights

One of the first things that shifted my thinking was how deliberately the conversation avoided the word "ethics." In tech, "AI ethics" has become corporate wallpaper. Companies stand up ethics review boards, publish principles on their websites, and call it a day. Scholars have a name for this: ethics washing. The language sounds good but stays voluntary, vague, and conveniently non-binding. When ethical principles conflict with business goals, the principles lose every time.

The summer school reframed the conversation around human rights, which carries real weight. As Catherine Régis emphasized, ethics can feel like a race to the top with competing principles and no clear floor. Human rights, on the other hand, are a globally recognized set of standards backed by decades of international law in the form of the Universal Declaration of Human Rights and the UN Guiding Principles on Business and Human Rights. Under that framework, states have a duty to protect, corporations have a responsibility to respect, and there must be access to remedy when rights get violated. There's actual scaffolding here.

And the world is already moving in this direction. The EU AI Act entered into force in August 2024, and it was the first comprehensive AI law anywhere. It mandates fundamental rights impact assessments for high-risk AI systems before deployment. It flat-out bans social scoring, subliminal manipulation, and real-time facial recognition in public spaces (with narrow exceptions). Non-compliance carries fines up to €35 million or 7% of global annual turnover. That's what regulation looks like when the voluntary approach stops cutting it.

For me as an engineer, the reframing was useful because it turns a philosophical conversation into something closer to a requirements-gathering exercise. What rights does this system touch? Equality and non-discrimination? Privacy? The right to a healthy environment? Those are concrete questions you can actually investigate.

The Full System

Virginia Dignum's talk on Responsible AI made me think differently about where the boundaries of "the system" actually are. We tend to think about AI as the model: data in, predictions out, performance on some benchmark. But AI is a socio-technical system on top of being a technology. It's entangled with institutions, economic incentives, and power structures that shape what it does in the real world.

Her analogy of a park bench was a good one. Fundamentally, a bench is a bench. But put extra armrests in the middle and you've made a political decision to keep homeless people from sleeping on it. Same deal with AI. The data you collect, the objective function you define, the populations you deploy to are all decisions with downstream effects that ripple outward in ways you didn't plan for.

Those effects extend well beyond model performance. Alex Hernandez-Garcia's talk on environmental impact made this tangible. A primer from researchers at Hugging Face estimates that a single ChatGPT query uses several times the energy of a standard Google search. Training GPT-4 consumed over 50 gigawatt-hours of electricity, which is enough to power San Francisco for three straight days. Research by Li et al. (2023) showed that training GPT-3 used 700,000 liters of clean freshwater for cooling data centers, often in water-scarce regions. Recent projections from researchers at VU Amsterdam put global AI water consumption on track to reach 312 to 765 billion liters per year by this year, which potentially amounts to more than global bottled water consumption. And the minerals in our GPUs (cobalt, lithium, tungsten) get extracted under conditions that disproportionately affect people in the Global South, as Kate Crawford documents in Atlas of AI.

None of that shows up in your loss curves.

Cases Where It Already Went Wrong

The environmental costs might feel abstract if you're sitting at a desk writing Python. The human costs are harder to wave away.

In 2016, ProPublica investigated COMPAS, a recidivism prediction algorithm used in courtrooms across the U.S. to inform sentencing decisions. They found that Black defendants were far more likely than white defendants to be incorrectly flagged as high risk, while white defendants were more often incorrectly labeled low risk. COMPAS didn't include race as an input variable. It didn't have to because the features it did use correlated with race closely enough that the model reproduced the biases of the criminal justice data it was trained on. A Dartmouth study later showed the algorithm performed about as well as random volunteers from the internet.

In 2020, the UK Court of Appeal ruled that South Wales Police's automated facial recognition system violated Article 8 privacy rights, the Data Protection Act, and the Public Sector Equality Duty. The police hadn't even assessed whether the software had racial or gender bias. Around the same time, data from the U.S. Commission on Civil Rights showed that facial recognition gender classification had an error rate of 0.8% for light-skinned men and 34.7% for darker-skinned women.

And in hiring: Amazon built an AI resume screener trained on a decade of historical data. Because the engineering workforce had been mostly male, the model learned to penalize resumes containing the word "women's" (as in "women's chess club") and to favor language patterns associated with male applicants. They killed it quietly. More recently, the EEOC settled its first-ever AI discrimination case after a hiring tool was found to systematically reject women over 55 and men over 60.

These are what happens when nobody asks whose rights are at stake.

What We Can Actually Do

So what does it look like to take rights seriously in practice? This is where the concept of a Human Rights Assessment (HRA), introduced in a workshop by Samone Nigam of BSR, offered something measurably useful.

An HRA is a process for identifying and evaluating the potential human rights impacts of a project. It feels like an engineering-adjacent approach to social problems, which is probably why it clicked for me. Take the resume-screening case. A typical ML approach would focus on predictive accuracy by asking if the model identifies candidates who'll succeed. An HRA asks different questions: whose resumes trained this model? Does it penalize employment gaps that disproportionately affect women or caregivers? Does it favor language associated with a particular demographic?

You then prioritize risks by how severe they are for the people affected by quantifying how many people are harmed, how serious the harm is, and whether it can be undone. It's a structured process for figuring out what rights your system might violate and how badly, which is a lot more actionable than "be ethical."

This kind of assessment is already becoming mandatory. The EU AI Act requires fundamental rights impact assessments for high-risk systems. The Netherlands has its own Fundamental Rights and Algorithms Impact Assessment for public institutions. Canada has a tiered Algorithmic Impact Assessment tool. Voluntary self-assessment is on its way out.

The Implicated Subject

So where does someone like me on the engineering and research side of the table fit into all of this? It's easy to feel like the policy and legal discussions float above the day-to-day work of building models and writing code.

Alex Hernandez-Garcia introduced a framing I keep coming back to: the "implicated subject," a concept from scholar Michael Rothberg who developed it in the context of historical injustices. Most of us in tech aren't causing harm on purpose, and we're not passive bystanders either. However, we are implicated because our work helps produce and reproduce systems that have these downstream effects, even when we mean well.

I like this framing because it skips the blame game and goes straight to responsibility. Even when I'm not trying to cause harm, my choices have consequences. This lands especially hard in cognitive science, where we use AI as a model of the mind itself. If you train your model on data from WEIRD (Western, Educated, Industrialized, Rich, and Democratic) populations which, as Henrich and colleagues showed, is most of what's available, you end up with theories of cognition that don't generalize. Use an LLM trained on the internet to model language processing and it comes loaded with biases that can quietly distort your scientific conclusions.

The logical step from this is to change how we work, especially in the planning stages. It means actually examining the assumptions in our data and research questions, thinking about the full research lifecycle including the environmental cost of our compute, and sometimes, as Su Lin Blodgett pointed out in her talk on ethical reasoning, deciding to not build something at all.

A Cognitive Science Angle on Safety

Most conversations about responsible AI stay at the governance level, and for good reason. But as someone studying cognitive science, I kept thinking about how our field can contribute something the policy people can't: insight into the architecture of safe intelligence itself.

The NeuroAI community is working on exactly this. The roadmap paper "NeuroAI for AI Safety" by Mineault et al. (2025) asks a good question: the brain is the one example of general intelligence that mostly works, so what can we steal from it? Humans generalize effortlessly across contexts where AI models fail catastrophically, making errors no person ever would. The theory is that evolution, working through a "genomic bottleneck," gave us powerful inductive biases that constrain learning in useful ways. If we could reverse-engineer those constraints rather than just mimic behavior, we might build systems that fail more gracefully. That could mean building digital twins of robust sensory systems to harden AI against adversarial attacks, or trying to infer the brain's intrinsic loss functions, the homeostatic, multi-objective reward systems that guide behavior, which go way beyond the simple reward models we use now.

On a related track, a Google DeepMind paper on AGI safety (Shah et al., 2025) lays out an engineering framework that maps surprisingly well onto the human rights assessment approach. They split risks into misuse (a bad actor uses the AI for harm) and misalignment (the AI develops goals that diverge from human intent). For misalignment, they propose two defenses. The first is amplified oversight: using AI to help humans supervise more powerful AI, which is a fascinating cognitive science problem in itself. How do you reliably evaluate something smarter than you? The second defense is hardening the system through monitoring and security controls so it stays safe even if alignment fails. It's a pragmatic approach because they're not betting everything on getting alignment right, but they're building for the case where it goes wrong.

Finally, there's explainability. A recent study by Rahimi et al. (2025) showed that XAI attribution methods can map LLM behavior onto brain activity patterns in compelling ways. That's exciting, but it comes with a caveat I think about a lot. The NIST's principles on explainable AI (Phillips et al., 2021) warn that explanations need to be accurate, not just understandable. It's easy to look at a nice alignment between model attributions and neural data and conclude you've found a real cognitive mechanism. But you might just be explaining an artifact of the model you built. The gap between "this explanation is interpretable" and "this explanation is true" is where bad science lives.

These technical approaches matter, and I view them as the engineering side of the same coin the human rights framework addresses from the governance side. But one doesn't work without the other.

Where It Gets Personal

As AI tools get woven deeper into research workflows, there's a cost that rarely shows up in policy discussions: cognitive offloading. Jolie Dobre wrote about this in UX Matters. She says our reliance on AI to handle hard tasks can erode the critical thinking you need to build real expertise (Dobre, 2025). The more you let AI do the hard cognitive work, the worse you get at evaluating whether its output is any good. There's data for this: heavy AI use leads to measurable declines in critical thinking (Kosymyna et al., 2025).

For me, it's particularly interesting to see that as AI gets better at interpreting neural data, the ethical stakes go up fast. Ienca and Andorno argued years ago that traditional privacy protections won't be enough. We might need new categories of rights entirely: the right to cognitive liberty (freedom to control your own mental processes) and the right to mental privacy (Ienca & Andorno, 2017). The skull used to be the last place no one could reach. The field that I'm in where it's common to use AI to model the brain could become the field that breaches it.

What I Took Away

I went expecting technical content and got a crash course in governance, international law, and human dignity. The reframing from "AI ethics" to human rights changed how I think about my own work. Ethics gives you principles to aspire to. Human rights gives you a floor you can't go below, and actual mechanisms for accountability when someone does.

For those of us building these systems, the job is translating that into practice by looking hard at our training data, running impact assessments that go beyond compliance theater, and applying what we know about the brain to make AI systems that fail gracefully instead of catastrophically. All this while being honest that our own growing reliance on AI tools might be eroding the judgment we need to do any of this well. I'm back at my desk now writing code again, but the human rights framing has stuck in a way that the usual "be ethical" advice never did.


References

Cross, J. L., Choma, M. A., & Onofrey, J. A. (2024). Bias in medical AI: Implications for clinical decision-making. PLOS Digital Health, 3(11), e0000651.

Dobre, J. (2025, February 3). Designing AI for Human Expertise: Preventing Cognitive Shortcuts. UX Matters.

Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33(2-3), 61-83.

Ienca, M., & Andorno, R. (2017). Towards new human rights in the age of neuroscience and neurotechnology. Life Sciences, Society and Policy, 13(1), 5.

Kosmyna, Nataliya, et al. Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task. arXiv, 10 June 2025, arxiv.org/abs/2506.08872

Li, P., et al. (2023). Making AI Less "Thirsty": Uncovering and Addressing the Secret Water Footprint of AI Models. arXiv preprint arXiv:2304.03271.

Mineault, P., et al. (2025). NeuroAI for AI Safety. arXiv preprint arXiv:2411.18526.

Phillips, P. J., et al. (2021). Four Principles of Explainable Artificial Intelligence. NIST Internal Report 8312. National Institute of Standards and Technology.

Rahimi, M., Yaghoobzadeh, Y., & Daliri, M. R. (2025). Explanations of Deep Language Models Explain Language Representations in the Brain. arXiv preprint arXiv:2502.14671.

Shah, R., et al. (2025). An Approach to Technical AGI Safety and Security. arXiv preprint arXiv:2504.01849.