Rethinking Responsibility: From Code to Human Rights
In artificial intelligence, we are often trained to solve computational problems with less focus on ethical questions. Last week at the Mila Summer School on Responsible AI and Human Rights, I had the opportunity to widen my perspective to see the very human issues that other fields are grappling with regarding AI. I saw a thorough outline of problems at every level from research and development all the way to legistlation. However, I did come away with optimism and drive for solutions, too.
As someone who spends most of his time thinking from a software and cognitive science perspective, I walked in expecting deep dives into technical mitigation strategies for bias, fairness metrics, or the latest in privacy-preserving machine learning. But surrounded by a mix of lawyers, policymakers, and social scientists, the discussion was rarely about optimizing loss functions. Instead, we grappled with frameworks that felt both more foundational and far more complex: international law, socio-technical systems, and the staggering, often invisible, material costs of our digital world. It was a powerful reminder that the most difficult problems in AI aren't solved with cleverer code; they must be confronted within the messy, human context they are part of.
From Vague "Ethics" to Concrete "Rights"
One of the first steps in confronting this human context is to move beyond the often-amorphous term "AI ethics." While it has become a common buzzword in the tech world, leading to ethics review boards and corporate principles, the term often feels voluntary and ill-defined. A major shift in perspective for me at the summer school was the deliberate framing of the conversation around a more rigorous concept: human rights.
This is not just a semantic difference. As speakers like Catherine Régis emphasized, ethics can sometimes feel like a "race to the top" with competing principles. A human rights framework, by contrast, provides a globally recognized, legally grounded floor of non-negotiable standards. It’s based on decades of international law, like the Universal Declaration of Human Rights, and comes with established concepts like the UN Guiding Principles on Business and Human Rights: states have a duty to Protect human rights, corporations have a responsibility to Respect them, and there must be access to Remedy when rights are violated.
This reframing moves the conversation from "what is the most ethical choice?" to "what are the fundamental rights at stake, and how do we ensure they are not violated?". For an engineer, this is a powerful shift. It turns a philosophical debate into a requirements-gathering exercise, albeit one of a profoundly different nature. It forces us to ask not just "is the model fair?" but "does this system impact the right to equality and non-discrimination, the right to privacy, or even the right to a healthy environment?".
AI as a Socio-Technical System
Another key takeaway came from Virginia Dignum’s talk on Responsible AI, which hammered home the point that AI is not just a technology, but a socio-technical system. As engineers, we often focus on the model itself - the data, the architecture, the performance metrics. But AI systems don't exist in a vacuum. They are, as Dignum put it, a "social construct," embedded in and interacting with a messy world of human institutions, economic incentives, and power dynamics.
She used a simple, powerful analogy: a park bench. A bench is just a bench, but if it’s designed with extra armrests to prevent homeless people from sleeping on it, that design choice is inherently political. The same is true for AI. Our design choices, from the data we select to the objective function we define, have downstream consequences that are shaped by and, in turn, shape the society they are deployed in.
This perspective forces us to zoom out from the model to its entire lifecycle. Alex Hernandez-Garcia's talk on the environmental impact of AI brought this into sharp focus. The computational cost of training a large model is something we can measure, but what about the material cost? A primer from researchers at Hugging Face estimates a single ChatGPT query can use several times the energy of a standard Google search. Research by Li et al. (2023) has shown that training a model like GPT-3 can consume 700,000 liters of clean freshwater for cooling data centers, which are often located in water-scarce regions. And as Kate Crawford documents in her work Atlas of AI, the minerals in our GPUs—cobalt, lithium, tungsten—are extracted from the earth, often under exploitative labor conditions that disproportionately affect people in the Global South.
Operationalizing Responsibility: Human Rights Assessments
So, what do we do? If the problems are this big and interconnected, how can a developer or a research lab even begin to act responsibly? This is where the concept of a Human Rights Assessment (HRA), introduced in a workshop by Samone Nigam of BSR, offered a concrete path forward.
An HRA is a process for systematically identifying and evaluating the potential human rights impacts of a project. What stood out to me was how it provides a structured, almost engineering-like approach to these complex social challenges. Consider, for example, a company developing an AI tool to screen resumes. A standard technical approach might focus on whether the model accurately predicts which candidates will succeed. An HRA, however, asks different questions. It forces us to assess the entire value chain and map the technology's potential impacts against internationally recognized human rights.
The process doesn't just ask if the model is biased; it asks if the system could infringe on the right to equality and non-discrimination. It scrutinizes the training data: whose resumes were used? Does the model penalize gaps in employment that disproportionately affect women or caregivers? Does it favor language typically associated with a specific demographic? It then asks us to prioritize these risks based on their severity to people, measured by:
- Scope: How many people are affected?
- Scale: How serious is the harm of being unjustly denied an opportunity?
- Remediability: Can this harm be undone?
This framework forces us to move beyond a narrow, model-centric view and consider the full, messy, human context of our work. The big takeaway for me after these first few days was that you can't just debug a human rights violation or optimize for social justice with a cleverer loss function. Responsibility isn't a feature you can add to a model; it's a process that has to be woven into every stage of development, from the first line of code to the last user interaction, and it requires listening to a much broader range of voices than we typically have in the room.
The Researcher as an "Implicated Subject"
So where does that leave someone like me, who sits on the engineering and research side of the table? It's easy to feel like these high-level discussions about policy, law, and global governance are disconnected from the day-to-day work of building models, writing code, or analyzing data. But the central lesson from the summer school was that these worlds aren't just connected; they're inseparable. The choices we make in our labs and on our laptops have direct lines to these bigger questions.
One of the most powerful framings for this came from Alex Hernandez-Garcia, who applied the concept of the "implicated subject," developed by scholar Michael Rothberg in the context of historical injustices, to those of us working within the technology ecosystem. The idea is that most of us are neither clear perpetrators of harm nor passive, uninvolved bystanders. We are implicated. Our work, our code, our research—even with the best intentions—helps produce and reproduce the very systems that generate these complex societal effects.
This framing resonated with me because it moves beyond blame and towards responsibility. It acknowledges that even when we aren't actively trying to cause harm, our actions and inactions have consequences. This is especially true in cognitive science and neuroscience, where we increasingly use AI not just as a tool, but as a model of the mind itself. If we train a model on data from WEIRD (Western, Educated, Industrialized, Rich, and Democratic) populations, as Henrich and colleagues famously pointed out, it will inevitably produce theories of cognition that are not universally applicable. If we use an LLM trained on the internet to model language processing, it will inherit all the biases of its training data—biases that can subtly reinforce flawed scientific conclusions.
Recognizing this implication means we have to change how we work. It’s not enough to just build a model that has high predictive accuracy on a benchmark. We have to:
- Understand the full context: Critically examine the assumptions baked into our data, our models, and our own research questions. This includes the social context from which data is collected and, as highlighted by the Human Brain Project's work on dual-use research, the potential for non-military applications of our work.
- Mitigate our own impact: This involves more than just technical debiasing. It means thinking about the entire research lifecycle, from ensuring data is sourced consensually and ethically to considering the environmental footprint of our computational experiments.
- Consider refusal: There are some research questions we probably shouldn't be asking, or technologies we shouldn't be building. As Su Lin Blodgett pointed out in her talk on ethical reasoning, deciding not to do something is a valid and sometimes necessary choice.
Building Responsibility into the Machine
While the summer school provided a fantastic high-level map of the legal and social terrain, a growing community of technical researchers is trying to build answers to these same challenges directly into the architecture of AI. These approaches don't replace the need for governance, but they represent the engineering response to it—translating principles into practice.
One of the most compelling frontiers is the field of NeuroAI, which seeks inspiration for AI safety from the one example of general intelligence we know works (most of the time): the human brain. The roadmap paper "NeuroAI for AI Safety" by Mineault et al. (2025) suggests moving beyond just mimicking human behavior and instead trying to reverse-engineer the brain's own mechanisms for robustness and specification. For instance, why are humans so good at generalizing, while AI models can fail catastrophically on out-of-distribution data, making "alien errors" that no person ever would? The theory is that our brains, shaped by evolution through a “genomic bottleneck”, have powerful inductive biases that constrain learning in safe and useful ways.
This isn't about naively copying the brain, flaws and all. As the NeuroAI roadmap points out, it’s a selective process of identifying and replicating beneficial properties while carefully avoiding known pitfalls. This could mean building "digital twins" of robust sensory systems to make AI less vulnerable to adversarial attacks, or trying to infer the brain’s intrinsic "loss functions"—the complex, homeostatic, and multi-objective reward systems that guide our behavior. This moves far beyond simple reward models and toward understanding the foundational drives that shape human values, a pursuit at the very heart of cognitive science.
On a parallel track, the AGI safety community is developing rigorous engineering frameworks to manage risks in highly capable models. The Google DeepMind paper, "An Approach to Technical AGI Safety and Security," outlines a strategy that feels like a direct, technical corollary to the Human Rights Assessment framework (Shah et al., 2025). It categorizes risks into Misuse (a bad actor uses the AI for harm) and Misalignment (the AI itself has goals that diverge from human intent). To address misalignment, they propose two key lines of defense. The first is building an "aligned model" through techniques like amplified oversight, where AI is used to help humans supervise more powerful AI—a fascinating cognitive science problem in its own right. How do you reliably oversee a system that is more capable than you are? The second line of defense is hardening the system against failure even if the model is misaligned, using robust monitoring and security controls. This is a pragmatic, defense-in-depth approach. It acknowledges that we can’t guarantee perfect alignment, so we must build systems that are safe even when they fail.
These engineering frameworks are crucial, but they also bring us face-to-face with even more profound challenges where technology, cognition, and rights become inextricably tangled. As AI tools become more integrated into our own research and creative workflows, we risk a subtle but significant erosion of human expertise through cognitive offloading. As Jolie Dobre highlights in a recent UX Matters article, our increasing reliance on AI to streamline tasks can diminish the very critical thinking and deep engagement required to develop true mastery (Dobre, 2025). This creates a dangerous paradox: the more we use AI to bypass difficult cognitive work, the less capable we may become of judging whether the AI's output is correct, creative, or even safe. This isn't just a theoretical risk; studies show that heavy AI use can lead to measurable declines in users' critical-thinking skills (Kosymyna et al., 2025).
This brings the question of responsibility to an intensely personal level. As AI systems become more adept at interpreting human data, including neural data, the ethical stakes skyrocket. As Ienca and Andorno argued years ago, we are rapidly approaching a reality where the traditional right to privacy is insufficient. We need to consider new, more fundamental protections like the right to cognitive liberty (the freedom to control one's own mental processes) and the right to mental privacy (Ienca & Andorno, 2017). The skull is no longer the final bastion of privacy. The work that I and my colleagues do - using AI to model the brain - could, without the right safeguards, become the very tools for violating it.
This forces us to critically examine our scientific methods. The problem of bias, for example, is not just a technical glitch in a dataset but a systemic issue that infects the entire research lifecycle. As described in a recent review by Cross et al. (2024), bias can be introduced at every stage: from the initial data collection—where cognitive science’s over-reliance on WEIRD (Western, Educated, Industrialized, Rich, and Democratic) populations creates a notoriously skewed foundation (Henrich et al., 2010)—to the subtler biases embedded in how we label data or evaluate models. Even our most sophisticated tools for understanding models, Explainable AI (XAI), present a fundamental paradox. While a recent study shows how XAI attribution methods can brilliantly align LLM behavior with brain activity (Rahimi et al., 2025), we must heed the warning from the NIST’s XAI principles: we need to ensure an explanation is not only meaningful and understandable, but also accurate (Phillips et al., 2021). We must constantly ask whether we are explaining a genuine cognitive mechanism or merely a clever artifact of the model we built.
Finally, this wave of generative technology even turns back on science itself. The ease with which GenAI can produce plausible-sounding text, data, and even scientific visualizations—as Resnik and Shamoo (2025) have warned—presents a fundamental threat to research integrity. The very tools we are developing to accelerate discovery could be used to fabricate it, undermining the public trust that science depends on. This moves the discussion beyond simply applying responsible AI principles to our research subjects and forces us to apply them to ourselves. The choices we make about transparency, disclosure, and verification are no longer just methodological footnotes; they are central to upholding the integrity of our own work and our responsibility as "implicated subjects" in this new technological ecosystem.
Conclusion: The Engineer as Steward
Leaving the summer school, the biggest takeaway wasn't a new algorithm, but a new lens. I had walked in expecting to talk about technology, and I left understanding that the most urgent conversations are about governance, law, and human dignity. The journey from the abstract idea of "AI ethics" to the concrete, legally-grounded framework of human rights was a fundamental re-orientation. It became clear that our responsibility doesn't end at the model's output; it extends across the entire socio-technical system—from the environmental and labor costs of the GPUs in our servers to the societal impact of the applications we enable.
But this isn't just a conversation for policymakers and lawyers. For those of us building these systems, the challenge is to translate these principles into practice. The technical roadmaps from communities like NeuroAI and AGI safety are not just about preventing misuse or misalignment; they are engineering's answer to the human rights framework. They are attempts to build systems that are robust not just in their performance, but in their respect for human values—whether by drawing inspiration from the brain's own safety mechanisms or by designing rigorous, multi-layered security controls.
Ultimately, this brings the responsibility back to us, the individual researchers and engineers. Recognizing ourselves as "implicated subjects" is a call to change not just what we build, but how we build it. It means questioning the data we use, scrutinizing the biases in our models, and critically evaluating the explanations they provide. It means confronting the dual risk of our own cognitive offloading and the potential for the very tools we create to undermine the integrity of science itself.
The work of a cognitive scientist or an AI engineer is no longer just about building a better model. It's about understanding how that model fits within a world of human rights, cognitive limits, and societal trust. It's about being not just a creator, but a steward. The hardest problems in AI, it turns out, are not computational; they are a profound and necessary reflection of ourselves.
References
Cross, J. L., Choma, M. A., & Onofrey, J. A. (2024). Bias in medical AI: Implications for clinical decision-making. PLOS Digital Health, 3(11), e0000651.
Dobre, J. (2025, February 3). Designing AI for Human Expertise: Preventing Cognitive Shortcuts. UX Matters.
Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33(2-3), 61-83.
Ienca, M., & Andorno, R. (2017). Towards new human rights in the age of neuroscience and neurotechnology. Life Sciences, Society and Policy, 13(1), 5.
Kosmyna, Nataliya, et al. Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task. arXiv, 10 June 2025, arxiv.org/abs/2506.08872
Mineault, P., et al. (2025). NeuroAI for AI Safety. arXiv preprint arXiv:2411.18526.
Phillips, P. J., et al. (2021). Four Principles of Explainable Artificial Intelligence. NIST Internal Report 8312. National Institute of Standards and Technology.
Rahimi, M., Yaghoobzadeh, Y., & Daliri, M. R. (2025). Explanations of Deep Language Models Explain Language Representations in the Brain. arXiv preprint arXiv:2502.14671.
Resnik, D. B., & Shamoo, A. E. (2025). GenAI synthetic data create ethical challenges for scientists. Here's how to address them. Proceedings of the National Academy of Sciences, 122(9), e2409182122.
Shah, R., et al. (2025). An Approach to Technical AGI Safety and Security. arXiv preprint arXiv:2504.01849.