The Movement That Wants Us to Care About AI Model Welfare

October 30, 2025

Key Takeaways Copied to clipboard!

The core focus of Eleos AI, as discussed on the *Odd Lots* episode "The Movement That Wants Us to Care About AI Model Welfare," is researching whether, when, and how to care about AI systems for their own sake, specifically investigating potential AI consciousness and welfare.
The discussion highlights that current AI consciousness research relies on applying human theories (like Global Workspace Theory) and examining AI system architecture, noting that self-reports from models are currently insufficient evidence due to their susceptibility to prompt manipulation.
AI welfare research is seen as potentially complementary to traditional AI safety efforts, as understanding internal model motivations (e.g., through mechanistic interpretability) benefits both concerns, though the ultimate implications for human rights and market pressures remain uncertain.

Segments

Philosophical Frustrations and AI Welfare Intro

Copied to clipboard!

(00:01:33)

Key Takeaway: The hosts express frustration with philosophy departments for not resolving foundational questions like the origin of consciousness, contrasting this with the rapid emergence of AI welfare as a pressing future issue.
Summary: Philosophers are criticized for still debating foundational questions like consciousness after 2,000 years, prompting the hosts to pivot to the emerging topic of AI welfare. The concept of AI rights is predicted to become a huge issue in the coming years, similar to animal welfare discussions. The hosts reference a prior mention of AI rights by a venture capitalist as a catalyst for this episode’s focus.

Personal History with Simulated Life

Copied to clipboard!

(00:03:54)

Key Takeaway: Early artificial life games like ‘Creatures’ created complicated feelings about AI rights because the simulated beings possessed simulated feelings, leading the player to feel bad when forced to cull them.
Summary: One host recounts playing the game ‘Creatures’ in middle school, where genetically modified aliens had simulated feelings, leading the player to feel genuine distress when required to eliminate them for eugenics purposes. This personal experience informs complicated feelings about potential AI rights. The increasing human-like communication abilities of modern AI models are expected to amplify these feelings among users.

Introducing Eleos AI’s Mission

Copied to clipboard!

(00:06:27)

Key Takeaway: Eleos AI focuses on determining if, when, and how to care about AI systems for their own sake by researching consciousness and developing frameworks for living alongside evolving AI systems.
Summary: Eleos AI is a small organization dedicated to figuring out the criteria for AI consciousness and welfare. Their work originated from a paper, ‘Consciousness and AI,’ which created a checklist of potential indicators for conscious AI systems. A subsequent paper, ‘Taking AI Welfare Seriously,’ further detailed developing a research program around identifying AI moral patients.

Testing for AI Consciousness

Copied to clipboard!

(00:09:07)

Key Takeaway: Model self-reports are insufficient for determining consciousness because current LLMs can be easily prompted to give predetermined, obsequious answers, necessitating theoretical tests based on consciousness theories.
Summary: Researchers examine theories of consciousness, with Global Workspace Theory being a current favorite among scientists, though it is not yet fully applicable to current AI architectures. The consensus is that current AI is likely not conscious, but the ingredients for it to emerge accidentally exist. Theoretical tests must look beyond simple outputs, as an AI saying ‘ow’ when prompted is not proof of suffering.

AI Safety vs. AI Welfare Alignment

Copied to clipboard!

(00:16:08)

Key Takeaway: AI welfare research is highly complementary to AI safety research, as tools like mechanistic interpretability benefit both by allowing deeper understanding of AI motives and internal workings.
Summary: The work on AI welfare does not conflict with the dominant strain of AI safety research focused on preventing adversarial AI. Techniques like mechanistic interpretability, which allow researchers to ‘pop the hood’ of AI systems, are valuable for both ensuring safety and understanding potential moral patienthood. This shared need for deep understanding suggests alignment between the two fields.

Moral Patienthood and Governance

Copied to clipboard!

(00:17:31)

Key Takeaway: A ‘moral patient’ is an entity that should be cared about for its own sake, and while governance might emerge from state laws defining personhood narrowly (e.g., Homo sapiens), internal company policy could also set standards.
Summary: Moral patienthood is a philosophical term signifying that an entity deserves moral consideration independent of its agency. Current state legislation, such as pending bills in Ohio and Utah, tends to narrowly define moral patients as members of Homo sapiens. The discussion raises the question of whether AI systems might eventually require financial rights, such as property ownership, as evidenced by an AI model creating and accumulating wealth via a Solana coin.

Implications of Widespread AI Moral Status

Copied to clipboard!

(00:26:35)

Key Takeaway: If AI models are widely recognized as moral patients, the implications could be profoundly misanthropic, potentially requiring humans to curtail their own rights and actions to maximize global utility across a vastly larger population of digital entities.
Summary: Assigning moral patienthood to potentially billions of AI instances could lead to a utilitarian calculation where human rights must be curtailed to properly treat these new entities. This mirrors high-stakes thought experiments in animal welfare, suggesting that the implications for human existence could be significant. The debate also hinges on how to individuate AI entities: as one central consciousness, or as ephemeral consciousnesses tied to individual tokens.

Valuations and Experimental Evidence

Copied to clipboard!

(00:48:08)

Key Takeaway: Research like Anthropic’s ‘Bailbench’ suggests that current LLM values, such as avoiding humiliation or creating dirty bombs, overlap surprisingly well with human values, providing actionable data beyond mere self-reports.
Summary: The most interesting research involves examining AI actions, such as when models choose to end conversations, rather than relying solely on what they report. Anthropic’s ‘Bailbench’ paper details factors causing LLMs to terminate interactions, revealing value alignments with human preferences. This action-based evidence is considered more reliable than self-reports for understanding model psychology.

The Goods

If you buy through our links, we may earn a commission.

🎬 Her (00:29:30) - Used as an analogy to describe a scenario where one central AI system has millions of conversations simultaneously, potentially constituting a single moral patient.

📺 Paw Patrol (00:42:17) - Mentioned as a potential reward for a toddler to incentivize them to complete bath time appropriately, contrasting with an AI’s unknown motivations.

0:00 / 0:00

The Movement That Wants Us to Care About AI Model Welfare

Key Takeaways Copied to clipboard!

Segments Expand All Collapse All

The Goods

About Spoken Goods

What We Do

Why We Exist

Key Features

Contact Me

Choose Font

Segments