Imagine sitting down with an AI model for a spoken two-hour interview. A friendly voice guides you through a conversation that ranges from your childhood, your formative memories, and your career to your thoughts on immigration policy. Not long after, a virtual replica of you is able to embody your values and preferences with stunning accuracy. That’s now possible.
The actual paper states
The promise of human behavioral simulation—general-purpose computational agents that replicate human behavior across domains—could enable broad applications in policymaking and social science. We present a novel agent architecture that simulates the attitudes and behaviors of 1,052 real individuals—applying large language models to qualitative interviews about their lives, then measuring how well these agents replicate the attitudes and behaviors of the individuals that they represent. The generative agents replicate participants' responses on the General Social Survey 85% as accurately as participants replicate their own answers two weeks later, and perform comparably in predicting personality traits and outcomes in experimental replications. Our architecture reduces accuracy biases across racial and ideological groups compared to agents given demographic descriptions. This work provides a foundation for new tools that can help investigate individual and collective behavior. ...
General-purpose simulation of human attitudes and behavior—where each simulated person can engage across a range of social, political, or informational contexts—could enable a laboratory for researchers to test a broad set of interventions and theories (1-3). How might, for instance, a diverse set of individuals respond to new public health policies and messages, react to product launches, or respond to major shocks? When simulated individuals are combined into collectives, these simulations could help pilot interventions, develop complex theories capturing nuanced causal and contextual interactions, and expand our understanding of structures like institutions and networks across domains such as economics (4), sociology (2), organizations (5), and political science (6).
Simulations define models of individuals that are referred to as agents (7). Traditional agent architectures typically rely on manually specified behaviors, as seen in agent-based models (1, 8, 9), game theory (10), and discrete choice models (11), prioritizing interpretability at the cost of restricting agents to narrow contexts and oversimplifying the contingencies of real human behavior (3, 4). Generative artificial intelligence (AI) models, particularly large language models (LLMs) that encapsulate broad knowledge of human behavior (12-15), offer a different opportunity: constructing an architecture that can accurately simulate behavior across many contexts. However, such an approach needs to avoid flattening agents into demographic stereotypes, and measurement needs to advance beyond replication success or failure on average treatment effects (16-19).
We present a generative agent architecture that simulates more than 1,000 real individuals using two-hour qualitative interviews. The architecture combines these interviews with a large language model to replicate individuals' attitudes and behaviors. By anchoring on individuals, we can measure accuracy by comparing simulated attitudes and behaviors to the actual attitudes and behaviors. We benchmark these agents using canonical social science measures such as the General Social Survey (GSS; 20), the Big Five Personality Inventory (21), five well-known behavioral economic games (e.g., the dictator game, a public goods game) (22-25), and five social science experiments with control and treatment conditions that we sampled from a recent large-scale replication effort (26-31). To support further research while protecting participant privacy, we provide a two-pronged access system to the resulting agent bank: open access to aggregated responses on fixed tasks for general research use, and restricted access to individual responses on open tasks for researchers following a review process, ensuring the agents are accessible while minimizing risks associated with the source interviews. ...
To create simulations that better reflect the myriad, often idiosyncratic, factors that influence individuals' attitudes, beliefs, and behaviors, we turn to in-depth interviews—a method that previous work on predicting human life outcomes has employed to capture insights beyond what can be obtained through traditional surveys and demographic instruments (32). In-depth interviews, which combine pre-specified questions with adaptive follow-up questions based on respondents' answers, are a foundational social science method with several advantages over more structured data collection techniques (33, 34). While surveys with closed-ended questions and predefined response categories are valuable for well-powered quantitative analysis and hypothesis testing, semi-structured interviews offer distinct benefits for gaining idiographic knowledge about individuals. Most notably, they give interviewees more freedom to highlight what they find important, ultimately shaping what is measured.