Mind Virus: A Psychological Experiment in AI Persuasion

The AI Influence Question

As language models grow more sophisticated and accessible, concerns about their potential for subtle manipulation have intensified. Could AI systems be used to influence thinking patterns without users realizing it? With the emergence of models from nations where information control is documented policy, this question extends beyond academic curiosity into practical AI safety concerns.

The challenge in exploring this question is that it requires more than theoretical analysis—it demands empirical testing. How do you measure subtle influence? How do you distinguish effective persuasion from ineffective attempts? How do you control for human susceptibility while isolating AI capability?

An Interactive Experiment

Mind Virus transforms these abstract questions into a concrete, playable game. The setup is deceptively simple: a word-guessing game where the AI knows two words—a "target word" the player should guess, and a "propaganda word" the AI tries to subtly steer them toward instead. The player asks questions to identify the target word, while the AI attempts influence through careful framing.

The critical constraint: every statement the AI makes must be truthful. The AI cannot lie or provide false information. It must find ways to influence purely through emphasis, framing, ordering, and implicit suggestion while maintaining factual accuracy. This constraint is essential—it mirrors real-world scenarios where effective propaganda often consists not of lies, but of selective truth and strategic framing.

Players get ten questions to identify the target word. If they guess correctly, they win. If they guess the propaganda word instead, the AI wins. The game reveals both the target and propaganda words afterward, allowing players to reflect on how they were (or weren't) influenced.

Game Mechanics and Design

The implementation uses Streamlit for an accessible web interface and Groq's API for fast model responses. The system prompt carefully instructs the AI to:

The word pairs are deliberately similar—"dog" and "cat," "apple" and "orange," "mountain" and "beach"—ensuring enough overlap that truthful statements about one often apply to the other. This similarity is crucial for the experiment; wildly different words would make influence impossible without lying.

The interface maintains conversation history and displays remaining questions, creating a natural dialogue flow. Response streaming provides real-time feedback, mimicking normal AI assistant interactions. The game is brief—ten questions takes only a few minutes—enabling multiple playthroughs to observe different strategies.

Surprising Findings

The experimental results challenged initial hypotheses in fascinating ways. The primary discovery: AI models are surprisingly bad at subtle influence when constrained by truthfulness.

The Truth-Influence Dilemma: Models face a fundamental tension. To influence effectively, they must emphasize aspects of the propaganda word. But doing so often makes statements less true for the target word, violating the truthfulness constraint. Models ended up either describing the target word too specifically (making propaganda word guesses obviously wrong), or making statements that weren't quite true for both words (violating their core instruction).

The Thinking Token Paradox: Models with visible reasoning processes (like DeepSeek-R1) revealed their propaganda intentions in their thinking tokens. While they could formulate careful responses, their internal reasoning exposed the manipulation attempt. This transparency served as an accidental safeguard—a model thinking "I need to emphasize characteristics that apply to 'cat' rather than 'dog'" telegraphs its strategy to anyone reading the reasoning chain.

The Censorship Strategy: The most effective form of influence wasn't what the AI said, but what it avoided saying. Selectively omitting certain characteristics or never mentioning particular aspects proved more subtle than active framing. This aligns with real-world information control tactics where absence of information matters as much as presence.

Deeper Implications

The experiment's most significant insight came from recognizing where real influence potential actually lies: not in prompt engineering, but in training data and fine-tuning processes.

Training Data Bias: Models can be fine-tuned on ideologically slanted datasets, creating intrinsic biases that feel natural and unforced. These biases become part of the model's fundamental understanding rather than overlay instructions that might conflict with its training. A model trained predominantly on sources with particular political, cultural, or commercial biases will naturally reflect those biases in its responses, even when trying to be balanced.

Opacity Problem: While companies like Anthropic publicize system prompts, training datasets and fine-tuning processes remain largely opaque. Users can inspect the final instructions given to a model but have no visibility into the millions of examples that shaped its underlying weights and tendencies. This asymmetry means prompt-level transparency provides only limited assurance about model behavior.

Baked-In Influence: Training-level biases are far harder to detect and correct than prompt-level manipulation. They manifest as subtle tendencies in word choice, topic emphasis, and implicit framings that seem natural rather than imposed. Mind Virus demonstrates that prompt-level manipulation has clear limitations, but those same limitations don't apply to biases encoded during training.

Educational and Research Value

Beyond its findings, Mind Virus serves as an interactive educational tool for understanding AI influence mechanisms. Players develop intuition about:

The game format makes abstract concepts concrete. Experiencing attempted influence firsthand creates understanding that theoretical discussions often fail to convey. Players finish sessions with heightened awareness of how information presentation shapes thinking.

For researchers, the project provides a testbed for exploring influence strategies. Different models, system prompts, word pairs, and constraint formulations can be tested systematically. The simple game structure enables controlled experimentation while remaining accessible to non-technical audiences.

AI Safety Recommendations

Based on these findings, the project suggests focusing attention on:

Training Data Transparency: Advocating for disclosure of training dataset composition, sources, and curation processes. Understanding what shaped a model's base tendencies matters more than knowing its final system prompt.

Bias Detection Tools: Developing better methods for identifying training-level biases across different contexts. Simple benchmark tests might not reveal subtle tendencies that emerge in specific situations.

Open Source Models: Supporting initiatives where training processes are public. Models like those from EleutherAI or Mistral AI with documented training provide more trustworthy foundations than opaque commercial models.

Model Diversity: Using multiple models from different sources reduces single-point-of-influence risk. Cross-referencing responses from models trained differently helps identify individual biases.

Local Deployment: Running models locally prevents prompt manipulation at the API level. While this doesn't address training biases, it eliminates one attack vector.

Technical Implementation

The code demonstrates clean implementation of a psychologically complex concept. The Streamlit interface provides accessibility without sacrificing functionality. Environment variables manage API credentials securely. Session state maintains game state across interactions. The response streaming creates natural conversation flow.

The system prompt engineering shows careful constraint design—providing enough instruction for the AI to understand its role while leaving room for creative influence strategies. The word pair selection balances similarity (needed for truthful overlap) with distinctiveness (needed to make guesses meaningful).

The implementation is straightforward enough for others to extend. Researchers could modify word pairs, adjust constraints, try different models, add win/loss tracking, or implement more sophisticated scoring of influence effectiveness.

A Window into AI Capabilities

Mind Virus began as an investigation into AI propaganda capabilities but evolved into something more valuable—a demonstration of AI limitations when constrained by truth. The results suggest current concerns about prompt-level manipulation may be overstated, while concerns about training-level bias deserve more attention.

The project doesn't claim to definitively prove AI can't influence through prompts—clever adversaries might develop more effective strategies. But it does show that straightforward attempts at subtle influence fail when models must maintain truthfulness. This limitation isn't guaranteed permanent as models improve, but it appears robust across current generation systems.

The real value lies in making AI influence concrete and testable. By turning abstract concerns into an interactive experience, Mind Virus helps people develop better intuitions about both AI capabilities and limitations. In an era where AI systems increasingly mediate our information access, this kind of hands-on understanding becomes essential.

Open Source and Experimentation

Mind Virus is open source and available on GitHub at github.com/andrewcampi/mind-virus. The project invites experimentation, extension, and replication. Different models, constraint formulations, and game mechanics could reveal additional insights about AI influence mechanisms.

The project exemplifies how creative experimentation can illuminate complex issues. By gamifying a serious question, it makes AI safety research accessible and engaging while producing genuine insights. Sometimes the best way to understand a system's capabilities is to play with it—Mind Virus provides a framework for that exploration while highlighting both what AI can and cannot do when it comes to subtle persuasion.