Animals vs Ghosts by Karpathy: A Realistic Path for Enterprise AI

saurabhsarkar
Oct 6, 2025
5 min read

Most AI systems today are ghosts: trained on human text, fine-tuned for alignment, and limited to observing and predicting.The future may lie in animals: AI systems that act, learn, and adapt from experience. But the path from ghost to animal is farmore complex than most realize. This article explains why enterprises should focus on perfecting "ghosts with tools" whilecarefully experimenting with "animal instincts" in controlled environments—and why this transition will take years, not months.

1. The Big Idea: Ghosts vs Animals

Ghosts are today's LLMs. They learn from human data and imitate reasoning, language, and expertise. They don't learncontinuously or interact with the world directly.

Animals, in contrast, learn through experience. They explore, make mistakes, and improve based on feedback. Think ofAlphaZero learning chess or a robot learning to walk.

Karpathy's metaphor highlights a core truth: we've built powerful mimics, not learners. True intelligence demandsinteraction, feedback, and goals beyond text prediction.

But here's what Karpathy's piece doesn't emphasize: the gap between game-playing AI and business AI is enormous. Chesshas perfect information and instant feedback. Business has ambiguous data, delayed outcomes, and human unpredictability.

2. Why Ghosts Still Rule (And Will for Years)

Despite the hype around AI agents, most enterprise value today—and for the foreseeable future—comes from ghosts. Theyare predictable, secure, auditable, and surprisingly effective for most business needs.

Where ghosts deliver immediate value:

Document Intelligence: Drafting contracts, summarizing reports, extracting insights from unstructured data

Decision Support: Analyzing options, identifying patterns, explaining regulatory requirements

Process Automation: Handling structured workflows through RPA and API orchestration

Knowledge Management: Making institutional knowledge searchable and actionable through RAG

The underappreciated truth:

A well-implemented ghost with good tools can handle 80% of enterprise AI use cases. The remaining 20% that could benefitfrom adaptive learning often isn't worth the 10x additional complexity and cost.

3. The Real Challenge of Building Animals

Moving from ghost to animal means teaching AI to learn from real-world outcomes. This is extraordinarily difficult because:

Technical Challenges:

Feedback is delayed and noisy: Did that pricing decision work? Check back in 6 months after the project completes

Credit assignment is nearly impossible: Was it the AI's recommendation, market conditions, or the sales team that closed the deal?

Exploration is expensive: Every experiment risks real money, customer relationships, or regulatory violations

Organizational Challenges:

Governance breaks down: How do you audit a system that changes its behavior daily?

Accountability becomes murky: Who's responsible when an adaptive system makes a decision no human explicitly programmed?

Skills gap widens: You need ML engineers who understand both reinforcement learning and your business domain— a rare combination

The Simulator Trap:

Many believe simulators solve these problems. They don't.

Business simulators are nothing like game environments:
Customer behavior isn't deterministic
Competitors adapt to your strategies
Regulations change unexpectedly
Black swan events can't be simulated

Even sophisticated simulators capture maybe 60% of real-world complexity. Models trained purely in simulation often fail catastrophically in production.

4. The Practical Path: Ghost Core, Cautious Evolution

At Phenx Machine Learning Technologies, we've learned that successful enterprise AI follows a patient evolution:

Phase 1: Perfect the Ghost (6-12 months)

Build robust RAG pipelines on your proprietary data

Create deterministic tool integrations (SQL, APIs, RPA)

Establish governance, monitoring, and rollback procedures

Measure everything: accuracy, latency, cost, user satisfaction

Phase 2: Add Smart Tools (3-6 months)

Introduce optimization algorithms for specific, bounded problems

Use traditional ML for predictive models with clear feedback loops

Implement A/B testing infrastructure for decision policies

Keep humans in the loop for all consequential decisions

Phase 3: Controlled Learning (12-24 months)

Identify narrow domains with fast, clear feedback

Build hybrid simulators that combine historical data with limited forward modeling

Run parallel tracks: production (ghost) and experimental (animal)

Gradually increase autonomy only where demonstrated safe and profitable

5. Architecture for the Real World

Data Layer:

Private vector databases (pgvector, Qdrant) for unstructured data

Traditional databases for structured business data

Immutable audit logs for all decisions and outcomes

Intelligence Layer:

Local or VPC-hosted LLMs for reasoning and planning

Specialized models for specific tasks (time series, vision, NLP)

Explainability tools (SHAP, LIME) for regulatory compliance

Action Layer:

Read-only access by default

Write permissions through approved, versioned APIs

Human approval queues for material decisions

Automatic rollback on anomaly detection

Learning Layer (Experimental):

Offline reinforcement learning on historical decisions

Bandit algorithms for pricing and recommendation

Careful online learning with strict bounds and kill switches

Regular retraining schedules, not continuous learning

Governance Layer:

Version control for all models and policies

Reproducible decisions (same input → same output)

Regular fairness and bias audits

Clear escalation paths for edge cases

6. Where This Actually Works (With Caveats)

Dynamic Pricing (Retail/E-commerce)

What works: Multi-armed bandits for price testing with clear conversion metrics

What doesn't: Fully autonomous pricing in complex B2B negotiations

Timeline: 6-9 months to meaningful results

Predictive Maintenance (Manufacturing)

What works: Anomaly detection and failure prediction from sensor data

What doesn't: Fully autonomous maintenance scheduling without human oversight

Timeline: 12-18 months including sensor deployment and data collection

Credit Decisioning (Financial Services)

What works: Improved risk scoring with explainable features

What doesn't: Adaptive models that change approval criteria without regulatory review

Timeline: 18-24 months including regulatory approval

7. What to Measure (And When to Stop)

Success Metrics:

Business impact: revenue lift, cost reduction, efficiency gains

Model performance: accuracy, precision, recall, calibration

Operational health: latency, uptime, error rates

Governance compliance: audit completeness, override rates, bias metrics

Warning Signs to Pause Development:

Degrading performance in production vs. testing

Increasing human override rates

Unexpected behavior that can't be explained

ROI below cost of capital for the complexity added

8. The Cost-Benefit Reality Check

Before pursuing animal capabilities, calculate:

Costs:

Engineering: 3-5 senior ML engineers for 12-24 months

Infrastructure: 2-10x compute costs for training and simulation

Governance: New processes, audits, and oversight requirements

Risk: Potential for costly mistakes during learning

Benefits:

Incremental improvement: Often 5-15% over well-tuned ghosts

Adaptation speed: Faster response to market changes (weeks vs. months)

Competitive advantage: Temporary, until competitors catch up

For most enterprises, optimizing ghost systems yields better ROI than building primitive animals.

9. Hard-Won Lessons

Perfect ghosts beat primitive animals: A well-tuned LLM with good tools outperforms a poorly trained RL system every time.

Simulators disappoint: They're useful for initial training but never capture real-world complexity. Budget for extensive real-world testing.

Humans stay in the loop longer than expected: Even successful animal systems need human oversight for edge cases and strategic decisions.

Governance is the bottleneck: Technical challenges are solvable; organizational and regulatory acceptance takes years.

Start narrow, stay narrow: Successful learning systems solve specific problems with clear feedback. General-purpose business agents remain science fiction.

10. The Bottom Line

The evolution from ghosts to animals in enterprise AI is not a 90-day sprint but a multi-year journey. Companies that succeed will:

Master ghost capabilities first: Build robust, explainable, tool-using LLM systems
Experiment carefully with learning: Focus on narrow domains with clear feedback
Accept the complexity costs: Understand that adaptive systems require 10x the investment of static ones
Maintain realistic expectations: Recognize that business environments resist the clean abstractions that make game-playing AI successful

The future of enterprise AI isn't choosing between ghosts and animals—it's accepting that we'll live in a ghost-dominated world for years to come, with small pockets of animal-like adaptation where the physics of the problem allows and the economics justify the complexity.

Companies that acknowledge this reality and plan accordingly will build practical, profitable AI systems. Those chasing the dream of fully autonomous business agents will likely waste resources on complexity that delivers marginal value.

The path forward is clear: Build exceptional ghosts, add tools thoughtfully, and reserve animal instincts for the few places where adaptation truly matters and can be safely achieved.