Tools for the era of experience

I recent read this beautiful chapter from the unpublished book “Designing an Intelligence” by David Silver and Rich Sutton . It’s titled “Welcome to the Era of Experience” and you can read it here. The basic gist is this:

What next? - Experiential learning

AI agents will learn by doing - continuously interacting with environments and learning from the results, (similar to how AlphaProof generated millions of mathematical proofs to exceed human performance). The postulate that this new era of training AI agents will be based on the following four characteristics:

The Era of Experience PhaseSignalsPredictions
Synthetic-experience ramp-upLabs release more computer-use, Operator, Mariner style agents that click, code, and call APIs continuously.Compute spend shifts from one-shot inference to always-on interaction. GPU cost per trajectory becomes a new KPI
Reward diversificationVertical startups define composite metrics (e.g. reduced QA defects per deployment; lab assay yield per $, etc.).Owning measurement pipelines (sensors, logs, BI dashboards) becomes more important than prompt engineering
World-model renaissancePapers on scalable planning in partially observed, non-stationary environments get popular in conferencesToolchains for causal modelling, counterfactual simulation, and long-horizon credit assignment become the TensorFlow/PyTorch of the era.
Safety loop tensionAgents that can act for weeks raise new mis-alignment and interpretability worries, but also create richer feedback to correct them .Expect regulation that requires continuous auditing hooks, agent “flight recorders”, and kill-switch APIs.

The bitter lesson from decades of AI: over time, the winning models are those that scale best with compute and data, not the ones with the most clever human-designed heuristics. That’s the core of Tesla’s bet: no LiDAR, no human-engineered rules. Just YOLOing on vision + a learning model + fleet data. This also explains why OpenAI moved from fine-tuning with small curated datasets to huge amounts of online data + reinforcement learning on top. Same for Anthropic and other leading AI labs. But before brute-force models got good enough, everyone needed high-quality, structured data: labeling, RLHF, SFT, safety tuning, evaluation, etc. This was the middle layer of the stack: operations-heavy and labor-dependent. But if the vision is: “We’re going to skip the human bottleneck and go fully self-supervised or reinforcement-style learning in a simulated or real environment, with scalable compute” then these data labelling companies look like bad long-term bets. (that is not to say they are bad short term bets and the company I work for is/was an investor in Scale AI). The gist of the point is that infinite labeling doesn’t scale past a certain model quality threshold.

Part of my job description here at work is to figure out new paradigms in AI and where value creation will happen in the future. If OpenAI and Anthropic and xAI can brute-force intelligence using their compute, what can any startup possibly do that they can’t just replicate better? What can we regular people build that Sam Altman and Sundar Pichai cannot? This blog post tries to answer that question.

What to build for this era then?

The Silver & Sutton paper paints a picture where compute + environment beats compute + labels. If that’s true, then the game isn’t about who has the most GPUs (although that too matters a lot). It’s about who controls the richest streams of experiential data.

This is how I am thinking about it: While OpenAI and Google can throw infinite compute at general intelligence, they can’t be everywhere at once. They can’t own every workflow, every sensor, every domain-specific feedback loop. That’s where the opportunity lies.

1. Own the Environment

Remember how Tesla’s real USP isn’t their AI team or their Dojo supercomputer but their fleet? Every Tesla on the road is a data collector. The winners in the era of experience won’t be building better foundation models - they’ll be building proprietary environments where agents can learn things nobody else can access.

Take construction sites. Imagine a startup that deploys cheap sensors across construction projects tracking worker movements, equipment usage, material flow, safety incidents, etc. An agent learning in this environment could discover patterns humans miss. Maybe it notices that accidents spike when certain equipment configurations appear together, or that productivity drops in specific weather + task combinations. The construction companies can’t build this themselves, and OpenAI can’t access this data without the physical deployment.

Or consider hospital emergency rooms. A startup could build the connective tissue between EMR systems, vital sign monitors, staff location trackers, and patient flow systems. The agent not only reads static records but also experiences the ER in real-time, learning which triage decisions lead to better outcomes months later. It might discover that routing certain patients to specific doctors based on subtle vital sign patterns reduces readmission rates by 20%. Google can train GPT-6, but they can’t access your hospital’s real-time sensor feed.

For non-hardware domain, this could be an area where legacy tech companies might have an edge. They already have a great wealth of simulation data from their users and they are best positioned to train these agents. Here is an excerpt from the Pleias.fr blog on training LLM agents

What else to build:

The key: You need proprietary access to environments where cause-and-effect cycles play out. No amount of compute at OpenAI can simulate what happens when you change fertilizer mix #47 in greenhouse #12. Start thinking about these environments for the vertical of your choice and build the RL layer on top.

2. Engineer Better Rewards (the alignment arbitrage)

Here’s where it gets really interesting. In the current paradigm, everyone’s obsessed with prompt engineering. In the Era of Experience, the money will be in reward engineering.

Let me explain with a concrete example. Say you’re building an AI sales assistant. OpenAI’s version might optimize for user engagement or satisfaction scores. But what if your reward function is composite - 30% close rate, 20% customer lifetime value, 20% time-to-close, 30% post-sale NPS score measured 6 months later? Suddenly your agent is learning completely different behaviors. It might discover that slowing down the sales process for enterprise clients actually increases LTV, or that certain objection-handling patterns correlate with lower churn a year later.

The beauty is that these reward functions become proprietary knowledge. You’re essentially programming business strategy directly into the agent’s learning process. A competitor can copy your UI, but they can’t copy the three years of refined reward engineering that makes your agent act like a senior enterprise sales rep instead of a chatbot.

More examples:

We’re already seeing early versions of this. Harvey (the legal AI) is building environments where the agent can practice contract negotiations, with rewards based on deal outcomes months later. The agent learns strategies that no human lawyer explicitly programmed. (While they claim this, there is lot of chatter around how its just ChatGPT with a legal system prompt)

Chemistry VC also wrote a post on this about RLaaS companies which you can find here.

The real-world impact of this approach is exemplified by a case from Veris AI: an agent trained with RL to automate the complex, hours-long process of supplier negotiations. By training on realistic simulations of Slack and email conversations—complete with sensitive data—the agent learns optimal tone, questions to ask, and search strategies, dramatically outperforming prompt chaining or one-shot LLM attempts.

3. Distribution moats - Own the interface

This one’s subtle but crucial. The big labs will build general intelligence, but someone needs to own the last mile - the surfaces where agents actually interact with humans and systems.

Think Figma for the Era of Experience. Figma didn’t invent vector graphics or collaboration. But they owned the interface where designers actually work. Now imagine “Figma for AI Agents” - dashboards where product managers can spin up agents, define reward functions with visual tools, monitor long-running agent workflows, and intervene when needed.

Or take enterprise software. Salesforce won’t be disrupted by a smarter CRM. But they might be disrupted by someone who builds the connective tissue that lets agents actually operate across CRM, email, calendar, Slack, and analytics tools. The moat is owning the interface where humans supervise and direct these long-running agent processes.

What to build:

I’m seeing early versions of this in companies building “agent orchestration platforms.” They’re building the Kubernetes for agents - the infrastructure layer that lets you deploy, monitor, and manage hundreds of specialized agents working on month-long projects. Julep.ai is a good example for this but they still haven’t figured out the monitoring and dashboard layer.

4. Synthetic Worlds (the AlphaGo for X)

I personally find this super interesting. The next generation of training data will come from synthetic environments designed specifically to train agents.

Imagine you’re trying to train an agent for supply chain optimization. Instead of waiting for real supply chain disruptions (expensive and rare), you build a high-fidelity simulation. Your simulation includes weather patterns, geopolitical events, port congestions, manufacturing delays - all interacting in complex ways. The agent can experience a thousand supply chain crises before breakfast, learning strategies that no human has ever tried. Lyric.tech IIRC is actually building this causal model.

Or financial markets. While the big labs are training on historical data, you’re building synthetic markets where agents can practice trading strategies with realistic but accelerated market dynamics. Your agent experiences 50 years of market cycles in a week, discovering patterns in the interaction between news events, sentiment, and price movement that would take human traders decades to learn.

More synthetic world opportunities:

The key insight: OpenAI won’t build a supply chain simulator or a synthetic derivatives market. Too niche. But for a focused startup, that simulator becomes a data factory, churning out experience that’s worth more than all the text on the internet for that specific domain. Here is one more post from ChemistryVC on the topic of synthetic worlds

What’s remarkable is that these agents are beginning to converge on human behavior, in some cases more accurately than traditional surveys when compared to real-world outcomes. Corporations and governments routinely spend hundreds of thousands of dollars and wait months to gather data from human panels. Synthetic research offers a faster, cheaper alternative—with the potential for a continuous stream of insight rather than discrete snapshots in time. Imagine being able to assess how much people would pay for your product, anticipate the impact of a new tax, or even predict election outcomes—in real time. This could fundamentally reshape how decisions are made across industries.

5. The sketchy frontier

This last one’s controversial. The big labs are constrained by their visibility, regulation, and cultural commitments to safety. There’s a whole frontier of capabilities they won’t touch for years.

I’m not talking about building weapons or breaking laws. I’m talking about agents that operate in the grey zones of human interaction:

A startup building “uncensored therapy agents” for specific populations might discover interaction patterns that actually help people more than our current “always honest, always safe” approach. Maybe the agent learns that for certain personality types recovering from addiction, a small tactical lie about recovery statistics prevents relapse better than brutal honesty.

The Builder’s Playbook

Based on all this, here’s a guide:

1. Secure the telemetry monopoly now Find a niche where you can instrument everything. Every action, every outcome, every millisecond. This data exhaust becomes your moat. A friend’s startup put sensors in commercial kitchens - they now know more about restaurant operations than anyone. While they are still figuring out what to do with this data, I am pretty sure this will become their moat in the long run.

2. Build reward engineering tools The next TensorFlow but for reward function composition. Imagine Mixpanel + reinforcement learning. Let domain experts define complex objectives without coding. Think Airflow-grade orchestration for computing and updating rewards on live data streams.

3. Go full-stack on agents Don’t build thin wrappers around GPT-5. Build the entire loop: perception → action → outcome measurement → model update. Own the whole cycle. Back end-to-end agent stacks, not wrapper apps. The value migrates from prompt veneers to mechanisms that store memories, run planners, and retrain online.

4. Instrument for safety from day one When your agent runs for months unsupervised, you need audit trails, rollback mechanisms, and kill switches. The first startup to make “safe autonomous agents” easy will win enterprise contracts. Companies that operationalize real-time red-teaming and rollback will win enterprise trust fastest.

The Bottom Line

The Era of Experience IMO is a redistribution of power from compute-rich to data-rich players. OpenAI has the GPUs, but you can own the environments. They have the models, but you can engineer the rewards. They have the general intelligence, but you can own the specific workflows.

If I had to place bets (and I do, it’s literally my job now and before I was staking horses for poker), here’s where I’d put chips:

The next steps is clear: Find a domain where experience matters. Instrument the hell out of it. Define rewards that align with real-world outcomes. Deploy agents that learn continuously. Build interfaces that lock in users.

And remember - by the time OpenAI decides to compete in your specific niche, you’ll have months or years of proprietary experience data. In the Era of Experience, that’s the only moat that matters. Tesla won by having millions of cars generating edge cases. That’s the inspiration. The future isn’t about bigger models eating more data. It’s about smarter agents learning from richer experience. They’ll build the intelligence. We’ll build the world it lives in.

Now go build something Sam Altman can’t.

·