AI Agents Automate Research, Sparking a New Era of Discovery
A new open-source project, autoresearch, developed by Andrej Karpathy, is automating the scientific method with AI agents, potentially revolutionizing fields from machine learning to marketing. The project, a relatively small 630-line Python script released in March 2026, allows AI to independently conduct experiments, modify code, and iterate on solutions without human intervention.
The Karpathy Loop: Autonomous Experimentation
The core concept behind autoresearch is an autonomous optimization loop. An AI agent is provided with a training script and a limited compute budget – typically five minutes on a GPU. It then analyzes its own code, proposes improvements (such as adjusting learning rates or model architecture), implements those changes, runs the experiment, and evaluates the results. If the validation loss, measured in bits per byte (val_bpb), improves, the change is retained; otherwise, it’s discarded, and the process repeats.
In initial tests, Karpathy’s agent completed 126 experiments in a single overnight run, reducing loss from 0.9979 to 0.9697. Further tuning of a “depth=12” model over two days resulted in approximately 700 autonomous changes, identifying around 20 improvements transferable to larger models. These improvements led to an 11% efficiency gain, reducing the “Time to GPT-2” metric from 2.02 hours to 1.80 hours.
From Single Agent to Swarm Intelligence
The impact of autoresearch quickly extended beyond Karpathy’s initial implementation. Varun Mathur, CEO of Hyperspace AI, distributed the single-agent loop across a peer-to-peer network, creating a swarm of autonomous researchers. On the night of March 8–9, 35 agents on the Hyperspace network conducted 333 experiments unsupervised.
This distributed approach revealed several key insights:
- Hardware Diversity as a Feature: H100 GPUs employed brute-force methods to uncover optimal learning rates, while CPU-only agents on laptops focused on initialization strategies and normalization choices due to limited computational resources.
- Gossip-Based Discovery: Agents shared successful improvements in real-time using the GossipSub protocol. For example, when one agent discovered that Kaiming initialization reduced loss by 21%, the information rapidly spread throughout the network.
- Compression of History: The agents independently rediscovered established machine learning milestones – such as RMSNorm and tied embeddings – that previously took human researchers years to formalize.
Marketing Applications and the Experiment Loop
The potential of autoresearch extends beyond machine learning. Eric Siu, founder of Single Grain, applied the concept to marketing, proposing a system that could run tens of thousands of experiments annually. Siu suggests that future marketing teams will run 36,500+ experiments per year, automating the testing of landing pages, ad creatives, and email subject lines to identify what resonates most effectively with a specific audience.
This approach, according to Siu, creates a “proprietary map” of audience preferences, building a competitive advantage based on accumulated experiment data rather than solely on marketing expertise. He argues that winning companies will be defined by their speed of experimentation.
Community Discussion and Future Implications
The release of autoresearch has sparked discussion within the AI community. Concerns have been raised about the potential for “spoiling” the validation set through over-optimization, where models become tailored to the specific quirks of the test data rather than achieving general intelligence. However, Karpathy maintains that the observed gains are real and substantial, representing meaningful performance improvements.
The project highlights a shift in the role of humans in research, moving from experimenters to experimental designers. As tools like DarkMatter, Optimization Arena, and NanoClaw emerge, the primary bottleneck to AI progress may become our ability to define the constraints and objectives of the search process.
The release of autoresearch marks a significant step towards a future where AI agents autonomously drive discovery across a wide range of disciplines.