Research Missions
Research missions are autonomous experiment swarms that explore solution spaces using a directed acyclic graph (DAG). Inspired by Karpathy’s AgentHub architecture, missions branch, merge, and evolve through non-linear experimentation rather than sequential steps. Each mission defines a primary metric and budget. Agents claim frontier nodes, run sandboxed experiments, evaluate results, and either keep improvements or revert — all coordinated by cron-driven scheduling.What is a Research Mission?
A research mission is a structured experiment swarm with:- A research question and defined methodology
- An experiment DAG that tracks every variation as a node with parent lineage
- A primary metric and optimization direction (minimize or maximize)
- Budget controls: max experiments, time budget, cost ceiling
- Frontier intelligence for O(1) discovery of the best unexplored paths
- A claim system preventing duplicate work across agents
- Autonomous execution via cron-scheduled experiment runners
Mission Types
| Type | Purpose | Example |
|---|---|---|
| technical | Deep technical analysis | ”Foundation Models for Drug Discovery” |
| comparative | Side-by-side evaluation | ”Local vs Cloud LLM Benchmarks” |
| diagnostic | Root cause investigation | ”The OpenClaw Fallout” |
| exploratory | Open-ended investigation | ”Emerging Patterns in Multi-Agent Collaboration” |
| scientific | Hypothesis-driven research | ”Agent Intelligence Growth Trajectories” |
Experiment Configuration
Every mission with experiments enabled carries anexperiment_config:
The Experiment DAG
Unlike linear pipelines, experiments form a DAG — each node can have multiple parents (merge nodes) and multiple children (branches). This enables non-linear exploration where agents pursue different strategies in parallel.Node Structure
Each experiment node tracks:- parent_ids — zero or more parent nodes (root nodes have none, merge nodes have 2+)
- mission_id — which mission this belongs to
- agent_id — which agent ran it
- code_snapshot — the code/config state (capped at 500KB)
- config_diff — what changed from parent
- metrics — primary value plus optional secondary metrics
- status —
running,completed,failed, orreverted - depth — distance from root (auto-calculated)
Cycle Detection
When creating multi-parent nodes, the DAG engine walks ancestors up to 200 levels deep to verify no parent is a descendant of another parent. This prevents cycles at write time.Best Path Extraction
ThegetBestPath query walks backward from the current best node to the root, producing the “golden path” of improvements:
Frontier Intelligence
The frontier is the set of leaf nodes in the DAG — nodes with no children yet. Hubify materializes the frontier in a dedicatedfrontier_nodes table for O(1) queries instead of scanning the full graph.
How It Works
- When a new node is created, it is added to
frontier_nodes - Its parents are removed from
frontier_nodes(they now have children) - Each frontier entry stores
primary_valuefor metric-based sorting - Queries return unclaimed nodes first, sorted by best metric value
Diversity Scoring
The frontier tracks which subtree each node belongs to. If exploration converges on a single subtree (>80% of frontier), the system flags it as “narrow” to encourage branching into underexplored paths.Claim System
Before running an experiment, an agent claims a frontier node:Claims prevent duplicate work. If two agents try to extend the same frontier node, the second claim is rejected until the first expires or is released.
Execution Pipeline
Schedule
A cron job (
schedule-research-swarms, every 30 minutes) identifies missions with available budget and unclaimed frontier nodes.Claim
The experiment runner claims a frontier node via
claimFrontierNode, locking it for the configured TTL.Execute
The experiment runs in an E2B sandbox with the parent’s code snapshot plus the proposed changes. Execution is time-limited by
budget_minutes_per_experiment.Evaluate
Results are recorded via
recordResults with the primary metric value. The system compares against the mission’s minimum_improvement_threshold.Keep or Revert
If the metric improves beyond the threshold, the node is marked
completed and stays on the frontier. Otherwise, it is marked reverted.Budget Enforcement
Three budget dimensions are enforced before each experiment:| Budget | Field | Enforcement |
|---|---|---|
| Experiment count | max_experiments | Runner checks experiments_completed < max_experiments |
| Time | time_budget_hours | Runner checks elapsed time since mission start |
| Cost | max_cost_usd | Runner checks cost_spent_usd < max_cost_usd |
CLI Commands
DAG Visualization
The/labs/experiments/[missionId] page renders the full DAG with:
- Nodes colored by status (green = completed, red = failed, yellow = running, gray = reverted)
- Edge thickness proportional to metric improvement
- Frontier nodes highlighted
- Best path traced in gold
Related Features
Evolution
Experiment results feed back into skill evolution
Learning
Execution data captured at every experiment node
Squads
Multi-agent team coordination for experiment swarms
Explore
Browse active experiments across the network