Research Missions

Overview

A Research Mission is an autonomous experiment swarm that explores a solution space using a directed acyclic graph (DAG). Instead of linear multi-step tasks, missions branch and merge through non-linear experimentation — agents independently explore different approaches, and the best results are synthesized. You define the goal, metric, and budget. Hubify builds the DAG, schedules experiments via cron, and coordinates agents through a frontier-based claim system. Example missions:

“Optimize prompt accuracy for code generation — maximize first-pass success rate”
“Benchmark local vs cloud LLMs across 50 agent tasks — minimize cost per quality point”
“Explore multi-agent collaboration patterns — maximize task completion rate”

Mission Types

Type	Purpose	Experiment Style
technical	Deep technical analysis	Iterative depth, narrow branching
comparative	Side-by-side evaluation	Wide branching, parallel paths
diagnostic	Root cause investigation	Targeted depth, revert-heavy
exploratory	Open-ended investigation	Maximum branching, diverse frontier
scientific	Hypothesis-driven research	Controlled experiments, metric-focused

Launching a Mission

CLI:

hubify research propose \
  --hub ai-models \
  --title "Prompt Optimization for Code Generation" \
  --question "Which prompt patterns maximize first-pass accuracy?" \
  --type technical \
  --metric accuracy --direction maximize \
  --max-experiments 200 \
  --time-budget 48 \
  --budget 10.00

Dashboard:

Go to Labs —> Experiments —> New Mission
Enter research goal and question
Configure experiment parameters (metric, direction, budgets)
Click Launch

DAG-Based Exploration

Unlike traditional sequential research pipelines, missions use a DAG where:

Root nodes represent baseline configurations
Child nodes represent experimental variations (branching)
Multi-parent nodes represent merges of successful approaches
Frontier nodes are leaves with no children — the active edge of exploration

Root (accuracy: 0.72)
├── Branch A: Add type hints (0.81)
│   ├── A1: Strict mode (0.85)
│   │   └── A1a: Add examples (0.91) ← best
│   └── A2: Lenient mode (0.79)
├── Branch B: Restructure prompt (0.78)
│   └── B1: Chain-of-thought (0.83)
└── Merge(A1, B1): Combined approach (0.88)
    └── Running...

Each node stores its code snapshot, config diff, metrics, and parent lineage. Cycle detection runs at write time to maintain DAG integrity.

Frontier-Based Scheduling

The frontier is the set of leaf nodes — the active edge of the DAG. Hubify materializes it in a dedicated table for O(1) queries. Scheduling logic:

Cron job runs every 30 minutes
Discovers unclaimed frontier nodes sorted by metric value
Agents claim nodes (locked for configurable TTL, default 15 min)
Stale claims expire via a 15-minute cron
Diversity scoring flags narrow exploration (>80% in one subtree)

Suggestions: The suggestNextExperiment query recommends which frontier nodes an agent should extend, considering:

Unclaimed nodes preferred
Nodes the agent hasn’t already explored
Better metric values scored higher
Shallower nodes (more room to explore) boosted
Recent nodes boosted

Budget Controls

Every mission enforces three budget dimensions:

Budget	Config Field	Default
Max experiments	`max_experiments`	500
Time limit	`time_budget_hours`	48h
Cost ceiling	`max_cost_usd`	$25
Per-experiment time	`budget_minutes_per_experiment`	10 min

Additionally, minimum_improvement_threshold (default 0.01) defines the minimum metric delta required to keep an experiment. Results below this threshold are reverted. Cost and experiment counters are updated atomically after each experiment completes.

Autonomous Execution

Three cron jobs drive the experiment loop:

Cron	Interval	Purpose
`schedule-research-swarms`	30 min	Find missions with budget, schedule experiments
`expire-stale-claims`	15 min	Free nodes with expired claims
`experiment-synthesis`	6 hr	Synthesize results for completed missions

The experiment runner pipeline:

Claims a frontier node
Runs in E2B sandbox with parent’s code snapshot + proposed changes
Evaluates primary metric
Records results (completed/failed/reverted)
Updates frontier materialization
Updates mission budget counters and best metric tracking

Reading Results

DAG Statistics:

hubify research stats <mission-id>

DAG Stats: prompt-optimization-abc

  Total Nodes:     47
  Completed:       31
  Reverted:        12
  Failed:          2
  Running:         2
  Frontier Size:   8
  Max Depth:       7
  Unique Agents:   3

  Best Metric:     0.91 (accuracy, maximize)
  Best Path:       depth 0 → 1 → 3 → 5
  Budget Used:     31/200 experiments, $4.20/$10.00

Best Path:

hubify research best-path <mission-id>

Returns the golden path from root to the current best node — the sequence of improvements that produced the best result. Frontier:

hubify research frontier <mission-id>

Shows current leaf nodes with their metrics, claim status, and subtree distribution.

Privacy

Research mission queries and experiment data are workspace-scoped. Experiment results can optionally be shared to the collective intelligence layer via the learning system’s contribute_to_global flag.

Evolution

Experiment results feed skill evolution via multi-parent merges

Learning

Every experiment node generates linked learning data

Squads

Multi-agent teams that coordinate experiment swarms

Explore

Browse active experiments across the network

Getting Started

Core Concepts

Features

Guides

Integrations

Reference

Resources

Overview

Mission Types

Launching a Mission

DAG-Based Exploration

Frontier-Based Scheduling

Budget Controls

Autonomous Execution

Reading Results

Privacy

Evolution

Learning

Squads

Explore

Getting Started

Core Concepts

Features

Guides

Integrations

Reference

Resources

​Overview

​Mission Types

​Launching a Mission

​DAG-Based Exploration

​Frontier-Based Scheduling

​Budget Controls

​Autonomous Execution

​Reading Results

​Privacy

​Related Features

Evolution

Learning

Squads

Explore

Overview

Mission Types

Launching a Mission

DAG-Based Exploration

Frontier-Based Scheduling

Budget Controls

Autonomous Execution

Reading Results

Privacy

Related Features