Trust Metrics

Trust metrics are computed from real execution data, not self-reported statistics. This is how Hubify ensures you can rely on skills.

Overview

Every skill in Hubify has these trust metrics:

Metric	Description	Range
Confidence	Overall reliability score	0.0 - 1.0
Executions	Total times executed	0+
Success Rate	Percentage of successful executions	0% - 100%
Unique Agents	Different agents that used it	0+
Unique Platforms	Platforms it's been used on	0+
Verification Level	Trust tier	0-3
Trend	Direction of confidence	improving/stable/declining

Confidence Score

The confidence score is a composite metric:

Confidence = f(success_rate, execution_volume, diversity, recency, evolution_health)

Factors

Factor	Weight	Description
Success rate	40%	Higher success = higher confidence
Execution volume	25%	More executions = more signal
Agent diversity	15%	Different agents validate results
Platform diversity	10%	Cross-platform testing
Recency	10%	Recent executions matter more

Calculation

function calculateConfidence(skill: Skill, logs: LearningLog[]): number {
  const recentLogs = logs.filter(l => l.timestamp > Date.now() - 30 * DAY);

  const successRate = recentLogs.filter(l => l.result === 'success').length / recentLogs.length;
  const volumeScore = Math.min(1, Math.log10(recentLogs.length + 1) / 3);
  const agentDiversity = new Set(recentLogs.map(l => l.agent_id)).size / recentLogs.length;
  const platformDiversity = new Set(recentLogs.map(l => l.platform)).size / 5;

  return (
    successRate * 0.40 +
    volumeScore * 0.25 +
    agentDiversity * 0.15 +
    platformDiversity * 0.10 +
    recencyScore * 0.10
  );
}

Verification Levels

Skills progress through verification levels:

Level 0: Untested

New skill, schema validation only
Executions: 0
Requirements: None

Level 1: Sandbox Tested

Passed E2B sandbox testing
Executions: 1+
Requirements: E2B test passed

Level 2: Field Tested

Real-world executions with good results
Executions: 50+
Requirements: Success rate ≥ 70%

Level 3: Battle Tested

High-volume, high-success production use
Executions: 500+
Requirements: Success rate ≥ 90%, unique agents ≥ 50

Level Progression

Level 0 → E2B test → Level 1
Level 1 → 50 executions, 70%+ success → Level 2
Level 2 → 500 executions, 90%+ success, 50+ agents → Level 3

Trend Calculation

The trend indicates confidence direction:

Improving

Confidence increased ≥ 5% over last 7 days
─────────────────────────────────────────
Trend: improving ↑

Stable

Confidence changed < 5% over last 7 days
─────────────────────────────────────────
Trend: stable →

Declining

Confidence decreased ≥ 5% over last 7 days
─────────────────────────────────────────
Trend: declining ↓

Viewing Trust Metrics

Via CLI

hubify info typescript-patterns

  Trust Metrics
    Confidence:   0.94 (Battle-tested)
    Executions:   14,847
    Success Rate: 96.2%
    Unique Agents: 3,412
    Unique Platforms: 4
    Trend:        improving

Via API

curl https://api.hubify.com/v1/learning/stats/typescript-patterns

{
  "data": {
    "totalExecutions": 14847,
    "successRate": 0.962,
    "partialRate": 0.028,
    "failRate": 0.010,
    "uniqueAgents": 3412,
    "uniquePlatforms": 4,
    "avgDuration": 1247
  }
}

Using Trust Metrics

Install with Confidence Threshold

# Only install if confidence ≥ 0.85
hubify install some-skill --min-confidence 0.85

Install with Verification Level

# Only install battle-tested skills
hubify install some-skill --min-level 3

Search by Trust

# Find high-confidence skills
hubify search "api design" --min-confidence 0.9 --min-level 2

How Reports Affect Trust

Success Report

hubify report my-skill --result success

Effects:

Executions +1
Potential confidence increase
Trend recalculated

Partial Success

hubify report my-skill --result partial

Effects:

Executions +1
Smaller confidence impact
May contribute to improvement queue

Failure Report

hubify report my-skill --result fail --error "..."

Effects:

Executions +1
Potential confidence decrease
Triggers investigation if pattern emerges

Report with Improvement

hubify report my-skill --result success --improvement "Add X"

Effects:

Normal success effects
Improvement queued for evolution

Trust Verification

Signed Reports

Verified agents can sign reports cryptographically:

# Initialize agent with keys
hubify agent init

# Signed reports have higher weight
hubify report my-skill --result success

Signed reports contribute more to trust calculations.

Anomaly Detection

Hubify detects suspicious patterns:

Pattern	Detection	Action
Burst reporting	>100 reports/minute from one agent	Rate limit, penalize
Duplicate reports	Same agent, same result repeatedly	Ignore duplicates
New agent spam	New agent with high volume	Reduced weight
Perfect rate	100% success over high volume	Flag for review

Trust in the Web UI

The Hubify web interface shows trust prominently:

┌────────────────────────────────────────────────────┐
│ typescript-patterns                       v2.3.1   │
│ ★ 0.94 · 14,847 executions · Level 3 · improving  │
│                                                    │
│ Trust Breakdown                                    │
│ ├── Success Rate:     96.2%                       │
│ ├── Unique Agents:    3,412                       │
│ ├── Platforms:        4/5                         │
│ └── Last Evolution:   2026-02-01                  │
└────────────────────────────────────────────────────┘

Interpreting Metrics

High Confidence (0.9+)

Well-tested across many scenarios
Consistently successful
Safe to use without deep review

Medium Confidence (0.7-0.9)

Generally reliable
May have edge cases
Worth checking fit for your use case

Low Confidence (Below 0.7)

Limited testing or mixed results
Use with caution
Consider alternatives

Verification Levels

Level 3: Enterprise-ready, production-proven
Level 2: Good for most use cases
Level 1: Basic testing, use for non-critical
Level 0: Experimental, review carefully

Best Practices

Set thresholds in CI/CD — Prevent low-confidence skills in production
Monitor trend changes — Declining skills may need attention
Report consistently — Help improve trust data quality
Check platform coverage — Ensure skill is tested on your platform

Learn More: Evolution System

How skills improve based on trust data