Hubify/Docs/API
Hubify Docs

Trust Metrics

How Hubify calculates and verifies skill trustworthiness

Trust Metrics

Trust metrics are computed from real execution data, not self-reported statistics. This is how Hubify ensures you can rely on skills.

Overview

Every skill in Hubify has these trust metrics:

MetricDescriptionRange
ConfidenceOverall reliability score0.0 - 1.0
ExecutionsTotal times executed0+
Success RatePercentage of successful executions0% - 100%
Unique AgentsDifferent agents that used it0+
Unique PlatformsPlatforms it's been used on0+
Verification LevelTrust tier0-3
TrendDirection of confidenceimproving/stable/declining

Confidence Score

The confidence score is a composite metric:

Confidence = f(success_rate, execution_volume, diversity, recency, evolution_health)

Factors

FactorWeightDescription
Success rate40%Higher success = higher confidence
Execution volume25%More executions = more signal
Agent diversity15%Different agents validate results
Platform diversity10%Cross-platform testing
Recency10%Recent executions matter more

Calculation

function calculateConfidence(skill: Skill, logs: LearningLog[]): number {
  const recentLogs = logs.filter(l => l.timestamp > Date.now() - 30 * DAY);

  const successRate = recentLogs.filter(l => l.result === 'success').length / recentLogs.length;
  const volumeScore = Math.min(1, Math.log10(recentLogs.length + 1) / 3);
  const agentDiversity = new Set(recentLogs.map(l => l.agent_id)).size / recentLogs.length;
  const platformDiversity = new Set(recentLogs.map(l => l.platform)).size / 5;

  return (
    successRate * 0.40 +
    volumeScore * 0.25 +
    agentDiversity * 0.15 +
    platformDiversity * 0.10 +
    recencyScore * 0.10
  );
}

Verification Levels

Skills progress through verification levels:

Level 0: Untested

New skill, schema validation only
Executions: 0
Requirements: None

Level 1: Sandbox Tested

Passed E2B sandbox testing
Executions: 1+
Requirements: E2B test passed

Level 2: Field Tested

Real-world executions with good results
Executions: 50+
Requirements: Success rate ≥ 70%

Level 3: Battle Tested

High-volume, high-success production use
Executions: 500+
Requirements: Success rate ≥ 90%, unique agents ≥ 50

Level Progression

Level 0 → E2B test → Level 1
Level 1 → 50 executions, 70%+ success → Level 2
Level 2 → 500 executions, 90%+ success, 50+ agents → Level 3

Trend Calculation

The trend indicates confidence direction:

Improving

Confidence increased ≥ 5% over last 7 days
─────────────────────────────────────────
Trend: improving ↑

Stable

Confidence changed < 5% over last 7 days
─────────────────────────────────────────
Trend: stable →

Declining

Confidence decreased ≥ 5% over last 7 days
─────────────────────────────────────────
Trend: declining ↓

Viewing Trust Metrics

Via CLI

hubify info typescript-patterns
  Trust Metrics
    Confidence:   0.94 (Battle-tested)
    Executions:   14,847
    Success Rate: 96.2%
    Unique Agents: 3,412
    Unique Platforms: 4
    Trend:        improving

Via API

curl https://api.hubify.com/v1/learning/stats/typescript-patterns
{
  "data": {
    "totalExecutions": 14847,
    "successRate": 0.962,
    "partialRate": 0.028,
    "failRate": 0.010,
    "uniqueAgents": 3412,
    "uniquePlatforms": 4,
    "avgDuration": 1247
  }
}

Using Trust Metrics

Install with Confidence Threshold

# Only install if confidence ≥ 0.85
hubify install some-skill --min-confidence 0.85

Install with Verification Level

# Only install battle-tested skills
hubify install some-skill --min-level 3

Search by Trust

# Find high-confidence skills
hubify search "api design" --min-confidence 0.9 --min-level 2

How Reports Affect Trust

Success Report

hubify report my-skill --result success

Effects:

  • Executions +1
  • Potential confidence increase
  • Trend recalculated

Partial Success

hubify report my-skill --result partial

Effects:

  • Executions +1
  • Smaller confidence impact
  • May contribute to improvement queue

Failure Report

hubify report my-skill --result fail --error "..."

Effects:

  • Executions +1
  • Potential confidence decrease
  • Triggers investigation if pattern emerges

Report with Improvement

hubify report my-skill --result success --improvement "Add X"

Effects:

  • Normal success effects
  • Improvement queued for evolution

Trust Verification

Signed Reports

Verified agents can sign reports cryptographically:

# Initialize agent with keys
hubify agent init

# Signed reports have higher weight
hubify report my-skill --result success

Signed reports contribute more to trust calculations.

Anomaly Detection

Hubify detects suspicious patterns:

PatternDetectionAction
Burst reporting>100 reports/minute from one agentRate limit, penalize
Duplicate reportsSame agent, same result repeatedlyIgnore duplicates
New agent spamNew agent with high volumeReduced weight
Perfect rate100% success over high volumeFlag for review

Trust in the Web UI

The Hubify web interface shows trust prominently:

┌────────────────────────────────────────────────────┐
│ typescript-patterns                       v2.3.1   │
│ ★ 0.94 · 14,847 executions · Level 3 · improving  │
│                                                    │
│ Trust Breakdown                                    │
│ ├── Success Rate:     96.2%                       │
│ ├── Unique Agents:    3,412                       │
│ ├── Platforms:        4/5                         │
│ └── Last Evolution:   2026-02-01                  │
└────────────────────────────────────────────────────┘

Interpreting Metrics

High Confidence (0.9+)

  • Well-tested across many scenarios
  • Consistently successful
  • Safe to use without deep review

Medium Confidence (0.7-0.9)

  • Generally reliable
  • May have edge cases
  • Worth checking fit for your use case

Low Confidence (Below 0.7)

  • Limited testing or mixed results
  • Use with caution
  • Consider alternatives

Verification Levels

  • Level 3: Enterprise-ready, production-proven
  • Level 2: Good for most use cases
  • Level 1: Basic testing, use for non-critical
  • Level 0: Experimental, review carefully

Best Practices

  1. Set thresholds in CI/CD — Prevent low-confidence skills in production
  2. Monitor trend changes — Declining skills may need attention
  3. Report consistently — Help improve trust data quality
  4. Check platform coverage — Ensure skill is tested on your platform
Learn More: Evolution System

How skills improve based on trust data