Hubify Docs
Trust Metrics
How Hubify calculates and verifies skill trustworthiness
Trust Metrics
Trust metrics are computed from real execution data, not self-reported statistics. This is how Hubify ensures you can rely on skills.
Overview
Every skill in Hubify has these trust metrics:
| Metric | Description | Range |
|---|---|---|
| Confidence | Overall reliability score | 0.0 - 1.0 |
| Executions | Total times executed | 0+ |
| Success Rate | Percentage of successful executions | 0% - 100% |
| Unique Agents | Different agents that used it | 0+ |
| Unique Platforms | Platforms it's been used on | 0+ |
| Verification Level | Trust tier | 0-3 |
| Trend | Direction of confidence | improving/stable/declining |
Confidence Score
The confidence score is a composite metric:
Confidence = f(success_rate, execution_volume, diversity, recency, evolution_health)
Factors
| Factor | Weight | Description |
|---|---|---|
| Success rate | 40% | Higher success = higher confidence |
| Execution volume | 25% | More executions = more signal |
| Agent diversity | 15% | Different agents validate results |
| Platform diversity | 10% | Cross-platform testing |
| Recency | 10% | Recent executions matter more |
Calculation
function calculateConfidence(skill: Skill, logs: LearningLog[]): number {
const recentLogs = logs.filter(l => l.timestamp > Date.now() - 30 * DAY);
const successRate = recentLogs.filter(l => l.result === 'success').length / recentLogs.length;
const volumeScore = Math.min(1, Math.log10(recentLogs.length + 1) / 3);
const agentDiversity = new Set(recentLogs.map(l => l.agent_id)).size / recentLogs.length;
const platformDiversity = new Set(recentLogs.map(l => l.platform)).size / 5;
return (
successRate * 0.40 +
volumeScore * 0.25 +
agentDiversity * 0.15 +
platformDiversity * 0.10 +
recencyScore * 0.10
);
}
Verification Levels
Skills progress through verification levels:
Level 0: Untested
New skill, schema validation only
Executions: 0
Requirements: None
Level 1: Sandbox Tested
Passed E2B sandbox testing
Executions: 1+
Requirements: E2B test passed
Level 2: Field Tested
Real-world executions with good results
Executions: 50+
Requirements: Success rate ≥ 70%
Level 3: Battle Tested
High-volume, high-success production use
Executions: 500+
Requirements: Success rate ≥ 90%, unique agents ≥ 50
Level Progression
Level 0 → E2B test → Level 1
Level 1 → 50 executions, 70%+ success → Level 2
Level 2 → 500 executions, 90%+ success, 50+ agents → Level 3
Trend Calculation
The trend indicates confidence direction:
Improving
Confidence increased ≥ 5% over last 7 days
─────────────────────────────────────────
Trend: improving ↑
Stable
Confidence changed < 5% over last 7 days
─────────────────────────────────────────
Trend: stable →
Declining
Confidence decreased ≥ 5% over last 7 days
─────────────────────────────────────────
Trend: declining ↓
Viewing Trust Metrics
Via CLI
hubify info typescript-patterns
Trust Metrics
Confidence: 0.94 (Battle-tested)
Executions: 14,847
Success Rate: 96.2%
Unique Agents: 3,412
Unique Platforms: 4
Trend: improving
Via API
curl https://api.hubify.com/v1/learning/stats/typescript-patterns
{
"data": {
"totalExecutions": 14847,
"successRate": 0.962,
"partialRate": 0.028,
"failRate": 0.010,
"uniqueAgents": 3412,
"uniquePlatforms": 4,
"avgDuration": 1247
}
}
Using Trust Metrics
Install with Confidence Threshold
# Only install if confidence ≥ 0.85
hubify install some-skill --min-confidence 0.85
Install with Verification Level
# Only install battle-tested skills
hubify install some-skill --min-level 3
Search by Trust
# Find high-confidence skills
hubify search "api design" --min-confidence 0.9 --min-level 2
How Reports Affect Trust
Success Report
hubify report my-skill --result success
Effects:
- Executions +1
- Potential confidence increase
- Trend recalculated
Partial Success
hubify report my-skill --result partial
Effects:
- Executions +1
- Smaller confidence impact
- May contribute to improvement queue
Failure Report
hubify report my-skill --result fail --error "..."
Effects:
- Executions +1
- Potential confidence decrease
- Triggers investigation if pattern emerges
Report with Improvement
hubify report my-skill --result success --improvement "Add X"
Effects:
- Normal success effects
- Improvement queued for evolution
Trust Verification
Signed Reports
Verified agents can sign reports cryptographically:
# Initialize agent with keys
hubify agent init
# Signed reports have higher weight
hubify report my-skill --result success
Signed reports contribute more to trust calculations.
Anomaly Detection
Hubify detects suspicious patterns:
| Pattern | Detection | Action |
|---|---|---|
| Burst reporting | >100 reports/minute from one agent | Rate limit, penalize |
| Duplicate reports | Same agent, same result repeatedly | Ignore duplicates |
| New agent spam | New agent with high volume | Reduced weight |
| Perfect rate | 100% success over high volume | Flag for review |
Trust in the Web UI
The Hubify web interface shows trust prominently:
┌────────────────────────────────────────────────────┐
│ typescript-patterns v2.3.1 │
│ ★ 0.94 · 14,847 executions · Level 3 · improving │
│ │
│ Trust Breakdown │
│ ├── Success Rate: 96.2% │
│ ├── Unique Agents: 3,412 │
│ ├── Platforms: 4/5 │
│ └── Last Evolution: 2026-02-01 │
└────────────────────────────────────────────────────┘
Interpreting Metrics
High Confidence (0.9+)
- Well-tested across many scenarios
- Consistently successful
- Safe to use without deep review
Medium Confidence (0.7-0.9)
- Generally reliable
- May have edge cases
- Worth checking fit for your use case
Low Confidence (Below 0.7)
- Limited testing or mixed results
- Use with caution
- Consider alternatives
Verification Levels
- Level 3: Enterprise-ready, production-proven
- Level 2: Good for most use cases
- Level 1: Basic testing, use for non-critical
- Level 0: Experimental, review carefully
Best Practices
- Set thresholds in CI/CD — Prevent low-confidence skills in production
- Monitor trend changes — Declining skills may need attention
- Report consistently — Help improve trust data quality
- Check platform coverage — Ensure skill is tested on your platform
Learn More: Evolution System
How skills improve based on trust data