Skip to main content

Research SDK

hubify-research is the Python toolkit that powers research workspaces on Hubify. It provides multi-model LLM routing, unified literature search, math verification, GPU pod management, archive data access, and dataset streaming — all from a single pip install. The toolkit runs on your workspace VPS (auto-installed on boot) and integrates with the CLI, MCP server, and Convex backend for a seamless research pipeline.

Architecture

┌──────────────────────────────────────────────────────────┐
│                     Your Workspace                       │
│                                                          │
│  hubify-research (Python)                                │
│  ├── router         — Multi-model LLM routing            │
│  ├── literature     — arXiv, ADS, S2, Perplexity         │
│  ├── computation    — Wolfram Alpha + DeepSeek R1         │
│  ├── data_access    — MAST, Gaia, VizieR, NED            │
│  ├── dataset_loader — HuggingFace streaming               │
│  ├── env_check      — API key validation                  │
│  └── gpu/                                                │
│      ├── runpod     — Pod lifecycle (GraphQL API)         │
│      └── session    — Env detection, key loading          │
│                                                          │
│  hubify CLI (TypeScript)                                 │
│  └── research subcommands orchestrate Python toolkit     │
│                                                          │
│  @hubify/mcp (TypeScript)                                │
│  └── 7 research tools exposed to AI editors              │
│                                                          │
├──────────────────────────────────────────────────────────┤
│  Convex (backend)                                        │
│  ├── researchLabs.ts  — findings, labs, toolkit status   │
│  ├── research.ts      — missions, updates, publishing    │
│  └── experimentDag.ts — DAG nodes, frontier, claims      │
└──────────────────────────────────────────────────────────┘

Installation

# Standard install
pip install hubify-research

# With all optional dependencies
pip install hubify-research[all]

# Development mode (from repo)
pip install -e packages/sdk/python/

# Via CLI
hubify research tools install
Workspace VPS images auto-install hubify-research on boot. You only need manual installation for local development.

Optional dependency groups

GroupPackagesUse case
llmanthropic, openaiNative SDK connectivity tests
scienceastroquery, astropyMAST, Gaia, VizieR, NED queries
datasetsdatasets, huggingface-hubHuggingFace dataset streaming
gputorchGPU detection and CUDA checks
allEverything aboveFull research stack

Modules

1. Router — Multi-model LLM routing

Routes research questions to the best model based on task type. Seven providers, three API protocols (OpenAI-compatible, Anthropic Messages, Google Gemini).
from hubify_research.router import query, multi_query, available_models

# Route by task type (auto-selects best model)
result = query("Verify this tensor contraction: ...", task="math_rigor")
# → Uses DeepSeek R1

result = query("Summarize recent advances in...", task="literature")
# → Uses Perplexity sonar-pro

result = query("Write an abstract for...", task="writing")
# → Uses Claude Opus

# Compare across multiple models
results = multi_query("Is P = NP?", models=["math_rigor", "reasoning", "writing"])

# List configured models
for key, info in available_models().items():
    print(f"  [{key}] {info['name']}{info['description']}")

Available models

Task KeyModelProviderBest for
math_rigorDeepSeek R1DeepSeekMath verification, sign errors, derivations
multimodalGemini 2.5 ProGoogleImages, plots, long-form math
writingClaude OpusAnthropicAcademic writing, argument structure
reasoningGPT-4oOpenAIGeneral reasoning
literatureSonar ProPerplexityWeb-grounded paper search
fastGrok 3xAIQuick reasoning, alternative perspectives
multiClaude Sonnet (via OpenRouter)OpenRouterMulti-model routing, comparison runs

Task routing table

Task aliasRoutes to
math_rigor, tensor_check, sign_error, derivationDeepSeek R1
multimodal, plot_analysis, imageGemini 2.5 Pro
writing, abstract, paper_editClaude Opus
reasoning, generalGPT-4o
literature, search, recent_papersPerplexity
fast, quickGrok 3
compareOpenRouter
Search across NASA ADS, Semantic Scholar, arXiv, and Perplexity with a single call.
from hubify_research.literature import search, search_ads, search_arxiv, search_s2

# Unified search (queries all configured sources)
results = search("transformer architectures 2025", max_results=5, category="cs.AI")
# Returns: {"arxiv": [...], "ads": [...], "s2": [...], "perplexity": {...}}

# Source-specific queries
ads_papers = search_ads("large language models", rows=20, sort="citation_count desc")
arxiv_papers = search_arxiv("diffusion models", category="cs.CV", max_results=10)
s2_papers = search_s2("attention mechanism", limit=10)

# Citation graph
from hubify_research.literature import s2_citation_graph
graph = s2_citation_graph("arxiv:2106.09685")  # LoRA paper
SourceAPI Key RequiredFeatures
arXivNoFull-text search, category filters, date sorting
NASA ADSNASA_ADS_API_KEYCitation counts, year ranges, ADS query syntax
Semantic ScholarSEMANTIC_SCHOLAR_API_KEY (optional)Citation graphs, paper details, author search
PerplexityPERPLEXITY_API_KEYWeb-grounded synthesis with citations

3. Computation — Math verification

Dual-engine verification using Wolfram Alpha for numerical checks and DeepSeek R1 for logical rigor.
from hubify_research.computation import verify_equation, wolfram, cross_check

# Combined Wolfram + DeepSeek verification
report = verify_equation("integrate x^2 sin(x) dx", expected="-x^2 cos(x) + 2x sin(x) + 2 cos(x)")

# Wolfram Alpha directly
result = wolfram("mass of the Sun in kg", format="short")
result = wolfram("solve x^2 + 3x - 4 = 0", format="full")
result = wolfram("speed of light in m/s", format="llm")

# Cross-check across multiple reasoning models
report = cross_check("Verify: [alpha/M] has dimensions of -1 and [F] has dimensions of +2")
# Returns consensus verdict from 3 models

4. Data Access — Archive queries

Query MAST (JWST/HST/TESS), Gaia DR3, VizieR catalogs, and NED. No API keys required.
from hubify_research.data_access import search_mast, search_gaia, query_catalog, search_ned

# JWST observations
jwst = search_mast(target="NGC 1365", collection="JWST", radius_arcmin=5)

# Gaia DR3 astrometry
gaia = search_gaia(ra=53.23, dec=-36.14, radius_deg=0.1)

# VizieR catalog (e.g., 2MASS)
catalog = query_catalog("II/246", target="M31", radius_arcmin=10)

# NED extragalactic database
ned = search_ned("NGC 1365")

# S3 direct access for JWST data
from hubify_research.data_access import mast_s3_uri
uri = mast_s3_uri("jw02107-o001_t003_nircam_f200w", collection="jwst")
Data access functions require astroquery and astropy. Install with: pip install hubify-research[science]

5. Dataset Loader — HuggingFace streaming

Stream datasets from HuggingFace without full downloads. Includes a configurable registry for shortnames.
from hubify_research.dataset_loader import load_hf, register_dataset, dataset_info

# Load directly by HF ID
ds = load_hf("imdb", split="train", streaming=True, max_samples=100)

# Register shortnames for your project
register_dataset("my-data", "username/my-custom-dataset")
ds = load_hf("my-data", split="train", streaming=True)

# Get metadata without downloading
info = dataset_info("imdb")
print(info["splits"])  # {"train": 25000, "test": 25000}

6. Environment Check — API key validation

Validates which API keys are configured and optionally tests connectivity.
from hubify_research.env_check import check_keys

configured, missing_required, missing_optional = check_keys()

for env_var, name, category, masked in configured:
    print(f"  [{category}] {name}: {masked}")
# CLI
python -m hubify_research check
python -m hubify_research check --test
hubify research tools check
hubify research tools check --test

7. GPU / RunPod — Pod lifecycle management

Full pod lifecycle via the RunPod GraphQL API: create, stop, start, terminate, SSH, and remote command execution.
from hubify_research.gpu.runpod import (
    list_pods, create_pod, stop_pod, start_pod,
    get_ssh_command, setup_pod, list_gpu_types
)

# List pods
pods = list_pods()

# Create a pod
pod = create_pod(
    gpu_type="NVIDIA RTX 4090",
    name="my-research",
    volume_gb=50,
)

# Full environment setup (install deps, validate, check GPU)
setup_pod(pod["id"])

# SSH command
print(get_ssh_command(pod["id"]))
# → ssh root@IP -p PORT -i ~/.ssh/id_ed25519

# Available GPUs with pricing
gpus = list_gpu_types()

8. GPU / Session — Environment detection

Detects the runtime platform (RunPod, Lambda, Colab, or local) and loads API keys from the appropriate source.
from hubify_research.gpu.session import detect_environment, load_api_keys, run_session

# Detect platform
env = detect_environment()
print(env["platform"])   # "runpod" | "lambda" | "colab" | "local"
print(env["gpu_name"])   # "NVIDIA RTX 4090" or None
print(env["gpu_vram_gb"])  # 24.0

# Load all API keys
keys = load_api_keys(env)

# Run full session validation
results = run_session(checks=["env", "keys", "gpu"])

API Keys

The toolkit tracks 13 keys across 6 categories:
CategoryKeyService
LLMANTHROPIC_API_KEYAnthropic Claude
LLMOPENAI_API_KEYOpenAI GPT
LLMGOOGLE_AI_API_KEYGoogle Gemini
LLMDEEPSEEK_API_KEYDeepSeek R1
LLMXAI_API_KEYxAI Grok
LLMOPENROUTER_API_KEYOpenRouter
ScienceNASA_ADS_API_KEYNASA ADS
ScienceSEMANTIC_SCHOLAR_API_KEYSemantic Scholar
ComputePERPLEXITY_API_KEYPerplexity
ComputeWOLFRAM_ALPHA_APP_IDWolfram Alpha
DataHUGGINGFACE_TOKENHugging Face
GPURUNPOD_API_KEYRunPod
WebFIRECRAWL_API_KEYFirecrawl
Keys are loaded from (in order of priority):
  1. .env or .env.local in the current directory
  2. ~/.hubify/.env
  3. Colab Secrets (on Google Colab)
  4. Environment variables
None of the keys are strictly required. The toolkit degrades gracefully — modules that need a missing key will raise a clear error, while other modules continue to work.

Output Directory

Research outputs (search results, session reports, data exports) are saved to:
  • /workspace/outputs/ on RunPod
  • $HUBIFY_RESEARCH_OUTPUT_DIR if set
  • ~/.hubify/research/outputs/ on local machines

Research Labs

Dedicated research workspaces with budget controls

Research MCP

7 research tools for AI editors via MCP

CLI Reference

Full CLI command reference for research

Quickstart Guide

Get started with the research toolkit in 5 minutes