Research SDK
hubify-research is the Python toolkit that powers research workspaces on Hubify. It provides multi-model LLM routing, unified literature search, math verification, GPU pod management, archive data access, and dataset streaming — all from a single pip install.
The toolkit runs on your workspace VPS (auto-installed on boot) and integrates with the CLI, MCP server, and Convex backend for a seamless research pipeline.
Architecture
┌──────────────────────────────────────────────────────────┐
│ Your Workspace │
│ │
│ hubify-research (Python) │
│ ├── router — Multi-model LLM routing │
│ ├── literature — arXiv, ADS, S2, Perplexity │
│ ├── computation — Wolfram Alpha + DeepSeek R1 │
│ ├── data_access — MAST, Gaia, VizieR, NED │
│ ├── dataset_loader — HuggingFace streaming │
│ ├── env_check — API key validation │
│ └── gpu/ │
│ ├── runpod — Pod lifecycle (GraphQL API) │
│ └── session — Env detection, key loading │
│ │
│ hubify CLI (TypeScript) │
│ └── research subcommands orchestrate Python toolkit │
│ │
│ @hubify/mcp (TypeScript) │
│ └── 7 research tools exposed to AI editors │
│ │
├──────────────────────────────────────────────────────────┤
│ Convex (backend) │
│ ├── researchLabs.ts — findings, labs, toolkit status │
│ ├── research.ts — missions, updates, publishing │
│ └── experimentDag.ts — DAG nodes, frontier, claims │
└──────────────────────────────────────────────────────────┘
Installation
# Standard install
pip install hubify-research
# With all optional dependencies
pip install hubify-research[all]
# Development mode (from repo)
pip install -e packages/sdk/python/
# Via CLI
hubify research tools install
Workspace VPS images auto-install hubify-research on boot. You only need manual installation for local development.
Optional dependency groups
Group Packages Use case llmanthropic, openaiNative SDK connectivity tests scienceastroquery, astropyMAST, Gaia, VizieR, NED queries datasetsdatasets, huggingface-hubHuggingFace dataset streaming gputorchGPU detection and CUDA checks allEverything above Full research stack
Modules
1. Router — Multi-model LLM routing
Routes research questions to the best model based on task type. Seven providers, three API protocols (OpenAI-compatible, Anthropic Messages, Google Gemini).
from hubify_research.router import query, multi_query, available_models
# Route by task type (auto-selects best model)
result = query( "Verify this tensor contraction: ..." , task = "math_rigor" )
# → Uses DeepSeek R1
result = query( "Summarize recent advances in..." , task = "literature" )
# → Uses Perplexity sonar-pro
result = query( "Write an abstract for..." , task = "writing" )
# → Uses Claude Opus
# Compare across multiple models
results = multi_query( "Is P = NP?" , models = [ "math_rigor" , "reasoning" , "writing" ])
# List configured models
for key, info in available_models().items():
print ( f " [ { key } ] { info[ 'name' ] } — { info[ 'description' ] } " )
Available models
Task Key Model Provider Best for math_rigorDeepSeek R1 DeepSeek Math verification, sign errors, derivations multimodalGemini 2.5 Pro Google Images, plots, long-form math writingClaude Opus Anthropic Academic writing, argument structure reasoningGPT-4o OpenAI General reasoning literatureSonar Pro Perplexity Web-grounded paper search fastGrok 3 xAI Quick reasoning, alternative perspectives multiClaude Sonnet (via OpenRouter) OpenRouter Multi-model routing, comparison runs
Task routing table
Task alias Routes to math_rigor, tensor_check, sign_error, derivationDeepSeek R1 multimodal, plot_analysis, imageGemini 2.5 Pro writing, abstract, paper_editClaude Opus reasoning, generalGPT-4o literature, search, recent_papersPerplexity fast, quickGrok 3 compareOpenRouter
2. Literature — Unified search
Search across NASA ADS, Semantic Scholar, arXiv, and Perplexity with a single call.
from hubify_research.literature import search, search_ads, search_arxiv, search_s2
# Unified search (queries all configured sources)
results = search( "transformer architectures 2025" , max_results = 5 , category = "cs.AI" )
# Returns: {"arxiv": [...], "ads": [...], "s2": [...], "perplexity": {...}}
# Source-specific queries
ads_papers = search_ads( "large language models" , rows = 20 , sort = "citation_count desc" )
arxiv_papers = search_arxiv( "diffusion models" , category = "cs.CV" , max_results = 10 )
s2_papers = search_s2( "attention mechanism" , limit = 10 )
# Citation graph
from hubify_research.literature import s2_citation_graph
graph = s2_citation_graph( "arxiv:2106.09685" ) # LoRA paper
Source API Key Required Features arXiv No Full-text search, category filters, date sorting NASA ADS NASA_ADS_API_KEYCitation counts, year ranges, ADS query syntax Semantic Scholar SEMANTIC_SCHOLAR_API_KEY (optional)Citation graphs, paper details, author search Perplexity PERPLEXITY_API_KEYWeb-grounded synthesis with citations
3. Computation — Math verification
Dual-engine verification using Wolfram Alpha for numerical checks and DeepSeek R1 for logical rigor.
from hubify_research.computation import verify_equation, wolfram, cross_check
# Combined Wolfram + DeepSeek verification
report = verify_equation( "integrate x^2 sin(x) dx" , expected = "-x^2 cos(x) + 2x sin(x) + 2 cos(x)" )
# Wolfram Alpha directly
result = wolfram( "mass of the Sun in kg" , format = "short" )
result = wolfram( "solve x^2 + 3x - 4 = 0" , format = "full" )
result = wolfram( "speed of light in m/s" , format = "llm" )
# Cross-check across multiple reasoning models
report = cross_check( "Verify: [alpha/M] has dimensions of -1 and [F] has dimensions of +2" )
# Returns consensus verdict from 3 models
4. Data Access — Archive queries
Query MAST (JWST/HST/TESS), Gaia DR3, VizieR catalogs, and NED. No API keys required.
from hubify_research.data_access import search_mast, search_gaia, query_catalog, search_ned
# JWST observations
jwst = search_mast( target = "NGC 1365" , collection = "JWST" , radius_arcmin = 5 )
# Gaia DR3 astrometry
gaia = search_gaia( ra = 53.23 , dec =- 36.14 , radius_deg = 0.1 )
# VizieR catalog (e.g., 2MASS)
catalog = query_catalog( "II/246" , target = "M31" , radius_arcmin = 10 )
# NED extragalactic database
ned = search_ned( "NGC 1365" )
# S3 direct access for JWST data
from hubify_research.data_access import mast_s3_uri
uri = mast_s3_uri( "jw02107-o001_t003_nircam_f200w" , collection = "jwst" )
Data access functions require astroquery and astropy. Install with: pip install hubify-research[science]
5. Dataset Loader — HuggingFace streaming
Stream datasets from HuggingFace without full downloads. Includes a configurable registry for shortnames.
from hubify_research.dataset_loader import load_hf, register_dataset, dataset_info
# Load directly by HF ID
ds = load_hf( "imdb" , split = "train" , streaming = True , max_samples = 100 )
# Register shortnames for your project
register_dataset( "my-data" , "username/my-custom-dataset" )
ds = load_hf( "my-data" , split = "train" , streaming = True )
# Get metadata without downloading
info = dataset_info( "imdb" )
print (info[ "splits" ]) # {"train": 25000, "test": 25000}
6. Environment Check — API key validation
Validates which API keys are configured and optionally tests connectivity.
from hubify_research.env_check import check_keys
configured, missing_required, missing_optional = check_keys()
for env_var, name, category, masked in configured:
print ( f " [ { category } ] { name } : { masked } " )
# CLI
python -m hubify_research check
python -m hubify_research check --test
hubify research tools check
hubify research tools check --test
7. GPU / RunPod — Pod lifecycle management
Full pod lifecycle via the RunPod GraphQL API: create, stop, start, terminate, SSH, and remote command execution.
from hubify_research.gpu.runpod import (
list_pods, create_pod, stop_pod, start_pod,
get_ssh_command, setup_pod, list_gpu_types
)
# List pods
pods = list_pods()
# Create a pod
pod = create_pod(
gpu_type = "NVIDIA RTX 4090" ,
name = "my-research" ,
volume_gb = 50 ,
)
# Full environment setup (install deps, validate, check GPU)
setup_pod(pod[ "id" ])
# SSH command
print (get_ssh_command(pod[ "id" ]))
# → ssh root@IP -p PORT -i ~/.ssh/id_ed25519
# Available GPUs with pricing
gpus = list_gpu_types()
8. GPU / Session — Environment detection
Detects the runtime platform (RunPod, Lambda, Colab, or local) and loads API keys from the appropriate source.
from hubify_research.gpu.session import detect_environment, load_api_keys, run_session
# Detect platform
env = detect_environment()
print (env[ "platform" ]) # "runpod" | "lambda" | "colab" | "local"
print (env[ "gpu_name" ]) # "NVIDIA RTX 4090" or None
print (env[ "gpu_vram_gb" ]) # 24.0
# Load all API keys
keys = load_api_keys(env)
# Run full session validation
results = run_session( checks = [ "env" , "keys" , "gpu" ])
API Keys
The toolkit tracks 13 keys across 6 categories:
Category Key Service LLM ANTHROPIC_API_KEYAnthropic Claude LLM OPENAI_API_KEYOpenAI GPT LLM GOOGLE_AI_API_KEYGoogle Gemini LLM DEEPSEEK_API_KEYDeepSeek R1 LLM XAI_API_KEYxAI Grok LLM OPENROUTER_API_KEYOpenRouter Science NASA_ADS_API_KEYNASA ADS Science SEMANTIC_SCHOLAR_API_KEYSemantic Scholar Compute PERPLEXITY_API_KEYPerplexity Compute WOLFRAM_ALPHA_APP_IDWolfram Alpha Data HUGGINGFACE_TOKENHugging Face GPU RUNPOD_API_KEYRunPod Web FIRECRAWL_API_KEYFirecrawl
Keys are loaded from (in order of priority):
.env or .env.local in the current directory
~/.hubify/.env
Colab Secrets (on Google Colab)
Environment variables
None of the keys are strictly required. The toolkit degrades gracefully — modules that need a missing key will raise a clear error, while other modules continue to work.
Output Directory
Research outputs (search results, session reports, data exports) are saved to:
/workspace/outputs/ on RunPod
$HUBIFY_RESEARCH_OUTPUT_DIR if set
~/.hubify/research/outputs/ on local machines
Research Labs Dedicated research workspaces with budget controls
Research MCP 7 research tools for AI editors via MCP
CLI Reference Full CLI command reference for research
Quickstart Guide Get started with the research toolkit in 5 minutes