Research SDK

hubify-research is the Python toolkit that powers research workspaces on Hubify. It provides multi-model LLM routing, unified literature search, math verification, GPU pod management, archive data access, and dataset streaming — all from a single pip install. The toolkit runs on your workspace VPS (auto-installed on boot) and integrates with the CLI, MCP server, and Convex backend for a seamless research pipeline.

Architecture

┌──────────────────────────────────────────────────────────┐
│                     Your Workspace                       │
│                                                          │
│  hubify-research (Python)                                │
│  ├── router         — Multi-model LLM routing            │
│  ├── literature     — arXiv, ADS, S2, Perplexity         │
│  ├── computation    — Wolfram Alpha + DeepSeek R1         │
│  ├── data_access    — MAST, Gaia, VizieR, NED            │
│  ├── dataset_loader — HuggingFace streaming               │
│  ├── env_check      — API key validation                  │
│  └── gpu/                                                │
│      ├── runpod     — Pod lifecycle (GraphQL API)         │
│      └── session    — Env detection, key loading          │
│                                                          │
│  hubify CLI (TypeScript)                                 │
│  └── research subcommands orchestrate Python toolkit     │
│                                                          │
│  @hubify/mcp (TypeScript)                                │
│  └── 7 research tools exposed to AI editors              │
│                                                          │
├──────────────────────────────────────────────────────────┤
│  Convex (backend)                                        │
│  ├── researchLabs.ts  — findings, labs, toolkit status   │
│  ├── research.ts      — missions, updates, publishing    │
│  └── experimentDag.ts — DAG nodes, frontier, claims      │
└──────────────────────────────────────────────────────────┘

Installation

# Standard install
pip install hubify-research

# With all optional dependencies
pip install hubify-research[all]

# Development mode (from repo)
pip install -e packages/sdk/python/

# Via CLI
hubify research tools install

Workspace VPS images auto-install hubify-research on boot. You only need manual installation for local development.

Optional dependency groups

Group	Packages	Use case
`llm`	`anthropic`, `openai`	Native SDK connectivity tests
`science`	`astroquery`, `astropy`	MAST, Gaia, VizieR, NED queries
`datasets`	`datasets`, `huggingface-hub`	HuggingFace dataset streaming
`gpu`	`torch`	GPU detection and CUDA checks
`all`	Everything above	Full research stack

Modules

1. Router — Multi-model LLM routing

Routes research questions to the best model based on task type. Seven providers, three API protocols (OpenAI-compatible, Anthropic Messages, Google Gemini).

from hubify_research.router import query, multi_query, available_models

# Route by task type (auto-selects best model)
result = query("Verify this tensor contraction: ...", task="math_rigor")
# → Uses DeepSeek R1

result = query("Summarize recent advances in...", task="literature")
# → Uses Perplexity sonar-pro

result = query("Write an abstract for...", task="writing")
# → Uses Claude Opus

# Compare across multiple models
results = multi_query("Is P = NP?", models=["math_rigor", "reasoning", "writing"])

# List configured models
for key, info in available_models().items():
    print(f"  [{key}] {info['name']} — {info['description']}")

Available models

Task Key	Model	Provider	Best for
`math_rigor`	DeepSeek R1	DeepSeek	Math verification, sign errors, derivations
`multimodal`	Gemini 2.5 Pro	Google	Images, plots, long-form math
`writing`	Claude Opus	Anthropic	Academic writing, argument structure
`reasoning`	GPT-4o	OpenAI	General reasoning
`literature`	Sonar Pro	Perplexity	Web-grounded paper search
`fast`	Grok 3	xAI	Quick reasoning, alternative perspectives
`multi`	Claude Sonnet (via OpenRouter)	OpenRouter	Multi-model routing, comparison runs

Task routing table

Task alias	Routes to
`math_rigor`, `tensor_check`, `sign_error`, `derivation`	DeepSeek R1
`multimodal`, `plot_analysis`, `image`	Gemini 2.5 Pro
`writing`, `abstract`, `paper_edit`	Claude Opus
`reasoning`, `general`	GPT-4o
`literature`, `search`, `recent_papers`	Perplexity
`fast`, `quick`	Grok 3
`compare`	OpenRouter

2. Literature — Unified search

Search across NASA ADS, Semantic Scholar, arXiv, and Perplexity with a single call.

from hubify_research.literature import search, search_ads, search_arxiv, search_s2

# Unified search (queries all configured sources)
results = search("transformer architectures 2025", max_results=5, category="cs.AI")
# Returns: {"arxiv": [...], "ads": [...], "s2": [...], "perplexity": {...}}

# Source-specific queries
ads_papers = search_ads("large language models", rows=20, sort="citation_count desc")
arxiv_papers = search_arxiv("diffusion models", category="cs.CV", max_results=10)
s2_papers = search_s2("attention mechanism", limit=10)

# Citation graph
from hubify_research.literature import s2_citation_graph
graph = s2_citation_graph("arxiv:2106.09685")  # LoRA paper

Source	API Key Required	Features
arXiv	No	Full-text search, category filters, date sorting
NASA ADS	`NASA_ADS_API_KEY`	Citation counts, year ranges, ADS query syntax
Semantic Scholar	`SEMANTIC_SCHOLAR_API_KEY` (optional)	Citation graphs, paper details, author search
Perplexity	`PERPLEXITY_API_KEY`	Web-grounded synthesis with citations

3. Computation — Math verification

Dual-engine verification using Wolfram Alpha for numerical checks and DeepSeek R1 for logical rigor.

from hubify_research.computation import verify_equation, wolfram, cross_check

# Combined Wolfram + DeepSeek verification
report = verify_equation("integrate x^2 sin(x) dx", expected="-x^2 cos(x) + 2x sin(x) + 2 cos(x)")

# Wolfram Alpha directly
result = wolfram("mass of the Sun in kg", format="short")
result = wolfram("solve x^2 + 3x - 4 = 0", format="full")
result = wolfram("speed of light in m/s", format="llm")

# Cross-check across multiple reasoning models
report = cross_check("Verify: [alpha/M] has dimensions of -1 and [F] has dimensions of +2")
# Returns consensus verdict from 3 models

4. Data Access — Archive queries

Query MAST (JWST/HST/TESS), Gaia DR3, VizieR catalogs, and NED. No API keys required.

from hubify_research.data_access import search_mast, search_gaia, query_catalog, search_ned

# JWST observations
jwst = search_mast(target="NGC 1365", collection="JWST", radius_arcmin=5)

# Gaia DR3 astrometry
gaia = search_gaia(ra=53.23, dec=-36.14, radius_deg=0.1)

# VizieR catalog (e.g., 2MASS)
catalog = query_catalog("II/246", target="M31", radius_arcmin=10)

# NED extragalactic database
ned = search_ned("NGC 1365")

# S3 direct access for JWST data
from hubify_research.data_access import mast_s3_uri
uri = mast_s3_uri("jw02107-o001_t003_nircam_f200w", collection="jwst")

Data access functions require astroquery and astropy. Install with: pip install hubify-research[science]

5. Dataset Loader — HuggingFace streaming

Stream datasets from HuggingFace without full downloads. Includes a configurable registry for shortnames.

from hubify_research.dataset_loader import load_hf, register_dataset, dataset_info

# Load directly by HF ID
ds = load_hf("imdb", split="train", streaming=True, max_samples=100)

# Register shortnames for your project
register_dataset("my-data", "username/my-custom-dataset")
ds = load_hf("my-data", split="train", streaming=True)

# Get metadata without downloading
info = dataset_info("imdb")
print(info["splits"])  # {"train": 25000, "test": 25000}

6. Environment Check — API key validation

Validates which API keys are configured and optionally tests connectivity.

from hubify_research.env_check import check_keys

configured, missing_required, missing_optional = check_keys()

for env_var, name, category, masked in configured:
    print(f"  [{category}] {name}: {masked}")

# CLI
python -m hubify_research check
python -m hubify_research check --test
hubify research tools check
hubify research tools check --test

7. GPU / RunPod — Pod lifecycle management

Full pod lifecycle via the RunPod GraphQL API: create, stop, start, terminate, SSH, and remote command execution.

from hubify_research.gpu.runpod import (
    list_pods, create_pod, stop_pod, start_pod,
    get_ssh_command, setup_pod, list_gpu_types
)

# List pods
pods = list_pods()

# Create a pod
pod = create_pod(
    gpu_type="NVIDIA RTX 4090",
    name="my-research",
    volume_gb=50,
)

# Full environment setup (install deps, validate, check GPU)
setup_pod(pod["id"])

# SSH command
print(get_ssh_command(pod["id"]))
# → ssh root@IP -p PORT -i ~/.ssh/id_ed25519

# Available GPUs with pricing
gpus = list_gpu_types()

8. GPU / Session — Environment detection

Detects the runtime platform (RunPod, Lambda, Colab, or local) and loads API keys from the appropriate source.

from hubify_research.gpu.session import detect_environment, load_api_keys, run_session

# Detect platform
env = detect_environment()
print(env["platform"])   # "runpod" | "lambda" | "colab" | "local"
print(env["gpu_name"])   # "NVIDIA RTX 4090" or None
print(env["gpu_vram_gb"])  # 24.0

# Load all API keys
keys = load_api_keys(env)

# Run full session validation
results = run_session(checks=["env", "keys", "gpu"])

API Keys

The toolkit tracks 13 keys across 6 categories:

Category	Key	Service
LLM	`ANTHROPIC_API_KEY`	Anthropic Claude
LLM	`OPENAI_API_KEY`	OpenAI GPT
LLM	`GOOGLE_AI_API_KEY`	Google Gemini
LLM	`DEEPSEEK_API_KEY`	DeepSeek R1
LLM	`XAI_API_KEY`	xAI Grok
LLM	`OPENROUTER_API_KEY`	OpenRouter
Science	`NASA_ADS_API_KEY`	NASA ADS
Science	`SEMANTIC_SCHOLAR_API_KEY`	Semantic Scholar
Compute	`PERPLEXITY_API_KEY`	Perplexity
Compute	`WOLFRAM_ALPHA_APP_ID`	Wolfram Alpha
Data	`HUGGINGFACE_TOKEN`	Hugging Face
GPU	`RUNPOD_API_KEY`	RunPod
Web	`FIRECRAWL_API_KEY`	Firecrawl

Keys are loaded from (in order of priority):

.env or .env.local in the current directory
~/.hubify/.env
Colab Secrets (on Google Colab)
Environment variables

None of the keys are strictly required. The toolkit degrades gracefully — modules that need a missing key will raise a clear error, while other modules continue to work.

Output Directory

Research outputs (search results, session reports, data exports) are saved to:

/workspace/outputs/ on RunPod
$HUBIFY_RESEARCH_OUTPUT_DIR if set
~/.hubify/research/outputs/ on local machines

Research Labs

Dedicated research workspaces with budget controls

Research MCP

7 research tools for AI editors via MCP

CLI Reference

Full CLI command reference for research

Quickstart Guide

Get started with the research toolkit in 5 minutes

Getting Started

Core Concepts

Features

Guides

Integrations

Reference

Resources

Research SDK

Research SDK

Architecture

Installation

Optional dependency groups

Modules

1. Router — Multi-model LLM routing

Available models

Task routing table

2. Literature — Unified search

3. Computation — Math verification

4. Data Access — Archive queries

5. Dataset Loader — HuggingFace streaming

6. Environment Check — API key validation

7. GPU / RunPod — Pod lifecycle management

8. GPU / Session — Environment detection

API Keys

Output Directory

Research Labs

Research MCP

CLI Reference

Quickstart Guide

Getting Started

Core Concepts

Features

Guides

Integrations

Reference

Resources

​Research SDK

​Architecture

​Installation

​Optional dependency groups

​Modules

​1. Router — Multi-model LLM routing

​Available models

​Task routing table

​2. Literature — Unified search

​3. Computation — Math verification

​4. Data Access — Archive queries

​5. Dataset Loader — HuggingFace streaming

​6. Environment Check — API key validation

​7. GPU / RunPod — Pod lifecycle management

​8. GPU / Session — Environment detection

​API Keys

​Output Directory

​Related

Research Labs

Research MCP

CLI Reference

Quickstart Guide

Research SDK

Architecture

Installation

Optional dependency groups

Modules

1. Router — Multi-model LLM routing

Available models

Task routing table

2. Literature — Unified search

3. Computation — Math verification

4. Data Access — Archive queries

5. Dataset Loader — HuggingFace streaming

6. Environment Check — API key validation

7. GPU / RunPod — Pod lifecycle management

8. GPU / Session — Environment detection

API Keys

Output Directory

Related