RunPod Integration

RunPod is the primary GPU compute provider for Hubify Labs. This guide covers connecting your RunPod account, configuring pods, and optimizing for cost.

Connecting RunPod

Create a RunPod account

Generate an API key

Go to RunPod Settings > API Keys and create a key with full access.

Add to Hubify

hubify pod config --provider runpod --api-key "your-runpod-api-key"

Verify

hubify pod config --test

RunPod connection: OK
Available GPUs: H200, H100, A100, A40, RTX 4090
Account balance: $245.00

Available GPU Types

GPU	VRAM	Best For	Approx. Cost/hr
H200	141 GB	Large models, full-dataset anomaly detection	$3.89
H100	80 GB	MCMC chains, training, most experiments	$2.49
A100	80 GB	General GPU compute	$1.64
A40	48 GB	Medium workloads, figure generation	$0.79
RTX 4090	24 GB	Small models, prototyping	$0.44

Pricing varies by availability and region. Spot instances can be up to 80% cheaper.

Pod Configuration

Default Settings

# Set defaults for all new pods
hubify pod config --default-gpu h100
hubify pod config --default-region us-east
hubify pod config --idle-timeout 15m

Docker Images

Hubify provides pre-built images with common scientific packages:

Image	Contents
`hubify/base:latest`	Python 3.11, CUDA 12, PyTorch 2.1
`hubify/cosmo:latest`	Base + Cobaya, GetDist, Astropy, HEALPy
`hubify/ml:latest`	Base + Transformers, Accelerate, Datasets
`hubify/astro:latest`	Base + Astropy, Photutils, SEP, Source Extractor

hubify pod config --default-image hubify/cosmo:latest

SSH Access

# Add your SSH key
hubify pod ssh-key add --file ~/.ssh/id_ed25519.pub

# SSH into a running pod
hubify pod ssh pod-abc123

Performance Tips

Use DataLoader for GPU inference: num_workers=16, pin_memory=True, prefetch_factor=4 gives a 32x speedup over serial processing
Pre-stage large datasets on persistent storage so pods start instantly
Use spot instances for non-urgent experiments (set --spot flag)
Match GPU to workload: do not use an H200 for figure generation

# Run on a spot instance
hubify experiment run --name "overnight-chain" --pod h100 --spot

Cost Management

# Set monthly budget
hubify pod budget --monthly 500

# Set per-experiment cap
hubify pod budget --per-experiment 50

# View current spend
hubify pod budget --show

# Alert at 80% of budget
hubify pod budget --alert-threshold 0.8

Persistent Storage

Upload datasets to RunPod persistent storage so they survive pod restarts:

# Upload a dataset
hubify pod storage upload ./planck_likelihood.tar.gz

# Mount in experiments
hubify experiment run --name "my-chain" --storage planck_likelihood.tar.gz

Troubleshooting

Pod stuck in provisioning

The requested GPU type may be sold out. Try a different GPU or region:

hubify pod list --available

Out of memory (OOM)

Upgrade to a GPU with more VRAM, or reduce batch size. H200 (141 GB) handles the largest workloads.

Spot instance preempted

Spot instances can be reclaimed. Use checkpointing for long experiments:

hubify experiment resume EXP-051 --from-checkpoint latest

Getting Started

Core Concepts

Features

Guides

Integrations

Resources

RunPod Integration

RunPod Integration

Connecting RunPod

Available GPU Types

Pod Configuration

Default Settings

Docker Images

SSH Access

Performance Tips

Cost Management

Persistent Storage

Troubleshooting

​RunPod Integration

​Connecting RunPod

​Available GPU Types

​Pod Configuration

​Default Settings

​Docker Images

​SSH Access

​Performance Tips

​Cost Management

​Persistent Storage

​Troubleshooting

RunPod Integration

Connecting RunPod

Available GPU Types

Pod Configuration

Default Settings

Docker Images

SSH Access

Performance Tips

Cost Management

Persistent Storage

Troubleshooting