Self-hosting Llama 3 70B: cheapest providers per million tokens

If you’re running Llama-3 70B yourself rather than paying an API, the question is which provider gets you the lowest cost per million tokens. We measured.

The setup

Single H100 80GB, vLLM, batch size tuned per host (since shared hosts vary), 4-bit AWQ quantisation. Prompt mix derived from a real production traffic sample, ~512 input tokens and ~256 output on average.

The ranking

Runpod Community Cloud H100: $0.41 / 1M output tokens
Vast.ai 4090 (3 in tensor-parallel): $0.49
Hetzner GPU box (RTX 6000 Ada): $0.62
Lambda H100 on-demand: $0.71
Modal Serverless H100: $0.84
Paperspace A100 80GB: $1.04
AWS p5 H100 on-demand: $1.83
Replicate (managed): $2.40+ depending on quota

The takeaway: rolling your own on Runpod is ~5× cheaper than the managed APIs at meaningful volume. Below ~50M tokens/month it’s probably not worth your engineering time. Above that, it’s an obvious move.

Self-hosting Llama 3 70B: cheapest providers per million tokens

The setup

The ranking

Runpod Bare-Metal vs Serverless: Llama 3 8B Cost and Latency

AMD MI300X vs H100: Cloud LLM Inference, Price-Per-Token

The actual cost of egress on AWS, Hetzner, OVH and Runpod