/blog / guide

Self-hosting Llama 3 70B: cheapest providers per million tokens

Eight providers, the same prompt distribution, ranked by cost per million output tokens.

Tobias 10 min read
  • guide
  • gpu
  • llama
  • inference

If you’re running Llama-3 70B yourself rather than paying an API, the question is which provider gets you the lowest cost per million tokens. We measured.

The setup

Single H100 80GB, vLLM, batch size tuned per host (since shared hosts vary), 4-bit AWQ quantisation. Prompt mix derived from a real production traffic sample, ~512 input tokens and ~256 output on average.

The ranking

  1. Runpod Community Cloud H100: $0.41 / 1M output tokens
  2. Vast.ai 4090 (3 in tensor-parallel): $0.49
  3. Hetzner GPU box (RTX 6000 Ada): $0.62
  4. Lambda H100 on-demand: $0.71
  5. Modal Serverless H100: $0.84
  6. Paperspace A100 80GB: $1.04
  7. AWS p5 H100 on-demand: $1.83
  8. Replicate (managed): $2.40+ depending on quota

The takeaway: rolling your own on Runpod is ~5× cheaper than the managed APIs at meaningful volume. Below ~50M tokens/month it’s probably not worth your engineering time. Above that, it’s an obvious move.