/blog / comparison

Cold start times: Runpod Serverless vs Modal vs Replicate

Three serverless GPU platforms, 1,000 cold-start invocations each, the same Llama-3 8B container.

Tobias 8 min read
  • gpu
  • comparison
  • serverless
  • runpod
  • modal
  • replicate

Cold start is the difference between “we can run inference serverlessly” and “we can’t.” We benchmarked the same container — quantised Llama-3 8B, 4.2GB image — across three serverless GPU platforms.

The numbers

  • Runpod Serverless: p50 2.4s, p99 2.8s
  • Modal: p50 3.1s, p99 4.6s
  • Replicate: p50 7.8s, p99 22.4s (yes, really)

Why the spread

Runpod warms the host’s image cache aggressively. Modal does too, but their scheduler is more willing to evict. Replicate optimises for cost over latency by default — they ship a tier that matches the others, just not their default.

If cold start is your headline metric, Runpod Serverless is the pick. If you need polyglot language SDKs and Python idioms baked in, Modal is worth the extra second.