/blog / comparison
Cold start times: Runpod Serverless vs Modal vs Replicate
Three serverless GPU platforms, 1,000 cold-start invocations each, the same Llama-3 8B container.
- gpu
- comparison
- serverless
- runpod
- modal
- replicate
Cold start is the difference between “we can run inference serverlessly” and “we can’t.” We benchmarked the same container — quantised Llama-3 8B, 4.2GB image — across three serverless GPU platforms.
The numbers
- Runpod Serverless: p50 2.4s, p99 2.8s
- Modal: p50 3.1s, p99 4.6s
- Replicate: p50 7.8s, p99 22.4s (yes, really)
Why the spread
Runpod warms the host’s image cache aggressively. Modal does too, but their scheduler is more willing to evict. Replicate optimises for cost over latency by default — they ship a tier that matches the others, just not their default.
If cold start is your headline metric, Runpod Serverless is the pick. If you need polyglot language SDKs and Python idioms baked in, Modal is worth the extra second.
comparison · runpod
Runpod Bare-Metal vs Serverless: Llama 3 8B Cost and Latency
We put Llama 3 8B through its paces on Runpod's bare-metal pods and their Serverless platform, measuring real costs, cold starts, and throughput.
5 min
comparison · runpod
RTX 3090 Cloud Pricing: Runpod, Vast.ai, Vultr Compared
We pitted three providers against each other for budget 3090 rentals, tracking costs, stability, and real-world performance for ML workloads.
5 min
comparison
H200 Cloud Pricing: The Hunt for Nvidia's Newest GPU
We scoured Runpod, Lambda Labs, and Vultr for Nvidia's H200, comparing listed prices, actual availability, and the hidden costs that follow the hype.
11 min