Nvidia L40S: Does Vultr, Runpod, or Lambda Labs justify the cost?

The Nvidia L40S GPU, launched quietly a while back, has been pitched as a versatile workhorse: not quite the H100 for heavy training, not quite the RTX 4090 for desktop gaming, but a strong contender for inference, VDI, and moderate training workloads in the datacenter. With 48GB of GDDR6 memory, 18176 CUDA cores, and a 350W TDP, it’s a significant piece of silicon. We’ve seen it pop up across various providers, often sitting in that awkward middle ground between the budget-friendly consumer cards and the top-tier A100/H100 offerings. The question, as always, is whether the price reflects its utility.

To find out, we rented L40S instances from Vultr, Runpod, and Lambda Labs. Our objective was simple: put them through a consistent set of benchmarks for a week, compare the raw performance, and, more importantly, figure out the actual cost-per-unit-of-work. Because, ultimately, that’s what matters when the monthly bill lands.

Our Testing Methodology

For consistency, we deployed our standard benchmarking playbook, which readers can review at /blog/benchmark-playbook/. This involves a series of synthetic and real-world workloads designed to stress different aspects of the GPU, including memory bandwidth, FP16/FP32 performance, and inference latency. Each instance was rented for a full seven days to account for any provider-specific quirks or performance variability over time. We focused on three key tasks:

Stable Diffusion XL 1.0 Inference: Generating 1024x1024 images with a refiner, 50 steps, measuring iterations per second (it/s).
Llama 3 70B Inference: Running a quantized (4-bit) version of Llama 3 70B, generating 1024 tokens, measuring tokens per second (tokens/s).
PyTorch ResNet-50 Training: A basic training loop on ImageNet (synthetic data), measuring samples per second (samples/s) for FP16 training.

All environments were set up with identical Docker containers and CUDA versions to minimize environmental discrepancies. Our goal was to isolate the hardware and provider overhead as much as possible.

Vultr’s Offering: Steady, Predictable

Vultr has been steadily expanding its GPU offerings, and the L40S is a relatively recent addition. We provisioned an instance at an advertised rate of $1.95 per hour. The setup process was straightforward, typical of any general-purpose cloud provider. The instance spun up within minutes, and we had our Docker environment running without issue.

Performance on Vultr’s L40S was consistent:

SDXL 1.0 Inference: We observed an average of 4.2 iterations per second (it/s).
Llama 3 70B Inference (4-bit): The model processed approximately 65 tokens per second.
ResNet-50 Training (FP16): It managed about 120 samples per second.

Vultr’s egress policies are fairly generous, which can be a relief for those moving large datasets, though we weren’t stressing that aspect much in this particular test. The overall experience was solid, if unremarkable. No surprises, which for some, is a feature.

Runpod’s L40S: The Performance Contender

Runpod, a platform we’ve examined in depth before (see our general impressions in /blog/runpod-review/), typically offers competitive pricing, especially for raw GPU power. We opted for an L40S instance on their Secure Cloud, priced at $1.70 per hour. Provisioning was quick, as expected, leveraging their Docker-first approach to instance creation. It’s a system that works well if you’re comfortable with containers, which, frankly, most GPU renters should be by now. If you’re looking for flexible GPU rentals, their platform is often worth a look: https://runpod.io/?ref=8vbo5oc9.

Runpod’s L40S instances consistently edged out Vultr in our benchmarks:

SDXL 1.0 Inference: It achieved an average of 4.5 iterations per second (it/s).
Llama 3 70B Inference (4-bit): We saw it push out 70 tokens per second.
ResNet-50 Training (FP16): The instance processed around 128 samples per second.

The slight performance bump, combined with the lower hourly rate, immediately made Runpod an attractive option for purely cost-conscious workloads. We didn’t encounter any stability issues or unexpected throttling during our week-long test.

Lambda Labs: The Premium, Predictable Choice

Lambda Labs positions itself as a more enterprise-focused provider, often catering to ML teams looking for reliability and dedicated resources. Our experience with them has often been positive, albeit sometimes involving a wait for popular hardware, as noted in our /blog/lambda-labs-review/. An L40S instance from Lambda cost us $2.20 per hour. Provisioning was not instantaneous; we experienced a short queue, but once the instance was available, it was rock-solid.

Lambda Labs’ L40S performance was right in line with expectations:

SDXL 1.0 Inference: It delivered an average of 4.4 iterations per second (it/s).
Llama 3 70B Inference (4-bit): The throughput was 68 tokens per second.
ResNet-50 Training (FP16): It managed approximately 125 samples per second.

The numbers are competitive, falling between Vultr and Runpod for raw speed. The premium price point for Lambda often comes with a higher level of support, clearer billing, and a generally more polished, if sometimes less flexible, platform experience. For teams that prioritize these aspects over pinching every penny, it’s a valid trade-off.

The Numbers: A Direct Comparison

Let’s put the raw data into perspective:

Provider	Hourly Price	SDXL (it/s)	Llama 3 70B (tokens/s)	ResNet-50 (samples/s)	Cost per SDXL it/s	Cost per Llama Token/s
Vultr	$1.95	4.2	65	120	$0.46	$0.030
Runpod	$1.70	4.5	70	128	$0.38	$0.024
Lambda Labs	$2.20	4.4	68	125	$0.50	$0.032

From a purely performance-per-dollar perspective, Runpod clearly takes the lead across all our benchmarks. Its lower hourly rate combined with marginally better throughput translates to a significant saving for high-volume workloads. For instance, the cost per SDXL iteration on Runpod is nearly 18% lower than on Vultr, and 31% lower than on Lambda Labs. The Llama 3 inference figures show a similar trend.

Vultr holds its own as a respectable second place, offering solid performance at a reasonable price. It’s a good general-purpose cloud GPU option, especially if you’re already integrated into their ecosystem or prefer a less specialized interface.

Lambda Labs, while not winning on raw cost efficiency for these specific benchmarks, still offers a compelling package for those who value predictable, stable infrastructure and a more hands-off management experience. Their pricing often reflects a different value proposition—one of reliability and enterprise-grade service, which for some production environments, is money well spent.

Final Thoughts on the L40S Landscape

The Nvidia L40S occupies a peculiar niche. It’s too expensive for casual experimentation when RTX 4090s are available, and not quite powerful enough to displace H100s for cutting-edge training. However, for continuous inference workloads, VDI, or even some mid-range training where the 48GB VRAM is beneficial, it presents a compelling case. Our testing indicates that for raw, unadulterated performance-per-dollar, Runpod’s L40S instances are difficult to beat. If your workload can tolerate a container-centric workflow and you’re optimizing for cost, they should be high on your list. Vultr offers a perfectly acceptable, consistent experience that won’t break the bank, while Lambda Labs remains the choice for those who prefer predictability and a premium service, even if it comes with a higher hourly rate. The L40S is a decent card, but as always, the provider makes all the difference to your bottom line.