How to actually benchmark a server: our standard playbook

Every review on this site uses the same benchmark suite. We get asked what’s in it often enough that here’s the whole playbook.

CPU

sysbench cpu --threads=N --time=120 run, where N is the box’s vCPU count. We report events-per-second and the standard deviation across five runs.

Disk

fio with two profiles: 4k random read at QD32, and 1M sequential write. Both against the boot volume, mounted with default options. We also note filesystem mount options because nobody else seems to.

Network

iperf3 to a fixed Hetzner endpoint in FSN1, three samples each direction. We use the same endpoint for every provider so the numbers are comparable.

Uptime

A 7-day external watch with 1-minute granularity. We report any minute the box failed to respond — even if it was the network’s fault.

GPU (when applicable)

Llama-3 70B 4-bit inference at batch size 1, SDXL 1024² with default settings, and a fixed fine-tune of a 7B model. Same prompt, same seed, same image set every time.

That’s it. No magic. Reproducibility beats sophistication.

How to actually benchmark a server: our standard playbook

CPU

Disk

Network

Uptime

GPU (when applicable)

The actual cost of egress on AWS, Hetzner, OVH and Runpod

Setting up a Palworld dedicated server on Hetzner CX22 for £4/mo

Self-hosting Llama 3 70B: cheapest providers per million tokens