Measure algorithm performance across CPU, CUDA, TPU, and Metal backends
# Quick smoke test (TRG, small size, 1 trial)
python -m benchmarks.run --backend cpu --algorithm trg --size small --trials 1
# Full CPU baseline
python -m benchmarks.run --backend cpu -o benchmarks/results/cpu_baseline.json
# GPU comparison
python -m benchmarks.run --backend cuda -o benchmarks/results/cuda.json
# Show available backends
python -m benchmarks.run --list-backends
python -m benchmarks.run [OPTIONS]
| Flag | Description | Values |
|---|---|---|
-b, --backend |
Hardware backend | cpu, cuda, tpu, metal |
-a, --algorithm |
Algorithm(s) to benchmark | dmrg, idmrg, trg, hotrg, ipeps, all |
-s, --size |
Problem size(s) | small, medium, large, all |
-n, --trials |
Number of trials per config | Integer (default varies) |
-o, --output |
Save results to JSON | File path |
--csv |
Save results to CSV | File path |
--list-backends |
Show available backends | — |
Each algorithm defines three problem sizes. Larger sizes stress-test bond dimension scaling and hardware throughput.
| Algorithm | Size | Parameters |
|---|---|---|
| DMRG | small | L=10, chi=20, 5 sweeps |
| DMRG | medium | L=20, chi=50, 10 sweeps |
| DMRG | large | L=40, chi=100, 10 sweeps |
| iDMRG | small | chi=16, 50 iterations |
| iDMRG | medium | chi=32, 100 iterations |
| iDMRG | large | chi=64, 200 iterations |
| TRG | small | chi=8, 16 steps |
| TRG | medium | chi=16, 20 steps |
| TRG | large | chi=32, 24 steps |
| HOTRG | small | chi=8, 12 steps |
| HOTRG | medium | chi=16, 16 steps |
| HOTRG | large | chi=32, 20 steps |
| iPEPS | small | D=2, chi=8, 100 SU steps |
| iPEPS | medium | D=3, chi=16, 200 SU steps |
| iPEPS | large | D=4, chi=24, 300 SU steps |
| Backend | Flag | Requirements |
|---|---|---|
| CPU | cpu |
Default — works everywhere |
| NVIDIA GPU | cuda |
pip install tenax-tn[cuda13] or tenax-tn[cuda12] |
| Google Cloud TPU | tpu |
pip install tenax-tn[tpu] |
| Apple Silicon GPU | metal |
pip install tenax-tn[metal] (macOS, experimental) |
Check what’s available on your machine:
python -m benchmarks.run --list-backends
Full structured output with timings, parameters, and device info:
python -m benchmarks.run -b cpu -a dmrg -s medium -n 3 -o results.json
Flat table for analysis in pandas, Excel, or plotting tools:
python -m benchmarks.run -b cpu -a all -s all --csv results.csv
python -m benchmarks.run -b cpu -a dmrg -s medium -n 3 -o cpu.json
python -m benchmarks.run -b cuda -a dmrg -s medium -n 3 -o gpu.json
GPU typically helps when chi >= 64. For smaller bond dimensions, CPU may be faster due to transfer overhead.
Fix the backend and vary size to study computational scaling:
python -m benchmarks.run -b cpu -a dmrg -s small medium large -n 5 --csv dmrg_scaling.csv
--trials 5 or more for stable numbers.