nanodrr¶
A performance-oriented reimplementation of DiffDRR with the following improvements:
- Optimized, pure PyTorch implementation (~5× faster than
DiffDRRat baseline) - Modular design (freely swap subjects, extrinsics, and intrinsics during rendering)
- Compatibility with
torch.compileand mixed precision - Extensive type hints with
jaxtyping - Standard Python package structure managed with
uv
All changes to DiffDRR are summarized here.
Installation¶
PyTorch version
On pytorch<2.9, torch.compile with bfloat16 is slower than eager due to a CUDA graph capture issue (see Benchmarks). Use pytorch>=2.9 (triton>=3.5) for best results.
To strictly install the renderer:
To install the optional 3D visualization module:
Benchmarks¶
Highlights
- ~5× faster than
DiffDRRout of the box, without compilation (946 FPS vs 213 FPS) - ~8× faster with
torch.compileandbfloat16onpytorch>=2.9(1,650 FPS vs 213 FPS) - ~2.5× less memory than
DiffDRR(516 MB vs 1,344 MB peak reserved withbfloat16+ compile)

Mean ± standard deviation of 10 runs, 100 loops each.
Benchmarked by rendering 200×200 DRRs on an NVIDIA RTX 6000 Ada (48 GB) with Python 3.12. Compile represents torch.compile(mode="reduce-overhead", fullgraph=True). Full experiment at tests/benchmark/.