Skip to content

nanodrr

tests docs pypi

A performance-oriented reimplementation of DiffDRR with the following improvements:

  • Optimized, pure PyTorch implementation (~5× faster than DiffDRR at baseline)
  • Modular design (freely swap subjects, extrinsics, and intrinsics during rendering)
  • Compatibility with torch.compile and mixed precision
  • Extensive type hints with jaxtyping
  • Standard Python package structure managed with uv

All changes to DiffDRR are summarized here.

Installation

PyTorch version

On pytorch<2.9, torch.compile with bfloat16 is slower than eager due to a CUDA graph capture issue (see Benchmarks). Use pytorch>=2.9 (triton>=3.5) for best results.

To strictly install the renderer:

pip install nanodrr

To install the optional 3D visualization module:

pip install "nanodrr[scene]"

Benchmarks

Highlights

  • ~5× faster than DiffDRR out of the box, without compilation (946 FPS vs 213 FPS)
  • ~8× faster with torch.compile and bfloat16 on pytorch>=2.9 (1,650 FPS vs 213 FPS)
  • ~2.5× less memory than DiffDRR (516 MB vs 1,344 MB peak reserved with bfloat16 + compile)

Benchmarking runtime, FPS, and memory usage. Benchmarking runtime, FPS, and memory usage.

Mean ± standard deviation of 10 runs, 100 loops each.

Benchmarked by rendering 200×200 DRRs on an NVIDIA RTX 6000 Ada (48 GB) with Python 3.12. Compile represents torch.compile(mode="reduce-overhead", fullgraph=True). Full experiment at tests/benchmark/.

Roadmap

  • Implement a fully optimized renderer
  • Port strictly necessary modules from DiffDRR
  • Migrate 3D plotting functions to an optional module
  • Integrate with xvr to speed up network training and registration
  • Integrate with polypose to speed up registration
  • Release as v1.0.0 of DiffDRR!