Timing versus DRR size

Along with tips for rendering DRRs that don’t fit in memory
import numpy as np
import torch

from diffdrr.data import load_example_ct
from diffdrr.drr import DRR
from diffdrr.visualization import plot_drr
from diffdrr.pose import convert
# Read in the volume
subject = load_example_ct()
device = "cuda" if torch.cuda.is_available() else "cpu"

# Get parameters for the detector
rotations = torch.tensor([[0.0, 0.0, 0.0]], device=device)
translations = torch.tensor([[0.0, 850.0, 0.0]], device=device)
pose = convert(rotations, translations, parameterization="euler_angles", convention="ZXY")
height = 100

drr = DRR(subject, sdd=1020, height=height, delx=2.0).to(device=device, dtype=torch.float32)

del drr
6.64 ms ± 441 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
height = 200

drr = DRR(subject, sdd=1020, height=height, delx=2.0).to(device=device, dtype=torch.float32)

del drr
24.6 ms ± 15.9 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
height = 300

drr = DRR(subject, sdd=1020, height=height, delx=2.0).to(device=device, dtype=torch.float32)

del drr
51.1 ms ± 21.9 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
height = 400

drr = DRR(subject, sdd=1020, height=height, delx=2.0).to(device=device, dtype=torch.float32)

del drr
88 ms ± 79.4 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Memory constraints

Up until this point, we could compute every ray in the DRR in one go on the GPU. However, as the DRRs get bigger, we will quickly run out of memory. For example, on a 12 GB GPU, computing a 500 by 500 DRR will raise a CUDA memory error.

Tip

To render DRRs whose computation won’t fit in memory, we can compute patches of the DRR at a time. Pass patch_size to the DRR module to specify the size of the patch. Note the patch size must evenly tile (height, width).

height = 500
patch_size = 250

drr = DRR(subject, sdd=1020, height=height, delx=2.0, patch_size=patch_size).to(device=device, dtype=torch.float32)

del drr
105 ms ± 83.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
height = 750
patch_size = 150

drr = DRR(subject, sdd=1020, height=height, delx=2.0, patch_size=patch_size).to(device=device, dtype=torch.float32)

del drr
217 ms ± 68.4 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
height = 1000
patch_size = 250

drr = DRR(subject, sdd=1020, height=height, delx=2.0, patch_size=patch_size).to(device=device, dtype=torch.float32)

del drr
341 ms ± 310 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
height = 1500
patch_size = 250

drr = DRR(subject, sdd=1020, height=height, delx=2.0, patch_size=patch_size).to(device=device, dtype=torch.float32)

del drr
717 ms ± 794 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

With patch_size, the only limitation is storage in memory, not computation.