import numpy as np
import torch
from diffdrr.data import load_example_ct
from diffdrr.drr import DRR
from diffdrr.visualization import plot_drr
Timing versus DRR size
Along with tips for rendering DRRs that don’t fit in memory
# Read in the volume
= load_example_ct()
volume, spacing = "cuda" if torch.cuda.is_available() else "cpu"
device
# Get parameters for the detector
= np.array(volume.shape) * np.array(spacing) / 2
bx, by, bz = torch.tensor([[bx, by, bz]]).to(device)
translations = torch.tensor([[np.pi, 0, np.pi / 2]]).to(device) rotations
= 100
height
= DRR(volume, spacing, sdr=300.0, height=height, delx=4.0).to(device)
drr
del drr
9.28 ms ± 375 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
= 200
height
= DRR(volume, spacing, sdr=300.0, height=height, delx=4.0).to(device)
drr
del drr
33.3 ms ± 14.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
= 300
height
= DRR(volume, spacing, sdr=300.0, height=height, delx=4.0).to(device)
drr
del drr
72.1 ms ± 30.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
= 400
height
= DRR(volume, spacing, sdr=300.0, height=height, delx=4.0).to(device)
drr
del drr
123 ms ± 35.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
= 500
height
= DRR(volume, spacing, sdr=300.0, height=height, delx=4.0).to(device)
drr
del drr
187 ms ± 96.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Memory constraints
Up until this point, we could compute every ray in the DRR in one go on the GPU. However, as the DRRs get bigger, we will quickly run out of memory. For example, on a 12 GB GPU, computing a 600 by 600 DRR will raise a CUDA memory error.
Tip
To render DRRs whose computation won’t fit in memory, we can compute patches of the DRR at a time. Pass patch_size
to the DRR
module to specify the size of the patch. Note the patch size must evenly tile (height
, width
).
= 600
height = 150
patch_size
= DRR(
drr =300.0, height=height, delx=4.0, patch_size=patch_size
volume, spacing, sdr
).to(device)
del drr
183 ms ± 214 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
= 750
height = 150
patch_size
= DRR(
drr =300.0, height=height, delx=4.0, patch_size=patch_size
volume, spacing, sdr
).to(device)
del drr
259 ms ± 92.7 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
= 1000
height = 250
patch_size
= DRR(
drr =300.0, height=height, delx=4.0, patch_size=patch_size
volume, spacing, sdr
).to(device)
del drr
417 ms ± 234 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
= 1500
height = 250
patch_size
= DRR(
drr =300.0, height=height, delx=4.0, patch_size=patch_size
volume, spacing, sdr
).to(device)
del drr
826 ms ± 268 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
With patch_size
, the only limitation is DRR storage, not computation.