Image similarity metrics and geodesic distances for camera poses
Image similarity metrics
Used to quantify the similarity between ground truth X-rays (\(\mathbf I\)) and synthetic X-rays generated from estimated camera poses (\(\hat{\mathbf I}\)). If a metric is differentiable, it can be used to optimize camera poses with DiffDRR.
NCC and GradNCC are originally implemented in diffdrr.metrics. DiffPose provides torchmetrics wrappers for these functions.
One can define geodesic pseudo-distances on SO(3) and SE(3).1 This let’s us measure registration error (in radians and millimeters, respectively) on poses, rather than needed to compute the projection of fiducials.
For SO(3), the geodesic distance between two rotation matrices \(\mathbf R_A\) and \(\mathbf R_B\) is \[\begin{equation}
d_\theta(\mathbf R_A, \mathbf R_B; r) = r \left| \arccos \left( \frac{\mathrm{trace}(\mathbf R_A^* \mathbf R_B) - 1}{2} \right ) \right| \,,
\end{equation}\] where \(r\), the source-to-detector radius, is used to convert radians to millimeters.
For SE(3), we decompose the transformation matrix into a rotation and a translation, i.e., \(\mathbf T = (\mathbf R, \mathbf t)\). Then, we compute the geodesic on translations (just Euclidean distance), \[\begin{equation}
d_t(\mathbf t_A, \mathbf t_B) = \| \mathbf t_A - \mathbf t_B \|_2 \,.
\end{equation}\] Finally, we compute the double geodesic on the rotations and translations: \[\begin{equation}
d(\mathbf T_A, \mathbf T_B) = \sqrt{d_\theta(\mathbf R_A, \mathbf R_B)^2 + d_t(\mathbf t_A, \mathbf t_B)^2} \,.
\end{equation}\]