The ability to accurately monitor the position of an object or group of objects in some video sequence can be of high value when making decisions with respect to security, maneuverability, ecology, and infrastructure. Efforts from our previous work1, 2 provided evidence that in overhead video, a system’s ability to perform this tracking could be greatly enhanced through effective and segmented calculation of frame-to-frame spatial correspondences. We now continue the investigations began by this previous work (wherein frames and their detections are co-mapped) through a case study wherein hand-crafted and machine learning (ML) approaches for co-registration of video frames are reviewed and compared. First, the merits and shortcomings of hand-crafted algorithms, ML models, and hybrid approaches are discussed for key-point detection and key-point description. Modifications to feature matching and homography estimation are discussed as well, and following this, a more recently published class of co-registration involving “detector-less” correspondence is outlined. These approaches are applied and evaluated on four overhead image sequences with ground-truth corresponding to object detection locations only. Because of this lack of ground control points for evaluation, co-mapped centroids of stationary objects are used to generate an accuracy metric for the various mapping approaches. Further, given the value of mapping and tracking in real-time contexts, this notion of accuracy is compared to computation time so that trade-offs in temporal and spatial performance can be better understood.
|