This project processes video frames with YOLO object detection and computes a homography matrix to map detections onto a reference aerial image (e.g., a Google Maps image). It includes functionality for image warping, bounding box transformations, and frame-by-frame video processing.
- Input video parsing: Handles image frames and YOLO detections together.
- Homography Calculation: Computes a transformation matrix between a source and destination image.
- Warping: Warps image frames and bounding boxes using the homography matrix.
- Exporting: Exports processed frames and detection data for further analysis.
- Single Camera: This pipeline assumes the use of a single, static camera. This simplifies the computation of the homography matrix, as the scene and reference frame remain consistent throughout the video.
- Static Camera: A static camera ensures that the relationship between the video frames and the reference aerial image does not change, which is critical for accurate homography calculations.
- No Feature Detection: This pipeline does not perform feature detection. Instead, it focuses on applying precomputed keypoint matches to compute the homography matrix.
- No OpenCV: The pipeline is implemented without using OpenCV, leveraging libraries like NumPy and SciPy for matrix operations and image transformations.
- 
main.py:- Entry point for the pipeline.
- Parses command-line arguments and coordinates the video processing workflow.
 
- 
video.py:- Contains classes for enriched video and frame processing.
- Handles loading, exporting, and visualization of frames and their associated YOLO detections.
 
- 
vision.py:- Provides functions for homography computation, image warping, and bounding box transformation.
 
- 
files.py:- Utility functions for handling file operations, such as extracting numbers from filenames to match frames with YOLO detections.
 
- 
LICENSE:- Licensing details for the project.
 
- Python 3.11 or higher
- NumPy
- SciPy
- Matplotlib
To install the required dependencies, run:
pip install numpy scipy matplotlibYou can also run this project using Docker for a more isolated and consistent environment.
- 
Clone the repository: git clone https://github.com/guilherme-marcello/video-stitching-pipeline.git cd video-stitching-pipeline
- 
Build the Docker image: docker build -t video-processing-pipeline .Note: If you don't want to build the Docker image yourself, you can use the prebuilt image available on Docker Hub: docker pull guilhermemarcelo/video-stitching-pipeline:latest 
- 
Prepare your input data and ensure it is located in a directory accessible from your system. 
- 
Run the Docker container: docker run -v /path/to/input/data:/data -v /path/to/output:/output video-processing-pipeline \ -kp /data/keypoint_matches.mat \ -map /data/google_maps_image.png \ -i /data \ -o /outputIf using the prebuilt image from Docker Hub: docker run -v /path/to/input/data:/data -v /path/to/output:/output guilhermemarcelo/video-stitching-pipeline:latest \ -kp /data/keypoint_matches.mat \ -map /data/google_maps_image.png \ -i /data \ -o /output- Replace /path/to/input/datawith the path to your input directory.
- Replace /path/to/outputwith the path where you want the output files to be saved.
- Adjust the paths for the keypoint matches file and Google Maps image as needed.
 
- Replace 
- 
Output files will be saved in the specified output directory. 
- 
Prepare Input Data: - Ensure all input frames are named img_<frame_number>.jpg.
- Ensure YOLO detection outputs are named yolo_<frame_number>.mat.
- Place these files in a directory.
 
- Ensure all input frames are named 
- 
Run the Pipeline: Use the following command to process the video frames: python main.py -kp <keypoint_matches_file> -map <google_maps_image> -i <input_directory> - -kp: Path to the keypoint matches file (default:- kp_gmaps.mat).
- -map: Path to the Google Maps image (default:- gmaps.png).
- -i: Input directory containing frames and YOLO detections (default:- .).
 
- 
Output: - Warped frames and detection data will be exported to the output directory.
 
- 
Load Video Frames: - Frames and YOLO detection outputs are matched by their filenames and loaded into EnrichedFrameobjects.
 
- Frames and YOLO detection outputs are matched by their filenames and loaded into 
- 
Compute Homography: - A homography matrix is calculated using keypoint matches between the first frame and a reference aerial image.
 
- 
Warp Frames: - Frames and bounding boxes are transformed using the computed homography matrix.
 
- 
Export Results: - Processed frames and detection outputs are saved to the specified directory for further analysis.
 
This project is licensed under the terms specified in the LICENSE file.