Skip to content
guanchar edited this page Aug 21, 2020 · 18 revisions

Code Map Localization is an indoor 2D localization system that decodes floor printed binary patterns to achieve robust and accurate global positioning. The algorithm is designed to run in real-time under tight resource constraints (e.g. embedded micro-controllers), and works with image sensors as low as 20x20 pixels in resolution.

rover.png

Currently supports:

  • USB cameras
  • ESP32 camera
  • PC simulation
  • ROS2 interface
  • Python binding

Use cases

Although many applications exist, the primary objective is to provide a reliable and extremely low-cost localization system for large fleets of autonomous indoor ground vehicles, since other existing systems usually become prohibitively expensive as the coverage area and fleet size scales up.

Case Examples:

  • For research purposes, it is often required to collect real-time “ground truth” localization data for mobile robots to evaluate various control and estimation algorithms.
  • For industrial purposes, autonomous warehouse robots often require accurate real-time localization data in order to maneuver in tight spaces and arrive at exact locations for loading/unloading/docking.
  • For hobbyist purposes, small robotics demos often require a localization system that is robust and portable with minimal setup (the floor pattern can be printed on a portable surface).

Compared to other methods

See Comparing Indoor Localization Systems, in brief:

Advantages:

  • Millimeter accurate global positioning everywhere. It is literally “ground truth”.
  • High update frequency, limited only by image sensor framerate.
  • Scales to any number of trackers without penalty.
  • Ultra-low cost.
    • Each module requires as much hardware as an optical mouse, which is a few dollars at scale.
      • The current development prototype cost less than $10 each.
    • Graphic flooring might be an investment depending on material. For reference, custom printed vinyl flooring stickers are about $5 per square meter at scale.
      • For demo purposes even a large format poster would suffice.
  • Can use and share existing imaging systems without requiring dedicated hardware. (Any floor pointing camera)

Drawbacks:

  • 2D (x, y, theta) only.
  • Requires floor printed positioning pattern.
  • Loses localization during high-speed motion.
    • Can be compensated via printing the pattern larger, and viewing from longer distance, with the trade-off of less positioning resolution.
    • Mostly dependent on camera configuration, image may be too smeared to decode at some movement threshold, especially for rolling shutter.
    • Visual odometry requires some overlap between successive frames, which decreases with movement speed.

How it works

See Localization Pattern implementation details, in brief:

Code Sequence Generation

  1. Long 1D binary sequences with the property of directional uniqueness in all sub-strings of a fixed length are pre-generated using an incremental bit sampling algorithm.
  2. A compact reverse lookup index is generated from the sequence for fast decoding of positions from sub-strings.
  3. The floor pattern consists of the XOR outer product of the binary vector/sequence with itself.

Decoding Pipeline

  1. The image is de-rotated to align rows and columns to the nearest 90 degrees using a fast gradient averaging algorithm.
  2. A hyper-sharpening convolutional filter is applied to reduce blur and uneven lighting effects, followed by Otsu thresholding onto a binary matrix.
  3. A noise resilient clustering algorithm is used to perform binary matrix factorization to extract 1D row and column codes.
  4. The row and column codes are individually scaled/down-sampled to various bit lengths, then scanned for the location with the longest matching code using the reverse lookup index.
  5. The match location is filtered/rejected based on match size, bit errors during factorization and down-sampling, and distance from last confirmed position.

Visual Odometry In Parallel

  1. FFT based Phase Correlation is performed on the stream of de-rotated frames to obtain pixel translation.
  2. Neighboring peaks based Subpixel Registration is used to refine translation estimates.
  3. Translations are rotated and scaled using estimates from the current frame and accumulated until a new location match is confirmed.

code_map_sim.png Example of Decoding Pipeline in Simulation

Frequently Asked Questions

How large of an area can the system cover?

Coverage area depends on block size of the code used, which is configurable according to camera sensor quality and image processing resources. For reference, currently a block of 20x20 bits (containing up to ~20% random bit errors) can decode about 50,000 positions per axis, which translates to 50m by 50m if every bit is printed at 1mm (i.e. millimeter accuracy)

What makes this better than similar visual pattern based systems/algorithms that already exist?

  1. It is free and open-source.
  2. It is a complete localization system that works out of the box.
  3. It is designed for practical application, to run in real-time on common low-cost hardware with meager resources.
  4. The core algorithm is portable to different systems, currently supporting simulation, usb camera, and embedded platforms.
  5. The binary code sequence embeds both position and direction data, which allows global heading/orientation to be decoded along with x and y position coordinates.

How is this different from putting QR codes on the floor?

  1. Each QR code occupies a distinct non-overlapping area. Code Map is a continuous positioning pattern where every sliding window block of bits decode to a unique position. This allows for much higher resolution in position.
  2. QR codes have dedicated locator patterns in the corners to distinguish itself within an image. Code Map assumes the majority of the image is the code and only ignores sections that fail to decode.
  3. Code Map was designed to be decoded many times per second (e.g. 50Hz) on a micro-controller. Hence the image processing pipeline is optimized to extract codes with very low image resolution requirements.
  4. QR codes use Reed-Solomon for error correction, Code map uses XOR binary matrix factorization (which is much more lightweight but is specific to extracting 1D vectors from 2D blocks).

How far does the camera have to be from the floor pattern?

It depends on how large the floor pattern is printed and the lens configuration of the image sensor. Ideally the length of 1 bit in the floor pattern should be projected onto 1.5 to 3 pixels in the image. For a crude reference, if the pattern is printed at 1mm resolution, the camera distance should be between 4cm - 7cm for a typical 60 degree FOV webcam.

Does the camera have to point perpendicular to the floor?

The short answer is YES. This is a choice by design because estimating and undoing image perspective distortions every frame in software requires higher computation and image resolution, which is not worth the trade-off considering micro-controller resources and the primary use case of fixed mounting a floor pointed camera for 2D navigation. However, usually there is some leeway virtue of bit error correction mechanisms and code redundancy across the frame. See Perspective and Homography for details.