A 3D Christmas Tree: Sparse Bundle Adjustment of Addressable LEDs

December 17, 2022 | Evan Fletcher | C++ , Code , Computer Vision

A 3D Christmas Tree: Sparse Bundle Adjustment of Addressable LEDs

Code on Github: https://github.com/evanfletcher42/3dBlinkenlights

It’s the holiday season, and what better way to celebrate than to dig out an old project & massively over-complicate decorating? This is a Christmas tree, with addressable LEDs, where I’ve reconstructed all the 3D positions of every LED & am using that to drive a volumetric effect:

I’ve seen this done before, both commercially (Twinkly) and by Stand-up Maths on Youtube. I wanted to take a more generic computer-vision approach to the problem, solving this like a structure-from-motion scan on video data. Specifically, as a monocular sparse bundle adjustment problem, where we solve for 3D points + all the camera poses + camera intrinsic parameters.

This post talks through all the steps at a high level – the code is up on Github here.

Hardware

For this, the setup could not be simpler:

One set of 500 12v WS2811 addressable RGB tree lights
A big 12V power supply
One Arduino Teensy LC
Bit of wire & heatshrink
A Christmas tree

… And that’s it. A Teensy LC was chosen mostly because I had one, but the board also comes with a nice level shifter on pin 17, meant for driving addressable LEDs like these. Didn’t even need a breadboard!

Tracking and Labeling LEDs

I wanted to make it possible to reconstruct the tree via a not-careful, non-synchronized, video capture.

I’ve seen approaches with stationary or near-stationary cameras, like Stand-up Maths‘ approach that blinks LEDs one at a time, or Twinkly’s commercial product, where a near-stationary capture of coded LED blinks is used to capture a single labeled perspective. Both of these simplify the labeling problem, but slow down capture. Twinkly’s approach in particular, with some nice UX to guide the user into capturing few high-quality frames with good LED coverage, is probably very computationally healthy for their server backend!

In my case, I’d prefer to support a completely arbitrary walk all around the tree, using a regular camera with no synchronization or feedback. Also, because I think big computational problems are fun, I’d like to use every frame in what’s likely to be a 2-or-3-minute 60-FPS video. While the accuracy benefits of using ~10k frames are probably marginal, and certainly not worth the massive additional computational cost, I’m doing it anyway for the extra challenge – and, again, since part of the point here is to demonstrate solving a general structure-from-motion problem.

In practice this means 1) every LED would need to be on all the time / detectable in every frame, and 2) I’d need to separate tracking LEDs and labeling them, since multiple frames are required to ID a point. Both fun challenges, IMO.

For per-frame LED detection, I went with a simple bright-blob finder. I set a camera to take a rather dark video w/ short exposure, so that LEDs showed up as obvious threshold-able and separate bright spots on a dark background. (There are better ways, but I’m lazy.)

For labeling: I elected to split the required 9 bits (for 500 LEDs) evenly across the r/g/b channels, encoding each set of 3 bits as 4-bit Hadamard codes for a little error detection, and blink them out in a short sequence. Since these LEDs’ intensities vary strongly depending on how they’re viewed, and how many pine needles are in the way, bits are encoded as positive/negative changes in brightness. Sequence alignment is done by flashing all the LEDs bright white for a bit between each codeword.

When you need a prototype, anything can be a tree. Especially things that used to be trees.
Dog for scale. Yes, I gave her the treat after this.

Three sequences of decoded data for one LED over ~2.6s of video. Yellow sections mark where we search for edges.

This does require tracking LEDs over several frames in order to interpret their IDs, but that’s done easily enough in image space, associating detections with their nearest neighbors frame to frame. Video is captured at 60FPS to help with this, and also to keep the blink sequence short in real time.

Detected and decoded LEDs + tracks over several frames. Also rejected points, usually from tree branches getting in the way.

Camera Calibration

While not strictly required for a bundle-adjustment problems like this, having a decent initial set of camera intrinsic parameters can simplify the process a bit. There is code in the repo for calibrating a pinhole-like camera with a ChArUco target, but any method that yields an approximately-correct calibration will do. The camera calibration will be refined by the solver anyway.

Initialization / Bootstrapping

Now that we have ~10k frames worth of LED observations, we need to start doing a 3D reconstruction – and to start doing any sort of optimization there, we need to initialize a few LEDs and viewpoints in a sane configuration.

Since we only have a monocular video, a vague belief in the continuity of time, and no gyro/accelerometer to give us other hints about motion, we must rely on optical data only – and we need to be a bit picky about it, else our results will be ambiguous or otherwise useless.

Here, the code:

Looks for a pair of frames where motion of the points isn’t explainable by a planar homography. Specifically: cv::findHomography(…) should result in a lot of outliers, more than 50% of the observed points.
This guarantees we are observing parallax effects between two “distant” captures, which is a necessary condition for getting a meaningful transform. It’s also the truth – Christmas trees are generally not flat.
Estimates the essential matrix via Nister’s 5-point method.
OpenCV has a nice implementation of this, including RANSAC validation, in cv::findEssentialMat(…).
Decomposes this essential matrix into a rotation and unit-magnitude translation, plus triangulations of observed points, via cv::recoverPose(…) – which, under the hood, decomposes the essential matrix via SVD, then applies some sanity checks to confirm which hypothesis is correct.

At this point, we have a reasonable initialization for the relative pose between two camera frames, plus a handful of points in 3D space (with arbitrary scale – assumed unit translation between cameras). We arbitrarily declare that the first camera is the origin of the world coordinate system.

Reconstruction Process

This process is more or less standard photogrammetry / bundle adjustment in the style of Bundler (Snavely et al, 2006/2007). Given our dataset of LED observations in frames, we do the following, at each step minimizing reprojection error of each observation:

Initialize new viewpoints using the current mapped set of LEDs (via cv::SolvePnP(…) ).
Jointly solve all camera poses + LED points, including these new observations.
Triangulate new LED points using these newly-added camera viewpoints.
Remove any impossible points from the map (e.g. points that are behind cameras that claim to observe them).
Jointly solve all camera poses and LED points, including any new LEDs we just added in step 3, minus the nonsense points we removed in step 4.

This process is repeated, levering up how many LEDs and camera frames we can consider in the problem, until we can’t add any more. At the end, we unlock the camera’s intrinsic calibration parameters and do a final solve over everything. The final output is a set of 3D points for every LED (known except for scale – this is a monocular problem). We also get the poses of every camera frame used, and a refined set of intrinsics for the camera, both of which are thrown away for this application.

This optimization is done using ceres-solver. One residual functor (projecting 3D LED coordinates into images & comparing vs measurements), plus some flags to freeze/unfreeze certain terms, is enough to run all of the above.

A 3D plot showing the coordinates of all 500 reconstructed LEDs. — Yeah, it’s a bit of a Charlie Brown tree. Not bad though – even got the LEDs trailing down towards the power supply.

Fudging Gaps & LED-Strand Assumptions

Given the nature of the capture, it’s likely that a few LEDs won’t be seen by the camera, and therefore won’t be reconstructed. We could leave these lights off, but that’s no fun.

For the sake of mathematical purity, I’ve avoided making any assumptions about the construction of the LED strand up to this point – but we can leverage its uniform spacing to paper over gaps, by linearly interpolating the position of missing LEDs between their reconstructed neighbors.

This is a rough approximation, but it works well enough in practice and is much better than leaving lights off everywhere.

A similar constraint could also be used to deal with any fly-away outliers during the reconstruction, since the wire length does place a maximum on the distance between any two neighboring LEDs. I didn’t find this necessary in my implementation, but it could work well as a sanity check or regularization. It might also be handy for applying real scale to the tree.

Alignment

This reconstructed tree is in an arbitrary coordinate system – the orientation of the first camera image, whatever that was. This is not particularly convenient for animations.

After reconstruction, we solve for a rotation/translation that:

Minimizes distance from every point to the origin, and
Minimizes distance from every point to the Z axis.

While no tree is perfect, this centers and aligns the vaguely-conical point cloud with an up/down axis.

Firmware & Animations

For convenient access in firmware, I wrote a script that converts the output into a header with LED coordinates, both as floats and as 16p16 fixed-point values handy for libraries like FastLED.

For example: FastLED has some nice 16p16 fixed-point Perlin noise implementation, which I used to make an easy “snowfall”-like animation by sliding the point cloud through a noise volume, then applying a gradient on the result. This does make for a lovely subtle effect that still shows off the 3D – but, since that translates poorly to 2D video, I also added some plane sweeps to really sell the effect.

Fun fact: I actually did most of this project last year, and got it working more or less on Christmas Eve. Can’t exactly make a blog post about this on New Years’, right? I’m looking forward to playing with the animations a bit more this holiday season. 🎄