Physical Time of Flight Sensing with Airsoft and HTC Vive

Inspired by a completely serious discussion about time of flight ranging in general:

The idea is to try to get 3D point clouds of objects via measurement of a physical projectile’s time of flight, as opposed to existing methods that are practical or precise.  The original idea was to use a potato cannon – using a spud for the projectile, or for maximum mockery, launching a real precision ToF imaging system instead.

In this post, that concept is expanded to produce full 3D point clouds of objects, using an Airsoft gun, a HTC Vive, and a microphone.

Physical Setup

The “depth sensor” here is a cheap electric Airsoft gun, with a Vive controller attached to the barrel, and a microphone attached nearby.

Painter’s tape, for precision assembly.

The general idea is:

  1. A user aims at a target and pulls the trigger.
  2. The microphone picks up the sound of the airsoft pellet firing (spring snap, etc).  We record:
    1. The pose of the Vive controller at the time of launch
    2. The timestamp of the “launch” sound
  3. The pellet impacts the target some time later, making a sound that is picked up by the microphone.  This impact timestamp is also recorded.
  4. Time of flight of the pellet is calculated, which gives us the linear distance the pellet flew.
  5. The impact point in 3D space is drawn by:
    1. Getting the “launch” pose of the Airsoft gun (rigid body transform on top of the recorded Vive controller pose)
    2. Extending a line from the recorded “launch” pose, with length equal to projectile_velocity * time_of_flight

Do that a bunch of times, and in theory we get a point cloud of the object.

Implementation

This project was done in Unity – primarily for easy access to the SteamVR plugin, but also for the microphone APIs, and the fact that the point clouds could be easily visualized in a VR headset.

The relevant source code is all up on Github, but given that this project involves fast-moving projectiles and a mess in the living room – not to mention potential difficulty in replicating the precision tape-and-Vive-controller assembly, or differences in audio setup – I don’t plan to upload a fully working project.  Use this at your own risk, etc.

When capturing data, we record raw information about each shot – namely, the pose of the Vive controller when the shot happened, plus the projectile’s time of flight.  This allows us to manually adjust the “calibration” (transform between barrel + controller, plus shot velocity) after the fact.  More on this later.

// Data for the reconstructed point cloud
class AirsoftPoint
{
    public Vector3 shotPosition;
    public Vector3 shotDirection;
    public float timeOfFlight;
}

List<AirsoftPoint> pointCloud; // Contains info about shots/impacts

The fundamental detection code is a simple peak-amplitude detector, which operates on the absolute value of the audio samples.  In a given sample buffer, find amplitudes > 0.5 that are separated by some small amount of time (NONMAX_SUPPRESS_TIME_S, here 0.025 seconds), and add their timestamps to a list for later processing with a state machine.

// Iteratively find peaks separated by at least NONMAX_SUPPRESS_TIME_S
List<float> peakTimes = new List<float>();

while(true)
{
    float peakVal = 0.0f;
    int peakIdx = -1;
    for(int i = 0; i < numSamples; i++)
    {
        if(peakVal < dSamples[i])
        {
            peakVal = dSamples[i];
            peakIdx = i;
        }
    }

    if (peakVal > 0.5)
    {
        // convert sample index to time & keep track of this event
        float peakTime = tCurrSample - 
                         ((float)(numSamples - peakIdx)) / (float)(MIC_SAMPLE_RATE);
        peakTimes.Add(peakTime);

        // enforce NONMAX_SUPPRESS_TIME_S
        int nSuppressSamples = (int)(NONMAX_SUPPRESS_TIME_S * MIC_SAMPLE_RATE + 0.5);
        int startSuppressWindow = peakIdx - nSuppressSamples;
        int endSuppressWindow = peakIdx + nSuppressSamples;

        startSuppressWindow = startSuppressWindow < 0 ? 0 : startSuppressWindow;
        endSuppressWindow = endSuppressWindow >= numSamples ? numSamples - 1 : endSuppressWindow;

        for (int i = startSuppressWindow; i <= endSuppressWindow; i++)
            dSamples[i] = 0;
    } else
    {
        // Nothing else interesting here
        break;
    }
}

One of the first observations I made is that each airsoft pellet seems to make three sounds: two with constant timing when the trigger is pulled, then a third with variable timing depending on distance.

I’m not entirely sure what causes the first two noises. I expected one – perhaps a spring releasing, then air coming out the barrel as the pellet exits? – but the good news is their timing is consistent, so a simple state machine can handle detecting the pair of sounds for each launch.

if(peakTimes.Count >= 3)
{
    peakTimes.Sort();

    foreach(float eventTime in peakTimes)
    {
        switch(currState)
        {
            case ShotTrackState.Idle:
                lastSpringFiredTime = eventTime;
                Debug.Log("Detected spring fired at " + lastSpringFiredTime);
                currState = ShotTrackState.SpringFired;
                break;

            case ShotTrackState.SpringFired:
                Debug.Log("Detected barrel exit at T+" + 
                          1000*(eventTime - lastSpringFiredTime));
                lastBarrelExitedTime = eventTime;
                currState = ShotTrackState.BarrelExited;

                break;

            case ShotTrackState.BarrelExited:
                lastImpactedTime = eventTime;
                Debug.Log("Detected shot: npeaks=" + peakTimes.Count +
                    " lastSpringFiredTime " + lastSpringFiredTime + 
                    " +barrel " + (1000*(lastBarrelExitedTime - lastSpringFiredTime)) + 
                    " +impact " + (1000*(lastImpactedTime - lastBarrelExitedTime)));
                currState = ShotTrackState.Impacted;

                // Record pose.
                AirsoftPoint pt = new AirsoftPoint();
                pt.shotPosition = trackedDevice.transform.position;
                pt.shotDirection = trackedDevice.transform.forward;
                pt.timeOfFlight = lastImpactedTime - lastBarrelExitedTime;
                pointCloud.Add(pt);
                renderedCloud.Add(makeNewPoint());
                break;
        }
    }
}

(The full code is here, including types missing from the above snippets.)

“Calibration”

The exact 6-DoF transformation between two objects held together with painter’s tape is unknown, other than vague axis alignment.  (This would also be true of a real precision assembly, but I digress.)

To account for calibrations being different, the state machine that tracks shots only stores information about timing and the Vive controller’s position at the time of the shot – leaving the velocity of the pellet as an adjustable parameter, used when the point cloud is being rendered.

In the update() loop for the rendering object:

// Place points at impact locations
for(int i = 0; i < pointCloud.Count; i++)
{
    renderedCloud[i].transform.position = pointCloud[i].shotPosition + shotVelocity * pointCloud[i].timeOfFlight * pointCloud[i].shotDirection;
}

With renderedCloud being a list of GameObjects (sphere primitives), with each created every time a new impact is registered.

Then, to calibrate, one must simply shoot the same point from a variety of distances and locations, then regress a shotVelocity that makes the grouping of impact points as tight as possible.  A numerical minimization on variance of the impact position would likely work very well.  I just hooked up shotVelocity to a slider and eyeballed it.

In the spirit of eyeballing things, I did not correct the 6DoF pose between the controller and the end of the barrel.   In Unity, the Vive controller’s (0,0,0) point is inside the circle at the top of the controller, so I opted to just fiddle with the tape until things lined up well.  That said, it would be straightforward to expand the correction above to add rotations/translations of the origin pose.

Helpfully, impact points found via this method are registered to real 3D space – so comparing points against the physical location of the target worked as a good hint when aligning.

Results

I set up a few cardboard boxes in the living room for scanning tests.  While this system won’t be replacing real depth sensors anytime soon, it did do better than expected, measuring around +/- 5cm of error.  Not enough for an accurate reconstruction, but enough to tell the difference between foreground and background.

I can only assume that changes to the target are due to quantum effects of the observations. Specifically, 0.2 grams per observation.

Improvements

This was implemented quickly, for a joke; there’s lots of opportunity for improvement.  I may revisit the project just as soon as I stop finding airsoft pellets all over the living room.

Error in launch angle and velocity

The Airsoft gun used here was selected because it was cheap on Amazon.  It would seem considerably more effort went into making it look “tactical” than making it functional, accurate, or consistent.

Based on the audio signal and a few tests, I’m pretty sure that pellets are randomly impacting the inside of the barrel, causing random variation in shot velocity and angle that are not accounted for when drawing the point cloud.  This results in some randomness of the impact point.  In a test where the airsoft gun was clamped in place, shots seemed to pattern in a ~2cm radius at ~1.5m distance – not all that great.

Error in calibration

I did not devise any numerical optimization method for determining the rigid body transform between the Vive controller and the Airsoft gun.  Instead, launch angle and offsets were tweaked manually (by fiddling with tape) and velocity was tweaked manually in the editor, to yield a point cloud in VR that looked decent.

Error in ToF measurement

It’s also true that the microphone used here is on an old Logitech headset that I got for free with a mouse once; its delays – while probably small on the scale of projectile velocity – aren’t accounted for.  More generally, I don’t have a precise timing relationship between the trigger pull, audio samples, and the Vive HMD; I simply use the controller pose in the “current frame” (aka the Update() call which detects the launch), and try to hold reasonably still when shooting.  All of this adds error to the launch pose that couples directly into point cloud error, even if the timing between launch and impact audio samples is assumed to be consistent.

Speaking of which – the detection of launch and impact sounds is very simplistic, finding peaks directly on audio samples.  This is expected to add some error to the time of flight of each projectile.  It might be possible to create a better timing reference by placing an optical break sensor across the muzzle, then using it to short out the audio signal when a pellet leaves the barrel, etc.

FAQ

…Why?

Why not?

Why didn’t you use [more precise ranging method/device] instead?

Yes, there are many, many better ways to get point clouds or 3D geometry of things, other than shooting little plastic balls at them.  Especially when these precision devices were attached to a precision tracking system like Lighthouse.  But those methods aren’t as funny.

Can I use this to 3D scan people / [thing]?

In theory anything that makes a loud noise with predictable timing after being hit with an airsoft pellet, while remaining still, should work.  Inanimate, inexpensive objects are strongly recommended.

On most person-sized objects I figure you’ll need >1000 samples on a completely stationary target to get a recognizable point cloud.  I’ll leave it to the reader to convince their friends to participate.

*I accept no responsibility for any injuries, damage, or loss of friendship resulting from the reader’s choice of target.

Can we talk about commercial applications for this technology?

…If you actually find a meritful use for this, or are willing to fund it regardless, I’m all ears.


Disclaimer: This whole thing was done as a joke.  I make real optical depth sensors at work – Structure Sensor and Structure Core.  To match the performance of one of those, this system would need to shoot an object with ~1.2 million pellets per “frame” at 30+ frames per second, which sounds unpleasant, expensive, and environmentally unfriendly. 

That said, if somebody did make something that could accurately launch ~7400kg of compostable airsoft pellets per second, it might revolutionize the landscaping industry.  

About the Author