Working with depth cameras in interactive installations often requires detecting when people enter specific 3D regions. While 2D bounding boxes are common, true 3D zone detection unlocks more sophisticated spatial interactions. In this post, I'll share my experience building a complete zone detection and visualization system using OAK-D cameras and TouchDesigner's Script CHOP.
Understanding the OAK-D Coordinate System
The first challenge was understanding how OAK-D reports 3D positions. Unlike simple 2D tracking, the OAK-D uses an IMU-stabilized world coordinate system:
- X axis: Horizontal, +X = right (millimeters)
- Y axis: Vertical, +Y = up, -Y = down (millimeters)
- Z axis: Depth, +Z = away from camera (millimeters)
A critical insight: camera mounting orientation only affects the video feed, not the 3D coordinates. The IMU ensures coordinates remain world-referenced regardless of how the camera is physically mounted. This means your zone definitions stay consistent even if you rotate or flip the camera.
# Example detection from OAK-D
# Person at x=95mm, y=-802mm, z=2778mm
# → 2.78m from camera, slightly right, below optical axis (standing)
The Script CHOP Parameter Gotcha
TouchDesigner's Script CHOP is powerful for custom data processing, but there's a subtle API detail that cost me debugging time. When creating custom parameters, the appendX() functions return a ParGroup, not a Parameter directly:
# ❌ WRONG - This will fail silently or error
p = page.appendMenu('Preset', label='Camera Preset')
p.menuNames = ['option_a', 'option_b'] # ParGroup doesn't have menuNames!
# ✅ CORRECT - Access the first parameter in the ParGroup
p = page.appendMenu('Preset', label='Camera Preset')
p[0].menuNames = ['option_a', 'option_b']
p[0].menuLabels = ['Option A', 'Option B']
p[0].default = 'option_a'
This pattern applies to all parameter types: appendInt, appendFloat, appendMenu, appendToggle, and operator references like appendDAT, appendCHOP, appendTOP.
Calculating Camera Intrinsics from FOV
Rather than hardcoding camera intrinsics or requiring manual calibration, I built a preset system that calculates intrinsics from the camera's field of view specifications:
OAK_CAMERA_PRESETS = {
'oak_d_pro_af': {'name': 'OAK-D Pro (Auto-Focus)', 'hfov': 66, 'vfov': 54},
'oak_d_pro_ff': {'name': 'OAK-D Pro (Fixed-Focus)', 'hfov': 69, 'vfov': 55},
'oak_d_sr': {'name': 'OAK-D SR (Short Range)', 'hfov': 80, 'vfov': 55},
}
def calculate_intrinsics(preset_key, width, height):
preset = OAK_CAMERA_PRESETS[preset_key]
hfov_rad = math.radians(preset['hfov'])
vfov_rad = math.radians(preset['vfov'])
fx = (width / 2.0) / math.tan(hfov_rad / 2.0)
fy = (height / 2.0) / math.tan(vfov_rad / 2.0)
cx, cy = width / 2.0, height / 2.0
return {'fx': fx, 'fy': fy, 'cx': cx, 'cy': cy}
This approach lets users simply select their camera model and resolution from dropdowns, with the math handled automatically.
3D to 2D Projection: The Y-Axis Trap
Projecting 3D zone boundaries onto the 2D camera feed required careful attention to coordinate system conventions. The key insight is that image space has Y+ pointing down, while the OAK-D world space has Y+ pointing up:
def project_to_screen(point3d, camera):
x, y, z = point3d
if z <= 0:
return None # Behind camera
# Project to pixel coordinates
# Note: Y is inverted between world and image space
px = (x * camera['fx'] / z) + camera['cx']
py = camera['cy'] - (y * camera['fy'] / z) # Subtract, not add!
# Normalize to TouchDesigner's -0.5 to 0.5 range
nx = px / camera['width'] - 0.5
ny = 0.5 - py / camera['height']
return (nx, ny)
Getting this wrong results in mirrored wireframes that don't align with detected people—a frustrating bug to track down.
CHOP to GLSL: Channel Naming Matters
Passing data from Script CHOP to GLSL via CHOP To TOP revealed another gotcha: channels are mapped to RGBA in alphabetical order. Using descriptive names like x1, y1, x2, y2 can result in unexpected channel ordering.
# ❌ Alphabetical order: a, b, x1, x2, y1, y2 — not what we want!
scriptOp.appendChan('x1')
scriptOp.appendChan('y1')
scriptOp.appendChan('x2')
scriptOp.appendChan('y2')
# ✅ Use r,g,b,a to guarantee RGBA mapping
rChan = scriptOp.appendChan('r') # → R channel (x1)
gChan = scriptOp.appendChan('g') # → G channel (y1)
bChan = scriptOp.appendChan('b') # → B channel (x2)
aChan = scriptOp.appendChan('a') # → A channel (y2)
In the GLSL shader, sample with proper texel addressing:
// CHOP To TOP: width = numSamples, height = 1
int numLines = textureSize(sTD2DInputs[1], 0).x;
float texX = (float(i) + 0.5) / float(numLines);
vec4 lineData = texture(sTD2DInputs[1], vec2(texX, 0.5));
vec2 p1 = lineData.rg; // x1, y1
vec2 p2 = lineData.ba; // x2, y2
GLSL Uniform Defaults
GLSL uniforms default to zero, which can cause invisible rendering if you forget to set them in TouchDesigner:
uniform float uLineWidth; // Defaults to 0 = invisible lines!
uniform float uOpacity;
void main() {
// Always provide fallback defaults
float lineWidth = uLineWidth > 0.0 ? uLineWidth : 2.0;
float opacity = uOpacity > 0.0 ? uOpacity : 0.8;
// ...
}
This defensive pattern saves time when testing shaders before all parameters are properly connected.
Texture-Based Digit Rendering
For zone labels, I initially considered SDF-based procedural digit rendering. However, a texture atlas approach proved simpler and more flexible:
// Digit texture: 50x500 PNG with 0-9 stacked vertically
float sampleDigit(vec2 localUV, int digit) {
digit = clamp(digit, 0, 9);
float digitHeight = 0.1; // Each digit = 1/10th of texture
vec2 texUV;
texUV.x = localUV.x;
texUV.y = float(digit) * digitHeight + localUV.y * digitHeight;
return texture(sTD2DInputs[4], texUV).r;
}
This allows using any font by simply swapping the texture, without touching shader code.
Architecture Overview
The final system uses three Script CHOP instances feeding into a single GLSL TOP:
┌─────────────┐
│ Zones Table │ (DAT: name, zone, xmin, xmax, ymin, ymax, zmin, zmax)
└──────┬──────┘
│
├────────────────┬────────────────┐
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Zone Detect │ │ Zone Lines │ │ Zone Labels │
│ Script CHOP │ │ Script CHOP │ │ Script CHOP │
│ (triggers) │ │ (wireframe) │ │ (numbers) │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
▼ ▼ ▼
To Logic CHOP To TOP CHOP To TOP
│ │
└───────┬────────┘
▼
┌─────────────┐
│ GLSL TOP │ ← Camera + Digits Texture
└─────────────┘
Conclusion
Building this system reinforced several lessons: always verify coordinate system conventions early, defensive defaults save debugging time, and TouchDesigner's flexibility comes with API subtleties worth documenting. The combination of Script CHOPs for data processing and GLSL for visualization provides a powerful pattern for custom computer vision overlays.
The complete code handles variable zone counts, camera presets with automatic intrinsics calculation, and real-time active zone highlighting. For installations requiring spatial awareness beyond simple presence detection, this architecture provides a solid foundation.