Archive for the ‘Research’ Category

Evaluation of Logitech C910 webcam for Computer Vision use

Friday, April 1st, 2011

I’ve recently been using a pair of PlayStation 3 Eye’s for reading structured light patterns projected onto objects. These particular cameras have had a lot of attention from hackers due to their value/performance.

The PS3Eye is a camera built for machine vision, and can provide ‘lossless’ 640×480 RGB frames at 60 frames per second with low latency, and is therefore particularly relevant for realtime tracking applications (e.g. multi-touch, 3-phase scanning). For OSX there is Maccam driver, and for Windows there is the fully featured CLEye driver from Alexp which supports programatic control of multiple cameras with full support for all camera features and also gives camera identity (through a GUID), and is free for 1 or 2 cameras per system.

But for a recent project, it became apparent that I needed resolution rather than framerate. My first instinct was to move to DSLR’s, and I began working with developing a libgphoto2 extension for openFrameworks called ofxDSLR. This route had the following issues:

  • Relatively expensive (compact cameras do not support remote capture, meaning I would have to use DSLR’s, with the cheapest compatible options at around £350 with lens – Canon 1000D+18-55mm lens)
  • Requires external power supply / recharging
  • Heavier than machine vision cameras
  • Flaky libraries (libghoto2 isn’t really built with CV in mind, and I found it was taking a lot of time to get results. And lack of solutions for Windows)
  • Slow capture (several seconds between send capture command and receive full result)

A DSLR offers:

  • Fantastic resolution
  • Great optics
  • Programmatic control of ISO, Focus, Shutter, Aperture
  • More than 8bits per colour

Due to the above issues, I decided to explore other options. This led me to the Logitech C910 which supports continuous capture at roughly 20x as many pixels as the PS3eye but at 1/120th of the frame rate.

Without further ado, here’s the video documentation (I recommend you chose either 720p or 1080p for viewing).

YouTube Preview Image

Notes:

Capture

Driver

  • UVC device (capture supported on all major desktop OS’s)
  • As of 1st April 2011, there is no way to programatically control the C910 from OSX, but this is likely to come soon (see here and here)
  • Programatic control from Windows through DirectShow. I recommend Theo Watson’s videoInput class for c++, which is included with openFrameworks or available as standalone.
  • I haven’t yet seen a way to uniquely identify a camera (Each PS3Eye can report its GUID identity, which is useful for recognising individual cameras in a multicam system)

Compression options

  • YUY2 (YUV 4:2:2) = lossless luminance, half resolution colour
  • MJPG = lossy, but higher frame rates supported than YUY2, since lower bandwidth required.

Programatic control of

  • Motorised focus
  • Shutter speed (aka exposure)
  • Gain (aka brightness)
  • ‘Hacky’ Region of Interest {ROI} (through digital Zoom, Pan, Tilt)

Focus

  • ~12 discrete focus steps (i.e. focus control is NOT continuous)
  • Furthest focus point is ~70cm, beyond this is classed as ‘infinity’
  • With sharpening turned off (i.e. getting more of the ‘raw’ image), we see a general lack of focus on surfaces other than at discrete steps
  • Closest macro focus at 3.5cm

Focus table [control value 0-255 / distance (cm)]

  • 255 / 3.5
  • 238 / 3.8
  • 221 / 4
  • 204 / 4.3
  • 187 / 5.3
  • 170 / 6.4
  • 153 / 8
  • 136 / 10.5
  • 119 / 15
  • 102 / 25
  • 85 / 40
  • 68 / 51

See also

Kinect + Projector experiments

Wednesday, January 12th, 2011

Using Padé projection mapping to calibrate Kinect’s 3D world with a projector.

  1. Using the kinect camera, we can scan a 3D scene in realtime.
  2. Using a video projector, we can project onto a 3D scene in realtime.

Combining these, we re-project images onto geometry to create a new technique for augmented reality

Previous videos (for process)

YouTube Preview Image YouTube Preview Image YouTube Preview Image

The pipeline is:

  1. Capture Depth at CameraXY (OpenNI)
  2. Convert to image of WorldXYZ
  3. Pade transformation to create WorldXYZ map in ProjectorXY
  4. Calculate NormalXYZ  map in ProjectorXY
  5. Guassian Blur X of NormalXYZ in ProjectorXY
  6. Guassian Blur Y of NormalXYZ in ProjectorXY
  7. Light calculations on NormalXYZ, WorldXYZ maps in ProjectorXY

Found the error!

Saturday, November 6th, 2010

Turns out that dataset 0.35m was broken.
And get pretty much a perfect fit without it at low orders.
YouTube Preview Image

Here’s the dirty scan:

it should look something like this:

It seems to have missed a data frame. This should have come up in the error checking…

hmm. anyway.,..

Structured light 3D scanning of projector pixels (stage 1: calibration)

Saturday, November 6th, 2010

I’ve been working on this method for a bit of time now….

The concept is:

  1. Make something like a litescape/wiremap/lumarca, but instead of ordered thin (1px) vertical elements, use any material in any arrangement
  2. Use a scattering field of ‘stuff’ to project onto (e.g. lots of of ribbon)
  3. Use a projector to shine loads of pixels into the stuff
  4. Scan where all the pixels land in 3D
  5. Reimagine the pixels as 3D pixels, since they know have a 3D location in space
  6. Project 3D content constructed from these 3D pixels

Since then I’ve thought of a few other decent uses for having scannable projection fields

Early prototypes were in VVVV, then moved to openFrameworks for it’s speed with pixelwise operations and accuracy with framewise. I started writing the scanning program on the bus between Bergamo airport and studio dotdotdot in Milan (which was just over a year ago). After lots of procrastinating and working on other projects, I’m finally getting some progress with this.

Also along the way I realised that a lot of people were doing similar things. When i started to project out the patterns out I realised I was doing something similar to Johnny Chung lee with his projection calibration work, wherein I found out about ‘Structured Light’. Also there’s Kyle McDonald’s work with democratising 3D scanning (particularly with super-fast 3-phase projection methods). Then more recently some things hit closer to home, such as Brett Jones’ interactive projection system.

So the first stage is calibrate the cameras:

YouTube Preview Image

Here we have 2 cameras at one end of the rails. Then the monitor is on a ‘train’ which can move forwards and backwards. Each point on the screen then has a 3D position (2D on the screen, 1D on the rail). We scan in using greycode XY structured light where each pixel is on the screen within each camera image.

Then if we run a correlation on this, we can try to make a relationship between the  4D position (2x2D) on the cameras, and the 3D position in the real world. This gives us a stereo camera, specifically built to scan in the location of projector pixels. Here’s what the correlation looks like at 4th order power series polynomial, triangular bases.

YouTube Preview Image

The next steps are to:

  1. Implement a Padé polynomial for accuracy at low orders
  2. Scan in a 3D scene
  3. Test different arrangements of scattering fields for aesthetic quality and ‘projectability’

The code for all this is available on our google code:

http://code.kimchiandchips.com

Please get in touch if you’re planning to use this for your projects! Code there’s released under a modified version of the MIT license. See the google code page for details (opinions on that license are also very welcome).

General thanks to Dan Tang.