Tracking Pose in Three Dimensions

Sample reconstructions from a video clip of a dancer.

People can easily interpret the 3D motions of a dancer from a 2D video clip, even though multiple 3D motions could give rise to a particular 2D projection. The secret is to resolve the ambiguity by applying background knowledge about how people move in the everyday world. By giving the computer experience with typical human motions, one can create an automated system that will track a person's motions in 3D from a single camera input.

My approach to pose tracking draws heavily on my experience in image retrieval: each frame of a video is treated as a query image, used to locate similar synthetic images whose pose is known. By stringing together a plausible series of such retrieved poses over time, the entire motion sequence can be reconstructed.

See a list of papers on monocular markerless human motion capture.

This material is based upon work supported by the National Science Foundation under Grant No. 0328741. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.