The purpose of this project is to design a simple program that recognizes gestures from a human hand in an image. This requires the identification of certain parts of the hand, i.e., the fingers, palm, and wrist areas, and also requires that the hand itself be segmented apart from other appendages in the image, such as the arms and torso. The intention of this project then is to create a gesture recognition system that is capable of keeping track of palm, wrist, and fingertip regions of an imaged hand, and is capable of recognizing and categorizing the unique movements of these regions when performing certain gestures.
In order to recognize the individual parts of the hand separate from other appendages within the image, I devised a simple scheme for finding the palm centroid with the intent of working outward from the palm to segment fingertips, the wrist area, etc. Thus, the first step in building this gesture recognition system was to compute the axis of elongation for the imaged arm and then find the widest point on this object, which would be labeled as the beginning of the wrist. However, since the orientation of the arm was always changing within the image, finding the widest point on the arm required a search along lines perpendicular to the arm's axis of elongation.
The palm centroid was labeled as the point at which the widest line perpendicular to the arm's axis of elongation intersects the arm's axis of elongation. Using this elongation-based approach worked well for still or rapidly moving hands, and managed to locate the palm centroid at almost any orientation, frame placement, or size (perspective) in relation to the camera.
After finding the palm centroid, I search out from the palm along the arm's axis of elongation looking for the greatest amount of directional blackspace (non-human-regions). I mark the direction of the highest amount of blackspace as the fingertip direction. This approach allows the fingertip direction to be found regardless of hand orientation, frame placement, size (perspective), or number of fingers visible to the camera.
Observe the video below. The black line shows the direction of fingertips in the image. This directionality changes as the fingertips move in the image. Also note the green palm centroid in the video, which sticks to open and closed palms well and also shows success at sticking to sideways palms.
After finding the direction of the fingertips in the image, I search outward to the point at which the arm's elongation axis intersects the image-border in the direction of the fingertips. I record this point as as the finger goal; this goal-point is used when trying to find the upmost fingertip in the image. All hand pixels are computed with the euclidean distance to this goal-point, and the hand pixel with the shortest euclidean distance to the goal-point is labeled as the current finger tip.
Observe the following videos. Note how the fingertip crosshair sticks to the furthest extended fingertip, regardless of hand orientation, size (perspective), or speed of movement:
With the ability to recognize the furthest extended fingertip in an image, I fulfill a gesture recognition requirement of being able to draw with a finger in midair. I translate this into a drawing system later on. In order to classify other gestures, it is helpful to segment the hand separate from the arm -- that is, to cut off the arm at the wrist and only concern ourselves with the actions of the hand itself.
To accomplish this, I label all pixels within fingertip distance of the palm centroid as belonging to the hand region. This measurement works very well regardless of perspective or orientation due to the unchanging proportions of the human hand. Observe the hand pixels, labeled pink, and the arm pixels, labeled black:
From here, it is simply a matter of measuring the hand's area with respect to its surroundings in order to gauge whether the hand is making a fist, wide open, or somewhere inbetween. To make this comparison of hand-to-environment area easier, I needed to establish a comparison region that changed size and area as the hand became larger or smaller due to perspective. Thus, I ensconed the hand in a white circle, the pixels of which were compared to the current pink hand area in order to obtain a dynamic ratio of hand pixels to surrounding environment. Observe how the white area expands and contracts with the hand's movement, and how the shrinking of the circular comparion area during fist-clenching makes it easy to recognize a fist via the hand-to-circle ratio:
|Gesture||Hand-to-Environ Ratio||Hand Circularity|
|Closed Fist||Range [1 - 3], Avg 2.6||0.83 < circularity =< 1|
|Open Hand (Wave)||Range [0.6 - 1.5], Avg 1.1||0.76 < circularity < 0.83|
|Pointing Hand||Range [0.7 - 2], Avg 1.2||0.57 < circularity < 0.78|
A high circularity score automatically triggers a closed fist solution, while a low circularity score matched with a low ratio score automatically triggers a pointing hand solution. Open hand solutions are scored by the current measured hand's precision in matching the open hand's average open-hand ratio and circularity range.
My program does well in identifying hand segments and distinguishing between gestures, achieving near-perfect accuracy in testing at submission time. Due to this success, the "fist pulsing" gesture was added as an extra recognizable gesture. Fist pulsing is triggered when a fist-open_hand-fist pattern is detected for several consecutive frames. Given the high accuracy of my scoring system, my true-positive rate approached 1 and my false-positive rate approached zero across all tests. I am very, very pleased with these results.
My final output is a drawing program which allows users to mark colors on a background using their pointed finger. Making a fist causes the color of the marker to change; conversely, waving erases or resets the current drawn image while fist-pulsing changes the color of the background canvas. During testing, my program was able to successfully recognize and respond to each of these gestures, given uniform scene lighting and input images of caucasian hands. Due to time constraints this program has not yet been tested on other skin types, although this functionality may be developed further in future implementations.