Master Thesis: Sign language translator using Microsoft Kinect XBOX 360
MethodologyFigure 2 shows the flux diagram for each frame that is captured by the camera. For this frame, the joints of interest are obtained, normalized, and finally the frame descriptor is created. The current working mode (TESTING/TRAINING) will define in which dataset the sign that the current frame belongs to can be found. If the current mode is TESTING and once the last frame of the sign is added to the test gesture, the classifier will output the correspondent word in the output display.
A) OBTAIN JOINTS OF INTERESTThe system makes use of the six joints of interest shown in Figure 3. These joints are both hands (LH,RH), both elbows (LE,RE), the torso (T), and the head (H), where the last two are only used for normalization purpose. A weight is applied to give more importance to the joints that are more meaningful.
B) NORMALIZE DATAInvariant to user’s position:
All the joints are expressed with respect to the torso (T) joint to make the system robust to user’s position.
Invariant to user’s size:
The joints are expressed in spherical coordinates (Figure 4), and the distances d are normalized by the factor of the distance dHT from Figure 5 to make the system robust to user’s size.
C) DESCRIPTORAfter evaluating the importance of each of one of the features d, θ, and ϕ, only d and ϕ result to be meaningful. Hence, the 8-dimensional descriptor from Figure 6 is built by storing the values of these features for all the joints along the frames.
D) USED CLASSIFIERS