Master Thesis: Sign language translator using Microsoft Kinect XBOX 360
Thesis final result video
IntroductionUnlike other animals, humans have been endowed by nature with the voice capability that allows them to interact and communicate with each other. Hence, the spoken language becomes one of the main attributes of humanity. Unfortunately, not everybody possesses this capability due
to the lack of one sense, i.e. hearing. Sign language
is the basic alternative communication method between deaf people and
several dictionaries of words or single letters have been
defined to make this communication possible.
The goal of the thesis consists of developing an automatic Sign Language Translator using the data provided by the Microsoft Kinect XBOX 360TM camera. An input sign done by user is recorded by the camera and after processing the raw image, the translator will provide the correspondent word/letter in the spoken language as output. You can easily understand the goal by seeing Fig 1.
Several works about Sign Language Translators have been introduced before and Gesture Recognition has always been an active research area. A wide number of authors have tried to find new ways to solve this problem and almost all the time they end up using complex implementations based on statistical descriptors that increase the complexity of computation.
In a project such as this with a time frame of only 3 months, the constraints are an issue. This required the setting up of a suitable and feasible goal of the project from the beginning. The aim of the project is to make the Sign Language Translator work in the simplest possible way and leave it open for future improvements. Starting from a basic implementation and improve it as much as possible until the
best possible accuracy of system will be reached.
Sign Language Translation task is highly influenced by its linguistics. The syn-tax and morphology of the Sign Language play a very important role and the order of the words or the non-manual components (i.e. lip movements, facial expression, etc.) can drastically change the meaning of a sign. These facts make the translation process even more complex. The Sign Language Translator will be capable of satisfying the following goals:
- Use data provided by the Microsoft Kinect XBOX 360 camera.
- Recognize a list of basic signs. This list will containkey words such as the ones from Table I. Using these words, the deaf user will be able to transmit what he/she needs and the communication between deaf and ordinary users will become possible (see again Fig 1).
- Considering the data that the Microsoft Kinect XBOX 360TM provides, the signs are homemade rather than belonging to an official sign language because the main goal of this project is to make a system capable of working with a wide number of meaningful words. If the work is focused on a specific official sign language, the selection of these basic meaningful words will be hard since sometimes the difference between them are in characteristics that this project is not taking into account (i.e: finger positions, lip movements,etc.).
- Design an interactive user interface so that the user will be able to run the application without any previous knowledge.
- The system must work on real time and give an instantaneous output once the gesture is done.
- Allow the user to auto-train the dictionary (training dataset) by adding new words.
- By using the tracking capability of the Kinect camera, a meaningful 8-dimensional descriptor for every frame is introduced here. In addition, an efficient Nearest Neighbor Dynamic Time Warping (DTW) and Nearest Group DTW classifiers are developed for fast comparison between signs. With the proposed descriptors and classifiers combined with the use of the Microsoft Kinect XBOX 360™, the system has the potential to provide a computationally efficient design without sacrificing the recognition accuracy compared to other similar projects. The project does not focus on a particular official dictionary of signs due to limited time. In addition, the project intends to be a feasibility study where proof-of-concept is the main goal. A default dictionary of 14 homemade signs is defined. This dictionary contains basic words such as “are”, “doctor”, “have”, “hello”, “hot”, “hungry”, “I”, “love”, “phone”, “play”, “question”, “sick”, “want”, “you”, “sleep”, and “play”. By combining these words, a wide list of basic sentences can tell about the need of the user (e.g. “I want to see a doctor”, “I am sick”, etc, Table 3.3).