Multi Task GPT model for Sign Language Processing, using spoken language text.
- Spoken language text is left as is.
- SignWriting is represented as SignWriting in Unicode (SWU).
- HamNoSys is left as is.
- Poses are tokenized using the MediaPipe vector quantizer.
There are many books and articles on Sign Language Processing, but they are not easily/widely accessible. Collecting a large dataset of text about sign language can be beneficial for large scale pretraining.
This GPT model is designed to be large, and encode a wide variety of tasks in Sign Language Processing. The tasks we currently cover are
| Task Name | Prompt | Response | Data | 
|---|---|---|---|
| Unconstrained SignWriting Generation | Generate a SignWriting sequence in American Sign Lagnguage: | 𝠃𝤘𝤣𝣳𝣩𝤉𝣻 𝠃𝤘𝤧𝣻𝤕𝣴𝣼𝤎𝤂𝤆𝣦 𝣢𝤂 | sign-language-processing/signbank-plus | 
| SignWriting to Text Translation | Translate the following American Sign LanguageSignWriting sequence:𝠃𝤘𝤣𝣳𝣩𝤉𝣻 𝠃𝤘𝤧𝣻𝤕𝣴𝣼𝤎𝤂𝤆𝣦 𝣢𝤂toEnglishtext: | Hello World | sign-language-processing/signbank-plus | 
| Text to SignWriting Translation | Translate the following Englishtext:Hello WorldtoAmerican Sign LanguageSignWriting: | 𝠃𝤘𝤣𝣳𝣩𝤉𝣻 𝠃𝤘𝤧𝣻𝤕𝣴𝣼𝤎𝤂𝤆𝣦 𝣢𝤂 | sign-language-processing/signbank-plus | 
| Single Sign Description | Describe how to sign the following Englishtext:HelloinAmerican Sign Language: | With your dominant hand open, touch your forehead and move your hand away, palm facing out. | sign-language-processing/signwriting-description | 
| SignWriting Description | Describe the sign represented by this American Sign LanguageSignWriting:𝠃𝤘𝤣𝣳𝣩𝤉𝣻. | With your dominant hand open, touch your forehead and move your hand away, palm facing out. | sign-language-processing/signwriting-description | 
| Description to SignWriting | Generate American Sign LanguageSignWriting from the description of a sign:With your dominant hand open, touch your forehead and move your hand away, palm facing out. | 𝠃𝤘𝤣𝣳𝣩𝤉𝣻 | sign-language-processing/signwriting-description | 
| Gloss to Text Translation | Translate the following American Sign Languagegloss sequence:HELLO WORLDtoEnglishtext: | Hello World | DGS Corpus, PHOENIX | 
| Text to Gloss Translation | Translate the following Englishtext:Hello WorldtoAmerican Sign Languageglosses: | HELLO WORLD | DGS Corpus, PHOENIX | 
| Unconstrained Pose Generation | Generate a pose sequence in American Sign Lagnguage: | code435 code325 ... code492 | sign-language-processing/sign-vq | 
| Pose to SignWriting Transcription | Transcribe the following American Sign Languagepose sequence:code435 code325 ... code492into SignWriting: | 𝠃𝤘𝤣𝣳𝣩𝤉𝣻 | sign-language-processing/signwriting-transcription | 
| SignWriting to Pose Animation | Animate the following American Sign LanguageSignWriting sequence𝠃𝤘𝤣𝣳𝣩𝤉𝣻into poses: | code435 code325 ... code492 | sign-language-processing/signwriting-transcription | 
| Pose to Text Translation | Translate the following American Sign Languagepose sequence:code435 code325 ... code492intoEnglish: | Hello World | SignTube | 
| Text to Pose Animation | Animate the following EnglishtextHello WorldintoAmerican Sign Languageposes: | code435 code325 ... code492 | SignTube | 
Set up the environment:
conda create --name sign_gpt python=3.11 -y
conda activate sign_gpt
pip install git+https://github.com/sign-language-processing/datasets.git
pip install mediapipe gdown lxmlGenerate the data:
python -m sign_gpt.custom_datasets.rwth_phoenix2014_t
python -m sign_gpt.custom_datasets.dicta_sign
python -m sign_gpt.custom_datasets.dgs_types
python -m sign_gpt.custom_datasets.dgs_corpus
python -m sign_gpt.custom_datasets.dgs_corpus_document
python -m sign_gpt.custom_datasets.signbank_plus
python -m sign_gpt.custom_datasets.signwriting_hamnosysWe have very large crawlers (such as CommonCrawl) that can be used to collect data from websites/books. We can vectorize all videos. We have very strong and capable language models able to help us create data. So the idea would be: crawl the web/whatever, feed the contents to a language model to generate a system prompt, and the relevant inputs and outputs from the document. We then compile that into "CrawlInstruct"
Videos that include captions are always converted to a translation task.
pip install .[huggingface]
python -m sign_gpt.models.huggingface.train_lora
sbatch sign_gpt/models/huggingface/train.sh
pip install .[keras]
python -m sign_gpt.models.keras.train_gemma