Hand Prior-aware Intelligent Sign Language Understanding

Posted on:2024-02-10

Degree:Doctor

Type:Dissertation

Country:China

Candidate:H Z Hu

Full Text:PDF

GTID:1528306932958859

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

As a kind of visual language,sign language serves as the primary communication tool among the deaf community.It conveys meaning mainly by manual features,including hand shapes,orientations and movements,and is assisted by the non-manual features,e.g.facial expressions and lip motions.To build barrier-free communication between the hearing and deaf people,intelligent sign language understanding(SLU)has emerged.It is a multidisciplinary research topic,whose aim is to convert the sign language video into the corresponding text,or perform the reverse conversion,thus creating an interactive closed loop.The former corresponds to sign language recognition(SLR),while the latter represents sign language production(SLP).Recently,despite the promising progress in deep-learning-based sign language understanding,it remains huge challenges in this topic.Since sign data annotation is time-consuming and needs expert involvement,current annotated sign data scale is limited.Under the data-driven learning paradigm,current methods usually encounter the issues like overfitting and limited interpretability.To address these issues,this thesis attempts to incorporate hand prior and explore the mixing potential of data-driven and model-driven methods,in pursuit of enhancing visual representations and improving performance in the downstream tasks.Specifically,this thesis performs investigation into the following four aspects.Firstly,this thesis proposes a hand model-aware framework for isolated SLR.Specifically,it contains three important components,i.e.,visual encoder,hand-modelaware decoder and inference module.It leverages the hand model for better optimization and utilizes the modeling hand as the intermediate representation,thereby guiding the framework to learn discriminative features.During training,additional loss functions are added on the intermediate representation to constrain its spatial and temporal consistency.Extensive experiments demonstrate that the proposed framework achieves the state-of-the-art performance when published.Secondly,this thesis leverages both hand prior and unlabeled sign language data to propose the first hand-prior-aware self-supervised pre-training framework in the context of sign language.This framework can be applied to more SLR subtasks.Its pretraining is conducted via masking-reconstruction.Oriented at the hand pose,this work carefully designs various masked modeling strategies,jointly introducing hand prior as regularization in the decoding stage.These techniques help the framework better learn hierarchical context in the sign language domain.For downstream tasks,it designs various task-specific prediction heads to fine-tune with pre-trained encoder.Extensive experiments show that the proposed framework not only increases the task applicability but also achieves state-of-the-art performance in three SLR tasks with a notable gain.Thirdly,this thesis proposes a gesture-to-gesture translation framework with hand topology incorporated.As a key technology in SLP,gesture-to-gesture translation requires fine-grained structure understanding of hand.In response to the insuficient representation capability of sparse 2D keypoints in existing works,this work proposes a hand-topology-aware framework for this task.This framework utilizes the modelaware hand mesh as the gesture state representation,and leverages inherent topology in the hand model to enhance the framework capability of hand structure understanding.Specifically,the framework unravels the surface of the hand model into the topology space.In this space,it provides fine-grained position embedding aligning the desired image plane,thus building the topology map.Then the framework employs a spatialadaptive approach and the attention mechanism to leverage information in the topology map for better generation.Experiments validate the effectiveness of the proposed method,achieving the best performance in this task.Fourthly,this thesis proposes a hand prior-based interacting hand generation framework.During sign language expression,the scenario of hand interaction often occurs.Compared to the single-hand counterpart,the complexity of interacting hand image generation largely increases.Its challenges mainly arise from occlusion from complex spatial positions.Oriented at this novel task,this work builds baselines from the related single-hand gesture-to-gesture translation task,and establishes the evaluation protocol from multiple perspectives,including image quality and hand structure preservation.To tackle the challenges posed by this task,we propose a model-based,occlusion-aware framework.Through incorporating hand prior,the proposed framework is capable of effectively handling complex occlusions between hands.Extensive experiments demonstrate that the proposed framework outperforms the baseline methods.

Keywords/Search Tags:

Intelligent sign language understanding, Hand prior, Self-supervise pre-training, Hand fine-grained structure understanding, Interacting hand

PDF Full Text Request

Related items

1	HLAC-Based Feature Extraction Methodology And Its Application To Hand Gestures Recognition
2	Movement Characteristics Of Human Hand Grasp And Its Applications For The Design Of Anthropomorphic Robotic Hand
3	Research And Implementation Of Continuous Chinese Sign Language Recognition Based On RealSense
4	Design And Implementation Of Software For Hand Function Rehabilitation Training Robot
5	Free-hand Sketch Understanding Based On Hidden Markov Model
6	Research On Methods Of Real-Time Tracking And Recognition Of Hand Gesture With Complex Backgrounds
7	Application Research Of Hand Recognition Based On Monocular Vision
8	Student Understanding of Cross Product Direction and Use of Right-hand Rules: An Exploration of Representation and Context-dependence
9	Design Of A Natural Human-hand Interaction Rehabilitation System Based On Augmented Realit
10	Research On Key Technologies Of Vision-based Hand Gesture Interaction