| With the rapid development of AI technology,deep learning algorithm has been applied in many fields.With the proliferation of Internet of things(IoT)applications,the implementation of deep learning(DL)algorithm on IoT devices has become a new important design goal in various IoT scenarios.In order to meet the constraints of computing resources,storage resources and privacy on the application of DL algorithm in IoT devices,specialized accelerators for the DL algorithm have been proposed.Face recognition technology is one of the most mature and widely used technologies in the field of DL.The accuracy of face recognition technology has been continuously improved in recent years,but the heavy parameters bring great challenges to the embedded system with limited resource.To solve the problem of applying face recognition technology to IoT devices,this work designed a technically-sound scheme to realize face recognition quickly and with low power consumption on a dedicated accelerator.The primary research work contains three-fold contributions.1.Implementation and optimization of a lightweight face recognition algorithm.The face detection algorithm based on MTCNN and the feature recognition algorithm based on MobileNet are selected,which can reduce the computation.The model is further compressed and optimized to increase the speed and decrease the consumption of hardware.2.Mapping of lightweight face recognition algorithm onto the specialized IoT hardware with high efficiency and utility.Based on an in-house general-purpose AI accelerator for IoT application,this work designed a special compiler to realize the mapping of lightweight face recognition algorithm on the AI accelerator,which can achieve high-efficiency neural network parameter mapping and generation of instructions for the accelerator.3.Hardware implementation and co-design of lightweight face recognition algorithm and the accelerator architecture.Considering the limitations of hardware,this work first split and modified the special operators from the perspective of offline compilation,and then modified the hardware architecture to complete the implementation and optimization of the operators.In conclusion,this work presents a technical scheme to realize a fast and low-power face recognition algorithm on the dedicated artificial intelligence accelerators.Specifically,this work introduces the lightweight face recognition framework,conducted model compression,map the algorithms on the hardware,and further improved the hardware,so as to complete the face recognition solution on the accelerator.Although the technology scheme is oriented to the dedicated artificial intelligence accelerator,the technology it practised can be borrowed to improve other IoT AI applications according to the characteristics of the algorithm and hardware.The research is thought to have useful implications and reference value for the realization of other DL algorithms on IoT accelerator and devices. |