An adaptive teacher–student learning algorithm with decomposed knowledge distillation for on-edge intelligence

Research

Title	An adaptive teacher–student learning algorithm with decomposed knowledge distillation for on-edge intelligence
Type	JournalPaper
Keywords	Deep learning, Knowledge distillation, On-edge intelligence, Feature representation,Tensor decomposition
Year	2023
Journal	Engineering Applications of Artificial Intelligence
DOI
Researchers	Majid Sepahvand ، Fardin Abdali Mohammadi ، Amir Taherkordi

Abstract

In case the spatial shape of the feature maps of the teacher in feature-based knowledge distillation (KD) is significantly greater than the student model, first, they cannot be compared directly. Second, the knowledge of these complex feature maps cannot be quite apprehensible for the student. This paper proposed a new KD, in which Tucker decomposition was used to decompose the large-dimension feature maps of a teacher to obtain core tensors from the feature maps of the teacher. The knowledge of these tensors can be easily understood by students due to their low complexity. Furthermore, in the proposed KD, an adaptor function is suggested, which balances the spatial shape of the core tensors of the teacher and student and helps compare them using a convolution regressor. Finally, a hybrid loss based on adaptor function is suggested to distill the knowledge of the core tensors of the teacher to the student. Both teacher and student models were implemented on smartphones used as edge devices, and the experiments were evaluated in terms of recognition rate and complexity. According to the results, the student model designed by ResNet-18 architecture has 65.44 million fewer parameters, 6.45 GFLOPs less computational complexity, 1.12 G less GPU memory use, and 265.67 times greater compression rate than its teacher model designed by ResNet-50 architecture. While the recognition rate of the student model merely dropped down to 1.5% in the benchmark dataset.

Majid Sepahvand

Research

Abstract