Model Compression for End-to-end Speech Recognition
Model compression is important to on-device automatic speech recognition. In this study, we propose three weight sharing based model compression methods to compress a well-trained conformer-based end-to-end speech recognition system without retraining the model. These methods are pruning without retraining, submatrix weight sharing, and full-range sensitivity analysis. On the LibriSpeech corpus, the proposed methods together achieve 9-fold model compression with negligible performance degradation. In addition, the proposed methods can work with 8-bit weight quantization. With model retraining, the proposed techniques achieve 20-fold or 40-fold model compression if some increases of word error rate can be tolerated.