DDESE is an efficient end-to-end automatic speech recognition (ASR) engine with the deep learning accelerating solution of algorithm, software and hardware co-design (containing pruning, quantization, compilation and FPGA inference) by DeePhi. We Use Baidu DeepSpeech2 framework and LibriSpeech 1000h dataset for model training and compression. User could run the test scripts for both performance comparison of CPU/FPGA and single sentence recognition.
Innovative full-stack accelerating solution for deep learning in acoustic speech recognition (ESE: best paper of FPGA2017)
Our solution contains algorithm, software and hardware co-design (containing pruning, quantization, compilation and FPGA inference).
After pruning, the model is pruned to a sparse one (15%~20% density) with little loss of accuracy, then the weights and activation are quantized to 16bits so that the whole model is compressed by more than 10X and could be easily compiled by CSC (Compressed Sparse Column) format and deployed on the Descartes platform for efficient inference of FPGA.
Our ASR system and model structure is as follows:
Our achievement is as follows:
If we only accelerate the LSTM layers of the model, we could achieve about 2.87X and 2.56X speedup for unidirectional LSTM and bi-directional model respectively.
If we aim at both the CNN and LSTM layers for further acceleration, we could achieve about 2.06X speedup for the whole end-to-end speech recognition process.
The details of performance comparison for bi-directional LSTM model are as follows:
We assume you are familiar with AWS F1 instance. Please refer to https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/concepts.html if you are not. You should launch and login to DDESE instance before the test.
$ sudo bash (make sure you are under root environment)
# source /opt/Xilinx/SDx/2017.1.rte/setup.sh(start SDAccel platform)
# cd ASR_Accelerator/deepspeech2 (where the test tool are placed)
# source activate test_py3 (activate python3.6 environment)
After the above steps are done, you are free to test the ASR process.
The following command deploy a model on CPU and transcribe the same sentence 1000 times.
# python aws_test.py --audio_path data/middle_audio/wav/middle1.wav --single_test
The following command deploy a model on FPGA and transcribe the same sentence 1000 times.
# python aws_test.py --fpga_config deephi/config/fpga_cnnblstm_0.15.json --audio_path data/middle_audio/wav/middle1.wav --no_cpu --single_test
From the above testing, you can compare the performance of the same acoustic speech recognition task on CPU and FPGA.
In this part, we give more commands you can use to test the DeePhi_ASRAcc. Furthermore, you can change some parameters according to the parameter descriptions.
# python aws_test.py (multi-sentence test for showing the performance of FPGA over CPU)
By default, this command will deploy a model on CPU and transcribe all the sentences (wav format) under data/short_audio/wav/ and print the output logs.
# python transcribe.py (single-test test for showing the accuracy of the model)
By default, this command will deploy the model on CPU and transcribe data/short_audio/wav/short_audio1.wav and print the output logs.
By default both commands deploy model only CPU, you can add FPGA configuration to deploy the model on FPGA, like below:
# python aws_test.py --fpga_config deephi/config/fpga_bilstm_0.15.json
(deploy the model on CPU and FPGA and run the test)
By running this command, models will be deployed on CPU AND FPGA and the ASR process will be tested on CPU and FPGA one by one.
# python transcribe.py --fpga_config deephi/config/fpga_bilstm_0.15.json
(deploy the model on FPGA and do the ASR)
By running this command, model will be deployed on FPGA INSTEAD of on CPU and run the ASR process.
A. for command aws_test.py:
:set this parameter to avoid running the ASR process on CPU
:specify the ROOTDIR_OF_YOUR_WAV_FILE to the folder where wav files are saved, then this command will transcribe every .wav file under this folder,this parameter SHOULD NOT be used together with -- single_test parameter
:specify the PATH_TO_YOUR_WAV_FILE to the wav file that you want to transcribe, then this command till transcribe the specified sentence for 1000 times,this parameter SHOULD be used together with -- single_test parameter
:set this parameter to run single test mode, thus, transcribe the same sentence 1000 times on the specified models. Otherwise transcribe all the sentences under the specified folder for 1 time.
B. for command transcribe.py:
:specify the PATH_TO_YOUR_WAV_FILE to the wav file that you want to transcribe.
Note: The data library is consist of short audios, middle audios and long audios, which are located in the data directory.
Please upload the wav file (16kHz sample rate, recorded in clean environment, shorter than 3 seconds).Then use the following command to transcribe the uploaded sentence:
# python transcribe.py --audio_path PATH_TO_YOUR_WAV_FILE