Loading…

Applications of Pruning Methods in Natural Language Processing

Deep neural networks (DNN) are in high demand because of their widespread applications in natural language processing, image processing, and a lot of other domains. However, due to their computational expense, over-parameterization, and large memory requirements, DNN applications often require the u...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE access 2024, Vol.12, p.89418-89438
Main Authors:	Touheed, Marva, Zubair, Urooj, Sabir, Dilshad, Hassan, Ali, Butt, Muhammad Fasih Uddin, Riaz, Farhan, Abdul, Wadood, Ayub, Rashid
Format:	Article
Language:	English
Subjects:	acceleration Artificial neural networks Computational modeling convolution neural networks Convolutional neural networks Data models DNN Image processing Machine learning model compression Natural language processing Neural networks Parameterization Performance degradation Pruning Sentiment analysis Task analysis Training
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Deep neural networks (DNN) are in high demand because of their widespread applications in natural language processing, image processing, and a lot of other domains. However, due to their computational expense, over-parameterization, and large memory requirements, DNN applications often require the use of substantial model resources. This strict requirement of latency and limited memory availability are hurdles in the device deployment of these technologies. Therefore, a common idea could be to mitigate the DNN-based models' size without any performance degradation using different compression techniques. During the last few years, a great deal of progress has been made in the field of Natural Language Processing (NLP) using deep learning approaches. The objective of this research is to offer a thorough overview of the various pruning methods applied in the context of NLP. In this paper, we review several recent pruning-based schemes used for converting standard networks into their compact and accelerated versions. Traditionally, pruning is a technique for improving latency, reducing model size, and computational complexity which is a viable approach to deal with the above-mentioned challenges. In general, these techniques are divided into two main categories: structural and unstructured pruning methods. Structural pruning methods are further classified into filter, channel, layer, block, and movement pruning. Whereas, neuron, magnitude-based, and iterative pruning lie in the category of unstructured pruning. For each method, we discuss the related metrics and benchmarks. Then recent work on each method is discussed in detail, which provides insightful analysis of the performance, related applications, and pros and cons. Then, a comparative analysis is provided to analyze the differences among approaches. Finally, the paper concludes with possible future directions and some technical challenges.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2024.3411776