Loading…

Learning-Based Memory Allocation Optimization for Delay-Sensitive Big Data Processing

Optimal resource provisioning is essential for scalable big data analytics. However, it has been difficult to accurately forecast the resource requirements before the actual deployment of these applications as their resource requirements are heavily application and data dependent. This paper identif...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on parallel and distributed systems 2018-06, Vol.29 (6), p.1332-1341
Main Authors:	Tsai, Linjiun, Franke, Hubertus, Li, Chung-Sheng, Liao, Wanjiun
Format:	Article
Language:	English
Subjects:	Analytics Big Data Clusters Computer memory Data analysis Data management Data processing Distributed memory garbage collection Java Machine learning Mathematical models Memory management memory over-commitment Model accuracy modeling performance-cost tradeoff Predictive models profiling Provisioning Resource allocation Resource management spark Sparks Task analysis
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Optimal resource provisioning is essential for scalable big data analytics. However, it has been difficult to accurately forecast the resource requirements before the actual deployment of these applications as their resource requirements are heavily application and data dependent. This paper identifies the existence of effective memory resource requirements for most of the big data analytic applications running inside JVMs in distributed Spark environments. Provisioning memory less than the effective memory requirement may result in rapid deterioration of the application execution in terms of its total execution time. A machine learning-based prediction model is proposed in this paper to forecast the effective memory requirement of an application given its service level agreement. This model captures the memory consumption behavior of big data applications and the dynamics of memory utilization in a distributed cluster environment. With an accurate prediction of the effective memory requirement, it is shown that up to 60 percent savings of the memory resource is feasible if an execution time penalty of 10 percent is acceptable. The accuracy of the model is evaluated on a physical Spark cluster with 128 cores and 1TB of total memory. The experiment results show that the proposed solution can predict the minimum required memory size for given acceptable delays with high accuracy, even if the behavior of target applications is unknown during the training of the model.
ISSN:	1045-9219 1558-2183
DOI:	10.1109/TPDS.2018.2800011