Predicting construction cost overruns using text mining, numerical data and ensemble classifiers

This paper discusses how text describing a construction project can be combined with numerical data to produce a prediction of the level of cost overrun using data mining classification algorithms. Modeling results found that a stacking model that combined the results from several classifiers produc...

Full description

Saved in:
Bibliographic Details
Published in:Automation in construction 2014-07, Vol.43, p.23-29
Main Authors: Williams, Trefor P., Gong, Jie
Format: Article
Language:eng
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper discusses how text describing a construction project can be combined with numerical data to produce a prediction of the level of cost overrun using data mining classification algorithms. Modeling results found that a stacking model that combined the results from several classifiers produced the best results. The stacking ensemble model had an average accuracy of 43.72% for five model runs. The model performed best in predicting projects completed with large cost overruns and projects near the original low bid amount. It was found that a stacking model that used only numerical data produced predictions with lower precision and recall. A potential application of this research is as an aid in budgeting sufficient funds to complete a construction project. Additionally, during the planning stages of a project the research can be used to identify a project that requires increased scrutiny during construction to avoid cost overruns. •Some words and phrases are associated with the level of project cost increase.•Text mining and SVD algorithms were used to convert text to numeric data.•The text information was combined with numerical data describing the project.•The combined data were used to predict cost overruns using data mining algorithms.•Best results were obtained using the stacking ensemble method.
ISSN:0926-5805
1872-7891