Data Mining and the Enterprise BI Long Game

Data mining provides the foundational work for higher-order predictive and prescriptive analytics. Enterprise data warehouses are a trove of useful information, and data mining methods help to separate what is useful from what is not (Sharma, Sharma, & Sharma, 2013). Data mining is itself an analysis method; that is,

  • “the analysis of data that was collected for other purposes but not the questions to be answered through the data mining process” (Maaß, Spruit, & de Waal, 2014, p.
  • Data mining takes on the unknown-unknowns of the dataset and begins to make sense of the vast amount of data points available. It involves both data transformation and reduction. These are necessary as “prediction algorithms have no control over the quality of the features and must accept it as a source of error” (Maaß, Spruit, & de Waal, 2014, p. 6).

What is produced from these data mining efforts is a set of relevant data points that can be used for aggregate, predictive, and prescriptive analysis in the enterprise organization’s business intelligence platform(s). It is no different than avoiding the “garbage-in, garbage-out” mistake of simple reporting and visualization. Data mining reduces the noise and eliminates the dilution of relevant data by irrelevant covariates. It provides the business intelligence framework with usable data and a minimum of error.

For example, if I were to embark on a predictive modeling project to determine what factors influenced employee attrition from a large manufacturing company over the last five years, I would first want to do extensive data mining on the raw dataset. With over 20,000 employees on all continents across the world, and hundreds of data points per employee, a rigorous data mining phase eliminates the variables that would throw errors into any predictive model such as decision trees or multiple regression.

References

Maaß, D., Spruit, M., & de Waal, P. (2014). Improving short-term demand forecasting for short-lifecycle consumer products with data mining techniques. Decision Analytics, 1(1), 1–17.
Sharma, S. A., Sharma, A. K., & Sharma, D. M. (2013). Using Data Mining for Prediction: A Conceptual Analysis. I-Manager’s Journal on Information Technology, 2(1), 1–9.

Jonathan Fowler
About the author

Data-Centric Culture evangelist and BI leader with a demonstrated history of working in a variety of environments. Skilled in Big Data, statistical methods, machine learning, database design, and research methods. Clemson University alumnus ('04, '07) and doctoral student at Colorado Technical University ('21).

Related Posts