Investigating the Effect of Feature Engineering on Structured and Unstructured Data for Predictive Modelling and Machine Learning in Enterprises of Developing Countries: A Case Study of the Republic of Namibia

This study investigates the effect of feature engineering on structured and unstructured data for predictive modelling and machine learning in Namibian enterprises. With structured and unstructured data increasingly available, enterprises face challenges in effectively utilising data due to limitations in infrastructure, data quality, and expertise in data-driven approaches. It addresses a critical gap in the literature: the lack of context-specific feature engineering techniques tailored to low-resource environments, where data inconsistencies and limited computational resources are common. By examining and adapting various feature engineering methods—such as feature selection, transformation, and construction—the study aims to enhance model accuracy and decision-making effectiveness in Namibian enterprises. Methodologically, the study employs both supervised and unsupervised machine learning techniques, including K-Means clustering, K-Nearest Neighbours, Decision Trees, Random Forests, Gradient Boosting and Artificial Neural Networks. Data preprocessing, dimensionality reduction, and other feature engineering methods are applied to structured and unstructured data, derived from operational datasets, questionnaires, and interviews. Key techniques like SelectKBest and ANOVA F-value are utilised to optimise model performance and to address the limitations in data quality and accessibility. Findings indicate that context-specific feature engineering significantly improves model accuracy and predictive reliability, especially in data-scarce environments. Enhanced model performance is shown to aid strategic decision-making, enabling Namibian enterprises to optimise resource allocation, streamline operations, and improve customer service. Despite these benefits, challenges such as limited infrastructure, inconsistent data quality, and low ML literacy hinder widespread adoption.
The study recommends collaborative data governance frameworks and increased ML training initiatives to address these barriers. Future research directions include the development of hybrid feature selection techniques, automation of feature engineering, and integration of blockchain technology to improve data integrity, scalability, and robustness.

Item Type:

Doctoral Thesis

Subjects:

Information Technology

Divisions:

No Keywords

Depositing User:

Joseph Adewale Iyanda

Date Deposited:

2025-02-04 00:00:00