Insurance Claim Analysis Using Extreme Gradient Boosting Trees-A Machine Learning Approach

KOLLONGEI, Naomi

View/Open

Master's thesis copy.pdf (359.3Kb)

Publication Date

2024

Author

KOLLONGEI, Naomi

Metadata

Show full item record

Abstract/Overview

The emergence of big data has revolutionized the way insurance companies deal with data that they receive in the course of their business, big data involves huge volumes of data of different varieties. Therefore the current methods used for analysis such as statistical methods and actuarial formulas in insurance sector are becoming inadequate to solve the emerging problems and opportunities from advancement in technology. Moreover, the data may be prone to missing values. Extreme gradient Boosting Algorithm (XGBoost) which is an ensemble learning which has the capacity to effectively address the two unique characteristics of the data. This research utilized an Extreme boosting algorithm to process insurance claim data in-order to model the frequency of claim and severity of claims for claim prediction. XGBoost creates tree-based models by iteratively fitting decision trees to the residuals of the previous predictions, effectively reducing the error in each iteration. Using the algorithm we aim to enhance the accuracy of predictions that will yield better estimates for improved risk assessment and pricing of insurance products within the insurance sector. The XGBoost algorithm models were evaluated using Root Mean Squared Error (RMSE), Mean Absolute Error (MAE) and Rsquared (RSQ). Results showed that XGBoost models for the claim frequency had a RMSE estimate of 0.949, MAE of 0.7741 and RSQ 0.781 and claim severity model had the metrics 899.12,736.77 and 0.9625 respectively. We also compared the performance of the XGBoost models with zero inflated poisson model, multiple linear regression and generalized Pareto Model. The XGBoost model had the best metrics (RMSE, MAE and RSQ), we therefore concluded that the Extreme Gradient Boosting Model was the optimal model. Key words: Big data, Frequency, Severity, machine learning, gradient boost, XGBoost

Permalink

https://repository.maseno.ac.ke/handle/123456789/6305

Collections

Statistics and Actuarial Science [31]