This project uses an insurance cost data set from Kaggle (https://www.kaggle.com/datasets/mirichoi0218/insurance). It contains information on individual medical insurance bills. Each bill is associated with some characteristics of the person who received it:
- age: age of primary beneficiary
- sex: insurance contractor gender, female, male
- bmi: Body mass index, providing an understanding of body, weights that are relatively high or low relative to height, objective index of body weight (kg / m ^ 2) using the ratio of height to weight, ideally 18.5 to 24.9
- children: Number of children covered by health insurance / Number of dependents
- smoker: if person is smoking (yes/no)
- region: the beneficiary's residential area in the US, northeast, southeast, southwest, northwest.
- charges: Individual medical costs billed by health insurance
We are interested in how these different characteristics relate to the total medical cost. Since it is a continuous, positive number, a linear regression is promising to bring us some good results.
The procedure is described in the attached notebook linear_regression_insurance_costs.ipynb.