Context matters - Text classification of insurance claims using BERT

This thesis investigates how BERT performs in classifying the type of damage of insurance claims, compared to a FastText classifier. Additionally, the impact of text cleaning on the model performance of BERT is examined. To arrive at a conclusion, BERT and DistilBERT models are fine-tuned on claims data with the task of classi- fying the type of damage. For comparison, FastText models are trained on the same task. Two preprocessing pipelines are applied to the document corpus before train- ing to investigate the resulting performance differences of the models. The results indicate that BERT outperforms FastText in the domain-specific text classification task. The comparison of the BERT models that were fine-tuned on differently pre- processed texts from the same document corpus shows that text cleaning has only a small impact on the model performance. Overall, it can be concluded that transfer learning using BERT is a promising technique in the insurance domain.


Michelle Reiners