Bringing leading edge predictive modelling to insurance companies for improved outcomes.

Applying Text Mining to Improve Segmentation of Workers Compensation Claims

Introduction

With the proliferation of Web, smart-phone and other applications for creating and exchanging text notes and documents, exponentially increasing amounts of unstructured text data and information in electronic format are now available to support various business critical data mining and other analytic activities. The problem at hand is to extract useful information from text such as claims notes, Web blogs, customer service center call notes, etc. without anyone actually reading the text, or manually classifying the relevant contents in meaningful and useful ways.

Claim adjusters are a very rich source of information and their notes document their observations, actions and opinions. Is it possible to use this golden resource of information to improve the accuracy of predictive models? Text mining provides an automated methodology to review adjuster notes and uncover valuable information that can be applied to improve a predictive model’s accuracy in measuring claim severity.

This case study will demonstrate two things: first, that a model using only the information in adjuster notes can predict claim severity just as accurately as a model using all of the structured claim data (eg, age, SIC code, injury type, medical bills, prescriptions, etc). Secondly, that the case study will demonstrate that combining text mining with structured data significantly betters the predictions of claim severity.

Why Predict Severity of Workers Compensation Claims?

Claim Analytics has been working with the State Fund of Minnesota (SFM Insurance), analyzing historic claims data to build predictive models which enables new claims to be scored on their severity at an early stage in the claim’s lifecycle. “This enables us to understand and quantify claim severity without lengthy discussion or debate and take appropriate action quickly” says Meg Kasting, Vice President of Claims at SFM Insurance.

There are many reasons why text mining can improve model accuracy.

  • Expert opinions and qualitative assessments: Text mining can pick up on the sentiments of the claim adjuster and account for their expert opinion within the predictive model. For example, a word such as investigate within the claim adjuster notes is likely to indicate that the adjuster has some concerns about the claim.
  • Early indicator: There can be delays in some medical services being provided as well as lags in the bills being processed. For example, hernia surgery may be delayed by a month to allow the injured worker an opportunity to lose weight and it may be another two weeks after that before the bill is processed and entered into the claim system. Text mining will provide knowledge of the surgery 6 weeks earlier than the medical bill data.
  • Case management plans: The claim adjuster notes may also contain information about the case management strategy that is not available in the structured data. Case management plans are another indicator about the claim adjuster’s opinion of a claim. In addition, case management plans can have impact on the claim outcome. Advance information about the case management plan can be very valuable to a model’s predictions.

Model Comparisons

For this case study we are predicting the likelihood at fifteen days from FNOL that the claim cost of a medical only claim will exceed $20,000.

We initially trained two independent predictive models:

  • The first model used only structured data-‘non- text model’ as predictor variables. Several hundred such variables were analyzed. For example: age, industry, nature of injury, body part, medical bill amounts and number of visits by type of provider, prescriptions, distance to employer and distance to medical providers. 28 of these metrics were included as predictors in this model.
  • The second model used only non-structured text data – ‘text only model’ as input. Several hundred key words and phrases were analyzed. For example: body parts, medical procedures, case management actions and time references. 15 of these words and phrases were included as predictors in this model.

Here is a summary of the number and type of predictors used for each model:

Type of Information Non-text Model Text Only Model
Claimant Info 3 0
Injury Description 6 4
Medical Info 9 4
Prescription Meds 3 0
Location 3 0
Industry Description 2 0
Time References 1 1
Reserve Estimate > 1 0
Claims Mgmt Actions 0 6
Total # Predictor Variables 28 15

Back To Top

It is important to stress that model evaluation should always be measured on out-of-sample data. That is to say that a subset of claims should be withheld from the modeling process so that model evaluation can be performed on a sample of claims that were not part of the model building. This protects against the risk of overfitting the data and ensures the final model is robust and able to accurately measure the risk of new claims.

The two models predict claim severity using very different information. For example, the non-text model considers the reserve estimate, claimant information (eg, age), industry and prescription medications – none of which are inputs for the text only model. Similarly, the text only model gives a lot of weight to claims management actions – which are not inputs for the non-text models.

Even though the two models predict claim severity using different information their accuracy is very similar. The text only model is a little better at finding the most severe claims but a little worse at finding the least severe claims. The gains chart below shows for each model the percentage of all severe claims found within a given percentage of all claims that were scored as being highest risk.

Our final step was to bring together the text and non-text data to build a combined model. Our combined model included 37 predictor variables: the 28 predictors from the non-text model and 9 of the 15 predictors from the text-only model. Our analysis indicated some redundancy in the information provided by some of the text variables and so they could be dropped without deteriorating model performance. Not surprisingly, the redundant variables from the text mining analysis were mainly those conveying information where the non-text data was quite rich – such as injury descriptions and medical information.

The gains chart below demonstrates that combining the text mining analysis to the claim severity models greatly improves the accuracy of the model’s risk assessment. In practical terms, this helps claims organizations to further optimize their allocation of resources.

Project Experience

SFM Insurance, working with Claim Analytics, is using text mining within their predictive models to enhance their claim handling process with some surprising results. Models have been built to indicate likelihood of return to work and to identify exploding medical expense claims.

“One of the surprises for us has been the value of using text mining on our claim case notes, this has really added benefit to our claim handling process” says Scott Brenner, Senior Vice President and General Counsel at SFM Insurance.

“This predictive modeling project with Claim Analytics has been a real eye-opening experience for us, it amazing what is hidden inside our data!” Brener went on to say.

Text mining is a technique used in advanced predictive models to enable data in free text field to be used in data analysis, something which has previously been hard if not impossible. “We are using leading edge techniques here at Claim Analytics which are producing great results for worker compensation insurers” says Jonathan Polon, Chief Analytics Officer at Claim Analytics.

Conclusion

With the plethora of data locked in adjuster and claims notes, text mining has finally found a way to analyze this invaluable information and increase the accuracy of predictive modeling. Claim Analytics is pioneering this new Combined Model technology and it’s customers are truly reaping the benefits of using what was previously supposed to be redundant, ineffectual data. In industries such as insurance where risk is a component of everyday business, text mining, whilst not able to produce exact results, gives companies an edge which brings them a step closer to actual results, and therefore a step ahead of the rest of the industry.

Back To Top

Bookmark and Share

News/Events

March 5, 2012
Insurance-Canada - Toronto, Canada

April 15-18, 2012
RIMS Annual Conference - Philadelphia, Pennsylvania

April 18, 2012
Canadian Re-Insurance Conference - Toronto, Canada

July 22-25, 2012
AASCIF 2012 Annual Conference - Portland, Oregon

November 7 - 8, 2012
National Workers Compensation Conference - Las Vegas, Nevada

Featured Articles

Claim Analytics 2011 US Group LTD Benchmarking Study

The Claim Analytics 2011 Group LTD Benchmarking Study compares LTD recovery experience across several insurance companies. The overriding objective is to aid companies to improve their claim practices.View Article

Follow Us