In the good old days (20 years ago, or so) most long-term disability (LTD) claims were straightforward. The path to return to work (RTW) was based on physical factors, and seemed almost predictable. No longer.
Psycho-social factors now play an increasingly larger role in long-term disability; during the 1990s, the incidence rate of mental-nervous claims almost tripled. New diagnoses such as fibromyalgia have appeared on the scene. Clinical depression has become an accepted, treatable disease. Work absences no longer carry the stigma it once did. Today, co-morbid conditions are more prevalent, and diagnoses are more complex.
The management of disability claims continues to tax and challenge every claim operation. Instilling a disciplined focus and process on return to work opportunities are more important than ever before. Deploying the right staff at the right time on the right claim continues to prove itself as the best means to increase recovery, maximize return to work, and get claimants back to a productive life.
The increasing number and complexity of all diagnoses, and the more challenging economic times has created a world in which early intervention has become more and more important to claims management.
Early intervention offers claimants the attention and assistance they need, at the time they need it, to make a successful return to work. It can also prevent the onset to a disability mindset – the claimant’s feeling that staying away from work is inevitable, and return to work is impossible.
One of the major difficulties in practicing early intervention is the heavy demands it places on the claims management department. In a perfect world, every claims professional would have infinite time and resources and 100 percent objectivity. The reality, of course, is different.
LTD claims files are bulky. Some, in hard copy, are several inches thick. They tend to thicken over time. Few, if any, are easy to peruse or review.
When a claims file is first presented to claims professional, it may already contain a number of documents:
Claims professionals have to plow through a great deal of information before deciding which of their several hundreds of claims will receive priority and attention that day. Should they suggest a payout for Martha B.? Order an independent medical examination for Jerry T.? Call Jacob Z. again to monitor his progress? Hire an investigator to check out Brenda B’s suspicious-sounding bad back? The possible scenarios are endless.
Utilizing predictive modeling technology to build scoring tools to help claims professionals is a growing science that is successfully being applied to the art of claims management. It allows the best choices available to optimize time and resources. The predictive model scores each LTD claim with a number from one to 10, based on likeliness of return to work within a given timeframe, usually six, 12, or 24 months.
This score gives claims managers a quick, initial assessment of each claim file they manage. It helps them to optimize resource allocation and to decide where to focus their time, energy, and available support. By offering a classification for each claim, it also helps the claims supervisor to allocate caseloads effectively among claims staff.
Most statistical tools used in business pre-date the invention not only of computers, but of electric light and automobiles. These tools were designed with a now-antiquated constraint: they could employ only calculations that could be done by humans in a reasonable amount of time, with the amount of data available.
With the advent of modern computing, two changes occurred to break open these boundaries. First, computers could compute with incredible speed. Second, they began to gather and store vast amounts of data. The setting was ripe for the birth of predictive modeling.
Predictive modeling tools are uniquely suited to dealing with sixth-sense problems, problems that defy the capabilities of traditional if-then-else programming. What predictive modeling does is to rumble, wide-eyed and open-minded, through the data. By drawing empirical observations, based on historical behavior, predictive modeling succeeds at accurately predicting human behavior.
A neural network is an optimization tool that works to develop a weighted algorithm. The neural network starts by assigning random weights to a random series of inputs. It predicts a result, and then compares its predicted result to the actual result. If the predicted result differs from the actual result, the neural network adjust its weights to better approximate the actual result.
In this way, neural networks are similar to humans; they “learn” by comparing known inputs to known outcomes, over and over again, until they can detect a pattern. Unlike humans, they can consider and weight many factors simultaneously.
This process gives neural networks the ability to deal with an object as complex as a claims file. The neural network doesn’t need to be programmed to “understand” the file, and to go through a logical series of if-then-else steps to place it in a particular category. It simply compares groups of known inputs (age, gender, diagnosis, occupation) to the known outcome, in order to detect patterns, some of which are very subtle indeed.
As with any ground-breaking new technology, predictive modeling has attracted a coterie of detractors who say that the answers it delivers are too complex, that the language of delivery is arcane, and that the solutions are impractical.
Many of these criticisms are representative of poor applications of the new techniques, rather than of the techniques themselves. Others are simply wrong. Certainly, in terms of economic payoff, predictive modeling is posting some head-turning returns. A study reported in ZDNet Australia found that: “Among North American and European companies, the successful introduction and use of analytics has delivered returns on investment of anywhere from 17 percent to 2000 percent, with a median ROI of 112 percent. These kinds of numbers make a compelling argument for business analytics, and predictive modeling is a significant part of those analytics.” 1
One article on predictive modeling, published in a newsletter of the Association for Computing Machinery, shows just how much data mining techniques have already penetrated the industrialized world. It describes a day in the life of John, an average American:
… John starts his day with the morning paper, which has improved over a few years ago; it uses a data mining process control method to maintain ink quality while maximizing print speed and minimizing cost. John pops his morning hair-loss pill as he reads. Although John’s pill was first developed to treat prostate enlargement, it was recently approved by the FDA for treatment of hair loss. Major drug companies have been applying data mining techniques to patient information to discover beneficial side effects of drugs. As he drives
As he drives to work, John’s car uses an onboard neural net to monitor information from his engine, exhaust, and fuel systems. Once he arrives, his neural net-based software helps him find opportunities for cross-selling his products. Neural nets even help him find a birthday present for his mother, by suggesting books she may like. And when he uses his credit card to buy online, a neural net fraud detection program runs silently in the background, checking that he is, indeed, the owner of the card. 2
The summary below gives an overview of how one claims scoring model was built. A paper offering a more detailed examination of the process can be found on the Society of Actuaries website.
One of the greatest challenges in creating a disability claims scoring model is to prepare the data in a way that allows the neural network to actually compare cases on an apples-to-apples basis. More than 400 different diagnoses were involved in the dataset used, some of which were very rare and offered few opportunities for comparison.
Creating one field for diagnosis, with 400 different possible inputs, would not have allowed the neural network adequate opportunities for comparison.
Accordingly, the “diagnosis” field was broken down into a number of sub-fields. This gave the neural network a far greater number of opportunities to compare.
Similarly, repeated experimentation revealed that the “age” field was better classified as a categorical (non-numeric) variable, with only three age categories: 18-35, 36-50, 51-65. The neural network derived much greater meaning from the absolute differences found in the three age categories than it had from comparing the relative values of the ages in just one age category.
1There-s-gold-in-them-thar databases, David Braue, ZDNet Australia
2 Behind-the-scenes Data Mining, George H. John, SIGKDD Explorations

The proof is in the pudding. The most interesting discovery the model made was that it could accurately predict the likelihood of recovery. Upon being declared ready for final testing, the model was fed the input fields for a dataset of historical cases whose outcome was known. The rising line of blue diamonds in the chart above shows the results of the blind test. A higher score (score shown on x axis, rate of recovery on y) was clearly linked to a higher rate of recovery. The model’s predictions aligned very closely with real-life outcomes.
The score can be used as a factor in creating an overall policy for managing claims. For example, claims scored in the four-to-seven range might be afforded the highest level of time, attention, and resources. Claims scored between eight and ten would simply require a certain level of monitoring to ensure that nothing stood in the way of the claimant’s recovery and return to work. Claims scored in the one-to-three range could receive a similar level of attention to those scored between eight and ten.
One of the most powerful features of the scoring model is its ability to distinguish between grey-area claims, those claims that are neither particularly promising nor particularly unpromising. This is often a very difficult task for claims professionals.
Yet it can be seen in the chart above that the model differentiates, clearly and accurately, in the four-to-seven range. “Fives” were more likely to recover than “fours”, “sixes” more likely than “fives,” and so on. This differentiation may offer the claims professional assistance in deciding which claims in the 4-to-7 range would most benefit from extra time and attention.
One route that can be (and is being) taken by claims departments is to focus most attention on claims scored between four and seven and, within this group, to start with the claims scored with a six or seven. These claims have a good potential for return to work, but may not do so without the proper assistance, resources, and encouragement. This is where an intensity of intervention could offer the greatest rewards.
The score provides a quick, objective assessment of approved claims files. Because each claim has a score, claims staff can view, compare and prioritize their portfolio of claims. This helps to decide where to focus time, energy and available resources. It offers the claims manager assistance in allocating caseloads effectively among claims professionals.
The predictive model main reports summarize each claim, including key characteristics (age, gender, occupational type, diagnosis, location, etc.) and the scores. For the claims professional, this revealing picture of the entire portfolio facilitates the process of deciding where to focus time and energy. Below is an example of this report.
Additional reports reveal trends in severity of diagnosis, age of claimants, lengths of elimination periods, etc. This ability to track trends offers management an early warning of changes in aggregate claim quality and thereby time to take appropriate measures.
We decided that our model would predict likeliness of recovery within a given time span rather than time to recovery. As well as making this decision, we decided how we would evaluate the completed model.
We started by trying to get a general idea of which factors influence recovery. Then we had to precisely define recovery, a surprisingly tricky task. We had to decide how many records were needed to create the model, and ascertain that the necessary data fields and records were available.
We first split the data into three parts. The largest, 80 percent of the data, was used for training the model. We also set aside 10 percent for testing and 10 percent for final validation.
We then checked data quality, and reviewed the data fields with a knowledgeable user. Finally, we had to devise a way of transforming raw medical data into quantifiable terms the model could understand and evaluate.
We used Salford Systems CART©—a decision-tree tool that automatically sifts large, complex databases, searching for and isolating significant patterns and relationships — as an initial filter to key in on which data factors impact recovery most.
Much more data preparation goes into preparing the data for a neural network than for CART©. For example, we modelled “age” in three ranges, and used fuzzy logic for the edges of the ranges. That is, we put claimants in three age buckets—18-35, 36-50 and 51-65—but used fuzzy logic on the bucket boundaries so the model would recognize the similarity between a 35-year-old and a 36-year-old claimant.
Choosing neural network training settings is akin to making design decisions for the model. Choosing the settings is an iterative process; it takes time and a great deal of trial-and-error testing.
The experimentation involved in choosing training settings required training a neural network for each combination of settings selected. We then determined the best settings by comparing the various networks we trained, using measures such as percentage of records correctly scored, R-squared, etc.
When we had finished training our network, we decided the results were not adequately precise; too many scores were lumped in the middle. We decided to try the “Genetic Training Option” of our neural network software.
Genetic algorithms, while similar in function to neural networks, use different methods: neural networks “learn” from data, while genetic algorithms “evolve” to a solution. They can be used together, combining the strengths of both. (Genetic algorithms are capable of global search and not easily fooled by local minima; neural networks can apply gradient descent to find the minimum point on the error curve.) Using genetic algorithms, we produced a model we felt was ready for final validation.
We validated the completed model by comparing the model’s predictions for the validation data to the real outcomes. The results were excellent. The model’s predictions of return-to-work behavior allied closely with actual outcomes.
Claims management is neither a simple nor a routine task. Yet it forms an appreciable portion of activity for most large insurance organizations, and many smaller ones. A means of assisting claims managers to improve return to work could be highly beneficial. The powers of predictive modeling can now offer assistance in this complex, critical area.
Phil Porter is Claims Director at the Principal Financial Group. Barry Senensky is president and Jonathan Polon is chief modeling officer of Claim Analytics, a predictive modeling company in Toronto, Ont.