Machine Learning Models and Big Data

by Maurik van den Heuvel
In: Big Data

In our previous blog, we explored the concept of Big Data, briefly discussed the phases in a Big Data solution, and explained how you can advise your customer with the association algorithm about the next product he or she might want to purchase. In this blog post I will briefly discuss different types of Big Data algorithms and machine learning models. And if you’re not very familiar with models and algorithms, I add a practical example of a fraud detection solution in healthcare we developed last year. 

What is machine learning?

Machine learning is a form of artificial intelligence (AI), in which the “machine” automatically learns and constantly improves itself, without being explicitly programmed for it. The term was devised in the 1950’s by Arthur Samuel, who tried a number of different methods to teach a computer how to win a game of checkers. Samuel distinguished two types of learning. The first one is rate learning. In this case, the computer saves each move in the game and the score of that move in order to make the best choice later when the same situation occurs. The second is learning with the help of generalization. The computer does not store all possible outcomes, but only the generalized rules. By doing this in an iterative process, those rules become better and better.

For example, it has become possible to make statements about situations of which we have no or limited knowledge on the basis of things we know from the past. And I think that’s a nice definition of machine learning: “To say something meaningful about things we do not know, based on things we do know.”

The Azure Machine Learning Platform

Nowadays there are many different tools and platforms to implement machine learning projects. A platform that I am happy with is Azure Machine Learning from Microsoft. And that is mainly because it has a visually attractive and easy-to-use interface. Moreover, it is relatively easy to publish trained machine learning models as a web service so that you can call them from other applications. Very cool!

Azure Machine Learning distinguishes four families of machine learning algorithms. Not exhaustive, but a good start! Each family contains different methods that can be used to achieve a certain goal:

  • Anomaly detection: identifying unusual data points; for example, for fraud detection.
  • Clustering: the discovery of structure in data; for example, to be able to divide consumers into different segments and thus define separate marketing strategies.
  • Classification: the prediction of two or more categories to predict; for example, whether a client of a bank will or will not pay back a loan.
  • Regression: predicting an exact value; for example, how many more bags of chips you will sell if you lower the price by 10%.

machine learning algorithms

A practical example of machine learning models: Fraud detection in healthcare

Last year we developed an application for a health insurance company, that uses machine-learning methods to identify caregivers who are fraudulent in their declaration behavior. What I find particularly interesting about this application is that machine learning is used here as part of a complete system. As a result, between the moment of declaration and making an appointment with the physician who made the declaration to discuss the conclusions, no human need to be involved anymore. Of course, if desired, this can be done.

The phases of this solution are as follows:

phases of machine learning solution

1) Data sources: Data about the care provider and always innovative declaration data. In this case it concerns dentists.

2) Integration: The phase in which the data is moved to the servers where further analysis can take place.

3) Data stores: The databases where the analysis is performed. We use databases that are specially designed for analysis. This makes it more efficient, and we do not have to disrupt the source systems by querying for analysis.

4) Analytical methods and techniques:

a) Pre-clustering dentists in groups with a similar profile in order to make mutual comparison possible. Think of the size of the practice, but also of different focus profiles such as “Orthodontics” or “Children” or “Protheses.”

b) Identify unusual declarations immediately when they arrive.

c) Categorizing heath care providers in different groups of fraud risks.

5) Data visualization: The results of the various analyses appear in a report indicating the reasons why a claim is considered to be fraudulent or wrong.

6) Integration into the business process: The final step in the application is the automatic generation of a letter to the care provider in which the conclusions are presented, and an appointment is made to discuss the case at the office of the insurance company.

Thanks to this application, the claims have already been reduced by more than $3 million within a year.

How can machine learning models help your business?

With these models you can, of course, detect much more than just fraud. There are countless other possible applications. And not only big tech companies are working on it; all sectors can have a lot of machine learning. I wonder what possible applications can be thought up for your business. Do you have questions or ideas? Or is something bubbling but cannot you put your finger on it yet? Get in touch, and let us think along with you.