Though it’s regarded as a new idea, the phrase “big data” actually became popular in the 1990s. At that time, the term was mainly used to designate datasets that were so large and complex that the databases and analysis tools of the time could not handle them properly. But what was big in the 90s may no longer be considered big or complex today. Despite the fact that storing and processing large data sets has become simpler and cheaper, we have continued to use the term big data, though now with a different meaning. Today, when we say “big data” or “big data applications”, we actually mean any type of (predictive) analysis that can create information from one or more raw datasets. So, it is no longer about the size of the dataset, but more about the methods we use to analyze the data.
1. Data sources: Here, the data is generated. Usually it concerns transactional data (such as sales), click behavior on websites, e-mails, data collected by sensors, GPS trackers, etc.
2. Integration: The phase in which the source data is moved, and sometimes transformed, in order to be able to better store, update or edit it.
3. Data stores: The databases where the analyses are performed. We use databases that are specially designed for analysis, making it more efficient, and we do not have to disrupt the source systems by querying for analysis.
4. Analytical methods and techniques: Structured analysis, which can be done in Excel, but also with specialized analysis tools and platforms that support advanced analytical methods. Here, artificial intelligence and machine learning models are developed.
5. Data visualization, reporting or interactive sharing of information from databases or outcomes of analyses.
6. Integration of results and models into applications that can be in daily business.
In the final step, integration, the results of analyses and models are put into practice. A good, simple example of a machine learning model is the advice that online stores give to you as a customer, based on past purchases and search behavior from you and thousands of other visitors. If you are looking for a book in a store like Amazon, you get advice to buy two more books, for example, under a heading like “Frequently bought together” or “Others also viewed”. You no doubt recognize this common big data application. The algorithm behind it is the association algorithm, which is usually called “shopping-basket analysis”. So, now you know how it is called too. We will explain another time how that algorithm works. I drew in red how the previous solution would look:
These models do not necessarily have to be about giving good advice in a bookstore. For example, with a comparable model, we can discover insights about medication use in patients with diabetes. Based on medication history, we can make a prediction about the next medication that a patient will probably need. Other analysis projects we are involved in include fraud detection with insurance claims submitted, identifying customers who may be about to churn, or predicting consumption of goods or services over a certain period of time.
If you are not yet structurally involved with big data projects, why not start today? Take the first step: keep your data. Then think about how you can use that data. Try to discover trends or patterns and experiment. Start small, and build it up slowly. Excel is really great to start with. You can always apply complex algorithms later. If you are already working on data projects, I wonder what you have achieved and what you might encounter. Call us, email us, and we can think along with you!