Deriving value from your Data Lake Implementation

Having implemented a Data Lake, the challenge organizations face, is to derive value from such investments!

This is where Machine Learning (ML) models help you extract meaningful information from the platform implemented. They essentially convert data to knowledge to information that can be used in decision making. We have seen most commonly two types of ML models – Supervised & Unsupervised.

Supervised – When you have a target variable to predict e.g. To determine whether a customer will Churn or not.

Unsupervised – When you have no target variable to predict such as Clustering or grouping customers and building Association rules.

Let’s look at an interesting case study where we helped a Large Distributor for home appliances sell effectively to their Retailers. Through a ML model developed for them, they were able to make proper recommendations for their products sold.

This was an Unsupervised ML model that used Clustering to group customers with similar purchasing habits, understanding the products they stock based on historical data while providing recommendations to others in a similar category thereby generating more revenues. Let us now get to a detailed understanding of how this was implemented.

Existing Process:

A distributor salesperson visits various retailers monthly who normally stock these home appliances which in turn are bought by end users like us. The existing process was completely manual resulting in recommendation based on gut feeling rather than following a systematic approach. This was not helping the organization as their revenues was decreasing on a consistent basis. Hence, they decided to implement a system that could help them make informed decisions through a proper recommendation system.


Let us understand what exactly a recommendation system is and what do you mean by associations.


Considering the current business pain for the distributor, a well-developed Recommendation system should take the following into consideration:
  1. Recommending more quantities of the same products purchased by the same retailer earlier
  2. Recommendations based on the historical purchase behavior of other retailers
  3. Recommendations targeted to retailers similar in size, based on their location, for products having more demand, seasonal products that retailer stocks only on occasions and not otherwise
  4. Recommendations made on category of products that are stocked more than others e.g., more household appliances are stocked by Hypermarkets compared to electronic retailers who stock less household appliances or sometimes don’t even stock any
  5. Recommendations should also be made on Association rules to identify which products are normally purchased together
  6. Recommendations based on monthly consumptions by different retailers e.g., fast moving items that generally go off the shelf quickly and will have higher chances of repurchase
  7. Finally, recommendation system if linked to the Inventory Management system should also be able to provide recommendations based on stock availability in the warehouse

We studied the requirements, analyzed sample data sets over the past few months and developed a powerful ML model based on the considerations given above. Finally, the output was displayed on the Mobile App to help the salesperson make correct & timely recommendations and increase revenues.

The process followed for building and training the model was as follows:

In conclusion, AI-ML models built on top of your Data Lakes / Data Platforms help you justify the investments made from a business perspective. In earlier days, analytics was all about visualizations only, today advanced analytics is additionally about deriving intelligence from the data stored to make meaningful inferences that benefits organizations immensely!

Parag Penker

Global Vice President - Technology (Big Data, Data Warehousing & Business Intelligence)

Parag is heading the Big Data, Data Warehousing and BI initiatives across the USA, India, Middle East and Africa. Parag carries more than 25 years of experience in the IT Industry and has PMP, Oracle Hyperion and SAP Supply Chain Certifications to his credit.