Understanding Data Science: Spectrum and Tools

Data Science is another buzz word that we come across quite frequently. At the outset it looks like a whole new universe of data, quite humongous and difficult to relate with. But do we realize, it is helping us do our routine tasks daily and we may probably continue using it for several years.

When we look into our mobile phone for directions to a place or when we are searching something over the Internet, we are using Data Science. Earlier large organizations like Google, IBM, Yahoo etc. were the only players in this field, but with democratization of data and access to Cloud Computing, we can analyze huge amounts of data in shorter periods of time. Today the gap between these organizations and the rest of the world is shrinking as possibility of the data being used to take decisions is practically possible for anyone!

Data Science is therefore an approach through which actionable insights can be derived from the data. It is all about applying scientific knowledge to make inferences and has benefitted many organizations. In absence of data, beliefs are uninformed, decisions in many cases, are purely based on best practices or intuition and may not yield correct results.

Data Science essentially helps us in four areas mentioned below:

Probing Reality:

Gathering data through various means and analyzing it to take actions e.g., best tool used to create Dashboards, can be determined by probing the world

Pattern Discovery:

Determining patterns, forming clusters can be used greatly to simplify solutions e.g., profiling users to sell a particular product

Predicting Future Events:

Predictive Analytics can be used to take decisions in response to future events e.g., predicting the sales in a retail store based on historical data, seasonality, demand etc.

Understanding People & The World:

Scientific understanding of natural language, computer vision, psychology, neuroscience to understand the process that drives people’s decisions and behaviors. e.g., usage of deep learning methods for object identification or defect detection

The Data Science Spectrum:

Reporting & Business Intelligence:

Common Tools that are popular in this space for creating Reports and Dashboards

Statistical Modeling:

Involves using Statistics to build representation of data and then conduct analysis to infer relationship between its variables e.g., from a population of children with different ages, we can determine the height with some error. Hence a relationship exists between variables age and height.

Machine Learning (ML):

As the name indicates, a system that learns from past data through self-improvement without being explicitly programmed. Machine learning normally uses mathematical or statistical models to obtain a general understanding of data to make predictions.

Machine learning foundations are derived from statistical theory & learning and hence the tools used under both categories have some overlap as the tasks performed are similar in nature and also performed by people with same roles.

 

Artificial Intelligence (AI):

When a system can perform cognitive functions, such as perceiving, learning, reasoning and solving problems it is deemed to have artificial intelligence. AI is commonly known to have the below levels:

  • Narrow AI: When it can perform a task better than a human
  • General AI: When it can perform a task with the same accuracy level as a human
  • Superhuman AI: When it can beat humans in many tasks

Some of the tools used for AI are given below:

Having understood the Data Science Spectrum and tools, it is important to understand how the organizations are benefiting from it. Therefore, let’s talk about a few case studies on ML and AI that we successfully implemented for our customers.

ML Case Study: Churn Prediction for a Telecom Customer

Business Problem:

Customer was facing problems due to customer attrition. They wanted to be proactive to identify customers who are likely to churn and come up with Direct Marketing offers to retain them, thereby minimizing potential loss of revenues.

High-level Objectives:

  • Determine the reasons that attribute to Customer attrition and take actions
  • Run Marketing Campaigns to provide offers to customers likely to churn and retain them
  • Minimize loss of Revenues due to Customer churn

Approach:

  • Formed customer clusters based on demographics
  • Analyzed the historical data shared with regards to churn
  • Trained the model using ML algorithm, to identify attributes affecting churn and also understand the co-relation between those attributes
  • Use the model with the new dataset to determine the customers who are likely to churn setting a threshold probability
  • The data obtained was used to target specific customers with marketing offers with a view to retain them and minimize the likely loss of revenue

AI Case Study: Conversational AI for a Banking Customer

Business Problem:

Customer wanted to implement a conversational AI system to improve the bank’s service portfolio. The bank had multiple business domains and for ease of maintenance, the bank wanted to implement a unified chatbot platform which each domain could leverage. The chatbot should be able to resolve any query the customer had and incase it wasn’t able to do that, there should be a possibility to transfer it to a human (customer representative) for further action. Importantly the chatbot should be intelligent enough to identify context switching and also memorize transactions.

High-level Objectives:

  • Build a conversation processing platform for all of bank’s needs
  • Depending on the need of business domain, customizations should be allowed to the platform
  • Platform to be built for both text and voice conversations

Approach:

  • The system had 4 main components – A chat interface, an orchestration component, a dialogue processing component (ML based) & an admin console for configuration of dialogue
  • The orchestration component acts a central unit and handshakes to all bank domains to fetch necessary information (E.g., account details, balance details, statements etc.)
  • The ML model processes the chat and identifies the intent
  • The bot is intelligent enough to identify spelling mistakes, context switching and memorize the previous discussions
  • We implemented this for 5 different domains (functions) within the bank

Conclusion:

Data Science is a complex, multi-faceted field that can be approached from different perspectives e.g., methodology, business models, big data, data engineering, data governance etc. but it essentially focuses on analytical techniques and tools that are used to demystify challenging real-world problems. The novelty of data science is not rooted in the latest scientific knowledge, but rather a disruptive change in our society that has been caused by the evolution of technology!

Parag Penker

Global Vice President - Technology (Big Data, Data Warehousing & Business Intelligence)

Parag is heading the Big Data, Data Warehousing and BI initiatives across the USA, India, Middle East and Africa. Parag carries more than 25 years of experience in the IT Industry and has PMP, Oracle Hyperion and SAP Supply Chain Certifications to his credit.