Data Science Concepts Every Analyst Should Know: Applicability of ML/AI


Applicability of Machine Learning (ML) and Artificial Intelligence (AI)

Data Science Concepts Every Analyst Should Know

The practical applications of data science are multiplying. From predicting if a delivery will arrive late to recommending how much herbicide to use to save money and protect the ecosystem, there are endless examples of organizations harnessing data science solutions to improve the efficiency and quality of business decisions.

Naturally, the increasing adoption of advanced analytics is going to affect the business analyst role. In the first article of this "Data Science Concepts Every Analyst Should Know" series, we talked about some core data science concepts that business analysts can use to add value to data science projects:

  • Evaluation metrics for machine learning models. BAs who understand the applicability of performance measures to specific business problems can prevent poor decisions by helping validate the conclusions presented by a data science team. They can also facilitate the buy-in of good models by articulating the team’s conclusions in relevant business terms.
  • Sampling bias. Data quality in data science involves not only completeness, consistency of format, cleanliness, and accuracy of data points but also the notion of representativeness. Business analysts with a good understanding of the business context and sampling techniques can play a key role in ensuring that the data used to train models isn’t different in some meaningful random way from the larger population it is meant to represent.
  • False discoveries in experimentation. When testing multiple hypotheses such as which version of a various landing pages will increase sales, a BA with solid grounding in statistics can recommend proper corrections for multiple comparisons to prevent the multiple testing problem in statistics.

Another value-adding contribution BAs can make in projects leveraging data to support business decisions is to assess whether ML/AI is a valid alternative to solve a business problem. Not all business problems and opportunities require or will benefit from machine learning. BAs who know how to tell the difference can help their organizations make better decisions in the ML/AI space.

Questions to ask to evaluate the applicability of ML/AI to a business problem

In business environments, machine learning is particularly useful in complex environments where the right decision might depend on a large number of variables that follow hidden, complex patterns.

For example, a bank trying to predict which customers will default in their credit card payments may implement a model that uses dozens of inputs variables like the customer’s credit card limit, education level, marital status, gender, age group, repayment status of previous periods, etc. In that scenario, machine learning will be more equipped to handle input and model complexity, dramatically accelerating the “time to answer.”

In order to help determine whether machine learning can be considered a superior alternative to solve a business problem, business analysts can ask the following questions:

1) Do we have a well-defined business question requiring modeling and optimization?

Good ML applications start from a well-defined business question that business analysts can help formulate. Examples of valid questions involving modeling and optimization are, “Is this transaction legitimate or fraudulent?”, and “What price will yield the highest sales volume?”

In contrast, the question “What makes our customers feel satisfied?” is too vague to serve as a starting point.

2) Can the problem be solved using standard business logic?

Machine learning shouldn’t be used when standard business logic will suffice.

In many business optimization problems, it’s possible to predict outcomes using a rules-based system that provides the correct answer each time with comparable or better performance than complex models. When that’s the case, there is no point in adopting ML/AI, even if there is a high volume of predictions or decisions to make.

Gerd Gigerenzer and Wolfgang Gaissmaier cite several examples of studies that illustrate this less-is-more effect empirically. One of the studies tested different models to classify a customer as inactive or active. For retailers, it’s valuable to know which customers will be back or have abandoned their business for good. Compared to complex models that used extensive estimations and computations, the following simple hiatus heuristics made fewer errors:

If a customer has not purchased within a certain number of months (the hiatus), the customer is classified as inactive; otherwise, the customer is classified as active.

Photo by Korie Cull on Unsplash

According to the authors, a simple rules-based model is more likely to outshine complex machine learning algorithms in environments with moderate to high uncertainty and moderate to high redundancy (that is, the different data series available are correlated with each other). In the case of customer inactivity, the time since last purchase is likely to be closely associated with every other available metric of past customer behavior (frequency and spacing of purchases over a time frame, time elapsed since the customer last opened a promotional email, etc.). In the end, the complex model with more information was outperformed by the hiatus rule.

3) Do we have enough data to feed a machine learning model?

Another important consideration to determine whether machine learning is a valid alternative to consider is the existence of enough data to learn from.

In the example of predicting which customers will default in their credit card payments, we’d need a large number of transactions that exemplify the information necessary to make a decision, including both the predictor variables (customer’s credit card limit, demographic information, repayment status, etc.), and the actual outcome (default vs. not).

How many examples are needed will vary by business problem and type of ML algorithm, but as a rule of thumb, expect to need thousands of examples — ideally, tens or hundreds of thousands for “average” modeling problems and millions and tens of millions for “hard problems like those addressed by deep learning (e.g., image processing, language translation, etc.). For more on this topic, read this article by Jason Brownlee.

4) Does the data have sufficient quality?

Having a large volume of data is not enough. The data must be of sufficient quality as well.

For example, imagine that your company wants to predict which new sales opportunities it will win or lose. If the only source of information is the CRM system, and the sales team doesn’t do a good job updating the system as opportunities are moved through the sales funnel, it may be impossible to produce a model that performs well on new deals-- even with hundreds of thousands of historical records. Problems like missing or inaccurate values and inconsistent use of data fields to capture information will get in the way of producing an accurate model.

# # #

A recent article published by MIT Management, 6 trends in data and artificial intelligence for 2021 and beyond, talks about the widening gap between leaders and laggards in advanced analytics.

As with any other technology, success using ML/AI is heavily dependent on business alignment. ML/AI initiatives that don’t start from a solid business case may result in expensive data science experiments with dismal success rates and low ROI. By asking the right questions, business analysts can help organizations better compete using strategies shaped by machine learning and other advanced analytics techniques.

Author: Adriana Beal

Adriana Beal has graduate degrees in Electrical Engineering and Strategic Management of Information obtained from top schools in her native country, Brazil. In 2016, she got a certificate in Big Data and Data Analytics from the University of Texas. Since then she has been working in machine learning and data science projects in healthcare, mobility, IoT, customer science, and human services. Prior to that she worked for more than a decade in business analysis and product management helping U.S. Fortune 500 companies and high tech startups make better software decisions. Adriana has two IT strategy books published in Brazil and work internationally published by IEEE and IGI Global. You can find more of her useful advice for business analysts at

Like this article:
  5 members liked this article


Only registered users may post comments.



Copyright 2006-2024 by Modern Analyst Media LLC