Ai & ML

Posted on February 2, 2020

If I were to pick out the single most common slide presented at analytics and data science conferences, it would be Gartner’s analytics ascendancy model shown below.

Not to be confused with the capability maturity model from Carnegie Mellon, the diagram has been variously called a maturity model, a continuum, and yes, even an escalator. Sometimes companies flip the order too.

Moving past semantics, I will call this the analytics maturity model for the purpose of this article based on common industry parlance.

The Analytics Maturity Model Is A Compelling Idea…

This model captivates our imagination for three reasons:

Its format closely mirrors the classic 5W 1H journalist technique that immediately sets our synapses firing. The typical pitch can be delivered by anyone from the feisty startup data geek to the pinstripe-suited enterprise analytics salesperson: “we start up with ‘what happened’, intuitively progress to ‘why did it happen’, traverse to ‘what will happen’, and close with the satisfying ‘how can we make it happen’.”
Its companion analogies are appealing and relatable. “Oh, it is just like a child growing up. First you learn to crawl, then you learn to walk before you can run.” Cue nods around the room.
It makes for good business. A common consulting engagement pattern starts with an assessment of where a company is on a maturity model. The company then uses the level above to prioritize what capabilities to learn next. The model thus provides clarity by both imposing structure to a capability and a clear road map to get better.

…But Contains Flawed Assumptions That Can Derail Data Science

Representing the model this way visually introduces a number of subtle assumptions. Unfortunately many of these assumptions are flawed, and can leave data science teams severely handicapped.

The irony is the model that was meant to help companies make better data-driven decisions is presented in a way that prompts bad decisions about building data science teams.

Maturity models are structured as a series of levels of effectiveness. But the danger comes when we make the following assumptions:

You start at the bottom, advancing through the levels in sequence
Each higher level brings more value than the lower level before it
The way you manage these capabilities lie on the same spectrum

None of these assumptions are true.

Deconstructing them one at a time:

There is no need to ‘complete’ building out descriptive analytics before moving on to advanced analytics.

Firstly, how exactly does one ‘complete’ building out reporting, business intelligence and analytics capability? Data is a dynamic representation of a changing world, and as long as the world keeps changing (forever, and at an accelerating speed), there will be new requirements for descriptive analytics.

And I get it — mature data management is important. Data platforms done well are firm friends of data science. It is a rare joy to have all the data you need in one place to do modeling. Having nice (data) warehouses and lakes, make for fertile ground where random forests can grow.

But waiting for multi-year data warehousing projects to complete and deploying data science teams to SQL and documentation duty in the meantime is leaving value on the table and a recipe to send your data science team job hunting.

At its core, unless you are building product features the source of value of data science and analytics come from one thing — and that is the decision.

If the data scientist is able to affect the decision towards a better outcome through data, value is created. If there is no change to the decision, then you have wasted your time. This is true no matter how robust your secure-high-performance-cloud-hosted-explainable-deep-learning model is.

And it is exceedingly possible for entire teams to exist and be rewarded for their work while creating absolutely no value for years.

There is no need to wait at the lower levels of the model while advanced analytics opportunities languish. Infrequent but major business decisions are a common occurrence where data scientists can add value immediately.

A much better strategy is almost laughable in its simplicity:

Set your data scientists to work on the most important decisions of the most senior person you can get access to.

Sit next to this person. Get into his or her brain and decision making process. Start from where they are and work your way forward from there. Look for local access databases. Look for Excel spreadsheets. Look for the management accountant.

And use every technique in your toolkit to improve decisions.

There is no certainty that higher levels of analytics bring more value

There are well established ways to calculate the value or ‘uplift’ of predictive or prescriptive models — as an example, one may utilize statistical techniques to forecast the state of the world without the intervention wrought due to the use of the model and compare that with the ground truth after time has passed, with the difference being the value created.

As an simplified example, prior to starting a data science project to increase retail product sales, one may forecast that without any intervention, revenue for next month might be $10,000. Having implemented a pricing and promotional model, revenue comes in at $12,000, with the model uplift being $2,000.

But in an odd reversal, calculating the value of descriptive or diagnostic work may be a lot trickier. Because how exactly does one quantify the value of awareness? If one were to walk around blindfolded, how might one estimate the value of taking off the blindfold?

If you are starting to think that the above two ideas are not comparable, you are absolutely right.

The different types of work described thrive under starkly different management methods.

We have established that the different levels can work in parallel, and measure value differently. But that is not all.

A strong reason why teams get bogged down at the lower end of the maturity model is that management paradigms that make descriptive and diagnostic analytics effective may be a death knell for predictive and prescriptive work.

This will be covered in more detail in a dedicated future post, but in short the former thrives under a strong ‘engineering’ mindset, with IT style requirements, strong project management, and robust processes. While the latter works best outside the bounds of projects with defined start and end points.

The big difference is in data uncertainty. The distinctive risk of predictive and prescriptive analytics is this: there is no guarantee that there is enough information in the data, to make the application of predictive and prescriptive analytics valuable.

To compound the situation, there are also multiple techniques — often equally valid — that can be utilized for a given problem. And thus there must be sufficient room to experiment, try, and fail early with little repercussions. As an example, if I am building a machine learning model for predictive maintenance, and find that the available data carries no useful signals, failing after two weeks of experimentation on a laptop is much better than failing with a six month budgeted project and a team of ten.

To recap: a primary way maturity models damage teams is when companies take the methods of management that worked for delivering descriptive analytics solutions, and impose them on advanced analytics work without modifying the approach to account for data uncertainty.

Towards a better model of data science team maturity

How then should we think of maturing data science teams?

For a start, ditch the descriptive-diagnostic-predictive-prescriptive spectrum. In the trenches, work often transits seamlessly between the four. Analytics and data science professionals across the board do diagnostic work all the time. And imposing major company processes whenever someone switches from building a visualization to a machine learning model or vice versa as part of his or her daily work is both painful and unnecessary.

One should not think of analytics maturity and value like the height of a growing child, with serial increments across a single dimension. A more accurate starting point is think of maturity across two distinct dimensions — the dimensions that actually deliver value: decision support or production systems.

Mature both decision science and data science in production

If you are supporting business decisions, the maturity you want is really the maturity of decision science. ‘Engineering’ here is secondary. Instead look into data literacy and interpretation, mitigating cognitive bias, and setting up the right metrics and incentives that actually reward data driven decisions.

Building data science products or putting models in production is a very different activity. It requires mature processes that acknowledge data uncertainty, safe spaces to experiment to de-risk advanced analytics work, proper model operations post go-live and financial models that are tailored for products instead of projects.

In this article, we have glossed over some of the complexities of real life data science teams. Are the sub-disciplines of AI considered science or engineering? Where are the most useful places for someone with a PhD? Am I a data scientist if I only call pre-trained models? Should data engineering be a separate team?

These questions all fit. And in a future article we will cover distinct career tracks, and distinctive approaches to managing analytics, data science and AI teams that will cause each type of data scientist to thrive.

All images displayed above are solely for non-commercial illustrative purposes. This article is written in a personal capacity and do not represent the views of the organizations I work for or I am affiliated with.

How analytics maturity models are stunting data science teams