Building a data-driven channel attribution model using Markov chains

What is the impact of Paid Search on the customer journey?
Arben Kqiku, acount manager at comtogether


In the rapidly evolving world of digital marketing, you should be keeping a reliable finger on the data-pulse of your customers’ journeys. This will allow you to consistently choose the most effective ways of spending your precious advertising budget and maximizing ROI. In order to make sense of the data however, you need to be aware of the models shaping the way it is presented to you. One of the central aims of anyone in digital marketing should be the capacity to monitor with precision the effectiveness of investments, in particular regarding which channels lead to conversions. In the interest of creating data-driven models that allow us to optimize our clients’ needs, we at comtogether are constantly exploring new ways of burrowing into the numbers to extract useful insights and novel perspectives on what exactly is going on within our digital marketing portfolios.

Using Markov chains for optimal channel attribution

Among a number of modeling possibilities for multi-channel attribution, we have Markov chains. Markov chains are a powerful tool we can use to extract precise probabilities from a series of sequential events. The realm of applicability is enormous and has long fueled research in fields as far apart as predictive inventory management, innovation of airport queuing, and the design of search-engine page-ranking algorithms. Within the context of digital marketing, performing a Markovian analysis upon a dataset of customer journeys and their conversion paths allows us to put weighted values on the effectiveness of each channel, particularly as relates to the desired outcome, a conversion. We do this in two stages: first by calculating the transition probabilities for each channel and, second, by removing channels from the journeys in order to calculate the impact of their absence and, therefore, their relative importance to the conversion. Make sense? Let’s have a look at this in a little more detail.

Transition probabilities

Each customer journey has a number of steps that can involve any number of channels. Imagine that within a campaign you monitor three of such channels: C1) Social Media, C2) Paid Search, and C3) Organic Search. Before converting, a customer will go through any number of these three channels, creating what is in effect a unique journey with a not-so-unique statistical footprint. By aggregating the data of all these journeys, we can calculate what is called the transition probabilities or, in other words, how likely the customer is to go from C1 to C2, C3 to C1, C2 to C1, and so on.

Removal effects

In the second stage of this process, we apply a method called removal effect which, unsurprisingly, shows us the overall effect of removing the various steps in the customer journey. In this way we can calculate the impact on conversions by removing each channel from the dataset. How many conversions do we lose when we remove, in turn, the Social Media, Paid Search, or Organic Search channels? The aggregation of these numbers gives us a comprehensive, accurate picture of the customer journeys, and a properly weighted value for each channel’s impact on conversions.

Our instruments and methodologies

We recently performed a Markovian analysis in this way on a dataset from our portfolio that included over 40,000 unique customer journeys. The desired conversion for this particular client was a reservation of a logistics service. We built a pipeline with the raw data from Google Analytics 360 to Google Big Query, where we created a table showing unique client id, date at which each client visited the website, and whether they converted. Using R, a programming language used for statistical computing, we calculated and piped in the transition probabilities and importance values. Finally, we plotted all this data together using R’s ggplot2 data visualization package.

What you can except as results

With the results, we were able to prove in detail exactly how effective each channel had been. In this case, 35.1%, 38.5%, and 23.6% of conversions could be attributed to Organic Search, Paid Search, and Direct ( which includes unclassifiable sources, and direct entry of the website’s URL) channels, respectively.

Within these numbers we can further illustrate, for example, that Paid Search reinforced other top-performing channels by contributing directly to 13.39% of Organic Search conversions, and 7.29% of Direct channel conversions.

Get a broader perspective

So, when you’re deciding on how to attribute channels and resources within your digital marketing strategy, be aware that there are hidden stories within the data that your current model may not be picking up on. Markov chains are just one of the methods you can use to get a broader perspective.

Would you too like to identify what journey your customers take before converting on your site? Contact us today for a free strategy session.

Spread the word

About the author

Picture of Arben
Arben graduated with a Master in Psychology. Apart from keeping us in good mental shape, he brings his passion & creativity to data analytics and programs the automations that save our customers valuable time.

"The Marketing Mixtape"

Get the latest digital marketing insights straight to your inbox.