What is cohort analysis and everything you need to know to build it

Cohort analysis is a method that allows you to break your data set into the specific groups, examine the lifecycle of each group, and compare them to each other.

It is a great way to find critical points within a customer lifecycle. You can use cohort analysis to understand if you are improving the provided service with time.

In the SaaS world, we usually use cohort analysis to get a deeper understanding of churn or customer retention.

How to build a cohort

Let's take a SaaS company that already has some customer base. The first step is to split the whole customer base into separate groups – cohorts. In SaaS, we usually group customers by the month they joined the service.

For example, in May 2019 our imaginary SaaS company had 120 new customers. This group of 120 customers will form one May 2019 cohort. Now, from these 120 customers, how many were still using the service as of the end of June 2019? Let’s say 100. From this 100 how many were left in July 2019? And so on. You keep asking this question till the current date.

Here is the progression of the May 2019 cohort.

What can you understand by examining this one cohort? One thing is that within just 13 months the company loses 100 from 120 customers it started with. Certainly not the best performance. The biggest drop happens in Month 1 and Month 13. This is also very useful information. Perhaps it is so because the company sells monthly or yearly contracts.

The next step is to add newer cohorts. Customers joining in the month of June 2019 will form the June cohort and so on.

Here is the example of customer cohorts for our SaaS company:

Once you have split your data set into cohorts with the number of customers, you can use this data to calculate any SaaS metric to get a deeper understanding of them. Let’s see how we can do that on the example of customer churn, MRR churn, and customer retention.

Using cohort analysis to understand customer churn

Having the cohorts with customers number is nice, but it is not very visual, and still requires a lot of digging into the numbers to understand what is going on.

Let’s use this data and examine customer churn. For each month of the cohort’s life, we have to calculate what % of customers were lost.

For example, in May 2019 we had 120 customers and only 100 of them were left after a month. So the churn for Month 1 will be:

(120 - 100) / 120 = 16%

When calculating churn for Month 2 we can do two things. We can either calculate it relative to the previous month like so:

(100 - 89) / 100 = 11%

Or we can calculate it relative to the Month 0:

(120 - 89) / 120 = 25%

These two approaches give you a slightly different picture. The first tells you churn at any given month of the cohort life. The second gives you an understanding of what percent of all customers from the given cohort (the month 0 number) were lost until the particular month.

For example, if we started with 120 customers, and in Month 3 churn relative to month 0 is 35% it means that we lost 42 customers already.

120 * .35 = 42

Let’s calculate churn relative to the previous month for all cohorts.

We have added conditional formatting to highlight cells with higher churn and draw attention to them. This feature is available in Google Spreadsheets and Excel and is very useful when it comes to cohorts visualization.

Now you can clearly see at what month your churn is the highest.

A few words on calculation relative to month 0 VS previous month

When looking at cohorts where percent is calculated relative to the previous month it is important to understand that higher percentages do not necessarily mean the highest number of lost customers. For example, losing 20 customers out of a 100 will result in a 20% churn with 20 customers lost, but losing 40 out of a 300 will result in a 13% churn with 40 customers lost. So the churn is lower but the number of lost customers is higher.

When presenting cohorts with percent calculation relative to the previous month it is always useful to pair it with cohorts showing pure numbers. So that a reader can easily understand what actual number a given percentage will convert.

How to read cohort analysis

You have to understand that the time on the cohort table is passing in two directions. From left to right – as each cohort progresses with its life. From top to the bottom – with each new cohort being added at the bottom.

This means two things:

  1. You would expect younger cohorts to perform better compared to the older once. As time passes, hopefully, you improve the service you provide. It becomes more mature. You sell to customers that are a better fit for your product.
  2. You would expect a cohort to stabilize as more months pass. Hopefully, once customers have adopted your product they will stay with it for longer and will not churn at such a high rate as at the beginning of the lifecycle.

Here is how we can apply this to churn cohorts:

Using cohort analysis to understand MRR churn

You can use quite a few different metrics as a value for your cohort. MRR is a very good example. By calculating the MRR retention rate for each cohort you can easily understand not only where you lose the most of your money but also if you manage to grow revenue over the lifespan of the specific cohort.

Let’s take a look at the example below:

You can see that in younger cohorts, specifically Feb 2020, Mar 2020, Apr 2020 our company managed to cover all MRR churn by expanding existing customers’ MRR. Thus achieving +100% MRR retention. These cohorts have more MRR in Month 2-4 than in Month 0. This is a great sign that you are doing something right and can expand your existing customer base.

A different way to visualize cohorts

I want to describe one different way to visualize your cohort data. It gives you a good visual clue into the underlying patterns. The approach is to visualize your cohort as a multiple line chart.

Let’s take customer retention this time. We don’t really need to do any additional calculations here. It is the initial data we started our post with. From all customers that sign up in May 2019 how many left in June, and so on.

What we can do is instead of putting these data in the table, we can draw them as lines. The X-axis represents the lifecycle month of the cohort. The Y-axis represents the number of customers left in a given month of the lifecycle.

Here is the example of such visualization for our imaginary company:

When examining such visualization you want to look for three main things:

  1. Newer cohort lines should stay above the older cohort lines. That means that the new cohort has more customers and you are able to retain them better.
  2. Hopefully, after the initial drop, the line will stabilize and go almost in parallel with the X-axes. It means that once customers adopt your product they stay with it and churn at a much lower rate than initially.
  3. New lines should have a more shallow initial drop compared to the older ones. You can see on the example above how lines which are on top have much more shallow drops. This means that you are improving your service, and customers need less time to see its value.

There are many different metrics you can use as a cohort value. We presented some of the most common ones.

I hope it was useful and if you have any questions or comments please feel free to reach out.

by Alex, co-founder of Probe

References:

  1. Here is Google Spreadsheet with all of the analysis and more I’ve shown in this blog. You can create a copy of it (File –> Make a Copy) and make it your own.
  2. This post was inspired by the Christoph Janz blog post Excel template for cohort analysis in SaaS. I’ve been following him for a while and he writes a lot of great content on analytics in SaaS. Please check him out.
  3. Huge thanks to our friend Aleksandra who helped us with proofreading and shared her experience.
  4. Illustration by Sara Maese from Icons8