Cohort Analysis: Retention Tables, LTV, and Product Health

What Is Cohort Analysis?

Cohort analysis is a technique for segmenting users or customers into groups — cohorts — based on a shared characteristic or experience at a specific point in time, then tracking those groups' behavior over subsequent time periods. The most common cohort definition is the acquisition cohort: all users who signed up, made their first purchase, or Were acquired in the same week or month. By comparing cohorts across time, analysts can distinguish genuine behavioral trends from compositional changes in the user base.

Cohort analysis answers questions like: Are users acquired in Q3 retaining better than those acquired in Q1? Is lifetime value improving for newer cohorts? How does engagement in week 2 predict long-term retention?

Why Cohort Analysis Matters

Aggregate metrics like "monthly active users" or "average revenue" hide the underlying dynamics of a product. A growing MAU number can mask the fact that churn is accelerating — you're simply acquiring new users faster than you're losing old ones. Cohort analysis exposes these dynamics by keeping each group of users separate and measuring them at the same relative time points (e.g., Day 1, Day 7, Day 30 after signup — not calendar dates).

This relative-time perspective is what makes cohort analysis powerful: it controls for the maturity of the user relationship, enabling apples-to-apples comparisons across cohorts acquired at different times.

Types of Cohorts

Cohort Type	Definition	Example Use Case
Acquisition cohort	Grouped by when they first joined or made a purchase	Retention curves by signup month
Behavioral cohort	Grouped by an action they took (e.g., used a feature)	Do users who complete onboarding retain better?
Size cohort	Grouped by a size attribute (e.g., company size in B2B)	Do enterprise accounts expand faster than SMB?
Channel cohort	Grouped by acquisition channel (paid, organic, referral)	Which channel yields highest LTV users?

The Cohort Retention Table

The standard output of cohort analysis is a retention table (also called a cohort matrix). Rows represent cohorts (e.g., the month of first purchase), columns represent time periods after the cohort event (Month 0, Month 1, Month 2...), and each cell contains the percentage of users from that cohort who were still active in that period.

Cohort	Month 0	Month 1	Month 2	Month 3
Jan 2024	100%	62%	48%	41%
Feb 2024	100%	65%	51%	44%
Mar 2024	100%	68%	54%	—
Apr 2024	100%	70%	—	—

Reading down a column shows how the same time period compares across cohorts — an improving Month 1 column (62% → 65% → 68% → 70%) indicates that recent cohorts are retaining better at that stage, suggesting a product improvement or better-qualified acquisition.

Building a Cohort Analysis in SQL

The SQL pattern for cohort analysis involves three steps: assign each user to a cohort, compute the activity period for each event, and aggregate by cohort and period.

Step 1 — Assign cohort (first activity date):

SELECT user_id, DATE_TRUNC('month', MIN(created_at)) AS cohort_month FROM orders GROUP BY user_id

Step 2 — Compute period index (months since cohort):

SELECT o.user_id, c.cohort_month, DATE_TRUNC('month', o.created_at) AS activity_month, DATEDIFF('month', c.cohort_month, DATE_TRUNC('month', o.created_at)) AS period FROM orders o JOIN cohorts c ON o.user_id = c.user_id

Step 3 — Aggregate into retention table:

SELECT cohort_month, period, COUNT(DISTINCT user_id) AS users FROM cohort_activity GROUP BY 1, 2 ORDER BY 1, 2

Divide each cell by the cohort size (period = 0 count) to get retention percentages. Then pivot by period to get the matrix format.

Building a Cohort Analysis in pandas

In Python, the cohort analysis pattern typically involves: assigning cohort labels, computing a period index, grouping and counting, then pivoting.

df['cohort'] = df.groupby('user_id')['order_date'].transform('min').dt.to_period('M')

df['period'] = (df['order_date'].dt.to_period('M') - df['cohort']).apply(lambda x: x.n)

cohort_data = df.groupby(['cohort', 'period'])['user_id'].nunique().reset_index()

cohort_pivot = cohort_data.pivot(index='cohort', columns='period', values='user_id')

retention = cohort_pivot.divide(cohort_pivot[0], axis=0) — divide by cohort size to get percentages.

Interpreting the Retention Curve

The retention curve plots the percentage of users active at each period (Day 1, Day 7, Day 30...). Key patterns to look for:

Pattern	What It Means	Implication
Steep early drop, then flat	Many users churn quickly but a loyal core stays	Focus on improving early activation
Gradual continuous decline	Users are slowly disengaging over time	Product may lack long-term value hooks
Improving across cohorts	Newer cohorts retain better — product is improving	Positive signal; identify what changed
Smiling curve (retention rises after initial drop)	Seasonal or periodic usage pattern	Normal for seasonal products

Cohort Analysis for Revenue: LTV

Cohort analysis extends naturally to revenue. Instead of tracking active users, track cumulative revenue per user over time. This builds a customer lifetime value (LTV) curve by cohort:

LTV at Month N = (total revenue from cohort through Month N) / (number of users in cohort)

Comparing LTV curves across cohorts — and across acquisition channels — reveals which cohorts are most valuable and whether LTV is improving over time. LTV curves are also essential for computing the LTV:CAC ratio, which determines whether customer acquisition spending is economically justified.

Common Pitfalls

Pitfall	Description	Fix
Immature cohorts	Recent cohorts have fewer periods — their curve looks better simply because we haven't seen their full churn yet	Only compare cohorts at periods where all have data
Survivorship bias	Analyzing only users who completed a certain action excludes those who churned before reaching it	Define cohort at the earliest possible event
Wrong cohort definition	Using a non-first event (e.g., last purchase) as the cohort anchor mixes up cohort timing	Always use the user's first event as the cohort date
Calendar vs. relative time	Comparing activity in January vs. February mixes cohort age with seasonality	Always use period index (0, 1, 2...) not calendar dates

Summary

Cohort analysis is one of the most valuable analytical techniques for understanding product health, customer retention, and lifetime value. By grouping users by acquisition time and tracking them through relative time periods, analysts can separate compositional effects from behavioral trends, detect whether product improvements are actually improving retention, and build robust LTV models by acquisition channel. The SQL and pandas patterns are consistent across industries: assign cohorts, compute period indices, aggregate, and pivot — then look for trends across the diagonal of the retention matrix.