How to Measure Cohort Retention

Cohort retention analysis helps you understand how well you’re retaining users over time. With BigQuery’s powerful data processing capabilities, you can analyze retention at scale and gain actionable insights.

1. What is Cohort Retention?

Cohort retention measures how many users from a particular group (cohort) continue to engage with your product over time. For example, you may group users by signup month and track how many of them are still active after 1, 2, or 3 months.

2. Prepare Your Data in BigQuery

Ensure your dataset contains at least the following columns:

  • user_id — unique identifier for each user
  • signup_date — when the user first joined
  • activity_date — when the user was active

3. Write the SQL Query

Here’s an example SQL query to calculate cohort retention by signup month:


WITH cohorts AS (
  SELECT 
    user_id,
    DATE_TRUNC(signup_date, MONTH) AS cohort_month
  FROM `project.dataset.users`
),
activity AS (
  SELECT
    user_id,
    DATE_TRUNC(activity_date, MONTH) AS activity_month
  FROM `project.dataset.user_activity`
),
cohort_activity AS (
  SELECT
    c.cohort_month,
    a.activity_month,
    COUNT(DISTINCT a.user_id) AS active_users
  FROM cohorts c
  JOIN activity a
  ON c.user_id = a.user_id
  WHERE a.activity_month >= c.cohort_month
  GROUP BY c.cohort_month, a.activity_month
)
SELECT 
  cohort_month,
  FORMAT_DATE('%Y-%m', activity_month) AS activity_month,
  active_users
FROM cohort_activity
ORDER BY cohort_month, activity_month;
                                

4. Calculate Retention Rates

To turn counts into retention rates, divide the number of active users by the number of users in the cohort:


WITH cohort_sizes AS (
  SELECT 
    cohort_month,
    COUNT(DISTINCT user_id) AS cohort_size
  FROM cohorts
  GROUP BY cohort_month
)
SELECT 
  ca.cohort_month,
  ca.activity_month,
  ca.active_users,
  ROUND(ca.active_users / cs.cohort_size, 2) AS retention_rate
FROM cohort_activity ca
JOIN cohort_sizes cs
ON ca.cohort_month = cs.cohort_month
ORDER BY ca.cohort_month, ca.activity_month;
                                

5. Visualize the Results

Export your results to a tool like Google Sheets, Looker Studio, or Tableau. Heatmaps are a popular choice, where rows represent cohorts, columns represent months since signup, and the cells are shaded by retention rate.

6. Best Practices

  • Define clear cohort criteria (e.g., signup date, first purchase).
  • Exclude test or inactive users for cleaner analysis.
  • Segment by user type or geography to uncover deeper insights.

Conclusion

Measuring cohort retention in BigQuery provides valuable insights into user engagement and product health. By following these steps, you can track performance, identify trends, and improve retention strategies over time.