How to Calculate Percentiles in BigQuery

Percentiles are a powerful way to understand the distribution of your data. In BigQuery, calculating percentiles is straightforward thanks to built-in functions and SQL constructs.

Why Percentiles Matter

Percentiles help you understand the spread and shape of your data. For example, the 90th percentile tells you the value below which 90% of the data falls. This is useful in performance monitoring, sales analysis, and customer behavior insights.

BigQuery Functions for Percentiles

  • PERCENTILE_CONT(value, percentile): Returns the exact percentile using continuous distribution.
  • PERCENTILE_DISC(value, percentile): Returns the percentile as a discrete value from the dataset.
  • PERCENTILE_CONT(value, ARRAY(percentiles)): Returns an array of continuous percentiles.

Basic Example

SELECT 
  PERCENTILE_CONT(sales_amount, 0.9) AS sales_90th_percentile
FROM 
  `my_dataset.sales_data`;

This query calculates the 90th percentile of the sales_amount column.

Multiple Percentiles

SELECT 
  PERCENTILE_CONT(sales_amount, [0.25, 0.5, 0.75]) AS sales_percentiles
FROM 
  `my_dataset.sales_data`;

This query returns an array with the 25th, 50th (median), and 75th percentiles.

Using GROUP BY

SELECT 
  region,
  PERCENTILE_CONT(sales_amount, 0.9) AS sales_90th_percentile
FROM 
  `my_dataset.sales_data`
GROUP BY 
  region;

This query calculates the 90th percentile of sales for each region.

Understanding Continuous vs Discrete

Continuous (PERCENTILE_CONT) interpolates between values, giving precise results even if the exact percentile does not exist in the dataset. Discrete (PERCENTILE_DISC) returns the actual value from the dataset, useful when you need real data points.

Final Tips

  • Always check if your dataset has enough rows for meaningful percentiles.
  • Use SAFE.PERCENTILE_CONT if you want to avoid errors on empty datasets.
  • Test both continuous and discrete methods to see which works best for your use case.

With these techniques, you can confidently summarize and analyze your data in BigQuery!