Basics
Database Management
Dates and Times
How to Calculate Percentiles in BigQuery
Percentiles are a powerful way to understand the distribution of your data. In BigQuery, calculating percentiles is straightforward thanks to built-in functions and SQL constructs.
Why Percentiles Matter
Percentiles help you understand the spread and shape of your data. For example, the 90th percentile tells you the value below which 90% of the data falls. This is useful in performance monitoring, sales analysis, and customer behavior insights.
BigQuery Functions for Percentiles
PERCENTILE_CONT(value, percentile)
: Returns the exact percentile using continuous distribution.PERCENTILE_DISC(value, percentile)
: Returns the percentile as a discrete value from the dataset.PERCENTILE_CONT(value, ARRAY(percentiles))
: Returns an array of continuous percentiles.
Basic Example
SELECT
PERCENTILE_CONT(sales_amount, 0.9) AS sales_90th_percentile
FROM
`my_dataset.sales_data`;
This query calculates the 90th percentile of the sales_amount
column.
Multiple Percentiles
SELECT
PERCENTILE_CONT(sales_amount, [0.25, 0.5, 0.75]) AS sales_percentiles
FROM
`my_dataset.sales_data`;
This query returns an array with the 25th, 50th (median), and 75th percentiles.
Using GROUP BY
SELECT
region,
PERCENTILE_CONT(sales_amount, 0.9) AS sales_90th_percentile
FROM
`my_dataset.sales_data`
GROUP BY
region;
This query calculates the 90th percentile of sales for each region.
Understanding Continuous vs Discrete
Continuous (PERCENTILE_CONT) interpolates between values, giving precise results even if the exact percentile does not exist in the dataset. Discrete (PERCENTILE_DISC) returns the actual value from the dataset, useful when you need real data points.
Final Tips
- Always check if your dataset has enough rows for meaningful percentiles.
- Use
SAFE.PERCENTILE_CONT
if you want to avoid errors on empty datasets. - Test both continuous and discrete methods to see which works best for your use case.
With these techniques, you can confidently summarize and analyze your data in BigQuery!