How to Calculate Percentiles in PostgreSQL

Percentiles are crucial in data analysis to understand the distribution of values in a dataset. PostgreSQL provides several built-in ways to calculate percentiles efficiently.

What is a Percentile?

A percentile indicates the value below which a given percentage of observations fall. For example, the 90th percentile means 90% of the data is below this value.

Using percentile_cont and percentile_disc

PostgreSQL offers two primary functions:

  • percentile_cont: Returns a continuous percentile, interpolating between values if necessary.
  • percentile_disc: Returns a discrete percentile, selecting the closest actual value.

Example: Continuous Percentile

SELECT percentile_cont(0.9) WITHIN GROUP (ORDER BY salary) AS p90 FROM employees;

This returns the 90th percentile salary, interpolating between salaries if needed.

Example: Discrete Percentile

SELECT percentile_disc(0.9) WITHIN GROUP (ORDER BY salary) AS p90 FROM employees;

This returns the first salary value that reaches the 90th percentile.

Calculating Multiple Percentiles at Once

SELECT percentile_cont(ARRAY[0.25, 0.5, 0.75]) WITHIN GROUP (ORDER BY salary) AS quartiles FROM employees;

This returns the 25th, 50th (median), and 75th percentiles in a single query.

Using Percentiles in Groups

SELECT department, 
       percentile_cont(0.9) WITHIN GROUP (ORDER BY salary) AS p90
FROM employees
GROUP BY department;

This calculates the 90th percentile salary per department.

Things to Keep in Mind

  • Percentile values range from 0 to 1 (not 0 to 100).
  • For performance, ensure you have indexes on the ORDER BY columns when working with large tables.
  • If you need a simpler estimate, you can approximate percentiles using NTILE or window functions.

Conclusion

PostgreSQL’s percentile functions are powerful tools for analyzing data distributions. With percentile_cont and percentile_disc, you can perform advanced statistical queries with ease. Incorporate these into your analytics workflows to gain deeper insights from your data.