Basics
- How to Insert
- How to Update
- How to Delete
- How to Trim Strings
- How to Use substring()
- How to Use substring() with RegEx to Extract a String
- How to Replace Substrings
- How to Modify Arrays
- How to Compare Arrays
- How to Concatenate Strings
- How to Convert the Case of a String
- How to Create an Array
- How to Insert Data Into an Array
- How to Query Arrays
- How to Use string_agg()
Database Management
- How to Create a Table
- How to Drop a Table
- How to Rename a Table
- How to Truncate a Table
- How to Duplicate a Table
- How to Add a Column
- How to Change a Column Name
- How to Add a Default Value
- How to Remove a Default Value From a Column
- How to Add a Not Null Constraint
- How to Remove a Not Null Constraint
- How to Add an Index
- How to Drop an Index
- How to Create a View
- How to Drop a View
- How to Reset Sequence
- How to Drop a Column
Dates and Times
Analysis
- How to Use nullif()
- How to Use Lateral Joins
- How to Calculate Percentiles
- How to Get the First Row per Group
- How to Use generate_series to Avoid Gaps in Data
- How to Do Type Casting
- How to Write a Common Table Expression
- How to Import a CSV using Copy
- How to Compare Two Values When One Is Null
- How to Use Coalesce
- How to Write a Case Statement
- How to Use Filter to Have Multiple Counts
- How to Calculate Cumulative Sum-Running Total
- How to Query a JSON Column
How to Calculate Percentiles in PostgreSQL
Percentiles are crucial in data analysis to understand the distribution of values in a dataset. PostgreSQL provides several built-in ways to calculate percentiles efficiently.
What is a Percentile?
A percentile indicates the value below which a given percentage of observations fall. For example, the 90th percentile means 90% of the data is below this value.
Using percentile_cont
and percentile_disc
PostgreSQL offers two primary functions:
percentile_cont
: Returns a continuous percentile, interpolating between values if necessary.percentile_disc
: Returns a discrete percentile, selecting the closest actual value.
Example: Continuous Percentile
SELECT percentile_cont(0.9) WITHIN GROUP (ORDER BY salary) AS p90 FROM employees;
This returns the 90th percentile salary, interpolating between salaries if needed.
Example: Discrete Percentile
SELECT percentile_disc(0.9) WITHIN GROUP (ORDER BY salary) AS p90 FROM employees;
This returns the first salary value that reaches the 90th percentile.
Calculating Multiple Percentiles at Once
SELECT percentile_cont(ARRAY[0.25, 0.5, 0.75]) WITHIN GROUP (ORDER BY salary) AS quartiles FROM employees;
This returns the 25th, 50th (median), and 75th percentiles in a single query.
Using Percentiles in Groups
SELECT department,
percentile_cont(0.9) WITHIN GROUP (ORDER BY salary) AS p90
FROM employees
GROUP BY department;
This calculates the 90th percentile salary per department.
Things to Keep in Mind
- Percentile values range from 0 to 1 (not 0 to 100).
- For performance, ensure you have indexes on the
ORDER BY
columns when working with large tables. - If you need a simpler estimate, you can approximate percentiles using
NTILE
or window functions.
Conclusion
PostgreSQL’s percentile functions are powerful tools for analyzing data distributions. With percentile_cont
and percentile_disc
, you can perform advanced statistical queries with ease. Incorporate these into your analytics workflows to gain deeper insights from your data.