How to Use generate_series to Avoid Gaps in Data

PostgreSQL’s generate_series function is a powerful tool for creating continuous ranges of values, especially useful when working with time-series or sequential data. This tutorial will show you how to use it to avoid gaps in your datasets.

Why Gaps in Data Matter

Missing data points can break reports, dashboards, and analysis. For example, if you’re tracking daily sales but no sales occurred on some days, those dates might be missing from your reports — leading to misleading trends.

What is generate_series?

The generate_series function creates a set of numbers, timestamps, or dates based on a start, end, and step value. For example:

SELECT generate_series('2023-01-01'::date, '2023-01-10', '1 day');

This will return every date from January 1 to January 10, 2023.

Example Use Case: Filling Missing Dates in Sales Data

Suppose you have a sales table:


CREATE TABLE sales (
    sale_date DATE,
    amount NUMERIC
);
                                

And your data looks like this:


sale_date   | amount
------------|--------
2023-01-01 | 100
2023-01-03 | 150
2023-01-05 | 200
                                

To generate a full date range and join it to your sales data:


SELECT gs.date, COALESCE(s.amount, 0) AS amount
FROM generate_series('2023-01-01'::date, '2023-01-05', '1 day') AS gs(date)
LEFT JOIN sales s ON s.sale_date = gs.date;
                                

This will give you:


date       | amount
-----------|--------
2023-01-01 | 100
2023-01-02 | 0
2023-01-03 | 150
2023-01-04 | 0
2023-01-05 | 200
                                

Handling Numeric Gaps

You can also fill gaps in numeric sequences:


SELECT gs.id, COALESCE(d.value, 0) AS value
FROM generate_series(1, 5) AS gs(id)
LEFT JOIN data d ON d.id = gs.id;
                                

Best Practices

  • Always use LEFT JOIN to preserve the full series.
  • Use COALESCE to replace NULL with default values like 0.
  • For large datasets, limit the series to a reasonable range to avoid performance issues.

Conclusion

By using generate_series, you can ensure that your reports and visualizations always show complete ranges, making your data clearer and more reliable. It’s a simple but essential tool for anyone working with PostgreSQL.

Happy querying!