Re: Query planner instability on large append-only table with heavy same-day inserts - Mailing list pgsql-general

From Greg Sabino Mullane
Subject Re: Query planner instability on large append-only table with heavy same-day inserts
Date
Msg-id CAKAnmmLQGwqtD+3kn+sgAspV4dhtYYoE6DrE7uFEhn04ab6S3w@mail.gmail.com
Whole thread Raw
In response to Query planner instability on large append-only table with heavy same-day inserts  (Giriraj Sharma <giriraj.sharma27@gmail.com>)
List pgsql-general
On Tue, Oct 28, 2025 at 2:36 AM Giriraj Sharma <giriraj.sharma27@gmail.com> wrote:
Would daily range partitioning by effective_date (≈ 1000 partitions for 3 years) be the right long-term approach here?

If you are querying based on dates, yes. Does not need to be daily, could do monthly for example. A retention policy of 3 years is a great candidate for partitioning.
 
Given that inserts and queries almost always target today’s partition, will partition pruning and per-partition statistics fully eliminate the stale-statistics problem? Are there known performance implications of maintaining ~1000 daily partitions at this scale (10 M/day)? We occasionally receive backdated or future-dated inserts — can such out-of-range values be handled efficiently (e.g., by creating partitions on the fly)?

It will help, don't know about eliminate. I would not do daily unless it is really needed, that's a lot of partitions. Monthly to start with. You cannot create partitions on the fly, but you can have them go to a default partition and sort them out later. Or pre-create a bunch of partitions.

If we stay with a single table, what are practical per-table autovacuum settings to ensure frequent ANALYZE even as total row count grows into billions?

Turn off autovacuum for that table. Use cron to run vacuum and analyze hourly (or some frequency that ensures good plans). If analyze is taking too long, it can be done per-column as well, although that won't help much if your sample table is representative. But if you have a lot of other columns with stable values, you could analyze those less often. Measure and see.

Should we also experiment with raising the statistics target for effective_date from 100 to 1000 using:

I don't think that will matter if your analyze is already giving you index-only scans.

Are there best practices or proven patterns for append-only, time-series–like workloads that insert heavily into one day and read from the same day?

Partitioning. Ensure fillfactor is 100%. Minimal indexes.

Is there a known best way to make Postgres’s planner more resilient to temporary statistic drift for parameterized queries like ours?

Not really, other than giving Postgres updated stats via frequent analyze.

PostgreSQL 14 / 15 tested.

Moving to a newer version is always helpful too. 


Cheers,
Greg

--
Enterprise Postgres Software Products & Tech Support

pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Index (primary key) corrupt?
Next
From: Richard Welty
Date:
Subject: Re: phadmin4 app not starting after most recent upgrade