Query planner instability on large append-only table with heavy same-day inserts - Mailing list pgsql-general
| From | Giriraj Sharma |
|---|---|
| Subject | Query planner instability on large append-only table with heavy same-day inserts |
| Date | |
| Msg-id | CAB5vuUwa3QBFvJKCWY9PYWM8CM+Qk0XSq1R84zMjuBibVW8ccw@mail.gmail.com Whole thread Raw |
| Responses |
Re: Query planner instability on large append-only table with heavy same-day inserts
Re: Query planner instability on large append-only table with heavy same-day inserts |
| List | pgsql-general |
Environment
PostgreSQL 14 / 15 tested.
RDS / Aurora Postgres in production.
Primary key index (effective_date, idempotency_id).
Autovacuum enabled.
SQLC (Go) for query generation.
________________________________
Context
We are running into query planner instability on an append-only table that records daily commit information.
The workload is heavily skewed toward the current day’s data, and query performance deteriorates periodically until we run ANALYZE.
________________________________
Table Definition
CREATE TABLE sample_table (
idempotency_id transactions.string NOT NULL,
effective_date date NOT NULL,
created_at timestamptz DEFAULT now() NOT NULL,
CONSTRAINT commits_v3_pk PRIMARY KEY (effective_date, idempotency_id)
);
________________________________
Query Pattern
All our lookups are parameterized (SQLC, Go):
SELECT idempotency_id
FROM sample_table
WHERE effective_date = @effective_date
AND idempotency_id IN (
SELECT unnest(@idempotency_ids::transactions.string[])
);
Typical characteristics:
@effective_date almost always = current date.
@idempotency_ids varies between 2–50 items off-peak and 800–1000 items at peak hours.
Query is latency-sensitive (expected 5–10 ms).
________________________________
Workload Characteristics
~10–12 million inserts per weekday, almost entirely for the current date.
Practically append-only — no updates or deletes.
No weekend inserts.
Occasional rare inserts for past or future dates (late or early trades).
Retention: ~3 years of data (~1000 days × 10 M = 10+ billion rows).
PostgreSQL 14+.
________________________________
Observed Behavior
Immediately after ANALYZE, this query uses an Index Only Scan on the primary key (effective_date, idempotency_id) with stable latency around 5–10 ms.
After several days (5–7 typically), the query planner flips to a Sequential Scan, and latency jumps to 2–30 seconds.
Running ANALYZE transactions.commits_v3; restores performance instantly.
We currently run manual ANALYZE twice a day via pg_cron, which helps but doesn’t fully guarantee stability (especially in test environments where insert patterns are more random).
________________________________
What We’ve Tried
Manual and scheduled ANALYZE runs (twice a day, each run takes ~30s) → improves performance but not sustainable long-term.
Verified query execution plans before and after ANALYZE — planner switches from Index Only Scan to Seq Scan as statistics become stale.
Confirmed table’s autovacuum is running (last one occurred 15 days ago), but its frequency isn’t sufficient to keep stats current during high insert periods. Could this be an issue at all given that we do run ANALYZE at-least twice ?
_______________________________
Problem Summary
The planner’s row-count estimates for effective_date and idempotency_id become inaccurate as we continuously append to “today’s” date.
The result is plan instability (index scan ↔ sequential scan) until statistics are refreshed.
We’re looking for a solution that keeps plans stable without manual ANALYZE as data volume scales.
________________________________
Questions for the Community
1. Partitioning
Would daily range partitioning by effective_date (≈ 1000 partitions for 3 years) be the right long-term approach here?
Given that inserts and queries almost always target today’s partition, will partition pruning and per-partition statistics fully eliminate the stale-statistics problem? Are there known performance implications of maintaining ~1000 daily partitions at this scale (10 M/day)? We occasionally receive backdated or future-dated inserts — can such out-of-range values be handled efficiently (e.g., by creating partitions on the fly)?
2. Autovacuum / Analyze Tuning
If we stay with a single table, what are practical per-table autovacuum settings to ensure frequent ANALYZE even as total row count grows into billions?
Would it make sense to use:
ALTER TABLE transactions.commits_v3
SET (
autovacuum_analyze_scale_factor = 0,
autovacuum_analyze_threshold = 50000,
autovacuum_vacuum_scale_factor = 0.05,
autovacuum_vacuum_threshold = 10000
);
to decouple analyze frequency from table size?
Should we also experiment with raising the statistics target for effective_date from 100 to 1000 using:
ALTER TABLE transactions.commits_v3
ALTER COLUMN effective_date SET STATISTICS 1000;
to improve the planner’s histogram accuracy for the date distribution?
3. Best Practices
Are there best practices or proven patterns for append-only, time-series–like workloads that insert heavily into one day and read from the same day?
Is there a known best way to make Postgres’s planner more resilient to temporary statistic drift for parameterized queries like ours?
________________________________
Goal
We’d like a “set it and forget it” architecture — either through partitioning or robust autovacuum tuning — where:
The planner always chooses the index scan for same-day queries.
We no longer need manual ANALYZE (that runs via pg_cron).
Query latency remains in the 5–10 ms range, even as total data volume grows into billions of rows.
________________________________
Any recommendations, benchmark references, or production-proven strategies for this workload would be highly appreciated.
--
Giriraj Sharma
about.me/girirajsharma
PostgreSQL 14 / 15 tested.
RDS / Aurora Postgres in production.
Primary key index (effective_date, idempotency_id).
Autovacuum enabled.
SQLC (Go) for query generation.
________________________________
Context
We are running into query planner instability on an append-only table that records daily commit information.
The workload is heavily skewed toward the current day’s data, and query performance deteriorates periodically until we run ANALYZE.
________________________________
Table Definition
CREATE TABLE sample_table (
idempotency_id transactions.string NOT NULL,
effective_date date NOT NULL,
created_at timestamptz DEFAULT now() NOT NULL,
CONSTRAINT commits_v3_pk PRIMARY KEY (effective_date, idempotency_id)
);
________________________________
Query Pattern
All our lookups are parameterized (SQLC, Go):
SELECT idempotency_id
FROM sample_table
WHERE effective_date = @effective_date
AND idempotency_id IN (
SELECT unnest(@idempotency_ids::transactions.string[])
);
Typical characteristics:
@effective_date almost always = current date.
@idempotency_ids varies between 2–50 items off-peak and 800–1000 items at peak hours.
Query is latency-sensitive (expected 5–10 ms).
________________________________
Workload Characteristics
~10–12 million inserts per weekday, almost entirely for the current date.
Practically append-only — no updates or deletes.
No weekend inserts.
Occasional rare inserts for past or future dates (late or early trades).
Retention: ~3 years of data (~1000 days × 10 M = 10+ billion rows).
PostgreSQL 14+.
________________________________
Observed Behavior
Immediately after ANALYZE, this query uses an Index Only Scan on the primary key (effective_date, idempotency_id) with stable latency around 5–10 ms.
After several days (5–7 typically), the query planner flips to a Sequential Scan, and latency jumps to 2–30 seconds.
Running ANALYZE transactions.commits_v3; restores performance instantly.
We currently run manual ANALYZE twice a day via pg_cron, which helps but doesn’t fully guarantee stability (especially in test environments where insert patterns are more random).
________________________________
What We’ve Tried
Manual and scheduled ANALYZE runs (twice a day, each run takes ~30s) → improves performance but not sustainable long-term.
Verified query execution plans before and after ANALYZE — planner switches from Index Only Scan to Seq Scan as statistics become stale.
Confirmed table’s autovacuum is running (last one occurred 15 days ago), but its frequency isn’t sufficient to keep stats current during high insert periods. Could this be an issue at all given that we do run ANALYZE at-least twice ?
_______________________________
Problem Summary
The planner’s row-count estimates for effective_date and idempotency_id become inaccurate as we continuously append to “today’s” date.
The result is plan instability (index scan ↔ sequential scan) until statistics are refreshed.
We’re looking for a solution that keeps plans stable without manual ANALYZE as data volume scales.
________________________________
Questions for the Community
1. Partitioning
Would daily range partitioning by effective_date (≈ 1000 partitions for 3 years) be the right long-term approach here?
Given that inserts and queries almost always target today’s partition, will partition pruning and per-partition statistics fully eliminate the stale-statistics problem? Are there known performance implications of maintaining ~1000 daily partitions at this scale (10 M/day)? We occasionally receive backdated or future-dated inserts — can such out-of-range values be handled efficiently (e.g., by creating partitions on the fly)?
2. Autovacuum / Analyze Tuning
If we stay with a single table, what are practical per-table autovacuum settings to ensure frequent ANALYZE even as total row count grows into billions?
Would it make sense to use:
ALTER TABLE transactions.commits_v3
SET (
autovacuum_analyze_scale_factor = 0,
autovacuum_analyze_threshold = 50000,
autovacuum_vacuum_scale_factor = 0.05,
autovacuum_vacuum_threshold = 10000
);
to decouple analyze frequency from table size?
Should we also experiment with raising the statistics target for effective_date from 100 to 1000 using:
ALTER TABLE transactions.commits_v3
ALTER COLUMN effective_date SET STATISTICS 1000;
to improve the planner’s histogram accuracy for the date distribution?
3. Best Practices
Are there best practices or proven patterns for append-only, time-series–like workloads that insert heavily into one day and read from the same day?
Is there a known best way to make Postgres’s planner more resilient to temporary statistic drift for parameterized queries like ours?
________________________________
Goal
We’d like a “set it and forget it” architecture — either through partitioning or robust autovacuum tuning — where:
The planner always chooses the index scan for same-day queries.
We no longer need manual ANALYZE (that runs via pg_cron).
Query latency remains in the 5–10 ms range, even as total data volume grows into billions of rows.
________________________________
Any recommendations, benchmark references, or production-proven strategies for this workload would be highly appreciated.
--
Giriraj Sharma
about.me/girirajsharma
pgsql-general by date: