Hi hackers,
(I know, pg_stat_statements is already too wide, but bear with me, there is a really important use case that the proposed column solves)
When managing many clusters with thousands of query patterns, we need automated ways to detect missing indexes. Currently, if we want to find candidate queries that potentially can be improved using new indexes, looking at pg_stat_statements we need to use us total_exec_time, mean_exec_time, shared_blks_(hit|read) -- useful, but noisy for index detection since heavy aggregate queries rank high yet often can't benefit from additional indexes.
Rows removed by filter is a strong signal that an index would help. EXPLAIN shows this per-execution, but for fleet-wide automated analysis we need it aggregated in pg_stat_statements.
The proposed patch adds:
- rows_filtered column: sum of nfiltered1 + nfiltered2 from all plan nodes
- GUC pg_stat_statements.track_rows_filtered (default: off)
- Uses INSTRUMENT_ROWS only (no timing overhead)
Implementation: sums nfiltered1+nfiltered2 at ExecutorEnd. Uses INSTRUMENT_ROWS (not INSTRUMENT_ALL) — just counter increments, no timing overhead. Controlled by new GUC track_rows_filtered (default: off, PGC_SUSET). Stats file header bumped for upgrade compatibility. Includes regression tests and docs.