Re: [PATCH] Reduce pg_class scans in GRANT/REVOKE ON ALL TABLES IN SCHEMA - Mailing list pgsql-hackers
| From | CharSyam |
|---|---|
| Subject | Re: [PATCH] Reduce pg_class scans in GRANT/REVOKE ON ALL TABLES IN SCHEMA |
| Date | |
| Msg-id | CAMrLSE7mjAgWGykBdRM1Etiw6h=YopAp_9hrf4qOumverqab-Q@mail.gmail.com Whole thread |
| In response to | Re: [PATCH] Reduce pg_class scans in GRANT/REVOKE ON ALL TABLES IN SCHEMA (Michael Paquier <michael@paquier.xyz>) |
| List | pgsql-hackers |
I ran a benchmark on the patch *Reduce pg_class scans in GRANT/REVOKE
ON ALL TABLES IN SCHEMA* (collapses 5 per-relkind pg_class heap scans
into 1 scan distributed into per-relkind buckets). Summary below.(It took much time to tests)
## Assumptions
- Two builds of PostgreSQL 19devel from the same source tree (one
patched, one at master tip), identical compile flags, separate
--prefix.
- Separate data directories, run sequentially on an otherwise idle
host.
- GUCs: shared_buffers=2GB, max_locks_per_transaction=100000,
fsync=off, synchronous_commit=off, full_page_writes=off,
autovacuum=off.
- bench_s schema contains N empty tables (CREATE TABLE t_i()).
- Measured operations:
GRANT SELECT ON ALL TABLES IN SCHEMA bench_s TO bench_role
REVOKE ALL PRIVILEGES ON ALL TABLES IN SCHEMA bench_s FROM bench_role
- Best of 3 runs reported (seconds).
- Two scenarios:
A. Clean catalog — VACUUM FULL pg_class; VACUUM FULL pg_attribute
immediately before measurement. pg_class
densely packed.
B. Bloated catalog — pre-bloat phase: GRANT+REVOKE on all N tables
repeated C cycles, no VACUUM. Both patched
and master operate on catalogs with identical
relpages and n_dead_tup.
## Results — Scenario A (Clean catalog)
macOS (Apple Silicon), best of 3, seconds:
ntables | GRANT patched | GRANT master | REVOKE patched | REVOKE master
--------+---------------+--------------+----------------+--------------
20,000 | 0.116 | 0.115 | 0.231 | 0.226
40,000 | 0.250 | 0.250 | 0.460 | 0.460
100,000 | 0.730 | 0.678 | 1.193 | 1.193
Honestly, there is no measurable performance difference in the clean
state. Patched and master are statistically indistinguishable within
run-to-run noise. This matches the design of the patch: when pg_class
is densely packed, repeating a small seq scan five times is cheap, so
collapsing it into one has nothing meaningful to save. The patch adds
no overhead either — worst case is a tie.
## Results — Scenario B (Bloated catalog)
### Linux x86_64, C=20, best of 3, seconds
ntables | dead_tup | GRANT patched | GRANT master | Δ | REVOKE patched | REVOKE master | Δ
----------+-----------+---------------+--------------+---------+----------------+---------------+---------
10,000 | 0 | 0.0924 | 0.0935 | −1.2 % | 0.1668 | 0.1696 | −1.6 %
20,000 | 109,825 | 0.2027 | 0.2069 | −2.0 % | 0.3381 | 0.3533 | −4.3 %
50,000 | 329,468 | 0.5555 | 0.5895 | −5.8 % | 0.8901 | 0.9371 | −5.0 %
100,000 | 879,311 | 1.1732 | 1.1968 | −2.0 % | 1.8808 | 1.9555 | −3.8 %
200,000 | 1,978,925 | 2.2188 | 2.3470 | −5.5 % | 3.7290 | 3.9064 | −4.5 %
500,000 | 4,178,604 | 6.0260 | 6.6663 | −9.6 % | 9.8162 | 10.2169 | −3.9 %
1,000,000 | 9,678,399 | 12.9241 | 14.7657 | −12.5 % | 24.8893 | 28.7566 | −13.4 %
### macOS (Apple Silicon), C=20 (C=10 at 1M), best of 3, seconds
ntables | dead_tup | GRANT patched | GRANT master | Δ | REVOKE patched | REVOKE master | Δ
----------+-----------+---------------+--------------+--------+----------------+---------------+--------
20,000 | 299,960 | 0.168 | 0.163 | +3 % | 0.260 | 0.278 | −6 %
40,000 | 519,601 | 0.307 | 0.307 | 0 % | 0.552 | 0.564 | −2 %
100,000 | 959,268 | 0.784 | 0.934 | −16 % | 1.405 | 1.419 | ~0 %
200,000 | 2,058,886 | 1.787 | 1.878 | −5 % | 2.745 | 2.849 | −4 %
500,000 | 4,258,565 | 4.727 | 5.197 | −9 % | 7.126 | 7.908 | −10 %
1,000,000 | 9,758,364 | 10.977 | 11.126 | −1 % | 19.473 | 20.759 | −6 %
Negative Δ = patched faster. Under catalog bloat the patch produces a
consistent, reproducible improvement on both operating systems.
Happy to share the bench scripts and raw logs on request.
Thanks,
charsyam
ON ALL TABLES IN SCHEMA* (collapses 5 per-relkind pg_class heap scans
into 1 scan distributed into per-relkind buckets). Summary below.(It took much time to tests)
## Assumptions
- Two builds of PostgreSQL 19devel from the same source tree (one
patched, one at master tip), identical compile flags, separate
--prefix.
- Separate data directories, run sequentially on an otherwise idle
host.
- GUCs: shared_buffers=2GB, max_locks_per_transaction=100000,
fsync=off, synchronous_commit=off, full_page_writes=off,
autovacuum=off.
- bench_s schema contains N empty tables (CREATE TABLE t_i()).
- Measured operations:
GRANT SELECT ON ALL TABLES IN SCHEMA bench_s TO bench_role
REVOKE ALL PRIVILEGES ON ALL TABLES IN SCHEMA bench_s FROM bench_role
- Best of 3 runs reported (seconds).
- Two scenarios:
A. Clean catalog — VACUUM FULL pg_class; VACUUM FULL pg_attribute
immediately before measurement. pg_class
densely packed.
B. Bloated catalog — pre-bloat phase: GRANT+REVOKE on all N tables
repeated C cycles, no VACUUM. Both patched
and master operate on catalogs with identical
relpages and n_dead_tup.
## Results — Scenario A (Clean catalog)
macOS (Apple Silicon), best of 3, seconds:
ntables | GRANT patched | GRANT master | REVOKE patched | REVOKE master
--------+---------------+--------------+----------------+--------------
20,000 | 0.116 | 0.115 | 0.231 | 0.226
40,000 | 0.250 | 0.250 | 0.460 | 0.460
100,000 | 0.730 | 0.678 | 1.193 | 1.193
Honestly, there is no measurable performance difference in the clean
state. Patched and master are statistically indistinguishable within
run-to-run noise. This matches the design of the patch: when pg_class
is densely packed, repeating a small seq scan five times is cheap, so
collapsing it into one has nothing meaningful to save. The patch adds
no overhead either — worst case is a tie.
## Results — Scenario B (Bloated catalog)
### Linux x86_64, C=20, best of 3, seconds
ntables | dead_tup | GRANT patched | GRANT master | Δ | REVOKE patched | REVOKE master | Δ
----------+-----------+---------------+--------------+---------+----------------+---------------+---------
10,000 | 0 | 0.0924 | 0.0935 | −1.2 % | 0.1668 | 0.1696 | −1.6 %
20,000 | 109,825 | 0.2027 | 0.2069 | −2.0 % | 0.3381 | 0.3533 | −4.3 %
50,000 | 329,468 | 0.5555 | 0.5895 | −5.8 % | 0.8901 | 0.9371 | −5.0 %
100,000 | 879,311 | 1.1732 | 1.1968 | −2.0 % | 1.8808 | 1.9555 | −3.8 %
200,000 | 1,978,925 | 2.2188 | 2.3470 | −5.5 % | 3.7290 | 3.9064 | −4.5 %
500,000 | 4,178,604 | 6.0260 | 6.6663 | −9.6 % | 9.8162 | 10.2169 | −3.9 %
1,000,000 | 9,678,399 | 12.9241 | 14.7657 | −12.5 % | 24.8893 | 28.7566 | −13.4 %
### macOS (Apple Silicon), C=20 (C=10 at 1M), best of 3, seconds
ntables | dead_tup | GRANT patched | GRANT master | Δ | REVOKE patched | REVOKE master | Δ
----------+-----------+---------------+--------------+--------+----------------+---------------+--------
20,000 | 299,960 | 0.168 | 0.163 | +3 % | 0.260 | 0.278 | −6 %
40,000 | 519,601 | 0.307 | 0.307 | 0 % | 0.552 | 0.564 | −2 %
100,000 | 959,268 | 0.784 | 0.934 | −16 % | 1.405 | 1.419 | ~0 %
200,000 | 2,058,886 | 1.787 | 1.878 | −5 % | 2.745 | 2.849 | −4 %
500,000 | 4,258,565 | 4.727 | 5.197 | −9 % | 7.126 | 7.908 | −10 %
1,000,000 | 9,758,364 | 10.977 | 11.126 | −1 % | 19.473 | 20.759 | −6 %
Negative Δ = patched faster. Under catalog bloat the patch produces a
consistent, reproducible improvement on both operating systems.
Happy to share the bench scripts and raw logs on request.
Thanks,
charsyam
2026년 4월 13일 (월) 오전 9:43, Michael Paquier <michael@paquier.xyz>님이 작성:
On Sun, Apr 12, 2026 at 04:22:24PM +0900, CharSyam wrote:
> Benchmark
> ---------
> This is a targeted micro-optimization, not a dramatic speedup.
> With 10,000 tables in a single schema (pg_class ~10,452 rows),
> running GRANT/REVOKE SELECT ON ALL TABLES IN SCHEMA in a loop
> (6 iterations, first dropped as warmup), I measured a consistent
> ~15% reduction in end-to-end time:
>
> baseline patched delta
> GRANT (avg) 88.2 ms 75.9 ms -14%
> REVOKE (avg) 134.9 ms 115.7 ms -14%
I am pretty sure that there are users with millions of relations in a
single schema that could benefit from that. At least that would not
be surprising with partitioning these days, and foreign tables. What
kind of numbers do you get if you bump up the number of digits for
these tests. Let's say a comparison based on a few million relations
at least?
The change you are proposing looks simple enough, quickly skimming
through the patch. There may be more optimizations doable here, I
have not looked at that, still I tend to like such micro-optimization
proposals as they provide a silent benefit.
--
Michael
pgsql-hackers by date: