Re: Statistics Import and Export - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Statistics Import and Export |
Date | |
Msg-id | rp6int5sqs5bn7g35brqz7w3t7il6tn2dgpm7tr3pev626wnc4@fi7a3enofhct Whole thread Raw |
In response to | Re: Statistics Import and Export (Corey Huinker <corey.huinker@gmail.com>) |
Responses |
Re: Statistics Import and Export
Re: Statistics Import and Export |
List | pgsql-hackers |
Hi, On 2025-02-25 21:29:56 -0500, Corey Huinker wrote: > On Tue, Feb 25, 2025 at 9:00 PM Jeff Davis <pgsql@j-davis.com> wrote: > > > On Mon, 2025-02-24 at 09:54 -0500, Andres Freund wrote: > > > Have you compared performance of with/without stats after these > > > optimizations? > > > > On unoptimized build with asserts enabled, dumping the regression > > database: > > > > --no-statistics: 1.0s > > master: 3.6s > > v3j-0001: 3.0s > > v3j-0002: 1.7s > > > > I plan to commit the patches soon. I think these have all been committed, but I still see a larger performance difference than what you observed. I just checked because I was noticing that the tests are still considerably slower than they used to be. An optimized pg_dump against an unoptimized assert-enabled server: time ./src/bin/pg_dump/pg_dump --no-data --quote-all-identifiers --binary-upgrade --format=custom --no-sync regression >/dev/null real 0m2.778s user 0m0.167s sys 0m0.057s $ time ./src/bin/pg_dump/pg_dump --no-data --quote-all-identifiers --binary-upgrade --format=custom --no-sync --no-statisticsregression > /dev/null real 0m1.290s user 0m0.097s sys 0m0.026s I thought it might be interesting to look at the set of queries arriving on the server side, so I enabled pg-stat_statements and ran a dump: regression[4041753][1]=# SELECT total_exec_time, total_plan_time, calls, plans, substring(query, 1, 30) FROM pg_stat_statementsORDER BY calls DESC LIMIT 15; ┌─────────────────────┬─────────────────────┬───────┬───────┬────────────────────────────────┐ │ total_exec_time │ total_plan_time │ calls │ plans │ substring │ ├─────────────────────┼─────────────────────┼───────┼───────┼────────────────────────────────┤ │ 239.9672189999998 │ 12.5725 │ 981 │ 6 │ PREPARE getAttributeStats(pg_c │ │ 15.330405000000004 │ 1.836712 │ 282 │ 6 │ PREPARE dumpFunc(pg_catalog.oi │ │ 10.129114000000003 │ 0.39834800000000004 │ 199 │ 6 │ PREPARE dumpTableAttach(pg_cat │ │ 9.887489000000002 │ 0.9332620000000001 │ 84 │ 84 │ SELECT pg_get_partkeydef($1) │ │ 14.350725000000006 │ 0.691071 │ 60 │ 60 │ SELECT pg_catalog.pg_get_viewd │ │ 5.1174219999999995 │ 1.4604219999999999 │ 47 │ 6 │ PREPARE dumpAgg(pg_catalog.oid │ │ 0.24036199999999996 │ 0.545125 │ 41 │ 41 │ SELECT pg_catalog.format_type( │ │ 7.099635000000002 │ 0.47031800000000007 │ 39 │ 39 │ SELECT pg_catalog.pg_get_ruled │ │ 0.672752 │ 1.9036320000000002 │ 21 │ 6 │ PREPARE dumpDomain(pg_catalog. │ │ 1.6519299999999997 │ 3.1480380000000006 │ 21 │ 22 │ PREPARE getDomainConstraints(p │ │ 1.085548 │ 3.9647630000000005 │ 16 │ 6 │ PREPARE dumpCompositeType(pg_c │ │ 0.196259 │ 0.602291 │ 11 │ 6 │ PREPARE dumpOpr(pg_catalog.oid │ │ 0.265461 │ 4.428914 │ 10 │ 10 │ SELECT amprocnum, amproc::pg_c │ │ 0.39591399999999993 │ 9.345471 │ 10 │ 10 │ SELECT amopstrategy, amopopr:: │ │ 0.35752100000000003 │ 2.128437 │ 9 │ 9 │ SELECT nspname, tmplname FROM │ └─────────────────────┴─────────────────────┴───────┴───────┴────────────────────────────────┘ It looks a lot less bad with an optimized build: regression[4042057][1]=# SELECT total_exec_time, total_plan_time, calls, plans, substring(query, 1, 30) FROM pg_stat_statementsORDER BY calls DESC LIMIT 15; ┌─────────────────────┬─────────────────────┬───────┬───────┬────────────────────────────────┐ │ total_exec_time │ total_plan_time │ calls │ plans │ substring │ ├─────────────────────┼─────────────────────┼───────┼───────┼────────────────────────────────┤ │ 50.63764299999999 │ 2.503585 │ 981 │ 6 │ PREPARE getAttributeStats(pg_c │ │ 3.5241990000000007 │ 0.478541 │ 282 │ 6 │ PREPARE dumpFunc(pg_catalog.oi │ │ 2.3170359999999985 │ 0.126379 │ 199 │ 6 │ PREPARE dumpTableAttach(pg_cat │ │ 2.291331 │ 0.25360400000000005 │ 84 │ 84 │ SELECT pg_get_partkeydef($1) │ │ 4.678433000000003 │ 0.202578 │ 60 │ 60 │ SELECT pg_catalog.pg_get_viewd │ │ 1.1288440000000004 │ 0.30976200000000004 │ 47 │ 6 │ PREPARE dumpAgg(pg_catalog.oid │ │ 0.06619 │ 0.16813600000000004 │ 41 │ 41 │ SELECT pg_catalog.format_type( │ │ 2.102865 │ 0.115169 │ 39 │ 39 │ SELECT pg_catalog.pg_get_ruled │ │ 0.16163 │ 0.439991 │ 21 │ 6 │ PREPARE dumpDomain(pg_catalog. │ │ 0.5335120000000001 │ 0.727573 │ 21 │ 22 │ PREPARE getDomainConstraints(p │ │ 0.28177 │ 0.894156 │ 16 │ 6 │ PREPARE dumpCompositeType(pg_c │ │ 0.038558 │ 0.140807 │ 11 │ 6 │ PREPARE dumpOpr(pg_catalog.oid │ │ 0.082078 │ 0.9654280000000001 │ 10 │ 10 │ SELECT amprocnum, amproc::pg_c │ │ 0.136964 │ 2.1140120000000002 │ 10 │ 10 │ SELECT amopstrategy, amopopr:: │ │ 0.11634699999999999 │ 0.48550499999999996 │ 9 │ 9 │ SELECT nspname, tmplname FROM │ └─────────────────────┴─────────────────────┴───────┴───────┴────────────────────────────────┘ (15 rows) This isn't even *remotely* an adversarial case, there are lots of workloads with folks have a handful of indexes on each table and many many tables. Right now --statistics more than doubles the number of queries that pg_dump issues. That's oviously noticeable locally, but it's going to be really noticeable when dumping across the network. I think we need to do more to lessen the impact. Even leaving regression test performance aside, the time increase for the default pg_dump invocation will be painful for folks, particularly due to this being enabled by default. One fairly easy win would be to stop issuing getAttributeStats() for non-expression indexes. In most cases that'll already drastically cut down on the extra queries. Greetings, Andres Freund
pgsql-hackers by date: