Re: Statistics Import and Export - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Statistics Import and Export |
Date | |
Msg-id | x5yo5ly5u3dhzfaq3hrzeqhju3io5zmwk4yqrkge3ywcflffck@ghbbgsfv56yb Whole thread Raw |
In response to | Re: Statistics Import and Export (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Statistics Import and Export
|
List | pgsql-hackers |
Hi, On 2025-02-21 15:23:00 -0500, Andres Freund wrote: > On 2025-02-20 01:39:34 -0800, Jeff Davis wrote: > > Committed with some revisions on top of v48: > > This made the pg_upgrade tests considerably slower. > > In an assert build without optimization (since that's what I use for normal > dev work): > > 1fd1bd87101^ 65.03s > 1fd1bd87101 86.84s > > > Looking at the times in the in the regress_log, I see: > [...] > Which to me rather strongly suggests pg_dump has gotten a *lot* slower with > this change. Indeed. While the slowdown is worse with assertions and without compiler optimizations, it's pretty bad otherwise too. optimized, non-cassert, pg_dump and server with the regression database contents: $ time ./src/bin/pg_dump/pg_dump regression > /dev/null real 0m1.314s user 0m0.189s sys 0m0.059s $ time ./src/bin/pg_dump/pg_dump --no-statistics regression > /dev/null real 0m0.472s user 0m0.179s sys 0m0.035s Unoptimized, cassert server and pg_dump: $ time ./src/bin/pg_dump/pg_dump regression > /dev/null real 0m9.008s user 0m0.396s sys 0m0.108s $ time ./src/bin/pg_dump/pg_dump --no-statistics regression > /dev/null real 0m2.590s user 0m0.347s sys 0m0.037s Looking at the query log, the biggest culprit is a *lot* of additional queries, I think primarily these two: SELECT c.oid::regclass AS relation, current_setting('server_version_num') AS version, c.relpages, c.reltuples, c.relallvisibleFROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE n.nspname = 'public' AND c.relname = 'alpha_neg_p2' SELECT c.oid::regclass AS relation, s.attname,s.inherited,current_setting('server_version_num') AS version, s.null_frac,s.avg_width,s.n_distinct,s.most_common_vals,s.most_common_freqs,s.histogram_bounds,s.correlation,s.most_common_elems,s.most_common_elem_freqs,s.elem_count_histogram,s.range_length_histogram,s.range_empty_frac,s.range_bounds_histogram FROMpg_stats s JOIN pg_namespace n ON n.nspname = s.schemaname JOIN pg_class c ON c.relname = s.tablename AND c.relnamespace= n.oid WHERE s.schemaname = 'public' AND s.tablename = 'alpha_neg_p2' ORDER BY s.attname, s.inherited I think there are a few things wrong here: 1) Why do we need to plan this over and over? Tom a while ago put in a fair bit of work to make frequent queries use prepared statements. In this case we spend more time replanning the query than executing it. 2) Querying this one-by-one makes this much more expensive than if it were queried in a batched fashion, for multiple tables at once. This is especially true if actually executed over network, rather than locally. 3) The query is unnecessarily expensive due to repeated joins gathering the same information. pg_stats has a join to pg_namespace and pg_class, but then the query above joins to both *again*. And afaict the joins in the pg_stats query are pretty useless? Isn't all that information already available in pg_stats? I guess you did that to get it as a ::regclass, but isn't that already known, why requery it? 4) Why do we need to fetch the version twice for every table, that can't be right? It won't change while pg_dump is running. Greetings, Andres Freund
pgsql-hackers by date: