Re: Statistics Import and Export - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Statistics Import and Export
Date
Msg-id blezfpeafycxlizyynwvwzh2vywmklhvfqudhicjrccu4raqpx@4ttfufowc2vo
Whole thread Raw
In response to Re: Statistics Import and Export  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Statistics Import and Export
List pgsql-hackers
Hi,

On 2025-02-21 16:24:38 -0500, Tom Lane wrote:
> Oy.  Those are outright horrid, even without any consideration of
> pre-preparing them.  We know the OID of the table we want to dump,
> we should be doing "FROM pg_class WHERE oid = whatever" and lose
> the join to pg_namespace altogether.  The explicit casts to regclass
> are quite expensive too to fetch information that pg_dump already
> has.  It already knows the server version, too.

> Moreover, the first of these shouldn't be a separate query at all.
> I objected to fetching pg_statistic content for all tables at once,
> but relpages/reltuples/relallvisible is a pretty small amount of
> new info.  We should just collect those fields as part of getTables'
> main query of pg_class (which, indeed, is already fetching relpages).

> On the second one, if we want to go through the pg_stats view then
> we can't rely on table OID, but I don't see why we need the joins
> to anything else.  "WHERE s.schemaname = 'x' AND s.tablename = 'y'"
> seems sufficient.

Agreed on all those.


> I wonder whether we ought to issue different queries depending on
> whether we're superuser.  The pg_stats view is rather expensive
> because of its security restrictions, and if we're superuser we
> could just look directly at pg_statistic.  Maybe those checks are
> fast enough not to matter, but ...

It doesn't seem to make much of a difference, from what I can tell.

At execution time most of the time is is in
a) the joins to pg_attribute and pg_class (the ones in pg_stats)
b) array_out().


The times get way worse if you dump stats for catalog tables, because there
some of arrays are regproc and regprocout calls FuncnameGetCandidates(), which
then ends up iterating over a long cached list... I think that's basically
O(N^2)?

Of course that's nothing we should encounter frequently, but ugh.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Greg Sabino Mullane
Date:
Subject: PATCH: warn about, and deprecate, clear text passwords
Next
From: Tom Lane
Date:
Subject: Re: Statistics Import and Export