Hi,
On 2025-02-21 16:24:38 -0500, Tom Lane wrote:
> Oy. Those are outright horrid, even without any consideration of
> pre-preparing them. We know the OID of the table we want to dump,
> we should be doing "FROM pg_class WHERE oid = whatever" and lose
> the join to pg_namespace altogether. The explicit casts to regclass
> are quite expensive too to fetch information that pg_dump already
> has. It already knows the server version, too.
> Moreover, the first of these shouldn't be a separate query at all.
> I objected to fetching pg_statistic content for all tables at once,
> but relpages/reltuples/relallvisible is a pretty small amount of
> new info. We should just collect those fields as part of getTables'
> main query of pg_class (which, indeed, is already fetching relpages).
> On the second one, if we want to go through the pg_stats view then
> we can't rely on table OID, but I don't see why we need the joins
> to anything else. "WHERE s.schemaname = 'x' AND s.tablename = 'y'"
> seems sufficient.
Agreed on all those.
> I wonder whether we ought to issue different queries depending on
> whether we're superuser. The pg_stats view is rather expensive
> because of its security restrictions, and if we're superuser we
> could just look directly at pg_statistic. Maybe those checks are
> fast enough not to matter, but ...
It doesn't seem to make much of a difference, from what I can tell.
At execution time most of the time is is in
a) the joins to pg_attribute and pg_class (the ones in pg_stats)
b) array_out().
The times get way worse if you dump stats for catalog tables, because there
some of arrays are regproc and regprocout calls FuncnameGetCandidates(), which
then ends up iterating over a long cached list... I think that's basically
O(N^2)?
Of course that's nothing we should encounter frequently, but ugh.
Greetings,
Andres Freund