Re: Statistics Import and Export - Mailing list pgsql-hackers

From Corey Huinker
Subject Re: Statistics Import and Export
Date
Msg-id CADkLM=f1n2_Vomq0gKab7xdxDHmJGgn=DE48P8fzQOp3Mrs1Qg@mail.gmail.com
Whole thread Raw
In response to Re: Statistics Import and Export  (Andres Freund <andres@anarazel.de>)
Responses Re: Statistics Import and Export
Re: Statistics Import and Export
Re: Statistics Import and Export
List pgsql-hackers

> I'm uncertain how we'd do that with (schemaname,tablename) pairs. Are you
> suggesting we back the joins from pg_stats to pg_namespace and pg_class and
> then filter by oids?

I was thinking of one query per schema or something like that. But yea, a
query to pg_namespace and pg_class wouldn't be a problem if we did it far
fewer times than before.   Or you could put the list of catalogs / tables to
be queried into an unnest() with two arrays or such.

Not sure how good the query plan for that would be, but it may be worth
looking at.

Ok, so we're willing to take the pg_class/pg_namespace join hit for one or a handful of queries, good to know.
 
> Each call to getAttributeStats() fetches the pg_stats for one and only one
> relation and then writes the SQL call to fout, then discards the result set
> once all the attributes of the relation are done.

I don't think that's true. For one my example demonstrated that it increases
the peak memory usage substantially. That'd not be the case if the data was
just written out to stdout or such.

Looking at the code confirms that. The ArchiveEntry() in dumpRelationStats()
is never freed, afaict. And ArchiveEntry() strdups ->createStmt, which
contains the "SELECT pg_restore_attribute_stats(...)".

Pardon my inexperience, but aren't the ArchiveEntry records needed right up until the program's run? If there's value in freeing them, why isn't it being done already? What other thing would consume this freed memory?
 


> I don't think the query itself would be a problem, a query querying all the
> > required stats should probably use PQsetSingleRowMode() or
> > PQsetChunkedRowsMode().
>
>
> That makes sense if we get the attribute stats from the result set in the
> order that we need them, and I don't know how we could possibly do that.
> We'd still need a table to bsearch() and that would be huge.

I'm not following - what would be the problem with a bsearch()? Compared to
the stats data an array to map from oid to an index in an array of stats data
data would be very small.

If we can do oid bsearch lookups, then we might be in business, but even then we have to maintain a data structure in memory of all pg_stats records relevant to this dump, either in PGresult form, an intermediate data structure like tblinfo/indxinfo, or in the resolved string of pg_restore_attribute_stats() calls for that relation...which then get strdup'd into the ArchiveEntry that we have to maintain anyway.
 


But with the unnest() idea from above it wouldn't even be needed, you could
use

SELECT ...
FROM unnest(schema_array, table_array) WITH ORDINALITY AS src(schemaname, tablename)
...
ORDER BY ordinality

or something along those lines.

This still seems like there is some ability to generate a batch of these rows and then discard them, and then go to the next logical batch (perhaps by schema, as you suggested earlier), and I don't know that we have that freedom. Perhaps we would have that freedom if stats were the absolute last thing loaded in a dump.


Anyway, here's a rebased set of the existing up-for-consideration patches, plus the optimization of avoiding querying on non-expression indexes.

I should add that this set presently doesn't include a patch that reverts the set locale and strtof() call in favor of storing reltuples as a string. As far as I know that idea is still on the table.
Attachment

pgsql-hackers by date:

Previous
From: Daniil Davydov
Date:
Subject: Add arbitrary xid and mxid to pg_resetwal
Next
From: Michael Paquier
Date:
Subject: Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible