Re: Statistics Import and Export - Mailing list pgsql-hackers

From Corey Huinker
Subject Re: Statistics Import and Export
Date
Msg-id CADkLM=eWSv2z_DVB=psfwHitq7DbFJQaP9A46R27eARC-4OYtQ@mail.gmail.com
Whole thread Raw
In response to Re: Statistics Import and Export  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Statistics Import and Export
Re: Statistics Import and Export
List pgsql-hackers
On Mon, Feb 24, 2025 at 12:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andres Freund <andres@anarazel.de> writes:
> On 2025-02-24 05:11:48 -0500, Corey Huinker wrote:
>> * relpages/reltuples/relallvisible are now char[32] buffers in RelStatsInfo
>> and nowhere else (existing relpages conversion remains, however)

> I don't see the point. This will use more memory and if we can't get
> conversions between integers and strings right we have much bigger
> problems. The same code was used in the backend too!

I don't like that either.  But there's a bigger problem with 0002:
it's still got mostly table-driven output.  I've been working on
fixing the problem discussed over in the -committers thread about how
we need to identify index-expression columns by number not name [1].

There doesn't seem to be any way around it, but it will slightly complicate the dump-ing side of things, in that we need to either:

a) switch to attnums for index expressions and keep attname calls for everything else.

b) track what the attnum will be on the destination side, which will be different when we're not doing a binary upgrade and there are any preceding dropped columns.

The patch Tom provided opens the door for option "a", and I'm inclined to take it.

 
It's not too awful in the backend (WIP patch attached), but
getting appendAttStatsImport to do it seems like a complete disaster,
and this patch fails to make that any easier.  It'd be much better
if you gave up on that table-driven business and just open-coded the
handling of the successive output values as was discussed upthread.

Can do.


I don't think the table-driven approach has anything to recommend it
anyway.  It requires keeping att_stats_arginfo[] in sync with the
query in getAttStatsExportQuery, an extremely nonobvious (and
undocumented) connection.  Personally I would nuke the separate
getAttStatsExportQuery and appendAttStatsImport functions altogether,
and have one function that executes a query and immediately interprets
the PGresult.

+1, though that comes at the cost of shutting off the possibility of a mass fetch from pg_stats without also rendering the pg_restore_attribute_stats calls at the same time.
 

Also, while working on the attached, I couldn't help forming the
opinion that we'd be better off to nuke pg_set_attribute_stats()
from orbit and require people to use pg_restore_attribute_stats().
pg_set_attribute_stats() would be fine if we had a way to force
people to call it with only named-argument notation, but we don't.
So I'm afraid that its existence will encourage people to rely
on a specific parameter order, and then they'll whine if we
add/remove/reorder parameters, as indeed I had to do below.

They've always had split goals. To recap for people just joining the show, the "set" family had the following properties:

1. transactional, even for pg_class
2. assumes all stats given are relevant and correct for current db version
3. guaranteed to ERROR if any parameter doesn't check out
4. unstable call signature, can and will change to match the realities of the current version
5. intended for planner experiments and fuzzing

and the "restore" family has the following properties:

1. will inplace update pg_class to avoid table bloat
2. states the version from whence the stats came, so that adjustments can be made to suit the current db version, up to and including rejecting that particular statistic
3. attempts to sidestep errors with WARNINGs so as not to kill a restore
4. stable but highly fluid kwargs-ish call signature
5. intended to be machine generated and used only in restore/upgrade

The attnum change certainly throws a wrench into that, and if we get rid of the setter functions then we will need to (re)introduce parameters to indicate our choice for properties 1 and 3. I suppose we could use the existence or non-existence of the "version" parameter as an indicator of which mode we want (if it exists, we want WARNINGS and inplace updates, if not we want pure transactional and ERROR at the first problem), but I'm not certain that proxy will hold true in the future.
 
BTW, I pushed the 0003 patch with minor adjustments.

Thanks! 

pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Statistics Import and Export
Next
From: Jeff Davis
Date:
Subject: Re: Statistics Import and Export