Re: Extended Statistics set/restore/clear functions. - Mailing list pgsql-hackers

From Corey Huinker
Subject Re: Extended Statistics set/restore/clear functions.
Date
Msg-id CADkLM=ei_07zDAyTrabZRc1tbRwzWLBd5yaP1zzqBTXxpvS-SA@mail.gmail.com
Whole thread Raw
In response to Re: Extended Statistics set/restore/clear functions.  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Extended Statistics set/restore/clear functions.
List pgsql-hackers
On Mon, Nov 17, 2025 at 1:56 AM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Nov 14, 2025 at 03:25:27PM +0900, Michael Paquier wrote:
> Thanks for the new versions, I'll also look at all these across the
> next couple of days.  Probably not at 0005~ for now.

0001 and 0002 from series v13 have been applied to change the output
functions.

Thanks!

Though, I was thinking some more about the output format. Using jsonb_pretty() makes it readable in one way, and very clumsy in other ways. Instead, I'm going to try doing the following:

replace (ndist_string_value, '},', E'}\n,')

This will result in the output value being formatted in exactly the way described in the commit messages.

Of course, we could make the the actual default format by changing appendStringInfoString(&str, ", ") instead.

One might argue that the output shouldn't get too flowery, but we're already adding spaces between items and array elements, and we've already made extensive changes favoring readability over compactness.


Review of changes to input function patches:

I'm curious about the re-parameterization of error messages involving PG_NDISTINCT_KEY_ATTRIBUTES, PG_NDISTINCT_KEY_NDISTINCT, and similar keys in dependencies. I like the parameterized version better, and was confused as to why it was removed. Did you change your mind, or was it done for ease of translation?

Wording changes all look good. Use of pg_input_error_info() makes sense.

 

There is an extra thing that bugs me as incorrect for the pg_ndistinct
input, something I have not tackled myself yet.  Your patch checks
that subsets of attributes are included in the longest set found, but
it does not match the guarantees we have in mvndistinct.c: we have to
check that *all* the combinations generated by generator_init() are
satisfied based on the longest of attributes detected.  For example,
this is thought as correct in the input function:
SELECT '[{"attributes" : [-1,2], "ndistinct" : 1},
         {"attributes" : [-1,2,3], "ndistinct" : 3}]'::pg_ndistinct;

However it is obviously not correct as we are missing an element for
the attributes [-1, 3].  The simplest solution would be to export the
routines that generate the groups now in mvndistinct.c.  Also we
should make sure that the number of elements in the arrays match with
the number of groups we expect, not only the elements.  I don't think
that we need to care much about the values, but we ought to provide
stronger guarantees for the attributes listed in these elements.

I had a feeling that was going to be requested. My question would be if that we want to stick to modeling the other combinations after the first longest combination, last longest, or if we want to defer those checks altogether until we have to validate against an actual stats object?
 

Except for this argument, the input of pg_ndistinct feels OK in terms
of the guarantees that we'd want to enforce on an import.  The same
argument applies in terms of attribute number guarantees for
pg_dependencies, based on DependencyGenerator_init() & friends in
dependencies.c.  Could you look at that?

Yes. I had already looked at it to verify that _all_ combinations were always generated (they are), because I had some vague memory of the generator dropping combinations that were statistically insignificant. In retrospect, I have no idea where I got that idea.

 

For pg_dependencies, we also require some checks on the value for
"dependency", of course, making sure that this matches with what's
expected with the "largest" sets of attributes.  In this case, we need
to track the union of "dependency" and "attributes", with "attributes"
having at least one element.

This is fairly simple to do. The dependency attnum is just appended to the list of attnums, and the combinations are generated the same as ndistinct, though obviously there are no single elements.

There's probably some common code between the lists to be shared, differing only in how they report missing combinations.

 
The tests of pg_dependencies need also to be extended more (begun that
a bit, far from being complete and I'm lacking of time this week due
to a conference).  One thing that I would add are nested JSON objects
in the paths where we expect values, for example.  Please note that I
have done a brush of 0004, while on it, cleaning up typos,
inconsistencies and making the error codes consistent with the
ndistinct case where possible.  This is not ready, but that's at least
it's a start to rely on.

 

In terms of committable bits, it would be better to apply the input
functions once both parts are ready to go.  For now I am attached a
v14 with the work I've put into them.  0005~ are not reviewed yet, as
mentioned previously.  The changes in pg_dependencies are actually
straight-forward to figure out (well, mostly) once the pg_ndistinct
changes are OK in shape.
--
Michael

+1 

pgsql-hackers by date:

Previous
From: ocean_li_996
Date:
Subject: minor improvement in snapbuild: use existing interface and remove fake code
Next
From: Tomas Vondra
Date:
Subject: Re: Adding basic NUMA awareness