Re: Add expressions to pg_restore_extended_stats() - Mailing list pgsql-hackers

From Corey Huinker
Subject Re: Add expressions to pg_restore_extended_stats()
Date
Msg-id CADkLM=fOmCRxb-ssN+e1ZGTmsL-wTzHDD4a012rL=fUYAz-cfg@mail.gmail.com
Whole thread Raw
In response to Re: Add expressions to pg_restore_extended_stats()  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Add expressions to pg_restore_extended_stats()
List pgsql-hackers

Not including a trace regarding to which expression a row refers to
sounds like a design mistake to me, particularly because JSON is, by
design, JSON, and we don't have ordering requirements.

I'd have no issue including the expr definition in the existing object container - an optional parameter that allow (no warning) but which has no effect. We could then compare that optional parameter against the text representation of the node and issue a non-failing warning on any differences.

But in a way we're already getting this type checking if the expressions have different datatypes and either of them have most_common_values, histogram_bounds, or most_common_elems - as all of those require input coersion of the array values to datatypes of the expressions, and any failure in any of those dooms the whole exprs array.

It's true that json jumbles orders of keys, but it does not jumble the order of elements inside an array. The 2-D text[] export has exactly this same problem, btw.
 
  If we don't
include an expression text, I'm OK to give up on this idea.  But let's
at least include a negative attribute number with an "attribute"
field.

That feels like the same thing as including the expr definition as a parameter, and I'm as amenable to this as I am to the expr node text.
 
  We could cross-check it with the number of expressions defined
in the statext object.

Yep.
 

On second though, as we already use negative attribute numbers for
ndistinct and dependencies, perhaps it's not a bad choice to use a
negative number anyway.  As the attribute number assigned depends on
the order of the elements in pg_stats_ext.exprs, I'd suggest to tweak
the pg_dump query to rely on that rather than ORDINALITY and the order
where the rows of pg_stats_ext_exprs are scanned.  Using the order of
the elements in the definition of the stats object is predictable.  A
sequential scan of a catalog offers no real guarantees.

This has come up before. pg_stats, infuriatingly, has an attname but not an attnum.

pg_stats_ext_exprs, equally infuriatingly, does not expose pg_statistic_ext.oid, which means to join back from pg_stats_ext_exprs to pg_statistic_ext, we'd have to re-join the namespace and...yeah, it gets yucky.

But instead of doing that, we could fetch the pg_get_statisticsobjdef_expressions() when we first fetch the extstats definitions, use that as a parameter in the dump query, and unnest that with ordinality and join back to pg_stats_ext_expr. I think that's the best way of enforcing the order of the expressions and not just trusting the unnest() in the view definition.

In other news, pg_get_statisticsobjdef_expressions appears to be undocumented, though it exists in v14 when extended stats expressions were introduced, so I think we're safe there.

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: IO wait events for COPY FROM/TO PROGRAM or file
Next
From: Chao Li
Date:
Subject: Re: walsender: Assert MyReplicationSlot is set before use