Re: Add expressions to pg_restore_extended_stats() - Mailing list pgsql-hackers

From Corey Huinker
Subject Re: Add expressions to pg_restore_extended_stats()
Date
Msg-id CADkLM=dWG-1oYFPYAXpL9UZ-C2VOsQxrtwsgmvL_vuLzAkOedA@mail.gmail.com
Whole thread Raw
In response to Re: Add expressions to pg_restore_extended_stats()  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Add expressions to pg_restore_extended_stats()
List pgsql-hackers
>> For this warning in the regression tests, you should only need one
>> element, reducing the number of input lines?
>
> That's true IF we decide that missing expression keys are a thing that we
> allow, and per conversation above it's not clear yet that we do.

If ANALYZE can generate some partial data for a set of expression (aka
generating some data for a portion of the expressions defined but skip
some of them because of reason x_y_z), the restore function needs to
cope with this policy (I'd need to double-check the code to be sure,
not sure on top of my mind now and I cannot do that today,
unfortunately).

In this latest patch, I decided that we can tolerate missing exprs parameters, and as you'll see from the 0002 patch, it does result in considerable decluttering.

Hmm.  I would not rely on the ordering of the items as they are
scanned, that seems like a recipe for disaster.  We have a text
representation of the expression, as of pg_stats_ext_exprs.expr.  This
could be mapped with the elements of hte text[] in pg_stats_ext.exprs.
Here is a wilder idea: why not just putting the expression text itself
in the data given in input of the restore function rather than a
guessed argument number?  For the case of manual stats injections,
that kinds of makes things simpler.

The patch I submitted here hasn't yet implemented using the expression text as a key, and honestly I'm not too hyped on the prospect of trying.

I think that:

1. I don't think we can guarantee that the expression node text is stable across major versions, and that would break upgrades, the primary function of this code.
2. Anyone wanting to modify/hack the exprs values has almost certainly extracted it using the jsonb_build_object() code in pg_dump, so they already have all expressions before editing.
3. Array unnest() has proven to give a stable order in all tests so far.
4. We don't decompose mcv into it's parts, so why do that for exprs?

Yeah, it sounds to me that we should just set ok=false and give up
rather than have a semi-filled set of numbers for a single extended
stats object.  There is an argument in favor of that: it can simplify
the detection of missing stats for a single extended stats definition.
I understand that you'd want to keep going with loading the data even
if it's partial.  My question is: is it possible for ANALYZE to fill
in only a portion of the expressions and can these be partially
skipped?  If the answer to my question is yes, the restore function
should do the same and my idea of the matter is wrong.  If the answer
to my question is no, then your idea on this matter is the right one.

In which case I think you'll like the latest patchset.
 

> 4. Does casting the numeric scalar values (null_frac, correlation,
> avg_width, n_distinct) to text make sense, since we have to put them
> through type-specific input functions anyway?

In terms of JSON, it makes the use of a representation simpler.  I
don't think that we need to apply a strict casting when fetching them.

Everything is jbvStrings now.
 
@Tomas (now added in CC for confirmation): would you see a problem
against applying a JSONB data type to the argument for the restore of
extended stats?  This level of data serialization would be required
when inserting data for what would show up into pg_stats_ext_exprs.

Very interested to hear his thoughts as well.
Attachment

pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Wake up backends immediately when sync standbys decrease
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: pg_resetwal: Fix wrong directory in log output