Re: Add expressions to pg_restore_extended_stats() - Mailing list pgsql-hackers

From Corey Huinker
Subject Re: Add expressions to pg_restore_extended_stats()
Date
Msg-id CADkLM=cuvJBNCQ7Ge7zrc52s_ci=g=FbXiOum3i86MCUpSNEDQ@mail.gmail.com
Whole thread Raw
In response to Re: Add expressions to pg_restore_extended_stats()  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Add expressions to pg_restore_extended_stats()
List pgsql-hackers
My initial read of statext_expressions_load() was actually a bit
wrong, examine_variable() will always go through the load(), but if
serialize_expr_stats() has skipped one expression due to invalid data,
we just store a "(Datum) 0" aka NULL in the array.  So there is some
data, and that also means that we will always have a number of
elements equal to the number of expressions (accumArrayResult adds the
value).  Sorry for the confusion!

That is an enormous relief.
 

It also means that it should be possible to write
import_pg_statistic() so as it does the same thing as the
serialization.  How about just allowing a NULL in the JSON array as an
equivalent of no data found for an expression?  If you do that, your
initial idea about requiring the same number of elements in the array
would be consistent with the backend, without having to include any
information about the expression text or its attnum in the elements of
the JSON array.

Whew.
 

That would require a couple of extra lines in
import_pg_statistic(), so as accumArrayResult() could be called with
disnull=true to do the same as the backend, with "ok" set to true to
count for the fact that it is a valid pattern.

+1 
 
> - An element in the input array should have all its key/value pairs
>> set, if set for an expression.
>>
>
> That won't fly, because there will be pg_dumps from prior versions that
> won't know about future keys which would then be required.

Okay, point taken.

We also get *much* shorter regression tests if allow values irrelevant to the test to be excluded.
 

> It does, but I'm not seeing how we line up the right stats to the right
> element in the event of an undercount. I've been digging around in the
> planner, and it seems like it just takes the
> Anum_pg_statistic_ext_data_stxdexpr, expands it, and then
> statext_expressions_load just assumes that the array will have something at
> that index. I hope I'm misreading this, because we may have uncovered
> something broken.

It looks like your take is right and that my first impression was
wrong: if an expression has invalid data accumArrayResult() stores a
NULL value, so we will never be short.  So there is some data, and
your patch could be tweaked so as an expression in a set is marked as
invalid by storing a NULL in the result array at its location.

I'll switch to adding the nulls to the array result, and add tests for both leading and trailing expr missing.

Mild change of subject, it seems that we can't get the expression fake attnum context into the errors we re-throw in statatt_build_stavalues - it might make sense to to bring a version of that function into extended_stats_funcs where we can add the extra parameters for context (and avoid the need for a text datum version of some longish strings that we've already just converted from converting json-string to c-string. If I did make a new function, then that'd be 2 statatt_* functions that no longer need to be visible outside of attribute_stats.c. Thoughts on both making the new function, and maybe sending a few of these statatts back to static-land?


 

pgsql-hackers by date:

Previous
From: John Naylor
Date:
Subject: Re: refactor architecture-specific popcount code
Next
From: Manni Wood
Date:
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD