Re: Re: Extended Statistics set/restore/clear functions. - Mailing list pgsql-hackers
| From | Corey Huinker |
|---|---|
| Subject | Re: Re: Extended Statistics set/restore/clear functions. |
| Date | |
| Msg-id | CADkLM=csMd52i39Ye8-PUUHyzBb3546eSCUTh-FBQ7bzT2uZ4Q@mail.gmail.com Whole thread Raw |
| In response to | Re: Re: Extended Statistics set/restore/clear functions. (Michael Paquier <michael@paquier.xyz>) |
| List | pgsql-hackers |
Indeed. These exact typos existed in some of the previous iterations
of the patch posted on this thread.
+1
1) statatt_get_type(), that retrieves a set of information from a
relation and one of its attributes. This is lightly documented in
extended_statistics_update() from 0002, with an argument claiming that
we do the same as attribute statistics. If one then looks at
attribute_stats.c, we have a not-really-helpful "derive information
from attribute". The trick is that we have an implied dependency
here: statatt_get_type() needs to be called *before*
statatt_get_elem_type() for attribute *and* extended stats,
statatt_get_elem_type() being fed data from statatt_get_type().
I've added comments to that effect.
2) statatt_get_elem_type(), interesting piece of the puzzle that acts
as a wrapper for the type cache. An important part here, it seems to
me, would be to at least tell that this relies on data retrieved from
statatt_get_type(), initially. Once I've noticed this dependency the
logic felt a bit cleaner to understand, but we have no docs explaining
any of that..
I've added comments that will hopefully connect those dots.
3) statatt_init_empty_tuple(), that initializes a set of data to be
used for updates and/or insertion. My first thought here is about
...
not apply to extended stats. But one would have to guess this fact
after knowing that this relates to the catalog pg_statistic..
The comments didn't seem quite so illuminating, but I added them just the same.
4) statatt_set_slot is slightly simpler. Still here, it is really easy
to miss that this routine should be fed data retrieved by
statatt_get_type(). staop can also be an eq or an lt operator.
stakind is one of the STATISTIC_KIND_* values, etc, etc.
Some here too.
As far as I can see, there is nothing in these functions that require
them to be located in attribute_stats.c anymore. Let's move that to
stats_utils.c, common place for all the shared facilities used by
these stats modules. Or it could also be a new, separate patch, for
the manipulation and extraction of the tuple data related to
the pg_statistic tuples.
They've been moved to stats_utils.c, and impact was fairly minimal. I was worried about an explosion of #includes, but that ended up not being the case.
Perhaps you know what these imply because you have written this code
originally in ce207d2a7901, but exposing them also means that it is
very important to document at least some concepts and assumptions
around them so as it is possible to guide somebody that may want to
use this code:
- What are the input arguments from?
- And what are the results?
- In which circumstances should these be used?
An attempt was made.
Then about 0002.
There are a bunch of assumptions embedded inside import_expressions()
that are interdependent with the calls of these attribute routines
...
each STATISTIC_KIND_*. Too much is let to the reader to guess, making
the code complicated to maintain as written, at least IMHO.
I tried to expand on these things, but I haven't yet moved them to another file for reasons I'll explain later.
Is there any need to locate these new functions and code in
extended_stats.c, actually? A separation into a new file seems like
it would be a cleaner result, leaving extended_stats.c to deal with
the import of the new data, including the fetch and deletion of any
existing data in pg_stat_ext that would be updated or inserted.
Perhaps name that extended_stats_funcs.c, as these are about the
direct SQL functions, import and deletion.
We can move them to another file, but I'm not sure if that will actually make things clearer. What is extended_stats.c if not functions about extended stats? Perhaps it makes more sense to try harder to find the commonality in the building of the pg_statistic tuples, move those functions to stats_utils, and get things decluttered that way?
For the tests, it looks like it would be better to have everything in
a new file, like a stats_ext_import.sql for the clear and restore
bits.
Those tests very much build upon the tables and data already created for regular relation/attribute imports, so we save quite a bit of boilerplate in keeping them there.
A lot of the tests are copy-pastes of surrounding queries. Perhaps it
would be better to use a SQL function wrapper or an IN clause with
multiple relations?
I'd say "no", because we're doing a lot of things that *could* generate ERRORs, and we're verifying that they generate WARNINGs instead. If we did batch up those commands, and any one of them generated an error, we'd be at loss for which one was the culprit, and we'd also lose any insight into what else got busted in the other rows of the batch.
The patch has a lot of bloat with these repeated
queries.. Perhaps we should use a split for the elements in the
extended stats array instead of jsonb_pretty(). The results get quite
long here.
I replaced '}, ' with '},\n' and removed the jsonb_pretty entirely, which is something I had proposed doing earlier, as it's actually more pretty than jsonb_pretty(). It's still in the same file, though.
I like the net result of this, and it might make sense to search other regression tests for places where we can employ this mini-pretty pattern.
As of 0002, there are actually two independent pieces dealt with: the
deletion of extended stats with pg_clear_extended_stats(), and the
insert/update of extended stats with pg_restore_extended_stats(). I'd
suggest to split that into two parts, with the clear being first in
rank. The deletion is simpler, and getting that in first simplifies
the review of the import part with less input to deal with.
The pg_clear function is quite trivial, as it's basically just marshalling and locking before a call to delete_pg_statistic_ext_data(). So while I can split them out, I'd also have to split the pg_proc.dat changes into two separate ones, which makes it hard to see how similar they are. I have a similar feeling about the changes to func-admin.sgml.
All of that, plus a rebase.
Attachment
pgsql-hackers by date: