Re: Statistics Import and Export - Mailing list pgsql-hackers

From Corey Huinker
Subject Re: Statistics Import and Export
Date
Msg-id CADkLM=c9gnYMkQCyh1+R56VrCkVWbu8-SCpmHepkbAJHMcVuuA@mail.gmail.com
Whole thread Raw
In response to Re: Statistics Import and Export  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Statistics Import and Export
List pgsql-hackers

* Doc build error and malformatting.

Looking into it.
 
* I'm not certain that we want all changes to relation stats to be non-
transactional. Are there transactional use cases? Should it be an
option? Should it be transactional for pg_set_relation_stats() but non-
transactional for pg_restore_relation_stats()?

It's non-transactional because that's how ANALYZE does it to avoid bloating pg_class. We _could_ do it transactionally, but on restore we'd immediately have a pg_class that was 50% bloat.
 

* The documentation for the pg_set_attribute_stats() still refers to
upgrade scenarios -- shouldn't that be in the
pg_restore_attribute_stats() docs? I imagine the pg_set variant to be
used for ad-hoc planner stuff rather than upgrades.

Noted.
 

* For the "WARNING: stat names must be of type text" I think we need an
ERROR instead. The calling convention of name/value pairs is broken and
we can't safely continue.

They can't be errors, because any one error fails the whole pg_upgrade.
 
* The huge list of "else if (strcmp(statname, mc_freqs_name) == 0) ..."
seems wasteful and hard to read. I think we already discussed this,
what was the reason we can't just use an array to map the arg name to
an arg position type OID?

That was my overreaction to the dislike that the P_argname enum got in previous reviews.

We'd need an array of struct like

argname  (ex. "mc_vals")
argtypeoid  (one of: int, text, real, rea[])
argtypename (name we want to call the argtypeoid (integer, text. real, real[] about covers it).
argpos (position in the arg list of the corresponding pg_set_ function
 

* How much error checking did we decide is appropriate? Do we need to
check that range_length_hist is always specified with range_empty_frac,
or should we just call that the planner's problem if one is specified
and the other not? Similarly, range stats for a non-range type.

I suppose we can let that go, and leave incomplete stat pairs in there.

The big risk is that somebody packs the call with more than 5 statkinds, which would overflow the struct.
 
* I think most of the tests should be of pg_set_*_stats(). For
pg_restore_, we just want to know that it's translating the name/value
pairs reasonably well and throwing WARNINGs when appropriate. Then, for
pg_dump tests, it should exercise pg_restore_*_stats() more completely.

I was afraid you'd suggest that, in which case I'd break up the patch into the pg_sets and the pg_restores.
 
* It might help to clarify which arguments are important (like
n_distinct) vs not. I assume the difference is that it's a non-NULLable
column in pg_statistic.

There are NOT NULL stats...now. They might not be in the future. Does that change your opinion?
 

* Some arguments, like the relid, just seem absolutely required, and
it's weird to just emit a WARNING and return false in that case.

Again, we can't fail.Any one failure breaks pg_upgrade.
 
* To clarify: a return of "true" means all settings were successfully
applied, whereas "false" means that some were applied and some were
unrecognized, correct? Or does it also mean that some recognized
options may not have been applied?

True means "at least some stats were applied. False means "nothing was modified".
 
* pg_set_attribute_stats(): why initialize the output tuple nulls array
to false? It seems like initializing it to true would be safer.
 
+1
 

* please use a better name for "k" and add some error checking to make
sure it doesn't overrun the available slots.

k was an inheritance from analzye.c, from whence the very first version was cribbed. No objection to renaming.

 
* the pg_statistic tuple is always completely replaced, but the way you
can call pg_set_attribute_stats() doesn't imply that -- calling
pg_set_attribute_stats(..., most_common_vals => ..., most_common_freqs
=> ...) looks like it would just replace the most_common_vals+freqs and
leave histogram_bounds as it was, but it actually clears
histogram_bounds, right? Should we make that work or should we document
better that it doesn't?

That would complicate things. How would we intentionally null-out one stat, while leaving others unchanged? However, this points out that I didn't re-instate the re-definition that applied the NULL defaults.

pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: documentation structure
Next
From: Andres Freund
Date:
Subject: Re: Evaluate arguments of correlated SubPlans in the referencing ExprState