Re: Statistics Import and Export - Mailing list pgsql-hackers

From Corey Huinker
Subject Re: Statistics Import and Export
Date
Msg-id CADkLM=dJ3grfy=PuUWgmCAfpGOyoXTJQr2UWo+FqYTimeYm3dg@mail.gmail.com
Whole thread Raw
In response to Re: Statistics Import and Export  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
I have attached version 28j as one giant patch covering what was
previously 0001-0003. It's a bit rough (tests in particular need some
work), but it implelements the logic to replace only those values
specified rather than the whole tuple.

I like what you did restoring the parameter enums, especially now that they can be leveraged for the expected type oids data structure.
 
At least for the interactive "set" variants of the functions, I think
it's an improvement. It feels more natural to just change one stat
without wiping out all the others. I realize a lot of the statistics
depend on each other, but the point is not to replace ANALYZE, the
point is to experiment with planner scenarios. What do others think?

When I first heard that was what you wanted to do, I was very uneasy about it. The way you implemented it (one function to wipe out/reset all existing stats, and then the _set_ function works as an upsert) puts my mind at ease. The things I really wanted to avoid were gaps in the stakind array (which can't happen as you wrote it) and getting the stakinds out of order (admittedly that's more a tidiness issue, but pg_statistic before/after fidelity is kept, so I'm happy).

For the "restore" variants, I'm not sure it matters a lot because the
stats will already be empty. If it does matter, we could pretty easily
define the "restore" variants to wipe out existing stats when loading
the table, though I'm not sure if that's a good thing or not.

I agree, and I'm leaning towards doing the clear, because "restore" to me implies that what resides there exactly matches what was in the function call, regardless of what might have been there before. But you're also right, "restore" is expected to be used on default/missing stats, and the restore_* call generated is supposed to be comprehensive of all stats that were there at time of dump/upgrade, so impact would be minimal.
 
I also made more use of FunctionCallInfo structures to communicate
between functions rather than huge parameter lists. I believe that
reduced the line count substantially, and made it easier to transform
the argument pairs in the "restore" variants into the positional
arguments for the "set" variants.

You certainly did, and I see where it pays off given that _set_ / _restore_ functions are just different ways of ordering the shared internal function call.

Observation: there is currently no way to delete a stakind, keeping the rest of the record. It's certainly possible to compose a SQL query that gets the current values, invokes pg_clear_* and then pg_set_* using the values that are meant to be kept, and in fact that pattern is how I imagined the pg_set_* functions would be used when they overwrote everything in the tuple. So I am fine with going forward with this paradigm.

The code mentions that more explanation should be given for the special cases (tsvector, etc) and that explanation is basically "this code follows what the corresponding custom typanalyze() function does". In the future, it may make sense to have custom typimport() functions for datatypes that have a custom typanalzye(), which would solve the issue of handling custom stakinds.

I'll continue to work on this.

p.s. dropping invalid email address from the thread

pgsql-hackers by date:

Previous
From: Jacob Champion
Date:
Subject: Re: Make query cancellation keys longer
Next
From: Tomas Vondra
Date:
Subject: Re: scalability bottlenecks with (many) partitions (and more)