Re: Statistics Import and Export - Mailing list pgsql-hackers

From Corey Huinker
Subject Re: Statistics Import and Export
Date
Msg-id CADkLM=dB-qbU81LuAQVSQxAabYXghXe6ptaatTYmQAzVT44V_g@mail.gmail.com
Whole thread Raw
In response to Re: Statistics Import and Export  (Stephen Frost <sfrost@snowman.net>)
Responses Re: Statistics Import and Export
Re: Statistics Import and Export
Re: Statistics Import and Export
List pgsql-hackers


Having some discussion around that would be useful.  Is it better to
have a situation where there are stats for some columns but no stats for
other columns?  There would be a good chance that this would lead to a
set of queries that were properly planned out and a set which end up
with unexpected and likely poor query plans due to lack of stats.
Arguably that's better overall, but either way an ANALYZE needs to be
done to address the lack of stats for those columns and then that
ANALYZE is going to blow away whatever stats got loaded previously
anyway and all we did with a partial stats load was maybe have a subset
of queries have better plans in the interim, after having expended the
cost to try and individually load the stats and dealing with the case of
some of them succeeding and some failing.

It is my (incomplete and entirely second-hand) understanding is that pg_upgrade doesn't STOP autovacuum, but sets a delay to a very long value and then resets it on completion, presumably because analyzing a table before its data is loaded and indexes are created would just be a waste of time.

 

Overall, I'd suggest we wait to see what Corey comes up with in terms of
doing the stats load for all attributes in a single function call,
perhaps using the VALUES construct as you suggested up-thread, and then
we can contemplate if that's clean enough to work or if it's so grotty
that the better plan would be to do per-attribute function calls.  If it
ends up being the latter, then we can revisit this discussion and try to
answer some of the questions raised above.

In the patch below, I ended up doing per-attribute function calls, mostly because it allowed me to avoid creating a custom data type for the portable version of pg_statistic. This comes at the cost of a very high number of parameters, but that's the breaks.

I am a bit concerned about the number of locks on pg_statistic and the relation itself, doing CatalogOpenIndexes/CatalogCloseIndexes once per attribute rather than once per relation. But I also see that this will mostly get used at a time when no other traffic is on the machine, and whatever it costs, it's still faster than the smallest table sample (insert joke about "don't have to be faster than the bear" here).

This raises questions about whether a failure in one attribute update statement should cause the others in that relation to roll back or not, and I can see situations where both would be desirable.

I'm putting this out there ahead of the pg_dump / fe_utils work, mostly because what I do there heavily depends on how this is received.

Also, I'm still seeking confirmation that I can create a pg_dump TOC entry with a chain of commands (e.g. BEGIN; ...  COMMIT; ) or if I have to fan them out into multiple entries.

Anyway, here's v7. Eagerly awaiting feedback.
Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: meson: Specify -Wformat as a common warning flag for extensions
Next
From: Shlok Kyal
Date:
Subject: Re: speed up a logical replica setup