Re: Statistics Import and Export - Mailing list pgsql-hackers
From | Stephen Frost |
---|---|
Subject | Re: Statistics Import and Export |
Date | |
Msg-id | Zei1xMGH8SJS9yoc@tamriel.snowman.net Whole thread Raw |
In response to | Re: Statistics Import and Export (Matthias van de Meent <boekewurm+postgres@gmail.com>) |
Responses |
Re: Statistics Import and Export
|
List | pgsql-hackers |
Greetings, * Matthias van de Meent (boekewurm+postgres@gmail.com) wrote: > On Wed, 6 Mar 2024 at 11:33, Stephen Frost <sfrost@snowman.net> wrote: > > On Wed, Mar 6, 2024 at 11:07 Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > >> Or even just one VALUES for the whole statistics loading? > > I don’t think we’d want to go beyond one relation at a time as then it can be parallelized, we won’t be trying to locka whole bunch of objects at once, and any failures would only impact that one relation’s stats load. > > That also makes sense. Great, thanks. > >> I suspect the main issue with combining this into one statement > >> (transaction) is that failure to load one column's statistics implies > >> you'll have to redo all the other statistics (or fail to load the > >> statistics at all), which may be problematic at the scale of thousands > >> of relations with tens of columns each. > > > > > > I’m pretty skeptical that “stats fail to load and lead to a failed transaction” is a likely scenario that we have tospend a lot of effort on. > > Agreed on the "don't have to spend a lot of time on it", but I'm not > so sure on the "unlikely" part while the autovacuum deamon is > involved, specifically for non-upgrade pg_restore. I imagine (haven't > checked) that autoanalyze is disabled during pg_upgrade, but > pg_restore doesn't do that, while it would have to be able to restore > statistics of a table if it is included in the dump (and the version > matches). Even if autovacuum was running and it kicked off an auto-analyze of the relation at just the time that we were trying to load the stats, there would be appropriate locking happening to keep them from causing an outright ERROR and transaction failure, or if not, that's a lack of locking and should be fixed. With the per-attribute-function-call approach, that could lead to a situation where some stats are from the auto-analyze and some are from the stats being loaded but I'm not sure if that's a big concern or not. For users of this, I would think we'd generally encourage them to disable autovacuum on the tables they're loading as otherwise they'll end up with the stats going back to whatever an auto-analyze ends up finding. That may be fine in some cases, but not in others. A couple questions to think about though: Should pg_dump explicitly ask autovacuum to ignore these tables while we're loading them? Should these functions only perform a load when there aren't any existing stats? Should the latter be an argument to the functions to allow the caller to decide? > > What are the cases where we would be seeing stats reloads failing where it would make sense to re-try on a subset ofcolumns, or just generally, if we know that the pg_dump version matches the target server version? > > Last time I checked, pg_restore's default is to load data on a > row-by-row basis without --single-transaction or --exit-on-error. Of > course, pg_upgrade uses it's own set of flags, but if a user is > restoring stats with pg_restore, I suspect they'd rather have some > column's stats loaded than no stats at all; so I would assume this > requires one separate pg_import_pg_statistic()-transaction for every > column. Having some discussion around that would be useful. Is it better to have a situation where there are stats for some columns but no stats for other columns? There would be a good chance that this would lead to a set of queries that were properly planned out and a set which end up with unexpected and likely poor query plans due to lack of stats. Arguably that's better overall, but either way an ANALYZE needs to be done to address the lack of stats for those columns and then that ANALYZE is going to blow away whatever stats got loaded previously anyway and all we did with a partial stats load was maybe have a subset of queries have better plans in the interim, after having expended the cost to try and individually load the stats and dealing with the case of some of them succeeding and some failing. Overall, I'd suggest we wait to see what Corey comes up with in terms of doing the stats load for all attributes in a single function call, perhaps using the VALUES construct as you suggested up-thread, and then we can contemplate if that's clean enough to work or if it's so grotty that the better plan would be to do per-attribute function calls. If it ends up being the latter, then we can revisit this discussion and try to answer some of the questions raised above. Thanks! Stephen
Attachment
pgsql-hackers by date: