On Wed, 6 Mar 2024 at 11:33, Stephen Frost <sfrost@snowman.net> wrote:
> On Wed, Mar 6, 2024 at 11:07 Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
>> Or even just one VALUES for the whole statistics loading?
>
>
> I don’t think we’d want to go beyond one relation at a time as then it can be parallelized, we won’t be trying to
locka whole bunch of objects at once, and any failures would only impact that one relation’s stats load.
That also makes sense.
>> I suspect the main issue with combining this into one statement
>> (transaction) is that failure to load one column's statistics implies
>> you'll have to redo all the other statistics (or fail to load the
>> statistics at all), which may be problematic at the scale of thousands
>> of relations with tens of columns each.
>
>
> I’m pretty skeptical that “stats fail to load and lead to a failed transaction” is a likely scenario that we have to
spenda lot of effort on.
Agreed on the "don't have to spend a lot of time on it", but I'm not
so sure on the "unlikely" part while the autovacuum deamon is
involved, specifically for non-upgrade pg_restore. I imagine (haven't
checked) that autoanalyze is disabled during pg_upgrade, but
pg_restore doesn't do that, while it would have to be able to restore
statistics of a table if it is included in the dump (and the version
matches).
> What are the cases where we would be seeing stats reloads failing where it would make sense to re-try on a subset of
columns,or just generally, if we know that the pg_dump version matches the target server version?
Last time I checked, pg_restore's default is to load data on a
row-by-row basis without --single-transaction or --exit-on-error. Of
course, pg_upgrade uses it's own set of flags, but if a user is
restoring stats with pg_restore, I suspect they'd rather have some
column's stats loaded than no stats at all; so I would assume this
requires one separate pg_import_pg_statistic()-transaction for every
column.
Kind regards,
Matthias van de Meent
Neon (https://neon.tech)