Re: pg_upgrade: Pass -j down to vacuumdb - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: pg_upgrade: Pass -j down to vacuumdb
Date
Msg-id CAMkU=1w2Yuvz+xbCU+42pT2U3GCE0xmkaSDXCgM9CR3bt=5Crw@mail.gmail.com
Whole thread Raw
In response to Re: pg_upgrade: Pass -j down to vacuumdb  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Responses Re: pg_upgrade: Pass -j down to vacuumdb
List pgsql-hackers
On Fri, Mar 29, 2019 at 5:58 AM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 2019-03-28 02:43, Jeff Janes wrote:
> At first blush I thought it was obvious that you would not want to run
> analyze-in-stages in parallel.  But after thinking about it some more
> and reflecting on experience doing some troublesome upgrades, I would
> reverse that and say it is now obvious you do want at least the first
> stage of analyze-in-stages, and probably the first two, to run in
> parallel.  That is not currently an option it supports, so we can't
> really recommend it in the script or the docs.

So do you think we should copy down the -j option from pg_upgrade, or
make some other arrangement?

For 12, I think we should not pass it down so that it runs automatically, and just go with a doc-only patch instead.  Where the text written into the analyze_new_cluster.sh recommending to hit Ctrl-C and do something else is counted as documentation.

I agree with you that the value of -j that was used for pg_ugrade might not be suitable for vacuumdb, but anyone who tests things thoroughly enough to know exactly what value to use for each is not going to be the target audience of analyze_new_cluster.sh anyway.  So I think the -j used in pg_upgrade can just be written directly into the suggestion, and that would be good enough for the intended use.

Ideally I think the analyze-in-stages should have the ability to parallelize the first stage and not the last one, but that is not v12 material.

Even more ideally it should only have two stages, not three.  One stage runs to generate minimally-tolerable stats before the database is opened to other users, and one runs after it is open (but hopefully before the big end-of-month reports get run, or whatever the big statistics-sensitive queries are on your system).   But we don't really have a concept of "open to other users" in the first place, and doing it yourself by juggling pg_hba files is annoying and error prone.  So maybe the first stage could be run by pg_upgrade itself, while the new server is still running on a linux socket in a private directory.

The defense of the current three-stage method for analyze-in-stages is that very few people are likely to know just what the level of "minimally-tolerable stats" for their system actually are, because upgrades are rare enough not to learn by experience, and realistic load-generators are rare enough not to learn from test/QA environments.  If the default_statistics_target=1 stats are enough, then you got them quickly.  If they aren't, at least you didn't waste too much time collecting them before moving on to the second-stage default_statistics_target=10 and then the final stage.  So the three stages are amortizing over our ignorance.
 
Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Strange coding in _mdfd_openseg()
Next
From: Andres Freund
Date:
Subject: Re: Strange coding in _mdfd_openseg()