Re: pg_upgrade: Pass -j down to vacuumdb - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: pg_upgrade: Pass -j down to vacuumdb
Date
Msg-id CAMkU=1zcOfThON4p00VTWEO1xdY6R8Xbh9KSt9pZkLHq0gdE=A@mail.gmail.com
Whole thread Raw
In response to Re: pg_upgrade: Pass -j down to vacuumdb  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Responses Re: pg_upgrade: Pass -j down to vacuumdb
List pgsql-hackers
On Tue, Mar 26, 2019 at 7:28 AM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 2019-03-25 22:57, Tom Lane wrote:
> +     fprintf(script, "echo %sYou may wish to add --jobs=N for parallel analyzing.%s\n",
> +                     ECHO_QUOTE, ECHO_QUOTE);

But then you get that information after you have already started the script.

True, but the same goes for all the other information there, and it sleeps to let you break out of it.  And I make it a habit to glance through any scripts someone suggests that I run, so would notice the embedded advice without running it at all.

I don't find any information about this analyze business on the
pg_upgrade reference page.  Maybe a discussion there could explain the
different paths better than making the output script extra complicated.

Essentially: If you want a slow and gentle analyze, use the supplied
script.  If you want a fast analyze, use vacuumdb, perhaps with an
appropriate --jobs option.  Note that pg_upgrade --jobs and vacuumdb
--jobs are resource-bound in different ways, so the same value might not
be appropriate for both.


To me, analyze-in-stages is not about gentleness at all.  For example, it does nothing to move vacuum_cost_delay away from its default of 0.  Rather, it is about slamming the bare minimum statistics in there as fast as possible, so your database doesn't keel over from horrible query plans on even simple queries as soon as you reopen.  I want the database to survive long enough for the more complete statistics to be gathered.  If you have quickly accumulated max_connection processes all running horrible query plans that never finish, your database might as well still be closed for all the good it does the users. And all the load generated by those is going to make the critical ANALYZE all that much slower.

At first blush I thought it was obvious that you would not want to run analyze-in-stages in parallel.  But after thinking about it some more and reflecting on experience doing some troublesome upgrades, I would reverse that and say it is now obvious you do want at least the first stage of analyze-in-stages, and probably the first two, to run in parallel.  That is not currently an option it supports, so we can't really recommend it in the script or the docs.

But we could at least adopt the more straightforward patch, suggest that if they don't want analyze-in-stages they should consider doing the big-bang analyze in parallel.

Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: Ordered Partitioned Table Scans
Next
From: Masahiko Sawada
Date:
Subject: Re: Log a sample of transactions