Thread: pg_upgrade: Pass -j down to vacuumdb

pg_upgrade: Pass -j down to vacuumdb

From
Jesper Pedersen
Date:
Hi Hackers,

Here is a patch that passes the -j option from pg_upgrade down to 
vacuumdb if supported.

I'll add it to the January 'Fest.

Thanks for considering !

Best regards,
  Jesper

Attachment

Re: pg_upgrade: Pass -j down to vacuumdb

From
Peter Eisentraut
Date:
On 21/12/2018 11:12, Jesper Pedersen wrote:
> Here is a patch that passes the -j option from pg_upgrade down to 
> vacuumdb if supported.

It's not clear to me that that is what is usually wanted.

The point of the post-upgrade analyze script is specifically to do the
analyze in a gentle fashion.  Mixing in parallelism would seem to defeat
that a bit.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pg_upgrade: Pass -j down to vacuumdb

From
Jesper Pedersen
Date:
Hi,

On 12/29/18 10:03 AM, Peter Eisentraut wrote:
> On 21/12/2018 11:12, Jesper Pedersen wrote:
>> Here is a patch that passes the -j option from pg_upgrade down to
>> vacuumdb if supported.
> 
> It's not clear to me that that is what is usually wanted.
> 
> The point of the post-upgrade analyze script is specifically to do the
> analyze in a gentle fashion.  Mixing in parallelism would seem to defeat
> that a bit.
> 

Well, that really depends. The user passed -j to pg_upgrade in order for 
the upgrade to happen faster, so maybe they would expect, as I would, 
that the ANALYZE phase would happen in parallel too.

At least there should be a "notice" about what the script will do in 
this case.

Best regards,
  Jesper


Re: pg_upgrade: Pass -j down to vacuumdb

From
Peter Eisentraut
Date:
On 02/01/2019 20:47, Jesper Pedersen wrote:
> Well, that really depends. The user passed -j to pg_upgrade in order for 
> the upgrade to happen faster, so maybe they would expect, as I would, 
> that the ANALYZE phase would happen in parallel too.

pg_upgrade -j reduces the *downtime* caused by pg_upgrade during the
upgrade process.  Also, during said downtime, nothing else is happening,
so you can use all the resources of the machine.

Once the system is back up, you don't necessarily want to use all the
resources.  The analyze script is specifically written to run while
production traffic is active.  If you just want to run the analyze as
fast as possible, you can just run vacuumdb -j directly, without using
the script.

Moreover, it's not clear that pg_upgrade and vacuumdb are bound the same
way, so it's not a given that the same -j number should be used.

Perhaps more documentation would be useful here.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pg_upgrade: Pass -j down to vacuumdb

From
Jerry Sievers
Date:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

> On 02/01/2019 20:47, Jesper Pedersen wrote:
>
>> Well, that really depends. The user passed -j to pg_upgrade in order for 
>> the upgrade to happen faster, so maybe they would expect, as I would, 
>> that the ANALYZE phase would happen in parallel too.
>
> pg_upgrade -j reduces the *downtime* caused by pg_upgrade during the
> upgrade process.  Also, during said downtime, nothing else is happening,
> so you can use all the resources of the machine.
>
> Once the system is back up, you don't necessarily want to use all the
> resources.  The analyze script is specifically written to run while
> production traffic is active.  If you just want to run the analyze as
> fast as possible, you can just run vacuumdb -j directly, without using
> the script.

Peter, I'm skeptical here.

I might permit a connection to a just pg_upgraded DB prior to any
analyze being known finished only for the most trivial case.  At my site
however, *trivial* systems are a small minority.

In fact, our automated upgrade workflow uses our home-built parallel
analyzer which predates vacuumdb -j.  Apps are not allowed into the DB
until a fast 1st pass has been done.

We run it in 2 phases...

$all preceeding upgrade steps w/system locked out
analyze-lite (reduced stats target)
open DB for application traffic
analyze-full

Of course we are increasing downtime by disallowing app traffic till
finish of analyze-lite however the assumption is that many queries would
be too slow to attempt without full analyzer coverage, albiet at a
reduced stats target.

IMO this is certainly a case of no 1-size-fits-all solution so perhaps a
--analyze-jobs option :-)

FWIW

Thanks

> Moreover, it's not clear that pg_upgrade and vacuumdb are bound the same
> way, so it's not a given that the same -j number should be used.
>
> Perhaps more documentation would be useful here.

-- 
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net


Re: pg_upgrade: Pass -j down to vacuumdb

From
Jesper Pedersen
Date:
Hi Peter,

On 1/3/19 7:03 AM, Peter Eisentraut wrote:
> Perhaps more documentation would be useful here.
> 

Here is v2 that just notes that the -j option isn't passed down.

Best regards,
  Jesper

Attachment

RE: pg_upgrade: Pass -j down to vacuumdb

From
"Jamison, Kirk"
Date:
Hi,

According to CF app, this patch needs review so I took a look at it.
Currently, your patch applies and builds cleanly, and all tests are also successful
based from the CF bot patch tester.

I'm not sure if I have understood the argument raised by Peter correctly. 
Quoting Peter, "it's not clear that pg_upgrade and vacuumdb are bound the same way, so it's not a given that the same
-jnumber should be used."
 
I think it's whether the # jobs for pg_upgrade is used the same way for parallel vacuum. 

According to the official docs, the recommended setting for pg_upgrade --j option is the maximum of the number of CPU
coresand tablespaces. [1]
 
As for vacuumdb, parallel vacuum's (-j) recommended setting is based from your max_connections [2], which is the max #
ofconcurrent connections to your db server.
 

[1] https://www.postgresql.org/docs/current/pgupgrade.html
[2] https://www.postgresql.org/docs/current/app-vacuumdb.html


Regards,
Kirk Jamison

Re: pg_upgrade: Pass -j down to vacuumdb

From
Michael Paquier
Date:
On Fri, Jan 25, 2019 at 02:31:57AM +0000, Jamison, Kirk wrote:
> I'm not sure if I have understood the argument raised by Peter
> correctly.  Quoting Peter, "it's not clear that pg_upgrade and
> vacuumdb are bound the same way, so it's not a given that the same
> -j number should be used."  I think it's whether the # jobs for
> pg_upgrade is used the same way for parallel vacuum.

vacuumdb and pg_upgrade are designed for different purposes and have
different properties, hence using a value of -j for one does not
necessarily mean that the same value should be used for the other.
An ANALYZE is nice because it is non-intrusive for a live production
system, and if you begin to pass down a -j argument you may finish by
eating more resources that would have been necessary for the same
job.  For some users perhaps that matters, for some it does not.  So
based on that linking the value used by pg_upgrade and vacuumdb is a
bad concept in my opinion, and the patch should be rejected.  More
documentation on pg_upgrade side to explain that a bit better could be
a good idea though, as it is perfectly possible to use your own
post-upgrade script or rewrite partially the generated one.
--
Michael

Attachment

Re: pg_upgrade: Pass -j down to vacuumdb

From
Peter Eisentraut
Date:
On 25/01/2019 11:28, Michael Paquier wrote:
> based on that linking the value used by pg_upgrade and vacuumdb is a
> bad concept in my opinion, and the patch should be rejected.  More
> documentation on pg_upgrade side to explain that a bit better could be
> a good idea though, as it is perfectly possible to use your own
> post-upgrade script or rewrite partially the generated one.

Right.  pg_upgrade doesn't actually call vacuumdb.  It creates a script
that you may use.  The script itself contains a comment that says, if
you want to do this as fast as possible, don't use this script.  That
comment could be enhanced to suggest the use of the -j option.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pg_upgrade: Pass -j down to vacuumdb

From
Jesper Pedersen
Date:
Hi Kirk,

On 1/24/19 9:31 PM, Jamison, Kirk wrote:
> According to CF app, this patch needs review so I took a look at it.
> Currently, your patch applies and builds cleanly, and all tests are also successful
> based from the CF bot patch tester.
> 
> I'm not sure if I have understood the argument raised by Peter correctly.
> Quoting Peter, "it's not clear that pg_upgrade and vacuumdb are bound the same way, so it's not a given that the same
-jnumber should be used."
 
> I think it's whether the # jobs for pg_upgrade is used the same way for parallel vacuum.
> 
> According to the official docs, the recommended setting for pg_upgrade --j option is the maximum of the number of CPU
coresand tablespaces. [1]
 
> As for vacuumdb, parallel vacuum's (-j) recommended setting is based from your max_connections [2], which is the max
#of concurrent connections to your db server.
 
>

Thanks for your feedback !

As per Peter's comments I have changed the patch (v2) to not pass down 
the -j option to vacuumdb.

Only an update to the documentation and console output is made in order 
to make it more clear.

Best regards,
  Jesper


Re: pg_upgrade: Pass -j down to vacuumdb

From
Alvaro Herrera
Date:
On 2019-Jan-25, Peter Eisentraut wrote:

> On 25/01/2019 11:28, Michael Paquier wrote:
> > based on that linking the value used by pg_upgrade and vacuumdb is a
> > bad concept in my opinion, and the patch should be rejected.  More
> > documentation on pg_upgrade side to explain that a bit better could be
> > a good idea though, as it is perfectly possible to use your own
> > post-upgrade script or rewrite partially the generated one.
> 
> Right.  pg_upgrade doesn't actually call vacuumdb.  It creates a script
> that you may use.  The script itself contains a comment that says, if
> you want to do this as fast as possible, don't use this script.  That
> comment could be enhanced to suggest the use of the -j option.

So let's have it write with a $VACUUMDB_OPTS variable, which is by
default defined as empty but with a comment suggesting that maybe the
user wants to add the -j option.  This way, if they have to edit it,
they only have to edit the VACUUMDB_OPTS line instead of each of the two
vacuumdb lines.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pg_upgrade: Pass -j down to vacuumdb

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> So let's have it write with a $VACUUMDB_OPTS variable, which is by
> default defined as empty but with a comment suggesting that maybe the
> user wants to add the -j option.  This way, if they have to edit it,
> they only have to edit the VACUUMDB_OPTS line instead of each of the two
> vacuumdb lines.

Even better, allow the script to absorb a value from the environment,
so that it needn't be edited at all.

            regards, tom lane


Re: pg_upgrade: Pass -j down to vacuumdb

From
Michael Paquier
Date:
On Fri, Jan 25, 2019 at 12:16:49PM -0500, Tom Lane wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> So let's have it write with a $VACUUMDB_OPTS variable, which is by
>> default defined as empty but with a comment suggesting that maybe the
>> user wants to add the -j option.  This way, if they have to edit it,
>> they only have to edit the VACUUMDB_OPTS line instead of each of the two
>> vacuumdb lines.
>
> Even better, allow the script to absorb a value from the environment,
> so that it needn't be edited at all.

+1.
--
Michael

Attachment

RE: pg_upgrade: Pass -j down to vacuumdb

From
"Jamison, Kirk"
Date:
Hi Jesper,

On Friday, January 25, 2019, Jesper Pedersen <jesper.pedersen@redhat.com> 
> Thanks for your feedback !
>
> As per Peter's comments I have changed the patch (v2) to not pass down the -j option to vacuumdb.
>
> Only an update to the documentation and console output is made in order to make it more clear.

Yeah, I understood from the references that the parallel option use
for vacuumdb and pg_upgrade are different. I was also getting
clarification whether pg_upgrade gets to pass it down to vacuumdb.
Based from other Michael's and Peter's replies, I now understand it.

There were also helpful comments from the developers above,
pointing it to the right direction.
In addition to Peter's comment, quoting "...if you want to do this
as fast as possible, don't use this script.  That comment could be
enhanced to suggest the use of the -j option.", so I think you 
should also update the following script in create_script_for_cluster_analyze():
     fprintf(script, "echo %sIf you would like default statistics as quickly as possible, cancel%s\n",
             ECHO_QUOTE, ECHO_QUOTE);

There were also advice from Tom and Alvaro that you can incorporate
further in your next patch.

>Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> So let's have it write with a $VACUUMDB_OPTS variable, which is by 
>> default defined as empty but with a comment suggesting that maybe the 
>> user wants to add the -j option.  This way, if they have to edit it, 
>> they only have to edit the VACUUMDB_OPTS line instead of each of the 
>> two vacuumdb lines.
>
Tom Lane wrote:
>Even better, allow the script to absorb a value from the environment, so that it needn't be edited at all.

Regards,
Kirk Jamison

Re: pg_upgrade: Pass -j down to vacuumdb

From
Jesper Pedersen
Date:
Hi,

On 1/27/19 7:50 PM, Jamison, Kirk wrote:
> There were also helpful comments from the developers above,
> pointing it to the right direction.
> In addition to Peter's comment, quoting "...if you want to do this
> as fast as possible, don't use this script.  That comment could be
> enhanced to suggest the use of the -j option.", so I think you
> should also update the following script in create_script_for_cluster_analyze():
>       fprintf(script, "echo %sIf you would like default statistics as quickly as possible, cancel%s\n",
>               ECHO_QUOTE, ECHO_QUOTE);
>

Yes, it will echo the -j option if passed, and the server supports it.

>> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>>> So let's have it write with a $VACUUMDB_OPTS variable, which is by
>>> default defined as empty but with a comment suggesting that maybe the
>>> user wants to add the -j option.  This way, if they have to edit it,
>>> they only have to edit the VACUUMDB_OPTS line instead of each of the
>>> two vacuumdb lines.
>>
> Tom Lane wrote:
>> Even better, allow the script to absorb a value from the environment, so that it needn't be edited at all.
> 

Done.

Attached is v3.

Best regards,
  Jesper

Attachment

Re: pg_upgrade: Pass -j down to vacuumdb

From
Fabrízio de Royes Mello
Date:

On Mon, Jan 28, 2019 at 1:15 PM Jesper Pedersen <jesper.pedersen@redhat.com> wrote:
>
> Done.
>
> Attached is v3.
>

IMHO you should use long option name '--jobs' like others.

Regards,

--
   Fabrízio de Royes Mello         Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: pg_upgrade: Pass -j down to vacuumdb

From
Jesper Pedersen
Date:
On 1/28/19 11:30 AM, Fabrízio de Royes Mello wrote:
> IMHO you should use long option name '--jobs' like others.
>

Thanks for your feedback !

Done, and v4 attached.

Best regards,
  Jesper

Attachment

Re: pg_upgrade: Pass -j down to vacuumdb

From
Peter Eisentraut
Date:
On 28/01/2019 17:47, Jesper Pedersen wrote:
> On 1/28/19 11:30 AM, Fabrízio de Royes Mello wrote:
>> IMHO you should use long option name '--jobs' like others.
>>
> 
> Thanks for your feedback !
> 
> Done, and v4 attached.

I would drop the changes in pgupgrade.sgml.  We don't need to explain
what doesn't happen, when nobody before now ever thought that it would
happen.

Also, we don't need the versioning stuff.  The new cluster in pg_upgrade
is always of the current version, and we know what that one supports.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pg_upgrade: Pass -j down to vacuumdb

From
Jesper Pedersen
Date:
On 1/28/19 3:50 PM, Peter Eisentraut wrote:
>> Done, and v4 attached.
> 
> I would drop the changes in pgupgrade.sgml.  We don't need to explain
> what doesn't happen, when nobody before now ever thought that it would
> happen.
> 
> Also, we don't need the versioning stuff.  The new cluster in pg_upgrade
> is always of the current version, and we know what that one supports.
> 

Done.

Best regards,
  Jesper

Attachment

RE: pg_upgrade: Pass -j down to vacuumdb

From
"Jamison, Kirk"
Date:
Hi Jesper,

Thanks for updating your patches quickly.

>On 1/28/19 3:50 PM, Peter Eisentraut wrote:
>>> Done, and v4 attached.
>> 
>> I would drop the changes in pgupgrade.sgml.  We don't need to explain 
>> what doesn't happen, when nobody before now ever thought that it would 
>> happen.
>> 
>> Also, we don't need the versioning stuff.  The new cluster in 
>> pg_upgrade is always of the current version, and we know what that one supports.
>> 

>Done.

I just checked the patch.
As per advice, you removed the versioning and specified --jobs. 
The patch still applies, builds and passed the tests successfully.

I'd argue though that it's helpful to find a short documentation to clarify
that it's not pass down by to vacuumdb default. I think the initial doc 
change from the previous patch would be good.
+     machine. Note, that this option isn't passed to the
+     <application>vacuumdb</application> application by default.
Just remove the comma symbol from "Note, that..."


Regards,
Kirk Jamison

Re: pg_upgrade: Pass -j down to vacuumdb

From
Michael Paquier
Date:
On Tue, Jan 29, 2019 at 12:35:30AM +0000, Jamison, Kirk wrote:
> I just checked the patch.
> As per advice, you removed the versioning and specified --jobs.
> The patch still applies, builds and passed the tests successfully.

I would document the optional VACUUM_OPTS on the page of pg_upgrade.
If Peter thinks it is fine to not do so, that's fine for me as well.

It seems to me that the latest patch sent is incorrect for multiple
reasons:
1) You still enforce -j to use the number of jobs that the caller of
pg_upgrade provides, and we agreed that both things are separate
concepts upthread, no?  What has been suggested by Alvaro is to add a
comment so as one can use VACUUM_OPTS with -j optionally, instead of
suggesting a full-fledged vacuumdb command which depends on what
pg_upgrade uses.  So there is no actual need for the if/else
complication business.
2) Perhaps we need to worry about the second vacuumdb --all command,
which may want custom options, which are not necessarily the same as
the options of the first command?  I don't think we need to care as it
applies only to an upgraded cluster using something older 8.4, just
wondering..
--
Michael

Attachment

Re: pg_upgrade: Pass -j down to vacuumdb

From
Jesper Pedersen
Date:
Hi,

On 1/29/19 12:08 AM, Michael Paquier wrote:
> On Tue, Jan 29, 2019 at 12:35:30AM +0000, Jamison, Kirk wrote:
>> I just checked the patch.
>> As per advice, you removed the versioning and specified --jobs.
>> The patch still applies, builds and passed the tests successfully.
> 
> I would document the optional VACUUM_OPTS on the page of pg_upgrade.
> If Peter thinks it is fine to not do so, that's fine for me as well.
> 

I added most of the documentation back, as requested by Kirk.

> It seems to me that the latest patch sent is incorrect for multiple
> reasons:
> 1) You still enforce -j to use the number of jobs that the caller of
> pg_upgrade provides, and we agreed that both things are separate
> concepts upthread, no?  What has been suggested by Alvaro is to add a
> comment so as one can use VACUUM_OPTS with -j optionally, instead of
> suggesting a full-fledged vacuumdb command which depends on what
> pg_upgrade uses.  So there is no actual need for the if/else
> complication business.

I think it is ok for the echo command to highlight to the user that 
running --analyze-only using the same amount of jobs will give a faster 
result.

> 2) Perhaps we need to worry about the second vacuumdb --all command,
> which may want custom options, which are not necessarily the same as
> the options of the first command?  I don't think we need to care as it
> applies only to an upgraded cluster using something older 8.4, just
> wondering..

I think that --all --analyze-in-stages is what most people want. And 
with the $VACUUMDB_OPTS variable people have an option to put in more, 
such as -j X.

Best regards,
  Jesper



Attachment

RE: pg_upgrade: Pass -j down to vacuumdb

From
"Jamison, Kirk"
Date:
On January 29, 2019 8:19 PM +0000, Jesper Pedersen wrote:
>On 1/29/19 12:08 AM, Michael Paquier wrote:
>> I would document the optional VACUUM_OPTS on the page of pg_upgrade.
>> If Peter thinks it is fine to not do so, that's fine for me as well.
>> 
..
>I added most of the documentation back, as requested by Kirk.

Actually, I find it useful to be documented. However, major contributors have higher opinions in terms of experience,
soI think it's alright with me not to include the doc part if they finally say so.
 

>> It seems to me that the latest patch sent is incorrect for multiple
>> reasons:
>> 1) You still enforce -j to use the number of jobs that the caller of 
>> pg_upgrade provides, and we agreed that both things are separate 
>> concepts upthread, no?  What has been suggested by Alvaro is to add a 
>> comment so as one can use VACUUM_OPTS with -j optionally, instead of 
>> suggesting a full-fledged vacuumdb command which depends on what 
>> pg_upgrade uses.  So there is no actual need for the if/else 
>> complication business.

> I think it is ok for the echo command to highlight to the user that running --analyze-only using the same amount of
jobswill give a faster result.
 

I missed that part. 
IIUC, using the $VACUUMDB_OPTS variable would simplify and correct the usage of jobs for vacuumdb. (pg_upgrade jobs is
notequal to vacuumdb jobs) Plus, it might simplify and reduce the number of additional lines.
 
Tom Lane also suggested above to make the script absorb the value from env.
Would that address your concern of getting a faster result, yes?


Regards,
Kirk Jamison



Re: pg_upgrade: Pass -j down to vacuumdb

From
Jesper Pedersen
Date:
Hi,

On 1/30/19 7:59 PM, Jamison, Kirk wrote:
>> I added most of the documentation back, as requested by Kirk.
> 
> Actually, I find it useful to be documented. However, major contributors have higher opinions in terms of experience,
soI think it's alright with me not to include the doc part if they finally say so.
 
> 

I think we need to leave it to the Committer to decide, as both Peter  
and Michael are committers; provided the patch reaches RfC.

>>> It seems to me that the latest patch sent is incorrect for multiple
>>> reasons:
>>> 1) You still enforce -j to use the number of jobs that the caller of
>>> pg_upgrade provides, and we agreed that both things are separate
>>> concepts upthread, no?  What has been suggested by Alvaro is to add a
>>> comment so as one can use VACUUM_OPTS with -j optionally, instead of
>>> suggesting a full-fledged vacuumdb command which depends on what
>>> pg_upgrade uses.  So there is no actual need for the if/else
>>> complication business.
> 
>> I think it is ok for the echo command to highlight to the user that running --analyze-only using the same amount of
jobswill give a faster result.
 
> 
> I missed that part.
> IIUC, using the $VACUUMDB_OPTS variable would simplify and correct the usage of jobs for vacuumdb. (pg_upgrade jobs
isnot equal to vacuumdb jobs) Plus, it might simplify and reduce the number of additional lines.
 
> Tom Lane also suggested above to make the script absorb the value from env.
> Would that address your concern of getting a faster result, yes?
> 

The actual line in the script executed is using $VACUUMDB_OPTS at the  
moment, so could you expand on the above ? The 'if' could be removed if  
we assume that pg_upgrade would only be used with server > 9.5, but to  
be safer I would leave it in, as it is only console output.

Best regards,
  Jesper


RE: pg_upgrade: Pass -j down to vacuumdb

From
"Jamison, Kirk"
Date:
On January 31, 2019, 9:29PM +0000, Jesper Pedersen wrote:

>>> I added most of the documentation back, as requested by Kirk.
>> 
>> Actually, I find it useful to be documented. However, major contributors have higher opinions in terms of
experience,so I think it's alright with me not to include the doc part if they finally say so. 
 
>
> I think we need to leave it to the Committer to decide, as both Peter and Michael are committers; provided the patch
reachesRfC.
 

Agreed.

>>> 1) You still enforce -j to use the number of jobs that the caller of 
>>> pg_upgrade provides, and we agreed that both things are separate 
>>> concepts upthread, no?  What has been suggested by Alvaro is to add 
>>> a comment so as one can use VACUUM_OPTS with -j optionally, instead 
>>> of suggesting a full-fledged vacuumdb command which depends on what 
>>> pg_upgrade uses.  So there is no actual need for the if/else 
>>> complication business.
> 
>> I think it is ok for the echo command to highlight to the user that 
>> running --analyze-only using the same amount of jobs will give a faster result.

Since you used user_opts.jobs (which is coming from pg_upgrade, is it not?),
could you clarify more the statement above? Or did you mean somehow that
it won't be a problem for vacuumdb to use the same?
Though correctness-wise is arguable, if the committers can let it pass
from your answer, then I think it's alright.

I'm not sure if misunderstood the purpose of $VACUUMDB_OPTS. I thought what
the other developers suggested is to utilize it so that --jobs for vacuumdb
would be optional and just based from user-specified option $VACUUMDB_OPTS.
By which it would not utilize the amount of jobs used for pg_upgrade.
To address your need of needing a parallel, the script would just then
suggest if the user wants a --jobs option. The if/else for number of jobs is
not needed in that case, maybe only for ifndef WIN32 else case.

Regards,
Kirk Jamison



Re: pg_upgrade: Pass -j down to vacuumdb

From
Alvaro Herrera
Date:
On 2019-Feb-01, Jamison, Kirk wrote:

> I'm not sure if misunderstood the purpose of $VACUUMDB_OPTS. I thought what
> the other developers suggested is to utilize it so that --jobs for vacuumdb
> would be optional and just based from user-specified option $VACUUMDB_OPTS.
> By which it would not utilize the amount of jobs used for pg_upgrade.
> To address your need of needing a parallel, the script would just then
> suggest if the user wants a --jobs option. The if/else for number of jobs is
> not needed in that case, maybe only for ifndef WIN32 else case.

No Kirk, you got it right -- (some of) those ifdefs are not needed, as
adding the --jobs in the command line is not needed.  I do think some
extra whitespace in the format string is needed.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pg_upgrade: Pass -j down to vacuumdb

From
Jesper Pedersen
Date:
Hi,

On 2/1/19 4:58 AM, Alvaro Herrera wrote:
> On 2019-Feb-01, Jamison, Kirk wrote:
> 
>> I'm not sure if misunderstood the purpose of $VACUUMDB_OPTS. I thought what
>> the other developers suggested is to utilize it so that --jobs for vacuumdb
>> would be optional and just based from user-specified option $VACUUMDB_OPTS.
>> By which it would not utilize the amount of jobs used for pg_upgrade.
>> To address your need of needing a parallel, the script would just then
>> suggest if the user wants a --jobs option. The if/else for number of jobs is
>> not needed in that case, maybe only for ifndef WIN32 else case.
> 
> No Kirk, you got it right -- (some of) those ifdefs are not needed, as
> adding the --jobs in the command line is not needed.  I do think some
> extra whitespace in the format string is needed.
> 

I'm confused by this.

The point of the patch is to highlight to the user that even though 
he/she does "pg_upgrade -j 8" the "-j 8" option isn't passed down to 
vacuumdb.

Default value is 1, so in that case the echo command doesn't highlight 
that fact, otherwise it is. The user can then cancel the script and use 
the suggested command line.

The script then executes the vaccumdb command with the $VACUUMDB_OPTS 
argument as noted in the documentation.

Sample script attached as well.

I added extra space in the --analyze-in-stages part.

Kirk, can you provide a delta patch to match what you are thinking ?

Best regards,
  Jesper


Attachment

Re: pg_upgrade: Pass -j down to vacuumdb

From
Andres Freund
Date:
Hi,

On 2019-02-01 14:14:27 -0500, Jesper Pedersen wrote:

> From 4b5856725af5d971a26a2ba150c1067ee21bb242 Mon Sep 17 00:00:00 2001
> From: jesperpedersen <jesper.pedersen@redhat.com>
> Date: Fri, 21 Dec 2018 05:05:31 -0500
> Subject: [PATCH] Highlight that the --jobs option isn't passed down to
>  vacuumdb by default.

This is currently marked as waiting for author, which doesn't seem
accurate. I've set it to needs review, and moved it to the next
commitfest - please revert if that's not accurate.

- Andres


RE: pg_upgrade: Pass -j down to vacuumdb

From
"Jamison, Kirk"
Date:
Hi, 

On February 1, 2019 8:14 PM +0000, Jesper Pedersen wrote:

> On 2/1/19 4:58 AM, Alvaro Herrera wrote:
>> On 2019-Feb-01, Jamison, Kirk wrote:
>>> I'm not sure if misunderstood the purpose of $VACUUMDB_OPTS. I 
>>> thought what the other developers suggested is to utilize it so that 
>>> --jobs for vacuumdb would be optional and just based from user-specified option $VACUUMDB_OPTS.
>>> By which it would not utilize the amount of jobs used for pg_upgrade.
>>> To address your need of needing a parallel, the script would just 
>>> then suggest if the user wants a --jobs option. The if/else for 
>>> number of jobs is not needed in that case, maybe only for ifndef WIN32 else case.
>> 
>> No Kirk, you got it right -- (some of) those ifdefs are not needed, as 
>> adding the --jobs in the command line is not needed.  I do think some 
>> extra whitespace in the format string is needed.


> The point of the patch is to highlight to the user that even though he/she does "pg_upgrade -j 8" the "-j 8" option
isn'tpassed down to vacuumdb.
 
> Default value is 1, so in that case the echo command doesn't highlight that fact, otherwise it is. The user can then
cancelthe script and use the suggested command line.
 
> The script then executes the vaccumdb command with the $VACUUMDB_OPTS argument as noted in the documentation.
> Sample script attached as well.

Sorry I think I might have understood now where you are coming from,
where your script clarifies that it's not necessarily passed down.
If committers can let it pass, maybe it's necessary to highlight in the script,
Something like:
"If you would like default statistics as quickly as possible (e.g. run in parallel)..."

The difference is still perhaps your use of --jobs option
as somehow "explicitly" embedded in the code, compared to the suggestion of
making it optional by using the $VACUUMDB_OPTS variable, so that
jobs need not to be explicitly written, unless the user wants to.
(which I also think it's a better improvement because it's optional)

Some quick code check. sorry, don't have much time to write a patch for it today.
Perhaps you could write it similar to what you did in the --analyze-in-stages part.
Maybe something like (not sure if correct):
+    fprintf(script, "echo %s    \"%s/vacuumdb\" %s $VACUUMDB_OPTS --all %s%s\n", ECHO_QUOTE,
+            new_cluster.bindir, user_specification.data,
+            /* Did we copy the free space files? */
+            (GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
+            "--analyze-only" : "--analyze", ECHO_QUOTE);

I'll continue to review though from time to time.

Regards,
Kirk Jamison

RE: pg_upgrade: Pass -j down to vacuumdb

From
"Jamison, Kirk"
Date:
Hi Jesper,

Sorry I almost completely forgot to get back to you on this.
Actually your patch works when I tested it before,
and I understand the intention.
. 
Although a point was raised by other developers by making
--jobs optional in the suggested line by using the env variable instead.

> > Kirk, can you provide a delta patch to match what you are thinking ?

I was thinking of something like the attached, 
but the user needs to edit the variable value though. 
I am not sure how to implement the suggestion to allow the 
script to absorb a value from the environment,
so that it doesn't need to be edited at all.

Regards,
Kirk Jamison

Attachment

Re: pg_upgrade: Pass -j down to vacuumdb

From
Robert Haas
Date:
On Mon, Mar 11, 2019 at 10:05 PM Jamison, Kirk <k.jamison@jp.fujitsu.com> wrote:
> I was thinking of something like the attached,

+     machine. Note that this option isn't passed to the
+     <application>vacuumdb</application> application by default.

The words 'by default' should be removed here, because there is also
no non-default way to get that behavior, either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pg_upgrade: Pass -j down to vacuumdb

From
Jesper Pedersen
Date:
Hi,

Thanks Kirk !

On 3/12/19 2:20 PM, Robert Haas wrote:
> The words 'by default' should be removed here, because there is also
> no non-default way to get that behavior, either.
> 

Here is v9 based on Kirk's and your input.

Best regards,
  Jesper


Attachment

RE: pg_upgrade: Pass -j down to vacuumdb

From
"Jamison, Kirk"
Date:
Hi Jesper,

> Thanks Kirk !
>
> > On 3/12/19 2:20 PM, Robert Haas wrote:
> > The words 'by default' should be removed here, because there is also 
> > no non-default way to get that behavior, either.
> > 
> 
> Here is v9 based on Kirk's and your input.

Thanks! Although there were trailing whitespaces.
After fixing that, it works as intended (built & passed tests successfully).
That aside, the v9 patch looks good and comment was addressed.


Regards,
Kirk Jamison

Re: pg_upgrade: Pass -j down to vacuumdb

From
Tom Lane
Date:
Jesper Pedersen <jesper.pedersen@redhat.com> writes:
> [ v9_0001-Highlight-that-the-jobs-option-isn-t-passed-down-to-.patch ]

Now that I've looked at this, it seems like it's fairly confused about
what the proposed VACUUMDB_OPTS variable would actually do for you.
If you're going to run vacuumdb directly, it hardly seems like you'd
bother with setting such a variable; you'd just enter the options you
want directly on the command line.

Conversely, if what you plan to do is set VACUUMDB_OPTS and then
run this script, the fact that --analyze-in-stages is hard-wired
into the script's invocation doesn't seem right either: you probably
don't want that if your goal is to get done as fast as possible.

In short, I'm not convinced that most of this patch is an improvement
on the status quo.  I think we'd be best off to just take the idea
of explicitly mentioning adding --jobs to a manual run, ie roughly

    fprintf(script, "echo %sthis script and run:%s\n",
            ECHO_QUOTE, ECHO_QUOTE);
    fprintf(script, "echo %s    \"%s/vacuumdb\" %s--all %s%s\n", ECHO_QUOTE,
            new_cluster.bindir, user_specification.data,
    /* Did we copy the free space files? */
            (GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
            "--analyze-only" : "--analyze", ECHO_QUOTE);
+    fprintf(script, "echo %sYou may wish to add --jobs=N for parallel analyzing.%s\n",
+            ECHO_QUOTE, ECHO_QUOTE);
    fprintf(script, "echo%s\n\n", ECHO_BLANK);

    fprintf(script, "\"%s/vacuumdb\" %s--all --analyze-in-stages\n",


            regards, tom lane


Re: pg_upgrade: Pass -j down to vacuumdb

From
Michael Paquier
Date:
On Mon, Mar 25, 2019 at 05:57:50PM -0400, Tom Lane wrote:
> In short, I'm not convinced that most of this patch is an improvement
> on the status quo.  I think we'd be best off to just take the idea
> of explicitly mentioning adding --jobs to a manual run, ie roughly

Yeah, no objections from here to keep that stuff the simpler the
better.  So I am on board with your suggestion.
--
Michael

Attachment

Re: pg_upgrade: Pass -j down to vacuumdb

From
Daniel Gustafsson
Date:
On Tuesday, March 26, 2019 3:20 AM, Michael Paquier <michael@paquier.xyz> wrote:

> On Mon, Mar 25, 2019 at 05:57:50PM -0400, Tom Lane wrote:
>
> > In short, I'm not convinced that most of this patch is an improvement
> > on the status quo. I think we'd be best off to just take the idea
> > of explicitly mentioning adding --jobs to a manual run, ie roughly
>
> Yeah, no objections from here to keep that stuff the simpler the
> better. So I am on board with your suggestion.

+1, I'd rather have that as "documentation" after an upgrade.

cheers ./daniel


Re: pg_upgrade: Pass -j down to vacuumdb

From
Peter Eisentraut
Date:
On 2019-03-25 22:57, Tom Lane wrote:
> +    fprintf(script, "echo %sYou may wish to add --jobs=N for parallel analyzing.%s\n",
> +            ECHO_QUOTE, ECHO_QUOTE);

But then you get that information after you have already started the script.

I don't find any information about this analyze business on the
pg_upgrade reference page.  Maybe a discussion there could explain the
different paths better than making the output script extra complicated.

Essentially: If you want a slow and gentle analyze, use the supplied
script.  If you want a fast analyze, use vacuumdb, perhaps with an
appropriate --jobs option.  Note that pg_upgrade --jobs and vacuumdb
--jobs are resource-bound in different ways, so the same value might not
be appropriate for both.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pg_upgrade: Pass -j down to vacuumdb

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> On 2019-03-25 22:57, Tom Lane wrote:
>> +    fprintf(script, "echo %sYou may wish to add --jobs=N for parallel analyzing.%s\n",
>> +            ECHO_QUOTE, ECHO_QUOTE);

> But then you get that information after you have already started the script.

Yes, but that's already true of all the info that this script prints out.

> I don't find any information about this analyze business on the
> pg_upgrade reference page.  Maybe a discussion there could explain the
> different paths better than making the output script extra complicated.

> Essentially: If you want a slow and gentle analyze, use the supplied
> script.  If you want a fast analyze, use vacuumdb, perhaps with an
> appropriate --jobs option.  Note that pg_upgrade --jobs and vacuumdb
> --jobs are resource-bound in different ways, so the same value might not
> be appropriate for both.

I have no objection to handling it that way (i.e., just as a doc fix).
Or we could do both.

            regards, tom lane


Re: pg_upgrade: Pass -j down to vacuumdb

From
Jeff Janes
Date:
On Tue, Mar 26, 2019 at 7:28 AM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 2019-03-25 22:57, Tom Lane wrote:
> +     fprintf(script, "echo %sYou may wish to add --jobs=N for parallel analyzing.%s\n",
> +                     ECHO_QUOTE, ECHO_QUOTE);

But then you get that information after you have already started the script.

True, but the same goes for all the other information there, and it sleeps to let you break out of it.  And I make it a habit to glance through any scripts someone suggests that I run, so would notice the embedded advice without running it at all.

I don't find any information about this analyze business on the
pg_upgrade reference page.  Maybe a discussion there could explain the
different paths better than making the output script extra complicated.

Essentially: If you want a slow and gentle analyze, use the supplied
script.  If you want a fast analyze, use vacuumdb, perhaps with an
appropriate --jobs option.  Note that pg_upgrade --jobs and vacuumdb
--jobs are resource-bound in different ways, so the same value might not
be appropriate for both.


To me, analyze-in-stages is not about gentleness at all.  For example, it does nothing to move vacuum_cost_delay away from its default of 0.  Rather, it is about slamming the bare minimum statistics in there as fast as possible, so your database doesn't keel over from horrible query plans on even simple queries as soon as you reopen.  I want the database to survive long enough for the more complete statistics to be gathered.  If you have quickly accumulated max_connection processes all running horrible query plans that never finish, your database might as well still be closed for all the good it does the users. And all the load generated by those is going to make the critical ANALYZE all that much slower.

At first blush I thought it was obvious that you would not want to run analyze-in-stages in parallel.  But after thinking about it some more and reflecting on experience doing some troublesome upgrades, I would reverse that and say it is now obvious you do want at least the first stage of analyze-in-stages, and probably the first two, to run in parallel.  That is not currently an option it supports, so we can't really recommend it in the script or the docs.

But we could at least adopt the more straightforward patch, suggest that if they don't want analyze-in-stages they should consider doing the big-bang analyze in parallel.

Cheers,

Jeff

Re: pg_upgrade: Pass -j down to vacuumdb

From
Peter Eisentraut
Date:
On 2019-03-28 02:43, Jeff Janes wrote:
> At first blush I thought it was obvious that you would not want to run
> analyze-in-stages in parallel.  But after thinking about it some more
> and reflecting on experience doing some troublesome upgrades, I would
> reverse that and say it is now obvious you do want at least the first
> stage of analyze-in-stages, and probably the first two, to run in
> parallel.  That is not currently an option it supports, so we can't
> really recommend it in the script or the docs.

So do you think we should copy down the -j option from pg_upgrade, or
make some other arrangement?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pg_upgrade: Pass -j down to vacuumdb

From
Jeff Janes
Date:
On Fri, Mar 29, 2019 at 5:58 AM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 2019-03-28 02:43, Jeff Janes wrote:
> At first blush I thought it was obvious that you would not want to run
> analyze-in-stages in parallel.  But after thinking about it some more
> and reflecting on experience doing some troublesome upgrades, I would
> reverse that and say it is now obvious you do want at least the first
> stage of analyze-in-stages, and probably the first two, to run in
> parallel.  That is not currently an option it supports, so we can't
> really recommend it in the script or the docs.

So do you think we should copy down the -j option from pg_upgrade, or
make some other arrangement?

For 12, I think we should not pass it down so that it runs automatically, and just go with a doc-only patch instead.  Where the text written into the analyze_new_cluster.sh recommending to hit Ctrl-C and do something else is counted as documentation.

I agree with you that the value of -j that was used for pg_ugrade might not be suitable for vacuumdb, but anyone who tests things thoroughly enough to know exactly what value to use for each is not going to be the target audience of analyze_new_cluster.sh anyway.  So I think the -j used in pg_upgrade can just be written directly into the suggestion, and that would be good enough for the intended use.

Ideally I think the analyze-in-stages should have the ability to parallelize the first stage and not the last one, but that is not v12 material.

Even more ideally it should only have two stages, not three.  One stage runs to generate minimally-tolerable stats before the database is opened to other users, and one runs after it is open (but hopefully before the big end-of-month reports get run, or whatever the big statistics-sensitive queries are on your system).   But we don't really have a concept of "open to other users" in the first place, and doing it yourself by juggling pg_hba files is annoying and error prone.  So maybe the first stage could be run by pg_upgrade itself, while the new server is still running on a linux socket in a private directory.

The defense of the current three-stage method for analyze-in-stages is that very few people are likely to know just what the level of "minimally-tolerable stats" for their system actually are, because upgrades are rare enough not to learn by experience, and realistic load-generators are rare enough not to learn from test/QA environments.  If the default_statistics_target=1 stats are enough, then you got them quickly.  If they aren't, at least you didn't waste too much time collecting them before moving on to the second-stage default_statistics_target=10 and then the final stage.  So the three stages are amortizing over our ignorance.
 
Cheers,

Jeff

Re: pg_upgrade: Pass -j down to vacuumdb

From
Justin Pryzby
Date:
On Wed, Apr 03, 2019 at 04:42:14PM -0400, Jeff Janes wrote:
> So maybe the first stage could
> be run by pg_upgrade itself, while the new server is still running on a
> linux socket in a private directory.

I think that would take too long.  It would be less of an issue if there was
feedback/progress from pg_upgrade during the analyze.

For our upgrades (which typically take ~15min but several customers take up to
~60min), I only analyze base tables (essentially, those which are neither
parents nor children), then start services, then ANALYZE with default stats
target.  I would want to avoid delaying services restart for more than another
(say) 5 minutes, and I would want to avoid even that unless there was a
progress report indicating that it's projected to take only a few more minutes.

I just did a test on one of our large-but-not-huge customers.  With
stats_target=1, analyzing a 145GB partitioned table looks like it'll take
perhaps an hour; they have ~1TB data, so delaying services during ANALYZE would
nullify the utility of pg_upgrade.  I can restore the essential tables from
backup in 15-30 minutes.

It might be fine if pg_upgrade took an option which enabled analyze, perhaps
instead of outputting analyze_new_cluster.sh.  But actually, a problem with
*that* is that currently pg_upgrade avoids starting the new cluster.  That
seems to be deliberate, since, with --link, that's an irreversible operation:
it's unsafe to start the old cluster afterwards.

Tangent: I have a queued mail from ~15 months ago wherein I proposed adding to
pg_upgrade an option to remove the old data dir (or probably only the files
associated with known relations).  I realized at the time that would be pretty
scary without having first verified that the new cluster at least starts.  I'm
not sure how good an idea that is, but --startnewcluster would be needed there,
too.

Justin



Re: pg_upgrade: Pass -j down to vacuumdb

From
Peter Eisentraut
Date:
On 2019-04-03 23:24, Justin Pryzby wrote:
> I just did a test on one of our large-but-not-huge customers.  With
> stats_target=1, analyzing a 145GB partitioned table looks like it'll take
> perhaps an hour; they have ~1TB data, so delaying services during ANALYZE would
> nullify the utility of pg_upgrade.  I can restore the essential tables from
> backup in 15-30 minutes.

I think the real fix is to have pg_upgrade copy over the statistics.
They are right there after all, just take them.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pg_upgrade: Pass -j down to vacuumdb

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> I think the real fix is to have pg_upgrade copy over the statistics.
> They are right there after all, just take them.

Well, it's not "just take them", we need to develop some infrastructure
that will permit coping with cross-version differences in the stats
storage.  This isn't insoluble, but it will take some work.  We had
a prior discussion that trailed off without much result:

https://www.postgresql.org/message-id/flat/724322880.K8vzik8zPz%40abook

            regards, tom lane