Thread: Is analyze_new_cluster.sh still useful?

Is analyze_new_cluster.sh still useful?

From

Christoph Berg

Date:

18 June 2014, 15:41:12

Hi,

now that we have vacuumdb --all --analyze-in-stages in 9.4, wouldn't
it make sense to get rid of the analyze_new_cluster.sh file which
pg_upgrade writes? The net content is a single line which could as
well be printed by pg_upgrade itself. Instead of an lengthy
explanation how to invoke that manually, there should be a short note
and a pointer to some manual section. I think the chances of people
reading that would even be increased.

Similary, I don't really see the usefulness of delete_old_cluster.sh
as a file, when "rm -rf" could just be presented on the console for
the admin to execute by cut-and-paste.

Christoph
-- 
Senior Berater, Tel.: +49 (0)21 61 / 46 43-187
credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209
Hohenzollernstr. 133, 41061 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
pgp fingerprint: 5C48 FE61 57F4 9179 5970  87C6 4C5A 6BAB 12D2 A7AE

Re: Is analyze_new_cluster.sh still useful?

From

Tom Lane

Date:

18 June 2014, 16:51:50

Christoph Berg <christoph.berg@credativ.de> writes:
> now that we have vacuumdb --all --analyze-in-stages in 9.4, wouldn't
> it make sense to get rid of the analyze_new_cluster.sh file which
> pg_upgrade writes? The net content is a single line which could as
> well be printed by pg_upgrade itself. Instead of an lengthy
> explanation how to invoke that manually, there should be a short note
> and a pointer to some manual section. I think the chances of people
> reading that would even be increased.

> Similary, I don't really see the usefulness of delete_old_cluster.sh
> as a file, when "rm -rf" could just be presented on the console for
> the admin to execute by cut-and-paste.

There are contexts where pg_upgrade is executed by some wrapper script
and the user doesn't normally see its output directly.  This is the
case in the Red Hat packaging (unless Honza changed it since I left ;-))
and I think Debian might be similar.

I generally don't like the amount of cruft pg_upgrade leaves lying
around, so I'd be glad to see these script files go away if possible;
but we need to think about how this will play when there's a wrapper
script between pg_upgrade and the human user.

In the Red Hat wrapper script, the pg_upgrade output is dumped into a
log file, which the user can look at if he wants, but I'd bet the
average user doesn't read it --- that was certainly the expectation.
Of course, said user probably never notices the separate shell
scripts either, so maybe it's a wash.

Another angle is that some folks might have tried to automate things
even more, with a wrapper script that starts up the new postmaster
and runs analyze_new_cluster.sh all by itself.  I guess they could
make the wrapper do "vacuumdb --all --analyze-in-stages" directly,
though, so maybe that's not a fatal objection either.
        regards, tom lane

Re: Is analyze_new_cluster.sh still useful?

From

Andres Freund

Date:

18 June 2014, 16:56:28

On 2014-06-18 12:51:43 -0400, Tom Lane wrote:
> Another angle is that some folks might have tried to automate things
> even more, with a wrapper script that starts up the new postmaster
> and runs analyze_new_cluster.sh all by itself.  I guess they could
> make the wrapper do "vacuumdb --all --analyze-in-stages" directly,
> though, so maybe that's not a fatal objection either.

Wouldn't that be quite counterproductive? The reason we don't normally
do that and why --analyze-in-stages exists is that the cluster should be
started up as fast as possible. Restarting it after ANALYZE went through
would be defeating that purpose, no?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Is analyze_new_cluster.sh still useful?

From

Tom Lane

Date:

18 June 2014, 17:24:22

Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-06-18 12:51:43 -0400, Tom Lane wrote:
>> Another angle is that some folks might have tried to automate things
>> even more, with a wrapper script that starts up the new postmaster
>> and runs analyze_new_cluster.sh all by itself.  I guess they could
>> make the wrapper do "vacuumdb --all --analyze-in-stages" directly,
>> though, so maybe that's not a fatal objection either.

> Wouldn't that be quite counterproductive? The reason we don't normally
> do that and why --analyze-in-stages exists is that the cluster should be
> started up as fast as possible. Restarting it after ANALYZE went through
> would be defeating that purpose, no?

How so?  Once you've started the postmaster, you're open for business,
no?
        regards, tom lane

Re: Is analyze_new_cluster.sh still useful?

From

Andres Freund

Date:

18 June 2014, 17:28:02

On 2014-06-18 13:24:14 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2014-06-18 12:51:43 -0400, Tom Lane wrote:
> >> Another angle is that some folks might have tried to automate things
> >> even more, with a wrapper script that starts up the new postmaster
> >> and runs analyze_new_cluster.sh all by itself.  I guess they could
> >> make the wrapper do "vacuumdb --all --analyze-in-stages" directly,
> >> though, so maybe that's not a fatal objection either.
> 
> > Wouldn't that be quite counterproductive? The reason we don't normally
> > do that and why --analyze-in-stages exists is that the cluster should be
> > started up as fast as possible. Restarting it after ANALYZE went through
> > would be defeating that purpose, no?
> 
> How so?  Once you've started the postmaster, you're open for business,
> no?

Wasn't there lots of talk about making the server inaccessible while
pg_upgrade is doing its thing? Also, many people are desparately unhappy
if postgres has to be restarted (to return to be being OS managed) after
their application already has connected.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Is analyze_new_cluster.sh still useful?

From

Tom Lane

Date:

18 June 2014, 17:54:27

Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-06-18 13:24:14 -0400, Tom Lane wrote:
>> Andres Freund <andres@2ndquadrant.com> writes:
>>> On 2014-06-18 12:51:43 -0400, Tom Lane wrote:
>>>> Another angle is that some folks might have tried to automate things
>>>> even more, with a wrapper script that starts up the new postmaster
>>>> and runs analyze_new_cluster.sh all by itself.  I guess they could
>>>> make the wrapper do "vacuumdb --all --analyze-in-stages" directly,
>>>> though, so maybe that's not a fatal objection either.

>>> Wouldn't that be quite counterproductive? The reason we don't normally
>>> do that and why --analyze-in-stages exists is that the cluster should be
>>> started up as fast as possible. Restarting it after ANALYZE went through
>>> would be defeating that purpose, no?

>> How so?  Once you've started the postmaster, you're open for business,
>> no?

> Wasn't there lots of talk about making the server inaccessible while
> pg_upgrade is doing its thing? Also, many people are desparately unhappy
> if postgres has to be restarted (to return to be being OS managed) after
> their application already has connected.

I think we're not on the same page.  My point is that someone might want
to automate the whole sequence: stop old postmaster, run pg_upgrade, start
the updated postmaster normally (hence it *is* open for business), kick
off the analyze runs.  If you're concerned about minimal downtime you
would not want to be waiting around for the admin to issue a perfectly
predictable series of commands.
        regards, tom lane

Re: Is analyze_new_cluster.sh still useful?

From

Andres Freund

Date:

18 June 2014, 17:58:49

On 2014-06-18 13:54:16 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2014-06-18 13:24:14 -0400, Tom Lane wrote:
> >> Andres Freund <andres@2ndquadrant.com> writes:
> >>> On 2014-06-18 12:51:43 -0400, Tom Lane wrote:
> >>>> Another angle is that some folks might have tried to automate things
> >>>> even more, with a wrapper script that starts up the new postmaster
> >>>> and runs analyze_new_cluster.sh all by itself.  I guess they could
> >>>> make the wrapper do "vacuumdb --all --analyze-in-stages" directly,
> >>>> though, so maybe that's not a fatal objection either.
> 
> >>> Wouldn't that be quite counterproductive? The reason we don't normally
> >>> do that and why --analyze-in-stages exists is that the cluster should be
> >>> started up as fast as possible. Restarting it after ANALYZE went through
> >>> would be defeating that purpose, no?
> 
> >> How so?  Once you've started the postmaster, you're open for business,
> >> no?
> 
> > Wasn't there lots of talk about making the server inaccessible while
> > pg_upgrade is doing its thing? Also, many people are desparately unhappy
> > if postgres has to be restarted (to return to be being OS managed) after
> > their application already has connected.
> 
> I think we're not on the same page.  My point is that someone might want
> to automate the whole sequence: stop old postmaster, run pg_upgrade, start
> the updated postmaster normally (hence it *is* open for business), kick
> off the analyze runs.  If you're concerned about minimal downtime you
> would not want to be waiting around for the admin to issue a perfectly
> predictable series of commands.

Oh, yea. Definitely. I think that's what I've seen happen in pretty much
*all* usages of pg_upgrade. I somehow misread that you wanted to add
that into pg_upgrade. Not really sure how, sorry.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Is analyze_new_cluster.sh still useful?

From

Jeff Janes

Date:

18 June 2014, 19:51:58

On Wed, Jun 18, 2014 at 10:58 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-06-18 13:54:16 -0400, Tom Lane wrote:

>> I think we're not on the same page.  My point is that someone might want
>> to automate the whole sequence: stop old postmaster, run pg_upgrade, start
>> the updated postmaster normally (hence it *is* open for business), kick
>> off the analyze runs.  If you're concerned about minimal downtime you
>> would not want to be waiting around for the admin to issue a perfectly
>> predictable series of commands.
>
> Oh, yea. Definitely. I think that's what I've seen happen in pretty much
> *all* usages of pg_upgrade.

I think it is a popular way to do it not because it is a particularly
good way, but because the better alternatives are not readily
available.

If your database needs statistics badly enough that you want to do a
coarse pre-pass with default_statistics_target=1, why would you want
that pass to be done on an open database?  Surely you don't want 100
open connections all doing giant seq scans (that should be single-row
look up, but without stats they are not) competing with the analyze.

Having a database which is "open" to queries but they have such
deranged execution plans that they never actually finish is not truly
open, and the attempts to service those futile queries just delays the
true opening even further.

If you really need a multi pass ANALYZE, you probably need the first
pass to be before the database opens because otherwise the open will
be a disaster, and the 2nd pass to be after the database opens but
before your bulk queries (mass deletes, EOM reports, etc.) kick in.
Having both passes be on the same "side" of the opening seems unlikely
to do much good for most use cases.  Fortunately it probably doesn't
do much harm to most people, either, simple because most databases are
not terribly sensitive to the issue.

Cheers,

Jeff

Re: Is analyze_new_cluster.sh still useful?

From

Christoph Berg

Date:

20 June 2014, 12:26:12

Re: Tom Lane 2014-06-18 <21034.1403110303@sss.pgh.pa.us>
> Christoph Berg <christoph.berg@credativ.de> writes:
> > now that we have vacuumdb --all --analyze-in-stages in 9.4, wouldn't
> > it make sense to get rid of the analyze_new_cluster.sh file which
> > pg_upgrade writes? The net content is a single line which could as
> > well be printed by pg_upgrade itself. Instead of an lengthy
> > explanation how to invoke that manually, there should be a short note
> > and a pointer to some manual section. I think the chances of people
> > reading that would even be increased.
> 
> > Similary, I don't really see the usefulness of delete_old_cluster.sh
> > as a file, when "rm -rf" could just be presented on the console for
> > the admin to execute by cut-and-paste.
> 
> There are contexts where pg_upgrade is executed by some wrapper script
> and the user doesn't normally see its output directly.  This is the
> case in the Red Hat packaging (unless Honza changed it since I left ;-))
> and I think Debian might be similar.

pg_upgradecluster shows the full pg_upgrade output in Debian. (But
pg_createcluster hides the initdb output, so it the other way round
here... It'd be nice if initdb would just output the interesting parts
and omit most of the chatter.)

> I generally don't like the amount of cruft pg_upgrade leaves lying
> around, so I'd be glad to see these script files go away if possible;
> but we need to think about how this will play when there's a wrapper
> script between pg_upgrade and the human user.
> 
> In the Red Hat wrapper script, the pg_upgrade output is dumped into a
> log file, which the user can look at if he wants, but I'd bet the
> average user doesn't read it --- that was certainly the expectation.
> Of course, said user probably never notices the separate shell
> scripts either, so maybe it's a wash.
> 
> Another angle is that some folks might have tried to automate things
> even more, with a wrapper script that starts up the new postmaster
> and runs analyze_new_cluster.sh all by itself.  I guess they could
> make the wrapper do "vacuumdb --all --analyze-in-stages" directly,
> though, so maybe that's not a fatal objection either.

Yeah that was my point - that's a single static command, that could be
executed by the wrapper, and it would be much less opaque. (Same for
the delete script - before looking into the file you'd think it would
do all sorts of cleanup, but then issues a simple rm -rf.)

Mit freundlichen Grüßen,
Christoph Berg
-- 
Senior Berater, Tel.: +49 (0)21 61 / 46 43-187
credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209
Hohenzollernstr. 133, 41061 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
pgp fingerprint: 5C48 FE61 57F4 9179 5970  87C6 4C5A 6BAB 12D2 A7AE

Re: Is analyze_new_cluster.sh still useful?

From

Jeff Janes

Date:

20 June 2014, 14:58:17

On Wed, Jun 18, 2014 at 8:41 AM, Christoph Berg
<christoph.berg@credativ.de> wrote:
> Hi,
>
> now that we have vacuumdb --all --analyze-in-stages in 9.4, wouldn't
> it make sense to get rid of the analyze_new_cluster.sh file which
> pg_upgrade writes? The net content is a single line which could as
> well be printed by pg_upgrade itself. Instead of an lengthy
> explanation how to invoke that manually, there should be a short note
> and a pointer to some manual section. I think the chances of people
> reading that would even be increased.

That one line was longer in the past, it could become longer again in
the future.  I don't think we should toggle the presentation back and
forth from version to version depending how long it happens to be.

> Similary, I don't really see the usefulness of delete_old_cluster.sh
> as a file, when "rm -rf" could just be presented on the console for
> the admin to execute by cut-and-paste.

I certainly would not want to run rm -rf commands copied off the
console window.  A slip of the mouse (or the paste buffer) and
suddenly you are removing entirely the wrong level of the directory
tree.

But I wouldn't mind an option to suppress the creation of those files.

Cheers,

Jeff

Re: Is analyze_new_cluster.sh still useful?

From

Christoph Berg

Date:

20 June 2014, 15:15:12

Re: Jeff Janes 2014-06-20 <CAMkU=1z3Edq+CNRo4F=jBEzXNMidSskdm=cPcAZnOgdY2sivXw@mail.gmail.com>
> On Wed, Jun 18, 2014 at 8:41 AM, Christoph Berg
> <christoph.berg@credativ.de> wrote:
> > Hi,
> >
> > now that we have vacuumdb --all --analyze-in-stages in 9.4, wouldn't
> > it make sense to get rid of the analyze_new_cluster.sh file which
> > pg_upgrade writes? The net content is a single line which could as
> > well be printed by pg_upgrade itself. Instead of an lengthy
> > explanation how to invoke that manually, there should be a short note
> > and a pointer to some manual section. I think the chances of people
> > reading that would even be increased.
> 
> That one line was longer in the past, it could become longer again in
> the future.  I don't think we should toggle the presentation back and
> forth from version to version depending how long it happens to be.

I doubt that would happen. If there's more than analyze to do after
the upgrade, the file would get a new name, so the "API" would change
anyway.

> > Similary, I don't really see the usefulness of delete_old_cluster.sh
> > as a file, when "rm -rf" could just be presented on the console for
> > the admin to execute by cut-and-paste.
> 
> I certainly would not want to run rm -rf commands copied off the
> console window.  A slip of the mouse (or the paste buffer) and
> suddenly you are removing entirely the wrong level of the directory
> tree.

Well I don't like shell scripts containing rm -rf commands sitting in
the filesystem either.

> But I wouldn't mind an option to suppress the creation of those files.

Another nitpick here: What pg_upgrade outputs doesn't even work on
most systems, you need to ./analyze_new_cluster.sh or "sh
analyze_new_cluster.sh".

Mit freundlichen Grüßen,
Christoph Berg
-- 
Senior Berater, Tel.: +49 (0)21 61 / 46 43-187
credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209
Hohenzollernstr. 133, 41061 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
pgp fingerprint: 5C48 FE61 57F4 9179 5970  87C6 4C5A 6BAB 12D2 A7AE

Re: Is analyze_new_cluster.sh still useful?

From

Bruce Momjian

Date:

29 July 2014, 09:39:07

On Wed, Jun 18, 2014 at 05:41:06PM +0200, Christoph Berg wrote:
> Hi,
> 
> now that we have vacuumdb --all --analyze-in-stages in 9.4, wouldn't
> it make sense to get rid of the analyze_new_cluster.sh file which
> pg_upgrade writes? The net content is a single line which could as
> well be printed by pg_upgrade itself. Instead of an lengthy
> explanation how to invoke that manually, there should be a short note
> and a pointer to some manual section. I think the chances of people
> reading that would even be increased.

I was not a big fan of keeping analyze_new_cluster.sh with one command
in it, but it does maintain the same user API, so I guess that is why
people wanted it kept.

> Similary, I don't really see the usefulness of delete_old_cluster.sh
> as a file, when "rm -rf" could just be presented on the console for
> the admin to execute by cut-and-paste.

Uh, that could be hard because delete_old_cluster.sh also can also
delete the old major-version-specific subdirectories in tablespaces, so
I do think we need to keep that.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +

Re: Is analyze_new_cluster.sh still useful?

From

Bruce Momjian

Date:

29 July 2014, 09:42:39

On Fri, Jun 20, 2014 at 05:15:05PM +0200, Christoph Berg wrote:
> Another nitpick here: What pg_upgrade outputs doesn't even work on
> most systems, you need to ./analyze_new_cluster.sh or "sh
> analyze_new_cluster.sh".

Well, the output is:
Optimizer statistics are not transferred by pg_upgrade so,once you start the new server, consider running:
analyze_new_cluster.shRunningthis script will delete the old cluster's data files:    delete_old_cluster.sh
 

It is not really telling you _how_ to run them.  Would adding a ./
prefix help?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +

Re: Is analyze_new_cluster.sh still useful?

From

Christoph Berg

Date:

04 August 2014, 08:17:46

Re: Bruce Momjian 2014-07-29 <20140729094234.GC13958@momjian.us>
> On Fri, Jun 20, 2014 at 05:15:05PM +0200, Christoph Berg wrote:
> > Another nitpick here: What pg_upgrade outputs doesn't even work on
> > most systems, you need to ./analyze_new_cluster.sh or "sh
> > analyze_new_cluster.sh".
> 
> Well, the output is:
> 
>     Optimizer statistics are not transferred by pg_upgrade so,
>     once you start the new server, consider running:
>         analyze_new_cluster.sh
>     
>     Running this script will delete the old cluster's data files:
>         delete_old_cluster.sh
> 
> It is not really telling you _how_ to run them.  Would adding a ./
> prefix help?

I think it would help in hinting the user that these are not
system-wide commands in $PATH but rather scripts in the current
directory.

There's also a case for prefixing them with the full path, Debian's
pg_upgradecluster wrapper drops these scripts in a new subdirectory in
/var/log/postgresql/ along with the log files. Possibly other
automation frameworks do likewise. (Though the frameworks will likely
make delete_old_cluster.sh redundant.)

Christoph
-- 
Senior Berater, Tel.: +49 (0)21 61 / 46 43-187
credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209
Hohenzollernstr. 133, 41061 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
pgp fingerprint: 5C48 FE61 57F4 9179 5970  87C6 4C5A 6BAB 12D2 A7AE

Re: Is analyze_new_cluster.sh still useful?

From

Bruce Momjian

Date:

11 October 2014, 22:40:07

On Mon, Aug  4, 2014 at 10:17:40AM +0200, Christoph Berg wrote:
> Re: Bruce Momjian 2014-07-29 <20140729094234.GC13958@momjian.us>
> > On Fri, Jun 20, 2014 at 05:15:05PM +0200, Christoph Berg wrote:
> > > Another nitpick here: What pg_upgrade outputs doesn't even work on
> > > most systems, you need to ./analyze_new_cluster.sh or "sh
> > > analyze_new_cluster.sh".
> > 
> > Well, the output is:
> > 
> >     Optimizer statistics are not transferred by pg_upgrade so,
> >     once you start the new server, consider running:
> >         analyze_new_cluster.sh
> >     
> >     Running this script will delete the old cluster's data files:
> >         delete_old_cluster.sh
> > 
> > It is not really telling you _how_ to run them.  Would adding a ./
> > prefix help?
> 
> I think it would help in hinting the user that these are not
> system-wide commands in $PATH but rather scripts in the current
> directory.
> 
> There's also a case for prefixing them with the full path, Debian's
> pg_upgradecluster wrapper drops these scripts in a new subdirectory in
> /var/log/postgresql/ along with the log files. Possibly other
> automation frameworks do likewise. (Though the frameworks will likely
> make delete_old_cluster.sh redundant.)

I have applied a patch to 9.5 to output "./" as a prefix for Unix script
file names.  While this also works on Windows, it is likely to be
confusing.  The new Unix output is:
Upgrade Complete----------------Optimizer statistics are not transferred by pg_upgrade so,once you start the new
server,consider running:    ./analyze_new_cluster.shRunning this script will delete the old cluster's data files:
./delete_old_cluster.sh

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +

Re: Is analyze_new_cluster.sh still useful?

From

Christoph Berg

Date:

13 October 2014, 09:51:48

Re: Bruce Momjian 2014-10-12 <20141011224002.GM21267@momjian.us>
> I have applied a patch to 9.5 to output "./" as a prefix for Unix script
> file names.  While this also works on Windows, it is likely to be
> confusing.  The new Unix output is:
> 
>     Upgrade Complete
>     ----------------
>     Optimizer statistics are not transferred by pg_upgrade so,
>     once you start the new server, consider running:
>         ./analyze_new_cluster.sh
>     
>     Running this script will delete the old cluster's data files:
>         ./delete_old_cluster.sh

I like this, thanks!

Mit freundlichen Grüßen,
Christoph Berg
-- 
Senior Berater, Tel.: +49 (0)21 61 / 46 43-187
credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209
Hohenzollernstr. 133, 41061 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
pgp fingerprint: 5C48 FE61 57F4 9179 5970  87C6 4C5A 6BAB 12D2 A7AE