Thread: ANALYZE after restore

ANALYZE after restore

From
"Christopher Kings-Lynne"
Date:
Hi,

Would it be an idea to have pg_dump append an ANALYZE; command to the end of
its dumps to assist newbies / inexperienced admins?

Reason being is that I noticed that when I just restored a 50MB dump that
the pg_statistic table had no contents...

I think it'd be an idea...

Chris



Re: ANALYZE after restore

From
Neil Conway
Date:
On Wed, 3 Apr 2002 09:40:13 +0800
"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> wrote:
> Hi,
> 
> Would it be an idea to have pg_dump append an ANALYZE; command to the end of
> its dumps to assist newbies / inexperienced admins?

That strikes me as a good idea; a lot of the questions we get on
-general and on IRC are solved by suggesting "have you run ANALYZE?"
And that is only the sub-section of the user community that takes the
time to track down the problem and posts about it to the mailing
list -- I shudder to think how many people have never taken the time
to tune their database at all.

Given that ANALYZE is now a separate command, so there is no need to
run a VACUUM (which could be much more expensive); furthermore, since
ANALYZE now only takes a statistical sampling of the full table, it
shouldn't take very long, even on large tables. However, I'd say we
should make this behavior optional, controlled by a command-line
switch, but it should be enabled by default.

Cheers,

Neil

-- 
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC


Re: ANALYZE after restore

From
Gavin Sherry
Date:
On Wed, 3 Apr 2002, Christopher Kings-Lynne wrote:

> Hi,
> 
> Would it be an idea to have pg_dump append an ANALYZE; command to the end of
> its dumps to assist newbies / inexperienced admins?

I do not think this is desired behaviour. Firstly, pg_dump is not just for
restoring data to the system. Presumably another flag would need to be
added to pg_dump to prevent an ANALYZE being appended. This is messing
and, in my opinion, it goes against the 'does what it says it does' nature
of Postgres. Secondly, in experienced admins are not going to get
experienced with database management unless they see that their database
runs like a dog and they have to read the manual.

Gavin



Re: ANALYZE after restore

From
Tom Lane
Date:
Gavin Sherry <swm@linuxworld.com.au> writes:
> On Wed, 3 Apr 2002, Christopher Kings-Lynne wrote:
>> Would it be an idea to have pg_dump append an ANALYZE; command to the end of
>> its dumps to assist newbies / inexperienced admins?

> I do not think this is desired behaviour.

I agree with Gavin here ... a forced VACUUM or ANALYZE after a restore
will just get in the way of people who know what they're doing, and it's
not at all clear that it will help people who do not.
        regards, tom lane


Re: ANALYZE after restore

From
Justin Clift
Date:
Tom Lane wrote:
> 
> Gavin Sherry <swm@linuxworld.com.au> writes:
> > On Wed, 3 Apr 2002, Christopher Kings-Lynne wrote:
> >> Would it be an idea to have pg_dump append an ANALYZE; command to the end of
> >> its dumps to assist newbies / inexperienced admins?
> 
> > I do not think this is desired behaviour.
> 
> I agree with Gavin here ... a forced VACUUM or ANALYZE after a restore
> will just get in the way of people who know what they're doing, and it's
> not at all clear that it will help people who do not.

Sorry Tom and Gavin, but I feel it really comes down to our idea of what
we're
trying to do here :

a) A database which is very self-maintaining, so people DON'T HAVE to  learn it's intricacies in order to be getting
decentperformance.  (They'll have to learn the intricacies if they want *better*
 
performance)

b) A database which works.  But if you want decent performance, you'd
better  take the time and effort to learn it.  (This is the approach the commercial vendors take)

I feel we should always target a) where it's possible to without it
seriously
getting in the way of people who've take the time to learn the skills.

The far majority of people who use PostgreSQL are in the category which
will
benefit from a) so they can put their time to other uses instead of
having to
learn and keep-up-to-date-with PostgreSQL.  This will *always* be the
case.

Having decent performance by default should definitely be an important
objective, so having an ANALYZE command run at the end of a restore - by
default only - is a good idea.

Regards and best wishes,

Justin Clift

>                         regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

-- 
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."  - Indira Gandhi


Re: ANALYZE after restore

From
Jan Wieck
Date:
Justin Clift wrote:
> Tom Lane wrote:
> >
> > Gavin Sherry <swm@linuxworld.com.au> writes:
> > > On Wed, 3 Apr 2002, Christopher Kings-Lynne wrote:
> > >> Would it be an idea to have pg_dump append an ANALYZE; command to the end of
> > >> its dumps to assist newbies / inexperienced admins?
> >
> > > I do not think this is desired behaviour.
> >
> > I agree with Gavin here ... a forced VACUUM or ANALYZE after a restore
> > will just get in the way of people who know what they're doing, and it's
> > not at all clear that it will help people who do not.
>
> Sorry Tom and Gavin, but I feel it really comes down to our idea of what
> we're
> trying to do here :
>
> a) A database which is very self-maintaining, so people DON'T HAVE to
>    learn it's intricacies in order to be getting decent performance.
>    (They'll have to learn the intricacies if they want *better*
> performance)
   The  defaults  after  a  restore should result in index scans   most  of  the  time,  resulting   in   some   medium
 decent   performance.   And  PostgreSQL  needs  some  frequent  VACUUM   anyway, so after a while this problem solves
itself for  the   average user.
 
   A  database  wide  forced  VACUUM  on the other hand can make   things worse.  I have  seen  scenarios,  where  you
have to   explicitly  leave out ANALYZE for specific tables in order to   keep them index-scanned. So what you're
proposingis to force   professional  PostgreSQL  users  to  wait after restore for a   useless ANALYZE to complete,
before they  can  reset  things   with  a normal VACUUM to get their required performance back?   And all that just to
makedummies happier?
 


Jan

> b) A database which works.  But if you want decent performance, you'd
> better
>    take the time and effort to learn it.
>    (This is the approach the commercial vendors take)
>
> I feel we should always target a) where it's possible to without it
> seriously
> getting in the way of people who've take the time to learn the skills.
>
> The far majority of people who use PostgreSQL are in the category which
> will
> benefit from a) so they can put their time to other uses instead of
> having to
> learn and keep-up-to-date-with PostgreSQL.  This will *always* be the
> case.
>
> Having decent performance by default should definitely be an important
> objective, so having an ANALYZE command run at the end of a restore - by
> default only - is a good idea.
>
> Regards and best wishes,
>
> Justin Clift
>
> >                         regards, tom lane
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>
> --
> "My grandfather once told me that there are two kinds of people: those
> who work and those who take the credit. He told me to try to be in the
> first group; there was less competition there."
>    - Indira Gandhi
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>


--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #




Re: ANALYZE after restore

From
Justin Clift
Date:
Hi Jan,

Jan Wieck wrote:
> 
<snip>
>     The  defaults  after  a  restore should result in index scans
>     most  of  the  time,  resulting   in   some   medium   decent
>     performance.   And  PostgreSQL  needs  some  frequent  VACUUM
>     anyway, so after a while this problem solves itself  for  the
>     average user.
> 
>     A  database  wide  forced  VACUUM  on the other hand can make
>     things worse.  I have  seen  scenarios,  where  you  have  to
>     explicitly  leave out ANALYZE for specific tables in order to
>     keep them index-scanned. So what you're proposing is to force
>     professional  PostgreSQL  users  to  wait after restore for a
>     useless ANALYZE to complete, before  they  can  reset  things
>     with  a normal VACUUM to get their required performance back?
>     And all that just to make dummies happier?
> 
> Jan

Nope, I'm figuring that if it's an option, and the option is on by
default, then for the majority of people that will be a good thing.

Anyone that's a professional PostgreSQL user will know about to turn the
option off i.e. pg_dump --something (etc).  Sure, we all make mistakes
and will forget now and again, but I don't think that should stop us
from taking into account that the majority of users out there are fairly
PostgreSQL clue-less.

If we can make it easy without much inconvenience and without
sacrificing the power of the database, we should.

:-)

Regards and best wishes,

Justin Clift

<snip>

-- 
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."  - Indira Gandhi


Re: ANALYZE after restore

From
Tom Lane
Date:
Jan Wieck <janwieck@yahoo.com> writes:
>     ... And  PostgreSQL  needs  some  frequent  VACUUM
>     anyway, so after a while this problem solves itself  for  the
>     average user.

Yes, that's the key point for me too.  Anyone who doesn't set up for
routine vacuums/analyzes is going to have performance problems anyway.
Attacking that by making pg_dump force a vacuum is attacking the wrong
place.

There's been discussion of adding automatic background vacuums to
Postgres; that seems like a more useful response to the issue.
        regards, tom lane


Re: ANALYZE after restore

From
Justin Clift
Date:
Tom Lane wrote:
> 
> Jan Wieck <janwieck@yahoo.com> writes:
> >     ... And  PostgreSQL  needs  some  frequent  VACUUM
> >     anyway, so after a while this problem solves itself  for  the
> >     average user.
> 
> Yes, that's the key point for me too.  Anyone who doesn't set up for
> routine vacuums/analyzes is going to have performance problems anyway.
> Attacking that by making pg_dump force a vacuum is attacking the wrong
> place.

Hi Tom,

Good point.  Although I also think we're talking about two different
things here.

No-one is proposing running a VACCUM after the load, but instead getting
some accurate statistics about the data which was loaded.

I agree adding an automatic background vacuum thread/process/something
will be really, really useful too.  
Should we instead have this proposed automatic background something also
update the statistics every now and again?

If so, I think this will all be a moot point.

:-)

Regards and best wishes,

Justin Clift

> There's been discussion of adding automatic background vacuums to
> Postgres; that seems like a more useful response to the issue.
> 
>                         regards, tom lane

-- 
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."  - Indira Gandhi


Re: ANALYZE after restore

From
Tom Lane
Date:
Justin Clift <justin@postgresql.org> writes:
> I agree adding an automatic background vacuum thread/process/something
> will be really, really useful too.  
> Should we instead have this proposed automatic background something also
> update the statistics every now and again?

Yes, I had always assumed that would be part of the feature ...
        regards, tom lane


Re: ANALYZE after restore

From
Justin Clift
Date:
Tom Lane wrote:
> 
> Justin Clift <justin@postgresql.org> writes:
> > I agree adding an automatic background vacuum thread/process/something
> > will be really, really useful too.
> > Should we instead have this proposed automatic background something also
> > update the statistics every now and again?
>
> Yes, I had always assumed that would be part of the feature ...

Hi Tom,

Cool.  I wasn't sure of that (probably haven't been following the
correct threads).

That makes way more sense then.

:-)

Regards and best wishes,

Justin Clift


> 
>                         regards, tom lane

-- 
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."  - Indira Gandhi


Re: ANALYZE after restore

From
Hannu Krosing
Date:
On Wed, 2002-04-03 at 06:52, Gavin Sherry wrote:
> On Wed, 3 Apr 2002, Christopher Kings-Lynne wrote:
> 
> > Hi,
> > 
> > Would it be an idea to have pg_dump append an ANALYZE; command to the end of
> > its dumps to assist newbies / inexperienced admins?
> 
> I do not think this is desired behaviour. Firstly, pg_dump is not just for
> restoring data to the system. Presumably another flag would need to be
> added to pg_dump to prevent an ANALYZE being appended.

Yes.

> This is messing and, in my opinion, it goes against the 'does what it says> it does' nature of Postgres.

What does pg_dump say it does ?

Or should pg_dump append ANALYZE only if it determines that ANALYZE has
been run on the database being dumped ?

Do you have any tools that will break when ANALYZE is added, (and which
don't break on the weird way of dumping foreign keys ;) ?

> Secondly, in experienced admins are not going to get
> experienced with database management unless they see that their database
> runs like a dog and they have to read the manual.

Rather they think that the database is indeed designed to run like a
dog.

For _forcing_ them newbies to learn we could append a new UNANALYZE
command that inserts delibarately bogus info into pg_statistic to make
it perform even worse by default ;)

In general, I'd prefer a database that has no need to be explicitly
maintained. How many experienced file-system managers do you know ?

---------------------
Hannu




Re: ANALYZE after restore

From
Scott Marlowe
Date:
On 3 Apr 2002, Hannu Krosing wrote:

> On Wed, 2002-04-03 at 06:52, Gavin Sherry wrote:
> > On Wed, 3 Apr 2002, Christopher Kings-Lynne wrote:
> > 
> > > Hi,
> > > 
> > > Would it be an idea to have pg_dump append an ANALYZE; command to the end of
> > > its dumps to assist newbies / inexperienced admins?
> > 
> > I do not think this is desired behaviour. Firstly, pg_dump is not just for
> > restoring data to the system. Presumably another flag would need to be
> > added to pg_dump to prevent an ANALYZE being appended.
> 
> Yes.
> 
> > This is messing and, in my opinion, it goes against the 'does what it says> it does' nature of Postgres.
> 
> What does pg_dump say it does ?

from man pg_dump:

pg_dump - extract a PostgreSQL database into a script file or other 
archive file

Pretty simple really.

I've been using postgresql for about three years now, and it only took me 
about 15  minutes of reading the docs to find the vacuum and vacuum 
analyze command.  It was far harder to figure out subselects, 
transactions, outer joins, unions, and a dozen other things than vacuum.  
I was a total database newbie back then, by the way.

One of the things I liked about postgresql was that it wasn't stuffed full 
of marketing fluff to try and impress the PHBs at the top of the corporate 
ladder, but was full of useful extensibility and was very much a "do what 
it said it would" database.

while I agree that postgresql could do with some automated housekeeping 
routines that would allow joe sixpack to grab it and go, no database that 
has real power is going to run very well without some administration, 
period.

The last place to put house keeping is in the end of my data dumps.  
pg_dump's job is to dump the data from my database in a format that is as 
transportable as possible.  not to hold my hand the next time I need to 
load data into my own database.  

While I fully support a switch like -z on pg_dump that puts an analyze on 
the end of my dumps if I so choose, I don't want them showing up 
automatically and me wondering if the data feeds I make for other will 
work.  

I can see junior dbas who don't understand vacuum and analyze recommending 
to people that they need to dump / restore their whole database once a 
week to get good performance if we add aht analyze switch to the end of 
the pg_dump file.  NOT a good thing.  :-)

anywho, I don't post much here, cause I don't hack postgresql that much, 
but I love this database, and I don't want it filled up with useless 
marketing cruft like analyze being haphazardly tacked onto the pg_dump 
output, so my vote is a great big NO.