Thread: ANALYZE after restore
Hi, Would it be an idea to have pg_dump append an ANALYZE; command to the end of its dumps to assist newbies / inexperienced admins? Reason being is that I noticed that when I just restored a 50MB dump that the pg_statistic table had no contents... I think it'd be an idea... Chris
On Wed, 3 Apr 2002 09:40:13 +0800 "Christopher Kings-Lynne" <chriskl@familyhealth.com.au> wrote: > Hi, > > Would it be an idea to have pg_dump append an ANALYZE; command to the end of > its dumps to assist newbies / inexperienced admins? That strikes me as a good idea; a lot of the questions we get on -general and on IRC are solved by suggesting "have you run ANALYZE?" And that is only the sub-section of the user community that takes the time to track down the problem and posts about it to the mailing list -- I shudder to think how many people have never taken the time to tune their database at all. Given that ANALYZE is now a separate command, so there is no need to run a VACUUM (which could be much more expensive); furthermore, since ANALYZE now only takes a statistical sampling of the full table, it shouldn't take very long, even on large tables. However, I'd say we should make this behavior optional, controlled by a command-line switch, but it should be enabled by default. Cheers, Neil -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC
On Wed, 3 Apr 2002, Christopher Kings-Lynne wrote: > Hi, > > Would it be an idea to have pg_dump append an ANALYZE; command to the end of > its dumps to assist newbies / inexperienced admins? I do not think this is desired behaviour. Firstly, pg_dump is not just for restoring data to the system. Presumably another flag would need to be added to pg_dump to prevent an ANALYZE being appended. This is messing and, in my opinion, it goes against the 'does what it says it does' nature of Postgres. Secondly, in experienced admins are not going to get experienced with database management unless they see that their database runs like a dog and they have to read the manual. Gavin
Gavin Sherry <swm@linuxworld.com.au> writes: > On Wed, 3 Apr 2002, Christopher Kings-Lynne wrote: >> Would it be an idea to have pg_dump append an ANALYZE; command to the end of >> its dumps to assist newbies / inexperienced admins? > I do not think this is desired behaviour. I agree with Gavin here ... a forced VACUUM or ANALYZE after a restore will just get in the way of people who know what they're doing, and it's not at all clear that it will help people who do not. regards, tom lane
Tom Lane wrote: > > Gavin Sherry <swm@linuxworld.com.au> writes: > > On Wed, 3 Apr 2002, Christopher Kings-Lynne wrote: > >> Would it be an idea to have pg_dump append an ANALYZE; command to the end of > >> its dumps to assist newbies / inexperienced admins? > > > I do not think this is desired behaviour. > > I agree with Gavin here ... a forced VACUUM or ANALYZE after a restore > will just get in the way of people who know what they're doing, and it's > not at all clear that it will help people who do not. Sorry Tom and Gavin, but I feel it really comes down to our idea of what we're trying to do here : a) A database which is very self-maintaining, so people DON'T HAVE to learn it's intricacies in order to be getting decentperformance. (They'll have to learn the intricacies if they want *better* performance) b) A database which works. But if you want decent performance, you'd better take the time and effort to learn it. (This is the approach the commercial vendors take) I feel we should always target a) where it's possible to without it seriously getting in the way of people who've take the time to learn the skills. The far majority of people who use PostgreSQL are in the category which will benefit from a) so they can put their time to other uses instead of having to learn and keep-up-to-date-with PostgreSQL. This will *always* be the case. Having decent performance by default should definitely be an important objective, so having an ANALYZE command run at the end of a restore - by default only - is a good idea. Regards and best wishes, Justin Clift > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org -- "My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there." - Indira Gandhi
Justin Clift wrote: > Tom Lane wrote: > > > > Gavin Sherry <swm@linuxworld.com.au> writes: > > > On Wed, 3 Apr 2002, Christopher Kings-Lynne wrote: > > >> Would it be an idea to have pg_dump append an ANALYZE; command to the end of > > >> its dumps to assist newbies / inexperienced admins? > > > > > I do not think this is desired behaviour. > > > > I agree with Gavin here ... a forced VACUUM or ANALYZE after a restore > > will just get in the way of people who know what they're doing, and it's > > not at all clear that it will help people who do not. > > Sorry Tom and Gavin, but I feel it really comes down to our idea of what > we're > trying to do here : > > a) A database which is very self-maintaining, so people DON'T HAVE to > learn it's intricacies in order to be getting decent performance. > (They'll have to learn the intricacies if they want *better* > performance) The defaults after a restore should result in index scans most of the time, resulting in some medium decent performance. And PostgreSQL needs some frequent VACUUM anyway, so after a while this problem solves itself for the average user. A database wide forced VACUUM on the other hand can make things worse. I have seen scenarios, where you have to explicitly leave out ANALYZE for specific tables in order to keep them index-scanned. So what you're proposingis to force professional PostgreSQL users to wait after restore for a useless ANALYZE to complete, before they can reset things with a normal VACUUM to get their required performance back? And all that just to makedummies happier? Jan > b) A database which works. But if you want decent performance, you'd > better > take the time and effort to learn it. > (This is the approach the commercial vendors take) > > I feel we should always target a) where it's possible to without it > seriously > getting in the way of people who've take the time to learn the skills. > > The far majority of people who use PostgreSQL are in the category which > will > benefit from a) so they can put their time to other uses instead of > having to > learn and keep-up-to-date-with PostgreSQL. This will *always* be the > case. > > Having decent performance by default should definitely be an important > objective, so having an ANALYZE command run at the end of a restore - by > default only - is a good idea. > > Regards and best wishes, > > Justin Clift > > > regards, tom lane > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > -- > "My grandfather once told me that there are two kinds of people: those > who work and those who take the credit. He told me to try to be in the > first group; there was less competition there." > - Indira Gandhi > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Hi Jan, Jan Wieck wrote: > <snip> > The defaults after a restore should result in index scans > most of the time, resulting in some medium decent > performance. And PostgreSQL needs some frequent VACUUM > anyway, so after a while this problem solves itself for the > average user. > > A database wide forced VACUUM on the other hand can make > things worse. I have seen scenarios, where you have to > explicitly leave out ANALYZE for specific tables in order to > keep them index-scanned. So what you're proposing is to force > professional PostgreSQL users to wait after restore for a > useless ANALYZE to complete, before they can reset things > with a normal VACUUM to get their required performance back? > And all that just to make dummies happier? > > Jan Nope, I'm figuring that if it's an option, and the option is on by default, then for the majority of people that will be a good thing. Anyone that's a professional PostgreSQL user will know about to turn the option off i.e. pg_dump --something (etc). Sure, we all make mistakes and will forget now and again, but I don't think that should stop us from taking into account that the majority of users out there are fairly PostgreSQL clue-less. If we can make it easy without much inconvenience and without sacrificing the power of the database, we should. :-) Regards and best wishes, Justin Clift <snip> -- "My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there." - Indira Gandhi
Jan Wieck <janwieck@yahoo.com> writes: > ... And PostgreSQL needs some frequent VACUUM > anyway, so after a while this problem solves itself for the > average user. Yes, that's the key point for me too. Anyone who doesn't set up for routine vacuums/analyzes is going to have performance problems anyway. Attacking that by making pg_dump force a vacuum is attacking the wrong place. There's been discussion of adding automatic background vacuums to Postgres; that seems like a more useful response to the issue. regards, tom lane
Tom Lane wrote: > > Jan Wieck <janwieck@yahoo.com> writes: > > ... And PostgreSQL needs some frequent VACUUM > > anyway, so after a while this problem solves itself for the > > average user. > > Yes, that's the key point for me too. Anyone who doesn't set up for > routine vacuums/analyzes is going to have performance problems anyway. > Attacking that by making pg_dump force a vacuum is attacking the wrong > place. Hi Tom, Good point. Although I also think we're talking about two different things here. No-one is proposing running a VACCUM after the load, but instead getting some accurate statistics about the data which was loaded. I agree adding an automatic background vacuum thread/process/something will be really, really useful too. Should we instead have this proposed automatic background something also update the statistics every now and again? If so, I think this will all be a moot point. :-) Regards and best wishes, Justin Clift > There's been discussion of adding automatic background vacuums to > Postgres; that seems like a more useful response to the issue. > > regards, tom lane -- "My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there." - Indira Gandhi
Justin Clift <justin@postgresql.org> writes: > I agree adding an automatic background vacuum thread/process/something > will be really, really useful too. > Should we instead have this proposed automatic background something also > update the statistics every now and again? Yes, I had always assumed that would be part of the feature ... regards, tom lane
Tom Lane wrote: > > Justin Clift <justin@postgresql.org> writes: > > I agree adding an automatic background vacuum thread/process/something > > will be really, really useful too. > > Should we instead have this proposed automatic background something also > > update the statistics every now and again? > > Yes, I had always assumed that would be part of the feature ... Hi Tom, Cool. I wasn't sure of that (probably haven't been following the correct threads). That makes way more sense then. :-) Regards and best wishes, Justin Clift > > regards, tom lane -- "My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there." - Indira Gandhi
On Wed, 2002-04-03 at 06:52, Gavin Sherry wrote: > On Wed, 3 Apr 2002, Christopher Kings-Lynne wrote: > > > Hi, > > > > Would it be an idea to have pg_dump append an ANALYZE; command to the end of > > its dumps to assist newbies / inexperienced admins? > > I do not think this is desired behaviour. Firstly, pg_dump is not just for > restoring data to the system. Presumably another flag would need to be > added to pg_dump to prevent an ANALYZE being appended. Yes. > This is messing and, in my opinion, it goes against the 'does what it says> it does' nature of Postgres. What does pg_dump say it does ? Or should pg_dump append ANALYZE only if it determines that ANALYZE has been run on the database being dumped ? Do you have any tools that will break when ANALYZE is added, (and which don't break on the weird way of dumping foreign keys ;) ? > Secondly, in experienced admins are not going to get > experienced with database management unless they see that their database > runs like a dog and they have to read the manual. Rather they think that the database is indeed designed to run like a dog. For _forcing_ them newbies to learn we could append a new UNANALYZE command that inserts delibarately bogus info into pg_statistic to make it perform even worse by default ;) In general, I'd prefer a database that has no need to be explicitly maintained. How many experienced file-system managers do you know ? --------------------- Hannu
On 3 Apr 2002, Hannu Krosing wrote: > On Wed, 2002-04-03 at 06:52, Gavin Sherry wrote: > > On Wed, 3 Apr 2002, Christopher Kings-Lynne wrote: > > > > > Hi, > > > > > > Would it be an idea to have pg_dump append an ANALYZE; command to the end of > > > its dumps to assist newbies / inexperienced admins? > > > > I do not think this is desired behaviour. Firstly, pg_dump is not just for > > restoring data to the system. Presumably another flag would need to be > > added to pg_dump to prevent an ANALYZE being appended. > > Yes. > > > This is messing and, in my opinion, it goes against the 'does what it says> it does' nature of Postgres. > > What does pg_dump say it does ? from man pg_dump: pg_dump - extract a PostgreSQL database into a script file or other archive file Pretty simple really. I've been using postgresql for about three years now, and it only took me about 15 minutes of reading the docs to find the vacuum and vacuum analyze command. It was far harder to figure out subselects, transactions, outer joins, unions, and a dozen other things than vacuum. I was a total database newbie back then, by the way. One of the things I liked about postgresql was that it wasn't stuffed full of marketing fluff to try and impress the PHBs at the top of the corporate ladder, but was full of useful extensibility and was very much a "do what it said it would" database. while I agree that postgresql could do with some automated housekeeping routines that would allow joe sixpack to grab it and go, no database that has real power is going to run very well without some administration, period. The last place to put house keeping is in the end of my data dumps. pg_dump's job is to dump the data from my database in a format that is as transportable as possible. not to hold my hand the next time I need to load data into my own database. While I fully support a switch like -z on pg_dump that puts an analyze on the end of my dumps if I so choose, I don't want them showing up automatically and me wondering if the data feeds I make for other will work. I can see junior dbas who don't understand vacuum and analyze recommending to people that they need to dump / restore their whole database once a week to get good performance if we add aht analyze switch to the end of the pg_dump file. NOT a good thing. :-) anywho, I don't post much here, cause I don't hack postgresql that much, but I love this database, and I don't want it filled up with useless marketing cruft like analyze being haphazardly tacked onto the pg_dump output, so my vote is a great big NO.