Thread: Contrib reindex script:
I modeled this after the vacuumdb script that is distributed in the postgresql-base distribution. It can reindex a whole database, specific tables, or just a single index. Being as we've had so much trouble with things growing out of control, and the group seems to have tracked it down to indexes, here it is. I don't understand why we have a vacuumdb script, but not a reindex script, considering the fact that vacuum doesn't clean up indexes. Anyway, I didn't bother to comment it, I just copied vacuumdb and changed relevant parts. It's not pretty, but it works. Maybe later I'll produce one that doesn't look like it was coughed up by a robot. -- +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+ | Shaun M. Thomas INN Database Administrator | | Phone: (309) 743-0812 Fax : (309) 743-0830 | | Email: sthomas@townnews.com AIM : trifthen | | Web : www.townnews.com | | | | "Most of our lives are about proving something, either to | | ourselves or to someone else." | | -- Anonymous | +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
Attachment
Shaun Thomas <sthomas@townnews.com> writes: > I modeled this after the vacuumdb script that is distributed in the > postgresql-base distribution. It can reindex a whole database, specific > tables, or just a single index. Being as we've had so much trouble with > things growing out of control, and the group seems to have tracked it > down to indexes, here it is. > > I don't understand why we have a vacuumdb script, but not a reindex > script, considering the fact that vacuum doesn't clean up indexes. > > Anyway, I didn't bother to comment it, I just copied vacuumdb and > changed relevant parts. It's not pretty, but it works. Maybe later > I'll produce one that doesn't look like it was coughed up by a robot. Hey thanks, I was just wondering if such a thing were available. How has your migration to 7.2? Are you still have problems with the database growing out of control? Jason
On 7 May 2002, Jason Earl wrote: > Hey thanks, I was just wondering if such a thing were available. Here also is an updated version of the script. I've cleaned up some of the code, heavily commented it, and fixed a bug or two. Remember, this is basically just vacuumdb, so it'll take all the same parameters (except the obvious ones like -z and -f) and you can put it in your postgres bin directory. I have a line in cron that runs it every two hours with the -a option with the same user that owns the install. Works great! > > How has your migration to 7.2? Are you still have problems with the > database growing out of control? Actually, now that I have this script to basically be a vacuum script for indexes to go along with vacuumdb, I've arrested the database growth. The database I was groaning about before is standing firm around 87MB instead of the 300MB it would normally be by now. So far, 7.2 is fine. Database load is a non issue, growth is gone thanks to the reindex script, and I couldn't be happier. Now the real question is, why can't Postgres have a monitor that does these two things (vacuum, reindex) automatically throughout the day? Something that just trawls the tables doing a continuous partial vacuum, and triggers on deletes and updates to keep the indexes consistant. If what they say is true about row reuse, then I could get rid of both my vacuumdb and reindexdb scripts at that point. 7.3? -- +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+ | Shaun M. Thomas INN Database Administrator | | Phone: (309) 743-0812 Fax : (309) 743-0830 | | Email: sthomas@townnews.com AIM : trifthen | | Web : www.townnews.com | | | | "Most of our lives are about proving something, either to | | ourselves or to someone else." | | -- Anonymous | +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
Attachment
Shaun Thomas <sthomas@townnews.com> writes: > On 7 May 2002, Jason Earl wrote: > > > Hey thanks, I was just wondering if such a thing were available. > > Here also is an updated version of the script. I've cleaned up some > of the code, heavily commented it, and fixed a bug or two. > Remember, this is basically just vacuumdb, so it'll take all the > same parameters (except the obvious ones like -z and -f) and you can > put it in your postgres bin directory. I have a line in cron that > runs it every two hours with the -a option with the same user that > owns the install. > > Works great! Perfect. Most of my tables don't get too much turnover, my application is basically insert-heavy (which is why I went with PostgreSQL in the first place), so I don't really have problems with indexes growing out of control. But with automated scripts to maintain the database PostgreSQL can be used in a lot more applications. > > How has your migration to 7.2? Are you still have problems with > > the database growing out of control? > > Actually, now that I have this script to basically be a vacuum > script for indexes to go along with vacuumdb, I've arrested the > database growth. The database I was groaning about before is > standing firm around 87MB instead of the 300MB it would normally be > by now. That is good to know. My own tests have shown that PostgreSQL is now to the point where I consider it to be more than acceptable for use in 24/7 applications, but I don't have the real-world experience with high turnover tables that you have. > So far, 7.2 is fine. Database load is a non issue, growth is gone > thanks to the reindex script, and I couldn't be happier. Now the > real question is, why can't Postgres have a monitor that does these > two things (vacuum, reindex) automatically throughout the day? > Something that just trawls the tables doing a continuous partial > vacuum, and triggers on deletes and updates to keep the indexes > consistant. I believe that this particular issue has come up on hackers a couple of times. However, with workable vacuum and reindex scripts this sort of thing shouldn't be too hard to automate. There is something to be said for a more flexible manual solution as it allows the DBA to vacuum and reindex according to their own needs. Then again, perhaps I am just biased due to the fact that I like software with lots of buttons and levers to play with. Worse comes to worse a few cron jobs and you pretty much have a monitor that automatically reindexes and vacuums throughout the day :). > If what they say is true about row reuse, then I could get rid of > both my vacuumdb and reindexdb scripts at that point. > > 7.3? Take Care, Jason
Just want to say thanks for the reindex script. Today I used it as the basis of a perl script which I've cron'ed and produces a daily report of table/index name and size for all tables/indexes in my db so I can see if things are growing out of control. Thanks to this new tool, I found several out of control indexes. One example: I had a 55 megabyte index which reduced to 44k. Overall, I reduced my db size from 550 megs to 300. Additionally, the web interface which this database is powering is much more responsive. We have several high-turnover tables. We have one of only about 1500 rows but which experiences nearly 100% turnover every 5 minutes. As a first-time PostgreSQL administrator, I've been nagging myself to figure out a way to monitor db growth in this manner. Of course, there was always something else to work on at the time. Thanks again, Fran
I noticed that it is possible to build an index on a function to speed retrieval of data. The example in the docs was a create index on a name field forced to lower case. Would it be possible to make an index on an aggregate to speed the return of aggregate info, while keeping it up to date as the data changes? I know that index maintenance would be HUGE overhead, but on mostly static data that gets occasional updates, it could be good.
On Wed, 8 May 2002, grant wrote: > I noticed that it is possible to build an index on a function to speed > retrieval of data. The example in the docs was a create index on a name > field forced to lower case. Would it be possible to make an index on an > aggregate to speed the return of aggregate info, while keeping it up to > date as the data changes? I know that index maintenance would be HUGE > overhead, but on mostly static data that gets occasional updates, it could > be good. No, it isn't possible to create an index on a function. -- Nigel J. Andrews Director --- Logictree Systems Limited Computer Consultants
On Wed, 8 May 2002, grant wrote: > I noticed that it is possible to build an index on a function to speed > retrieval of data. The example in the docs was a create index on a name > field forced to lower case. Would it be possible to make an index on an > aggregate to speed the return of aggregate info, while keeping it up to > date as the data changes? I know that index maintenance would be HUGE > overhead, but on mostly static data that gets occasional updates, it could > be good. You can't currently do that with a functional index, but you should be able to do that via triggers.
Given the real file growth problems we have analyzed over the past month, I think we need to rethink the tools we give to users. First, I think we need a section in the admin/maintenance manual talking about reindex. (We already have vacuum there, and I think reindex is something that periodically also should be done.) Second, reindex on an entire database only works for stand-alone databases. Now, I know we can't reindex system tables on a live database, but should we enable REINDEX all non-system tables for a database, similar to how we use vacuum? Do we need the reindex script like vacuumdb? --------------------------------------------------------------------------- Shaun Thomas wrote: > I modeled this after the vacuumdb script that is distributed in the > postgresql-base distribution. It can reindex a whole database, specific > tables, or just a single index. Being as we've had so much trouble with > things growing out of control, and the group seems to have tracked it > down to indexes, here it is. > > I don't understand why we have a vacuumdb script, but not a reindex > script, considering the fact that vacuum doesn't clean up indexes. > > Anyway, I didn't bother to comment it, I just copied vacuumdb and > changed relevant parts. It's not pretty, but it works. Maybe later > I'll produce one that doesn't look like it was coughed up by a robot. > > -- > +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+ > | Shaun M. Thomas INN Database Administrator | > | Phone: (309) 743-0812 Fax : (309) 743-0830 | > | Email: sthomas@townnews.com AIM : trifthen | > | Web : www.townnews.com | > | | > | "Most of our lives are about proving something, either to | > | ourselves or to someone else." | > | -- Anonymous | > +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+ > Content-Description: [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
If you are going to implement this is there a way to do it so that it doesn't lock the tables? We have had to reindex on a regular basis and it locks up the tables when we do. The same problem with the old vacuum David Blood -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Bruce Momjian Sent: Thursday, June 13, 2002 11:13 AM To: Shaun Thomas Cc: PostgreSQL general list Subject: Re: [GENERAL] Contrib reindex script: Given the real file growth problems we have analyzed over the past month, I think we need to rethink the tools we give to users. First, I think we need a section in the admin/maintenance manual talking about reindex. (We already have vacuum there, and I think reindex is something that periodically also should be done.) Second, reindex on an entire database only works for stand-alone databases. Now, I know we can't reindex system tables on a live database, but should we enable REINDEX all non-system tables for a database, similar to how we use vacuum? Do we need the reindex script like vacuumdb? ------------------------------------------------------------------------ --- Shaun Thomas wrote: > I modeled this after the vacuumdb script that is distributed in the > postgresql-base distribution. It can reindex a whole database, specific > tables, or just a single index. Being as we've had so much trouble with > things growing out of control, and the group seems to have tracked it > down to indexes, here it is. > > I don't understand why we have a vacuumdb script, but not a reindex > script, considering the fact that vacuum doesn't clean up indexes. > > Anyway, I didn't bother to comment it, I just copied vacuumdb and > changed relevant parts. It's not pretty, but it works. Maybe later > I'll produce one that doesn't look like it was coughed up by a robot. > > -- > +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+ > | Shaun M. Thomas INN Database Administrator | > | Phone: (309) 743-0812 Fax : (309) 743-0830 | > | Email: sthomas@townnews.com AIM : trifthen | > | Web : www.townnews.com | > | | > | "Most of our lives are about proving something, either to | > | ourselves or to someone else." | > | -- Anonymous | > +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+ > Content-Description: [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
On Thu, 13 Jun 2002 13:13:29 -0400 (EDT) "Bruce Momjian" <pgman@candle.pha.pa.us> wrote: > First, I think we need a section in the admin/maintenance manual talking > about reindex. (We already have vacuum there, and I think reindex is > something that periodically also should be done.) One simple improvement would be to change the documentation's view of REINDEX -- at the moment, it says that it's only useful for rebuilding "corrupted indexes". If it's a legitimate maintainence activity for certain query loads, it should get a more positive spin. Also, could we enhance REINDEX to not require an exclusive table lock? Without looking at the code, I would think that we could at least allow for SELECTs (using seqscans or other indexes) while a REINDEX is running. Cheers, Neil -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC
Neil Conway <nconway@klamath.dyndns.org> writes: > Also, could we enhance REINDEX to not require an exclusive table lock? > Without looking at the code, I would think that we could at least allow > for SELECTs (using seqscans or other indexes) while a REINDEX is running. Not unless you see a way to prevent those SELECTs from trying to use the index... Personally I think putting development effort into REINDEX is a waste of manpower. Study how to make btree collapse out empty pages, instead. regards, tom lane
Tom Lane wrote: > Neil Conway <nconway@klamath.dyndns.org> writes: > > Also, could we enhance REINDEX to not require an exclusive table lock? > > Without looking at the code, I would think that we could at least allow > > for SELECTs (using seqscans or other indexes) while a REINDEX is running. > > Not unless you see a way to prevent those SELECTs from trying to use the > index... Well, we should recreate the index as a separate file, then switch over to the new file use relfilenode. > Personally I think putting development effort into REINDEX is a waste of > manpower. Study how to make btree collapse out empty pages, instead. Agreed, but what plans do we have to fix it? I have heard no proposals, and it has been a known problem for years. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tom Lane wrote: >> Personally I think putting development effort into REINDEX is a waste of >> manpower. Study how to make btree collapse out empty pages, instead. > Agreed, but what plans do we have to fix it? I have heard no proposals, > and it has been a known problem for years. Lots of important things have been on the TODO list for years. Diverting effort into third-best substitutes isn't going to help get them done faster. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom Lane wrote: > >> Personally I think putting development effort into REINDEX is a waste of > >> manpower. Study how to make btree collapse out empty pages, instead. > > > Agreed, but what plans do we have to fix it? I have heard no proposals, > > and it has been a known problem for years. > > Lots of important things have been on the TODO list for years. > Diverting effort into third-best substitutes isn't going to help get > them done faster. Well, not providing tools for people to work around the problem seems wrong too. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tom Lane wrote: >> Diverting effort into third-best substitutes isn't going to help get >> them done faster. > Well, not providing tools for people to work around the problem seems > wrong too. We *have* such tools: REINDEX is certainly usable as-is. The improvements proposed in this thread are just polishing a buggy whip that will be obsolete as soon as someone tackles the real problem. I don't object to adding the already-submitted reindex script into contrib; that costs nothing. But the time of people who would be able to improve the backend implementation of REINDEX is valuable, and would be better spent elsewhere. IMHO anyway. regards, tom lane
Shaun Thomas wrote: > On 7 May 2002, Jason Earl wrote: > > > Hey thanks, I was just wondering if such a thing were available. > > Here also is an updated version of the script. I've cleaned up some of > the code, heavily commented it, and fixed a bug or two. Remember, this > is basically just vacuumdb, so it'll take all the same parameters > (except the obvious ones like -z and -f) and you can put it in your > postgres bin directory. I have a line in cron that runs it every two > hours with the -a option with the same user that owns the install. > > Works great! OK, I have added your reindex script to CVS /contrib. > > How has your migration to 7.2? Are you still have problems with the > > database growing out of control? > > Actually, now that I have this script to basically be a vacuum script > for indexes to go along with vacuumdb, I've arrested the database > growth. The database I was groaning about before is standing firm > around 87MB instead of the 300MB it would normally be by now. > > So far, 7.2 is fine. Database load is a non issue, growth is gone > thanks to the reindex script, and I couldn't be happier. Now the real > question is, why can't Postgres have a monitor that does these two > things (vacuum, reindex) automatically throughout the day? Something > that just trawls the tables doing a continuous partial vacuum, and > triggers on deletes and updates to keep the indexes consistant. Yep, it would be nice. I now realize there isn't even a way to see index wastage. I think Tatsuo was working on such a script for contrib. I have also added information to the SGML manual under maintenance: <para> <productname>PostgreSQL</productname> is unable to reuse btree index pages in certain cases. The problem is that if indexed rows are deleted, those index pages can only be reused by rows with similar values. For example, if indexed rows are deleted and newly inserted/updated rows have much higher values, the new rows can't use the index space made available by the deleted rows. Instead, such new rows must be placed on new index pages. In such cases, disk space used by the index will grow indefinately, even if <command>VACUUM</> is run frequently. </para> <para> As a solution, you can use the <command>REINDEX</> command periodically to discard pages used by deleted rows. There is also <filename>contrib/reindex</> which can reindex an entire database. </para> -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Neil Conway wrote: > On Thu, 13 Jun 2002 13:13:29 -0400 (EDT) > "Bruce Momjian" <pgman@candle.pha.pa.us> wrote: > > First, I think we need a section in the admin/maintenance manual talking > > about reindex. (We already have vacuum there, and I think reindex is > > something that periodically also should be done.) > > One simple improvement would be to change the documentation's view of > REINDEX -- at the moment, it says that it's only useful for rebuilding > "corrupted indexes". If it's a legitimate maintainence activity for > certain query loads, it should get a more positive spin. Done. > Also, could we enhance REINDEX to not require an exclusive table lock? > Without looking at the code, I would think that we could at least allow > for SELECTs (using seqscans or other indexes) while a REINDEX is running. Yes, that would be nice. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026