Thread: Contrib reindex script:

Contrib reindex script:

From
Shaun Thomas
Date:
I modeled this after the vacuumdb script that is distributed in the
postgresql-base distribution.  It can reindex a whole database, specific
tables, or just a single index.  Being as we've had so much trouble with
things growing out of control, and the group seems to have tracked it
down to indexes, here it is.

I don't understand why we have a vacuumdb script, but not a reindex
script, considering the fact that vacuum doesn't clean up indexes.

Anyway, I didn't bother to comment it, I just copied vacuumdb and
changed relevant parts.  It's not pretty, but it works.  Maybe later
I'll produce one that doesn't look like it was coughed up by a robot.

--
+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
| Shaun M. Thomas                INN Database Administrator           |
| Phone: (309) 743-0812          Fax  : (309) 743-0830                |
| Email: sthomas@townnews.com    AIM  : trifthen                      |
| Web  : www.townnews.com                                             |
|                                                                     |
|     "Most of our lives are about proving something, either to       |
|      ourselves or to someone else."                                 |
|                                           -- Anonymous              |
+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+


Attachment

Re: Contrib reindex script:

From
Jason Earl
Date:
Shaun Thomas <sthomas@townnews.com> writes:

> I modeled this after the vacuumdb script that is distributed in the
> postgresql-base distribution.  It can reindex a whole database, specific
> tables, or just a single index.  Being as we've had so much trouble with
> things growing out of control, and the group seems to have tracked it
> down to indexes, here it is.
>
> I don't understand why we have a vacuumdb script, but not a reindex
> script, considering the fact that vacuum doesn't clean up indexes.
>
> Anyway, I didn't bother to comment it, I just copied vacuumdb and
> changed relevant parts.  It's not pretty, but it works.  Maybe later
> I'll produce one that doesn't look like it was coughed up by a robot.

Hey thanks, I was just wondering if such a thing were available.

How has your migration to 7.2?  Are you still have problems with the
database growing out of control?

Jason

Re: Contrib reindex script:

From
Shaun Thomas
Date:
On 7 May 2002, Jason Earl wrote:

> Hey thanks, I was just wondering if such a thing were available.

Here also is an updated version of the script.  I've cleaned up some of
the code, heavily commented it, and fixed a bug or two.  Remember, this
is basically just vacuumdb, so it'll take all the same parameters
(except the obvious ones like -z and -f) and you can put it in your
postgres bin directory.  I have a line in cron that runs it every two
hours with the -a option with the same user that owns the install.

Works great!

>
> How has your migration to 7.2?  Are you still have problems with the
> database growing out of control?

Actually, now that I have this script to basically be a vacuum script
for indexes to go along with vacuumdb, I've arrested the database
growth.  The database I was groaning about before is standing firm
around 87MB instead of the 300MB it would normally be by now.

So far, 7.2 is fine.  Database load is a non issue, growth is gone
thanks to the reindex script, and I couldn't be happier.  Now the real
question is, why can't Postgres have a monitor that does these two
things (vacuum, reindex) automatically throughout the day?  Something
that just trawls the tables doing a continuous partial vacuum, and
triggers on deletes and updates to keep the indexes consistant.

If what they say is true about row reuse, then I could get rid of both
my vacuumdb and reindexdb scripts at that point.

7.3?

--
+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
| Shaun M. Thomas                INN Database Administrator           |
| Phone: (309) 743-0812          Fax  : (309) 743-0830                |
| Email: sthomas@townnews.com    AIM  : trifthen                      |
| Web  : www.townnews.com                                             |
|                                                                     |
|     "Most of our lives are about proving something, either to       |
|      ourselves or to someone else."                                 |
|                                           -- Anonymous              |
+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+


Attachment

Re: Contrib reindex script:

From
Jason Earl
Date:
Shaun Thomas <sthomas@townnews.com> writes:

> On 7 May 2002, Jason Earl wrote:
>
> > Hey thanks, I was just wondering if such a thing were available.
>
> Here also is an updated version of the script.  I've cleaned up some
> of the code, heavily commented it, and fixed a bug or two.
> Remember, this is basically just vacuumdb, so it'll take all the
> same parameters (except the obvious ones like -z and -f) and you can
> put it in your postgres bin directory.  I have a line in cron that
> runs it every two hours with the -a option with the same user that
> owns the install.
>
> Works great!

Perfect.  Most of my tables don't get too much turnover, my
application is basically insert-heavy (which is why I went with
PostgreSQL in the first place), so I don't really have problems with
indexes growing out of control.  But with automated scripts to
maintain the database PostgreSQL can be used in a lot more
applications.

> > How has your migration to 7.2?  Are you still have problems with
> > the database growing out of control?
>
> Actually, now that I have this script to basically be a vacuum
> script for indexes to go along with vacuumdb, I've arrested the
> database growth.  The database I was groaning about before is
> standing firm around 87MB instead of the 300MB it would normally be
> by now.

That is good to know.  My own tests have shown that PostgreSQL is now
to the point where I consider it to be more than acceptable for use in
24/7 applications, but I don't have the real-world experience with
high turnover tables that you have.

> So far, 7.2 is fine.  Database load is a non issue, growth is gone
> thanks to the reindex script, and I couldn't be happier.  Now the
> real question is, why can't Postgres have a monitor that does these
> two things (vacuum, reindex) automatically throughout the day?
> Something that just trawls the tables doing a continuous partial
> vacuum, and triggers on deletes and updates to keep the indexes
> consistant.

I believe that this particular issue has come up on hackers a couple
of times.  However, with workable vacuum and reindex scripts this sort
of thing shouldn't be too hard to automate.  There is something to be
said for a more flexible manual solution as it allows the DBA to
vacuum and reindex according to their own needs.  Then again, perhaps
I am just biased due to the fact that I like software with lots of
buttons and levers to play with.

Worse comes to worse a few cron jobs and you pretty much have a
monitor that automatically reindexes and vacuums throughout the day
:).

> If what they say is true about row reuse, then I could get rid of
> both my vacuumdb and reindexdb scripts at that point.
>
> 7.3?

Take Care,
Jason

Re: Contrib reindex script:

From
Fran Fabrizio
Date:
Just want to say thanks for the reindex script.  Today I used it as the
basis of a perl script which I've cron'ed and produces a daily report of
table/index name and size for all tables/indexes in my db so I can see
if things are growing out of control.  Thanks to this new tool, I found
several out of control indexes.  One example:  I had a 55 megabyte index
which reduced to 44k.  Overall, I reduced my db size from 550 megs to
300.  Additionally, the  web interface which this database is powering
is much more responsive.

We have several high-turnover tables.  We have one of only about 1500
rows but which experiences nearly 100% turnover every 5 minutes.

As a first-time PostgreSQL administrator, I've been nagging myself to
figure out a way to monitor db growth in this manner.  Of course, there
was always something else to work on at the time.

Thanks again,
Fran


OLAP idea

From
grant
Date:
I noticed that it is possible to build an index on a function to speed
retrieval of data.  The example in the docs was a create index on a name
field forced to lower case.  Would it be possible to make an index on an
aggregate to speed the return of aggregate info, while keeping it up to
date as the data changes?  I know that index maintenance would be HUGE
overhead, but on mostly static data that gets occasional updates, it could
be good.


Re: OLAP idea

From
"Nigel J. Andrews"
Date:
On Wed, 8 May 2002, grant wrote:

> I noticed that it is possible to build an index on a function to speed
> retrieval of data.  The example in the docs was a create index on a name
> field forced to lower case.  Would it be possible to make an index on an
> aggregate to speed the return of aggregate info, while keeping it up to
> date as the data changes?  I know that index maintenance would be HUGE
> overhead, but on mostly static data that gets occasional updates, it could
> be good.

No, it isn't possible to create an index on a function.


--
Nigel J. Andrews
Director

---
Logictree Systems Limited
Computer Consultants


Re: OLAP idea

From
Stephan Szabo
Date:
On Wed, 8 May 2002, grant wrote:

> I noticed that it is possible to build an index on a function to speed
> retrieval of data.  The example in the docs was a create index on a name
> field forced to lower case.  Would it be possible to make an index on an
> aggregate to speed the return of aggregate info, while keeping it up to
> date as the data changes?  I know that index maintenance would be HUGE
> overhead, but on mostly static data that gets occasional updates, it could
> be good.

You can't currently do that with a functional index, but you should be
able to do that via triggers.



Re: Contrib reindex script:

From
Bruce Momjian
Date:
Given the real file growth problems we have analyzed over the past
month, I think we need to rethink the tools we give to users.

First, I think we need a section in the admin/maintenance manual talking
about reindex.  (We already have vacuum there, and I think reindex is
something that periodically also should be done.)

Second, reindex on an entire database only works for stand-alone
databases.  Now, I know we can't reindex system tables on a live
database, but should we enable REINDEX all non-system tables  for a
database, similar to how we use vacuum?  Do we need the reindex script
like vacuumdb?

---------------------------------------------------------------------------

Shaun Thomas wrote:
> I modeled this after the vacuumdb script that is distributed in the
> postgresql-base distribution.  It can reindex a whole database, specific
> tables, or just a single index.  Being as we've had so much trouble with
> things growing out of control, and the group seems to have tracked it
> down to indexes, here it is.
>
> I don't understand why we have a vacuumdb script, but not a reindex
> script, considering the fact that vacuum doesn't clean up indexes.
>
> Anyway, I didn't bother to comment it, I just copied vacuumdb and
> changed relevant parts.  It's not pretty, but it works.  Maybe later
> I'll produce one that doesn't look like it was coughed up by a robot.
>
> --
> +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
> | Shaun M. Thomas                INN Database Administrator           |
> | Phone: (309) 743-0812          Fax  : (309) 743-0830                |
> | Email: sthomas@townnews.com    AIM  : trifthen                      |
> | Web  : www.townnews.com                                             |
> |                                                                     |
> |     "Most of our lives are about proving something, either to       |
> |      ourselves or to someone else."                                 |
> |                                           -- Anonymous              |
> +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
>

Content-Description:

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: Contrib reindex script:

From
"David Blood"
Date:
If you are going to implement this is there a way to do it so that it
doesn't lock the tables?  We have had to reindex on a regular basis and
it locks up the tables when we do. The same problem with the old vacuum

David Blood

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Bruce Momjian
Sent: Thursday, June 13, 2002 11:13 AM
To: Shaun Thomas
Cc: PostgreSQL general list
Subject: Re: [GENERAL] Contrib reindex script:


Given the real file growth problems we have analyzed over the past
month, I think we need to rethink the tools we give to users.

First, I think we need a section in the admin/maintenance manual talking
about reindex.  (We already have vacuum there, and I think reindex is
something that periodically also should be done.)

Second, reindex on an entire database only works for stand-alone
databases.  Now, I know we can't reindex system tables on a live
database, but should we enable REINDEX all non-system tables  for a
database, similar to how we use vacuum?  Do we need the reindex script
like vacuumdb?

------------------------------------------------------------------------
---

Shaun Thomas wrote:
> I modeled this after the vacuumdb script that is distributed in the
> postgresql-base distribution.  It can reindex a whole database,
specific
> tables, or just a single index.  Being as we've had so much trouble
with
> things growing out of control, and the group seems to have tracked it
> down to indexes, here it is.
>
> I don't understand why we have a vacuumdb script, but not a reindex
> script, considering the fact that vacuum doesn't clean up indexes.
>
> Anyway, I didn't bother to comment it, I just copied vacuumdb and
> changed relevant parts.  It's not pretty, but it works.  Maybe later
> I'll produce one that doesn't look like it was coughed up by a robot.
>
> --
>
+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
> | Shaun M. Thomas                INN Database Administrator
|
> | Phone: (309) 743-0812          Fax  : (309) 743-0830
|
> | Email: sthomas@townnews.com    AIM  : trifthen
|
> | Web  : www.townnews.com
|
> |
|
> |     "Most of our lives are about proving something, either to
|
> |      ourselves or to someone else."
|
> |                                           -- Anonymous
|
>
+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
>

Content-Description:

[ Attachment, skipping... ]

>
> ---------------------------(end of
broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania
19026

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)




Re: Contrib reindex script:

From
Neil Conway
Date:
On Thu, 13 Jun 2002 13:13:29 -0400 (EDT)
"Bruce Momjian" <pgman@candle.pha.pa.us> wrote:
> First, I think we need a section in the admin/maintenance manual talking
> about reindex.  (We already have vacuum there, and I think reindex is
> something that periodically also should be done.)

One simple improvement would be to change the documentation's view of
REINDEX -- at the moment, it says that it's only useful for rebuilding
"corrupted indexes". If it's a legitimate maintainence activity for
certain query loads, it should get a more positive spin.

Also, could we enhance REINDEX to not require an exclusive table lock?
Without looking at the code, I would think that we could at least allow
for SELECTs (using seqscans or other indexes) while a REINDEX is running.

Cheers,

Neil

--
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC

Re: Contrib reindex script:

From
Tom Lane
Date:
Neil Conway <nconway@klamath.dyndns.org> writes:
> Also, could we enhance REINDEX to not require an exclusive table lock?
> Without looking at the code, I would think that we could at least allow
> for SELECTs (using seqscans or other indexes) while a REINDEX is running.

Not unless you see a way to prevent those SELECTs from trying to use the
index...

Personally I think putting development effort into REINDEX is a waste of
manpower.  Study how to make btree collapse out empty pages, instead.

            regards, tom lane

Re: Contrib reindex script:

From
Bruce Momjian
Date:
Tom Lane wrote:
> Neil Conway <nconway@klamath.dyndns.org> writes:
> > Also, could we enhance REINDEX to not require an exclusive table lock?
> > Without looking at the code, I would think that we could at least allow
> > for SELECTs (using seqscans or other indexes) while a REINDEX is running.
>
> Not unless you see a way to prevent those SELECTs from trying to use the
> index...

Well, we should recreate the index as a separate file, then switch over
to the new file use relfilenode.

> Personally I think putting development effort into REINDEX is a waste of
> manpower.  Study how to make btree collapse out empty pages, instead.

Agreed, but what plans do we have to fix it?  I have heard no proposals,
and it has been a known problem for years.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: Contrib reindex script:

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> Personally I think putting development effort into REINDEX is a waste of
>> manpower.  Study how to make btree collapse out empty pages, instead.

> Agreed, but what plans do we have to fix it?  I have heard no proposals,
> and it has been a known problem for years.

Lots of important things have been on the TODO list for years.
Diverting effort into third-best substitutes isn't going to help get
them done faster.

            regards, tom lane

Re: Contrib reindex script:

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> Personally I think putting development effort into REINDEX is a waste of
> >> manpower.  Study how to make btree collapse out empty pages, instead.
>
> > Agreed, but what plans do we have to fix it?  I have heard no proposals,
> > and it has been a known problem for years.
>
> Lots of important things have been on the TODO list for years.
> Diverting effort into third-best substitutes isn't going to help get
> them done faster.

Well, not providing tools for people to work around the problem seems
wrong too.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: Contrib reindex script:

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> Diverting effort into third-best substitutes isn't going to help get
>> them done faster.

> Well, not providing tools for people to work around the problem seems
> wrong too.

We *have* such tools: REINDEX is certainly usable as-is.  The
improvements proposed in this thread are just polishing a buggy whip
that will be obsolete as soon as someone tackles the real problem.

I don't object to adding the already-submitted reindex script into
contrib; that costs nothing.  But the time of people who would be able
to improve the backend implementation of REINDEX is valuable, and
would be better spent elsewhere.  IMHO anyway.

            regards, tom lane

Re: Contrib reindex script:

From
Bruce Momjian
Date:
Shaun Thomas wrote:
> On 7 May 2002, Jason Earl wrote:
>
> > Hey thanks, I was just wondering if such a thing were available.
>
> Here also is an updated version of the script.  I've cleaned up some of
> the code, heavily commented it, and fixed a bug or two.  Remember, this
> is basically just vacuumdb, so it'll take all the same parameters
> (except the obvious ones like -z and -f) and you can put it in your
> postgres bin directory.  I have a line in cron that runs it every two
> hours with the -a option with the same user that owns the install.
>
> Works great!

OK, I have added your reindex script to CVS /contrib.

> > How has your migration to 7.2?  Are you still have problems with the
> > database growing out of control?
>
> Actually, now that I have this script to basically be a vacuum script
> for indexes to go along with vacuumdb, I've arrested the database
> growth.  The database I was groaning about before is standing firm
> around 87MB instead of the 300MB it would normally be by now.
>
> So far, 7.2 is fine.  Database load is a non issue, growth is gone
> thanks to the reindex script, and I couldn't be happier.  Now the real
> question is, why can't Postgres have a monitor that does these two
> things (vacuum, reindex) automatically throughout the day?  Something
> that just trawls the tables doing a continuous partial vacuum, and
> triggers on deletes and updates to keep the indexes consistant.

Yep, it would be nice.  I now realize there isn't even a way to see
index wastage.  I think Tatsuo was working on such a script for contrib.

I have also added information to the SGML manual under maintenance:

  <para>
   <productname>PostgreSQL</productname> is unable to reuse btree index
   pages in certain cases. The problem is that if indexed rows are
   deleted, those index pages can only be reused by rows with similar
   values. For example, if indexed rows are deleted and newly
   inserted/updated rows have much higher values, the new rows can't use
   the index space made available by the deleted rows. Instead, such
   new rows must be placed on new index pages. In such cases, disk
   space used by the index will grow indefinately, even if
   <command>VACUUM</> is run frequently.
  </para>
  <para>
   As a solution, you can use the <command>REINDEX</> command
   periodically to discard pages used by deleted rows. There is also
   <filename>contrib/reindex</> which can reindex an entire database.
  </para>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: Contrib reindex script:

From
Bruce Momjian
Date:
Neil Conway wrote:
> On Thu, 13 Jun 2002 13:13:29 -0400 (EDT)
> "Bruce Momjian" <pgman@candle.pha.pa.us> wrote:
> > First, I think we need a section in the admin/maintenance manual talking
> > about reindex.  (We already have vacuum there, and I think reindex is
> > something that periodically also should be done.)
>
> One simple improvement would be to change the documentation's view of
> REINDEX -- at the moment, it says that it's only useful for rebuilding
> "corrupted indexes". If it's a legitimate maintainence activity for
> certain query loads, it should get a more positive spin.

Done.

> Also, could we enhance REINDEX to not require an exclusive table lock?
> Without looking at the code, I would think that we could at least allow
> for SELECTs (using seqscans or other indexes) while a REINDEX is running.

Yes, that would be nice.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026