Thread: How often do I need to reindex tables?

How often do I need to reindex tables?

From

"Dhaval Shah"

Date:

27 February 2007, 15:26:27

I am planning to use 8.2 and the average inserts/deletes and updates
across all tables is moderate. That is, it is a moderate sized
database with moderate usage of tables.

Given that, how often do I need to reindex the tables? Do I need to do
it everyday?

Also with 8.2, I do not have to do vacuum anymore or that is what I
understand. Does it do auto-vacuum?

Thanks in advance
Dhaval Shah

Re: How often do I need to reindex tables?

From

"Jim C. Nasby"

Date:

27 February 2007, 15:56:44

On Tue, Feb 27, 2007 at 11:26:02AM -0800, Dhaval Shah wrote:
> I am planning to use 8.2 and the average inserts/deletes and updates
> across all tables is moderate. That is, it is a moderate sized
> database with moderate usage of tables.
>
> Given that, how often do I need to reindex the tables? Do I need to do
> it everyday?

No, you should very rarely if ever need to do it.

If you're really concerned, I suggest monitoring average tuples per
index page; something like

SELECT relname, reltuples/relpages FROM pg_class WHERE relkind = 'i' AND
relpages > 1000;

That'll show tuples/page for all indexes over 8MB in size.

> Also with 8.2, I do not have to do vacuum anymore or that is what I
> understand. Does it do auto-vacuum?

You still need to enable autovacuum. See autovacuum_enable.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

[ANN]VTD-XML 2.0

From

"Jimmy Zhang"

Date:

27 February 2007, 16:13:45

 XimpleWare is proud to announce the release of version 2.0 of
VTD-XML, the next generation XML parser/indexer. The new
features introduced in this version are:

* VTD+XML version 1.0: the world's first true native XML index
that is simple, general-purpose and back-compatible with XML.
* NodeRecorder Class that saves VTDNav's cursor location for
later sequential access.
* Overwrite capability
* Lexically comparisons between VTD and strings

 To download the software, please go to
http://sourceforge.net/project/showfiles.php?group_id=110612

 To read the latest benchmark report please go to
http://vtd-xml.sf.net/benchmark1.html

 To get the latest API overview
http://www.ximpleware.com/vtd-xml_intro.pdf

----- Original Message -----
From: "Jim C. Nasby" <jim@nasby.net>
To: "Dhaval Shah" <dhaval.shah.m@gmail.com>
Cc: <pgsql-general@postgresql.org>
Sent: Tuesday, February 27, 2007 11:56 AM
Subject: Re: [GENERAL] How often do I need to reindex tables?


> On Tue, Feb 27, 2007 at 11:26:02AM -0800, Dhaval Shah wrote:
>> I am planning to use 8.2 and the average inserts/deletes and updates
>> across all tables is moderate. That is, it is a moderate sized
>> database with moderate usage of tables.
>>
>> Given that, how often do I need to reindex the tables? Do I need to do
>> it everyday?
>
> No, you should very rarely if ever need to do it.
>
> If you're really concerned, I suggest monitoring average tuples per
> index page; something like
>
> SELECT relname, reltuples/relpages FROM pg_class WHERE relkind = 'i' AND
> relpages > 1000;
>
> That'll show tuples/page for all indexes over 8MB in size.
>
>> Also with 8.2, I do not have to do vacuum anymore or that is what I
>> understand. Does it do auto-vacuum?
>
> You still need to enable autovacuum. See autovacuum_enable.
> --
> Jim Nasby                                            jim@nasby.net
> EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

Re: How often do I need to reindex tables?

From

Ron Johnson

Date:

27 February 2007, 19:57:45

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 02/27/07 13:26, Dhaval Shah wrote:
> I am planning to use 8.2 and the average inserts/deletes and updates
> across all tables is moderate. That is, it is a moderate sized
> database with moderate usage of tables.

Moderate?




-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFF5MVmS9HxQb37XmcRAu3PAJ9BwYSpuENbeJKweBn4arApxqyiKACgg8pg
1wExzokHE3tLSj5o4MjEaK4=
=GAEs
-----END PGP SIGNATURE-----

Re: How often do I need to reindex tables?

From

"Ezequias Rodrigues da Rocha"

Date:

28 February 2007, 12:29:14

This select doesn't return any row. What does it mean ?

Ezequias.

2007/2/27, Jim C. Nasby <jim@nasby.net>:
> On Tue, Feb 27, 2007 at 11:26:02AM -0800, Dhaval Shah wrote:
> > I am planning to use 8.2 and the average inserts/deletes and updates
> > across all tables is moderate. That is, it is a moderate sized
> > database with moderate usage of tables.
> >
> > Given that, how often do I need to reindex the tables? Do I need to do
> > it everyday?
>
> No, you should very rarely if ever need to do it.
>
> If you're really concerned, I suggest monitoring average tuples per
> index page; something like
>
> SELECT relname, reltuples/relpages FROM pg_class WHERE relkind = 'i' AND
> relpages > 1000;
>
> That'll show tuples/page for all indexes over 8MB in size.
>
> > Also with 8.2, I do not have to do vacuum anymore or that is what I
> > understand. Does it do auto-vacuum?
>
> You still need to enable autovacuum. See autovacuum_enable.
> --
> Jim Nasby                                            jim@nasby.net
> EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>


--
 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
                                  Atenciosamente (Sincerely)
                        Ezequias Rodrigues da Rocha
 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
A pior das democracias ainda é melhor do que a melhor das ditaduras
The worst of democracies is still better than the better of dictatorships
http://ezequiasrocha.blogspot.com/

Re: How often do I need to reindex tables?

From

Bill Moran

Date:

28 February 2007, 13:09:40

In response to "Ezequias Rodrigues da Rocha" <ezequias.rocha@gmail.com>:
> 2007/2/27, Jim C. Nasby <jim@nasby.net>:
> > On Tue, Feb 27, 2007 at 11:26:02AM -0800, Dhaval Shah wrote:
> > > I am planning to use 8.2 and the average inserts/deletes and updates
> > > across all tables is moderate. That is, it is a moderate sized
> > > database with moderate usage of tables.
> > >
> > > Given that, how often do I need to reindex the tables? Do I need to do
> > > it everyday?
> >
> > No, you should very rarely if ever need to do it.

I don't agree.  I think that regular indexing is mandatory under some
workloads.  Example:
bacula=# select relname, relpages from pg_class where relkind='i' and relname not like 'pg_%' order by relname;
            relname            | relpages
-------------------------------+----------
 basefiles_pkey                |        1
 cdimages_pkey                 |        1
 client_name_idx               |        2
 client_pkey                   |        2
 counters_pkey                 |        1
 device_pkey                   |        1
 file_fp_idx                   |    41212
[...]

bacula=# reindex database bacula;
[...]

            relname            | relpages
-------------------------------+----------
 basefiles_pkey                |        1
 cdimages_pkey                 |        1
 client_name_idx               |        2
 client_pkey                   |        2
 counters_pkey                 |        1
 device_pkey                   |        1
 file_fp_idx                   |    21367
[...]

There are some additional indexes that I've snipped from the output that also
saw some benefit from reindexing, but let's just focus on file_fp_idx.

Please note that the database you're looking at is reindexed _weekly_ by a
cron job, which means the index bloat you're seeing in the above example is
the result of normal activity since last Saturday.

I've brought this up before, and I want to point it out again.  I really
think there are certain workloads that require reindexing.  Luckily for
this particular workload, it's easy to schedule a job to do so, since I
know when the backups aren't running :)

--
Bill Moran
Collaborative Fusion Inc.

Re: How often do I need to reindex tables?

From

"Joshua D. Drake"

Date:

28 February 2007, 13:17:49

Bill Moran wrote:
> In response to "Ezequias Rodrigues da Rocha" <ezequias.rocha@gmail.com>:
>> 2007/2/27, Jim C. Nasby <jim@nasby.net>:
>>> On Tue, Feb 27, 2007 at 11:26:02AM -0800, Dhaval Shah wrote:
>>>> I am planning to use 8.2 and the average inserts/deletes and updates
>>>> across all tables is moderate. That is, it is a moderate sized
>>>> database with moderate usage of tables.
>>>>
>>>> Given that, how often do I need to reindex the tables? Do I need to do
>>>> it everyday?
>>> No, you should very rarely if ever need to do it.
>
> I don't agree.  I think that regular indexing is mandatory under some
> workloads.


Bill, you are right but I believe Jim was speaking from a general
perspective. Generally speaking you should not have to reindex, or if
you do very rarely.

I too have a couple of databases we manage that require a reindex more
often than what would be considered normal, but a reindex is far from
the norm itself.

Joshua D. Drake


 Example:
> bacula=# select relname, relpages from pg_class where relkind='i' and relname not like 'pg_%' order by relname;
>             relname            | relpages
> -------------------------------+----------
>  basefiles_pkey                |        1
>  cdimages_pkey                 |        1
>  client_name_idx               |        2
>  client_pkey                   |        2
>  counters_pkey                 |        1
>  device_pkey                   |        1
>  file_fp_idx                   |    41212
> [...]
>
> bacula=# reindex database bacula;
> [...]
>
>             relname            | relpages
> -------------------------------+----------
>  basefiles_pkey                |        1
>  cdimages_pkey                 |        1
>  client_name_idx               |        2
>  client_pkey                   |        2
>  counters_pkey                 |        1
>  device_pkey                   |        1
>  file_fp_idx                   |    21367
> [...]
>
> There are some additional indexes that I've snipped from the output that also
> saw some benefit from reindexing, but let's just focus on file_fp_idx.
>
> Please note that the database you're looking at is reindexed _weekly_ by a
> cron job, which means the index bloat you're seeing in the above example is
> the result of normal activity since last Saturday.
>
> I've brought this up before, and I want to point it out again.  I really
> think there are certain workloads that require reindexing.  Luckily for
> this particular workload, it's easy to schedule a job to do so, since I
> know when the backups aren't running :)
>


--

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

Re: How often do I need to reindex tables?

From

Tom Lane

Date:

28 February 2007, 13:23:12

Bill Moran <wmoran@collaborativefusion.com> writes:
> I don't agree.  I think that regular indexing is mandatory under some
> workloads.  Example:
> ...
> There are some additional indexes that I've snipped from the output that also
> saw some benefit from reindexing, but let's just focus on file_fp_idx.

Can you describe the usage pattern of that index?  I'm curious why it
doesn't maintain reasonably static size.  How often is the underlying
table vacuumed?

            regards, tom lane

Re: How often do I need to reindex tables?

From

Bill Moran

Date:

28 February 2007, 14:48:05

In response to "Joshua D. Drake" <jd@commandprompt.com>:

> Bill Moran wrote:
> > In response to "Ezequias Rodrigues da Rocha" <ezequias.rocha@gmail.com>:
> >> 2007/2/27, Jim C. Nasby <jim@nasby.net>:
> >>> On Tue, Feb 27, 2007 at 11:26:02AM -0800, Dhaval Shah wrote:
> >>>> I am planning to use 8.2 and the average inserts/deletes and updates
> >>>> across all tables is moderate. That is, it is a moderate sized
> >>>> database with moderate usage of tables.
> >>>>
> >>>> Given that, how often do I need to reindex the tables? Do I need to do
> >>>> it everyday?
> >>> No, you should very rarely if ever need to do it.
> >
> > I don't agree.  I think that regular indexing is mandatory under some
> > workloads.
>
> Bill, you are right but I believe Jim was speaking from a general
> perspective. Generally speaking you should not have to reindex, or if
> you do very rarely.
>
> I too have a couple of databases we manage that require a reindex more
> often than what would be considered normal, but a reindex is far from
> the norm itself.

Well, I hope I didn't come across as confrontation or anything, as that
wasn't my intent.  The only point I was trying to make is that the need to
reindex probably shouldn't be written off lightly until one has monitored
the indexes for a spell to see if they need it or not.


>  Example:
> > bacula=# select relname, relpages from pg_class where relkind='i' and relname not like 'pg_%' order by relname;
> >             relname            | relpages
> > -------------------------------+----------
> >  basefiles_pkey                |        1
> >  cdimages_pkey                 |        1
> >  client_name_idx               |        2
> >  client_pkey                   |        2
> >  counters_pkey                 |        1
> >  device_pkey                   |        1
> >  file_fp_idx                   |    41212
> > [...]
> >
> > bacula=# reindex database bacula;
> > [...]
> >
> >             relname            | relpages
> > -------------------------------+----------
> >  basefiles_pkey                |        1
> >  cdimages_pkey                 |        1
> >  client_name_idx               |        2
> >  client_pkey                   |        2
> >  counters_pkey                 |        1
> >  device_pkey                   |        1
> >  file_fp_idx                   |    21367
> > [...]
> >
> > There are some additional indexes that I've snipped from the output that also
> > saw some benefit from reindexing, but let's just focus on file_fp_idx.
> >
> > Please note that the database you're looking at is reindexed _weekly_ by a
> > cron job, which means the index bloat you're seeing in the above example is
> > the result of normal activity since last Saturday.
> >
> > I've brought this up before, and I want to point it out again.  I really
> > think there are certain workloads that require reindexing.  Luckily for
> > this particular workload, it's easy to schedule a job to do so, since I
> > know when the backups aren't running :)
> >
>
>
> --
>
>       === The PostgreSQL Company: Command Prompt, Inc. ===
> Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
> Providing the most comprehensive  PostgreSQL solutions since 1997
>              http://www.commandprompt.com/
>
> Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
> PostgreSQL Replication: http://www.commandprompt.com/products/
>
>
>
>
>
>
>


--
Bill Moran
Collaborative Fusion Inc.

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

****************************************************************
IMPORTANT: This message contains confidential information and is
intended only for the individual named. If the reader of this
message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.
****************************************************************

About PostgreSQL Block Size

From

Iannsp

Date:

28 February 2007, 15:05:52

Hi,
Some one people have one report/benchmark about using postgresql block
size modified?
What is the difference?

thanks.

--
Ivo Nascimento
Iann tech - Desenvolvendo soluções com performance e segurança
http://www.ianntech.com.br

Re: How often do I need to reindex tables?

From

Bill Moran

Date:

28 February 2007, 15:09:22

In response to Tom Lane <tgl@sss.pgh.pa.us>:

> Bill Moran <wmoran@collaborativefusion.com> writes:
> > I don't agree.  I think that regular indexing is mandatory under some
> > workloads.  Example:
> > ...
> > There are some additional indexes that I've snipped from the output that also
> > saw some benefit from reindexing, but let's just focus on file_fp_idx.
>
> Can you describe the usage pattern of that index?  I'm curious why it
> doesn't maintain reasonably static size.  How often is the underlying
> table vacuumed?

bacula=# \d file
                             Table "public.file"
   Column   |  Type   |                       Modifiers
------------+---------+-------------------------------------------------------
 fileid     | integer | not null default nextval('file_fileid_seq'::regclass)
 fileindex  | integer | not null default 0
 jobid      | integer | not null
 pathid     | integer | not null
 filenameid | integer | not null
 markid     | integer | not null default 0
 lstat      | text    | not null
 md5        | text    | not null
Indexes:
    "file_pkey" PRIMARY KEY, btree (fileid)
    "file_fp_idx" btree (filenameid, pathid)
    "file_jobid_idx" btree (jobid)

Now, that table stores a record for each file that is backed up (i.e. there's
a unique tuple for each time a file is backed up)  To save space in the
database, the file name and file path are stored in separate tables and
referenced by an ID.

This particular server has the following characteristics:
bacula=# select count(*) from file;
  count
---------
 8068956
(1 row)

bacula=# select count(*) from filename;
 count
--------
 151954
(1 row)

bacula=# select count(*) from path;
 count
-------
 49642
(1 row)

There are 21 jobs, each ranging in size from 2000 - 5000 files.  Each job
runs twice a day.  So you're looking at about 60,000 new rows at midnight
and 60,000 new rows at noon each day.  With the purge cycle, about the
same number of rows are being deleted as are being added, so the table
size stays pretty constant.

Because I know exactly when database activity is occurring on this system,
I have autovacuum disabled, and I manually run a vacuum analyze on this
database twice a day: once at 8:00 AM and again at 4:00 PM.  I had to bump
max_fsm_pages up to 60000 to keep vacuum effective.

Note that the index under discussion is the only one in this database that
shows significant bloat.  I could probably just reindex that one on a
regular schedule, but since I know when the database is quiescent, there's
no reason I can think of not to reindex the whole thing.

Anything else I can provide that would be useful?

--
Bill Moran
Collaborative Fusion Inc.

Re: How often do I need to reindex tables?

From

Tom Lane

Date:

28 February 2007, 16:41:11

Bill Moran <wmoran@collaborativefusion.com> writes:
> In response to Tom Lane <tgl@sss.pgh.pa.us>:
>> Can you describe the usage pattern of that index?  I'm curious why it
>> doesn't maintain reasonably static size.  How often is the underlying
>> table vacuumed?
> ...
> There are 21 jobs, each ranging in size from 2000 - 5000 files.  Each job
> runs twice a day.  So you're looking at about 60,000 new rows at midnight
> and 60,000 new rows at noon each day.  With the purge cycle, about the
> same number of rows are being deleted as are being added, so the table
> size stays pretty constant.
> ...
> Note that the index under discussion is the only one in this database that
> shows significant bloat.

Yeah, and there's no obvious reason in what you say why this one should
bloat either.  Can you say anything about the distribution of the index
columns --- are you working with a fairly static set of filenameids, or
does that change over time?  How about the pathids?  How does the
combination of filenameid x pathid behave?

A bit of quick arithmetic says that the minimum possible size of that
index (at 100% fill factor) would be about 20K pages.  What you showed
us was that it had expanded to 40-some K pages, or a bit under 50% fill
factor.  This is low but not totally out of line; the traditional rule
of thumb is that the steady state fill factor will be about 2/3rds for a
heavily updated btree.  If you leave it go, does it continue to get
larger, or stay around 40K?

            regards, tom lane

Re: How often do I need to reindex tables?

From

Bill Moran

Date:

28 February 2007, 17:24:19

In response to Tom Lane <tgl@sss.pgh.pa.us>:

> Bill Moran <wmoran@collaborativefusion.com> writes:
> > In response to Tom Lane <tgl@sss.pgh.pa.us>:
> >> Can you describe the usage pattern of that index?  I'm curious why it
> >> doesn't maintain reasonably static size.  How often is the underlying
> >> table vacuumed?
> > ...
> > There are 21 jobs, each ranging in size from 2000 - 5000 files.  Each job
> > runs twice a day.  So you're looking at about 60,000 new rows at midnight
> > and 60,000 new rows at noon each day.  With the purge cycle, about the
> > same number of rows are being deleted as are being added, so the table
> > size stays pretty constant.
> > ...
> > Note that the index under discussion is the only one in this database that
> > shows significant bloat.
>
> Yeah, and there's no obvious reason in what you say why this one should
> bloat either.  Can you say anything about the distribution of the index
> columns --- are you working with a fairly static set of filenameids, or
> does that change over time?  How about the pathids?  How does the
> combination of filenameid x pathid behave?

My expectation would be that the distribution stays fairly constant and
that filenameids and pathids don't get added in any great number.  Most
of the servers that are being backed up are not going to see the file
names or paths change very much, just the contents of those files.

> A bit of quick arithmetic says that the minimum possible size of that
> index (at 100% fill factor) would be about 20K pages.  What you showed
> us was that it had expanded to 40-some K pages, or a bit under 50% fill
> factor.  This is low but not totally out of line; the traditional rule
> of thumb is that the steady state fill factor will be about 2/3rds for a
> heavily updated btree.  If you leave it go, does it continue to get
> larger, or stay around 40K?

I don't remember how big it was getting before I added that cron job.  I'll
remove the cron job and replace it with one that emails me the page size
of that index every week.  I'll let it go for a few weeks and see how
it manages.

--
Bill Moran
Collaborative Fusion Inc.

Re: How often do I need to reindex tables?

From

Bill Moran

Date:

28 February 2007, 18:35:26

In response to Tom Lane <tgl@sss.pgh.pa.us>:

> Bill Moran <wmoran@collaborativefusion.com> writes:
> > In response to Tom Lane <tgl@sss.pgh.pa.us>:
> >> Can you describe the usage pattern of that index?  I'm curious why it
> >> doesn't maintain reasonably static size.  How often is the underlying
> >> table vacuumed?
> > ...
> > There are 21 jobs, each ranging in size from 2000 - 5000 files.  Each job
> > runs twice a day.  So you're looking at about 60,000 new rows at midnight
> > and 60,000 new rows at noon each day.  With the purge cycle, about the
> > same number of rows are being deleted as are being added, so the table
> > size stays pretty constant.
> > ...
> > Note that the index under discussion is the only one in this database that
> > shows significant bloat.
>
> Yeah, and there's no obvious reason in what you say why this one should
> bloat either.  Can you say anything about the distribution of the index
> columns --- are you working with a fairly static set of filenameids, or
> does that change over time?  How about the pathids?  How does the
> combination of filenameid x pathid behave?
>
> A bit of quick arithmetic says that the minimum possible size of that
> index (at 100% fill factor) would be about 20K pages.  What you showed
> us was that it had expanded to 40-some K pages, or a bit under 50% fill
> factor.  This is low but not totally out of line; the traditional rule
> of thumb is that the steady state fill factor will be about 2/3rds for a
> heavily updated btree.  If you leave it go, does it continue to get
> larger, or stay around 40K?

Just an FYI ... I remembered what prompted the cron job.

We were seeing significant performance degradation.  I never did actual
measurements, but it was on the order of "Bill, why is restoring taking
such a long time?" from other systems people.  At the time, I poked around
and tried some stuff here and there and found that reindex restored
performance.  I didn't look at actual size at that time.

Anyway, I'll report back in a few weeks as to what the numbers look like.

--
Bill Moran
Collaborative Fusion Inc.

Re: How often do I need to reindex tables?

From

Vivek Khera

Date:

01 March 2007, 13:10:25

On Feb 28, 2007, at 5:35 PM, Bill Moran wrote:

> Just an FYI ... I remembered what prompted the cron job.
>
> We were seeing significant performance degradation.  I never did
> actual
> measurements, but it was on the order of "Bill, why is restoring
> taking
> such a long time?" from other systems people.  At the time, I poked
> around
> and tried some stuff here and there and found that reindex restored
> performance.  I didn't look at actual size at that time.

I have two huge tables (one tracks messages sent, one tracks URL
click-throughs from said messages) from which I purge old data every
few weeks.  The primary key indexes on these get bloated after a few
months and performance goes way down like you observe.  A reindex
fixes up the performance issues pretty well on those tables, and
often shaves off a few gigs of disk space too.

We have to manually run the reindex because it has to be timed such
that the service is not impacted (ie, run on major holiday weekends)
and we have to take down part of the service and point other parts to
backup servers, etc.  Not an easy chore...

This is on Pg 8.1.  Don't even ask me how it was in the 7.4 days when
we have maybe 10% of the data! :-)

Attachment

smime.p7s

Re: How often do I need to reindex tables?

From

Tom Lane

Date:

02 March 2007, 02:26:41

Bill Moran <wmoran@collaborativefusion.com> writes:
> Just an FYI ... I remembered what prompted the cron job.

> We were seeing significant performance degradation.  I never did actual
> measurements, but it was on the order of "Bill, why is restoring taking
> such a long time?" from other systems people.  At the time, I poked around
> and tried some stuff here and there and found that reindex restored
> performance.  I didn't look at actual size at that time.

A reindex might improve performance for reasons other than bloat --- to
wit, that a freshly-built index is in perfect physical order, which
tends to get degraded over time by page splits.  How important that is
depends on your usage patterns.  If this is what the story is for your
situation, then what might fix it (in 8.2) is to create the index with
FILLFACTOR 50 or so, so that it's already at the steady state density
and won't need many page splits.

> Anyway, I'll report back in a few weeks as to what the numbers look like.

Yeah, please for the moment just watch what happens with the default
behavior.

            regards, tom lane

Re: How often do I need to reindex tables?

From

Bill Moran

Date:

02 March 2007, 10:25:34

In response to Tom Lane <tgl@sss.pgh.pa.us>:

> Bill Moran <wmoran@collaborativefusion.com> writes:
> > Just an FYI ... I remembered what prompted the cron job.
>
> > We were seeing significant performance degradation.  I never did actual
> > measurements, but it was on the order of "Bill, why is restoring taking
> > such a long time?" from other systems people.  At the time, I poked around
> > and tried some stuff here and there and found that reindex restored
> > performance.  I didn't look at actual size at that time.
>
> A reindex might improve performance for reasons other than bloat --- to
> wit, that a freshly-built index is in perfect physical order, which
> tends to get degraded over time by page splits.  How important that is
> depends on your usage patterns.

This goes back to the heart of the original question, which was: "how often
do I need to reindex."  The answer was "rarely, if ever" and "if you're
really worried about this, you can monitor _size_ via these queries ..."

I guess I focused too much on size in my response.  As you point out,
bloat isn't the only indicator that an index would benefit from being
rebuilt.

> If this is what the story is for your
> situation, then what might fix it (in 8.2) is to create the index with
> FILLFACTOR 50 or so, so that it's already at the steady state density
> and won't need many page splits.

Interesting.  So a major factor in performance degradation is when the
index has to split pages.  I read about FILLFACTOR in the docs, but it
didn't click as to what use it was until your statement.

We're still evaluating 8.2.  We've hit a few issues with our application
and plpgsql, but I think those are minor.  As far as deploying it for
our Bacula systems -- I just need to find the time to be sure that it
doesn't introduce any problems, but I suspect there's a low chance of
that with Bacula.

> > Anyway, I'll report back in a few weeks as to what the numbers look like.
>
> Yeah, please for the moment just watch what happens with the default
> behavior.

Yup.

--
Bill Moran
http://www.potentialtech.com

Re: How often do I need to reindex tables?

From

Jeff Davis

Date:

02 March 2007, 16:19:18

On Wed, 2007-02-28 at 09:17 -0800, Joshua D. Drake wrote:
> Bill, you are right but I believe Jim was speaking from a general
> perspective. Generally speaking you should not have to reindex, or if
> you do very rarely.
>
> I too have a couple of databases we manage that require a reindex more
> often than what would be considered normal, but a reindex is far from
> the norm itself.
>

Isn't a REINDEX still needed in the case of monotonically increasing
keys, such as in a sequence or timestamp index? I also delete tuples, so
that results in a forward-shifting range of keys.

If this is not normal, I need to re-evaluate my autovacuum settings.

Regards,
    Jeff Davis

Re: How often do I need to reindex tables?

From

Tom Lane

Date:

02 March 2007, 17:39:49

Jeff Davis <pgsql@j-davis.com> writes:
> Isn't a REINDEX still needed in the case of monotonically increasing
> keys, such as in a sequence or timestamp index? I also delete tuples, so
> that results in a forward-shifting range of keys.

No, that shouldn't be a problem, if you're maintaining a constant key
range width (that is, *all* the old entries get deleted).  The only
pattern I'm aware of that causes a problem is if you leave a small
subset of the keys behind, for instance insert every few minutes and
then later delete all but one entry per day.  In this situation you may
end up with an index containing as few as one entry per page.  We
don't have any mechanism short of REINDEX to collapse nonempty index
pages together, so that way lies bloat.  But if you delete all the old
entries then the pages get recycled and there shouldn't be a problem.

            regards, tom lane

Re: How often do I need to reindex tables?

From

Jeff Davis

Date:

02 March 2007, 18:33:17

On Fri, 2007-03-02 at 16:39 -0500, Tom Lane wrote:
> Jeff Davis <pgsql@j-davis.com> writes:
> > Isn't a REINDEX still needed in the case of monotonically increasing
> > keys, such as in a sequence or timestamp index? I also delete tuples, so
> > that results in a forward-shifting range of keys.
>
> No, that shouldn't be a problem, if you're maintaining a constant key
> range width (that is, *all* the old entries get deleted).  The only
> pattern I'm aware of that causes a problem is if you leave a small
> subset of the keys behind, for instance insert every few minutes and
> then later delete all but one entry per day.  In this situation you may
> end up with an index containing as few as one entry per page.  We
> don't have any mechanism short of REINDEX to collapse nonempty index
> pages together, so that way lies bloat.  But if you delete all the old
> entries then the pages get recycled and there shouldn't be a problem.
>

You just described this particular table, so I will need to continue
REINDEXing. It's getting maybe 10-50 inserts per second, and most expire
in an day. However, a small percentage hang around for much longer.

REINDEX isn't a problem for me, because there are periods of low usage.

I think if I really wanted to eliminate REINDEX I could move the few
remaining records into another table and have a view accross them.

Regards,
    Jeff Davis

Re: How often do I need to reindex tables?

From

Bill Moran

Date:

19 April 2007, 10:33:31

In response to Tom Lane <tgl@sss.pgh.pa.us>:

> Bill Moran <wmoran@collaborativefusion.com> writes:
> > Just an FYI ... I remembered what prompted the cron job.
>
> > We were seeing significant performance degradation.  I never did actual
> > measurements, but it was on the order of "Bill, why is restoring taking
> > such a long time?" from other systems people.  At the time, I poked around
> > and tried some stuff here and there and found that reindex restored
> > performance.  I didn't look at actual size at that time.
>
> A reindex might improve performance for reasons other than bloat --- to
> wit, that a freshly-built index is in perfect physical order, which
> tends to get degraded over time by page splits.  How important that is
> depends on your usage patterns.  If this is what the story is for your
> situation, then what might fix it (in 8.2) is to create the index with
> FILLFACTOR 50 or so, so that it's already at the steady state density
> and won't need many page splits.
>
> > Anyway, I'll report back in a few weeks as to what the numbers look like.
>
> Yeah, please for the moment just watch what happens with the default
> behavior.

Remember this discussion?

To recap, I had scheduled a weekly reindex of this database because I
was seeing performance issues otherwise.  In order to see if this was
actually helping, I disabled the redindex job, ran a few timing
experiments, then scheduled a job to email me the size of the indexes
in the database on a daily basis.

At this point, I have daily records of index size since March 6th.

The behaviour is like this:  A freshly created index is about 21,000
pages in size.  Under normal usage, the index size balloons to about
38,000 pages immediately after the first backup job is run.  From there
it grows slowly (but fairly consistently) by about 100 pages each day.
As of today, it is 44304 pages.

When I first brought up this discussion, the table contained 8068956
rows.  It now has 7451381, which means it's dropped by 7%

The important part is that I can't reproduce the performance problems
that I originally thought were the result of this.  It's entirely
possible that something else was changed since then that actually
fixed the problem, and that the index bloat was a red herring.

Not sure what (if any) conclusions can be drawn from this.  Is there
any other data I should gather?  Have I just proved my previous
rantings about the necessity of reindexing to be wrong?

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

Re: How often do I need to reindex tables?

From

"Martin Gainty"

Date:

19 April 2007, 11:18:50

Bill and Tom

Best to find out what kind of index you want to create beforehand
If your data is evenly distributed and exhibits High Cardinality (2 entries
for A,B,C...Z) then I would recommend a BTREE Index
If not (low cardinality scenarios such as gender) then create Bitmap Index
I cant speak for postgres but index creation will necessitate you to
schedule time when you can bring DB offline (such as a weekend) as most DB
will not allow a unique index to be created on a table while the table is in
use
Also I find oracle books and online documentation very helpful specifically
http://otn.oracle.com
Books are available from Oracle Press

HTH
Martin
This email message and any files transmitted with it contain confidential
information intended only for the person(s) to whom this email message is
addressed.  If you have received this email message in error, please notify
the sender immediately by telephone or email and destroy the original
message without making a copy.  Thank you.

----- Original Message -----
From: "Bill Moran" <wmoran@collaborativefusion.com>
To: "Tom Lane" <tgl@sss.pgh.pa.us>
Cc: <pgsql-general@postgresql.org>
Sent: Thursday, April 19, 2007 9:33 AM
Subject: Re: [GENERAL] How often do I need to reindex tables?


> In response to Tom Lane <tgl@sss.pgh.pa.us>:
>
>> Bill Moran <wmoran@collaborativefusion.com> writes:
>> > Just an FYI ... I remembered what prompted the cron job.
>>
>> > We were seeing significant performance degradation.  I never did actual
>> > measurements, but it was on the order of "Bill, why is restoring taking
>> > such a long time?" from other systems people.  At the time, I poked
>> > around
>> > and tried some stuff here and there and found that reindex restored
>> > performance.  I didn't look at actual size at that time.
>>
>> A reindex might improve performance for reasons other than bloat --- to
>> wit, that a freshly-built index is in perfect physical order, which
>> tends to get degraded over time by page splits.  How important that is
>> depends on your usage patterns.  If this is what the story is for your
>> situation, then what might fix it (in 8.2) is to create the index with
>> FILLFACTOR 50 or so, so that it's already at the steady state density
>> and won't need many page splits.
>>
>> > Anyway, I'll report back in a few weeks as to what the numbers look
>> > like.
>>
>> Yeah, please for the moment just watch what happens with the default
>> behavior.
>
> Remember this discussion?
>
> To recap, I had scheduled a weekly reindex of this database because I
> was seeing performance issues otherwise.  In order to see if this was
> actually helping, I disabled the redindex job, ran a few timing
> experiments, then scheduled a job to email me the size of the indexes
> in the database on a daily basis.
>
> At this point, I have daily records of index size since March 6th.
>
> The behaviour is like this:  A freshly created index is about 21,000
> pages in size.  Under normal usage, the index size balloons to about
> 38,000 pages immediately after the first backup job is run.  From there
> it grows slowly (but fairly consistently) by about 100 pages each day.
> As of today, it is 44304 pages.
>
> When I first brought up this discussion, the table contained 8068956
> rows.  It now has 7451381, which means it's dropped by 7%
>
> The important part is that I can't reproduce the performance problems
> that I originally thought were the result of this.  It's entirely
> possible that something else was changed since then that actually
> fixed the problem, and that the index bloat was a red herring.
>
> Not sure what (if any) conclusions can be drawn from this.  Is there
> any other data I should gather?  Have I just proved my previous
> rantings about the necessity of reindexing to be wrong?
>
> --
> Bill Moran
> Collaborative Fusion Inc.
> http://people.collaborativefusion.com/~wmoran/
>
> wmoran@collaborativefusion.com
> Phone: 412-422-3463x4023
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>