Re: How often do I need to reindex tables? - Mailing list pgsql-general

From Bill Moran
Subject Re: How often do I need to reindex tables?
Date
Msg-id 20070228140858.492efaf2.wmoran@collaborativefusion.com
Whole thread Raw
In response to Re: How often do I need to reindex tables?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: How often do I need to reindex tables?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
In response to Tom Lane <tgl@sss.pgh.pa.us>:

> Bill Moran <wmoran@collaborativefusion.com> writes:
> > I don't agree.  I think that regular indexing is mandatory under some
> > workloads.  Example:
> > ...
> > There are some additional indexes that I've snipped from the output that also
> > saw some benefit from reindexing, but let's just focus on file_fp_idx.
>
> Can you describe the usage pattern of that index?  I'm curious why it
> doesn't maintain reasonably static size.  How often is the underlying
> table vacuumed?

bacula=# \d file
                             Table "public.file"
   Column   |  Type   |                       Modifiers
------------+---------+-------------------------------------------------------
 fileid     | integer | not null default nextval('file_fileid_seq'::regclass)
 fileindex  | integer | not null default 0
 jobid      | integer | not null
 pathid     | integer | not null
 filenameid | integer | not null
 markid     | integer | not null default 0
 lstat      | text    | not null
 md5        | text    | not null
Indexes:
    "file_pkey" PRIMARY KEY, btree (fileid)
    "file_fp_idx" btree (filenameid, pathid)
    "file_jobid_idx" btree (jobid)

Now, that table stores a record for each file that is backed up (i.e. there's
a unique tuple for each time a file is backed up)  To save space in the
database, the file name and file path are stored in separate tables and
referenced by an ID.

This particular server has the following characteristics:
bacula=# select count(*) from file;
  count
---------
 8068956
(1 row)

bacula=# select count(*) from filename;
 count
--------
 151954
(1 row)

bacula=# select count(*) from path;
 count
-------
 49642
(1 row)

There are 21 jobs, each ranging in size from 2000 - 5000 files.  Each job
runs twice a day.  So you're looking at about 60,000 new rows at midnight
and 60,000 new rows at noon each day.  With the purge cycle, about the
same number of rows are being deleted as are being added, so the table
size stays pretty constant.

Because I know exactly when database activity is occurring on this system,
I have autovacuum disabled, and I manually run a vacuum analyze on this
database twice a day: once at 8:00 AM and again at 4:00 PM.  I had to bump
max_fsm_pages up to 60000 to keep vacuum effective.

Note that the index under discussion is the only one in this database that
shows significant bloat.  I could probably just reindex that one on a
regular schedule, but since I know when the database is quiescent, there's
no reason I can think of not to reindex the whole thing.

Anything else I can provide that would be useful?

--
Bill Moran
Collaborative Fusion Inc.

pgsql-general by date:

Previous
From: Iannsp
Date:
Subject: About PostgreSQL Block Size
Next
From: Bruce Momjian
Date:
Subject: Re: grant on sequence and pg_restore/pg_dump problem