Thread: How often do I need to reindex tables?
I am planning to use 8.2 and the average inserts/deletes and updates across all tables is moderate. That is, it is a moderate sized database with moderate usage of tables. Given that, how often do I need to reindex the tables? Do I need to do it everyday? Also with 8.2, I do not have to do vacuum anymore or that is what I understand. Does it do auto-vacuum? Thanks in advance Dhaval Shah
On Tue, Feb 27, 2007 at 11:26:02AM -0800, Dhaval Shah wrote: > I am planning to use 8.2 and the average inserts/deletes and updates > across all tables is moderate. That is, it is a moderate sized > database with moderate usage of tables. > > Given that, how often do I need to reindex the tables? Do I need to do > it everyday? No, you should very rarely if ever need to do it. If you're really concerned, I suggest monitoring average tuples per index page; something like SELECT relname, reltuples/relpages FROM pg_class WHERE relkind = 'i' AND relpages > 1000; That'll show tuples/page for all indexes over 8MB in size. > Also with 8.2, I do not have to do vacuum anymore or that is what I > understand. Does it do auto-vacuum? You still need to enable autovacuum. See autovacuum_enable. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
XimpleWare is proud to announce the release of version 2.0 of VTD-XML, the next generation XML parser/indexer. The new features introduced in this version are: * VTD+XML version 1.0: the world's first true native XML index that is simple, general-purpose and back-compatible with XML. * NodeRecorder Class that saves VTDNav's cursor location for later sequential access. * Overwrite capability * Lexically comparisons between VTD and strings To download the software, please go to http://sourceforge.net/project/showfiles.php?group_id=110612 To read the latest benchmark report please go to http://vtd-xml.sf.net/benchmark1.html To get the latest API overview http://www.ximpleware.com/vtd-xml_intro.pdf ----- Original Message ----- From: "Jim C. Nasby" <jim@nasby.net> To: "Dhaval Shah" <dhaval.shah.m@gmail.com> Cc: <pgsql-general@postgresql.org> Sent: Tuesday, February 27, 2007 11:56 AM Subject: Re: [GENERAL] How often do I need to reindex tables? > On Tue, Feb 27, 2007 at 11:26:02AM -0800, Dhaval Shah wrote: >> I am planning to use 8.2 and the average inserts/deletes and updates >> across all tables is moderate. That is, it is a moderate sized >> database with moderate usage of tables. >> >> Given that, how often do I need to reindex the tables? Do I need to do >> it everyday? > > No, you should very rarely if ever need to do it. > > If you're really concerned, I suggest monitoring average tuples per > index page; something like > > SELECT relname, reltuples/relpages FROM pg_class WHERE relkind = 'i' AND > relpages > 1000; > > That'll show tuples/page for all indexes over 8MB in size. > >> Also with 8.2, I do not have to do vacuum anymore or that is what I >> understand. Does it do auto-vacuum? > > You still need to enable autovacuum. See autovacuum_enable. > -- > Jim Nasby jim@nasby.net > EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster >
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 02/27/07 13:26, Dhaval Shah wrote: > I am planning to use 8.2 and the average inserts/deletes and updates > across all tables is moderate. That is, it is a moderate sized > database with moderate usage of tables. Moderate? -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFF5MVmS9HxQb37XmcRAu3PAJ9BwYSpuENbeJKweBn4arApxqyiKACgg8pg 1wExzokHE3tLSj5o4MjEaK4= =GAEs -----END PGP SIGNATURE-----
This select doesn't return any row. What does it mean ? Ezequias. 2007/2/27, Jim C. Nasby <jim@nasby.net>: > On Tue, Feb 27, 2007 at 11:26:02AM -0800, Dhaval Shah wrote: > > I am planning to use 8.2 and the average inserts/deletes and updates > > across all tables is moderate. That is, it is a moderate sized > > database with moderate usage of tables. > > > > Given that, how often do I need to reindex the tables? Do I need to do > > it everyday? > > No, you should very rarely if ever need to do it. > > If you're really concerned, I suggest monitoring average tuples per > index page; something like > > SELECT relname, reltuples/relpages FROM pg_class WHERE relkind = 'i' AND > relpages > 1000; > > That'll show tuples/page for all indexes over 8MB in size. > > > Also with 8.2, I do not have to do vacuum anymore or that is what I > > understand. Does it do auto-vacuum? > > You still need to enable autovacuum. See autovacuum_enable. > -- > Jim Nasby jim@nasby.net > EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Atenciosamente (Sincerely) Ezequias Rodrigues da Rocha =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- A pior das democracias ainda é melhor do que a melhor das ditaduras The worst of democracies is still better than the better of dictatorships http://ezequiasrocha.blogspot.com/
In response to "Ezequias Rodrigues da Rocha" <ezequias.rocha@gmail.com>: > 2007/2/27, Jim C. Nasby <jim@nasby.net>: > > On Tue, Feb 27, 2007 at 11:26:02AM -0800, Dhaval Shah wrote: > > > I am planning to use 8.2 and the average inserts/deletes and updates > > > across all tables is moderate. That is, it is a moderate sized > > > database with moderate usage of tables. > > > > > > Given that, how often do I need to reindex the tables? Do I need to do > > > it everyday? > > > > No, you should very rarely if ever need to do it. I don't agree. I think that regular indexing is mandatory under some workloads. Example: bacula=# select relname, relpages from pg_class where relkind='i' and relname not like 'pg_%' order by relname; relname | relpages -------------------------------+---------- basefiles_pkey | 1 cdimages_pkey | 1 client_name_idx | 2 client_pkey | 2 counters_pkey | 1 device_pkey | 1 file_fp_idx | 41212 [...] bacula=# reindex database bacula; [...] relname | relpages -------------------------------+---------- basefiles_pkey | 1 cdimages_pkey | 1 client_name_idx | 2 client_pkey | 2 counters_pkey | 1 device_pkey | 1 file_fp_idx | 21367 [...] There are some additional indexes that I've snipped from the output that also saw some benefit from reindexing, but let's just focus on file_fp_idx. Please note that the database you're looking at is reindexed _weekly_ by a cron job, which means the index bloat you're seeing in the above example is the result of normal activity since last Saturday. I've brought this up before, and I want to point it out again. I really think there are certain workloads that require reindexing. Luckily for this particular workload, it's easy to schedule a job to do so, since I know when the backups aren't running :) -- Bill Moran Collaborative Fusion Inc.
Bill Moran wrote: > In response to "Ezequias Rodrigues da Rocha" <ezequias.rocha@gmail.com>: >> 2007/2/27, Jim C. Nasby <jim@nasby.net>: >>> On Tue, Feb 27, 2007 at 11:26:02AM -0800, Dhaval Shah wrote: >>>> I am planning to use 8.2 and the average inserts/deletes and updates >>>> across all tables is moderate. That is, it is a moderate sized >>>> database with moderate usage of tables. >>>> >>>> Given that, how often do I need to reindex the tables? Do I need to do >>>> it everyday? >>> No, you should very rarely if ever need to do it. > > I don't agree. I think that regular indexing is mandatory under some > workloads. Bill, you are right but I believe Jim was speaking from a general perspective. Generally speaking you should not have to reindex, or if you do very rarely. I too have a couple of databases we manage that require a reindex more often than what would be considered normal, but a reindex is far from the norm itself. Joshua D. Drake Example: > bacula=# select relname, relpages from pg_class where relkind='i' and relname not like 'pg_%' order by relname; > relname | relpages > -------------------------------+---------- > basefiles_pkey | 1 > cdimages_pkey | 1 > client_name_idx | 2 > client_pkey | 2 > counters_pkey | 1 > device_pkey | 1 > file_fp_idx | 41212 > [...] > > bacula=# reindex database bacula; > [...] > > relname | relpages > -------------------------------+---------- > basefiles_pkey | 1 > cdimages_pkey | 1 > client_name_idx | 2 > client_pkey | 2 > counters_pkey | 1 > device_pkey | 1 > file_fp_idx | 21367 > [...] > > There are some additional indexes that I've snipped from the output that also > saw some benefit from reindexing, but let's just focus on file_fp_idx. > > Please note that the database you're looking at is reindexed _weekly_ by a > cron job, which means the index bloat you're seeing in the above example is > the result of normal activity since last Saturday. > > I've brought this up before, and I want to point it out again. I really > think there are certain workloads that require reindexing. Luckily for > this particular workload, it's easy to schedule a job to do so, since I > know when the backups aren't running :) > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
Bill Moran <wmoran@collaborativefusion.com> writes: > I don't agree. I think that regular indexing is mandatory under some > workloads. Example: > ... > There are some additional indexes that I've snipped from the output that also > saw some benefit from reindexing, but let's just focus on file_fp_idx. Can you describe the usage pattern of that index? I'm curious why it doesn't maintain reasonably static size. How often is the underlying table vacuumed? regards, tom lane
In response to "Joshua D. Drake" <jd@commandprompt.com>: > Bill Moran wrote: > > In response to "Ezequias Rodrigues da Rocha" <ezequias.rocha@gmail.com>: > >> 2007/2/27, Jim C. Nasby <jim@nasby.net>: > >>> On Tue, Feb 27, 2007 at 11:26:02AM -0800, Dhaval Shah wrote: > >>>> I am planning to use 8.2 and the average inserts/deletes and updates > >>>> across all tables is moderate. That is, it is a moderate sized > >>>> database with moderate usage of tables. > >>>> > >>>> Given that, how often do I need to reindex the tables? Do I need to do > >>>> it everyday? > >>> No, you should very rarely if ever need to do it. > > > > I don't agree. I think that regular indexing is mandatory under some > > workloads. > > Bill, you are right but I believe Jim was speaking from a general > perspective. Generally speaking you should not have to reindex, or if > you do very rarely. > > I too have a couple of databases we manage that require a reindex more > often than what would be considered normal, but a reindex is far from > the norm itself. Well, I hope I didn't come across as confrontation or anything, as that wasn't my intent. The only point I was trying to make is that the need to reindex probably shouldn't be written off lightly until one has monitored the indexes for a spell to see if they need it or not. > Example: > > bacula=# select relname, relpages from pg_class where relkind='i' and relname not like 'pg_%' order by relname; > > relname | relpages > > -------------------------------+---------- > > basefiles_pkey | 1 > > cdimages_pkey | 1 > > client_name_idx | 2 > > client_pkey | 2 > > counters_pkey | 1 > > device_pkey | 1 > > file_fp_idx | 41212 > > [...] > > > > bacula=# reindex database bacula; > > [...] > > > > relname | relpages > > -------------------------------+---------- > > basefiles_pkey | 1 > > cdimages_pkey | 1 > > client_name_idx | 2 > > client_pkey | 2 > > counters_pkey | 1 > > device_pkey | 1 > > file_fp_idx | 21367 > > [...] > > > > There are some additional indexes that I've snipped from the output that also > > saw some benefit from reindexing, but let's just focus on file_fp_idx. > > > > Please note that the database you're looking at is reindexed _weekly_ by a > > cron job, which means the index bloat you're seeing in the above example is > > the result of normal activity since last Saturday. > > > > I've brought this up before, and I want to point it out again. I really > > think there are certain workloads that require reindexing. Luckily for > > this particular workload, it's easy to schedule a job to do so, since I > > know when the backups aren't running :) > > > > > -- > > === The PostgreSQL Company: Command Prompt, Inc. === > Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 > Providing the most comprehensive PostgreSQL solutions since 1997 > http://www.commandprompt.com/ > > Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate > PostgreSQL Replication: http://www.commandprompt.com/products/ > > > > > > > -- Bill Moran Collaborative Fusion Inc. wmoran@collaborativefusion.com Phone: 412-422-3463x4023 **************************************************************** IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ****************************************************************
Hi, Some one people have one report/benchmark about using postgresql block size modified? What is the difference? thanks. -- Ivo Nascimento Iann tech - Desenvolvendo soluções com performance e segurança http://www.ianntech.com.br
In response to Tom Lane <tgl@sss.pgh.pa.us>: > Bill Moran <wmoran@collaborativefusion.com> writes: > > I don't agree. I think that regular indexing is mandatory under some > > workloads. Example: > > ... > > There are some additional indexes that I've snipped from the output that also > > saw some benefit from reindexing, but let's just focus on file_fp_idx. > > Can you describe the usage pattern of that index? I'm curious why it > doesn't maintain reasonably static size. How often is the underlying > table vacuumed? bacula=# \d file Table "public.file" Column | Type | Modifiers ------------+---------+------------------------------------------------------- fileid | integer | not null default nextval('file_fileid_seq'::regclass) fileindex | integer | not null default 0 jobid | integer | not null pathid | integer | not null filenameid | integer | not null markid | integer | not null default 0 lstat | text | not null md5 | text | not null Indexes: "file_pkey" PRIMARY KEY, btree (fileid) "file_fp_idx" btree (filenameid, pathid) "file_jobid_idx" btree (jobid) Now, that table stores a record for each file that is backed up (i.e. there's a unique tuple for each time a file is backed up) To save space in the database, the file name and file path are stored in separate tables and referenced by an ID. This particular server has the following characteristics: bacula=# select count(*) from file; count --------- 8068956 (1 row) bacula=# select count(*) from filename; count -------- 151954 (1 row) bacula=# select count(*) from path; count ------- 49642 (1 row) There are 21 jobs, each ranging in size from 2000 - 5000 files. Each job runs twice a day. So you're looking at about 60,000 new rows at midnight and 60,000 new rows at noon each day. With the purge cycle, about the same number of rows are being deleted as are being added, so the table size stays pretty constant. Because I know exactly when database activity is occurring on this system, I have autovacuum disabled, and I manually run a vacuum analyze on this database twice a day: once at 8:00 AM and again at 4:00 PM. I had to bump max_fsm_pages up to 60000 to keep vacuum effective. Note that the index under discussion is the only one in this database that shows significant bloat. I could probably just reindex that one on a regular schedule, but since I know when the database is quiescent, there's no reason I can think of not to reindex the whole thing. Anything else I can provide that would be useful? -- Bill Moran Collaborative Fusion Inc.
Bill Moran <wmoran@collaborativefusion.com> writes: > In response to Tom Lane <tgl@sss.pgh.pa.us>: >> Can you describe the usage pattern of that index? I'm curious why it >> doesn't maintain reasonably static size. How often is the underlying >> table vacuumed? > ... > There are 21 jobs, each ranging in size from 2000 - 5000 files. Each job > runs twice a day. So you're looking at about 60,000 new rows at midnight > and 60,000 new rows at noon each day. With the purge cycle, about the > same number of rows are being deleted as are being added, so the table > size stays pretty constant. > ... > Note that the index under discussion is the only one in this database that > shows significant bloat. Yeah, and there's no obvious reason in what you say why this one should bloat either. Can you say anything about the distribution of the index columns --- are you working with a fairly static set of filenameids, or does that change over time? How about the pathids? How does the combination of filenameid x pathid behave? A bit of quick arithmetic says that the minimum possible size of that index (at 100% fill factor) would be about 20K pages. What you showed us was that it had expanded to 40-some K pages, or a bit under 50% fill factor. This is low but not totally out of line; the traditional rule of thumb is that the steady state fill factor will be about 2/3rds for a heavily updated btree. If you leave it go, does it continue to get larger, or stay around 40K? regards, tom lane
In response to Tom Lane <tgl@sss.pgh.pa.us>: > Bill Moran <wmoran@collaborativefusion.com> writes: > > In response to Tom Lane <tgl@sss.pgh.pa.us>: > >> Can you describe the usage pattern of that index? I'm curious why it > >> doesn't maintain reasonably static size. How often is the underlying > >> table vacuumed? > > ... > > There are 21 jobs, each ranging in size from 2000 - 5000 files. Each job > > runs twice a day. So you're looking at about 60,000 new rows at midnight > > and 60,000 new rows at noon each day. With the purge cycle, about the > > same number of rows are being deleted as are being added, so the table > > size stays pretty constant. > > ... > > Note that the index under discussion is the only one in this database that > > shows significant bloat. > > Yeah, and there's no obvious reason in what you say why this one should > bloat either. Can you say anything about the distribution of the index > columns --- are you working with a fairly static set of filenameids, or > does that change over time? How about the pathids? How does the > combination of filenameid x pathid behave? My expectation would be that the distribution stays fairly constant and that filenameids and pathids don't get added in any great number. Most of the servers that are being backed up are not going to see the file names or paths change very much, just the contents of those files. > A bit of quick arithmetic says that the minimum possible size of that > index (at 100% fill factor) would be about 20K pages. What you showed > us was that it had expanded to 40-some K pages, or a bit under 50% fill > factor. This is low but not totally out of line; the traditional rule > of thumb is that the steady state fill factor will be about 2/3rds for a > heavily updated btree. If you leave it go, does it continue to get > larger, or stay around 40K? I don't remember how big it was getting before I added that cron job. I'll remove the cron job and replace it with one that emails me the page size of that index every week. I'll let it go for a few weeks and see how it manages. -- Bill Moran Collaborative Fusion Inc.
In response to Tom Lane <tgl@sss.pgh.pa.us>: > Bill Moran <wmoran@collaborativefusion.com> writes: > > In response to Tom Lane <tgl@sss.pgh.pa.us>: > >> Can you describe the usage pattern of that index? I'm curious why it > >> doesn't maintain reasonably static size. How often is the underlying > >> table vacuumed? > > ... > > There are 21 jobs, each ranging in size from 2000 - 5000 files. Each job > > runs twice a day. So you're looking at about 60,000 new rows at midnight > > and 60,000 new rows at noon each day. With the purge cycle, about the > > same number of rows are being deleted as are being added, so the table > > size stays pretty constant. > > ... > > Note that the index under discussion is the only one in this database that > > shows significant bloat. > > Yeah, and there's no obvious reason in what you say why this one should > bloat either. Can you say anything about the distribution of the index > columns --- are you working with a fairly static set of filenameids, or > does that change over time? How about the pathids? How does the > combination of filenameid x pathid behave? > > A bit of quick arithmetic says that the minimum possible size of that > index (at 100% fill factor) would be about 20K pages. What you showed > us was that it had expanded to 40-some K pages, or a bit under 50% fill > factor. This is low but not totally out of line; the traditional rule > of thumb is that the steady state fill factor will be about 2/3rds for a > heavily updated btree. If you leave it go, does it continue to get > larger, or stay around 40K? Just an FYI ... I remembered what prompted the cron job. We were seeing significant performance degradation. I never did actual measurements, but it was on the order of "Bill, why is restoring taking such a long time?" from other systems people. At the time, I poked around and tried some stuff here and there and found that reindex restored performance. I didn't look at actual size at that time. Anyway, I'll report back in a few weeks as to what the numbers look like. -- Bill Moran Collaborative Fusion Inc.
On Feb 28, 2007, at 5:35 PM, Bill Moran wrote: > Just an FYI ... I remembered what prompted the cron job. > > We were seeing significant performance degradation. I never did > actual > measurements, but it was on the order of "Bill, why is restoring > taking > such a long time?" from other systems people. At the time, I poked > around > and tried some stuff here and there and found that reindex restored > performance. I didn't look at actual size at that time. I have two huge tables (one tracks messages sent, one tracks URL click-throughs from said messages) from which I purge old data every few weeks. The primary key indexes on these get bloated after a few months and performance goes way down like you observe. A reindex fixes up the performance issues pretty well on those tables, and often shaves off a few gigs of disk space too. We have to manually run the reindex because it has to be timed such that the service is not impacted (ie, run on major holiday weekends) and we have to take down part of the service and point other parts to backup servers, etc. Not an easy chore... This is on Pg 8.1. Don't even ask me how it was in the 7.4 days when we have maybe 10% of the data! :-)
Attachment
Bill Moran <wmoran@collaborativefusion.com> writes: > Just an FYI ... I remembered what prompted the cron job. > We were seeing significant performance degradation. I never did actual > measurements, but it was on the order of "Bill, why is restoring taking > such a long time?" from other systems people. At the time, I poked around > and tried some stuff here and there and found that reindex restored > performance. I didn't look at actual size at that time. A reindex might improve performance for reasons other than bloat --- to wit, that a freshly-built index is in perfect physical order, which tends to get degraded over time by page splits. How important that is depends on your usage patterns. If this is what the story is for your situation, then what might fix it (in 8.2) is to create the index with FILLFACTOR 50 or so, so that it's already at the steady state density and won't need many page splits. > Anyway, I'll report back in a few weeks as to what the numbers look like. Yeah, please for the moment just watch what happens with the default behavior. regards, tom lane
In response to Tom Lane <tgl@sss.pgh.pa.us>: > Bill Moran <wmoran@collaborativefusion.com> writes: > > Just an FYI ... I remembered what prompted the cron job. > > > We were seeing significant performance degradation. I never did actual > > measurements, but it was on the order of "Bill, why is restoring taking > > such a long time?" from other systems people. At the time, I poked around > > and tried some stuff here and there and found that reindex restored > > performance. I didn't look at actual size at that time. > > A reindex might improve performance for reasons other than bloat --- to > wit, that a freshly-built index is in perfect physical order, which > tends to get degraded over time by page splits. How important that is > depends on your usage patterns. This goes back to the heart of the original question, which was: "how often do I need to reindex." The answer was "rarely, if ever" and "if you're really worried about this, you can monitor _size_ via these queries ..." I guess I focused too much on size in my response. As you point out, bloat isn't the only indicator that an index would benefit from being rebuilt. > If this is what the story is for your > situation, then what might fix it (in 8.2) is to create the index with > FILLFACTOR 50 or so, so that it's already at the steady state density > and won't need many page splits. Interesting. So a major factor in performance degradation is when the index has to split pages. I read about FILLFACTOR in the docs, but it didn't click as to what use it was until your statement. We're still evaluating 8.2. We've hit a few issues with our application and plpgsql, but I think those are minor. As far as deploying it for our Bacula systems -- I just need to find the time to be sure that it doesn't introduce any problems, but I suspect there's a low chance of that with Bacula. > > Anyway, I'll report back in a few weeks as to what the numbers look like. > > Yeah, please for the moment just watch what happens with the default > behavior. Yup. -- Bill Moran http://www.potentialtech.com
On Wed, 2007-02-28 at 09:17 -0800, Joshua D. Drake wrote: > Bill, you are right but I believe Jim was speaking from a general > perspective. Generally speaking you should not have to reindex, or if > you do very rarely. > > I too have a couple of databases we manage that require a reindex more > often than what would be considered normal, but a reindex is far from > the norm itself. > Isn't a REINDEX still needed in the case of monotonically increasing keys, such as in a sequence or timestamp index? I also delete tuples, so that results in a forward-shifting range of keys. If this is not normal, I need to re-evaluate my autovacuum settings. Regards, Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > Isn't a REINDEX still needed in the case of monotonically increasing > keys, such as in a sequence or timestamp index? I also delete tuples, so > that results in a forward-shifting range of keys. No, that shouldn't be a problem, if you're maintaining a constant key range width (that is, *all* the old entries get deleted). The only pattern I'm aware of that causes a problem is if you leave a small subset of the keys behind, for instance insert every few minutes and then later delete all but one entry per day. In this situation you may end up with an index containing as few as one entry per page. We don't have any mechanism short of REINDEX to collapse nonempty index pages together, so that way lies bloat. But if you delete all the old entries then the pages get recycled and there shouldn't be a problem. regards, tom lane
On Fri, 2007-03-02 at 16:39 -0500, Tom Lane wrote: > Jeff Davis <pgsql@j-davis.com> writes: > > Isn't a REINDEX still needed in the case of monotonically increasing > > keys, such as in a sequence or timestamp index? I also delete tuples, so > > that results in a forward-shifting range of keys. > > No, that shouldn't be a problem, if you're maintaining a constant key > range width (that is, *all* the old entries get deleted). The only > pattern I'm aware of that causes a problem is if you leave a small > subset of the keys behind, for instance insert every few minutes and > then later delete all but one entry per day. In this situation you may > end up with an index containing as few as one entry per page. We > don't have any mechanism short of REINDEX to collapse nonempty index > pages together, so that way lies bloat. But if you delete all the old > entries then the pages get recycled and there shouldn't be a problem. > You just described this particular table, so I will need to continue REINDEXing. It's getting maybe 10-50 inserts per second, and most expire in an day. However, a small percentage hang around for much longer. REINDEX isn't a problem for me, because there are periods of low usage. I think if I really wanted to eliminate REINDEX I could move the few remaining records into another table and have a view accross them. Regards, Jeff Davis
In response to Tom Lane <tgl@sss.pgh.pa.us>: > Bill Moran <wmoran@collaborativefusion.com> writes: > > Just an FYI ... I remembered what prompted the cron job. > > > We were seeing significant performance degradation. I never did actual > > measurements, but it was on the order of "Bill, why is restoring taking > > such a long time?" from other systems people. At the time, I poked around > > and tried some stuff here and there and found that reindex restored > > performance. I didn't look at actual size at that time. > > A reindex might improve performance for reasons other than bloat --- to > wit, that a freshly-built index is in perfect physical order, which > tends to get degraded over time by page splits. How important that is > depends on your usage patterns. If this is what the story is for your > situation, then what might fix it (in 8.2) is to create the index with > FILLFACTOR 50 or so, so that it's already at the steady state density > and won't need many page splits. > > > Anyway, I'll report back in a few weeks as to what the numbers look like. > > Yeah, please for the moment just watch what happens with the default > behavior. Remember this discussion? To recap, I had scheduled a weekly reindex of this database because I was seeing performance issues otherwise. In order to see if this was actually helping, I disabled the redindex job, ran a few timing experiments, then scheduled a job to email me the size of the indexes in the database on a daily basis. At this point, I have daily records of index size since March 6th. The behaviour is like this: A freshly created index is about 21,000 pages in size. Under normal usage, the index size balloons to about 38,000 pages immediately after the first backup job is run. From there it grows slowly (but fairly consistently) by about 100 pages each day. As of today, it is 44304 pages. When I first brought up this discussion, the table contained 8068956 rows. It now has 7451381, which means it's dropped by 7% The important part is that I can't reproduce the performance problems that I originally thought were the result of this. It's entirely possible that something else was changed since then that actually fixed the problem, and that the index bloat was a red herring. Not sure what (if any) conclusions can be drawn from this. Is there any other data I should gather? Have I just proved my previous rantings about the necessity of reindexing to be wrong? -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023
Bill and Tom Best to find out what kind of index you want to create beforehand If your data is evenly distributed and exhibits High Cardinality (2 entries for A,B,C...Z) then I would recommend a BTREE Index If not (low cardinality scenarios such as gender) then create Bitmap Index I cant speak for postgres but index creation will necessitate you to schedule time when you can bring DB offline (such as a weekend) as most DB will not allow a unique index to be created on a table while the table is in use Also I find oracle books and online documentation very helpful specifically http://otn.oracle.com Books are available from Oracle Press HTH Martin This email message and any files transmitted with it contain confidential information intended only for the person(s) to whom this email message is addressed. If you have received this email message in error, please notify the sender immediately by telephone or email and destroy the original message without making a copy. Thank you. ----- Original Message ----- From: "Bill Moran" <wmoran@collaborativefusion.com> To: "Tom Lane" <tgl@sss.pgh.pa.us> Cc: <pgsql-general@postgresql.org> Sent: Thursday, April 19, 2007 9:33 AM Subject: Re: [GENERAL] How often do I need to reindex tables? > In response to Tom Lane <tgl@sss.pgh.pa.us>: > >> Bill Moran <wmoran@collaborativefusion.com> writes: >> > Just an FYI ... I remembered what prompted the cron job. >> >> > We were seeing significant performance degradation. I never did actual >> > measurements, but it was on the order of "Bill, why is restoring taking >> > such a long time?" from other systems people. At the time, I poked >> > around >> > and tried some stuff here and there and found that reindex restored >> > performance. I didn't look at actual size at that time. >> >> A reindex might improve performance for reasons other than bloat --- to >> wit, that a freshly-built index is in perfect physical order, which >> tends to get degraded over time by page splits. How important that is >> depends on your usage patterns. If this is what the story is for your >> situation, then what might fix it (in 8.2) is to create the index with >> FILLFACTOR 50 or so, so that it's already at the steady state density >> and won't need many page splits. >> >> > Anyway, I'll report back in a few weeks as to what the numbers look >> > like. >> >> Yeah, please for the moment just watch what happens with the default >> behavior. > > Remember this discussion? > > To recap, I had scheduled a weekly reindex of this database because I > was seeing performance issues otherwise. In order to see if this was > actually helping, I disabled the redindex job, ran a few timing > experiments, then scheduled a job to email me the size of the indexes > in the database on a daily basis. > > At this point, I have daily records of index size since March 6th. > > The behaviour is like this: A freshly created index is about 21,000 > pages in size. Under normal usage, the index size balloons to about > 38,000 pages immediately after the first backup job is run. From there > it grows slowly (but fairly consistently) by about 100 pages each day. > As of today, it is 44304 pages. > > When I first brought up this discussion, the table contained 8068956 > rows. It now has 7451381, which means it's dropped by 7% > > The important part is that I can't reproduce the performance problems > that I originally thought were the result of this. It's entirely > possible that something else was changed since then that actually > fixed the problem, and that the index bloat was a red herring. > > Not sure what (if any) conclusions can be drawn from this. Is there > any other data I should gather? Have I just proved my previous > rantings about the necessity of reindexing to be wrong? > > -- > Bill Moran > Collaborative Fusion Inc. > http://people.collaborativefusion.com/~wmoran/ > > wmoran@collaborativefusion.com > Phone: 412-422-3463x4023 > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend >