Thread: automatic REINDEX-ing
Hello all [[[ while dealing with a disk size problem I realised my REINDEX cron script was not really being called every week :( so... ]]] I executed REINDEX by hand and the disk ocupation imediatelly dropped 6 Giga...!!! is there a way to configure postgres to automatically execute the needed REINDEXING (on indexes and tables) for a given database.... something similar to auto-vacuum... I guess.... thx joao
you can use a cron job --- On Tue, 8/12/08, Joao Ferreira gmail <joao.miguel.c.ferreira@gmail.com> wrote: > From: Joao Ferreira gmail <joao.miguel.c.ferreira@gmail.com> > Subject: [GENERAL] automatic REINDEX-ing > To: "pgsql-general" <pgsql-general@postgresql.org> > Date: Tuesday, August 12, 2008, 3:13 PM > Hello all > > [[[ while dealing with a disk size problem I realised my > REINDEX cron > script was not really being called every week :( so... > ]]] > > I executed REINDEX by hand and the disk ocupation > imediatelly dropped 6 > Giga...!!! > > is there a way to configure postgres to automatically > execute the needed > REINDEXING (on indexes and tables) for a given database.... > > something similar to auto-vacuum... I guess.... > > thx > joao > > > > -- > Sent via pgsql-general mailing list > (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general
Joao Ferreira gmail <joao.miguel.c.ferreira@gmail.com> writes: > I executed REINDEX by hand and the disk ocupation imediatelly dropped 6 > Giga...!!! > is there a way to configure postgres to automatically execute the needed > REINDEXING (on indexes and tables) for a given database.... Generally speaking, there shouldn't be a need for "automatic reindexing". What the above suggests is that you need more aggressive routine vacuuming, so that you don't get into this situation in the first place. BTW, "more aggressive routine vacuuming" does NOT mean "use vacuum full". Vacuum full tends to make index bloat worse, not better. regards, tom lane
On Tue, 2008-08-12 at 11:53 -0400, Tom Lane wrote: > TW, "more aggressive routine vacuuming" does NOT mean "use vacuum > full". > Vacuum full tends to make index bloat worse, not better. > > regards, tom lane > Ok. so what does it mean ? I'm a bit lost here. I'm currently executing VACUUM FULL _and_ REINDEX (tbls & idxs) every week. Should I keep the REINDEX and drop VACUUM FULL ? How do I iterate to a better approach ? thanks. joao
On Tue, Aug 12, 2008 at 10:04 AM, Joao Ferreira gmail <joao.miguel.c.ferreira@gmail.com> wrote: > > On Tue, 2008-08-12 at 11:53 -0400, Tom Lane wrote: >> TW, "more aggressive routine vacuuming" does NOT mean "use vacuum >> full". >> Vacuum full tends to make index bloat worse, not better. >> >> regards, tom lane >> > Ok. so what does it mean ? > > > > I'm a bit lost here. I'm currently executing VACUUM FULL _and_ REINDEX > (tbls & idxs) every week. > > Should I keep the REINDEX and drop VACUUM FULL ? > > How do I iterate to a better approach ? It's better to run REGULAR vacuums more often than to vacuum full OR reindex OR both. If your machine doesn't have the I/O bandwidth to withstand being vacuumed during the day then you either have to have a fairly large free space map and vacuum off hours or buy a machine with more I/O bandwidth. With the sleep settings in vacuum and autovacuum you can usually get away with autovacuum running during the day.
Tom Lane wrote: > Joao Ferreira gmail <joao.miguel.c.ferreira@gmail.com> writes: > >> I executed REINDEX by hand and the disk ocupation imediatelly dropped 6 >> Giga...!!! >> > > >> is there a way to configure postgres to automatically execute the needed >> REINDEXING (on indexes and tables) for a given database.... >> > > Generally speaking, there shouldn't be a need for "automatic > reindexing". What the above suggests is that you need more aggressive > routine vacuuming, so that you don't get into this situation in the > first place. > > BTW, "more aggressive routine vacuuming" does NOT mean "use vacuum full". > Vacuum full tends to make index bloat worse, not better. > > regards, tom lane > > So now that we know what that term does not mean, what does it mean? Just doing it more often by adjusting the autovacuum parameters?
On Tue, Aug 12, 2008 at 10:09 AM, William Garrison <postgres@mobydisk.com> wrote: > Tom Lane wrote: >> >> Joao Ferreira gmail <joao.miguel.c.ferreira@gmail.com> writes: >> >>> >>> I executed REINDEX by hand and the disk ocupation imediatelly dropped 6 >>> Giga...!!! >>> >> >> >>> >>> is there a way to configure postgres to automatically execute the needed >>> REINDEXING (on indexes and tables) for a given database.... >>> >> >> Generally speaking, there shouldn't be a need for "automatic >> reindexing". What the above suggests is that you need more aggressive >> routine vacuuming, so that you don't get into this situation in the >> first place. >> >> BTW, "more aggressive routine vacuuming" does NOT mean "use vacuum full". >> Vacuum full tends to make index bloat worse, not better. >> > > So now that we know what that term does not mean, what does it mean? Just > doing it more often by adjusting the autovacuum parameters? exactly. Or running cronned vacuums on particular tables if they need it more often.
In response to Joao Ferreira gmail <joao.miguel.c.ferreira@gmail.com>: > > On Tue, 2008-08-12 at 11:53 -0400, Tom Lane wrote: > > TW, "more aggressive routine vacuuming" does NOT mean "use vacuum > > full". > > Vacuum full tends to make index bloat worse, not better. > > > > regards, tom lane > > > Ok. so what does it mean ? > > I'm a bit lost here. I'm currently executing VACUUM FULL _and_ REINDEX > (tbls & idxs) every week. > > Should I keep the REINDEX and drop VACUUM FULL ? Don't "vacuum full" as part of regular maintenance. Do plain vacuum. If that's unable to keep up with the database bloat, then do it more often. Whether you use autovacuum or cron isn't as important as whether you're vacuuming often enough. Personally, I like to put explicit VACUUM commands in my applications after operations that are known to bloat tables. This isn't always possible as it sometimes introduces a performance issue, but I use it where it doesn't cause problem as it solves the bloat problem at the point of creation. REINDEX is normally not needed, although there _are_ some corner cases that seem to require it. One particular corner case is VACUUM FULL, which tends to bloat indexes. If you're using vacuum on a schedule appropriate to your database activity, you'll probably not need reindex. If you do find that your use is one of those corner cases where reindex is necessary, then go ahead and do it. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023
On Tue, 2008-08-12 at 08:38 -0700, Lennin Caro wrote: > you can use a cron job > I have my cron setup to do database wide vacuums each night and it usually takes ~between 4-6 hours on ~200G DB size. On days where there is huge activity, it can drag on for like 15+ hours. I've recently dropped all my indexes and started to only rebuild _some_ needed ones. What's the method for looking at index bloats anyway?
At 12:04p -0400 on Tue, 12 Aug 2008, Joao Ferreira wrote: > I'm a bit lost here. I'm currently executing VACUUM FULL _and_ REINDEX > (tbls & idxs) every week. > > Should I keep the REINDEX and drop VACUUM FULL ? > > How do I iterate to a better approach ? It might behoove you to read the Notes section of the docs about vacuuming: http://www.postgresql.org/docs/current/static/sql-vacuum.html Roughly, VACUUM simply reclaims/frees disk space, while VACUUM FULL additionally reorganizes disk usage. I'm still don't know *why* this leads to index bloat, however, just that it does. I must defer that explication to a Postgres guru. Kevin
On Wed, Aug 13, 2008 at 12:41:41PM -0400, Kevin Hunter wrote: > Roughly, VACUUM simply reclaims/frees disk space, while VACUUM FULL > additionally reorganizes disk usage. I'm still don't know *why* this > leads to index bloat, however, just that it does. I must defer that > explication to a Postgres guru. Because VACUUM FULL needs to move stuff around in the table which means it need to mess around with the indexes (adding new entries). Ordinary VACUUM only needs to delete stuff so doesn't cause anywhere near as many problems. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
Attachment
> Because VACUUM FULL needs to move stuff around in the table which means it > need to mess around with the indexes (adding new entries). Ordinary > VACUUM only needs to delete stuff so doesn't cause anywhere near as > many problems. so.... in the event that I really end up running VACUUM FULL once in a while, it seems to me it is a good idea to run REINDEX. Is this correct ? thx joao > > Have a nice day,
At 12:53p -0400 on Wed, 13 Aug 2008, Martijn van Oosterhout wrote: > On Wed, Aug 13, 2008 at 12:41:41PM -0400, Kevin Hunter wrote: >> Roughly, VACUUM simply reclaims/frees disk space, while VACUUM FULL >> additionally reorganizes disk usage. I'm still don't know *why* this >> leads to index bloat, however, just that it does. I must defer that >> explication to a Postgres guru. > > Because VACUUM FULL needs to move stuff around in the table which means it > need to mess around with the indexes (adding new entries). Ordinary > VACUUM only needs to delete stuff so doesn't cause anywhere near as > many problems. Hmm. I get the reorganization bit, but so what? Since VACUUM FULL already has an exclusive lock, what prevents it from updating the indexes in-place to point to the new physical disk location? Why does it need to create extra bloat? Or, failing that, what's the reason to not issue a REINDEX CONCURRENTLY automatically after a VACUUM FULL (or something to that effect)? Kevin
On Wed, Aug 13, 2008 at 01:16:03PM -0400, Kevin Hunter wrote: > Hmm. I get the reorganization bit, but so what? Since VACUUM FULL > already has an exclusive lock, what prevents it from updating the > indexes in-place to point to the new physical disk location? Why does > it need to create extra bloat? AIUI, people know VACUUM FULL sucks and that in the cases where it really helps CLUSTER is faster anyway and doesn't have the index problems. The TODO list reference several discussions on the topic. > Or, failing that, what's the reason to not issue a REINDEX CONCURRENTLY > automatically after a VACUUM FULL (or something to that effect)? Or how about not doing VACUUM FULL at all. It's not a command that should be run regularly in most situations. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
Attachment
At 2:44p -0400 on Wed, 13 Aug 2008, Martijn van Oosterhout wrote: > Or how about not doing VACUUM FULL at all. It's not a command that > should be run regularly in most situations. Heh, fair enough. My questions were rather academic anyway. :-) I appreciate it, Martijn! Kevin
Martijn van Oosterhout wrote: ... > AIUI, people know VACUUM FULL sucks and that in the cases where it > really helps CLUSTER is faster anyway and doesn't have the index > problems. The TODO list reference several discussions on the topic. > >> Or, failing that, what's the reason to not issue a REINDEX CONCURRENTLY >> automatically after a VACUUM FULL (or something to that effect)? > > Or how about not doing VACUUM FULL at all. It's not a command that > should be run regularly in most situations. Which makes me think if the solution would be to just run CLUSTER under the hood when VACUUM FULL is requested. Would that introduce any other problems? Cheers Tino
Attachment
Tino Wildenhain escribió: > Which makes me think if the solution would be to just run CLUSTER under > the hood when VACUUM FULL is requested. Would that introduce any > other problems? The difference is that CLUSTER requires double the disk space in table + indexes. VACUUM FULL has no such requirement. A possibly approach would be to do an ANALYZE (to have fresh stats about dead tuple density), and do a CLUSTER if the density is too high. There has been talk about rewriting VACUUM FULL anyway; it's complex code and it introduces extra complications in other parts of code that would be otherwise unneeded, e.g. HOT. I have no idea what a rewritten VACUUM FULL would look like, though. Another thing we should do in this area is rewrite CLUSTER to use a seqscan + sort instead of indexscan when the heap/index order correlation is low. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Kevin Hunter <hunteke@earlham.edu> writes: > Hmm. I get the reorganization bit, but so what? Since VACUUM FULL > already has an exclusive lock, what prevents it from updating the > indexes in-place to point to the new physical disk location? Why does > it need to create extra bloat? Crash safety. The basic process in vacuum full is: * detect that there is room in a lower-numbered page to move the physically last tuple to. * put a copy of the last tuple there. Mark the last tuple itself as deleted by the VACUUM. * make new index entries for that copy. * repeat until there's no longer any free space that can hold the last undeleted tuple. At this point there are two copies of every moved-down tuple, and two sets of index entries for it. The inserted copies are marked as inserted by the VACUUM FULL. Therefore, a crash anywhere along this part of the process will leave them having the same status as tuples inserted by any other failed transaction, ie, not good. The original copies are deleted by the vacuum, but again, since it crashed, the deletion is a no-op. Result: you still have one and only one good copy of each tuple. Next: commit the VACUUM FULL transaction and start a new one to continue working in. The commit is atomic and instantaneous via the same WAL magic that makes every other commit atomic and instantaneous. If we crash at some point after the commit, the state is that all the moved-down tuple copies are good, all the original ones are dead. Therefore we still have one and only one good copy of each row, and there's no data corruption. Next, vacuum full runs around and deletes the index entries for the original copies of the moved tuples. Crash partway through here doesn't matter, because it doesn't actually matter whether a dead tuple has all, some, or none of its expected index entries. It's not going to be returned by any indexscan anyway. The next vacuum will finish up the job of removing index entries for any dead tuples it finds. Once we know there are no index entries left for the dead tuples, we can physically remove the tuples. Once that's done, we can truncate away any wholly-free pages at the end of the table. So in short, at no point in a vacuum sequence can you lose data due to a crash. That wouldn't be true if we overwrote valid data or index entries. Plain vacuum just removes index entries for dead tuples and then the tuples themselves. It doesn't try to move anything across page boundaries. So while it frees space that can be re-used for subsequent insertions, it will very seldom be able to shrink the table length --- usually there will still be some live tuples near the end of the table. Vacuum full's problem with indexes is comparable: the structure of the index is largely dictated by the index access method, and there's no very good way to force pages at the physical end of the index to become empty. So the excess index entries that it temporarily adds tend to lead to long-term index bloat. The longer you wait between vacuum fulls, the more tuples have to be moved to compact the table, and the worse the index bloat gets. regards, tom lane
Tom Lane escribió: > Crash safety. The basic process in vacuum full is: > > * detect that there is room in a lower-numbered page to move the > physically last tuple to. > > * put a copy of the last tuple there. Mark the last tuple itself > as deleted by the VACUUM. Hmm, I've seen people doing piecemeal table shrinking with no exclusive locking by looking up the live tuples in the last page, doing no-op UPDATEs until they moves those tuples to an earlier page with free space, and then hoping that plain VACUUM will detect that this page is empty and truncating it. I wonder if we could write some (semi-) automatic way of doing this. The benefit is that the page ends up truncated to a reasonable of pages, reducing bloat, without requiring the table to be exclusive-locked. The main problem with this approach is that it is fairly painful to deal with CTID values. The operator support for them is pretty poor. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support