Re: index bloat - Mailing list pgsql-general
From | David Esposito |
---|---|
Subject | Re: index bloat |
Date | |
Msg-id | 200507132053.j6DKrOs8003872@relay2.nnco.com Whole thread Raw |
In response to | Re: index bloat (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: index bloat
|
List | pgsql-general |
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Wednesday, July 13, 2005 3:45 PM > > Hmm, this is preferentially touching stuff near the right end of the > index, ie, it's going to bloat the pages associated with higher keys. > As I understand your usage of these indexes, pages generally only get > reclaimed off the left end (as records get old enough to be moved to > archival storage). So if you were to repeat this test for long enough > for the data to cycle all the way through the table and out again > (50 days in your real-world usage) then the extra space would be > evenly distributed and the usage would reach a steady state. So I guess the test of this would be whether the index usage increases after the 50 day mark passes ... Our last REINDEX was on June 5th so that should mean that things would level off on July 25th ... What you're saying is that when VACUUM runs nightly, it clears out all of the dead row versions from the indexes ... but since the bulk of my dead row versions are all clustered around the right side of the index, that means that those pages become very sparsely populated with data ... as the index ages (assuming time-zero is the day I did my REINDEX) the index becomes exclusively populated with those sparse pages ... and the way new keys are inserted into the index is to always add them to a new page (where the 'new' page is either a truly new page, or a page that is completely empty), rather than using up some of the fragmented space within existing pages? and this must differ from the way that space is allocated for the actual table record otherwise i would be witnessing the same bloat with the table entity itself? out of curiosity, why is it necessary for MVCC to create copies of the index keys if the value of the indexed field doesn't change with an UPDATE to the underlying record? (perhaps answering my own question: because the entry in the index is the key and the rowid that it's pointing to .. so when you UPDATE the row, you have to create a new index entry to point at the new rowid) > > The testing I've been doing so far involves UPDATEs that touch a > uniformly distributed subset of the table --- maybe that's the aspect > that is failing to match your reality. Do you mostly update > recently-added rows? Can you quantify the effect at all? This is probably a good point ... The histogram of UPDATE activity to the table probably looks something like 90% - records created < 24 hours ago 7% - records created 1 - 2 days ago 2% - records created 2 - 7 days ago 1% - records older than 7 days I don't have a way to easily quantitatively confirm this, but based on what we see in our reports and what we recommend to customers (as far as when they can consider a mailing "closed"), we usually recommend considering the 48-hour mark as the end of the vast majority of the activity .. Thanks again, Dave
pgsql-general by date: