Re: vacuum, performance, and MVCC - Mailing list pgsql-hackers
From | Christopher Browne |
---|---|
Subject | Re: vacuum, performance, and MVCC |
Date | |
Msg-id | 87ejxeb4jj.fsf@wolfe.cbbrowne.com Whole thread Raw |
In response to | vacuum, performance, and MVCC ("Mark Woodward" <pgsql@mohawksoft.com>) |
List | pgsql-hackers |
Martha Stewart called it a Good Thing when JanWieck@Yahoo.com (Jan Wieck) wrote: > On 6/22/2006 2:37 PM, Alvaro Herrera wrote: > >> Adding back pgsql-hackers. >> Mark Woodward wrote: >>> > Mark Woodward wrote: >>> > >>> >> Hmm, OK, then the problem is more serious than I suspected. >>> >> This means that every index on a row has to be updated on every >>> >> transaction that modifies that row. Is that correct? >>> > >>> > Add an index entry, yes. >>> > >>> >> I am attaching some code that shows the problem with regard to >>> >> applications such as web server session management, when run, each >>> >> second >>> >> the system can handle fewer and fewer connections. Here is a brief >>> >> output: >>> >> [...] >>> >> There has to be a more linear way of handling this scenario. >>> > >>> > So vacuum the table often. >>> That fixes the symptom, not the problem. The problem is performance >>> steadily degrades over time. >> No, you got it backwards. The performance degradation is the >> symptom. >> The problem is that there are too many dead tuples in the table. There >> is one way to solve that problem -- remove them, which is done by >> running vacuum. > > Precisely. > >> There are some problems with vacuum itself, that I agree with. For >> example it would be good if a long-running vacuum wouldn't affect a >> vacuum running in another table because of the long-running transaction >> effect it has. It would be good if vacuum could be run partially over a >> table. It would be good if there was a way to speed up vacuum by using >> a dead space map or something. > > It would be good if vacuum wouldn't waste time on blocks that don't > have any possible work in them. Vacuum has two main purposes. A) > remove dead rows and B) freeze xids. Once a block has zero deleted > rows and all xids are frozen, there is nothing to do with this block > and vacuum should skip it until a transaction updates that block. > > This requires 2 bits per block, which is 32K per 1G segment of a > heap. Clearing the bits is done when the block is marked dirty. This > way vacuum would not waste any time and IO on huge slow changing > tables. That part, sequentially scanning huge tables that didn't > change much is what keeps us from running vacuum every couple of > seconds. This is, in effect, the "VACUUM Space Map." I see one unfortunate thing about that representation of it, namely that it would in effect require that non-frozen pages be kept on the VSM for potentially a long time. Based on *present* VACUUM strategy, at least. Would it not be the case, here, that any time a page could be "frozen," it would have to be? In effect, we are always trying to run VACUUM FREEZE? -- output = ("cbbrowne" "@" "gmail.com") http://cbbrowne.com/info/finances.html Rules of the Evil Overlord #72. "If all the heroes are standing together around a strange device and begin to taunt me, I will pull out a conventional weapon instead of using my unstoppable superweapon on them. <http://www.eviloverlord.com/>
pgsql-hackers by date: