Faster and more frequent VACUUM (was PostgreSQL clustering VS MySQL clustering) - Mailing list pgsql-performance
From | Simon Riggs |
---|---|
Subject | Faster and more frequent VACUUM (was PostgreSQL clustering VS MySQL clustering) |
Date | |
Msg-id | 1106556097.31592.139.camel@localhost.localdomain Whole thread Raw |
In response to | Re: PostgreSQL clustering VS MySQL clustering (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-performance |
On Sun, 2005-01-23 at 15:40 -0500, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > Changing the idea slightly might be better: if a row update would cause > > a block split, then if there is more than one row version then we vacuum > > the whole block first, then re-attempt the update. > > "Block split"? I think you are confusing tables with indexes. Terminologically loose, as ever. :( I meant both tables and indexes and was referring to the part of the algorithm that is entered when we have a block-full situation. > Chasing down prior versions of the same row is not very practical > anyway, since there is no direct way to find them. > > One possibility is, if you tried to insert a row on a given page but > there's not room, to look through the other rows on the same page to see > if any are deletable (xmax below the GlobalXmin event horizon). This > strikes me as a fairly expensive operation though, especially when you > take into account the need to get rid of their index entries first. Thats what I was suggesting, vac the whole page, not just those rows. Doing it immediately greatly increases the chance that the index blocks would be in cache also. > Moreover, the check would often be unproductive. > The real issue with any such scheme is that you are putting maintenance > costs into the critical paths of foreground processes that are executing > user queries. I think that one of the primary advantages of the > Postgres storage design is that we keep that work outside the critical > path and delegate it to maintenance processes that can run in the > background. We shouldn't lightly toss away that advantage. Completely agree. ...which is why I was trying to find a place for such an operation in-front-of another expensive operation which is also currently on the critical path. That way there might be benefit rather than just additional overhead. > There was some discussion in Toronto this week about storing bitmaps > that would tell VACUUM whether or not there was any need to visit > individual pages of each table. Getting rid of useless scans through > not-recently-changed areas of large tables would make for a significant > reduction in the cost of VACUUM. ISTM there are two issues here, which are only somewhat related: - speed of VACUUM on large tables - ability to run VACUUM very frequently on very frequently updated tables The needs-maintenance bitmap idea hits both, whilst the on-the-spot idea only hits the second one, even if it does it +/- better. Gut feel says we would implement only one idea...so... On balance that indicates the need-maintenance bitmap is a better idea, and one for which we already have existing code. A few questions... - wouldn't we need a bitmap per relation? - wouldn't all the extra bitmaps need to be cached in shared_buffers, which could use up a good proportion of buffer cache space - maybe we should use a smaller block size and a different cache for it - how would we update the bitmap without creating a new LWlock that needs to be acquired for every block write and so reducing scalability? - would this be implemented as an option for each table, so that we could avoid the implementation overhead? (Or perhaps don't have a bitmap if table is less than 16 blocks?) -- Best Regards, Simon Riggs
pgsql-performance by date: