Re: Plans for solving the VACUUM problem - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Plans for solving the VACUUM problem |
Date | |
Msg-id | 14061.990166059@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Plans for solving the VACUUM problem (Hiroshi Inoue <Inoue@tpf.co.jp>) |
List | pgsql-hackers |
Hiroshi Inoue <Inoue@tpf.co.jp> writes: > Tom Lane wrote: >> 1. A new lock type that allows concurrent reads and writes >> (AccessShareLock, RowShareLock, RowExclusiveLock) but not >> anything else. > What's different from RowExclusiveLock ? I wanted something that *does* conflict with itself, thereby ensuring that only one instance of VACUUM can be running on a table at a time. This is not absolutely necessary, perhaps, but without it matters would be more complex for little gain that I can see. (For example, the tuple TIDs that lazy VACUUM gathers in step A might already be deleted and compacted out of existence by another VACUUM, and then reused as new tuples, before the first VACUUM gets back to the page to delete them. There would need to be a defense against that if we allow concurrent VACUUMs.) > And would the truncation occur that often in reality under > the scheme(without tuple movement) ? Probably not, per my comments to someone else. I'm not very concerned about that, as long as we are able to recycle freed space within the relation. We could in fact move tuples if we wanted to --- it's not fundamentally different from an UPDATE --- but then VACUUM becomes a transaction and we have the WAL-log-traffic problem back again. I'd like to try it without that and see if it gets the job done. > ???? Isn't current implementation "bulk delete" ? No, the index AM is called separately for each index tuple to be deleted; more to the point, the search for deletable index tuples should be moved inside the index AM for performance reasons. > Fast access to individual index tuples seems to be > also needed in case a few dead tuples. Yes, that is the approach Vadim used in the LAZY VACUUM code he did last summer for Alfred Perlstein. I had a couple of reasons for not duplicating that method: 1. Searching for individual index tuples requires hanging onto (or refetching) the index key data for each target tuple. A bulk delete based only on tuple TIDs is simpler and requires little memory. 2. The main reason for that form of lazy VACUUM is to minimize the time spent holding the exclusive lock on a table. With this approach we don't need to worry so much about that. A background task trawling through an index won't really bother anyone, since it won't lock out updates. 3. If we're concerned about the speed of lazy VACUUM when there's few tuples to be recycled, there's a really easy answer: don't recycle 'em. Leave 'em be until there are enough to make it worth going after 'em. Remember this is a "fast and good enough" approach, not a "get every possible byte every time" approach. Using a key-driven delete when we have only a few deletable tuples might be a useful improvement later on, but I don't think it's necessary to bother with it in the first go-round. This is a big project already... regards, tom lane
pgsql-hackers by date: