Thread: Non-blocking vacuum full
Hello, I was wondering whether any thought has previously been given to having a non-blocking "vacuum full", in the sense of space reclamation and table compactation. The motivation is that it is useful to be able to assume that operations that span a table will *roughtly* scale linearly with the size of the table. But when you have a table that over an extended period of time begins small, grows large, and grows small again (where "large" might be, say, 200 GB), that assumption is most definitely not correct when you're on the downward slope of that graph. Having this assumption remain true simplifies things a lot for certain workloads (= my particular work load ;)). I have only looked very very briefly at the PG code so I don't know how far fetched it is, but my thought was that it should be possible to have a slow background process (similar to normal non-full vacuums nows) that would, instead of registering dead tuples in the FSM, move live tuples around. Combine that slow moving operations with a policy to a new tuple space allocation policy that prefers earlier locations on-disk, it should in time result in a situation where the physical on-disk file contains only dead tuples after a certain percentage location. At this point the file can be truncated, giving space back to the OS as well as eliminating all that dead space from having to be covered by sequential scans on the table. This does of course increase the total cost of all updates and deletes, but would be very useful in some senarios. It also has the interesting property that the scan for live tuples to move need not touch the entire table to be effective; it could by design be applied to the last <n> percentage of the table, where <n> would be scaled appropriately with the frequency of the checks relative to update/insert frequency. Other benefits: * Never vacuum full - EVER. Not even after discovering too small max_fsm_pages or too infrequent vacuums and needing to retroactively shrink the table. * Increased locality in general; even if one does not care about the diskspace or sequential scanning. Particularly relevant for low-update frequency tables suffering from sudden shrinkage, where a blocking VACUUM FULL Is not acceptable. * Non-blocking CLUSTER is perhaps suddently more trivial to implement? Or at least SORTOFCLUSTER when you want it for reasons other than perfect order ("mostly sorted"). Opinions/thoughts? -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@infidyne.com>' Key retrieval: Send an E-Mail to getpgpkey@scode.org E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org
Attachment
Peter Schuller wrote: > I have only looked very very briefly at the PG code so I don't know > how far fetched it is, but my thought was that it should be possible > to have a slow background process (similar to normal non-full vacuums > nows) that would, instead of registering dead tuples in the FSM, move > live tuples around. What you've described is actually very close to VACUUM FULL. VACUUM FULL needs to take an exclusive lock to lock out concurrent scanners that might miss or see a tuple twice, when a live tuple is moved. That's the fundamental problem you need to solve. I think it's doable, if you take a copy of the tuple, and set the ctid pointer on the old one like an UPDATE, and wait until the old tuple is no longer visible to anyone before removing it. It does require some changes to tuple visibility code. For example, a transaction running in serializable mode shouldn't throw a serialization error when it tries to update an old, moved row version, but follow the ctid pointer instead. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > Peter Schuller wrote: >> to have a slow background process (similar to normal non-full vacuums > ... > I think it's doable, if you take a copy of the tuple, and set the ctid > pointer on the old one like an UPDATE, and wait until the old tuple is > no longer visible to anyone before removing it. It does require some > changes to tuple visibility code. Wouldn't just having this slow background process repeatedly alternating between update table set anycol=anycol where ctid > [some ctid near the end] and running normal VACUUM statements do what the original poster was asking? And with 8.3, I guess also avoiding HOT?
Ron Mayer wrote: > Heikki Linnakangas wrote: >> Peter Schuller wrote: >>> to have a slow background process (similar to normal non-full vacuums >> ... >> I think it's doable, if you take a copy of the tuple, and set the ctid >> pointer on the old one like an UPDATE, and wait until the old tuple is >> no longer visible to anyone before removing it. It does require some >> changes to tuple visibility code. > > Wouldn't just having this slow background process > repeatedly alternating between > update table set anycol=anycol where ctid > [some ctid near the end] > and running normal VACUUM statements do what the original poster > was asking? Almost. Updaters would block waiting for the UPDATE, and updaters in serializable mode would throw serialization errors. And the "WHERE ctid > ?" would actually result in a seq scan scanning the whole table, since our tid scans don't support inequality searches. > And with 8.3, I guess also avoiding HOT? HOT shouldn't cause any complications here AFAICS. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com