Re: Single pass vacuum - take 1 - Mailing list pgsql-hackers
From | Pavan Deolasee |
---|---|
Subject | Re: Single pass vacuum - take 1 |
Date | |
Msg-id | CABOikdNRfZzkN74CvWLpedp1SrJdsuLki=30rm+mpqyNYUJXjw@mail.gmail.com Whole thread Raw |
In response to | Re: Single pass vacuum - take 1 (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>) |
Responses |
Re: Single pass vacuum - take 1
|
List | pgsql-hackers |
On Thu, Jul 14, 2011 at 12:43 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
On 14.07.2011 18:57, Pavan Deolasee wrote:On Thu, Jul 14, 2011 at 11:46 AM, Simon Riggs<simon@2ndquadrant.com> wrote:I'd say that seems way too complex for such a small use case and we've
only just fixed the bugs from 8.4 vacuum map complexity. The code's
looking very robust now and I'm uneasy that such changes are really
worth it.Thanks Simon for looking at the patch.
I am not sure if the use case is really narrow. Today, we dirty the pages in
both the passes and also emit WAL records. Just the heap scan can take a
very long time for large tables, blocking the autovacuum worker threads from
doing useful work on other tables. If I am not wrong, we use ring buffers
for vacuum which would most-likely force those buffers to be written/read
twice to the disk.
Seems worthwhile to me. What bothers me a bit is the need for the new 64-bit LSN value on each heap page. Also, note that temporary tables are not WAL-logged, so there's no LSNs.
Yeah, the good thing is we store it only when its needed. Temp and unlogged tables don't need any special handling because we don't rely on the WAL logging for the table itself. As you have noted down the thread, any counter which is guaranteed to not wrap-around would have worked. LSN is just very handy for the purpose (let me think more if we can just do with a flag).
How does this interact with the visibility map? If you set the visibility map bit after vacuuming indexes, a subsequent vacuum will not visit the page. The second vacuum will update relindxvacxlogid/off, but it will not clean up the dead line pointers left behind by the first vacuum. Now the LSN on the page differs from the one stored in pg_class, so subsequent pruning will not remove the dead line pointers either. I think you can sidestep that if you check that the page's vacuum LSN <= vacuum LSN in pg_class, instead of equality.
Hmm. We need to carefully see that. As the patch stands, we don't set the visibility map bit when there are any dead line pointers on the page. I'm not sure if its clear, but the ItemIdIsDead() test accounts for dead as well as dead-vacuum line pointers, so the test in lazy_heap_scan() won't let VM bit to be set early. The next vacuum cycle would still look at the page and set the bit if the page appears all-visible at that time. Note that in the next vacuum cycle, we would first clear the dead-vacuum line pointers if its not already done by some intermediate HOT-prune operation.
Ignoring the issue stated in previous paragraph, I think you wouldn't actually need an 64-bit LSN. A smaller counter is enough, as wrap-around doesn't matter. In fact, a single bit would be enough. After a successful vacuum, the counter on each heap page (with dead line pointers) is N, and the value in pg_class is N. There are no other values on the heap, because vacuum will have cleaned them up. When you begin the next vacuum, it will stamp pages with N+1. So at any stage, there is only one of two values on any page, so a single bit is enough. (But as I said, that doesn't hold if vacuum skips some pages thanks to the visibility map)
I am not sure if that can take care of aborted vacuum case with a single bit or a counter that can wrap-around. Let me think more about it.
Is there something in place to make sure that pruning uses an up-to-date relindxvacxlogid/off value? I guess it doesn't matter if it's out-of-date, you'll just miss the opportunity to remove some dead tuples.
Yeah, exactly. We just rely on the value that was read when the pg_class tuple was last read. If we don't have the up-to-date value, we might miss some opportunity to clean up those dead-vauum line pointers.
Seems odd to store relindxvacxlogid/off as two int32 columns. Store it in one uint64 column, or invent a new datatype for LSNs, or store it as text in %X/%X format.
Thanks,
Pavan
pgsql-hackers by date: