Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers
From | John Naylor |
---|---|
Subject | Re: [PoC] Improve dead tuple storage for lazy vacuum |
Date | |
Msg-id | CANWCAZbJ5f66O+tE3U4pbgphmqZoUiPSSV-=21ZRsEu2Od+mZw@mail.gmail.com Whole thread Raw |
In response to | Re: [PoC] Improve dead tuple storage for lazy vacuum (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: [PoC] Improve dead tuple storage for lazy vacuum
|
List | pgsql-hackers |
On Fri, Jan 19, 2024 at 2:26 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Jan 18, 2024 at 1:30 PM John Naylor <johncnaylorls@gmail.com> wrote: > > I'm not quite sure what the point of "num_items" is anymore, because > > it was really tied to the array in VacDeadItems. dead_items->num_items > > is essential to reading/writing the array correctly. If this number is > > wrong, the array is corrupt. There is no such requirement for the > > radix tree. We don't need to know the number of tids to add to it or > > do a lookup, or anything. > > True. Sorry I wanted to say "num_tids" of TidStore. I'm still thinking > we need to have the number of TIDs in a tidstore, especially in the > tidstore's control object. Hmm, it would be kind of sad to require explicit locking in tidstore.c is only for maintaining that one number at all times. Aside from the two ereports after an index scan / second heap pass, the only non-assert place where it's used is @@ -1258,7 +1265,7 @@ lazy_scan_heap(LVRelState *vacrel) * Do index vacuuming (call each index's ambulkdelete routine), then do * related heap vacuuming */ - if (dead_items->num_items > 0) + if (TidStoreNumTids(dead_items) > 0) lazy_vacuum(vacrel); ...and that condition can be checked by doing a single step of iteration to see if it shows anything. But for the ereport, my idea for iteration + popcount is probably quite slow. > IIUC lpdead_items is the total number of LP_DEAD items vacuumed during > the whole lazy vacuum operation whereas num_items is the number of > LP_DEAD items vacuumed within one index vacuum and heap vacuum cycle. > That is, after heap vacuum, the latter counter is reset while the > former counter is not. > > The latter counter is used in lazyvacuum.c as well as the ereport in > vac_bulkdel_one_index(). Ah, of course. > Putting a copy of the key in BlocktableEntry's header is an > interesting idea. But the current debug code in the tidstore also > makes sure that the tidstore returns TIDs in the correct order during > an iterate operation. I think it still has a value and you can disable > it by removing the "#define TIDSTORE_DEBUG" line. Fair enough. I just thought it'd be less work to leave this out in case we change how locking is called. > > This week I tried an idea to use a callback there so that after > > internal unlocking, the caller received the value (or whatever else > > needs to happen, such as lookup an offset in the tid bitmap). I've > > attached a draft for that that passes radix tree tests. It's a bit > > awkward, but I'm guessing this would more closely match future > > internal atomic locking. Let me know what you think of the concept, > > and then do whichever way you think is best. (using v53 as the basis) > > Thank you for verifying this idea! Interesting. While it's promising > in terms of future atomic locking, I'm concerned it might not be easy > to use if radix tree APIs supports only such callback style. Yeah, it's quite awkward. It could be helped by only exposing it for varlen types. For simply returning "present or not" (used a lot in the regression tests), we could skip the callback if the data is null. That is all also extra stuff. > I believe > the caller would like to pass one more data along with val_data. For That's trivial, however, if I understand you correctly. With "void *", a callback can receive anything, including a struct containing additional pointers to elsewhere. > example, considering tidstore that has num_tids internally, it wants > to pass both a pointer to BlocktableEntry and a pointer to TidStore > itself so that it increments the counter while holding a lock. Hmm, so a callback to RT_SET also. That's interesting! Anyway, I agree it needs to be simple, since the first use doesn't even have multiple writers. > BTW in radixtree.h pg_attribute_unused() is used for some functions, > but is it for debugging purposes? I don't see why it's used only for > some functions. It was there to silence warnings about unused functions. I only see one remaining, and it's already behind a debug symbol, so we might not need this attribute anymore.
pgsql-hackers by date: