Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: [PoC] Improve dead tuple storage for lazy vacuum |
Date | |
Msg-id | CAD21AoCcSZSj7O0ObwYXwKBfcs3W+Nd+fOLq2EmRFoOZrkdvbw@mail.gmail.com Whole thread Raw |
In response to | Re: [PoC] Improve dead tuple storage for lazy vacuum (John Naylor <johncnaylorls@gmail.com>) |
Responses |
Re: [PoC] Improve dead tuple storage for lazy vacuum
Re: [PoC] Improve dead tuple storage for lazy vacuum |
List | pgsql-hackers |
On Sat, Oct 28, 2023 at 5:56 PM John Naylor <johncnaylorls@gmail.com> wrote: > > I wrote: > > > Seems fine at a glance, thanks. I will build on this to implement variable-length values. I have already finished oneprerequisite which is: public APIs passing pointers to values. > > Since my publishing schedule has not kept up, I'm just going to share > something similar to what I mentioned earlier, just to get things > moving again. Thanks for sharing the updates. I've returned to work today and will resume working on this feature. > > 0001-0009 are from earlier versions, except for 0007 which makes a > bunch of superficial naming updates, similar to those done in a recent > other version. Somewhere along the way I fixed long-standing git > whitespace warnings, but I don't remember if that's new here. In any > case, let's try to preserve that. > > 0010 is some minor refactoring to reduce duplication > > 0011-0014 add public functions that give the caller more control over > the input and responsibility for locking. They are not named well, but > I plan these to be temporary: They are currently used for the tidstore > only, since that has much simpler tests than the standard radix tree > tests. One thing to note: since the tidstore has always done it's own > locking within a larger structure, these patches don't bother to do > locking at the radix tree level. Locking twice seems...not great. > These patches are the main prerequisite for variable-length values. > Once that is working well, we can switch the standard tests to the new > APIs. Since the variable-length values support is a big deal and would be related to API design I'd like to discuss the API design first. Currently, we have the following APIs: --- RT_VALUE_TYPE RT_GET(RT_RADIX_TREE *tree, uint64 key, bool *found); or for variable-length value support, RT_GET(RT_RADIX_TREE *tree, uint64 key, size_t sz, bool *found); If an entry already exists, return its pointer and set "found" to true. Otherwize, insert an empty value with sz bytes, return its pointer, and set "found" to false. --- RT_VALUE_TYPE RT_FIND(RT_RADIX_TREE *tree, uint64 key); If an entry exists, return the pointer to the value, otherwise return NULL. (I omitted RT_SEARCH() as it's essentially the same as RT_FIND() and will probably get removed.) --- bool RT_SET(RT_RADIX_TREE *tree, uint64 key, RT_VALUE_TYPE *value_p); or for variable-length value support, RT_SET(RT_RADIX_TREE *tree, uint64 key, RT_VALUE_TYPE *value_p, size_t sz); If an entry already exists, update its value to 'value_p' and return true. Otherwise set the value and return false. Given variable-length value support, RT_GET() would have to do repalloc() if the existing value size is not big enough for the new value, but it cannot as the radix tree doesn't know the size of each stored value. Another idea is that the radix tree returns the pointer to the slot and the caller updates the value accordingly. But it means that the caller has to update the slot properly while considering the value size (embedded vs. single-leave value), which seems not a good idea. To deal with this problem, I think we can somewhat change RT_GET() API as follow: RT_VALUE_TYPE RT_INSERT(RT_RADIX_TREE *tree, uint64 key, size_t sz, bool *found); If the entry already exists, replace the value with a new empty value with sz bytes and set "found" to true. Otherwise, insert an empty value, return its pointer, and set "found" to false. We probably will find a better name but I use RT_INSERT() for discussion. RT_INSERT() returns an empty slot regardless of existing values. It can be used to insert a new value or to replace the value with a larger value. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: