Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: [PoC] Improve dead tuple storage for lazy vacuum |
Date | |
Msg-id | CAD21AoCe1Ch+y1_nDzTXF5w6VT=nnP33QXM5V9z-PE3-x0f_xw@mail.gmail.com Whole thread Raw |
In response to | Re: [PoC] Improve dead tuple storage for lazy vacuum (John Naylor <john.naylor@enterprisedb.com>) |
Responses |
Re: [PoC] Improve dead tuple storage for lazy vacuum
|
List | pgsql-hackers |
On Mon, Jan 9, 2023 at 5:59 PM John Naylor <john.naylor@enterprisedb.com> wrote: > > > [working on templating] > > In the end, I decided to base my effort on v8, and not v12 (based on one of my less-well-thought-out ideas). The latterwas a good experiment, but it did not lead to an increase in readability as I had hoped. The attached v17 is stillrough, but it's in good enough shape to evaluate a mostly-complete templating implementation. I really appreciate your work! > > v13-0007 had some changes to the regression tests, but I haven't included those. The tests from v13-0003 do pass, bothlocally and shared. I quickly hacked together changing shared/local tests by hand (need to recompile), but it would begood for maintainability if tests could run once each with local and shmem, but use the same "expected" test output. Agreed. > Also, I didn't look to see if there were any changes in v14/15 that didn't have to do with precise memory accounting. > > At this point, Masahiko, I'd appreciate your feedback on whether this is an improvement at all (or at least a good basefor improvement), especially for integrating with the TID store. I think there are some advantages to the template approach.One possible disadvantage is needing separate functions for each local and shared memory. > > If we go this route, I do think the TID store should invoke the template as static functions. I'm not quite comfortablewith a global function that may not fit well with future use cases. It looks no problem in terms of vacuum integration, although I've not fully tested yet. TID store uses the radix tree as the main storage, and with the template radix tree, the data types for shared and non-shared will be different. TID store can have an union for the radix tree and the structure would be like follows: /* Per-backend state for a TidStore */ struct TidStore { /* * Control object. This is allocated in DSA area 'area' in the shared * case, otherwise in backend-local memory. */ TidStoreControl *control; /* Storage for Tids */ union tree { local_radix_tree *local; shared_radix_tree *shared; }; /* DSA area for TidStore if used */ dsa_area *area; }; In the functions of TID store, we need to call either local or shared radix tree functions depending on whether TID store is shared or not. We need if-branch for each key-value pair insertion, but I think it would not be a big performance problem in TID store use cases, since vacuum is an I/O intensive operation in many cases. Overall, I think there is no problem and I'll investigate it in depth. Apart from that, I've been considering the lock support for shared radix tree. As we discussed before, the current usage (i.e, only parallel index vacuum) doesn't require locking support at all, so it would be enough to have a single lock for simplicity. If we want to use the shared radix tree for other use cases such as the parallel heap vacuum or the replacement of the hash table for shared buffers, we would need better lock support. For example, if we want to support Optimistic Lock Coupling[1], we would need to change not only the node structure but also the logic. Which probably leads to widen the gap between the code for non-shared and shared radix tree. In this case, once we have a better radix tree optimized for shared case, perhaps we can replace the templated shared radix tree with it. I'd like to hear your opinion on this line. > > One review point I'll mention: Somehow I didn't notice there is no use for the "chunk" field in the rt_node type -- it'sonly set to zero and copied when growing. What is the purpose? Removing it would allow the smallest node to take up only32 bytes with a fanout of 3, by eliminating padding. Oh, I didn't notice that. The chunk field was originally used when redirecting the child pointer in the parent node from old to new (grown) node. When redirecting the pointer, since the corresponding chunk surely exists on the parent we can skip existence checks. Currently we use RT_NODE_UPDATE_INNER() for that (see RT_REPLACE_NODE()) but having a dedicated function to update the existing chunk and child pointer might improve the performance. Or reducing the node size by getting rid of the chunk field might be better. > Also, v17-0005 has an optimization/simplification for growing into node125 (my version needs an assertion or fallback,but works well now), found by another reading of Andres' prototype There is a lot of good engineering there, weshould try to preserve it. Agreed. Regards, [1] https://db.in.tum.de/~leis/papers/artsync.pdf -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: