Re: Improve eviction algorithm in ReorderBuffer - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Improve eviction algorithm in ReorderBuffer |
Date | |
Msg-id | CAD21AoAtf12e9Z9NLBuaO1GjHMMo16_8R-yBu9Q9jrk2QLqMEA@mail.gmail.com Whole thread Raw |
In response to | Re: Improve eviction algorithm in ReorderBuffer (Jeff Davis <pgsql@j-davis.com>) |
Responses |
Re: Improve eviction algorithm in ReorderBuffer
|
List | pgsql-hackers |
On Sat, Apr 6, 2024 at 5:44 AM Jeff Davis <pgsql@j-davis.com> wrote: > > > > > It sounds like a data structure that mixes the hash table and the > > binary heap and we use it as the main storage (e.g. for > > ReorderBufferTXN) instead of using the binary heap as the secondary > > data structure. IIUC with your idea, the indexed binary heap has a > > hash table to store elements each of which has its index within the > > heap node array. I guess it's better to create it as a new data > > structure rather than extending the existing binaryheap, since APIs > > could be very different. I might be missing something, though. > > You are right that this approach starts to feel like a new data > structure and is not v17 material. > > I am interested in this for v18 though -- we could make the API more > like simplehash to be more flexible when using values (rather than > pointers) and to be able to inline the comparator. Interesting project. It would be great if we could support increasing and decreasing the key as APIs. The current binaryheap_update_{up|down} APIs are not very user-friendly. > > > > * remove the hash table from binaryheap.c > > > > > > * supply a new callback to the binary heap with type like: > > > > > > typedef void (*binaryheap_update_index)( > > > bh_node_type node, > > > int new_element_index); > > > > > > * make the remove, update_up, and update_down methods take the > > > element > > > index rather than the pointer > > ... > > > This shows that the current indexed binaryheap is much slower than > > the > > other two implementations, and the xx_binaryheap has a good number in > > spite of also being indexed. > > xx_binaryheap isn't indexed though, right? Well, yes. To be xact, xx_binaryheap isn't indexed but the element indexes are stored in the element itself (see test_elem struct) so the caller still can update the key using xx_binaryheap_update_{up|down}. > > > When it comes to > > implementing the above idea (i.e. changing binaryheap to > > xx_binaryheap), it was simple since we just replace the code where we > > update the hash table with the code where we call the callback, if we > > get the consensus on API change. > > That seems reasonable to me. > > > The fact that we use simplehash for the internal hash table might > > make > > this idea complex. If I understand your suggestion correctly, the > > caller needs to tell the hash table the hash function when creating a > > binaryheap but the hash function needs to be specified at a compile > > time. We can use a dynahash instead but it would make the binaryheap > > slow further. > > simplehash.h supports private_data, which makes it easier to track a > callback. > > In binaryheap.c, that would look something like: > > static inline uint32 > binaryheap_hash(bh_nodeidx_hash *tab, uint32 key) > { > binaryheap_hashfunc hashfunc = tab->private_data; > return hashfunc(key); > } > > ... > #define SH_HASH_KEY(tb, key) binaryheap_hash(tb, key) > ... > > binaryheap_allocate(int num_nodes, binaryheap_comparator compare, > void *arg, binaryheap_hashfunc hashfunc) > { > ... > if (hashfunc != NULL) > { > /* could have a new structure, but we only need to > * store one pointer, so don't bother with palloc/pfree */ > void *private_data = (void *)hashfunc; > heap->bh_nodeidx = bh_nodeidx_create(..., private_data); > ... > > > And in reorderbuffer.c, define the callback like: > > static uint32 > reorderbuffer_xid_hash(TransactionId xid) > { > /* fasthash32 is 'static inline' so may > * be faster than hash_bytes()? */ > return fasthash32(&xid, sizeof(TransactionId), 0); > } > Thanks, that's a good idea. > > > In summary, there are two viable approaches for addressing the concerns > in v17: > > 1. Provide callback to update ReorderBufferTXN->heap_element_index, and > use that index (rather than the pointer) for updating the heap key > (transaction size) or removing elements from the heap. > > 2. Provide callback for hashing, so that binaryheap.c can hash the xid > value rather than the pointer. > > I don't have a strong opinion about which one to use. I prefer > something closer to #1 for v18, but for v17 I suggest whichever one > comes out simpler. I've implemented prototypes of both ideas, and attached the draft patches. I agree with you that something closer to #1 is for v18. Probably we can implement the #1 idea while making binaryheap codes template like simplehash.h. For v17, changes for #2 are smaller, but I'm concerned that the new API that requires a hash function to be able to use binaryheap_update_{up|down} might not be user friendly. In terms of APIs, I prefer #1 idea. And changes for #1 can make the binaryheap code simple, although it requires adding a variable in ReorderBufferTXN instead. But overall, it can remove the hash table and some functions so it looks better to me. When it comes to performance overhead, I mentioned that there is some regression in the current binaryheap even without indexing. Since function calling contributed to the regression, inlining some functions reduced some overheads. For example, inlining set_node() and replace_node(), the same benchmark test I used showed: postgres(1:88476)=# select * from generate_series(1,3) x(x), lateral (select * from bench_load(false, 10000000 * (1+x-x))); x | cnt | load_ms | xx_load_ms | old_load_ms ---+----------+---------+------------+------------- 1 | 10000000 | 502 | 624 | 427 2 | 10000000 | 503 | 622 | 428 3 | 10000000 | 502 | 621 | 427 (3 rows) Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
pgsql-hackers by date: