Re: Improve eviction algorithm in ReorderBuffer - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Improve eviction algorithm in ReorderBuffer |
Date | |
Msg-id | CAD21AoAtR2A4f1-JFUE9KRkiPqMRvxS+3W=ZR7FpuXm8VSVASQ@mail.gmail.com Whole thread Raw |
In response to | Re: Improve eviction algorithm in ReorderBuffer (Jeff Davis <pgsql@j-davis.com>) |
Responses |
Re: Improve eviction algorithm in ReorderBuffer
Re: Improve eviction algorithm in ReorderBuffer |
List | pgsql-hackers |
On Fri, Apr 5, 2024 at 2:55 AM Jeff Davis <pgsql@j-davis.com> wrote: > > On Thu, 2024-04-04 at 17:28 +0900, Masahiko Sawada wrote: > > > Perhaps it's not worth the effort though, if performance is already > > > good enough? > > > > Yeah, it would be better to measure the overhead first. I'll do that. > > I have some further comments and I believe changes are required for > v17. > > An indexed binary heap API requires both a comparator and a hash > function to be specified, and has two different kinds of keys: the heap > key (mutable) and the hash key (immutable). It provides heap methods > and hashtable methods, and keep the two internal structures (heap and > HT) in sync. IIUC for example in ReorderBuffer, the heap key is transaction size and the hash key is xid. > > The implementation in b840508644 uses the bh_node_type as the hash key, > which is just a Datum, and it just hashes the bytes. I believe the > implicit assumption is that the Datum is a pointer -- I'm not sure how > one would use that API if the Datum were a value. Hashing a pointer > seems strange to me and, while I see why you did it that way, I think > it reflects that the API boundaries are not quite right. I see your point. It assumes that the bh_node_type is a pointer or at least unique. So it cannot work with Datum being a value. > > One consequence of using the pointer as the hash key is that you need > to find the pointer first: you can't change or remove elements based on > the transaction ID, you have to get the ReorderBufferTXN pointer by > finding it in another structure, first. Currently, that's being done by > searching ReorderBuffer->by_txn. So we actually have two hash tables > for essentially the same purpose: one with xid as the key, and the > other with the pointer as the key. That makes no sense -- let's have a > proper indexed binary heap to look things up by xid (the internal HT) > or by transaction size (using the internal heap). > > I suggest: > > * Make a proper indexed binary heap API that accepts a hash function > and provides both heap methods and HT methods that operate based on > values (transaction size and transaction ID, respectively). > * Get rid of ReorderBuffer->by_txn and use the indexed binary heap > instead. > > This will be a net simplification in reorderbuffer.c, which is good, > because that file makes use of a *lot* of data strucutres. > It sounds like a data structure that mixes the hash table and the binary heap and we use it as the main storage (e.g. for ReorderBufferTXN) instead of using the binary heap as the secondary data structure. IIUC with your idea, the indexed binary heap has a hash table to store elements each of which has its index within the heap node array. I guess it's better to create it as a new data structure rather than extending the existing binaryheap, since APIs could be very different. I might be missing something, though. On Fri, Apr 5, 2024 at 3:55 AM Jeff Davis <pgsql@j-davis.com> wrote: > > On Thu, 2024-04-04 at 10:55 -0700, Jeff Davis wrote: > > * Make a proper indexed binary heap API that accepts a hash > > function > > and provides both heap methods and HT methods that operate based on > > values (transaction size and transaction ID, respectively). > > * Get rid of ReorderBuffer->by_txn and use the indexed binary heap > > instead. > > An alternative idea: > > * remove the hash table from binaryheap.c > > * supply a new callback to the binary heap with type like: > > typedef void (*binaryheap_update_index)( > bh_node_type node, > int new_element_index); > > * make the remove, update_up, and update_down methods take the element > index rather than the pointer > > reorderbuffer.c would then do something like: > > void > txn_update_heap_index(ReorderBufferTXN *txn, int new_element_index) > { > txn->heap_element_index = new_element_index; > } > > ... > > txn_heap = binaryheap_allocate(..., txn_update_heap_index, ...); > > and then binaryheap.c would effectively maintain txn- > >heap_element_index, so reorderbuffer.c can pass that to the APIs that > require the element index. Thank you for the idea. I was thinking the same idea when considering your previous comment. With this idea, we still use the binaryheap for ReorderBuffer as the second data structure. Since we can implement this idea with relatively small changes to the current binaryheap, I've implemented it and measured performances. I've attached a patch that adds an extension for benchmarking binaryheap implementations. binaryheap_bench.c is the main test module. To make the comparison between different binaryheap implementations, the extension includes two different binaryheap implementations. Therefore, binaryheap_bench.c uses three different binaryheap implementation in total as the comment on top of the file says: /* * This benchmark tool uses three binary heap implementations. * * "binaryheap" is the current binaryheap implementation in PostgreSQL. That * is, it internally has a hash table to track each node index within the * node array. * * "xx_binaryheap" is based on "binaryheap" but remove the hash table. * Instead, it has each element have its index with in the node array. The * element's index is updated by the callback function, xx_binaryheap_update_index_fn * specified when xx_binaryheap_allocate(). * * "old_binaryheap" is the binaryheap implementation before the "indexed" binary * heap changes are made. It neither has a hash table internally nor tracks nodes' * indexes. */ That is, xx_binaryheap is the binaryheap implementation suggested above. The bench_load() function measures the time for adding elements (i.e. using binaryheap_add() and similar). Here are results: postgres(1:3882886)=# select * from generate_series(1,3) x(x), lateral (select * from bench_load(true, 10000000 * (1+x-x))); x | cnt | load_ms | xx_load_ms | old_load_ms ---+----------+---------+------------+------------- 1 | 10000000 | 4372 | 582 | 429 2 | 10000000 | 4371 | 582 | 429 3 | 10000000 | 4373 | 582 | 429 (3 rows) This shows that the current indexed binaryheap is much slower than the other two implementations, and the xx_binaryheap has a good number in spite of also being indexed. Here are another run that disables indexing on the current binaryheap: postgres(1:3882886)=# select * from generate_series(1,3) x(x), lateral (select * from bench_load(false, 10000000 * (1+x-x))); x | cnt | load_ms | xx_load_ms | old_load_ms ---+----------+---------+------------+------------- 1 | 10000000 | 697 | 579 | 430 2 | 10000000 | 704 | 582 | 430 3 | 10000000 | 698 | 581 | 429 (3 rows) This shows that there is still performance regression in the current binaryheap even if the indexing is disabled. xx_binaryheap also has some regressions. I haven't investigated the root cause yet though. Overall, we can say there is a large room to improve the current binaryheap performance, as you pointed out. When it comes to implementing the above idea (i.e. changing binaryheap to xx_binaryheap), it was simple since we just replace the code where we update the hash table with the code where we call the callback, if we get the consensus on API change. > Another alternative is to keep the hash table in binaryheap.c, and > supply a hash function that hashes the xid. That leaves us with two > hash tables still, but it would be cleaner than hashing the pointer. > That might be best for right now, and we can consider these other ideas > later. The fact that we use simplehash for the internal hash table might make this idea complex. If I understand your suggestion correctly, the caller needs to tell the hash table the hash function when creating a binaryheap but the hash function needs to be specified at a compile time. We can use a dynahash instead but it would make the binaryheap slow further. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
pgsql-hackers by date: