Attached is PoC v2, which fully replaces the hash table with radix
tree and passes regression tests.
The architecture is still one of tidbitmap.c having it's own tree of
(fixed length) PagetableEntries. The eventual goal is for vacuum and
tidbitmap to both use TID store in some form, but there are some
architectural issues to work out first.
0001 temporarily gets rid of the tidbitmap one-page cache so I could
always count on the hash table and radix tree having matching contents
during development, barring bugs. I'm curious how important it is, so
it doesn't yet get put back in this patch set.
0002 temporarily gets turns off page lossification for the same reason
of matching contents. (Still not reimplemented)
0003 makes some radix tree APIs more similar to simplehash, in a
backward compatible way.
0004 is a step towards putting allocation of tree values in the
caller's hands. This will be even more necessary when doing unions and
intersections with variable-length bitmaps (not yet implemented).
0005 make it easier to add new tree APIs that aren't used everywhere.
0006 RT_BEGIN_ITERATE_AT, similar to simplehash. This is a
prerequisite for 0007, and could also be used for shared iteration.
This implementation is naive and may slow down iteration, but that can
easily be fixed.
On Thu, Jul 3, 2025 at 3:43 PM I wrote:
> Second issue: TBM intersection
> 1) Radix tree iteration assumes that no one will delete keys out from
> under it. That includes the caller itself! This is not sufficient for
> intersecting bitmaps. I kluged around it by saving
> blocks-to-be-deleted in a stack array, but a proper solution could be
> to (vigorous handwaving)
> 1) save per-leaf-node keys in the iterator
> 2) delete the keys when iteration leaves that node, possibly with a
> bulk-delete mechanism
> 3) restart iteration from the top, at the next part of the key space
0007 implements the above (but no bulk deletion). It's so far only
used for intersection, but will also be the basis for page
lossification. Because latter requires inserts as well as deletes, it
could be tricky to do nicely. Compared to the hash table, retail
deletion is likely slower in general, but since they're batched per
leaf node it might be okay. 0006 and 0007 were tricky to get right,
and may contain bugs.
0008 is the main event. It's sloppy about memory management and
contains a kluge for the shared memory case: I was unable to quickly
adapt the existing relative pointer logic so I gave up and and
allocated a new DSA array, since it works and gets the point across.
There are new radix tree tests (separately, some of the existing ones
should be simplified to look more like the new bitmapset testing
module), but no new bitmap heap scan tests.
With some cleanups, restoring lossification, and maybe performance
work, the above could be the basis of a viable commit, since it has a
fairly small blast radius and gets rid of the qsort steps. However, I
believe it won't quite be enough on its own for a major release. For
that, we'd want
* variable-length keys to save memory, which would add some complexity
to union and intersection steps
* probably direct iteration over the tree, rather than via reference
arrays, which add memory and are no longer as useful because we don't
need a separate sorting step
* integration with TID store in some form
* adjustments to costing?
--
John Naylor
Amazon Web Services