Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: [PoC] Improve dead tuple storage for lazy vacuum |
Date | |
Msg-id | 20220704211822.kfxtzpcdmslzm2dy@awork3.anarazel.de Whole thread Raw |
In response to | Re: [PoC] Improve dead tuple storage for lazy vacuum (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: [PoC] Improve dead tuple storage for lazy vacuum
|
List | pgsql-hackers |
Hi, On 2022-06-16 13:56:55 +0900, Masahiko Sawada wrote: > diff --git a/src/backend/lib/radixtree.c b/src/backend/lib/radixtree.c > new file mode 100644 > index 0000000000..bf87f932fd > --- /dev/null > +++ b/src/backend/lib/radixtree.c > @@ -0,0 +1,1763 @@ > +/*------------------------------------------------------------------------- > + * > + * radixtree.c > + * Implementation for adaptive radix tree. > + * > + * This module employs the idea from the paper "The Adaptive Radix Tree: ARTful > + * Indexing for Main-Memory Databases" by Viktor Leis, Alfons Kemper, and Thomas > + * Neumann, 2013. > + * > + * There are some differences from the proposed implementation. For instance, > + * this radix tree module utilizes AVX2 instruction, enabling us to use 256-bit > + * width SIMD vector, whereas 128-bit width SIMD vector is used in the paper. > + * Also, there is no support for path compression and lazy path expansion. The > + * radix tree supports fixed length of the key so we don't expect the tree level > + * wouldn't be high. I think we're going to need path compression at some point, fwiw. I'd bet on it being beneficial even for the tid case. > + * The key is a 64-bit unsigned integer and the value is a Datum. I don't think it's a good idea to define the value type to be a datum. > +/* > + * As we descend a radix tree, we push the node to the stack. The stack is used > + * at deletion. > + */ > +typedef struct radix_tree_stack_data > +{ > + radix_tree_node *node; > + struct radix_tree_stack_data *parent; > +} radix_tree_stack_data; > +typedef radix_tree_stack_data *radix_tree_stack; I think it's a very bad idea for traversal to need allocations. I really want to eventually use this for shared structures (eventually with lock-free searches at least), and needing to do allocations while traversing the tree is a no-go for that. Particularly given that the tree currently has a fixed depth, can't you just allocate this on the stack once? > +/* > + * Allocate a new node with the given node kind. > + */ > +static radix_tree_node * > +radix_tree_alloc_node(radix_tree *tree, radix_tree_node_kind kind) > +{ > + radix_tree_node *newnode; > + > + newnode = (radix_tree_node *) MemoryContextAllocZero(tree->slabs[kind], > + radix_tree_node_info[kind].size); > + newnode->kind = kind; > + > + /* update the statistics */ > + tree->mem_used += GetMemoryChunkSpace(newnode); > + tree->cnt[kind]++; > + > + return newnode; > +} Why are you tracking the memory usage at this level of detail? It's *much* cheaper to track memory usage via the memory contexts? Since they're dedicated for the radix tree, that ought to be sufficient? > + else if (idx != n4->n.count) > + { > + /* > + * the key needs to be inserted in the middle of the > + * array, make space for the new key. > + */ > + memmove(&(n4->chunks[idx + 1]), &(n4->chunks[idx]), > + sizeof(uint8) * (n4->n.count - idx)); > + memmove(&(n4->slots[idx + 1]), &(n4->slots[idx]), > + sizeof(radix_tree_node *) * (n4->n.count - idx)); > + } Maybe we could add a static inline helper for these memmoves? Both because it's repetitive (for different node types) and because the last time I looked gcc was generating quite bad code for this. And having to put workarounds into multiple places is obviously worse than having to do it in one place. > +/* > + * Insert the key with the val. > + * > + * found_p is set to true if the key already present, otherwise false, if > + * it's not NULL. > + * > + * XXX: do we need to support update_if_exists behavior? > + */ Yes, I think that's needed - hence using bfm_set() instead of insert() in the prototype. > +void > +radix_tree_insert(radix_tree *tree, uint64 key, Datum val, bool *found_p) > +{ > + int shift; > + bool replaced; > + radix_tree_node *node; > + radix_tree_node *parent = tree->root; > + > + /* Empty tree, create the root */ > + if (!tree->root) > + radix_tree_new_root(tree, key, val); > + > + /* Extend the tree if necessary */ > + if (key > tree->max_val) > + radix_tree_extend(tree, key); FWIW, the reason I used separate functions for these in the prototype is that it turns out to generate a lot better code, because it allows non-inlined function calls to be sibling calls - thereby avoiding the need for a dedicated stack frame. That's not possible once you need a palloc or such, so splitting off those call paths into dedicated functions is useful. Greetings, Andres Freund
pgsql-hackers by date: