Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From John Naylor
Subject Re: [PoC] Improve dead tuple storage for lazy vacuum
Date
Msg-id CAFBsxsGV9ssgP-itGa_TVux0v=_B16-28ff+q_G2WLCgBstSLg@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Improve dead tuple storage for lazy vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: [PoC] Improve dead tuple storage for lazy vacuum
List pgsql-hackers
On Mon, Oct 31, 2022 at 12:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> I've attached v8 patches. 0001, 0002, and 0003 patches incorporated
> the comments I got so far. 0004 patch is a DSA support patch for PoC.

Thanks for the new patchset. This is not a full review, but I have some comments:

0001 and 0002 look okay on a quick scan -- I will use this as a base for further work that we discussed. However, before I do so I'd like to request another revision regarding the following:

> In 0004 patch, the basic idea is to use rt_node_ptr in all inner nodes
> to point its children, and we use rt_node_ptr as either rt_node* or
> dsa_pointer depending on whether the radix tree is shared or not (ie,
> by checking radix_tree->dsa == NULL).

0004: Looks like a good start, but this patch has a large number of changes like these, making it hard to read:

- if (found && child_p)
- *child_p = child;
+ if (found && childp_p)
+ *childp_p = childp;
...
  rt_node_inner_32 *new32;
+ rt_node_ptr new32p;
 
  /* grow node from 4 to 32 */
- new32 = (rt_node_inner_32 *) rt_copy_node(tree, (rt_node *) n4,
-  RT_NODE_KIND_32);
+ new32p = rt_copy_node(tree, (rt_node *) n4, RT_NODE_KIND_32);
+ new32 = (rt_node_inner_32 *) node_ptr_get_local(tree, new32p);

It's difficult to keep in my head what all the variables refer to. I thought a bit about how to split this patch up to make this easier to read. Here's what I came up with:

typedef struct rt_node_ptr
{
  uintptr_t encoded;
  rt_node * decoded;
}

Note that there is nothing about "dsa or local". That's deliberate. That way, we can use the "encoded" field for a tagged pointer as well, as I hope we can do (at least for local pointers) in the future. So an intermediate patch would have "static inline void" functions  node_ptr_encode() and  node_ptr_decode(), which would only copy from one member to another. I suspect that: 1. The actual DSA changes will be *much* smaller and easier to reason about. 2. Experimenting with tagged pointers will be easier.

Also, quick question: 0004 has a new function rt_node_update_inner() -- is that necessary because of DSA?, or does this ideally belong in 0002? What's the reason for it?

Regarding the performance, I've
> added another boolean argument to bench_seq/shuffle_search(),
> specifying whether to use the shared radix tree or not. Here are
> benchmark results in my environment,

> [...]

> In non-shared radix tree cases (the forth argument is false), I don't
> see a visible performance degradation. On the other hand, in shared
> radix tree cases (the forth argument is true), I see visible overheads
> because of dsa_get_address().

Thanks, this is useful.

> Please note that the current shared radix tree implementation doesn't
> support any locking, so it cannot be read while written by someone.

I think at the very least we need a global lock to enforce this.

> Also, only one process can iterate over the shared radix tree. When it
> comes to parallel vacuum, these don't become restriction as the leader
> process writes the radix tree while scanning heap and the radix tree
> is read by multiple processes while vacuuming indexes. And only the
> leader process can do heap vacuum by iterating the key-value pairs in
> the radix tree. If we want to use it for other cases too, we would
> need to support locking, RCU or something.

A useful exercise here is to think about what we'd need to do parallel heap pruning. We don't need to go that far for v16 of course, but what's the simplest thing we can do to make that possible? Other use cases can change to more sophisticated schemes if need be.

--
John Naylor
EDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [BUG] Logical replica crash if there was an error in a function.
Next
From: John Naylor
Date:
Subject: Re: Incorrect include file order in guc-file.l