Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: [PoC] Improve dead tuple storage for lazy vacuum |
Date | |
Msg-id | CAD21AoAnTo4SWrdjXA7xWuEmGtAKG=M1-hB+jjhJGQdMLEDjGA@mail.gmail.com Whole thread Raw |
In response to | Re: [PoC] Improve dead tuple storage for lazy vacuum (John Naylor <johncnaylorls@gmail.com>) |
Responses |
Re: [PoC] Improve dead tuple storage for lazy vacuum
|
List | pgsql-hackers |
On Sun, Mar 3, 2024 at 2:43 PM John Naylor <johncnaylorls@gmail.com> wrote: > > On Fri, Mar 1, 2024 at 3:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Thu, Feb 29, 2024 at 8:43 PM John Naylor <johncnaylorls@gmail.com> wrote: > > > > + ts->rt_context = AllocSetContextCreate(CurrentMemoryContext, > > > + "tidstore storage", > > > > > > "tidstore storage" sounds a bit strange -- maybe look at some other > > > context names for ideas. > > > > Agreed. How about "tidstore's radix tree"? > > That might be okay. I'm now thinking "TID storage". On that note, one > improvement needed when we polish tidstore.c is to make sure it's > spelled "TID" in comments, like other files do already. > > > > - leaf.alloc = (RT_PTR_ALLOC) MemoryContextAlloc(tree->leaf_ctx, allocsize); > > > + leaf.alloc = (RT_PTR_ALLOC) MemoryContextAlloc(tree->leaf_ctx != NULL > > > + ? tree->leaf_ctx > > > + : tree->context, allocsize); > > > > > > Instead of branching here, can we copy "context" to "leaf_ctx" when > > > necessary (those names should look more like eachother, btw)? I think > > > that means anything not covered by this case: > > > > > > +#ifndef RT_VARLEN_VALUE_SIZE > > > + if (sizeof(RT_VALUE_TYPE) > sizeof(RT_PTR_ALLOC)) > > > + tree->leaf_ctx = SlabContextCreate(ctx, > > > + RT_STR(RT_PREFIX) "radix_tree leaf contex", > > > + RT_SLAB_BLOCK_SIZE(sizeof(RT_VALUE_TYPE)), > > > + sizeof(RT_VALUE_TYPE)); > > > +#endif /* !RT_VARLEN_VALUE_SIZE */ > > > > > > ...also, we should document why we're using slab here. On that, I > > > don't recall why we are? We've never had a fixed-length type test case > > > on 64-bit, so it wasn't because it won through benchmarking. It seems > > > a hold-over from the days of "multi-value leaves". Is it to avoid the > > > possibility of space wastage with non-power-of-two size types? > > > > Yes, it matches my understanding. > > There are two issues quoted here, so not sure if you mean both or only > the last one... I meant only the last one. > > For the latter, I'm not sure it makes sense to have code and #ifdef's > to force slab for large-enough fixed-length values just because we > can. There may never be such a use-case anyway. I'm also not against > it, either, but it seems like a premature optimization. Reading the old threads, the fact that using a slab context for leaves originally came from Andres's prototype patch, was to avoid rounding up the bytes to a power of 2 number by aset.c. It makes sense to me to use a slab context for this case. To measure the effect of using a slab, I've updated bench_radix_tree so it uses a large fixed-length value. The struct I used is: typedef struct mytype { uint64 a; uint64 b; uint64 c; uint64 d; char e[100]; } mytype; The struct size is 136 bytes with padding, just above a power-of-2. The simple benchmark test showed using a slab context for leaves is more space efficient. The results are: slab: = #select * from bench_load_random_int(1000000); mem_allocated | load_ms ---------------+--------- 405643264 | 560 (1 row) aset: =# select * from bench_load_random_int(1000000); mem_allocated | load_ms ---------------+--------- 527777792 | 576 (1 row) > > > > For this stanza that remains unchanged: > > > > > > for (int i = 0; i < RT_SIZE_CLASS_COUNT; i++) > > > { > > > MemoryContextDelete(tree->node_slabs[i]); > > > } > > > > > > if (tree->leaf_ctx) > > > { > > > MemoryContextDelete(tree->leaf_ctx); > > > } > > > > > > ...is there a reason we can't just delete tree->ctx, and let that > > > recursively delete child contexts? > > > > I thought that considering the RT_CREATE doesn't create its own memory > > context but just uses the passed context, it might be a bit unusable > > to delete the passed context in the radix tree code. For example, if a > > caller creates a radix tree (or tidstore) on a memory context and > > wants to recreate it again and again, he also needs to re-create the > > memory context together. It might be okay if we leave comments on > > RT_CREATE as a side effect, though. This is the same reason why we > > don't destroy tree->dsa in RT_FREE(). And, as for RT_FREE_RECURSE(), > > Right, I should have said "reset". Resetting a context will delete > it's children as well, and seems like it should work to reset the tree > context, and we don't have to know whether that context actually > contains leaves at all. That should allow copying "tree context" to > "leaf context" in the case where we have no special context for > leaves. Resetting the tree->context seems to work. But I think we should note for callers that the dsa_area passed to RT_CREATE should be created in a different context than the context passed to RT_CREATE because otherwise RT_FREE() will also free the dsa_area. For example, the following code in test_radixtree.c will no longer work: dsa = dsa_create(tranche_id); radixtree = rt_create(CurrentMemoryContext, dsa, tranche_id); : rt_free(radixtree); dsa_detach(dsa); // dsa is already freed. So I think that a practical usage of the radix tree will be that the caller creates a memory context for a radix tree and passes it to RT_CREATE(). I've attached an update patch set: - 0008 updates RT_FREE_RECURSE(). - 0009 patch is an updated version of cleanup radix tree memory handling. - 0010 updates comments in tidstore.c such as replacing "Tid" with "TID". - 0011 rename TidStore to TIDSTORE all places. - 0012 update bench_radix_tree so it uses a (possibly large) struct instead of uint64. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
pgsql-hackers by date: