Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [PoC] Improve dead tuple storage for lazy vacuum
Date
Msg-id CAD21AoD-sOZY5KFTQw1PQDvRExFGhda4SrW8C1tFr7jgJ_6ByQ@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Improve dead tuple storage for lazy vacuum  (John Naylor <john.naylor@enterprisedb.com>)
Responses Re: [PoC] Improve dead tuple storage for lazy vacuum
List pgsql-hackers
On Tue, Feb 14, 2023 at 8:24 PM John Naylor
<john.naylor@enterprisedb.com> wrote:
>
> On Mon, Feb 13, 2023 at 2:51 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Sat, Feb 11, 2023 at 2:33 PM John Naylor
> > <john.naylor@enterprisedb.com> wrote:
> > >
> > > I didn't get any closer to radix-tree regression,
> >
> > Me neither. It seems that in v26, inserting chunks into node-32 is
> > slow but needs more analysis. I'll share if I found something
> > interesting.
>
> If that were the case, then the other benchmarks I ran would likely have slowed down as well, but they are the same
orfaster. There is one microbenchmark I didn't run before: "select * from bench_fixed_height_search(15)" (15 to reduce
noisefrom growing size class, and despite the name it measures load time as well). Trying this now shows no difference:
afew runs range 19 to 21ms in each version. That also reinforces that update_inner is fine and that the move to value
pointerAPI didn't regress. 
>
> Changing TIDS_PER_BLOCK_FOR_LOAD to 1 to stress the tree more gives (min of 5, perf run separate from measurements):
>
> v15 + v26 store:
>
>  mem_allocated | load_ms
> ---------------+---------
>       98202152 |     553
>
>   19.71%  postgres  postgres             [.] tidstore_add_tids
> + 31.47%  postgres  postgres             [.] rt_set
> = 51.18%
>
>   20.62%  postgres  postgres             [.] rt_node_insert_leaf
>    6.05%  postgres  postgres             [.] AllocSetAlloc
>    4.74%  postgres  postgres             [.] AllocSetFree
>    4.62%  postgres  postgres             [.] palloc
>    2.23%  postgres  postgres             [.] SlabAlloc
>
> v26:
>
>  mem_allocated | load_ms
> ---------------+---------
>       98202032 |     617
>
>   57.45%  postgres  postgres             [.] tidstore_add_tids
>
>   20.67%  postgres  postgres             [.] local_rt_node_insert_leaf
>    5.99%  postgres  postgres             [.] AllocSetAlloc
>    3.55%  postgres  postgres             [.] palloc
>    3.05%  postgres  postgres             [.] AllocSetFree
>    2.05%  postgres  postgres             [.] SlabAlloc
>
> So it seems the store itself got faster when we removed shared memory paths from the v26 store to test it against
v15.
>
> I thought to favor the local memory case in the tidstore by controlling inlining -- it's smaller and will be called
muchmore often, so I tried the following (done in 0007) 
>
>  #define RT_PREFIX shared_rt
>  #define RT_SHMEM
> -#define RT_SCOPE static
> +#define RT_SCOPE static pg_noinline
>
> That brings it down to
>
>  mem_allocated | load_ms
> ---------------+---------
>       98202032 |     590

The improvement makes sense to me. I've also done the same test (with
changing TIDS_PER_BLOCK_FOR_LOAD to 1):

w/o 0007 patch:
 mem_allocated | load_ms | iter_ms
---------------+---------+---------
      98202032 |     334 |     445
(1 row)

w/ 0007 patch:
 mem_allocated | load_ms | iter_ms
---------------+---------+---------
      98202032 |     316 |     434
(1 row)

On the other hand, with TIDS_PER_BLOCK_FOR_LOAD being 30, the load
performance didn't improve:

w/0 0007 patch:
 mem_allocated | load_ms | iter_ms
---------------+---------+---------
      98202032 |     601 |     608
(1 row)

w/ 0007 patch:
 mem_allocated | load_ms | iter_ms
---------------+---------+---------
      98202032 |     610 |     606
(1 row)

That being said, it might be within noise level, so I agree with 0007 patch.

> Perhaps some slowdown is unavoidable, but it would be nice to understand why.

True.

>
> > I can think that something like traversing a HOT chain could visit
> > offsets out of order. But fortunately we prune such collected TIDs
> > before heap vacuum in heap case.
>
> Further, currently we *already* assume we populate the tid array in order (for binary search), so we can just
continueassuming that (with an assert added since it's more public in this form). I'm not sure why such basic common
senseevaded me a few versions ago... 

Right. TidStore is implemented not only for heap, so loading
out-of-order TIDs might be important in the future.

> > > If these are acceptable, I can incorporate them into a later patchset.
> >
> > These are nice improvements! I agree with all changes.
>
> Great, I've squashed these into the tidstore patch (0004). Also added 0005, which is just a simplification.
>

I've attached some small patches to improve the radix tree and tidstrore:

We have the following WIP comment in test_radixtree:

// WIP: compiles with warnings because rt_attach is defined but not used
// #define RT_SHMEM

How about unsetting RT_SCOPE to suppress warnings for unused rt_attach
and friends?

FYI I've briefly tested the TidStore with blocksize = 32kb, and it
seems to work fine.

> I squashed the earlier dead code removal into the radix tree patch.

Thanks!

>
> v27-0008 measures tid store iteration performance and adds a stub function to prevent spurious warnings, so the
benchmarkingmodule can always be built. 
>
> Getting the list of offsets from the old array for a given block is always trivial, but tidstore_iter_extract_tids()
isdoing a huge amount of unnecessary work when TIDS_PER_BLOCK_FOR_LOAD is 1, enough to exceed the load time: 
>
>  mem_allocated | load_ms | iter_ms
> ---------------+---------+---------
>       98202032 |     589 |     915
>
> Fortunately, it's an easy fix, done in 0009.
>
>  mem_allocated | load_ms | iter_ms
> ---------------+---------+---------
>       98202032 |     589 |     153

Cool!

>
> I'll soon resume more cosmetic review of the tid store, but this is enough to post.

Thanks!

You removed the vacuum integration patch from v27, is there any reason for that?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Todo: Teach planner to evaluate multiple windows in the optimal order
Next
From: Thomas Munro
Date:
Subject: Dead code in ps_status.c