Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [PoC] Improve dead tuple storage for lazy vacuum
Date
Msg-id CAD21AoBffs5FpV=5WGU3v0jYY8R_AD8qDHgrBwrqak4ZzHwf-A@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Improve dead tuple storage for lazy vacuum  (John Naylor <john.naylor@enterprisedb.com>)
Responses Re: [PoC] Improve dead tuple storage for lazy vacuum  (John Naylor <john.naylor@enterprisedb.com>)
List pgsql-hackers
On Tue, Dec 20, 2022 at 3:09 PM John Naylor
<john.naylor@enterprisedb.com> wrote:
>
>
> On Mon, Dec 19, 2022 at 2:14 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Dec 13, 2022 at 1:04 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > > Looking at other code using DSA such as tidbitmap.c and nodeHash.c, it
> > > seems that they look at only memory that are actually dsa_allocate'd.
> > > To be exact, we estimate the number of hash buckets based on work_mem
> > > (and hash_mem_multiplier) and use it as the upper limit. So I've
> > > confirmed that the result of dsa_get_total_size() could exceed the
> > > limit. I'm not sure it's a known and legitimate usage. If we can
> > > follow such usage, we can probably track how much dsa_allocate'd
> > > memory is used in the radix tree.
> >
> > I've experimented with this idea. The newly added 0008 patch changes
> > the radix tree so that it counts the memory usage for both local and
> > shared cases. As shown below, there is an overhead for that:
> >
> > w/o 0008 patch
> >      298453544 |     282
>
> > w/0 0008 patch
> >      293603184 |     297
>
> This adds about as much overhead as the improvement I measured in the v4 slab allocator patch.

Oh, yes, that's bad.

> https://www.postgresql.org/message-id/20220704211822.kfxtzpcdmslzm2dy%40awork3.anarazel.de
>
> I'm guessing the hash join case can afford to be precise about memory because it must spill to disk when exceeding
workmem.We don't have that design constraint.
 

You mean that the memory used by the radix tree should be limited not
by the amount of memory actually used, but by the amount of memory
allocated? In other words, it checks by MomoryContextMemAllocated() in
the local cases and by dsa_get_total_size() in the shared case.

The idea of using up to half of maintenance_work_mem might be a good
idea compared to the current flat-array solution. But since it only
uses half, I'm concerned that there will be users who double their
maintenace_work_mem. When it is improved, the user needs to restore
maintenance_work_mem again.

A better solution would be to have slab-like DSA. We allocate the
dynamic shared memory by adding fixed-length large segments. However,
downside would be since the segment size gets large we need to
increase maintenance_work_mem as well. Also, this patch set is already
getting bigger and more complicated, I don't think it's a good idea to
add more.

If we limit the memory usage by checking the amount of memory actually
used, we can use SlabStats() for the local cases. Since DSA doesn't
have such functionality for now we would need to add it. Or we can
track it in the radix tree only in the shared cases.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Force streaming every change in logical decoding
Next
From: Vik Fearing
Date:
Subject: Re: [PATCH] random_normal function