Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [PoC] Improve dead tuple storage for lazy vacuum
Date
Msg-id CAD21AoAnQnZPJ63-vd_sUJgnc7L818x1oCnaf7vupt7J3JLo+w@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Improve dead tuple storage for lazy vacuum  (John Naylor <john.naylor@enterprisedb.com>)
Responses Re: [PoC] Improve dead tuple storage for lazy vacuum  (John Naylor <john.naylor@enterprisedb.com>)
List pgsql-hackers
On Thu, Dec 22, 2022 at 7:24 PM John Naylor
<john.naylor@enterprisedb.com> wrote:
>
>
> On Wed, Dec 21, 2022 at 3:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Dec 20, 2022 at 3:09 PM John Naylor
> > <john.naylor@enterprisedb.com> wrote:
>
> > > https://www.postgresql.org/message-id/20220704211822.kfxtzpcdmslzm2dy%40awork3.anarazel.de
> > >
> > > I'm guessing the hash join case can afford to be precise about memory because it must spill to disk when
exceedingworkmem. We don't have that design constraint. 
> >
> > You mean that the memory used by the radix tree should be limited not
> > by the amount of memory actually used, but by the amount of memory
> > allocated? In other words, it checks by MomoryContextMemAllocated() in
> > the local cases and by dsa_get_total_size() in the shared case.
>
> I mean, if this patch set uses 10x less memory than v15 (not always, but easy to find cases where it does), and if
it'salso expensive to track memory use precisely, then we don't have an incentive to track memory precisely. Even if we
did,we don't want to assume that every future caller of radix tree is willing to incur that cost. 

Understood.

>
> > The idea of using up to half of maintenance_work_mem might be a good
> > idea compared to the current flat-array solution. But since it only
> > uses half, I'm concerned that there will be users who double their
> > maintenace_work_mem. When it is improved, the user needs to restore
> > maintenance_work_mem again.
>
> I find it useful to step back and look at the usage patterns:
>
> Autovacuum: Limiting the memory allocated by vacuum is important, since there are multiple workers and they can run
atany time (possibly most of the time). This case will not use parallel index vacuum, so will use slab, where the quick
estimationof memory taken by the context is not terribly far off, so we can afford to be more optimistic here. 
>
> Manual vacuum: The default configuration assumes we want to finish as soon as possible (vacuum_cost_delay is zero).
Parallelindex vacuum can be used. My experience leads me to believe users are willing to use a lot of memory to make
manualvacuum finish as quickly as possible, and are disappointed to learn that even if maintenance work mem is 10GB,
vacuumcan only use 1GB. 

Agreed.

> So I don't believe anyone will have to double maintenance work mem after upgrading (even with pessimistic accounting)
becausewe'll be both 
> - much more efficient with memory on average
> - free from the 1GB cap

Make sense.

>
> That said, it's possible 50% is too pessimistic -- a 75% threshold will bring us very close to powers of two for
example:
>
> 2*(1+2+4+8+16+32+64+128) + 256 = 766MB (74.8% of 1GB) -> keep going
> 766 + 256 = 1022MB -> stop
>
> I'm not sure if that calculation could cause going over the limit, or how common that would be.
>

If the value is a power of 2, it seems to work perfectly fine. But for
example if it's 700MB, the total memory exceeds the limit:

2*(1+2+4+8+16+32+64+128) = 510MB (72.8% of 700MB) -> keep going
510 + 256 = 766MB -> stop but it exceeds the limit.

In a more bigger case, if it's 11000MB,

2*(1+2+...+2048) = 8190MB (74.4%)
8190 + 4096 = 12286MB

That being said, I don't think they are not common cases. So the 75%
threshold seems to work fine in most cases.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Ranier Vilela
Date:
Subject: Re: Small miscellaneus fixes (Part II)
Next
From: Ranier Vilela
Date:
Subject: ARRNELEMS Out-of-bounds possible errors