Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From John Naylor
Subject Re: [PoC] Improve dead tuple storage for lazy vacuum
Date
Msg-id CAFBsxsG9gVMptcbYS+_Z49xz5iHxnKWYLqB73XcaDFVNoWEOfA@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Improve dead tuple storage for lazy vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: [PoC] Improve dead tuple storage for lazy vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers

On Wed, Dec 21, 2022 at 3:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Dec 20, 2022 at 3:09 PM John Naylor
> <john.naylor@enterprisedb.com> wrote:

> > https://www.postgresql.org/message-id/20220704211822.kfxtzpcdmslzm2dy%40awork3.anarazel.de
> >
> > I'm guessing the hash join case can afford to be precise about memory because it must spill to disk when exceeding workmem. We don't have that design constraint.
>
> You mean that the memory used by the radix tree should be limited not
> by the amount of memory actually used, but by the amount of memory
> allocated? In other words, it checks by MomoryContextMemAllocated() in
> the local cases and by dsa_get_total_size() in the shared case.

I mean, if this patch set uses 10x less memory than v15 (not always, but easy to find cases where it does), and if it's also expensive to track memory use precisely, then we don't have an incentive to track memory precisely. Even if we did, we don't want to assume that every future caller of radix tree is willing to incur that cost.

> The idea of using up to half of maintenance_work_mem might be a good
> idea compared to the current flat-array solution. But since it only
> uses half, I'm concerned that there will be users who double their
> maintenace_work_mem. When it is improved, the user needs to restore
> maintenance_work_mem again.

I find it useful to step back and look at the usage patterns:

Autovacuum: Limiting the memory allocated by vacuum is important, since there are multiple workers and they can run at any time (possibly most of the time). This case will not use parallel index vacuum, so will use slab, where the quick estimation of memory taken by the context is not terribly far off, so we can afford to be more optimistic here.

Manual vacuum: The default configuration assumes we want to finish as soon as possible (vacuum_cost_delay is zero). Parallel index vacuum can be used. My experience leads me to believe users are willing to use a lot of memory to make manual vacuum finish as quickly as possible, and are disappointed to learn that even if maintenance work mem is 10GB, vacuum can only use 1GB.

So I don't believe anyone will have to double maintenance work mem after upgrading (even with pessimistic accounting) because we'll be both
- much more efficient with memory on average
- free from the 1GB cap

That said, it's possible 50% is too pessimistic -- a 75% threshold will bring us very close to powers of two for example:

2*(1+2+4+8+16+32+64+128) + 256 = 766MB (74.8% of 1GB) -> keep going
766 + 256 = 1022MB -> stop

I'm not sure if that calculation could cause going over the limit, or how common that would be.

--
John Naylor
EDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Dag Lem
Date:
Subject: Re: [PATCH] Add function to_oct
Next
From: Michael Paquier
Date:
Subject: Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures