Home > mailing lists

Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From	Masahiko Sawada
Subject	Re: [PoC] Improve dead tuple storage for lazy vacuum
Date	March 20, 2023 08:24:38
Msg-id	CAD21AoCPXZ3ziQTt=nU2vMyMsskqOExWrs9ypp96YW3BPvEPSQ@mail.gmail.com Whole thread Raw
In response to	Re: [PoC] Improve dead tuple storage for lazy vacuum (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses	Re: [PoC] Improve dead tuple storage for lazy vacuum (John Naylor <john.naylor@enterprisedb.com>)
List	pgsql-hackers

Tree view

On Fri, Mar 17, 2023 at 4:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Fri, Mar 17, 2023 at 4:03 PM John Naylor
> <john.naylor@enterprisedb.com> wrote:
> >
> > On Wed, Mar 15, 2023 at 9:32 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Tue, Mar 14, 2023 at 8:27 PM John Naylor
> > > <john.naylor@enterprisedb.com> wrote:
> > > >
> > > > I wrote:
> > > >
> > > > > > > Since the block-level measurement is likely overestimating quite a bit, I propose to simply reverse the
orderof the actions here, effectively reporting progress for the *last page* and not the current one: First update
progresswith the current memory usage, then add tids for this page. If this allocated a new block, only a small bit of
thatwill be written to. If this block pushes it over the limit, we will detect that up at the top of the loop. It's
kindof like our earlier attempts at a "fudge factor", but simpler and less brittle. And, as far as OS pages we have
actuallywritten to, I think it'll effectively respect the memory limit, at least in the local mem case. And the numbers
willmake sense. 
> >
> > > > I still like my idea at the top of the page -- at least for vacuum and m_w_m. It's still not completely clear
ifit's right but I've got nothing better. It also ignores the work_mem issue, but I've given up anticipating all future
casesat the moment. 
> >
> > > IIUC you suggested measuring memory usage by tracking how much memory
> > > chunks are allocated within a block. If your idea at the top of the
> > > page follows this method, it still doesn't deal with the point Andres
> > > mentioned.
> >
> > Right, but that idea was orthogonal to how we measure memory use, and in fact mentions blocks specifically. The
re-orderingwas just to make sure that progress reporting didn't show current-use > max-use. 
>
> Right. I still like your re-ordering idea. It's true that the most
> area of the last allocated block before heap scanning stops is not
> actually used yet. I'm guessing we can just check if the context
> memory has gone over the limit. But I'm concerned it might not work
> well in systems where overcommit memory is disabled.
>
> >
> > However, the big question remains DSA, since a new segment can be as large as the entire previous set of
allocations.It seems it just wasn't designed for things where memory growth is unpredictable. 

aset.c also has a similar characteristic; allocates an 8K block upon
the first allocation in a context, and doubles that size for each
successive block request. But we can specify the initial block size
and max blocksize. This made me think of another idea to specify both
to DSA and both values are calculated based on m_w_m. For example, we
can create a DSA in parallel_vacuum_init() as follows:

initial block size = min(m_w_m / 4, 1MB)
max block size = max(m_w_m / 8, 8MB)

In most cases, we can start with a 1MB initial segment, the same as
before. For small memory cases, say 1MB, we start with a 256KB initial
segment and heap scanning stops after DSA allocated 1.5MB (= 256kB +
256kB + 512kB + 512kB). For larger memory, we can have heap scan stop
after DSA allocates 1.25 times more memory than m_w_m. For example, if
m_w_m = 1GB, the both initial and maximum segment sizes are 1MB and
128MB respectively, and then DSA allocates the segments as follows
until heap scanning stops:

2 * (1 + 2 + 4 + 8 + 16 + 32 + 64 + 128) + (128 * 5) = 1150MB

dsa_allocate() will be extended to have the initial and maximum block
sizes like AllocSetContextCreate().

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

pgsql-hackers by date:

From: Michael Paquier
Date: 20 March 2023, 08:17:33
Subject: Re: Improve logging when using Huge Pages

From: Peter Smith
Date: 20 March 2023, 09:10:46
Subject: Re: BF mamba failure

Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

Previous

Next