Home > mailing lists

Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From	John Naylor
Subject	Re: [PoC] Improve dead tuple storage for lazy vacuum
Date	March 17, 2023 10:02:53
Msg-id	CAFBsxsGiiyY+wykVLBbN9hFUMiNHqEr_Kqg9Mpc=uv4sg8eagQ@mail.gmail.com Whole thread Raw
In response to	Re: [PoC] Improve dead tuple storage for lazy vacuum (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses	Re: [PoC] Improve dead tuple storage for lazy vacuum
List	pgsql-hackers

Tree view

On Wed, Mar 15, 2023 at 9:32 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Mar 14, 2023 at 8:27 PM John Naylor
> <john.naylor@enterprisedb.com> wrote:
> >
> > I wrote:
> >
> > > > > Since the block-level measurement is likely overestimating quite a bit, I propose to simply reverse the order of the actions here, effectively reporting progress for the *last page* and not the current one: First update progress with the current memory usage, then add tids for this page. If this allocated a new block, only a small bit of that will be written to. If this block pushes it over the limit, we will detect that up at the top of the loop. It's kind of like our earlier attempts at a "fudge factor", but simpler and less brittle. And, as far as OS pages we have actually written to, I think it'll effectively respect the memory limit, at least in the local mem case. And the numbers will make sense.

> > I still like my idea at the top of the page -- at least for vacuum and m_w_m. It's still not completely clear if it's right but I've got nothing better. It also ignores the work_mem issue, but I've given up anticipating all future cases at the moment.

> IIUC you suggested measuring memory usage by tracking how much memory
> chunks are allocated within a block. If your idea at the top of the
> page follows this method, it still doesn't deal with the point Andres
> mentioned.

Right, but that idea was orthogonal to how we measure memory use, and in fact mentions blocks specifically. The re-ordering was just to make sure that progress reporting didn't show current-use > max-use.

However, the big question remains DSA, since a new segment can be as large as the entire previous set of allocations. It seems it just wasn't designed for things where memory growth is unpredictable.

I'm starting to wonder if we need to give DSA a bit more info at the start. Imagine a "soft" limit given to the DSA area when it is initialized. If the total segment usage exceeds this, it stops doubling and instead new segments get smaller. Modifying an example we used for the fudge-factor idea some time ago:

m_w_m = 1GB, so calculate the soft limit to be 512MB and pass it to the DSA area.

2*(1+2+4+8+16+32+64+128) + 256 = 766MB (74.8% of 1GB) -> hit soft limit, so "stairstep down" the new segment sizes:

766 + 2*(128) + 64 = 1086MB -> stop

That's just an undeveloped idea, however, so likely v17 development, even assuming it's not a bad idea (could be).

And sadly, unless we find some other, simpler answer soon for tracking and limiting shared memory, the tid store is looking like v17 material.

--
John Naylor
EDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Andres Freund
Date: 17 March 2023, 09:58:29
Subject: Re: slapd logs to syslog during tests

From: Bharath Rupireddy
Date: 17 March 2023, 10:20:09
Subject: Re: Add pg_walinspect function with block info columns

Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

Previous

Next