Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [PoC] Improve dead tuple storage for lazy vacuum
Date
Msg-id CAD21AoAednwbvpLNeQM-U9Vr7-Lg_bMj_8NVunRb7qWeknJ+zQ@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Improve dead tuple storage for lazy vacuum  (John Naylor <john.naylor@enterprisedb.com>)
Responses Re: [PoC] Improve dead tuple storage for lazy vacuum
List pgsql-hackers
On Tue, Mar 14, 2023 at 8:27 PM John Naylor
<john.naylor@enterprisedb.com> wrote:
>
> I wrote:
>
> > > > Since the block-level measurement is likely overestimating quite a bit, I propose to simply reverse the order
ofthe actions here, effectively reporting progress for the *last page* and not the current one: First update progress
withthe current memory usage, then add tids for this page. If this allocated a new block, only a small bit of that will
bewritten to. If this block pushes it over the limit, we will detect that up at the top of the loop. It's kind of like
ourearlier attempts at a "fudge factor", but simpler and less brittle. And, as far as OS pages we have actually written
to,I think it'll effectively respect the memory limit, at least in the local mem case. And the numbers will make sense. 
> > > >
> > > > Thoughts?
> > >
> > > It looks to work but it still doesn't work in a case where a shared
> > > tidstore is created with a 64kB memory limit, right?
> > > TidStoreMemoryUsage() returns 1MB and TidStoreIsFull() returns true
> > > from the beginning.
> >
> > I have two ideas:
> >
> > 1. Make it optional to track chunk memory space by a template parameter. It might be tiny compared to everything
elsethat vacuum does. That would allow other users to avoid that overhead. 
> > 2. When context block usage exceeds the limit (rare), make the additional effort to get the precise usage -- I'm
notsure such a top-down facility exists, and I'm not feeling well enough today to study this further. 
>
> Since then, Masahiko incorporated #1 into v31, and that's what I'm looking at now. Unfortunately, If I had spent five
minutesreminding myself what the original objections were to this approach, I could have saved us some effort. Back in
July(!), Andres raised two points: GetMemoryChunkSpace() is slow [1], and fragmentation [2] (leading to
underestimation).
>
> In v31, in the local case at least, the underestimation is actually worse than tracking chunk space, since it ignores
chunkheader and alignment.  I'm not sure about the DSA case. This doesn't seem great. 

Right.

>
> It shouldn't be a surprise why a simple increment of raw allocation size is comparable in speed --
GetMemoryChunkSpace()calls the right function through a pointer, which is slower. If we were willing to underestimate
forthe sake of speed, that takes away the reason for making memory tracking optional. 
>
> Further, if the option is not specified, in v31 there is no way to get the memory use at all, which seems odd. Surely
thecaller should be able to ask the context/area, if it wants to. 

There are precedents that don't provide a way to return memory usage,
such as simplehash.h and dshash.c.

>
> I still like my idea at the top of the page -- at least for vacuum and m_w_m. It's still not completely clear if it's
rightbut I've got nothing better. It also ignores the work_mem issue, but I've given up anticipating all future cases
atthe moment. 
>

What does it mean by "the precise usage" in your idea? Quoting from
the email you referred to, Andres said:

---
One thing I was wondering about is trying to choose node types in
roughly-power-of-two struct sizes. It's pretty easy to end up with significant
fragmentation in the slabs right now when inserting as you go, because some of
the smaller node types will be freed but not enough to actually free blocks of
memory. If we instead have ~power-of-two sizes we could just use a single slab
of the max size, and carve out the smaller node types out of that largest
allocation.

Btw, that fragmentation is another reason why I think it's better to track
memory usage via memory contexts, rather than doing so based on
GetMemoryChunkSpace().
---

IIUC he suggested measuring memory usage in block-level in order to
count blocks that are not actually freed but some of its chunks are
freed. That's why we used MemoryContextMemAllocated(). On the other
hand, recently you pointed out[1]:

---
I think we're trying to solve the wrong problem here. I need to study
this more, but it seems that code that needs to stay within a memory
limit only needs to track what's been allocated in chunks within a
block, since writing there is what invokes a page fault.
---

IIUC you suggested measuring memory usage by tracking how much memory
chunks are allocated within a block. If your idea at the top of the
page follows this method, it still doesn't deal with the point Andres
mentioned.

> I'll put this item and a couple other things together in a separate email tomorrow.

Thanks!

Regards,

[1] https://www.postgresql.org/message-id/CAFBsxsEnzivaJ13iCGdDoUMsXJVGOaahuBe_y%3Dq6ow%3DLTzyDvA%40mail.gmail.com


--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Combine pg_walinspect till_end_of_wal functions with others
Next
From: Andrey Borodin
Date:
Subject: Re: psql \watch 2nd argument: iteration count