Thread: maintenance_work_mem = 64kB doesn't work for vacuum

maintenance_work_mem = 64kB doesn't work for vacuum

From
Masahiko Sawada
Date:
Hi,

Commit bbf668d66fbf6 (back-patched to v17) lowered the minimum
maintenance_work_mem to 64kB, but it doesn't work for parallel vacuum
cases since the minimum dsa segment size (DSA_MIN_SEGMENT_SIZE) is
256kB. As soon as the radix tree allocates its control object and the
root node, the memory usage exceeds the maintenance_work_mem limit and
vacuum ends up with the following assertion:

TRAP: failed Assert("vacrel->lpdead_item_pages > 0"), File:
"vacuumlazy.c", Line: 2447, PID: 3208575

On build without --enable-cassert, vacuum never ends.

I've tried to lower DSA_MIN_SEGMENT_SIZE to 64kB but no luck.
Investigating it further, dsa creates a 64kB superblock for each size
class and when creating a new shared radix tree we need to create two
superblocks: one for the radix tree control data (64 bytes) and
another one for the root node (40 bytes). Also, each superblock
requires a span data, which uses 1 page (4096kB). Therefore, we need
at least 136kB for a shared radix tree even when it's empty.

A simple fix is to bump the minimum maintenance_work_mem to 256kB. We
would break the compatibility for backbranch (i.e. v17) but I guess
it's unlikely that existing v17 users are using less than 1MB
maintenance_work_mem (the release note doesn't mention the fact that
we lowered the minimum value).

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: maintenance_work_mem = 64kB doesn't work for vacuum

From
David Rowley
Date:
On Mon, 10 Mar 2025 at 07:46, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> A simple fix is to bump the minimum maintenance_work_mem to 256kB. We
> would break the compatibility for backbranch (i.e. v17) but I guess
> it's unlikely that existing v17 users are using less than 1MB
> maintenance_work_mem (the release note doesn't mention the fact that
> we lowered the minimum value).

Could you do something similar to what's in hash_agg_check_limits()
where we check we've got at least 1 item before bailing before we've
used up the all the prescribed memory?  That seems like a safer coding
practise as if in the future the minimum usage for a DSM segment goes
above 256KB, the bug comes back again.

David



Re: maintenance_work_mem = 64kB doesn't work for vacuum

From
John Naylor
Date:
On Mon, Mar 10, 2025 at 1:46 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Commit bbf668d66fbf6 (back-patched to v17) lowered the minimum
> maintenance_work_mem to 64kB, but it doesn't work for parallel vacuum

That was done in the first place to make a regression test for a bug
fix easier, but that test never got committed. In any case I found it
worked back in July:

https://www.postgresql.org/message-id/CANWCAZZb7wd403wHQQUJZjkF%2BRWKAAa%2BWARP0Rj0EyMcfcdN9Q%40mail.gmail.com

--
John Naylor
Amazon Web Services



Re: maintenance_work_mem = 64kB doesn't work for vacuum

From
Melanie Plageman
Date:
On Sun, Mar 9, 2025 at 9:24 PM John Naylor <johncnaylorls@gmail.com> wrote:
>
> On Mon, Mar 10, 2025 at 1:46 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > Commit bbf668d66fbf6 (back-patched to v17) lowered the minimum
> > maintenance_work_mem to 64kB, but it doesn't work for parallel vacuum
>
> That was done in the first place to make a regression test for a bug
> fix easier, but that test never got committed. In any case I found it
> worked back in July:

Yes, I would like to keep the lower minimum. I really do have every
intention of committing that test. Apologies for taking so long.
Raising the limit to 256 kB might make the test take too long. And I
think it's nice to have that coverage (not just of the vacuum bug but
of the multi-index vacuum pass vacuum in a natural setting [as opposed
to the tidstore test module]). I don't recall if we have that
elsewhere.

- Melanie



Re: maintenance_work_mem = 64kB doesn't work for vacuum

From
David Rowley
Date:
On Mon, 10 Mar 2025 at 10:30, David Rowley <dgrowleyml@gmail.com> wrote:
> Could you do something similar to what's in hash_agg_check_limits()
> where we check we've got at least 1 item before bailing before we've
> used up the all the prescribed memory?  That seems like a safer coding
> practise as if in the future the minimum usage for a DSM segment goes
> above 256KB, the bug comes back again.

FWIW, I had something like the attached in mind.

David

Attachment

Re: maintenance_work_mem = 64kB doesn't work for vacuum

From
Masahiko Sawada
Date:
On Sun, Mar 9, 2025 at 7:03 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Mon, 10 Mar 2025 at 10:30, David Rowley <dgrowleyml@gmail.com> wrote:
> > Could you do something similar to what's in hash_agg_check_limits()
> > where we check we've got at least 1 item before bailing before we've
> > used up the all the prescribed memory?  That seems like a safer coding
> > practise as if in the future the minimum usage for a DSM segment goes
> > above 256KB, the bug comes back again.
>
> FWIW, I had something like the attached in mind.
>

Thank you for the patch! I like your idea. This means that even if we
set maintenance_work_mem to 64kB the memory usage would not actually
be limited to 64kB but probably we're fine as it's primarily testing
purpose.

Regarding that patch, we need to note that the lpdead_items is a
counter that is not reset in the entire vacuum. Therefore, with
maintenance_work_mem = 64kB, once we collect at least one lpdead item,
we perform a cycle of index vacuuming and heap vacuuming for every
subsequent block even if they don't have a lpdead item. I think we
should use vacrel->dead_items_info->num_items instead.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: maintenance_work_mem = 64kB doesn't work for vacuum

From
David Rowley
Date:
On Mon, 10 Mar 2025 at 17:22, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Regarding that patch, we need to note that the lpdead_items is a
> counter that is not reset in the entire vacuum. Therefore, with
> maintenance_work_mem = 64kB, once we collect at least one lpdead item,
> we perform a cycle of index vacuuming and heap vacuuming for every
> subsequent block even if they don't have a lpdead item. I think we
> should use vacrel->dead_items_info->num_items instead.

OK, I didn't study the code enough to realise that. My patch was only
intended as an indication of what I thought. Please feel free to
proceed with your own patch using the correct field.

When playing with parallel vacuum, I also wondered if there should be
some heuristic that avoids parallel vacuum unless the user
specifically asked for it in the command when maintenance_work_mem is
set to something far too low.

Take the following case as an example:
set maintenance_work_mem=64;
create table aa(a int primary key, b int unique);
insert into aa select a,a from generate_Series(1,1000000) a;
delete from aa;

-- try a vacuum with no parallelism
vacuum (verbose, parallel 0) aa;

system usage: CPU: user: 0.53 s, system: 0.00 s, elapsed: 0.57 s

If I did the following instead:

vacuum (verbose) aa;

The vacuum goes parallel and it takes a very long time due to
launching a parallel worker to do 1 page worth of tuples. I see the
following message 4425 times

INFO:  launched 1 parallel vacuum worker for index vacuuming (planned: 1)

and takes about 30 seconds to complete: system usage: CPU: user: 14.00
s, system: 0.81 s, elapsed: 30.86 s

Shouldn't the code in parallel_vacuum_compute_workers() try and pick a
good value for the workers based on the available memory and table
size when the user does not explicitly specify how many workers they
want?

David



Re: maintenance_work_mem = 64kB doesn't work for vacuum

From
Masahiko Sawada
Date:
On Mon, Mar 10, 2025 at 2:53 AM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Mon, 10 Mar 2025 at 17:22, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > Regarding that patch, we need to note that the lpdead_items is a
> > counter that is not reset in the entire vacuum. Therefore, with
> > maintenance_work_mem = 64kB, once we collect at least one lpdead item,
> > we perform a cycle of index vacuuming and heap vacuuming for every
> > subsequent block even if they don't have a lpdead item. I think we
> > should use vacrel->dead_items_info->num_items instead.
>
> OK, I didn't study the code enough to realise that. My patch was only
> intended as an indication of what I thought. Please feel free to
> proceed with your own patch using the correct field.
>
> When playing with parallel vacuum, I also wondered if there should be
> some heuristic that avoids parallel vacuum unless the user
> specifically asked for it in the command when maintenance_work_mem is
> set to something far too low.
>
> Take the following case as an example:
> set maintenance_work_mem=64;
> create table aa(a int primary key, b int unique);
> insert into aa select a,a from generate_Series(1,1000000) a;
> delete from aa;
>
> -- try a vacuum with no parallelism
> vacuum (verbose, parallel 0) aa;
>
> system usage: CPU: user: 0.53 s, system: 0.00 s, elapsed: 0.57 s
>
> If I did the following instead:
>
> vacuum (verbose) aa;
>
> The vacuum goes parallel and it takes a very long time due to
> launching a parallel worker to do 1 page worth of tuples. I see the
> following message 4425 times
>
> INFO:  launched 1 parallel vacuum worker for index vacuuming (planned: 1)
>
> and takes about 30 seconds to complete: system usage: CPU: user: 14.00
> s, system: 0.81 s, elapsed: 30.86 s
>
> Shouldn't the code in parallel_vacuum_compute_workers() try and pick a
> good value for the workers based on the available memory and table
> size when the user does not explicitly specify how many workers they
> want?

I think in your case the threshold of min_parallel_index_scan_size
didn't work well. Given that one worker is assigned to one index and
index vacuum time mostly depends on the index size, the index size
would be a good criterion to decide the parallel degree. For example,
even if the table has only one dead item, index vacuuming would take a
long time if indexes are large as we scan the whole indexes in common
cases (e.g. btree indexes), in which we would like to use parallel
index vacuuming. Also, even if the table has many dead items but its
indexes are small (e.g., expression indexes), it would be better not
to use parallel index vacuuming.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: maintenance_work_mem = 64kB doesn't work for vacuum

From
Masahiko Sawada
Date:
On Mon, Mar 10, 2025 at 2:53 AM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Mon, 10 Mar 2025 at 17:22, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > Regarding that patch, we need to note that the lpdead_items is a
> > counter that is not reset in the entire vacuum. Therefore, with
> > maintenance_work_mem = 64kB, once we collect at least one lpdead item,
> > we perform a cycle of index vacuuming and heap vacuuming for every
> > subsequent block even if they don't have a lpdead item. I think we
> > should use vacrel->dead_items_info->num_items instead.
>
> OK, I didn't study the code enough to realise that. My patch was only
> intended as an indication of what I thought. Please feel free to
> proceed with your own patch using the correct field.
>

I've attached the patch. I added the minimum regression tests for that.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment

Re: maintenance_work_mem = 64kB doesn't work for vacuum

From
David Rowley
Date:
On Tue, 18 Mar 2025 at 05:49, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> I've attached the patch. I added the minimum regression tests for that.

I think the change to vacuumlazy.c is ok. The new test you've added
creates a table called pvactst2 but then adds a test that uses the
pvactst table.

Did you mean to skip the DROP TABLE pvactst2;?

Is there a reason to keep the maintenance_work_mem=64 for the
subsequent existing test?

David



Re: maintenance_work_mem = 64kB doesn't work for vacuum

From
Masahiko Sawada
Date:
On Mon, Mar 17, 2025 at 7:06 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Tue, 18 Mar 2025 at 05:49, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > I've attached the patch. I added the minimum regression tests for that.
>
> I think the change to vacuumlazy.c is ok. The new test you've added
> creates a table called pvactst2 but then adds a test that uses the
> pvactst table.

Fixed.

> Did you mean to skip the DROP TABLE pvactst2;?

Yes, added DROP TABLE pvactst2.

> Is there a reason to keep the maintenance_work_mem=64 for the
> subsequent existing test?

No, I reset it immediately after tests for pvactst2.

I've attached the updated patch.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment

Re: maintenance_work_mem = 64kB doesn't work for vacuum

From
David Rowley
Date:
On Tue, 18 Mar 2025 at 19:34, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> I've attached the updated patch.

Looks good to me.

David



Re: maintenance_work_mem = 64kB doesn't work for vacuum

From
Masahiko Sawada
Date:
On Mon, Mar 17, 2025 at 11:54 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Tue, 18 Mar 2025 at 19:34, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > I've attached the updated patch.
>
> Looks good to me.

Thank you for reviewing the patch. Pushed (backpatched to v17).

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com