On Mon, 10 Mar 2025 at 17:22, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Regarding that patch, we need to note that the lpdead_items is a
> counter that is not reset in the entire vacuum. Therefore, with
> maintenance_work_mem = 64kB, once we collect at least one lpdead item,
> we perform a cycle of index vacuuming and heap vacuuming for every
> subsequent block even if they don't have a lpdead item. I think we
> should use vacrel->dead_items_info->num_items instead.
OK, I didn't study the code enough to realise that. My patch was only
intended as an indication of what I thought. Please feel free to
proceed with your own patch using the correct field.
When playing with parallel vacuum, I also wondered if there should be
some heuristic that avoids parallel vacuum unless the user
specifically asked for it in the command when maintenance_work_mem is
set to something far too low.
Take the following case as an example:
set maintenance_work_mem=64;
create table aa(a int primary key, b int unique);
insert into aa select a,a from generate_Series(1,1000000) a;
delete from aa;
-- try a vacuum with no parallelism
vacuum (verbose, parallel 0) aa;
system usage: CPU: user: 0.53 s, system: 0.00 s, elapsed: 0.57 s
If I did the following instead:
vacuum (verbose) aa;
The vacuum goes parallel and it takes a very long time due to
launching a parallel worker to do 1 page worth of tuples. I see the
following message 4425 times
INFO: launched 1 parallel vacuum worker for index vacuuming (planned: 1)
and takes about 30 seconds to complete: system usage: CPU: user: 14.00
s, system: 0.81 s, elapsed: 30.86 s
Shouldn't the code in parallel_vacuum_compute_workers() try and pick a
good value for the workers based on the available memory and table
size when the user does not explicitly specify how many workers they
want?
David