Re: [HACKERS] Block level parallel vacuum - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [HACKERS] Block level parallel vacuum
Date
Msg-id CAA4eK1JwdUemQauSB-e==aGNAN7pr73GL=Pc9xpyhbAVm+K51w@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Block level parallel vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: [HACKERS] Block level parallel vacuum  (Dilip Kumar <dilipbalaut@gmail.com>)
Re: [HACKERS] Block level parallel vacuum  (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
List pgsql-hackers
On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> I think that two approaches make parallel vacuum worker wait in
> different way: in approach(a) the vacuum delay works as if vacuum is
> performed by single process, on the other hand in approach(b) the
> vacuum delay work for each workers independently.
>
> Suppose that the total number of blocks to vacuum is 10,000 blocks,
> the cost per blocks is 10, the cost limit is 200 and sleep time is 5
> ms. In single process vacuum the total sleep time is 2,500ms (=
> (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms.
> Because all parallel vacuum workers use the shared balance value and a
> worker sleeps once the balance value exceeds the limit. In
> approach(b), since the cost limit is divided evenly the value of each
> workers is 40 (e.g. when 5 parallel degree). And suppose each workers
> processes blocks  evenly,  the total sleep time of all workers is
> 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can
> compute the sleep time of approach(b) by dividing the total value by
> the number of parallel workers.
>
> IOW the approach(b) makes parallel vacuum delay much more than normal
> vacuum and parallel vacuum with approach(a) even with the same
> settings. Which behaviors do we expect?
>

Yeah, this is an important thing to decide.  I don't think that the
conclusion you are drawing is correct because it that is true then the
same applies to the current autovacuum work division where we divide
the cost_limit among workers but the cost_delay is same (see
autovac_balance_cost).  Basically, if we consider the delay time of
each worker independently, then it would appear that a parallel vacuum
delay with approach (b) is more, but that is true only if the workers
run serially which is not true.

> I thought the vacuum delay for
> parallel vacuum should work as if it's a single process vacuum as we
> did for memory usage. I might be missing something. If we prefer
> approach(b) I should change the patch so that the leader process
> divides the cost limit evenly.
>

I am also not completely sure which approach is better but I slightly
lean towards approach (b).  I think we need input from some other
people as well.  I will start a separate thread to discuss this and
see if that helps to get the input from others.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Allow superuser to grant passwordless connection rights onpostgres_fdw
Next
From: amul sul
Date:
Subject: Re: Can avoid list_copy in recomputeNamespacePath() conditionally?