Re: cost based vacuum (parallel) - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: cost based vacuum (parallel)
Date
Msg-id CAA4eK1Jft_zLKJ92yK+mgkME2ar-9fnFcO7cdV3C6TLWb6Zpng@mail.gmail.com
Whole thread Raw
In response to Re: cost based vacuum (parallel)  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: cost based vacuum (parallel)  (Dilip Kumar <dilipbalaut@gmail.com>)
List pgsql-hackers
On Fri, Nov 8, 2019 at 9:39 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> I have done some experiments on this line.  I have first produced a
> case where we can show the problem with the existing shared costing
> patch (worker which is doing less I/O might pay the penalty on behalf
> of the worker who is doing more I/O).  I have also hacked the shared
> costing patch of Swada-san so that worker only go for sleep if the
> shared balance has crossed the limit and it's local balance has
> crossed some threadshold[1].
>
> Test setup:  I have created 4 indexes on the table.  Out of which 3
> indexes will have a lot of pages to process but need to dirty a few
> pages whereas the 4th index will have to process a very less number of
> pages but need to dirty all of them.  I have attached the test script
> along with the mail.  I have shown what is the delay time each worker
> have done.  What is total I/O[1] each worker and what is the page hit,
> page miss and page dirty count?
> [1] total I/O = _nhit * VacuumCostPageHit + _nmiss *
> VacuumCostPageMiss + _ndirty * VacuumCostPageDirty
>
> patch 1: Shared costing patch:  (delay condition ->
> VacuumSharedCostBalance > VacuumCostLimit)
> worker 0 delay=80.00   total I/O=17931 hit=17891 miss=0 dirty=2
> worker 1 delay=40.00   total I/O=17931 hit=17891 miss=0 dirty=2
> worker 2 delay=110.00 total I/O=17931 hit=17891 miss=0 dirty=2
> worker 3 delay=120.98 total I/O=16378 hit=4318   miss=0 dirty=603
>
> Observation1:  I think here it's clearly visible that worker 3 is
> doing the least total I/O but delaying for maximum amount of time.
> OTOH, worker 1 is delaying for very little time compared to how much
> I/O it is doing.  So for solving this problem, I have add a small
> tweak to the patch.  Wherein the worker will only sleep if its local
> balance has crossed some threshold.  And, we can see that with that
> change the problem is solved up to quite an extent.
>
> patch 2: Shared costing patch: (delay condition ->
> VacuumSharedCostBalance > VacuumCostLimit && VacuumLocalBalance >
> VacuumCostLimit/number of workers)
> worker 0 delay=100.12 total I/O=17931 hit=17891 miss=0 dirty=2
> worker 1 delay=90.00   total I/O=17931 hit=17891 miss=0 dirty=2
> worker 2 delay=80.06   total I/O=17931 hit=17891 miss=0 dirty=2
> worker 3 delay=80.72   total I/O=16378 hit=4318 miss=0 dirty=603
>
> Observation2:  This patch solves the problem discussed with patch1 but
> in some extreme cases there is a possibility that the shared limit can
> become twice as much as local limit and still no worker goes for the
> delay.  For solving that there could be multiple ideas a) Set the max
> limit on shared balance e.g. 1.5 * VacuumCostLimit after that we will
> give the delay whoever tries to do the I/O irrespective of its local
> balance.
> b) Set a little lower value for the local threshold e.g 50% of the  local limit
>
> Here I have changed the patch2 as per (b) If local balance reaches to
> 50% of the local limit and shared balance hit the vacuum cost limit
> then go for the delay.
>
> patch 3: Shared costing patch: (delay condition ->
> VacuumSharedCostBalance > VacuumCostLimit && VacuumLocalBalance > 0.5
> * VacuumCostLimit/number of workers)
> worker 0 delay=70.03   total I/O=17931 hit=17891 miss=0 dirty=2
> worker 1 delay=100.14 total I/O=17931 hit=17891 miss=0 dirty=2
> worker 2 delay=80.01   total I/O=17931 hit=17891 miss=0 dirty=2
> worker 3 delay=101.03 total I/O=16378 hit=4318 miss=0 dirty=603
>
> Observation3:  I think patch3 doesn't completely solve the issue
> discussed in patch1 but its far better than patch1.
>

Yeah, I think it is difficult to get the exact balance, but we can try
to be as close as possible.  We can try to play with the threshold and
another possibility is to try to sleep in proportion to the amount of
I/O done by the worker.

Thanks for doing these experiments, but I think it is better if you
can share the modified patches so that others can also reproduce what
you are seeing.  There is no need to post the entire parallel vacuum
patch-set, but the costing related patch can be posted with a
reference to what all patches are required from parallel vacuum
thread.  Another option is to move this discussion to the parallel
vacuum thread, but I think it is better to decide the costing model
here.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: TestLib::command_fails_like enhancement
Next
From: Craig Ringer
Date:
Subject: Handy describe_pg_lock function