Re: [HACKERS] Block level parallel vacuum - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: [HACKERS] Block level parallel vacuum
Date
Msg-id CAFiTN-u4nMk-q81aL9w4Rara09KWOrsaZk-wwCjMGM5QFdgDPA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Block level parallel vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: [HACKERS] Block level parallel vacuum
List pgsql-hackers
On Thu, Oct 24, 2019 at 8:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Oct 24, 2019 at 3:21 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Fri, Oct 18, 2019 at 8:45 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > >
> > > > > On Thu, Oct 17, 2019 at 4:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > >
> > > > > > On Thu, Oct 17, 2019 at 3:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > > > >
> > > > > > > On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > Another point in this regard is that the user anyway has an option to
> > > > > > > > > turn off the cost-based vacuum.  By default, it is anyway disabled.
> > > > > > > > > So, if the user enables it we have to provide some sensible behavior.
> > > > > > > > > If we can't come up with anything, then, in the end, we might want to
> > > > > > > > > turn it off for a parallel vacuum and mention the same in docs, but I
> > > > > > > > > think we should try to come up with a solution for it.
> > > > > > > >
> > > > > > > > I finally got your point and now understood the need. And the idea I
> > > > > > > > proposed doesn't work fine.
> > > > > > > >
> > > > > > > > So you meant that all workers share the cost count and if a parallel
> > > > > > > > vacuum worker increase the cost and it reaches the limit, does the
> > > > > > > > only one worker sleep? Is that okay even though other parallel workers
> > > > > > > > are still running and then the sleep might not help?
> > > > > > > >
> > > > > >
> > > > > > Remember that the other running workers will also increase
> > > > > > VacuumCostBalance and whichever worker finds that it becomes greater
> > > > > > than VacuumCostLimit will reset its value and sleep.  So, won't this
> > > > > > make sure that overall throttling works the same?
> > > > > >
> > > > > > > I agree with this point.  There is a possibility that some of the
> > > > > > > workers who are doing heavy I/O continue to work and OTOH other
> > > > > > > workers who are doing very less I/O might become the victim and
> > > > > > > unnecessarily delay its operation.
> > > > > > >
> > > > > >
> > > > > > Sure, but will it impact the overall I/O?  I mean to say the rate
> > > > > > limit we want to provide for overall vacuum operation will still be
> > > > > > the same.  Also, isn't a similar thing happens now also where heap
> > > > > > might have done a major portion of I/O but soon after we start
> > > > > > vacuuming the index, we will hit the limit and will sleep.
> > > > >
> > > > > Actually, What I meant is that the worker who performing actual I/O
> > > > > might not go for the delay and another worker which has done only CPU
> > > > > operation might pay the penalty?  So basically the worker who is doing
> > > > > CPU intensive operation might go for the delay and pay the penalty and
> > > > > the worker who is performing actual I/O continues to work and do
> > > > > further I/O.  Do you think this is not a practical problem?
> > > > >
> > > >
> > > > I don't know.  Generally, we try to delay (if required) before
> > > > processing (read/write) one page which means it will happen for I/O
> > > > intensive operations, so I am not sure if the point you are making is
> > > > completely correct.
> > >
> > > Ok, I agree with the point that we are checking it only when we are
> > > doing the I/O operation.  But, we also need to consider that each I/O
> > > operations have a different weightage.  So even if we have a delay
> > > point at I/O operation there is a possibility that we might delay the
> > > worker which is just performing read buffer with page
> > > hit(VacuumCostPageHit).  But, the other worker who is actually
> > > dirtying the page(VacuumCostPageDirty = 20) continue the work and do
> > > more I/O.
> > >
> > > >
> > > > > Stepping back a bit,  OTOH, I think that we can not guarantee that the
> > > > > one worker who has done more I/O will continue to do further I/O and
> > > > > the one which has not done much I/O will not perform more I/O in
> > > > > future.  So it might not be too bad if we compute shared costs as you
> > > > > suggested above.
> > > > >
> > > >
> > > > I am thinking if we can write the patch for both the approaches (a.
> > > > compute shared costs and try to delay based on that, b. try to divide
> > > > the I/O cost among workers as described in the email above[1]) and do
> > > > some tests to see the behavior of throttling, that might help us in
> > > > deciding what is the best strategy to solve this problem, if any.
> > > > What do you think?
> > >
> > > I agree with this idea.  I can come up with a POC patch for approach
> > > (b).  Meanwhile, if someone is interested to quickly hack with the
> > > approach (a) then we can do some testing and compare.  Sawada-san,
> > > by any chance will you be interested to write POC with approach (a)?
> > > Otherwise, I will try to write it after finishing the first one
> > > (approach b).
> > >
> > I have come up with the POC for approach (a).
> >
> > The idea is
> > 1) Before launching the worker divide the current VacuumCostBalance
> > among workers so that workers start accumulating the balance from that
> > point.
> > 2) Also, divide the VacuumCostLimit among the workers.
> > 3) Once the worker are done with the index vacuum, send back the
> > remaining balance with the leader.
> > 4) The leader will sum all the balances and add that to its current
> > VacuumCostBalance.  And start accumulating its balance from this
> > point.
> >
> > I was trying to test how is the behaviour of the vacuum I/O limit, but
> > I could not find an easy way to test that so I just put the tracepoint
> > in the code and just checked that at what point we are giving the
> > delay.
> > I also printed the cost balance at various point to see that after how
> > much I/O accumulation we are hitting the delay.  Please feel free to
> > suggest a better way to test this.
> >
> > I have printed these logs for parallel vacuum patch (v30) vs v(30) +
> > patch for dividing i/o limit (attached with the mail)
> >
> > Note: Patch and the test results are attached.
> >
>
> Thank you!
>
> For approach (a) the basic idea I've come up with is that we have a
> shared balance value on DSM and each workers including the leader
> process add its local balance value to it in vacuum_delay_point, and
> then based on the shared value workers sleep. I'll submit that patch
> with other updates.
IMHO, if we add the local balance to the shared balance in
vacuum_delay_point and each worker is working with full limit then
there will be a problem right? because suppose VacuumCostLimit is 2000
then the first time each worker hit the vacuum_delay_point when their
local limit will be 2000 so in most cases, the first delay will be hit
when there gross I/O is 6000 (if there are 3 workers).

I think if we want to have the shared accounting then we must
accumulate the balance always in a shared variable so that as soon as
the gross limit hits the VacuumCostLimit, we can have the delay point.

Maybe we can do this
1. change VacuumCostBalance from integer to pg_atomic_uint32 *
2. In heap_parallel_vacuum_main function, make this point into a
shared memory location.  Basically, for the non-parallel case, it will
point to the process-specific global variable whereas in parallel case
it will point to a shared memory variable.
3. Now, I think in code (I think 5-6 occurrence) wherever we are using
VacuumCostBalance, change them to use atomic operations.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: dropdb --force
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: [HACKERS] WAL logging problem in 9.4.3?