Re: cost based vacuum (parallel) - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: cost based vacuum (parallel) |
Date | |
Msg-id | CAFiTN-vxbWLU-R73uE9eb12QxqkiE1hZCQLjmb_Ob3iPY1Fr-w@mail.gmail.com Whole thread Raw |
In response to | Re: cost based vacuum (parallel) (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: cost based vacuum (parallel)
|
List | pgsql-hackers |
On Fri, Nov 8, 2019 at 11:49 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Nov 8, 2019 at 9:39 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > I have done some experiments on this line. I have first produced a > > case where we can show the problem with the existing shared costing > > patch (worker which is doing less I/O might pay the penalty on behalf > > of the worker who is doing more I/O). I have also hacked the shared > > costing patch of Swada-san so that worker only go for sleep if the > > shared balance has crossed the limit and it's local balance has > > crossed some threadshold[1]. > > > > Test setup: I have created 4 indexes on the table. Out of which 3 > > indexes will have a lot of pages to process but need to dirty a few > > pages whereas the 4th index will have to process a very less number of > > pages but need to dirty all of them. I have attached the test script > > along with the mail. I have shown what is the delay time each worker > > have done. What is total I/O[1] each worker and what is the page hit, > > page miss and page dirty count? > > [1] total I/O = _nhit * VacuumCostPageHit + _nmiss * > > VacuumCostPageMiss + _ndirty * VacuumCostPageDirty > > > > patch 1: Shared costing patch: (delay condition -> > > VacuumSharedCostBalance > VacuumCostLimit) > > worker 0 delay=80.00 total I/O=17931 hit=17891 miss=0 dirty=2 > > worker 1 delay=40.00 total I/O=17931 hit=17891 miss=0 dirty=2 > > worker 2 delay=110.00 total I/O=17931 hit=17891 miss=0 dirty=2 > > worker 3 delay=120.98 total I/O=16378 hit=4318 miss=0 dirty=603 > > > > Observation1: I think here it's clearly visible that worker 3 is > > doing the least total I/O but delaying for maximum amount of time. > > OTOH, worker 1 is delaying for very little time compared to how much > > I/O it is doing. So for solving this problem, I have add a small > > tweak to the patch. Wherein the worker will only sleep if its local > > balance has crossed some threshold. And, we can see that with that > > change the problem is solved up to quite an extent. > > > > patch 2: Shared costing patch: (delay condition -> > > VacuumSharedCostBalance > VacuumCostLimit && VacuumLocalBalance > > > VacuumCostLimit/number of workers) > > worker 0 delay=100.12 total I/O=17931 hit=17891 miss=0 dirty=2 > > worker 1 delay=90.00 total I/O=17931 hit=17891 miss=0 dirty=2 > > worker 2 delay=80.06 total I/O=17931 hit=17891 miss=0 dirty=2 > > worker 3 delay=80.72 total I/O=16378 hit=4318 miss=0 dirty=603 > > > > Observation2: This patch solves the problem discussed with patch1 but > > in some extreme cases there is a possibility that the shared limit can > > become twice as much as local limit and still no worker goes for the > > delay. For solving that there could be multiple ideas a) Set the max > > limit on shared balance e.g. 1.5 * VacuumCostLimit after that we will > > give the delay whoever tries to do the I/O irrespective of its local > > balance. > > b) Set a little lower value for the local threshold e.g 50% of the local limit > > > > Here I have changed the patch2 as per (b) If local balance reaches to > > 50% of the local limit and shared balance hit the vacuum cost limit > > then go for the delay. > > > > patch 3: Shared costing patch: (delay condition -> > > VacuumSharedCostBalance > VacuumCostLimit && VacuumLocalBalance > 0.5 > > * VacuumCostLimit/number of workers) > > worker 0 delay=70.03 total I/O=17931 hit=17891 miss=0 dirty=2 > > worker 1 delay=100.14 total I/O=17931 hit=17891 miss=0 dirty=2 > > worker 2 delay=80.01 total I/O=17931 hit=17891 miss=0 dirty=2 > > worker 3 delay=101.03 total I/O=16378 hit=4318 miss=0 dirty=603 > > > > Observation3: I think patch3 doesn't completely solve the issue > > discussed in patch1 but its far better than patch1. > > > > Yeah, I think it is difficult to get the exact balance, but we can try > to be as close as possible. We can try to play with the threshold and > another possibility is to try to sleep in proportion to the amount of > I/O done by the worker. I have done another experiment where I have done another 2 changes on top op patch3 a) Only reduce the local balance from the total shared balance whenever it's applying delay b) Compute the delay based on the local balance. patch4: worker 0 delay=84.130000 total I/O=17931 hit=17891 miss=0 dirty=2 worker 1 delay=89.230000 total I/O=17931 hit=17891 miss=0 dirty=2 worker 2 delay=88.680000 total I/O=17931 hit=17891 miss=0 dirty=2 worker 3 delay=80.790000 total I/O=16378 hit=4318 miss=0 dirty=603 I think with this approach the delay is divided among the worker quite well compared to other approaches > > Thanks for doing these experiments, but I think it is better if you > can share the modified patches so that others can also reproduce what > you are seeing. There is no need to post the entire parallel vacuum > patch-set, but the costing related patch can be posted with a > reference to what all patches are required from parallel vacuum > thread. Another option is to move this discussion to the parallel > vacuum thread, but I think it is better to decide the costing model > here. I have attached the POC patches I have for testing. Step for testing 1. First, apply the parallel vacuum base patch and the shared costing patch[1]. 2. Apply 0001-vacuum_costing_test.patch attached in the mail 3. Run the script shared in previous mail [2]. --> this will give the results for patch 1 shared upthread[2] 4. Apply patch shared_costing_plus_patch[2] or [3] or [4] to see the results with different approaches explained in the mail. [1] https://www.postgresql.org/message-id/CAD21AoAqT17QwKJ_sWOqRxNvg66wMw1oZZzf9Rt-E-zD%2BXOh_Q%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAFiTN-tFLN%3Dvdu5Ra-23E9_7Z1JXkk5MkRY3Bkj2zAoWK7fULA%40mail.gmail.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: