Re: [HACKERS] Block level parallel vacuum - Mailing list pgsql-hackers

From Mahendra Singh
Subject Re: [HACKERS] Block level parallel vacuum
Date
Msg-id CAKYtNAo4SeCYSRHNZZJDorro4p-BSFFjZkRueeqh+RdsB=PSCA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Block level parallel vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: [HACKERS] Block level parallel vacuum  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
Hi All,
I did some performance testing with the help of Dilip to test normal vacuum and parallel vacuum. Below is the test summary-

Configuration settings:
autovacuum = off
shared_buffers = 2GB
max_parallel_maintenance_workers = 6

Test  1: (table has 4 index on all tuples and deleting alternative tuples)
create table test(a int, b int, c int, d int, e int, f int, g int, h int); create index i1 on test (a); create index i2 on test (b); create index i3 on test (c); create index i4 on test (d); insert into test select i,i,i,i,i,i,i,i from generate_series(1,1000000) as i; delete from test where a %2=0;

case 1: (run normal vacuum)
vacuum test;
1019.453 ms

Case 2: (run vacuum with 1 parallel degree)
vacuum (parallel 1) test;
765.366 ms

Case 3:(run vacuum with 3 parallel degree)
vacuum (parallel 3) test;
555.227 ms

From above results, we can concluded that with the help of parallel vacuum, performance is increased for large indexes.

Test 2:(table has 16 indexes and indexes are small , deleting alternative tuples)
create table test(a int, b int, c int, d int, e int, f int, g int, h int);
create index i1 on test (a) where a < 100000;
create index i2 on test (a) where a > 100000 and a < 200000;
create index i3 on test (a) where a > 200000 and a < 300000;
create index i4 on test (a) where a > 300000 and a < 400000;
create index i5 on test (a) where a > 400000 and a < 500000;
create index i6 on test (a) where a > 500000 and a < 600000;
create index i7 on test (b) where a < 100000;
create index i8 on test (c) where a < 100000;
create index i9 on test (d) where a < 100000;
create index i10 on test (d) where a < 100000;
create index i11 on test (d) where a < 100000;
create index i12 on test (d) where a < 100000;
create index i13 on test (d) where a < 100000;
create index i14 on test (d) where a < 100000;
create index i15 on test (d) where a < 100000;
create index i16 on test (d) where a < 100000;
insert into test select i,i,i,i,i,i,i,i from generate_series(1,1000000) as i;
delete from test where a %2=0;

case 1: (run normal vacuum)
vacuum test;
649.187 ms

Case 2: (run vacuum with 1 parallel degree)
vacuum (parallel 1) test;
492.075 ms

Case 3:(run vacuum with 3 parallel degree)
vacuum (parallel 3) test;
435.581 ms

For small indexes also, we gained some performance by parallel vacuum.

I will continue my testing for stats collection.

Please let me know, if anybody has any suggestion for other testing(What should be tested).

Thanks and Regards
Mahendra Thalor

On Tue, 29 Oct 2019 at 12:37, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Mon, Oct 28, 2019 at 2:13 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Oct 24, 2019 at 4:33 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Oct 24, 2019 at 4:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Thu, Oct 24, 2019 at 11:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > >
> > > > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > >
> > > > > > I am thinking if we can write the patch for both the approaches (a.
> > > > > > compute shared costs and try to delay based on that, b. try to divide
> > > > > > the I/O cost among workers as described in the email above[1]) and do
> > > > > > some tests to see the behavior of throttling, that might help us in
> > > > > > deciding what is the best strategy to solve this problem, if any.
> > > > > > What do you think?
> > > > >
> > > > > I agree with this idea.  I can come up with a POC patch for approach
> > > > > (b).  Meanwhile, if someone is interested to quickly hack with the
> > > > > approach (a) then we can do some testing and compare.  Sawada-san,
> > > > > by any chance will you be interested to write POC with approach (a)?
> > > > > Otherwise, I will try to write it after finishing the first one
> > > > > (approach b).
> > > > >
> > > > I have come up with the POC for approach (a).
>
> > > Can we compute the overall throttling (sleep time) in the operation
> > > separately for heap and index, then divide the index's sleep_time with
> > > a number of workers and add it to heap's sleep time?  Then, it will be
> > > a bit easier to compare the data between parallel and non-parallel
> > > case.
> I have come up with a patch to compute the total delay during the
> vacuum.  So the idea of computing the total cost delay is
>
> Total cost delay = Total dealy of heap scan + Total dealy of
> index/worker;  Patch is attached for the same.
>
> I have prepared this patch on the latest patch of the parallel
> vacuum[1].  I have also rebased the patch for the approach [b] for
> dividing the vacuum cost limit and done some testing for computing the
> I/O throttling.  Attached patches 0001-POC-compute-total-cost-delay
> and 0002-POC-divide-vacuum-cost-limit can be applied on top of
> v31-0005-Add-paralell-P-option-to-vacuumdb-command.patch.  I haven't
> rebased on top of v31-0006, because v31-0006 is implementing the I/O
> throttling with one approach and 0002-POC-divide-vacuum-cost-limit is
> doing the same with another approach.   But,
> 0001-POC-compute-total-cost-delay can be applied on top of v31-0006 as
> well (just 1-2 lines conflict).
>
> Testing:  I have performed 2 tests, one with the same size indexes and
> second with the different size indexes and measured total I/O delay
> with the attached patch.
>
> Setup:
> VacuumCostDelay=10ms
> VacuumCostLimit=2000
>
> Test1 (Same size index):
> create table test(a int, b varchar, c varchar);
> create index idx1 on test(a);
> create index idx2 on test(b);
> create index idx3 on test(c);
> insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
> generate_series(1,500000) as i;
> delete from test where a < 200000;
>
>                       Vacuum (Head)                   Parallel Vacuum
>            Vacuum Cost Divide Patch
> Total Delay        1784 (ms)                           1398(ms)
>                  1938(ms)
>
>
> Test2 (Variable size dead tuple in index)
> create table test(a int, b varchar, c varchar);
> create index idx1 on test(a);
> create index idx2 on test(b) where a > 100000;
> create index idx3 on test(c) where a > 150000;
>
> insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
> generate_series(1,500000) as i;
> delete from test where a < 200000;
>
> Vacuum (Head)                                   Parallel Vacuum
>               Vacuum Cost Divide Patch
> Total Delay 1438 (ms)                               1029(ms)
>                    1529(ms)
>
>
> Conclusion:
> 1. The tests prove that the total I/O delay is significantly less with
> the parallel vacuum.
> 2. With the vacuum cost divide the problem is solved but the delay bit
> more compared to the non-parallel version.  The reason could be the
> problem discussed at[2], but it needs further investigation.
>
> Next, I will test with the v31-0006 (shared vacuum cost) patch.  I
> will also try to test different types of indexes.
>

Thank you for testing!

I realized that v31-0006 patch doesn't work fine so I've attached the
updated version patch that also incorporated some comments I got so
far. Sorry for the inconvenience. I'll apply your 0001 patch and also
test the total delay time.

Regards,

--
Masahiko Sawada

pgsql-hackers by date:

Previous
From: Rafia Sabih
Date:
Subject: Re: Performance improvement for queries with IN clause
Next
From: Amit Kapila
Date:
Subject: Re: [HACKERS] Block level parallel vacuum