Re: cost based vacuum (parallel) - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: cost based vacuum (parallel)
Date
Msg-id CAFiTN-snOhg0HfENP6Zw0QM6k-HmoL-esgj7HiZo2BfQjA2NvA@mail.gmail.com
Whole thread Raw
In response to Re: cost based vacuum (parallel)  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: cost based vacuum (parallel)
List pgsql-hackers
On Tue, Nov 5, 2019 at 2:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Nov 4, 2019 at 11:58 PM Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
> >
> > On 2019-11-04 12:59:02 -0500, Jeff Janes wrote:
> > > On Mon, Nov 4, 2019 at 1:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > > For parallel vacuum [1], we were discussing what is the best way to
> > > > divide the cost among parallel workers but we didn't get many inputs
> > > > apart from people who are very actively involved in patch development.
> > > > I feel that we need some more inputs before we finalize anything, so
> > > > starting a new thread.
> > > >
> > >
> > > Maybe a I just don't have experience in the type of system that parallel
> > > vacuum is needed for, but if there is any meaningful IO throttling which is
> > > active, then what is the point of doing the vacuum in parallel in the first
> > > place?
> >
> > I am wondering the same - but to be fair, it's pretty easy to run into
> > cases where VACUUM is CPU bound. E.g. because most pages are in
> > shared_buffers, and compared to the size of the indexes number of tids
> > that need to be pruned is fairly small (also [1]). That means a lot of
> > pages need to be scanned, without a whole lot of IO going on. The
> > problem with that is just that the defaults for vacuum throttling will
> > also apply here, I've never seen anybody tune vacuum_cost_page_hit = 0,
> > vacuum_cost_page_dirty=0 or such (in contrast, the latter is the highest
> > cost currently).  Nor do we reduce the cost of vacuum_cost_page_dirty
> > for unlogged tables.
> >
> > So while it doesn't seem unreasonable to want to use cost limiting to
> > protect against vacuum unexpectedly causing too much, especially read,
> > IO, I'm doubtful it has current practical relevance.
> >
>
> IIUC, you mean to say that it is of not much practical use to do
> parallel vacuum if I/O throttling is enabled for an operation, is that
> right?
>
>
> > I'm wondering how much of the benefit of parallel vacuum really is just
> > to work around vacuum ringbuffers often massively hurting performance
> > (see e.g. [2]).
> >
>
> Yeah, it is a good thing to check, but if anything, I think a parallel
> vacuum will further improve the performance with larger ring buffers
> as it will make it more CPU bound.
I have tested the same and the results prove that increasing the ring
buffer size we can see the performance gain.  And, the gain is much
more with the parallel vacuum.

Test case:
create table test(a int, b int, c int, d int, e int, f int, g int, h int);
create index idx1 on test(a);
create index idx2 on test(b);
create index idx3 on test(c);
create index idx4 on test(d);
create index idx5 on test(e);
create index idx6 on test(f);
create index idx7 on test(g);
create index idx8 on test(h);
insert into test select i,i,i,i,i,i,i,i from generate_series(1,1000000) as i;
delete from test where a < 300000;

( I have tested the parallel vacuum and non-parallel vacuum with
different ring buffer size)

8 index
ring buffer size 246kb-> non-parallel: 7.6 seconds   parallel (2
worker): 3.9 seconds
ring buffer size 256mb-> non-parallel: 6.1 seconds   parallel (2
worker): 3.2 seconds

4 index
ring buffer size 246kb -> non-parallel: 4.8 seconds   parallel (2
worker): 3.2 seconds
ring buffer size 256mb -> non-parallel: 3.8 seconds   parallel (2
worker): 2.6 seconds

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: v12 and pg_restore -f-
Next
From: Andres Freund
Date:
Subject: Re: cost based vacuum (parallel)