Re: cost based vacuum (parallel) - Mailing list pgsql-hackers

From Andres Freund
Subject Re: cost based vacuum (parallel)
Date
Msg-id 05A5B57C-015B-448F-8FA6-1966EC60263E@anarazel.de
Whole thread Raw
In response to Re: cost based vacuum (parallel)  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: cost based vacuum (parallel)
List pgsql-hackers
Hi,

On November 5, 2019 7:16:41 AM PST, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>On Tue, Nov 5, 2019 at 2:40 PM Amit Kapila <amit.kapila16@gmail.com>
>wrote:
>>
>> On Mon, Nov 4, 2019 at 11:58 PM Andres Freund <andres@anarazel.de>
>wrote:
>> >
>> > Hi,
>> >
>> > On 2019-11-04 12:59:02 -0500, Jeff Janes wrote:
>> > > On Mon, Nov 4, 2019 at 1:54 AM Amit Kapila
><amit.kapila16@gmail.com> wrote:
>> > >
>> > > > For parallel vacuum [1], we were discussing what is the best
>way to
>> > > > divide the cost among parallel workers but we didn't get many
>inputs
>> > > > apart from people who are very actively involved in patch
>development.
>> > > > I feel that we need some more inputs before we finalize
>anything, so
>> > > > starting a new thread.
>> > > >
>> > >
>> > > Maybe a I just don't have experience in the type of system that
>parallel
>> > > vacuum is needed for, but if there is any meaningful IO
>throttling which is
>> > > active, then what is the point of doing the vacuum in parallel in
>the first
>> > > place?
>> >
>> > I am wondering the same - but to be fair, it's pretty easy to run
>into
>> > cases where VACUUM is CPU bound. E.g. because most pages are in
>> > shared_buffers, and compared to the size of the indexes number of
>tids
>> > that need to be pruned is fairly small (also [1]). That means a lot
>of
>> > pages need to be scanned, without a whole lot of IO going on. The
>> > problem with that is just that the defaults for vacuum throttling
>will
>> > also apply here, I've never seen anybody tune vacuum_cost_page_hit
>= 0,
>> > vacuum_cost_page_dirty=0 or such (in contrast, the latter is the
>highest
>> > cost currently).  Nor do we reduce the cost of
>vacuum_cost_page_dirty
>> > for unlogged tables.
>> >
>> > So while it doesn't seem unreasonable to want to use cost limiting
>to
>> > protect against vacuum unexpectedly causing too much, especially
>read,
>> > IO, I'm doubtful it has current practical relevance.
>> >
>>
>> IIUC, you mean to say that it is of not much practical use to do
>> parallel vacuum if I/O throttling is enabled for an operation, is
>that
>> right?
>>
>>
>> > I'm wondering how much of the benefit of parallel vacuum really is
>just
>> > to work around vacuum ringbuffers often massively hurting
>performance
>> > (see e.g. [2]).
>> >
>>
>> Yeah, it is a good thing to check, but if anything, I think a
>parallel
>> vacuum will further improve the performance with larger ring buffers
>> as it will make it more CPU bound.
>I have tested the same and the results prove that increasing the ring
>buffer size we can see the performance gain.  And, the gain is much
>more with the parallel vacuum.
>
>Test case:
>create table test(a int, b int, c int, d int, e int, f int, g int, h
>int);
>create index idx1 on test(a);
>create index idx2 on test(b);
>create index idx3 on test(c);
>create index idx4 on test(d);
>create index idx5 on test(e);
>create index idx6 on test(f);
>create index idx7 on test(g);
>create index idx8 on test(h);
>insert into test select i,i,i,i,i,i,i,i from generate_series(1,1000000)
>as i;
>delete from test where a < 300000;
>
>( I have tested the parallel vacuum and non-parallel vacuum with
>different ring buffer size)

Thanks!

>8 index
>ring buffer size 246kb-> non-parallel: 7.6 seconds   parallel (2
>worker): 3.9 seconds
>ring buffer size 256mb-> non-parallel: 6.1 seconds   parallel (2
>worker): 3.2 seconds
>
>4 index
>ring buffer size 246kb -> non-parallel: 4.8 seconds   parallel (2
>worker): 3.2 seconds
>ring buffer size 256mb -> non-parallel: 3.8 seconds   parallel (2
>worker): 2.6 seconds

What about the case of just disabling the ring buffer logic?

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: cost based vacuum (parallel)
Next
From: Tomas Vondra
Date:
Subject: Re: [Patch] Optimize dropping of relation buffers using dlist