Re: cost based vacuum (parallel) - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: cost based vacuum (parallel) |
Date | |
Msg-id | 05A5B57C-015B-448F-8FA6-1966EC60263E@anarazel.de Whole thread Raw |
In response to | Re: cost based vacuum (parallel) (Dilip Kumar <dilipbalaut@gmail.com>) |
Responses |
Re: cost based vacuum (parallel)
|
List | pgsql-hackers |
Hi, On November 5, 2019 7:16:41 AM PST, Dilip Kumar <dilipbalaut@gmail.com> wrote: >On Tue, Nov 5, 2019 at 2:40 PM Amit Kapila <amit.kapila16@gmail.com> >wrote: >> >> On Mon, Nov 4, 2019 at 11:58 PM Andres Freund <andres@anarazel.de> >wrote: >> > >> > Hi, >> > >> > On 2019-11-04 12:59:02 -0500, Jeff Janes wrote: >> > > On Mon, Nov 4, 2019 at 1:54 AM Amit Kapila ><amit.kapila16@gmail.com> wrote: >> > > >> > > > For parallel vacuum [1], we were discussing what is the best >way to >> > > > divide the cost among parallel workers but we didn't get many >inputs >> > > > apart from people who are very actively involved in patch >development. >> > > > I feel that we need some more inputs before we finalize >anything, so >> > > > starting a new thread. >> > > > >> > > >> > > Maybe a I just don't have experience in the type of system that >parallel >> > > vacuum is needed for, but if there is any meaningful IO >throttling which is >> > > active, then what is the point of doing the vacuum in parallel in >the first >> > > place? >> > >> > I am wondering the same - but to be fair, it's pretty easy to run >into >> > cases where VACUUM is CPU bound. E.g. because most pages are in >> > shared_buffers, and compared to the size of the indexes number of >tids >> > that need to be pruned is fairly small (also [1]). That means a lot >of >> > pages need to be scanned, without a whole lot of IO going on. The >> > problem with that is just that the defaults for vacuum throttling >will >> > also apply here, I've never seen anybody tune vacuum_cost_page_hit >= 0, >> > vacuum_cost_page_dirty=0 or such (in contrast, the latter is the >highest >> > cost currently). Nor do we reduce the cost of >vacuum_cost_page_dirty >> > for unlogged tables. >> > >> > So while it doesn't seem unreasonable to want to use cost limiting >to >> > protect against vacuum unexpectedly causing too much, especially >read, >> > IO, I'm doubtful it has current practical relevance. >> > >> >> IIUC, you mean to say that it is of not much practical use to do >> parallel vacuum if I/O throttling is enabled for an operation, is >that >> right? >> >> >> > I'm wondering how much of the benefit of parallel vacuum really is >just >> > to work around vacuum ringbuffers often massively hurting >performance >> > (see e.g. [2]). >> > >> >> Yeah, it is a good thing to check, but if anything, I think a >parallel >> vacuum will further improve the performance with larger ring buffers >> as it will make it more CPU bound. >I have tested the same and the results prove that increasing the ring >buffer size we can see the performance gain. And, the gain is much >more with the parallel vacuum. > >Test case: >create table test(a int, b int, c int, d int, e int, f int, g int, h >int); >create index idx1 on test(a); >create index idx2 on test(b); >create index idx3 on test(c); >create index idx4 on test(d); >create index idx5 on test(e); >create index idx6 on test(f); >create index idx7 on test(g); >create index idx8 on test(h); >insert into test select i,i,i,i,i,i,i,i from generate_series(1,1000000) >as i; >delete from test where a < 300000; > >( I have tested the parallel vacuum and non-parallel vacuum with >different ring buffer size) Thanks! >8 index >ring buffer size 246kb-> non-parallel: 7.6 seconds parallel (2 >worker): 3.9 seconds >ring buffer size 256mb-> non-parallel: 6.1 seconds parallel (2 >worker): 3.2 seconds > >4 index >ring buffer size 246kb -> non-parallel: 4.8 seconds parallel (2 >worker): 3.2 seconds >ring buffer size 256mb -> non-parallel: 3.8 seconds parallel (2 >worker): 2.6 seconds What about the case of just disabling the ring buffer logic? Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
pgsql-hackers by date: