Re: [HACKERS] Increase Vacuum ring buffer. - Mailing list pgsql-hackers
From | Sokolov Yura |
---|---|
Subject | Re: [HACKERS] Increase Vacuum ring buffer. |
Date | |
Msg-id | 20170815180038.1f1953b2@falcon-work Whole thread Raw |
In response to | Re: [HACKERS] Increase Vacuum ring buffer. (Sokolov Yura <funny.falcon@postgrespro.ru>) |
List | pgsql-hackers |
В Mon, 31 Jul 2017 20:11:25 +0300 Sokolov Yura <funny.falcon@postgrespro.ru> пишет: > On 2017-07-27 11:53, Sokolov Yura wrote: > > On 2017-07-26 20:28, Sokolov Yura wrote: > >> On 2017-07-26 19:46, Claudio Freire wrote: > >>> On Wed, Jul 26, 2017 at 1:39 PM, Sokolov Yura > >>> <funny.falcon@postgrespro.ru> wrote: > >>>> On 2017-07-24 12:41, Sokolov Yura wrote: > >>>> test_master_1/pretty.log > >>> ... > >>>> time activity tps latency stddev min max > >>>> 11130 av+ch 198 198ms 374ms 7ms 1956ms > >>>> 11160 av+ch 248 163ms 401ms 7ms 2601ms > >>>> 11190 av+ch 321 125ms 363ms 7ms 2722ms > >>>> 11220 av+ch 1155 35ms 123ms 7ms 2668ms > >>>> 11250 av+ch 1390 29ms 79ms 7ms 1422ms > >>> > >>> vs > >>> > >>>> test_master_ring16_1/pretty.log > >>>> time activity tps latency stddev min max > >>>> 11130 av+ch 26 1575ms 635ms 101ms 2536ms > >>>> 11160 av+ch 25 1552ms 648ms 58ms 2376ms > >>>> 11190 av+ch 32 1275ms 726ms 16ms 2493ms > >>>> 11220 av+ch 23 1584ms 674ms 48ms 2454ms > >>>> 11250 av+ch 35 1235ms 777ms 22ms 3627ms > >>> > >>> That's a very huge change in latency for the worse > >>> > >>> Are you sure that's the ring buffer's doing and not some > >>> methodology snafu? > >> > >> Well, I tuned postgresql.conf so that there is no such > >> catastrophic slows down on master branch. (with default > >> settings such slowdown happens quite frequently). > >> bgwriter_lru_maxpages = 10 (instead of default 200) were one > >> of such tuning. > >> > >> Probably there were some magic "border" that triggers this > >> behavior. Tuning postgresql.conf shifted master branch on > >> "good side" of this border, and faster autovacuum crossed it > >> to "bad side" again. > >> > >> Probably, backend_flush_after = 2MB (instead of default 0) is > >> also part of this border. I didn't try to bench without this > >> option yet. > >> > >> Any way, given checkpoint and autovacuum interference could be > >> such noticeable, checkpoint clearly should affect autovacuum > >> cost mechanism, imho. > >> > >> With regards, > > > > I'll run two times with default postgresql.conf (except > > shared_buffers and maintence_work_mem) to find out behavior on > > default setting. > > > I've accidentally lost results of this run, so I will rerun it. > > This I remembered: > - even with default settings, autovacuum runs 3 times faster: > 9000s on master, 3000s with increased ring buffer. > So xlog-fsync really slows down autovacuum. > - but concurrent transactions slows down (not so extremely as in > previous test, but still significantly). > I could not draw pretty table now, cause I lost results. I'll do > it after re-run completes. > > With regards, Excuse me for long delay. I did run with default postgresql.conf . First: query was a bit different - instead of updating 5 close but random points using `aid in (:aid1, :aid2, :aid3, :aid4, :aid5)`, condition was `aid between (:aid1 and :aid1+9)`. TPS is much slower (on master 330tps vs 540tps for previous version), but it is hard to tell, is it due query difference, or is it due config change. I'm sorry for this inconvenience :-( I will never repeat this mistake in a future. Overview: master : 339 tps, average autovacuum 6000sec. ring16 : 275 tps, average autovacuum 1100sec, first 2500sec. ring16 + `vacuum_cost_page_dirty = 40` : 293 tps, average autovacuum 2100sec, first 4226sec. Running with default postgresql.conf doesn't show catastrophic tps decline when checkpoint starts during autovacuum (seen in previous test runs with `autovacuum_cost_delay = 2ms` ). Overall average tps through 8 hours test is also much closer to "master". Still, increased ring buffer significantly improves autovacuum performance, that means, fsync consumes a lot of time, comparable with autovacuum_cost_delay. Runs with ring16 has occasional jumps in minimum and average response latency: test_master_ring16_2/pretty.log 8475 av+ch 15 2689ms 659ms 1867ms 3575ms 27420 av 15 2674ms 170ms 2393ms 2926ms Usually it happens close to end of autovacuum. What could it be? It is clearly bad behavior hidden by current small ring buffer. Runs with ring16+`cost_page_dirty = 40` are much more stable in term of performance of concurrent transactions. Only first autovacuum has such "latency jump", latter runs smoothly. So, increasing ring buffer certainly improves autovacuum performance. Its negative effects could be compensated with configuration. It exposes some bad behavior in current implementation, that should be investigated closer. -- With regards, Sokolov Yura aka funny_falcon Postgres Professional: https://postgrespro.ru The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: