Re: postgresql latency & bgwriter not doing its job - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: postgresql latency & bgwriter not doing its job |
Date | |
Msg-id | 20140827083026.GB21544@awork2.anarazel.de Whole thread Raw |
In response to | Re: postgresql latency & bgwriter not doing its job (Fabien COELHO <coelho@cri.ensmp.fr>) |
Responses |
Re: postgresql latency & bgwriter not doing its job
|
List | pgsql-hackers |
On 2014-08-27 09:32:16 +0200, Fabien COELHO wrote: > > Hello Andres, > > >[...] > >I think you're misunderstanding how spread checkpoints work. > > Yep, definitely:-) On the other hand I though I was seeking something > "simple", namely correct latency under small load, that I would expect out > of the box. Yea. The current situation *sucks*. Both from the utterly borked behaviour of ext4 and other filesystems and the lack of workaround from postgres. > >When the checkpointer process starts a spread checkpoint it first writes > >all buffers to the kernel in a paced manner. > >That pace is determined by checkpoint_completion_target and > >checkpoint_timeout. > > This pacing does not seem to work, even at slow pace. It definitely does in some cases. What's your evidence the pacing doesn't work? Afaik it's the fsync that causes the problem, not the the writes themselves. > >If you have a stall of roughly the same magnitude (say a factor > >of two different), the smaller once a minute, the larger once an > >hour. Obviously the once-an-hour one will have a better latency in many, > >many more transactions. > > I do not believe in delaying as much as possible writing do disk to handle a > small load as a viable strategy. However, to show my good will, I have > tried to follow your advices: I've launched a 5000 seconds test with 50 > segments, 30 min timeout, 0.9 completion target, at 25 tps, which is less > than 1/10 of the maximum throughput. > > There are only two time-triggered checkpoints: > > LOG: checkpoint starting: time > LOG: checkpoint complete: wrote 48725 buffers (47.6%); > 1 transaction log file(s) added, 0 removed, 0 recycled; > write=1619.750 s, sync=27.675 s, total=1647.932 s; > sync files=14, longest=27.593 s, average=1.976 s > > LOG: checkpoint starting: time > LOG: checkpoint complete: wrote 22533 buffers (22.0%); > 0 transaction log file(s) added, 0 removed, 23 recycled; > write=826.919 s, sync=9.989 s, total=837.023 s; > sync files=8, longest=6.742 s, average=1.248 s The write pacing itself doesn't seem to be bad. The bad thing is the 'sync' times here. Those are *NOT* paced and kernel probably has delayed flushing out much the writes... > (1) the ability to put checkpoint_timeout to values smaller than 30s could > help, although obviously there would be other consequences. But the ability > to avoid periodic offline time looks like a desirable objective. I'd rather not do that. It's a utterly horrible hack to go this write. > (2) I still think that a parameter to force bgwriter to write more stuff > could help, but this is not tested. It's going to be random writes. That's not going to be helpful. > (3) Any other effective idea to configure for responsiveness is welcome! I've a couple of ideas how to improve the situation, but so far I've not had the time to investigate them properly. Would you be willing to test a couple of simple patches? Did you test xfs already? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: