Re: postgresql latency & bgwriter not doing its job - Mailing list pgsql-hackers

From Andres Freund
Subject Re: postgresql latency & bgwriter not doing its job
Date
Msg-id 20140827083026.GB21544@awork2.anarazel.de
Whole thread Raw
In response to Re: postgresql latency & bgwriter not doing its job  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: postgresql latency & bgwriter not doing its job
List pgsql-hackers
On 2014-08-27 09:32:16 +0200, Fabien COELHO wrote:
> 
> Hello Andres,
> 
> >[...]
> >I think you're misunderstanding how spread checkpoints work.
> 
> Yep, definitely:-) On the other hand I though I was seeking something
> "simple", namely correct latency under small load, that I would expect out
> of the box.

Yea. The current situation *sucks*. Both from the utterly borked
behaviour of ext4 and other filesystems and the lack of workaround from postgres.

> >When the checkpointer process starts a spread checkpoint it first writes
> >all buffers to the kernel in a paced manner.
> >That pace is determined by checkpoint_completion_target and
> >checkpoint_timeout.
> 
> This pacing does not seem to work, even at slow pace.

It definitely does in some cases. What's your evidence the pacing
doesn't work? Afaik it's the fsync that causes the problem, not the the
writes themselves.

> >If you have a stall of roughly the same magnitude (say a factor
> >of two different), the smaller once a minute, the larger once an
> >hour. Obviously the once-an-hour one will have a better latency in many,
> >many more transactions.
> 
> I do not believe in delaying as much as possible writing do disk to handle a
> small load as a viable strategy.  However, to show my good will, I have
> tried to follow your advices: I've launched a 5000 seconds test with 50
> segments, 30 min timeout, 0.9 completion target, at 25 tps, which is less
> than 1/10 of the maximum throughput.
> 
> There are only two time-triggered checkpoints:
> 
>   LOG:  checkpoint starting: time
>   LOG:  checkpoint complete: wrote 48725 buffers (47.6%);
>       1 transaction log file(s) added, 0 removed, 0 recycled;
>       write=1619.750 s, sync=27.675 s, total=1647.932 s;
>       sync files=14, longest=27.593 s, average=1.976 s
> 
>   LOG:  checkpoint starting: time
>   LOG:  checkpoint complete: wrote 22533 buffers (22.0%);
>       0 transaction log file(s) added, 0 removed, 23 recycled;
>       write=826.919 s, sync=9.989 s, total=837.023 s;
>       sync files=8, longest=6.742 s, average=1.248 s

The write pacing itself doesn't seem to be bad. The bad thing is the
'sync' times here. Those are *NOT* paced and kernel probably has delayed
flushing out much the writes...

> (1) the ability to put checkpoint_timeout to values smaller than 30s could
> help, although obviously there would be other consequences. But the ability
> to avoid periodic offline time looks like a desirable objective.

I'd rather not do that. It's a utterly horrible hack to go this write.

> (2) I still think that a parameter to force bgwriter to write more stuff
> could help, but this is not tested.

It's going to be random writes. That's not going to be helpful.

> (3) Any other effective idea to configure for responsiveness is welcome!

I've a couple of ideas how to improve the situation, but so far I've not
had the time to investigate them properly. Would you be willing to test
a couple of simple patches?

Did you test xfs already?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: pgbench throttling latency limit
Next
From: Fabien COELHO
Date:
Subject: Re: postgresql latency & bgwriter not doing its job