Re: postgresql latency & bgwriter not doing its job - Mailing list pgsql-hackers

From Andres Freund
Subject Re: postgresql latency & bgwriter not doing its job
Date
Msg-id 20140830180405.GB25523@awork2.anarazel.de
Whole thread Raw
In response to Re: postgresql latency & bgwriter not doing its job  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: postgresql latency & bgwriter not doing its job
Re: postgresql latency & bgwriter not doing its job
List pgsql-hackers
On 2014-08-30 13:50:40 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2014-08-27 19:23:04 +0300, Heikki Linnakangas wrote:
> >> A long time ago, Itagaki Takahiro wrote a patch sort the buffers and write
> >> them out in order (http://www.postgresql.org/message-id/flat/20070614153758.6A62.ITAGAKI.TAKAHIRO@oss.ntt.co.jp).
> >> The performance impact of that was inconclusive, but one thing that it
> >> allows nicely is to interleave the fsyncs, so that you write all the buffers
> >> for one file, then fsync it, then next file and so on.
> 
> > ...
> > So, *very* clearly sorting is a benefit.
> 
> pg_bench alone doesn't convince me on this.  The original thread found
> cases where it was a loss, IIRC; you will need to test many more than
> one scenario to prove the point.

Sure. And I'm not claiming Itagaki/your old patch is immediately going
to be ready for commit. But our checkpoint performance has sucked for
years in the field. I don't think we can wave that away.

I think the primary reason it wasn't easily visible as being beneficial
back then was that only the throughput, not the latency and such were
looked at.

> Also, it does not matter how good it looks in test cases if it causes
> outright failures due to OOM; unlike you, I am not prepared to just "wave
> away" that risk.

I'm not "waving away" any risks.

If the sort buffer is allocated when the checkpointer is started, not
everytime we sort, as you've done in your version of the patch I think
that risk is pretty manageable. If we really want to be sure nothing is
happening at runtime, even if the checkpointer was restarted, we can put
the sort array in shared memory.

We're talking about (sizeof(BufferTag) + sizeof(int))/8192 ~= 0.3 %
overhead over shared_buffers here. If we decide to got that way, it's a
pretty darn small to price not to regularly have stalls that last
minutes.

> A possible compromise is to sort a limited number of
> buffers ---- say, collect a few thousand dirty buffers then sort, dump and
> fsync them, repeat as needed.

Yea, that's what I suggested nearby. But I don't really like it, because
it robs us of the the chance to fsync() a relfilenode immediately after
having synced all its buffers. Doing so is rather beneficial because
then fewer independently dirtied pages have to be flushed out - reducing
the impact of the fsync().

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: postgresql latency & bgwriter not doing its job
Next
From: Tom Lane
Date:
Subject: Re: postgresql latency & bgwriter not doing its job