Re: checkpointer continuous flushing - Mailing list pgsql-hackers

From Andres Freund
Subject Re: checkpointer continuous flushing
Date
Msg-id 20160109134634.mdo6ma2xgpk5loya@alap3.anarazel.de
Whole thread Raw
In response to Re: checkpointer continuous flushing  (Andres Freund <andres@anarazel.de>)
Responses Re: checkpointer continuous flushing  (Fabien COELHO <coelho@cri.ensmp.fr>)
List pgsql-hackers
On 2016-01-07 21:17:32 +0100, Andres Freund wrote:
> On 2016-01-07 21:08:10 +0100, Fabien COELHO wrote:
> > Hmmm. What I understood is that the workloads that have some performance
> > regressions (regressions that I have *not* seen in the many tests I ran) are
> > not due to checkpointer IOs, but rather in settings where most of the writes
> > is done by backends or bgwriter.
> 
> As far as I can see you've not run many tests where the hot/warm data
> set is larger than memory (the full machine's memory, not
> shared_buffers). That quite drastically alters the performance
> characteristics here, because you suddenly have lots of synchronous read
> IO thrown into the mix.
> 
> Whether it's bgwriter or not I've not fully been able to establish, but
> it's a working theory.

Hm. New theory: The current flush interface does the flushing inside
FlushBuffer()->smgrwrite()->mdwrite()->FileWrite()->FlushContextSchedule(). The
problem with that is that at that point we (need to) hold a content lock
on the buffer!

Especially on a system that's bottlenecked on IO that means we'll
frequently hold content locks for a noticeable amount of time, while
flushing blocks, without any need to.

Even if that's not the reason for the slowdowns I observed, I think this
fact gives further credence to the current "pending flushes" tracking
residing on the wrong level.


Andres



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: checkpointer continuous flushing
Next
From: Simon Riggs
Date:
Subject: Re: Speedup twophase transactions