On Wed, Sep 9, 2015 at 10:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> ... How often such a workload actually has to replace a *dirty* clog
>> buffer obviously depends on how often you checkpoint, but if you're
>> getting ~28k TPS you can completely fill 32 clog buffers (1 million
>> transactions) in less than 40 seconds, and you're probably not
>> checkpointing nearly that often.
>
> But by the same token, at that kind of transaction rate, no clog page is
> actively getting dirtied for more than a couple of seconds. So while it
> might get swapped in and out of the SLRU arena pretty often after that,
> this scenario seems unconvincing as a source of repeated fsyncs.
Well, if you're filling ~1 clog page per second, you're doing ~1 fsync
per second too. Or if you are not, then you are thrashing the
progressively smaller and smaller number of clean slots ever-harder
until no clean pages remain and you're forced to fsync something -
probably, a bunch of things all at once.
> Like Andres, I'd want to see a more realistic problem case before
> expending a lot of work here.
I think the question here isn't whether the problem case is realistic
- I am quite sure that a pgbench workload is - but rather how much of
a problem it actually causes. I'm very sure that individual pgbench
backends experience multi-second stallls as a result of this. What
I'm not sure about is how frequently it happens, and how much of an
effect it has on overall latency. I think it would be worth someone's
time to try to write some good instrumentation code here and figure
that out.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company