Re: checkpointer continuous flushing - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: checkpointer continuous flushing
Date
Msg-id alpine.DEB.2.10.1506250632170.3535@sto
Whole thread Raw
In response to Re: checkpointer continuous flushing  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
Hello Amit,

>> [...]
>> Ok, I misunderstood your question. I thought you meant a dip between 1
>> client and 4 clients. I meant that when flush is turned on tps goes down by
>> 8% (743 to 681 tps) on this particular run.
>
> This 8% might matter if the dip is bigger with more clients and
> more aggressive workload.  Do you know what could lead to this
> dip, because if we know what is the reason than it will be more
> predictable to know if this is the max dip that could happen or it
> could lead to bigger dip in other cases.

I do not know the cause of the dip, and whether it would increase with 
more clients. I do not have a box for such tests. If someone can provided 
the box, I can provide test scripts:-)

The first, although higher, measure is really very unstable, with pg 
totaly unresponsive (offline, really) at time.

I think that the flush option may always have a risk of (small) 
detrimental effects on tps, because there are two steady states: one with 
pg only doing wal-logged transactions with great tps, and one with pg 
doing the checkpoint at nought tps. If this is on the same disk, even at 
best the combination means that probably each operation will amper the 
other one a little bit, so the combined tps performance would/could be 
lower than doing one after the other and having pg offline 50% of the 
time...

Please also note that this 8% "dip" is on a 681 (with the dip) vs 198 (no 
options at all) a X 3.4 improvement compared to pg current behavior.

>> Basically tps improvements mostly come from "sort", and "flush" has
>> uncertain effects on tps (throuput), but much more on latency and
>> performance stability (lower late rate, lower standard deviation).
>
> I agree that performance stability is important, but not sure if it
> is good idea to sacrifice the throuput for it.

See discussion above. I think better stability may imply slightly lower 
throughput on some load. That is why there are options and DBA to choose 
them:-)

> If sort + flush always gives better results, then isn't it better to 
> perform these actions together under one option.

Sure, but that is not currently the case. Also what is done is very 
orthogonal, so I would tend to keep these separate. If one is always 
beneficial and it is wished that it should be always activated, then the 
option could be removed.

>> Hmmm. My point of view is still that the logical priority is to optimize
>> for disk IO first, then look for compatible RAM optimisations later.
>
> It is not only about RAM optimisation which we can do later, but also
> about avoiding regression in existing use-cases.

Hmmm. Currently I have not seen really significant regressions. I have 
seen some less good impact of some options on some loads.

-- 
Fabien.



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Supporting TAP tests with MSVC and Windows
Next
From: Fujii Masao
Date:
Subject: Re: Support for N synchronous standby servers - take 2