Home > mailing lists

Re: checkpointer continuous flushing - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: checkpointer continuous flushing
Date	June 25, 2015 04:22:09
Msg-id	CAA4eK1KAXpAp3JBCNVmkihp1q7hbRg_j+ofSYj953F6S9N2_OA@mail.gmail.com Whole thread Raw
In response to	Re: checkpointer continuous flushing (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses	Re: checkpointer continuous flushing
List	pgsql-hackers

Tree view

On Wed, Jun 24, 2015 at 9:50 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:

flsh | full speed tps | percent of late tx, 4 clients
/srt | 1 client | 4 clients | 100 | 200 | 400 |
N/N | 173 +- 289* | 198 +- 531* | 27.61 | 43.92 | 61.16 |
N/Y | 458 +- 327* | 743 +- 920* | 7.05 | 14.24 | 24.07 |
Y/N | 169 +- 166* | 187 +- 302* | 4.01 | 39.84 | 65.70 |
Y/Y | 546 +- 143 | 681 +- 459 | 1.55 | 3.51 | 2.84 |

The effect of sorting is very positive (+150% to 270% tps). On this run,

flushing has a positive (+20% with 1 client) or negative (-8 % with 4
clients) on throughput, and late transactions are reduced by 92-95% when
both options are activated.

Why there is dip in performance with multiple clients,

I'm not sure to see the "dip". The performances are better with 4 clients
compared to 1 client?

What do you mean by "negative (-8 % with 4 clients) on throughput" in above sentence? I thought by that you mean that there is dip in TPS with patch as compare to HEAD at 4 clients.

Ok, I misunderstood your question. I thought you meant a dip between 1 client and 4 clients. I meant that when flush is turned on tps goes down by 8% (743 to 681 tps) on this particular run.

This 8% might matter if the dip is bigger with more clients and

more aggressive workload. Do you know what could lead to this

dip, because if we know what is the reason than it will be more

predictable to know if this is the max dip that could happen or it

could lead to bigger dip in other cases.

Basically tps improvements mostly come from "sort", and "flush" has uncertain effects on tps (throuput), but much more on latency and performance stability (lower late rate, lower standard deviation).

I agree that performance stability is important, but not sure if it

is good idea to sacrifice the throuput for it. If sort + flush always

gives better results, then isn't it better to perform these actions

together under one option.

Note that I'm not comparing to HEAD in the above tests, but with the new options desactivated, which should be more or less comparable to current HEAD, i.e. there is no sorting nor flushing done, but this is not strictly speaking HEAD behavior. Probably I should get some figures with HEAD as well to check the "more or less" assumption.

Also I am not completely sure what's +- means in your data above?

The first figure before "+-" is the tps, the second after is its standard deviation computed in per-second traces. Some runs are very bad, with pgbench stuck at times, and result on stddev larger than the average, they ere noted with "*".

I understand your point and I also don't have any specific answer
for it at this moment, the point of worry is that it should not lead
to degradation of certain cases as compare to current algorithm.
The workload where it could effect is when your data doesn't fit
in shared buffers, but can fit in RAM.

Hmmm. My point of view is still that the logical priority is to optimize for disk IO first, then look for compatible RAM optimisations later.

It is not only about RAM optimisation which we can do later, but also

about avoiding regression in existing use-cases.

With Regards,
Amit Kapila.

EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Amit Langote
Date: 25 June 2015, 03:52:35
Subject: Re: UPSERT on partition

From: Michael Paquier
Date: 25 June 2015, 04:41:11
Subject: Re: pg_rewind failure by file deletion in source server

Re: checkpointer continuous flushing - Mailing list pgsql-hackers

Previous

Next