Re: checkpointer continuous flushing - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: checkpointer continuous flushing
Date
Msg-id 34db4b9b-6bcb-8633-df87-064df76065e6@2ndquadrant.com
Whole thread Raw
In response to Re: checkpointer continuous flushing  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: checkpointer continuous flushing
List pgsql-hackers
Hi,

On 03/17/2016 06:36 PM, Fabien COELHO wrote:
>
> Hello Tomas,
>
> Thanks for these great measures.
>
>> * 4 x CPU E5-4620 (2.2GHz)
>
> 4*8 = 32 cores / 64 threads.

Yep. I only used 32 clients though, to keep some of the CPU available 
for the rest of the system (also, HT does not really double the number 
of cores).

>
>> * 256GB of RAM
>
> Wow!
>
>> * 24x SSD on LSI 2208 controller (with 1GB BBWC)
>
> Wow! RAID configuration ? The patch is designed to fix very big issues
> on HDD, but it is good to see that the impact is good on SSD as well.

Yep, RAID-10. I agree that doing the test on a HDD-based system would be 
useful, however (a) I don't have a comparable system at hand at the 
moment, and (b) I was a bit worried that it'll hurt performance on SSDs, 
but thankfully that's not the case.

I will do the test on a much smaller system with HDDs in a few days.

>
> Is it possible to run tests with distinct table spaces on those many disks?

Nope, that'd require reconfiguring the system (and then back), and I 
don't have access to that system (just SSH). Also, I don't quite see 
what would that tell us?

>> * shared_buffers=64GB
>
> 1/4 of the available memory.
>
>> The pgbench was scale 60000, so ~750GB of data on disk,
>
> *3 available memory, mostly on disk.
>
>> or like this ("throttled"):
>>
>> pgbench -c 32 -j 8 -T 86400 -R 5000 -l --aggregate-interval=1 pgbench
>>
>> The reason for the throttling is that people generally don't run
>> production databases 100% saturated, so it'd be sad to improve the
>> 100% saturated case and hurt the common case by increasing latency.
>
> Sure.
>
>> The machine does ~8000 tps, so 5000 tps is ~60% of that.
>
> Ok.
>
> I would have suggested using the --latency-limit option to filter out
> very slow queries, otherwise if the system is stuck it may catch up
> later, but then this is not representative of "sustainable" performance.
>
> When pgbench is running under a target rate, in both runs the
> transaction distribution is expected to be the same, around 5000 tps,
> and the green run looks pretty ok with respect to that. The magenta one
> shows that about 25% of the time, things are not good at all, and the
> higher figures just show the catching up, which is not really
> interesting if you asked for a web page and it is finally delivered 1
> minutes later.

Maybe. But that'd only increase the stress on the system, possibly 
causing more issues, no? And the magenta line is the old code, thus it 
would only increase the improvement of the new code.

Notice the max latency is in microseconds (as logged by pgbench), so 
according to the "max latency" charts the latencies are below 10 seconds 
(old) and 1 second (new) about 99% of the time. So I don't think this 
would make any measurable difference in practice.


regards


-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Using quicksort for every external sort run
Next
From: Robert Haas
Date:
Subject: Re: WIP: Upper planner pathification