Home > mailing lists

Re: checkpointer continuous flushing - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: checkpointer continuous flushing
Date	September 1, 2015 09:51:49
Msg-id	CAA4eK1Km7smEJfFfXsHtNpdj2+jUqUk+6b91QQsomQEA0d4x=g@mail.gmail.com Whole thread Raw
In response to	Re: checkpointer continuous flushing (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses	Re: checkpointer continuous flushing (Fabien COELHO <coelho@cri.ensmp.fr>)
List	pgsql-hackers

Tree view

On Mon, Aug 31, 2015 at 12:40 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
>
> Hello Amit,
>
>> IBM POWER-8 24 cores, 192 hardware threads
>> RAM = 492GB
>
>
> Wow! Thanks for trying the patch on such high-end hardware!
>
> About the disks: what kind of HDD (RAID? speed?)? HDD write cache?
>

Speed of Reads -

Timing cached reads: 27790 MB in 1.98 seconds = 14001.86 MB/sec

Timing buffered disk reads: 3830 MB in 3.00 seconds = 1276.55 MB/sec

Copy speed -

dd if=/dev/zero of=/tmp/output.img bs=8k count=256k

262144+0 records in

262144+0 records out

2147483648 bytes (2.1 GB) copied, 1.30993 s, 1.6 GB/s

> What is the OS? The FS?
>

OS info -

Linux <m/c addr> 3.10.0-123.1.2.el7.ppc64 #1 SMP Wed Jun 4 15:23:17 EDT 2014 ppc64 ppc64 ppc64 GNU/Linux

FS - ext4

>> shared_buffers=8GB
>
>
> This is small wrt hardware, but given the scale setup I think that it should not matter much.
>

Yes, I was testing the case for Read-Write transactions when all the data

fits in shared_buffers, so this is okay.

>> max_wal_size=5GB
>
>
> Hmmm... Maybe quite small given the average performance?
>

We can check with larger value, but do you expect some different

results and why?

>> checkpoint_timeout=2min
>
>
> This seems rather small. Are the checkpoints xlog or time triggered?
>

I wanted to test by triggering more checkpoints, but I can test with

larger checkpoint interval as wel like 5 or 10 mins. Any suggestions?

> You did not update checkpoint_completion_target, which means 0.5 so that the checkpoint is scheduled to run in at most 1 minute, which suggest at least 130 MB/s write performance for the checkpoint.
>

The value used in your script was 0.8 for checkpoint_completion_target

which I have not changed during tests.

>> parallelism - 128 clients, 128 threads
>
>
> Given 192 hw threads, I would have tried used 128 clients & 64 threads, so that each pgbench client has its own dedicated postgres in a thread, and that postgres processes are not competing with pgbench. Now as pgbench is mostly sleeping, probably that does not matter much... I may also be totally wrong:-)
>

In next run, I can use it with 64 threads, lets settle on other parameters

first for which you expect there could be a clear win with the first patch.

>
>
> Given the hardware, I would suggest to raise checkpoint_timeout, shared_buffers and max_wal_size, and use checkpoint_completion_target=0.8. I would expect that it should improve performance both with and without sorting.
>

I don't think increasing shared_buffers would have any impact, because

8GB is sufficient for 300 scale factor data, checkpoint_completion_target is

already 0.8 in my previous tests. Lets try with checkpoint_timeout = 10 min

and max_wal_size = 15GB, do you have any other suggestion?

> It would be interesting to have informations from checkpoint logs (especially how many buffers written in how long, whether checkpoints are time or xlog triggered, ...).
>
>> The results of sorting patch for the tests done indicate that the win is not big enough with just doing sorting during checkpoints,
>
>
> ISTM that you do too much generalization: The win is not big "under this configuration and harware".
>

Hmm.. nothing like that, this was based on couple of tests done by

me and I am open to do some more if you or anybody feels that the

first patch (checkpoint-continuous-flush-10-a) can alone gives benefit,

in-fact I have started these tests with the intention to see if first

patch gives benefit, then that could be evaluated and eventually

committed separately.

> I think that the patch may have very small influence under some conditions, but should not degrade performance significantly, and on the other hand it should provide great improvements under some (other) conditions.
>

True, let us try to find conditions/scenarios where you think it can give

big boost, suggestions are welcome.

>>
>> What if tablespaces are not on separate disks
>
>
> I would expect that it might very slightly degrade performance, but only marginally.
>

>
> If you want to be able to disactivate balancing, it could be done with a guc, but I cannot see good reasons to want to do that: it would complicate the code and it does not make much sense to use many tablespaces on one disk, while anyone who uses several tablespaces on several disks is probably expecting to see her expensive disks actually used in parallel.
>

I think we can leave this for committer to take a call or if anybody

else has any opinion, because there is nothing wrong in what you

have done, but I am not clear if there is a clear need for the same.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Etsuro Fujita
Date: 01 September 2015, 09:21:24
Subject: Re: Minor code improvements to create_foreignscan_plan/ExecInitForeignScan

From: Yeb Havinga
Date: 01 September 2015, 10:19:26
Subject: Re: to_json(NULL) should to return JSON null instead NULL

Re: checkpointer continuous flushing - Mailing list pgsql-hackers

Previous

Next