Re: checkpointer continuous flushing - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: checkpointer continuous flushing
Date
Msg-id CAA4eK1Km7smEJfFfXsHtNpdj2+jUqUk+6b91QQsomQEA0d4x=g@mail.gmail.com
Whole thread Raw
In response to Re: checkpointer continuous flushing  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: checkpointer continuous flushing  (Fabien COELHO <coelho@cri.ensmp.fr>)
List pgsql-hackers
On Mon, Aug 31, 2015 at 12:40 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
>
> Hello Amit,
>
>> IBM POWER-8 24 cores, 192 hardware threads
>> RAM = 492GB
>
>
> Wow! Thanks for trying the patch on such high-end hardware!
>
> About the disks: what kind of HDD (RAID? speed?)? HDD write cache?
>

Speed of Reads -
Timing cached reads:   27790 MB in  1.98 seconds = 14001.86 MB/sec
Timing buffered disk reads: 3830 MB in  3.00 seconds = 1276.55 MB/sec

Copy speed - 

dd if=/dev/zero of=/tmp/output.img bs=8k count=256k
262144+0 records in
262144+0 records out
2147483648 bytes (2.1 GB) copied, 1.30993 s, 1.6 GB/s


> What is the OS? The FS?
>

OS info -
Linux <m/c addr> 3.10.0-123.1.2.el7.ppc64 #1 SMP Wed Jun 4 15:23:17 EDT 2014 ppc64 ppc64 ppc64 GNU/Linux

FS - ext4


>> shared_buffers=8GB
>
>
> This is small wrt hardware, but given the scale setup I think that it should not matter much.
>

Yes, I was testing the case for Read-Write transactions when all the data
fits in shared_buffers, so this is okay.

>> max_wal_size=5GB
>
>
> Hmmm... Maybe quite small given the average performance?
>

We can check with larger value, but do you expect some different
results and why?

>> checkpoint_timeout=2min
>
>
> This seems rather small. Are the checkpoints xlog or time triggered?
>

I wanted to test by triggering more checkpoints, but I can test with
larger checkpoint interval as wel like 5 or 10 mins. Any suggestions?


> You did not update checkpoint_completion_target, which means 0.5 so that the checkpoint is scheduled to run in at most 1 minute, which suggest at least 130 MB/s write performance for the checkpoint.
>

The value used in your script was 0.8 for checkpoint_completion_target
which I have not changed during tests.

>> parallelism - 128 clients, 128 threads
>
>
> Given 192 hw threads, I would have tried used 128 clients & 64 threads, so that each pgbench client has its own dedicated postgres in a thread, and that postgres processes are not competing with pgbench. Now as pgbench is mostly sleeping, probably that does not matter much... I may also be totally wrong:-)
>

In next run, I can use it with 64 threads, lets settle on other parameters
first for which you expect there could be a clear win with the first patch.

>
>
> Given the hardware, I would suggest to raise checkpoint_timeout, shared_buffers and max_wal_size, and use checkpoint_completion_target=0.8. I would expect that it should improve performance both with and without sorting.
>

I don't think increasing shared_buffers would have any impact, because
8GB is sufficient for 300 scale factor data, checkpoint_completion_target is
already 0.8 in my previous tests.  Lets try with checkpoint_timeout = 10 min
and max_wal_size = 15GB, do you have any other suggestion?

> It would be interesting to have informations from checkpoint logs (especially how many buffers written in how long, whether checkpoints are time or xlog triggered, ...).
>
>> The results of sorting patch for the tests done indicate that the win is not big enough with just doing sorting during checkpoints,
>
>
> ISTM that you do too much generalization: The win is not big "under this configuration and harware".
>

Hmm.. nothing like that, this was based on couple of tests done by
me and I am open to do some more if you or anybody feels that the
first patch (checkpoint-continuous-flush-10-a) can alone gives benefit,
in-fact I have started these tests with the intention to see if first
patch gives benefit, then that could be evaluated and eventually
committed separately.

> I think that the patch may have very small influence under some conditions, but should not degrade performance significantly, and on the other hand it should provide great improvements under some (other) conditions.
>

True, let us try to find conditions/scenarios where you think it can give
big boost, suggestions are welcome.

>>
>> What if tablespaces are not on separate disks
>
>
> I would expect that it might very slightly degrade performance, but only marginally.
>
>
> If you want to be able to disactivate balancing, it could be done with a guc, but I cannot see good reasons to want to do that: it would complicate the code and it does not make much sense to use many tablespaces on one disk, while anyone who uses several tablespaces on several disks is probably expecting to see her expensive disks actually used in parallel.
>

I think we can leave this for committer to take a call or if anybody
else has any opinion, because there is nothing wrong in what you
have done, but I am not clear if there is a clear need for the same.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Etsuro Fujita
Date:
Subject: Re: Minor code improvements to create_foreignscan_plan/ExecInitForeignScan
Next
From: Yeb Havinga
Date:
Subject: Re: to_json(NULL) should to return JSON null instead NULL