Re: Partitioned checkpointing - Mailing list pgsql-hackers
From | Fabien COELHO |
---|---|
Subject | Re: Partitioned checkpointing |
Date | |
Msg-id | alpine.DEB.2.10.1509160850240.14201@sto Whole thread Raw |
In response to | Re: Partitioned checkpointing (Takashi Horikawa <t-horikawa@aj.jp.nec.com>) |
List | pgsql-hackers |
Hello Takashi-san, > I've noticed that the behavior in 'checkpoint_partitions = 1' is not the > same as that of original 9.5alpha2. Attached > 'partitioned-checkpointing-v3.patch' fixed the bug, thus please use it. I've done two sets of run on an old box (16 GB, 8 cores, RAID1 HDD) with "pgbench -M prepared -N -j 2 -c 4 ..." and analysed per second traces (-P 1) for 4 versions : sort+flush patch fully on, sort+flush patch full off (should be equivalent to head), partition patch v3 with 1 partition (should also be equivalent to head), partition patch v3 with 16 partitions. I ran two configurations : small: shared_buffers = 2GB checkpoint_timeout = 300s checkpoint_completion_target = 0.8 pgbench's scale = 120, time= 4000 medium: shared_buffers = 4GB max_wal_size = 5GB checkpoint_timeout = 30min checkpoint_completion_target = 0.8 pgbench'sscale = 300, time = 7500 * full speed run performance average tps +- standard deviation (percent of under 10 tps seconds) small medium 1. flush+sort : 751 +- 415 ( 0.2) 984 +- 500 ( 0.3) 2. no flush/sort: 188 +- 508 (83.6) 252 +- 551 (77.0) 3. 1 partition : 183 +- 518 (85.6) 232 +- 535 (78.3) 4. 16 partitions: 179 +- 462 (81.1) 196 +- 492 (80.9) Although 2 & 3 should be equivalent, there seems to be a lower performance with 1 partition, but it is pretty close and it may not be significant. The 16 partitions seems to show significant lower tps performance, especially for the medium case. Although the stddev is a little bit better for the small case, as suggested by the lower off-line figure, but relatively higher with the medium case (stddev = 2.5 * average). There is no comparision with the flush & sort activated. * throttled performance (-R 100/200 -L 100) percent of late transactions - above 100 ms or not even started as the system is much too behind schedule. small-100 small-200 medium-100 1. flush+sort : 1.0 1.9 1.9 2. no flush/sort: 31.5 49.8 27.1 3. 1 partition : 32.3 49.0 27.0 4. 16 partitions : 32.9 48.0 31.5 2 & 3 seem pretty equivalent, as expected. The 16 partitions seem to slightly degrade availability on average. Yet again, no comparison with flush & sort activated. From these runs, I would advise against applying the checkpoint partitionning patch: there is no consistent benefit on the basic harware I'm using on this test. I think that it make sense, because fsyncing random I/O several times instead of one has little impact. Now, once I/O are not random, that is with some kind of combined patch, this is another question. I would rather go with Andres suggestion to fsync once per file, when writing to a file is completed, because partitionning as such would reduce the effectiveness of sorting buffers. I think that it would be interesting if you could test the sort/flush patch on the same high-end system that you used for testing your partition patch. -- Fabien.
pgsql-hackers by date: