Re: WAL format and API changes (9.5) - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: WAL format and API changes (9.5) |
Date | |
Msg-id | 546A4D1D.4010207@vmware.com Whole thread Raw |
In response to | Re: WAL format and API changes (9.5) (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: WAL format and API changes (9.5)
(Heikki Linnakangas <hlinnakangas@vmware.com>)
|
List | pgsql-hackers |
On 11/13/2014 05:04 PM, Andres Freund wrote: > On 2014-11-13 15:33:44 +0200, Heikki Linnakangas wrote: >> That also seems to have more than bought back the performance >> regression I saw earlier. > > I think we really need to do the performance test with a different CRC > implementation. So far all the tests in this thread seem to be > correlating pretty linearly to the size of the record. Ok, I ran some tests comparing this patch, plus a quick & dirty patch to use the Intel CRC instruction for CRC calculation. The quick & dirty is attached. It's based on code you sent me over IM some weeks ago, and needs -msse4.2 to compile. The point of the CRC patch is to just "hide" the effect of simply reducing the WAL volume, to concentrate on the possible overhead on constructing and replaying the new format. WAL insertion performance ========================= To measure the performance of generating WAL, I ran the wal-update-testsuite.sh that Amit also ran earlier. The cluster was initialized with: shared_buffers=512MB checkpoint_segments=30 fsync=off autovacuum=off full_page_writes=off master + intel-crc.patch ------------------------ testname | wal_generated | duration -----------------------------------------+---------------+------------------ two short fields, no change | 388523496 | 5.98275089263916 two short fields, no change | 384400536 | 5.81247496604919 two short fields, no change | 394318384 | 5.83137106895447 two short fields, one changed | 424519456 | 5.94383692741394 two short fields, one changed | 427103960 | 6.0239098072052 two short fields, one changed | 431826936 | 5.99587488174438 two short fields, both changed | 424524232 | 6.24820590019226 two short fields, both changed | 424520616 | 6.27542090415955 two short fields, both changed | 429219024 | 6.24310803413391 one short and one long field, no change | 78215672 | 1.69482898712158 one short and one long field, no change | 79275744 | 1.70809984207153 one short and one long field, no change | 76444464 | 1.6900041103363 ten tiny fields, all changed | 468263328 | 7.33222103118896 ten tiny fields, all changed | 475685120 | 7.44711399078369 ten tiny fields, all changed | 466329808 | 7.44984602928162 hundred tiny fields, all changed | 181802520 | 3.48464608192444 hundred tiny fields, all changed | 169941080 | 3.57416915893555 hundred tiny fields, all changed | 170178472 | 3.51270413398743 hundred tiny fields, half changed | 184539560 | 3.50752401351929 hundred tiny fields, half changed | 184539488 | 3.51076292991638 hundred tiny fields, half changed | 182090680 | 3.50708913803101 hundred tiny fields, half nulled | 104020856 | 3.43046092987061 hundred tiny fields, half nulled | 100607288 | 3.2797839641571 hundred tiny fields, half nulled | 97194600 | 3.26747107505798 9 short and 1 long, short changed | 99243976 | 1.94056487083435 9 short and 1 long, short changed | 95552912 | 1.94364786148071 9 short and 1 long, short changed | 100634736 | 1.94284915924072 (27 rows) wal-format-and-api-changes-12.patch + crc-intel.patch ----------------------------------------------------- testname | wal_generated | duration -----------------------------------------+---------------+------------------ two short fields, no change | 344717720 | 6.01146483421326 two short fields, no change | 351999048 | 5.94603586196899 two short fields, no change | 359280088 | 5.92386198043823 two short fields, one changed | 349783736 | 5.98722791671753 two short fields, one changed | 357064632 | 5.95086097717285 two short fields, one changed | 347568640 | 5.92789196968079 two short fields, both changed | 388404496 | 5.9715678691864 two short fields, both changed | 389122792 | 5.99439883232117 two short fields, both changed | 389840936 | 5.98658204078674 one short and one long field, no change | 56844552 | 1.69335913658142 one short and one long field, no change | 67099160 | 1.66509008407593 one short and one long field, no change | 65557168 | 1.67773389816284 ten tiny fields, all changed | 387205416 | 7.0555100440979 ten tiny fields, all changed | 396668040 | 7.05817604064941 ten tiny fields, all changed | 389353592 | 6.98230600357056 hundred tiny fields, all changed | 163675416 | 3.4382529258728 hundred tiny fields, all changed | 173227296 | 3.4594509601593 hundred tiny fields, all changed | 172800520 | 3.44062900543213 hundred tiny fields, half changed | 174070216 | 3.51688480377197 hundred tiny fields, half changed | 169965336 | 3.47681999206543 hundred tiny fields, half changed | 175331128 | 3.45153713226318 hundred tiny fields, half nulled | 84784272 | 3.26745700836182 hundred tiny fields, half nulled | 83732928 | 3.22160506248474 hundred tiny fields, half nulled | 80339600 | 3.22326803207397 9 short and 1 long, short changed | 98087384 | 2.00620484352112 9 short and 1 long, short changed | 101273968 | 1.99480700492859 9 short and 1 long, short changed | 93123424 | 1.99506211280823 (27 rows) Summary: No meaningful difference in runtime. WAL replay performance ====================== To test WAL replay performance, I ran pgbench with WAL archiving enabled, and timed the replay of the generated WAL. I used the attached script, replay-perf-test.sh for that. full_page_writes were disabled, because replaying full page images is quite different from replaying other records. (Performance of full-page images is interesting too, but it's not expected that these WAL format changes make much difference to that). (This is the same test setup I used in 54611209.705@vmware.com.) I repeated the replay three times, and grepped for the "redo starts" and "redo done" messages in the log. The results: master + crc-intel.patch ------------------------ 2014-11-17 20:40:10.175 EET LOG: redo starts at 0/2000090 2014-11-17 20:41:04.327 EET LOG: redo done at 0/7A0000C8 2014-11-17 20:42:46.319 EET LOG: redo starts at 0/2000090 2014-11-17 20:43:37.973 EET LOG: redo done at 0/7A0000C8 2014-11-17 20:44:14.889 EET LOG: redo starts at 0/2000090 2014-11-17 20:45:07.161 EET LOG: redo done at 0/7A0000C8 wal-format-and-api-changes-12.patch + crc-intel.patch ----------------------------------------------------- 2014-11-17 20:20:54.198 EET LOG: redo starts at 0/2000090 2014-11-17 20:21:49.073 EET LOG: redo done at 0/6A0000C8 2014-11-17 20:23:09.438 EET LOG: redo starts at 0/2000090 2014-11-17 20:24:04.744 EET LOG: redo done at 0/6A0000C8 2014-11-17 20:25:24.076 EET LOG: redo starts at 0/2000090 2014-11-17 20:26:18.830 EET LOG: redo done at 0/6A0000C8 In summary, there is no significant difference in replay performance. The amount of WAL generated is much smaller with the patch. For completeness, attached is the version of the patch that I used for these tests. It's the same as patch version 11, but I fixed one trivial bug in a b-tree redo function. This concludes my performance testing, until someone wants to see some other scenario being tested. I'm happy with the results. - Heikki
Attachment
pgsql-hackers by date: