Re: WAL format and API changes (9.5) - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: WAL format and API changes (9.5) |
Date | |
Msg-id | 5416DE72.7030005@vmware.com Whole thread Raw |
In response to | Re: WAL format and API changes (9.5) (Michael Paquier <michael.paquier@gmail.com>) |
Responses |
Re: WAL format and API changes (9.5)
Re: WAL format and API changes (9.5) Re: WAL format and API changes (9.5) |
List | pgsql-hackers |
On 09/04/2014 03:39 AM, Michael Paquier wrote: > On Tue, Sep 2, 2014 at 9:23 PM, Heikki Linnakangas > <hlinnakangas@vmware.com> wrote: >> I committed the redo-routine refactoring patch. I kept the XLog prefix in >> the XLogReadBufferForRedo name; it's redundant, but all the other similar >> functions in xlogutils.c use the XLog prefix so it would seem inconsistent >> to not have it here. > Thanks! Even that will be helpful for a potential patch doing > consistency comparisons of FPW with current pages having WAL of a > record applied. > >> I'll post a new version of the main patch shortly... > Looking forward to seeing it. Here we go. I've split this again into two patches. The first patch is just refactoring the current code. It moves XLogInsert into a new file, xloginsert.c, and the definition of XLogRecord to new xlogrecord.h header file. As a result, there is a a lot of churn in the #includes in C files that generate WAL records, or contain redo routines. The number of files that pull in xlog.h - directly or indirectly through other headers - is greatly reduced. The second patch contains the interesting changes. I wrote a little benchmark kit to performance test this. I'm trying to find out two things: 1) How much CPU overhead do the new XLogBeginInsert and XLogRegister* functions add, compared to the current approach with XLogRecDatas. 2) How much extra WAL is generated with the patch. This affects the CPU time spent in the tests, but it's also interesting to measure directly, because WAL size affects many things like WAL archiving, streaming replication etc. Attached is the test kit I'm using. To run the battery of tests, use "psql -f run.sql". To answer the question of WAL volume, it runs a bunch of tests that exercise heap insert, update and delete, as well as b-tree and GIN insertions. To answer the second test, it runs a heap insertion test, with a tiny record size that's chosen so that it generates exactly the same amount of WAL after alignment with and without the patch. The test is repeated many times, and the median of runtimes is printed out. Here are the results, comparing unpatched and patched versions. First, the WAL sizes: > postgres=# \i compare.sql > description | wal_per_op (orig) | wal_per_op (patched) | % > --------------------------------+-------------------+----------------------+-------- > heap insert 26 | 64 | 64 | 100.00 > heap insert 27 | 64 | 72 | 112.50 > heap insert 28 | 64 | 72 | 112.50 > heap insert 29 | 64 | 72 | 112.50 > heap insert 30 | 72 | 72 | 100.00 > heap insert 31 | 72 | 72 | 100.00 > heap insert 32 | 72 | 72 | 100.00 > heap insert 33 | 72 | 72 | 100.00 > heap insert 34 | 72 | 72 | 100.00 > heap insert 35 | 72 | 80 | 111.11 > heap update 26 | 80 | 80 | 100.00 > heap update 27 | 80 | 88 | 110.00 > heap update 28 | 107 | 88 | 82.24 > heap update 29 | 88 | 88 | 100.00 > heap update 30 | 88 | 108 | 122.73 > heap update 31 | 88 | 88 | 100.00 > heap update 32 | 105 | 88 | 83.81 > heap update 33 | 88 | 88 | 100.00 > heap update 34 | 88 | 102 | 115.91 > heap update 35 | 88 | 96 | 109.09 > hot update 26 | 112 | 80 | 71.43 > hot update 27 | 80 | 88 | 110.00 > hot update 28 | 80 | 94 | 117.50 > hot update 29 | 88 | 88 | 100.00 > hot update 30 | 105 | 88 | 83.81 > hot update 31 | 88 | 105 | 119.32 > hot update 32 | 88 | 88 | 100.00 > hot update 33 | 88 | 88 | 100.00 > hot update 34 | 124 | 88 | 70.97 > hot update 35 | 88 | 111 | 126.14 > heap + btree insert 26 | 149 | 157 | 105.37 > heap + btree insert 27 | 161 | 161 | 100.00 > heap + btree insert 28 | 177 | 178 | 100.56 > heap + btree insert 29 | 177 | 185 | 104.52 > heap + btree insert 30 | 178 | 185 | 103.93 > heap + btree insert 31 | 185 | 188 | 101.62 > heap + btree insert 32 | 202 | 202 | 100.00 > heap + btree insert 33 | 205 | 211 | 102.93 > heap + btree insert 34 | 202 | 210 | 103.96 > heap + btree insert 35 | 211 | 210 | 99.53 > heap + gin insert (fastupdate) | 12479 | 13182 | 105.63 > heap + gin insert | 232547 | 236677 | 101.78 > (42 rows) A heap insertion records are 2 bytes larger with the patch. Due to alignment, that makes for a 0 or 8 byte difference in the record sizes. Other WAL records have a similar store; a few extra bytes but no big regressions. There are a few outliers above where it appears that the patched version takes less space. Not sure why that would be, probably just a glitch in the test, autovacuum kicked in or something. Now, for the CPU overhead: description | dur_us (orig) | dur_us (patched) | % ----------------+---------------+------------------+-------- heap insert 30 | 0.7752835 | 0.831883 | 107.30 (1 row) So, the patched version runs 7.3 % slower. That's disappointing :-(. This are the result I got on my laptop today. Previously, the typical result I've gotten has been about 5%, so that's a bit high. Nevertheless, even a 5% slowdown is probably not acceptable. While I've trying to nail down where that difference comes from, I've seen a lot of strange phenomenon. At one point, the patched version was 10% slower, but I was able to bring the difference down to 5% if I added a certain function in xloginsert.c, but never called it. It was very repeatable at the time, I tried adding and removing it many times and always got the same result, but I don't see it with the current HEAD and patch version anymore. So I think 5% is pretty close to the margin of error that arises from different compiler optimizations, data/instruction cache effects etc. Looking at the 'perf' profile, The new function calls only amount to about 2% of overhead, so I'm not sure where the slowdown is coming from. Here are explanations I've considered, but I haven't been able to prove any of them: * Function call overhead of the new functions. I've tried inlining them, but found no big difference. * The relation and block information are included as a separate XLogRecData entry, so the chain that needs to be memcpy'd and CRCd is one entry longer. I've tried hacking away the extra entry, but haven't seen much difference. * Even though the record size is the same after alignment, it's 2 bytes longer without alignment, which happens to be about 5% of the total record size. I've tried modifying the record to be 2 bytes smaller for test purposes, but found no difference. I'm out of ideas at the moment. Anyone else? - Heikki
Attachment
pgsql-hackers by date: