Re: WAL format and API changes (9.5) - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: WAL format and API changes (9.5)
Date
Msg-id 5464B338.8070805@vmware.com
Whole thread Raw
In response to Re: WAL format and API changes (9.5)  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: WAL format and API changes (9.5)  (Andres Freund <andres@2ndquadrant.com>)
Re: WAL format and API changes (9.5)  (Amit Kapila <amit.kapila16@gmail.com>)
Re: WAL format and API changes (9.5)  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
On 11/11/2014 04:42 PM, Amit Kapila wrote:
> I have done some performance testing of this patch using attached
> script and data is as below:
>
> ...
>
> It seems to me that there is a regression of (4 ~ 8%) for small records,
> refer two short fields tests.

Thanks for the testing!

Here's a new version, with big changes again to the record format. Have
a look at xlogrecord.h for the details, but in a nutshell:

1. The overall format is now: XLogRecord, per-block headers, header for
main data portion, per-block data, main data.

2. I removed xl_len field from XLogRecord and rearranged the fields, to
shrink the XLogRecord struct from 32 to 24 bytes. (instead, there's a
new 2- or 5-byte header for the "main data", after the block headers).

3. No alignment padding. (the data chunks are copied to aligned buffers
at replay, so redo functions can still assume aligned access)

In quick testing, this new WAL format is somewhat more compact than the
9.4 format. That also seems to have more than bought back the
performance regression I saw earlier. Here are results from my laptop,
using the wal-update-testsuite.sh script:

master:

                 testname                 | wal_generated |     duration

-----------------------------------------+---------------+------------------
  two short fields, no change             |     396982984 | 7.73713994026184
  two short fields, no change             |     398531152 | 7.72360110282898
  two short fields, no change             |     397228552 | 7.90237998962402
  two short fields, one changed           |     437108464 | 8.03014206886292
  two short fields, one changed           |     438368456 | 8.17672896385193
  two short fields, one changed           |     437105232 | 7.89896702766418
  two short fields, both changed          |     437100544 | 7.98763203620911
  two short fields, both changed          |     437107032 |  8.0971851348877
  two short fields, both changed          |     437105368 |  8.1279079914093
  one short and one long field, no change |      76552752 | 2.47367906570435
  one short and one long field, no change |      76043608 | 2.54243588447571
  one short and one long field, no change |      76042576 |  2.6014678478241
  ten tiny fields, all changed            |     477221488 | 9.41646003723145
  ten tiny fields, all changed            |     477224080 | 9.37260103225708
  ten tiny fields, all changed            |     477220944 | 9.41951704025269
  hundred tiny fields, all changed        |     180889992 | 4.72576093673706
  hundred tiny fields, all changed        |     180348224 | 4.50496411323547
  hundred tiny fields, all changed        |     181347504 | 4.78004717826843
  hundred tiny fields, half changed       |     180379760 | 4.53589606285095
  hundred tiny fields, half changed       |     181773832 | 4.85075807571411
  hundred tiny fields, half changed       |     180348160 | 4.65349197387695
  hundred tiny fields, half nulled        |     100114832 | 3.70726609230042
  hundred tiny fields, half nulled        |     100116840 | 3.88224697113037
  hundred tiny fields, half nulled        |     100118848 | 4.00612688064575
  9 short and 1 long, short changed       |     108140640 | 2.63146805763245
  9 short and 1 long, short changed       |     108508784 | 2.76349496841431
  9 short and 1 long, short changed       |     108137144 | 2.79056811332703
(27 rows)

wal-format-and-api-changes-9.patch:

                 testname                 | wal_generated |     duration

-----------------------------------------+---------------+------------------
  two short fields, no change             |     356865216 | 6.81889986991882
  two short fields, no change             |     356871304 |  7.0333080291748
  two short fields, no change             |     356869520 | 6.62423706054688
  two short fields, one changed           |     356867824 | 7.09969711303711
  two short fields, one changed           |     356866480 | 7.07576990127563
  two short fields, one changed           |     357987080 | 7.25394797325134
  two short fields, both changed          |     396996096 | 7.13484597206116
  two short fields, both changed          |     396990184 | 7.08063006401062
  two short fields, both changed          |     396987192 | 7.04641604423523
  one short and one long field, no change |      70858376 |  2.2726149559021
  one short and one long field, no change |      68024232 | 2.21982789039612
  one short and one long field, no change |      69258192 |  2.4696249961853
  ten tiny fields, all changed            |     396987896 | 8.25723004341125
  ten tiny fields, all changed            |     396983768 | 8.24221706390381
  ten tiny fields, all changed            |     397012600 | 8.60816693305969
  hundred tiny fields, all changed        |     172327416 | 4.57576704025269
  hundred tiny fields, all changed        |     174669320 | 4.52080512046814
  hundred tiny fields, all changed        |     172696944 | 4.65672993659973
  hundred tiny fields, half changed       |     172323720 | 4.57278800010681
  hundred tiny fields, half changed       |     172330232 | 4.63164114952087
  hundred tiny fields, half changed       |     172326864 | 4.74219608306885
  hundred tiny fields, half nulled        |      85597408 | 3.78670310974121
  hundred tiny fields, half nulled        |      84742808 | 3.82968688011169
  hundred tiny fields, half nulled        |      84066936 | 3.86192607879639
  9 short and 1 long, short changed       |     100113080 | 2.54274320602417
  9 short and 1 long, short changed       |     100119440 |  2.4966151714325
  9 short and 1 long, short changed       |     100115960 | 2.63230085372925
(27 rows)

Aside from the WAL record format changes, this patch adds the "decoded
WAL record" infrastructure that we talked about with Andres. XLogReader
now has a new function, DecodeXLogRecord, which parses the block headers
etc. from the WAL record, and copies the data chunks to aligned buffers.
The redo routines are passed a pointer to the XLogReaderState, instead
of the plain XLogRecord, and the redo routines can use macros and
functions defined xlogreader.h to access the already-decoded WAL record.
The new WAL record format is difficult to parse in a piece-meal fashion,
so it really needs this separate decoding pass to be efficient.

Thoughts on this new WAL record format? I've attached the xlogrecord.h
file here separately for easy reading, if you want to take a quick look
at just that without applying the whole patch.

- Heikki


Attachment

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: tracking commit timestamps
Next
From: Petr Jelinek
Date:
Subject: Re: tracking commit timestamps