Re: WAL format and API changes (9.5) - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: WAL format and API changes (9.5)
Date
Msg-id CAB7nPqR-TUdK+-Uc5-Aodjpdxw3=-hjeV7y-1nKj+qo+O1WQ+A@mail.gmail.com
Whole thread Raw
In response to Re: WAL format and API changes (9.5)  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: WAL format and API changes (9.5)  (Michael Paquier <michael.paquier@gmail.com>)
Re: WAL format and API changes (9.5)  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
On Thu, Nov 13, 2014 at 10:33 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> In quick testing, this new WAL format is somewhat more compact than the 9.4
> format. That also seems to have more than bought back the performance
> regression I saw earlier. Here are results from my laptop, using the
> wal-update-testsuite.sh script:
> master:
> [results]
> (27 rows)
> wal-format-and-api-changes-9.patch:
> [results]
> (27 rows)
So based on your series of tests, you are saving 6% up to 10%.
That's... Really cool!

> Aside from the WAL record format changes, this patch adds the "decoded WAL
> record" infrastructure that we talked about with Andres. XLogReader now has
> a new function, DecodeXLogRecord, which parses the block headers etc. from
> the WAL record, and copies the data chunks to aligned buffers. The redo
> routines are passed a pointer to the XLogReaderState, instead of the plain
> XLogRecord, and the redo routines can use macros and functions defined
> xlogreader.h to access the already-decoded WAL record. The new WAL record
> format is difficult to parse in a piece-meal fashion, so it really needs
> this separate decoding pass to be efficient.
>
> Thoughts on this new WAL record format? I've attached the xlogrecord.h file
> here separately for easy reading, if you want to take a quick look at just
> that without applying the whole patch.
The new format is neat, and that's a lot of code.. Grouping together
the blocks of data is a good thing for the FPW compression patch as
well.

Note that this patch conflicts with the recent commits 81c4508 and
34402ae. installcheck-world is passing, and standbys are able to
replay records, at least without crashes.

Here are some more comments:
1) with assertions enabled this does not compile because of a small
typo in xlogreader.h here:
+#define XLogRecHasAnyBlockRefs(decoder) ((decoder)->max_block id >= 0)
2) In xlogreader.c, XLogRecGetBlockData should return char * but a
boolean is returned here:
+XLogRecGetBlockData(XLogReaderState *record, uint8 block_id, Size *len)
+{
+       DecodedBkpBlock *bkpb;
+
+       if (!record->blocks[block_id].in_use)
+               return false;
As no blocks are in use in this case this should be NULL.
3) pg_xlogdump does not seem to work:
$ pg_xlogdump 00000001000000000000000D
pg_xlogdump: FATAL:  could not find a valid record after 0/D000000
4) A couple of NOT_USED blocks could be removed, no?
+#ifdef NOT_USED       BlockNumber leftChildBlkno = InvalidBlockNumber;
+#endif
5) Here why not using the 2nd block instead of the 3rd (@_bt_getroot)?
+                       XLogBeginInsert();
+                       XLogRegisterBuffer(0, rootbuf, REGBUF_WILL_INIT);
+                       XLogRegisterBuffer(2, metabuf, REGBUF_WILL_INIT);
6) There are FIXME blocks:
+               // FIXME
+               //if ((bkpb->fork_flags & BKPBLOCK_WILL_INIT) != 0 &&
mode != RBM_ZERO)
+               //      elog(PANIC, "block with WILL_INIT flag in WAL
record must be zeroed by redo routine");]
And that:
+               /* FIXME: translation? Although this shouldn't happen.. */
+               ereport(ERROR,
+                               (errmsg("error decoding WAL record"),
+                                errdetail("%s", errormsg)));
Regards,
-- 
Michael



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Change in HEAP_NEWPAGE logging makes diagnosis harder
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: alter user/role CURRENT_USER