Re: WAL format and API changes (9.5) - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: WAL format and API changes (9.5)
Date
Msg-id CAB7nPqQreciciXVLpQ8Sk_HZM4_6twEtb79TCXCbK00CF423Eg@mail.gmail.com
Whole thread Raw
In response to Re: WAL format and API changes (9.5)  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: WAL format and API changes (9.5)
List pgsql-hackers


On Mon, Aug 18, 2014 at 10:55 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
> All rightey.
>
> Here's the next version of this work. It now comes as two patches. The first
> one refactors adds the XLogReplayBuffer() function and refactors all the
> redo functions to use it. It doesn't change the WAL record format in any
> way. The second patch applies over the first one, and changes the WAL
> format, and all the WAL-creating functions to use the new API for
> constructing WAL records. The second patch removes the relfilenode and block
> number arguments from XLogReplayBuffer, because they're no longer needed
> when that information is included in the record format.
>
> Todo:
>
> * Performance testing. Do the new WAL construction functions add overhead to
> WAL insertion?
> * Compare WAL record sizes before and after. I've tried to keep it as
> compact as possible, but I know that some records have gotten bigger. Need
> to do a more thorough analysis.
> * Rename XLogReplayBuffer. I don't particularly like any of the suggestions
> so far, XLogReplayBuffer included. Discussion continues..
> * Anything else?

Patch 1 looks good. The main difference with the v1 submitted a couple of days back is that the global variable approach is replaced with additional arguments for the LSN position and record pointer in XLogReplayBuffer. I have as well run a couple of tests with the page comparison tool, done some tests based on installcheck-world with a slave replaying WAL behind, and found no issues with it.
Perhaps we could consider pushing it to facilitate the next work? Even if the second patch is dropped it is still a win IMO to have backup block replay managed within a single function (being it named XLogReplayBuffer in latest patch), and having it return a status flag.

Regarding patch 2:
- The main differences with the latest version are the modifications for XLogReplayBuffer having new arguments (LSN position and record pointer). XLogRecGetBlockRefIds has been changed to return a palloc'd array of block IDs. xloginsert.h, containing all the functions for xlog record construction is introduced as well.
- Tiny thing, be aware of tab padding. Here is heapam.c:
                        page = BufferGetPage(buffer);
                        PageSetAllVisible(page);
                        MarkBufferDirty(buffer);
- XLogRecGetBlockRefIds is not described in src/backend/access/transam/README. Btw, pg_xlogdump drops a core dump when using it:
--Output:
Assertion failed: (n == *num_refs), function XLogRecGetBlockRefIds, file xlogreader.c, line 1157.
rmgr: Heap        len (rec/tot):     14/ 12912, tx:          3, lsn: 0/01000148, prev 0/01000120, Abort trap: 6 (core dumped)
-- Backtrace:
    frame #4: 0x0000000103870363 pg_xlogdump`XLogRecGetBlockRefIds(record=0x00007ff38a003200, num_refs=0x00007fff5c394464) + 435 at xlogreader.c:1157
    frame #5: 0x000000010386d610 pg_xlogdump`XLogDumpDisplayRecord(config=0x00007fff5c3945c8, ReadRecPtr=16777544, record=0x00007ff38a003200) + 272 at pg_xlogdump.c:357
    frame #6: 0x000000010386cad8 pg_xlogdump`main(argc=2, argv=0x00007fff5c394658) + 3160 at pg_xlogdump.c:749
In order to reproduce that, simply run regression tests, followed by pg_xlogdump on one of the WAL files generated.
- This patch includes some diffs from pg_receivexlog.c taken from 52bffe3.
- I have run installcheck-world and compared the size of the WAL generated on HEAD and the patch (any hints to improve such analysis are of course welcome)
      name      |   start   |    stop    |   diff  
----------------+-----------+------------+-----------
 HEAD (8605bc7) | 0/16C6808 | 0/11A2C670 | 271998568
 Patch 1+2      | 0/16D45D8 | 0/1267A4B0 | 284843736
(2 rows)
So that's a diff of more or less 13MB for this test set.

Looking forward for some performance numbers as well as more precise comparison of WAL record length.
--
Michael

pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: 9.5: Memory-bounded HashAgg
Next
From: Jeff Davis
Date:
Subject: Re: 9.5: Better memory accounting, towards memory-bounded HashAgg