Re: WAL format and API changes (9.5) - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: WAL format and API changes (9.5)
Date
Msg-id CAB7nPqQO57-nEXgT2zkvmsYwXov9kzVzEBio05=SSH_ghe8+=Q@mail.gmail.com
Whole thread Raw
In response to Re: WAL format and API changes (9.5)  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: WAL format and API changes (9.5)  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers



On Sun, Jun 15, 2014 at 6:11 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
On 06/14/2014 09:43 PM, Alvaro Herrera wrote:
Heikki Linnakangas wrote:

Here's a new version, rebased against master. No other changes.

This is missing xlogrecord.h ...

Oops, here you are.
Looking at this patch, it does not compile when assertions are enabled because of this assertion in heapam.c:
Assert(recdata == data + datalen);
It seems like a thinko as removing it makes the code compile.

Here is a review of this patch:
- Compilation without warnings, regression tests pass
- That's not a small patch, but the changes mainly done are xlog record insertion in accordance to the new API format.
$ git diff master --stat | tail -n1
52 files changed, 3601 insertions(+), 4371 deletions(-)

Some comments about the code:
1) In src/backend/access/transam/README:
's/no further action is no required/no further action is required/g'
2) Description of XLogBeginInsert, XLogHasBlockRef, XLogGetBlockRef and XLogGetBlockRefIds are missing in src/backend/access/transam/README.
3) There are a couple of code blocks marked as FIXME and BROKEN. There are as well some NOT_USED blocks.
4) Current patch crashes when running make check in contrib/test_decoding. Looking closer, wal_level = logical is broken:
=# show wal_level;
 wal_level
-----------
 logical
(1 row)
=# create database foo;
The connection to the server was lost. Attempting reset: Failed.

Looking even closer, log_heap_new_cid is broken:
(lldb) bt
* thread #1: tid = 0x0000, 0x00007fff8a3c2866 libsystem_kernel.dylib`__pthread_kill + 10, stop reason = signal SIGSTOP
  * frame #0: 0x00007fff8a3c2866 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff8d37235c libsystem_pthread.dylib`pthread_kill + 92
    frame #2: 0x00007fff9369db1a libsystem_c.dylib`abort + 125
    frame #3: 0x0000000104428819 postgres`ExceptionalCondition(conditionName=0x00000001044a711c, errorType=0x000000010448b19f, fileName=0x00000001044a2cfd, lineNumber=6879) + 137 at assert.c:54
    frame #4: 0x0000000103f08dbe postgres`log_heap_new_cid(relation=0x00007fae4482afb8, tup=0x00007fae438062e8) + 126 at heapam.c:6879
    frame #5: 0x0000000103f080cf postgres`heap_insert(relation=0x00007fae4482afb8, tup=0x00007fae438062e8, cid=0, options=0, bistate=0x0000000000000000) + 383 at heapam.c:2095
    frame #6: 0x0000000103f09b6a postgres`simple_heap_insert(relation=0x00007fae4482afb8, tup=0x00007fae438062e8) + 74 at heapam.c:2533
    frame #7: 0x000000010406aacf postgres`createdb(stmt=0x00007fae44843690) + 6335 at dbcommands.c:511
    frame #8: 0x000000010429def8 postgres`standard_ProcessUtility(parsetree=0x00007fae44843690, queryString=0x00007fae44842c38, context=PROCESS_UTILITY_TOPLEVEL, params=0x0000000000000000, dest=0x00007fae44843a18, completionTag=0x00007fff5bd4ee30) + 1720 at utility.c:56

5) Yes,there are some WAL records that have only data related to buffers, XLOG_GIN_VACUUM_DATA_LEAF_PAGE being one, with nothing registered with buffer_id = -d, but using a dummy entry seems counter-productive:
+       rdata = rdata_head;
+       if (rdata == NULL)
+       {
+               /*
+                * the rest of this function assumes that there is at least some
+                * rdata entries, so fake an empty rdata entry
+                */
+               dummy_rdata.data = NULL;
+               dummy_rdata.len = 0;
+               dummy_rdata.next = NULL;
+               rdata = &dummy_rdata;
+       }
Could this be removed and replaced by a couple of checks on rdata being NULL in some cases? XLogInsert should be updated in consequence.
6) XLogRegisterData creates a XLogRecData entry and appends it in one of the static XLogRecData entries if buffer_id >= 0 or to rdata_head/rdata_tail if buffer_id = -1 (for permanent data related to the WAL record). XLogRegisterXLogRecData does something similar, but with an entry of XLogRecData already built. What do you think about removing XLogRegisterXLogRecData and keeping only XLogRegisterData. This makes the interface simpler, and XLogRecData could be kept private in xlog.c. This would need more work especially on gin side where I am seeing for example constructLeafRecompressWALData directly building a XLogRecData entry.
By doing so, we could as remove the while loop at the bottom of XLogRegisterXLogRecData as we would insert in the tail only the latest record created, aka replacing that:
while ((*tail)->next)
        *tail = (*tail)->next;
By simply that:
*tail = (*tail)->next;
7) New structures at the top of xlog.c should have more comments about their role, particularly rdata_head and rdata_tail that store main WAL data (id of -1)
8) What do you think about adding in the README a description about how to retrieve the block list modified by a WAL record, for a flow similar to that:
record = XLogReadRecord(...);
nblocks = XLogGetBlockRefIds(record, blockids);
for (i = 0; i < nblocks; i++)
{
    bkpb = XLogGetBlockRef(record, id, NULL);
    // stuff
}
pg_xlogdump is a good example of what can be done as well.
9) In ginxlog.c, shouldn't this block of code use XLogGetPayload?
+               char       *payload = XLogRecGetData(record) + sizeof(ginxlogInsert);
+#ifdef NOT_USED
                leftChildBlkno = BlockIdGetBlockNumber((BlockId) payload);
+#endif
10) Shouldn't the call to XLogBeginInsert come first here?
@@ -1646,7 +1647,16 @@ spgAddNodeAction(Relation index, SpGistState *state,
                {
                        XLogRecPtr      recptr;

-                       recptr = XLogInsert(RM_SPGIST_ID, XLOG_SPGIST_ADD_NODE, rdata);
+                       XLogRegisterBuffer(0, saveCurrent.buffer, REGBUF_STANDARD);     /* orig page */
+                       XLogRegisterBuffer(1, current->buffer, REGBUF_STANDARD);                /* new page */
+                       if (xlrec.parentBlk == 2)
+                               XLogRegisterBuffer(2, parent->buffer, REGBUF_STANDARD);
+
+                       XLogBeginInsert();
+                       XLogRegisterData(-1, (char *) &xlrec, sizeof(xlrec));
+                       XLogRegisterData(-1, (char *) newInnerTuple, newInnerTuple->size);
+
+                       recptr = XLogInsert(RM_SPGIST_ID, XLOG_SPGIST_ADD_NODE);
11) Any reason for raising a PANIC here?
@@ -1126,7 +1126,7 @@ LWLockRelease(LWLock *l)
                        break;
        }
        if (i < 0)
-               elog(ERROR, "lock %s %d is not held", T_NAME(l), T_ID(l));
+               elog(PANIC, "lock %s %d is not held", T_NAME(l), T_ID(l));
        num_held_lwlocks--;
        for (; i < num_held_lwlocks; i++)
14) SPgist redo is crashing (during make installcheck run on a master, redo done on a standby at recovery).
(lldb) bt
* thread #1: tid = 0x0000, 0x00007fff8a3c2866 libsystem_kernel.dylib`__pthread_kill + 10, stop reason = signal SIGSTOP
  * frame #0: 0x00007fff8a3c2866 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff8d37235c libsystem_pthread.dylib`pthread_kill + 92
    frame #2: 0x00007fff9369db1a libsystem_c.dylib`abort + 125
    frame #3: 0x000000010ba8d819 postgres`ExceptionalCondition(conditionName=0x000000010bb21ea9, errorType=0x000000010baf019f, fileName=0x000000010bb212dd, lineNumber=65) + 137 at assert.c:54
    frame #4: 0x000000010b5c3a50 postgres`addOrReplaceTuple(page=0x000000010bfee980, tuple=0x00007ffd41822902, size=770059, offset=18) + 688 at spgxlog.c:65
    frame #5: 0x000000010b5c031f postgres`spgRedoMoveLeafs(lsn=71708608, record=0x00007ffd41822800) + 719 at spgxlog.c:263
    frame #6: 0x000000010b5bf492 postgres`spg_redo(lsn=71708608, record=0x00007ffd41822800) + 402 at spgxlog.c:988
    frame #7: 0x000000010b5e774a postgres`StartupXLOG + 8474 at xlog.c:6780
    frame #8: 0x000000010b86a5fe postgres`StartupProcessMain + 430 at startup.c:224
    frame #9: 0x000000010b600409 postgres`AuxiliaryProcessMain(argc=2, argv=0x00007fff546ea190) + 1897 at bootstrap.c:416
    frame #10: 0x000000010b864998 postgres`StartChildProcess(type=StartupProcess) + 328 at postmaster.c:5090
    frame #11: 0x000000010b862aa9 postgres`PostmasterMain(argc=3, argv=0x00007ffd40c04470) + 5401 at postmaster.c:1212
    frame #12: 0x000000010b7a9335 postgres`main(argc=3, argv=0x00007ffd40c04470) + 773 at main.c:227

Also, I wanted to run the WAL replay facility to compare before and after buffer images, but code crashed before... I am still planning to do so once this code gets more stable though.
Regards,
--
Michael

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: WAL replay bugs
Next
From: Pavel Stehule
Date:
Subject: Re: review: Built-in binning functions