Thread: nbtdesc.c and nbtpage.c are inconsistent withXLOG_BTREE_META_CLEANUP (11~)

nbtdesc.c and nbtpage.c are inconsistent withXLOG_BTREE_META_CLEANUP (11~)

From
Michael Paquier
Date:
Hi all,
(Adding in CC relevant committer and author, Teodor and Alexander)

I have been looking today at a crash of pg_waldump on one of the test
builds keeping running in some internal environment.  Luckily, I have
been able to put my hands on a core file with 11.2 running.  The
backtrace is not that interesting:
(gdb) bt
#0 btree_desc (buf=0x0, record=0x22ce590) at nbtdesc.c:103
#1 0x000000000040419f in XLogDumpDisplayRecord (config=0x7fff1ccfd360,
record=0x22ce590) at
/build/mts/release/bora-13598999/vpostgres/src/postgres/src/bin/pg_waldump/pg_waldump.c:558
#2 main (argc=<optimized out>, argv=<optimized out>) at
/build/mts/release/bora-13598999/vpostgres/src/postgres/src/bin/pg_waldump/pg_waldump.c:1170
(gdb) down 1
#0 btree_desc (buf=0x0, record=0x22ce590) at nbtdesc.c:103
103 nbtdesc.c: No such file or directory.
(gdb) p record
$1 = (XLogReaderState *) 0x22ce590
(gdb) p *record
$2 = {wal_segment_size = 16777216, read_page = 0x405220
<XLogDumpReadPage>, system_identifier = 0, private_data =
0x7fff1ccfd380, ReadRecPtr = 67109592, EndRecPtr = 67109672,
decoded_record = 0x22cf178, main_data = 0x0, main_data_len = 0,
main_data_bufsz = 0, record_origin = 0, blocks = {{in_use = true,
rnode = {spcNode = 16399, dbNode = 16386, relNode = 19907}, forknum =
MAIN_FORKNUM, blkno = 0, flags = 96 '`', has_image = false,
apply_image = false, bkp_image = 0x0, hole_offset = 0,  hole_length =
0, bimg_len = 0, bimg_info = 0 '\000', has_data = true, data =
0x22db2c0 "\003", data_len = 32,  data_bufsz = 8192}, {in_use = false,
rnode = {spcNode = 0, dbNode = 0, relNode = 0}, forknum =
MAIN_FORKNUM, blkno = 0, flags = 0 '\000', has_image = false,
apply_image = false, bkp_image = 0x0, hole_offset = 0, hole_length =
0, bimg_len = 0, bimg_info = 0 '\000', has_data = false, data = 0x0,
data_len = 0, data_bufsz = 0} <repeats 32 times>}, max_block_id = 0,
readBuf = 0x22ceea0 "\230\320\a", readLen = 8192, readSegNo = 4,
readOff = 0, readPageTLI = 0, latestPagePtr = 67108864, latestPageTLI
= 1, currRecPtr = 67109592, currTLI = 0, currTLIValidUntil = 0,
nextTLI = 0, readRecordBuf = 0x22d12b0 "L", readRecordBufSize = 40960,
errormsg_buf = 0x22d0eb0 ""}
(gdb) p xlrec
$5 = (xl_btree_metadata *) 0x0

Anyway, after looking at the code relevant to XLOG_BTREE_META_CLEANUP,
I have noticed that the meta-data associated to the first buffer is
registered via XLogRegisterBufData() (this is correct because we want
to associate this data to the metadata buffer).  However, nbtdesc.c
assumes that xl_btree_metadata is from the main record data, causing a
crash because we have nothing in this case.

I think that we could just patch nbtpage.c so as we fetch the
metadata using XLogRecGetBlockData(), and log its data.  The error
comes from 3d92796, which is one year-old and has been visibly
committed untested.  I am surprised that we  have not seen that
complain yet.  Attached is a patch, which looks right to me and should
be back-patched down to v11.  I have not taken the time to actually
test it though.

Thoughts?
--
Michael

Attachment

Re: nbtdesc.c and nbtpage.c are inconsistent with XLOG_BTREE_META_CLEANUP(11~)

From
Peter Geoghegan
Date:
On Sun, Jun 16, 2019 at 6:31 PM Michael Paquier <michael@paquier.xyz> wrote:
> I think that we could just patch nbtpage.c so as we fetch the
> metadata using XLogRecGetBlockData(), and log its data.

Don't you mean nbtdesc.c?

> The error
> comes from 3d92796, which is one year-old and has been visibly
> committed untested.  I am surprised that we  have not seen that
> complain yet.

Why is that surprising?

https://coverage.postgresql.org/src/backend/access/rmgrdesc/index.html

-- 
Peter Geoghegan



Re: nbtdesc.c and nbtpage.c are inconsistent withXLOG_BTREE_META_CLEANUP (11~)

From
Michael Paquier
Date:
On Sun, Jun 16, 2019 at 06:54:57PM -0700, Peter Geoghegan wrote:
> On Sun, Jun 16, 2019 at 6:31 PM Michael Paquier <michael@paquier.xyz> wrote:
>> I think that we could just patch nbtpage.c so as we fetch the
>> metadata using XLogRecGetBlockData(), and log its data.
>
> Don't you mean nbtdesc.c?

Yeah I meant nbtdesc.c, sorry.  This will have to wait after this
week's release for a fix by the way...

> Why is that surprising?
>
> https://coverage.postgresql.org/src/backend/access/rmgrdesc/index.html

I would have supposed that more people scan WAL records using the
description callbacks.
--
Michael

Attachment

Re: nbtdesc.c and nbtpage.c are inconsistent with XLOG_BTREE_META_CLEANUP(11~)

From
Peter Geoghegan
Date:
On Sun, Jun 16, 2019 at 7:05 PM Michael Paquier <michael@paquier.xyz> wrote:
> I would have supposed that more people scan WAL records using the
> description callbacks.

The WAL record in question, XLOG_BTREE_META_CLEANUP, is certainly one
of the less common record types used by nbtree. I agree that this
should have been tested when it went in, but I'm not surprised that
the bug remained undetected for a year. Not that many people use
pg_waldump.

-- 
Peter Geoghegan



Re: nbtdesc.c and nbtpage.c are inconsistent withXLOG_BTREE_META_CLEANUP (11~)

From
Michael Paquier
Date:
On Sun, Jun 16, 2019 at 07:14:05PM -0700, Peter Geoghegan wrote:
> The WAL record in question, XLOG_BTREE_META_CLEANUP, is certainly one
> of the less common record types used by nbtree. I agree that this
> should have been tested when it went in, but I'm not surprised that
> the bug remained undetected for a year. Not that many people use
> pg_waldump.

Actually, a simple installcheck generates a handful of them.  I have
not actually run into a crash, but this causes pg_waldump to describe
the record incorrectly.  Committed down to 11 after cross-checking
that the data inserted in the WAL record and what gets described are
both consistent.

_bt_restore_meta() does the right thing by the way when restoring the
page.
--
Michael

Attachment