Home > mailing lists

Re: WIP: WAL prefetch (another approach) - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: WIP: WAL prefetch (another approach)
Date	April 25, 2022 18:11:49
Msg-id	111657.1650910309@sss.pgh.pa.us Whole thread Raw
In response to	Re: WIP: WAL prefetch (another approach) (Thomas Munro <thomas.munro@gmail.com>)
Responses	Re: WIP: WAL prefetch (another approach)
List	pgsql-hackers

Tree view

I believe that the WAL prefetch patch probably accounts for the
intermittent errors that buildfarm member topminnow has shown
since it went in, eg [1]:

diff -U3 /home/nm/ext4/HEAD/pgsql/contrib/pg_walinspect/expected/pg_walinspect.out
/home/nm/ext4/HEAD/pgsql.build/contrib/pg_walinspect/results/pg_walinspect.out
--- /home/nm/ext4/HEAD/pgsql/contrib/pg_walinspect/expected/pg_walinspect.out    2022-04-10 03:05:15.972622440 +0200
+++ /home/nm/ext4/HEAD/pgsql.build/contrib/pg_walinspect/results/pg_walinspect.out    2022-04-25 05:09:49.861642059
+0200
@@ -34,11 +34,7 @@
 (1 row)

 SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_records_info_till_end_of_wal(:'wal_lsn1');
- ok
-----
- t
-(1 row)
-
+ERROR:  could not read WAL at 0/1903E40
 SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_stats(:'wal_lsn1', :'wal_lsn2');
  ok
 ----
@@ -46,11 +42,7 @@
 (1 row)

 SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_stats_till_end_of_wal(:'wal_lsn1');
- ok
-----
- t
-(1 row)
-
+ERROR:  could not read WAL at 0/1903E40
 -- ===================================================================
 -- Test for filtering out WAL records of a particular table
 -- ===================================================================


I've reproduced this manually on that machine, and confirmed that the
proximate cause is that XLogNextRecord() is returning NULL because
state->decode_queue_head == NULL, without bothering to provide an errormsg
(which doesn't seem very well thought out in itself).  I obtained the
contents of the xlogreader struct at failure:

(gdb) p *xlogreader
$1 = {routine = {page_read = 0x594270 <read_local_xlog_page_no_wait>,
    segment_open = 0x593b44 <wal_segment_open>,
    segment_close = 0x593d38 <wal_segment_close>}, system_identifier = 0,
  private_data = 0x0, ReadRecPtr = 26230672, EndRecPtr = 26230752,
  abortedRecPtr = 26230752, missingContrecPtr = 26230784,
  overwrittenRecPtr = 0, DecodeRecPtr = 26230672, NextRecPtr = 26230752,
  PrevRecPtr = 0, record = 0x0, decode_buffer = 0xf25428 "\240",
  decode_buffer_size = 65536, free_decode_buffer = true,
  decode_buffer_head = 0xf25428 "\240", decode_buffer_tail = 0xf25428 "\240",
  decode_queue_head = 0x0, decode_queue_tail = 0x0,
  readBuf = 0xf173f0 "\020\321\005", readLen = 0, segcxt = {
    ws_dir = '\000' <repeats 1023 times>, ws_segsize = 16777216}, seg = {
    ws_file = 25, ws_segno = 0, ws_tli = 1}, segoff = 0,
  latestPagePtr = 26222592, latestPageTLI = 1, currRecPtr = 26230752,
  currTLI = 1, currTLIValidUntil = 0, nextTLI = 0,
  readRecordBuf = 0xf1b3f8 "<", readRecordBufSize = 40960,
  errormsg_buf = 0xef3270 "", errormsg_deferred = false, nonblocking = false}

I don't have an intuition about where to look beyond that, any
suggestions?

What I do know so far is that while the failure reproduces fairly
reliably under "make check" (more than half the time, which squares
with topminnow's history), it doesn't reproduce at all under "make
installcheck" (after removing NO_INSTALLCHECK), which seems odd.
Maybe it's dependent on how much WAL history the installation has
accumulated?

It could be that this is a bug in pg_walinspect or a fault in its
test case; hard to tell since that got committed at about the same
time as the prefetch changes.

            regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=topminnow&dt=2022-04-25%2001%3A48%3A47

pgsql-hackers by date:

From: Pavel Stehule
Date: 25 April 2022, 17:39:58
Subject: Re: proposal: possibility to read dumped table's name from file

From: Tom Lane
Date: 25 April 2022, 18:31:48
Subject: Re: WIP: WAL prefetch (another approach)

Re: WIP: WAL prefetch (another approach) - Mailing list pgsql-hackers

Previous

Next