Re: XLogReadRecord() error in XlogReadTwoPhaseData() - Mailing list pgsql-hackers

From Tom Lane
Subject Re: XLogReadRecord() error in XlogReadTwoPhaseData()
Date
Msg-id 2782601.1637189230@sss.pgh.pa.us
Whole thread Raw
In response to Re: XLogReadRecord() error in XlogReadTwoPhaseData()  (Noah Misch <noah@leadboat.com>)
Responses Re: XLogReadRecord() error in XlogReadTwoPhaseData()
List pgsql-hackers
Noah Misch <noah@leadboat.com> writes:
> Tom Lane reported another instance today:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tadarida&dt=2021-11-11%2013%3A29%3A58

> Each of the three failures happened on a sparc64 Debian+gcc machine.  I had
> tried ~8000 iterations on thorntail, another sparc64 Debian+gcc animal,
> without reproducing this.

>>> As a first step, let's report the actual XLogReadRecord() error message.
>>> Attached.

>> Good catch!  This looks good.

> Pushed.

Well, we didn't have to wait too long [1]:

#   at t/003_cic_2pc.pl line 143.
#                   'pgbench: error: client 0 script 1 aborted in command 4 query 0: ERROR:  could not read two-phase
statefrom WAL at 0/159EF88: unexpected pageaddr 0/0 in log segment 000000010000000000000001, offset 5890048 
# pgbench: error: client 2 script 3 aborted in command 2 query 0: ERROR:  canceling statement due to lock timeout
# pgbench: fatal: Run was aborted; the above results are incomplete.

I suppose "unexpected pageaddr 0/0" is most easily explained by supposing
that XlogReadTwoPhaseData tried to read a WAL page that hadn't been
written out yet.  Have we got any synchronization around that?

            regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tadarida&dt=2021-11-17%2013%3A01%3A24



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Patch: Range Merge Join
Next
From: Tom Lane
Date:
Subject: Re: Windows: Wrong error message at connection termination