Thread: Assert in heapgettup_pagemode() fails due to underlying buffer change

Assert in heapgettup_pagemode() fails due to underlying buffer change

From
Alexander Lakhin
Date:
Hello hackers,

I tried to investigate a recent buildfarm test failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder&dt=2024-06-04%2003%3A27%3A47

  29/295 postgresql:recovery / recovery/026_overwrite_contrecord ERROR            39.55s   exit status 32

log/026_overwrite_contrecord_standby.log
TRAP: failed Assert("ItemIdIsNormal(lpp)"), File: "../pgsql/src/backend/access/heap/heapam.c", Line: 1002, PID:
3740958
postgres: standby: bf postgres [local] startup(ExceptionalCondition+0x81)[0x56c60bf9]
postgres: standby: bf postgres [local] startup(+0xf776e)[0x5667276e]
postgres: standby: bf postgres [local] startup(heap_getnextslot+0x40)[0x56672ee1]
postgres: standby: bf postgres [local] startup(+0x11c218)[0x56697218]
postgres: standby: bf postgres [local] startup(systable_getnext+0xfa)[0x56697c1a]
postgres: standby: bf postgres [local] startup(+0x6d29c7)[0x56c4d9c7]
postgres: standby: bf postgres [local] startup(+0x6d372c)[0x56c4e72c]
postgres: standby: bf postgres [local] startup(+0x6d8288)[0x56c53288]
postgres: standby: bf postgres [local] startup(RelationCacheInitializePhase3+0x149)[0x56c52d71]

(It's not the only failure of that ilk in the buildfarm.)

and managed to reproduce the failure by running many
026_overwrite_contrecord tests in parallel (with fsync=on).

Analyzing the core dump added some info:
...
#3  0x0000000000bb43cc in ExceptionalCondition (conditionName=0xc45c77 "ItemIdIsNormal(lpp)",
     fileName=0xc45aa8 "heapam.c", lineNumber=1002) at assert.c:66
#4  0x00000000004f7f13 in heapgettup_pagemode (scan=0x19f5660, dir=ForwardScanDirection, nkeys=2, key=0x19f61d0)
     at heapam.c:1002
#5  0x00000000004f86d1 in heap_getnextslot (sscan=0x19f5660, direction=ForwardScanDirection, slot=0x19f5da0)
     at heapam.c:1307
#6  0x000000000051d028 in table_scan_getnextslot (sscan=0x19f5660, direction=ForwardScanDirection, slot=0x19f5da0)
     at ../../../../src/include/access/tableam.h:1081
#7  0x000000000051da80 in systable_getnext (sysscan=0x19f5470) at genam.c:530
#8  0x0000000000ba0937 in RelationBuildTupleDesc (relation=0x7fa004feea88) at relcache.c:572
#9  0x0000000000ba17b9 in RelationBuildDesc (targetRelId=2679, insertIt=true) at relcache.c:1184
#10 0x0000000000ba6520 in load_critical_index (indexoid=2679, heapoid=2610) at relcache.c:4353
#11 0x0000000000ba607d in RelationCacheInitializePhase3 () at relcache.c:4132
#12 0x0000000000bcb704 in InitPostgres (in_dbname=0x196ca30 "postgres", dboid=5, username=0x19a91b8 "law", useroid=0,
     flags=1, out_dbname=0x0) at postinit.c:1193
...
(gdb) frame 4
(gdb) p lpp->lp_flags
$2 = 1
(gdb) p ItemIdIsNormal(lpp)
$12 = 1

So it looks like the Assert had failed when lpp->lp_flags had some other
contents...

I added the following debugging code:
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -995,10 +995,14 @@ continue_page:
                 for (; linesleft > 0; linesleft--, lineindex += dir)
                 {
                         ItemId          lpp;
+                       ItemIdData      iid;
                         OffsetNumber lineoff;

                         lineoff = scan->rs_vistuples[lineindex];
                         lpp = PageGetItemId(page, lineoff);
+                       iid = *((ItemIdData *)lpp);
+
+                       Assert(ItemIdIsNormal(&iid));
                         Assert(ItemIdIsNormal(lpp));

and got:
...
#2  0x000055b68dc6998c in ExceptionalCondition (conditionName=0x55b68dcfe5f7 "ItemIdIsNormal(&iid)",
     fileName=0x55b68dcfe428 "heapam.c", lineNumber=1010) at assert.c:66
#3  0x000055b68d588a78 in heapgettup_pagemode (scan=0x55b68f0905e0, dir=ForwardScanDirection, nkeys=2,
     key=0x55b68f091150) at heapam.c:1010
#4  0x000055b68d58930e in heap_getnextslot (sscan=0x55b68f0905e0, direction=ForwardScanDirection, slot=0x55b68f090d20)
     at heapam.c:1322
...
(gdb) frame 3
#3  0x000055b68d588a78 in heapgettup_pagemode (...) at heapam.c:1010
1010                            Assert(ItemIdIsNormal(&iid));
(gdb) info locals
lpp = 0x7f615c34b0ec
iid = {lp_off = 0, lp_flags = 0, lp_len = 0}
lineoff = 54
tuple = 0x55b68f090638
page = 0x7f615c34b000 ""

(gdb) p *lpp
$1 = {lp_off = 3160, lp_flags = 1, lp_len = 136}

It seemingly confirms that the underlying memory was changed while being
processed in heapgettup_pagemode().

I've tried to add checks for the page buffer content as below:
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -953,11 +953,15 @@ heapgettup_pagemode(HeapScanDesc scan,
      Page        page;
      int         lineindex;
      int         linesleft;
+char    page_copy[BLCKSZ];

      if (likely(scan->rs_inited))
      {
          /* continue from previously returned page/tuple */
          page = BufferGetPage(scan->rs_cbuf);
+memcpy(page_copy, page, BLCKSZ);
+for (int i = 0; i < 100; i++)
+Assert(memcmp(page_copy, page, BLCKSZ) == 0);

          lineindex = scan->rs_cindex + dir;
          if (ScanDirectionIsForward(dir))
@@ -986,6 +990,10 @@ heapgettup_pagemode(HeapScanDesc scan,
          /* prune the page and determine visible tuple offsets */
          heap_prepare_pagescan((TableScanDesc) scan);
          page = BufferGetPage(scan->rs_cbuf);
+memcpy(page_copy, page, BLCKSZ);
+for (int i = 0; i < 100; i++)
+Assert(memcmp(page_copy, page, BLCKSZ) == 0);
+
          linesleft = scan->rs_ntuples;
          lineindex = ScanDirectionIsForward(dir) ? 0 : linesleft - 1;

and got the assertion failures even during `make check`:
...
#5  0x00005577f29e0bc4 in ExceptionalCondition (
     conditionName=conditionName@entry=0x5577f2a4a5d0 "memcmp(page_copy, page, BLCKSZ) == 0",
     fileName=fileName@entry=0x5577f2a4aa38 "heapam.c", lineNumber=lineNumber@entry=966) at assert.c:66
#6  0x00005577f24faa68 in heapgettup_pagemode (scan=scan@entry=0x5577f46574e8, dir=ForwardScanDirection, nkeys=0,
     key=0x0) at heapam.c:966
...
(gdb) frame 6
#6  0x00005577f24faa68 in heapgettup_pagemode (...) at heapam.c:966
966     Assert(memcmp(page_copy, page, BLCKSZ) == 0);
(gdb) p i
$1 = 25

Am I missing something or the the page buffer indeed lacks locking there?

Best regards,
Alexander



On Thu, Jun 6, 2024 at 6:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
> Am I missing something or the the page buffer indeed lacks locking there?

I don't know, but if the locks are really missing now, I feel like the
first question is "which commit got rid of them?". It's a little hard
to believe that they've never been there and somehow nobody has
noticed.

Then again, maybe we have; see Noah's thread about in-place updates
breaking stuff and some of the surprising discoveries there. But it
seems worth investigating.

--
Robert Haas
EDB: http://www.enterprisedb.com



Hello Robert,

06.06.2024 19:36, Robert Haas wrote:
> On Thu, Jun 6, 2024 at 6:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
>> Am I missing something or the the page buffer indeed lacks locking there?
> I don't know, but if the locks are really missing now, I feel like the
> first question is "which commit got rid of them?". It's a little hard
> to believe that they've never been there and somehow nobody has
> noticed.
>
> Then again, maybe we have; see Noah's thread about in-place updates
> breaking stuff and some of the surprising discoveries there. But it
> seems worth investigating.

Yes, my last experiment with memcmp for the whole buffer was wrong,
given the comment above heapgettup_pagemode(). I think the correct
check would be:
              ItemId      lpp;
              OffsetNumber lineoff;
+ItemIdData      iid;

              lineoff = scan->rs_vistuples[lineindex];
              lpp = PageGetItemId(page, lineoff);
+iid = *((ItemIdData *)lpp);
+for (int i = 0; i < 1000; i++)
+Assert(memcmp(&iid, lpp, sizeof(iid)) == 0);

It significantly alleviates reproducing of the test failure for me.
Will try to bisect this anomaly tomorrow.

Best regards,
Alexander



On Thu, Jun 06, 2024 at 12:36:32PM -0400, Robert Haas wrote:
> On Thu, Jun 6, 2024 at 6:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
> > Am I missing something or the the page buffer indeed lacks locking there?
> 
> I don't know, but if the locks are really missing now, I feel like the
> first question is "which commit got rid of them?". It's a little hard
> to believe that they've never been there and somehow nobody has
> noticed.
> 
> Then again, maybe we have; see Noah's thread about in-place updates
> breaking stuff and some of the surprising discoveries there. But it
> seems worth investigating.

$SUBJECT looks more like a duplicate of
postgr.es/m/flat/20240512171658.7e.nmisch@google.com (Hot standby queries see
transient all-zeros pages).



Hello Noah,

06.06.2024 22:07, Noah Misch wrote:
>
>> I don't know, but if the locks are really missing now, I feel like the
>> first question is "which commit got rid of them?". It's a little hard
>> to believe that they've never been there and somehow nobody has
>> noticed.
>>
>> Then again, maybe we have; see Noah's thread about in-place updates
>> breaking stuff and some of the surprising discoveries there. But it
>> seems worth investigating.
> $SUBJECT looks more like a duplicate of
> postgr.es/m/flat/20240512171658.7e.nmisch@google.com (Hot standby queries see
> transient all-zeros pages).

Thank you for the reference! Yes, it looks very similar. Though I can't
say the sleep you proposed helps the failure reproduction (I've tried
026_overwrite_contrecord.pl and saw no more frequent failures or so).

My bisect run ended with:
210622c60e1a9db2e2730140b8106ab57d259d15 is the first bad commit

Author: Thomas Munro <tmunro@postgresql.org>
Date:   Wed Apr 3 00:03:08 2024 +1300

     Provide vectored variant of ReadBuffer().

Other buildfarm failures with this Assert I could find kind of confirm this:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae&dt=2024-04-03%2003%3A32%3A18
(presumably a first failure of this sort)
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tamandua&dt=2024-04-04%2015%3A38%3A16
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=margay&dt=2024-05-07%2004%3A00%3A08

Best regards,
Alexander



On Fri, Jun 7, 2024 at 3:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
> My bisect run ended with:
> 210622c60e1a9db2e2730140b8106ab57d259d15 is the first bad commit
>
> Author: Thomas Munro <tmunro@postgresql.org>
> Date:   Wed Apr 3 00:03:08 2024 +1300
>
>      Provide vectored variant of ReadBuffer().
>
> Other buildfarm failures with this Assert I could find kind of confirm this:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae&dt=2024-04-03%2003%3A32%3A18
> (presumably a first failure of this sort)
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tamandua&dt=2024-04-04%2015%3A38%3A16
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=margay&dt=2024-05-07%2004%3A00%3A08

Looking...



On Fri, Jun 7, 2024 at 3:06 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Fri, Jun 7, 2024 at 3:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
> > My bisect run ended with:
> > 210622c60e1a9db2e2730140b8106ab57d259d15 is the first bad commit
> >
> > Author: Thomas Munro <tmunro@postgresql.org>
> > Date:   Wed Apr 3 00:03:08 2024 +1300
> >
> >      Provide vectored variant of ReadBuffer().
> >
> > Other buildfarm failures with this Assert I could find kind of confirm this:
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae&dt=2024-04-03%2003%3A32%3A18
> > (presumably a first failure of this sort)
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tamandua&dt=2024-04-04%2015%3A38%3A16
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=margay&dt=2024-05-07%2004%3A00%3A08
>
> Looking...

What Noah described[1] is what should be happening already, I think,
but 210622c6 unconditionally zeroed the page.  Oops.  The attached
seems to cure his repro for me.  Does it also cure your test?  I
couldn't see that variant myself for some reason, but it seems to make
sense as the explanation.  I would probably adjust the function name
or perhaps consider refactoring slightly, but first let's confirm that
this is the same issue and fix.

[1] https://www.postgresql.org/message-id/flat/20240512171658.7e.nmisch@google.com

Attachment
On 2024-Jun-07, Thomas Munro wrote:

>  static void
> -ZeroBuffer(Buffer buffer, ReadBufferMode mode)
> +ZeroBuffer(Buffer buffer, ReadBufferMode mode, bool zero)

This change makes the API very strange.  Should the function be called
ZeroAndLockBuffer() instead?  Then the addition of a "bool zero"
argument makes a lot more sense.

In passing, I noticed that WaitReadBuffers has zero comments, which
seems an insufficient number of them.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/



Hello Thomas,

07.06.2024 09:06, Thomas Munro wrote:
> On Fri, Jun 7, 2024 at 3:06 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>> On Fri, Jun 7, 2024 at 3:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
>>> My bisect run ended with:
>>> 210622c60e1a9db2e2730140b8106ab57d259d15 is the first bad commit
>>>
>>> Author: Thomas Munro <tmunro@postgresql.org>
>>> Date:   Wed Apr 3 00:03:08 2024 +1300
>>>
>>>       Provide vectored variant of ReadBuffer().
>>>
>>> Other buildfarm failures with this Assert I could find kind of confirm this:
>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae&dt=2024-04-03%2003%3A32%3A18
>>> (presumably a first failure of this sort)
>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tamandua&dt=2024-04-04%2015%3A38%3A16
>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=margay&dt=2024-05-07%2004%3A00%3A08
>> Looking...
> What Noah described[1] is what should be happening already, I think,
> but 210622c6 unconditionally zeroed the page.  Oops.  The attached
> seems to cure his repro for me.  Does it also cure your test?  I
> couldn't see that variant myself for some reason, but it seems to make
> sense as the explanation.  I would probably adjust the function name
> or perhaps consider refactoring slightly, but first let's confirm that
> this is the same issue and fix.

Thank you for looking and for the fix!

Using the same testing procedure (applying patch for checking lpp,
multiplying 026_overwrite_contrecord.pl tests and running 30 tests in
parallel, with fsync=on) which I used for bisecting, I got failures on
iterations 8, 19, 4 without the fix, but with the fix applied, 125
iterations passed. I think The Cure is sound.

Best regards,
Alexander



On Fri, Jun 7, 2024 at 4:05 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >  static void
> > -ZeroBuffer(Buffer buffer, ReadBufferMode mode)
> > +ZeroBuffer(Buffer buffer, ReadBufferMode mode, bool zero)
>
> This change makes the API very strange.  Should the function be called
> ZeroAndLockBuffer() instead?  Then the addition of a "bool zero"
> argument makes a lot more sense.

I agree that's better, but it still looks a bit weird. You have to
realize that 'bool zero' means 'is already zeroed' here -- or at
least, I guess that's the intention. But then I wonder why you'd call
a function called ZeroAndLockBuffer if all you need to do is
LockBuffer.

--
Robert Haas
EDB: http://www.enterprisedb.com



On Sat, Jun 8, 2024 at 12:47 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Jun 7, 2024 at 4:05 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > >  static void
> > > -ZeroBuffer(Buffer buffer, ReadBufferMode mode)
> > > +ZeroBuffer(Buffer buffer, ReadBufferMode mode, bool zero)
> >
> > This change makes the API very strange.  Should the function be called
> > ZeroAndLockBuffer() instead?  Then the addition of a "bool zero"
> > argument makes a lot more sense.
>
> I agree that's better, but it still looks a bit weird. You have to
> realize that 'bool zero' means 'is already zeroed' here -- or at
> least, I guess that's the intention. But then I wonder why you'd call
> a function called ZeroAndLockBuffer if all you need to do is
> LockBuffer.

The name weirdness comes directly from RBM_ZERO_AND_LOCK (the fact
that it doesn't always zero despite shouting ZERO is probably what
temporarily confused me).  But coming up with a better name is hard
and I certainly don't propose to change it now.  I think it's
reasonable for this internal helper function to have that matching
name as Alvaro suggested, with a good comment about that.

Even though that quick-demonstration change fixed the two reported
repros, I think it is still probably racy (or if it isn't, it relies
on higher level interlocking that I don't want to rely on).  This case
really should be using the standard StartBufferIO/TerminateBufferIO
infrastructure as it was before.  I had moved that around to deal with
multi-block I/O, but dropped the ball on the zero case... sorry.

Here's a version like that.  The "zero" argument (yeah that was not a
good name) is now inverted and called "already_valid", but it's only a
sort of optimisation for the case where we already know for sure that
it's valid.  If it isn't, we do the standard
BM_IO_IN_PROGRESS/BM_VALID/CV dance, for correct interaction with any
concurrent read or zero operation.

Attachment
New version.  Same code as v2, but comments improved to explain the
reasoning, with reference to README's buffer access rules.  I'm
planning to push this soon if there are no objections.

Attachment
On Fri, Jun 7, 2024 at 8:05 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> In passing, I noticed that WaitReadBuffers has zero comments, which
> seems an insufficient number of them.

Ack.  Here is a patch for that.  I guess I hadn't put a comment there
because it's hard to write anything without sort of duplicating what
is already said by the StartReadBuffers() comment and doubling down on
descriptions of future plans... but, in for a penny, in for a pound as
they say.

Attachment
I adjusted the code a bit more to look like the 16 coding including
restoring some very useful comments that had been lost, and pushed.

Thanks very much to Alexander and Noah for (independently) chasing
this down and reporting, testing etc, and Alvaro and Robert for review
comments.

(Passing thought: it's a bit weird that we need to zero pages at all
before restoring FPIs or initialising them.  Perhaps there is some way
to defer marking the buffer valid until after the caller gets a chance
to initialise?  Or something like that...)