Thread: Assert in heapgettup_pagemode() fails due to underlying buffer change
Hello hackers, I tried to investigate a recent buildfarm test failure: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder&dt=2024-06-04%2003%3A27%3A47 29/295 postgresql:recovery / recovery/026_overwrite_contrecord ERROR 39.55s exit status 32 log/026_overwrite_contrecord_standby.log TRAP: failed Assert("ItemIdIsNormal(lpp)"), File: "../pgsql/src/backend/access/heap/heapam.c", Line: 1002, PID: 3740958 postgres: standby: bf postgres [local] startup(ExceptionalCondition+0x81)[0x56c60bf9] postgres: standby: bf postgres [local] startup(+0xf776e)[0x5667276e] postgres: standby: bf postgres [local] startup(heap_getnextslot+0x40)[0x56672ee1] postgres: standby: bf postgres [local] startup(+0x11c218)[0x56697218] postgres: standby: bf postgres [local] startup(systable_getnext+0xfa)[0x56697c1a] postgres: standby: bf postgres [local] startup(+0x6d29c7)[0x56c4d9c7] postgres: standby: bf postgres [local] startup(+0x6d372c)[0x56c4e72c] postgres: standby: bf postgres [local] startup(+0x6d8288)[0x56c53288] postgres: standby: bf postgres [local] startup(RelationCacheInitializePhase3+0x149)[0x56c52d71] (It's not the only failure of that ilk in the buildfarm.) and managed to reproduce the failure by running many 026_overwrite_contrecord tests in parallel (with fsync=on). Analyzing the core dump added some info: ... #3 0x0000000000bb43cc in ExceptionalCondition (conditionName=0xc45c77 "ItemIdIsNormal(lpp)", fileName=0xc45aa8 "heapam.c", lineNumber=1002) at assert.c:66 #4 0x00000000004f7f13 in heapgettup_pagemode (scan=0x19f5660, dir=ForwardScanDirection, nkeys=2, key=0x19f61d0) at heapam.c:1002 #5 0x00000000004f86d1 in heap_getnextslot (sscan=0x19f5660, direction=ForwardScanDirection, slot=0x19f5da0) at heapam.c:1307 #6 0x000000000051d028 in table_scan_getnextslot (sscan=0x19f5660, direction=ForwardScanDirection, slot=0x19f5da0) at ../../../../src/include/access/tableam.h:1081 #7 0x000000000051da80 in systable_getnext (sysscan=0x19f5470) at genam.c:530 #8 0x0000000000ba0937 in RelationBuildTupleDesc (relation=0x7fa004feea88) at relcache.c:572 #9 0x0000000000ba17b9 in RelationBuildDesc (targetRelId=2679, insertIt=true) at relcache.c:1184 #10 0x0000000000ba6520 in load_critical_index (indexoid=2679, heapoid=2610) at relcache.c:4353 #11 0x0000000000ba607d in RelationCacheInitializePhase3 () at relcache.c:4132 #12 0x0000000000bcb704 in InitPostgres (in_dbname=0x196ca30 "postgres", dboid=5, username=0x19a91b8 "law", useroid=0, flags=1, out_dbname=0x0) at postinit.c:1193 ... (gdb) frame 4 (gdb) p lpp->lp_flags $2 = 1 (gdb) p ItemIdIsNormal(lpp) $12 = 1 So it looks like the Assert had failed when lpp->lp_flags had some other contents... I added the following debugging code: --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -995,10 +995,14 @@ continue_page: for (; linesleft > 0; linesleft--, lineindex += dir) { ItemId lpp; + ItemIdData iid; OffsetNumber lineoff; lineoff = scan->rs_vistuples[lineindex]; lpp = PageGetItemId(page, lineoff); + iid = *((ItemIdData *)lpp); + + Assert(ItemIdIsNormal(&iid)); Assert(ItemIdIsNormal(lpp)); and got: ... #2 0x000055b68dc6998c in ExceptionalCondition (conditionName=0x55b68dcfe5f7 "ItemIdIsNormal(&iid)", fileName=0x55b68dcfe428 "heapam.c", lineNumber=1010) at assert.c:66 #3 0x000055b68d588a78 in heapgettup_pagemode (scan=0x55b68f0905e0, dir=ForwardScanDirection, nkeys=2, key=0x55b68f091150) at heapam.c:1010 #4 0x000055b68d58930e in heap_getnextslot (sscan=0x55b68f0905e0, direction=ForwardScanDirection, slot=0x55b68f090d20) at heapam.c:1322 ... (gdb) frame 3 #3 0x000055b68d588a78 in heapgettup_pagemode (...) at heapam.c:1010 1010 Assert(ItemIdIsNormal(&iid)); (gdb) info locals lpp = 0x7f615c34b0ec iid = {lp_off = 0, lp_flags = 0, lp_len = 0} lineoff = 54 tuple = 0x55b68f090638 page = 0x7f615c34b000 "" (gdb) p *lpp $1 = {lp_off = 3160, lp_flags = 1, lp_len = 136} It seemingly confirms that the underlying memory was changed while being processed in heapgettup_pagemode(). I've tried to add checks for the page buffer content as below: --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -953,11 +953,15 @@ heapgettup_pagemode(HeapScanDesc scan, Page page; int lineindex; int linesleft; +char page_copy[BLCKSZ]; if (likely(scan->rs_inited)) { /* continue from previously returned page/tuple */ page = BufferGetPage(scan->rs_cbuf); +memcpy(page_copy, page, BLCKSZ); +for (int i = 0; i < 100; i++) +Assert(memcmp(page_copy, page, BLCKSZ) == 0); lineindex = scan->rs_cindex + dir; if (ScanDirectionIsForward(dir)) @@ -986,6 +990,10 @@ heapgettup_pagemode(HeapScanDesc scan, /* prune the page and determine visible tuple offsets */ heap_prepare_pagescan((TableScanDesc) scan); page = BufferGetPage(scan->rs_cbuf); +memcpy(page_copy, page, BLCKSZ); +for (int i = 0; i < 100; i++) +Assert(memcmp(page_copy, page, BLCKSZ) == 0); + linesleft = scan->rs_ntuples; lineindex = ScanDirectionIsForward(dir) ? 0 : linesleft - 1; and got the assertion failures even during `make check`: ... #5 0x00005577f29e0bc4 in ExceptionalCondition ( conditionName=conditionName@entry=0x5577f2a4a5d0 "memcmp(page_copy, page, BLCKSZ) == 0", fileName=fileName@entry=0x5577f2a4aa38 "heapam.c", lineNumber=lineNumber@entry=966) at assert.c:66 #6 0x00005577f24faa68 in heapgettup_pagemode (scan=scan@entry=0x5577f46574e8, dir=ForwardScanDirection, nkeys=0, key=0x0) at heapam.c:966 ... (gdb) frame 6 #6 0x00005577f24faa68 in heapgettup_pagemode (...) at heapam.c:966 966 Assert(memcmp(page_copy, page, BLCKSZ) == 0); (gdb) p i $1 = 25 Am I missing something or the the page buffer indeed lacks locking there? Best regards, Alexander
On Thu, Jun 6, 2024 at 6:00 AM Alexander Lakhin <exclusion@gmail.com> wrote: > Am I missing something or the the page buffer indeed lacks locking there? I don't know, but if the locks are really missing now, I feel like the first question is "which commit got rid of them?". It's a little hard to believe that they've never been there and somehow nobody has noticed. Then again, maybe we have; see Noah's thread about in-place updates breaking stuff and some of the surprising discoveries there. But it seems worth investigating. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Assert in heapgettup_pagemode() fails due to underlying buffer change
From
Alexander Lakhin
Date:
Hello Robert, 06.06.2024 19:36, Robert Haas wrote: > On Thu, Jun 6, 2024 at 6:00 AM Alexander Lakhin <exclusion@gmail.com> wrote: >> Am I missing something or the the page buffer indeed lacks locking there? > I don't know, but if the locks are really missing now, I feel like the > first question is "which commit got rid of them?". It's a little hard > to believe that they've never been there and somehow nobody has > noticed. > > Then again, maybe we have; see Noah's thread about in-place updates > breaking stuff and some of the surprising discoveries there. But it > seems worth investigating. Yes, my last experiment with memcmp for the whole buffer was wrong, given the comment above heapgettup_pagemode(). I think the correct check would be: ItemId lpp; OffsetNumber lineoff; +ItemIdData iid; lineoff = scan->rs_vistuples[lineindex]; lpp = PageGetItemId(page, lineoff); +iid = *((ItemIdData *)lpp); +for (int i = 0; i < 1000; i++) +Assert(memcmp(&iid, lpp, sizeof(iid)) == 0); It significantly alleviates reproducing of the test failure for me. Will try to bisect this anomaly tomorrow. Best regards, Alexander
On Thu, Jun 06, 2024 at 12:36:32PM -0400, Robert Haas wrote: > On Thu, Jun 6, 2024 at 6:00 AM Alexander Lakhin <exclusion@gmail.com> wrote: > > Am I missing something or the the page buffer indeed lacks locking there? > > I don't know, but if the locks are really missing now, I feel like the > first question is "which commit got rid of them?". It's a little hard > to believe that they've never been there and somehow nobody has > noticed. > > Then again, maybe we have; see Noah's thread about in-place updates > breaking stuff and some of the surprising discoveries there. But it > seems worth investigating. $SUBJECT looks more like a duplicate of postgr.es/m/flat/20240512171658.7e.nmisch@google.com (Hot standby queries see transient all-zeros pages).
Re: Assert in heapgettup_pagemode() fails due to underlying buffer change
From
Alexander Lakhin
Date:
Hello Noah, 06.06.2024 22:07, Noah Misch wrote: > >> I don't know, but if the locks are really missing now, I feel like the >> first question is "which commit got rid of them?". It's a little hard >> to believe that they've never been there and somehow nobody has >> noticed. >> >> Then again, maybe we have; see Noah's thread about in-place updates >> breaking stuff and some of the surprising discoveries there. But it >> seems worth investigating. > $SUBJECT looks more like a duplicate of > postgr.es/m/flat/20240512171658.7e.nmisch@google.com (Hot standby queries see > transient all-zeros pages). Thank you for the reference! Yes, it looks very similar. Though I can't say the sleep you proposed helps the failure reproduction (I've tried 026_overwrite_contrecord.pl and saw no more frequent failures or so). My bisect run ended with: 210622c60e1a9db2e2730140b8106ab57d259d15 is the first bad commit Author: Thomas Munro <tmunro@postgresql.org> Date: Wed Apr 3 00:03:08 2024 +1300 Provide vectored variant of ReadBuffer(). Other buildfarm failures with this Assert I could find kind of confirm this: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae&dt=2024-04-03%2003%3A32%3A18 (presumably a first failure of this sort) https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tamandua&dt=2024-04-04%2015%3A38%3A16 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=margay&dt=2024-05-07%2004%3A00%3A08 Best regards, Alexander
On Fri, Jun 7, 2024 at 3:00 PM Alexander Lakhin <exclusion@gmail.com> wrote: > My bisect run ended with: > 210622c60e1a9db2e2730140b8106ab57d259d15 is the first bad commit > > Author: Thomas Munro <tmunro@postgresql.org> > Date: Wed Apr 3 00:03:08 2024 +1300 > > Provide vectored variant of ReadBuffer(). > > Other buildfarm failures with this Assert I could find kind of confirm this: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae&dt=2024-04-03%2003%3A32%3A18 > (presumably a first failure of this sort) > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tamandua&dt=2024-04-04%2015%3A38%3A16 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=margay&dt=2024-05-07%2004%3A00%3A08 Looking...
On Fri, Jun 7, 2024 at 3:06 PM Thomas Munro <thomas.munro@gmail.com> wrote: > On Fri, Jun 7, 2024 at 3:00 PM Alexander Lakhin <exclusion@gmail.com> wrote: > > My bisect run ended with: > > 210622c60e1a9db2e2730140b8106ab57d259d15 is the first bad commit > > > > Author: Thomas Munro <tmunro@postgresql.org> > > Date: Wed Apr 3 00:03:08 2024 +1300 > > > > Provide vectored variant of ReadBuffer(). > > > > Other buildfarm failures with this Assert I could find kind of confirm this: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae&dt=2024-04-03%2003%3A32%3A18 > > (presumably a first failure of this sort) > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tamandua&dt=2024-04-04%2015%3A38%3A16 > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=margay&dt=2024-05-07%2004%3A00%3A08 > > Looking... What Noah described[1] is what should be happening already, I think, but 210622c6 unconditionally zeroed the page. Oops. The attached seems to cure his repro for me. Does it also cure your test? I couldn't see that variant myself for some reason, but it seems to make sense as the explanation. I would probably adjust the function name or perhaps consider refactoring slightly, but first let's confirm that this is the same issue and fix. [1] https://www.postgresql.org/message-id/flat/20240512171658.7e.nmisch@google.com
Attachment
On 2024-Jun-07, Thomas Munro wrote: > static void > -ZeroBuffer(Buffer buffer, ReadBufferMode mode) > +ZeroBuffer(Buffer buffer, ReadBufferMode mode, bool zero) This change makes the API very strange. Should the function be called ZeroAndLockBuffer() instead? Then the addition of a "bool zero" argument makes a lot more sense. In passing, I noticed that WaitReadBuffers has zero comments, which seems an insufficient number of them. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Re: Assert in heapgettup_pagemode() fails due to underlying buffer change
From
Alexander Lakhin
Date:
Hello Thomas, 07.06.2024 09:06, Thomas Munro wrote: > On Fri, Jun 7, 2024 at 3:06 PM Thomas Munro <thomas.munro@gmail.com> wrote: >> On Fri, Jun 7, 2024 at 3:00 PM Alexander Lakhin <exclusion@gmail.com> wrote: >>> My bisect run ended with: >>> 210622c60e1a9db2e2730140b8106ab57d259d15 is the first bad commit >>> >>> Author: Thomas Munro <tmunro@postgresql.org> >>> Date: Wed Apr 3 00:03:08 2024 +1300 >>> >>> Provide vectored variant of ReadBuffer(). >>> >>> Other buildfarm failures with this Assert I could find kind of confirm this: >>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae&dt=2024-04-03%2003%3A32%3A18 >>> (presumably a first failure of this sort) >>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tamandua&dt=2024-04-04%2015%3A38%3A16 >>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=margay&dt=2024-05-07%2004%3A00%3A08 >> Looking... > What Noah described[1] is what should be happening already, I think, > but 210622c6 unconditionally zeroed the page. Oops. The attached > seems to cure his repro for me. Does it also cure your test? I > couldn't see that variant myself for some reason, but it seems to make > sense as the explanation. I would probably adjust the function name > or perhaps consider refactoring slightly, but first let's confirm that > this is the same issue and fix. Thank you for looking and for the fix! Using the same testing procedure (applying patch for checking lpp, multiplying 026_overwrite_contrecord.pl tests and running 30 tests in parallel, with fsync=on) which I used for bisecting, I got failures on iterations 8, 19, 4 without the fix, but with the fix applied, 125 iterations passed. I think The Cure is sound. Best regards, Alexander
On Fri, Jun 7, 2024 at 4:05 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > static void > > -ZeroBuffer(Buffer buffer, ReadBufferMode mode) > > +ZeroBuffer(Buffer buffer, ReadBufferMode mode, bool zero) > > This change makes the API very strange. Should the function be called > ZeroAndLockBuffer() instead? Then the addition of a "bool zero" > argument makes a lot more sense. I agree that's better, but it still looks a bit weird. You have to realize that 'bool zero' means 'is already zeroed' here -- or at least, I guess that's the intention. But then I wonder why you'd call a function called ZeroAndLockBuffer if all you need to do is LockBuffer. -- Robert Haas EDB: http://www.enterprisedb.com
On Sat, Jun 8, 2024 at 12:47 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Fri, Jun 7, 2024 at 4:05 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > static void > > > -ZeroBuffer(Buffer buffer, ReadBufferMode mode) > > > +ZeroBuffer(Buffer buffer, ReadBufferMode mode, bool zero) > > > > This change makes the API very strange. Should the function be called > > ZeroAndLockBuffer() instead? Then the addition of a "bool zero" > > argument makes a lot more sense. > > I agree that's better, but it still looks a bit weird. You have to > realize that 'bool zero' means 'is already zeroed' here -- or at > least, I guess that's the intention. But then I wonder why you'd call > a function called ZeroAndLockBuffer if all you need to do is > LockBuffer. The name weirdness comes directly from RBM_ZERO_AND_LOCK (the fact that it doesn't always zero despite shouting ZERO is probably what temporarily confused me). But coming up with a better name is hard and I certainly don't propose to change it now. I think it's reasonable for this internal helper function to have that matching name as Alvaro suggested, with a good comment about that. Even though that quick-demonstration change fixed the two reported repros, I think it is still probably racy (or if it isn't, it relies on higher level interlocking that I don't want to rely on). This case really should be using the standard StartBufferIO/TerminateBufferIO infrastructure as it was before. I had moved that around to deal with multi-block I/O, but dropped the ball on the zero case... sorry. Here's a version like that. The "zero" argument (yeah that was not a good name) is now inverted and called "already_valid", but it's only a sort of optimisation for the case where we already know for sure that it's valid. If it isn't, we do the standard BM_IO_IN_PROGRESS/BM_VALID/CV dance, for correct interaction with any concurrent read or zero operation.
Attachment
New version. Same code as v2, but comments improved to explain the reasoning, with reference to README's buffer access rules. I'm planning to push this soon if there are no objections.
Attachment
On Fri, Jun 7, 2024 at 8:05 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > In passing, I noticed that WaitReadBuffers has zero comments, which > seems an insufficient number of them. Ack. Here is a patch for that. I guess I hadn't put a comment there because it's hard to write anything without sort of duplicating what is already said by the StartReadBuffers() comment and doubling down on descriptions of future plans... but, in for a penny, in for a pound as they say.
Attachment
I adjusted the code a bit more to look like the 16 coding including restoring some very useful comments that had been lost, and pushed. Thanks very much to Alexander and Noah for (independently) chasing this down and reporting, testing etc, and Alvaro and Robert for review comments. (Passing thought: it's a bit weird that we need to zero pages at all before restoring FPIs or initialising them. Perhaps there is some way to defer marking the buffer valid until after the caller gets a chance to initialise? Or something like that...)