Thread: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 1173, ...
FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 1173, ...
From
Thomas Munro
Date:
Hi, I don't know what happened here, but I my little script that scrapes for build farm assertion failures hasn't seen this one before now. It's on a 15 year old macOS release and compiled with 15 year old GCC version, for some added mystery. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust&dt=2022-02-12%2022%3A07%3A21 2022-02-13 00:09:52.859 CET [92423:92] pg_regress/replorigin STATEMENT: SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-sequences', '0'); TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 1173, PID: 92423) 0 postgres 0x00592dd0 ExceptionalCondition + 160\\0 1 postgres 0x00391c40 ReorderBufferGetRelids + 304\\0 2 postgres 0x00394794 ReorderBufferAllocate + 1044\\0 3 postgres 0x003834dc heap_decode + 76\\0 4 postgres 0x003829c8 LogicalDecodingProcessRecord + 168\\0 5 postgres 0x0038a1d4 pg_logical_emit_message_text + 1876\\0 6 postgres 0x00241810 ExecMakeTableFunctionResult + 576\\0 7 postgres 0x002576c4 ExecInitFunctionScan + 1444\\0 8 postgres 0x00242140 ExecScan + 896\\0 9 postgres 0x002397cc standard_ExecutorRun + 540\\0 10 postgres 0x00427940 EnsurePortalSnapshotExists + 832\\0 11 postgres 0x00428e84 PortalRun + 644\\0 12 postgres 0x004252e0 pg_plan_queries + 3232\\0 13 postgres 0x00426270 PostgresMain + 3424\\0 14 postgres 0x0036b9ec PostmasterMain + 8060\\0 15 postgres 0x0029f24c main + 1356\\0 16 postgres 0x00002524 start + 68\\0
Re: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 1173, ...
From
Andres Freund
Date:
Hi, On February 12, 2022 3:57:28 PM PST, Thomas Munro <thomas.munro@gmail.com> wrote: >Hi, > >I don't know what happened here, but I my little script that scrapes >for build farm assertion failures hasn't seen this one before now. >It's on a 15 year old macOS release and compiled with 15 year old GCC >version, for some added mystery. > >https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust&dt=2022-02-12%2022%3A07%3A21 > >2022-02-13 00:09:52.859 CET [92423:92] pg_regress/replorigin >STATEMENT: SELECT data FROM >pg_logical_slot_get_changes('regression_slot', NULL, NULL, >'include-xids', '0', 'skip-empty-xacts', '1', 'include-sequences', >'0'); >TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: >"reorderbuffer.c", Line: 1173, PID: 92423) >0 postgres 0x00592dd0 ExceptionalCondition + 160\\0 >1 postgres 0x00391c40 >ReorderBufferGetRelids + 304\\0 >2 postgres 0x00394794 >ReorderBufferAllocate + 1044\\0 >3 postgres 0x003834dc heap_decode + 76\\0 >4 postgres 0x003829c8 >LogicalDecodingProcessRecord + 168\\0 >5 postgres 0x0038a1d4 >pg_logical_emit_message_text + 1876\\0 >6 postgres 0x00241810 >ExecMakeTableFunctionResult + 576\\0 >7 postgres 0x002576c4 >ExecInitFunctionScan + 1444\\0 >8 postgres 0x00242140 ExecScan + 896\\0 >9 postgres 0x002397cc standard_ExecutorRun + 540\\0 >10 postgres 0x00427940 >EnsurePortalSnapshotExists + 832\\0 >11 postgres 0x00428e84 PortalRun + 644\\0 >12 postgres 0x004252e0 pg_plan_queries + 3232\\0 >13 postgres 0x00426270 PostgresMain + 3424\\0 >14 postgres 0x0036b9ec PostmasterMain + 8060\\0 >15 postgres 0x0029f24c main + 1356\\0 >16 postgres 0x00002524 start + 68\\0 I think that may be the bug that Tomas fixed a short while ago. He said the skip empty xacts stuff didn't yet work for sequences.That's only likely to be hit in slower machines... Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 1173, ...
From
Tomas Vondra
Date:
On 2/13/22 01:28, Andres Freund wrote: > Hi, > > On February 12, 2022 3:57:28 PM PST, Thomas Munro <thomas.munro@gmail.com> wrote: >> Hi, >> >> I don't know what happened here, but I my little script that scrapes >> for build farm assertion failures hasn't seen this one before now. >> It's on a 15 year old macOS release and compiled with 15 year old GCC >> version, for some added mystery. >> >> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust&dt=2022-02-12%2022%3A07%3A21 >> >> 2022-02-13 00:09:52.859 CET [92423:92] pg_regress/replorigin >> STATEMENT: SELECT data FROM >> pg_logical_slot_get_changes('regression_slot', NULL, NULL, >> 'include-xids', '0', 'skip-empty-xacts', '1', 'include-sequences', >> '0'); >> TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: >> "reorderbuffer.c", Line: 1173, PID: 92423) >> 0 postgres 0x00592dd0 ExceptionalCondition + 160\\0 >> 1 postgres 0x00391c40 >> ReorderBufferGetRelids + 304\\0 >> 2 postgres 0x00394794 >> ReorderBufferAllocate + 1044\\0 >> 3 postgres 0x003834dc heap_decode + 76\\0 >> 4 postgres 0x003829c8 >> LogicalDecodingProcessRecord + 168\\0 >> 5 postgres 0x0038a1d4 >> pg_logical_emit_message_text + 1876\\0 >> 6 postgres 0x00241810 >> ExecMakeTableFunctionResult + 576\\0 >> 7 postgres 0x002576c4 >> ExecInitFunctionScan + 1444\\0 >> 8 postgres 0x00242140 ExecScan + 896\\0 >> 9 postgres 0x002397cc standard_ExecutorRun + 540\\0 >> 10 postgres 0x00427940 >> EnsurePortalSnapshotExists + 832\\0 >> 11 postgres 0x00428e84 PortalRun + 644\\0 >> 12 postgres 0x004252e0 pg_plan_queries + 3232\\0 >> 13 postgres 0x00426270 PostgresMain + 3424\\0 >> 14 postgres 0x0036b9ec PostmasterMain + 8060\\0 >> 15 postgres 0x0029f24c main + 1356\\0 >> 16 postgres 0x00002524 start + 68\\0 > > I think that may be the bug that Tomas fixed a short while ago. He > said the skip empty xacts stuff didn't yet work for sequences. That's > only likely to be hit in slower machines... > I'm not sure how could this be related to the issue fixed by b779d7d8fd. That's merely about handling of empty xacts in the output plugin, it has little (nothing) to do with reorderbuffer - the failures was simply about (not) printing BEGIN/COMMIT for empty xacts. So why would it be triggering an Assert in reorderbuffer? Also, there was a successful run with the test_decoding changes 80901b3291 (and also with sequence decoding infrastructure committed a couple days ago) ... Do we have access to the machine? If yes, it'd be interesting to run the tests before the fix, and see how often it fails. Also, printing the LSN would be interesting, so that we can correlate it to WAL. BTW it's probably just a coincidence, but the cfbot does show a strange failure on the macOS machine [1], as if we did not wait for all changes to be applied by replica. I thought it's because of the issue with tracking LSN for sequences [2], but this is with a reworked test that does not do just nextval(). So I'm wondering if these two things may be related ... both things happened on macOS only. [1] https://cirrus-ci.com/task/6299472241098752 [2] https://www.postgresql.org/message-id/712cad46-a9c8-1389-aef8-faf0203c9be9%40enterprisedb.com -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 1173, ...
From
Tomas Vondra
Date:
On 2/13/22 13:27, Tomas Vondra wrote: > > ... > > BTW it's probably just a coincidence, but the cfbot does show a strange > failure on the macOS machine [1], as if we did not wait for all changes > to be applied by replica. I thought it's because of the issue with > tracking LSN for sequences [2], but this is with a reworked test that > does not do just nextval(). So I'm wondering if these two things may be > related ... both things happened on macOS only. > I take this back - this failure was just a thinko in the ROLLBACK test, I've posted an updated patch fixing this. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 1173, ...
From
Andres Freund
Date:
Hi, On 2022-02-13 13:27:08 +0100, Tomas Vondra wrote: > I'm not sure how could this be related to the issue fixed by b779d7d8fd. > That's merely about handling of empty xacts in the output plugin, it has > little (nothing) to do with reorderbuffer - the failures was simply > about (not) printing BEGIN/COMMIT for empty xacts. I'd only read your commit message - which didn't go into that much detail. I just saw that the run didn't yet include that commit and that the failed command specified skip-empty-xacts 1, which your commit described as fixing... > So why would it be triggering an Assert in reorderbuffer? Also, there > was a successful run with the test_decoding changes 80901b3291 (and also > with sequence decoding infrastructure committed a couple days ago) ... It's clearly a bug only happen at a lower likelihood... > Do we have access to the machine? If yes, it'd be interesting to run the > tests before the fix, and see how often it fails. Also, printing the LSN > would be interesting, so that we can correlate it to WAL. We really should provide assert infrastructure that can do comparisons while also printing values... If C just had proper generics :(. Greetings, Andres Freund