Thread: TRAP: FailedAssertion("tabstat->trans == trans", File: "pgstat_relation.c", Line: 508
TRAP: FailedAssertion("tabstat->trans == trans", File: "pgstat_relation.c", Line: 508
From
Erik Rijkers
Date:
Hi, I get this crash running the attached test program. On my slow-disked and old desktop it occurs once in 20 or so runs (it is yet another installment of an old test that runs pgbench with logical replication). 15devel compiled from d3609dd25. (The bash deletes stuff, and without my environment it will need some tweaking) Thanks, Erik Rijkers
Attachment
Re: TRAP: FailedAssertion("tabstat->trans == trans", File: "pgstat_relation.c", Line: 508
From
Michael Paquier
Date:
On Sat, Apr 16, 2022 at 09:37:55AM +0200, Erik Rijkers wrote: > I get this crash running the attached test program. On my slow-disked and > old desktop it occurs once in 20 or so runs (it is yet another installment > of an old test that runs pgbench with logical replication). > > 15devel compiled from d3609dd25. > > (The bash deletes stuff, and without my environment it will need some > tweaking) Thanks for the report, Erik. This one is new, likely related to the move of the stats to shared memory. I have added an open item. -- Michael
Attachment
Re: TRAP: FailedAssertion("tabstat->trans == trans", File: "pgstat_relation.c", Line: 508
From
Andres Freund
Date:
Hi, On 2022-04-16 09:37:55 +0200, Erik Rijkers wrote: > I get this crash running the attached test program. On my slow-disked and > old desktop it occurs once in 20 or so runs (it is yet another installment > of an old test that runs pgbench with logical replication). > > 15devel compiled from d3609dd25. > > (The bash deletes stuff, and without my environment it will need some > tweaking) Any chance for a backtrace? I'll otherwise try to adjust the script, but ... Greetings, Andres Freund
Re: TRAP: FailedAssertion("tabstat->trans == trans", File: "pgstat_relation.c", Line: 508
From
Kyotaro Horiguchi
Date:
At Sat, 16 Apr 2022 11:23:23 -0700, Andres Freund <andres@anarazel.de> wrote in > Hi, > > On 2022-04-16 09:37:55 +0200, Erik Rijkers wrote: > > I get this crash running the attached test program. On my slow-disked and > > old desktop it occurs once in 20 or so runs (it is yet another installment > > of an old test that runs pgbench with logical replication). > > > > 15devel compiled from d3609dd25. > > > > (The bash deletes stuff, and without my environment it will need some > > tweaking) > > Any chance for a backtrace? I'll otherwise try to adjust the script, but ... FWIW, the script keep succussfully running more than 140 times for me. (on master, Cent8) And I haven't find a hypothesis for the cause of the symptom. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: TRAP: FailedAssertion("tabstat->trans == trans", File: "pgstat_relation.c", Line: 508
From
Erikjan Rijkers
Date:
Op 18-04-2022 om 09:13 schreef Kyotaro Horiguchi: > At Sat, 16 Apr 2022 11:23:23 -0700, Andres Freund <andres@anarazel.de> wrote in >> Hi, >> >> On 2022-04-16 09:37:55 +0200, Erik Rijkers wrote: >>> I get this crash running the attached test program. On my slow-disked and >>> old desktop it occurs once in 20 or so runs (it is yet another installment >>> of an old test that runs pgbench with logical replication). >>> >>> 15devel compiled from d3609dd25. >>> >>> (The bash deletes stuff, and without my environment it will need some >>> tweaking) >> >> Any chance for a backtrace? I'll otherwise try to adjust the script, but ... > > FWIW, the script keep succussfully running more than 140 times for me. > (on master, Cent8) And I haven't find a hypothesis for the cause of > the symptom. Hm. Just now I've recompiled and retried and after 5 runs got the same crash. Then tried on another machine (also old, I'm afraid), and built 1a8b11053 and ran the same thing. That failed on the first try, and made core dump from which I extracted: gdb ~/pg_stuff/pg_installations/pgsql.HEAD/bin/postgres core-postgres-6-500-500-8289-1650269886 -ex bt -ex q GNU gdb (GDB) 7.6 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /home/aardvark/pg_stuff/pg_installations/pgsql.HEAD/bin/postgres...done. [New LWP 8289] warning: Can't read pathname for load map: Input/output error. [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `postgres: logical replication worker for subscription 16411 '. Program terminated with signal 6, Aborted. #0 0x000000357d6324f5 in raise () from /lib64/libc.so.6 #0 0x000000357d6324f5 in raise () from /lib64/libc.so.6 #1 0x000000357d633cd5 in abort () from /lib64/libc.so.6 #2 0x0000000000973fcb in ExceptionalCondition (conditionName=conditionName@entry=0xb20d76 "tabstat->trans == trans", errorType=errorType@entry=0x9c7c2b "FailedAssertion", fileName=fileName@entry=0xb20d0b "pgstat_relation.c", lineNumber=lineNumber@entry=508) at assert.c:69 #3 0x000000000086b77f in AtEOXact_PgStat_Relations (xact_state=xact_state@entry=0x26f0b50, isCommit=isCommit@entry=true) at pgstat_relation.c:508 #4 0x000000000086ec0f in AtEOXact_PgStat (isCommit=isCommit@entry=true, parallel=parallel@entry=false) at pgstat_xact.c:54 #5 0x00000000005bd2a3 in CommitTransaction () at xact.c:2360 #6 0x00000000005be5d5 in CommitTransactionCommand () at xact.c:3048 #7 0x00000000007ee72b in apply_handle_commit_internal (commit_data=commit_data@entry=0x7ffe4606a7a0) at worker.c:1532 #8 0x00000000007efac9 in apply_handle_commit (s=0x7ffe4606a940) at worker.c:845 #9 apply_dispatch () at worker.c:2473 #10 0x00000000007f11a7 in LogicalRepApplyLoop (last_received=74454600) at worker.c:2757 #11 start_apply () at worker.c:3526 #12 0x00000000007f175f in ApplyWorkerMain () at worker.c:3782 #13 0x00000000007bdba3 in StartBackgroundWorker () at bgworker.c:858 #14 0x00000000007c3241 in do_start_bgworker (rw=<optimized out>) at postmaster.c:5802 #15 maybe_start_bgworkers () at postmaster.c:6026 #16 0x00000000007c3b65 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:5191 #17 <signal handler called> #18 0x000000357d6e1683 in __select_nocancel () from /lib64/libc.so.6 #19 0x00000000007c41d6 in ServerLoop () at postmaster.c:1757 #20 0x00000000007c5c3b in PostmasterMain () at postmaster.c:1465 #21 0x0000000000720cfe in main (argc=11, argv=0x2615590) at main.c:202 I'm not sure that helps. > > regards. >
Re: TRAP: FailedAssertion("tabstat->trans == trans", File: "pgstat_relation.c", Line: 508
From
Kyotaro Horiguchi
Date:
At Mon, 18 Apr 2022 10:57:02 +0200, Erikjan Rijkers <er@xs4all.nl> wrote in > Hm. Just now I've recompiled and retried and after 5 runs got the > same crash. Then tried on another machine (also old, I'm afraid), > and built 1a8b11053 and ran the same thing. That failed on the first > try, and made core dump from which I extracted: Thanks! > gdb ~/pg_stuff/pg_installations/pgsql.HEAD/bin/postgres > core-postgres-6-500-500-8289-1650269886 -ex bt -ex q > > #2 0x0000000000973fcb in ExceptionalCondition > #(conditionName=conditionName@entry=0xb20d76 "tabstat->trans == trans", > #errorType=errorType@entry=0x9c7c2b "FailedAssertion", > fileName=fileName@entry=0xb20d0b "pgstat_relation.c", > lineNumber=lineNumber@entry=508) at assert.c:69 > #3 0x000000000086b77f in AtEOXact_PgStat_Relations > #(xact_state=xact_state@entry=0x26f0b50, isCommit=isCommit@entry=true) > #at pgstat_relation.c:508 Could you read tabstat, *tabstat, trans, *trans here? > #4 0x000000000086ec0f in AtEOXact_PgStat (isCommit=isCommit@entry=true, > #parallel=parallel@entry=false) at pgstat_xact.c:54 > #5 0x00000000005bd2a3 in CommitTransaction () at xact.c:2360 > #6 0x00000000005be5d5 in CommitTransactionCommand () at xact.c:3048 > #7 0x00000000007ee72b in apply_handle_commit_internal > #(commit_data=commit_data@entry=0x7ffe4606a7a0) at worker.c:1532 > #8 0x00000000007efac9 in apply_handle_commit (s=0x7ffe4606a940) at > #worker.c:845 > #9 apply_dispatch () at worker.c:2473 > #10 0x00000000007f11a7 in LogicalRepApplyLoop (last_received=74454600) > #at worker.c:2757 > #11 start_apply () at worker.c:3526 > #12 0x00000000007f175f in ApplyWorkerMain () at worker.c:3782 regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: TRAP: FailedAssertion("tabstat->trans == trans", File: "pgstat_relation.c", Line: 508
From
Erik Rijkers
Date:
Op 19-04-2022 om 02:15 schreef Kyotaro Horiguchi: > At Mon, 18 Apr 2022 10:57:02 +0200, Erikjan Rijkers <er@xs4all.nl> wrote in >> Hm. Just now I've recompiled and retried and after 5 runs got the >> same crash. Then tried on another machine (also old, I'm afraid), >> and built 1a8b11053 and ran the same thing. That failed on the first >> try, and made core dump from which I extracted: > > Thanks! > >> gdb ~/pg_stuff/pg_installations/pgsql.HEAD/bin/postgres >> core-postgres-6-500-500-8289-1650269886 -ex bt -ex q >> >> #2 0x0000000000973fcb in ExceptionalCondition >> #(conditionName=conditionName@entry=0xb20d76 "tabstat->trans == trans", >> #errorType=errorType@entry=0x9c7c2b "FailedAssertion", >> fileName=fileName@entry=0xb20d0b "pgstat_relation.c", >> lineNumber=lineNumber@entry=508) at assert.c:69 >> #3 0x000000000086b77f in AtEOXact_PgStat_Relations >> #(xact_state=xact_state@entry=0x26f0b50, isCommit=isCommit@entry=true) >> #at pgstat_relation.c:508 > > Could you read tabstat, *tabstat, trans, *trans here? To be honest I'm not sure how to, but I gave it a try: GNU gdb (GDB) 7.6 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /home/aardvark/pg_stuff/pg_installations/pgsql.HEAD/bin/postgres...done. [New LWP 21839] warning: Can't read pathname for load map: Input/output error. [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `postgres: logical replication worker for subscription 16411 '. Program terminated with signal 6, Aborted. #0 0x000000357d6324f5 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x000000357d6324f5 in raise () from /lib64/libc.so.6 #1 0x000000357d633cd5 in abort () from /lib64/libc.so.6 #2 0x000000000097400b in ExceptionalCondition (conditionName=conditionName@entry=0xb20df6 "tabstat->trans == trans", errorType=errorType@entry=0x9c7cab "FailedAssertion", fileName=fileName@entry=0xb20d8b "pgstat_relation.c", lineNumber=lineNumber@entry=508) at assert.c:69 #3 0x000000000086b7bf in AtEOXact_PgStat_Relations (xact_state=xact_state@entry=0x2d9ab50, isCommit=isCommit@entry=true) at pgstat_relation.c:508 #4 0x000000000086ec4f in AtEOXact_PgStat (isCommit=isCommit@entry=true, parallel=parallel@entry=false) at pgstat_xact.c:54 #5 0x00000000005bd2a3 in CommitTransaction () at xact.c:2360 #6 0x00000000005be5d5 in CommitTransactionCommand () at xact.c:3048 #7 0x00000000007ee76b in apply_handle_commit_internal (commit_data=commit_data@entry=0x7fffb90aa8e0) at worker.c:1532 #8 0x00000000007efb09 in apply_handle_commit (s=0x7fffb90aaa80) at worker.c:845 #9 apply_dispatch () at worker.c:2473 #10 0x00000000007f11e7 in LogicalRepApplyLoop (last_received=74695984) at worker.c:2757 #11 start_apply () at worker.c:3526 #12 0x00000000007f179f in ApplyWorkerMain () at worker.c:3782 #13 0x00000000007bdbb3 in StartBackgroundWorker () at bgworker.c:858 #14 0x00000000007c3251 in do_start_bgworker (rw=<optimized out>) at postmaster.c:5802 #15 maybe_start_bgworkers () at postmaster.c:6026 #16 0x00000000007c3b75 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:5191 #17 <signal handler called> #18 0x000000357d6e1683 in __select_nocancel () from /lib64/libc.so.6 #19 0x00000000007c41e6 in ServerLoop () at postmaster.c:1757 #20 0x00000000007c5c4b in PostmasterMain () at postmaster.c:1465 #21 0x0000000000720d0e in main (argc=11, argv=0x2cbf590) at main.c:202 (gdb) f 3 #3 0x000000000086b7bf in AtEOXact_PgStat_Relations (xact_state=xact_state@entry=0x2d9ab50, isCommit=isCommit@entry=true) at pgstat_relation.c:508 508 Assert(tabstat->trans == trans); (gdb) p tabstat $1 = <optimized out> (gdb) p *tabstat value has been optimized out (gdb) p trans $2 = <optimized out> (gdb) p *trans value has been optimized out (gdb) > >> #4 0x000000000086ec0f in AtEOXact_PgStat (isCommit=isCommit@entry=true, >> #parallel=parallel@entry=false) at pgstat_xact.c:54 >> #5 0x00000000005bd2a3 in CommitTransaction () at xact.c:2360 >> #6 0x00000000005be5d5 in CommitTransactionCommand () at xact.c:3048 >> #7 0x00000000007ee72b in apply_handle_commit_internal >> #(commit_data=commit_data@entry=0x7ffe4606a7a0) at worker.c:1532 >> #8 0x00000000007efac9 in apply_handle_commit (s=0x7ffe4606a940) at >> #worker.c:845 >> #9 apply_dispatch () at worker.c:2473 >> #10 0x00000000007f11a7 in LogicalRepApplyLoop (last_received=74454600) >> #at worker.c:2757 >> #11 start_apply () at worker.c:3526 >> #12 0x00000000007f175f in ApplyWorkerMain () at worker.c:3782 > > regards. >
Re: TRAP: FailedAssertion("tabstat->trans == trans", File: "pgstat_relation.c", Line: 508
From
Kyotaro Horiguchi
Date:
Thaks Erik. At Tue, 19 Apr 2022 07:00:30 +0200, Erik Rijkers <er@xs4all.nl> wrote in > Op 19-04-2022 om 02:15 schreef Kyotaro Horiguchi: > > Could you read tabstat, *tabstat, trans, *trans here? > > To be honest I'm not sure how to, but I gave it a try: > > (gdb) p tabstat > $1 = <optimized out> Great! It is that. But unfortunately they are optimized out.. Could you cause the crash with -O0 binary? You will see the variable with it. You can rebuild with the option as follows. $ make clean; make install CUSTOM_COPT="-O0 -g" You can dump only the whole xact_state chain from the current core file but the result will give a bit obscure hint for diagnosis. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: TRAP: FailedAssertion("tabstat->trans == trans", File: "pgstat_relation.c", Line: 508
From
Erik Rijkers
Date:
Op 19-04-2022 om 11:25 schreef Kyotaro Horiguchi: > Thaks Erik. > > At Tue, 19 Apr 2022 07:00:30 +0200, Erik Rijkers <er@xs4all.nl> wrote in >> Op 19-04-2022 om 02:15 schreef Kyotaro Horiguchi: >>> Could you read tabstat, *tabstat, trans, *trans here? >> >> To be honest I'm not sure how to, but I gave it a try: >> I rebuilt newest master (a62bff74b135) with export CUSTOM_COPT="-O0 -g" The 12th run of statbug.sh crashed and gave a corefile. GNU gdb (GDB) 7.6 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /home/aardvark/pg_stuff/pg_installations/pgsql.HEAD/bin/postgres...done. [New LWP 25058] warning: Can't read pathname for load map: Input/output error. [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `postgres: logical replication worker for subscription 16411 '. Program terminated with signal 6, Aborted. #0 0x000000357d6324f5 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x000000357d6324f5 in raise () from /lib64/libc.so.6 #1 0x000000357d633cd5 in abort () from /lib64/libc.so.6 #2 0x0000000000b3bada in ExceptionalCondition (conditionName=0xd389a1 "tabstat->trans == trans", errorType=0xd388b2 "FailedAssertion", fileName=0xd388a0 "pgstat_relation.c", lineNumber=508) at assert.c:69 #3 0x00000000009bf5dc in AtEOXact_PgStat_Relations (xact_state=0x31b1b50, isCommit=true) at pgstat_relation.c:508 #4 0x00000000009c4107 in AtEOXact_PgStat (isCommit=true, parallel=false) at pgstat_xact.c:54 #5 0x0000000000583764 in CommitTransaction () at xact.c:2360 #6 0x0000000000584354 in CommitTransactionCommand () at xact.c:3048 #7 0x000000000090b34e in apply_handle_commit_internal (commit_data=0x7ffd024b5940) at worker.c:1532 #8 0x000000000090a287 in apply_handle_commit (s=0x7ffd024b59b0) at worker.c:845 #9 0x000000000090ce3a in apply_dispatch (s=0x7ffd024b59b0) at worker.c:2473 #10 0x000000000090d41c in LogicalRepApplyLoop (last_received=74680880) at worker.c:2757 #11 0x000000000090e974 in start_apply (origin_startpos=0) at worker.c:3526 #12 0x000000000090f156 in ApplyWorkerMain (main_arg=0) at worker.c:3782 #13 0x00000000008c7623 in StartBackgroundWorker () at bgworker.c:858 #14 0x00000000008d1557 in do_start_bgworker (rw=0x30ff0a0) at postmaster.c:5802 #15 0x00000000008d1903 in maybe_start_bgworkers () at postmaster.c:6026 #16 0x00000000008d09ba in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5191 #17 <signal handler called> #18 0x000000357d6e1683 in __select_nocancel () from /lib64/libc.so.6 #19 0x00000000008cc6c1 in ServerLoop () at postmaster.c:1757 #20 0x00000000008cc0aa in PostmasterMain (argc=11, argv=0x30d6590) at postmaster.c:1465 #21 0x00000000007c9256 in main (argc=11, argv=0x30d6590) at main.c:202 (gdb) f 3 #3 0x00000000009bf5dc in AtEOXact_PgStat_Relations (xact_state=0x31b1b50, isCommit=true) at pgstat_relation.c:508 508 Assert(tabstat->trans == trans); (gdb) p tabstat $1 = (PgStat_TableStatus *) 0x319e630 (gdb) p *tabstat $2 = {t_id = 2139062143, t_shared = 127, trans = 0x7f7f7f7f7f7f7f7f, t_counts = {t_numscans = 9187201950435737471, t_tuples_returned = 9187201950435737471, t_tuples_fetched = 9187201950435737471, t_tuples_inserted = 9187201950435737471, t_tuples_updated = 9187201950435737471, t_tuples_deleted = 9187201950435737471, t_tuples_hot_updated = 9187201950435737471, t_truncdropped = 127, t_delta_live_tuples = 9187201950435737471, t_delta_dead_tuples = 9187201950435737471, t_changed_tuples = 9187201950435737471, t_blocks_fetched = 9187201950435737471, t_blocks_hit = 9187201950435737471}, relation = 0x7f7f7f7f7f7f7f7f} (gdb) p trans $3 = (PgStat_TableXactStatus *) 0x31b1ba8 (gdb) p *trans $4 = {tuples_inserted = 1, tuples_updated = 0, tuples_deleted = 0, truncdropped = false, inserted_pre_truncdrop = 0, updated_pre_truncdrop = 0, deleted_pre_truncdrop = 0, nest_level = 1, upper = 0x0, parent = 0x319e630, next = 0x31b1ab8} (gdb) Looks like we're one step further, no? Erik >> (gdb) p tabstat >> $1 = <optimized out> > > Great! It is that. But unfortunately they are optimized out.. Could > you cause the crash with -O0 binary? You will see the variable with > it. You can rebuild with the option as follows. > > $ make clean; make install CUSTOM_COPT="-O0 -g" > > You can dump only the whole xact_state chain from the current core > file but the result will give a bit obscure hint for diagnosis. > > regards. >
Re: TRAP: FailedAssertion("tabstat->trans == trans", File: "pgstat_relation.c", Line: 508
From
Alvaro Herrera
Date:
On 2022-Apr-19, Erik Rijkers wrote: > (gdb) p tabstat > $1 = (PgStat_TableStatus *) 0x319e630 > (gdb) p *tabstat > $2 = {t_id = 2139062143, t_shared = 127, trans = 0x7f7f7f7f7f7f7f7f, > t_counts = {t_numscans = 9187201950435737471, t_tuples_returned = > 9187201950435737471, t_tuples_fetched = 9187201950435737471, > t_tuples_inserted = 9187201950435737471, t_tuples_updated = > 9187201950435737471, t_tuples_deleted = 9187201950435737471, > t_tuples_hot_updated = 9187201950435737471, t_truncdropped = 127, > t_delta_live_tuples = 9187201950435737471, t_delta_dead_tuples = > 9187201950435737471, t_changed_tuples = 9187201950435737471, > t_blocks_fetched = 9187201950435737471, t_blocks_hit = 9187201950435737471}, > relation = 0x7f7f7f7f7f7f7f7f} It looks like this struct is freed or is in a memory context that was reset. Perhaps its lifetime wasn't carefully considered in the logical replication code, which takes some shortcuts. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "El que vive para el futuro es un iluso, y el que vive para el pasado, un imbécil" (Luis Adler, "Los tripulantes de la noche")
Re: TRAP: FailedAssertion("tabstat->trans == trans", File: "pgstat_relation.c", Line: 508
From
Andres Freund
Date:
Hi, On 2022-04-19 13:50:25 +0200, Erik Rijkers wrote: > The 12th run of statbug.sh crashed and gave a corefile. I ran through quite a few iterations by now, without reproducing :( I guess there's some timing issue and you're hitting on your system due to the slower disks. > Program terminated with signal 6, Aborted. > #0 0x000000357d6324f5 in raise () from /lib64/libc.so.6 > (gdb) bt > #0 0x000000357d6324f5 in raise () from /lib64/libc.so.6 > #1 0x000000357d633cd5 in abort () from /lib64/libc.so.6 > #2 0x0000000000b3bada in ExceptionalCondition (conditionName=0xd389a1 > "tabstat->trans == trans", errorType=0xd388b2 "FailedAssertion", > fileName=0xd388a0 "pgstat_relation.c", lineNumber=508) at assert.c:69 > #3 0x00000000009bf5dc in AtEOXact_PgStat_Relations (xact_state=0x31b1b50, > isCommit=true) at pgstat_relation.c:508 > #4 0x00000000009c4107 in AtEOXact_PgStat (isCommit=true, parallel=false) at > pgstat_xact.c:54 > #5 0x0000000000583764 in CommitTransaction () at xact.c:2360 > #6 0x0000000000584354 in CommitTransactionCommand () at xact.c:3048 > #7 0x000000000090b34e in apply_handle_commit_internal > (commit_data=0x7ffd024b5940) at worker.c:1532 > #8 0x000000000090a287 in apply_handle_commit (s=0x7ffd024b59b0) at > worker.c:845 > #9 0x000000000090ce3a in apply_dispatch (s=0x7ffd024b59b0) at worker.c:2473 > #10 0x000000000090d41c in LogicalRepApplyLoop (last_received=74680880) at > worker.c:2757 > #11 0x000000000090e974 in start_apply (origin_startpos=0) at worker.c:3526 > #12 0x000000000090f156 in ApplyWorkerMain (main_arg=0) at worker.c:3782 > #13 0x00000000008c7623 in StartBackgroundWorker () at bgworker.c:858 > #14 0x00000000008d1557 in do_start_bgworker (rw=0x30ff0a0) at > postmaster.c:5802 > #15 0x00000000008d1903 in maybe_start_bgworkers () at postmaster.c:6026 > #16 0x00000000008d09ba in sigusr1_handler (postgres_signal_arg=10) at > postmaster.c:5191 > #17 <signal handler called> > #18 0x000000357d6e1683 in __select_nocancel () from /lib64/libc.so.6 > #19 0x00000000008cc6c1 in ServerLoop () at postmaster.c:1757 > #20 0x00000000008cc0aa in PostmasterMain (argc=11, argv=0x30d6590) at > postmaster.c:1465 > #21 0x00000000007c9256 in main (argc=11, argv=0x30d6590) at main.c:202 > (gdb) f 3 > #3 0x00000000009bf5dc in AtEOXact_PgStat_Relations (xact_state=0x31b1b50, > isCommit=true) at pgstat_relation.c:508 > 508 Assert(tabstat->trans == trans); > (gdb) p tabstat > $1 = (PgStat_TableStatus *) 0x319e630 > (gdb) p *tabstat > $2 = {t_id = 2139062143, t_shared = 127, trans = 0x7f7f7f7f7f7f7f7f, > t_counts = {t_numscans = 9187201950435737471, t_tuples_returned = > 9187201950435737471, t_tuples_fetched = 9187201950435737471, > t_tuples_inserted = 9187201950435737471, t_tuples_updated = > 9187201950435737471, t_tuples_deleted = 9187201950435737471, > t_tuples_hot_updated = 9187201950435737471, t_truncdropped = 127, > t_delta_live_tuples = 9187201950435737471, t_delta_dead_tuples = > 9187201950435737471, t_changed_tuples = 9187201950435737471, > t_blocks_fetched = 9187201950435737471, t_blocks_hit = 9187201950435737471}, > relation = 0x7f7f7f7f7f7f7f7f} > (gdb) p trans > $3 = (PgStat_TableXactStatus *) 0x31b1ba8 > (gdb) p *trans > $4 = {tuples_inserted = 1, tuples_updated = 0, tuples_deleted = 0, > truncdropped = false, inserted_pre_truncdrop = 0, updated_pre_truncdrop = 0, > deleted_pre_truncdrop = 0, nest_level = 1, upper = 0x0, > parent = 0x319e630, next = 0x31b1ab8} > (gdb) Could you print out p xact_state p *xact_state p xact_state->first p *xact_state->first Do you have the server log file for the failed run / instance? Greetings, Andres Freund
Re: TRAP: FailedAssertion("tabstat->trans == trans", File: "pgstat_relation.c", Line: 508
From
Andres Freund
Date:
Hi, On 2022-04-19 10:36:24 -0700, Andres Freund wrote: > On 2022-04-19 13:50:25 +0200, Erik Rijkers wrote: > > The 12th run of statbug.sh crashed and gave a corefile. > > I ran through quite a few iterations by now, without reproducing :( > > I guess there's some timing issue and you're hitting on your system > due to the slower disks. Ah. I found the issue. The new pgstat_report_stat(true) call in LogicalRepApplyLoop()'s "timeout" section doesn't check if we're in a transaction. And the transactional stats code doesn't handle that (never has). I think all that's needed is a if (IsTransactionState()) around that pgstat_report_stat(). It might be possible to put an assertion into pgstat_report_stat(), but I need to look at the process exit code to see if it is. Greetings, Andres Freund
Re: TRAP: FailedAssertion("tabstat->trans == trans", File: "pgstat_relation.c", Line: 508
From
Erik Rijkers
Date:
Op 19-04-2022 om 19:36 schreef Andres Freund: > Hi, > > On 2022-04-19 13:50:25 +0200, Erik Rijkers wrote: >> The 12th run of statbug.sh crashed and gave a corefile. > > I ran through quite a few iterations by now, without reproducing :( > > I guess there's some timing issue and you're hitting on your system > due to the slower disks. > > >> Program terminated with signal 6, Aborted. >> #0 0x000000357d6324f5 in raise () from /lib64/libc.so.6 >> (gdb) bt >> #0 0x000000357d6324f5 in raise () from /lib64/libc.so.6 >> #1 0x000000357d633cd5 in abort () from /lib64/libc.so.6 >> #2 0x0000000000b3bada in ExceptionalCondition (conditionName=0xd389a1 >> "tabstat->trans == trans", errorType=0xd388b2 "FailedAssertion", >> fileName=0xd388a0 "pgstat_relation.c", lineNumber=508) at assert.c:69 >> #3 0x00000000009bf5dc in AtEOXact_PgStat_Relations (xact_state=0x31b1b50, >> isCommit=true) at pgstat_relation.c:508 >> #4 0x00000000009c4107 in AtEOXact_PgStat (isCommit=true, parallel=false) at >> pgstat_xact.c:54 >> #5 0x0000000000583764 in CommitTransaction () at xact.c:2360 >> #6 0x0000000000584354 in CommitTransactionCommand () at xact.c:3048 >> #7 0x000000000090b34e in apply_handle_commit_internal >> (commit_data=0x7ffd024b5940) at worker.c:1532 >> #8 0x000000000090a287 in apply_handle_commit (s=0x7ffd024b59b0) at >> worker.c:845 >> #9 0x000000000090ce3a in apply_dispatch (s=0x7ffd024b59b0) at worker.c:2473 >> #10 0x000000000090d41c in LogicalRepApplyLoop (last_received=74680880) at >> worker.c:2757 >> #11 0x000000000090e974 in start_apply (origin_startpos=0) at worker.c:3526 >> #12 0x000000000090f156 in ApplyWorkerMain (main_arg=0) at worker.c:3782 >> #13 0x00000000008c7623 in StartBackgroundWorker () at bgworker.c:858 >> #14 0x00000000008d1557 in do_start_bgworker (rw=0x30ff0a0) at >> postmaster.c:5802 >> #15 0x00000000008d1903 in maybe_start_bgworkers () at postmaster.c:6026 >> #16 0x00000000008d09ba in sigusr1_handler (postgres_signal_arg=10) at >> postmaster.c:5191 >> #17 <signal handler called> >> #18 0x000000357d6e1683 in __select_nocancel () from /lib64/libc.so.6 >> #19 0x00000000008cc6c1 in ServerLoop () at postmaster.c:1757 >> #20 0x00000000008cc0aa in PostmasterMain (argc=11, argv=0x30d6590) at >> postmaster.c:1465 >> #21 0x00000000007c9256 in main (argc=11, argv=0x30d6590) at main.c:202 >> (gdb) f 3 >> #3 0x00000000009bf5dc in AtEOXact_PgStat_Relations (xact_state=0x31b1b50, >> isCommit=true) at pgstat_relation.c:508 >> 508 Assert(tabstat->trans == trans); >> (gdb) p tabstat >> $1 = (PgStat_TableStatus *) 0x319e630 >> (gdb) p *tabstat >> $2 = {t_id = 2139062143, t_shared = 127, trans = 0x7f7f7f7f7f7f7f7f, >> t_counts = {t_numscans = 9187201950435737471, t_tuples_returned = >> 9187201950435737471, t_tuples_fetched = 9187201950435737471, >> t_tuples_inserted = 9187201950435737471, t_tuples_updated = >> 9187201950435737471, t_tuples_deleted = 9187201950435737471, >> t_tuples_hot_updated = 9187201950435737471, t_truncdropped = 127, >> t_delta_live_tuples = 9187201950435737471, t_delta_dead_tuples = >> 9187201950435737471, t_changed_tuples = 9187201950435737471, >> t_blocks_fetched = 9187201950435737471, t_blocks_hit = 9187201950435737471}, >> relation = 0x7f7f7f7f7f7f7f7f} >> (gdb) p trans >> $3 = (PgStat_TableXactStatus *) 0x31b1ba8 >> (gdb) p *trans >> $4 = {tuples_inserted = 1, tuples_updated = 0, tuples_deleted = 0, >> truncdropped = false, inserted_pre_truncdrop = 0, updated_pre_truncdrop = 0, >> deleted_pre_truncdrop = 0, nest_level = 1, upper = 0x0, >> parent = 0x319e630, next = 0x31b1ab8} >> (gdb) > > Could you print out > p xact_state > p *xact_state > p xact_state->first > p *xact_state->first > > Do you have the server log file for the failed run / instance? (gdb) p xact_state $5 = (PgStat_SubXactStatus *) 0x31b1b50 (gdb) p *xact_state $6 = {nest_level = 1, prev = 0x0, pending_drops = {head = {prev = 0x31b1b60, next = 0x31b1b60}}, pending_drops_count = 0, first = 0x31b1ba8} (gdb) p xact_state->first $7 = (PgStat_TableXactStatus *) 0x31b1ba8 (gdb) p *xact_state->first $8 = {tuples_inserted = 1, tuples_updated = 0, tuples_deleted = 0, truncdropped = false, inserted_pre_truncdrop = 0, updated_pre_truncdrop = 0, deleted_pre_truncdrop = 0, nest_level = 1, upper = 0x0, parent = 0x319e630, next = 0x31b1ab8} (gdb) The logfile is attached. Erik > Greetings, > > Andres Freund
Attachment
Re: TRAP: FailedAssertion("tabstat->trans == trans", File: "pgstat_relation.c", Line: 508
From
Kyotaro Horiguchi
Date:
At Tue, 19 Apr 2022 10:55:26 -0700, Andres Freund <andres@anarazel.de> wrote in > Hi, > > On 2022-04-19 10:36:24 -0700, Andres Freund wrote: > > On 2022-04-19 13:50:25 +0200, Erik Rijkers wrote: > > > The 12th run of statbug.sh crashed and gave a corefile. > > > > I ran through quite a few iterations by now, without reproducing :( > > > > I guess there's some timing issue and you're hitting on your system > > due to the slower disks. > > Ah. I found the issue. The new pgstat_report_stat(true) call in > LogicalRepApplyLoop()'s "timeout" section doesn't check if we're in a > transaction. And the transactional stats code doesn't handle that (never > has). > > I think all that's needed is a if (IsTransactionState()) around that > pgstat_report_stat(). if (!IsTransactinoState()) ? > It might be possible to put an assertion into pgstat_report_stat(), but > I need to look at the process exit code to see if it is. Inserting a sleep in pgoutput_commit_txn reproduced this. Crashes with the same stack trace with the similar variable state. diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c index b197bfd565..def4d751d3 100644 --- a/src/backend/replication/pgoutput/pgoutput.c +++ b/src/backend/replication/pgoutput/pgoutput.c @@ -568,6 +568,7 @@ pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, return; } + sleep(2); OutputPluginPrepareWrite(ctx, true); logicalrep_write_commit(ctx->out, txn, commit_lsn); OutputPluginWrite(ctx, true); The following actuall works for this. diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c index 4171371296..f4e5359513 100644 --- a/src/backend/replication/logical/worker.c +++ b/src/backend/replication/logical/worker.c @@ -2882,10 +2882,11 @@ LogicalRepApplyLoop(XLogRecPtr last_received) send_feedback(last_received, requestReply, requestReply); /* - * Force reporting to ensure long idle periods don't lead to - * arbitrarily delayed stats. + * Force reporting to ensure long out-of-transaction idle periods + * don't lead to arbitrarily delayed stats. */ - pgstat_report_stat(true); + if (!IsTransactionState()) + pgstat_report_stat(true); } } regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: TRAP: FailedAssertion("tabstat->trans == trans", File: "pgstat_relation.c", Line: 508
From
Erik Rijkers
Date:
Op 20-04-2022 om 06:54 schreef Kyotaro Horiguchi: > At Tue, 19 Apr 2022 10:55:26 -0700, Andres Freund <andres@anarazel.de> wrote in >> Hi, >> >> On 2022-04-19 10:36:24 -0700, Andres Freund wrote: >>> On 2022-04-19 13:50:25 +0200, Erik Rijkers wrote: >>>> The 12th run of statbug.sh crashed and gave a corefile. >>> >>> I ran through quite a few iterations by now, without reproducing :( >>> >>> I guess there's some timing issue and you're hitting on your system >>> due to the slower disks. >> >> Ah. I found the issue. The new pgstat_report_stat(true) call in >> LogicalRepApplyLoop()'s "timeout" section doesn't check if we're in a >> transaction. And the transactional stats code doesn't handle that (never >> has). >> >> I think all that's needed is a if (IsTransactionState()) around that >> pgstat_report_stat(). > > if (!IsTransactinoState()) ? > >> It might be possible to put an assertion into pgstat_report_stat(), but >> I need to look at the process exit code to see if it is. > > Inserting a sleep in pgoutput_commit_txn reproduced this. Crashes with > the same stack trace with the similar variable state. > > diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c > index b197bfd565..def4d751d3 100644 > --- a/src/backend/replication/pgoutput/pgoutput.c > +++ b/src/backend/replication/pgoutput/pgoutput.c > @@ -568,6 +568,7 @@ pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, > return; > } > > + sleep(2); > OutputPluginPrepareWrite(ctx, true); > logicalrep_write_commit(ctx->out, txn, commit_lsn); > OutputPluginWrite(ctx, true); > > The following actuall works for this. > > diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c > index 4171371296..f4e5359513 100644 > --- a/src/backend/replication/logical/worker.c > +++ b/src/backend/replication/logical/worker.c > @@ -2882,10 +2882,11 @@ LogicalRepApplyLoop(XLogRecPtr last_received) > send_feedback(last_received, requestReply, requestReply); > > /* > - * Force reporting to ensure long idle periods don't lead to > - * arbitrarily delayed stats. > + * Force reporting to ensure long out-of-transaction idle periods > + * don't lead to arbitrarily delayed stats. > */ > - pgstat_report_stat(true); > + if (!IsTransactionState()) > + pgstat_report_stat(true); > } > } > Yes, that seems to fix it: I applied that latter patch, and ran my program 250x without errors. Then I removed it again an it gave the error within 15x. thanks! Erik > regards. >
Re: TRAP: FailedAssertion("tabstat->trans == trans", File: "pgstat_relation.c", Line: 508
From
Michael Paquier
Date:
On Wed, Apr 20, 2022 at 01:03:20PM +0200, Erik Rijkers wrote: > Yes, that seems to fix it: I applied that latter patch, and ran my program > 250x without errors. Then I removed it again an it gave the error within > 15x. That looks simple enough, indeed. Andres, are you planning to address this issue? -- Michael
Attachment
Re: TRAP: FailedAssertion("tabstat->trans == trans", File: "pgstat_relation.c", Line: 508
From
Michael Paquier
Date:
On Mon, Apr 25, 2022 at 03:18:52PM +0900, Michael Paquier wrote: > On Wed, Apr 20, 2022 at 01:03:20PM +0200, Erik Rijkers wrote: >> Yes, that seems to fix it: I applied that latter patch, and ran my program >> 250x without errors. Then I removed it again an it gave the error within >> 15x. > > That looks simple enough, indeed. Andres, are you planning to address > this issue? Ping. It looks annoying to release beta1 with that, as assertions are likely going to be enabled in a lot of test builds. -- Michael
Attachment
Re: TRAP: FailedAssertion("tabstat->trans == trans", File: "pgstat_relation.c", Line: 508
From
Andres Freund
Date:
Hi, On 2022-05-11 15:48:40 +0900, Michael Paquier wrote: > On Mon, Apr 25, 2022 at 03:18:52PM +0900, Michael Paquier wrote: > > On Wed, Apr 20, 2022 at 01:03:20PM +0200, Erik Rijkers wrote: > >> Yes, that seems to fix it: I applied that latter patch, and ran my program > >> 250x without errors. Then I removed it again an it gave the error within > >> 15x. > > > > That looks simple enough, indeed. Andres, are you planning to address > > this issue? > > Ping. It looks annoying to release beta1 with that, as assertions are > likely going to be enabled in a lot of test builds. I'll try to fix it tomorrow... Sorry for the delay. Greetings, Andres Freund
Re: TRAP: FailedAssertion("tabstat->trans == trans", File: "pgstat_relation.c", Line: 508
From
Michael Paquier
Date:
On Wed, May 11, 2022 at 08:32:14PM -0700, Andres Freund wrote: > I'll try to fix it tomorrow... Sorry for the delay. Thanks, Andres. -- Michael
Attachment
Re: TRAP: FailedAssertion("tabstat->trans == trans", File: "pgstat_relation.c", Line: 508
From
Andres Freund
Date:
Hi, I finally pushed the fix for this. Erik, thanks for the report! And thanks Michael for the ping... On 2022-05-11 20:32:14 -0700, Andres Freund wrote: > On 2022-05-11 15:48:40 +0900, Michael Paquier wrote: > > Ping. It looks annoying to release beta1 with that, as assertions are > > likely going to be enabled in a lot of test builds. FWIW, it's somewhat hard to hit (basically the sender needs to stall while sending out a transaction / network being really slow), so it'd not have been likely to be hit by all that many people. Greetings, Andres Freund