Thread: A failure in t/001_rep_changes.pl
Hi, I recently observed an assertion failure twice in t/001_rep_changes.pl on HEAD with the backtrace [1] on my dev EC2 c5.4xlarge instance [2]. Unfortunately I'm not observing it again. I haven't got a chance to dive deep into it. However, I'm posting it here just for the records, and in case something can be derived out of the backtrace. [1] t/001_rep_changes.pl 2024-01-31 12:24:38.474 UTC [840166] pg_16435_sync_16393_7330237333761601891 STATEMENT: DROP_REPLICATION_SLOT pg_16435_sync_16393_7330237333761601891 WAIT TRAP: failed Assert("list->head != INVALID_PGPROCNO"), File: "../../../../src/include/storage/proclist.h", Line: 101, PID: 840166 postgres: publisher: walsender ubuntu postgres [local] DROP_REPLICATION_SLOT(ExceptionalCondition+0xbb)[0x55c8edf6b8f9] postgres: publisher: walsender ubuntu postgres [local] DROP_REPLICATION_SLOT(+0x6637de)[0x55c8edd517de] postgres: publisher: walsender ubuntu postgres [local] DROP_REPLICATION_SLOT(ConditionVariablePrepareToSleep+0x85)[0x55c8edd51b91] postgres: publisher: walsender ubuntu postgres [local] DROP_REPLICATION_SLOT(ReplicationSlotAcquire+0x142)[0x55c8edcead6b] postgres: publisher: walsender ubuntu postgres [local] DROP_REPLICATION_SLOT(ReplicationSlotDrop+0x51)[0x55c8edceb47f] postgres: publisher: walsender ubuntu postgres [local] DROP_REPLICATION_SLOT(+0x60da71)[0x55c8edcfba71] postgres: publisher: walsender ubuntu postgres [local] DROP_REPLICATION_SLOT(exec_replication_command+0x47e)[0x55c8edcfc96a] postgres: publisher: walsender ubuntu postgres [local] DROP_REPLICATION_SLOT(PostgresMain+0x7df)[0x55c8edd7d644] postgres: publisher: walsender ubuntu postgres [local] DROP_REPLICATION_SLOT(+0x5ab50c)[0x55c8edc9950c] postgres: publisher: walsender ubuntu postgres [local] DROP_REPLICATION_SLOT(+0x5aab21)[0x55c8edc98b21] postgres: publisher: walsender ubuntu postgres [local] DROP_REPLICATION_SLOT(+0x5a70de)[0x55c8edc950de] postgres: publisher: walsender ubuntu postgres [local] DROP_REPLICATION_SLOT(PostmasterMain+0x1534)[0x55c8edc949db] postgres: publisher: walsender ubuntu postgres [local] DROP_REPLICATION_SLOT(+0x459c47)[0x55c8edb47c47] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f19fe629d90] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f19fe629e40] postgres: publisher: walsender ubuntu postgres [local] DROP_REPLICATION_SLOT(_start+0x25)[0x55c8ed7c4565] 2024-01-31 12:24:38.476 UTC [840168] pg_16435_sync_16390_7330237333761601891 LOG: statement: SELECT a.attnum, a.attname, a.atttypid, a.attnum = ANY(i.indkey) FROM pg_catalog.pg_attribute a LEFT JOIN pg_catalog.pg_index i ON (i.indexrelid = pg_get_replica_identity_index(16391)) WHERE a.attnum > 0::pg_catalog.int2 AND NOT a.attisdropped AND a.attgenerated = '' AND a.attrelid = 16391 ORDER BY a.attnum [2] Linux ip-000-00-0-000 6.2.0-1018-aws #18~22.04.1-Ubuntu SMP Wed Jan 10 22:54:16 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Wed, 14 Feb 2024 at 13:19, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > Hi, > > I recently observed an assertion failure twice in t/001_rep_changes.pl > on HEAD with the backtrace [1] on my dev EC2 c5.4xlarge instance [2]. > Unfortunately I'm not observing it again. I haven't got a chance to > dive deep into it. However, I'm posting it here just for the records, > and in case something can be derived out of the backtrace. > > [1] t/001_rep_changes.pl > > 2024-01-31 12:24:38.474 UTC [840166] > pg_16435_sync_16393_7330237333761601891 STATEMENT: > DROP_REPLICATION_SLOT pg_16435_sync_16393_7330237333761601891 WAIT > TRAP: failed Assert("list->head != INVALID_PGPROCNO"), File: > "../../../../src/include/storage/proclist.h", Line: 101, PID: 840166 > postgres: publisher: walsender ubuntu postgres [local] > DROP_REPLICATION_SLOT(ExceptionalCondition+0xbb)[0x55c8edf6b8f9] > postgres: publisher: walsender ubuntu postgres [local] > DROP_REPLICATION_SLOT(+0x6637de)[0x55c8edd517de] > postgres: publisher: walsender ubuntu postgres [local] > DROP_REPLICATION_SLOT(ConditionVariablePrepareToSleep+0x85)[0x55c8edd51b91] > postgres: publisher: walsender ubuntu postgres [local] > DROP_REPLICATION_SLOT(ReplicationSlotAcquire+0x142)[0x55c8edcead6b] > postgres: publisher: walsender ubuntu postgres [local] > DROP_REPLICATION_SLOT(ReplicationSlotDrop+0x51)[0x55c8edceb47f] > postgres: publisher: walsender ubuntu postgres [local] > DROP_REPLICATION_SLOT(+0x60da71)[0x55c8edcfba71] > postgres: publisher: walsender ubuntu postgres [local] > DROP_REPLICATION_SLOT(exec_replication_command+0x47e)[0x55c8edcfc96a] > postgres: publisher: walsender ubuntu postgres [local] > DROP_REPLICATION_SLOT(PostgresMain+0x7df)[0x55c8edd7d644] > postgres: publisher: walsender ubuntu postgres [local] > DROP_REPLICATION_SLOT(+0x5ab50c)[0x55c8edc9950c] > postgres: publisher: walsender ubuntu postgres [local] > DROP_REPLICATION_SLOT(+0x5aab21)[0x55c8edc98b21] > postgres: publisher: walsender ubuntu postgres [local] > DROP_REPLICATION_SLOT(+0x5a70de)[0x55c8edc950de] > postgres: publisher: walsender ubuntu postgres [local] > DROP_REPLICATION_SLOT(PostmasterMain+0x1534)[0x55c8edc949db] > postgres: publisher: walsender ubuntu postgres [local] > DROP_REPLICATION_SLOT(+0x459c47)[0x55c8edb47c47] > /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f19fe629d90] > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f19fe629e40] > postgres: publisher: walsender ubuntu postgres [local] > DROP_REPLICATION_SLOT(_start+0x25)[0x55c8ed7c4565] > 2024-01-31 12:24:38.476 UTC [840168] > pg_16435_sync_16390_7330237333761601891 LOG: statement: SELECT > a.attnum, a.attname, a.atttypid, a.attnum = > ANY(i.indkey) FROM pg_catalog.pg_attribute a LEFT JOIN > pg_catalog.pg_index i ON (i.indexrelid = > pg_get_replica_identity_index(16391)) WHERE a.attnum > > 0::pg_catalog.int2 AND NOT a.attisdropped AND a.attgenerated = '' > AND a.attrelid = 16391 ORDER BY a.attnum > > [2] Linux ip-000-00-0-000 6.2.0-1018-aws #18~22.04.1-Ubuntu SMP Wed > Jan 10 22:54:16 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux By any chance do you have the log files when this failure occurred, if so please share it. Regards, Vignesh
At Fri, 23 Feb 2024 15:50:21 +0530, vignesh C <vignesh21@gmail.com> wrote in > By any chance do you have the log files when this failure occurred, if > so please share it. In my understanding, within a single instance, no two proclists can simultaneously share the same waitlink member of PGPROC. On the other hand, a publisher uses two condition variables for slots and WAL waiting, which work on the same PGPROC member cvWaitLink. I suspect this issue arises from the configuration. However, although it is unlikly related to this specific issue, a similar problem can arise in instances that function both as logical publisher and physical primary. Regardless of this issue, I think we should provide separate waitlink members for condition variables that can possibly be used simultaneously. regards. -- Kyotaro Horiguchi NTT Open Source Software Center