Thread: A failure in t/001_rep_changes.pl

A failure in t/001_rep_changes.pl

From
Bharath Rupireddy
Date:
Hi,

I recently observed an assertion failure twice in t/001_rep_changes.pl
on HEAD with the backtrace [1] on my dev EC2 c5.4xlarge instance [2].
Unfortunately I'm not observing it again. I haven't got a chance to
dive deep into it. However, I'm posting it here just for the records,
and in case something can be derived out of the backtrace.

[1] t/001_rep_changes.pl

2024-01-31 12:24:38.474 UTC [840166]
pg_16435_sync_16393_7330237333761601891 STATEMENT:
DROP_REPLICATION_SLOT pg_16435_sync_16393_7330237333761601891 WAIT
TRAP: failed Assert("list->head != INVALID_PGPROCNO"), File:
"../../../../src/include/storage/proclist.h", Line: 101, PID: 840166
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(ExceptionalCondition+0xbb)[0x55c8edf6b8f9]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(+0x6637de)[0x55c8edd517de]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(ConditionVariablePrepareToSleep+0x85)[0x55c8edd51b91]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(ReplicationSlotAcquire+0x142)[0x55c8edcead6b]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(ReplicationSlotDrop+0x51)[0x55c8edceb47f]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(+0x60da71)[0x55c8edcfba71]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(exec_replication_command+0x47e)[0x55c8edcfc96a]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(PostgresMain+0x7df)[0x55c8edd7d644]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(+0x5ab50c)[0x55c8edc9950c]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(+0x5aab21)[0x55c8edc98b21]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(+0x5a70de)[0x55c8edc950de]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(PostmasterMain+0x1534)[0x55c8edc949db]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(+0x459c47)[0x55c8edb47c47]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f19fe629d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f19fe629e40]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(_start+0x25)[0x55c8ed7c4565]
2024-01-31 12:24:38.476 UTC [840168]
pg_16435_sync_16390_7330237333761601891 LOG:  statement: SELECT
a.attnum,       a.attname,       a.atttypid,       a.attnum =
ANY(i.indkey)  FROM pg_catalog.pg_attribute a  LEFT JOIN
pg_catalog.pg_index i       ON (i.indexrelid =
pg_get_replica_identity_index(16391)) WHERE a.attnum >
0::pg_catalog.int2   AND NOT a.attisdropped AND a.attgenerated = ''
AND a.attrelid = 16391 ORDER BY a.attnum

[2] Linux ip-000-00-0-000 6.2.0-1018-aws #18~22.04.1-Ubuntu SMP Wed
Jan 10 22:54:16 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: A failure in t/001_rep_changes.pl

From
vignesh C
Date:
On Wed, 14 Feb 2024 at 13:19, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Hi,
>
> I recently observed an assertion failure twice in t/001_rep_changes.pl
> on HEAD with the backtrace [1] on my dev EC2 c5.4xlarge instance [2].
> Unfortunately I'm not observing it again. I haven't got a chance to
> dive deep into it. However, I'm posting it here just for the records,
> and in case something can be derived out of the backtrace.
>
> [1] t/001_rep_changes.pl
>
> 2024-01-31 12:24:38.474 UTC [840166]
> pg_16435_sync_16393_7330237333761601891 STATEMENT:
> DROP_REPLICATION_SLOT pg_16435_sync_16393_7330237333761601891 WAIT
> TRAP: failed Assert("list->head != INVALID_PGPROCNO"), File:
> "../../../../src/include/storage/proclist.h", Line: 101, PID: 840166
> postgres: publisher: walsender ubuntu postgres [local]
> DROP_REPLICATION_SLOT(ExceptionalCondition+0xbb)[0x55c8edf6b8f9]
> postgres: publisher: walsender ubuntu postgres [local]
> DROP_REPLICATION_SLOT(+0x6637de)[0x55c8edd517de]
> postgres: publisher: walsender ubuntu postgres [local]
> DROP_REPLICATION_SLOT(ConditionVariablePrepareToSleep+0x85)[0x55c8edd51b91]
> postgres: publisher: walsender ubuntu postgres [local]
> DROP_REPLICATION_SLOT(ReplicationSlotAcquire+0x142)[0x55c8edcead6b]
> postgres: publisher: walsender ubuntu postgres [local]
> DROP_REPLICATION_SLOT(ReplicationSlotDrop+0x51)[0x55c8edceb47f]
> postgres: publisher: walsender ubuntu postgres [local]
> DROP_REPLICATION_SLOT(+0x60da71)[0x55c8edcfba71]
> postgres: publisher: walsender ubuntu postgres [local]
> DROP_REPLICATION_SLOT(exec_replication_command+0x47e)[0x55c8edcfc96a]
> postgres: publisher: walsender ubuntu postgres [local]
> DROP_REPLICATION_SLOT(PostgresMain+0x7df)[0x55c8edd7d644]
> postgres: publisher: walsender ubuntu postgres [local]
> DROP_REPLICATION_SLOT(+0x5ab50c)[0x55c8edc9950c]
> postgres: publisher: walsender ubuntu postgres [local]
> DROP_REPLICATION_SLOT(+0x5aab21)[0x55c8edc98b21]
> postgres: publisher: walsender ubuntu postgres [local]
> DROP_REPLICATION_SLOT(+0x5a70de)[0x55c8edc950de]
> postgres: publisher: walsender ubuntu postgres [local]
> DROP_REPLICATION_SLOT(PostmasterMain+0x1534)[0x55c8edc949db]
> postgres: publisher: walsender ubuntu postgres [local]
> DROP_REPLICATION_SLOT(+0x459c47)[0x55c8edb47c47]
> /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f19fe629d90]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f19fe629e40]
> postgres: publisher: walsender ubuntu postgres [local]
> DROP_REPLICATION_SLOT(_start+0x25)[0x55c8ed7c4565]
> 2024-01-31 12:24:38.476 UTC [840168]
> pg_16435_sync_16390_7330237333761601891 LOG:  statement: SELECT
> a.attnum,       a.attname,       a.atttypid,       a.attnum =
> ANY(i.indkey)  FROM pg_catalog.pg_attribute a  LEFT JOIN
> pg_catalog.pg_index i       ON (i.indexrelid =
> pg_get_replica_identity_index(16391)) WHERE a.attnum >
> 0::pg_catalog.int2   AND NOT a.attisdropped AND a.attgenerated = ''
> AND a.attrelid = 16391 ORDER BY a.attnum
>
> [2] Linux ip-000-00-0-000 6.2.0-1018-aws #18~22.04.1-Ubuntu SMP Wed
> Jan 10 22:54:16 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

By any chance do you have the log files when this failure occurred, if
so please share it.

Regards,
Vignesh



Re: A failure in t/001_rep_changes.pl

From
Kyotaro Horiguchi
Date:
At Fri, 23 Feb 2024 15:50:21 +0530, vignesh C <vignesh21@gmail.com> wrote in 
> By any chance do you have the log files when this failure occurred, if
> so please share it.

In my understanding, within a single instance, no two proclists can
simultaneously share the same waitlink member of PGPROC.

On the other hand, a publisher uses two condition variables for slots
and WAL waiting, which work on the same PGPROC member cvWaitLink. I
suspect this issue arises from the configuration. However, although it
is unlikly related to this specific issue, a similar problem can arise
in instances that function both as logical publisher and physical
primary.

Regardless of this issue, I think we should provide separate waitlink
members for condition variables that can possibly be used
simultaneously.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center