Re: START_REPLICATION SLOT causing a crash in an assert build - Mailing list pgsql-hackers

From Jaime Casanova
Subject Re: START_REPLICATION SLOT causing a crash in an assert build
Date
Msg-id YyAXoU4CQhlZ4/ZN@ahch-to
Whole thread Raw
In response to Re: START_REPLICATION SLOT causing a crash in an assert build  (Andres Freund <andres@anarazel.de>)
Responses Re: START_REPLICATION SLOT causing a crash in an assert build
List pgsql-hackers
On Wed, Sep 07, 2022 at 12:39:08PM -0700, Andres Freund wrote:
> Hi,
> 
> On 2022-09-06 18:40:49 -0500, Jaime Casanova wrote:
> > I'm not sure what is causing this, but I have seen this twice. The
> > second time without activity after changing the set of tables in a
> > PUBLICATION.

This crash happens after a reset of statistics for a slot replication

> Can you describe the steps to reproduce?
> 

bin/pg_ctl -D data1 initdb
bin/pg_ctl -D data1 -l logfile1 -o "-c port=54315 -c wal_level=logical" start
bin/psql -p 54315 postgres <<EOF
    create table t1 (i int primary key);
    create publication pub1 for table t1;
EOF

bin/pg_ctl -D data2 initdb
bin/pg_ctl -D data2 -l logfile2 -o "-c port=54316" start
bin/psql -p 54316 postgres <<EOF
    create table t1 (i int primary key);
    create subscription sub1 connection 'host=/tmp port=54315 dbname=postgres' publication pub1;
EOF

bin/psql -p 54315 postgres <<EOF
    select pg_stat_reset_replication_slot('sub1');
    insert into t1 values(1);
EOF



> Which git commit does this happen on?
> 

just tested again on f5047c1293acce3c6c3802b06825aa3a9f9aa55a

> 
> > gdb says that debug_query_string contains:
> > 
> > """
> > START_REPLICATION SLOT "sub_pgbench" LOGICAL 0/0 (proto_version '3', publication_names
'"pub_pgbench"')START_REPLICATIONSLOT "sub_pgbench" LOGICAL 0/0 (proto_version '3', publication_names '"pub_pgbench"')
 
> > """
> > 
> > attached the backtrace.
> > 
> 
> > #2  0x00005559bfd4f0ed in ExceptionalCondition (
> >     conditionName=0x5559bff30e20 "namestrcmp(&statent->slotname, NameStr(slot->data.name)) == 0",
errorType=0x5559bff30e0d"FailedAssertion", fileName=0x5559bff30dbb "pgstat_replslot.c", 
 
> >     lineNumber=89) at assert.c:69
> 
> what are statent->slotname and slot->data.name?
> 

and the problem seems to be that after zero'ing the stats that includes
the name of the replication slot, this simple patch fixes it... not sure
if it's the right fix though...

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL

Attachment

pgsql-hackers by date:

Previous
From: Andrey Borodin
Date:
Subject: Re: pg_stat_statements locking
Next
From: "houzj.fnst@fujitsu.com"
Date:
Subject: RE: why can't a table be part of the same publication as its schema