Re: Skipping logical replication transactions on subscriber side - Mailing list pgsql-hackers

From vignesh C
Subject Re: Skipping logical replication transactions on subscriber side
Date
Msg-id CALDaNm3RRM-0FajnQ+QXGTEPckXaoAbNCWhTANd3_dRKYki2dw@mail.gmail.com
Whole thread Raw
In response to Re: Skipping logical replication transactions on subscriber side  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Skipping logical replication transactions on subscriber side
List pgsql-hackers
On Mon, Nov 15, 2021 at 2:48 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Nov 15, 2021 at 4:49 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> >
> > On Mon, Nov 15, 2021 at 1:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > I've attached an updated patch that incorporates all comments I got so
> > > far. Please review it.
> > >
> >
> > Thanks for the updated patch.
> > A few minor comments:
> >
> > doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
> >
> > (1) tab in doc updates
> >
> > There's a tab before "Otherwise,":
> >
> > +        copy of the relation with <parameter>relid</parameter>.
> >         Otherwise,
>
> Fixed.
>
> >
> > src/backend/utils/adt/pgstatfuncs.c
> >
> > (2) The function comment for "pg_stat_reset_subscription_worker_sub"
> > seems a bit long and I expected it to be multi-line (did you run
> > pg_indent?)
>
> I ran pg_indent on pgstatfuncs.c but it didn't become a multi-line comment.
>
> >
> > src/include/pgstat.h
> >
> > (3) Remove PgStat_StatSubWorkerEntry.dbid?
> >
> > The "dbid" member of the new PgStat_StatSubWorkerEntry struct doesn't
> > seem to be used, so I think it should be removed.
> > (I could remove it and everything builds OK and tests pass).
> >
>
> Fixed.
>
> Thank you for the comments! I've updated an updated version patch.

Thanks for the updated patch.
I found one issue:
This Assert can fail in few cases:
+void
+pgstat_report_subworker_error(Oid subid, Oid subrelid, Oid relid,
+
LogicalRepMsgType command, TransactionId xid,
+                                                         const char *errmsg)
+{
+       PgStat_MsgSubWorkerError msg;
+       int                     len;
+
+       Assert(strlen(errmsg) < PGSTAT_SUBWORKERERROR_MSGLEN);
+       len = offsetof(PgStat_MsgSubWorkerError, m_message[0]) +
strlen(errmsg) + 1;
+

I could reproduce the problem with the following scenario:
Publisher:
create table t1 (c1 varchar);
create publication pub1 for table t1;
insert into t1 values(repeat('abcd', 5000));

Subscriber:
create table t1(c1 smallint);
create subscription sub1 connection 'dbname=postgres port=5432'
publication pub1 with ( two_phase = true);
postgres=# select * from pg_stat_subscription_workers;
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
server closed the connection unexpectedly
   This probably means the server terminated abnormally
   before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Subscriber logs:
2021-11-15 19:27:56.380 IST [15685] LOG:  logical replication apply
worker for subscription "sub1" has started
2021-11-15 19:27:56.384 IST [15687] LOG:  logical replication table
synchronization worker for subscription "sub1", table "t1" has started
TRAP: FailedAssertion("strlen(errmsg) < PGSTAT_SUBWORKERERROR_MSGLEN",
File: "pgstat.c", Line: 1946, PID: 15687)
postgres: logical replication worker for subscription 16387 sync 16384
(ExceptionalCondition+0xd0)[0x55a18f3c727f]
postgres: logical replication worker for subscription 16387 sync 16384
(pgstat_report_subworker_error+0x7a)[0x55a18f126417]
postgres: logical replication worker for subscription 16387 sync 16384
(ApplyWorkerMain+0x493)[0x55a18f176611]
postgres: logical replication worker for subscription 16387 sync 16384
(StartBackgroundWorker+0x23c)[0x55a18f11f7e2]
postgres: logical replication worker for subscription 16387 sync 16384
(+0x54efc0)[0x55a18f134fc0]
postgres: logical replication worker for subscription 16387 sync 16384
(+0x54f3af)[0x55a18f1353af]
postgres: logical replication worker for subscription 16387 sync 16384
(+0x54e338)[0x55a18f134338]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x141f0)[0x7feef84371f0]
/lib/x86_64-linux-gnu/libc.so.6(__select+0x57)[0x7feef81e3ac7]
postgres: logical replication worker for subscription 16387 sync 16384
(+0x5498c2)[0x55a18f12f8c2]
postgres: logical replication worker for subscription 16387 sync 16384
(PostmasterMain+0x134c)[0x55a18f12f1dd]
postgres: logical replication worker for subscription 16387 sync 16384
(+0x43c3d4)[0x55a18f0223d4]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xd5)[0x7feef80fd565]
postgres: logical replication worker for subscription 16387 sync 16384
(_start+0x2e)[0x55a18ecaf4fe]
2021-11-15 19:27:56.483 IST [15645] LOG:  background worker "logical
replication worker" (PID 15687) was terminated by signal 6: Aborted
2021-11-15 19:27:56.483 IST [15645] LOG:  terminating any other active
server processes
2021-11-15 19:27:56.485 IST [15645] LOG:  all server processes
terminated; reinitializing

Here it fails because of a long error message ""invalid input syntax
for type smallint:

\"abcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabc...."
because we try to insert varchar type data into smallint type.  Maybe
we should trim the error message in this case.

Regards,
Vignesh



pgsql-hackers by date:

Previous
From: Rushabh Lathia
Date:
Subject: Should rename "startup process" to something else?
Next
From: Robert Haas
Date:
Subject: Re: Parallelize correlated subqueries that execute within each worker