Re: Skipping logical replication transactions on subscriber side - Mailing list pgsql-hackers

On Mon, Nov 15, 2021 at 11:43 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Mon, Nov 15, 2021 at 2:48 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, Nov 15, 2021 at 4:49 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > >
> > > On Mon, Nov 15, 2021 at 1:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > I've attached an updated patch that incorporates all comments I got so
> > > > far. Please review it.
> > > >
> > >
> > > Thanks for the updated patch.
> > > A few minor comments:
> > >
> > > doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
> > >
> > > (1) tab in doc updates
> > >
> > > There's a tab before "Otherwise,":
> > >
> > > +        copy of the relation with <parameter>relid</parameter>.
> > >         Otherwise,
> >
> > Fixed.
> >
> > >
> > > src/backend/utils/adt/pgstatfuncs.c
> > >
> > > (2) The function comment for "pg_stat_reset_subscription_worker_sub"
> > > seems a bit long and I expected it to be multi-line (did you run
> > > pg_indent?)
> >
> > I ran pg_indent on pgstatfuncs.c but it didn't become a multi-line comment.
> >
> > >
> > > src/include/pgstat.h
> > >
> > > (3) Remove PgStat_StatSubWorkerEntry.dbid?
> > >
> > > The "dbid" member of the new PgStat_StatSubWorkerEntry struct doesn't
> > > seem to be used, so I think it should be removed.
> > > (I could remove it and everything builds OK and tests pass).
> > >
> >
> > Fixed.
> >
> > Thank you for the comments! I've updated an updated version patch.
>
> Thanks for the updated patch.
> I found one issue:
> This Assert can fail in few cases:
> +void
> +pgstat_report_subworker_error(Oid subid, Oid subrelid, Oid relid,
> +
> LogicalRepMsgType command, TransactionId xid,
> +                                                         const char *errmsg)
> +{
> +       PgStat_MsgSubWorkerError msg;
> +       int                     len;
> +
> +       Assert(strlen(errmsg) < PGSTAT_SUBWORKERERROR_MSGLEN);
> +       len = offsetof(PgStat_MsgSubWorkerError, m_message[0]) +
> strlen(errmsg) + 1;
> +
>
> I could reproduce the problem with the following scenario:
> Publisher:
> create table t1 (c1 varchar);
> create publication pub1 for table t1;
> insert into t1 values(repeat('abcd', 5000));
>
> Subscriber:
> create table t1(c1 smallint);
> create subscription sub1 connection 'dbname=postgres port=5432'
> publication pub1 with ( two_phase = true);
> postgres=# select * from pg_stat_subscription_workers;
> WARNING:  terminating connection because of crash of another server process
> DETAIL:  The postmaster has commanded this server process to roll back
> the current transaction and exit, because another server process
> exited abnormally and possibly corrupted shared memory.
> HINT:  In a moment you should be able to reconnect to the database and
> repeat your command.
> server closed the connection unexpectedly
>    This probably means the server terminated abnormally
>    before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
>
> Subscriber logs:
> 2021-11-15 19:27:56.380 IST [15685] LOG:  logical replication apply
> worker for subscription "sub1" has started
> 2021-11-15 19:27:56.384 IST [15687] LOG:  logical replication table
> synchronization worker for subscription "sub1", table "t1" has started
> TRAP: FailedAssertion("strlen(errmsg) < PGSTAT_SUBWORKERERROR_MSGLEN",
> File: "pgstat.c", Line: 1946, PID: 15687)
> postgres: logical replication worker for subscription 16387 sync 16384
> (ExceptionalCondition+0xd0)[0x55a18f3c727f]
> postgres: logical replication worker for subscription 16387 sync 16384
> (pgstat_report_subworker_error+0x7a)[0x55a18f126417]
> postgres: logical replication worker for subscription 16387 sync 16384
> (ApplyWorkerMain+0x493)[0x55a18f176611]
> postgres: logical replication worker for subscription 16387 sync 16384
> (StartBackgroundWorker+0x23c)[0x55a18f11f7e2]
> postgres: logical replication worker for subscription 16387 sync 16384
> (+0x54efc0)[0x55a18f134fc0]
> postgres: logical replication worker for subscription 16387 sync 16384
> (+0x54f3af)[0x55a18f1353af]
> postgres: logical replication worker for subscription 16387 sync 16384
> (+0x54e338)[0x55a18f134338]
> /lib/x86_64-linux-gnu/libpthread.so.0(+0x141f0)[0x7feef84371f0]
> /lib/x86_64-linux-gnu/libc.so.6(__select+0x57)[0x7feef81e3ac7]
> postgres: logical replication worker for subscription 16387 sync 16384
> (+0x5498c2)[0x55a18f12f8c2]
> postgres: logical replication worker for subscription 16387 sync 16384
> (PostmasterMain+0x134c)[0x55a18f12f1dd]
> postgres: logical replication worker for subscription 16387 sync 16384
> (+0x43c3d4)[0x55a18f0223d4]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xd5)[0x7feef80fd565]
> postgres: logical replication worker for subscription 16387 sync 16384
> (_start+0x2e)[0x55a18ecaf4fe]
> 2021-11-15 19:27:56.483 IST [15645] LOG:  background worker "logical
> replication worker" (PID 15687) was terminated by signal 6: Aborted
> 2021-11-15 19:27:56.483 IST [15645] LOG:  terminating any other active
> server processes
> 2021-11-15 19:27:56.485 IST [15645] LOG:  all server processes
> terminated; reinitializing
>
> Here it fails because of a long error message ""invalid input syntax
> for type smallint:

Good catch!

>
\"abcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabc...."
> because we try to insert varchar type data into smallint type.  Maybe
> we should trim the error message in this case.

Right. I've fixed this issue and attached an updated patch.


Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Unnecessary global variable declared in xlog.c
Next
From: Tom Lane
Date:
Subject: Re: Unnecessary global variable declared in xlog.c