Re: Skipping logical replication transactions on subscriber side - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Skipping logical replication transactions on subscriber side
Date
Msg-id CAD21AoDoQ6pUdXN=wx2UoB5_uWR=24w0q+YwYDr4LEcEjeqxKA@mail.gmail.com
Whole thread Raw
In response to Re: Skipping logical replication transactions on subscriber side  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Skipping logical replication transactions on subscriber side  (Amit Kapila <amit.kapila16@gmail.com>)
Re: Skipping logical replication transactions on subscriber side  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Wed, Jul 14, 2021 at 5:14 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Jul 12, 2021 at 8:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Jul 12, 2021 at 11:13 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Mon, Jul 12, 2021 at 1:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Mon, Jul 12, 2021 at 9:37 AM Alexey Lesovsky <lesovsky@gmail.com> wrote:
> > > > >
> > > > > On Mon, Jul 12, 2021 at 8:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >>
> > > > >> >
> > > > >> > Ok, looks nice. But I am curious how this will work in the case when there are two (or more) errors in the
samesubscription, but different relations? 
> > > > >> >
> > > > >>
> > > > >> We can't proceed unless the first error is resolved, so there
> > > > >> shouldn't be multiple unresolved errors.
> > > > >
> > > > >
> > > > > Ok. I thought multiple errors are possible when many tables are initialized using parallel workers (with
max_sync_workers_per_subscription> 1). 
> > > > >
> > > >
> > > > Yeah, that is possible but that covers under the second condition
> > > > mentioned by me and in such cases I think we should have separate rows
> > > > for each tablesync. Is that right, Sawada-san or do you have something
> > > > else in mind?
> > >
> > > Yeah, I agree to have separate rows for each table sync. The table
> > > should not be processed by both the table sync worker and the apply
> > > worker at a time so the pair of subscription OID and relation OID will
> > > be unique. I think that we have a boolean column in the view,
> > > indicating whether the error entry is reported by the table sync
> > > worker or the apply worker, or maybe we also can have the action
> > > column show "TABLE SYNC" if the error is reported by the table sync
> > > worker.
> > >
> >
> > Or similar to backend_type (text) in pg_stat_activity, we can have
> > something like error_source (text) which will display apply worker or
> > tablesync worker? I think if we have this column then even if there is
> > a chance that both apply and sync worker operates on the same
> > relation, we can identify it via this column.
>
> Sounds good. I'll incorporate this in the next version patch that I'm
> planning to submit this week.

Sorry, I could not make it this week. I'll submit them early next week.
While updating the patch I thought we need to have more design
discussion on two points of clearing error details after the error is
resolved:

1. How to clear apply worker errors. IIUC we've discussed that once
the apply worker skipped the transaction we leave the error entry
itself but clear its fields except for some fields such as failure
counts. But given that the stats messages could be lost, how can we
ensure to clear those error details? For table sync workers’ error, we
can have autovacuum workers periodically check entires of
pg_subscription_rel and clear the error entry if the table sync worker
completes table sync (i.g., checking if srsubstate = ‘r’). But there
is no such information for the apply workers and subscriptions. In
addition to sending the message clearing the error details just after
skipping the transaction, I thought that we can have apply workers
periodically send the message clearing the error details but it seems
not good.

2. Do we really want to leave the table sync worker even after the
error is resolved and the table sync completes? Unlike the apply
worker error, the number of table sync worker errors could be very
large, for example, if a subscriber subscribes to many tables. If we
leave those errors in the stats view, it uses more memory space and
could affect writing and reading stats file performance. If such left
table sync error entries are not helpful in practice I think we can
remove them rather than clear some fields. What do you think?

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Added documentation for cascade and restrict option of drop statistics
Next
From: Fujii Masao
Date:
Subject: Re: 回复: Why is XLOG_FPI_FOR_HINT always need backups?