Re: Skipping logical replication transactions on subscriber side - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Skipping logical replication transactions on subscriber side
Date
Msg-id CAA4eK1+2O68tkwdZsyfw3aZ7zB4YdejM4GzCciRtwcON6gBbTw@mail.gmail.com
Whole thread Raw
In response to Re: Skipping logical replication transactions on subscriber side  (Peter Eisentraut <peter.eisentraut@enterprisedb.com>)
Responses Re: Skipping logical replication transactions on subscriber side
Re: Skipping logical replication transactions on subscriber side
List pgsql-hackers
On Tue, Jun 1, 2021 at 12:55 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
>
> On 27.05.21 12:04, Amit Kapila wrote:
> >>> Also, I am thinking that instead of a stat view, do we need
> >>> to consider having a system table (pg_replication_conflicts or
> >>> something like that) for this because what if stats information is
> >>> lost (say either due to crash or due to udp packet loss), can we rely
> >>> on stats view for this?
> >> Yeah, it seems better to use a catalog.
> >>
> > Okay.
>
> Could you store it shared memory?  You don't need it to be crash safe,
> since the subscription will just run into the same error again after
> restart.  You just don't want it to be lost, like with the statistics
> collector.
>

But, won't that be costly in cases where we have errors in the
processing of very large transactions? Subscription has to process all
the data before it gets an error. I think we can even imagine this
feature to be extended to use commitLSN as a skip candidate in which
case we can even avoid getting the data of that transaction from the
publisher. So if this information is persistent, the user can even set
the skip identifier after the restart before the publisher can send
all the data.

Also, I think we can't assume after the restart we will get the same
error because the user can perform some operations after the restart
and before we try to apply the same transaction. It might be that the
user wanted to see all the errors before the user can set the skip
identifier (and or method).

I think the XID (or say another identifier like commitLSN) which we
want to use for skipping the transaction as specified by the user has
to be stored in the catalog because otherwise, after the restart we
won't remember it and the user won't know that he needs to set it
again. Now, say we have multiple skip identifiers (XIDs, commitLSN,
..), isn't it better to store all conflict-related information in a
separate catalog like pg_subscription_conflict or something like that.
I think it might be also better to later extend it for auto conflict
resolution where the user can specify auto conflict resolution info
for a subscription. Is it better to store all such information in
pg_subscription or have a separate catalog? It is possible that even
if we have a separate catalog for conflict info, we might not want to
store error info there.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: A new function to wait for the backend exit after termination
Next
From: Tatsuro Yamada
Date:
Subject: Re: Duplicate history file?