Re: Skipping logical replication transactions on subscriber side - Mailing list pgsql-hackers

From David G. Johnston
Subject Re: Skipping logical replication transactions on subscriber side
Date
Msg-id CAKFQuwZey7aeBoO9Bee2CaHO+SoA2++krUU_qSXL6mJOv33bwA@mail.gmail.com
Whole thread Raw
In response to Re: Skipping logical replication transactions on subscriber side  ("David G. Johnston" <david.g.johnston@gmail.com>)
List pgsql-hackers
On Sat, Jan 22, 2022 at 9:21 AM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Sat, Jan 22, 2022 at 2:41 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

> Additionally, the description for pg_stat_subscription_workers should describe what happens once the transaction represented by last_error_xid has either been successfully processed or skipped.  Does this "last error" stick around until another error happens (which is hopefully very rare) or does it reset to blanks?
>

It will be reset only on subscription drop, otherwise, it will stick
around until another error happens.
 
I really dislike the user experience this provides, and given it is new in v15 (and right now this table seems to exist solely to support this feature) changing this seems within the realm of possibility. I have to imagine these workers have a sense of local state that would just be "no errors, no need to touch pg_stat_subscription_workers at the end of this transaction's commit".  It would save a local state of the error_xid and if a successfully committed transaction has that xid it would clear the error.  The skip code path would also check for and see the matching xid value and clear the error.  Even if the local state thing doesn't work, one catalog lookup per transaction seems like potentially reasonable overhead to incur here.


It shouldn't even need to be that overhead intensive.  Once an error is encountered the system stops.  By construction it must be told to redo, at which point the information about "last error" is no longer relevant and can be removed (for skipping the user/system will have already done everything with the xid that is needed before the redo is issued).  In the steady-state it then is simply empty until a new error arises at which point it becomes populated again; and stays that way until the system goes into redo mode as instructed by the user via one of several methods.

David J.

pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: Skipping logical replication transactions on subscriber side
Next
From: Stephen Frost
Date:
Subject: Re: How to get started with contribution