Re: Design of pg_stat_subscription_workers vs pgstats - Mailing list pgsql-hackers

From David G. Johnston
Subject Re: Design of pg_stat_subscription_workers vs pgstats
Date
Msg-id CAKFQuwYS_EUe+sR6MS3aiR9UXtUJfDcmHoDjrXAeDnY5w_9bnw@mail.gmail.com
Whole thread Raw
In response to Re: Design of pg_stat_subscription_workers vs pgstats  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Thu, Jan 27, 2022 at 2:15 PM Andres Freund <andres@anarazel.de> wrote:
Another related thing is that using a 32bit xid for allowing skipping is a bad
idea anyway - we shouldn't adding new interfaces with xid wraparound dangers -
it's getting more and more common to have multiple wraparounds a day.  An
easily better alternative would be the LSN at which a transaction starts.


Interesting idea.  I do not think a well-designed skipping feature need worry about wrap-around though.  The XID to be skipped was just seen be a worker and because it failed it will continue to be the same XID encountered by that worker until it is resolved.  There is no effective progression in time while the subscriber is stuck for wrap-around to happen.  Since we want to skip the transaction as a whole adding a layer of hidden indirection to the process seems undesirable.  I'm not against the idea though - to the user it is basically "copy this value from the error message in order to skip the transaction that caused the error".  Then the system verifies the value and then ensures it skips one, and only one, transaction.


It's pretty easy from the POV of getting into a new transaction.

PG_CATCH():

    /* get us out of the failed transaction */
    AbortOutOfAnyTransaction();

    StartTransactionCommand();
    /* do something to remember the error we just got */
    CommitTransactionCommand();

Thank you.
It may be a bit harder to afterwards to to not just error out the whole
worker, because we'd need to know what to do instead.


I imagine the launcher and worker startup code can be made to deal with the restart adequately.  Just wait if the last thing seen was an error.  Require the user to manually resume the worker - unless we really think a try-until-you-succeed with a backoff protocol is superior.  Upon system restart all error information is cleared and we start from scratch and let the errors happen (or not depending) as they will.

David J.

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: A test for replay of regression tests
Next
From: Andres Freund
Date:
Subject: Re: A test for replay of regression tests