Re: Design of pg_stat_subscription_workers vs pgstats - Mailing list pgsql-hackers

From David G. Johnston
Subject Re: Design of pg_stat_subscription_workers vs pgstats
Date
Msg-id CAKFQuwZk-Trf78QW7-_q9Pr2EGBhtg=wSA2hRiOW-K845Zv2cQ@mail.gmail.com
Whole thread Raw
In response to Re: Design of pg_stat_subscription_workers vs pgstats  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Design of pg_stat_subscription_workers vs pgstats  (Amit Kapila <amit.kapila16@gmail.com>)
Re: Design of pg_stat_subscription_workers vs pgstats  (Amit Kapila <amit.kapila16@gmail.com>)
Re: Design of pg_stat_subscription_workers vs pgstats  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Fri, Feb 18, 2022 at 1:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Here is the summary of the discussion, changes, and plan.

1. Move some error information such as the error message to a new
system catalog, pg_subscription_error. The pg_subscription_error table
would have the following columns:

* sesubid : subscription Oid.
* serelid : relation Oid (NULL for apply worker).
* seerrlsn : commit-LSN or the error transaction.
* seerrcmd : command (INSERT, UPDATE, etc.) of the error transaction.
* seerrmsg : error message

Not a fan of the "se" prefix but overall yes. We should include a timestamp.


The tuple is inserted or updated when an apply worker or a tablesync
worker raises an error. If the same error occurs in a row, the update
is skipped.

I disagree with this - I would treat every new instance of an error to be important and insert on conflict (sesubid, serelid) the new entry, updating lsn/cmd/msg with the new values.

The tuple is removed in the following cases:

* the subscription is dropped.
* the table is dropped. 
* the table is removed from the subscription.
* the worker successfully committed a non-empty transaction.

Correct.  This handles the "end of sync worker" just fine since its final action should be a successful commit of a non-empty transaction.

With this change, pg_stat_subscription_workers will be like:

* subid
* subname
* subrelid
* error_count
* last_error_timestamp

This view will be extended by adding transaction statistics proposed
on another thread[1].

I haven't reviewed that thread but in-so-far as this one goes I would just drop this altogether.  The error count, if desired, can be added to pg_subscription_error, and the timestamp should be added there as noted above.

If this is useful for the feature on the other thread it can be reconstituted from there.


2. Fix a bug in pg_stat_subscription_workers. As pointed out by
Andres, there is a bug in pg_stat_subscription_workers; it doesn't
drop entries for already-dropped tables. We need to fix it.

Given the above, this becomes an N/A.


3. Show commit-LSN of the error transaction in errcontext. Currently,
we show XID and commit timestamp in the errcontext. But given that we
use LSN in pg_subscriptoin_error, it's better to show commit-LSN as
well (or instead of error-XID).

Agreed, I think: what "errcontext" is this referring to?

5. Record skipped data to the system catalog, say
pg_subscription_conflict_history so that there is a chance to analyze
and recover.

We can discuss the
details in a new thread.
Agreed - the "before skipping" consideration seems considerably more helpful; but wouldn't need to be persistent, it could just be a view.  A permanent record probably would best be recorded in the logs - though if we get the pre-skip functionality the user can view directly and save the results wherever they wish or we can provide a function to spool the information to the log.  I don't see persistent in-database storage being that desirable here.

If we only do something after the transaction has been skipped it may be useful to add an option to the skipping command to auto-disable the subscription after the transaction has been skipped and before any subsequent transactions are applied.  This will give the user a chance to process the post-skipped information before restoring sync and having the system begin changing underneath them again.
 

4 and 5 might be better introduced together but I think since the user
is able to check what changes will be skipped on the publisher side we
can do 5 for v16.

How would one go about doing that (checking what changes will be skipped on the publisher side)?

David J.

pgsql-hackers by date:

Previous
From: Joe Conway
Date:
Subject: Re: Time to drop plpython2?
Next
From: Robert Haas
Date:
Subject: Re: buildfarm warnings