Re: Design of pg_stat_subscription_workers vs pgstats - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Design of pg_stat_subscription_workers vs pgstats
Date
Msg-id CAD21AoCOG-BOUor3HyYk9KWxfmaosLq8ANu9Nhk0vyS7Rfd4GQ@mail.gmail.com
Whole thread Raw
In response to Re: Design of pg_stat_subscription_workers vs pgstats  ("David G. Johnston" <david.g.johnston@gmail.com>)
List pgsql-hackers
On Sat, Feb 19, 2022 at 4:47 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:
>
> On Fri, Feb 18, 2022 at 1:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>>
>> Here is the summary of the discussion, changes, and plan.
>>
>> 1. Move some error information such as the error message to a new
>> system catalog, pg_subscription_error. The pg_subscription_error table
>> would have the following columns:
>>
>> * sesubid : subscription Oid.
>> * serelid : relation Oid (NULL for apply worker).
>> * seerrlsn : commit-LSN or the error transaction.
>> * seerrcmd : command (INSERT, UPDATE, etc.) of the error transaction.
>> * seerrmsg : error message
>
>
> Not a fan of the "se" prefix but overall yes. We should include a timestamp.
>
>>
>> The tuple is inserted or updated when an apply worker or a tablesync
>> worker raises an error. If the same error occurs in a row, the update
>> is skipped.
>
>
> I disagree with this - I would treat every new instance of an error to be important and insert on conflict (sesubid,
serelid)the new entry, updating lsn/cmd/msg with the new values. 
>
>> The tuple is removed in the following cases:
>>
>> * the subscription is dropped.
>> * the table is dropped.
>>
>> * the table is removed from the subscription.
>> * the worker successfully committed a non-empty transaction.
>
>
> Correct.  This handles the "end of sync worker" just fine since its final action should be a successful commit of a
non-emptytransaction. 
>>
>>
>> With this change, pg_stat_subscription_workers will be like:
>>
>> * subid
>> * subname
>> * subrelid
>> * error_count
>> * last_error_timestamp
>>
>> This view will be extended by adding transaction statistics proposed
>> on another thread[1].
>
>
> I haven't reviewed that thread but in-so-far as this one goes I would just drop this altogether.  The error count, if
desired,can be added to pg_subscription_error, and the timestamp should be added there as noted above. 
>
> If this is useful for the feature on the other thread it can be reconstituted from there.
>
>>
>> 2. Fix a bug in pg_stat_subscription_workers. As pointed out by
>> Andres, there is a bug in pg_stat_subscription_workers; it doesn't
>> drop entries for already-dropped tables. We need to fix it.
>
>
> Given the above, this becomes an N/A.
>
>>
>> 3. Show commit-LSN of the error transaction in errcontext. Currently,
>> we show XID and commit timestamp in the errcontext. But given that we
>> use LSN in pg_subscriptoin_error, it's better to show commit-LSN as
>> well (or instead of error-XID).
>
>
> Agreed, I think: what "errcontext" is this referring to?

The CONTEXT part in a log message. The apply worker and the tablesync
worker write the details of the changes on an error in the CONTEXT
part as follow:

ERROR:  duplicate key value violates unique constraint "test_pkey"
DETAIL:  Key (id)=(1) already exists.
CONTEXT:  processing remote data during "INSERT" for replication
target relation "public.test" in transaction 716 with commit timestamp
2021-09-29 15:52:45.165754+00

>>
>>
>> 5. Record skipped data to the system catalog, say
>> pg_subscription_conflict_history so that there is a chance to analyze
>> and recover.
>
>
>> We can discuss the
>> details in a new thread.
>
> Agreed - the "before skipping" consideration seems considerably more helpful; but wouldn't need to be persistent, it
couldjust be a view.  A permanent record probably would best be recorded in the logs - though if we get the pre-skip
functionalitythe user can view directly and save the results wherever they wish or we can provide a function to spool
theinformation to the log.  I don't see persistent in-database storage being that desirable here. 
>
> If we only do something after the transaction has been skipped it may be useful to add an option to the skipping
commandto auto-disable the subscription after the transaction has been skipped and before any subsequent transactions
areapplied.  This will give the user a chance to process the post-skipped information before restoring sync and having
thesystem begin changing underneath them again. 
>
>>
>>
>> 4 and 5 might be better introduced together but I think since the user
>> is able to check what changes will be skipped on the publisher side we
>> can do 5 for v16.
>
>
> How would one go about doing that (checking what changes will be skipped on the publisher side)?

We can copy the replication slot while changing the plugin by using
pg_copy_replication_slot(). Then we can check the changes by using
pg_logical_slot_peek_changes() with the copied slot.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: killing perl2host
Next
From: Simon Riggs
Date:
Subject: Reducing power consumption on idle servers