Re: Skipping logical replication transactions on subscriber side - Mailing list pgsql-hackers

From Alexey Lesovsky
Subject Re: Skipping logical replication transactions on subscriber side
Date
Msg-id CAGnetYfO0AEgLmSPrxRuhq3ykDH5PGsTeTV2xsbZEXnMvzah2w@mail.gmail.com
Whole thread Raw
In response to Re: Skipping logical replication transactions on subscriber side  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Skipping logical replication transactions on subscriber side  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
Hi,
Have a few notes about pg_stat_logical_replication_error from the DBA point of view (which will use this view in the future).
1. As I understand it, this view might contain many errors related to different subscriptions. It is better to name "pg_stat_logical_replication_errors" using the plural form (like this done for stat views for tables, indexes, functions). Also, I'd like to suggest thinking twice about the view name (and function used in view DDL) -  "pg_stat_logical_replication_error" contains very common "logical replication" words, but the view contains errors related to subscriptions only. In the future there could be other kinds of errors related to logical replication, but not related to subscriptions - what will you do?
2. Add a field with database name or id - it helps to quickly understand to which database the subscription belongs.
3. Add a counter field with total number of errors - it helps to calculate errors rates and aggregations (sum), and don't lose information about errors between view checks.
4. Add text of last error (if it will not be too expensive).
5. Rename the "action" field to "command", as I know this is right from terminology point of view.

Finally, the view might seems like this:

postgres(1:25250)=# select * from pg_stat_logical_replication_errors;
subname | datid | relid | command | xid | total | last_failure | last_failure_text
----------+--------+-------+---------+-----+-------+-------------------------------+---------------------------
sub_1 | 12345 | 16384 | INSERT | 736 | 145 | 2021-06-27 12:12:45.142675+09 | something goes wrong...
sub_2 | 12346 | 16458 | UPDATE | 845 | 12 | 2021-06-27 12:16:01.458752+09 | hmm, something goes wrong

Regards, Alexey

On Mon, Jul 5, 2021 at 2:59 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Thu, Jun 17, 2021 at 6:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Jun 17, 2021 at 3:24 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > > Now, if this function is used by super
> > > users then we can probably trust that they provide the XIDs that we
> > > can trust to be skipped but OTOH making a restriction to allow these
> > > functions to be used by superusers might restrict the usage of this
> > > repair tool.
> >
> > If we specify the subscription id or name, maybe we can allow also the
> > owner of subscription to do that operation?
>
> Ah, the owner of the subscription must be superuser.

I've attached PoC patches.

0001 patch introduces the ability to skip transactions on the
subscriber side. We can specify XID to the subscription by like ALTER
SUBSCRIPTION test_sub SET SKIP TRANSACTION 100. The implementation
seems straightforward except for setting origin state. After skipping
the transaction we have to update the session origin state so that we
can start streaming the transaction next to the one that we just
skipped in case of the server crash or restarting the apply worker. We
set origin state to the commit WAL record. However, since we skip all
changes we don’t write any WAL even if we call CommitTransaction() at
the end of the skipped transaction. So the patch sets the origin state
to the transaction that updates the pg_subscription system catalog to
reset the skip XID. I think we need a discussion of this part.

With 0002 and 0003 patches, we report the error information in server
logs and the stats view, respectively. 0002 patch adds errcontext for
messages that happened during applying the changes:

ERROR:  duplicate key value violates unique constraint "hoge_pkey"
DETAIL:  Key (c)=(1) already exists.
CONTEXT:  during apply of "INSERT" for relation "public.hoge" in
transaction with xid 736 committs 2021-06-27 12:12:30.053887+09

0003 patch adds pg_stat_logical_replication_error statistics view
discussed on another thread[1]. The apply worker sends the error
information to the stats collector if an error happens during applying
changes. We can check those errors as follow:

postgres(1:25250)=# select * from pg_stat_logical_replication_error;
 subname  | relid | action | xid |         last_failure
----------+-------+--------+-----+-------------------------------
 test_sub | 16384 | INSERT | 736 | 2021-06-27 12:12:45.142675+09
(1 row)

I added only columns required for the skipping transaction feature to
the view for now.

Please note that those patches are meant to evaluate the concept we've
discussed so far. Those don't have the doc update yet.

Regards,

[1] https://www.postgresql.org/message-id/DB35438F-9356-4841-89A0-412709EBD3AB%40enterprisedb.com


--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


--
С уважением Алексей В. Лесовский


pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Skipping logical replication transactions on subscriber side
Next
From: Gilles Darold
Date:
Subject: Re: [PATCH] Hooks at XactCommand level