Re: Failed transaction statistics to measure the logical replication progress - Mailing list pgsql-hackers

From vignesh C
Subject Re: Failed transaction statistics to measure the logical replication progress
Date
Msg-id CALDaNm2kHv481wQS-u7c=taGQ6JcpT_aR6xpnU5pD8kWLRfJZw@mail.gmail.com
Whole thread Raw
In response to Failed transaction statistics to measure the logical replication progress  ("osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com>)
Responses RE: Failed transaction statistics to measure the logical replication progress  ("osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com>)
List pgsql-hackers
On Thu, Jul 8, 2021 at 12:25 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:
>
> Hello, hackers
>
>
> When the current HEAD fails during logical decoding, the failure
> increments txns count in pg_stat_replication_slots - [1] and adds
> the transaction size to the sum of bytes in the same repeatedly
> on the publisher, until the problem is solved.
> One of the good examples is duplication error on the subscriber side
> and this applies to both streaming and spill cases as well.
>
> This update prevents users from grasping the exact number and size of
> successful and unsuccessful transactions. Accordingly, we need to
> have new columns of failed transactions that will work to differentiate
> both of them for all types, which means spill, streaming and normal
> transactions. This will help users to measure the exact status of
> logical replication.
>
> Attached file is the POC patch for this.
> Current design is to save failed stats data in the ReplicationSlot struct.
> This is because after the error, I'm not able to access the ReorderBuffer object.
> Thus, I chose the object where I can interact with at the ReplicationSlotRelease timing.
>
> Below is one example that I can get on the publisher,
> after the duplication error on the subscriber caused by insert is solved.
>
> postgres=# select * from pg_stat_replication_slots;
> -[ RECORD 1 ]-------+------
> slot_name           | mysub
> spill_txns          | 0
> spill_count         | 0
> spill_bytes         | 0
> failed_spill_txns   | 0
> failed_spill_bytes  | 0
> stream_txns         | 0
> stream_count        | 0
> stream_bytes        | 0
> failed_stream_txns  | 0
> failed_stream_bytes | 0
> total_txns          | 4
> total_bytes         | 528
> failed_total_txns   | 3
> failed_total_bytes  | 396
> stats_reset         |
>
>
> Any ideas and comments are welcome.

+1 for having logical replication failed statistics. Currently if
there is any transaction failure in the subscriber after sending the
decoded data to the subscriber like constraint violation, object not
exist, the statistics will include the failed decoded transaction info
and there is no way to identify the actual successful transaction
data. This patch will help in measuring the actual decoded transaction
data.

Regards,
Vignesh



pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Next
From: Laurenz Albe
Date:
Subject: Re: printf %s with NULL pointer (was Re: BUG #17098: Assert failed on composing an error message when adding a type to an extension being dropped)