Re: Failed transaction statistics to measure the logical replication progress - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Failed transaction statistics to measure the logical replication progress
Date
Msg-id CAD21AoAixHnpjG-TtnSejJ2Dv1VsrzGr3oVPSRFhYjz3Z8_XZA@mail.gmail.com
Whole thread Raw
In response to Failed transaction statistics to measure the logical replication progress  ("osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com>)
Responses RE: Failed transaction statistics to measure the logical replication progress  ("osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com>)
List pgsql-hackers
On Thu, Jul 8, 2021 at 3:55 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:
>
> Hello, hackers
>
>
> When the current HEAD fails during logical decoding, the failure
> increments txns count in pg_stat_replication_slots - [1] and adds
> the transaction size to the sum of bytes in the same repeatedly
> on the publisher, until the problem is solved.
> One of the good examples is duplication error on the subscriber side
> and this applies to both streaming and spill cases as well.
>
> This update prevents users from grasping the exact number and size of
> successful and unsuccessful transactions. Accordingly, we need to
> have new columns of failed transactions that will work to differentiate
> both of them for all types, which means spill, streaming and normal
> transactions. This will help users to measure the exact status of
> logical replication.

Could you please elaborate on use cases of the proposed statistics?
For example, the current statistics on pg_replication_slots can be
used for tuning logical_decoding_work_mem as well as inferring the
total amount of bytes passed to the output plugin. How will the user
use those statistics?

Also, if we want the stats of successful transactions why don't we
show the stats of successful transactions in the view instead of ones
of failed transactions?

>
> Attached file is the POC patch for this.
> Current design is to save failed stats data in the ReplicationSlot struct.
> This is because after the error, I'm not able to access the ReorderBuffer object.
> Thus, I chose the object where I can interact with at the ReplicationSlotRelease timing.

When discussing the pg_stat_replication_slots view, there was an idea
to store the slot statistics on ReplicationSlot struct. But the idea
was rejected mainly because the struct is on the shared buffer[1]. If
we store those counts on ReplicationSlot struct it increases the usage
of shared memory. And those statistics are used only by logical slots
and don’t necessarily need to be shared among the server processes.
Moreover, if we want to add more statistics on the view in the future,
it further increases the usage of shared memory. If we want to track
the stats of successful transactions, I think it's easier to track
them on the subscriber side rather than the publisher side. We can
increase counters when applying [stream]commit/abort logical changes
on the subscriber.

Regards,

[1] https://www.postgresql.org/message-id/CAA4eK1Kuj%2B3G59hh3wu86f4mmpQLpah_mGv2-wfAPyn%2BzT%3DP4A%40mail.gmail.com

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Ranier Vilela
Date:
Subject: Re: Out-of-memory error reports in libpq
Next
From: Bruce Momjian
Date:
Subject: Re: pg_upgrade does not upgrade pg_stat_statements properly