Re: Failed transaction statistics to measure the logical replication progress - Mailing list pgsql-hackers
From | vignesh C |
---|---|
Subject | Re: Failed transaction statistics to measure the logical replication progress |
Date | |
Msg-id | CALDaNm32HHjdwmuoF+Nw5CU70r819Kq+Tmat3bzkkvSv_2u=gA@mail.gmail.com Whole thread Raw |
In response to | RE: Failed transaction statistics to measure the logical replication progress ("osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com>) |
Responses |
RE: Failed transaction statistics to measure the logical replication progress
|
List | pgsql-hackers |
On Wed, Dec 1, 2021 at 3:04 PM osumi.takamichi@fujitsu.com <osumi.takamichi@fujitsu.com> wrote: > > On Friday, November 19, 2021 11:11 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Besides that, I’m not sure how useful commit_bytes, abort_bytes, and > > error_bytes are. I originally thought these statistics track the size of received > > data, i.g., how much data is transferred from the publisher and processed on > > the subscriber. But what the view currently has is how much memory is used in > > the subscription worker. The subscription worker emulates > > ReorderBufferChangeSize() on the subscriber side but, as the comment of > > update_apply_change_size() mentions, the size in the view is not accurate: > ... > > I guess that the purpose of these values is to compare them to total_bytes, > > stream_byte, and spill_bytes but if the calculation is not accurate, does it mean > > that the more stats are updated, the more the stats will be getting inaccurate? > Thanks for your comment ! > > I tried to solve your concerns about byte columns but there are really difficult issues to solve. > For example, to begin with the messages of apply worker are different from those of > reorder buffer. > > Therefore, I decided to split the previous patch and make counter columns go first. > v14 was checked by pgperltidy and pgindent. > > This patch can be applied to the PG whose commit id is after 8d74fc9 (introduction of > pg_stat_subscription_workers). Thanks for the updated patch. Currently we are storing the commit count, error_count and abort_count for each table of the table sync operation. If we have thousands of tables, we will be storing the information for each of the tables. Shouldn't we be storing the consolidated information in this case. diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c index f07983a..02e9486 100644 --- a/src/backend/replication/logical/tablesync.c +++ b/src/backend/replication/logical/tablesync.c @@ -1149,6 +1149,11 @@ copy_table_done: MyLogicalRepWorker->relstate_lsn = *origin_startpos; SpinLockRelease(&MyLogicalRepWorker->relmutex); + /* Report the success of table sync. */ + pgstat_report_subworker_xact_end(MyLogicalRepWorker->subid, + MyLogicalRepWorker->relid, + 0 /* no logical message type */ ); postgres=# select * from pg_stat_subscription_workers ; subid | subname | subrelid | commit_count | error_count | abort_count | last_error_relid | last_error_command | last_error_xid | last_error_count | last_error_message | last_error_time -------+---------+----------+--------------+-------------+-------------+------------------+--------------------+----------------+------------------+--------------------+----------------- 16411 | sub1 | 16387 | 1 | 0 | 0 | | | | 0 | | 16411 | sub1 | 16396 | 1 | 0 | 0 | | | | 0 | | 16411 | sub1 | 16390 | 1 | 0 | 0 | | | | 0 | | 16411 | sub1 | 16393 | 1 | 0 | 0 | | | | 0 | | 16411 | sub1 | 16402 | 1 | 0 | 0 | | | | 0 | | 16411 | sub1 | 16408 | 1 | 0 | 0 | | | | 0 | | 16411 | sub1 | 16384 | 1 | 0 | 0 | | | | 0 | | 16411 | sub1 | 16399 | 1 | 0 | 0 | | | | 0 | | 16411 | sub1 | 16405 | 1 | 0 | 0 | | | | 0 | | (9 rows) Regards, Vignesh
pgsql-hackers by date: