Thread: Re: Collect statistics about conflicts in logical replication

Re: Collect statistics about conflicts in logical replication

From
Peter Smith
Date:
Hi Hou-San.

I tried an experiment where I deliberately violated a primary key
during initial table synchronization.

For example:

test_sub=# create table t1(a int primary key);
CREATE TABLE

test_sub=# insert into t1 values(1);
INSERT 0 1

test_sub=# create subscription sub1 connection 'dbname=test_pub'
publication pub1 with (enabled=false);
2024-08-29 09:53:21.172 AEST [24186] WARNING:  subscriptions created
by regression test cases should have names starting with "regress_"
WARNING:  subscriptions created by regression test cases should have
names starting with "regress_"
NOTICE:  created replication slot "sub1" on publisher
CREATE SUBSCRIPTION

test_sub=# select * from pg_stat_subscription_stats;
 subid | subname | apply_error_count | sync_error_count |
insert_exists_count | update_differ_count | update_exists_count |
update_missing_count | de
lete_differ_count | delete_missing_count | stats_reset

-------+---------+-------------------+------------------+---------------------+---------------------+---------------------+----------------------+---
------------------+----------------------+-------------
 16390 | sub1    |                 0 |                0 |
     0 |                   0 |                   0 |
 0 |
                0 |                    0 |
(1 row)

test_sub=# alter subscription sub1 enable;
ALTER SUBSCRIPTION

test_sub=# 2024-08-29 09:53:57.245 AEST [4345] LOG:  logical
replication apply worker for subscription "sub1" has started
2024-08-29 09:53:57.258 AEST [4347] LOG:  logical replication table
synchronization worker for subscription "sub1", table "t1" has started
2024-08-29 09:53:57.311 AEST [4347] ERROR:  duplicate key value
violates unique constraint "t1_pkey"
2024-08-29 09:53:57.311 AEST [4347] DETAIL:  Key (a)=(1) already exists.
2024-08-29 09:53:57.311 AEST [4347] CONTEXT:  COPY t1, line 1
2024-08-29 09:53:57.312 AEST [23501] LOG:  background worker "logical
replication tablesync worker" (PID 4347) exited with exit code 1
2024-08-29 09:54:02.385 AEST [4501] LOG:  logical replication table
synchronization worker for subscription "sub1", table "t1" has started
2024-08-29 09:54:02.462 AEST [4501] ERROR:  duplicate key value
violates unique constraint "t1_pkey"
2024-08-29 09:54:02.462 AEST [4501] DETAIL:  Key (a)=(1) already exists.
2024-08-29 09:54:02.462 AEST [4501] CONTEXT:  COPY t1, line 1
2024-08-29 09:54:02.463 AEST [23501] LOG:  background worker "logical
replication tablesync worker" (PID 4501) exited with exit code 1
2024-08-29 09:54:07.512 AEST [4654] LOG:  logical replication table
synchronization worker for subscription "sub1", table "t1" has started
2024-08-29 09:54:07.580 AEST [4654] ERROR:  duplicate key value
violates unique constraint "t1_pkey"
2024-08-29 09:54:07.580 AEST [4654] DETAIL:  Key (a)=(1) already exists.
2024-08-29 09:54:07.580 AEST [4654] CONTEXT:  COPY t1, line 1
...

test_sub=# alter subscription sub1 disable;'
ALTER SUBSCRIPTION
2024-08-29 09:55:10.329 AEST [4345] LOG:  logical replication worker
for subscription "sub1" will stop because the subscription was
disabled

test_sub=# select * from pg_stat_subscription_stats;
 subid | subname | apply_error_count | sync_error_count |
insert_exists_count | update_differ_count | update_exists_count |
update_missing_count | de
lete_differ_count | delete_missing_count | stats_reset

-------+---------+-------------------+------------------+---------------------+---------------------+---------------------+----------------------+---
------------------+----------------------+-------------
 16390 | sub1    |                 0 |               15 |
     0 |                   0 |                   0 |
 0 |
                0 |                    0 |
(1 row)

~~~

Notice how after a while there were multiple (15) 'sync_error_count' recorded.

According to the docs: 'insert_exists' happens when "Inserting a row
that violates a NOT DEFERRABLE unique constraint.".  So why are there
not the same number of 'insert_exists_count' recorded in
pg_stat_subscription_stats?

The 'insert_exists' is either not happening or is not being counted
during table synchronization. Either way, it was not what I was
expecting. If it is not a bug, maybe the docs need to explain more
about the rules for 'insert_exists' during the initial table sync.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



RE: Collect statistics about conflicts in logical replication

From
"Zhijie Hou (Fujitsu)"
Date:
On Thursday, August 29, 2024 8:31 AM Peter Smith <smithpb2250@gmail.com> wrote:

Hi,

> I tried an experiment where I deliberately violated a primary key during initial
> table synchronization.
> 
> For example:
...
> test_sub=# 2024-08-29 09:53:57.245 AEST [4345] LOG:  logical replication
> apply worker for subscription "sub1" has started
> 2024-08-29 09:53:57.258 AEST [4347] LOG:  logical replication table
> synchronization worker for subscription "sub1", table "t1" has started
> 2024-08-29 09:53:57.311 AEST [4347] ERROR:  duplicate key value violates
> unique constraint "t1_pkey"
> 2024-08-29 09:53:57.311 AEST [4347] DETAIL:  Key (a)=(1) already exists.
> 2024-08-29 09:53:57.311 AEST [4347] CONTEXT:  COPY t1, line 1
> ~~~
> 
> Notice how after a while there were multiple (15) 'sync_error_count' recorded.
> 
> According to the docs: 'insert_exists' happens when "Inserting a row that
> violates a NOT DEFERRABLE unique constraint.".  So why are there not the
> same number of 'insert_exists_count' recorded in pg_stat_subscription_stats?

Because this error was caused by COPY instead of an INSERT (e.g., CONTEXT:  COPY
t1, line 1), so this is as expected. The doc of conflict counts(
insert_exists_count) has already mentioned that it counts the conflict only *during the
application of changes* which is clear to me that it doesn't count the ones in
initial table synchronization. See the existing apply_error_count where we also
has similar wording(e.g. "an error occurred while applying changes").

Best Regards,
Hou zj