Re: [HACKERS] Logical replication existing data copy - Mailing list pgsql-hackers

From Erik Rijkers
Subject Re: [HACKERS] Logical replication existing data copy
Date
Msg-id b0dbcb2a1066d6728cbf62e391e7edf4@xs4all.nl
Whole thread Raw
In response to Re: [HACKERS] Logical replication existing data copy  (Erik Rijkers <er@xs4all.nl>)
Responses Re: [HACKERS] Logical replication existing data copy
List pgsql-hackers
On 2017-02-22 14:48, Erik Rijkers wrote:
> On 2017-02-22 13:03, Petr Jelinek wrote:
> 
>> 0001-Skip-unnecessary-snapshot-builds.patch
>> 0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch
>> 0003-Fix-xl_running_xacts-usage-in-snapshot-builder.patch
>> 0001-Use-asynchronous-connect-API-in-libpqwalreceiver.patch
>> 0002-Fix-after-trigger-execution-in-logical-replication.patch
>> 0003-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION.patch
>> 0001-Logical-replication-support-for-initial-data-copy-v5.patch
> 
> It works well now, or at least my particular test case seems now 
> solved.

Cried victory too early, I'm afraid.


The logical replication is now certainly much more stable but there are 
still errors, just less often.

The rare 'hang'-error that I mentioned a few emails back I have not yet 
encountered; I am beginning to trust that that is indeed solved.

But there is still sometimes incorrect replication.  The symptoms are 
the ones I mentioned earlier:
- incorrect number of rows in one of (mostly) pgbench_accounts or 
pgbench_history.
the numers are always off by a very small number, say less than 20, 
often even only 1 row.
- incorrect content in one of pgbench_accounts or pgbench_history 
(detected via md5). Also mostly the two tables named above.

I see sometimes primary key violations on the replica. That should not 
be possible if I have understood the intent of logical replication 
correctly.
( ERROR:  duplicate key value violates unique constraint 
"pgbench_tellers_pkey" )
mostly *_tellers, also seen *_branches

Understandably, the errors become more frequent with higher client 
counts: a 25x repeat with 1 client yielded only 1 failed run whereas a 
25x repeat with 16 clients gave 16 failures.

I attach once more the current incarnation of my test-bash pgbench 
runner, pgbench_derail2.sh.
Easiest to run it yourself, I guess.

I also attach the output (of pgbench_derail2.sh) of those two 25x 
repeats:
d2_scale__1_client__1_25x.txt
d2_scale__1_client_16_25x.txt

I worry a bit about the correctness of that test program 
(pgbench_derail2.sh). I especially wonder if it should look around 
better at startup (e.g., at stuff left over from previous iterations).   
If you see any incorrect/dumb things there, or a better way to monitor 
(aka pre-flight checks), please let me know.

But the current state si certainly a big step forward -- I guess it's 
just your bad luck that I had the afternoon off ;)

thanks,

Erik Rijkers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Bernd Helmle
Date:
Subject: Re: [HACKERS] Make subquery alias optional in FROM clause
Next
From: Dave Page
Date:
Subject: Re: [HACKERS] pg_monitor role