On 2017-02-22 14:48, Erik Rijkers wrote:
> On 2017-02-22 13:03, Petr Jelinek wrote:
>
>> 0001-Skip-unnecessary-snapshot-builds.patch
>> 0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch
>> 0003-Fix-xl_running_xacts-usage-in-snapshot-builder.patch
>> 0001-Use-asynchronous-connect-API-in-libpqwalreceiver.patch
>> 0002-Fix-after-trigger-execution-in-logical-replication.patch
>> 0003-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION.patch
>> 0001-Logical-replication-support-for-initial-data-copy-v5.patch
>
> It works well now, or at least my particular test case seems now
> solved.
Cried victory too early, I'm afraid.
The logical replication is now certainly much more stable but there are
still errors, just less often.
The rare 'hang'-error that I mentioned a few emails back I have not yet
encountered; I am beginning to trust that that is indeed solved.
But there is still sometimes incorrect replication. The symptoms are
the ones I mentioned earlier:
- incorrect number of rows in one of (mostly) pgbench_accounts or
pgbench_history.
the numers are always off by a very small number, say less than 20,
often even only 1 row.
- incorrect content in one of pgbench_accounts or pgbench_history
(detected via md5). Also mostly the two tables named above.
I see sometimes primary key violations on the replica. That should not
be possible if I have understood the intent of logical replication
correctly.
( ERROR: duplicate key value violates unique constraint
"pgbench_tellers_pkey" )
mostly *_tellers, also seen *_branches
Understandably, the errors become more frequent with higher client
counts: a 25x repeat with 1 client yielded only 1 failed run whereas a
25x repeat with 16 clients gave 16 failures.
I attach once more the current incarnation of my test-bash pgbench
runner, pgbench_derail2.sh.
Easiest to run it yourself, I guess.
I also attach the output (of pgbench_derail2.sh) of those two 25x
repeats:
d2_scale__1_client__1_25x.txt
d2_scale__1_client_16_25x.txt
I worry a bit about the correctness of that test program
(pgbench_derail2.sh). I especially wonder if it should look around
better at startup (e.g., at stuff left over from previous iterations).
If you see any incorrect/dumb things there, or a better way to monitor
(aka pre-flight checks), please let me know.
But the current state si certainly a big step forward -- I guess it's
just your bad luck that I had the afternoon off ;)
thanks,
Erik Rijkers
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers