BDR: SSL error: bad write retry - Mailing list pgsql-general

From Florin Andrei
Subject BDR: SSL error: bad write retry
Date
Msg-id d3da40a0ff19413ffecf7ec482b7c4c6@andrei.myip.org
Whole thread Raw
Responses Re: BDR: SSL error: bad write retry
List pgsql-general
BDR-0.9.3 and PG-9.4.4 on Ubuntu 14.04

Two nodes, BDR replication. Cluster is newly created, no nodes have been
removed from it. Making/deleting small tables works well across the
cluster.

Now I'm trying to pg_restore a larger database from another system
(pg_dump output file is 3.1 GB compressed). I am running pg_restore on
one node in the cluster, and hoping that BDR would replicate changes to
the other node.

It went well through creating schemas, extensions, comments, views,
functions, sequences and tables. It imported data into tables. Now it's
trying to create indexes - but it takes a very long time and I see
errors in the logs.

I see periodic data transfers between instances. CPU usage shoots up,
then back down. There's a cycle of a few minutes where these things
occur.

On the instance where I'm running pg_restore, I see this in the logs:

###############################################
2015-11-03 21:45:35.328
UTC,"postgres","pgmirror",4918,"10.1.1.169:36334",56392a26.1336,5,"idle",2015-11-03 21:41:58 UTC,5/146,0,LOG,08P01,"SSL
error:bad write retry",,,,,"slot ""bdr_16387_6212727469166484615_1_16387__"", output plugin ""bdr"", in the change
callback,associated LSN 0/2B29E50",,,,"bdr (6212727469166484615,1,16387,):receive" 
2015-11-03 21:45:35.328
UTC,"postgres","pgmirror",4918,"10.1.1.169:36334",56392a26.1336,6,"idle",2015-11-03 21:41:58
UTC,5/146,0,LOG,08006,"couldnot receive data from client: Connection reset by peer",,,,,"slot
""bdr_16387_6212727469166484615_1_16387__"",output plugin ""bdr"", in the change callback, associated LSN
0/2B29E50",,,,"bdr(6212727469166484615,1,16387,):receive" 
2015-11-03 21:45:35.328
UTC,"postgres","pgmirror",4918,"10.1.1.169:36334",56392a26.1336,7,"idle",2015-11-03 21:41:58
UTC,5/146,0,LOG,08P01,"unexpectedEOF on standby connection",,,,,"slot ""bdr_16387_6212727469166484615_1_16387__"",
outputplugin ""bdr"", in the change callback, associated LSN 0/2B29E50",,,,"bdr (6212727469166484615,1,16387,):receive" 
2015-11-03 21:45:35.330
UTC,"postgres","pgmirror",4918,"10.1.1.169:36334",56392a26.1336,8,"idle",2015-11-03 21:41:58
UTC,,0,LOG,00000,"disconnection:session time: 0:03:37.234 user=postgres database=pgmirror host=10.1.1.169
port=36334",,,,,,,,,"bdr(6212727469166484615,1,16387,):receive" 
2015-11-03 21:45:40.421 UTC,,,5066,"",56392b04.13ca,1,"",2015-11-03
21:45:40 UTC,,0,LOG,00000,"connection received: host=10.1.1.169
port=36335",,,,,,,,,""
2015-11-03 21:45:40.427
UTC,"postgres","pgmirror",5066,"10.1.1.169:36335",56392b04.13ca,2,"authentication",2015-11-03 21:45:40
UTC,5/149,0,LOG,00000,"replicationconnection authorized: user=postgres SSL enabled (protocol=TLSv1.2,
cipher=ECDHE-RSA-AES256-GCM-SHA384,compression=off)",,,,,,,,,"" 
2015-11-03 21:45:40.431
UTC,"postgres","pgmirror",5066,"10.1.1.169:36335",56392b04.13ca,3,"idle",2015-11-03 21:45:40
UTC,5/0,0,LOG,00000,"startinglogical decoding for slot ""bdr_16387_6212727469166484615_1_16387__""","streaming
transactionscommitting after 1/14FEDFE0, reading WAL from 0/187AFA0",,,,,,,,"bdr
(6212727469166484615,1,16387,):receive"
2015-11-03 21:45:40.435
UTC,"postgres","pgmirror",5066,"10.1.1.169:36335",56392b04.13ca,4,"idle",2015-11-03 21:45:40
UTC,5/0,0,LOG,00000,"logicaldecoding found consistent point at 0/187AFA0","Logical decoding will begin using saved
snapshot.",,,,,,,,"bdr(6212727469166484615,1,16387,):receive" 
###############################################

On the other node in the cluster (supposed to receive the data via BDR)
I see this in the logs:

###############################################
2015-11-03 21:45:31.885 UTC,,,3391,,56391e57.d3f,19,,2015-11-03 20:51:35
UTC,,0,LOG,00000,"checkpoint starting: xlog",,,,,,,,,""
2015-11-03 21:45:35.400 UTC,,,4773,,56392a26.12a5,1,,2015-11-03 21:41:58
UTC,3/201,876,ERROR,XX000,"connection to other side has
died",,,,,,,,,"bdr (6212727469166484615,1,16387,): apply"
2015-11-03 21:45:35.408 UTC,,,3388,,56391e56.d3c,34,,2015-11-03 20:51:34
UTC,,0,LOG,00000,"worker process: bdr
(6212727469166484615,1,16387,)->bdr (6212727476664052871,1, (PID 4773)
exited with exit code 1",,,,,,,,,""
2015-11-03 21:45:40.412 UTC,,,3388,,56391e56.d3c,35,,2015-11-03 20:51:34
UTC,,0,LOG,00000,"starting background worker process ""bdr
(6212727469166484615,1,16387,)->bdr
(6212727476664052871,1,""",,,,,,,,,""
###############################################


--
Florin Andrei
http://florin.myip.org/


pgsql-general by date:

Previous
From: Florin Andrei
Date:
Subject: Re: BDR: name conflict when joining a rebuilt node
Next
From: Ramesh T
Date:
Subject: Taking lot time