BDR: SSL error: bad write retry - Mailing list pgsql-general
From | Florin Andrei |
---|---|
Subject | BDR: SSL error: bad write retry |
Date | |
Msg-id | d3da40a0ff19413ffecf7ec482b7c4c6@andrei.myip.org Whole thread Raw |
Responses |
Re: BDR: SSL error: bad write retry
|
List | pgsql-general |
BDR-0.9.3 and PG-9.4.4 on Ubuntu 14.04 Two nodes, BDR replication. Cluster is newly created, no nodes have been removed from it. Making/deleting small tables works well across the cluster. Now I'm trying to pg_restore a larger database from another system (pg_dump output file is 3.1 GB compressed). I am running pg_restore on one node in the cluster, and hoping that BDR would replicate changes to the other node. It went well through creating schemas, extensions, comments, views, functions, sequences and tables. It imported data into tables. Now it's trying to create indexes - but it takes a very long time and I see errors in the logs. I see periodic data transfers between instances. CPU usage shoots up, then back down. There's a cycle of a few minutes where these things occur. On the instance where I'm running pg_restore, I see this in the logs: ############################################### 2015-11-03 21:45:35.328 UTC,"postgres","pgmirror",4918,"10.1.1.169:36334",56392a26.1336,5,"idle",2015-11-03 21:41:58 UTC,5/146,0,LOG,08P01,"SSL error:bad write retry",,,,,"slot ""bdr_16387_6212727469166484615_1_16387__"", output plugin ""bdr"", in the change callback,associated LSN 0/2B29E50",,,,"bdr (6212727469166484615,1,16387,):receive" 2015-11-03 21:45:35.328 UTC,"postgres","pgmirror",4918,"10.1.1.169:36334",56392a26.1336,6,"idle",2015-11-03 21:41:58 UTC,5/146,0,LOG,08006,"couldnot receive data from client: Connection reset by peer",,,,,"slot ""bdr_16387_6212727469166484615_1_16387__"",output plugin ""bdr"", in the change callback, associated LSN 0/2B29E50",,,,"bdr(6212727469166484615,1,16387,):receive" 2015-11-03 21:45:35.328 UTC,"postgres","pgmirror",4918,"10.1.1.169:36334",56392a26.1336,7,"idle",2015-11-03 21:41:58 UTC,5/146,0,LOG,08P01,"unexpectedEOF on standby connection",,,,,"slot ""bdr_16387_6212727469166484615_1_16387__"", outputplugin ""bdr"", in the change callback, associated LSN 0/2B29E50",,,,"bdr (6212727469166484615,1,16387,):receive" 2015-11-03 21:45:35.330 UTC,"postgres","pgmirror",4918,"10.1.1.169:36334",56392a26.1336,8,"idle",2015-11-03 21:41:58 UTC,,0,LOG,00000,"disconnection:session time: 0:03:37.234 user=postgres database=pgmirror host=10.1.1.169 port=36334",,,,,,,,,"bdr(6212727469166484615,1,16387,):receive" 2015-11-03 21:45:40.421 UTC,,,5066,"",56392b04.13ca,1,"",2015-11-03 21:45:40 UTC,,0,LOG,00000,"connection received: host=10.1.1.169 port=36335",,,,,,,,,"" 2015-11-03 21:45:40.427 UTC,"postgres","pgmirror",5066,"10.1.1.169:36335",56392b04.13ca,2,"authentication",2015-11-03 21:45:40 UTC,5/149,0,LOG,00000,"replicationconnection authorized: user=postgres SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES256-GCM-SHA384,compression=off)",,,,,,,,,"" 2015-11-03 21:45:40.431 UTC,"postgres","pgmirror",5066,"10.1.1.169:36335",56392b04.13ca,3,"idle",2015-11-03 21:45:40 UTC,5/0,0,LOG,00000,"startinglogical decoding for slot ""bdr_16387_6212727469166484615_1_16387__""","streaming transactionscommitting after 1/14FEDFE0, reading WAL from 0/187AFA0",,,,,,,,"bdr (6212727469166484615,1,16387,):receive" 2015-11-03 21:45:40.435 UTC,"postgres","pgmirror",5066,"10.1.1.169:36335",56392b04.13ca,4,"idle",2015-11-03 21:45:40 UTC,5/0,0,LOG,00000,"logicaldecoding found consistent point at 0/187AFA0","Logical decoding will begin using saved snapshot.",,,,,,,,"bdr(6212727469166484615,1,16387,):receive" ############################################### On the other node in the cluster (supposed to receive the data via BDR) I see this in the logs: ############################################### 2015-11-03 21:45:31.885 UTC,,,3391,,56391e57.d3f,19,,2015-11-03 20:51:35 UTC,,0,LOG,00000,"checkpoint starting: xlog",,,,,,,,,"" 2015-11-03 21:45:35.400 UTC,,,4773,,56392a26.12a5,1,,2015-11-03 21:41:58 UTC,3/201,876,ERROR,XX000,"connection to other side has died",,,,,,,,,"bdr (6212727469166484615,1,16387,): apply" 2015-11-03 21:45:35.408 UTC,,,3388,,56391e56.d3c,34,,2015-11-03 20:51:34 UTC,,0,LOG,00000,"worker process: bdr (6212727469166484615,1,16387,)->bdr (6212727476664052871,1, (PID 4773) exited with exit code 1",,,,,,,,,"" 2015-11-03 21:45:40.412 UTC,,,3388,,56391e56.d3c,35,,2015-11-03 20:51:34 UTC,,0,LOG,00000,"starting background worker process ""bdr (6212727469166484615,1,16387,)->bdr (6212727476664052871,1,""",,,,,,,,,"" ############################################### -- Florin Andrei http://florin.myip.org/
pgsql-general by date: