Thread: BDR Rejoin of failed node, hangs.
Good Afternoon, I am in the process of testing out BDR and am having problems rejoining after a simulated loss of one node. The join hangswhile waiting to complete and there interesting errors in the logs. Simulation process After creating the database on both nodes. Shutdown postgres and reset db on node 2 rm -rf /var/lib/postgresql/9.4/main/* /usr/lib/postgresql/9.4/bin/initdb -D /var/lib/postgresql/9.4/main -A trust -U postgre Clean up node 1. select bdr.bdr_part_by_node_names('{node2}'); delete from bdr.bdr_nodes where node_status='k'; When I try to re-add node2 Using the NODE 2 Create sql again, it hangs SELECT bdr.bdr_node_join_wait_for_ready(); hangs. The log on node2 has errors http://pastebin.com/8ZsTe5cG 55000: System identification mismatch between connection and slot 00000: worker process: bdr db: bdrdemo (PID 12042) exited with exit code 1 The log on node1 does not have any errors. http://pastebin.com/njVJ9WX7 Both nodes show up in select * from bdr.bdr_nodes; on node1 and node2; Database Creation: NODE 1 create database bdrdemo; \connect bdrdemo CREATE EXTENSION btree_gist; CREATE EXTENSION bdr; select bdr.bdr_group_create(local_node_name := 'node1', node_external_dsn := 'host=192.168.101.41 port=5432 dbname=bdrdemo'); SELECT bdr.bdr_node_join_wait_for_ready(); NODE 2 create database bdrdemo; \connect bdrdemo CREATE EXTENSION btree_gist; CREATE EXTENSION bdr; select bdr.bdr_group_join(local_node_name := 'node2', node_external_dsn := 'host=192.168.101.42 port=5432 dbname=bdrdemo',join_using_dsn := 'host=192.168.101.41 port=5432 dbname=bdrdemo'); SELECT bdr.bdr_node_join_wait_for_ready(); Ubuntu 14.04 packages ii postgresql-bdr-9.4 9.4.4-1trusty amd64 object-relational SQL database, version9.4 server ii postgresql-bdr-9.4-bdr-plugin 0.9.2-1trusty amd64 BDR Plugin for PostgreSQL-BDR 9.4 ii postgresql-bdr-client-9.4 9.4.4-1trusty amd64 front-end programs for PostgreSQL-BDR9.4 ii postgresql-bdr-contrib-9.4 9.4.4-1trusty amd64 additional facilities for PostgreSQL ii postgresql-client-common 154 all manager for multiple PostgreSQL clientversions ii postgresql-common 154 all PostgreSQL database-cluster manager TIA, Steve ________________________________ [http://www.akunacapital.com/images/akuna.png] Steve Pribyl | Senior Systems Engineer Akuna Capital LLC 36 S Wabash, Suite 310 Chicago IL 60603 USA | www.akunacapital.com <http://www.akunacapital.com> p: +1 312 994 4646 | m: | f: +1 312 750 1667 | spribyl@akunacapital.com Please consider the environment, before printing this email. This electronic message contains information from Akuna Capital LLC that may be confidential, legally privileged or otherwiseprotected from disclosure. This information is intended for the use of the addressee only and is not offered asinvestment advice to be relied upon for personal or professional use. Additionally, all electronic messages are recordedand stored in compliance pursuant to applicable SEC rules. If you are not the intended recipient, you are herebynotified that any disclosure, copying, distribution, printing or any other use of, or any action in reliance on, thecontents of this electronic message is strictly prohibited. If you have received this communication in error, please notifyus by telephone at (312)994-4640 and destroy the original message.
On 10/1/15 12:27 PM, Steve Pribyl wrote: > I am in the process of testing out BDR Please send BDR requests to the BDR mailing list. Thanks! -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
Jim Nasby wrote: > On 10/1/15 12:27 PM, Steve Pribyl wrote: > >I am in the process of testing out BDR > > Please send BDR requests to the BDR mailing list. Thanks! pgsql-general remains the BDR list. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Good Morning, Has anyone had a moment to look at this? It is a bit of a show stopper. Thanks Steve ________________________________________ From: pgsql-general-owner@postgresql.org <pgsql-general-owner@postgresql.org> on behalf of Steve Pribyl <spribyl@akunacapital.com> Sent: Thursday, October 1, 2015 12:27 PM To: pgsql-general@postgresql.org Subject: [GENERAL] BDR Rejoin of failed node, hangs. Good Afternoon, I am in the process of testing out BDR and am having problems rejoining after a simulated loss of one node. The join hangswhile waiting to complete and there interesting errors in the logs. Simulation process After creating the database on both nodes. Shutdown postgres and reset db on node 2 rm -rf /var/lib/postgresql/9.4/main/* /usr/lib/postgresql/9.4/bin/initdb -D /var/lib/postgresql/9.4/main -A trust -U postgre Clean up node 1. select bdr.bdr_part_by_node_names('{node2}'); delete from bdr.bdr_nodes where node_status='k'; When I try to re-add node2 Using the NODE 2 Create sql again, it hangs SELECT bdr.bdr_node_join_wait_for_ready(); hangs. The log on node2 has errors http://pastebin.com/8ZsTe5cG 55000: System identification mismatch between connection and slot 00000: worker process: bdr db: bdrdemo (PID 12042) exited with exit code 1 The log on node1 does not have any errors. http://pastebin.com/njVJ9WX7 Both nodes show up in select * from bdr.bdr_nodes; on node1 and node2; Database Creation: NODE 1 create database bdrdemo; \connect bdrdemo CREATE EXTENSION btree_gist; CREATE EXTENSION bdr; select bdr.bdr_group_create(local_node_name := 'node1', node_external_dsn := 'host=192.168.101.41 port=5432 dbname=bdrdemo'); SELECT bdr.bdr_node_join_wait_for_ready(); NODE 2 create database bdrdemo; \connect bdrdemo CREATE EXTENSION btree_gist; CREATE EXTENSION bdr; select bdr.bdr_group_join(local_node_name := 'node2', node_external_dsn := 'host=192.168.101.42 port=5432 dbname=bdrdemo',join_using_dsn := 'host=192.168.101.41 port=5432 dbname=bdrdemo'); SELECT bdr.bdr_node_join_wait_for_ready(); Ubuntu 14.04 packages ii postgresql-bdr-9.4 9.4.4-1trusty amd64 object-relational SQL database, version9.4 server ii postgresql-bdr-9.4-bdr-plugin 0.9.2-1trusty amd64 BDR Plugin for PostgreSQL-BDR 9.4 ii postgresql-bdr-client-9.4 9.4.4-1trusty amd64 front-end programs for PostgreSQL-BDR9.4 ii postgresql-bdr-contrib-9.4 9.4.4-1trusty amd64 additional facilities for PostgreSQL ii postgresql-client-common 154 all manager for multiple PostgreSQL clientversions ii postgresql-common 154 all PostgreSQL database-cluster manager TIA, Steve ________________________________ [http://www.akunacapital.com/images/akuna.png] Steve Pribyl | Senior Systems Engineer Akuna Capital LLC 36 S Wabash, Suite 310 Chicago IL 60603 USA | www.akunacapital.com <http://www.akunacapital.com> p: +1 312 994 4646 | m: | f: +1 312 750 1667 | spribyl@akunacapital.com Please consider the environment, before printing this email. This electronic message contains information from Akuna Capital LLC that may be confidential, legally privileged or otherwiseprotected from disclosure. This information is intended for the use of the addressee only and is not offered asinvestment advice to be relied upon for personal or professional use. Additionally, all electronic messages are recordedand stored in compliance pursuant to applicable SEC rules. If you are not the intended recipient, you are herebynotified that any disclosure, copying, distribution, printing or any other use of, or any action in reliance on, thecontents of this electronic message is strictly prohibited. If you have received this communication in error, please notifyus by telephone at (312)994-4640 and destroy the original message. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general ________________________________ [http://www.akunacapital.com/images/akuna.png] Steve Pribyl | Senior Systems Engineer Akuna Capital LLC 36 S Wabash, Suite 310 Chicago IL 60603 USA | www.akunacapital.com <http://www.akunacapital.com> p: +1 312 994 4646 | m: | f: +1 312 750 1667 | spribyl@akunacapital.com Please consider the environment, before printing this email. This electronic message contains information from Akuna Capital LLC that may be confidential, legally privileged or otherwiseprotected from disclosure. This information is intended for the use of the addressee only and is not offered asinvestment advice to be relied upon for personal or professional use. Additionally, all electronic messages are recordedand stored in compliance pursuant to applicable SEC rules. If you are not the intended recipient, you are herebynotified that any disclosure, copying, distribution, printing or any other use of, or any action in reliance on, thecontents of this electronic message is strictly prohibited. If you have received this communication in error, please notifyus by telephone at (312)994-4640 and destroy the original message.
On 5 October 2015 at 20:58, Steve Pribyl <spribyl@akunacapital.com> wrote: > Clean up node 1. > select bdr.bdr_part_by_node_names('{node2}'); > delete from bdr.bdr_nodes where node_status='k'; You need to delete the bdr.bdr_connections entry too. 0.9.3 will fix that, so orphan connections entries and those associated with terminated nodes are ignored. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
That was it thanks. Steve Pribyl Sr. Systems Engineer steve.pribyl@akunacapital.com Desk: 312-994-4646 ________________________________________ From: Craig Ringer <craig@2ndquadrant.com> Sent: Tuesday, October 6, 2015 12:35 AM To: Steve Pribyl Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] BDR Rejoin of failed node, hangs. On 5 October 2015 at 20:58, Steve Pribyl <spribyl@akunacapital.com> wrote: > Clean up node 1. > select bdr.bdr_part_by_node_names('{node2}'); > delete from bdr.bdr_nodes where node_status='k'; You need to delete the bdr.bdr_connections entry too. 0.9.3 will fix that, so orphan connections entries and those associated with terminated nodes are ignored. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services ________________________________ [http://www.akunacapital.com/images/akuna.png] Steve Pribyl | Senior Systems Engineer Akuna Capital LLC 36 S Wabash, Suite 310 Chicago IL 60603 USA | www.akunacapital.com <http://www.akunacapital.com> p: +1 312 994 4646 | m: | f: +1 312 750 1667 | spribyl@akunacapital.com Please consider the environment, before printing this email. This electronic message contains information from Akuna Capital LLC that may be confidential, legally privileged or otherwiseprotected from disclosure. This information is intended for the use of the addressee only and is not offered asinvestment advice to be relied upon for personal or professional use. Additionally, all electronic messages are recordedand stored in compliance pursuant to applicable SEC rules. If you are not the intended recipient, you are herebynotified that any disclosure, copying, distribution, printing or any other use of, or any action in reliance on, thecontents of this electronic message is strictly prohibited. If you have received this communication in error, please notifyus by telephone at (312)994-4640 and destroy the original message.