Re: pglogical - logical replication contrib module - Mailing list pgsql-hackers
From | Steve Singer |
---|---|
Subject | Re: pglogical - logical replication contrib module |
Date | |
Msg-id | 5692B798.2050002@ssinger.info Whole thread Raw |
In response to | Re: pglogical - logical replication contrib module (Steve Singer <steve@ssinger.info>) |
Responses |
Re: pglogical - logical replication contrib module
|
List | pgsql-hackers |
On 01/09/2016 01:30 PM, Steve Singer wrote: > On 12/31/2015 06:34 PM, Petr Jelinek wrote: > > I'm not really sure what to do to 'recover' my cluster at this point > so I'll send this off and rebuild my cluster and start over. > > I had a setup test1--->test2 (with 2 tables in the default set) I then created a third database (all three hosted on the same PG cluster) In the third database (test3) test3=# create extension pglogical; CREATE EXTENSION test3=# select pglogical.create_node(node_name:='test3', dsn:='host=localhost dbname=test3 port=5436'); create_node ------------- 2001662995 (1 row) test3=# select pglogical.create_subscription(subscription_name:='defaultsub',provider_dsn:='host=localhost dbname=test2 port=5436'); create_subscription --------------------- 2974019075 It copied the schema over but not the data (if I use test2 as the provider_dsn then it does copy the data). I then tried inserting a row into a table on test1. Things crashed and after crash recovery I keep getting 2016-01-10 13:03:15 EST LOG: database system is ready to accept connections 2016-01-10 13:03:15 EST LOG: autovacuum launcher started 2016-01-10 13:03:15 EST LOG: starting apply for subscription defaultsub 2016-01-10 13:03:15 EST LOG: starting apply for subscription defaultsub 2016-01-10 13:03:15 EST test2LOG: starting logical decoding for slot "pgl_test3 _test2_defaultsub" 2016-01-10 13:03:15 EST test2DETAIL: streaming transactions committing after 0/ 18292D8, reading WAL from 0/18292D8 2016-01-10 13:03:15 EST test2LOG: logical decoding found consistent point at 0/ 18292D8 2016-01-10 13:03:15 EST test2DETAIL: Logical decoding will begin using saved sn apshot. TRAP: FailedAssertion("!(IsTransactionState())", File: "catcache.c", Line: 1127) 2016-01-10 13:03:15 EST test2LOG: unexpected EOF on standby connection 2016-01-10 13:03:15 EST LOG: worker process: pglogical apply 17016:2974019075 ( PID 24746) was terminated by signal 6: Aborted The stack trace is #3 0x00000000007b83af in SearchCatCache (cache=0xe27d18, v1=15015784, v2=v2@entry=0, v3=v3@entry=0, v4=v4@entry=0) atcatcache.c:1127 #4 0x00000000007c503e in SearchSysCache (cacheId=cacheId@entry=47, key1=<optimized out>, key2=key2@entry=0, key3=key3@entry=0, key4=key4@entry=0) at syscache.c:981 #5 0x00000000006996d4 in replorigin_by_name ( roname=0xe51f30 "pgl_test2_test1_defaultsub", missing_ok=missing_ok@entry=0'\000') at origin.c:216 #6 0x00007fdb54a908d3 in handle_origin (s=0x7ffd873f6da0) at pglogical_apply.c:235 #7 replication_handler (s=0x7ffd873f6da0) at pglogical_apply.c:1031 #8 apply_work (streamConn=streamConn@entry=0xe84fb0) at pglogical_apply.c:1309 #9 0x00007fdb54a911cc in pglogical_apply_main (main_arg=<optimized out>) at pglogical_apply.c:1691 #10 0x0000000000674912 in StartBackgroundWorker () at bgworker.c:726 ---Type <return> to continue, or q <return> to quit--- #11 0x000000000067f7e2 in do_start_bgworker (rw=0xe03890) at postmaster.c:5501 #12 maybe_start_bgworker () at postmaster.c:5676 #13 0x0000000000680206 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:4937 #14 <signal handler called> #15 0x00007fdb54fa2293 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:81 #16 0x0000000000468285 in ServerLoop () at postmaster.c:1648 #17 0x000000000068161e in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0xddede0) at postmaster.c:1292 #18 0x000000000046979d in main (argc=3, argv=0xddede0) at main.c:223 I tried dropping the subscription and re-adding it. I keep getting 2016-01-10 13:21:48 EST test1LOG: logical decoding found consistent point at 0/1830080 2016-01-10 13:21:48 EST test1DETAIL: There are no running transactions. 2016-01-10 13:21:48 EST test1LOG: exported logical decoding snapshot: "000004DE-1" with 0 transaction IDs 2016-01-10 13:21:48 EST test3ERROR: relation "a" already exists 2016-01-10 13:21:48 EST test3STATEMENT: CREATE TABLE a ( a integer NOT NULL, b integer ); pg_restore: [archiver (db)] Error while PROCESSING TOC: pg_restore: [archiver (db)] Error from TOC entry 182; 1259 16700 TABLE a ssinger pg_restore: [archiver (db)] could not execute query: ERROR: relation "a" already exists Command was: CREATE TABLE a ( a integer NOT NULL, b integer ); 2016-01-10 13:21:48 EST ERROR: could not execute command "/usr/local/pgsql96gitlogical/bin/pg_restore --section="pre-data" --exit-on-error -1 -d "host=localhost dbname=test3 port=5436" "/tmp/pglogical-28079.dump"" 2016-01-10 13:21:48 EST test1LOG: unexpected EOF on client connection with an open transaction 2016-01-10 13:21:48 EST LOG: worker process: pglogical apply 17016:844915593 (PID 28079) exited with exit code 1 2016-01-10 13:21:48 EST ERROR: subscriber defaultsub4 initialization failed during nonrecoverable step (s), please try the setup again Which is probably also the cause of the error I reported yesterday (that I tried creating a subscription without dropping the tables).From a usability point of view I think we need a way of makingthis errors available in the output of pglogical.show_subscription_status(). I asked to subscribe something through psql, even thought it is asynchronous, if the async operation fails I should be able to learn about the problem through psql. If I am writing a script to subscribe a node it needs a way in my script of checking if the subscription has failed and reporting the error. My subscription script might not have easy access to the server log. +- `pglogical.show_subscription_table(subscription_name name, + relation regclass)` + Shows synchronization status of a table. + + Parameters: + - `subscription_name` - name of the existing subscription + - `relation` - name of existing table, optionally qualified + It isn't clear from the documentation what the output of this function means, nor could I tell looking at it. Is this function just supposed to tell us if a table is part of the replication set or if it is 'up-to-date'. It still reports 'synchornized' when a table is behind.
pgsql-hackers by date: