Re: pglogical - logical replication contrib module - Mailing list pgsql-hackers
From | Petr Jelinek |
---|---|
Subject | Re: pglogical - logical replication contrib module |
Date | |
Msg-id | 56992762.1090409@2ndquadrant.com Whole thread Raw |
In response to | Re: pglogical - logical replication contrib module (Steve Singer <steve@ssinger.info>) |
Responses |
Re: pglogical - logical replication contrib module
Re: pglogical - logical replication contrib module |
List | pgsql-hackers |
On 2016-01-10 20:57, Steve Singer wrote: > On 01/09/2016 01:30 PM, Steve Singer wrote: >> On 12/31/2015 06:34 PM, Petr Jelinek wrote: >> > >> I'm not really sure what to do to 'recover' my cluster at this point >> so I'll send this off and rebuild my cluster and start over. >> >> > > I had a setup test1--->test2 (with 2 tables in the default set) > > I then created a third database (all three hosted on the same PG cluster) > > In the third database (test3) > test3=# create extension pglogical; > CREATE EXTENSION > test3=# select pglogical.create_node(node_name:='test3', > dsn:='host=localhost dbname=test3 port=5436'); > create_node > ------------- > 2001662995 > (1 row) > > test3=# select > pglogical.create_subscription(subscription_name:='defaultsub',provider_dsn:='host=localhost > dbname=test2 port=5436'); > create_subscription > --------------------- > 2974019075 > > It copied the schema over but not the data (if I use test2 as the > provider_dsn then it does copy the data). Yes, because you probably don't have any replication sets defined there. That's by design, replication sets are defined per provider and their definition is not replicated. This seems to be the only sane way to actually support merging data from multiple provider nodes. I guess this could be documented better, but cascading is something that's still WIP. > > I then tried inserting a row into a table on test1. Things crashed and > after crash recovery I keep getting > > 2016-01-10 13:03:15 EST LOG: database system is ready to accept > connections > 2016-01-10 13:03:15 EST LOG: autovacuum launcher started > 2016-01-10 13:03:15 EST LOG: starting apply for subscription defaultsub > 2016-01-10 13:03:15 EST LOG: starting apply for subscription defaultsub > 2016-01-10 13:03:15 EST test2LOG: starting logical decoding for slot > "pgl_test3 > _test2_defaultsub" > 2016-01-10 13:03:15 EST test2DETAIL: streaming transactions committing > after 0/ > 18292D8, reading WAL from 0/18292D8 > 2016-01-10 13:03:15 EST test2LO > I asked to subscribe something through psql, even thought it is > asynchronous, if the async operation fails I should be able to learn > about the problem through psql. If I am writing a script to subscribe a > node it needs a way in my script of checking if the subscription has > failed and reporting the error. > My subscription script might not have easy access to the server log. >G: logical decoding found consistent > point at 0/ > 18292D8 > 2016-01-10 13:03:15 EST test2DETAIL: Logical decoding will begin using > saved sn > apshot. > TRAP: FailedAssertion("!(IsTransactionState())", File: "catcache.c", > Line: 1127) > 2016-01-10 13:03:15 EST test2LOG: unexpected EOF on standby connection > 2016-01-10 13:03:15 EST LOG: worker process: pglogical apply > 17016:2974019075 ( > PID 24746) was terminated by signal 6: Aborted > > The stack trace is > > #3 0x00000000007b83af in SearchCatCache (cache=0xe27d18, v1=15015784, > v2=v2@entry=0, v3=v3@entry=0, v4=v4@entry=0) at catcache.c:1127 > #4 0x00000000007c503e in SearchSysCache (cacheId=cacheId@entry=47, > key1=<optimized out>, key2=key2@entry=0, key3=key3@entry=0, > key4=key4@entry=0) at syscache.c:981 > #5 0x00000000006996d4 in replorigin_by_name ( > roname=0xe51f30 "pgl_test2_test1_defaultsub", > missing_ok=missing_ok@entry=0 '\000') at origin.c:216 > #6 0x00007fdb54a908d3 in handle_origin (s=0x7ffd873f6da0) > at pglogical_apply.c:235 > #7 replication_handler (s=0x7ffd873f6da0) at pglogical_apply.c:1031 > #8 apply_work (streamConn=streamConn@entry=0xe84fb0) at > pglogical_apply.c:1309 > #9 0x00007fdb54a911cc in pglogical_apply_main (main_arg=<optimized out>) > at pglogical_apply.c:1691 > #10 0x0000000000674912 in StartBackgroundWorker () at bgworker.c:726 > ---Type <return> to continue, or q <return> to quit--- > #11 0x000000000067f7e2 in do_start_bgworker (rw=0xe03890) at > postmaster.c:5501 > #12 maybe_start_bgworker () at postmaster.c:5676 > #13 0x0000000000680206 in sigusr1_handler > (postgres_signal_arg=<optimized out>) > at postmaster.c:4937 > #14 <signal handler called> > #15 0x00007fdb54fa2293 in __select_nocancel () > at ../sysdeps/unix/syscall-template.S:81 > #16 0x0000000000468285 in ServerLoop () at postmaster.c:1648 > #17 0x000000000068161e in PostmasterMain (argc=argc@entry=3, > argv=argv@entry=0xddede0) at postmaster.c:1292 > #18 0x000000000046979d in main (argc=3, argv=0xddede0) at main.c:223 > > That's bug, fixed. > > I tried dropping the subscription and re-adding it. I keep getting > > 2016-01-10 13:21:48 EST test1LOG: logical decoding found consistent > point at 0/1830080 > 2016-01-10 13:21:48 EST test1DETAIL: There are no running transactions. > 2016-01-10 13:21:48 EST test1LOG: exported logical decoding snapshot: > "000004DE-1" with 0 transaction IDs > 2016-01-10 13:21:48 EST test3ERROR: relation "a" already exists > 2016-01-10 13:21:48 EST test3STATEMENT: CREATE TABLE a ( > a integer NOT NULL, > b integer > ); > > > > pg_restore: [archiver (db)] Error while PROCESSING TOC: > pg_restore: [archiver (db)] Error from TOC entry 182; 1259 16700 TABLE a > ssinger > pg_restore: [archiver (db)] could not execute query: ERROR: relation "a" > already exists > Command was: CREATE TABLE a ( > a integer NOT NULL, > b integer > ); > > > > 2016-01-10 13:21:48 EST ERROR: could not execute command > "/usr/local/pgsql96gitlogical/bin/pg_restore --section="pre-data" > --exit-on-error -1 -d "host=localhost dbname=test3 port=5436" > "/tmp/pglogical-28079.dump"" > 2016-01-10 13:21:48 EST test1LOG: unexpected EOF on client connection > with an open transaction > 2016-01-10 13:21:48 EST LOG: worker process: pglogical apply > 17016:844915593 (PID 28079) exited with exit code 1 > 2016-01-10 13:21:48 EST ERROR: subscriber defaultsub4 initialization > failed during nonrecoverable step (s), please try the setup again > > Which is probably also the cause of the error I reported yesterday (that > I tried creating a subscription without dropping the tables). > From a usability point of view I think we need a way of making this > errors available in the output of pglogical.show_subscription_status(). > Yes it is same reason I explained in previous email, can be solved with synchronize_structure := false in the create_subscription. And yes the show_subscription_status should show the last error. I didn't find good way to do that yet as some errors result in being unable to write to database anymore so this needs to be done either via libpq connection which is ugly or via shmem communication with another process (the manager process seems like good candidate for this), but we don't yet have infrastructure in pglogical to do this. This is definitely on my TODO. > > +- `pglogical.show_subscription_table(subscription_name name, > + relation regclass)` > + Shows synchronization status of a table. > + > + Parameters: > + - `subscription_name` - name of the existing subscription > + - `relation` - name of existing table, optionally qualified > + > > It isn't clear from the documentation what the output of this function > means, nor could I tell looking at it. Is this function just supposed > to tell us if a table is part of the replication set or if it is > 'up-to-date'. It still reports 'synchornized' when a table is behind. > There are several statuses the table goes through, during the COPY it's in synchronizing status, so next logical step seemed to be synchronized. Maybe it should be renamed to 'replicating' instead as that's what it actually means (table has finished synchronization and is now replicating normally). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: