Re: pglogical - logical replication contrib module - Mailing list pgsql-hackers

From Petr Jelinek
Subject Re: pglogical - logical replication contrib module
Date
Msg-id 56992762.1090409@2ndquadrant.com
Whole thread Raw
In response to Re: pglogical - logical replication contrib module  (Steve Singer <steve@ssinger.info>)
Responses Re: pglogical - logical replication contrib module
Re: pglogical - logical replication contrib module
List pgsql-hackers
On 2016-01-10 20:57, Steve Singer wrote:
> On 01/09/2016 01:30 PM, Steve Singer wrote:
>> On 12/31/2015 06:34 PM, Petr Jelinek wrote:
>>
>
>> I'm not really sure what to do to 'recover' my cluster at this point
>> so I'll send this off and rebuild my cluster and start over.
>>
>>
>
> I had a setup test1--->test2 (with 2 tables in the default set)
>
> I then created a third database (all three hosted on the same PG cluster)
>
> In the third database (test3)
> test3=# create extension pglogical;
> CREATE EXTENSION
> test3=# select pglogical.create_node(node_name:='test3',
> dsn:='host=localhost dbname=test3 port=5436');
>   create_node
> -------------
>    2001662995
> (1 row)
>
> test3=# select
> pglogical.create_subscription(subscription_name:='defaultsub',provider_dsn:='host=localhost
> dbname=test2 port=5436');
>   create_subscription
> ---------------------
>            2974019075
>
> It copied the schema over but not the data (if I use test2 as the
> provider_dsn then it does copy the data).

Yes, because you probably don't have any replication sets defined there. 
That's by design, replication sets are defined per provider and their 
definition is not replicated. This seems to be the only sane way to 
actually support merging data from multiple provider nodes. I guess this 
could be documented better, but cascading is something that's still WIP.

>
> I then tried inserting a row into a table on test1.  Things crashed and
> after crash recovery I keep getting
>
> 2016-01-10 13:03:15 EST LOG:  database system is ready to accept
> connections
> 2016-01-10 13:03:15 EST LOG:  autovacuum launcher started
> 2016-01-10 13:03:15 EST LOG:  starting apply for subscription defaultsub
> 2016-01-10 13:03:15 EST LOG:  starting apply for subscription defaultsub
> 2016-01-10 13:03:15 EST test2LOG:  starting logical decoding for slot
> "pgl_test3
> _test2_defaultsub"
> 2016-01-10 13:03:15 EST test2DETAIL:  streaming transactions committing
> after 0/
> 18292D8, reading WAL from 0/18292D8
> 2016-01-10 13:03:15 EST test2LO
> I asked to subscribe something through psql, even thought it is
> asynchronous, if the async operation fails I should be able to learn
> about the problem through psql.  If I am writing a script to subscribe a
> node it needs a way in my script of checking if the subscription has
> failed and reporting the error.
> My subscription script might not have easy access to the server log.
>G:  logical decoding found consistent
> point at 0/
> 18292D8
> 2016-01-10 13:03:15 EST test2DETAIL:  Logical decoding will begin using
> saved sn
> apshot.
> TRAP: FailedAssertion("!(IsTransactionState())", File: "catcache.c",
> Line: 1127)
> 2016-01-10 13:03:15 EST test2LOG:  unexpected EOF on standby connection
> 2016-01-10 13:03:15 EST LOG:  worker process: pglogical apply
> 17016:2974019075 (
> PID 24746) was terminated by signal 6: Aborted
>
> The stack trace is
>
> #3  0x00000000007b83af in SearchCatCache (cache=0xe27d18, v1=15015784,
>      v2=v2@entry=0, v3=v3@entry=0, v4=v4@entry=0) at catcache.c:1127
> #4  0x00000000007c503e in SearchSysCache (cacheId=cacheId@entry=47,
>      key1=<optimized out>, key2=key2@entry=0, key3=key3@entry=0,
>      key4=key4@entry=0) at syscache.c:981
> #5  0x00000000006996d4 in replorigin_by_name (
>      roname=0xe51f30 "pgl_test2_test1_defaultsub",
>      missing_ok=missing_ok@entry=0 '\000') at origin.c:216
> #6  0x00007fdb54a908d3 in handle_origin (s=0x7ffd873f6da0)
>      at pglogical_apply.c:235
> #7  replication_handler (s=0x7ffd873f6da0) at pglogical_apply.c:1031
> #8  apply_work (streamConn=streamConn@entry=0xe84fb0) at
> pglogical_apply.c:1309
> #9  0x00007fdb54a911cc in pglogical_apply_main (main_arg=<optimized out>)
>      at pglogical_apply.c:1691
> #10 0x0000000000674912 in StartBackgroundWorker () at bgworker.c:726
> ---Type <return> to continue, or q <return> to quit---
> #11 0x000000000067f7e2 in do_start_bgworker (rw=0xe03890) at
> postmaster.c:5501
> #12 maybe_start_bgworker () at postmaster.c:5676
> #13 0x0000000000680206 in sigusr1_handler
> (postgres_signal_arg=<optimized out>)
>      at postmaster.c:4937
> #14 <signal handler called>
> #15 0x00007fdb54fa2293 in __select_nocancel ()
>      at ../sysdeps/unix/syscall-template.S:81
> #16 0x0000000000468285 in ServerLoop () at postmaster.c:1648
> #17 0x000000000068161e in PostmasterMain (argc=argc@entry=3,
>      argv=argv@entry=0xddede0) at postmaster.c:1292
> #18 0x000000000046979d in main (argc=3, argv=0xddede0) at main.c:223
>
>

That's bug, fixed.

>
> I tried dropping the subscription and re-adding it.  I keep getting
>
> 2016-01-10 13:21:48 EST test1LOG:  logical decoding found consistent
> point at 0/1830080
> 2016-01-10 13:21:48 EST test1DETAIL:  There are no running transactions.
> 2016-01-10 13:21:48 EST test1LOG:  exported logical decoding snapshot:
> "000004DE-1" with 0 transaction IDs
> 2016-01-10 13:21:48 EST test3ERROR:  relation "a" already exists
> 2016-01-10 13:21:48 EST test3STATEMENT:  CREATE TABLE a (
>          a integer NOT NULL,
>          b integer
>      );
>
>
>
> pg_restore: [archiver (db)] Error while PROCESSING TOC:
> pg_restore: [archiver (db)] Error from TOC entry 182; 1259 16700 TABLE a
> ssinger
> pg_restore: [archiver (db)] could not execute query: ERROR: relation "a"
> already exists
>      Command was: CREATE TABLE a (
>      a integer NOT NULL,
>      b integer
> );
>
>
>
> 2016-01-10 13:21:48 EST ERROR:  could not execute command
> "/usr/local/pgsql96gitlogical/bin/pg_restore --section="pre-data"
> --exit-on-error -1 -d "host=localhost dbname=test3 port=5436"
> "/tmp/pglogical-28079.dump""
> 2016-01-10 13:21:48 EST test1LOG:  unexpected EOF on client connection
> with an open transaction
> 2016-01-10 13:21:48 EST LOG:  worker process: pglogical apply
> 17016:844915593 (PID 28079) exited with exit code 1
> 2016-01-10 13:21:48 EST ERROR:  subscriber defaultsub4 initialization
> failed during nonrecoverable step (s), please try the setup again
>
> Which is probably also the cause of the error I reported yesterday (that
> I tried creating a subscription without dropping the tables).
>  From a usability point of view I think we need a way of making this
> errors available in the output of pglogical.show_subscription_status().
>

Yes it is same reason I explained in previous email, can be solved with 
synchronize_structure := false in the create_subscription.

And yes the show_subscription_status should show the last error. I 
didn't find good way to do that yet as some errors result in being 
unable to write to database anymore so this needs to be done either via 
libpq connection which is ugly or via shmem communication with another 
process (the manager process seems like good candidate for this), but we 
don't yet have infrastructure in pglogical to do this. This is 
definitely on my TODO.

>
> +- `pglogical.show_subscription_table(subscription_name name,
> +  relation regclass)`
> +  Shows synchronization status of a table.
> +
> +  Parameters:
> +  - `subscription_name` - name of the existing subscription
> +  - `relation` - name of existing table, optionally qualified
> +
>
> It isn't clear from the documentation what the output of this function
> means, nor could I tell looking at it.  Is this function just supposed
> to tell us if a table is part of the replication set or if it is
> 'up-to-date'.  It still reports 'synchornized' when a table is behind.
>

There are several statuses the table goes through, during the COPY it's 
in synchronizing status, so next logical step seemed to be synchronized. 
Maybe it should be renamed to 'replicating' instead as that's what it 
actually means (table has finished synchronization and is now 
replicating normally).

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [BUGS] about test_parser installation failure problem(PostgreSQL in 9.5.0)?
Next
From: Abhijit Menon-Sen
Date:
Subject: Re: dealing with extension dependencies that aren't quite 'e'