Re: Logical Replication WIP - Mailing list pgsql-hackers

From Steve Singer
Subject Re: Logical Replication WIP
Date
Msg-id 57AF3E1A.8090904@ssinger.info
Whole thread Raw
In response to Logical Replication WIP  (Petr Jelinek <petr@2ndquadrant.com>)
Responses Re: Logical Replication WIP  (Petr Jelinek <petr@2ndquadrant.com>)
List pgsql-hackers
On 08/05/2016 11:00 AM, Petr Jelinek wrote:
> Hi,
>
> as promised here is WIP version of logical replication patch.
>

Thanks for keeping on this.  This is important work

> Feedback is welcome.
>

+<sect1 id="logical-replication-publication">
+  <title>Publication</title>
+  <para>
+    A Publication object can be defined on any master node, owned by one
+    user. A Publication is a set of changes generated from a group of
+    tables, and might also be described as a Change Set or Replication Set.
+    Each Publication exists in only one database.

'A publication object can be defined on *any master node*'.  I found 
this confusing the first time I read it because I thought it was 
circular (what makes a node a 'master' node? Having a publication object 
published from it?).   On reflection I realized that you mean ' any 
*physical replication master*'.  I think this might be better worded as 
'A publication object can be defined on any node other than a standby 
node'.  I think referring to 'master' in the context of logical 
replication might confuse people.

I am raising this in the context of the larger terminology that we want 
to use and potential confusion with the terminology we use for physical 
replication. I like the publication / subscription terminology you've 
gone with.

 <para>
+    Publications are different from table schema and do not affect
+    how the table is accessed. Each table can be added to multiple
+    Publications if needed.  Publications may include both tables
+    and materialized views. Objects must be added explicitly, except
+    when a Publication is created for "ALL TABLES". There is no
+    default name for a Publication which specifies all tables.
+  </para>
+  <para>
+    The Publication is different from table schema, it does not affect
+    how the table is accessed and each table can be added to multiple

Those 2 paragraphs seem to start the same way.  I get the feeling that 
there is some point your trying to express that I'm not catching onto. 
Of course a publication is different than a tables schema, or different 
than a function.

The definition of publication you have on the CREATE PUBLICATION page 
seems better and should be repeated here (A publication is essentially a 
group of tables intended for managing logical replication. See Section 
30.1 <cid:part1.06040100.08080900@ssinger.info> for details about how 
publications fit into logical replication setup. )


+  <para>
+    Conflicts happen when the replicated changes is breaking any
+    specified constraints (with the exception of foreign keys which are
+    not checked). Currently conflicts are not resolved automatically and
+    cause replication to be stopped with an error until the conflict is
+    manually resolved.

What options are there for manually resolving conflicts?  Is the only 
option to change the data on the subscriber to avoid the conflict?
I assume there isn't a way to flag a particular row coming from the 
publisher and say ignore it.  I don't think this is something we need to 
support for the first version.

<sect1 id="logical-replication-architecture">
+  <title>Architecture</title>
+  <para>
+    Logical replication starts by copying a snapshot of the data on
+    the Provider database. Once that is done, the changes on Provider

I notice the user of 'Provider' above do you intend to update that to 
'Publisher' or does provider mean something different. If we like the 
'publication' terminology then I think 'publishers' should publish them 
not providers.


I'm trying to test a basic subscription and I do the following

I did the following:

cluster 1:
create database test1;
create table a(id serial8 primary key,b text);
create publication testpub1; alter publication testpub1 add table a;
insert into a(b) values ('1');

cluster2
create database test1;
create table a(id serial8 primary key,b text);
create subscription testsub2 publication testpub1 connection 
'host=localhost port=5440 dbname=test1';
NOTICE:  created replication slot "testsub2" on provider
NOTICE:  synchronized table states
CREATE SUBSCRIPTION

This resulted in
LOG:  logical decoding found consistent point at 0/15625E0
DETAIL:  There are no running transactions.
LOG:  exported logical decoding snapshot: "00000494-1" with 0 
transaction IDs
LOG:  logical replication apply for subscription testsub2 started
LOG:  starting logical decoding for slot "testsub2"
DETAIL:  streaming transactions committing after 0/1562618, reading WAL 
from 0/15625E0
LOG:  logical decoding found consistent point at 0/15625E0
DETAIL:  There are no running transactions.
LOG:  logical replication sync for subscription testsub2, table a started
LOG:  logical decoding found consistent point at 0/1562640
DETAIL:  There are no running transactions.
LOG:  exported logical decoding snapshot: "00000495-1" with 0 
transaction IDs
LOG:  logical replication synchronization worker finished processing


The initial sync completed okay, then I did

insert into a(b) values ('2');

but the second insert never replicated.

I had the following output

LOG:  terminating walsender process due to replication timeout


On cluster 1 I do

select * FROM pg_stat_replication; pid | usesysid | usename | application_name | client_addr | 
client_hostname | client_port | backend_start |
backend_xmin | state | sent_location | write_location | flush_location | 
replay_location | sync_priority | sy
nc_state
-----+----------+---------+------------------+-------------+-----------------+-------------+---------------+-
-------------+-------+---------------+----------------+----------------+-----------------+---------------+---
---------
(0 rows)



If I then kill  the cluster2 postmaster, I have to do a -9 or it won't die

I get

LOG:  worker process: logical replication worker 16396 sync 16387 (PID 
3677) exited with exit code 1
WARNING:  could not launch logical replication worker
LOG:  logical replication sync for subscription testsub2, table a started
ERROR:  replication slot "testsub2_sync_a" does not exist
ERROR:  could not start WAL streaming: ERROR:  replication slot 
"testsub2_sync_a" does not exist

I'm not really sure what I need to do to debug this, I suspect the 
worker on cluster2 is having some issue.




> [1] 
>
https://www.postgresql.org/message-id/flat/CANP8%2Bj%2BNMHP-yFvoG03tpb4_s7GdmnCriEEOJeKkXWmUu_%3D-HA%40mail.gmail.com#CANP8+j+NMHP-yFvoG03tpb4_s7GdmnCriEEOJeKkXWmUu_=-HA@mail.gmail.com
>
>
>




pgsql-hackers by date:

Previous
From: Andrew Gierth
Date:
Subject: Re: No longer possible to query catalogs for index capabilities?
Next
From: Alvaro Herrera
Date:
Subject: Re: max_parallel_degree > 0 for 9.6 beta