Re: Logical Replication WIP - Mailing list pgsql-hackers

From Petr Jelinek
Subject Re: Logical Replication WIP
Date
Msg-id 1e0f6bde-09e0-3806-d329-36c90b159900@2ndquadrant.com
Whole thread Raw
In response to Re: Logical Replication WIP  (Steve Singer <steve@ssinger.info>)
List pgsql-hackers
On 13/08/16 17:34, Steve Singer wrote:
> On 08/05/2016 11:00 AM, Petr Jelinek wrote:
>> Hi,
>>
>> as promised here is WIP version of logical replication patch.
>>
>
> Thanks for keeping on this.  This is important work
>
>> Feedback is welcome.
>>
>
> +<sect1 id="logical-replication-publication">
> +  <title>Publication</title>
> +  <para>
> +    A Publication object can be defined on any master node, owned by one
> +    user. A Publication is a set of changes generated from a group of
> +    tables, and might also be described as a Change Set or Replication
> Set.
> +    Each Publication exists in only one database.
>
> 'A publication object can be defined on *any master node*'.  I found
> this confusing the first time I read it because I thought it was
> circular (what makes a node a 'master' node? Having a publication object
> published from it?).   On reflection I realized that you mean ' any
> *physical replication master*'.  I think this might be better worded as
> 'A publication object can be defined on any node other than a standby
> node'.  I think referring to 'master' in the context of logical
> replication might confuse people.

Makes sense to me.

>
> I am raising this in the context of the larger terminology that we want
> to use and potential confusion with the terminology we use for physical
> replication. I like the publication / subscription terminology you've
> gone with.
>
>
>  <para>
> +    Publications are different from table schema and do not affect
> +    how the table is accessed. Each table can be added to multiple
> +    Publications if needed.  Publications may include both tables
> +    and materialized views. Objects must be added explicitly, except
> +    when a Publication is created for "ALL TABLES". There is no
> +    default name for a Publication which specifies all tables.
> +  </para>
> +  <para>
> +    The Publication is different from table schema, it does not affect
> +    how the table is accessed and each table can be added to multiple
>
> Those 2 paragraphs seem to start the same way.  I get the feeling that
> there is some point your trying to express that I'm not catching onto.
> Of course a publication is different than a tables schema, or different
> than a function.

Ah that's relic of some editorialization, will fix. The reason why we 
think it's important to mention the difference between publication and 
schema is that they are the only objects that contain tables but they 
affect them in very different ways which might confuse users.

>
> The definition of publication you have on the CREATE PUBLICATION page
> seems better and should be repeated here (A publication is essentially a
> group of tables intended for managing logical replication. See Section
> 30.1 <cid:part1.06040100.08080900@ssinger.info> for details about how
> publications fit into logical replication setup. )
>
>
> +  <para>
> +    Conflicts happen when the replicated changes is breaking any
> +    specified constraints (with the exception of foreign keys which are
> +    not checked). Currently conflicts are not resolved automatically and
> +    cause replication to be stopped with an error until the conflict is
> +    manually resolved.
>
> What options are there for manually resolving conflicts?  Is the only
> option to change the data on the subscriber to avoid the conflict?
> I assume there isn't a way to flag a particular row coming from the
> publisher and say ignore it.  I don't think this is something we need to
> support for the first version.

Yes you have to update data on subscriber or skip the the replication of 
whole transaction (for which the UI is not very friendly currently as 
you either have to consume the transaction 
pg_logical_slot_get_binary_changes or by moving origin on subscriber 
using pg_replication_origin_advance).

It's relatively easy to add some automatic conflict resolution as well, 
but it didn't seem absolutely necessary so I didn't do it for the 
initial version.

>
> <sect1 id="logical-replication-architecture">
> +  <title>Architecture</title>
> +  <para>
> +    Logical replication starts by copying a snapshot of the data on
> +    the Provider database. Once that is done, the changes on Provider
>
> I notice the user of 'Provider' above do you intend to update that to
> 'Publisher' or does provider mean something different. If we like the
> 'publication' terminology then I think 'publishers' should publish them
> not providers.
>

Okay, I am just used to 'provider' in general (I guess londiste habit), 
but 'publisher' is fine as well.

>
> I'm trying to test a basic subscription and I do the following
>
> I did the following:
>
> cluster 1:
> create database test1;
> create table a(id serial8 primary key,b text);
> create publication testpub1;
>  alter publication testpub1 add table a;
> insert into a(b) values ('1');
>
> cluster2
> create database test1;
> create table a(id serial8 primary key,b text);
> create subscription testsub2 publication testpub1 connection
> 'host=localhost port=5440 dbname=test1';
> NOTICE:  created replication slot "testsub2" on provider
> NOTICE:  synchronized table states
> CREATE SUBSCRIPTION
>
>  [...]
>
> The initial sync completed okay, then I did
>
> insert into a(b) values ('2');
>
> but the second insert never replicated.
>
> I had the following output
>
> LOG:  terminating walsender process due to replication timeout
>
>
> On cluster 1 I do
>
> select * FROM pg_stat_replication;
>  pid | usesysid | usename | application_name | client_addr |
> client_hostname | client_port | backend_start |
> backend_xmin | state | sent_location | write_location | flush_location |
> replay_location | sync_priority | sy
> nc_state
> -----+----------+---------+------------------+-------------+-----------------+-------------+---------------+-
>
> -------------+-------+---------------+----------------+----------------+-----------------+---------------+---
>
> ---------
> (0 rows)
>
>
>
> If I then kill  the cluster2 postmaster, I have to do a -9 or it won't die
>

That might explain why it didn't replicate. The wait loops in apply 
worker clearly need some work. Thanks for the report.

--    Petr Jelinek                  http://www.2ndQuadrant.com/   PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Vladimir Sitnikov
Date:
Subject: Re: Slowness of extended protocol
Next
From: Stas Kelvich
Date:
Subject: Re: Logical Replication WIP