Thread: Design for In-Core Logical Replication

Design for In-Core Logical Replication

From
Simon Riggs
Date:
At PgCon we discussed that Petr Jelinek would be working on the code for an in-core logical replication implementation, while I would work on user interface/security models. Petr has been actively working on the code and will post patch in a few weeks, as discussed and agreed. Craig Ringer is also active in coding necessary aspects. None of those things are discussed further here at this time.

In this post, Petr and I present a joint view on a design for how this should work in-core, based upon our implementation experiences with physical replication, pglogical and various comments so far.

Note that this has substantial user-visible differences from pglogical, though much of the underlying architecture is reused.

I should stress that not all of the aspects are implemented yet. The post here today is a combination of all of our attempts to bring architecture, usability and security into one place, including a coherent way of describing the features and how they work.

Your comments and questions are sought now as we begin the main development effort to get this into PostgreSQL 10.0


<!-- doc/src/sgml/logical-replication.sgml -->

<chapter id="logical-replication">

  <title>Logical Replication</title>
  <para>
    Logical Replication is a method of replicating data objects and their
    changes, based upon their Primary Keys (or Replication Identity). We
    use the term Logical in contrast to Physical replication which
    uses exact block addresses and byte-by-byte replication.
    PostgreSQL supports both mechanisms concurrently, see
    <xref linkend="high-availability">. Logical Replication allows
    fine-grained control over both data replication and security.
  </para>
  <para>
    Logical Replication uses a Publish and Subscribe model with one or
    more Subscribers subscribing to one or more Publications on a
    Provider node. Subscribers pull data from the Publications they
    subscribe to and may subsequently re-publish data to allow
    cascading replication or more complex configurations.
  </para>
  <para>
    Data for committed transactions is streamed in real-time to each
    Subscriber.  Logical replication might also be described as Change
    Data Capture (CDC) or Transactional Replication.
  </para>
  <para>
    The typical use-cases for logical replication are:
  </para>
  <itemizedlist>
    <listitem>
      <para>
        Replicating between different major versions of the PostgreSQL
      </para>
    </listitem>
    <listitem>
      <para>
        Replicating a database in full to another master node.
      </para>
    </listitem>
    <listitem>
      <para>
        Replicating a subset of a database to another master node.
      </para>
    </listitem>
    <listitem>
      <para>
        Firing triggers for individual changes as they are incoming to
        subscriber.
      </para>
    </listitem>
    <listitem>
      <para>
        Gathering data from multiple databases into a single one (for example
        for analytical purposes).
      </para>
    </listitem>
  </itemizedlist>

<sect1 id="publication">
  <title>Publication</title>
  <para>
    A Publication object can be defined on any master node, owned by one
    user. A Publication is a set of changes generated from a group of
    tables, and might also be described as a Change Set or Replication Set.
    Each Publication exists in only one database.
  </para>
  <para>
    Publications are different from table schema and do not affect
    how the table is accessed. Each table can be added to multiple
    Publications if needed.  Publications may include both tables
    and materialized views. Objects must be added explicitly, except
    when a Publication is created for "ALL TABLES". There is no
    default name for a Publication which specifies all tables.
  </para>
  <para>
    Tables added to a Publication must be accessible via SELECT
    privilege for the user owning the Publication. Usage on the
    Publication can be GRANTed to other users.
  </para>
  <para>
    Publications can choose to limit the changes they show using any
    combination of INSERT, UPDATE, DELETE and TRUNCATE in a similar
    way to the way triggers are fired by particular event types.
  </para>
  <para>
    When UPDATEs and DELETEs are replicated by a Publication, all tables
    added must have a unique index present on the REPLICA IDENTITY for
    the table, or the addition will be refused.
  </para>
  <para>
    The definition of a Publication object will be included within
    pg_dump by default when all of the objects in the Publication are
    requested as part of the dump specification.
  </para>
  <para>
    Every Publication can have zero, one or more Subscribers.
  </para>
  <para>
    Publications are created using the <xref linkend="sql-createpublication">
    command and may be later altered or dropped using corresponding commands.
  </para>
  <para>
    The individual tables can be added and removed dynamically using
    <xref linkend="sql-alterpublication">. Both the ADD TABLE and DROP
    TABLE operations are transactional so the table will start or stop
    replicating at the correct snapshot once the transaction has committed.
  </para>
</sect1>
<sect1 id="subscription">
  <title>Subscription</title>
  <para>
    A Subscription is the downstream side of the Logical Replication. The
    node where Subscription is defined is referred to as Subscriber.
    Subscription defines the connection to another database and set of
    Publications (one or more) to which it wants to be subscribed.
    It is possible to have a Subscription that currently has no
    Publications.
  </para>
  <para>
    The Subscriber database behaves in a same way as any other
    PostgreSQL instance and can be used as a Provider for other
    databases by defining its own Publications.
  </para>
  <para>
    A Subscriber may have multiple Subscriptions if desired. It is
    possible to define multiple Subscriptions between single
    Provider-Subscriber pair, provided that each Publications can only
    be subscribed to from one Subcriber.
  </para>
  <para>
    Each Subscription will receive changes via one replication slot (see
    <xref linkend="streaming-replication-slots">). Additional temporary
    replication slots may be required for the initial data synchronizations
    of pre-existing table data.
  </para>
  <para>
    Subscriptions are not dumped by pg_dump by default, but can be
    requested using --subscriptions parameter.
  </para>
  <para>
    The Subscription is added using <xref linkend="sql-createsubscription">
    and can be stopped/resumed at any time using
    <xref linkend="sql-altersubscription"> command or removed using
    <xref linkend="sql-dropsubscription">.
  </para>
  <para>
    When a subscription is dropped and recreated the synchronization
    information is lost. This means that the data has to be
    resynchronized afterwards.
  </para>
  <para>
    Changes at the Subscriber are applied as normal database changes.
    If the Subscriber allows it these changes could conflict with
    changes made locally which could cause apply conflicts.
    In general, it is recommended that local changes be disallowed,
    treating data as read-only on the Subscriber side.
  </para>
  <para>
    Conflicts happen when the replicated changes is breaking any
    specified constraints (with the exception of foreign keys which are
    not checked). Currently conflicts are not resolved automatically and
    cause replication to be stopped with an error until the conflict is
    manually resolved.
  </para>
</sect1>
<sect1 id="logical-replication-architecture">
  <title>Architecture</title>
  <para>
    Logical replication starts by copying a snapshot of the data on
    the Provider database. Once that is done, the changes on Provider
    are sent to Subscriber as they occur in real-time. The Subscriber
    applies the data in the order in which commits were made on the
    Provider so that transactional consistency is guaranteed for the
    Publications within any single Subscription.
  </para>
  <para>
    The Logical Replication is built on the similar architecture as the
    physical streaming replication
    (see <xref linkend="streaming-replication">). It is implemented by
    WalSender and the Apply processes. The WalSender starts the logical
    decoding (described in <xref linkend="logicaldecoding">) of the WAL and
    loads the standard logical decoding plugin (pgoutput). The plugin
    transforms the changes read from WAL to the logical replication protocol
    (see <xref linkend="protocol-logical-replication">) and filters the data
    according to Publication specifications. The data are then continuously
    transferred using the streaming replication protocol to the Apply worker
    which maps them to the local tables and applies the individual changes as
    they are received in exact transactional order.
  </para>
  <para>
    The Apply process on Subscriber database always runs with
    session_replication_role set to replica, which produces the normal effects
    on triggers and constraints.
  </para>
  <sect2 id="logical-replication-snapshot">
    <title>Initial snapshot</title>
    <para>
      The initial snapshot is taken when the replication slot for
      Subscription is created. The existing data at that snapshot are
      then sent over using the streaming replication protocol between
      WalSender and Apply processes in similar way the changes are sent.
      Once the initial data are copied, the Apply enters catch up phase
      where it replays the changes which happened on the Provider while
      the initial snapshot was being copied. Once the replication catches
      up the Apply switches to normal replication streaming mode and
      replicates transactions as they happen.
    </para>
  </sect2>
  <sect2 id="logical-replication-table-resync">
    <title>Individual table resynchronization</title>
    <para>
      The table can be resynchronized at any point during the normal
      replication operation. When the table resynchronization is
      requested a parallel instance of special kind of the Apply process
      is started which registers its own temporary replication slot and
      does new snapshot. Then it works same way as the initial snapshot
      <xref linkend="logical-replication-snapshot"> with the exception that
      it only does data copy of single table and once the catchup phase is
      finished the control of the replication of the table is given back to
      the main Apply process.
    </para>
  </sect2>
</sect1>
<sect1 id="logical-replication-monitoring">
  <title>Monitoring</title>
  <para>
    pg_stat_replication
  </para>
  <para>
    pg_stat_subscription
  </para>
</sect1>
<sect1 id="logical-replication-security">
  <title>Security</title>
  <para>
    Replication connection can occur in the same way as physical streaming
    replication. It requires access to be specifically given using
    pg_hba.conf. The role used for the replication must have
    <literal>REPLICATION</literal> privilege <command>GRANTED</command>.
    This gives a role access to both logical and physical replication.
  </para>
  <para>
    In addition, logical replication can be accessed with the
    <literal>SUBSCRIPTION</literal> privilege. This allows you to create
    roles which can pull data from Publications yet cannot request
    physical replication.
  </para>
  <para>
    To create or subscribe to a Publication the user must have the
    REPLICATION role, the SUBSCRIPTION role or be a superuser.
  </para>
  <para>
    <literal>SELECT</literal> privilege is required when the user
    adds a table to a Publication.
    To subscribe to a Publication, user must be owner or have USAGE
    privileges granted to the Publication.
  </para>
  <para>
    To create a Subscription the user must have the
    REPLICATION role, the SUBSCRIPTION role or be a superuser.
    The Subscription Apply process will run in local database
    with the privileges of the owner of the Subscription. In practice this
    means that the owner of the Subscription must have <literal>INSERT</>,
    <literal>UPDATE</>, <literal>DELETE</> and <literal>TRUNCATE</>
    privileges on Subscriber to the tables that are being replicated by the
    Subscription, or be superuser, though this is not recommended.
  </para>
  <para>
    In particular, note that privileges are not re-checked as each change
    record is read from the Provider, nor are they re-checked for each change
    when applied. Security is checked once at startup. Concurrent REVOKEs
    of privilege will interrupt logical replication if they have a material
    affect on the security of the change stream.
  </para>
</sect1>
<sect1 id="logical-replication-gucs">
  <title>Logical replication related configuration parameters</title>
  <para>
    The Logical Replication requires several configuration options to be
    set.
  </para>
  <para>
    On the provider side the <varname>wal_level</> must be set to
    <literal>logical</>, <varname>max_replication_slots</> has to be set to
    at least number of Subscriptions expected to connect with some reserve
    for table synchronization as well. And <varname>max_wal_senders</>
    should be set to at least same as <varname>max_replication_slots</> plus
    the number of physical replicas that are connected at the same time.
  </para>
  <para>
    The Subscriber also requires the <varname>max_replication_slots</> to
    be set. In this case it should be set to at least the number of
    Subscriptions that will be added to the Subscriber. The
    <varname>max_logical_replication_workers</> has to be set to at least
    the number of Subscriptions again with some reserve for the table
    synchronization. Additionally the <varname>max_worker_processes</> may
    need to be adjusted to accommodate for replication workers at least
    (<varname>max_logical_replication_workers</> + <literal>1</>). Please
    note that some extensions and parallel queries also take worker slots
    from <varname>max_worker_processes</>.
  </para>
</sect1>
<sect1 id="logical-replication-quick-setup">
  <title>Quick setup</title>
  <para>
    First set the configuration options in the postgresql.conf:
<programlisting>
wal_level = logical
max_worker_processes = 10 # one per subscription + one per instance needed on subscriber
max_logical_replication_workers = 10 # one per subscription + one per instance needed on subscriber
max_replication_slots = 10 # one per subscription needed both provider and subscriber
max_wal_senders = 10 # one per subscription needed on provider
</programlisting>
  </para>
  <para>
    The pg_hba.conf needs to be adjusted to allow replication (the
    values here depend on your actual network configuration and user you
    want to use for connecting):
<programlisting>
host    replication     repuser     0.0.0.0/0       md5
</programlisting>
  </para>
  <para>
    Then on Provider database:
<programlisting>
CREATE PUBLICATION mypub;
ALTER PUBLICATION mypub ADD TABLE users, departments;
</programlisting>
  </para>
  <para>
    And on Subscriber database:
<programlisting>
CREATE SUBSCRIPTION mysub WITH CONNECTION <quote>dbname=foo host=bar user=repuser</quote> PUBLICATION mypub;
</programlisting>
  </para>
  <para>
    The above will start the replication process which synchronizes the
    initial table contents of <literal>users</literal> and
    <literal>departments</literal> tables and then starts replicating
    incremental changes to those tables.
  </para>
</sect1>
</chapter>


--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Design for In-Core Logical Replication

From
"Joshua D. Drake"
Date:
On 07/20/2016 01:08 AM, Simon Riggs wrote:

>
> <!-- doc/src/sgml/logical-replication.sgml -->
>
> <chapter id="logical-replication">
>
>    <title>Logical Replication</title>
>    <para>
>      Logical Replication is a method of replicating data objects and their
>      changes, based upon their Primary Keys (or Replication Identity). We

Do we want a limitation based on Primary Key, or would it be possible to 
use just UNIQUE or is that covered under Replication Identity?

>    <para>
>      Logical Replication uses a Publish and Subscribe model with one or
>      more Subscribers subscribing to one or more Publications on a
>      Provider node. Subscribers pull data from the Publications they
>      subscribe to and may subsequently re-publish data to allow
>      cascading replication or more complex configurations.

Is that somehow different than Origin/Subscriber or Master/Slave? If 
not, why are we using yet more terms?

> <sect1 id="publication">
>    <title>Publication</title>
>    <para>
>      A Publication object can be defined on any master node, owned by one
>      user. A Publication is a set of changes generated from a group of
>      tables, and might also be described as a Change Set or Replication Set.
>      Each Publication exists in only one database.
      Then on Provider database:
> <programlisting>
> CREATE PUBLICATION mypub;
> ALTER PUBLICATION mypub ADD TABLE users, departments;
> </programlisting>
>    </para>

Outside of my previous comments on reusing terminology that is known to 
our community, I like this. Basically a user creates a pool that is 
replicating, throws various ducks and small children into the pool and 
then replicates. Nice.

Sincerely,

JD


-- 
Command Prompt, Inc.                  http://the.postgres.company/                        +1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.



Re: Design for In-Core Logical Replication

From
Rod Taylor
Date:


On Wed, Jul 20, 2016 at 4:08 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
 

  <para>
    And on Subscriber database:
<programlisting>
CREATE SUBSCRIPTION mysub WITH CONNECTION <quote>dbname=foo host=bar user=repuser</quote> PUBLICATION mypub;
</programlisting>
  </para>
  <para>
    The above will start the replication process which synchronizes the
    initial table contents of <literal>users</literal> and
    <literal>departments</literal> tables and then starts replicating
    incremental changes to those tables.
  </para>
</sect1>
</chapter>

I think it's important for communication channels to be defined separately from the subscriptions.

If I have nodes 1/2 + 3/4 which operate in pairs, I don't really want to have to have a script reconfigure replication on 3/4 every-time we do maintenance on 1 or 2.

3/4 need to know they subscribe to mypub and that they have connections to machine 1 and machine 2. The replication system should be able to figure out which (of 1/2) has the most recently available data.


So, I'd rather have:

CREATE CONNECTION machine1;
CREATE CONNECTION machine2;
CREATE SUBSCRIPTION TO PUBLICATION mypub;

Notice I explicitly did not tell it how to get the publication but if we did have a preference the DNS weighting model might be appropriate.

I'm not certain the subscription needs to be named. IMO, a publication should have the same properties on all nodes (so any node may become the primary source). If a subscriber needs different behaviour for a publication, it should be created as a different publication.

Documenting that ThisPub is different from ThatPub is easier than documenting that ThisPub on node 1/2/4 is different from ThisPub on node 7/8, except Node 7 is temporarily on Node 4 too (database X instead of database Y) due to that power problem.


Clearly this is advanced. An initial implementation may only allow mypub from a single connection.


I also suspect multiple publications will be normal even if only 2 nodes. Old slow moving data almost always got different treatment than fast-moving data; even if only defining which set needs to hit the other node first and which set can trickle through later.

regards,

Rod Taylor

Re: Design for In-Core Logical Replication

From
Simon Riggs
Date:
On 20 July 2016 at 16:39, Joshua D. Drake <jd@commandprompt.com> wrote:
 
     Logical Replication is a method of replicating data objects and their
     changes, based upon their Primary Keys (or Replication Identity). We

Do we want a limitation based on Primary Key, or would it be possible to use just UNIQUE or is that covered under Replication Identity?

That is covered by replication identity.
 
   <para>
     Logical Replication uses a Publish and Subscribe model with one or
     more Subscribers subscribing to one or more Publications on a
     Provider node. Subscribers pull data from the Publications they
     subscribe to and may subsequently re-publish data to allow
     cascading replication or more complex configurations.

Is that somehow different than Origin/Subscriber or Master/Slave? If not, why are we using yet more terms?

Thanks for asking, an important question that we have a chance to get right before we go too far down the road of implementation.

Issue: We need a noun for CREATE "SOMETHING" (or pg_create_$NOUN). So what noun to use? SQLStandard gives us no guidance here.

I'll explain my thinking, so we can discuss the terms I've recommended, which can be summarized as:
A Provider node has one or more Databases, each of which can publish its data in zero, one or more PUBLICATIONs. A Subscribing node can receive data in the form of zero, one or more SUBSCRIBERs, where each SUBSCRIBER may bring together data from one or more PUBLICATIONs

Here's why...

Master/Slave is not appropriate, since both sending and receiving nodes are Masters.

Origin/Subscriber is used by Slony. The term "Replication Origin" is already used in PG9.5 for something related, but not identical.
Provider/Subscriber is used by Londiste.
Bucardo seems to use Master/Slave according to FAQ.

The Noun we are discussing is something that a single Database can have >1 of, so those terms aren't quite appropriate.

pglogical uses Provider/Subscriber and Replication Sets, so I started with the thought that we might want CREATE REPLICATION SET or pg_create_replication_set(). After some time considering this, ISTM that the term "replication set" may not be that useful since we foresee a future where data is actually filtered and transformed and the feature set extends well beyond what we have with Slony, so I began looking for a term that was general and obvious (POLA).

After some thought, I realised that we are describing this as "Publish & Subscribe", so it makes a lot of sense to just use the terms Publication & Subscription. Those phrases are commonly used by SQLServer, Sybase, Oracle, Redis, RabbitMQ etc which is a pretty big set.
It's also a commonly used Enterprise Integration Design pattern https://en.wikipedia.org/wiki/Publish–subscribe_pattern
I note especially that Publish/Subscribe does not imply any particular topology (a mistake I made earlier when I called this stuff BDR, which confused everybody when we tried to talk about a subset of that functionality called UDR).

So that brings us to...
A Provider node has one or more Databases, each of which can publish its data in zero, one or more PUBLICATIONs. A Subscribing node can receive data in the form of zero, one or more SUBSCRIBERs, where each SUBSCRIBER may bring together data from one or more PUBLICATIONs.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Design for In-Core Logical Replication

From
Simon Riggs
Date:
On 20 July 2016 at 17:52, Rod Taylor <rod.taylor@gmail.com> wrote:
 
I think it's important for communication channels to be defined separately from the subscriptions.

I agree and believe it will be that way.

Craig is working on allowing Replication Slots to failover between nodes, to provide exactly that requested behaviour.
 
So, I'd rather have:

CREATE CONNECTION machine1;
CREATE CONNECTION machine2;

I think those map to replication slots. (This discussion might get a bit confusing if we try to guess exactly what each others terms mean, so I'll go no further than "I think").
 
CREATE SUBSCRIPTION TO PUBLICATION mypub;

Yep
 
I'm not certain the subscription needs to be named. IMO, a publication should have the same properties on all nodes (so any node may become the primary source). If a subscriber needs different behaviour for a publication, it should be created as a different publication.

Understood, its mostly to allow it to be dropped or altered and monitored. It's kindof like an index, it needs a name, we just don't much care what it is.
 
Documenting that ThisPub is different from ThatPub is easier than documenting that ThisPub on node 1/2/4 is different from ThisPub on node 7/8, except Node 7 is temporarily on Node 4 too (database X instead of database Y) due to that power problem.

Which is why pg_dump support is important to allow us to sync up the definitions.
 
Clearly this is advanced. An initial implementation may only allow mypub from a single connection.

Good input and clearly explained, thanks. If any of the above changes, these requirements will remain noted.
 
I also suspect multiple publications will be normal even if only 2 nodes. Old slow moving data almost always got different treatment than fast-moving data; even if only defining which set needs to hit the other node first and which set can trickle through later.

Agreed 


--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Design for In-Core Logical Replication

From
Petr Jelinek
Date:
On 20/07/16 19:07, Simon Riggs wrote:
> On 20 July 2016 at 16:39, Joshua D. Drake <jd@commandprompt.com
> <mailto:jd@commandprompt.com>> wrote:
>             <para>
>               Logical Replication uses a Publish and Subscribe model
>         with one or
>               more Subscribers subscribing to one or more Publications on a
>               Provider node. Subscribers pull data from the Publications
>         they
>               subscribe to and may subsequently re-publish data to allow
>               cascading replication or more complex configurations.
>
>
>     Is that somehow different than Origin/Subscriber or Master/Slave? If
>     not, why are we using yet more terms?
>
>
> Thanks for asking, an important question that we have a chance to get
> right before we go too far down the road of implementation.
>
> I'll explain my thinking, so we can discuss the terms I've recommended,
> which can be summarized as:
> A Provider node has one or more Databases, each of which can publish its
> data in zero, one or more PUBLICATIONs. A Subscribing node can receive
> data in the form of zero, one or more SUBSCRIBERs, where each SUBSCRIBER
> may bring together data from one or more PUBLICATIONs
>
> Here's why...
>

Just to add to what Simon wrote. There is one more reason for not using 
term origin for this - origin of data does not necessarily have to be on 
the provider database once there is a cascading so it does not really 
map all that well.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: Design for In-Core Logical Replication

From
Craig Ringer
Date:
On 21 July 2016 at 01:20, Simon Riggs <simon@2ndquadrant.com> wrote:
On 20 July 2016 at 17:52, Rod Taylor <rod.taylor@gmail.com> wrote:
 
I think it's important for communication channels to be defined separately from the subscriptions.

I agree and believe it will be that way.

Craig is working on allowing Replication Slots to failover between nodes, to provide exactly that requested behaviour.
 

First, I'd like to emphasise that logical replication has been stalled for ages now because we can no longer make forward progress on core features needed until we have in-core logical replication (they're dismissed as irrelevant, no in core users, etc) - but we have also had difficulty getting logical replication into core. To break this impasse we really need logical replication in core and need to focus on getting the minimum viable feature in place, not trying to make it do everything all at once. Point-to-point replication with no forwarding should be just fine for the first release. Lets not bog this in extra "must have" features that aren't actually crucial.

That said:

I had a patch in it for 9.6 to provide the foundations for logical replication to follow physical failover, but it got pulled at the last minute. It'll be submitted for 10.0 along with some other enhancements to make it usable without hacky extensions, most notably support for using a physical replication slot and hot standby feedback to pin a master's catalog_xmin where it's needed by slots on a physical replica.

That's for when we're combining physical and logical replication though, e.g. "node A" is a master/standby pair, and "node B" is also a master/standby pair.

For non-star logical topologies, which is what I think you might've been referring to, it's necessary to have:

- Node identity
- Which nodes we want to receive data from
- How we connect to each node

all of which are separate things. Who's out there, what we want from them, and how to get it.

pglogical doesn't really separate the latter two much at this point. Subscriptions identify both the node to connect to and the data we want to receive from a node; there's no selective data forwarding from one node to another. Though there's room for that in pglogical's hooks/filters by using filtering by replication origin, it just doesn't do it yet.

It sounds like that's what you're getting at. Wanting to be able to say "node A wants to get data from node B and node C" separately to "node A connects to node B to receive data", with the replication system somehow working out that that means data written from C to B should be forwarded to A.

Right?

If so, it's not always easy to figure that out. If you create connections to both B and C, we then have to automagically work out that we should stop forwarding data from C over our connection to B.

The plan with pglogical has been to allow connections to specify forwarding options, so the connection explicitly says what nodes it wants to get data from. It's users' job to ensure that they don't declare connections that overlap. This is simpler to implement, but harder to admin.

One challenge with either approach is ensuring a consistent switchover. If you have a single connection A=>B receiving data from [B,C], then you switch to two connections A=>B and A=>C with neither forwarding, you must ensure that the switchover occurs in such a way as that no data is replicated twice or skipped. That's made easier by the fact that we have replication origins and we can actually safely receive from both at the same time then discard from one of them, even use upstream filtering to avoid sending it over the wire twice. But it does take care and caution.

Note that none of this is actually for logical _failover_, where we lose a node. For that we need some extra help in the form of placeholder slots maintained on other peers. This can be done at the application / replication system level without the need for new core features, but it might not be something we can support in the first iteration.

I'm not sure how Petr's current design for in-core replication addresses this, if it does, or whether it's presently focused only on point-to-point replication like pglogical. As far as I'm concerned so long as it does direct point-to-point replication with no forwarding that's good enough for a first cut feature, so long as the UI, catalog and schema design leaves room for adding more later.
 
  
I also suspect multiple publications will be normal even if only 2 nodes. Old slow moving data almost always got different treatment than fast-moving data; even if only defining which set needs to hit the other node first and which set can trickle through later.

Agreed 


Yes, especially since we can currently only stream transactions one by one in commit order after commit.

Even once we have interleaved xact streaming, though, there will still be plenty of times we want to receive different sets of data from the same node at different priorities/rates. Small data we want to receive quickly, vs big data we receive when we get the chance to catch up. Of course it's necessary to define non-overlapping replication sets for this.

That's something we can already do in pglogical. I'm not sure if Petr is targeting replication set support as part of the first release of the in-core version of logical replication; they're necessary to do things like this.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Design for In-Core Logical Replication

From
"Joshua D. Drake"
Date:
On 07/20/2016 06:35 PM, Craig Ringer wrote:

> First, I'd like to emphasise that logical replication has been stalled
> for ages now because we can no longer make forward progress on core
> features needed until we have in-core logical replication (they're
> dismissed as irrelevant, no in core users, etc) - but we have also had
> difficulty getting logical replication into core. To break this impasse
> we really need logical replication in core and need to focus on getting
> the minimum viable feature in place, not trying to make it do everything
> all at once. Point-to-point replication with no forwarding should be
> just fine for the first release. Lets not bog this in extra "must have"
> features that aren't actually crucial.

I don't think any person who actually works on postgresql with customers 
and clearly deals with "competition" can state with any sanity that we 
don't need logical replication in core.

JD


-- 
Command Prompt, Inc.                  http://the.postgres.company/                        +1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.



Re: Design for In-Core Logical Replication

From
Craig Ringer
Date:
On 21 July 2016 at 11:05, Joshua D. Drake <jd@commandprompt.com> wrote:
On 07/20/2016 06:35 PM, Craig Ringer wrote:

First, I'd like to emphasise that logical replication has been stalled
for ages now because we can no longer make forward progress on core
features needed until we have in-core logical replication (they're
dismissed as irrelevant, no in core users, etc) - but we have also had
difficulty getting logical replication into core. To break this impasse
we really need logical replication in core and need to focus on getting
the minimum viable feature in place, not trying to make it do everything
all at once. Point-to-point replication with no forwarding should be
just fine for the first release. Lets not bog this in extra "must have"
features that aren't actually crucial.

I don't think any person who actually works on postgresql with customers and clearly deals with "competition" can state with any sanity that we don't need logical replication in core.

No, and while people used to, we're past that now.

However, infrastructure improvements to make out-of-tree logical replication that we can get into user hands *now* rather than two+ years from now have been getting knocked back because there's no in-tree user, and tools like pglogical dismissed as irrelevant. Once we have logical replication in core hopefully we can start making infrastructure progress again as well.
 
--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Design for In-Core Logical Replication

From
Dmitriy Sarafannikov
Date:
><programlisting>
>CREATE PUBLICATION mypub;
>ALTER PUBLICATION mypub ADD TABLE users, departments;

>CREATE SUBSCRIPTION mysub WITH CONNECTION <quote>dbname=foo host=bar user=repuser</quote> PUBLICATION mypub;

>    The above will start the replication process which synchronizes the
>    initial table contents of <literal>users</literal> and
>    <literal>departments</literal> tables and then starts replicating
>    incremental changes to those tables.

Hi, hackers.

it is very good to have logical replication in core. Also i have some proposal. What if we would have ability to
executecustom trigger functions on events on particular table? Also it would be useful if would have ability to ignore
sometables in publication or replicatie with some WHERE condition. For example, we want replicate table "users" as is
(maybewith some WHERE conditions), but on events on table "departments" we want execute trigger function
departments_event_handler().And we don't want handle events on third table which was added to publication. 

Something like this:

CREATE PUBLICATION mypub;
ALTER PUBLICATION mypub ADD TABLE users, departments, unnecessary_tbl;

CREATE SUBSCRIPTION mysub WITH CONNECTION <quote>dbname=foo host=bar user=repuser</quote> PUBLICATION mypub;
ALTER SUBSCRIPTION mysub ADD TABLE users REPLICATE TO LOCAL TABLE my_users INSERT WHERE new.id_user > 1000 UPDATE WHERE
old.id_user< 1000; -- we don't want replicate deletes, for example. 
ALTER SUBSCRIPTION mysub ADD TABLE departments ON INSERT WHEN (new.id_user > 1000)  EXECUTE PROCEDURE
departments_event_handler();-- just like trigger 


Regards,
Dmitriy Sarafannikov

Re: Design for In-Core Logical Replication

From
Simon Riggs
Date:
On 22 July 2016 at 09:54, Dmitriy Sarafannikov <d.sarafannikov@bk.ru> wrote:
>
>><programlisting>
>>CREATE PUBLICATION mypub;
>>ALTER PUBLICATION mypub ADD TABLE users, departments;
>
>>CREATE SUBSCRIPTION mysub WITH CONNECTION <quote>dbname=foo host=bar user=repuser</quote> PUBLICATION mypub;
>
>>    The above will start the replication process which synchronizes the
>>    initial table contents of <literal>users</literal> and
>>    <literal>departments</literal> tables and then starts replicating
>>    incremental changes to those tables.
>
> Hi, hackers.
>
> it is very good to have logical replication in core. Also i have some proposal.

Thanks for your input.

> What if we would have ability to execute custom trigger functions on events on particular table? Also it would be
usefulif would have ability to ignore some tables in publication or replicatie with some WHERE condition.
 

Both of those requirements are eventual goals for this. The initial
commits of Logical Replication for 10.0 won't include those features,
keeping the scope tight so we make sure we get this into 10.0, but we
want to design it with those things in mind and we may be lucky enough
to get it in the first release.

> For example, we want replicate table "users" as is (maybe with some WHERE conditions), but on events on table
"departments"we want execute trigger function departments_event_handler().
 

> ALTER SUBSCRIPTION mysub ADD TABLE users REPLICATE TO LOCAL TABLE my_users INSERT WHERE new.id_user > 1000 UPDATE
WHEREold.id_user < 1000; -- we don't want replicate deletes, for example.
 
> ALTER SUBSCRIPTION mysub ADD TABLE departments ON INSERT WHEN (new.id_user > 1000)  EXECUTE PROCEDURE
departments_event_handler();-- just like trigger
 

Filtering data at both the sender and receiver has been requested, so
I guess it makes sense to have a WHERE clause on both the Publication
and the Subscription, but I will think more on that.

Transforming data at the receiver/Apply side we had envisaged would be
done using standard triggers. So we had envisaged you would add the
trigger onto the table using the existing trigger syntax, so that when
an INSERT was applied the trigger would execute, using the standard
mechanisms.
So I don't think we need to add trigger style syntax onto the
Subscription, but it is an option to consider, I guess.

(Note to self: make sure we get the dependencies right here so when we
drop objects they are automatically removed from Subscriptions, whilst
allowing Subscriptions to be dropped without causing tables to be
dropped (and Publications also)).

> And we don't want handle events on third table which was added to publication.

Not sure what you mean for that part. Could you explain a little more?

(I'm on leave for some days, so I don't respond quickly I will respond
eventually)

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Design for In-Core Logical Replication

From
Craig Ringer
Date:


On 22 July 2016 at 22:55, Simon Riggs <simon@2ndquadrant.com> wrote:
 

Filtering data at both the sender and receiver has been requested, so
I guess it makes sense to have a WHERE clause on both the Publication
and the Subscription, but I will think more on that.

Yeah, output plugin level filtering predicates are certainly a desirable feature for down the track.

One important factor to consider there is that we're very restricted in what we can safely access. Currently the historical snapshot infrastructure set up by logical decoding doesn't stop you trying to access tables that you can't safely access, you might just get surprising/wrong results or failures. If we're going to allow arbitrary user defined predicates we'll likely need to change that so we explicitly ERROR when an attempt to heap_open a table that isn't accessible during logical decoding is made. Probably not hard, likely a good idea anyway.

Such predicates should generally be simple, though with less strict requirements than CHECK constraints. We can probably do subqueries and I don't think functions have to be immutable, though obviously anything that tries to allocate an xid will fail.

Transforming data at the receiver/Apply side we had envisaged would be
done using standard triggers. So we had envisaged you would add the
trigger onto the table using the existing trigger syntax, so that when
an INSERT was applied the trigger would execute, using the standard
mechanisms.

Yeah. In the medium term at least.

I imagine eventually we'll want more powerful transforms that doesn't require the overhead of trigger function calls, but that's a looooong way down the road.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Design for In-Core Logical Replication

From
Jim Nasby
Date:
ISTM pg_dump should alter publication/subscription when doing partial 
dumps, similar to how it handles foreign keys.

Subscribed tables should be read-only by default. If you need something 
more advanced than that adding an extra option to the subscription (or 
wherever else) is going to be the least of your worries. If we don't 
want to get too wrapped up in config/syntax I think there should at 
least be a read-only statement level trigger installed by default that 
users can then drop or disable if desired.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461



Re: Design for In-Core Logical Replication

From
Jim Nasby
Date:
On 7/20/16 8:35 PM, Craig Ringer wrote:
> I'm not sure how Petr's current design for in-core replication addresses
> this, if it does, or whether it's presently focused only on
> point-to-point replication like pglogical. As far as I'm concerned so
> long as it does direct point-to-point replication with no forwarding
> that's good enough for a first cut feature, so long as the UI, catalog
> and schema design leaves room for adding more later.

That's the part I'm pretty worried about. I'd feel a lot better if there 
was at least a wiki page of future topology/communication features and a 
sketch of a design.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461



Re: Design for In-Core Logical Replication

From
Hannu Krosing
Date:
On 07/20/2016 10:08 AM, Simon Riggs wrote:
> <sect1 id="logical-replication-monitoring">
>   <title>Monitoring</title>
>   <para>
>     pg_stat_replication
>   </para>
>   <para>
>     pg_stat_subscription
>   </para>
> </sect1>

and probably also `pg_stat_publication` or some other way to see, what
tables are currently in a PUBLICATION, who has subscribed etc.

> <programlisting>
> CREATE PUBLICATION mypub;
> ALTER PUBLICATION mypub ADD TABLE users, departments;
> </programlisting>
Would a subscription just be a logical grouping or would it be something
stronger
like meaning atomic subscriptions and/or a dedicated replication slot ?

Can you subscribe to multiple publications through single SUBSCRIPTION ?

What is supposed to happen if table A is in two subscriptions S1 and S2,
and you
subscribe to both? Will you get table a only once (both initial copy and
events)?

Would a subscription of "mypub" pop up on subscriber side atomically, or
will subscribed
tables appear one-by one when they are ready (initial copy + catchup
event replay completed) ?

I recall that one of the drivers of developing pgq/skytools to replace
Slony was the
fact that Slony's "replication group" design made it very easy to
blunder subscription
changes in more complex topologies which manifested in deadlocks.

PGQ-s table-by-table subscription avoided this entirely at the cost on
non-atomic
subscribed table appearance.

Of course once subscribed, everything was transaction-consistent again.

> <programlisting>
> CREATE SUBSCRIPTION mysub WITH CONNECTION <quote>dbname=foo host=bar
> user=repuser</quote> PUBLICATION mypub;
> </programlisting>
For the pgq-like version which consider a PUBLICATION just as list of
tables to subscribe, I would add

CREATE SUBSCRIPTION mysub WITH CONNECTION 'dbname=foo host=bar
user=repuser' PUBLICATION mypub, mypub1;

ALTER SUBSCRIPTION mysub DROP PUBLICATION mypub1;

ALTER SUBSCRIPTION mysub ADD PUBLICATION mypub2;





-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ




Re: Design for In-Core Logical Replication

From
Petr Jelinek
Date:
On 26/07/16 00:05, Hannu Krosing wrote:
>> <programlisting>
>> CREATE PUBLICATION mypub;
>> ALTER PUBLICATION mypub ADD TABLE users, departments;
>> </programlisting>
> Would a subscription just be a logical grouping or would it be something
> stronger
> like meaning atomic subscriptions and/or a dedicated replication slot ?
>

Not sure what you mean by atomic subscription but subscription creation 
adds replication slot to the provider node. Other than that subscription 
lives on the downstream node only.

> Can you subscribe to multiple publications through single SUBSCRIPTION ?
>

Yes.

> What is supposed to happen if table A is in two subscriptions S1 and S2,
> and you
> subscribe to both? Will you get table a only once (both initial copy and
> events)?

Yes only once, the replication works with tables, publication is really 
just grouping/filtering, what you get is union of tables in the 
publications.

>
> Would a subscription of "mypub" pop up on subscriber side atomically, or
> will subscribed
> tables appear one-by one when they are ready (initial copy + catchup
> event replay completed) ?
>

Yes that's my plan as that makes it easier to parallelize and recover 
from crashes (also makes this faster as tables that are already done 
don't need to be copied again) during the initialization. Also makes it 
easier to reuse the table initialization code for adding new tables at 
later time.

>> <programlisting>
>> CREATE SUBSCRIPTION mysub WITH CONNECTION <quote>dbname=foo host=bar
>> user=repuser</quote> PUBLICATION mypub;
>> </programlisting>
> For the pgq-like version which consider a PUBLICATION just as list of
> tables to subscribe, I would add
>
> CREATE SUBSCRIPTION mysub WITH CONNECTION 'dbname=foo host=bar
> user=repuser' PUBLICATION mypub, mypub1;
>

Yes that works as well.

> ALTER SUBSCRIPTION mysub DROP PUBLICATION mypub1;
>
> ALTER SUBSCRIPTION mysub ADD PUBLICATION mypub2;
>

This does not yet, but I agree we should have it.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: Design for In-Core Logical Replication

From
Robert Haas
Date:
On Wed, Jul 20, 2016 at 4:08 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> In this post, Petr and I present a joint view on a design for how this
> should work in-core, based upon our implementation experiences with physical
> replication, pglogical and various comments so far.
>
> Note that this has substantial user-visible differences from pglogical,
> though much of the underlying architecture is reused.
>
> I should stress that not all of the aspects are implemented yet. The post
> here today is a combination of all of our attempts to bring architecture,
> usability and security into one place, including a coherent way of
> describing the features and how they work.
>
> Your comments and questions are sought now as we begin the main development
> effort to get this into PostgreSQL 10.0

Thanks for publishing this.

One minor comment is that this document makes extensive use of Terms
With Initial Capitals, which seems stylistically odd, although I'm not
precisely sure what would be better.

I would have expected that there would be a concept of a REPLICATION
SET, defining which tables are to be replicated; here, that seems to
be the Publication.  That may fine, but I wonder if there is any value
in separating those things.  It's clear, for example, that a
replication set can be dumped: which tables are members of which
replication sets is durable metadata.  It's less clear that a
publication can be dumped; that might include things which are not
durable metadata, such as associations with slots.

It's generally not really clear to me based on reading this exactly
what information is encapsulated in a Publication or a Subscription,
which makes it hard to evaluate design decisions like this one:

>   <para>
>     The definition of a Publication object will be included within
>     pg_dump by default when all of the objects in the Publication are
>     requested as part of the dump specification.
>   </para>
>   <para>
>     Subscriptions are not dumped by pg_dump by default, but can be
>     requested using --subscriptions parameter.
>   </para>

I think that to really understand exactly what you and Petr have in
mind, we'd need a description of where publication and subscription
data is stored within the server, and exactly what gets stored.
Perhaps that will come in a later email.  I'm not bashing the design,
exactly, I just can't quite see how all of the pieces fit together
yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Design for In-Core Logical Replication

From
Alvaro Herrera
Date:
Robert Haas wrote:

> One minor comment is that this document makes extensive use of Terms
> With Initial Capitals, which seems stylistically odd, although I'm not
> precisely sure what would be better.

We use <firstterm>publication</> on the first use only, which is turned
into italics.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Design for In-Core Logical Replication

From
Simon Riggs
Date:
On 29 July 2016 at 16:53, Robert Haas <robertmhaas@gmail.com> wrote:

> I think that to really understand exactly what you and Petr have in
> mind, we'd need a description of where publication and subscription
> data is stored within the server, and exactly what gets stored.
> Perhaps that will come in a later email.  I'm not bashing the design,
> exactly, I just can't quite see how all of the pieces fit together
> yet.

Sure no problem.

It's clear there are about 12 ways of putting this together and each
way has various terms needed.

We need input from many people to get this right, since this is much
more about UI than system architecture.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services