Thread: Design for In-Core Logical Replication
At PgCon we discussed that Petr Jelinek would be working on the code for an in-core logical replication implementation, while I would work on user interface/security models. Petr has been actively working on the code and will post patch in a few weeks, as discussed and agreed. Craig Ringer is also active in coding necessary aspects. None of those things are discussed further here at this time.
In this post, Petr and I present a joint view on a design for how this should work in-core, based upon our implementation experiences with physical replication, pglogical and various comments so far.
Note that this has substantial user-visible differences from pglogical, though much of the underlying architecture is reused.
I should stress that not all of the aspects are implemented yet. The post here today is a combination of all of our attempts to bring architecture, usability and security into one place, including a coherent way of describing the features and how they work.
Your comments and questions are sought now as we begin the main development effort to get this into PostgreSQL 10.0
<!-- doc/src/sgml/logical-replication.sgml -->
<chapter id="logical-replication">
<title>Logical Replication</title>
<para>
Logical Replication is a method of replicating data objects and their
changes, based upon their Primary Keys (or Replication Identity). We
use the term Logical in contrast to Physical replication which
uses exact block addresses and byte-by-byte replication.
PostgreSQL supports both mechanisms concurrently, see
<xref linkend="high-availability">. Logical Replication allows
fine-grained control over both data replication and security.
</para>
<para>
Logical Replication uses a Publish and Subscribe model with one or
more Subscribers subscribing to one or more Publications on a
Provider node. Subscribers pull data from the Publications they
subscribe to and may subsequently re-publish data to allow
cascading replication or more complex configurations.
</para>
<para>
Data for committed transactions is streamed in real-time to each
Subscriber. Logical replication might also be described as Change
Data Capture (CDC) or Transactional Replication.
</para>
<para>
The typical use-cases for logical replication are:
</para>
<itemizedlist>
<listitem>
<para>
Replicating between different major versions of the PostgreSQL
</para>
</listitem>
<listitem>
<para>
Replicating a database in full to another master node.
</para>
</listitem>
<listitem>
<para>
Replicating a subset of a database to another master node.
</para>
</listitem>
<listitem>
<para>
Firing triggers for individual changes as they are incoming to
subscriber.
</para>
</listitem>
<listitem>
<para>
Gathering data from multiple databases into a single one (for example
for analytical purposes).
</para>
</listitem>
</itemizedlist>
<sect1 id="publication">
<title>Publication</title>
<para>
A Publication object can be defined on any master node, owned by one
user. A Publication is a set of changes generated from a group of
tables, and might also be described as a Change Set or Replication Set.
Each Publication exists in only one database.
</para>
<para>
Publications are different from table schema and do not affect
how the table is accessed. Each table can be added to multiple
Publications if needed. Publications may include both tables
and materialized views. Objects must be added explicitly, except
when a Publication is created for "ALL TABLES". There is no
default name for a Publication which specifies all tables.
</para>
<para>
Tables added to a Publication must be accessible via SELECT
privilege for the user owning the Publication. Usage on the
Publication can be GRANTed to other users.
</para>
<para>
Publications can choose to limit the changes they show using any
combination of INSERT, UPDATE, DELETE and TRUNCATE in a similar
way to the way triggers are fired by particular event types.
</para>
<para>
When UPDATEs and DELETEs are replicated by a Publication, all tables
added must have a unique index present on the REPLICA IDENTITY for
the table, or the addition will be refused.
</para>
<para>
The definition of a Publication object will be included within
pg_dump by default when all of the objects in the Publication are
requested as part of the dump specification.
</para>
<para>
Every Publication can have zero, one or more Subscribers.
</para>
<para>
Publications are created using the <xref linkend="sql-createpublication">
command and may be later altered or dropped using corresponding commands.
</para>
<para>
The individual tables can be added and removed dynamically using
<xref linkend="sql-alterpublication">. Both the ADD TABLE and DROP
TABLE operations are transactional so the table will start or stop
replicating at the correct snapshot once the transaction has committed.
</para>
</sect1>
<sect1 id="subscription">
<title>Subscription</title>
<para>
A Subscription is the downstream side of the Logical Replication. The
node where Subscription is defined is referred to as Subscriber.
Subscription defines the connection to another database and set of
Publications (one or more) to which it wants to be subscribed.
It is possible to have a Subscription that currently has no
Publications.
</para>
<para>
The Subscriber database behaves in a same way as any other
PostgreSQL instance and can be used as a Provider for other
databases by defining its own Publications.
</para>
<para>
A Subscriber may have multiple Subscriptions if desired. It is
possible to define multiple Subscriptions between single
Provider-Subscriber pair, provided that each Publications can only
be subscribed to from one Subcriber.
</para>
<para>
Each Subscription will receive changes via one replication slot (see
<xref linkend="streaming-replication-slots">). Additional temporary
replication slots may be required for the initial data synchronizations
of pre-existing table data.
</para>
<para>
Subscriptions are not dumped by pg_dump by default, but can be
requested using --subscriptions parameter.
</para>
<para>
The Subscription is added using <xref linkend="sql-createsubscription">
and can be stopped/resumed at any time using
<xref linkend="sql-altersubscription"> command or removed using
<xref linkend="sql-dropsubscription">.
</para>
<para>
When a subscription is dropped and recreated the synchronization
information is lost. This means that the data has to be
resynchronized afterwards.
</para>
<para>
Changes at the Subscriber are applied as normal database changes.
If the Subscriber allows it these changes could conflict with
changes made locally which could cause apply conflicts.
In general, it is recommended that local changes be disallowed,
treating data as read-only on the Subscriber side.
</para>
<para>
Conflicts happen when the replicated changes is breaking any
specified constraints (with the exception of foreign keys which are
not checked). Currently conflicts are not resolved automatically and
cause replication to be stopped with an error until the conflict is
manually resolved.
</para>
</sect1>
<sect1 id="logical-replication-architecture">
<title>Architecture</title>
<para>
Logical replication starts by copying a snapshot of the data on
the Provider database. Once that is done, the changes on Provider
are sent to Subscriber as they occur in real-time. The Subscriber
applies the data in the order in which commits were made on the
Provider so that transactional consistency is guaranteed for the
Publications within any single Subscription.
</para>
<para>
The Logical Replication is built on the similar architecture as the
physical streaming replication
(see <xref linkend="streaming-replication">). It is implemented by
WalSender and the Apply processes. The WalSender starts the logical
decoding (described in <xref linkend="logicaldecoding">) of the WAL and
loads the standard logical decoding plugin (pgoutput). The plugin
transforms the changes read from WAL to the logical replication protocol
(see <xref linkend="protocol-logical-replication">) and filters the data
according to Publication specifications. The data are then continuously
transferred using the streaming replication protocol to the Apply worker
which maps them to the local tables and applies the individual changes as
they are received in exact transactional order.
</para>
<para>
The Apply process on Subscriber database always runs with
session_replication_role set to replica, which produces the normal effects
on triggers and constraints.
</para>
<sect2 id="logical-replication-snapshot">
<title>Initial snapshot</title>
<para>
The initial snapshot is taken when the replication slot for
Subscription is created. The existing data at that snapshot are
then sent over using the streaming replication protocol between
WalSender and Apply processes in similar way the changes are sent.
Once the initial data are copied, the Apply enters catch up phase
where it replays the changes which happened on the Provider while
the initial snapshot was being copied. Once the replication catches
up the Apply switches to normal replication streaming mode and
replicates transactions as they happen.
</para>
</sect2>
<sect2 id="logical-replication-table-resync">
<title>Individual table resynchronization</title>
<para>
The table can be resynchronized at any point during the normal
replication operation. When the table resynchronization is
requested a parallel instance of special kind of the Apply process
is started which registers its own temporary replication slot and
does new snapshot. Then it works same way as the initial snapshot
<xref linkend="logical-replication-snapshot"> with the exception that
it only does data copy of single table and once the catchup phase is
finished the control of the replication of the table is given back to
the main Apply process.
</para>
</sect2>
</sect1>
<sect1 id="logical-replication-monitoring">
<title>Monitoring</title>
<para>
pg_stat_replication
</para>
<para>
pg_stat_subscription
</para>
</sect1>
<sect1 id="logical-replication-security">
<title>Security</title>
<para>
Replication connection can occur in the same way as physical streaming
replication. It requires access to be specifically given using
pg_hba.conf. The role used for the replication must have
<literal>REPLICATION</literal> privilege <command>GRANTED</command>.
This gives a role access to both logical and physical replication.
</para>
<para>
In addition, logical replication can be accessed with the
<literal>SUBSCRIPTION</literal> privilege. This allows you to create
roles which can pull data from Publications yet cannot request
physical replication.
</para>
<para>
To create or subscribe to a Publication the user must have the
REPLICATION role, the SUBSCRIPTION role or be a superuser.
</para>
<para>
<literal>SELECT</literal> privilege is required when the user
adds a table to a Publication.
To subscribe to a Publication, user must be owner or have USAGE
privileges granted to the Publication.
</para>
<para>
To create a Subscription the user must have the
REPLICATION role, the SUBSCRIPTION role or be a superuser.
The Subscription Apply process will run in local database
with the privileges of the owner of the Subscription. In practice this
means that the owner of the Subscription must have <literal>INSERT</>,
<literal>UPDATE</>, <literal>DELETE</> and <literal>TRUNCATE</>
privileges on Subscriber to the tables that are being replicated by the
Subscription, or be superuser, though this is not recommended.
</para>
<para>
In particular, note that privileges are not re-checked as each change
record is read from the Provider, nor are they re-checked for each change
when applied. Security is checked once at startup. Concurrent REVOKEs
of privilege will interrupt logical replication if they have a material
affect on the security of the change stream.
</para>
</sect1>
<sect1 id="logical-replication-gucs">
<title>Logical replication related configuration parameters</title>
<para>
The Logical Replication requires several configuration options to be
set.
</para>
<para>
On the provider side the <varname>wal_level</> must be set to
<literal>logical</>, <varname>max_replication_slots</> has to be set to
at least number of Subscriptions expected to connect with some reserve
for table synchronization as well. And <varname>max_wal_senders</>
should be set to at least same as <varname>max_replication_slots</> plus
the number of physical replicas that are connected at the same time.
</para>
<para>
The Subscriber also requires the <varname>max_replication_slots</> to
be set. In this case it should be set to at least the number of
Subscriptions that will be added to the Subscriber. The
<varname>max_logical_replication_workers</> has to be set to at least
the number of Subscriptions again with some reserve for the table
synchronization. Additionally the <varname>max_worker_processes</> may
need to be adjusted to accommodate for replication workers at least
(<varname>max_logical_replication_workers</> + <literal>1</>). Please
note that some extensions and parallel queries also take worker slots
from <varname>max_worker_processes</>.
</para>
</sect1>
<sect1 id="logical-replication-quick-setup">
<title>Quick setup</title>
<para>
First set the configuration options in the postgresql.conf:
<programlisting>
wal_level = logical
max_worker_processes = 10 # one per subscription + one per instance needed on subscriber
max_logical_replication_workers = 10 # one per subscription + one per instance needed on subscriber
max_replication_slots = 10 # one per subscription needed both provider and subscriber
max_wal_senders = 10 # one per subscription needed on provider
</programlisting>
</para>
<para>
The pg_hba.conf needs to be adjusted to allow replication (the
values here depend on your actual network configuration and user you
want to use for connecting):
<programlisting>
host replication repuser 0.0.0.0/0 md5
</programlisting>
</para>
<para>
Then on Provider database:
<programlisting>
CREATE PUBLICATION mypub;
ALTER PUBLICATION mypub ADD TABLE users, departments;
</programlisting>
</para>
<para>
And on Subscriber database:
<programlisting>
CREATE SUBSCRIPTION mysub WITH CONNECTION <quote>dbname=foo host=bar user=repuser</quote> PUBLICATION mypub;
</programlisting>
</para>
<para>
The above will start the replication process which synchronizes the
initial table contents of <literal>users</literal> and
<literal>departments</literal> tables and then starts replicating
incremental changes to those tables.
</para>
</sect1>
</chapter>
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 07/20/2016 01:08 AM, Simon Riggs wrote: > > <!-- doc/src/sgml/logical-replication.sgml --> > > <chapter id="logical-replication"> > > <title>Logical Replication</title> > <para> > Logical Replication is a method of replicating data objects and their > changes, based upon their Primary Keys (or Replication Identity). We Do we want a limitation based on Primary Key, or would it be possible to use just UNIQUE or is that covered under Replication Identity? > <para> > Logical Replication uses a Publish and Subscribe model with one or > more Subscribers subscribing to one or more Publications on a > Provider node. Subscribers pull data from the Publications they > subscribe to and may subsequently re-publish data to allow > cascading replication or more complex configurations. Is that somehow different than Origin/Subscriber or Master/Slave? If not, why are we using yet more terms? > <sect1 id="publication"> > <title>Publication</title> > <para> > A Publication object can be defined on any master node, owned by one > user. A Publication is a set of changes generated from a group of > tables, and might also be described as a Change Set or Replication Set. > Each Publication exists in only one database. Then on Provider database: > <programlisting> > CREATE PUBLICATION mypub; > ALTER PUBLICATION mypub ADD TABLE users, departments; > </programlisting> > </para> Outside of my previous comments on reusing terminology that is known to our community, I like this. Basically a user creates a pool that is replicating, throws various ducks and small children into the pool and then replicates. Nice. Sincerely, JD -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them.
On Wed, Jul 20, 2016 at 4:08 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
<para>And on Subscriber database:<programlisting>CREATE SUBSCRIPTION mysub WITH CONNECTION <quote>dbname=foo host=bar user=repuser</quote> PUBLICATION mypub;</programlisting></para><para>The above will start the replication process which synchronizes theinitial table contents of <literal>users</literal> and<literal>departments</literal> tables and then starts replicatingincremental changes to those tables.</para></sect1></chapter>
I think it's important for communication channels to be defined separately from the subscriptions.
If I have nodes 1/2 + 3/4 which operate in pairs, I don't really want to have to have a script reconfigure replication on 3/4 every-time we do maintenance on 1 or 2.
3/4 need to know they subscribe to mypub and that they have connections to machine 1 and machine 2. The replication system should be able to figure out which (of 1/2) has the most recently available data.
So, I'd rather have:
CREATE CONNECTION machine1;
CREATE CONNECTION machine2;
CREATE SUBSCRIPTION TO PUBLICATION mypub;
CREATE CONNECTION machine1;
CREATE CONNECTION machine2;
CREATE SUBSCRIPTION TO PUBLICATION mypub;
Notice I explicitly did not tell it how to get the publication but if we did have a preference the DNS weighting model might be appropriate.
I'm not certain the subscription needs to be named. IMO, a publication should have the same properties on all nodes (so any node may become the primary source). If a subscriber needs different behaviour for a publication, it should be created as a different publication.
Documenting that ThisPub is different from ThatPub is easier than documenting that ThisPub on node 1/2/4 is different from ThisPub on node 7/8, except Node 7 is temporarily on Node 4 too (database X instead of database Y) due to that power problem.
Clearly this is advanced. An initial implementation may only allow mypub from a single connection.
I also suspect multiple publications will be normal even if only 2 nodes. Old slow moving data almost always got different treatment than fast-moving data; even if only defining which set needs to hit the other node first and which set can trickle through later.
regards,
Rod Taylor
On 20 July 2016 at 16:39, Joshua D. Drake <jd@commandprompt.com> wrote:
--
Logical Replication is a method of replicating data objects and their
changes, based upon their Primary Keys (or Replication Identity). We
Do we want a limitation based on Primary Key, or would it be possible to use just UNIQUE or is that covered under Replication Identity?
That is covered by replication identity.
<para>
Logical Replication uses a Publish and Subscribe model with one or
more Subscribers subscribing to one or more Publications on a
Provider node. Subscribers pull data from the Publications they
subscribe to and may subsequently re-publish data to allow
cascading replication or more complex configurations.
Is that somehow different than Origin/Subscriber or Master/Slave? If not, why are we using yet more terms?
Thanks for asking, an important question that we have a chance to get right before we go too far down the road of implementation.
Issue: We need a noun for CREATE "SOMETHING" (or pg_create_$NOUN). So what noun to use? SQLStandard gives us no guidance here.
I'll explain my thinking, so we can discuss the terms I've recommended, which can be summarized as:
A Provider node has one or more Databases, each of which can publish its data in zero, one or more PUBLICATIONs. A Subscribing node can receive data in the form of zero, one or more SUBSCRIBERs, where each SUBSCRIBER may bring together data from one or more PUBLICATIONs
Here's why...
Master/Slave is not appropriate, since both sending and receiving nodes are Masters.
Origin/Subscriber is used by Slony. The term "Replication Origin" is already used in PG9.5 for something related, but not identical.
Provider/Subscriber is used by Londiste.
Bucardo seems to use Master/Slave according to FAQ.
The Noun we are discussing is something that a single Database can have >1 of, so those terms aren't quite appropriate.
pglogical uses Provider/Subscriber and Replication Sets, so I started with the thought that we might want CREATE REPLICATION SET or pg_create_replication_set(). After some time considering this, ISTM that the term "replication set" may not be that useful since we foresee a future where data is actually filtered and transformed and the feature set extends well beyond what we have with Slony, so I began looking for a term that was general and obvious (POLA).
After some thought, I realised that we are describing this as "Publish & Subscribe", so it makes a lot of sense to just use the terms Publication & Subscription. Those phrases are commonly used by SQLServer, Sybase, Oracle, Redis, RabbitMQ etc which is a pretty big set.
It's also a commonly used Enterprise Integration Design pattern https://en.wikipedia.org/wiki/Publish–subscribe_pattern
I note especially that Publish/Subscribe does not imply any particular topology (a mistake I made earlier when I called this stuff BDR, which confused everybody when we tried to talk about a subset of that functionality called UDR).
So that brings us to...
A Provider node has one or more Databases, each of which can publish its data in zero, one or more PUBLICATIONs. A Subscribing node can receive data in the form of zero, one or more SUBSCRIBERs, where each SUBSCRIBER may bring together data from one or more PUBLICATIONs.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 20 July 2016 at 17:52, Rod Taylor <rod.taylor@gmail.com> wrote:
--
I think it's important for communication channels to be defined separately from the subscriptions.
I agree and believe it will be that way.
Craig is working on allowing Replication Slots to failover between nodes, to provide exactly that requested behaviour.
So, I'd rather have:
CREATE CONNECTION machine1;
CREATE CONNECTION machine2;
I think those map to replication slots. (This discussion might get a bit confusing if we try to guess exactly what each others terms mean, so I'll go no further than "I think").
CREATE SUBSCRIPTION TO PUBLICATION mypub;
Yep
I'm not certain the subscription needs to be named. IMO, a publication should have the same properties on all nodes (so any node may become the primary source). If a subscriber needs different behaviour for a publication, it should be created as a different publication.
Understood, its mostly to allow it to be dropped or altered and monitored. It's kindof like an index, it needs a name, we just don't much care what it is.
Documenting that ThisPub is different from ThatPub is easier than documenting that ThisPub on node 1/2/4 is different from ThisPub on node 7/8, except Node 7 is temporarily on Node 4 too (database X instead of database Y) due to that power problem.
Which is why pg_dump support is important to allow us to sync up the definitions.
Clearly this is advanced. An initial implementation may only allow mypub from a single connection.
Good input and clearly explained, thanks. If any of the above changes, these requirements will remain noted.
I also suspect multiple publications will be normal even if only 2 nodes. Old slow moving data almost always got different treatment than fast-moving data; even if only defining which set needs to hit the other node first and which set can trickle through later.
Agreed
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 20/07/16 19:07, Simon Riggs wrote: > On 20 July 2016 at 16:39, Joshua D. Drake <jd@commandprompt.com > <mailto:jd@commandprompt.com>> wrote: > <para> > Logical Replication uses a Publish and Subscribe model > with one or > more Subscribers subscribing to one or more Publications on a > Provider node. Subscribers pull data from the Publications > they > subscribe to and may subsequently re-publish data to allow > cascading replication or more complex configurations. > > > Is that somehow different than Origin/Subscriber or Master/Slave? If > not, why are we using yet more terms? > > > Thanks for asking, an important question that we have a chance to get > right before we go too far down the road of implementation. > > I'll explain my thinking, so we can discuss the terms I've recommended, > which can be summarized as: > A Provider node has one or more Databases, each of which can publish its > data in zero, one or more PUBLICATIONs. A Subscribing node can receive > data in the form of zero, one or more SUBSCRIBERs, where each SUBSCRIBER > may bring together data from one or more PUBLICATIONs > > Here's why... > Just to add to what Simon wrote. There is one more reason for not using term origin for this - origin of data does not necessarily have to be on the provider database once there is a cascading so it does not really map all that well. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 21 July 2016 at 01:20, Simon Riggs <simon@2ndquadrant.com> wrote:
First, I'd like to emphasise that logical replication has been stalled for ages now because we can no longer make forward progress on core features needed until we have in-core logical replication (they're dismissed as irrelevant, no in core users, etc) - but we have also had difficulty getting logical replication into core. To break this impasse we really need logical replication in core and need to focus on getting the minimum viable feature in place, not trying to make it do everything all at once. Point-to-point replication with no forwarding should be just fine for the first release. Lets not bog this in extra "must have" features that aren't actually crucial.
Note that none of this is actually for logical _failover_, where we lose a node. For that we need some extra help in the form of placeholder slots maintained on other peers. This can be done at the application / replication system level without the need for new core features, but it might not be something we can support in the first iteration.
On 20 July 2016 at 17:52, Rod Taylor <rod.taylor@gmail.com> wrote:I think it's important for communication channels to be defined separately from the subscriptions.I agree and believe it will be that way.Craig is working on allowing Replication Slots to failover between nodes, to provide exactly that requested behaviour.
First, I'd like to emphasise that logical replication has been stalled for ages now because we can no longer make forward progress on core features needed until we have in-core logical replication (they're dismissed as irrelevant, no in core users, etc) - but we have also had difficulty getting logical replication into core. To break this impasse we really need logical replication in core and need to focus on getting the minimum viable feature in place, not trying to make it do everything all at once. Point-to-point replication with no forwarding should be just fine for the first release. Lets not bog this in extra "must have" features that aren't actually crucial.
That said:
I had a patch in it for 9.6 to provide the foundations for logical replication to follow physical failover, but it got pulled at the last minute. It'll be submitted for 10.0 along with some other enhancements to make it usable without hacky extensions, most notably support for using a physical replication slot and hot standby feedback to pin a master's catalog_xmin where it's needed by slots on a physical replica.
That's for when we're combining physical and logical replication though, e.g. "node A" is a master/standby pair, and "node B" is also a master/standby pair.
For non-star logical topologies, which is what I think you might've been referring to, it's necessary to have:
- Node identity
- Which nodes we want to receive data from
- How we connect to each node
all of which are separate things. Who's out there, what we want from them, and how to get it.
pglogical doesn't really separate the latter two much at this point. Subscriptions identify both the node to connect to and the data we want to receive from a node; there's no selective data forwarding from one node to another. Though there's room for that in pglogical's hooks/filters by using filtering by replication origin, it just doesn't do it yet.
It sounds like that's what you're getting at. Wanting to be able to say "node A wants to get data from node B and node C" separately to "node A connects to node B to receive data", with the replication system somehow working out that that means data written from C to B should be forwarded to A.
Right?
If so, it's not always easy to figure that out. If you create connections to both B and C, we then have to automagically work out that we should stop forwarding data from C over our connection to B.
The plan with pglogical has been to allow connections to specify forwarding options, so the connection explicitly says what nodes it wants to get data from. It's users' job to ensure that they don't declare connections that overlap. This is simpler to implement, but harder to admin.
One challenge with either approach is ensuring a consistent switchover. If you have a single connection A=>B receiving data from [B,C], then you switch to two connections A=>B and A=>C with neither forwarding, you must ensure that the switchover occurs in such a way as that no data is replicated twice or skipped. That's made easier by the fact that we have replication origins and we can actually safely receive from both at the same time then discard from one of them, even use upstream filtering to avoid sending it over the wire twice. But it does take care and caution.
Note that none of this is actually for logical _failover_, where we lose a node. For that we need some extra help in the form of placeholder slots maintained on other peers. This can be done at the application / replication system level without the need for new core features, but it might not be something we can support in the first iteration.
I'm not sure how Petr's current design for in-core replication addresses this, if it does, or whether it's presently focused only on point-to-point replication like pglogical. As far as I'm concerned so long as it does direct point-to-point replication with no forwarding that's good enough for a first cut feature, so long as the UI, catalog and schema design leaves room for adding more later.
I also suspect multiple publications will be normal even if only 2 nodes. Old slow moving data almost always got different treatment than fast-moving data; even if only defining which set needs to hit the other node first and which set can trickle through later.Agreed
Yes, especially since we can currently only stream transactions one by one in commit order after commit.
Even once we have interleaved xact streaming, though, there will still be plenty of times we want to receive different sets of data from the same node at different priorities/rates. Small data we want to receive quickly, vs big data we receive when we get the chance to catch up. Of course it's necessary to define non-overlapping replication sets for this.
That's something we can already do in pglogical. I'm not sure if Petr is targeting replication set support as part of the first release of the in-core version of logical replication; they're necessary to do things like this.
On 07/20/2016 06:35 PM, Craig Ringer wrote: > First, I'd like to emphasise that logical replication has been stalled > for ages now because we can no longer make forward progress on core > features needed until we have in-core logical replication (they're > dismissed as irrelevant, no in core users, etc) - but we have also had > difficulty getting logical replication into core. To break this impasse > we really need logical replication in core and need to focus on getting > the minimum viable feature in place, not trying to make it do everything > all at once. Point-to-point replication with no forwarding should be > just fine for the first release. Lets not bog this in extra "must have" > features that aren't actually crucial. I don't think any person who actually works on postgresql with customers and clearly deals with "competition" can state with any sanity that we don't need logical replication in core. JD -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them.
On 21 July 2016 at 11:05, Joshua D. Drake <jd@commandprompt.com> wrote:
-- On 07/20/2016 06:35 PM, Craig Ringer wrote:First, I'd like to emphasise that logical replication has been stalled
for ages now because we can no longer make forward progress on core
features needed until we have in-core logical replication (they're
dismissed as irrelevant, no in core users, etc) - but we have also had
difficulty getting logical replication into core. To break this impasse
we really need logical replication in core and need to focus on getting
the minimum viable feature in place, not trying to make it do everything
all at once. Point-to-point replication with no forwarding should be
just fine for the first release. Lets not bog this in extra "must have"
features that aren't actually crucial.
I don't think any person who actually works on postgresql with customers and clearly deals with "competition" can state with any sanity that we don't need logical replication in core.
No, and while people used to, we're past that now.
However, infrastructure improvements to make out-of-tree logical replication that we can get into user hands *now* rather than two+ years from now have been getting knocked back because there's no in-tree user, and tools like pglogical dismissed as irrelevant. Once we have logical replication in core hopefully we can start making infrastructure progress again as well.
><programlisting> >CREATE PUBLICATION mypub; >ALTER PUBLICATION mypub ADD TABLE users, departments; >CREATE SUBSCRIPTION mysub WITH CONNECTION <quote>dbname=foo host=bar user=repuser</quote> PUBLICATION mypub; > The above will start the replication process which synchronizes the > initial table contents of <literal>users</literal> and > <literal>departments</literal> tables and then starts replicating > incremental changes to those tables. Hi, hackers. it is very good to have logical replication in core. Also i have some proposal. What if we would have ability to executecustom trigger functions on events on particular table? Also it would be useful if would have ability to ignore sometables in publication or replicatie with some WHERE condition. For example, we want replicate table "users" as is (maybewith some WHERE conditions), but on events on table "departments" we want execute trigger function departments_event_handler().And we don't want handle events on third table which was added to publication. Something like this: CREATE PUBLICATION mypub; ALTER PUBLICATION mypub ADD TABLE users, departments, unnecessary_tbl; CREATE SUBSCRIPTION mysub WITH CONNECTION <quote>dbname=foo host=bar user=repuser</quote> PUBLICATION mypub; ALTER SUBSCRIPTION mysub ADD TABLE users REPLICATE TO LOCAL TABLE my_users INSERT WHERE new.id_user > 1000 UPDATE WHERE old.id_user< 1000; -- we don't want replicate deletes, for example. ALTER SUBSCRIPTION mysub ADD TABLE departments ON INSERT WHEN (new.id_user > 1000) EXECUTE PROCEDURE departments_event_handler();-- just like trigger Regards, Dmitriy Sarafannikov
On 22 July 2016 at 09:54, Dmitriy Sarafannikov <d.sarafannikov@bk.ru> wrote: > >><programlisting> >>CREATE PUBLICATION mypub; >>ALTER PUBLICATION mypub ADD TABLE users, departments; > >>CREATE SUBSCRIPTION mysub WITH CONNECTION <quote>dbname=foo host=bar user=repuser</quote> PUBLICATION mypub; > >> The above will start the replication process which synchronizes the >> initial table contents of <literal>users</literal> and >> <literal>departments</literal> tables and then starts replicating >> incremental changes to those tables. > > Hi, hackers. > > it is very good to have logical replication in core. Also i have some proposal. Thanks for your input. > What if we would have ability to execute custom trigger functions on events on particular table? Also it would be usefulif would have ability to ignore some tables in publication or replicatie with some WHERE condition. Both of those requirements are eventual goals for this. The initial commits of Logical Replication for 10.0 won't include those features, keeping the scope tight so we make sure we get this into 10.0, but we want to design it with those things in mind and we may be lucky enough to get it in the first release. > For example, we want replicate table "users" as is (maybe with some WHERE conditions), but on events on table "departments"we want execute trigger function departments_event_handler(). > ALTER SUBSCRIPTION mysub ADD TABLE users REPLICATE TO LOCAL TABLE my_users INSERT WHERE new.id_user > 1000 UPDATE WHEREold.id_user < 1000; -- we don't want replicate deletes, for example. > ALTER SUBSCRIPTION mysub ADD TABLE departments ON INSERT WHEN (new.id_user > 1000) EXECUTE PROCEDURE departments_event_handler();-- just like trigger Filtering data at both the sender and receiver has been requested, so I guess it makes sense to have a WHERE clause on both the Publication and the Subscription, but I will think more on that. Transforming data at the receiver/Apply side we had envisaged would be done using standard triggers. So we had envisaged you would add the trigger onto the table using the existing trigger syntax, so that when an INSERT was applied the trigger would execute, using the standard mechanisms. So I don't think we need to add trigger style syntax onto the Subscription, but it is an option to consider, I guess. (Note to self: make sure we get the dependencies right here so when we drop objects they are automatically removed from Subscriptions, whilst allowing Subscriptions to be dropped without causing tables to be dropped (and Publications also)). > And we don't want handle events on third table which was added to publication. Not sure what you mean for that part. Could you explain a little more? (I'm on leave for some days, so I don't respond quickly I will respond eventually) -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 22 July 2016 at 22:55, Simon Riggs <simon@2ndquadrant.com> wrote:
Filtering data at both the sender and receiver has been requested, so
I guess it makes sense to have a WHERE clause on both the Publication
and the Subscription, but I will think more on that.
Yeah, output plugin level filtering predicates are certainly a desirable feature for down the track.
One important factor to consider there is that we're very restricted in what we can safely access. Currently the historical snapshot infrastructure set up by logical decoding doesn't stop you trying to access tables that you can't safely access, you might just get surprising/wrong results or failures. If we're going to allow arbitrary user defined predicates we'll likely need to change that so we explicitly ERROR when an attempt to heap_open a table that isn't accessible during logical decoding is made. Probably not hard, likely a good idea anyway.
Such predicates should generally be simple, though with less strict requirements than CHECK constraints. We can probably do subqueries and I don't think functions have to be immutable, though obviously anything that tries to allocate an xid will fail.
Transforming data at the receiver/Apply side we had envisaged would be
done using standard triggers. So we had envisaged you would add the
trigger onto the table using the existing trigger syntax, so that when
an INSERT was applied the trigger would execute, using the standard
mechanisms.
Yeah. In the medium term at least.
I imagine eventually we'll want more powerful transforms that doesn't require the overhead of trigger function calls, but that's a looooong way down the road.
ISTM pg_dump should alter publication/subscription when doing partial dumps, similar to how it handles foreign keys. Subscribed tables should be read-only by default. If you need something more advanced than that adding an extra option to the subscription (or wherever else) is going to be the least of your worries. If we don't want to get too wrapped up in config/syntax I think there should at least be a read-only statement level trigger installed by default that users can then drop or disable if desired. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461
On 7/20/16 8:35 PM, Craig Ringer wrote: > I'm not sure how Petr's current design for in-core replication addresses > this, if it does, or whether it's presently focused only on > point-to-point replication like pglogical. As far as I'm concerned so > long as it does direct point-to-point replication with no forwarding > that's good enough for a first cut feature, so long as the UI, catalog > and schema design leaves room for adding more later. That's the part I'm pretty worried about. I'd feel a lot better if there was at least a wiki page of future topology/communication features and a sketch of a design. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461
On 07/20/2016 10:08 AM, Simon Riggs wrote: > <sect1 id="logical-replication-monitoring"> > <title>Monitoring</title> > <para> > pg_stat_replication > </para> > <para> > pg_stat_subscription > </para> > </sect1> and probably also `pg_stat_publication` or some other way to see, what tables are currently in a PUBLICATION, who has subscribed etc. > <programlisting> > CREATE PUBLICATION mypub; > ALTER PUBLICATION mypub ADD TABLE users, departments; > </programlisting> Would a subscription just be a logical grouping or would it be something stronger like meaning atomic subscriptions and/or a dedicated replication slot ? Can you subscribe to multiple publications through single SUBSCRIPTION ? What is supposed to happen if table A is in two subscriptions S1 and S2, and you subscribe to both? Will you get table a only once (both initial copy and events)? Would a subscription of "mypub" pop up on subscriber side atomically, or will subscribed tables appear one-by one when they are ready (initial copy + catchup event replay completed) ? I recall that one of the drivers of developing pgq/skytools to replace Slony was the fact that Slony's "replication group" design made it very easy to blunder subscription changes in more complex topologies which manifested in deadlocks. PGQ-s table-by-table subscription avoided this entirely at the cost on non-atomic subscribed table appearance. Of course once subscribed, everything was transaction-consistent again. > <programlisting> > CREATE SUBSCRIPTION mysub WITH CONNECTION <quote>dbname=foo host=bar > user=repuser</quote> PUBLICATION mypub; > </programlisting> For the pgq-like version which consider a PUBLICATION just as list of tables to subscribe, I would add CREATE SUBSCRIPTION mysub WITH CONNECTION 'dbname=foo host=bar user=repuser' PUBLICATION mypub, mypub1; ALTER SUBSCRIPTION mysub DROP PUBLICATION mypub1; ALTER SUBSCRIPTION mysub ADD PUBLICATION mypub2; -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
On 26/07/16 00:05, Hannu Krosing wrote: >> <programlisting> >> CREATE PUBLICATION mypub; >> ALTER PUBLICATION mypub ADD TABLE users, departments; >> </programlisting> > Would a subscription just be a logical grouping or would it be something > stronger > like meaning atomic subscriptions and/or a dedicated replication slot ? > Not sure what you mean by atomic subscription but subscription creation adds replication slot to the provider node. Other than that subscription lives on the downstream node only. > Can you subscribe to multiple publications through single SUBSCRIPTION ? > Yes. > What is supposed to happen if table A is in two subscriptions S1 and S2, > and you > subscribe to both? Will you get table a only once (both initial copy and > events)? Yes only once, the replication works with tables, publication is really just grouping/filtering, what you get is union of tables in the publications. > > Would a subscription of "mypub" pop up on subscriber side atomically, or > will subscribed > tables appear one-by one when they are ready (initial copy + catchup > event replay completed) ? > Yes that's my plan as that makes it easier to parallelize and recover from crashes (also makes this faster as tables that are already done don't need to be copied again) during the initialization. Also makes it easier to reuse the table initialization code for adding new tables at later time. >> <programlisting> >> CREATE SUBSCRIPTION mysub WITH CONNECTION <quote>dbname=foo host=bar >> user=repuser</quote> PUBLICATION mypub; >> </programlisting> > For the pgq-like version which consider a PUBLICATION just as list of > tables to subscribe, I would add > > CREATE SUBSCRIPTION mysub WITH CONNECTION 'dbname=foo host=bar > user=repuser' PUBLICATION mypub, mypub1; > Yes that works as well. > ALTER SUBSCRIPTION mysub DROP PUBLICATION mypub1; > > ALTER SUBSCRIPTION mysub ADD PUBLICATION mypub2; > This does not yet, but I agree we should have it. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Jul 20, 2016 at 4:08 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > In this post, Petr and I present a joint view on a design for how this > should work in-core, based upon our implementation experiences with physical > replication, pglogical and various comments so far. > > Note that this has substantial user-visible differences from pglogical, > though much of the underlying architecture is reused. > > I should stress that not all of the aspects are implemented yet. The post > here today is a combination of all of our attempts to bring architecture, > usability and security into one place, including a coherent way of > describing the features and how they work. > > Your comments and questions are sought now as we begin the main development > effort to get this into PostgreSQL 10.0 Thanks for publishing this. One minor comment is that this document makes extensive use of Terms With Initial Capitals, which seems stylistically odd, although I'm not precisely sure what would be better. I would have expected that there would be a concept of a REPLICATION SET, defining which tables are to be replicated; here, that seems to be the Publication. That may fine, but I wonder if there is any value in separating those things. It's clear, for example, that a replication set can be dumped: which tables are members of which replication sets is durable metadata. It's less clear that a publication can be dumped; that might include things which are not durable metadata, such as associations with slots. It's generally not really clear to me based on reading this exactly what information is encapsulated in a Publication or a Subscription, which makes it hard to evaluate design decisions like this one: > <para> > The definition of a Publication object will be included within > pg_dump by default when all of the objects in the Publication are > requested as part of the dump specification. > </para> > <para> > Subscriptions are not dumped by pg_dump by default, but can be > requested using --subscriptions parameter. > </para> I think that to really understand exactly what you and Petr have in mind, we'd need a description of where publication and subscription data is stored within the server, and exactly what gets stored. Perhaps that will come in a later email. I'm not bashing the design, exactly, I just can't quite see how all of the pieces fit together yet. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > One minor comment is that this document makes extensive use of Terms > With Initial Capitals, which seems stylistically odd, although I'm not > precisely sure what would be better. We use <firstterm>publication</> on the first use only, which is turned into italics. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 29 July 2016 at 16:53, Robert Haas <robertmhaas@gmail.com> wrote: > I think that to really understand exactly what you and Petr have in > mind, we'd need a description of where publication and subscription > data is stored within the server, and exactly what gets stored. > Perhaps that will come in a later email. I'm not bashing the design, > exactly, I just can't quite see how all of the pieces fit together > yet. Sure no problem. It's clear there are about 12 ways of putting this together and each way has various terms needed. We need input from many people to get this right, since this is much more about UI than system architecture. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services