Thread: Logical Replication WIP

Logical Replication WIP

From
Petr Jelinek
Date:
Hi,

as promised here is WIP version of logical replication patch.

This is by no means anywhere close to be committable, but it should be
enough for discussion on the approaches chosen. I do plan to give this
some more time before September CF as well as during the CF itself.

You've seen some preview of ideas in the doc Simon posted [1], not all
of them are implemented yet in this patch though.

I'll start with the overview of the state of things.

What works:
  - Replication of INSERT/UPDATE/DELETE operations on tables in
    publication.
  - Initial copy of data in publication.
  - Automatic management of things like slots and origin tracking.
  - Some psql support (\drp, \drs and additional info in \d for
    tables, it's mainly missing ACLs as those are not implemented
    (see bellow) yet and tab completion.

What's missing:
  - sequences, I'd like to have them in 10.0 but I don't have good
    way to implement it. PGLogical uses periodical syncing with some
    buffer value but that's suboptimal. I would like to decode them
    but that has proven to be complicated due to their sometimes
    transactional sometimes nontransactional nature, so I probably
    won't have time to do it within 10.0 by myself.
  - ACLs, I still expect to have it the way it's documented in the
    logical replication docs, but currently the code just assumes
    superuser/REPLICATION role. This can be probably discussed in the
    design thread more [1].
  - pg_dump, same as above, I want to have publications and membership
    in those dumped unconditionally and potentially dump also
    subscription definitions if user asks for it using commandline
    option as I don't think subscriptions should be dumped by default as
    automatically starting replication when somebody dumps and restores
    the db goes against POLA.
  - DDL, I see several approaches we could do here for 10.0. a) don't
    deal with DDL at all yet, b) provide function which pushes the DDL
    into replication queue and then executes on downstream (like
    londiste, slony, pglogical do), c) capture the DDL query as text
    and allow user defined function to be called with that DDL text on
    the subscriber (that's what oracle did with CDC)
  - FDW support on downstream, currently only INSERTs should work
    there but that should be easy to fix.
  - Monitoring, I'd like to add some pg_stat_subscription view on the
    downstream (the rest of monitoring is very similar to physical
    streaming so that needs mostly docs).
  - TRUNCATE, this is handled using triggers in BDR and pglogical but
    I am not convinced that's the right way to do it for incore as it
    brings limitations (fe. inability to use restart identity).

The parts I am not overly happy with:
  - The fact that subscription handles slot creation/drop means we do
    some automagic that might fail and user might need to fix that up
    manually. I am not saying this is necessarily problem as that's how
    most of the publish/subscribe replication systems work but I wonder
    if there is better way of doing this that I missed.
  - The initial copy patch adds some interfaces for getting table list
    and data into the DecodingContext and I wonder if that's good place
    for those or if we should create some TableSync API instead that
    would load plugin as well and have these two new interfaces and put
    into the tablesync module. One reason why I didn't do it is that
    the interface would be almost the same and the plugin then would
    have to do separate init for DecodingContext and TableSync.
  - The initial copy uses the snapshot from slot creation in the
    walsender. I currently just push it as active snapshot inside
    snapbuilder which is probably not the right thing to do (tm). That
    is mostly because I don't really know what the right thing is there.

About individual pathes:
0001-Add-PUBLICATION-catalogs-and-DDL.patch: This patch defines a
Publication which his basically same thing as replication set. It adds
database local catalog pg_publication which stores the publications and
DML filters, and pg_publication_rel catalog for storing membership of
relation in the publication. Adds the DDL, dependency handling and all
the necessary boilerplate around that including some basic regression
tests for the DDL.

0002-Add-SUBSCRIPTION-catalog-and-DDL.patch: Adds Subscriptions with
shared nailed (!) catalog pg_subscription which stores the individual
subscriptions for each database. The reason why this is nailed is that
it needs to be accessible without connection to database so that the
logical replication launcher can read it and start/stop workers as
necessary. This does not include regression tests as I am usure how to
test this within regression testing framework given that it is
supposed to start workers (those are added in later patches).

0003-Define-logical-replication-protocol-and-output-plugi.patch:
Adds the logical replication protocol (api and docs) and "standard"
output plugin for logical decoding that produces output based on that
protocol and the publication definitions.

0004-Make-libpqwalreceiver-reentrant.patch: Redesigns the
libpqwalreceiver to be reusable outside of walreceiver by exporting
the api as struct and opaque connection handle. Also adds couple of
additional functions for logical replication.

0005-Add-logical-replication-workers.patch: This patch adds the actual
logical replication workers that use all above to implement the data
change replication from publisher to subscriber. It adds two different
background workers. First is Launcher which works like the autovacuum
laucnher in that it gets list of subscriptions and starts/stops the
apply workers for those subscriptions as needed. Apply workers connect
to the output plugin via streaming protocol and handle the actual data
replication. I exported the ExecUpdate/ExecInsert/ExecDelete functions
from nodeModifyTable to handle the actual database updates so that
things like triggers, etc are handled automatically without special
code. This also adds couple of TAP tests that test basic replication
setup and also wide variety of type support. Also the overview doc for
logical replication that Simon previously posted to the list is part
of this one.

0006-Logical-replication-support-for-initial-data-copy.patch: PoC of
initial sync. It adds another mode into apply worker which just applies
updates for single table and some handover logic for when the table is
given synchronized and can be replicated normally. It also adds new
catalog pg_subscription_rel which keeps information about
synchronization status of individual tables. Note that tables added to
publications at later time are not yet synchronized, there is also no
resynchronization UI yet.

On the upstream side it adds two new commands into replication protocol
for getting list of tables and for streaming existing table data. I
discussed this part as suboptimal above so won't repeat here.

Feedback is welcome.

[1]

https://www.postgresql.org/message-id/flat/CANP8%2Bj%2BNMHP-yFvoG03tpb4_s7GdmnCriEEOJeKkXWmUu_%3D-HA%40mail.gmail.com#CANP8+j+NMHP-yFvoG03tpb4_s7GdmnCriEEOJeKkXWmUu_=-HA@mail.gmail.com

--
   Petr Jelinek                  http://www.2ndQuadrant.com/
   PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Logical Replication WIP

From
Andres Freund
Date:
On 2016-08-05 17:00:13 +0200, Petr Jelinek wrote:
> as promised here is WIP version of logical replication patch.

Yay!

I'm about to head out for a week of, desperately needed, holidays, but
after that I plan to spend a fair amount of time helping to review
etc. this.



Re: Logical Replication WIP

From
Simon Riggs
Date:
On 5 August 2016 at 16:22, Andres Freund <andres@anarazel.de> wrote:
> On 2016-08-05 17:00:13 +0200, Petr Jelinek wrote:
>> as promised here is WIP version of logical replication patch.
>
> Yay!

Yay2

> I'm about to head out for a week of, desperately needed, holidays, but
> after that I plan to spend a fair amount of time helping to review
> etc. this.

Have a good one.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Logical Replication WIP

From
Masahiko Sawada
Date:
On Sat, Aug 6, 2016 at 2:04 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 5 August 2016 at 16:22, Andres Freund <andres@anarazel.de> wrote:
>> On 2016-08-05 17:00:13 +0200, Petr Jelinek wrote:
>>> as promised here is WIP version of logical replication patch.
>>
>> Yay!
>
> Yay2
>

Thank you for working on this!

I've applied these patches to current HEAD, but got the following error.

libpqwalreceiver.c:48: error: redefinition of typedef ‘WalReceiverConnHandle’
../../../../src/include/replication/walreceiver.h:137: note: previous
declaration of ‘WalReceiverConnHandle’ was here
make[2]: *** [libpqwalreceiver.o] Error 1
make[1]: *** [install-backend/replication/libpqwalreceiver-recurse] Error 2
make: *** [install-src-recurse] Error 2

After fixed this issue with attached patch, I used logical replication a little.
Some random comments and questions.

The logical replication launcher process and the apply process are
implemented as a bgworker. Isn't better to have them as an auxiliary
process like checkpointer, wal writer?
IMO the number of logical replication connections should not be
limited by max_worker_processes.

--
We need to set the publication up by at least CREATE PUBLICATION and
ALTER PUBLICATION command.
Can we make CREATE PUBLICATION possible to define tables as well?
For example,
CREATE PUBLICATION mypub [ TABLE table_name, ...] [WITH options]

--
This patch can not drop the subscription.

=# drop subscription sub;
ERROR:  unrecognized object class: 6102

--
+/*-------------------------------------------------------------------------
+ *
+ * proto.c
+ *             logical replication protocol functions
+ *
+ * Copyright (c) 2015, PostgreSQL Global Development Group
+ *

The copyright of added files are old.

And this patch has some whitespace problems.
Please run "git show --check" or "git diff origin/master --check"

Regards,

--
Masahiko Sawada

Attachment

Re: Logical Replication WIP

From
Craig Ringer
Date:
On 9 August 2016 at 15:59, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
 

The logical replication launcher process and the apply process are
implemented as a bgworker. Isn't better to have them as an auxiliary
process like checkpointer, wal writer?

I don't think so. The checkpointer, walwriter, autovacuum, etc predate bgworkers. I strongly suspect that if they were to be implemented now they'd use bgworkers.

Now, perhaps we want a new bgworker "kind" for system workers or some other minor tweaks. But basically I think bgworkers are exactly what we should be using here.
 
IMO the number of logical replication connections should not be
limited by max_worker_processes.

Well, they *are* worker processes... but I take your point, that that setting has been "number of bgworkers the user can run" and it might not be expected that logical replication would use the same space.

max_worker_progresses isn't just a limit, it controls how many shmem slots we allocate.

I guess we could have a separate max_logical_workers or something, but I'm inclined to think that adds complexity without really making things any nicer. We'd just add them together to decide how many shmem slots to allocate and we'd have to keep track of how many slots were used by which types of backend. Or create a near-duplicate of the bgworker facility for logical rep.

Sure, you can go deeper down the rabbit hole here and say that we need to add bgworker "categories" with reserved pools of worker slots for each category. But do we really need that?

max_connections includes everything, both system and user backends. It's not like we don't do this elsewhere. It's at worst a mild wart.

The only argument I can see for not using bgworkers is for the supervisor worker. It's a singleton that launches the per-database workers, and arguably is a job that the postmaster could do better. The current design there stems from its origins as an extension. Maybe worker management could be simplified a bit as a result. I'd really rather not invent yet another new and mostly duplicate category of custom workers to achieve that though.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Logical Replication WIP

From
Michael Paquier
Date:
On Tue, Aug 9, 2016 at 5:13 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 9 August 2016 at 15:59, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> The logical replication launcher process and the apply process are
>> implemented as a bgworker. Isn't better to have them as an auxiliary
>> process like checkpointer, wal writer?
>
> I don't think so. The checkpointer, walwriter, autovacuum, etc predate
> bgworkers. I strongly suspect that if they were to be implemented now they'd
> use bgworkers.

+1. We could always get them now under the umbrella of the bgworker
infrastructure if this cleans up some code duplication.
-- 
Michael



Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 09/08/16 09:59, Masahiko Sawada wrote:
>>> On 2016-08-05 17:00:13 +0200, Petr Jelinek wrote:
>>>> as promised here is WIP version of logical replication patch.
>>>
>
> Thank you for working on this!

Thanks for looking!

>
> I've applied these patches to current HEAD, but got the following error.
>
> libpqwalreceiver.c:48: error: redefinition of typedef ‘WalReceiverConnHandle’
> ../../../../src/include/replication/walreceiver.h:137: note: previous
> declaration of ‘WalReceiverConnHandle’ was here
> make[2]: *** [libpqwalreceiver.o] Error 1
> make[1]: *** [install-backend/replication/libpqwalreceiver-recurse] Error 2
> make: *** [install-src-recurse] Error 2
>
> After fixed this issue with attached patch, I used logical replication a little.
> Some random comments and questions.
>

Interesting, my compiler does have problem. Will investigate.

> The logical replication launcher process and the apply process are
> implemented as a bgworker. Isn't better to have them as an auxiliary
> process like checkpointer, wal writer?
> IMO the number of logical replication connections should not be
> limited by max_worker_processes.
>

What Craig said reflects my rationale for doing this pretty well.

> We need to set the publication up by at least CREATE PUBLICATION and
> ALTER PUBLICATION command.
> Can we make CREATE PUBLICATION possible to define tables as well?
> For example,
> CREATE PUBLICATION mypub [ TABLE table_name, ...] [WITH options]

Agreed, that just didn't make it to the first cut to -hackers. We've 
been also thinking of having special ALL TABLES parameter there that 
would encompass whole db.

> --
> This patch can not drop the subscription.
>
> =# drop subscription sub;
> ERROR:  unrecognized object class: 6102
>

Yeah that's because of the patch 0006, I didn't finish all the 
dependency tracking for the pg_subscription_rel catalog that it adds 
(which is why I called it PoC). I expect to have this working in next 
version (there is still quite a bit of polish work needed in general).

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 09/08/16 10:13, Craig Ringer wrote:
> On 9 August 2016 at 15:59, Masahiko Sawada <sawada.mshk@gmail.com
> <mailto:sawada.mshk@gmail.com>> wrote:
>
>
>
>     The logical replication launcher process and the apply process are
>     implemented as a bgworker. Isn't better to have them as an auxiliary
>     process like checkpointer, wal writer?
>
>
> I don't think so. The checkpointer, walwriter, autovacuum, etc predate
> bgworkers. I strongly suspect that if they were to be implemented now
> they'd use bgworkers.
>
> Now, perhaps we want a new bgworker "kind" for system workers or some
> other minor tweaks. But basically I think bgworkers are exactly what we
> should be using here.
>

Agreed.

>
>     IMO the number of logical replication connections should not be
>     limited by max_worker_processes.
>
>
> Well, they *are* worker processes... but I take your point, that that
> setting has been "number of bgworkers the user can run" and it might not
> be expected that logical replication would use the same space.

Again agree, I think we should ultimately go towards what PeterE 
suggested in 
https://www.postgresql.org/message-id/a2fffd92-6e59-a4eb-dd85-c5865ebca1a0@2ndquadrant.com

>
> The only argument I can see for not using bgworkers is for the
> supervisor worker. It's a singleton that launches the per-database
> workers, and arguably is a job that the postmaster could do better. The
> current design there stems from its origins as an extension. Maybe
> worker management could be simplified a bit as a result. I'd really
> rather not invent yet another new and mostly duplicate category of
> custom workers to achieve that though.
>

It is simplified compared to pglogical (there is only 2 worker types not 
3). I don't think it's job of postmaster to scan catalogs however so it 
can't really start workers for logical replication. I actually modeled 
it more after autovacuum (using bgworkers though) than the original 
extension.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Masahiko Sawada
Date:
On Tue, Aug 9, 2016 at 5:13 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 9 August 2016 at 15:59, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
>>
>>
>> The logical replication launcher process and the apply process are
>> implemented as a bgworker. Isn't better to have them as an auxiliary
>> process like checkpointer, wal writer?
>
>
> I don't think so. The checkpointer, walwriter, autovacuum, etc predate
> bgworkers. I strongly suspect that if they were to be implemented now they'd
> use bgworkers.
>
> Now, perhaps we want a new bgworker "kind" for system workers or some other
> minor tweaks. But basically I think bgworkers are exactly what we should be
> using here.

I understood. Thanks!

>>
>> IMO the number of logical replication connections should not be
>> limited by max_worker_processes.
>
>
> Well, they *are* worker processes... but I take your point, that that
> setting has been "number of bgworkers the user can run" and it might not be
> expected that logical replication would use the same space.
>
> max_worker_progresses isn't just a limit, it controls how many shmem slots
> we allocate.
>
> I guess we could have a separate max_logical_workers or something, but I'm
> inclined to think that adds complexity without really making things any
> nicer. We'd just add them together to decide how many shmem slots to
> allocate and we'd have to keep track of how many slots were used by which
> types of backend. Or create a near-duplicate of the bgworker facility for
> logical rep.
>
> Sure, you can go deeper down the rabbit hole here and say that we need to
> add bgworker "categories" with reserved pools of worker slots for each
> category. But do we really need that?

If we change these processes to bgworker, we can categorize them into
two, auxiliary process(check pointer and  wal sender etc) and other
worker process.
And max_worker_processes controls the latter.

> max_connections includes everything, both system and user backends. It's not
> like we don't do this elsewhere. It's at worst a mild wart.
>
> The only argument I can see for not using bgworkers is for the supervisor
> worker. It's a singleton that launches the per-database workers, and
> arguably is a job that the postmaster could do better. The current design
> there stems from its origins as an extension. Maybe worker management could
> be simplified a bit as a result. I'd really rather not invent yet another
> new and mostly duplicate category of custom workers to achieve that though.
>
>

Regards,

--
Masahiko Sawada



Re: Logical Replication WIP

From
Craig Ringer
Date:
On 9 August 2016 at 17:28, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
 
> Sure, you can go deeper down the rabbit hole here and say that we need to
> add bgworker "categories" with reserved pools of worker slots for each
> category. But do we really need that?

If we change these processes to bgworker, we can categorize them into
two, auxiliary process(check pointer and  wal sender etc) and other
worker process.
And max_worker_processes controls the latter.

Right. I think that's probably the direction we should be going eventually. Personally I don't think such a change should block the logical replication work from proceeding with bgworkers, though. It's been delayed a long time, a lot of people want it, and I think we need to focus on meeting the core requirements not getting too sidetracked on minor points.

Of course, everyone's idea of what's core and what's a minor sidetrack differs ;) 


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 09/08/16 12:16, Craig Ringer wrote:
> On 9 August 2016 at 17:28, Masahiko Sawada <sawada.mshk@gmail.com
> <mailto:sawada.mshk@gmail.com>> wrote:
>
>
>     > Sure, you can go deeper down the rabbit hole here and say that we need to
>     > add bgworker "categories" with reserved pools of worker slots for each
>     > category. But do we really need that?
>
>     If we change these processes to bgworker, we can categorize them into
>     two, auxiliary process(check pointer and  wal sender etc) and other
>     worker process.
>     And max_worker_processes controls the latter.
>
>
> Right. I think that's probably the direction we should be going
> eventually. Personally I don't think such a change should block the
> logical replication work from proceeding with bgworkers, though. It's
> been delayed a long time, a lot of people want it, and I think we need
> to focus on meeting the core requirements not getting too sidetracked on
> minor points.
>
> Of course, everyone's idea of what's core and what's a minor sidetrack
> differs ;)
>

Yeah that's why I added local max GUC that just handles the logical 
worker limit within the max_worker_processes. I didn't want to also 
write generic framework for managing the max workers using tags or 
something as part of this, it's big enough as it is and we can always 
move the limit to the more generic place once we have it.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Alvaro Herrera
Date:
Petr Jelinek wrote:
> On 09/08/16 12:16, Craig Ringer wrote:

> >Right. I think that's probably the direction we should be going
> >eventually. Personally I don't think such a change should block the
> >logical replication work from proceeding with bgworkers, though.
> 
> Yeah that's why I added local max GUC that just handles the logical worker
> limit within the max_worker_processes. I didn't want to also write generic
> framework for managing the max workers using tags or something as part of
> this, it's big enough as it is and we can always move the limit to the more
> generic place once we have it.

Parallel query does exactly that: the workers are allocated from the
bgworkers array, and if you want more, it's on you to increase that
limit (it doesn't even have the GUC for a maximum).  As far as logical
replication and parallel query are concerned, that's fine.  We can
improve this later, if it proves to be a problem.

I think there are far more pressing matters to review.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Logical Replication WIP

From
Alvaro Herrera
Date:
Petr Jelinek wrote:
> On 09/08/16 10:13, Craig Ringer wrote:

> >The only argument I can see for not using bgworkers is for the
> >supervisor worker. It's a singleton that launches the per-database
> >workers, and arguably is a job that the postmaster could do better. The
> >current design there stems from its origins as an extension. Maybe
> >worker management could be simplified a bit as a result. I'd really
> >rather not invent yet another new and mostly duplicate category of
> >custom workers to achieve that though.
> 
> It is simplified compared to pglogical (there is only 2 worker types not 3).
> I don't think it's job of postmaster to scan catalogs however so it can't
> really start workers for logical replication. I actually modeled it more
> after autovacuum (using bgworkers though) than the original extension.

Yeah, it's a very bad idea to put postmaster on this task.  We should
definitely stay away from that.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Logical Replication WIP

From
Stas Kelvich
Date:
> On 05 Aug 2016, at 18:00, Petr Jelinek <petr@2ndquadrant.com> wrote:
>
> Hi,
>
> as promised here is WIP version of logical replication patch.

Great!

Proposed DDL about publication/subscriptions looks very nice to me.

Some notes and thoughts about patch:


* Clang grumbles at following pieces of code:

apply.c:1316:6: warning: variable 'origin_startpos' is used uninitialized whenever 'if' condition is false
[-Wsometimes-uninitialized]

tablesync.c:436:45: warning: if statement has empty body [-Wempty-body]
                                if (wait_for_sync_status_change(tstate));


* max_logical_replication_workers mentioned everywhere in docs, but guc.c defines
variable called max_logical_replication_processes for postgresql.conf

* Since pg_subscription already shared across the cluster, it can be also handy to
share pg_publications too and allow publication of tables from different databases. That
is rare scenarios but quite important for virtual hosting use case — tons of small databases
in a single postgres cluster.

* There is no way to see attachecd tables/schemas to publication through \drp

* As far as I understand there is no way to add table/tablespace right in CREATE
PUBLICATION and one need explicitly do ALTER PUBLICATION right after creation.
May be add something like WITH TABLE/TABLESPACE to CREATE?

* So binary protocol goes into core. Is it still possible to use it as decoding plugin for
manually created walsender? May be also include json as it was in pglogical? While
i’m not arguing that it should be done, i’m interested about your opinion on that.

* Also I’ve noted that you got rid of reserved byte (flags) in protocol comparing to
pglogical_native. It was very handy to use it for two phase tx decoding (0 — usual
commit, 1 — prepare, 2 — commit prepared), because both prepare and commit
prepared generates commit record in xlog.

> On 05 Aug 2016, at 18:00, Petr Jelinek <petr@2ndquadrant.com> wrote:
>
> - DDL, I see several approaches we could do here for 10.0. a) don't
>   deal with DDL at all yet, b) provide function which pushes the DDL
>   into replication queue and then executes on downstream (like
>   londiste, slony, pglogical do), c) capture the DDL query as text
>   and allow user defined function to be called with that DDL text on
>   the subscriber

* Since here DDL is mostly ALTER / CREATE / DROP TABLE (or am I wrong?) may be
we can add something like WITH SUBSCRIBERS to statements?

* Talking about exact mechanism of DDL replication I like you variant b), but since we
have transactional DDL, we can do two phase commit here. That will require two phase
decoding and some logic about catching prepare responses through logical messages. If that
approach sounds interesting i can describe proposal in more details and create a patch.

* Also I wasn’t able actually to run replication itself =) While regression tests passes, TAP
tests and manual run stuck. pg_subscription_rel.substate never becomes ‘r’. I’ll investigate
that more and write again.

* As far as I understand sync starts automatically on enabling publication. May be split that
logic into a different command with some options? Like don’t sync at all for example.

* When I’m trying to create subscription to non-existent publication, CREATE SUBSRITION
creates replication slot and do not destroys it:

# create subscription sub connection 'host=127.0.0.1 dbname=postgres' publication mypub;
NOTICE:  created replication slot "sub" on provider
ERROR:  could not receive list of replicated tables from the provider: ERROR:  cache lookup failed for publication 0
CONTEXT:  slot "sub", output plugin "pgoutput", in the list_tables callback

after that:

postgres=# drop subscription sub;
ERROR:  subscription "sub" does not exist
postgres=# create subscription sub connection 'host=127.0.0.1 dbname=postgres' publication pub;
ERROR:  could not crate replication slot "sub": ERROR:  replication slot "sub" already exists

* Also can’t drop subscription:

postgres=# \drs
                          List of subscriptions
 Name | Database | Enabled | Publication |            Conninfo
------+----------+---------+-------------+--------------------------------
 sub  | postgres | t       | {mypub}     | host=127.0.0.1 dbname=postgres
(1 row)

postgres=# drop subscription sub;
ERROR:  unrecognized object class: 6102

* Several time i’ve run in a situation where provider's postmaster ignores Ctrl-C until subscribed
node is switched off.

* Patch with small typos fixed attached.

I’ll do more testing, just want to share what i have so far.




--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


Attachment

Re: Logical Replication WIP

From
Petr Jelinek
Date:
Hi,

On 11/08/16 13:34, Stas Kelvich wrote:
>
> * max_logical_replication_workers mentioned everywhere in docs, but guc.c defines
> variable called max_logical_replication_processes for postgresql.conf
>

Ah changed it in code but not in docs, will fix.

> * Since pg_subscription already shared across the cluster, it can be also handy to
> share pg_publications too and allow publication of tables from different databases. That
> is rare scenarios but quite important for virtual hosting use case — tons of small databases
> in a single postgres cluster.

You can't decode changes from multiple databases in one slot so I don't 
see the usefulness there. The pg_subscription is currently shared 
because it's technical necessity (as in I don't see how to solve the 
need to access the catalog from launcher in any other way) not because I 
think it's great design :)

>
> * There is no way to see attachecd tables/schemas to publication through \drp
>

That's mostly intentional as publications for table are visible in \d, 
but I am not against adding it to \drp.

> * As far as I understand there is no way to add table/tablespace right in CREATE
> PUBLICATION and one need explicitly do ALTER PUBLICATION right after creation.
> May be add something like WITH TABLE/TABLESPACE to CREATE?
>

Yes, as I said to Masahiko Sawada, it's just not there yet but I plan to 
have that.

> * So binary protocol goes into core. Is it still possible to use it as decoding plugin for
> manually created walsender? May be also include json as it was in pglogical? While
> i’m not arguing that it should be done, i’m interested about your opinion on that.
>

Well the plugin is bit more integrated into the publication infra so if 
somebody would want to use it directly they'd have to use that part as 
well. OTOH the protocol itself is provided as API so it's reusable by 
other plugins if needed.

JSON plugin is something that would be nice to have in core as well, but 
I don't think it's part of this patch.

> * Also I’ve noted that you got rid of reserved byte (flags) in protocol comparing to
> pglogical_native. It was very handy to use it for two phase tx decoding (0 — usual
> commit, 1 — prepare, 2 — commit prepared), because both prepare and commit
> prepared generates commit record in xlog.

Hmm maybe commit message could get it back. PGLogical has them sprinkled 
all around the protocol which I don't really like so I want to limit 
them to the places where they are actually useful.

>
>> On 05 Aug 2016, at 18:00, Petr Jelinek <petr@2ndquadrant.com> wrote:
>>
>> - DDL, I see several approaches we could do here for 10.0. a) don't
>>   deal with DDL at all yet, b) provide function which pushes the DDL
>>   into replication queue and then executes on downstream (like
>>   londiste, slony, pglogical do), c) capture the DDL query as text
>>   and allow user defined function to be called with that DDL text on
>>   the subscriber
>
> * Since here DDL is mostly ALTER / CREATE / DROP TABLE (or am I wrong?) may be
> we can add something like WITH SUBSCRIBERS to statements?
>

Not sure I follow. How does that help?

> * Talking about exact mechanism of DDL replication I like you variant b), but since we
> have transactional DDL, we can do two phase commit here. That will require two phase
> decoding and some logic about catching prepare responses through logical messages. If that
> approach sounds interesting i can describe proposal in more details and create a patch.
>

I'd think that such approach is somewhat more interesting with c) 
honestly. The difference between b) and c) is mostly about explicit vs 
implicit. I definitely would like to see the 2PC patch updated to work 
with this. But maybe it's wise to wait a while until the core of the 
patch stabilizes during the discussion.

> * Also I wasn’t able actually to run replication itself =) While regression tests passes, TAP
> tests and manual run stuck. pg_subscription_rel.substate never becomes ‘r’. I’ll investigate
> that more and write again.

Interesting, please keep me posted. It's possible for tables to stay in 
's' state for some time if there is nothing happening on the server, but 
that should not mean anything is stuck.

>
> * As far as I understand sync starts automatically on enabling publication. May be split that
> logic into a different command with some options? Like don’t sync at all for example.
>

I think SYNC should be option of subscription creation just like 
INITIALLY ENABLED/DISABLED is. And then there should be interface to 
resync a table manually (like pglogical has). Not yet sure how that 
interface should look like in terms of DDL though.

> * When I’m trying to create subscription to non-existent publication, CREATE SUBSRITION
> creates replication slot and do not destroys it:
>
> # create subscription sub connection 'host=127.0.0.1 dbname=postgres' publication mypub;
> NOTICE:  created replication slot "sub" on provider
> ERROR:  could not receive list of replicated tables from the provider: ERROR:  cache lookup failed for publication 0
> CONTEXT:  slot "sub", output plugin "pgoutput", in the list_tables callback
>
> after that:
>
> postgres=# drop subscription sub;
> ERROR:  subscription "sub" does not exist
> postgres=# create subscription sub connection 'host=127.0.0.1 dbname=postgres' publication pub;
> ERROR:  could not crate replication slot "sub": ERROR:  replication slot "sub" already exists
>

See the TODO in CreateSubscription function :)

> * Also can’t drop subscription:
>
> postgres=# \drs
>                           List of subscriptions
>  Name | Database | Enabled | Publication |            Conninfo
> ------+----------+---------+-------------+--------------------------------
>  sub  | postgres | t       | {mypub}     | host=127.0.0.1 dbname=postgres
> (1 row)
>
> postgres=# drop subscription sub;
> ERROR:  unrecognized object class: 6102

Yes that has been already reported.

>
> * Several time i’ve run in a situation where provider's postmaster ignores Ctrl-C until subscribed
> node is switched off.
>

Hmm I guess there is bug in signal processing code somewhere.


> * Patch with small typos fixed attached.
>
> I’ll do more testing, just want to share what i have so far.
>

Thanks for both.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Steve Singer
Date:
On 08/05/2016 11:00 AM, Petr Jelinek wrote:
> Hi,
>
> as promised here is WIP version of logical replication patch.
>

Thanks for keeping on this.  This is important work

> Feedback is welcome.
>

+<sect1 id="logical-replication-publication">
+  <title>Publication</title>
+  <para>
+    A Publication object can be defined on any master node, owned by one
+    user. A Publication is a set of changes generated from a group of
+    tables, and might also be described as a Change Set or Replication Set.
+    Each Publication exists in only one database.

'A publication object can be defined on *any master node*'.  I found 
this confusing the first time I read it because I thought it was 
circular (what makes a node a 'master' node? Having a publication object 
published from it?).   On reflection I realized that you mean ' any 
*physical replication master*'.  I think this might be better worded as 
'A publication object can be defined on any node other than a standby 
node'.  I think referring to 'master' in the context of logical 
replication might confuse people.

I am raising this in the context of the larger terminology that we want 
to use and potential confusion with the terminology we use for physical 
replication. I like the publication / subscription terminology you've 
gone with.

 <para>
+    Publications are different from table schema and do not affect
+    how the table is accessed. Each table can be added to multiple
+    Publications if needed.  Publications may include both tables
+    and materialized views. Objects must be added explicitly, except
+    when a Publication is created for "ALL TABLES". There is no
+    default name for a Publication which specifies all tables.
+  </para>
+  <para>
+    The Publication is different from table schema, it does not affect
+    how the table is accessed and each table can be added to multiple

Those 2 paragraphs seem to start the same way.  I get the feeling that 
there is some point your trying to express that I'm not catching onto. 
Of course a publication is different than a tables schema, or different 
than a function.

The definition of publication you have on the CREATE PUBLICATION page 
seems better and should be repeated here (A publication is essentially a 
group of tables intended for managing logical replication. See Section 
30.1 <cid:part1.06040100.08080900@ssinger.info> for details about how 
publications fit into logical replication setup. )


+  <para>
+    Conflicts happen when the replicated changes is breaking any
+    specified constraints (with the exception of foreign keys which are
+    not checked). Currently conflicts are not resolved automatically and
+    cause replication to be stopped with an error until the conflict is
+    manually resolved.

What options are there for manually resolving conflicts?  Is the only 
option to change the data on the subscriber to avoid the conflict?
I assume there isn't a way to flag a particular row coming from the 
publisher and say ignore it.  I don't think this is something we need to 
support for the first version.

<sect1 id="logical-replication-architecture">
+  <title>Architecture</title>
+  <para>
+    Logical replication starts by copying a snapshot of the data on
+    the Provider database. Once that is done, the changes on Provider

I notice the user of 'Provider' above do you intend to update that to 
'Publisher' or does provider mean something different. If we like the 
'publication' terminology then I think 'publishers' should publish them 
not providers.


I'm trying to test a basic subscription and I do the following

I did the following:

cluster 1:
create database test1;
create table a(id serial8 primary key,b text);
create publication testpub1; alter publication testpub1 add table a;
insert into a(b) values ('1');

cluster2
create database test1;
create table a(id serial8 primary key,b text);
create subscription testsub2 publication testpub1 connection 
'host=localhost port=5440 dbname=test1';
NOTICE:  created replication slot "testsub2" on provider
NOTICE:  synchronized table states
CREATE SUBSCRIPTION

This resulted in
LOG:  logical decoding found consistent point at 0/15625E0
DETAIL:  There are no running transactions.
LOG:  exported logical decoding snapshot: "00000494-1" with 0 
transaction IDs
LOG:  logical replication apply for subscription testsub2 started
LOG:  starting logical decoding for slot "testsub2"
DETAIL:  streaming transactions committing after 0/1562618, reading WAL 
from 0/15625E0
LOG:  logical decoding found consistent point at 0/15625E0
DETAIL:  There are no running transactions.
LOG:  logical replication sync for subscription testsub2, table a started
LOG:  logical decoding found consistent point at 0/1562640
DETAIL:  There are no running transactions.
LOG:  exported logical decoding snapshot: "00000495-1" with 0 
transaction IDs
LOG:  logical replication synchronization worker finished processing


The initial sync completed okay, then I did

insert into a(b) values ('2');

but the second insert never replicated.

I had the following output

LOG:  terminating walsender process due to replication timeout


On cluster 1 I do

select * FROM pg_stat_replication; pid | usesysid | usename | application_name | client_addr | 
client_hostname | client_port | backend_start |
backend_xmin | state | sent_location | write_location | flush_location | 
replay_location | sync_priority | sy
nc_state
-----+----------+---------+------------------+-------------+-----------------+-------------+---------------+-
-------------+-------+---------------+----------------+----------------+-----------------+---------------+---
---------
(0 rows)



If I then kill  the cluster2 postmaster, I have to do a -9 or it won't die

I get

LOG:  worker process: logical replication worker 16396 sync 16387 (PID 
3677) exited with exit code 1
WARNING:  could not launch logical replication worker
LOG:  logical replication sync for subscription testsub2, table a started
ERROR:  replication slot "testsub2_sync_a" does not exist
ERROR:  could not start WAL streaming: ERROR:  replication slot 
"testsub2_sync_a" does not exist

I'm not really sure what I need to do to debug this, I suspect the 
worker on cluster2 is having some issue.




> [1] 
>
https://www.postgresql.org/message-id/flat/CANP8%2Bj%2BNMHP-yFvoG03tpb4_s7GdmnCriEEOJeKkXWmUu_%3D-HA%40mail.gmail.com#CANP8+j+NMHP-yFvoG03tpb4_s7GdmnCriEEOJeKkXWmUu_=-HA@mail.gmail.com
>
>
>




Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 13/08/16 17:34, Steve Singer wrote:
> On 08/05/2016 11:00 AM, Petr Jelinek wrote:
>> Hi,
>>
>> as promised here is WIP version of logical replication patch.
>>
>
> Thanks for keeping on this.  This is important work
>
>> Feedback is welcome.
>>
>
> +<sect1 id="logical-replication-publication">
> +  <title>Publication</title>
> +  <para>
> +    A Publication object can be defined on any master node, owned by one
> +    user. A Publication is a set of changes generated from a group of
> +    tables, and might also be described as a Change Set or Replication
> Set.
> +    Each Publication exists in only one database.
>
> 'A publication object can be defined on *any master node*'.  I found
> this confusing the first time I read it because I thought it was
> circular (what makes a node a 'master' node? Having a publication object
> published from it?).   On reflection I realized that you mean ' any
> *physical replication master*'.  I think this might be better worded as
> 'A publication object can be defined on any node other than a standby
> node'.  I think referring to 'master' in the context of logical
> replication might confuse people.

Makes sense to me.

>
> I am raising this in the context of the larger terminology that we want
> to use and potential confusion with the terminology we use for physical
> replication. I like the publication / subscription terminology you've
> gone with.
>
>
>  <para>
> +    Publications are different from table schema and do not affect
> +    how the table is accessed. Each table can be added to multiple
> +    Publications if needed.  Publications may include both tables
> +    and materialized views. Objects must be added explicitly, except
> +    when a Publication is created for "ALL TABLES". There is no
> +    default name for a Publication which specifies all tables.
> +  </para>
> +  <para>
> +    The Publication is different from table schema, it does not affect
> +    how the table is accessed and each table can be added to multiple
>
> Those 2 paragraphs seem to start the same way.  I get the feeling that
> there is some point your trying to express that I'm not catching onto.
> Of course a publication is different than a tables schema, or different
> than a function.

Ah that's relic of some editorialization, will fix. The reason why we 
think it's important to mention the difference between publication and 
schema is that they are the only objects that contain tables but they 
affect them in very different ways which might confuse users.

>
> The definition of publication you have on the CREATE PUBLICATION page
> seems better and should be repeated here (A publication is essentially a
> group of tables intended for managing logical replication. See Section
> 30.1 <cid:part1.06040100.08080900@ssinger.info> for details about how
> publications fit into logical replication setup. )
>
>
> +  <para>
> +    Conflicts happen when the replicated changes is breaking any
> +    specified constraints (with the exception of foreign keys which are
> +    not checked). Currently conflicts are not resolved automatically and
> +    cause replication to be stopped with an error until the conflict is
> +    manually resolved.
>
> What options are there for manually resolving conflicts?  Is the only
> option to change the data on the subscriber to avoid the conflict?
> I assume there isn't a way to flag a particular row coming from the
> publisher and say ignore it.  I don't think this is something we need to
> support for the first version.

Yes you have to update data on subscriber or skip the the replication of 
whole transaction (for which the UI is not very friendly currently as 
you either have to consume the transaction 
pg_logical_slot_get_binary_changes or by moving origin on subscriber 
using pg_replication_origin_advance).

It's relatively easy to add some automatic conflict resolution as well, 
but it didn't seem absolutely necessary so I didn't do it for the 
initial version.

>
> <sect1 id="logical-replication-architecture">
> +  <title>Architecture</title>
> +  <para>
> +    Logical replication starts by copying a snapshot of the data on
> +    the Provider database. Once that is done, the changes on Provider
>
> I notice the user of 'Provider' above do you intend to update that to
> 'Publisher' or does provider mean something different. If we like the
> 'publication' terminology then I think 'publishers' should publish them
> not providers.
>

Okay, I am just used to 'provider' in general (I guess londiste habit), 
but 'publisher' is fine as well.

>
> I'm trying to test a basic subscription and I do the following
>
> I did the following:
>
> cluster 1:
> create database test1;
> create table a(id serial8 primary key,b text);
> create publication testpub1;
>  alter publication testpub1 add table a;
> insert into a(b) values ('1');
>
> cluster2
> create database test1;
> create table a(id serial8 primary key,b text);
> create subscription testsub2 publication testpub1 connection
> 'host=localhost port=5440 dbname=test1';
> NOTICE:  created replication slot "testsub2" on provider
> NOTICE:  synchronized table states
> CREATE SUBSCRIPTION
>
>  [...]
>
> The initial sync completed okay, then I did
>
> insert into a(b) values ('2');
>
> but the second insert never replicated.
>
> I had the following output
>
> LOG:  terminating walsender process due to replication timeout
>
>
> On cluster 1 I do
>
> select * FROM pg_stat_replication;
>  pid | usesysid | usename | application_name | client_addr |
> client_hostname | client_port | backend_start |
> backend_xmin | state | sent_location | write_location | flush_location |
> replay_location | sync_priority | sy
> nc_state
> -----+----------+---------+------------------+-------------+-----------------+-------------+---------------+-
>
> -------------+-------+---------------+----------------+----------------+-----------------+---------------+---
>
> ---------
> (0 rows)
>
>
>
> If I then kill  the cluster2 postmaster, I have to do a -9 or it won't die
>

That might explain why it didn't replicate. The wait loops in apply 
worker clearly need some work. Thanks for the report.

--    Petr Jelinek                  http://www.2ndQuadrant.com/   PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Stas Kelvich
Date:
> On 11 Aug 2016, at 17:43, Petr Jelinek <petr@2ndquadrant.com> wrote:
>
>>
>> * Also I wasn’t able actually to run replication itself =) While regression tests passes, TAP
>> tests and manual run stuck. pg_subscription_rel.substate never becomes ‘r’. I’ll investigate
>> that more and write again.
>
> Interesting, please keep me posted. It's possible for tables to stay in 's' state for some time if there is nothing
happeningon the server, but that should not mean anything is stuck. 

Slightly played around, it seems that apply worker waits forever for substate change.

(lldb) bt
* thread #1: tid = 0x183e00, 0x00007fff88c7f2a2 libsystem_kernel.dylib`poll + 10, queue = 'com.apple.main-thread', stop
reason= signal SIGSTOP   frame #0: 0x00007fff88c7f2a2 libsystem_kernel.dylib`poll + 10   frame #1: 0x00000001017ca8a3
postgres`WaitEventSetWaitBlock(set=0x00007fd2dc816b30,cur_timeout=10000, occurred_events=0x00007fff5e7f67d8, nevents=1)
+51 at latch.c:1108   frame #2: 0x00000001017ca438 postgres`WaitEventSetWait(set=0x00007fd2dc816b30, timeout=10000,
occurred_events=0x00007fff5e7f67d8,nevents=1) + 248 at latch.c:941   frame #3: 0x00000001017c9fde
postgres`WaitLatchOrSocket(latch=0x000000010ab208a4,wakeEvents=25, sock=-1, timeout=10000) + 254 at latch.c:347   frame
#4:0x00000001017c9eda postgres`WaitLatch(latch=0x000000010ab208a4, wakeEvents=25, timeout=10000) + 42 at latch.c:302 *
frame#5: 0x0000000101793352 postgres`wait_for_sync_status_change(tstate=0x0000000101e409b0) + 178 at tablesync.c:228
frame#6: 0x0000000101792bbe postgres`process_syncing_tables_apply(slotname="subbi", end_lsn=140734778796592) + 430 at
tablesync.c:436  frame #7: 0x00000001017928c1 postgres`process_syncing_tables(slotname="subbi",
end_lsn=140734778796592)+ 81 at tablesync.c:518   frame #8: 0x000000010177b620
postgres`LogicalRepApplyLoop(last_received=140734778796592)+ 704 at apply.c:1122   frame #9: 0x000000010177bef4
postgres`ApplyWorkerMain(main_arg=0)+ 1044 at apply.c:1353   frame #10: 0x000000010174cb5a
postgres`StartBackgroundWorker+ 826 at bgworker.c:729   frame #11: 0x0000000101762227
postgres`do_start_bgworker(rw=0x00007fd2db700000)+ 343 at postmaster.c:5553   frame #12: 0x000000010175d42b
postgres`maybe_start_bgworker+ 427 at postmaster.c:5761   frame #13: 0x000000010175bccf
postgres`sigusr1_handler(postgres_signal_arg=30)+ 383 at postmaster.c:4979   frame #14: 0x00007fff9ab2352a
libsystem_platform.dylib`_sigtramp+ 26   frame #15: 0x00007fff88c7e07b libsystem_kernel.dylib`__select + 11   frame
#16:0x000000010175d5ac postgres`ServerLoop + 252 at postmaster.c:1665   frame #17: 0x000000010175b2e0
postgres`PostmasterMain(argc=3,argv=0x00007fd2db403840) + 5968 at postmaster.c:1309   frame #18: 0x000000010169507f
postgres`main(argc=3,argv=0x00007fd2db403840) + 751 at main.c:228   frame #19: 0x00007fff8d45c5ad libdyld.dylib`start +
1
(lldb) p state
(char) $1 = 'c'
(lldb) p tstate->state
(char) $2 = ‘c’

Also I’ve noted that some lsn position looks wrong on publisher:

postgres=# select restart_lsn, confirmed_flush_lsn from pg_replication_slots;restart_lsn | confirmed_flush_lsn
-------------+---------------------0/1530EF8   | 7FFF/5E7F6A30
(1 row)

postgres=# select sent_location, write_location, flush_location, replay_location from pg_stat_replication;sent_location
|write_location | flush_location | replay_location  
---------------+----------------+----------------+-----------------0/1530F30     | 7FFF/5E7F6A30  | 7FFF/5E7F6A30  |
7FFF/5E7F6A30
(1 row)



--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company




Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 15/08/16 15:51, Stas Kelvich wrote:
>> On 11 Aug 2016, at 17:43, Petr Jelinek <petr@2ndquadrant.com> wrote:
>>
>>>
>>> * Also I wasn’t able actually to run replication itself =) While regression tests passes, TAP
>>> tests and manual run stuck. pg_subscription_rel.substate never becomes ‘r’. I’ll investigate
>>> that more and write again.
>>
>> Interesting, please keep me posted. It's possible for tables to stay in 's' state for some time if there is nothing
happeningon the server, but that should not mean anything is stuck.
 
>
> Slightly played around, it seems that apply worker waits forever for substate change.
>
> (lldb) bt
> * thread #1: tid = 0x183e00, 0x00007fff88c7f2a2 libsystem_kernel.dylib`poll + 10, queue = 'com.apple.main-thread',
stopreason = signal SIGSTOP
 
>     frame #0: 0x00007fff88c7f2a2 libsystem_kernel.dylib`poll + 10
>     frame #1: 0x00000001017ca8a3 postgres`WaitEventSetWaitBlock(set=0x00007fd2dc816b30, cur_timeout=10000,
occurred_events=0x00007fff5e7f67d8,nevents=1) + 51 at latch.c:1108
 
>     frame #2: 0x00000001017ca438 postgres`WaitEventSetWait(set=0x00007fd2dc816b30, timeout=10000,
occurred_events=0x00007fff5e7f67d8,nevents=1) + 248 at latch.c:941
 
>     frame #3: 0x00000001017c9fde postgres`WaitLatchOrSocket(latch=0x000000010ab208a4, wakeEvents=25, sock=-1,
timeout=10000)+ 254 at latch.c:347
 
>     frame #4: 0x00000001017c9eda postgres`WaitLatch(latch=0x000000010ab208a4, wakeEvents=25, timeout=10000) + 42 at
latch.c:302
>   * frame #5: 0x0000000101793352 postgres`wait_for_sync_status_change(tstate=0x0000000101e409b0) + 178 at
tablesync.c:228
>     frame #6: 0x0000000101792bbe postgres`process_syncing_tables_apply(slotname="subbi", end_lsn=140734778796592) +
430at tablesync.c:436
 
>     frame #7: 0x00000001017928c1 postgres`process_syncing_tables(slotname="subbi", end_lsn=140734778796592) + 81 at
tablesync.c:518
>     frame #8: 0x000000010177b620 postgres`LogicalRepApplyLoop(last_received=140734778796592) + 704 at apply.c:1122
>     frame #9: 0x000000010177bef4 postgres`ApplyWorkerMain(main_arg=0) + 1044 at apply.c:1353
>     frame #10: 0x000000010174cb5a postgres`StartBackgroundWorker + 826 at bgworker.c:729
>     frame #11: 0x0000000101762227 postgres`do_start_bgworker(rw=0x00007fd2db700000) + 343 at postmaster.c:5553
>     frame #12: 0x000000010175d42b postgres`maybe_start_bgworker + 427 at postmaster.c:5761
>     frame #13: 0x000000010175bccf postgres`sigusr1_handler(postgres_signal_arg=30) + 383 at postmaster.c:4979
>     frame #14: 0x00007fff9ab2352a libsystem_platform.dylib`_sigtramp + 26
>     frame #15: 0x00007fff88c7e07b libsystem_kernel.dylib`__select + 11
>     frame #16: 0x000000010175d5ac postgres`ServerLoop + 252 at postmaster.c:1665
>     frame #17: 0x000000010175b2e0 postgres`PostmasterMain(argc=3, argv=0x00007fd2db403840) + 5968 at
postmaster.c:1309
>     frame #18: 0x000000010169507f postgres`main(argc=3, argv=0x00007fd2db403840) + 751 at main.c:228
>     frame #19: 0x00007fff8d45c5ad libdyld.dylib`start + 1
> (lldb) p state
> (char) $1 = 'c'
> (lldb) p tstate->state
> (char) $2 = ‘c’
>

Hmm, not sure why is that, it might be related to the lsn reported being 
wrong. Could you check what is the lsn there (either in tstate or or in 
pg_subscription_rel)? Especially in comparison with what the 
sent_location is.

> Also I’ve noted that some lsn position looks wrong on publisher:
>
> postgres=# select restart_lsn, confirmed_flush_lsn from pg_replication_slots;
>  restart_lsn | confirmed_flush_lsn
> -------------+---------------------
>  0/1530EF8   | 7FFF/5E7F6A30
> (1 row)
>
> postgres=# select sent_location, write_location, flush_location, replay_location from pg_stat_replication;
>  sent_location | write_location | flush_location | replay_location
> ---------------+----------------+----------------+-----------------
>  0/1530F30     | 7FFF/5E7F6A30  | 7FFF/5E7F6A30  | 7FFF/5E7F6A30
> (1 row)
>

That's most likely result of the unitialized origin_startpos warning. I 
am working on new version of patch where that part is fixed, if you want 
to check this before I send it in, the patch looks like this:

diff --git a/src/backend/replication/logical/apply.c 
b/src/backend/replication/logical/apply.c
index 581299e..7a9e775 100644
--- a/src/backend/replication/logical/apply.c
+++ b/src/backend/replication/logical/apply.c
@@ -1353,6 +1353,7 @@ ApplyWorkerMain(Datum main_arg)                originid = replorigin_by_name(myslotname, false);
             replorigin_session_setup(originid);                replorigin_session_origin = originid;
 
+               origin_startpos = replorigin_session_get_progress(false);                CommitTransactionCommand();
                wrcapi->connect(wrchandle, MySubscription->conninfo, true,


--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Petr Jelinek
Date:
Hi all,

attaching updated version of the patch. Still very much WIP but it's
slowly getting there.

Changes since last time:
- Mostly rewrote publication handling in pgoutput which brings a)
ability to add FOR ALL TABLES publications, b) performs better (no need
to syscache lookup for every change like before), c) does correct
invalidation of publications on DDL
- added FOR TABLE and FOR ALL TABLES clause to both CREATE PUBLICATION
and ALTER PUBLICATION so that one can create publication directly with
table list, the FOR TABLE in ALTER PUBLICATION behaves like SET
operation (removes existing, adds new ones)
- fixed several issues with initial table synchronization (most of which
have been reported here)
- added pg_stat_subscription monitoring view
- updated docs to reflect all the changes, also removed the stuff that's
only planned from the docs (there is copy of the planned stuff docs in
the neighboring thread so no need to keep it in the patch)
- added documentation improvements suggested by Steve Singer and removed
the capitalization in the main doc
- added pg_dump support
- improved psql support (\drp+ shows list of tables)
- added flags to COMMIT message in the protocol so that we can add 2PC
support in the future
- fixed DROP SUBSCRIPTION issues and added tests for it

I decided to not deal with ACLs so far, assuming superuser/replication
role for now. We can always make it less restrictive later by adding the
grantable privileges.

FDW support is still TODO. I think TRUNCATE will have to be solved as
part of other DDL in the future. I do have some ideas what to do with
DDL but I don't plan to implement them in the initial patch.

--
   Petr Jelinek                  http://www.2ndQuadrant.com/
   PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Logical Replication WIP

From
Petr Jelinek
Date:
Hi,

I found few bugs and missing docs and fixed those, here is updated
version of the patch.

No changes in terms of features.


On 20/08/16 19:24, Petr Jelinek wrote:
> Hi all,
>
> attaching updated version of the patch. Still very much WIP but it's
> slowly getting there.
>
> Changes since last time:
> - Mostly rewrote publication handling in pgoutput which brings a)
> ability to add FOR ALL TABLES publications, b) performs better (no need
> to syscache lookup for every change like before), c) does correct
> invalidation of publications on DDL
> - added FOR TABLE and FOR ALL TABLES clause to both CREATE PUBLICATION
> and ALTER PUBLICATION so that one can create publication directly with
> table list, the FOR TABLE in ALTER PUBLICATION behaves like SET
> operation (removes existing, adds new ones)
> - fixed several issues with initial table synchronization (most of which
> have been reported here)
> - added pg_stat_subscription monitoring view
> - updated docs to reflect all the changes, also removed the stuff that's
> only planned from the docs (there is copy of the planned stuff docs in
> the neighboring thread so no need to keep it in the patch)
> - added documentation improvements suggested by Steve Singer and removed
> the capitalization in the main doc
> - added pg_dump support
> - improved psql support (\drp+ shows list of tables)
> - added flags to COMMIT message in the protocol so that we can add 2PC
> support in the future
> - fixed DROP SUBSCRIPTION issues and added tests for it
>
> I decided to not deal with ACLs so far, assuming superuser/replication
> role for now. We can always make it less restrictive later by adding the
> grantable privileges.
>

--
   Petr Jelinek                  http://www.2ndQuadrant.com/
   PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Logical Replication WIP

From
Petr Jelinek
Date:
Hi,

and one more version with bug fixes, improved code docs and couple more
tests, some general cleanup and also rebased on current master for the
start of CF.

--
   Petr Jelinek                  http://www.2ndQuadrant.com/
   PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Logical Replication WIP

From
Erik Rijkers
Date:
On 2016-08-31 22:51, Petr Jelinek wrote:
> Hi,
>
> and one more version with bug fixes, improved code docs and couple
> more tests, some general cleanup and also rebased on current master
> for the start of CF.

Clear, well-written docs, thanks.

Here are some small changes to  logical-replication.sgml

Erik Rijkers
Attachment

Re: Logical Replication WIP

From
Erik Rijkers
Date:
On 2016-09-01 01:04, Erik Rijkers wrote:
> On 2016-08-31 22:51, Petr Jelinek wrote:
>
> Here are some small changes to  logical-replication.sgml

...  and other .sgml files.

Erik Rijkers
Attachment

Re: Logical Replication WIP

From
Peter Eisentraut
Date:
Review of 0001-Add-PUBLICATION-catalogs-and-DDL.patch:

The new system catalog pg_publication_rel has columns pubid, relid,
and does not use the customary column name prefixes.  Maybe that is OK
here.  I can't actually think of a naming scheme that wouldn't make
things worse.

The hunk in patch 0006 for
src/backend/replication/logical/publication.c needs to be moved to
0001, for the definition of GetPublicationRelations().

The catalog column puballtables is not mentioned in the documentation.

Unrelated formatting changes in src/backend/commands/Makefile.

In psql, the code psql_error("The server (version %d.%d) does not
...") should be updated to use the new formatPGVersionNumber()
function.

In psql, psql \dr is already for "roles" (\drds).  You are adding \drp
for publications.  Maybe use big R for replication-related describes?

There should be some documentation about how TRUNCATE commands are
handled by publications.  Patch 0005 mentions TRUNCATE in the general
documentation, but I would have questions when reading the CREATE
PUBLICATION reference page.

Also, document how publications deal with INSERT ON CONFLICT.

In some places, the new publication object type is just added to the
end of a list instead of some alphabetical place, e.g.,
event_trigger.c, gram.y (drop_type).

publication.h has /* true if inserts are replicated */ repeated
several times.

What are the BKI_ROWTYPE_OID assignments for?  Are they necessary
here?  (Maybe this was just copied from pg_subscription?)

I think some or all of replication/logical/publication.c should be
catalog/pg_publication.c.  There are various different precedents in
how this can be split up, but I kind of like having command/foocmds.c
call into catalog/pg_foo.c.

Also, some things could be in lsyscache.c, although not too many new
things go in there now.

Most calls of the GetPublication() function could be changed to a
simpler get_publication_name(Oid), because that is all it is used for
so far.  (It will be used later in 0003, but only in one specific
case.)

In get_object_address_publication_rel() you are calling
ObjectAddressSet(address, UserMappingRelationId, InvalidOid).  That is
probably a typo.

Also, document somewhere around get_object_address_publication_rel()
what objname (relation) and objargs (publication) are, otherwise one
has to guess.  (Existing similar functions are also not good about that.)

The code for OCLASS_PUBLICATION_REL in getObjectIdentityParts() does
not fill in objname and objargs, as it is supposed to.

If I add a table to a publication, it requires a primary key.  But
after the table is added, I can remove the primary key.  There is code
in publication_add_relation() to record dependencies for that, but it
doesn't seem to do its job right.

Relatedly, the error messages in check_publication_add_relation() and
AlterPublicationOptions() conflate replica identity index and primary
key.  (I suppose the whole user-facing presentation of what replica
identity indexes are, which have so far been a rather obscure feature,
will need some polishing during this.)

I think the syntax could be made prettier.  For example, instead of
   CREATE PUBLICATION testpib_ins_trunct WITH noreplicate_delete
noreplicate_update;

how about something like
   CREATE PUBLICATION foo (REPLICATE DELETE, NO REPLICATE UPDATE);

Not that important right now, but something to keep in mind.

I also found ALTER PUBLICATION FOR TABLE / FOR ALL TABLES confusing.
Maybe that should be SET TABLE or something.

Finally, I'd like some more test coverage of DDL error cases, like
adding a view to a publication, trying to drop a primary key (as per
above), and so on.

(Various small typos and such I didn't bother with at this time.)

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 02/09/16 22:57, Peter Eisentraut wrote:
> Review of 0001-Add-PUBLICATION-catalogs-and-DDL.patch:
>

Thanks!

> The new system catalog pg_publication_rel has columns pubid, relid,
> and does not use the customary column name prefixes.  Maybe that is OK
> here.  I can't actually think of a naming scheme that wouldn't make
> things worse.
>

Yeah, well I could not either and thee are some catalogs that don't use 
the prefixes so I figured it's probably not big deal.


> In psql, the code psql_error("The server (version %d.%d) does not
> ...") should be updated to use the new formatPGVersionNumber()
> function.
>

Right, same thing will be in the 2nd patch.

> In psql, psql \dr is already for "roles" (\drds).  You are adding \drp
> for publications.  Maybe use big R for replication-related describes?
>

Seems reasonable.

> There should be some documentation about how TRUNCATE commands are
> handled by publications.  Patch 0005 mentions TRUNCATE in the general
> documentation, but I would have questions when reading the CREATE
> PUBLICATION reference page.
>

That's actually bug in the 0005 patch, TRUNCATE is not handled ATM, but 
that should be probably documented as well.

> Also, document how publications deal with INSERT ON CONFLICT.
>

Okay, they just replicate whatever was the result of that operation (if 
any).

> In some places, the new publication object type is just added to the
> end of a list instead of some alphabetical place, e.g.,
> event_trigger.c, gram.y (drop_type).

Hmm, what is and what isn't alphabetically sorted is quite unclear for 
me as we have mix of both everywhere. For example, if you consider 
drop_type to be alphabetically sorted then our locales are much more 
different than I thought :)


> What are the BKI_ROWTYPE_OID assignments for?  Are they necessary
> here?  (Maybe this was just copied from pg_subscription?)
>

Yes they are.

> I think some or all of replication/logical/publication.c should be
> catalog/pg_publication.c.  There are various different precedents in
> how this can be split up, but I kind of like having command/foocmds.c
> call into catalog/pg_foo.c.
>

Okay, I prefer grouping the code by functionality (as in terms of this 
is replication) rather than architectures (as in terms this is catalog) 
but no problem moving it. Again same thing will be in 2nd patch.

> Also, some things could be in lsyscache.c, although not too many new
> things go in there now.

TBH I dislike the whole lsyscache concept of just random lookup 
functions piled in one huge module and would rather not add to it.

>
> Most calls of the GetPublication() function could be changed to a
> simpler get_publication_name(Oid), because that is all it is used for
> so far.  (It will be used later in 0003, but only in one specific
> case.)

You mean the calls from objectaddress? Will change that - I actually 
added the get_publication_name much later in the development and didn't 
go back to use it in preexisting code.


> If I add a table to a publication, it requires a primary key.  But
> after the table is added, I can remove the primary key.  There is code
> in publication_add_relation() to record dependencies for that, but it
> doesn't seem to do its job right.
>

I need to rewrite that part. That's actually something I could use other 
people opinion on - currently the pg_publication_rel does not have 
records for the alltables publication as that seemed redundant so it 
will need some special handling in tablecmds.c for the "dependency" 
tracking and possibly elsewhere for other things. I do wonder though if 
we should instead just add records to the pg_publication_rel catalog.

> Relatedly, the error messages in check_publication_add_relation() and
> AlterPublicationOptions() conflate replica identity index and primary
> key.  (I suppose the whole user-facing presentation of what replica
> identity indexes are, which have so far been a rather obscure feature,
> will need some polishing during this.)
>

Those are copy/paste issues from pglogical. It should say replica 
identity index everywhere. But yes it might be needed to make it more 
obvious what replica identity indexes are.

> I think the syntax could be made prettier.  For example, instead of
>
>     CREATE PUBLICATION testpib_ins_trunct WITH noreplicate_delete
> noreplicate_update;
>
> how about something like
>
>     CREATE PUBLICATION foo (REPLICATE DELETE, NO REPLICATE UPDATE);
>

I went with the same syntax style as CREATE ROLE, but I am open to changes.

> I also found ALTER PUBLICATION FOR TABLE / FOR ALL TABLES confusing.
> Maybe that should be SET TABLE or something.
>

Yeah I am not sure what is the best option there. SET was also what I 
was thinking but then it does not map well to the CREATE PUBLICATION 
syntax and I would like to have some harmony there.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Tom Lane
Date:
Petr Jelinek <petr@2ndquadrant.com> writes:
> On 02/09/16 22:57, Peter Eisentraut wrote:
>> The new system catalog pg_publication_rel has columns pubid, relid,
>> and does not use the customary column name prefixes.  Maybe that is OK
>> here.  I can't actually think of a naming scheme that wouldn't make
>> things worse.

> Yeah, well I could not either and thee are some catalogs that don't use 
> the prefixes so I figured it's probably not big deal.

The ones that don't are not models to be emulated.  They are cases
where somebody ignored project convention and it wasn't caught until
too late.
        regards, tom lane



Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 03/09/16 18:04, Tom Lane wrote:
> Petr Jelinek <petr@2ndquadrant.com> writes:
>> On 02/09/16 22:57, Peter Eisentraut wrote:
>>> The new system catalog pg_publication_rel has columns pubid, relid,
>>> and does not use the customary column name prefixes.  Maybe that is OK
>>> here.  I can't actually think of a naming scheme that wouldn't make
>>> things worse.
>
>> Yeah, well I could not either and thee are some catalogs that don't use
>> the prefixes so I figured it's probably not big deal.
>
> The ones that don't are not models to be emulated.  They are cases
> where somebody ignored project convention and it wasn't caught until
> too late.
>

Okay but if I follow the convention the names of those fields would be 
something like pubrelpubid and pubrelrelid which does not seem like 
improvement to me. Maybe the catalog should be pg_publication_map then 
as that would make it seem less ugly although less future proof (as 
we'll want to add more things to publications than just tables and they 
might need different catalogs).

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Steve Singer
Date:
On 08/31/2016 04:51 PM, Petr Jelinek wrote:
> Hi,
>
> and one more version with bug fixes, improved code docs and couple 
> more tests, some general cleanup and also rebased on current master 
> for the start of CF.
>
>
>

To get the 'subscription' TAP tests to pass I need to set

export PGTZ=+02

Shouldn't the expected output be with reference to PST8PDT?





Re: Logical Replication WIP

From
Steve Singer
Date:
On 09/05/2016 03:58 PM, Steve Singer wrote:
> On 08/31/2016 04:51 PM, Petr Jelinek wrote:
>> Hi,
>>
>> and one more version with bug fixes, improved code docs and couple 
>> more tests, some general cleanup and also rebased on current master 
>> for the start of CF.
>>
>>
>>
>

A few more things I noticed when playing with the patches

1, Creating a subscription to yourself ends pretty badly,
the 'CREATE SUBSCRIPTION' command seems to get stuck, and you can't kill 
it.  The background process seems to be waiting for a transaction to 
commit (I assume the create subscription command).  I had to kill -9 the 
various processes to get things to stop.  Getting confused about 
hostnames and ports is a common operator error.

2. Failures during the initial subscription  aren't recoverable

For example

on db1  create table a(id serial4 primary key,b text);  insert into a(b) values ('1');  create publication testpub for
tablea;
 

on db2  create table a(id serial4 primary key,b text);  insert into a(b) values ('1');  create subscription testsub
connection'host=localhost port=5440 
 
dbname=test' publication testpub;

I then get in my db2 log

ERROR:  duplicate key value violates unique constraint "a_pkey"
DETAIL:  Key (id)=(1) already exists.
LOG:  worker process: logical replication worker 16396 sync 16387 (PID 
10583) exited with exit code 1
LOG:  logical replication sync for subscription testsub, table a started
ERROR:  could not crate replication slot "testsub_sync_a": ERROR: 
replication slot "testsub_sync_a" already exists


LOG:  worker process: logical replication worker 16396 sync 16387 (PID 
10585) exited with exit code 1
LOG:  logical replication sync for subscription testsub, table a started
ERROR:  could not crate replication slot "testsub_sync_a": ERROR: 
replication slot "testsub_sync_a" already exists


and it keeps looping.
If I then truncate "a" on db2 it doesn't help. (I'd expect at that point 
the initial subscription to work)

If I then do on db2 drop subscription testsub cascade;

I still see a slot in use on db1

select * FROM pg_replication_slots ;   slot_name    |  plugin  | slot_type | datoid | database | active | 
active_pid | xmin | catalog_xmin | rest
art_lsn | confirmed_flush_lsn
----------------+----------+-----------+--------+----------+--------+------------+------+--------------+-----
--------+--------------------- testsub_sync_a | pgoutput | logical   |  16384 | test     | f 
|            |      |         1173 | 0/15
66E08   | 0/1566E40







Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 05/09/16 21:58, Steve Singer wrote:
> On 08/31/2016 04:51 PM, Petr Jelinek wrote:
>> Hi,
>>
>> and one more version with bug fixes, improved code docs and couple
>> more tests, some general cleanup and also rebased on current master
>> for the start of CF.
>>
>>
>>
>
> To get the 'subscription' TAP tests to pass I need to set
>
> export PGTZ=+02
>
> Shouldn't the expected output be with reference to PST8PDT?
>

That would break it for other timezones, the expected output should be 
whatever will work for everybody. I think the connection just needs to 
set the timezone so that it's stable across environments.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 05/09/16 23:35, Steve Singer wrote:
> On 09/05/2016 03:58 PM, Steve Singer wrote:
>> On 08/31/2016 04:51 PM, Petr Jelinek wrote:
>>> Hi,
>>>
>>> and one more version with bug fixes, improved code docs and couple
>>> more tests, some general cleanup and also rebased on current master
>>> for the start of CF.
>>>
>>>
>>>
>>
>
> A few more things I noticed when playing with the patches
>
> 1, Creating a subscription to yourself ends pretty badly,
> the 'CREATE SUBSCRIPTION' command seems to get stuck, and you can't kill
> it.  The background process seems to be waiting for a transaction to
> commit (I assume the create subscription command).  I had to kill -9 the
> various processes to get things to stop.  Getting confused about
> hostnames and ports is a common operator error.
>

Hmm I guess there is missing interrupts check, will look. It would be 
great to detect it properly but I am not really sure how to do that as 
afaik there is no accurate way to detect that the connection is to yourself.

> 2. Failures during the initial subscription  aren't recoverable
>
> For example
>
> on db1
>   create table a(id serial4 primary key,b text);
>   insert into a(b) values ('1');
>   create publication testpub for table a;
>
> on db2
>   create table a(id serial4 primary key,b text);
>   insert into a(b) values ('1');
>   create subscription testsub connection 'host=localhost port=5440
> dbname=test' publication testpub;
>
> I then get in my db2 log
>
> ERROR:  duplicate key value violates unique constraint "a_pkey"
> DETAIL:  Key (id)=(1) already exists.
> LOG:  worker process: logical replication worker 16396 sync 16387 (PID
> 10583) exited with exit code 1
> LOG:  logical replication sync for subscription testsub, table a started
> ERROR:  could not crate replication slot "testsub_sync_a": ERROR:
> replication slot "testsub_sync_a" already exists
>
>
> LOG:  worker process: logical replication worker 16396 sync 16387 (PID
> 10585) exited with exit code 1
> LOG:  logical replication sync for subscription testsub, table a started
> ERROR:  could not crate replication slot "testsub_sync_a": ERROR:
> replication slot "testsub_sync_a" already exists
>
>
> and it keeps looping.
> If I then truncate "a" on db2 it doesn't help. (I'd expect at that point
> the initial subscription to work)

Hmm, looks like the error case does not cleanup correctly after itself.

>
> If I then do on db2
>  drop subscription testsub cascade;
>
> I still see a slot in use on db1
>
> select * FROM pg_replication_slots ;
>    slot_name    |  plugin  | slot_type | datoid | database | active |
> active_pid | xmin | catalog_xmin | rest
> art_lsn | confirmed_flush_lsn
> ----------------+----------+-----------+--------+----------+--------+------------+------+--------------+-----
>
> --------+---------------------
>  testsub_sync_a | pgoutput | logical   |  16384 | test     | f
> |            |      |         1173 | 0/15
> 66E08   | 0/1566E40
>

Same as above.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 01/09/16 08:29, Erik Rijkers wrote:
> On 2016-09-01 01:04, Erik Rijkers wrote:
>> On 2016-08-31 22:51, Petr Jelinek wrote:
>>
>> Here are some small changes to  logical-replication.sgml
>
> ...  and other .sgml files.

Thanks I'll integrate these into next iteration of the patch,

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Peter Eisentraut
Date:
On 9/3/16 5:14 AM, Petr Jelinek wrote:
>> What are the BKI_ROWTYPE_OID assignments for?  Are they necessary
>> > here?  (Maybe this was just copied from pg_subscription?)
>> >
> Yes they are.

Please explain/document why.  It does not match other catalogs, which
either use it for relcache initialization or because they are shared
catalogs.  (I'm not sure of the details, but this one clearly looks
different.)

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 06/09/16 20:14, Peter Eisentraut wrote:
> On 9/3/16 5:14 AM, Petr Jelinek wrote:
>>> What are the BKI_ROWTYPE_OID assignments for?  Are they necessary
>>>> here?  (Maybe this was just copied from pg_subscription?)
>>>>
>> Yes they are.
>
> Please explain/document why.  It does not match other catalogs, which
> either use it for relcache initialization or because they are shared
> catalogs.  (I'm not sure of the details, but this one clearly looks
> different.)
>

Erm, I meant yes they are just copied and I will remove them (I see how 
my answer might been confusing given that you asked multiple questions, 
sorry).

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Peter Eisentraut
Date:
Review of 0002-Add-SUBSCRIPTION-catalog-and-DDL.patch:

(As you had already mentioned, some of the review items in 0001 apply
analogously here.)


Changes needed to compile:

--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -218,7 +218,7 @@ CreateSubscription(CreateSubscriptionStmt *stmt)   CatalogUpdateIndexes(rel, tup);
heap_freetuple(tup);

-   ObjectAddressSet(myself, SubscriptionRelationId, suboid);
+   ObjectAddressSet(myself, SubscriptionRelationId, subid);
   heap_close(rel, RowExclusiveLock);

This is fixed later in patch 0005.

--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -6140,8 +6140,7 @@ <title><structname>pg_subscription</structname>
Columns</title>      <entry><structfield>subpublications</structfield></entry>
<entry><type>text[]</type></entry>           <entry></entry>
 
-      <entry>Array of subscribed publication names. For more on
publications
-       see <xref linkend="publications">.
+      <entry>Array of subscribed publication names.           </entry>         </row>      </tbody>

I don't see that id defined in any later patch.


Minor problems:

Probably unintentional change in pg_dump.h:

- * The PublicationInfo struct is used to represent publications.
+ * The PublicationInfo struct is used to represent publication.

pg_subscription column "dbid" rename to "subdbid".

I think subpublications ought to be of type name[], not text[].

It says, a subscription can only be dropped by its owner or a
superuser.  But subscriptions don't have owners.  Maybe they should.

On the CREATE SUBSCRIPTION ref page,
   | INITIALLY ( ENABLED | DISABLED )

should use {} instead.

We might want to add ALTER commands to rename subscriptions and
publications.

Similar concerns as before about ALTER syntax, e.g., does ALTER
SUBSCRIPTION ... PUBLICATION add to or replace the publication set?

For that matter, why is there no way to add?

Document why publicationListToArray() creates its own memory context.

I think we should allow creating subscriptions initally without
publications.  This could be useful for example to test connections,
or create slots before later on adding publications.  Seeing that
there is support for changing the publications later, this shouldn't
be a problem.

The synopsis of CREATE SUBSCRIPTION indicates that options are
optional, but it actually requires at least one option.

At the end of CreateSubscription(), the CommandCounterIncrement()
doesn't appear to be necessary (yet, see patch 0005?).

Maybe check for duplicates in the publications list.


Larger conceptual issues:

I haven't read the rest of the code yet to understand why
pg_subscription needs to be a shared catalog, but be that as it may, I
would still make it so that subscription names appear local to the
database.  We already have the database OID in the pg_subscription
catalog, so I would make the key (subname, subdatid).  DDL commands
would only operate on subscriptions in the current database (or you
could use "datname"."subname" syntax), and \drs would only show
subscriptions in the current database.  That way the shared catalog is
an implementation detail that can be changed in the future.  I think
it would be very helpful for users if publications and subscriptions
appear to work in a parallel way.  If I have two databases that I want
to replicate between two servers, I might want to have a publication
"mypub" in each database and a subscription "mysub" in each database.
If I learn that the subscriptions can't be named that way, then I have
to go back to rename to the publications, and it'll all be a bit
frustrating.

Some thoughts on pg_dump and such:

Even an INITIALLY DISABLED subscription needs network access to create
the replication slot.  So restoring a dump when the master is not
available will have some difficulties.  And restoring master and slave
at the same time (say disaster recovery) will not necessarily work
well either.  Also, the general idea of doing network access during a
backup restore without obvious prior warning sounds a bit unsafe.

I imagine maybe having three states for subscriptions: DISABLED,
PREPARED, ENABLED (to use existing key words).  DISABLED just exists
in the catalog, PREPARED has the slots set up, ENABLED starts
replicating.  So you can restore a dump with all slots disabled.  And
then it might be good to have a command to "prepare" or "enable" all
subscriptions at once.

That command would also help if you restore a dump not in a
transaction but you want to enable all subscriptions in the same
transaction.

I'd also prefer having subscriptions dumped by default, just to keep
it so that pg_dump by default backs up everything.

Finally, having disabled subscriptions without network access would
also allow writing some DDL command tests.

As I had mentioned privately before, I would perhaps have CREATE
SUBSCRIPTION use the foreign server infrastructure for storing
connection information.

We'll have to keep thinking about ways to handle abandonded
replication slots.  I imagine that people will want to create
subscriber instances in fully automated ways.  If that fails every so
often and requires manual cleanup of replication slots on the master
some of the time, that will get messy.  I don't have well-formed ideas
about this, though.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 07/09/16 02:56, Peter Eisentraut wrote:
> Review of 0002-Add-SUBSCRIPTION-catalog-and-DDL.patch:
>
> Similar concerns as before about ALTER syntax, e.g., does ALTER
> SUBSCRIPTION ... PUBLICATION add to or replace the publication set?
>

It sets.

> For that matter, why is there no way to add?
>

Didn't seem all that useful, the expectation here is that most 
subscriptions will use one couple of publications.

>
> I think we should allow creating subscriptions initally without
> publications.  This could be useful for example to test connections,
> or create slots before later on adding publications.  Seeing that
> there is support for changing the publications later, this shouldn't
> be a problem.
>

Sure, but they need to be created disabled then.

>
> Larger conceptual issues:
>
> I haven't read the rest of the code yet to understand why
> pg_subscription needs to be a shared catalog, but be that as it may, I
> would still make it so that subscription names appear local to the
> database.  We already have the database OID in the pg_subscription
> catalog, so I would make the key (subname, subdatid).  DDL commands
> would only operate on subscriptions in the current database (or you
> could use "datname"."subname" syntax), and \drs would only show
> subscriptions in the current database.  That way the shared catalog is
> an implementation detail that can be changed in the future.  I think
> it would be very helpful for users if publications and subscriptions
> appear to work in a parallel way.  If I have two databases that I want
> to replicate between two servers, I might want to have a publication
> "mypub" in each database and a subscription "mysub" in each database.
> If I learn that the subscriptions can't be named that way, then I have
> to go back to rename to the publications, and it'll all be a bit
> frustrating.

Okay that makes sense. The pg_subscription is shared catalog so that we 
can have one launcher per cluster instead one per database. Otherwise 
there is no reason why it could not behave like local catalog.

>
> Some thoughts on pg_dump and such:
>
> Even an INITIALLY DISABLED subscription needs network access to create
> the replication slot.  So restoring a dump when the master is not
> available will have some difficulties.  And restoring master and slave
> at the same time (say disaster recovery) will not necessarily work
> well either.  Also, the general idea of doing network access during a
> backup restore without obvious prior warning sounds a bit unsafe.
>
> I imagine maybe having three states for subscriptions: DISABLED,
> PREPARED, ENABLED (to use existing key words).  DISABLED just exists
> in the catalog, PREPARED has the slots set up, ENABLED starts
> replicating.  So you can restore a dump with all slots disabled.  And
> then it might be good to have a command to "prepare" or "enable" all
> subscriptions at once.

Well the DISABLED keyword is also used in alter to stop the subscription 
but not remove it, that would not longer map well if we used the 
behavior you described. That being said I agree with the idea of having 
subscription that exists just locally in catalog, we just need to figure 
out better naming.

>
> As I had mentioned privately before, I would perhaps have CREATE
> SUBSCRIPTION use the foreign server infrastructure for storing
> connection information.
>

Hmm, yeah it's an idea. My worry there is that it will make it bit more 
complex to setup as user will have to first create server and user 
mapping before creating subscription.

> We'll have to keep thinking about ways to handle abandonded
> replication slots.  I imagine that people will want to create
> subscriber instances in fully automated ways.  If that fails every so
> often and requires manual cleanup of replication slots on the master
> some of the time, that will get messy.  I don't have well-formed ideas
> about this, though.
>

Yes it's potential issue, don't have good solution for it either. It's 
loosely coupled system so we can't have 100% control over everything.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Erik Rijkers
Date:
On 2016-08-31 22:51, Petr Jelinek wrote:
> 
> and one more version with bug fixes, improved code docs and couple


I am not able to get the replication to work. Would you (or anyone) be 
so kind to point out what I am doing wrong?

Patches applied, compiled, make-checked, installed OK.

I have 2 copies compiled and installed,  logical_replication and 
logical_replication2, to be publisher and subscriber, ports 6972 and 
6973 respectively.

( BTW, there is no postgres user;  OS user is 'aardvark'.   'aardvark is 
also db superuser, and
it is also the user as which the two installations are installed.  )

PGPASSFILE is set up and works for both instances.

both pg_hba.conf's changed to have:
local   replication     aardvark                                md5


instances.sh
--------------------------------------------------------------------
#!/bin/sh
project1=logical_replication      # publisher
project2=logical_replication2    # subscriber
pg_stuff_dir=$HOME/pg_stuff
PATH1=$pg_stuff_dir/pg_installations/pgsql.$project1/bin:$PATH
PATH2=$pg_stuff_dir/pg_installations/pgsql.$project2/bin:$PATH
server_dir1=$pg_stuff_dir/pg_installations/pgsql.$project1
server_dir2=$pg_stuff_dir/pg_installations/pgsql.$project2
port1=6972
port2=6973
data_dir1=$server_dir1/data
data_dir2=$server_dir2/data
options1="
-c wal_level=logical
-c max_replication_slots=10
-c max_worker_processes=12
-c max_logical_replication_workers=10
-c max_wal_senders=10
-c logging_collector=on
-c log_directory=$server_dir1
-c log_filename=logfile.${project1} "

options2="
-c wal_level=logical
-c max_replication_slots=10
-c max_worker_processes=12
-c max_logical_replication_workers=10
-c max_wal_senders=10
-c logging_collector=on
-c log_directory=$server_dir2
-c log_filename=logfile.${project2} "


# start two instances:
export PATH=$PATH1; postgres -D $data_dir1 -p $port1 ${options1} &

export PATH=$PATH2; postgres -D $data_dir2 -p $port2 ${options2} &
--------------------------------------------------------------------

Both instances run fine.

On publisher db:
Create a table testt, with 20 rows.

CREATE PUBLICATION pub1 FOR TABLE testt ;
No problem.

On Subscriber db:
CREATE SUBSCRIPTION sub1 WITH CONNECTION 'host=/tmp dbname=testdb 
port=6972' PUBLICATION pub1 INITIALLY DISABLED;
ALTER SUBSCRIPTION sub1 enable;

Adding rows to the table  (publisher-side) gets activity going.  I give 
the resulting logs of both sides:

Logfile publisher side:
[...]
2016-09-07 13:47:44.287 CEST 21995 LOG:  logical replication launcher 
started
2016-09-07 13:51:42.601 CEST 22141 LOG:  logical decoding found 
consistent point at 0/230F478
2016-09-07 13:51:42.601 CEST 22141 DETAIL:  There are no running 
transactions.
2016-09-07 13:51:42.601 CEST 22141 LOG:  exported logical decoding 
snapshot: "00000702-1" with 0 transaction IDs
2016-09-07 13:52:11.326 CEST 22144 LOG:  starting logical decoding for 
slot "sub1"
2016-09-07 13:52:11.326 CEST 22144 DETAIL:  streaming transactions 
committing after 0/230F4B0, reading WAL from 0/230F478
2016-09-07 13:52:11.326 CEST 22144 LOG:  logical decoding found 
consistent point at 0/230F478
2016-09-07 13:52:11.326 CEST 22144 DETAIL:  There are no running 
transactions.
2016-09-07 13:53:47.012 CEST 22144 LOG:  could not receive data from 
client: Connection reset by peer
2016-09-07 13:53:47.012 CEST 22144 LOG:  unexpected EOF on standby 
connection
2016-09-07 13:53:47.025 CEST 22185 LOG:  starting logical decoding for 
slot "sub1"
2016-09-07 13:53:47.025 CEST 22185 DETAIL:  streaming transactions 
committing after 0/230F628, reading WAL from 0/230F5F0
2016-09-07 13:53:47.025 CEST 22185 LOG:  logical decoding found 
consistent point at 0/230F5F0
2016-09-07 13:53:47.025 CEST 22185 DETAIL:  There are no running 
transactions.
2016-09-07 13:53:47.030 CEST 22185 LOG:  could not receive data from 
client: Connection reset by peer
2016-09-07 13:53:47.030 CEST 22185 LOG:  unexpected EOF on standby 
connection
2016-09-07 13:53:52.044 CEST 22188 LOG:  starting logical decoding for 
slot "sub1"
2016-09-07 13:53:52.044 CEST 22188 DETAIL:  streaming transactions 
committing after 0/230F628, reading WAL from 0/230F5F0
2016-09-07 13:53:52.044 CEST 22188 LOG:  logical decoding found 
consistent point at 0/230F5F0
2016-09-07 13:53:52.044 CEST 22188 DETAIL:  There are no running 
transactions.
2016-09-07 13:53:52.195 CEST 22188 LOG:  could not receive data from 
client: Connection reset by peer
2016-09-07 13:53:52.195 CEST 22188 LOG:  unexpected EOF on standby 
connection
(repeat every few seconds)


Logfile subscriber-side:
[...]
2016-09-07 13:47:44.441 CEST 21997 LOG:  MultiXact member wraparound 
protections are now enabled
2016-09-07 13:47:44.528 CEST 21986 LOG:  database system is ready to 
accept connections
2016-09-07 13:47:44.529 CEST 22002 LOG:  logical replication launcher 
started
2016-09-07 13:52:11.319 CEST 22143 LOG:  logical replication apply for 
subscription sub1 started
2016-09-07 13:53:47.010 CEST 22143 ERROR:  could not open relation with 
OID 0
2016-09-07 13:53:47.012 CEST 21986 LOG:  worker process: logical 
replication worker 24048 (PID 22143) exited with exit code 1
2016-09-07 13:53:47.018 CEST 22184 LOG:  logical replication apply for 
subscription sub1 started
2016-09-07 13:53:47.028 CEST 22184 ERROR:  could not open relation with 
OID 0
2016-09-07 13:53:47.030 CEST 21986 LOG:  worker process: logical 
replication worker 24048 (PID 22184) exited with exit code 1
2016-09-07 13:53:52.041 CEST 22187 LOG:  logical replication apply for 
subscription sub1 started
2016-09-07 13:53:52.045 CEST 22187 ERROR:  could not open relation with 
OID 0
2016-09-07 13:53:52.046 CEST 21986 LOG:  worker process: logical 
replication worker 24048 (PID 22187) exited with exit code 1
(repeat every few seconds)


Any hints welcome.

Thanks!

Erik Rijkers



Re: Logical Replication WIP

From
Petr Jelinek
Date:
Hi,

On 07/09/16 14:10, Erik Rijkers wrote:
> On 2016-08-31 22:51, Petr Jelinek wrote:
>>
>> and one more version with bug fixes, improved code docs and couple
>
>
> I am not able to get the replication to work. Would you (or anyone) be
> so kind to point out what I am doing wrong?
>
> Patches applied, compiled, make-checked, installed OK.
>
> I have 2 copies compiled and installed,  logical_replication and
> logical_replication2, to be publisher and subscriber, ports 6972 and
> 6973 respectively.
>
>
> Logfile subscriber-side:
> [...]
> 2016-09-07 13:47:44.441 CEST 21997 LOG:  MultiXact member wraparound
> protections are now enabled
> 2016-09-07 13:47:44.528 CEST 21986 LOG:  database system is ready to
> accept connections
> 2016-09-07 13:47:44.529 CEST 22002 LOG:  logical replication launcher
> started
> 2016-09-07 13:52:11.319 CEST 22143 LOG:  logical replication apply for
> subscription sub1 started
> 2016-09-07 13:53:47.010 CEST 22143 ERROR:  could not open relation with
> OID 0
> 2016-09-07 13:53:47.012 CEST 21986 LOG:  worker process: logical
> replication worker 24048 (PID 22143) exited with exit code 1
> 2016-09-07 13:53:47.018 CEST 22184 LOG:  logical replication apply for
> subscription sub1 started
> 2016-09-07 13:53:47.028 CEST 22184 ERROR:  could not open relation with
> OID 0
> 2016-09-07 13:53:47.030 CEST 21986 LOG:  worker process: logical
> replication worker 24048 (PID 22184) exited with exit code 1
> 2016-09-07 13:53:52.041 CEST 22187 LOG:  logical replication apply for
> subscription sub1 started
> 2016-09-07 13:53:52.045 CEST 22187 ERROR:  could not open relation with
> OID 0
> 2016-09-07 13:53:52.046 CEST 21986 LOG:  worker process: logical
> replication worker 24048 (PID 22187) exited with exit code 1
> (repeat every few seconds)
>
>

It means the tables don't exist on subscriber. I added check and proper 
error message in my local dev branch, it will be part of the next update.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Petr Jelinek
Date:
Hi,

Updated version, this should address most of the things in Peter's
reviews so far, not all though as some of it needs more discussion.

Changes:
- I moved the publication.c to pg_publication.c, subscription.c to
pg_subscription.c.
- changed \drp and \drs to \dRp and \dRs
- fixed definitions of the catalogs (BKI_ROWTYPE_OID)
- changed some GetPublication calls to get_publication_name
- fixed getObjectIdentityParts for OCLASS_PUBLICATION_REL
- fixed get_object_address_publication_rel
- fixed the dependencies between pkeys and publications, for this I
actually had to add new interface to depenency.c that allows dropping
single dependency
- fixed the 'for all tables' and 'for tables all in schema'  publications
- changed the alter publication from FOR to SET
- added more test cases for the publication DDL
- fixed compilation of subscription patch alone and docs
- changed subpublications to name[]
- added check for publication list duplicates
- made the subscriptions behave more like they are inside the database
instead of shared catalog (even though the catalog is still shared)
- added options for for CREATE SUBSCRIPTION to optionally not create
slot and not do the initial data sync - that should solve the complaint
about CREATE SUBSCRIPTION always connecting
- the CREATE SUBSCRIPTION also tries to check if the specified
connection connects back to same db (although that check is somewhat
imperfect) and if it gets stuck on create slot it should be normally
cancelable (that should solve the issue Steve Singer had)
- fixed the tests to work in any timezone
- added DDL regress tests for subscription
- added proper detection of missing schemas and tables on subscriber
- rebased on top of 19acee8 as the DefElem changes broke the patch

The table sync is still far from ready.

--
   Petr Jelinek                  http://www.2ndQuadrant.com/
   PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Logical Replication WIP

From
Peter Eisentraut
Date:
Review of 0003-Define-logical-replication-protocol-and-output-plugi.patch:

(This is still based on the Aug 31 patch set, but at quick glance I
didn't see any significant changes in the Sep 8 set.)

Generally, this all seems mostly fine.  Everything is encapsulated
well enough that problems are localized and any tweaks don't affect
the overall work.

Changes needed to build:

--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2158,8 +2158,8 @@ <title>Logical Streaming Replication
Parameters</title>     <listitem>            <para>            Comma separated list of publication names for which to
subscribe
-       (receive changes). See
-       <xref linkend="logical-replication-publication"> for more info.
+       (receive changes). <!-- See
+       <xref linkend="logical-replication-publication"> for more info. -->           </para>         </listitem>
</varlistentry>
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -25,6 +25,7 @@#include "utils/builtins.h"#include "utils/inval.h"#include "utils/memutils.h"
+#include "utils/syscache.h"
PG_MODULE_MAGIC;

This is all fixed in later patches.

AFAICT, pgoutput does not use libpq, so the mentions in
src/backend/replication/pgoutput/Makefile are not needed (perhaps
copied from libpqwalreceiver?).

The start_replication option pg_version option is not documented and
not used in any later patch.  We can probably do without it and just
rely on the protocol version.

In pgoutput_startup(), you check opt->output_type.  But it is not set
anywhere.  Actually, the startup callback is supposed to set it
itself.

In init_rel_sync_cache(), the way hash_flags is set seems kind of
weird.  I think that variable could be removed and the flags put
directly into the hash_create() call.

pgoutput_config.c seems over-engineered, e.g., converting cstring to
Datum and back.  Just do normal DefElem list parsing in pgoutput.c.
That's not pretty either, but at least it's a common coding pattern.

In the protocol documentation, explain the meaning of int64 as a
commit timestamp.

Also, the documentation should emphasize more clearly that all the
messages are not actually top-level protocol messages but are
contained inside binary copy data.

On the actual protocol messages:

Why do strings have a length byte?  That is not how other strings in
the protocol work.  As a minor side-effect, this would limit for
example column names to 255 characters.

The message structure doesn't match the documentation in some ways.
For example Attributes and TupleData are not separate messages but are
contained in Relation and Insert/Update/Delete messages.  So the
documentation needs to be structured a bit differently.

In the Attributes message (or actually Relation message), we don't
need the 'A' and 'C' bytes.

I'm not sure that pgoutput should concern itself with the client
encoding.  The client encoding should already be set by the initial
FE/BE protocol handshake.  I haven't checked that further yet, so it
might already work, or it should be made to work that way, or I might
be way off.

Slight abuse of pqformat functions.  We're not composing messages
using pq_beginmessage()/pq_endmessage(), and we're not using
pq_getmsgend() when reading.  The "proper" way to do this is probably
to define a custom set of PQcommMethods.  (low priority)

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Logical Replication WIP

From
Peter Eisentraut
Date:
On 9/6/16 8:56 PM, Peter Eisentraut wrote:
> Some thoughts on pg_dump and such:

Another issue to add to this list:

With the current patch set, pg_dump will fail for unprivileged users,
because it can't read pg_subscription.  The include_subscription flag
ought to be checked in getSubscriptions() already, not (only) in
dumpSubscription().  The test suite for pg_dump fails because of this.

We might make further changes in this area, per ongoing discussion, but
it would be good to put in a quick fix for this in the next patch set so
that the global test suite doesn't fail.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Logical Replication WIP

From
Peter Eisentraut
Date:
Review of 0004-Make-libpqwalreceiver-reentrant.patch:

This looks like a good change.

typo: _PG_walreceirver_conn_init

For libpqrcv_create_slot(), slotname should be const char *.
Similarly, for slotname in libpqrcv_startstreaming*() and conninfo in
libpqrcv_connect(). (the latter two pre-existing)

The connection handle should record in libpqrcv_connect() whether a
connection is a logical or physical replication stream.  Then that
parameter doesn't have to be passed around later (or at least some
asserts could double-check it).

In libpqrcv_connect(), the new argument connname is actually just the
application name, for which in later patches the subscription name is
passed in.  Does this have a deeper meaning, or should we call the
argument appname to avoid introducing another term?

New function libpqrcv_create_slot(): Hardcoded cmd length (hmm, other
functions do that too), should used StringInfo.  ereport instead of
elog.  No newline at the end of error message, since PQerrorMessage()
already supplies it.  Typo "could not crate".  Briefly document return
value.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 09/09/16 06:33, Peter Eisentraut wrote:
> Review of 0003-Define-logical-replication-protocol-and-output-plugi.patch:
>
> (This is still based on the Aug 31 patch set, but at quick glance I
> didn't see any significant changes in the Sep 8 set.)
>

Yep.

> The start_replication option pg_version option is not documented and
> not used in any later patch.  We can probably do without it and just
> rely on the protocol version.
>

That's leftover from binary type data transfer which is not part of this 
initial approach as it adds a lot of complications to both protocol and 
apply side. So yes can do without.

> In pgoutput_startup(), you check opt->output_type.  But it is not set
> anywhere.  Actually, the startup callback is supposed to set it
> itself.

Leftover from pglogical which actually supports both output types.

> In init_rel_sync_cache(), the way hash_flags is set seems kind of
> weird.  I think that variable could be removed and the flags put
> directly into the hash_create() call.
>

Eh, yes no idea how that came to be.

> pgoutput_config.c seems over-engineered, e.g., converting cstring to
> Datum and back.  Just do normal DefElem list parsing in pgoutput.c.
> That's not pretty either, but at least it's a common coding pattern.
>

Yes now that we have only couple of options I agree.

> In the protocol documentation, explain the meaning of int64 as a
> commit timestamp.
>

You mean that it's milliseconds since postgres epoch?

> On the actual protocol messages:
>
> Why do strings have a length byte?  That is not how other strings in
> the protocol work.  As a minor side-effect, this would limit for
> example column names to 255 characters.

Because I originally sent them without the null termination but I guess 
they don't really need it anymore. (the 255 char limit is not really 
important in practice given the column length is limited to 64 
characters anyway)

>
> The message structure doesn't match the documentation in some ways.
> For example Attributes and TupleData are not separate messages but are
> contained in Relation and Insert/Update/Delete messages.  So the
> documentation needs to be structured a bit differently.
>
> In the Attributes message (or actually Relation message), we don't
> need the 'A' and 'C' bytes.
>

Hmm okay will look into it. I guess if we remove the 'A' then rest of 
the Attribute message neatly merges into the Relation message. The more 
interesting part will be the TupleData as it's common part of other 
messages.

> I'm not sure that pgoutput should concern itself with the client
> encoding.  The client encoding should already be set by the initial
> FE/BE protocol handshake.  I haven't checked that further yet, so it
> might already work, or it should be made to work that way, or I might
> be way off.

Yes, I think you are right, that was there mostly for same reason as the 
pg_version.

>
> Slight abuse of pqformat functions.  We're not composing messages
> using pq_beginmessage()/pq_endmessage(), and we're not using
> pq_getmsgend() when reading.  The "proper" way to do this is probably
> to define a custom set of PQcommMethods.  (low priority)
>

If we change that, I'd probably rather go with direct use of StringInfo 
functions.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Andres Freund
Date:
On 2016-09-12 21:47:08 +0200, Petr Jelinek wrote:
> On 09/09/16 06:33, Peter Eisentraut wrote:
> > The start_replication option pg_version option is not documented and
> > not used in any later patch.  We can probably do without it and just
> > rely on the protocol version.
> > 
> 
> That's leftover from binary type data transfer which is not part of this
> initial approach as it adds a lot of complications to both protocol and
> apply side. So yes can do without.

FWIW, I don't think we can leave this out of the initial protocol
design. We don't have to implement it, but it has to be part of the
design.

Greetings,

Andres Freund



Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 12/09/16 21:54, Andres Freund wrote:
> On 2016-09-12 21:47:08 +0200, Petr Jelinek wrote:
>> On 09/09/16 06:33, Peter Eisentraut wrote:
>>> The start_replication option pg_version option is not documented and
>>> not used in any later patch.  We can probably do without it and just
>>> rely on the protocol version.
>>>
>>
>> That's leftover from binary type data transfer which is not part of this
>> initial approach as it adds a lot of complications to both protocol and
>> apply side. So yes can do without.
>
> FWIW, I don't think we can leave this out of the initial protocol
> design. We don't have to implement it, but it has to be part of the
> design.
>

I don't think it's a good idea to have unimplemented parts of protocol, 
we have protocol version so it can be added in v2 when we have code that 
is able to handle it.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Andres Freund
Date:
On 2016-09-12 21:57:39 +0200, Petr Jelinek wrote:
> On 12/09/16 21:54, Andres Freund wrote:
> > On 2016-09-12 21:47:08 +0200, Petr Jelinek wrote:
> > > On 09/09/16 06:33, Peter Eisentraut wrote:
> > > > The start_replication option pg_version option is not documented and
> > > > not used in any later patch.  We can probably do without it and just
> > > > rely on the protocol version.
> > > > 
> > > 
> > > That's leftover from binary type data transfer which is not part of this
> > > initial approach as it adds a lot of complications to both protocol and
> > > apply side. So yes can do without.
> > 
> > FWIW, I don't think we can leave this out of the initial protocol
> > design. We don't have to implement it, but it has to be part of the
> > design.
> > 
> 
> I don't think it's a good idea to have unimplemented parts of protocol, we
> have protocol version so it can be added in v2 when we have code that is
> able to handle it.

I don't think we have to have it part of the protocol. But it has to be
forseen, otherwise introducing it later will end up requiring more
invasive changes than acceptable. I don't want to repeat the "libpq v3
protocol" evolution story here.



Re: Logical Replication WIP

From
Petr Jelinek
Date:

On 12/09/16 22:21, Andres Freund wrote:
> On 2016-09-12 21:57:39 +0200, Petr Jelinek wrote:
>> On 12/09/16 21:54, Andres Freund wrote:
>>> On 2016-09-12 21:47:08 +0200, Petr Jelinek wrote:
>>>> On 09/09/16 06:33, Peter Eisentraut wrote:
>>>>> The start_replication option pg_version option is not documented and
>>>>> not used in any later patch.  We can probably do without it and just
>>>>> rely on the protocol version.
>>>>>
>>>>
>>>> That's leftover from binary type data transfer which is not part of this
>>>> initial approach as it adds a lot of complications to both protocol and
>>>> apply side. So yes can do without.
>>>
>>> FWIW, I don't think we can leave this out of the initial protocol
>>> design. We don't have to implement it, but it has to be part of the
>>> design.
>>>
>>
>> I don't think it's a good idea to have unimplemented parts of protocol, we
>> have protocol version so it can be added in v2 when we have code that is
>> able to handle it.
>
> I don't think we have to have it part of the protocol. But it has to be
> forseen, otherwise introducing it later will end up requiring more
> invasive changes than acceptable. I don't want to repeat the "libpq v3
> protocol" evolution story here.
>

Oh sure, I don't see that as big problem, the TupleData already contains 
type of the data it sends (to distinguish between nulls and text data) 
so that's mostly about adding some different type there and we'll also 
need type info in the column part of the Relation message but that 
should be easy to fence with one if for different protocol version.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Craig Ringer
Date:
On 13 September 2016 at 06:03, Petr Jelinek <petr@2ndquadrant.com> wrote:

> Oh sure, I don't see that as big problem, the TupleData already contains
> type of the data it sends (to distinguish between nulls and text data) so
> that's mostly about adding some different type there and we'll also need
> type info in the column part of the Relation message but that should be easy
> to fence with one if for different protocol version.

The missing piece seems to be negotiation.

If a binary-aware client connects to a non-binary aware server, the
non-binary-aware server needs a way to say "you requested this option
I don't understand, go away" or "you asked for binary but I don't
support that".

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 13/09/16 02:55, Craig Ringer wrote:
> On 13 September 2016 at 06:03, Petr Jelinek <petr@2ndquadrant.com> wrote:
>
>> Oh sure, I don't see that as big problem, the TupleData already contains
>> type of the data it sends (to distinguish between nulls and text data) so
>> that's mostly about adding some different type there and we'll also need
>> type info in the column part of the Relation message but that should be easy
>> to fence with one if for different protocol version.
>
> The missing piece seems to be negotiation.
>
> If a binary-aware client connects to a non-binary aware server, the
> non-binary-aware server needs a way to say "you requested this option
> I don't understand, go away" or "you asked for binary but I don't
> support that".
>

Not sure what you mean by negotiation. Why would that be needed? You 
know server version when you connect and when you know that you also 
know what capabilities that version of Postgres has. If you send 
unrecognized option you get corresponding error.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Andres Freund
Date:
Hi,

First read through the current version. Hence no real architectural
comments.

On 2016-09-09 00:59:26 +0200, Petr Jelinek wrote:

> diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c
> new file mode 100644
> index 0000000..e0c719d
> --- /dev/null
> +++ b/src/backend/commands/publicationcmds.c
> @@ -0,0 +1,761 @@
> +/*-------------------------------------------------------------------------
> + *
> + * publicationcmds.c
> + *        publication manipulation
> + *
> + * Copyright (c) 2015, PostgreSQL Global Development Group
> + *
> + * IDENTIFICATION
> + *        publicationcmds.c
>

Not that I'm a fan of this line in the first place, but usually it does
include the path.


> +static void
> +check_replication_permissions(void)
> +{
> +    if (!superuser() && !has_rolreplication(GetUserId()))
> +        ereport(ERROR,
> +                (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> +                 (errmsg("must be superuser or replication role to manipulate publications"))));
> +}

Do we want to require owner privileges for replication roles? I'd say
no, but want to raise the question.


> +ObjectAddress
> +CreatePublication(CreatePublicationStmt *stmt)
> +{
> +    Relation    rel;
> +    ObjectAddress myself;
> +    Oid            puboid;
> +    bool        nulls[Natts_pg_publication];
> +    Datum        values[Natts_pg_publication];
> +    HeapTuple    tup;
> +    bool        replicate_insert_given;
> +    bool        replicate_update_given;
> +    bool        replicate_delete_given;
> +    bool        replicate_insert;
> +    bool        replicate_update;
> +    bool        replicate_delete;
> +
> +    check_replication_permissions();
> +
> +    rel = heap_open(PublicationRelationId, RowExclusiveLock);
> +
> +    /* Check if name is used */
> +    puboid = GetSysCacheOid1(PUBLICATIONNAME, CStringGetDatum(stmt->pubname));
> +    if (OidIsValid(puboid))
> +    {
> +        ereport(ERROR,
> +                (errcode(ERRCODE_DUPLICATE_OBJECT),
> +                 errmsg("publication \"%s\" already exists",
> +                        stmt->pubname)));
> +    }
> +
> +    /* Form a tuple. */
> +    memset(values, 0, sizeof(values));
> +    memset(nulls, false, sizeof(nulls));
> +
> +    values[Anum_pg_publication_pubname - 1] =
> +        DirectFunctionCall1(namein, CStringGetDatum(stmt->pubname));
> +
> +    parse_publication_options(stmt->options,
> +                              &replicate_insert_given, &replicate_insert,
> +                              &replicate_update_given, &replicate_update,
> +                              &replicate_delete_given, &replicate_delete);
> +
> +    values[Anum_pg_publication_puballtables - 1] =
> +        BoolGetDatum(stmt->for_all_tables);
> +    values[Anum_pg_publication_pubreplins - 1] =
> +        BoolGetDatum(replicate_insert);
> +    values[Anum_pg_publication_pubreplupd - 1] =
> +        BoolGetDatum(replicate_update);
> +    values[Anum_pg_publication_pubrepldel - 1] =
> +        BoolGetDatum(replicate_delete);
> +
> +    tup = heap_form_tuple(RelationGetDescr(rel), values, nulls);
> +
> +    /* Insert tuple into catalog. */
> +    puboid = simple_heap_insert(rel, tup);
> +    CatalogUpdateIndexes(rel, tup);
> +    heap_freetuple(tup);
> +
> +    ObjectAddressSet(myself, PublicationRelationId, puboid);
> +
> +    /* Make the changes visible. */
> +    CommandCounterIncrement();
> +
> +    if (stmt->tables)
> +    {
> +        List       *rels;
> +
> +        Assert(list_length(stmt->tables) > 0);
> +
> +        rels = GatherTableList(stmt->tables);
> +        PublicationAddTables(puboid, rels, true, NULL);
> +        CloseTables(rels);
> +    }
> +    else if (stmt->for_all_tables || stmt->schema)
> +    {
> +        List       *rels;
> +
> +        rels = GatherTables(stmt->schema);
> +        PublicationAddTables(puboid, rels, true, NULL);
> +        CloseTables(rels);
> +    }

Isn't this (and ALTER) racy? What happens if tables are concurrently
created? This session wouldn't necessarily see the tables, and other
sessions won't see for_all_tables/schema.   Evaluating
for_all_tables/all_in_schema when the publication is used, would solve
that problem.

> +/*
> + * Gather all tables optinally filtered by schema name.
> + * The gathered tables are locked in access share lock mode.
> + */
> +static List *
> +GatherTables(char *nspname)
> +{
> +    Oid            nspid = InvalidOid;
> +    List       *rels = NIL;
> +    Relation    rel;
> +    SysScanDesc scan;
> +    ScanKeyData key[1];
> +    HeapTuple    tup;
> +
> +    /* Resolve and validate the schema if specified */
> +    if (nspname)
> +    {
> +        nspid = LookupExplicitNamespace(nspname, false);
> +        if (IsSystemNamespace(nspid) || IsToastNamespace(nspid))
> +            ereport(ERROR,
> +                    (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                     errmsg("only tables in user schemas can be added to publication"),
> +                     errdetail("%s is a system schema", strVal(nspname))));
> +    }

Why are we restricting pg_catalog here? There's a bunch of extensions
creating objects therein, and we allow that. Seems better to just rely
on the IsSystemClass check for that below.

> +/*
> + * Gather Relations based o provided by RangeVar list.
> + * The gathered tables are locked in access share lock mode.
> + */

Why access share? Shouldn't we make this ShareUpdateExclusive or
similar, to prevent schema changes?


> +static List *
> +GatherTableList(List *tables)
> +{
> +    List       *relids = NIL;
> +    List       *rels = NIL;
> +    ListCell   *lc;
> +
> +    /*
> +     * Open, share-lock, and check all the explicitly-specified relations
> +     */
> +    foreach(lc, tables)
> +    {
> +        RangeVar   *rv = lfirst(lc);
> +        Relation    rel;
> +        bool        recurse = interpretInhOption(rv->inhOpt);
> +        Oid            myrelid;
> +
> +        rel = heap_openrv(rv, AccessShareLock);
> +        myrelid = RelationGetRelid(rel);
> +        /* don't throw error for "foo, foo" */
> +        if (list_member_oid(relids, myrelid))
> +        {
> +            heap_close(rel, AccessShareLock);
> +            continue;
> +        }
> +        rels = lappend(rels, rel);
> +        relids = lappend_oid(relids, myrelid);
> +
> +        if (recurse)
> +        {
> +            ListCell   *child;
> +            List       *children;
> +
> +            children = find_all_inheritors(myrelid, AccessShareLock,
> +                                           NULL);
> +
> +            foreach(child, children)
> +            {
> +                Oid            childrelid = lfirst_oid(child);
> +
> +                if (list_member_oid(relids, childrelid))
> +                    continue;
> +
> +                /* find_all_inheritors already got lock */
> +                rel = heap_open(childrelid, NoLock);
> +                rels = lappend(rels, rel);
> +                relids = lappend_oid(relids, childrelid);
> +            }
> +        }
> +    }

Hm, can't this yield duplicates, when both an inherited and a top level
relation are specified?


> @@ -713,6 +714,25 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
>      ObjectAddressSet(address, RelationRelationId, relationId);
>  
>      /*
> +     * If the newly created relation is a table and there are publications
> +     * which were created as FOR ALL TABLES, we want to add the relation
> +     * membership to those publications.
> +     */
> +
> +    if (relkind == RELKIND_RELATION)
> +    {
> +        List       *pubids = GetAllTablesPublications();
> +        ListCell   *lc;
> +
> +        foreach(lc, pubids)
> +        {
> +            Oid    pubid = lfirst_oid(lc);
> +
> +            publication_add_relation(pubid, rel, false);
> +        }
> +    }
> +

Hm, this has the potential to noticeably slow down table creation.

> +publication_opt_item:
> +            IDENT
> +                {
> +                    /*
> +                     * We handle identifiers that aren't parser keywords with
> +                     * the following special-case codes, to avoid bloating the
> +                     * size of the main parser.
> +                     */
> +                    if (strcmp($1, "replicate_insert") == 0)
> +                        $$ = makeDefElem("replicate_insert",
> +                                         (Node *)makeInteger(TRUE), @1);
> +                    else if (strcmp($1, "noreplicate_insert") == 0)
> +                        $$ = makeDefElem("replicate_insert",
> +                                         (Node *)makeInteger(FALSE), @1);
> +                    else if (strcmp($1, "replicate_update") == 0)
> +                        $$ = makeDefElem("replicate_update",
> +                                         (Node *)makeInteger(TRUE), @1);
> +                    else if (strcmp($1, "noreplicate_update") == 0)
> +                        $$ = makeDefElem("replicate_update",
> +                                         (Node *)makeInteger(FALSE), @1);
> +                    else if (strcmp($1, "replicate_delete") == 0)
> +                        $$ = makeDefElem("replicate_delete",
> +                                         (Node *)makeInteger(TRUE), @1);
> +                    else if (strcmp($1, "noreplicate_delete") == 0)
> +                        $$ = makeDefElem("replicate_delete",
> +                                         (Node *)makeInteger(FALSE), @1);
> +                    else
> +                        ereport(ERROR,
> +                                (errcode(ERRCODE_SYNTAX_ERROR),
> +                                 errmsg("unrecognized publication option \"%s\"", $1),
> +                                     parser_errposition(@1)));
> +                }
> +        ;

I'm kind of inclined to do this checking at execution (or transform)
time instead.  That allows extension to add options, and handle them in
utility hooks.

> +
> +/* ----------------
> + *        pg_publication_rel definition.  cpp turns this into
> + *        typedef struct FormData_pg_publication_rel
> + *
> + * ----------------
> + */
> +#define PublicationRelRelationId                6106
> +
> +CATALOG(pg_publication_rel,6106)
> +{
> +    Oid        pubid;                /* Oid of the publication */
> +    Oid        relid;                /* Oid of the relation */
> +} FormData_pg_publication_rel;

Hm. Do we really want this to have an oid? Won't that significantly,
especially if multiple publications are present, increase our oid
consumption?  It seems entirely sufficient to identify rows in here
using (pubid, relid).


> +ObjectAddress
> +CreateSubscription(CreateSubscriptionStmt *stmt)
> +{
> +    Relation    rel;
> +    ObjectAddress myself;
> +    Oid            subid;
> +    bool        nulls[Natts_pg_subscription];
> +    Datum        values[Natts_pg_subscription];
> +    HeapTuple    tup;
> +    bool        enabled_given;
> +    bool        enabled;
> +    char       *conninfo;
> +    List       *publications;
> +
> +    check_subscription_permissions();
> +
> +    rel = heap_open(SubscriptionRelationId, RowExclusiveLock);
> +
> +    /* Check if name is used */
> +    subid = GetSysCacheOid2(SUBSCRIPTIONNAME, MyDatabaseId,
> +                            CStringGetDatum(stmt->subname));
> +    if (OidIsValid(subid))
> +    {
> +        ereport(ERROR,
> +                (errcode(ERRCODE_DUPLICATE_OBJECT),
> +                 errmsg("subscription \"%s\" already exists",
> +                        stmt->subname)));
> +    }
> +
> +    /* Parse and check options. */
> +    parse_subscription_options(stmt->options, &enabled_given, &enabled,
> +                               &conninfo, &publications);
> +
> +    /* TODO: improve error messages here. */
> +    if (conninfo == NULL)
> +        ereport(ERROR,
> +                (errcode(ERRCODE_SYNTAX_ERROR),
> +                 errmsg("connection not specified")));

Probably also makes sense to parse the conninfo here to verify it looks
saen.  Although that's fairly annoying to do, because the relevant code
is libpq :(


> diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
> index 65230e2..f3d54c8 100644
> --- a/src/backend/nodes/copyfuncs.c
> +++ b/src/backend/nodes/copyfuncs.c

I think you might be missing outfuncs support.

> +
> +CATALOG(pg_subscription,6100) BKI_SHARED_RELATION BKI_ROWTYPE_OID(6101) BKI_SCHEMA_MACRO
> +{
> +    Oid            subdbid;            /* Database the subscription is in. */
> +    NameData    subname;        /* Name of the subscription */
> +    bool        subenabled;        /* True if the subsription is enabled (running) */

Not sure what "running" means here.

> +#ifdef CATALOG_VARLEN            /* variable-length fields start here */
> +    text        subconninfo;    /* Connection string to the provider */
> +    NameData    subslotname;    /* Slot name on provider */
> +
> +    name        subpublications[1];    /* List of publications subscribed to */
> +#endif
> +} FormData_pg_subscription;

> +    <varlistentry>
> +     <term>
> +      publication_names
> +     </term>
> +     <listitem>
> +      <para>
> +       Comma separated list of publication names for which to subscribe
> +       (receive changes). See
> +       <xref linkend="logical-replication-publication"> for more info.
> +      </para>
> +     </listitem>
> +    </varlistentry>
> +   </variablelist>

Do we need to specify an escaping scheme here?

> +  <para>
> +   Every DML message contains arbitraty relation id, which can be mapped to

Typo: "arbitraty"


> +<listitem>
> +<para>
> +                Commit timestamp of the transaction.
> +</para>
> +</listitem>
> +</varlistentry>

Perhaps mention it's relative to postgres epoch?



> +<variablelist>
> +<varlistentry>
> +<term>
> +        Byte1('O')
> +</term>
> +<listitem>
> +<para>
> +                Identifies the message as an origin message.
> +</para>
> +</listitem>
> +</varlistentry>
> +<varlistentry>
> +<term>
> +        Int64
> +</term>
> +<listitem>
> +<para>
> +                The LSN of the commit on the origin server.
> +</para>
> +</listitem>
> +</varlistentry>
> +<varlistentry>
> +<term>
> +        Int8
> +</term>
> +<listitem>
> +<para>
> +                Length of the origin name (including the NULL-termination
> +                character).
> +</para>
> +</listitem>
> +</varlistentry>

Should this explain that there could be mulitple origin messages (when
replay switched origins during an xact)?

> +<para>
> +                Relation name.
> +</para>
> +</listitem>
> +</varlistentry>
> +</variablelist>
> +
> +</para>
> +
> +<para>
> +This message is always followed by Attributes message.
> +</para>

What's the point of having this separate from the relation message?

> +<varlistentry>
> +<term>
> +        Byte1('C')
> +</term>
> +<listitem>
> +<para>
> +                Start of column block.
> +</para>
> +</listitem>

"block"?

> +</varlistentry><varlistentry>
> +<term>
> +        Int8
> +</term>
> +<listitem>
> +<para>
> +                Flags for the column. Currently can be either 0 for no flags
> +                or one which marks the column as part of the key.
> +</para>
> +</listitem>
> +</varlistentry>
> +<varlistentry>
> +<term>
> +        Int8
> +</term>
> +<listitem>
> +<para>
> +                Length of column name (including the NULL-termination
> +                character).
> +</para>
> +</listitem>
> +</varlistentry>
> +<varlistentry>
> +<term>
> +        String
> +</term>
> +<listitem>
> +<para>
> +                Name of the column.
> +</para>
> +</listitem>
> +</varlistentry>

Huh, no type information?



> +<varlistentry>
> +<term>
> +        Byte1('O')
> +</term>
> +<listitem>
> +<para>
> +                Identifies the following TupleData message as the old tuple
> +                (deleted tuple).
> +</para>
> +</listitem>
> +</varlistentry>

Should we discern between old key and old tuple?


> +#define IS_REPLICA_IDENTITY    1

Defining this in the c file doesn't seem particularly useful?



> +/*
> + * Read transaction BEGIN from the stream.
> + */
> +void
> +logicalrep_read_begin(StringInfo in, XLogRecPtr *remote_lsn,
> +                      TimestampTz *committime, TransactionId *remote_xid)
> +{
> +    /* read fields */
> +    *remote_lsn = pq_getmsgint64(in);
> +    Assert(*remote_lsn != InvalidXLogRecPtr);
> +    *committime = pq_getmsgint64(in);
> +    *remote_xid = pq_getmsgint(in, 4);
> +}

In network exposed stuff it seems better not to use assert, and error
out instead.


> +/*
> + * Write UPDATE to the output stream.
> + */
> +void
> +logicalrep_write_update(StringInfo out, Relation rel, HeapTuple oldtuple,
> +                       HeapTuple newtuple)
> +{
> +    pq_sendbyte(out, 'U');        /* action UPDATE */
> +
> +    /* use Oid as relation identifier */
> +    pq_sendint(out, RelationGetRelid(rel), 4);

Wonder if there's a way that could screw us. What happens if there's an
oid wraparound, and a relation is dropped? Then a new relation could end
up with same id. Maybe answered somewhere further down.


> +/*
> + * Write a tuple to the outputstream, in the most efficient format possible.
> + */
> +static void
> +logicalrep_write_tuple(StringInfo out, Relation rel, HeapTuple tuple)
> +{

> +    /* Write the values */
> +    for (i = 0; i < desc->natts; i++)
> +    {
> +        outputstr =    OidOutputFunctionCall(typclass->typoutput, values[i]);

Odd spacing.



> +/*
> + * Initialize this plugin
> + */
> +static void
> +pgoutput_startup(LogicalDecodingContext * ctx, OutputPluginOptions *opt,
> +                  bool is_init)
> +{
> +    PGOutputData   *data = palloc0(sizeof(PGOutputData));
> +    int                client_encoding;
> +
> +    /* Create our memory context for private allocations. */
> +    data->context = AllocSetContextCreate(ctx->context,
> +                                          "logical replication output context",
> +                                          ALLOCSET_DEFAULT_MINSIZE,
> +                                          ALLOCSET_DEFAULT_INITSIZE,
> +                                          ALLOCSET_DEFAULT_MAXSIZE);
> +
> +    ctx->output_plugin_private = data;
> +
> +    /*
> +     * This is replication start and not slot initialization.
> +     *
> +     * Parse and validate options passed by the client.
> +     */
> +    if (!is_init)
> +    {
> +        /* We can only do binary */
> +        if (opt->output_type != OUTPUT_PLUGIN_BINARY_OUTPUT)
> +            ereport(ERROR,
> +                    (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                     errmsg("only binary mode is supported for logical replication protocol")));

Shouldn't you just set            opt->output_type = OUTPUT_PLUGIN_BINARY_OUTPUT;
or is the goal just to output a better message?

> +
> +/*
> + * COMMIT callback
> + */
> +static void
> +pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
> +                     XLogRecPtr commit_lsn)
> +{
> +    OutputPluginPrepareWrite(ctx, true);
> +    logicalrep_write_commit(ctx->out, txn, commit_lsn);
> +    OutputPluginWrite(ctx, true);
> +}

Hm, so we don't reset the context for these...

> +/*
> + * Sends the decoded DML over wire.
> + */
> +static void
> +pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
> +                Relation relation, ReorderBufferChange *change)
> +{

> +    /* Avoid leaking memory by using and resetting our own context */
> +    old = MemoryContextSwitchTo(data->context);
> +
> +    /*
> +     * Write the relation schema if the current schema haven't been sent yet.
> +     */
> +    if (!relentry->schema_sent)
> +    {
> +        OutputPluginPrepareWrite(ctx, false);
> +        logicalrep_write_rel(ctx->out, relation);
> +        OutputPluginWrite(ctx, false);
> +        relentry->schema_sent = true;
> +    }
> +
> +    /* Send the data */
> +    switch (change->action)
> +    {
...
> +    /* Cleanup */
> +    MemoryContextSwitchTo(old);
> +    MemoryContextReset(data->context);
> +}

IIRC there were some pfree's in called functions. It's probably better
to remove those and rely on this.

> +/*
> + * Load publications from the list of publication names.
> + */
> +static List *
> +LoadPublications(List *pubnames)
> +{
> +    List       *result = NIL;
> +    ListCell   *lc;
> +
> +    foreach (lc, pubnames)
> +    {
> +        char           *pubname = (char *) lfirst(lc);
> +        Publication       *pub = GetPublicationByName(pubname, false);
> +
> +        result = lappend(result, pub);
> +    }
> +
> +    return result;
> +}

Why are we doing this eagerly? On systems with a lot of relations
this'll suck up a fair amount of memory, without much need?

> +/*
> + * Remove all the entries from our relation cache.
> + */
> +static void
> +destroy_rel_sync_cache(void)
> +{
> +    HASH_SEQ_STATUS        status;
> +    RelationSyncEntry  *entry;
> +
> +    if (RelationSyncCache == NULL)
> +        return;
> +
> +    hash_seq_init(&status, RelationSyncCache);
> +
> +    while ((entry = (RelationSyncEntry *) hash_seq_search(&status)) != NULL)
> +    {
> +        if (hash_search(RelationSyncCache, (void *) &entry->relid,
> +                        HASH_REMOVE, NULL) == NULL)
> +            elog(ERROR, "hash table corrupted");
> +    }
> +
> +    RelationSyncCache = NULL;
> +}

Any reason not to just destroy the hash table instead?





> +enum {
> +    PARAM_UNRECOGNISED,
> +    PARAM_PROTOCOL_VERSION,
> +    PARAM_ENCODING,
> +    PARAM_PG_VERSION,
> +    PARAM_PUBLICATION_NAMES,
> +} OutputPluginParamKey;
> +
> +typedef struct {
> +    const char * const paramname;
> +    int    paramkey;
> +} OutputPluginParam;
> +
> +/* Oh, if only C had switch on strings */
> +static OutputPluginParam param_lookup[] = {
> +    {"proto_version", PARAM_PROTOCOL_VERSION},
> +    {"encoding", PARAM_ENCODING},
> +    {"pg_version", PARAM_PG_VERSION},
> +    {"publication_names", PARAM_PUBLICATION_NAMES},
> +    {NULL, PARAM_UNRECOGNISED}
> +};
> +
> +
> +/*
> + * Read parameters sent by client at startup and store recognised
> + * ones in the parameters PGOutputData.
> + *
> + * The data must have all client-supplied parameter fields zeroed,
> + * such as by memset or palloc0, since values not supplied
> + * by the client are not set.
> + */
> +void
> +pgoutput_process_parameters(List *options, PGOutputData *data)
> +{
> +    ListCell    *lc;
> +
> +    /* Examine all the other params in the message. */
> +    foreach(lc, options)
> +    {
> +        DefElem    *elem = lfirst(lc);
> +        Datum        val;
> +
> +        Assert(elem->arg == NULL || IsA(elem->arg, String));
> +
> +        /* Check each param, whether or not we recognise it */
> +        switch(get_param_key(elem->defname))
> +        {
> +            case PARAM_PROTOCOL_VERSION:
> +                val = get_param_value(elem, OUTPUT_PARAM_TYPE_UINT32, false);
> +                data->protocol_version = DatumGetUInt32(val);
> +                break;
> +
> +            case PARAM_ENCODING:
> +                val = get_param_value(elem, OUTPUT_PARAM_TYPE_STRING, false);
> +                data->client_encoding = DatumGetCString(val);
> +                break;
> +
> +            case PARAM_PG_VERSION:
> +                val = get_param_value(elem, OUTPUT_PARAM_TYPE_UINT32, false);
> +                data->client_pg_version = DatumGetUInt32(val);
> +                break;
> +
> +            case PARAM_PUBLICATION_NAMES:
> +                val = get_param_value(elem, OUTPUT_PARAM_TYPE_STRING, false);
> +                if (!SplitIdentifierString(DatumGetCString(val), ',',
> +                                           &data->publication_names))
> +                    ereport(ERROR,
> +                            (errcode(ERRCODE_INVALID_NAME),
> +                             errmsg("invalid publication name syntax")));
> +
> +                break;
> +
> +            default:
> +                ereport(ERROR,
> +                        (errmsg("Unrecognised pgoutput parameter %s",
> +                                elem->defname)));
> +                break;
> +        }
> +    }
> +}
> +
> +/*
> + * Look up a param name to find the enum value for the
> + * param, or PARAM_UNRECOGNISED if not found.
> + */
> +static int
> +get_param_key(const char * const param_name)
> +{
> +    OutputPluginParam *param = ¶m_lookup[0];
> +
> +    do {
> +        if (strcmp(param->paramname, param_name) == 0)
> +            return param->paramkey;
> +        param++;
> +    } while (param->paramname != NULL);
> +
> +    return PARAM_UNRECOGNISED;
> +}

I'm not following why this isn't just one routine with a chain of
else if (strmcp() == 0)
blocks?


> From 2241471aec03de553126c2d5fc012fcba1ecf50d Mon Sep 17 00:00:00 2001
> From: Petr Jelinek <pjmodos@pjmodos.net>
> Date: Wed, 6 Jul 2016 13:59:23 +0200
> Subject: [PATCH 4/6] Make libpqwalreceiver reentrant
> 
> ---
>  .../libpqwalreceiver/libpqwalreceiver.c            | 328 ++++++++++++++-------
>  src/backend/replication/walreceiver.c              |  67 +++--
>  src/include/replication/walreceiver.h              |  75 +++--
>  3 files changed, 306 insertions(+), 164 deletions(-)
> 
> diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
> index f1c843e..5da4474 100644
> --- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
> +++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
> @@ -25,6 +25,7 @@
>  #include "miscadmin.h"
>  #include "replication/walreceiver.h"
>  #include "utils/builtins.h"
> +#include "utils/pg_lsn.h"
>  
>  #ifdef HAVE_POLL_H
>  #include <poll.h>
> @@ -38,62 +39,83 @@
>  
>  PG_MODULE_MAGIC;
>  
> -void        _PG_init(void);
> +struct WalReceiverConnHandle {
> +    /* Current connection to the primary, if any */
> +    PGconn *streamConn;
> +    /* Buffer for currently read records */
> +    char   *recvBuf;
> +};

newline before {

> -/* Current connection to the primary, if any */
> -static PGconn *streamConn = NULL;
> -
> -/* Buffer for currently read records */
> -static char *recvBuf = NULL;

Yuck, this indeed seems better.

>  
>  /*
> - * Module load callback
> + * Module initialization callback
>   */
> -void
> -_PG_init(void)
> +WalReceiverConnHandle *
> +_PG_walreceirver_conn_init(WalReceiverConnAPI *wrcapi)
>  {
> -    /* Tell walreceiver how to reach us */
> -    if (walrcv_connect != NULL || walrcv_identify_system != NULL ||
> -        walrcv_readtimelinehistoryfile != NULL ||
> -        walrcv_startstreaming != NULL || walrcv_endstreaming != NULL ||
> -        walrcv_receive != NULL || walrcv_send != NULL ||
> -        walrcv_disconnect != NULL)
> -        elog(ERROR, "libpqwalreceiver already loaded");
> -    walrcv_connect = libpqrcv_connect;
> -    walrcv_get_conninfo = libpqrcv_get_conninfo;
> -    walrcv_identify_system = libpqrcv_identify_system;
> -    walrcv_readtimelinehistoryfile = libpqrcv_readtimelinehistoryfile;
> -    walrcv_startstreaming = libpqrcv_startstreaming;
> -    walrcv_endstreaming = libpqrcv_endstreaming;
> -    walrcv_receive = libpqrcv_receive;
> -    walrcv_send = libpqrcv_send;
> -    walrcv_disconnect = libpqrcv_disconnect;
> +    WalReceiverConnHandle *handle;
> +
> +    handle = palloc0(sizeof(WalReceiverConnHandle));
> +
> +    /* Tell caller how to reach us */
> +    wrcapi->connect = libpqrcv_connect;
> +    wrcapi->get_conninfo = libpqrcv_get_conninfo;
> +    wrcapi->identify_system = libpqrcv_identify_system;
> +    wrcapi->readtimelinehistoryfile = libpqrcv_readtimelinehistoryfile;
> +    wrcapi->create_slot = libpqrcv_create_slot;
> +    wrcapi->startstreaming_physical = libpqrcv_startstreaming_physical;
> +    wrcapi->startstreaming_logical = libpqrcv_startstreaming_logical;
> +    wrcapi->endstreaming = libpqrcv_endstreaming;
> +    wrcapi->receive = libpqrcv_receive;
> +    wrcapi->send = libpqrcv_send;
> +    wrcapi->disconnect = libpqrcv_disconnect;
> +
> +    return handle;
>  }

This however I'm not following. Why do we need multiple copies of this?
And why aren't we doing the assignments in _PG_init?  Seems better to
just allocate one WalRcvCalllbacks globally and assign all these as
constants.  Then the establishment function can just return all these
(as part of a bigger struct).


(skipped logical rep docs)

> diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
> index 8acdff1..34007d3 100644
> --- a/doc/src/sgml/reference.sgml
> +++ b/doc/src/sgml/reference.sgml
> @@ -54,11 +54,13 @@
>     &alterOperatorClass;
>     &alterOperatorFamily;
>     &alterPolicy;
> +   &alterPublication;
>     &alterRole;
>     &alterRule;
>     &alterSchema;
>     &alterSequence;
>     &alterServer;
> +   &alterSubscription;
>     &alterSystem;
>     &alterTable;
>     &alterTableSpace;
> @@ -100,11 +102,13 @@
>     &createOperatorClass;
>     &createOperatorFamily;
>     &createPolicy;
> +   &createPublication;
>     &createRole;
>     &createRule;
>     &createSchema;
>     &createSequence;
>     &createServer;
> +   &createSubscription;
>     &createTable;
>     &createTableAs;
>     &createTableSpace;
> @@ -144,11 +148,13 @@
>     &dropOperatorFamily;
>     &dropOwned;
>     &dropPolicy;
> +   &dropPublication;
>     &dropRole;
>     &dropRule;
>     &dropSchema;
>     &dropSequence;
>     &dropServer;
> +   &dropSubscription;
>     &dropTable;
>     &dropTableSpace;
>     &dropTSConfig;

Hm, shouldn't all these have been registered in the earlier patch?



> diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
> index d29d3f9..f2052b8 100644
> --- a/src/backend/commands/subscriptioncmds.c
> +++ b/src/backend/commands/subscriptioncmds.c

This sure is a lot of yanking around of previously added code.  At least
some of it looks like it should really have been part of the earlier
commit.


> @@ -327,6 +431,18 @@ DropSubscriptionById(Oid subid)
>  {
>      Relation    rel;
>      HeapTuple    tup;
> +    Datum        datum;
> +    bool        isnull;
> +    char       *subname;
> +    char       *conninfo;
> +    char       *slotname;
> +    RepOriginId    originid;
> +    MemoryContext            tmpctx,
> +                            oldctx;
> +    WalReceiverConnHandle  *wrchandle = NULL;
> +    WalReceiverConnAPI       *wrcapi = NULL;
> +    walrcvconn_init_fn        walrcvconn_init;
> +    LogicalRepWorker       *worker;
>  
>      check_subscription_permissions();
>  
> @@ -337,9 +453,135 @@ DropSubscriptionById(Oid subid)
>      if (!HeapTupleIsValid(tup))
>          elog(ERROR, "cache lookup failed for subscription %u", subid);
>  
> +    /*
> +     * Create temporary memory context to keep copy of subscription
> +     * info needed later in the execution.
> +     */
> +    tmpctx = AllocSetContextCreate(TopMemoryContext,
> +                                          "DropSubscription Ctx",
> +                                          ALLOCSET_DEFAULT_MINSIZE,
> +                                          ALLOCSET_DEFAULT_INITSIZE,
> +                                          ALLOCSET_DEFAULT_MAXSIZE);
> +    oldctx = MemoryContextSwitchTo(tmpctx);
> +
> +    /* Get subname */
> +    datum = SysCacheGetAttr(SUBSCRIPTIONOID, tup,
> +                            Anum_pg_subscription_subname, &isnull);
> +    Assert(!isnull);
> +    subname = pstrdup(NameStr(*DatumGetName(datum)));
> +
> +    /* Get conninfo */
> +    datum = SysCacheGetAttr(SUBSCRIPTIONOID, tup,
> +                            Anum_pg_subscription_subconninfo, &isnull);
> +    Assert(!isnull);
> +    conninfo = pstrdup(TextDatumGetCString(datum));
> +
> +    /* Get slotname */
> +    datum = SysCacheGetAttr(SUBSCRIPTIONOID, tup,
> +                            Anum_pg_subscription_subslotname, &isnull);
> +    Assert(!isnull);
> +    slotname = pstrdup(NameStr(*DatumGetName(datum)));
> +
> +    MemoryContextSwitchTo(oldctx);
> +
> +    /* Remove the tuple from catalog. */
>      simple_heap_delete(rel, &tup->t_self);
>  
> -    ReleaseSysCache(tup);
> +    /* Protect against launcher restarting the worker. */
> +    LWLockAcquire(LogicalRepLauncherLock, LW_EXCLUSIVE);
>  
> -    heap_close(rel, RowExclusiveLock);
> +    /* Kill the apply worker so that the slot becomes accessible. */
> +    LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
> +    worker = logicalrep_worker_find(subid);
> +    if (worker)
> +        logicalrep_worker_stop(worker);
> +    LWLockRelease(LogicalRepWorkerLock);
> +
> +    /* Wait for apply process to die. */
> +    for (;;)
> +    {
> +        int    rc;
> +
> +        CHECK_FOR_INTERRUPTS();
> +
> +        LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
> +        if (logicalrep_worker_count(subid) < 1)
> +        {
> +            LWLockRelease(LogicalRepWorkerLock);
> +            break;
> +        }
> +        LWLockRelease(LogicalRepWorkerLock);
> +
> +        /* Wait for more work. */
> +        rc = WaitLatch(&MyProc->procLatch,
> +                       WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
> +                       1000L);
> +
> +        /* emergency bailout if postmaster has died */
> +        if (rc & WL_POSTMASTER_DEATH)
> +            proc_exit(1);
> +
> +        ResetLatch(&MyProc->procLatch);
> +    }

I'm really far from convinced this is the right layer to perform these
operations.  Previously these routines were low level catalog
manipulation routines. Now they're certainly not.


> +    /* Remove the origin trakicking. */

typo.



> +    /*
> +     * Now that the catalog update is done, try to reserve slot at the
> +     * provider node using replication connection.
> +     */
> +    wrcapi = palloc0(sizeof(WalReceiverConnAPI));
> +
> +    walrcvconn_init = (walrcvconn_init_fn)
> +        load_external_function("libpqwalreceiver",
> +                               "_PG_walreceirver_conn_init", false, NULL);
> +
> +    if (walrcvconn_init == NULL)
> +        elog(ERROR, "libpqwalreceiver does not declare _PG_walreceirver_conn_init symbol");

This does rather reinforce my opinion that the _PG_init removal in
libpqwalreceiver isn't useful.

> diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
> index 699c934..fc998cd 100644
> --- a/src/backend/postmaster/bgworker.c
> +++ b/src/backend/postmaster/bgworker.c
> @@ -93,6 +93,9 @@ struct BackgroundWorkerHandle
>  
>  static BackgroundWorkerArray *BackgroundWorkerData;
>  
> +/* Enables registration of internal background workers. */
> +bool internal_bgworker_registration_in_progress = false;
> +
>  /*
>   * Calculate shared memory needed.
>   */
> @@ -745,7 +748,8 @@ RegisterBackgroundWorker(BackgroundWorker *worker)
>          ereport(DEBUG1,
>           (errmsg("registering background worker \"%s\"", worker->bgw_name)));
>  
> -    if (!process_shared_preload_libraries_in_progress)
> +    if (!process_shared_preload_libraries_in_progress &&
> +        !internal_bgworker_registration_in_progress)
>      {
>          if (!IsUnderPostmaster)
>              ereport(LOG,

Ugh.




>  /*
> + * Register internal background workers.
> + *
> + * This is here mainly because the permanent bgworkers are normally allowed
> + * to be registered only when share preload libraries are loaded which does
> + * not work for the internal ones.
> + */
> +static void
> +register_internal_bgworkers(void)
> +{
> +    internal_bgworker_registration_in_progress = true;
> +
> +    /* Register the logical replication worker launcher if appropriate. */
> +    if (!IsBinaryUpgrade && max_logical_replication_workers > 0)
> +    {
> +        BackgroundWorker bgw;
> +
> +        bgw.bgw_flags =    BGWORKER_SHMEM_ACCESS |
> +            BGWORKER_BACKEND_DATABASE_CONNECTION;
> +        bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
> +        bgw.bgw_main = ApplyLauncherMain;
> +        snprintf(bgw.bgw_name, BGW_MAXLEN,
> +                 "logical replication launcher");
> +        bgw.bgw_restart_time = 5;
> +        bgw.bgw_notify_pid = 0;
> +        bgw.bgw_main_arg = (Datum) 0;
> +
> +        RegisterBackgroundWorker(&bgw);
> +    }
> +
> +    internal_bgworker_registration_in_progress = false;
> +}

Who says these flags are right for everyone?  If we indeed want to go
through bgworkers here, I think you'll have to generallize this a bit,
so we don't check for max_logical_replication_workers and such here.  We
could e.g. have the shared memory sizing hooks set up a chain of
registrations.



> -static void
> +static char *
>  libpqrcv_identify_system(WalReceiverConnHandle *handle,
> -                         TimeLineID *primary_tli)
> +                         TimeLineID *primary_tli,
> +                         char **dbname)
>  {
> +    char       *sysid;
>      PGresult   *res;
> -    char       *primary_sysid;
> -    char        standby_sysid[32];
>  
>      /*
>       * Get the system identifier and timeline ID as a DataRow message from the
> @@ -231,24 +234,19 @@ libpqrcv_identify_system(WalReceiverConnHandle *handle,
>                   errdetail("Could not identify system: got %d rows and %d fields, expected %d rows and %d or more
fields.",
>                             ntuples, nfields, 3, 1)));
>      }
> -    primary_sysid = PQgetvalue(res, 0, 0);
> +    sysid = pstrdup(PQgetvalue(res, 0, 0));
>      *primary_tli = pg_atoi(PQgetvalue(res, 0, 1), 4, 0);
> -
> -    /*
> -     * Confirm that the system identifier of the primary is the same as ours.
> -     */
> -    snprintf(standby_sysid, sizeof(standby_sysid), UINT64_FORMAT,
> -             GetSystemIdentifier());
> -    if (strcmp(primary_sysid, standby_sysid) != 0)
> +    if (dbname)
>      {
> -        primary_sysid = pstrdup(primary_sysid);
> -        PQclear(res);
> -        ereport(ERROR,
> -                (errmsg("database system identifier differs between the primary and standby"),
> -                 errdetail("The primary's identifier is %s, the standby's identifier is %s.",
> -                           primary_sysid, standby_sysid)));
> +        if (PQgetisnull(res, 0, 3))
> +            *dbname = NULL;
> +        else
> +            *dbname = pstrdup(PQgetvalue(res, 0, 3));
>      }
> +
>      PQclear(res);
> +
> +    return sysid;
>  }
>  
>  /*
> @@ -274,7 +272,7 @@ libpqrcv_create_slot(WalReceiverConnHandle *handle, char *slotname,
>  
>      if (PQresultStatus(res) != PGRES_TUPLES_OK)
>      {
> -        elog(FATAL, "could not crate replication slot \"%s\": %s\n",
> +        elog(ERROR, "could not crate replication slot \"%s\": %s\n",
>               slotname, PQerrorMessage(handle->streamConn));
>      }
>  
> @@ -287,6 +285,28 @@ libpqrcv_create_slot(WalReceiverConnHandle *handle, char *slotname,
>      return snapshot;
>  }
>  
> +/*
> + * Drop replication slot.
> + */
> +static void
> +libpqrcv_drop_slot(WalReceiverConnHandle *handle, char *slotname)
> +{
> +    PGresult       *res;
> +    char            cmd[256];
> +
> +    snprintf(cmd, sizeof(cmd),
> +             "DROP_REPLICATION_SLOT \"%s\"", slotname);
> +
> +    res = libpqrcv_PQexec(handle, cmd);
> +
> +    if (PQresultStatus(res) != PGRES_COMMAND_OK)
> +    {
> +        elog(ERROR, "could not drop replication slot \"%s\": %s\n",
> +             slotname, PQerrorMessage(handle->streamConn));
> +    }
> +
> +    PQclear(res);
> +}


Given that the earlier commit to libpqwalreciever added a lot of this
information, it doesn't seem right to change it again here.



> +typedef struct LogicalRepRelMapEntry {

early {


Ok, running out of time. See you soon I guess ;)

Andres



Re: Logical Replication WIP

From
Andres Freund
Date:
(continuing, uh, a bit happier)

On 2016-09-09 00:59:26 +0200, Petr Jelinek wrote:

> +/*
> + * Relcache invalidation callback for our relation map cache.
> + */
> +static void
> +logicalreprelmap_invalidate_cb(Datum arg, Oid reloid)
> +{
> +    LogicalRepRelMapEntry  *entry;
> +
> +    /* Just to be sure. */
> +    if (LogicalRepRelMap == NULL)
> +        return;
> +
> +    if (reloid != InvalidOid)
> +    {
> +        HASH_SEQ_STATUS status;
> +
> +        hash_seq_init(&status, LogicalRepRelMap);
> +
> +        /* TODO, use inverse lookup hastable? */

*hashtable

> +        while ((entry = (LogicalRepRelMapEntry *) hash_seq_search(&status)) != NULL)
> +        {
> +            if (entry->reloid == reloid)
> +                entry->reloid = InvalidOid;

can't we break here?


> +/*
> + * Initialize the relation map cache.
> + */
> +static void
> +remoterelmap_init(void)
> +{
> +    HASHCTL        ctl;
> +
> +    /* Make sure we've initialized CacheMemoryContext. */
> +    if (CacheMemoryContext == NULL)
> +        CreateCacheMemoryContext();
> +
> +    /* Initialize the hash table. */
> +    MemSet(&ctl, 0, sizeof(ctl));
> +    ctl.keysize = sizeof(uint32);
> +    ctl.entrysize = sizeof(LogicalRepRelMapEntry);
> +    ctl.hcxt = CacheMemoryContext;

Wonder if this (and similar code earlier) should try to do everything in
a sub-context of CacheMemoryContext instead. That'd make some issues
easier to track down.

> +/*
> + * Open the local relation associated with the remote one.
> + */
> +static LogicalRepRelMapEntry *
> +logicalreprel_open(uint32 remoteid, LOCKMODE lockmode)
> +{
> +    LogicalRepRelMapEntry  *entry;
> +    bool        found;
> +
> +    if (LogicalRepRelMap == NULL)
> +        remoterelmap_init();
> +
> +    /* Search for existing entry. */
> +    entry = hash_search(LogicalRepRelMap, (void *) &remoteid,
> +                        HASH_FIND, &found);
> +
> +    if (!found)
> +        elog(FATAL, "cache lookup failed for remote relation %u",
> +             remoteid);
> +
> +    /* Need to update the local cache? */
> +    if (!OidIsValid(entry->reloid))
> +    {
> +        Oid            nspid;
> +        Oid            relid;
> +        int            i;
> +        TupleDesc    desc;
> +        LogicalRepRelation *remoterel;
> +
> +        remoterel = &entry->remoterel;
> +
> +        nspid = LookupExplicitNamespace(remoterel->nspname, false);
> +        if (!OidIsValid(nspid))
> +            ereport(FATAL,
> +                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> +                     errmsg("the logical replication target %s not found",
> +                            quote_qualified_identifier(remoterel->nspname,
       remoterel->relname))));
 
> +        relid = get_relname_relid(remoterel->relname, nspid);
> +        if (!OidIsValid(relid))
> +            ereport(FATAL,
> +                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> +                     errmsg("the logical replication target %s not found",
> +                            quote_qualified_identifier(remoterel->nspname,
> +                                                       remoterel->relname))));
> +
> +        entry->rel = heap_open(relid, lockmode);

This seems rather racy. I think this really instead needs something akin
to RangeVarGetRelidExtended().

> +/*
> + * Executor state preparation for evaluation of constraint expressions,
> + * indexes and triggers.
> + *
> + * This is based on similar code in copy.c
> + */
> +static EState *
> +create_estate_for_relation(LogicalRepRelMapEntry *rel)
> +{
> +    EState       *estate;
> +    ResultRelInfo *resultRelInfo;
> +    RangeTblEntry *rte;
> +
> +    estate = CreateExecutorState();
> +
> +    rte = makeNode(RangeTblEntry);
> +    rte->rtekind = RTE_RELATION;
> +    rte->relid = RelationGetRelid(rel->rel);
> +    rte->relkind = rel->rel->rd_rel->relkind;
> +    estate->es_range_table = list_make1(rte);
> +
> +    resultRelInfo = makeNode(ResultRelInfo);
> +    InitResultRelInfo(resultRelInfo, rel->rel, 1, 0);
> +
> +    estate->es_result_relations = resultRelInfo;
> +    estate->es_num_result_relations = 1;
> +    estate->es_result_relation_info = resultRelInfo;
> +
> +    /* Triggers might need a slot */
> +    if (resultRelInfo->ri_TrigDesc)
> +        estate->es_trig_tuple_slot = ExecInitExtraTupleSlot(estate);
> +
> +    return estate;
> +}

Ugh, we do this for every single change? That's pretty darn heavy.


> +/*
> + * Check if the local attribute is present in relation definition used
> + * by upstream and hence updated by the replication.
> + */
> +static bool
> +physatt_in_attmap(LogicalRepRelMapEntry *rel, int attid)
> +{
> +    AttrNumber    i;
> +
> +    /* Fast path for tables that are same on upstream and downstream. */
> +    if (attid < rel->remoterel.natts && rel->attmap[attid] == attid)
> +        return true;
> +
> +    /* Try to find the attribute in the map. */
> +    for (i = 0; i < rel->remoterel.natts; i++)
> +        if (rel->attmap[i] == attid)
> +            return true;
> +
> +    return false;
> +}

Shouldn't we rather try to keep an attribute map that always can map
remote attribute numbers to local ones? That doesn't seem hard on a
first blush? But I might be missing something here.


> +/*
> + * Executes default values for columns for which we can't map to remote
> + * relation columns.
> + *
> + * This allows us to support tables which have more columns on the downstream
> + * than on the upsttream.
> + */

Typo: upsttream.


> +static void
> +FillSlotDefaults(LogicalRepRelMapEntry *rel, EState *estate,
> +                 TupleTableSlot *slot)
> +{

Why is this using a different naming scheme?


> +/*
> + * Handle COMMIT message.
> + *
> + * TODO, support tracking of multiple origins
> + */
> +static void
> +handle_commit(StringInfo s)
> +{
> +    XLogRecPtr        commit_lsn;
> +    XLogRecPtr        end_lsn;
> +    TimestampTz        commit_time;
> +
> +    logicalrep_read_commit(s, &commit_lsn, &end_lsn, &commit_time);

Perhaps this (and related routines) should rather be       LogicalRepCommitdata commit_data;
logicalrep_read_commit(s,&commit_data);
 
etc? That way the data can transparently be enhanced.

> +    Assert(commit_lsn == replorigin_session_origin_lsn);
> +    Assert(commit_time == replorigin_session_origin_timestamp);
> +
> +    if (IsTransactionState())
> +    {
> +        FlushPosition *flushpos;
> +
> +        CommitTransactionCommand();
> +        MemoryContextSwitchTo(CacheMemoryContext);
> +
> +        /* Track commit lsn  */
> +        flushpos = (FlushPosition *) palloc(sizeof(FlushPosition));
> +        flushpos->local_end = XactLastCommitEnd;
> +        flushpos->remote_end = end_lsn;
> +
> +        dlist_push_tail(&lsn_mapping, &flushpos->node);
> +        MemoryContextSwitchTo(ApplyContext);

Seems like it should be in a separate function.


> +/*
> + * Handle INSERT message.
> + */
> +static void
> +handle_insert(StringInfo s)
> +{
> +    LogicalRepRelMapEntry *rel;
> +    LogicalRepTupleData    newtup;
> +    LogicalRepRelId        relid;
> +    EState               *estate;
> +    TupleTableSlot       *remoteslot;
> +    MemoryContext        oldctx;
> +
> +    ensure_transaction();
> +
> +    relid = logicalrep_read_insert(s, &newtup);
> +    rel = logicalreprel_open(relid, RowExclusiveLock);
> +
> +    /* Initialize the executor state. */
> +    estate = create_estate_for_relation(rel);
> +    remoteslot = ExecInitExtraTupleSlot(estate);
> +    ExecSetSlotDescriptor(remoteslot, RelationGetDescr(rel->rel));

This seems incredibly expensive for replicating a lot of rows.

> +    /* Process and store remote tuple in the slot */
> +    oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
> +    SlotStoreCStrings(remoteslot, newtup.values);
> +    FillSlotDefaults(rel, estate, remoteslot);
> +    MemoryContextSwitchTo(oldctx);
> +
> +    PushActiveSnapshot(GetTransactionSnapshot());
> +    ExecOpenIndices(estate->es_result_relation_info, false);
> +
> +    ExecInsert(NULL, /* mtstate is only used for onconflict handling which we don't support atm */
> +               remoteslot,
> +               remoteslot,
> +               NIL,
> +               ONCONFLICT_NONE,
> +               estate,
> +               false);

I have *severe* doubts about just using the (newly) exposed functions
1:1 here.


> +/*
> + * Search the relation 'rel' for tuple using the replication index.
> + *
> + * If a matching tuple is found lock it with lockmode, fill the slot with its
> + * contents and return true, return false is returned otherwise.
> + */
> +static bool
> +tuple_find_by_replidx(Relation rel, LockTupleMode lockmode,
> +                      TupleTableSlot *searchslot, TupleTableSlot *slot)
> +{
> +    HeapTuple        scantuple;
> +    ScanKeyData        skey[INDEX_MAX_KEYS];
> +    IndexScanDesc    scan;
> +    SnapshotData    snap;
> +    TransactionId    xwait;
> +    Oid                idxoid;
> +    Relation        idxrel;
> +    bool            found;
> +
> +    /* Open REPLICA IDENTITY index.*/
> +    idxoid = RelationGetReplicaIndex(rel);
> +    if (!OidIsValid(idxoid))
> +    {
> +        elog(ERROR, "could not find configured replica identity for table \"%s\"",
> +             RelationGetRelationName(rel));
> +        return false;
> +    }
> +    idxrel = index_open(idxoid, RowExclusiveLock);
> +
> +    /* Start an index scan. */
> +    InitDirtySnapshot(snap);
> +    scan = index_beginscan(rel, idxrel, &snap,
> +                           RelationGetNumberOfAttributes(idxrel),
> +                           0);
> +
> +    /* Build scan key. */
> +    build_replindex_scan_key(skey, rel, idxrel, searchslot);
> +
> +retry:
> +    found = false;
> +
> +    index_rescan(scan, skey, RelationGetNumberOfAttributes(idxrel), NULL, 0);
> +
> +    /* Try to find the tuple */
> +    if ((scantuple = index_getnext(scan, ForwardScanDirection)) != NULL)
> +    {
> +        found = true;
> +        ExecStoreTuple(scantuple, slot, InvalidBuffer, false);
> +        ExecMaterializeSlot(slot);
> +
> +        xwait = TransactionIdIsValid(snap.xmin) ?
> +            snap.xmin : snap.xmax;
> +
> +        /*
> +         * If the tuple is locked, wait for locking transaction to finish
> +         * and retry.
> +         */
> +        if (TransactionIdIsValid(xwait))
> +        {
> +            XactLockTableWait(xwait, NULL, NULL, XLTW_None);
> +            goto retry;
> +        }
> +    }

Hm. So we potentially find multiple tuples here, and lock all of
them. but then only use one for the update.



> +static List *
> +get_subscription_list(void)
> +{
> +    List       *res = NIL;
> +    Relation    rel;
> +    HeapScanDesc scan;
> +    HeapTuple    tup;
> +    MemoryContext resultcxt;
> +
> +    /* This is the context that we will allocate our output data in */
> +    resultcxt = CurrentMemoryContext;
> +
> +    /*
> +     * Start a transaction so we can access pg_database, and get a snapshot.
> +     * We don't have a use for the snapshot itself, but we're interested in
> +     * the secondary effect that it sets RecentGlobalXmin.  (This is critical
> +     * for anything that reads heap pages, because HOT may decide to prune
> +     * them even if the process doesn't attempt to modify any tuples.)
> +     */

> +    StartTransactionCommand();
> +    (void) GetTransactionSnapshot();
> +
> +    rel = heap_open(SubscriptionRelationId, AccessShareLock);
> +    scan = heap_beginscan_catalog(rel, 0, NULL);
> +
> +    while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
> +    {
> +        Form_pg_subscription subform = (Form_pg_subscription) GETSTRUCT(tup);
> +        Subscription   *sub;
> +        MemoryContext    oldcxt;
> +
> +        /*
> +         * Allocate our results in the caller's context, not the
> +         * transaction's. We do this inside the loop, and restore the original
> +         * context at the end, so that leaky things like heap_getnext() are
> +         * not called in a potentially long-lived context.
> +         */
> +        oldcxt = MemoryContextSwitchTo(resultcxt);
> +
> +        sub = (Subscription *) palloc(sizeof(Subscription));
> +        sub->oid = HeapTupleGetOid(tup);
> +        sub->dbid = subform->subdbid;
> +        sub->enabled = subform->subenabled;
> +
> +        /* We don't fill fields we are not intereste in. */
> +        sub->name = NULL;
> +        sub->conninfo = NULL;
> +        sub->slotname = NULL;
> +        sub->publications = NIL;
> +
> +        res = lappend(res, sub);
> +        MemoryContextSwitchTo(oldcxt);
> +    }
> +
> +    heap_endscan(scan);
> +    heap_close(rel, AccessShareLock);
> +
> +    CommitTransactionCommand();

Hm. this doesn't seem quite right from a locking pov. What if, in the
middle of this, a new subscription is created?



> +void
> +logicalrep_worker_stop(LogicalRepWorker *worker)
> +{
> +    Assert(LWLockHeldByMe(LogicalRepWorkerLock));
> +
> +    /* Check that the worker is up and what we expect. */
> +    if (!worker->proc)
> +        return;
> +    if (!IsBackendPid(worker->proc->pid))
> +        return;
> +
> +    /* Terminate the worker. */
> +    kill(worker->proc->pid, SIGTERM);
> +
> +    LWLockRelease(LogicalRepLauncherLock);
> +
> +    /* Wait for it to detach. */
> +    for (;;)
> +    {
> +        int    rc = WaitLatch(&MyProc->procLatch,
> +                           WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
> +                           1000L);
> +
> +        /* emergency bailout if postmaster has died */
> +        if (rc & WL_POSTMASTER_DEATH)
> +            proc_exit(1);
> +
> +        ResetLatch(&MyProc->procLatch);
> +
> +        CHECK_FOR_INTERRUPTS();
> +
> +        if (!worker->proc)
> +            return;
> +    }
> +}

indentation here seems scfrewed.



> +static void
> +xacthook_signal_launcher(XactEvent event, void *arg)
> +{
> +    switch (event)
> +    {
> +        case XACT_EVENT_COMMIT:
> +            if (xacthook_do_signal_launcher)
> +                ApplyLauncherWakeup();
> +            break;
> +        default:
> +            /* We're not interested in other tx events */
> +            break;
> +    }
> +}

> +void
> +ApplyLauncherWakeupOnCommit(void)
> +{
> +    if (!xacthook_do_signal_launcher)
> +    {
> +        RegisterXactCallback(xacthook_signal_launcher, NULL);
> +        xacthook_do_signal_launcher = true;
> +    }
> +}

Hm. This seems like it really should be an AtCommit_* routine instead.
This also needs more docs.


Hadn't I previously read about always streaming data to disk first?

> @@ -0,0 +1,674 @@
> +/*-------------------------------------------------------------------------
> + * tablesync.c
> + *       PostgreSQL logical replication
> + *
> + * Copyright (c) 2012-2016, PostgreSQL Global Development Group
> + *
> + * IDENTIFICATION
> + *      src/backend/replication/logical/tablesync.c
> + *
> + * NOTES
> + *      This file contains code for initial table data synchronization for
> + *      logical replication.
> + *
> + *    The initial data synchronization is done separately for each table,
> + *    in separate apply worker that only fetches the initial snapshot data
> + *    from the provider and then synchronizes the position in stream with
> + *    the main apply worker.

Why? I guess that's because it allows to incrementally add tables, with
acceptable overhead.


> + *    The stream position synchronization works in multiple steps.
> + *     - sync finishes copy and sets table state as SYNCWAIT and waits
> + *       for state to change in a loop
> + *     - apply periodically checks unsynced tables for SYNCWAIT, when it
> + *       appears it will compare its position in the stream with the
> + *       SYNCWAIT position and decides to either set it to CATCHUP when
> + *       the apply was infront (and wait for the sync to do the catchup),
> + *       or set the state to SYNCDONE if the sync was infront or in case
> + *       both sync and apply are at the same position it will set it to
> + *       READY and stops tracking it

I'm not quite following here.

> + *     - if the state was set to CATCHUP sync will read the stream and
> + *       apply changes until it catches up to the specified stream
> + *       position and then sets state to READY and signals apply that it
> + *       can stop waiting and exits, if the state was set to something
> + *       else than CATCHUP the sync process will simply end
> + *     - if the state was set to SYNCDONE by apply, the apply will
> + *       continue tracking the table until it reaches the SYNCDONE stream
> + *       position at which point it sets state to READY and stops tracking
> + *
> + *    Example flows look like this:
> + *     - Apply is infront:
> + *          sync:8   -> set SYNCWAIT
> + *        apply:10 -> set CATCHUP
> + *        sync:10  -> set ready
> + *          exit
> + *        apply:10
> + *          stop tracking
> + *          continue rep
> + *    - Sync infront:
> + *        sync:10
> + *          set SYNCWAIT
> + *        apply:8
> + *          set SYNCDONE
> + *        sync:10
> + *          exit
> + *        apply:10
> + *          set READY
> + *          stop tracking
> + *          continue rep

This definitely needs to be expanded a bit. Where are we tracking how
far replication has progressed on individual tables? Are we creating new
slots for syncing? Is there any parallelism in syncing?

> +/*
> + * Exit routine for synchronization worker.
> + */
> +static void
> +finish_sync_worker(char *slotname)
> +{
> +    LogicalRepWorker   *worker;
> +    RepOriginId            originid;
> +    MemoryContext        oldctx = CurrentMemoryContext;
> +
> +    /*
> +     * Drop the replication slot on remote server.
> +     * We want to continue even in the case that the slot on remote side
> +     * is already gone. This means that we can leave slot on the remote
> +     * side but that can happen for other reasons as well so we can't
> +     * really protect against that.
> +     */
> +    PG_TRY();
> +    {
> +        wrcapi->drop_slot(wrchandle, slotname);
> +    }
> +    PG_CATCH();
> +    {
> +        MemoryContext    ectx;
> +        ErrorData       *edata;
> +
> +        ectx = MemoryContextSwitchTo(oldctx);
> +        /* Save error info */
> +        edata = CopyErrorData();
> +        MemoryContextSwitchTo(ectx);
> +        FlushErrorState();
> +
> +        ereport(WARNING,
> +                (errmsg("there was problem dropping the replication slot "
> +                        "\"%s\" on provider", slotname),
> +                 errdetail("The error was: %s", edata->message),
> +                 errhint("You may have to drop it manually")));
> +        FreeErrorData(edata);

ISTM we really should rather return success/failure here, and not throw
an error inside the libpqwalreceiver stuff.  I kind of wonder if we
actually can get rid of this indirection.


> +    /* Find the main apply worker and signal it. */
> +    LWLockAcquire(LogicalRepWorkerLock, LW_EXCLUSIVE);
> +    worker = logicalrep_worker_find(MyLogicalRepWorker->subid, InvalidOid);
> +    if (worker && worker->proc)
> +        SetLatch(&worker->proc->procLatch);
> +    LWLockRelease(LogicalRepWorkerLock);

I'd rather do the SetLatch outside of the critical section.

> +static bool
> +wait_for_sync_status_change(TableState *tstate)
> +{
> +    int        rc;
> +    char    state = tstate->state;
> +
> +    while (!got_SIGTERM)
> +    {
> +        StartTransactionCommand();
> +        tstate->state = GetSubscriptionRelState(MyLogicalRepWorker->subid,
> +                                                tstate->relid,
> +                                                &tstate->lsn,
> +                                                true);
> +        CommitTransactionCommand();
> +
> +        /* Status record was removed. */
> +        if (tstate->state == SUBREL_STATE_UNKNOWN)
> +            return false;
> +
> +        if (tstate->state != state)
> +            return true;
> +
> +        rc = WaitLatch(&MyProc->procLatch,
> +                       WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
> +                       10000L);
> +
> +        /* emergency bailout if postmaster has died */
> +        if (rc & WL_POSTMASTER_DEATH)
> +            proc_exit(1);
> +
> +        ResetLatch(&MyProc->procLatch);

broken indentation.


> +/*
> + * Read the state of the tables in the subscription and update our table
> + * state list.
> + */
> +static void
> +reread_sync_state(Oid relid)
> +{
> +    dlist_mutable_iter    iter;
> +    Relation    rel;
> +    HeapTuple    tup;
> +    ScanKeyData    skey[2];
> +    HeapScanDesc    scan;
> +
> +    /* Clean the old list. */
> +    dlist_foreach_modify(iter, &table_states)
> +    {
> +        TableState *tstate = dlist_container(TableState, node, iter.cur);
> +
> +        dlist_delete(iter.cur);
> +        pfree(tstate);
> +    }
> +
> +    /*
> +     * Fetch all the subscription relation states that are not marked as
> +     * ready and push them into our table state tracking list.
> +     */
> +    rel = heap_open(SubscriptionRelRelationId, RowExclusiveLock);
> +
> +    ScanKeyInit(&skey[0],
> +                Anum_pg_subscription_rel_subid,
> +                BTEqualStrategyNumber, F_OIDEQ,
> +                ObjectIdGetDatum(MyLogicalRepWorker->subid));
> +
> +    if (OidIsValid(relid))
> +    {
> +        ScanKeyInit(&skey[1],
> +                    Anum_pg_subscription_rel_subrelid,
> +                    BTEqualStrategyNumber, F_OIDEQ,
> +                    ObjectIdGetDatum(relid));
> +    }
> +    else
> +    {
> +        ScanKeyInit(&skey[1],
> +                    Anum_pg_subscription_rel_substate,
> +                    BTEqualStrategyNumber, F_CHARNE,
> +                    CharGetDatum(SUBREL_STATE_READY));
> +    }
> +
> +    scan = heap_beginscan_catalog(rel, 2, skey);

Hm. So this is a seqscan. Shouldn't we make this use an index (depending
on which branch is taken above)?


> +    while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
> +    {
> +        Form_pg_subscription_rel    subrel;
> +        TableState       *tstate;
> +        MemoryContext    oldctx;
> +
> +        subrel = (Form_pg_subscription_rel) GETSTRUCT(tup);
> +
> +        /* Allocate the tracking info in a permament memory context. */

s/permament/permanent/

> +/*
> + * Handle table synchronization cooperation from the synchroniation
> + * worker.
> + */
> +static void
> +process_syncing_tables_sync(char *slotname, XLogRecPtr end_lsn)
> +{
> +    TableState *tstate;
> +    TimeLineID    tli;
> +
> +    Assert(!IsTransactionState());
> +
> +    /*
> +     * Synchronization workers don't keep track of all synchronization
> +     * tables, they only care about their table.
> +     */
> +    if (!table_states_valid)
> +    {
> +        StartTransactionCommand();
> +        reread_sync_state(MyLogicalRepWorker->relid);
> +        CommitTransactionCommand();
> +    }
> +
> +    /* Somebody removed table underneath this worker, nothing more to do. */
> +    if (dlist_is_empty(&table_states))
> +    {
> +        wrcapi->endstreaming(wrchandle, &tli);
> +        finish_sync_worker(slotname);
> +    }
> +
> +    /* Check if we are done with catchup now. */
> +    tstate = dlist_container(TableState, node, dlist_head_node(&table_states));
> +    if (tstate->state == SUBREL_STATE_CATCHUP)
> +    {
> +        Assert(tstate->lsn != InvalidXLogRecPtr);
> +
> +        if (tstate->lsn == end_lsn)
> +        {
> +            tstate->state = SUBREL_STATE_READY;
> +            tstate->lsn = InvalidXLogRecPtr;
> +            /* Update state of the synchronization. */
> +            StartTransactionCommand();
> +            SetSubscriptionRelState(MyLogicalRepWorker->subid,
> +                                    tstate->relid, tstate->state,
> +                                    tstate->lsn);
> +            CommitTransactionCommand();
> +
> +            wrcapi->endstreaming(wrchandle, &tli);
> +            finish_sync_worker(slotname);
> +        }
> +        return;
> +    }
> +}

The return inside the if is a bit weird. Makes one think it might be a
loop or such.


> +/*
> + * Handle table synchronization cooperation from the apply worker.
> + */
> +static void
> +process_syncing_tables_apply(char *slotname, XLogRecPtr end_lsn)
> +{
> +    dlist_mutable_iter    iter;
> +
> +    Assert(!IsTransactionState());
> +
> +    if (!table_states_valid)
> +    {
> +        StartTransactionCommand();
> +        reread_sync_state(InvalidOid);
> +        CommitTransactionCommand();
> +    }

So this pattern is repeated a bunch of times, maybe we can encapsulate
that somewhat? Maybe like ensure_sync_state_valid() or such?


> +    dlist_foreach_modify(iter, &table_states)
> +    {
> +        TableState *tstate = dlist_container(TableState, node, iter.cur);
> +        bool        start_worker;
> +        LogicalRepWorker   *worker;
> +
> +        /*
> +         * When the synchronization process is at the cachup phase we need

s/cachup/catchup/


> +         * to ensure that we are not behind it (it's going to wait at this
> +         * point for the change of state). Once we are infront or at the same
> +         * position as the synchronization proccess we can signal it to
> +         * finish the catchup.
> +         */
> +        if (tstate->state == SUBREL_STATE_SYNCWAIT)
> +        {
> +            if (end_lsn > tstate->lsn)
> +            {
> +                /*
> +                 * Apply is infront, tell sync to catchup. and wait until
> +                 * it does.
> +                 */
> +                tstate->state = SUBREL_STATE_CATCHUP;
> +                tstate->lsn = end_lsn;
> +                StartTransactionCommand();
> +                SetSubscriptionRelState(MyLogicalRepWorker->subid,
> +                                        tstate->relid, tstate->state,
> +                                        tstate->lsn);
> +                CommitTransactionCommand();
> +
> +                /* Signal the worker as it may be waiting for us. */
> +                LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
> +                worker = logicalrep_worker_find(MyLogicalRepWorker->subid,
> +                                                tstate->relid);
> +                if (worker && worker->proc)
> +                    SetLatch(&worker->proc->procLatch);
> +                LWLockRelease(LogicalRepWorkerLock);

Different parts of this file use different lock level to set the
latch. Why?


> +                if (wait_for_sync_status_change(tstate))
> +                    Assert(tstate->state == SUBREL_STATE_READY);
> +            }
> +            else
> +            {
> +                /*
> +                 * Apply is either behind in which case sync worker is done
> +                 * but apply needs to keep tracking the table until it
> +                 * catches up to where sync finished.
> +                 * Or apply and sync are at the same position in which case
> +                 * table can be switched to standard replication mode
> +                 * immediately.
> +                 */
> +                if (end_lsn < tstate->lsn)
> +                    tstate->state = SUBREL_STATE_SYNCDONE;
> +                else
> +                    tstate->state = SUBREL_STATE_READY;
> +

What I'm failing to understand is how this can be done under
concurrency. You probably thought about this, but it should really be
explained somewhere.


> +                StartTransactionCommand();
> +                SetSubscriptionRelState(MyLogicalRepWorker->subid,
> +                                        tstate->relid, tstate->state,
> +                                        tstate->lsn);
> +                CommitTransactionCommand();
> +
> +                /* Signal the worker as it may be waiting for us. */
> +                LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
> +                worker = logicalrep_worker_find(MyLogicalRepWorker->subid,
> +                                                tstate->relid);
> +                if (worker && worker->proc)
> +                    SetLatch(&worker->proc->procLatch);
> +                LWLockRelease(LogicalRepWorkerLock);

Oh, and again, please set latches outside of the lock.


> +        else if (tstate->state == SUBREL_STATE_SYNCDONE &&
> +                 end_lsn >= tstate->lsn)
> +        {
> +            /*
> +             * Apply catched up to the position where table sync finished,
> +             * mark the table as ready for normal replication.
> +             */

Sentence needs to be rephrased a bit.

> +        /*
> +         * In case table is supposed to be synchronizing but the
> +         * synchronization worker is not running, start it.
> +         * Limit the number of launched workers here to one (for now).
> +         */

Hm. That seems problematic for online upgrade type cases, we might never
be catch up that way...


> +/*
> + * Start syncing the table in the sync worker.
> + */
> +char *
> +LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
> +{
> +    StringInfoData    s;
> +    TableState        tstate;
> +    MemoryContext    oldctx;
> +    char           *slotname;
> +
> +    /* Check the state of the table synchronization. */
> +    StartTransactionCommand();
> +    tstate.relid = MyLogicalRepWorker->relid;
> +    tstate.state = GetSubscriptionRelState(MySubscription->oid, tstate.relid,
> +                                           &tstate.lsn, false);
> +
> +    /*
> +     * Build unique slot name.
> +     * TODO: protect against too long slot name.
> +     */
> +    oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> +    initStringInfo(&s);
> +    appendStringInfo(&s, "%s_sync_%s", MySubscription->slotname,
> +                     get_rel_name(tstate.relid));
> +    slotname = s.data;

Is this memory freed somewhere?


> +                /*
> +                 * We want to do the table data sync in single
> +                 * transaction so do not close the transaction opened
> +                 * above.
> +                 * There will be no BEGIN or COMMIT messages coming via
> +                 * logical replication while the copy table command is
> +                 * running so start the transaction here.
> +                 * Note the memory context for data handling will still
> +                 * be done using ensure_transaction called by the insert
> +                 * handler.
> +                 */
> +                StartTransactionCommand();
> +
> +                /*
> +                 * Don't allow parallel access other than SELECT while
> +                 * the initial contents are being copied.
> +                 */
> +                rel = heap_open(tstate.relid, ExclusiveLock);

Why do we want to allow access at all?



> @@ -87,6 +92,8 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
>      cb->commit_cb = pgoutput_commit_txn;
>      cb->filter_by_origin_cb = pgoutput_origin_filter;
>      cb->shutdown_cb = pgoutput_shutdown;
> +    cb->tuple_cb = pgoutput_tuple;
> +    cb->list_tables_cb = pgoutput_list_tables;
>  }

What are these new, and undocumented callbacks actually doing? And why
is this integrated into logical decoding?


>  /*
> + * Handle LIST_TABLES command.
> + */
> +static void
> +SendTableList(ListTablesCmd *cmd)
> +{

Ugh.


I really dislike this kind of command. I think we should instead change
things around, allowing to issue normal SQL via the replication
command. We'll have to error out for running sql for non-database
connected replication connections, but that seems fine.


Andres



Re: Logical Replication WIP

From
Peter Eisentraut
Date:
On 9/14/16 11:21 AM, Andres Freund wrote:
>> +    ExecInsert(NULL, /* mtstate is only used for onconflict handling which we don't support atm */
>> > +               remoteslot,
>> > +               remoteslot,
>> > +               NIL,
>> > +               ONCONFLICT_NONE,
>> > +               estate,
>> > +               false);
> I have *severe* doubts about just using the (newly) exposed functions
> 1:1 here.

It is a valid concern, but what is the alternative?  ExecInsert() and
the others appear to do exactly the right things that are required.

Are your concerns mainly philosophical about calling into internal
executor code, or do you have technical concerns that this will not do
the right thing in some cases?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Logical Replication WIP

From
Andres Freund
Date:
On 2016-09-14 13:20:02 -0500, Peter Eisentraut wrote:
> On 9/14/16 11:21 AM, Andres Freund wrote:
> >> +    ExecInsert(NULL, /* mtstate is only used for onconflict handling which we don't support atm */
> >> > +               remoteslot,
> >> > +               remoteslot,
> >> > +               NIL,
> >> > +               ONCONFLICT_NONE,
> >> > +               estate,
> >> > +               false);
> > I have *severe* doubts about just using the (newly) exposed functions
> > 1:1 here.
> 
> It is a valid concern, but what is the alternative?  ExecInsert() and
> the others appear to do exactly the right things that are required.

They're actually a lot more heavyweight than what's required. If you
e.g. do a large COPY on the source side, we create a single executor
state (if at all), and then insert the rows using lower level
routines. And that's *vastly* faster, than going through all the setup
costs here for each row.


> Are your concerns mainly philosophical about calling into internal
> executor code, or do you have technical concerns that this will not do
> the right thing in some cases?

Well, not about it being wrong in the sene of returning wrong results,
but wrong in the sense of not even remotely being able to keep up in
common cases.

Andres



Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 14/09/16 00:48, Andres Freund wrote:
>
> First read through the current version. Hence no real architectural
> comments.

Hi,

Thanks for looking!

>
> On 2016-09-09 00:59:26 +0200, Petr Jelinek wrote:
>
>> diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c
>> new file mode 100644
>> index 0000000..e0c719d
>> --- /dev/null
>> +++ b/src/backend/commands/publicationcmds.c
>> @@ -0,0 +1,761 @@
>> +/*-------------------------------------------------------------------------
>> + *
>> + * publicationcmds.c
>> + *        publication manipulation
>> + *
>> + * Copyright (c) 2015, PostgreSQL Global Development Group
>> + *
>> + * IDENTIFICATION
>> + *        publicationcmds.c
>>
>
> Not that I'm a fan of this line in the first place, but usually it does
> include the path.
>

Yes, I don't bother with it in WIP version though, because this way I 
won't forget to change it when it's getting close to ready if there were 
renames.

>> +static void
>> +check_replication_permissions(void)
>> +{
>> +    if (!superuser() && !has_rolreplication(GetUserId()))
>> +        ereport(ERROR,
>> +                (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
>> +                 (errmsg("must be superuser or replication role to manipulate publications"))));
>> +}
>
> Do we want to require owner privileges for replication roles? I'd say
> no, but want to raise the question.
>

No, we might want to invent some publish role for which we will so that 
we can do logical replication with higher granularity but for 
replication role it does not make sense. And I think the higher 
granularity ACLs is something for followup patches.

>
>> +ObjectAddress
>> +CreatePublication(CreatePublicationStmt *stmt)
>> +{
>> +    Relation    rel;
>> +    ObjectAddress myself;
>> +    Oid            puboid;
>> +    bool        nulls[Natts_pg_publication];
>> +    Datum        values[Natts_pg_publication];
>> +    HeapTuple    tup;
>> +    bool        replicate_insert_given;
>> +    bool        replicate_update_given;
>> +    bool        replicate_delete_given;
>> +    bool        replicate_insert;
>> +    bool        replicate_update;
>> +    bool        replicate_delete;
>> +
>> +    check_replication_permissions();
>> +
>> +    rel = heap_open(PublicationRelationId, RowExclusiveLock);
>> +
>> +    /* Check if name is used */
>> +    puboid = GetSysCacheOid1(PUBLICATIONNAME, CStringGetDatum(stmt->pubname));
>> +    if (OidIsValid(puboid))
>> +    {
>> +        ereport(ERROR,
>> +                (errcode(ERRCODE_DUPLICATE_OBJECT),
>> +                 errmsg("publication \"%s\" already exists",
>> +                        stmt->pubname)));
>> +    }
>> +
>> +    /* Form a tuple. */
>> +    memset(values, 0, sizeof(values));
>> +    memset(nulls, false, sizeof(nulls));
>> +
>> +    values[Anum_pg_publication_pubname - 1] =
>> +        DirectFunctionCall1(namein, CStringGetDatum(stmt->pubname));
>> +
>> +    parse_publication_options(stmt->options,
>> +                              &replicate_insert_given, &replicate_insert,
>> +                              &replicate_update_given, &replicate_update,
>> +                              &replicate_delete_given, &replicate_delete);
>> +
>> +    values[Anum_pg_publication_puballtables - 1] =
>> +        BoolGetDatum(stmt->for_all_tables);
>> +    values[Anum_pg_publication_pubreplins - 1] =
>> +        BoolGetDatum(replicate_insert);
>> +    values[Anum_pg_publication_pubreplupd - 1] =
>> +        BoolGetDatum(replicate_update);
>> +    values[Anum_pg_publication_pubrepldel - 1] =
>> +        BoolGetDatum(replicate_delete);
>> +
>> +    tup = heap_form_tuple(RelationGetDescr(rel), values, nulls);
>> +
>> +    /* Insert tuple into catalog. */
>> +    puboid = simple_heap_insert(rel, tup);
>> +    CatalogUpdateIndexes(rel, tup);
>> +    heap_freetuple(tup);
>> +
>> +    ObjectAddressSet(myself, PublicationRelationId, puboid);
>> +
>> +    /* Make the changes visible. */
>> +    CommandCounterIncrement();
>> +
>> +    if (stmt->tables)
>> +    {
>> +        List       *rels;
>> +
>> +        Assert(list_length(stmt->tables) > 0);
>> +
>> +        rels = GatherTableList(stmt->tables);
>> +        PublicationAddTables(puboid, rels, true, NULL);
>> +        CloseTables(rels);
>> +    }
>> +    else if (stmt->for_all_tables || stmt->schema)
>> +    {
>> +        List       *rels;
>> +
>> +        rels = GatherTables(stmt->schema);
>> +        PublicationAddTables(puboid, rels, true, NULL);
>> +        CloseTables(rels);
>> +    }
>
> Isn't this (and ALTER) racy? What happens if tables are concurrently
> created? This session wouldn't necessarily see the tables, and other
> sessions won't see for_all_tables/schema.   Evaluating
> for_all_tables/all_in_schema when the publication is used, would solve
> that problem.

Well, yes it is. It's technically not problem for all_in_schema as 
that's just shorthand for TABLE a,b,c,d etc where future tables don't 
matter (and should be added manually, unless we want to change that 
behavior to act more like for_all_tables just with schema filter which I 
wouldn't be against). But for for_all_tables it's problem I agree.

Based on discussion offline I'll move the check to the actual DML 
operation instead of DDL and have for_all_tables be evaluated when used 
not when defined.

>
>> +/*
>> + * Gather all tables optinally filtered by schema name.
>> + * The gathered tables are locked in access share lock mode.
>> + */
>> +static List *
>> +GatherTables(char *nspname)
>> +{
>> +    Oid            nspid = InvalidOid;
>> +    List       *rels = NIL;
>> +    Relation    rel;
>> +    SysScanDesc scan;
>> +    ScanKeyData key[1];
>> +    HeapTuple    tup;
>> +
>> +    /* Resolve and validate the schema if specified */
>> +    if (nspname)
>> +    {
>> +        nspid = LookupExplicitNamespace(nspname, false);
>> +        if (IsSystemNamespace(nspid) || IsToastNamespace(nspid))
>> +            ereport(ERROR,
>> +                    (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> +                     errmsg("only tables in user schemas can be added to publication"),
>> +                     errdetail("%s is a system schema", strVal(nspname))));
>> +    }
>
> Why are we restricting pg_catalog here? There's a bunch of extensions
> creating objects therein, and we allow that. Seems better to just rely
> on the IsSystemClass check for that below.
>

Makes sense.

>> +/*
>> + * Gather Relations based o provided by RangeVar list.
>> + * The gathered tables are locked in access share lock mode.
>> + */
>
> Why access share? Shouldn't we make this ShareUpdateExclusive or
> similar, to prevent schema changes?
>

Hm, I thought AccessShare would be enough to prevent schema changes that 
matter to us (which is basically just drop afaik).

>
>> +static List *
>> +GatherTableList(List *tables)
>> +{
>> +    List       *relids = NIL;
>> +    List       *rels = NIL;
>> +    ListCell   *lc;
>> +
>> +    /*
>> +     * Open, share-lock, and check all the explicitly-specified relations
>> +     */
>> +    foreach(lc, tables)
>> +    {
>> +        RangeVar   *rv = lfirst(lc);
>> +        Relation    rel;
>> +        bool        recurse = interpretInhOption(rv->inhOpt);
>> +        Oid            myrelid;
>> +
>> +        rel = heap_openrv(rv, AccessShareLock);
>> +        myrelid = RelationGetRelid(rel);
>> +        /* don't throw error for "foo, foo" */
>> +        if (list_member_oid(relids, myrelid))
>> +        {
>> +            heap_close(rel, AccessShareLock);
>> +            continue;
>> +        }
>> +        rels = lappend(rels, rel);
>> +        relids = lappend_oid(relids, myrelid);
>> +
>> +        if (recurse)
>> +        {
>> +            ListCell   *child;
>> +            List       *children;
>> +
>> +            children = find_all_inheritors(myrelid, AccessShareLock,
>> +                                           NULL);
>> +
>> +            foreach(child, children)
>> +            {
>> +                Oid            childrelid = lfirst_oid(child);
>> +
>> +                if (list_member_oid(relids, childrelid))
>> +                    continue;
>> +
>> +                /* find_all_inheritors already got lock */
>> +                rel = heap_open(childrelid, NoLock);
>> +                rels = lappend(rels, rel);
>> +                relids = lappend_oid(relids, childrelid);
>> +            }
>> +        }
>> +    }
>
> Hm, can't this yield duplicates, when both an inherited and a top level
> relation are specified?
>

Hmm possible, I'll do the same check as I do above.

>
>> @@ -713,6 +714,25 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
>>      ObjectAddressSet(address, RelationRelationId, relationId);
>>
>>      /*
>> +     * If the newly created relation is a table and there are publications
>> +     * which were created as FOR ALL TABLES, we want to add the relation
>> +     * membership to those publications.
>> +     */
>> +
>> +    if (relkind == RELKIND_RELATION)
>> +    {
>> +        List       *pubids = GetAllTablesPublications();
>> +        ListCell   *lc;
>> +
>> +        foreach(lc, pubids)
>> +        {
>> +            Oid    pubid = lfirst_oid(lc);
>> +
>> +            publication_add_relation(pubid, rel, false);
>> +        }
>> +    }
>> +
>
> Hm, this has the potential to noticeably slow down table creation.
>

I doubt it's going to be noticeable given all the work CREATE TABLE 
already does, but it certainly won't make it any faster. But since we 
agreed to move the check to DML this will be removed as well.

>> +publication_opt_item:
>> +            IDENT
>> +                {
>> +                    /*
>> +                     * We handle identifiers that aren't parser keywords with
>> +                     * the following special-case codes, to avoid bloating the
>> +                     * size of the main parser.
>> +                     */
>> +                    if (strcmp($1, "replicate_insert") == 0)
>> +                        $$ = makeDefElem("replicate_insert",
>> +                                         (Node *)makeInteger(TRUE), @1);
>> +                    else if (strcmp($1, "noreplicate_insert") == 0)
>> +                        $$ = makeDefElem("replicate_insert",
>> +                                         (Node *)makeInteger(FALSE), @1);
>> +                    else if (strcmp($1, "replicate_update") == 0)
>> +                        $$ = makeDefElem("replicate_update",
>> +                                         (Node *)makeInteger(TRUE), @1);
>> +                    else if (strcmp($1, "noreplicate_update") == 0)
>> +                        $$ = makeDefElem("replicate_update",
>> +                                         (Node *)makeInteger(FALSE), @1);
>> +                    else if (strcmp($1, "replicate_delete") == 0)
>> +                        $$ = makeDefElem("replicate_delete",
>> +                                         (Node *)makeInteger(TRUE), @1);
>> +                    else if (strcmp($1, "noreplicate_delete") == 0)
>> +                        $$ = makeDefElem("replicate_delete",
>> +                                         (Node *)makeInteger(FALSE), @1);
>> +                    else
>> +                        ereport(ERROR,
>> +                                (errcode(ERRCODE_SYNTAX_ERROR),
>> +                                 errmsg("unrecognized publication option \"%s\"", $1),
>> +                                     parser_errposition(@1)));
>> +                }
>> +        ;
>
> I'm kind of inclined to do this checking at execution (or transform)
> time instead.  That allows extension to add options, and handle them in
> utility hooks.
>

Thant's interesting point, I prefer the parsing to be done in gram.y, 
but it might be worth moving it for extensibility. Although there are so 
far other barriers for that.

>> +
>> +/* ----------------
>> + *        pg_publication_rel definition.  cpp turns this into
>> + *        typedef struct FormData_pg_publication_rel
>> + *
>> + * ----------------
>> + */
>> +#define PublicationRelRelationId                6106
>> +
>> +CATALOG(pg_publication_rel,6106)
>> +{
>> +    Oid        pubid;                /* Oid of the publication */
>> +    Oid        relid;                /* Oid of the relation */
>> +} FormData_pg_publication_rel;
>
> Hm. Do we really want this to have an oid? Won't that significantly,
> especially if multiple publications are present, increase our oid
> consumption?  It seems entirely sufficient to identify rows in here
> using (pubid, relid).
>

It could, but I'll have to check and possibly fix dependency code, I 
vaguely remember that there is some part of it that assumes that suboid 
is only used for relation column and nothing else.

>
>> +ObjectAddress
>> +CreateSubscription(CreateSubscriptionStmt *stmt)
>> +{
>> +    Relation    rel;
>> +    ObjectAddress myself;
>> +    Oid            subid;
>> +    bool        nulls[Natts_pg_subscription];
>> +    Datum        values[Natts_pg_subscription];
>> +    HeapTuple    tup;
>> +    bool        enabled_given;
>> +    bool        enabled;
>> +    char       *conninfo;
>> +    List       *publications;
>> +
>> +    check_subscription_permissions();
>> +
>> +    rel = heap_open(SubscriptionRelationId, RowExclusiveLock);
>> +
>> +    /* Check if name is used */
>> +    subid = GetSysCacheOid2(SUBSCRIPTIONNAME, MyDatabaseId,
>> +                            CStringGetDatum(stmt->subname));
>> +    if (OidIsValid(subid))
>> +    {
>> +        ereport(ERROR,
>> +                (errcode(ERRCODE_DUPLICATE_OBJECT),
>> +                 errmsg("subscription \"%s\" already exists",
>> +                        stmt->subname)));
>> +    }
>> +
>> +    /* Parse and check options. */
>> +    parse_subscription_options(stmt->options, &enabled_given, &enabled,
>> +                               &conninfo, &publications);
>> +
>> +    /* TODO: improve error messages here. */
>> +    if (conninfo == NULL)
>> +        ereport(ERROR,
>> +                (errcode(ERRCODE_SYNTAX_ERROR),
>> +                 errmsg("connection not specified")));
>
> Probably also makes sense to parse the conninfo here to verify it looks
> saen.  Although that's fairly annoying to do, because the relevant code
> is libpq :(
>

Well the connection is eventually used (in later patches) so maybe 
that's not problem.

>
>> diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
>> index 65230e2..f3d54c8 100644
>> --- a/src/backend/nodes/copyfuncs.c
>> +++ b/src/backend/nodes/copyfuncs.c
>
> I think you might be missing outfuncs support.
>

I thought that we don't do outfuncs for DDL?

>> +
>> +CATALOG(pg_subscription,6100) BKI_SHARED_RELATION BKI_ROWTYPE_OID(6101) BKI_SCHEMA_MACRO
>> +{
>> +    Oid            subdbid;            /* Database the subscription is in. */
>> +    NameData    subname;        /* Name of the subscription */
>> +    bool        subenabled;        /* True if the subsription is enabled (running) */
>
> Not sure what "running" means here.

It's very terse way of saying that enabled means worker should be running.


>> +    <varlistentry>
>> +     <term>
>> +      publication_names
>> +     </term>
>> +     <listitem>
>> +      <para>
>> +       Comma separated list of publication names for which to subscribe
>> +       (receive changes). See
>> +       <xref linkend="logical-replication-publication"> for more info.
>> +      </para>
>> +     </listitem>
>> +    </varlistentry>
>> +   </variablelist>
>
> Do we need to specify an escaping scheme here?
>

Probably as we allow whatever Name allows.

>
>> +<listitem>
>> +<para>
>> +                Commit timestamp of the transaction.
>> +</para>
>> +</listitem>
>> +</varlistentry>
>
> Perhaps mention it's relative to postgres epoch?
>

Already done in my local working copy.

>
>
>> +<variablelist>
>> +<varlistentry>
>> +<term>
>> +        Byte1('O')
>> +</term>
>> +<listitem>
>> +<para>
>> +                Identifies the message as an origin message.
>> +</para>
>> +</listitem>
>> +</varlistentry>
>> +<varlistentry>
>> +<term>
>> +        Int64
>> +</term>
>> +<listitem>
>> +<para>
>> +                The LSN of the commit on the origin server.
>> +</para>
>> +</listitem>
>> +</varlistentry>
>> +<varlistentry>
>> +<term>
>> +        Int8
>> +</term>
>> +<listitem>
>> +<para>
>> +                Length of the origin name (including the NULL-termination
>> +                character).
>> +</para>
>> +</listitem>
>> +</varlistentry>
>
> Should this explain that there could be mulitple origin messages (when
> replay switched origins during an xact)?
>

Makes sense.

>> +<para>
>> +                Relation name.
>> +</para>
>> +</listitem>
>> +</varlistentry>
>> +</variablelist>
>> +
>> +</para>
>> +
>> +<para>
>> +This message is always followed by Attributes message.
>> +</para>
>
> What's the point of having this separate from the relation message?
>

It's not, it part of it, but the documentation does not make that very 
clear.

>> +<varlistentry>
>> +<term>
>> +        Byte1('C')
>> +</term>
>> +<listitem>
>> +<para>
>> +                Start of column block.
>> +</para>
>> +</listitem>
>
> "block"?
>

Block, message part, sub-message, I am not sure how to call something 
that's repeating inside of a message.

>> +</varlistentry><varlistentry>
>> +<term>
>> +        Int8
>> +</term>
>> +<listitem>
>> +<para>
>> +                Flags for the column. Currently can be either 0 for no flags
>> +                or one which marks the column as part of the key.
>> +</para>
>> +</listitem>
>> +</varlistentry>
>> +<varlistentry>
>> +<term>
>> +        Int8
>> +</term>
>> +<listitem>
>> +<para>
>> +                Length of column name (including the NULL-termination
>> +                character).
>> +</para>
>> +</listitem>
>> +</varlistentry>
>> +<varlistentry>
>> +<term>
>> +        String
>> +</term>
>> +<listitem>
>> +<para>
>> +                Name of the column.
>> +</para>
>> +</listitem>
>> +</varlistentry>
>
> Huh, no type information?
>

It's not necessary for the text transfer, it will be if we ever add 
binary data transfer but that will require protocol version bump anyway.


>> +<varlistentry>
>> +<term>
>> +        Byte1('O')
>> +</term>
>> +<listitem>
>> +<para>
>> +                Identifies the following TupleData message as the old tuple
>> +                (deleted tuple).
>> +</para>
>> +</listitem>
>> +</varlistentry>
>
> Should we discern between old key and old tuple?
>

Yes, otherwise it will be hard to support REPLICA IDENTITY FULL.

>
>> +/*
>> + * Read transaction BEGIN from the stream.
>> + */
>> +void
>> +logicalrep_read_begin(StringInfo in, XLogRecPtr *remote_lsn,
>> +                      TimestampTz *committime, TransactionId *remote_xid)
>> +{
>> +    /* read fields */
>> +    *remote_lsn = pq_getmsgint64(in);
>> +    Assert(*remote_lsn != InvalidXLogRecPtr);
>> +    *committime = pq_getmsgint64(in);
>> +    *remote_xid = pq_getmsgint(in, 4);
>> +}
>
> In network exposed stuff it seems better not to use assert, and error
> out instead.
>

Okay

>
>> +/*
>> + * Write UPDATE to the output stream.
>> + */
>> +void
>> +logicalrep_write_update(StringInfo out, Relation rel, HeapTuple oldtuple,
>> +                       HeapTuple newtuple)
>> +{
>> +    pq_sendbyte(out, 'U');        /* action UPDATE */
>> +
>> +    /* use Oid as relation identifier */
>> +    pq_sendint(out, RelationGetRelid(rel), 4);
>
> Wonder if there's a way that could screw us. What happens if there's an
> oid wraparound, and a relation is dropped? Then a new relation could end
> up with same id. Maybe answered somewhere further down.
>

Should not, we'll know we didn't send the message for the new table yet 
so we'll send new Relation message.

>> +
>> +/*
>> + * COMMIT callback
>> + */
>> +static void
>> +pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
>> +                     XLogRecPtr commit_lsn)
>> +{
>> +    OutputPluginPrepareWrite(ctx, true);
>> +    logicalrep_write_commit(ctx->out, txn, commit_lsn);
>> +    OutputPluginWrite(ctx, true);
>> +}
>
> Hm, so we don't reset the context for these...
>

What?

>> +/*
>> + * Sends the decoded DML over wire.
>> + */
>> +static void
>> +pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
>> +                Relation relation, ReorderBufferChange *change)
>> +{
>
>> +    /* Avoid leaking memory by using and resetting our own context */
>> +    old = MemoryContextSwitchTo(data->context);
>> +
>> +    /*
>> +     * Write the relation schema if the current schema haven't been sent yet.
>> +     */
>> +    if (!relentry->schema_sent)
>> +    {
>> +        OutputPluginPrepareWrite(ctx, false);
>> +        logicalrep_write_rel(ctx->out, relation);
>> +        OutputPluginWrite(ctx, false);
>> +        relentry->schema_sent = true;
>> +    }
>> +
>> +    /* Send the data */
>> +    switch (change->action)
>> +    {
> ...
>> +    /* Cleanup */
>> +    MemoryContextSwitchTo(old);
>> +    MemoryContextReset(data->context);
>> +}
>
> IIRC there were some pfree's in called functions. It's probably better
> to remove those and rely on this.
>

Only write_tuple calls pfree, that's mostly because we may call it twice 
for single tuple and it might allocate a lot of data.

>> +/*
>> + * Load publications from the list of publication names.
>> + */
>> +static List *
>> +LoadPublications(List *pubnames)
>> +{
>> +    List       *result = NIL;
>> +    ListCell   *lc;
>> +
>> +    foreach (lc, pubnames)
>> +    {
>> +        char           *pubname = (char *) lfirst(lc);
>> +        Publication       *pub = GetPublicationByName(pubname, false);
>> +
>> +        result = lappend(result, pub);
>> +    }
>> +
>> +    return result;
>> +}
>
> Why are we doing this eagerly? On systems with a lot of relations
> this'll suck up a fair amount of memory, without much need?
>

Don't follow, it only reads publications not relations in them, reason 
why we do it eagerly is to validate that the requested publications 
actually exist.

>> +/*
>> + * Remove all the entries from our relation cache.
>> + */
>> +static void
>> +destroy_rel_sync_cache(void)
>> +{
>> +    HASH_SEQ_STATUS        status;
>> +    RelationSyncEntry  *entry;
>> +
>> +    if (RelationSyncCache == NULL)
>> +        return;
>> +
>> +    hash_seq_init(&status, RelationSyncCache);
>> +
>> +    while ((entry = (RelationSyncEntry *) hash_seq_search(&status)) != NULL)
>> +    {
>> +        if (hash_search(RelationSyncCache, (void *) &entry->relid,
>> +                        HASH_REMOVE, NULL) == NULL)
>> +            elog(ERROR, "hash table corrupted");
>> +    }
>> +
>> +    RelationSyncCache = NULL;
>> +}
>
> Any reason not to just destroy the hash table instead?
>

Missed that we have AOI for that.

>>
>>  /*
>> - * Module load callback
>> + * Module initialization callback
>>   */
>> -void
>> -_PG_init(void)
>> +WalReceiverConnHandle *
>> +_PG_walreceirver_conn_init(WalReceiverConnAPI *wrcapi)
>>  {
>> -    /* Tell walreceiver how to reach us */
>> -    if (walrcv_connect != NULL || walrcv_identify_system != NULL ||
>> -        walrcv_readtimelinehistoryfile != NULL ||
>> -        walrcv_startstreaming != NULL || walrcv_endstreaming != NULL ||
>> -        walrcv_receive != NULL || walrcv_send != NULL ||
>> -        walrcv_disconnect != NULL)
>> -        elog(ERROR, "libpqwalreceiver already loaded");
>> -    walrcv_connect = libpqrcv_connect;
>> -    walrcv_get_conninfo = libpqrcv_get_conninfo;
>> -    walrcv_identify_system = libpqrcv_identify_system;
>> -    walrcv_readtimelinehistoryfile = libpqrcv_readtimelinehistoryfile;
>> -    walrcv_startstreaming = libpqrcv_startstreaming;
>> -    walrcv_endstreaming = libpqrcv_endstreaming;
>> -    walrcv_receive = libpqrcv_receive;
>> -    walrcv_send = libpqrcv_send;
>> -    walrcv_disconnect = libpqrcv_disconnect;
>> +    WalReceiverConnHandle *handle;
>> +
>> +    handle = palloc0(sizeof(WalReceiverConnHandle));
>> +
>> +    /* Tell caller how to reach us */
>> +    wrcapi->connect = libpqrcv_connect;
>> +    wrcapi->get_conninfo = libpqrcv_get_conninfo;
>> +    wrcapi->identify_system = libpqrcv_identify_system;
>> +    wrcapi->readtimelinehistoryfile = libpqrcv_readtimelinehistoryfile;
>> +    wrcapi->create_slot = libpqrcv_create_slot;
>> +    wrcapi->startstreaming_physical = libpqrcv_startstreaming_physical;
>> +    wrcapi->startstreaming_logical = libpqrcv_startstreaming_logical;
>> +    wrcapi->endstreaming = libpqrcv_endstreaming;
>> +    wrcapi->receive = libpqrcv_receive;
>> +    wrcapi->send = libpqrcv_send;
>> +    wrcapi->disconnect = libpqrcv_disconnect;
>> +
>> +    return handle;
>>  }
>
> This however I'm not following. Why do we need multiple copies of this?
> And why aren't we doing the assignments in _PG_init?  Seems better to
> just allocate one WalRcvCalllbacks globally and assign all these as
> constants.  Then the establishment function can just return all these
> (as part of a bigger struct).
>

Meh, If I understand you correctly that will make the access bit more 
ugly (multiple layers of structs).

>
> (skipped logical rep docs)
>
>> diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
>> index 8acdff1..34007d3 100644
>> --- a/doc/src/sgml/reference.sgml
>> +++ b/doc/src/sgml/reference.sgml
>> @@ -54,11 +54,13 @@
>>     &alterOperatorClass;
>>     &alterOperatorFamily;
>>     &alterPolicy;
>> +   &alterPublication;
>>     &alterRole;
>>     &alterRule;
>>     &alterSchema;
>>     &alterSequence;
>>     &alterServer;
>> +   &alterSubscription;
>>     &alterSystem;
>>     &alterTable;
>>     &alterTableSpace;
>> @@ -100,11 +102,13 @@
>>     &createOperatorClass;
>>     &createOperatorFamily;
>>     &createPolicy;
>> +   &createPublication;
>>     &createRole;
>>     &createRule;
>>     &createSchema;
>>     &createSequence;
>>     &createServer;
>> +   &createSubscription;
>>     &createTable;
>>     &createTableAs;
>>     &createTableSpace;
>> @@ -144,11 +148,13 @@
>>     &dropOperatorFamily;
>>     &dropOwned;
>>     &dropPolicy;
>> +   &dropPublication;
>>     &dropRole;
>>     &dropRule;
>>     &dropSchema;
>>     &dropSequence;
>>     &dropServer;
>> +   &dropSubscription;
>>     &dropTable;
>>     &dropTableSpace;
>>     &dropTSConfig;
>
> Hm, shouldn't all these have been registered in the earlier patch?
>

Yeah, all the rebasing sometimes produces artefacts.

>
>
>> diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
>> index d29d3f9..f2052b8 100644
>> --- a/src/backend/commands/subscriptioncmds.c
>> +++ b/src/backend/commands/subscriptioncmds.c
>
> This sure is a lot of yanking around of previously added code.  At least
> some of it looks like it should really have been part of the earlier
> commit.
>

True, but it depends on the previous patch ... scratches head ... hmm 
although the libpqwalreceiver actually does not depend on anything so it 
could be first patch in series, then this code could be moved to the 
patch which adds subscriptions.

>
>> @@ -327,6 +431,18 @@ DropSubscriptionById(Oid subid)
>>  {
>>      Relation    rel;
>>      HeapTuple    tup;
>> +    Datum        datum;
>> +    bool        isnull;
>> +    char       *subname;
>> +    char       *conninfo;
>> +    char       *slotname;
>> +    RepOriginId    originid;
>> +    MemoryContext            tmpctx,
>> +                            oldctx;
>> +    WalReceiverConnHandle  *wrchandle = NULL;
>> +    WalReceiverConnAPI       *wrcapi = NULL;
>> +    walrcvconn_init_fn        walrcvconn_init;
>> +    LogicalRepWorker       *worker;
>>
>>      check_subscription_permissions();
>>
>> @@ -337,9 +453,135 @@ DropSubscriptionById(Oid subid)
>>      if (!HeapTupleIsValid(tup))
>>          elog(ERROR, "cache lookup failed for subscription %u", subid);
>>
>> +    /*
>> +     * Create temporary memory context to keep copy of subscription
>> +     * info needed later in the execution.
>> +     */
>> +    tmpctx = AllocSetContextCreate(TopMemoryContext,
>> +                                          "DropSubscription Ctx",
>> +                                          ALLOCSET_DEFAULT_MINSIZE,
>> +                                          ALLOCSET_DEFAULT_INITSIZE,
>> +                                          ALLOCSET_DEFAULT_MAXSIZE);
>> +    oldctx = MemoryContextSwitchTo(tmpctx);
>> +
>> +    /* Get subname */
>> +    datum = SysCacheGetAttr(SUBSCRIPTIONOID, tup,
>> +                            Anum_pg_subscription_subname, &isnull);
>> +    Assert(!isnull);
>> +    subname = pstrdup(NameStr(*DatumGetName(datum)));
>> +
>> +    /* Get conninfo */
>> +    datum = SysCacheGetAttr(SUBSCRIPTIONOID, tup,
>> +                            Anum_pg_subscription_subconninfo, &isnull);
>> +    Assert(!isnull);
>> +    conninfo = pstrdup(TextDatumGetCString(datum));
>> +
>> +    /* Get slotname */
>> +    datum = SysCacheGetAttr(SUBSCRIPTIONOID, tup,
>> +                            Anum_pg_subscription_subslotname, &isnull);
>> +    Assert(!isnull);
>> +    slotname = pstrdup(NameStr(*DatumGetName(datum)));
>> +
>> +    MemoryContextSwitchTo(oldctx);
>> +
>> +    /* Remove the tuple from catalog. */
>>      simple_heap_delete(rel, &tup->t_self);
>>
>> -    ReleaseSysCache(tup);
>> +    /* Protect against launcher restarting the worker. */
>> +    LWLockAcquire(LogicalRepLauncherLock, LW_EXCLUSIVE);
>>
>> -    heap_close(rel, RowExclusiveLock);
>> +    /* Kill the apply worker so that the slot becomes accessible. */
>> +    LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
>> +    worker = logicalrep_worker_find(subid);
>> +    if (worker)
>> +        logicalrep_worker_stop(worker);
>> +    LWLockRelease(LogicalRepWorkerLock);
>> +
>> +    /* Wait for apply process to die. */
>> +    for (;;)
>> +    {
>> +        int    rc;
>> +
>> +        CHECK_FOR_INTERRUPTS();
>> +
>> +        LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
>> +        if (logicalrep_worker_count(subid) < 1)
>> +        {
>> +            LWLockRelease(LogicalRepWorkerLock);
>> +            break;
>> +        }
>> +        LWLockRelease(LogicalRepWorkerLock);
>> +
>> +        /* Wait for more work. */
>> +        rc = WaitLatch(&MyProc->procLatch,
>> +                       WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
>> +                       1000L);
>> +
>> +        /* emergency bailout if postmaster has died */
>> +        if (rc & WL_POSTMASTER_DEATH)
>> +            proc_exit(1);
>> +
>> +        ResetLatch(&MyProc->procLatch);
>> +    }
>
> I'm really far from convinced this is the right layer to perform these
> operations.  Previously these routines were low level catalog
> manipulation routines. Now they're certainly not.
>

Well I do want to have this happen when the DDL is executed so that I 
can inform user about failure. I can move this code to a separate 
function but it will still be executed in this layer.

>
>> +    /*
>> +     * Now that the catalog update is done, try to reserve slot at the
>> +     * provider node using replication connection.
>> +     */
>> +    wrcapi = palloc0(sizeof(WalReceiverConnAPI));
>> +
>> +    walrcvconn_init = (walrcvconn_init_fn)
>> +        load_external_function("libpqwalreceiver",
>> +                               "_PG_walreceirver_conn_init", false, NULL);
>> +
>> +    if (walrcvconn_init == NULL)
>> +        elog(ERROR, "libpqwalreceiver does not declare _PG_walreceirver_conn_init symbol");
>
> This does rather reinforce my opinion that the _PG_init removal in
> libpqwalreceiver isn't useful.

I don't see how it helps, you said we'd still return struct from some 
interface so this would be more or less the same?

>
>> diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
>> index 699c934..fc998cd 100644
>> --- a/src/backend/postmaster/bgworker.c
>> +++ b/src/backend/postmaster/bgworker.c
>> @@ -93,6 +93,9 @@ struct BackgroundWorkerHandle
>>
>>  static BackgroundWorkerArray *BackgroundWorkerData;
>>
>> +/* Enables registration of internal background workers. */
>> +bool internal_bgworker_registration_in_progress = false;
>> +
>>  /*
>>   * Calculate shared memory needed.
>>   */
>> @@ -745,7 +748,8 @@ RegisterBackgroundWorker(BackgroundWorker *worker)
>>          ereport(DEBUG1,
>>           (errmsg("registering background worker \"%s\"", worker->bgw_name)));
>>
>> -    if (!process_shared_preload_libraries_in_progress)
>> +    if (!process_shared_preload_libraries_in_progress &&
>> +        !internal_bgworker_registration_in_progress)
>>      {
>>          if (!IsUnderPostmaster)
>>              ereport(LOG,
>
> Ugh.
>
>
>
>
>>  /*
>> + * Register internal background workers.
>> + *
>> + * This is here mainly because the permanent bgworkers are normally allowed
>> + * to be registered only when share preload libraries are loaded which does
>> + * not work for the internal ones.
>> + */
>> +static void
>> +register_internal_bgworkers(void)
>> +{
>> +    internal_bgworker_registration_in_progress = true;
>> +
>> +    /* Register the logical replication worker launcher if appropriate. */
>> +    if (!IsBinaryUpgrade && max_logical_replication_workers > 0)
>> +    {
>> +        BackgroundWorker bgw;
>> +
>> +        bgw.bgw_flags =    BGWORKER_SHMEM_ACCESS |
>> +            BGWORKER_BACKEND_DATABASE_CONNECTION;
>> +        bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
>> +        bgw.bgw_main = ApplyLauncherMain;
>> +        snprintf(bgw.bgw_name, BGW_MAXLEN,
>> +                 "logical replication launcher");
>> +        bgw.bgw_restart_time = 5;
>> +        bgw.bgw_notify_pid = 0;
>> +        bgw.bgw_main_arg = (Datum) 0;
>> +
>> +        RegisterBackgroundWorker(&bgw);
>> +    }
>> +
>> +    internal_bgworker_registration_in_progress = false;
>> +}
>
> Who says these flags are right for everyone?  If we indeed want to go
> through bgworkers here, I think you'll have to generallize this a bit,
> so we don't check for max_logical_replication_workers and such here.  We
> could e.g. have the shared memory sizing hooks set up a chain of
> registrations.
>

It could be more generalized, I agree, this is more of a WIP hack.

I would like to make special version of RegisterBackgroundWorker called 
something like RegisterInternalBackgroundWorker that does something 
similar as the above function (obviously the if should be moved to the 
caller of that function). The main point here is to be able to register 
static worker without extension.

>
>
>> -static void
>> +static char *
>>  libpqrcv_identify_system(WalReceiverConnHandle *handle,
>> -                         TimeLineID *primary_tli)
>> +                         TimeLineID *primary_tli,
>> +                         char **dbname)
>>  {
>> +    char       *sysid;
>>      PGresult   *res;
>> -    char       *primary_sysid;
>> -    char        standby_sysid[32];
>>
>>      /*
>>       * Get the system identifier and timeline ID as a DataRow message from the
>> @@ -231,24 +234,19 @@ libpqrcv_identify_system(WalReceiverConnHandle *handle,
>>                   errdetail("Could not identify system: got %d rows and %d fields, expected %d rows and %d or more
fields.",
>>                             ntuples, nfields, 3, 1)));
>>      }
>> -    primary_sysid = PQgetvalue(res, 0, 0);
>> +    sysid = pstrdup(PQgetvalue(res, 0, 0));
>>      *primary_tli = pg_atoi(PQgetvalue(res, 0, 1), 4, 0);
>> -
>> -    /*
>> -     * Confirm that the system identifier of the primary is the same as ours.
>> -     */
>> -    snprintf(standby_sysid, sizeof(standby_sysid), UINT64_FORMAT,
>> -             GetSystemIdentifier());
>> -    if (strcmp(primary_sysid, standby_sysid) != 0)
>> +    if (dbname)
>>      {
>> -        primary_sysid = pstrdup(primary_sysid);
>> -        PQclear(res);
>> -        ereport(ERROR,
>> -                (errmsg("database system identifier differs between the primary and standby"),
>> -                 errdetail("The primary's identifier is %s, the standby's identifier is %s.",
>> -                           primary_sysid, standby_sysid)));
>> +        if (PQgetisnull(res, 0, 3))
>> +            *dbname = NULL;
>> +        else
>> +            *dbname = pstrdup(PQgetvalue(res, 0, 3));
>>      }
>> +
>>      PQclear(res);
>> +
>> +    return sysid;
>>  }
>>
>>  /*
>> @@ -274,7 +272,7 @@ libpqrcv_create_slot(WalReceiverConnHandle *handle, char *slotname,
>>
>>      if (PQresultStatus(res) != PGRES_TUPLES_OK)
>>      {
>> -        elog(FATAL, "could not crate replication slot \"%s\": %s\n",
>> +        elog(ERROR, "could not crate replication slot \"%s\": %s\n",
>>               slotname, PQerrorMessage(handle->streamConn));
>>      }
>>
>> @@ -287,6 +285,28 @@ libpqrcv_create_slot(WalReceiverConnHandle *handle, char *slotname,
>>      return snapshot;
>>  }
>>
>> +/*
>> + * Drop replication slot.
>> + */
>> +static void
>> +libpqrcv_drop_slot(WalReceiverConnHandle *handle, char *slotname)
>> +{
>> +    PGresult       *res;
>> +    char            cmd[256];
>> +
>> +    snprintf(cmd, sizeof(cmd),
>> +             "DROP_REPLICATION_SLOT \"%s\"", slotname);
>> +
>> +    res = libpqrcv_PQexec(handle, cmd);
>> +
>> +    if (PQresultStatus(res) != PGRES_COMMAND_OK)
>> +    {
>> +        elog(ERROR, "could not drop replication slot \"%s\": %s\n",
>> +             slotname, PQerrorMessage(handle->streamConn));
>> +    }
>> +
>> +    PQclear(res);
>> +}
>
>
> Given that the earlier commit to libpqwalreciever added a lot of this
> information, it doesn't seem right to change it again here.
>

Why? It's pretty unrelated to the previous change which is basically 
just refactoring, this actually adds new functionality.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 14/09/16 20:50, Andres Freund wrote:
> On 2016-09-14 13:20:02 -0500, Peter Eisentraut wrote:
>> On 9/14/16 11:21 AM, Andres Freund wrote:
>>>> +    ExecInsert(NULL, /* mtstate is only used for onconflict handling which we don't support atm */
>>>>> +               remoteslot,
>>>>> +               remoteslot,
>>>>> +               NIL,
>>>>> +               ONCONFLICT_NONE,
>>>>> +               estate,
>>>>> +               false);
>>> I have *severe* doubts about just using the (newly) exposed functions
>>> 1:1 here.
>>
>> It is a valid concern, but what is the alternative?  ExecInsert() and
>> the others appear to do exactly the right things that are required.
>
> They're actually a lot more heavyweight than what's required. If you
> e.g. do a large COPY on the source side, we create a single executor
> state (if at all), and then insert the rows using lower level
> routines. And that's *vastly* faster, than going through all the setup
> costs here for each row.
>
>
>> Are your concerns mainly philosophical about calling into internal
>> executor code, or do you have technical concerns that this will not do
>> the right thing in some cases?
>
> Well, not about it being wrong in the sene of returning wrong results,
> but wrong in the sense of not even remotely being able to keep up in
> common cases.
>

I'd say in common case they will. I don't plan to use these forever btw, 
but it's simplest to just use them in v1 IMHO instead of trying to 
reinvent new versions of these that perform better but also behave 
correctly (in terms of triggers and stuff for example).

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Andres Freund
Date:
Hi,

On 2016-09-14 21:17:42 +0200, Petr Jelinek wrote:
> > > +/*
> > > + * Gather Relations based o provided by RangeVar list.
> > > + * The gathered tables are locked in access share lock mode.
> > > + */
> > 
> > Why access share? Shouldn't we make this ShareUpdateExclusive or
> > similar, to prevent schema changes?
> > 
> 
> Hm, I thought AccessShare would be enough to prevent schema changes that
> matter to us (which is basically just drop afaik).

Doesn't e.g. dropping an index matter as well?


> > > +                    if (strcmp($1, "replicate_insert") == 0)
> > > +                        $$ = makeDefElem("replicate_insert",
> > > +                                         (Node *)makeInteger(TRUE), @1);
> > > +                    else if (strcmp($1, "noreplicate_insert") == 0)
> > > +                        $$ = makeDefElem("replicate_insert",
> > > +                                         (Node *)makeInteger(FALSE), @1);
> > > +                    else if (strcmp($1, "replicate_update") == 0)
> > > +                        $$ = makeDefElem("replicate_update",
> > > +                                         (Node *)makeInteger(TRUE), @1);
> > > +                    else if (strcmp($1, "noreplicate_update") == 0)
> > > +                        $$ = makeDefElem("replicate_update",
> > > +                                         (Node *)makeInteger(FALSE), @1);
> > > +                    else if (strcmp($1, "replicate_delete") == 0)
> > > +                        $$ = makeDefElem("replicate_delete",
> > > +                                         (Node *)makeInteger(TRUE), @1);
> > > +                    else if (strcmp($1, "noreplicate_delete") == 0)
> > > +                        $$ = makeDefElem("replicate_delete",
> > > +                                         (Node *)makeInteger(FALSE), @1);
> > > +                    else
> > > +                        ereport(ERROR,
> > > +                                (errcode(ERRCODE_SYNTAX_ERROR),
> > > +                                 errmsg("unrecognized publication option \"%s\"", $1),
> > > +                                     parser_errposition(@1)));
> > > +                }
> > > +        ;
> > 
> > I'm kind of inclined to do this checking at execution (or transform)
> > time instead.  That allows extension to add options, and handle them in
> > utility hooks.
> > 
> 
> Thant's interesting point, I prefer the parsing to be done in gram.y, but it
> might be worth moving it for extensibility. Although there are so far other
> barriers for that.

Citus uses the lack of such check for COPY to implement copy over it's
distributed tables for example. So there's some benefit.



> > > +    check_subscription_permissions();
> > > +
> > > +    rel = heap_open(SubscriptionRelationId, RowExclusiveLock);
> > > +
> > > +    /* Check if name is used */
> > > +    subid = GetSysCacheOid2(SUBSCRIPTIONNAME, MyDatabaseId,
> > > +                            CStringGetDatum(stmt->subname));
> > > +    if (OidIsValid(subid))
> > > +    {
> > > +        ereport(ERROR,
> > > +                (errcode(ERRCODE_DUPLICATE_OBJECT),
> > > +                 errmsg("subscription \"%s\" already exists",
> > > +                        stmt->subname)));
> > > +    }
> > > +
> > > +    /* Parse and check options. */
> > > +    parse_subscription_options(stmt->options, &enabled_given, &enabled,
> > > +                               &conninfo, &publications);
> > > +
> > > +    /* TODO: improve error messages here. */
> > > +    if (conninfo == NULL)
> > > +        ereport(ERROR,
> > > +                (errcode(ERRCODE_SYNTAX_ERROR),
> > > +                 errmsg("connection not specified")));
> > 
> > Probably also makes sense to parse the conninfo here to verify it looks
> > saen.  Although that's fairly annoying to do, because the relevant code
> > is libpq :(
> > 
> 
> Well the connection is eventually used (in later patches) so maybe that's
> not problem.

Well, it's nicer if it's immediately parsed, before doing complex and
expensive stuff, especially if that happens outside of the transaction.


> > 
> > > diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
> > > index 65230e2..f3d54c8 100644
> > > --- a/src/backend/nodes/copyfuncs.c
> > > +++ b/src/backend/nodes/copyfuncs.c
> > 
> > I think you might be missing outfuncs support.
> > 
> 
> I thought that we don't do outfuncs for DDL?

I think it's just readfuncs that's skipped.


> > > +                Length of column name (including the NULL-termination
> > > +                character).
> > > +</para>
> > > +</listitem>
> > > +</varlistentry>
> > > +<varlistentry>
> > > +<term>
> > > +        String
> > > +</term>
> > > +<listitem>
> > > +<para>
> > > +                Name of the column.
> > > +</para>
> > > +</listitem>
> > > +</varlistentry>
> > 
> > Huh, no type information?
> > 
> 
> It's not necessary for the text transfer, it will be if we ever add binary
> data transfer but that will require protocol version bump anyway.

I'm *hugely* unconvinced of this. For one type information is useful for
error reporting and such as well. For another, it's one thing to add a
new protocol message (for differently encoded tuples), and something
entirely different to change the format of existing messages.


> > > +
> > > +/*
> > > + * COMMIT callback
> > > + */
> > > +static void
> > > +pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
> > > +                     XLogRecPtr commit_lsn)
> > > +{
> > > +    OutputPluginPrepareWrite(ctx, true);
> > > +    logicalrep_write_commit(ctx->out, txn, commit_lsn);
> > > +    OutputPluginWrite(ctx, true);
> > > +}
> > 
> > Hm, so we don't reset the context for these...
> > 
> 
> What?

We only use & reset the data-> memory context in the change
callback. I'm not sure that's good.



> > This however I'm not following. Why do we need multiple copies of this?
> > And why aren't we doing the assignments in _PG_init?  Seems better to
> > just allocate one WalRcvCalllbacks globally and assign all these as
> > constants.  Then the establishment function can just return all these
> > (as part of a bigger struct).
> > 
> 
> Meh, If I understand you correctly that will make the access bit more ugly
> (multiple layers of structs).

On the other hand, you right now need to access one struct, and pass the
other...



> > This does rather reinforce my opinion that the _PG_init removal in
> > libpqwalreceiver isn't useful.
> 
> I don't see how it helps, you said we'd still return struct from some
> interface so this would be more or less the same?

Or we just set some global vars and use them directly.


Andres



Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 14/09/16 18:21, Andres Freund wrote:
> (continuing, uh, a bit happier)
>
> On 2016-09-09 00:59:26 +0200, Petr Jelinek wrote:
>
>> +/*
>> + * Relcache invalidation callback for our relation map cache.
>> + */
>> +static void
>> +logicalreprelmap_invalidate_cb(Datum arg, Oid reloid)
>> +{
>> +    LogicalRepRelMapEntry  *entry;
>> +
>> +    /* Just to be sure. */
>> +    if (LogicalRepRelMap == NULL)
>> +        return;
>> +
>> +    if (reloid != InvalidOid)
>> +    {
>> +        HASH_SEQ_STATUS status;
>> +
>> +        hash_seq_init(&status, LogicalRepRelMap);
>> +
>> +        /* TODO, use inverse lookup hastable? */
>
> *hashtable
>
>> +        while ((entry = (LogicalRepRelMapEntry *) hash_seq_search(&status)) != NULL)
>> +        {
>> +            if (entry->reloid == reloid)
>> +                entry->reloid = InvalidOid;
>
> can't we break here?
>

Probably.

>
>> +/*
>> + * Initialize the relation map cache.
>> + */
>> +static void
>> +remoterelmap_init(void)
>> +{
>> +    HASHCTL        ctl;
>> +
>> +    /* Make sure we've initialized CacheMemoryContext. */
>> +    if (CacheMemoryContext == NULL)
>> +        CreateCacheMemoryContext();
>> +
>> +    /* Initialize the hash table. */
>> +    MemSet(&ctl, 0, sizeof(ctl));
>> +    ctl.keysize = sizeof(uint32);
>> +    ctl.entrysize = sizeof(LogicalRepRelMapEntry);
>> +    ctl.hcxt = CacheMemoryContext;
>
> Wonder if this (and similar code earlier) should try to do everything in
> a sub-context of CacheMemoryContext instead. That'd make some issues
> easier to track down.

Sure. don't see why not.

>
>> +/*
>> + * Open the local relation associated with the remote one.
>> + */
>> +static LogicalRepRelMapEntry *
>> +logicalreprel_open(uint32 remoteid, LOCKMODE lockmode)
>> +{
>> +    LogicalRepRelMapEntry  *entry;
>> +    bool        found;
>> +
>> +    if (LogicalRepRelMap == NULL)
>> +        remoterelmap_init();
>> +
>> +    /* Search for existing entry. */
>> +    entry = hash_search(LogicalRepRelMap, (void *) &remoteid,
>> +                        HASH_FIND, &found);
>> +
>> +    if (!found)
>> +        elog(FATAL, "cache lookup failed for remote relation %u",
>> +             remoteid);
>> +
>> +    /* Need to update the local cache? */
>> +    if (!OidIsValid(entry->reloid))
>> +    {
>> +        Oid            nspid;
>> +        Oid            relid;
>> +        int            i;
>> +        TupleDesc    desc;
>> +        LogicalRepRelation *remoterel;
>> +
>> +        remoterel = &entry->remoterel;
>> +
>> +        nspid = LookupExplicitNamespace(remoterel->nspname, false);
>> +        if (!OidIsValid(nspid))
>> +            ereport(FATAL,
>> +                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>> +                     errmsg("the logical replication target %s not found",
>> +                            quote_qualified_identifier(remoterel->nspname,
>                                                        remoterel->relname))));
>> +        relid = get_relname_relid(remoterel->relname, nspid);
>> +        if (!OidIsValid(relid))
>> +            ereport(FATAL,
>> +                    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>> +                     errmsg("the logical replication target %s not found",
>> +                            quote_qualified_identifier(remoterel->nspname,
>> +                                                       remoterel->relname))));
>> +
>> +        entry->rel = heap_open(relid, lockmode);
>
> This seems rather racy. I think this really instead needs something akin
> to RangeVarGetRelidExtended().

Maybe, I am not sure if it really matters here given how it's used, but 
I can change that.

>
>> +/*
>> + * Executor state preparation for evaluation of constraint expressions,
>> + * indexes and triggers.
>> + *
>> + * This is based on similar code in copy.c
>> + */
>> +static EState *
>> +create_estate_for_relation(LogicalRepRelMapEntry *rel)
>> +{
>> +    EState       *estate;
>> +    ResultRelInfo *resultRelInfo;
>> +    RangeTblEntry *rte;
>> +
>> +    estate = CreateExecutorState();
>> +
>> +    rte = makeNode(RangeTblEntry);
>> +    rte->rtekind = RTE_RELATION;
>> +    rte->relid = RelationGetRelid(rel->rel);
>> +    rte->relkind = rel->rel->rd_rel->relkind;
>> +    estate->es_range_table = list_make1(rte);
>> +
>> +    resultRelInfo = makeNode(ResultRelInfo);
>> +    InitResultRelInfo(resultRelInfo, rel->rel, 1, 0);
>> +
>> +    estate->es_result_relations = resultRelInfo;
>> +    estate->es_num_result_relations = 1;
>> +    estate->es_result_relation_info = resultRelInfo;
>> +
>> +    /* Triggers might need a slot */
>> +    if (resultRelInfo->ri_TrigDesc)
>> +        estate->es_trig_tuple_slot = ExecInitExtraTupleSlot(estate);
>> +
>> +    return estate;
>> +}
>
> Ugh, we do this for every single change? That's pretty darn heavy.
>

I plan to add caching but didn't come up with good way of doing that yet.

>
>> +/*
>> + * Check if the local attribute is present in relation definition used
>> + * by upstream and hence updated by the replication.
>> + */
>> +static bool
>> +physatt_in_attmap(LogicalRepRelMapEntry *rel, int attid)
>> +{
>> +    AttrNumber    i;
>> +
>> +    /* Fast path for tables that are same on upstream and downstream. */
>> +    if (attid < rel->remoterel.natts && rel->attmap[attid] == attid)
>> +        return true;
>> +
>> +    /* Try to find the attribute in the map. */
>> +    for (i = 0; i < rel->remoterel.natts; i++)
>> +        if (rel->attmap[i] == attid)
>> +            return true;
>> +
>> +    return false;
>> +}
>
> Shouldn't we rather try to keep an attribute map that always can map
> remote attribute numbers to local ones? That doesn't seem hard on a
> first blush? But I might be missing something here.
>




>
>> +static void
>> +FillSlotDefaults(LogicalRepRelMapEntry *rel, EState *estate,
>> +                 TupleTableSlot *slot)
>> +{
>
> Why is this using a different naming scheme?
>

Because I originally wanted to put it into executor.

>> +/*
>> + * Handle INSERT message.
>> + */
>> +static void
>> +handle_insert(StringInfo s)
>> +{
>> +    LogicalRepRelMapEntry *rel;
>> +    LogicalRepTupleData    newtup;
>> +    LogicalRepRelId        relid;
>> +    EState               *estate;
>> +    TupleTableSlot       *remoteslot;
>> +    MemoryContext        oldctx;
>> +
>> +    ensure_transaction();
>> +
>> +    relid = logicalrep_read_insert(s, &newtup);
>> +    rel = logicalreprel_open(relid, RowExclusiveLock);
>> +
>> +    /* Initialize the executor state. */
>> +    estate = create_estate_for_relation(rel);
>> +    remoteslot = ExecInitExtraTupleSlot(estate);
>> +    ExecSetSlotDescriptor(remoteslot, RelationGetDescr(rel->rel));
>
> This seems incredibly expensive for replicating a lot of rows.

You mean because of create_estate_for_relation()?

>
>> +/*
>> + * Search the relation 'rel' for tuple using the replication index.
>> + *
>> + * If a matching tuple is found lock it with lockmode, fill the slot with its
>> + * contents and return true, return false is returned otherwise.
>> + */
>> +static bool
>> +tuple_find_by_replidx(Relation rel, LockTupleMode lockmode,
>> +                      TupleTableSlot *searchslot, TupleTableSlot *slot)
>> +{
>> +    HeapTuple        scantuple;
>> +    ScanKeyData        skey[INDEX_MAX_KEYS];
>> +    IndexScanDesc    scan;
>> +    SnapshotData    snap;
>> +    TransactionId    xwait;
>> +    Oid                idxoid;
>> +    Relation        idxrel;
>> +    bool            found;
>> +
>> +    /* Open REPLICA IDENTITY index.*/
>> +    idxoid = RelationGetReplicaIndex(rel);
>> +    if (!OidIsValid(idxoid))
>> +    {
>> +        elog(ERROR, "could not find configured replica identity for table \"%s\"",
>> +             RelationGetRelationName(rel));
>> +        return false;
>> +    }
>> +    idxrel = index_open(idxoid, RowExclusiveLock);
>> +
>> +    /* Start an index scan. */
>> +    InitDirtySnapshot(snap);
>> +    scan = index_beginscan(rel, idxrel, &snap,
>> +                           RelationGetNumberOfAttributes(idxrel),
>> +                           0);
>> +
>> +    /* Build scan key. */
>> +    build_replindex_scan_key(skey, rel, idxrel, searchslot);
>> +
>> +retry:
>> +    found = false;
>> +
>> +    index_rescan(scan, skey, RelationGetNumberOfAttributes(idxrel), NULL, 0);
>> +
>> +    /* Try to find the tuple */
>> +    if ((scantuple = index_getnext(scan, ForwardScanDirection)) != NULL)
>> +    {
>> +        found = true;
>> +        ExecStoreTuple(scantuple, slot, InvalidBuffer, false);
>> +        ExecMaterializeSlot(slot);
>> +
>> +        xwait = TransactionIdIsValid(snap.xmin) ?
>> +            snap.xmin : snap.xmax;
>> +
>> +        /*
>> +         * If the tuple is locked, wait for locking transaction to finish
>> +         * and retry.
>> +         */
>> +        if (TransactionIdIsValid(xwait))
>> +        {
>> +            XactLockTableWait(xwait, NULL, NULL, XLTW_None);
>> +            goto retry;
>> +        }
>> +    }
>
> Hm. So we potentially find multiple tuples here, and lock all of
> them. but then only use one for the update.
>

That's not how that code reads for me.

>
>> +static List *
>> +get_subscription_list(void)
>> +{
>> +    List       *res = NIL;
>> +    Relation    rel;
>> +    HeapScanDesc scan;
>> +    HeapTuple    tup;
>> +    MemoryContext resultcxt;
>> +
>> +    /* This is the context that we will allocate our output data in */
>> +    resultcxt = CurrentMemoryContext;
>> +
>> +    /*
>> +     * Start a transaction so we can access pg_database, and get a snapshot.
>> +     * We don't have a use for the snapshot itself, but we're interested in
>> +     * the secondary effect that it sets RecentGlobalXmin.  (This is critical
>> +     * for anything that reads heap pages, because HOT may decide to prune
>> +     * them even if the process doesn't attempt to modify any tuples.)
>> +     */
>
>> +    StartTransactionCommand();
>> +    (void) GetTransactionSnapshot();
>> +
>> +    rel = heap_open(SubscriptionRelationId, AccessShareLock);
>> +    scan = heap_beginscan_catalog(rel, 0, NULL);
>> +
>> +    while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
>> +    {
>> +        Form_pg_subscription subform = (Form_pg_subscription) GETSTRUCT(tup);
>> +        Subscription   *sub;
>> +        MemoryContext    oldcxt;
>> +
>> +        /*
>> +         * Allocate our results in the caller's context, not the
>> +         * transaction's. We do this inside the loop, and restore the original
>> +         * context at the end, so that leaky things like heap_getnext() are
>> +         * not called in a potentially long-lived context.
>> +         */
>> +        oldcxt = MemoryContextSwitchTo(resultcxt);
>> +
>> +        sub = (Subscription *) palloc(sizeof(Subscription));
>> +        sub->oid = HeapTupleGetOid(tup);
>> +        sub->dbid = subform->subdbid;
>> +        sub->enabled = subform->subenabled;
>> +
>> +        /* We don't fill fields we are not intereste in. */
>> +        sub->name = NULL;
>> +        sub->conninfo = NULL;
>> +        sub->slotname = NULL;
>> +        sub->publications = NIL;
>> +
>> +        res = lappend(res, sub);
>> +        MemoryContextSwitchTo(oldcxt);
>> +    }
>> +
>> +    heap_endscan(scan);
>> +    heap_close(rel, AccessShareLock);
>> +
>> +    CommitTransactionCommand();
>
> Hm. this doesn't seem quite right from a locking pov. What if, in the
> middle of this, a new subscription is created?
>

So it will be called again eventually in the next iteration of main 
loop. We don't perfectly stable world view here, just snapshot of it to 
work with.

>
> Hadn't I previously read about always streaming data to disk first?
>
>> @@ -0,0 +1,674 @@
>> +/*-------------------------------------------------------------------------
>> + * tablesync.c
>> + *       PostgreSQL logical replication
>> + *
>> + * Copyright (c) 2012-2016, PostgreSQL Global Development Group
>> + *
>> + * IDENTIFICATION
>> + *      src/backend/replication/logical/tablesync.c
>> + *
>> + * NOTES
>> + *      This file contains code for initial table data synchronization for
>> + *      logical replication.
>> + *
>> + *    The initial data synchronization is done separately for each table,
>> + *    in separate apply worker that only fetches the initial snapshot data
>> + *    from the provider and then synchronizes the position in stream with
>> + *    the main apply worker.
>
> Why? I guess that's because it allows to incrementally add tables, with
> acceptable overhead.
>

Yes I need to document why's more here. It enables us to copy multiple 
tables in parallel (in the future). It also is needed for adding tables 
after the initial sync as you say.

>
>> + *    The stream position synchronization works in multiple steps.
>> + *     - sync finishes copy and sets table state as SYNCWAIT and waits
>> + *       for state to change in a loop
>> + *     - apply periodically checks unsynced tables for SYNCWAIT, when it
>> + *       appears it will compare its position in the stream with the
>> + *       SYNCWAIT position and decides to either set it to CATCHUP when
>> + *       the apply was infront (and wait for the sync to do the catchup),
>> + *       or set the state to SYNCDONE if the sync was infront or in case
>> + *       both sync and apply are at the same position it will set it to
>> + *       READY and stops tracking it
>
> I'm not quite following here.
>

It's hard for me to explain I guess, that's why the flow diagram is 
underneath. The point is to reach same LSN for the table before the main 
apply process can take over the replication of that table. There are 2 
possible scenarios
a) either apply has replayed more of the stream than sync did and then 
the sync needs to ask apply to wait for it a bit (which blocks 
replication for short while)
b) or the sync has replayed more of the stream than sync and then apply 
needs to track the table for a while (and don't apply changes to it) 
until it reaches the same position where sync stopped and once it 
reaches that point it can just apply changes to it same as to any old table

>> + *     - if the state was set to CATCHUP sync will read the stream and
>> + *       apply changes until it catches up to the specified stream
>> + *       position and then sets state to READY and signals apply that it
>> + *       can stop waiting and exits, if the state was set to something
>> + *       else than CATCHUP the sync process will simply end
>> + *     - if the state was set to SYNCDONE by apply, the apply will
>> + *       continue tracking the table until it reaches the SYNCDONE stream
>> + *       position at which point it sets state to READY and stops tracking
>> + *
>> + *    Example flows look like this:
>> + *     - Apply is infront:
>> + *          sync:8   -> set SYNCWAIT
>> + *        apply:10 -> set CATCHUP
>> + *        sync:10  -> set ready
>> + *          exit
>> + *        apply:10
>> + *          stop tracking
>> + *          continue rep
>> + *    - Sync infront:
>> + *        sync:10
>> + *          set SYNCWAIT
>> + *        apply:8
>> + *          set SYNCDONE
>> + *        sync:10
>> + *          exit
>> + *        apply:10
>> + *          set READY
>> + *          stop tracking
>> + *          continue rep
>
> This definitely needs to be expanded a bit. Where are we tracking how
> far replication has progressed on individual tables? Are we creating new
> slots for syncing? Is there any parallelism in syncing?
>

Yes, new slots, tracking is in pg_subscription_rel, parallelism is not 
there yet, but the design is ready for expanding it (I currently 
artificially limit the number of sync workers to one to limit potential 
bugs, but afaik it could just be bumped to more and it should work).

>> +/*
>> + * Exit routine for synchronization worker.
>> + */
>> +static void
>> +finish_sync_worker(char *slotname)
>> +{
>> +    LogicalRepWorker   *worker;
>> +    RepOriginId            originid;
>> +    MemoryContext        oldctx = CurrentMemoryContext;
>> +
>> +    /*
>> +     * Drop the replication slot on remote server.
>> +     * We want to continue even in the case that the slot on remote side
>> +     * is already gone. This means that we can leave slot on the remote
>> +     * side but that can happen for other reasons as well so we can't
>> +     * really protect against that.
>> +     */
>> +    PG_TRY();
>> +    {
>> +        wrcapi->drop_slot(wrchandle, slotname);
>> +    }
>> +    PG_CATCH();
>> +    {
>> +        MemoryContext    ectx;
>> +        ErrorData       *edata;
>> +
>> +        ectx = MemoryContextSwitchTo(oldctx);
>> +        /* Save error info */
>> +        edata = CopyErrorData();
>> +        MemoryContextSwitchTo(ectx);
>> +        FlushErrorState();
>> +
>> +        ereport(WARNING,
>> +                (errmsg("there was problem dropping the replication slot "
>> +                        "\"%s\" on provider", slotname),
>> +                 errdetail("The error was: %s", edata->message),
>> +                 errhint("You may have to drop it manually")));
>> +        FreeErrorData(edata);
>
> ISTM we really should rather return success/failure here, and not throw
> an error inside the libpqwalreceiver stuff.  I kind of wonder if we
> actually can get rid of this indirection.
>

Yeah I can do success/failure. Not sure what you mean by indirection.

>> +         * to ensure that we are not behind it (it's going to wait at this
>> +         * point for the change of state). Once we are infront or at the same
>> +         * position as the synchronization proccess we can signal it to
>> +         * finish the catchup.
>> +         */
>> +        if (tstate->state == SUBREL_STATE_SYNCWAIT)
>> +        {
>> +            if (end_lsn > tstate->lsn)
>> +            {
>> +                /*
>> +                 * Apply is infront, tell sync to catchup. and wait until
>> +                 * it does.
>> +                 */
>> +                tstate->state = SUBREL_STATE_CATCHUP;
>> +                tstate->lsn = end_lsn;
>> +                StartTransactionCommand();
>> +                SetSubscriptionRelState(MyLogicalRepWorker->subid,
>> +                                        tstate->relid, tstate->state,
>> +                                        tstate->lsn);
>> +                CommitTransactionCommand();
>> +
>> +                /* Signal the worker as it may be waiting for us. */
>> +                LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
>> +                worker = logicalrep_worker_find(MyLogicalRepWorker->subid,
>> +                                                tstate->relid);
>> +                if (worker && worker->proc)
>> +                    SetLatch(&worker->proc->procLatch);
>> +                LWLockRelease(LogicalRepWorkerLock);
>
> Different parts of this file use different lock level to set the
> latch. Why?
>

The latch does not need the lock, not really following what you mean. 
But the lock here is for the benefit of logicalrep_worker_find.

>
>> +                if (wait_for_sync_status_change(tstate))
>> +                    Assert(tstate->state == SUBREL_STATE_READY);
>> +            }
>> +            else
>> +            {
>> +                /*
>> +                 * Apply is either behind in which case sync worker is done
>> +                 * but apply needs to keep tracking the table until it
>> +                 * catches up to where sync finished.
>> +                 * Or apply and sync are at the same position in which case
>> +                 * table can be switched to standard replication mode
>> +                 * immediately.
>> +                 */
>> +                if (end_lsn < tstate->lsn)
>> +                    tstate->state = SUBREL_STATE_SYNCDONE;
>> +                else
>> +                    tstate->state = SUBREL_STATE_READY;
>> +
>
> What I'm failing to understand is how this can be done under
> concurrency. You probably thought about this, but it should really be
> explained somewhere.

Well, so, if the original state was syncdone (the previous branch) the 
apply won't actually do any work until the state changes (and it can 
only change to either syncdone or ready at that point) so there is no 
real concurrently. If reach this branch then either sync worker already 
exited (if it set the state to syncdone) or it's not doing anything and 
is waiting for apply to set state to ready in which case there is also 
no concurrency.

>> +        /*
>> +         * In case table is supposed to be synchronizing but the
>> +         * synchronization worker is not running, start it.
>> +         * Limit the number of launched workers here to one (for now).
>> +         */
>
> Hm. That seems problematic for online upgrade type cases, we might never
> be catch up that way...
>

You mean the limit to 1? That's just because I didn't get to creating 
GUC for configuring this.

>
>
>> +                /*
>> +                 * We want to do the table data sync in single
>> +                 * transaction so do not close the transaction opened
>> +                 * above.
>> +                 * There will be no BEGIN or COMMIT messages coming via
>> +                 * logical replication while the copy table command is
>> +                 * running so start the transaction here.
>> +                 * Note the memory context for data handling will still
>> +                 * be done using ensure_transaction called by the insert
>> +                 * handler.
>> +                 */
>> +                StartTransactionCommand();
>> +
>> +                /*
>> +                 * Don't allow parallel access other than SELECT while
>> +                 * the initial contents are being copied.
>> +                 */
>> +                rel = heap_open(tstate.relid, ExclusiveLock);
>
> Why do we want to allow access at all?
>

I didn't see reason to not allow selects.

>
>
>> @@ -87,6 +92,8 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
>>      cb->commit_cb = pgoutput_commit_txn;
>>      cb->filter_by_origin_cb = pgoutput_origin_filter;
>>      cb->shutdown_cb = pgoutput_shutdown;
>> +    cb->tuple_cb = pgoutput_tuple;
>> +    cb->list_tables_cb = pgoutput_list_tables;
>>  }
>
> What are these new, and undocumented callbacks actually doing? And why
> is this integrated into logical decoding?
>

In the initial email I was saying that I am not very happy with this 
design, that's still true, because they don't belong to decoding.

>
>>  /*
>> + * Handle LIST_TABLES command.
>> + */
>> +static void
>> +SendTableList(ListTablesCmd *cmd)
>> +{
>
> Ugh.
>
>
> I really dislike this kind of command. I think we should instead change
> things around, allowing to issue normal SQL via the replication
> command. We'll have to error out for running sql for non-database
> connected replication connections, but that seems fine.
>

Note per discussion offline we agree to do this stuff over normal 
connection for now.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 14/09/16 21:53, Andres Freund wrote:
> Hi,
>
> On 2016-09-14 21:17:42 +0200, Petr Jelinek wrote:
>>>> +/*
>>>> + * Gather Relations based o provided by RangeVar list.
>>>> + * The gathered tables are locked in access share lock mode.
>>>> + */
>>>
>>> Why access share? Shouldn't we make this ShareUpdateExclusive or
>>> similar, to prevent schema changes?
>>>
>>
>> Hm, I thought AccessShare would be enough to prevent schema changes that
>> matter to us (which is basically just drop afaik).
>
> Doesn't e.g. dropping an index matter as well?
>

Drop of primary key matters I guess.

>
>>>> +                    if (strcmp($1, "replicate_insert") == 0)
>>>> +                        $$ = makeDefElem("replicate_insert",
>>>> +                                         (Node *)makeInteger(TRUE), @1);
>>>> +                    else if (strcmp($1, "noreplicate_insert") == 0)
>>>> +                        $$ = makeDefElem("replicate_insert",
>>>> +                                         (Node *)makeInteger(FALSE), @1);
>>>> +                    else if (strcmp($1, "replicate_update") == 0)
>>>> +                        $$ = makeDefElem("replicate_update",
>>>> +                                         (Node *)makeInteger(TRUE), @1);
>>>> +                    else if (strcmp($1, "noreplicate_update") == 0)
>>>> +                        $$ = makeDefElem("replicate_update",
>>>> +                                         (Node *)makeInteger(FALSE), @1);
>>>> +                    else if (strcmp($1, "replicate_delete") == 0)
>>>> +                        $$ = makeDefElem("replicate_delete",
>>>> +                                         (Node *)makeInteger(TRUE), @1);
>>>> +                    else if (strcmp($1, "noreplicate_delete") == 0)
>>>> +                        $$ = makeDefElem("replicate_delete",
>>>> +                                         (Node *)makeInteger(FALSE), @1);
>>>> +                    else
>>>> +                        ereport(ERROR,
>>>> +                                (errcode(ERRCODE_SYNTAX_ERROR),
>>>> +                                 errmsg("unrecognized publication option \"%s\"", $1),
>>>> +                                     parser_errposition(@1)));
>>>> +                }
>>>> +        ;
>>>
>>> I'm kind of inclined to do this checking at execution (or transform)
>>> time instead.  That allows extension to add options, and handle them in
>>> utility hooks.
>>>
>>
>> Thant's interesting point, I prefer the parsing to be done in gram.y, but it
>> might be worth moving it for extensibility. Although there are so far other
>> barriers for that.
>
> Citus uses the lack of such check for COPY to implement copy over it's
> distributed tables for example. So there's some benefit.
>

Yeah I am not saying that I am fundamentally against it, I am just 
saying it won't help all that much probably.

>
>
>>>> +    check_subscription_permissions();
>>>> +
>>>> +    rel = heap_open(SubscriptionRelationId, RowExclusiveLock);
>>>> +
>>>> +    /* Check if name is used */
>>>> +    subid = GetSysCacheOid2(SUBSCRIPTIONNAME, MyDatabaseId,
>>>> +                            CStringGetDatum(stmt->subname));
>>>> +    if (OidIsValid(subid))
>>>> +    {
>>>> +        ereport(ERROR,
>>>> +                (errcode(ERRCODE_DUPLICATE_OBJECT),
>>>> +                 errmsg("subscription \"%s\" already exists",
>>>> +                        stmt->subname)));
>>>> +    }
>>>> +
>>>> +    /* Parse and check options. */
>>>> +    parse_subscription_options(stmt->options, &enabled_given, &enabled,
>>>> +                               &conninfo, &publications);
>>>> +
>>>> +    /* TODO: improve error messages here. */
>>>> +    if (conninfo == NULL)
>>>> +        ereport(ERROR,
>>>> +                (errcode(ERRCODE_SYNTAX_ERROR),
>>>> +                 errmsg("connection not specified")));
>>>
>>> Probably also makes sense to parse the conninfo here to verify it looks
>>> saen.  Although that's fairly annoying to do, because the relevant code
>>> is libpq :(
>>>
>>
>> Well the connection is eventually used (in later patches) so maybe that's
>> not problem.
>
> Well, it's nicer if it's immediately parsed, before doing complex and
> expensive stuff, especially if that happens outside of the transaction.
>

Maybe, it's not too hard to add another function to libpqwalreceiver I 
guess.

>
>>>
>>>> diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
>>>> index 65230e2..f3d54c8 100644
>>>> --- a/src/backend/nodes/copyfuncs.c
>>>> +++ b/src/backend/nodes/copyfuncs.c
>>>
>>> I think you might be missing outfuncs support.
>>>
>>
>> I thought that we don't do outfuncs for DDL?
>
> I think it's just readfuncs that's skipped.
>

I see only couple odd DDL commands in outfuncs.c.

>
>>>> +                Length of column name (including the NULL-termination
>>>> +                character).
>>>> +</para>
>>>> +</listitem>
>>>> +</varlistentry>
>>>> +<varlistentry>
>>>> +<term>
>>>> +        String
>>>> +</term>
>>>> +<listitem>
>>>> +<para>
>>>> +                Name of the column.
>>>> +</para>
>>>> +</listitem>
>>>> +</varlistentry>
>>>
>>> Huh, no type information?
>>>
>>
>> It's not necessary for the text transfer, it will be if we ever add binary
>> data transfer but that will require protocol version bump anyway.
>
> I'm *hugely* unconvinced of this. For one type information is useful for
> error reporting and such as well. For another, it's one thing to add a
> new protocol message (for differently encoded tuples), and something
> entirely different to change the format of existing messages.
>

Well it's one if on wrrite and one if on read side in this case, but I 
can add it, it's rather simple change. One thing that we need to clarify 
is how we actually send type info, I think for builtin types Oid should 
be enough, but for all other ones we need qualified name of the type IMHO.

>
>>>> +
>>>> +/*
>>>> + * COMMIT callback
>>>> + */
>>>> +static void
>>>> +pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
>>>> +                     XLogRecPtr commit_lsn)
>>>> +{
>>>> +    OutputPluginPrepareWrite(ctx, true);
>>>> +    logicalrep_write_commit(ctx->out, txn, commit_lsn);
>>>> +    OutputPluginWrite(ctx, true);
>>>> +}
>>>
>>> Hm, so we don't reset the context for these...
>>>
>>
>> What?
>
> We only use & reset the data-> memory context in the change
> callback. I'm not sure that's good.
>

Well we don't do anything with the data memory context here.

>
>
>>> This however I'm not following. Why do we need multiple copies of this?
>>> And why aren't we doing the assignments in _PG_init?  Seems better to
>>> just allocate one WalRcvCalllbacks globally and assign all these as
>>> constants.  Then the establishment function can just return all these
>>> (as part of a bigger struct).
>>>
>>
>> Meh, If I understand you correctly that will make the access bit more ugly
>> (multiple layers of structs).
>
> On the other hand, you right now need to access one struct, and pass the
> other...
>

Point taken.

>
>
>>> This does rather reinforce my opinion that the _PG_init removal in
>>> libpqwalreceiver isn't useful.
>>
>> I don't see how it helps, you said we'd still return struct from some
>> interface so this would be more or less the same?
>
> Or we just set some global vars and use them directly.
>

I really hate the "global vars filled by external library when loaded" 
as design pattern, it's how it was done before but it's ugly, especially 
when you share the library between multiple C modules later.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Craig Ringer
Date:
On 14 September 2016 at 04:56, Petr Jelinek <petr@2ndquadrant.com> wrote:

> Not sure what you mean by negotiation. Why would that be needed? You know
> server version when you connect and when you know that you also know what
> capabilities that version of Postgres has. If you send unrecognized option
> you get corresponding error.

Right, because we can rely on the server version = the logical
replication version now.

All good.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Logical Replication WIP

From
Steve Singer
Date:
On 09/08/2016 06:59 PM, Petr Jelinek wrote:
> - the CREATE SUBSCRIPTION also tries to check if the specified 
> connection connects back to same db (although that check is somewhat 
> imperfect) and if it gets stuck on create slot it should be normally 
> cancelable (that should solve the issue Steve Singer had) 



When I create my subscriber database by doing a physical backup of the 
publisher cluster (with cp before I add any data) then I am unable to 
connect subscribe.
ie
initdb ../data
cp -r ../data ../data2
./postgres -D ../data
./postgres -D ../data2


This make sense when I look at your code, but it might not be what we want

I had the same issue when I created my subscriber cluster with 
pg_basebackup (The timeline on the destination cluster still shows as 1)






Re: Logical Replication WIP

From
Steve Singer
Date:
On 09/08/2016 06:59 PM, Petr Jelinek wrote:
> Hi,
>
> Updated version, this should address most of the things in Peter's 
> reviews so far, not all though as some of it needs more discussion.
>

Another bug report.

I had subscribed a subscriber database to a publication with 1 table
create table a (a serial4 primary key, b text);

* I then dropped column b on the subscriber
* inserted some rows on the publisher
* Noticed the expected error about column b not existing in the 
subscriber log
* Added column c on the subscriber, then added column b after column C

I now get the following stack trace

#1  0x00000000007dc8f9 in cstring_to_text (    s=0x16f238af0 <error: Cannot access memory at address 0x16f238af0>)
atvarlena.c:152
 
#2  0x00000000008046a3 in InputFunctionCall (    flinfo=flinfo@entry=0x7fffa02d0250,    str=str@entry=0x16f238af0
<error:Cannot access memory at address 
 
0x16f238af0>, typioparam=typioparam@entry=25, typmod=typmod@entry=-1) at 
fmgr.c:1909
#3  0x0000000000804971 in OidInputFunctionCall (functionId=<optimized out>,    str=0x16f238af0 <error: Cannot access
memoryat address 0x16f238af0>,    typioparam=25, typmod=-1) at fmgr.c:2040
 
#4  0x00000000006aa485 in SlotStoreCStrings (slot=0x2748670,    values=0x7fffa02d0330) at apply.c:569
#5  0x00000000006ab45c in handle_insert (s=0x274d088) at apply.c:756
#6  0x00000000006abcea in handle_message (s=0x7fffa02d3e20) at apply.c:978
#7  LogicalRepApplyLoop (last_received=117457680) at apply.c:1146
#8  0x00000000006ac37e in ApplyWorkerMain (main_arg=<optimized out>)    at apply.c:1530


In SlotStoreCStrings values only has 2 elements but natts is 4



> Changes:
> - I moved the publication.c to pg_publication.c, subscription.c to 
> pg_subscription.c.
> - changed \drp and \drs to \dRp and \dRs
> - fixed definitions of the catalogs (BKI_ROWTYPE_OID)
> - changed some GetPublication calls to get_publication_name
> - fixed getObjectIdentityParts for OCLASS_PUBLICATION_REL
> - fixed get_object_address_publication_rel
> - fixed the dependencies between pkeys and publications, for this I 
> actually had to add new interface to depenency.c that allows dropping 
> single dependency
> - fixed the 'for all tables' and 'for tables all in schema' publications
> - changed the alter publication from FOR to SET
> - added more test cases for the publication DDL
> - fixed compilation of subscription patch alone and docs
> - changed subpublications to name[]
> - added check for publication list duplicates
> - made the subscriptions behave more like they are inside the database 
> instead of shared catalog (even though the catalog is still shared)
> - added options for for CREATE SUBSCRIPTION to optionally not create 
> slot and not do the initial data sync - that should solve the 
> complaint about CREATE SUBSCRIPTION always connecting
> - the CREATE SUBSCRIPTION also tries to check if the specified 
> connection connects back to same db (although that check is somewhat 
> imperfect) and if it gets stuck on create slot it should be normally 
> cancelable (that should solve the issue Steve Singer had)
> - fixed the tests to work in any timezone
> - added DDL regress tests for subscription
> - added proper detection of missing schemas and tables on subscriber
> - rebased on top of 19acee8 as the DefElem changes broke the patch
>
> The table sync is still far from ready.
>
>
>




Re: Logical Replication WIP

From
Peter Eisentraut
Date:
On 9/18/16 4:17 PM, Steve Singer wrote:
> When I create my subscriber database by doing a physical backup of the 
> publisher cluster (with cp before I add any data) then I am unable to 
> connect subscribe.
> ie
> initdb ../data
> cp -r ../data ../data2
> ./postgres -D ../data
> ./postgres -D ../data2
> 
> This make sense when I look at your code, but it might not be what we want

I think if we want to prevent the creation of subscriptions that point
to self, then we need to create a magic token when the postmaster starts
and check for that when we connect.  So more of a running-instance
identifier instead of a instance-on-disk identifier.

The other option is that we just allow it and make it more robust.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Logical Replication WIP

From
Steve Singer
Date:
On Tue, 20 Sep 2016, Peter Eisentraut wrote:

> On 9/18/16 4:17 PM, Steve Singer wrote:

>
> I think if we want to prevent the creation of subscriptions that point
> to self, then we need to create a magic token when the postmaster starts
> and check for that when we connect.  So more of a running-instance
> identifier instead of a instance-on-disk identifier.
>
> The other option is that we just allow it and make it more robust.

I think we should go with the second option for now. I feel that the effort 
is  better spent making sure that initial syncs that have don't subscribe 
(for whatever reasons) can be aborted instead of trying to build a concept of 
node identity before we really need it.

Steve

>
> -- 
> Peter Eisentraut              http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>




Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 21/09/16 05:35, Steve Singer wrote:
> On Tue, 20 Sep 2016, Peter Eisentraut wrote:
> 
>> On 9/18/16 4:17 PM, Steve Singer wrote:
> 
>>
>> I think if we want to prevent the creation of subscriptions that point
>> to self, then we need to create a magic token when the postmaster starts
>> and check for that when we connect.  So more of a running-instance
>> identifier instead of a instance-on-disk identifier.
>>
>> The other option is that we just allow it and make it more robust.
> 
> I think we should go with the second option for now. I feel that the
> effort is  better spent making sure that initial syncs that have don't
> subscribe (for whatever reasons) can be aborted instead of trying to
> build a concept of node identity before we really need it.
> 

Well connecting to yourself will always hang though because the slot
creation needs snapshot and it will wait forever for the current query
to finish. So it will never really work. The hanging query is now
abortable though.

Question is if doing the logical snapshot is really required since we
don't really use the snapshot for anything here.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Peter Eisentraut
Date:
Some partial notes on 0005-Add-logical-replication-workers.patch:

Documentation still says that TRUNCATE is supported.

In catalogs.sgml for pg_subscription column subpublications I'd add a
note that those are publications that live on the remote server.
Otherwise one might think by mistake that it references pg_publication.

The changes in reference.sgml should go into an earlier patch.

Document that table and column names are matched by name.  (This seems
obvious, but it's not explained anywhere, AFAICT.)

Document to what extent other relation types are supported (e.g.,
materialized views as source, view or foreign table or temp table as
target).  Suggest an updatable view as target if user wants to have
different table names or write into a different table structure.

subscriptioncmds.c: In CreateSubscription(), the
CommandCounterIncrement() call is apparently not needed.

subscriptioncmds.c: Duplicative code for libpqwalreceiver loading and
init, should be refactored.

subscriptioncmds.c: Perhaps combine logicalrep_worker_find() and
logicalrep_worker_stop() into one call that also encapsulates the
required locking.

001_rep_changes.pl: The TAP protocol does not allow direct printing to
stdout.  (It needs to be prefixed with # or with spaces or something; I
forget.)  In this case, the print calls can just be removed, because the
following is() calls in each case will print the failing value anyway.

In get_subscription_list(), the memory context pointers don't appear to
do anything useful, because everything ends up being CurrentMemoryContext.

pg_stat_get_subscription(NULL) for "all" seems a bit of a weird interface.

pglogical_apply_main not used, should be removed.

In logicalreprel_open(), the error message "cache lookup failed for
remote relation %u" could be clarified.  This message could probably
happen if the protocol did not send a Relation message first.  (The term
"cache" is perhaps inappropriate for LogicalRepRelMap, because it
implies that the value can be gotten from elsewhere if it's not in the
cache.  In this case it's really session state that cannot be recovered
easily.)

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 21/09/16 15:04, Peter Eisentraut wrote:
> Some partial notes on 0005-Add-logical-replication-workers.patch:
> 
> Document to what extent other relation types are supported (e.g.,
> materialized views as source, view or foreign table or temp table as
> target).  Suggest an updatable view as target if user wants to have
> different table names or write into a different table structure.
> 

I don't think that's good suggestion, for one it won't work for UPDATEs
as we have completely different path for finding the tuple to update
which only works on real data, not on view. I am thinking of even just
allowing table to table replication in v1 tbh, but yes it should be
documented what target relation types can be.

> 
> subscriptioncmds.c: Perhaps combine logicalrep_worker_find() and
> logicalrep_worker_stop() into one call that also encapsulates the
> required locking.

I was actually thinking of moving the wait loop that waits for worker to
finish there as well.

> 
> In get_subscription_list(), the memory context pointers don't appear to
> do anything useful, because everything ends up being CurrentMemoryContext.
> 

That's kind of the point of the memory context pointers there though as
we start transaction inside that function.

> pg_stat_get_subscription(NULL) for "all" seems a bit of a weird interface.
> 

I modeled that after pg_stat_get_activity() which seems to be similar
type of interface.

> pglogical_apply_main not used, should be removed.
> 

Hah.

> In logicalreprel_open(), the error message "cache lookup failed for
> remote relation %u" could be clarified.  This message could probably
> happen if the protocol did not send a Relation message first.  (The term
> "cache" is perhaps inappropriate for LogicalRepRelMap, because it
> implies that the value can be gotten from elsewhere if it's not in the
> cache.  In this case it's really session state that cannot be recovered
> easily.)
> 

Yeah I have different code and error for that now.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Peter Eisentraut
Date:
On 9/23/16 9:28 PM, Petr Jelinek wrote:
>> Document to what extent other relation types are supported (e.g.,
>> > materialized views as source, view or foreign table or temp table as
>> > target).  Suggest an updatable view as target if user wants to have
>> > different table names or write into a different table structure.
>> > 
> I don't think that's good suggestion, for one it won't work for UPDATEs
> as we have completely different path for finding the tuple to update
> which only works on real data, not on view. I am thinking of even just
> allowing table to table replication in v1 tbh, but yes it should be
> documented what target relation types can be.

I'll generalize this then to: Determine which relation types should be
supported at either end, document that, and then make sure it works that
way.  A restrictive implementation is OK for the first version, as long
as it keeps options open.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Logical Replication WIP

From
Michael Paquier
Date:
On Wed, Sep 28, 2016 at 10:12 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 9/23/16 9:28 PM, Petr Jelinek wrote:
>>> Document to what extent other relation types are supported (e.g.,
>>> > materialized views as source, view or foreign table or temp table as
>>> > target).  Suggest an updatable view as target if user wants to have
>>> > different table names or write into a different table structure.
>>> >
>> I don't think that's good suggestion, for one it won't work for UPDATEs
>> as we have completely different path for finding the tuple to update
>> which only works on real data, not on view. I am thinking of even just
>> allowing table to table replication in v1 tbh, but yes it should be
>> documented what target relation types can be.
>
> I'll generalize this then to: Determine which relation types should be
> supported at either end, document that, and then make sure it works that
> way.  A restrictive implementation is OK for the first version, as long
> as it keeps options open.

The newest patch is 3-week old, so marking this entry as returned with feedback.
-- 
Michael



Re: Logical Replication WIP

From
Petr Jelinek
Date:
Hi,

attached is updated version of the patch.

There are quite a few improvements and restructuring, I fixed all the
bugs and basically everything that came up from the reviews and was
agreed on. There are still couple of things missing, ie column type
definition in protocol and some things related to existing data copy.

The biggest changes are:

I added one more prerequisite patch (the first one) which adds ephemeral
slots (or well implements UI on top of the code that was mostly already
there). The ephemeral slots are different in that they go away either on
error or when session is closed. This means the initial data sync does
not have to worry about cleaning up the slots after itself. I think this
will be useful in other places as well (for example basebackup). I
originally wanted to call them temporary slots in the UI but since the
behavior is bit different from temp tables I decided to go with what the
underlying code calls them in UI as well.

I also split out the libpqwalreceiver rewrite to separate patch which
does just the re-architecture and does not really add new functionality.
And I did the re-architecture bit differently based on the review.

There is now new executor module in execReplication.c, no new nodes but
several utility commands. I moved there the tuple lookup functions from
apply and also wrote new interfaces for doing inserts/updates/deletes to
a table including index updates and constraints checks and trigger
execution but without the need for the whole nodeModifyTable handling.

What I also did when rewriting this is implementation of the tuple
lookup also using sequential scan so that we can support replica
identity full properly. This greatly simplified the dependency handling
between pkeys and publications (by removing it completely ;) ). Also
when there is replica identity full and the table has primary key, the
code will use the primary key even though it's not replica identity
index to lookup the row so that users who want to combine the logical
replication with some kind of other system that requires replica
identity full (ie auditing) they still get usable experience.

The way copy is done was heavily reworked. For one it uses the ephemeral
slots mentioned above. But more importantly there are now new custom
commands anymore. Instead the walsender accepts some SQL, currently
allowed are BEGIN, ROLLBACK, SELECT and COPY. The way that is
implemented is probably not perfect and it could use look from somebody
who knows bison better. How it works is that if the command sent to
walsender starts with one of the above mentioned keywords the walsender
parser passes the whole query back and it's passed then to
exec_simple_query. The main reason why we need BEGIN is so that the COPY
can use the snapshot exported by the slot creation so that there is
synchronization point when there are concurrent writes. This probably
needs more discussion.

I also tried to keep the naming more consistent so cleaned up all
mentions of "provider" and changed them to "publisher" and also
publications don't mention that they "replicate", they just "publish"
now (that has effect on DDL syntax as well).


Some things that were discussed in the reviews that I didn't implement
knowingly include:

Removal of the Oid in the pg_publication_rel, that's mainly because it
would need significant changes to pg_dump which assumes everything
that's dumped has Oid and it's not something that seems worth it as part
of this patch.

Also didn't do the outfuncs, it's unclear to me what are the rules there
as the only DDL statement there is CreateStmt atm.


There are still few TODOs:

Type info for columns. My current best idea is to write typeOid and
typemod in the relation message and add another message (type message)
that describes the type which will skip the built-in types (as we can't
really remap those without breaking a lot of software so they seem safe
to skip). I plan to do this soonish barring objections.

Removal of use of replication origin in the table sync worker.

Parallelization of the initial copy. And ability to resync (do new copy)
of a table. These two mainly wait for agreement over how the current way
of doing copy should work.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Logical Replication WIP

From
Steve Singer
Date:
On 10/24/2016 09:22 AM, Petr Jelinek wrote:
> Hi,
>
> attached is updated version of the patch.
>
> There are quite a few improvements and restructuring, I fixed all the
> bugs and basically everything that came up from the reviews and was
> agreed on. There are still couple of things missing, ie column type
> definition in protocol and some things related to existing data copy.

Here are a few things I've noticed so far.

+<programlisting>
+CREATE SUBSCRIPTION mysub WITH CONNECTION <quote>dbname=foo host=bar 
user=repuser</quote> PUBLICATION mypub;
+</programlisting>
+  </para>
+  <para>

The documentation above doesn't match the syntax, CONNECTION needs to be 
in single quotes not double quotes
I think you want
+<programlisting>
+CREATE SUBSCRIPTION mysub WITH CONNECTION 'dbname=foo host=bar 
user=repuser' PUBLICATION mypub;
+</programlisting>
+  </para>
+  <para>



I am not sure if this is a known issue covered by your comments about 
data copy but I am still having issues with error reporting on a failed 
subscription.

I created a subscription, dropped the subscription and created a second 
one.  The second subscription isn't active but shows no errors.


P: create publication mypub for table public.a;
S: create subscription mysub with connection 'dbname=test host=localhost 
port=5440' publication mypub;
P:  insert into a(b) values ('t');
S: select * FROM a; a | b
---+--- 1 | t
(1 row)

Everything is good
Then I do
S: drop subscription mysub;

S: create subscription mysub2 with connection 'dbname=test 
host=localhost port=5440' publication mypub;
P:  insert into a(b) values ('f');
S: select * FROM a; a | b
---+--- 1 | t

The data doesn't replicate


select * FROM pg_stat_subscription; subid | subname | pid | relid | received_lsn | last_msg_send_time | 
last_msg_receipt_time | latest_end_lsn | latest_end_time

-------+---------+-----+-------+--------------+--------------------+-----------------------+----------------+-----------------
16398| mysub2  |     |       |              | |                       
 
|                |
(1 row)


The only thing in my log is

2016-10-30 15:27:27.038 EDT [6028] NOTICE:  dropped replication slot 
"mysub" on publisher
2016-10-30 15:27:36.072 EDT [6028] NOTICE:  created replication slot 
"mysub2" on publisher
2016-10-30 15:27:36.082 EDT [6028] NOTICE:  synchronized table states


I'd expect an error in the log or something.
However, if I delete everything from the table on the subscriber then 
the subscription proceeds

I think there are still problems with signal handling in the initial sync

If I try to drop mysub2 (while the subscription is stuck instead of 
deleting the data) the drop hangs
If I then try to kill the postmaster for the subscriber nothing happens, 
have to send it a -9 to go away.

However once I do that and then restart the postmaster for the 
subscriber I start to see the duplicate key errors in the log

2016-10-30 16:00:54.635 EDT [7018] ERROR:  duplicate key value violates 
unique constraint "a_pkey"
2016-10-30 16:00:54.635 EDT [7018] DETAIL:  Key (a)=(1) already exists.
2016-10-30 16:00:54.635 EDT [7018] CONTEXT:  COPY a, line 1
2016-10-30 16:00:54.637 EDT [7007] LOG:  worker process: logical 
replication worker 16400 sync 16387 (PID 7018) exited with exit code 1

I'm not sure why I didn't get those until I restarted the postmaster but 
it seems to happen whenever I drop a subscription then create a new one. 
Creating the second subscription from the same psql session as I 
create/drop the first seems important  in reproducing this




I am also having issues dropping a second subscription from the same 
psql session

(table a is empty on both nodes to avoid duplicate key errors)
S: create subscription sub1 with connection  'host=localhost dbname=test 
port=5440' publication mypub;
S: create subscription sub2 with connection  'host=localhost dbname=test 
port=5440' publication mypub;
S: drop subscription sub1;
S: drop subscription sub2;

At this point the drop subscription hangs.






>
> The biggest changes are:
>
> I added one more prerequisite patch (the first one) which adds ephemeral
> slots (or well implements UI on top of the code that was mostly already
> there). The ephemeral slots are different in that they go away either on
> error or when session is closed. This means the initial data sync does
> not have to worry about cleaning up the slots after itself. I think this
> will be useful in other places as well (for example basebackup). I
> originally wanted to call them temporary slots in the UI but since the
> behavior is bit different from temp tables I decided to go with what the
> underlying code calls them in UI as well.
>
> I also split out the libpqwalreceiver rewrite to separate patch which
> does just the re-architecture and does not really add new functionality.
> And I did the re-architecture bit differently based on the review.
>
> There is now new executor module in execReplication.c, no new nodes but
> several utility commands. I moved there the tuple lookup functions from
> apply and also wrote new interfaces for doing inserts/updates/deletes to
> a table including index updates and constraints checks and trigger
> execution but without the need for the whole nodeModifyTable handling.
>
> What I also did when rewriting this is implementation of the tuple
> lookup also using sequential scan so that we can support replica
> identity full properly. This greatly simplified the dependency handling
> between pkeys and publications (by removing it completely ;) ). Also
> when there is replica identity full and the table has primary key, the
> code will use the primary key even though it's not replica identity
> index to lookup the row so that users who want to combine the logical
> replication with some kind of other system that requires replica
> identity full (ie auditing) they still get usable experience.
>
> The way copy is done was heavily reworked. For one it uses the ephemeral
> slots mentioned above. But more importantly there are now new custom
> commands anymore. Instead the walsender accepts some SQL, currently
> allowed are BEGIN, ROLLBACK, SELECT and COPY. The way that is
> implemented is probably not perfect and it could use look from somebody
> who knows bison better. How it works is that if the command sent to
> walsender starts with one of the above mentioned keywords the walsender
> parser passes the whole query back and it's passed then to
> exec_simple_query. The main reason why we need BEGIN is so that the COPY
> can use the snapshot exported by the slot creation so that there is
> synchronization point when there are concurrent writes. This probably
> needs more discussion.
>
> I also tried to keep the naming more consistent so cleaned up all
> mentions of "provider" and changed them to "publisher" and also
> publications don't mention that they "replicate", they just "publish"
> now (that has effect on DDL syntax as well).
>
>
> Some things that were discussed in the reviews that I didn't implement
> knowingly include:
>
> Removal of the Oid in the pg_publication_rel, that's mainly because it
> would need significant changes to pg_dump which assumes everything
> that's dumped has Oid and it's not something that seems worth it as part
> of this patch.
>
> Also didn't do the outfuncs, it's unclear to me what are the rules there
> as the only DDL statement there is CreateStmt atm.
>
>
> There are still few TODOs:
>
> Type info for columns. My current best idea is to write typeOid and
> typemod in the relation message and add another message (type message)
> that describes the type which will skip the built-in types (as we can't
> really remap those without breaking a lot of software so they seem safe
> to skip). I plan to do this soonish barring objections.
>
> Removal of use of replication origin in the table sync worker.
>
> Parallelization of the initial copy. And ability to resync (do new copy)
> of a table. These two mainly wait for agreement over how the current way
> of doing copy should work.
>
>
>




Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 31/10/16 00:52, Steve Singer wrote:
> On 10/24/2016 09:22 AM, Petr Jelinek wrote:
>> Hi,
>>
>> attached is updated version of the patch.
>>
>> There are quite a few improvements and restructuring, I fixed all the
>> bugs and basically everything that came up from the reviews and was
>> agreed on. There are still couple of things missing, ie column type
>> definition in protocol and some things related to existing data copy.
>
> Here are a few things I've noticed so far.
>
> +<programlisting>
> +CREATE SUBSCRIPTION mysub WITH CONNECTION <quote>dbname=foo host=bar
> user=repuser</quote> PUBLICATION mypub;
> +</programlisting>
> +  </para>
> +  <para>
>
> The documentation above doesn't match the syntax, CONNECTION needs to be
> in single quotes not double quotes
> I think you want
> +<programlisting>
> +CREATE SUBSCRIPTION mysub WITH CONNECTION 'dbname=foo host=bar
> user=repuser' PUBLICATION mypub;
> +</programlisting>
> +  </para>
> +  <para>
>

Yes.

>
> I am not sure if this is a known issue covered by your comments about
> data copy but I am still having issues with error reporting on a failed
> subscription.
>
> I created a subscription, dropped the subscription and created a second
> one.  The second subscription isn't active but shows no errors.
>

There are some fundamental issues with initial sync that need to be
discussed on list but this one is not known. I'll try to convert this to
test case (seems like useful one) and fix it, thanks for the report.

In meantime I realized I broke the last patch in the series during
rebase so attached is the fixed version. It also contains the type info
in the protocol.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Logical Replication WIP

From
Peter Eisentraut
Date:
On 10/24/16 9:22 AM, Petr Jelinek wrote:
> I added one more prerequisite patch (the first one) which adds ephemeral
> slots (or well implements UI on top of the code that was mostly already
> there). The ephemeral slots are different in that they go away either on
> error or when session is closed. This means the initial data sync does
> not have to worry about cleaning up the slots after itself. I think this
> will be useful in other places as well (for example basebackup). I
> originally wanted to call them temporary slots in the UI but since the
> behavior is bit different from temp tables I decided to go with what the
> underlying code calls them in UI as well.

I think it makes sense to expose this.

Some of the comments need some polishing.

Eventually, we might want to convert the option list in
CREATE_REPLICATION_SLOT into a list instead of adding more and more
keywords (see also VACUUM), but not necessarily now.

I find the way Acquire and Release are handled now quite confusing.
Because Release of an ephemeral slot means to delete it, you have
changed most code to never release them until the end of the session.
So there is a lot of ugly and confusing code that needs to know this
difference.  I think we need to use some different verbs for different
purposes here.  Acquire and release should keep their meaning of "I'm
using this", and the calls in proc.c and postgres.c should be something
like ReplicationSlotCleanup().

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 02/11/16 17:22, Peter Eisentraut wrote:
> On 10/24/16 9:22 AM, Petr Jelinek wrote:
>> I added one more prerequisite patch (the first one) which adds ephemeral
>> slots (or well implements UI on top of the code that was mostly already
>> there). The ephemeral slots are different in that they go away either on
>> error or when session is closed. This means the initial data sync does
>> not have to worry about cleaning up the slots after itself. I think this
>> will be useful in other places as well (for example basebackup). I
>> originally wanted to call them temporary slots in the UI but since the
>> behavior is bit different from temp tables I decided to go with what the
>> underlying code calls them in UI as well.
> 
> I think it makes sense to expose this.
> 
> Some of the comments need some polishing.
> 
> Eventually, we might want to convert the option list in
> CREATE_REPLICATION_SLOT into a list instead of adding more and more
> keywords (see also VACUUM), but not necessarily now.
> 
> I find the way Acquire and Release are handled now quite confusing.
> Because Release of an ephemeral slot means to delete it, you have
> changed most code to never release them until the end of the session.
> So there is a lot of ugly and confusing code that needs to know this
> difference.  I think we need to use some different verbs for different
> purposes here.  Acquire and release should keep their meaning of "I'm
> using this", and the calls in proc.c and postgres.c should be something
> like ReplicationSlotCleanup().
> 

Release does not really change behavior, it has always dropped ephemeral
slot.

So if I understand correctly what you are proposing is to change
behavior of Release to not remove ephemeral slot, add function that
removes the ephemeral slots of current session and add tracking of
ephemeral slots created in current session? That seems like quite more
complicated than what the patch does with little gain.

What about just releasing the ephemeral slot if the different one is
being acquired instead of the current error?

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Peter Eisentraut
Date:
On 11/3/16 9:31 AM, Petr Jelinek wrote:
> Release does not really change behavior, it has always dropped ephemeral
> slot.

Well, currently ephemeral is just a temporary state while a slot is
being created.  It's not really something that can exist independently.
You might as well call it RS_NOTREADY.  Therefore, dropping the slot
when you de-acquire (release) it makes sense.

But what you want is a slot that exists across acquire/release but it
dropped at the end of the session.  And what is implicit is that the
slot is only usable by one session, so you don't really need to ever
"release" it for use by other sessions.  And so half the Release calls
have been changed to Release-if-persistent, but it's not explained why
in each case.  It all seems to work OK, but there are a lot of hidden
assumptions in each case that make it hard to follow.

> So if I understand correctly what you are proposing is to change
> behavior of Release to not remove ephemeral slot, add function that
> removes the ephemeral slots of current session and add tracking of
> ephemeral slots created in current session? That seems like quite more
> complicated than what the patch does with little gain.
> 
> What about just releasing the ephemeral slot if the different one is
> being acquired instead of the current error?

Maybe that would help reducing some of the mystery about when you have
to call Release and when ReleasePersistent (better called
ReleaseIfPersistent).

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Logical Replication WIP

From
Peter Eisentraut
Date:
On 10/24/16 9:22 AM, Petr Jelinek wrote:
> I also split out the libpqwalreceiver rewrite to separate patch which
> does just the re-architecture and does not really add new functionality.
> And I did the re-architecture bit differently based on the review.

That looks good to me, and it appears to address the previous discussions.

I wouldn't change walrcv_xxx to walrcvconn_xxx.  If we're going to have
macros to hide the internals, we might as well keep the names the same.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Logical Replication WIP

From
Andres Freund
Date:
/* * Replication slot on-disk data structure.
@@ -225,10 +226,25 @@ ReplicationSlotCreate(const char *name, bool db_specific,    ReplicationSlot *slot = NULL;    int
          i;
 

-    Assert(MyReplicationSlot == NULL);
+    /* Only aka ephemeral slots can survive across commands. */

What does this comment mean?


+    Assert(!MyReplicationSlot ||
+           MyReplicationSlot->data.persistency == RS_EPHEMERAL);

+    if (MyReplicationSlot)
+    {
+        /* Already acquired? Nothis to do. */

typo.

+        if (namestrcmp(&MyReplicationSlot->data.name, name) == 0)
+            return;
+
+        ereport(ERROR,
+                (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                 errmsg("cannot create replication slot %s, another slot %s is "
+                        "already active in this session",
+                        name, NameStr(MyReplicationSlot->data.name))));
+    }
+

Why do we now create slots that are already created? That seems like an
odd API change.
    /*     * If some other backend ran this code concurrently with us, we'd likely     * both allocate the same slot,
andthat would be bad.  We'd also be at
 
@@ -331,10 +347,25 @@ ReplicationSlotAcquire(const char *name)    int            i;    int            active_pid = 0;

-    Assert(MyReplicationSlot == NULL);
+    /* Only aka ephemeral slots can survive across commands. */
+    Assert(!MyReplicationSlot ||
+           MyReplicationSlot->data.persistency == RS_EPHEMERAL);
    ReplicationSlotValidateName(name, ERROR);

+    if (MyReplicationSlot)
+    {
+        /* Already acquired? Nothis to do. */
+        if (namestrcmp(&MyReplicationSlot->data.name, name) == 0)
+            return;
+
+        ereport(ERROR,
+                (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                 errmsg("cannot acquire replication slot %s, another slot %s is "
+                        "already active in this session",
+                        name, NameStr(MyReplicationSlot->data.name))));
+    }
+    /* Search for the named slot and mark it active if we find it. */    LWLockAcquire(ReplicationSlotControlLock,
LW_SHARED);   for (i = 0; i < max_replication_slots; i++)
 
@@ -406,12 +437,26 @@ ReplicationSlotRelease(void)}

Uh? We shouldn't ever have to acquire ephemeral
/*
+ * Same as above but only if currently acquired slot is peristent one.
+ */

s/peristent/persistent/

+void
+ReplicationSlotReleasePersistent(void)
+{
+    Assert(MyReplicationSlot);
+
+    if (MyReplicationSlot->data.persistency == RS_PERSISTENT)
+        ReplicationSlotRelease();
+}

Ick.



Hm. I think I have to agree a bit with Peter here.  Overloading
MyReplicationSlot this way seems ugly, and I think there's a bunch of
bugs around it too.


Sounds what we really want is a) two different lifetimes for ephemeral
slots, session and "command" b) have a number of slots that are released
either after a failed transaction / command or at session end.   The
easiest way for that appears to have a list of slots to be checked at
end-of-xact and backend shutdown. 

Regards,

Andres



Re: Logical Replication WIP

From
Andres Freund
Date:
Hi,


/* Prototypes for interface functions */
-static void libpqrcv_connect(char *conninfo);
-static char *libpqrcv_get_conninfo(void);
-static void libpqrcv_identify_system(TimeLineID *primary_tli);
-static void libpqrcv_readtimelinehistoryfile(TimeLineID tli, char **filename, char **content, int *len);
-static bool libpqrcv_startstreaming(TimeLineID tli, XLogRecPtr startpoint,
-                        char *slotname);
-static void libpqrcv_endstreaming(TimeLineID *next_tli);
-static int    libpqrcv_receive(char **buffer, pgsocket *wait_fd);
-static void libpqrcv_send(const char *buffer, int nbytes);
-static void libpqrcv_disconnect(void);
+static WalReceiverConn *libpqrcv_connect(char *conninfo,
+                                         bool logical, const char *appname);
+static char *libpqrcv_get_conninfo(WalReceiverConn *conn);
+static char *libpqrcv_identify_system(WalReceiverConn *conn,
+                                      TimeLineID *primary_tli);
+static void libpqrcv_readtimelinehistoryfile(WalReceiverConn *conn,
+                                 TimeLineID tli, char **filename,
+                                 char **content, int *len);
+static bool libpqrcv_startstreaming(WalReceiverConn *conn,
+                             TimeLineID tli, XLogRecPtr startpoint,
+                             const char *slotname);
+static void libpqrcv_endstreaming(WalReceiverConn *conn,
+                                  TimeLineID *next_tli);
+static int    libpqrcv_receive(WalReceiverConn *conn, char **buffer,
+                             pgsocket *wait_fd);
+static void libpqrcv_send(WalReceiverConn *conn, const char *buffer,
+                          int nbytes);
+static void libpqrcv_disconnect(WalReceiverConn *conn);


That looks good.
/* Prototypes for private functions */
-static bool libpq_select(int timeout_ms);
+static bool libpq_select(PGconn *streamConn,
+                         int timeout_ms);

If we're starting to use this more widely, we really should just a latch
instead of the plain select(). In fact, I think it's more or less a bug
that we don't (select is only interruptible by signals on a subset of
our platforms).  That shouldn't bother this patch, but...



This looks pretty close to committable, Peter do you want to do that, or
should I?

Andres



Re: Logical Replication WIP

From
Andres Freund
Date:
Hi,

+ <sect1 id="catalog-pg-publication-rel">
+  <title><structname>pg_publication_rel</structname></title>
+
+  <indexterm zone="catalog-pg-publication-rel">
+   <primary>pg_publication_rel</primary>
+  </indexterm>
+
+  <para>
+   The <structname>pg_publication_rel</structname> catalog contains
+   mapping between tables and publications in the database. This is many to
+   many mapping.
+  </para>

I wonder if we shouldn't abstract this a bit away from relations to
allow other objects to be exported to. Could structure it a bit more
like pg_depend.


+ALTER PUBLICATION <replaceable class="PARAMETER">name</replaceable> [ [ WITH ] <replaceable
class="PARAMETER">option</replaceable>[ ... ] ]
 
+
+<phrase>where <replaceable class="PARAMETER">option</replaceable> can be:</phrase>
+
+      PuBLISH_INSERT | NOPuBLISH_INSERT
+    | PuBLISH_UPDATE | NOPuBLISH_UPDATE
+    | PuBLISH_DELETE | NOPuBLISH_DELETE

That's odd casing.


+   <varlistentry>
+    <term><literal>PuBLISH_INSERT</literal></term>
+    <term><literal>NOPuBLISH_INSERT</literal></term>
+    <term><literal>PuBLISH_UPDATE</literal></term>
+    <term><literal>NOPuBLISH_UPDATE</literal></term>
+    <term><literal>PuBLISH_DELETE</literal></term>
+    <term><literal>NOPuBLISH_DELETE</literal></term>

More odd casing.

+   <varlistentry>
+    <term><literal>FOR TABLE</literal></term>
+    <listitem>
+     <para>
+      Specifies optional list of tables to add to the publication.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>FOR TABLE ALL IN SCHEMA</literal></term>
+    <listitem>
+     <para>
+      Specifies optional schema for which all logged tables will be added to
+      publication.
+     </para>
+    </listitem>
+   </varlistentry>

"FOR TABLE ALL IN SCHEMA" sounds weird.


+  <para>
+   This operation does not reserve any resources on the server. It only
+   defines grouping and filtering logic for future subscribers.
+  </para>

That's strictly speaking not true, maybe rephrase a bit?

+/*
+ * Check if relation can be in given publication and throws appropriate
+ * error if not.
+ */
+static void
+check_publication_add_relation(Relation targetrel)
+{
+    /* Must be table */
+    if (RelationGetForm(targetrel)->relkind != RELKIND_RELATION)
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("only tables can be added to publication"),
+                 errdetail("%s is not a table",
+                           RelationGetRelationName(targetrel))));
+
+    /* Can't be system table */
+    if (IsCatalogRelation(targetrel))
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("only user tables can be added to publication"),
+                 errdetail("%s is a system table",
+                           RelationGetRelationName(targetrel))));
+
+    /* UNLOGGED and TEMP relations cannot be part of publication. */
+    if (!RelationNeedsWAL(targetrel))
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("UNLOGGED and TEMP relations cannot be replicated")));
+}

This probably means we need a check in the ALTER TABLE ... SET UNLOGGED
path.


+/*
+ * Returns if relation represented by oid and Form_pg_class entry
+ * is publishable.
+ *
+ * Does same checks as the above, but does not need relation to be opened
+ * and also does not throw errors.
+ */
+static bool
+is_publishable_class(Oid relid, Form_pg_class reltuple)
+{
+    return reltuple->relkind == RELKIND_RELATION &&
+        !IsCatalogClass(relid, reltuple) &&
+        reltuple->relpersistence == RELPERSISTENCE_PERMANENT &&
+        /* XXX needed to exclude information_schema tables */
+        relid >= FirstNormalObjectId;
+}

Shouldn't that be IsCatalogRelation() instead?


+CREATE VIEW pg_publication_tables AS
+    SELECT
+        P.pubname AS pubname,
+        N.nspname AS schemaname,
+        C.relname AS tablename
+    FROM pg_publication P, pg_class C
+         JOIN pg_namespace N ON (N.oid = C.relnamespace)
+    WHERE C.relkind = 'r'
+      AND C.oid IN (SELECT relid FROM pg_get_publication_tables(P.pubname));

That's going to be quite inefficient if you filter by table... Might be
better to do that via the underlying table.


+/*
+ * Create new publication.
+ * TODO ACL check
+ */

Hm?

+ObjectAddress
+CreatePublication(CreatePublicationStmt *stmt)
+{
+    check_replication_permissions();

+
+/*
+ * Drop publication by OID
+ */
+void
+DropPublicationById(Oid pubid)
+
+/*
+ * Remove relation from publication by mapping OID.
+ */
+void
+RemovePublicationRelById(Oid proid)
+{

Permission checks?

+}

Hm. Neither of these does dependency checking, wonder if that can be
argued to be problematic.


+/*
+ * Gather Relations based o provided by RangeVar list.
+ * The gathered tables are locked in ShareUpdateExclusiveLock mode.
+ */

s/o/on/.  Not sure if gather is the best name.

+static List *
+GatherTableList(List *tables)


+/*
+ * Close all relations in the list.
+ */
+static void
+CloseTables(List *rels)

Shouldn't that be CloseTableList() based on the preceding function's naming?

+
+/*
+ * Add listed tables to the publication.
+ */
+static void
+PublicationAddTables(Oid pubid, List *rels, bool if_not_exists,
+                     AlterPublicationStmt *stmt)
+{
+    ListCell       *lc;
+
+    Assert(!stmt || !stmt->for_all_tables);
+
+    foreach(lc, rels)
+    {
+        Relation    rel = (Relation) lfirst(lc);
+        ObjectAddress    obj;
+
+        obj = publication_add_relation(pubid, rel, if_not_exists);
+        if (stmt)
+            EventTriggerCollectSimpleCommand(obj, InvalidObjectAddress,
+                                             (Node *) stmt);
+    }
+}
+
+/*
+ * Remove listed tables to the publication.
+ */

s/to/from/

+static void
+PublicationDropTables(Oid pubid, List *rels, bool missing_ok)
+{
+    ObjectAddress    obj;
+    ListCell       *lc;
+    Oid                prid;
+
+    foreach(lc, rels)
+    {
+        Relation    rel = (Relation) lfirst(lc);
+        Oid            relid = RelationGetRelid(rel);
+
+        prid = GetSysCacheOid2(PUBLICATIONRELMAP, ObjectIdGetDatum(relid),
+                               ObjectIdGetDatum(pubid));
+        if (!OidIsValid(prid))
+        {
+            if (missing_ok)
+                continue;
+
+            ereport(ERROR,
+                    (errcode(ERRCODE_UNDEFINED_OBJECT),
+                     errmsg("relation \"%s\" is not part of the publication",
+                            RelationGetRelationName(rel))));
+        }
+
+        ObjectAddressSet(obj, PublicationRelRelationId, prid);
+        performDeletion(&obj, DROP_CASCADE, 0);
+    }
+}


/*
+ * Check if command can be executed with current replica identity.
+ */
+static void
+CheckCmdReplicaIdentity(Relation rel, CmdType cmd)
+{
+    PublicationActions *pubactions;
+
+    /* We only need to do checks for UPDATE and DELETE. */
+    if (cmd != CMD_UPDATE && cmd != CMD_DELETE)
+        return;
+
+    /* If relation has replica identity we are always good. */
+    if (rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL ||
+        OidIsValid(RelationGetReplicaIndex(rel)))
+        return;
+
+    /*
+     * This is either UPDATE OR DELETE and there is no replica identity.
+     *
+     * Check if the table publishes UPDATES or DELETES.
+     */
+    pubactions = GetRelationPublicationActions(rel);
+    if (pubactions->pubupdate || pubactions->pubdelete)

I think that leads to spurious errors. Consider a DELETE with a
publication that replicates updates but not deletes.

+        ereport(ERROR,
+                (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                 errmsg("cannot update table \"%s\" because it does not have replica identity and publishes updates",
+                        RelationGetRelationName(rel)),
+                 errhint("To enable updating the table, provide set REPLICA IDENTITY using ALTER TABLE.")));
+}

"provide set"


+publication_opt_item:
+            IDENT
+                {
+                    /*
+                     * We handle identifiers that aren't parser keywords with
+                     * the following special-case codes, to avoid bloating the
+                     * size of the main parser.
+                     */
+                    if (strcmp($1, "publish_insert") == 0)
+                        $$ = makeDefElem("publish_insert",
+                                         (Node *)makeInteger(TRUE), @1);
+                    else if (strcmp($1, "nopublish_insert") == 0)
+                        $$ = makeDefElem("publish_insert",
+                                         (Node *)makeInteger(FALSE), @1);
+                    else if (strcmp($1, "publish_update") == 0)
+                        $$ = makeDefElem("publish_update",
+                                         (Node *)makeInteger(TRUE), @1);
+                    else if (strcmp($1, "nopublish_update") == 0)
+                        $$ = makeDefElem("publish_update",
+                                         (Node *)makeInteger(FALSE), @1);
+                    else if (strcmp($1, "publish_delete") == 0)
+                        $$ = makeDefElem("publish_delete",
+                                         (Node *)makeInteger(TRUE), @1);
+                    else if (strcmp($1, "nopublish_delete") == 0)
+                        $$ = makeDefElem("publish_delete",
+                                         (Node *)makeInteger(FALSE), @1);
+                    else
+                        ereport(ERROR,
+                                (errcode(ERRCODE_SYNTAX_ERROR),
+                                 errmsg("unrecognized publication option \"%s\"", $1),
+                                     parser_errposition(@1)));
+                }
+        ;

I still would very much like to move this outside of gram.y and just use
IDENTs here. Like how COPY options are handled.


+/*
+ * Get publication actions for list of publication oids.
+ */
+struct PublicationActions *
+GetRelationPublicationActions(Relation relation)

API description and function name/parameters don't quite match.



+CATALOG(pg_publication,6104)
+{
+    NameData    pubname;            /* name of the publication */
+
+    /*
+     * indicates that this is special publication which should encompass
+     * all tables in the database (except for the unlogged and temp ones)
+     */
+    bool        puballtables;
+
+    /* true if inserts are published */
+    bool        pubinsert;
+
+    /* true if updates are published */
+    bool        pubupdate;
+
+    /* true if deletes are published */
+    bool        pubdelete;
+
+} FormData_pg_publication;

Shouldn't this have an owner?   I also wonder if we want an easier to
extend form of pubinsert/update/delete (say to add pubddl, pubtruncate,
pub ... without changing the schema).


+/* ----------------
+ *        pg_publication_rel definition.  cpp turns this into
+ *        typedef struct FormData_pg_publication_rel
+ *
+ * ----------------
+ */
+#define PublicationRelRelationId                6106
+
+CATALOG(pg_publication_rel,6106)
+{
+    Oid        prpubid;                /* Oid of the publication */
+    Oid        prrelid;                /* Oid of the relation */
+} FormData_pg_publication_rel;

To me it seems like a good idea to have objclassid/objsubid here.


Regards,

Andres



Re: Logical Replication WIP

From
Andres Freund
Date:
Hi,

(btw, I vote against tarballing patches)

+   <tgroup cols="4">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Type</entry>
+      <entry>References</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry><structfield>oid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry></entry>
+      <entry>Row identifier (hidden attribute; must be explicitly selected)</entry>
+     </row>
+

+     <row>
+      <entry><structfield>subpublications</structfield></entry>
+      <entry><type>name[]</type></entry>
+      <entry></entry>
+      <entry>Array of subscribed publication names. These reference the
+       publications on the publisher server.
+      </entry>

Why is this names and not oids? So you can see it across databases?

I think this again should have an owner.

include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 68d7e46..523008d 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -112,6 +112,7 @@ static event_trigger_support_data event_trigger_support[] = {    {"SCHEMA", true},    {"SEQUENCE",
true},   {"SERVER", true},
 
+    {"SUBSCRIPTION", true},

Hm, is that ok? Subscriptions are shared, so ...?


+        /*
+         * If requested, create the replication slot on remote side for our
+         * newly created subscription.
+         *
+         * Note, we can't cleanup slot in case of failure as reason for
+         * failure might be already existing slot of the same name and we
+         * don't want to drop somebody else's slot by mistake.
+         */
+        if (create_slot)
+        {
+            XLogRecPtr            lsn;
+
+            /*
+             * Create the replication slot on remote side for our newly created
+             * subscription.
+             *
+             * Note, we can't cleanup slot in case of failure as reason for
+             * failure might be already existing slot of the same name and we
+             * don't want to drop somebody else's slot by mistake.
+             */

We should really be able to recognize that based on the error code...

+/*
+ * Drop subscription by OID
+ */
+void
+DropSubscriptionById(Oid subid)
+{

+    /*
+     * We must ignore errors here as that would make it impossible to drop
+     * subscription when publisher is down.
+     */

I'm not convinced.  Leaving a slot around without a "record" of it on
the creating side isn't nice either. Maybe a FORCE flag or something?

+subscription_create_opt_item:
+            subscription_opt_item
+            | INITIALLY IDENT
+                {
+                    if (strcmp($2, "enabled") == 0)
+                        $$ = makeDefElem("enabled",
+                                         (Node *)makeInteger(TRUE), @1);
+                    else if (strcmp($2, "disabled") == 0)
+                        $$ = makeDefElem("enabled",
+                                         (Node *)makeInteger(FALSE), @1);
+                    else
+                        ereport(ERROR,
+                                (errcode(ERRCODE_SYNTAX_ERROR),
+                                 errmsg("unrecognized subscription option \"%s\"", $1),
+                                     parser_errposition(@2)));
+                }
+            | IDENT
+                {
+                    if (strcmp($1, "create_slot") == 0)
+                        $$ = makeDefElem("create_slot",
+                                         (Node *)makeInteger(TRUE), @1);
+                    else if (strcmp($1, "nocreate_slot") == 0)
+                        $$ = makeDefElem("create_slot",
+                                         (Node *)makeInteger(FALSE), @1);
+                }
+        ;

Hm, the IDENT case ignores $1 if it's not create_slot/nocreate_slot and
thus leaves $$ uninitialized?

I again really would like to have the error checking elsewhere.



- Andres



Re: Logical Replication WIP

From
Steve Singer
Date:
On 10/31/2016 06:38 AM, Petr Jelinek wrote:
> There are some fundamental issues with initial sync that need to be 
> discussed on list but this one is not known. I'll try to convert this 
> to test case (seems like useful one) and fix it, thanks for the 
> report. In meantime I realized I broke the last patch in the series 
> during rebase so attached is the fixed version. It also contains the 
> type info in the protocol.
>
>

I don't know if this is covered by the known initial_sync problems or not


If I have a 'all tables' publication and then create a new table the 
data doesn't seem to replicate to the new table.


P: create table a(a serial4 primary key, b text);
S: create table a(a serial4 primary key, b text);
P: create publication mypub for all tables;
S: create subscription mysub connection 'host=localhost dbname=test 
port=5441' publication mypub;
P: create table b(a serial4 primary key, b text);
P: insert into b(b) values ('foo2');
P: insert into a(b) values ('foo3');

Then I check my subscriber

select * FROM a; a |  b
---+------ 1 | foo 2 | foo3
(2 rows)

test=# select * FROM b; a | b
---+---
(0 rows)


However, if the table isn't on the subscriber I do get an error:

ie

P: create table c(a serial4 primary key, b text);
P: insert into c(b) values('foo');

2016-11-05 11:49:31.456 EDT [14938] FATAL:  the logical replication 
target public.c not found
2016-11-05 11:49:31.457 EDT [13703] LOG:  worker process: logical 
replication worker 16457 (PID 14938) exited with exit code 1

but if then add the table
S: create table c(a serial4 primary key, b text);
2016-11-05 11:51:08.583 EDT [15014] LOG:  logical replication apply for 
subscription mysub started

but the data doesn't replicate to table c either.





Re: Logical Replication WIP

From
Peter Eisentraut
Date:
Review of v7 0003-Add-PUBLICATION-catalogs-and-DDL.patch:

This appears to address previous reviews and is looking pretty solid.  I
have some comments that are easily addressed:

[still from previous review] The code for OCLASS_PUBLICATION_REL in
getObjectIdentityParts() does not fill in objname and objargs, as it is
supposed to.

catalog.sgml: pg_publication_rel column names must be updated after renaming

alter_publication.sgml and elsewhere: typos PuBLISH_INSERT etc.

create_publication.sgml: FOR TABLE ALL IN SCHEMA does not exist anymore

create_publication.sgml: talks about not-yet-existing SUBSCRIPTION role

DropPublicationById maybe name RemovePublicationById for consistency

system_views.sql: C.relkind = 'r' unnecessary

CheckCmdReplicaIdentity: error message says "cannot update", should
distinguish between update and delete

relcache.c: pubactions->pubinsert |= pubform->pubinsert; etc. should be ||=

RelationData.rd_pubactions could be a bitmap, simplifying some memcpy
and context management.  But RelationData appears to favor rich data
structures, so maybe that is fine.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Logical Replication WIP

From
Peter Eisentraut
Date:
On 11/4/16 9:00 AM, Andres Freund wrote:
> +  <para>
> +   The <structname>pg_publication_rel</structname> catalog contains
> +   mapping between tables and publications in the database. This is many to
> +   many mapping.
> +  </para>
> 
> I wonder if we shouldn't abstract this a bit away from relations to
> allow other objects to be exported to. Could structure it a bit more
> like pg_depend.

I think we can add/change that when we have use for it.

> +   <varlistentry>
> +    <term><literal>FOR TABLE ALL IN SCHEMA</literal></term>
> +    <listitem>
> +     <para>
> +      Specifies optional schema for which all logged tables will be added to
> +      publication.
> +     </para>
> +    </listitem>
> +   </varlistentry>
> 
> "FOR TABLE ALL IN SCHEMA" sounds weird.

That clause no longer exists anyway.

> +  <para>
> +   This operation does not reserve any resources on the server. It only
> +   defines grouping and filtering logic for future subscribers.
> +  </para>
> 
> That's strictly speaking not true, maybe rephrase a bit?

Maybe the point is that it does not initiate any contact with remote nodes.

> +/*
> + * Create new publication.
> + * TODO ACL check
> + */
> 
> Hm?

The first patch is going to be just superuser and replication role.  I'm
working on a patch set for later that adds proper ACLs, owners, and all
that.  So I'd suggest to ignore these details for now, unless of course
you find permission checks *missing*.

> +/*
> + * Drop publication by OID
> + */
> +void
> +DropPublicationById(Oid pubid)
> +
> +/*
> + * Remove relation from publication by mapping OID.
> + */
> +void
> +RemovePublicationRelById(Oid proid)
> +{
> 
> Permission checks?
> 
> +}
> 
> Hm. Neither of these does dependency checking, wonder if that can be
> argued to be problematic.

The dependency checking is done before it gets to these functions, no?

> /*
> + * Check if command can be executed with current replica identity.
> + */
> +static void
> +CheckCmdReplicaIdentity(Relation rel, CmdType cmd)
> +{
> +    PublicationActions *pubactions;
> +
> +    /* We only need to do checks for UPDATE and DELETE. */
> +    if (cmd != CMD_UPDATE && cmd != CMD_DELETE)
> +        return;
> +
> +    /* If relation has replica identity we are always good. */
> +    if (rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL ||
> +        OidIsValid(RelationGetReplicaIndex(rel)))
> +        return;
> +
> +    /*
> +     * This is either UPDATE OR DELETE and there is no replica identity.
> +     *
> +     * Check if the table publishes UPDATES or DELETES.
> +     */
> +    pubactions = GetRelationPublicationActions(rel);
> +    if (pubactions->pubupdate || pubactions->pubdelete)
> 
> I think that leads to spurious errors. Consider a DELETE with a
> publication that replicates updates but not deletes.

Yeah, it needs to check the pubactions against the specific command.

> +} FormData_pg_publication;
> 
> Shouldn't this have an owner?

Yes, see above.

> I also wonder if we want an easier to
> extend form of pubinsert/update/delete (say to add pubddl, pubtruncate,
> pub ... without changing the schema).

Maybe, but how?  (without using weird array constructs that are a pain
to parse in psql and pg_dump, for example)

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 04/11/16 13:15, Andres Freund wrote:
> 
>  /* Prototypes for private functions */
> -static bool libpq_select(int timeout_ms);
> +static bool libpq_select(PGconn *streamConn,
> +                         int timeout_ms);
> 
> If we're starting to use this more widely, we really should just a latch
> instead of the plain select(). In fact, I think it's more or less a bug
> that we don't (select is only interruptible by signals on a subset of
> our platforms).  That shouldn't bother this patch, but...
> 
> 

Agree that this is problem, especially for the subscription creation
later. We should be doing WaitLatchOrSocket, but the question is which
latch. We can't use MyProc one as that's not the latch that WalReceiver
uses so I guess we would have to send latch as parameter to any caller
of this which is not very pretty from api perspective but I don't have
better idea here.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 04/11/16 13:07, Andres Freund wrote:
>
> Hm. I think I have to agree a bit with Peter here.  Overloading
> MyReplicationSlot this way seems ugly, and I think there's a bunch of
> bugs around it too.
>
>
> Sounds what we really want is a) two different lifetimes for ephemeral
> slots, session and "command" b) have a number of slots that are released
> either after a failed transaction / command or at session end.   The
> easiest way for that appears to have a list of slots to be checked at
> end-of-xact and backend shutdown.
>

Ok so how about attached? It adds temp slots as new type of persistence.
It does not really touch the behavior of any of the existing API or
persistence settings.

The temp slots are just cleaned up on backend exit or error, other than
that they are not special. I don't use any specific backend local list
to track them, instead they have active_pid always set and just cleanup
everything that has that set at the end of the session. This has nice
property that it forbids other backends for acquiring them.

It does not do any locking while searching for the slots to cleanup (see
ReplicationSlotCleanup), mainly because it complicates the interaction
with ReplicationSlotDropPtr and it seems to me that locking there is not
really needed there as other backends will never change active_pid to
our backend pid and then the ReplicationSlotDropPtr does exclusive lock
when resetting it.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 04/11/16 14:00, Andres Freund wrote:
> Hi,
> 
> + <sect1 id="catalog-pg-publication-rel">
> +  <title><structname>pg_publication_rel</structname></title>
> +
> +  <indexterm zone="catalog-pg-publication-rel">
> +   <primary>pg_publication_rel</primary>
> +  </indexterm>
> +
> +  <para>
> +   The <structname>pg_publication_rel</structname> catalog contains
> +   mapping between tables and publications in the database. This is many to
> +   many mapping.
> +  </para>
> 
> I wonder if we shouldn't abstract this a bit away from relations to
> allow other objects to be exported to. Could structure it a bit more
> like pg_depend.
> 

Honestly, let's not overdesign this. Change like that can be made in the
future if we need it and I am quite unconvinced we do given that
anything we might want to replicate will be relation. I understand that
it might be useful to know what's on downstream in terms of objects at
some point for some future functionality,  but I am don't have idea how
that functionality will look like so it's premature to guess what
catalog structure it will need.

> 
> +ALTER PUBLICATION <replaceable class="PARAMETER">name</replaceable> [ [ WITH ] <replaceable
class="PARAMETER">option</replaceable>[ ... ] ]
 
> +
> +<phrase>where <replaceable class="PARAMETER">option</replaceable> can be:</phrase>
> +
> +      PuBLISH_INSERT | NOPuBLISH_INSERT
> +    | PuBLISH_UPDATE | NOPuBLISH_UPDATE
> +    | PuBLISH_DELETE | NOPuBLISH_DELETE
> 
> That's odd casing.
> 
> 
> +   <varlistentry>
> +    <term><literal>PuBLISH_INSERT</literal></term>
> +    <term><literal>NOPuBLISH_INSERT</literal></term>
> +    <term><literal>PuBLISH_UPDATE</literal></term>
> +    <term><literal>NOPuBLISH_UPDATE</literal></term>
> +    <term><literal>PuBLISH_DELETE</literal></term>
> +    <term><literal>NOPuBLISH_DELETE</literal></term>
> 

Ah typo in my sed script, fun.

> More odd casing.
> 
> +   <varlistentry>
> +    <term><literal>FOR TABLE</literal></term>
> +    <listitem>
> +     <para>
> +      Specifies optional list of tables to add to the publication.
> +     </para>
> +    </listitem>
> +   </varlistentry>
> +
> +   <varlistentry>
> +    <term><literal>FOR TABLE ALL IN SCHEMA</literal></term>
> +    <listitem>
> +     <para>
> +      Specifies optional schema for which all logged tables will be added to
> +      publication.
> +     </para>
> +    </listitem>
> +   </varlistentry>
> 
> "FOR TABLE ALL IN SCHEMA" sounds weird.
> 

I actually removed support for this at some point, forgot to remove
docs. I might add this feature again in the future but I reckon we can
live without it in v1.


> +  <para>
> +   This operation does not reserve any resources on the server. It only
> +   defines grouping and filtering logic for future subscribers.
> +  </para>
> 
> That's strictly speaking not true, maybe rephrase a bit?
> 

Sure, this basically is supposed to mean that it does not really start
replication or keep wal or anything like that as opposed what for
example slots do.

> +/*
> + * Check if relation can be in given publication and throws appropriate
> + * error if not.
> + */
> +static void
> +check_publication_add_relation(Relation targetrel)
> +{
> +    /* Must be table */
> +    if (RelationGetForm(targetrel)->relkind != RELKIND_RELATION)
> +        ereport(ERROR,
> +                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                 errmsg("only tables can be added to publication"),
> +                 errdetail("%s is not a table",
> +                           RelationGetRelationName(targetrel))));
> +
> +    /* Can't be system table */
> +    if (IsCatalogRelation(targetrel))
> +        ereport(ERROR,
> +                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                 errmsg("only user tables can be added to publication"),
> +                 errdetail("%s is a system table",
> +                           RelationGetRelationName(targetrel))));
> +
> +    /* UNLOGGED and TEMP relations cannot be part of publication. */
> +    if (!RelationNeedsWAL(targetrel))
> +        ereport(ERROR,
> +                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                 errmsg("UNLOGGED and TEMP relations cannot be replicated")));
> +}
> 
> This probably means we need a check in the ALTER TABLE ... SET UNLOGGED
> path.
> 

Good point.

> 
> +/*
> + * Returns if relation represented by oid and Form_pg_class entry
> + * is publishable.
> + *
> + * Does same checks as the above, but does not need relation to be opened
> + * and also does not throw errors.
> + */
> +static bool
> +is_publishable_class(Oid relid, Form_pg_class reltuple)
> +{
> +    return reltuple->relkind == RELKIND_RELATION &&
> +        !IsCatalogClass(relid, reltuple) &&
> +        reltuple->relpersistence == RELPERSISTENCE_PERMANENT &&
> +        /* XXX needed to exclude information_schema tables */
> +        relid >= FirstNormalObjectId;
> +}
> 
> Shouldn't that be IsCatalogRelation() instead?
> 

Well IsCatalogRelation just calls IsCatalogClass and we call
IsCatalogClass here as well. The problem with IsCatalogClass is that it
does not consider tables in information_schema that were created as part
of initdb to be system catalogs because it first does negative check on
pg_catalog and toast schemas and only then considers
FirstNormalObjectId. I was actually wondering if that might be a bug in
IsCatalogClass.

> 
> +/*
> + * Create new publication.
> + * TODO ACL check
> + */
> 

That was meant for future enhancements, but I think I'll don't do
detailed ACLs in v1 so I'll remove that TODO.

> +
> +/*
> + * Drop publication by OID
> + */
> +void
> +DropPublicationById(Oid pubid)
> +
> +/*
> + * Remove relation from publication by mapping OID.
> + */
> +void
> +RemovePublicationRelById(Oid proid)
> +{
> 
> Permission checks?
> 
> +}
> 
> Hm. Neither of these does dependency checking, wonder if that can be
> argued to be problematic.
> 

As PeterE said, that's done by caller, none of the Drop...ById does
dependency checks.

> +publication_opt_item:
> +            IDENT
> +                {
> +                    /*
> +                     * We handle identifiers that aren't parser keywords with
> +                     * the following special-case codes, to avoid bloating the
> +                     * size of the main parser.
> +                     */
> +                    if (strcmp($1, "publish_insert") == 0)
> +                        $$ = makeDefElem("publish_insert",
> +                                         (Node *)makeInteger(TRUE), @1);
> +                    else if (strcmp($1, "nopublish_insert") == 0)
> +                        $$ = makeDefElem("publish_insert",
> +                                         (Node *)makeInteger(FALSE), @1);
> +                    else if (strcmp($1, "publish_update") == 0)
> +                        $$ = makeDefElem("publish_update",
> +                                         (Node *)makeInteger(TRUE), @1);
> +                    else if (strcmp($1, "nopublish_update") == 0)
> +                        $$ = makeDefElem("publish_update",
> +                                         (Node *)makeInteger(FALSE), @1);
> +                    else if (strcmp($1, "publish_delete") == 0)
> +                        $$ = makeDefElem("publish_delete",
> +                                         (Node *)makeInteger(TRUE), @1);
> +                    else if (strcmp($1, "nopublish_delete") == 0)
> +                        $$ = makeDefElem("publish_delete",
> +                                         (Node *)makeInteger(FALSE), @1);
> +                    else
> +                        ereport(ERROR,
> +                                (errcode(ERRCODE_SYNTAX_ERROR),
> +                                 errmsg("unrecognized publication option \"%s\"", $1),
> +                                     parser_errposition(@1)));
> +                }
> +        ;
> 
> I still would very much like to move this outside of gram.y and just use
> IDENTs here. Like how COPY options are handled.
> 

Well, I looked into it and it means some loss of info in the error
messages - mainly the error position in the query because utility
statements don't get ParseState (unlike COPY). It might be worth the
flexibility though.

> 
> 
> +CATALOG(pg_publication,6104)
> +{
> +    NameData    pubname;            /* name of the publication */
> +
> +    /*
> +     * indicates that this is special publication which should encompass
> +     * all tables in the database (except for the unlogged and temp ones)
> +     */
> +    bool        puballtables;
> +
> +    /* true if inserts are published */
> +    bool        pubinsert;
> +
> +    /* true if updates are published */
> +    bool        pubupdate;
> +
> +    /* true if deletes are published */
> +    bool        pubdelete;
> +
> +} FormData_pg_publication;
> 
> Shouldn't this have an owner? 

Probably, I wanted to do that as follow-up patch originally, but looks
like it should be in initial version.

>  I also wonder if we want an easier to
> extend form of pubinsert/update/delete (say to add pubddl, pubtruncate,
> pub ... without changing the schema).
> 

So like, text array that's then parsed everywhere (I am not doing
bitmask/int definitely)?

> 
> +/* ----------------
> + *        pg_publication_rel definition.  cpp turns this into
> + *        typedef struct FormData_pg_publication_rel
> + *
> + * ----------------
> + */
> +#define PublicationRelRelationId                6106
> +
> +CATALOG(pg_publication_rel,6106)
> +{
> +    Oid        prpubid;                /* Oid of the publication */
> +    Oid        prrelid;                /* Oid of the relation */
> +} FormData_pg_publication_rel;
> 
> To me it seems like a good idea to have objclassid/objsubid here.
> 

You said that in the beginning, but again I am not quite convinced of
that yet. i guess if PeterE will move the sequence patches all the way
and we might lose the notion that sequences are relation (not sure if
that's where he is ultimately going though), that might make sense,
otherwise, don't really think this we need that.


--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 04/11/16 14:24, Andres Freund wrote:
> Hi,
> 
> (btw, I vote against tarballing patches)
>

Well, I vote against CF app not handling correctly emails with multiple
attachments :)


> +   <tgroup cols="4">
> +    <thead>
> +     <row>
> +      <entry>Name</entry>
> +      <entry>Type</entry>
> +      <entry>References</entry>
> +      <entry>Description</entry>
> +     </row>
> +    </thead>
> +
> +    <tbody>
> +     <row>
> +      <entry><structfield>oid</structfield></entry>
> +      <entry><type>oid</type></entry>
> +      <entry></entry>
> +      <entry>Row identifier (hidden attribute; must be explicitly selected)</entry>
> +     </row>
> +
> 
> +     <row>
> +      <entry><structfield>subpublications</structfield></entry>
> +      <entry><type>name[]</type></entry>
> +      <entry></entry>
> +      <entry>Array of subscribed publication names. These reference the
> +       publications on the publisher server.
> +      </entry>
> 
> Why is this names and not oids? So you can see it across databases?
> 

Because they only exist on remote server.

> 
> 
>  include $(top_srcdir)/src/backend/common.mk
> diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
> index 68d7e46..523008d 100644
> --- a/src/backend/commands/event_trigger.c
> +++ b/src/backend/commands/event_trigger.c
> @@ -112,6 +112,7 @@ static event_trigger_support_data event_trigger_support[] = {
>      {"SCHEMA", true},
>      {"SEQUENCE", true},
>      {"SERVER", true},
> +    {"SUBSCRIPTION", true},
> 
> Hm, is that ok? Subscriptions are shared, so ...?
> 

Good point, forgot event triggers don't handle shared objects.

> 
> +        /*
> +         * If requested, create the replication slot on remote side for our
> +         * newly created subscription.
> +         *
> +         * Note, we can't cleanup slot in case of failure as reason for
> +         * failure might be already existing slot of the same name and we
> +         * don't want to drop somebody else's slot by mistake.
> +         */
> +        if (create_slot)
> +        {
> +            XLogRecPtr            lsn;
> +
> +            /*
> +             * Create the replication slot on remote side for our newly created
> +             * subscription.
> +             *
> +             * Note, we can't cleanup slot in case of failure as reason for
> +             * failure might be already existing slot of the same name and we
> +             * don't want to drop somebody else's slot by mistake.
> +             */
> 
> We should really be able to recognize that based on the error code...
> 

We could, provided that the slot is active, but that would leave nasty
race condition where if you do drop and the other subscription of same
name is not running (restarting, temporarily disabled, etc) we'll remove
the slot for it. Maybe we should not care about that and say slot is
representing the subscription and if you name slot same for two
different subscriptions then that's your problem though.

> +/*
> + * Drop subscription by OID
> + */
> +void
> +DropSubscriptionById(Oid subid)
> +{
> 
> +    /*
> +     * We must ignore errors here as that would make it impossible to drop
> +     * subscription when publisher is down.
> +     */
> 
> I'm not convinced.  Leaving a slot around without a "record" of it on
> the creating side isn't nice either. Maybe a FORCE flag or something?
> 

I would like to have this as option yes, not sure if FORCE is best
naming, but I have trouble coming up with good name. We have CREATE_SLOT
and NOCREATE_SLOT for CREATE SUBSCRIPTION, so maybe we could have
DROP_SLOT (default) and NODROP_SLOT for DROP SUBSCRIPTION.


--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 08/11/16 19:51, Peter Eisentraut wrote:
> Review of v7 0003-Add-PUBLICATION-catalogs-and-DDL.patch:
> 
> This appears to address previous reviews and is looking pretty solid.  I
> have some comments that are easily addressed:
> 
> [still from previous review] The code for OCLASS_PUBLICATION_REL in
> getObjectIdentityParts() does not fill in objname and objargs, as it is
> supposed to.
> 
> catalog.sgml: pg_publication_rel column names must be updated after renaming
> 
> alter_publication.sgml and elsewhere: typos PuBLISH_INSERT etc.
> 
> create_publication.sgml: FOR TABLE ALL IN SCHEMA does not exist anymore
> 
> create_publication.sgml: talks about not-yet-existing SUBSCRIPTION role
> 
> DropPublicationById maybe name RemovePublicationById for consistency
> 
> system_views.sql: C.relkind = 'r' unnecessary
> 
> CheckCmdReplicaIdentity: error message says "cannot update", should
> distinguish between update and delete
> 
> relcache.c: pubactions->pubinsert |= pubform->pubinsert; etc. should be ||=
> 
> RelationData.rd_pubactions could be a bitmap, simplifying some memcpy
> and context management.  But RelationData appears to favor rich data
> structures, so maybe that is fine.
> 

Thanks for these, some of it is result of various rebases that I did
(the sync patch makes rebasing bit complicated as it touches everything)
and it's easy for me to overlook it at this point.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Andres Freund
Date:
On 2016-11-10 23:31:27 +0100, Petr Jelinek wrote:
> On 04/11/16 13:15, Andres Freund wrote:
> > 
> >  /* Prototypes for private functions */
> > -static bool libpq_select(int timeout_ms);
> > +static bool libpq_select(PGconn *streamConn,
> > +                         int timeout_ms);
> > 
> > If we're starting to use this more widely, we really should just a latch
> > instead of the plain select(). In fact, I think it's more or less a bug
> > that we don't (select is only interruptible by signals on a subset of
> > our platforms).  That shouldn't bother this patch, but...
> > 
> > 
> 
> Agree that this is problem, especially for the subscription creation
> later. We should be doing WaitLatchOrSocket, but the question is which
> latch. We can't use MyProc one as that's not the latch that WalReceiver
> uses so I guess we would have to send latch as parameter to any caller
> of this which is not very pretty from api perspective but I don't have
> better idea here.

I think we should simply make walsender use the standard proc
latch. Afaics that should be fairly trivial?

Greetings,

Andres Freund



Re: Logical Replication WIP

From
Andres Freund
Date:
Hi,

On 2016-11-11 12:04:27 +0100, Petr Jelinek wrote:
> On 04/11/16 14:00, Andres Freund wrote:
> > Hi,
> > 
> > + <sect1 id="catalog-pg-publication-rel">
> > +  <title><structname>pg_publication_rel</structname></title>
> > +
> > +  <indexterm zone="catalog-pg-publication-rel">
> > +   <primary>pg_publication_rel</primary>
> > +  </indexterm>
> > +
> > +  <para>
> > +   The <structname>pg_publication_rel</structname> catalog contains
> > +   mapping between tables and publications in the database. This is many to
> > +   many mapping.
> > +  </para>
> > 
> > I wonder if we shouldn't abstract this a bit away from relations to
> > allow other objects to be exported to. Could structure it a bit more
> > like pg_depend.
> > 
> 
> Honestly, let's not overdesign this. Change like that can be made in the
> future if we need it and I am quite unconvinced we do given that
> anything we might want to replicate will be relation. I understand that
> it might be useful to know what's on downstream in terms of objects at
> some point for some future functionality,  but I am don't have idea how
> that functionality will look like so it's premature to guess what
> catalog structure it will need.

I slightly prefer to make it more generic right now, but I don't think
that's a blocker.

> > I still would very much like to move this outside of gram.y and just use
> > IDENTs here. Like how COPY options are handled.
> > 
> 
> Well, I looked into it and it means some loss of info in the error
> messages - mainly the error position in the query because utility
> statements don't get ParseState (unlike COPY). It might be worth the
> flexibility though.

Pretty sure that that's the case.


> >  I also wonder if we want an easier to
> > extend form of pubinsert/update/delete (say to add pubddl, pubtruncate,
> > pub ... without changing the schema).
> > 
> 
> So like, text array that's then parsed everywhere (I am not doing
> bitmask/int definitely)?

Yes, that sounds good to me. Then convert it to individual booleans or a
bitmask when loading the publications into the in-memory form (which you
already do).

Greetings,

Andres Freund



Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 12/11/16 20:19, Andres Freund wrote:
> On 2016-11-10 23:31:27 +0100, Petr Jelinek wrote:
>> On 04/11/16 13:15, Andres Freund wrote:
>>>
>>>  /* Prototypes for private functions */
>>> -static bool libpq_select(int timeout_ms);
>>> +static bool libpq_select(PGconn *streamConn,
>>> +                         int timeout_ms);
>>>
>>> If we're starting to use this more widely, we really should just a latch
>>> instead of the plain select(). In fact, I think it's more or less a bug
>>> that we don't (select is only interruptible by signals on a subset of
>>> our platforms).  That shouldn't bother this patch, but...
>>>
>>>
>>
>> Agree that this is problem, especially for the subscription creation
>> later. We should be doing WaitLatchOrSocket, but the question is which
>> latch. We can't use MyProc one as that's not the latch that WalReceiver
>> uses so I guess we would have to send latch as parameter to any caller
>> of this which is not very pretty from api perspective but I don't have
>> better idea here.
> 
> I think we should simply make walsender use the standard proc
> latch. Afaics that should be fairly trivial?

Walreceiver you mean. Yeah that should be simple, looking at the code I
am not quite sure why it uses separate latch in the first place.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Peter Eisentraut
Date:
On 11/12/16 2:18 PM, Andres Freund wrote:
>>>  I also wonder if we want an easier to
>>> > > extend form of pubinsert/update/delete (say to add pubddl, pubtruncate,
>>> > > pub ... without changing the schema).
>>> > > 
>> > 
>> > So like, text array that's then parsed everywhere (I am not doing
>> > bitmask/int definitely)?
> Yes, that sounds good to me. Then convert it to individual booleans or a
> bitmask when loading the publications into the in-memory form (which you
> already do).

I'm not sure why that would be better.  Adding catalog columns in future
versions is not a problem.  We're not planning on adding hundreds of
publication attributes.  Denormalizing catalog columns creates all kinds
of inconveniences, in the backend code, in frontend code, for users.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Logical Replication WIP

From
Andres Freund
Date:
On 2016-11-13 00:40:12 -0500, Peter Eisentraut wrote:
> On 11/12/16 2:18 PM, Andres Freund wrote:
> >>>  I also wonder if we want an easier to
> >>> > > extend form of pubinsert/update/delete (say to add pubddl, pubtruncate,
> >>> > > pub ... without changing the schema).
> >>> > > 
> >> > 
> >> > So like, text array that's then parsed everywhere (I am not doing
> >> > bitmask/int definitely)?
> > Yes, that sounds good to me. Then convert it to individual booleans or a
> > bitmask when loading the publications into the in-memory form (which you
> > already do).
> 
> I'm not sure why that would be better.  Adding catalog columns in future
> versions is not a problem.

It can be extended from what core provides, for extended versions of
replication solutions, for one. I presume publications/subscriptions
aren't only going to be used by built-in code.

Andres



Re: Logical Replication WIP

From
Steve Singer
Date:
On 10/31/2016 06:38 AM, Petr Jelinek wrote:
> On 31/10/16 00:52, Steve Singer wrote:
> There are some fundamental issues with initial sync that need to be
> discussed on list but this one is not known. I'll try to convert this
> to test case (seems like useful one) and fix it, thanks for the
> report. In meantime I realized I broke the last patch in the series
> during rebase so attached is the fixed version. It also contains the
> type info in the protocol.
>

Attached are some proposed documentation updates (to be applied ontop of
your 20161031 patch set)

Also

<sect1 id="logical-replication-publication">
   <title>Publication</title>


+  <para>
+    The tables are matched using fully qualified table name. Renaming of
+    tables or schemas is not supported.
+  </para>

Is renaming of tables any less supported than other DDL operations
For example

alter table nokey2 rename to nokey3
select * FROM pg_publication_tables ;
  pubname | schemaname | tablename
---------+------------+-----------
  tpub    | public     | nokey3
(1 row)


If I then kill the postmaster on my subscriber and restart it, I get

2016-11-13 16:17:11.341 EST [29488] FATAL:  the logical replication
target public.nokey3 not found
2016-11-13 16:17:11.342 EST [29272] LOG:  worker process: logical
replication worker 41076 (PID 29488) exited with exit code 1
2016-11-13 16:17:16.350 EST [29496] LOG:  logical replication apply for
subscription nokeysub started
2016-11-13 16:17:16.358 EST [29498] LOG:  logical replication sync for
subscription nokeysub, table nokey2 started
2016-11-13 16:17:16.515 EST [29498] ERROR:  table public.nokey2 not
found on publisher
2016-11-13 16:17:16.517 EST [29272] LOG:  worker process: logical
replication worker 41076 sync 24688 (PID 29498) exited with exit code 1

but if I then rename the table on the subscriber everything seems to work.

(I suspect the need to kill+restart is a bug, I've seen other instances
where a hard restart of the subscriber following changes to is required)


I am also having issues adding a table to a publication ,it doesn't seem
work

P: create publication tpub for table a;
S: create subscription mysub connection 'host=localhost dbname=test
port=5440' publication tpub;
P: insert into a(b) values ('1');
P: alter publication tpub add table b;
P: insert into b(b) values ('1');
P: insert into a(b) values ('2');


select * FROM pg_publication_tables ;
  pubname | schemaname | tablename
---------+------------+-----------
  tpub    | public     | a
  tpub    | public     | b


but


S:  select * FROM b;
  a | b
---+---
(0 rows)
S: select * FROM a;
  a | b
---+---
  5 | 1
  6 | 2
(2 rows)



Attachment

Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 13/11/16 10:21, Andres Freund wrote:
> On 2016-11-13 00:40:12 -0500, Peter Eisentraut wrote:
>> On 11/12/16 2:18 PM, Andres Freund wrote:
>>>>>  I also wonder if we want an easier to
>>>>>>> extend form of pubinsert/update/delete (say to add pubddl, pubtruncate,
>>>>>>> pub ... without changing the schema).
>>>>>>>
>>>>>
>>>>> So like, text array that's then parsed everywhere (I am not doing
>>>>> bitmask/int definitely)?
>>> Yes, that sounds good to me. Then convert it to individual booleans or a
>>> bitmask when loading the publications into the in-memory form (which you
>>> already do).
>>
>> I'm not sure why that would be better.  Adding catalog columns in future
>> versions is not a problem.
> 
> It can be extended from what core provides, for extended versions of
> replication solutions, for one. I presume publications/subscriptions
> aren't only going to be used by built-in code.
> 

I understand the desire here (especially as an author of such out of the
core tools), but I am not sure if this is a good place where to start
having pluggable catalogs given that we have no generic idea for those.
Currently, plugins writing arbitrary data to catalogs will cause things
to break when those plugins get uninstalled (and we don't have good
mechanism for cleaning that up when that happens). And that won't change
if we convert this into array. Besides, shouldn't the code then anyway
check that we only have expected data in that array otherwise we might
miss corruption?

So if the main reason for turning this into array is extendability for
other providers then I am -1 on the idea. IMHO this is for completely
different path that adds user catalogs with proper syscache-like
interface and everything but has nothing to do with publications.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Robert Haas
Date:
On Sun, Nov 13, 2016 at 4:21 AM, Andres Freund <andres@anarazel.de> wrote:
> It can be extended from what core provides, for extended versions of
> replication solutions, for one. I presume publications/subscriptions
> aren't only going to be used by built-in code.

Hmm, I would not have presumed that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Logical Replication WIP

From
Petr Jelinek
Date:
Hi,

attached is v8. No tarballing this time ;)

About the patches:

0001:
This is the reworked approach to temporary slots that I sent earlier.

0002:
I ripped out the libpq_select completely and did what Andres suggested,
ie, WaitLatchOrSocket, that needed changes for WalReceiver to use
procLatch but that was trivial. Otherwise it's same.

0003:
Changes:
 - Moved the parser of options into C
 - Removed the dead references to "FOR TABLE ALL IN SCHEMA"
 - Rephrased some things and fixed several typos
 - Added needed check into ALTER TABLE ... SET UNLOGGED
 - Fixed the UPDATE/DELETE check in CheckCmdReplicaIdentity
 - Added owner
 - Fixed permission checks

I didn't do any of the text array instead of bools and
objclassid/objsubid as the reasoning for former is wrong IMHO and the
latter is quite premature and I am still not convinced it will be ever
needed.

I also didn't do couple of things reported by PeterE:
> relcache.c: pubactions->pubinsert |= pubform->pubinsert; etc. should be ||=

This one does not seem to be true, there is no ||= and |= works fine for
booleans.

And
> The code for OCLASS_PUBLICATION_REL in
> getObjectIdentityParts() does not fill in objname and objargs, as it is
> supposed to.

>From what I see it already does that.

0004:
Changes:
 - Added separate DropSubscriptionStmt statement for DROP. This was
prompted by Andres' comment about event triggers. The event triggers
actually work fine as all the SQL is only supposed to touch
subscriptions in current database even though it's shared catalog (it's
only shared because we need the catalog pin but that's implementation
detail), but the DROP would break if name matched subscription in
another database if handled by DropStmt.
 - Added SLOT_DROP/NOSLOT_DROP options to DROP SUBSCRIPTION, the new
DropSubscriptionStmt helps here as well
 - Added owner
 - Moved the option parsing into C

0005/0006 - Mainly just included the doc patch from Steve Singer and did
some additional doc fixes.

The 0007 is something that's more a question for discussion if we want
that. It adds new GUC that sets synchronous commit for apply workers and
defaults to off. This gives quite noticeable performance boost while
still working correctly even is provider uses sync replication. This is
based on experience (and default behaviour) of BDR and pglogical, but I
am not quite sure if core postgres should have that as well (I think it
definitely should have the option, question is more about default setting).

And that's it for now. After some discussion with PeterE I decided to
skip the initial sync patch as it has quite high impact on development
of rest of the patch (because it touches everything, I spend all my time
rebasing it instead of actually fixing things) and can be done as
follow-up patch. I also believe that it will also be polished much
faster once I can fully concentrate on it when this part is done.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 13/11/16 23:02, Steve Singer wrote:
> On 10/31/2016 06:38 AM, Petr Jelinek wrote:
>> On 31/10/16 00:52, Steve Singer wrote:
>> There are some fundamental issues with initial sync that need to be
>> discussed on list but this one is not known. I'll try to convert this
>> to test case (seems like useful one) and fix it, thanks for the
>> report. In meantime I realized I broke the last patch in the series
>> during rebase so attached is the fixed version. It also contains the
>> type info in the protocol.
>>
> 
> Attached are some proposed documentation updates (to be applied ontop of
> your 20161031 patch set)
> 

Merged into v8, thanks!

There is one exception though:
> *** 195,214 ****
>     </para>
>     <para>
>       A conflict will produce an error and will stop the replication; it
> !     must be resolved manually by the user.
>     </para>
>     <para>
> !     The resolution can be done either by changing data on the subscriber
> !     so that it does not conflict with incoming change or by skipping the
> !     transaction that conflicts with the existing data. The transaction
> !     can be skipped by calling the
> !     <link linkend="pg-replication-origin-advance">
> !     <function>pg_replication_origin_advance()</function></link> function
> !     with a <literal>node_name</> corresponding to the subscription name. The
> !     current position of origins can be seen in the
> !     <link linkend="view-pg-replication-origin-status">
> !     <structname>pg_replication_origin_status</structname></link> system view.
> !   </para>
>   </sect1>
>   <sect1 id="logical-replication-architecture">

I don't see why this needs to be removed? Maybe it could be improved but
certainly not removed?

> Also
> 
> <sect1 id="logical-replication-publication">
>   <title>Publication</title>
> 
> 
> +  <para>
> +    The tables are matched using fully qualified table name. Renaming of
> +    tables or schemas is not supported.
> +  </para>
> 
> Is renaming of tables any less supported than other DDL operations
> For example
> 

I changed that text as it means something completely different.

> alter table nokey2 rename to nokey3
> select * FROM pg_publication_tables ;
>  pubname | schemaname | tablename
> ---------+------------+-----------
>  tpub    | public     | nokey3
> (1 row)
> 
> 
> If I then kill the postmaster on my subscriber and restart it, I get
> 
> 2016-11-13 16:17:11.341 EST [29488] FATAL:  the logical replication
> target public.nokey3 not found
> 2016-11-13 16:17:11.342 EST [29272] LOG:  worker process: logical
> replication worker 41076 (PID 29488) exited with exit code 1
> 2016-11-13 16:17:16.350 EST [29496] LOG:  logical replication apply for
> subscription nokeysub started
> 2016-11-13 16:17:16.358 EST [29498] LOG:  logical replication sync for
> subscription nokeysub, table nokey2 started
> 2016-11-13 16:17:16.515 EST [29498] ERROR:  table public.nokey2 not
> found on publisher
> 2016-11-13 16:17:16.517 EST [29272] LOG:  worker process: logical
> replication worker 41076 sync 24688 (PID 29498) exited with exit code 1
> 
> but if I then rename the table on the subscriber everything seems to work.
> 
> (I suspect the need to kill+restart is a bug, I've seen other instances
> where a hard restart of the subscriber following changes to is required)
> 

This is another initial sync patch bug.


--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Steve Singer
Date:
On Sun, 20 Nov 2016, Petr Jelinek wrote:

> On 13/11/16 23:02, Steve Singer wrote:

> There is one exception though:
>> *** 195,214 ****
>>     </para>
>>     <para>
>>       A conflict will produce an error and will stop the replication; it
>> !     must be resolved manually by the user.
>>     </para>
>>     <para>
>> !     The resolution can be done either by changing data on the subscriber
>> !     so that it does not conflict with incoming change or by skipping the
>> !     transaction that conflicts with the existing data. The transaction
>> !     can be skipped by calling the
>> !     <link linkend="pg-replication-origin-advance">
>> !     <function>pg_replication_origin_advance()</function></link> function
>> !     with a <literal>node_name</> corresponding to the subscription name. The
>> !     current position of origins can be seen in the
>> !     <link linkend="view-pg-replication-origin-status">
>> !     <structname>pg_replication_origin_status</structname></link> system view.
>> !   </para>
>>   </sect1>
>>   <sect1 id="logical-replication-architecture">
>
> I don't see why this needs to be removed? Maybe it could be improved but
> certainly not removed?
>

Sorry, I was confused. I noticed that the function was missing in the patch 
and thought it was documentation for a function that you had removed from 
recent versions of the patch versus referencing a function that is already 
committed.




Re: Logical Replication WIP

From
Erik Rijkers
Date:
On 2016-11-20 19:06, Petr Jelinek wrote:
> 0004-Add-SUBSCRIPTION-catalog-and-DDL-v8.patch.gz

This patch contains 2 tabs which break the html build when using 'make 
oldhtml':

$ ( cd 
/var/data1/pg_stuff/pg_sandbox/pgsql.logical_replication/doc/src/sgml; 
time make oldhtml )
make check-tabs
make[1]: Entering directory 
`/var/data1/pg_stuff/pg_sandbox/pgsql.logical_replication/doc/src/sgml'
./ref/create_subscription.sgml:                WITH (DISABLED);
Tabs appear in SGML/XML files
make[1]: *** [check-tabs] Error 1
make[1]: Leaving directory 
`/var/data1/pg_stuff/pg_sandbox/pgsql.logical_replication/doc/src/sgml'
make: *** [oldhtml-stamp] Error 2

Very minor change, but it fixes that build.

Thanks,

Erik Rijkers



Re: Logical Replication WIP

From
Erik Rijkers
Date:
and the attachment...

On 2016-11-22 14:55, Erik Rijkers wrote:
> On 2016-11-20 19:06, Petr Jelinek wrote:
>> 0004-Add-SUBSCRIPTION-catalog-and-DDL-v8.patch.gz
>
> This patch contains 2 tabs which break the html build when using 'make
> oldhtml':
>
> $ ( cd
> /var/data1/pg_stuff/pg_sandbox/pgsql.logical_replication/doc/src/sgml;
> time make oldhtml )
> make check-tabs
> make[1]: Entering directory
> `/var/data1/pg_stuff/pg_sandbox/pgsql.logical_replication/doc/src/sgml'
> ./ref/create_subscription.sgml:                WITH (DISABLED);
> Tabs appear in SGML/XML files
> make[1]: *** [check-tabs] Error 1
> make[1]: Leaving directory
> `/var/data1/pg_stuff/pg_sandbox/pgsql.logical_replication/doc/src/sgml'
> make: *** [oldhtml-stamp] Error 2
>
> Very minor change, but it fixes that build.
>
> Thanks,
>
> Erik Rijkers
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: Logical Replication WIP

From
Erik Rijkers
Date:
On 2016-11-20 19:02, Petr Jelinek wrote:

> 0001-Add-support-for-TE...cation-slots-v8.patch.gz (~8 KB)
> 0002-Refactor-libpqwalreceiver-v8.patch.gz (~9 KB)
> 0003-Add-PUBLICATION-catalogs-and-DDL-v8.patch.gz (~30 KB)
> 0004-Add-SUBSCRIPTION-catalog-and-DDL-v8.patch.gz (~27 KB)
> 0005-Define-logical-rep...output-plugi-v8.patch.gz (~13 KB)
> 0006-Add-logical-replication-workers-v8.patch.gz (~43 KB)
> 0007-Add-separate-synch...for-logical--v8.patch.gz (~2 KB)

Apply, make, make check, install OK.


A crash of the subscriber can be forced by running  vacuum <published 
table>  on the publisher.


- publisher
create table if not exists testt( id integer primary key, c text );
create publication pub1 for table testt;

- subscriber
create table if not exists testt( id integer primary key, c text );
create subscription sub1 connection 'dbname=testdb port=6444' 
publication pub1 with (disabled);
alter  subscription sub1 enable;

- publisher
vacuum testt;

now data change on the published table, (perhaps also a select on the 
subscriber-side data) leads to:


- subscriber log:
TRAP: FailedAssertion("!(pointer != ((void *)0))", File: "mcxt.c", Line: 
1001)
2016-11-22 18:13:13.983 CET 10177 LOG:  worker process: ??)? (PID 10334) 
was terminated by signal 6: Aborted
2016-11-22 18:13:13.983 CET 10177 LOG:  terminating any other active 
server processes
2016-11-22 18:13:13.983 CET 10338 WARNING:  terminating connection 
because of crash of another server process
2016-11-22 18:13:13.983 CET 10338 DETAIL:  The postmaster has commanded 
this server process to roll back the current transaction and exit, 
because another server process exited abnormally and possibly corrupted 
shared memory.
[...]




Erik Rijkers



Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 22/11/16 18:42, Erik Rijkers wrote:
> On 2016-11-20 19:02, Petr Jelinek wrote:
> 
>> 0001-Add-support-for-TE...cation-slots-v8.patch.gz (~8 KB)
>> 0002-Refactor-libpqwalreceiver-v8.patch.gz (~9 KB)
>> 0003-Add-PUBLICATION-catalogs-and-DDL-v8.patch.gz (~30 KB)
>> 0004-Add-SUBSCRIPTION-catalog-and-DDL-v8.patch.gz (~27 KB)
>> 0005-Define-logical-rep...output-plugi-v8.patch.gz (~13 KB)
>> 0006-Add-logical-replication-workers-v8.patch.gz (~43 KB)
>> 0007-Add-separate-synch...for-logical--v8.patch.gz (~2 KB)
> 
> Apply, make, make check, install OK.
> 
> 
> A crash of the subscriber can be forced by running  vacuum <published
> table>  on the publisher.
> 
> 
> - publisher
> create table if not exists testt( id integer primary key, c text );
> create publication pub1 for table testt;
> 
> - subscriber
> create table if not exists testt( id integer primary key, c text );
> create subscription sub1 connection 'dbname=testdb port=6444'
> publication pub1 with (disabled);
> alter  subscription sub1 enable;
> 
> - publisher
> vacuum testt;
> 
> now data change on the published table, (perhaps also a select on the
> subscriber-side data) leads to:
> 
> 
> - subscriber log:
> TRAP: FailedAssertion("!(pointer != ((void *)0))", File: "mcxt.c", Line:
> 1001)
> 2016-11-22 18:13:13.983 CET 10177 LOG:  worker process: ??)? (PID 10334)
> was terminated by signal 6: Aborted
> 2016-11-22 18:13:13.983 CET 10177 LOG:  terminating any other active
> server processes
> 2016-11-22 18:13:13.983 CET 10338 WARNING:  terminating connection
> because of crash of another server process
> 2016-11-22 18:13:13.983 CET 10338 DETAIL:  The postmaster has commanded
> this server process to roll back the current transaction and exit,
> because another server process exited abnormally and possibly corrupted
> shared memory.
> [...]
> 

Hi, thanks for report.

I very much doubt this is problem of vacuum as it does not send anything
to subscriber. Is there anything else you did on those servers?

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Erik Rijkers
Date:
On 2016-11-27 19:57, Petr Jelinek wrote:
> On 22/11/16 18:42, Erik Rijkers wrote:
>> On 2016-11-20 19:02, Petr Jelinek wrote:
>> 
>>> 0001-Add-support-for-TE...cation-slots-v8.patch.gz (~8 KB)
>>> 0002-Refactor-libpqwalreceiver-v8.patch.gz (~9 KB)
>>> 0003-Add-PUBLICATION-catalogs-and-DDL-v8.patch.gz (~30 KB)
>>> 0004-Add-SUBSCRIPTION-catalog-and-DDL-v8.patch.gz (~27 KB)
>>> 0005-Define-logical-rep...output-plugi-v8.patch.gz (~13 KB)
>>> 0006-Add-logical-replication-workers-v8.patch.gz (~43 KB)
>>> 0007-Add-separate-synch...for-logical--v8.patch.gz (~2 KB)
>> 
>> Apply, make, make check, install OK.
>> 
>> 
>> A crash of the subscriber can be forced by running  vacuum <published
>> table>  on the publisher.
>> 
>> 
>> - publisher
>> create table if not exists testt( id integer primary key, c text );
>> create publication pub1 for table testt;
>> 
>> - subscriber
>> create table if not exists testt( id integer primary key, c text );
>> create subscription sub1 connection 'dbname=testdb port=6444'
>> publication pub1 with (disabled);
>> alter  subscription sub1 enable;
>> 
>> - publisher
>> vacuum testt;
>> 
>> now data change on the published table, (perhaps also a select on the
>> subscriber-side data) leads to:
>> 
>> 
>> - subscriber log:
>> TRAP: FailedAssertion("!(pointer != ((void *)0))", File: "mcxt.c", 
>> Line:
>> 1001)

> 
> I very much doubt this is problem of vacuum as it does not send 
> anything
> to subscriber. Is there anything else you did on those servers?
> 

It is not the vacuum that triggers the crash but the data change (insert 
or delete, on the publisher)  /after/ that vacuum.

Just now, I compiled 2 instances from master and such a crash (after 
vacuum + delete) seems reliable here.

(If you can't duplicate such a crash let me know; then I'll dig out more 
precise set-up detail)

(by the way, the logical replication between the two instances works 
well otherwise)








Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 27/11/16 23:42, Erik Rijkers wrote:
> On 2016-11-27 19:57, Petr Jelinek wrote:
>> On 22/11/16 18:42, Erik Rijkers wrote:
>>> On 2016-11-20 19:02, Petr Jelinek wrote:
>>>
>>>> 0001-Add-support-for-TE...cation-slots-v8.patch.gz (~8 KB)
>>>> 0002-Refactor-libpqwalreceiver-v8.patch.gz (~9 KB)
>>>> 0003-Add-PUBLICATION-catalogs-and-DDL-v8.patch.gz (~30 KB)
>>>> 0004-Add-SUBSCRIPTION-catalog-and-DDL-v8.patch.gz (~27 KB)
>>>> 0005-Define-logical-rep...output-plugi-v8.patch.gz (~13 KB)
>>>> 0006-Add-logical-replication-workers-v8.patch.gz (~43 KB)
>>>> 0007-Add-separate-synch...for-logical--v8.patch.gz (~2 KB)
>>>
>>> Apply, make, make check, install OK.
>>>
>>>
>>> A crash of the subscriber can be forced by running  vacuum <published
>>> table>  on the publisher.
>>>
>>>
>>> - publisher
>>> create table if not exists testt( id integer primary key, c text );
>>> create publication pub1 for table testt;
>>>
>>> - subscriber
>>> create table if not exists testt( id integer primary key, c text );
>>> create subscription sub1 connection 'dbname=testdb port=6444'
>>> publication pub1 with (disabled);
>>> alter  subscription sub1 enable;
>>>
>>> - publisher
>>> vacuum testt;
>>>
>>> now data change on the published table, (perhaps also a select on the
>>> subscriber-side data) leads to:
>>>
>>>
>>> - subscriber log:
>>> TRAP: FailedAssertion("!(pointer != ((void *)0))", File: "mcxt.c", Line:
>>> 1001)
> 
>>
>> I very much doubt this is problem of vacuum as it does not send anything
>> to subscriber. Is there anything else you did on those servers?
>>
> 
> It is not the vacuum that triggers the crash but the data change (insert
> or delete, on the publisher)  /after/ that vacuum.
> 
> Just now, I compiled 2 instances from master and such a crash (after
> vacuum + delete) seems reliable here.
> 
> (If you can't duplicate such a crash let me know; then I'll dig out more
> precise set-up detail)
> 

I found the reason. It's not just vacuum (which was what confused me)
it's when the publishing side sends the info about relation again (which
happens when there was cache invalidation on the relation and then new
data were written) and I did free one pointer that I never set. I'll
send fixed patch tomorrow.
Thanks!

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 27/11/16 23:54, Petr Jelinek wrote:
> On 27/11/16 23:42, Erik Rijkers wrote:
>> On 2016-11-27 19:57, Petr Jelinek wrote:
>>> On 22/11/16 18:42, Erik Rijkers wrote:
>>>> A crash of the subscriber can be forced by running  vacuum <published
>>>> table>  on the publisher.
>>>>
>>>>
>>>> - publisher
>>>> create table if not exists testt( id integer primary key, c text );
>>>> create publication pub1 for table testt;
>>>>
>>>> - subscriber
>>>> create table if not exists testt( id integer primary key, c text );
>>>> create subscription sub1 connection 'dbname=testdb port=6444'
>>>> publication pub1 with (disabled);
>>>> alter  subscription sub1 enable;
>>>>
>>>> - publisher
>>>> vacuum testt;
>>>>
>>>> now data change on the published table, (perhaps also a select on the
>>>> subscriber-side data) leads to:
>>>>
>>>>
>>>> - subscriber log:
>>>> TRAP: FailedAssertion("!(pointer != ((void *)0))", File: "mcxt.c", Line:
>>>> 1001)
>>
>>>
>>> I very much doubt this is problem of vacuum as it does not send anything
>>> to subscriber. Is there anything else you did on those servers?
>>>
>>
>> It is not the vacuum that triggers the crash but the data change (insert
>> or delete, on the publisher)  /after/ that vacuum.
>>
>> Just now, I compiled 2 instances from master and such a crash (after
>> vacuum + delete) seems reliable here.
>>
>> (If you can't duplicate such a crash let me know; then I'll dig out more
>> precise set-up detail)
>>
>
> I found the reason. It's not just vacuum (which was what confused me)
> it's when the publishing side sends the info about relation again (which
> happens when there was cache invalidation on the relation and then new
> data were written) and I did free one pointer that I never set. I'll
> send fixed patch tomorrow.
> Thanks!
>

Okay, so here it is, I also included your doc fix, added test for
REPLICA IDENTITY FULL (which also tests this issue as side effect) and
fixed one relcache leak.

I also rebased it against current master as there was some conflict in
the bgworker.c.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Logical Replication WIP

From
Peter Eisentraut
Date:
I have taken the libpqwalreceiver refactoring patch and split it into
two: one for the latch change, one for the API change.  I have done some
mild editing.

These two patches are now ready to commit in my mind.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 30/11/16 22:37, Peter Eisentraut wrote:
> I have taken the libpqwalreceiver refactoring patch and split it into
> two: one for the latch change, one for the API change.  I have done some
> mild editing.
> 
> These two patches are now ready to commit in my mind.
> 

Hi, looks good to me, do you plan to commit this soon or would you
rather me to resubmit the patches rebased on top of this (and including
this) first?

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Peter Eisentraut
Date:
On 11/30/16 8:06 PM, Petr Jelinek wrote:
> On 30/11/16 22:37, Peter Eisentraut wrote:
>> I have taken the libpqwalreceiver refactoring patch and split it into
>> two: one for the latch change, one for the API change.  I have done some
>> mild editing.
>>
>> These two patches are now ready to commit in my mind.

> Hi, looks good to me, do you plan to commit this soon or would you
> rather me to resubmit the patches rebased on top of this (and including
> this) first?

committed those two

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Logical Replication WIP

From
Thomas Munro
Date:
On Fri, Dec 2, 2016 at 2:32 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 11/30/16 8:06 PM, Petr Jelinek wrote:
>> On 30/11/16 22:37, Peter Eisentraut wrote:
>>> I have taken the libpqwalreceiver refactoring patch and split it into
>>> two: one for the latch change, one for the API change.  I have done some
>>> mild editing.
>>>
>>> These two patches are now ready to commit in my mind.
>
>> Hi, looks good to me, do you plan to commit this soon or would you
>> rather me to resubmit the patches rebased on top of this (and including
>> this) first?
>
> committed those two

Commit 597a87ccc9a6fa8af7f3cf280b1e24e41807d555 left some comments
behind that referred to the select() that it removed.  Maybe rewrite
like in the attached?

I wonder if it would be worth creating and reusing a WaitEventSet here.

--
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 02/12/16 02:55, Thomas Munro wrote:
> On Fri, Dec 2, 2016 at 2:32 PM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>> On 11/30/16 8:06 PM, Petr Jelinek wrote:
>>> On 30/11/16 22:37, Peter Eisentraut wrote:
>>>> I have taken the libpqwalreceiver refactoring patch and split it into
>>>> two: one for the latch change, one for the API change.  I have done some
>>>> mild editing.
>>>>
>>>> These two patches are now ready to commit in my mind.
>>
>>> Hi, looks good to me, do you plan to commit this soon or would you
>>> rather me to resubmit the patches rebased on top of this (and including
>>> this) first?
>>
>> committed those two
> 
> Commit 597a87ccc9a6fa8af7f3cf280b1e24e41807d555 left some comments
> behind that referred to the select() that it removed.  Maybe rewrite
> like in the attached?

Agreed.

> 
> I wonder if it would be worth creating and reusing a WaitEventSet here.
> 

I don't think it's worth the extra code given that this is rarely called
interface.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Alvaro Herrera
Date:
Petr Jelinek wrote:
> On 02/12/16 02:55, Thomas Munro wrote:

> > Commit 597a87ccc9a6fa8af7f3cf280b1e24e41807d555 left some comments
> > behind that referred to the select() that it removed.  Maybe rewrite
> > like in the attached?
> 
> Agreed.

Thanks, pushed.


-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Logical Replication WIP

From
Peter Eisentraut
Date:
On 11/20/16 1:02 PM, Petr Jelinek wrote:
> 0001:
> This is the reworked approach to temporary slots that I sent earlier.

Andres, you had expressed an interest in this.  Will you be able to
review it soon?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Logical Replication WIP

From
Petr Jelinek
Date:
Hi,

this is rebased version after one of the patches was committed and there
were some renaming.

I also did some small fixes around pg_dump and changes syntax slightly
to what PeterE suggested in the beginning of the thread since I like it
more as it looks more like English (PUBLISH_INSERT => PUBLISH INSERT,
SLOT_NAME => SLOT NAME, etc).

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 02/12/16 19:35, Petr Jelinek wrote:
> Hi,
>
> this is rebased version after one of the patches was committed and there
> were some renaming.
>
> I also did some small fixes around pg_dump and changes syntax slightly
> to what PeterE suggested in the beginning of the thread since I like it
> more as it looks more like English (PUBLISH_INSERT => PUBLISH INSERT,
> SLOT_NAME => SLOT NAME, etc).
>

Ah sorry, wrong attachment.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Logical Replication WIP

From
Peter Eisentraut
Date:
I massaged the temporary replication slot patch a bit.  I changed the
column name in pg_stat_replication_slots from "persistent" to
"temporary" and flipped the logical sense, so that it is consistent with
the creation commands.  I also adjusted some comments and removed some
changes in ReplicationSlotCreate() that didn't seem to do anything
useful (might have been from a previous patch).

The attached patch looks good to me.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Logical Replication WIP

From
Haribabu Kommi
Date:


On Sun, Dec 4, 2016 at 12:06 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
I massaged the temporary replication slot patch a bit.  I changed the
column name in pg_stat_replication_slots from "persistent" to
"temporary" and flipped the logical sense, so that it is consistent with
the creation commands.  I also adjusted some comments and removed some
changes in ReplicationSlotCreate() that didn't seem to do anything
useful (might have been from a previous patch).

The attached patch looks good to me.

Moved to next CF with "needs review" status.


Regards,
Hari Babu
Fujitsu Australia

Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 04/12/16 02:06, Peter Eisentraut wrote:
> I massaged the temporary replication slot patch a bit.  I changed the
> column name in pg_stat_replication_slots from "persistent" to
> "temporary" and flipped the logical sense, so that it is consistent with
> the creation commands.  I also adjusted some comments and removed some
> changes in ReplicationSlotCreate() that didn't seem to do anything
> useful (might have been from a previous patch).
> 
> The attached patch looks good to me.
> 

I think that the removal of changes to ReplicationSlotAcquire() that you
did will result in making it impossible to reacquire temporary slot once
you switched to different one in the session as the if (active_pid != 0)
will always be true for temp slot.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Logical Replication WIP

From
Andres Freund
Date:
On 2016-12-02 12:37:49 -0500, Peter Eisentraut wrote:
> On 11/20/16 1:02 PM, Petr Jelinek wrote:
> > 0001:
> > This is the reworked approach to temporary slots that I sent earlier.
> 
> Andres, you had expressed an interest in this.  Will you be able to
> review it soon?

Yep. Needed to get that WIP stuff about expression evaluation and JITing
out of the door first though.

Regards,

Andres



Re: Logical Replication WIP

From
Peter Eisentraut
Date:
On 12/5/16 6:24 PM, Petr Jelinek wrote:
> I think that the removal of changes to ReplicationSlotAcquire() that you
> did will result in making it impossible to reacquire temporary slot once
> you switched to different one in the session as the if (active_pid != 0)
> will always be true for temp slot.

I see.  I suppose it's difficult to get a test case for this.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Logical Replication WIP

From
Peter Eisentraut
Date:
On 12/6/16 11:58 AM, Peter Eisentraut wrote:
> On 12/5/16 6:24 PM, Petr Jelinek wrote:
>> I think that the removal of changes to ReplicationSlotAcquire() that you
>> did will result in making it impossible to reacquire temporary slot once
>> you switched to different one in the session as the if (active_pid != 0)
>> will always be true for temp slot.
>
> I see.  I suppose it's difficult to get a test case for this.

I created a test case, saw the error of my ways, and added your code
back in.  Patch attached.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Logical Replication WIP

From
Petr Jelinek
Date:
On 08/12/16 20:16, Peter Eisentraut wrote:
> On 12/6/16 11:58 AM, Peter Eisentraut wrote:
>> On 12/5/16 6:24 PM, Petr Jelinek wrote:
>>> I think that the removal of changes to ReplicationSlotAcquire() that you
>>> did will result in making it impossible to reacquire temporary slot once
>>> you switched to different one in the session as the if (active_pid != 0)
>>> will always be true for temp slot.
>>
>> I see.  I suppose it's difficult to get a test case for this.
> 
> I created a test case, saw the error of my ways, and added your code
> back in.  Patch attached.
> 

Hi,

I am happy with this version, thanks for moving it forward.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 08/12/16 20:16, Peter Eisentraut wrote:
> On 12/6/16 11:58 AM, Peter Eisentraut wrote:
>> On 12/5/16 6:24 PM, Petr Jelinek wrote:
>>> I think that the removal of changes to ReplicationSlotAcquire() that you
>>> did will result in making it impossible to reacquire temporary slot once
>>> you switched to different one in the session as the if (active_pid != 0)
>>> will always be true for temp slot.
>>
>> I see.  I suppose it's difficult to get a test case for this.
> 
> I created a test case, saw the error of my ways, and added your code
> back in.  Patch attached.
> 

Hi,

I am happy with this version, thanks for moving it forward.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] Logical Replication WIP

From
Peter Eisentraut
Date:
Here is a "fixup" patch for
0002-Add-PUBLICATION-catalogs-and-DDL-v11.patch.gz with some minor fixes.

Two issues that should be addressed:

1. I think ALTER PUBLICATION does not need to require CREATE privilege
on the database.  That should be easy to change.

2. By requiring only SELECT privilege to include a table in a
publication, someone could include a table without replica identity into
a publication and thus prevent updates to the table.

A while ago I had been working on a patch to create a new PUBLICATION
privilege for this purpose.  I have attached the in-progress patch here.
 We could either finish that up and include it, or commit your patch
initially with requiring superuser and then refine the permissions later.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Logical Replication WIP

From
Erik Rijkers
Date:
On 2016-12-09 17:08, Peter Eisentraut wrote:

Your earlier 0001-Add-support-for-temporary-replication-slots.patch 
could be applied instead of the similarly named, original patch by Petr.
(I used 19fcc0058ecc8e5eb756547006bc1b24a93cbb80 to apply this patch-set 
to)

(And it was, by the way,  pretty stable and running well.)

I'd like to get it running again but now I can't find a way to also 
include your newer 0001-fixup-Add-PUBLICATION-catalogs-and-DDL.patch of 
today.

How should these patches be applied (and at what level)?

20161208: 0001-Add-support-for-temporary-replication-slots__petere.patch  # petere
20161202: 0002-Add-PUBLICATION-catalogs-and-DDL-v11.patch  # PJ
20161209: 0001-fixup-Add-PUBLICATION-catalogs-and-DDL.patch  # petere
20161202: 0003-Add-SUBSCRIPTION-catalog-and-DDL-v11.patch  # PJ
20161202: 
0004-Define-logical-replication-protocol-and-output-plugi-v11.patch  # 
PJ
20161202: 0005-Add-logical-replication-workers-v11.patch  # PJ
20161202: 
0006-Add-separate-synchronous-commit-control-for-logical--v11.patch  # 
PJ

Could (one of) you give me a hint?

Thanks,

Erik Rijkers



Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 09/12/16 17:08, Peter Eisentraut wrote:
> Here is a "fixup" patch for
> 0002-Add-PUBLICATION-catalogs-and-DDL-v11.patch.gz with some minor fixes.
> 

Thanks, merged.

> Two issues that should be addressed:
> 
> 1. I think ALTER PUBLICATION does not need to require CREATE privilege
> on the database.  That should be easy to change.
> 

Right, I removed the check.

> 2. By requiring only SELECT privilege to include a table in a
> publication, someone could include a table without replica identity into
> a publication and thus prevent updates to the table.
> 
> A while ago I had been working on a patch to create a new PUBLICATION
> privilege for this purpose.  I have attached the in-progress patch here.
>  We could either finish that up and include it, or commit your patch
> initially with requiring superuser and then refine the permissions later.
> 

Hmm, good catch. I changed the SELECT privilege check to owner check for
now, that seems relatively reasonable.

I agree that we should eventually have special privilege for that
though. But then we also need to invent privileges for PUBLICATIONs
themselves for this to work reasonably as you need to be owner of
PUBLICATION to add tables right now, so having PUBLICATION privilege on
table does not seem to do an awful lot. Also I think if we add table
privilege for this it's probably better named as PUBLISH rather than
PUBLICATION but that's not really important.

Attached new version with your updates and rebased on top of the current
HEAD (the partitioning patch produced quite a few conflicts).

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
Hi,

On 09/12/16 22:00, Erik Rijkers wrote:
> On 2016-12-09 17:08, Peter Eisentraut wrote:
> 
> Your earlier 0001-Add-support-for-temporary-replication-slots.patch
> could be applied instead of the similarly named, original patch by Petr.
> (I used 19fcc0058ecc8e5eb756547006bc1b24a93cbb80 to apply this patch-set
> to)
> 
> (And it was, by the way,  pretty stable and running well.)
> 

Great, thanks for testing.

> I'd like to get it running again but now I can't find a way to also
> include your newer 0001-fixup-Add-PUBLICATION-catalogs-and-DDL.patch of
> today.
> 
> How should these patches be applied (and at what level)?
> 
> 20161208: 0001-Add-support-for-temporary-replication-slots__petere.patch
>  # petere
> 20161202: 0002-Add-PUBLICATION-catalogs-and-DDL-v11.patch  # PJ
> 20161209: 0001-fixup-Add-PUBLICATION-catalogs-and-DDL.patch  # petere
> 20161202: 0003-Add-SUBSCRIPTION-catalog-and-DDL-v11.patch  # PJ
> 20161202:
> 0004-Define-logical-replication-protocol-and-output-plugi-v11.patch  # PJ
> 20161202: 0005-Add-logical-replication-workers-v11.patch  # PJ
> 20161202:
> 0006-Add-separate-synchronous-commit-control-for-logical--v11.patch  # PJ
> 
> Could (one of) you give me a hint?
> 

I just sent in a rebased patch that includes all of it.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] Logical Replication WIP

From
Peter Eisentraut
Date:
On 12/8/16 4:10 PM, Petr Jelinek wrote:
> On 08/12/16 20:16, Peter Eisentraut wrote:
>> On 12/6/16 11:58 AM, Peter Eisentraut wrote:
>>> On 12/5/16 6:24 PM, Petr Jelinek wrote:
>>>> I think that the removal of changes to ReplicationSlotAcquire() that you
>>>> did will result in making it impossible to reacquire temporary slot once
>>>> you switched to different one in the session as the if (active_pid != 0)
>>>> will always be true for temp slot.
>>>
>>> I see.  I suppose it's difficult to get a test case for this.
>>
>> I created a test case, saw the error of my ways, and added your code
>> back in.  Patch attached.
>>
> 
> Hi,
> 
> I am happy with this version, thanks for moving it forward.

committed

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Logical Replication WIP

From
Andres Freund
Date:
HJi,

On 2016-12-12 09:18:48 -0500, Peter Eisentraut wrote:
> On 12/8/16 4:10 PM, Petr Jelinek wrote:
> > On 08/12/16 20:16, Peter Eisentraut wrote:
> >> On 12/6/16 11:58 AM, Peter Eisentraut wrote:
> >>> On 12/5/16 6:24 PM, Petr Jelinek wrote:
> >>>> I think that the removal of changes to ReplicationSlotAcquire() that you
> >>>> did will result in making it impossible to reacquire temporary slot once
> >>>> you switched to different one in the session as the if (active_pid != 0)
> >>>> will always be true for temp slot.
> >>>
> >>> I see.  I suppose it's difficult to get a test case for this.
> >>
> >> I created a test case, saw the error of my ways, and added your code
> >> back in.  Patch attached.
> >>
> >
> > Hi,
> >
> > I am happy with this version, thanks for moving it forward.
>
> committed

Hm.
/*
+ * Cleanup all temporary slots created in current session.
+ */
+void
+ReplicationSlotCleanup()

I'd rather see a (void) there. The prototype has it, but still.


+
+    /*
+     * No need for locking as we are only interested in slots active in
+     * current process and those are not touched by other processes.

I'm a bit suspicious of this claim.  Without a memory barrier you could
actually look at outdated versions of active_pid. In practice there's
enough full memory barriers in the slot creation code that it's
guaranteed to not be the same pid from before a wraparound though.

I think that doing iterations of slots without
ReplicationSlotControlLock makes things more fragile, because suddenly
assumptions that previously held aren't true anymore.   E.g. factually/* * The slot is definitely gone.  Lock out
concurrentscans of the array * long enough to kill it.  It's OK to clear the active flag here without * grabbing the
mutexbecause nobody else can be scanning the array here, * and nobody can be attached to this slot and thus access it
without* scanning the array. */
 
is now simply not true anymore.  It's probably not harmfully broken, but
at least you've changed the locking protocol without adapting comments.

/*
- * Permanently drop the currently acquired replication slot which will be
- * released by the point this function returns.
+ * Permanently drop the currently acquired replication slot. */static voidReplicationSlotDropAcquired(void)

Isn't that actually removing interesting information? Yes, the comment's
been moved to ReplicationSlotDropPtr(), but that routine is an internal
one...


@@ -810,6 +810,9 @@ ProcKill(int code, Datum arg)    if (MyReplicationSlot != NULL)        ReplicationSlotRelease();

+    /* Also cleanup all the temporary slots. */
+    ReplicationSlotCleanup();
+

So we now have exactly this code in several places. Why does a
generically named Cleanup routine not also deal with a currently
acquired slot? Right now it'd be more appropriately named
ReplicationSlotDropTemporary() or such.


@@ -1427,13 +1427,14 @@ pg_replication_slots| SELECT l.slot_name,    l.slot_type,    l.datoid,    d.datname AS
database,
+    l.temporary,    l.active,    l.active_pid,    l.xmin,    l.catalog_xmin,    l.restart_lsn,
l.confirmed_flush_lsn
-   FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, active, active_pid, xmin, catalog_xmin,
restart_lsn,confirmed_flush_lsn)
 
+   FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin,
catalog_xmin,restart_lsn, confirmed_flush_lsn)     LEFT JOIN pg_database d ON ((l.datoid = d.oid)));pg_roles| SELECT
pg_authid.rolname,   pg_authid.rolsuper,
 

If we start to expose this, shouldn't we expose the persistency instead
(i.e. persistent/ephemeral/temporary)?


new file   contrib/test_decoding/sql/slot.sql
@@ -0,0 +1,20 @@
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_p', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_t', 'test_decoding', true);
+
+SELECT pg_drop_replication_slot('regression_slot_p');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_p', 'test_decoding', false);
+
+-- reconnect to clean temp slots
+\c

Can we add multiple slots to clean up here? Can we also add a test for
the cleanup on error for temporary slots? E.g. something like in
ddl.sql (maybe we should actually move some of the relevant tests from
there to here).

It'd also be good to test this with physical slots?


+-- test switching between slots in a session
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot1', 'test_decoding', true);
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot2', 'test_decoding', true);
+SELECT * FROM pg_logical_slot_get_changes('regression_slot1', NULL, NULL);
+SELECT * FROM pg_logical_slot_get_changes('regression_slot2', NULL, NULL);

Can we actually output something? Right now this doesn't test that
much...


- Andres



Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 13/12/16 01:33, Andres Freund wrote:
> HJi,
> 
> On 2016-12-12 09:18:48 -0500, Peter Eisentraut wrote:
>> On 12/8/16 4:10 PM, Petr Jelinek wrote:
>>> On 08/12/16 20:16, Peter Eisentraut wrote:
>>>> On 12/6/16 11:58 AM, Peter Eisentraut wrote:
>>>>> On 12/5/16 6:24 PM, Petr Jelinek wrote:
>>>>>> I think that the removal of changes to ReplicationSlotAcquire() that you
>>>>>> did will result in making it impossible to reacquire temporary slot once
>>>>>> you switched to different one in the session as the if (active_pid != 0)
>>>>>> will always be true for temp slot.
>>>>>
>>>>> I see.  I suppose it's difficult to get a test case for this.
>>>>
>>>> I created a test case, saw the error of my ways, and added your code
>>>> back in.  Patch attached.
>>>>
>>>
>>> Hi,
>>>
>>> I am happy with this version, thanks for moving it forward.
>>
>> committed
> 
> Hm.
> 
>  /*
> + * Cleanup all temporary slots created in current session.
> + */
> +void
> +ReplicationSlotCleanup()
> 
> I'd rather see a (void) there. The prototype has it, but still.
> 
> 
> +
> +    /*
> +     * No need for locking as we are only interested in slots active in
> +     * current process and those are not touched by other processes.
> 
> I'm a bit suspicious of this claim.  Without a memory barrier you could
> actually look at outdated versions of active_pid. In practice there's
> enough full memory barriers in the slot creation code that it's
> guaranteed to not be the same pid from before a wraparound though.
> 
> I think that doing iterations of slots without
> ReplicationSlotControlLock makes things more fragile, because suddenly
> assumptions that previously held aren't true anymore.   E.g. factually
>     /*
>      * The slot is definitely gone.  Lock out concurrent scans of the array
>      * long enough to kill it.  It's OK to clear the active flag here without
>      * grabbing the mutex because nobody else can be scanning the array here,
>      * and nobody can be attached to this slot and thus access it without
>      * scanning the array.
>      */
> is now simply not true anymore.  It's probably not harmfully broken, but
> at least you've changed the locking protocol without adapting comments.
> 

Well it's protected by being called only by ReplicationSlotCleanup() and
ReplicationSlotDropAcquired(). The comment could be improved though, yes.

Holding the ReplicationSlotControlLock in the scan is somewhat
problematic because ReplicationSlotDropPtr tryes to use it as well (and
in exclusive mode), so we'd have to do exclusive lock in
ReplicationSlotCleanup() which I don't really like much.

> 
>  /*
> - * Permanently drop the currently acquired replication slot which will be
> - * released by the point this function returns.
> + * Permanently drop the currently acquired replication slot.
>   */
>  static void
>  ReplicationSlotDropAcquired(void)
> 
> Isn't that actually removing interesting information? Yes, the comment's
> been moved to ReplicationSlotDropPtr(), but that routine is an internal
> one...
> 

ReplicationSlotDropAcquired() is internal one as well.

> 
> @@ -810,6 +810,9 @@ ProcKill(int code, Datum arg)
>      if (MyReplicationSlot != NULL)
>          ReplicationSlotRelease();
> 
> +    /* Also cleanup all the temporary slots. */
> +    ReplicationSlotCleanup();
> +
> 
> So we now have exactly this code in several places. Why does a
> generically named Cleanup routine not also deal with a currently
> acquired slot? Right now it'd be more appropriately named
> ReplicationSlotDropTemporary() or such.
> 

It definitely could release MyReplicationSlot as well.

> 
> @@ -1427,13 +1427,14 @@ pg_replication_slots| SELECT l.slot_name,
>      l.slot_type,
>      l.datoid,
>      d.datname AS database,
> +    l.temporary,
>      l.active,
>      l.active_pid,
>      l.xmin,
>      l.catalog_xmin,
>      l.restart_lsn,
>      l.confirmed_flush_lsn
> -   FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, active, active_pid, xmin, catalog_xmin,
restart_lsn,confirmed_flush_lsn)
 
> +   FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin,
catalog_xmin,restart_lsn, confirmed_flush_lsn)
 
>       LEFT JOIN pg_database d ON ((l.datoid = d.oid)));
>  pg_roles| SELECT pg_authid.rolname,
>      pg_authid.rolsuper,
> 
> If we start to expose this, shouldn't we expose the persistency instead
> (i.e. persistent/ephemeral/temporary)?
> 

Not sure how much is that useful given that ephemeral is transient state
only present during slot creation.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] Logical Replication WIP

From
Andres Freund
Date:
On 2016-12-10 08:48:55 +0100, Petr Jelinek wrote:

> diff --git a/src/backend/catalog/pg_publication.c b/src/backend/catalog/pg_publication.c
> new file mode 100644
> index 0000000..e3560b7
> --- /dev/null
> +++ b/src/backend/catalog/pg_publication.c
> +
> +Datum pg_get_publication_tables(PG_FUNCTION_ARGS);

Don't we usually put these in a header?

> +/*
> + * Insert new publication / relation mapping.
> + */
> +ObjectAddress
> +publication_add_relation(Oid pubid, Relation targetrel,
> +                         bool if_not_exists)
> +{
> +    Relation    rel;
> +    HeapTuple    tup;
> +    Datum        values[Natts_pg_publication_rel];
> +    bool        nulls[Natts_pg_publication_rel];
> +    Oid            relid = RelationGetRelid(targetrel);
> +    Oid            prrelid;
> +    Publication *pub = GetPublication(pubid);
> +    ObjectAddress    myself,
> +                    referenced;
> +
> +    rel = heap_open(PublicationRelRelationId, RowExclusiveLock);
> +
> +    /* Check for duplicates */

Maybe mention that that check is racy, but a unique index protects
against the race?


> +    /* Insert tuple into catalog. */
> +    prrelid = simple_heap_insert(rel, tup);
> +    CatalogUpdateIndexes(rel, tup);
> +    heap_freetuple(tup);
> +
> +    ObjectAddressSet(myself, PublicationRelRelationId, prrelid);
> +
> +    /* Add dependency on the publication */
> +    ObjectAddressSet(referenced, PublicationRelationId, pubid);
> +    recordDependencyOn(&myself, &referenced, DEPENDENCY_AUTO);
> +
> +    /* Add dependency on the relation */
> +    ObjectAddressSet(referenced, RelationRelationId, relid);
> +    recordDependencyOn(&myself, &referenced, DEPENDENCY_AUTO);
> +
> +    /* Close the table. */
> +    heap_close(rel, RowExclusiveLock);

I'm not quite sure abou the policy, but shouldn't we invoke
InvokeObjectPostCreateHook etc here?


> +/*
> + * Gets list of relation oids for a publication.
> + *
> + * This should only be used for normal publications, the FOR ALL TABLES
> + * should use GetAllTablesPublicationRelations().
> + */
> +List *
> +GetPublicationRelations(Oid pubid)
> +{
> +    List           *result;
> +    Relation        pubrelsrel;
> +    ScanKeyData        scankey;
> +    SysScanDesc        scan;
> +    HeapTuple        tup;
> +
> +    /* Find all publications associated with the relation. */
> +    pubrelsrel = heap_open(PublicationRelRelationId, AccessShareLock);
> +
> +    ScanKeyInit(&scankey,
> +                Anum_pg_publication_rel_prpubid,
> +                BTEqualStrategyNumber, F_OIDEQ,
> +                ObjectIdGetDatum(pubid));
> +
> +    scan = systable_beginscan(pubrelsrel, PublicationRelMapIndexId, true,
> +                              NULL, 1, &scankey);
> +
> +    result = NIL;
> +    while (HeapTupleIsValid(tup = systable_getnext(scan)))
> +    {
> +        Form_pg_publication_rel        pubrel;
> +
> +        pubrel = (Form_pg_publication_rel) GETSTRUCT(tup);
> +
> +        result = lappend_oid(result, pubrel->prrelid);
> +    }
> +
> +    systable_endscan(scan);
> +    heap_close(pubrelsrel, NoLock);

In other parts of this you drop the lock, but not here?


> +    heap_close(rel, NoLock);
> +
> +    return result;
> +}

and here.


> +/*
> + * Gets list of all relation published by FOR ALL TABLES publication(s).
> + */
> +List *
> +GetAllTablesPublicationRelations(void)
> +{
> +    Relation    classRel;
> +    ScanKeyData key[1];
> +    HeapScanDesc scan;
> +    HeapTuple    tuple;
> +    List       *result = NIL;
> +
> +    classRel = heap_open(RelationRelationId, AccessShareLock);

> +    heap_endscan(scan);
> +    heap_close(classRel, AccessShareLock);
> +
> +    return result;
> +}

but here.


Btw, why are matviews not publishable?

> +/*
> + * Get Publication using name.
> + */
> +Publication *
> +GetPublicationByName(const char *pubname, bool missing_ok)
> +{
> +    Oid            oid;
> +
> +    oid = GetSysCacheOid1(PUBLICATIONNAME, CStringGetDatum(pubname));
> +    if (!OidIsValid(oid))
> +    {
> +        if (missing_ok)
> +            return NULL;
> +
> +        ereport(ERROR,
> +                (errcode(ERRCODE_UNDEFINED_OBJECT),
> +                 errmsg("publication \"%s\" does not exist", pubname)));
> +    }
> +
> +    return GetPublication(oid);
> +}

That's racy... Also, shouldn't we specify for how to deal with the
returned memory for Publication * returning methods?



> diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c
> new file mode 100644
> index 0000000..954b2bd
> --- /dev/null
> +++ b/src/backend/commands/publicationcmds.c
> @@ -0,0 +1,613 @@

> +/*
> + * Create new publication.
> + */
> +ObjectAddress
> +CreatePublication(CreatePublicationStmt *stmt)
> +{
> +    Relation    rel;

> +
> +    values[Anum_pg_publication_puballtables - 1] =
> +        BoolGetDatum(stmt->for_all_tables);
> +    values[Anum_pg_publication_pubinsert - 1] =
> +        BoolGetDatum(publish_insert);
> +    values[Anum_pg_publication_pubupdate - 1] =
> +        BoolGetDatum(publish_update);
> +    values[Anum_pg_publication_pubdelete - 1] =
> +        BoolGetDatum(publish_delete);

I remain convinced that a different representation would be
better. There'll be more options over time (truncate, DDL at least).


> +static void
> +AlterPublicationOptions(AlterPublicationStmt *stmt, Relation rel,
> +                       HeapTuple tup)
> +{
> +    bool        publish_insert_given;
> +    bool        publish_update_given;
> +    bool        publish_delete_given;
> +    bool        publish_insert;
> +    bool        publish_update;
> +    bool        publish_delete;
> +    ObjectAddress        obj;
> +
> +    parse_publication_options(stmt->options,
> +                              &publish_insert_given, &publish_insert,
> +                              &publish_update_given, &publish_update,
> +                              &publish_delete_given, &publish_delete);

You could pass it a struct instead...


> +static List *
> +OpenTableList(List *tables)
> +{
> +    List       *relids = NIL;
> +    List       *rels = NIL;
> +    ListCell   *lc;
> +
> +    /*
> +     * Open, share-lock, and check all the explicitly-specified relations
> +     */
> +    foreach(lc, tables)
> +    {
> +        RangeVar   *rv = lfirst(lc);
> +        Relation    rel;
> +        bool        recurse = interpretInhOption(rv->inhOpt);
> +        Oid            myrelid;
> +
> +        rel = heap_openrv(rv, ShareUpdateExclusiveLock);
> +        myrelid = RelationGetRelid(rel);
> +        /* filter out duplicates when user specifies "foo, foo" */
> +        if (list_member_oid(relids, myrelid))
> +        {
> +            heap_close(rel, ShareUpdateExclusiveLock);
> +            continue;
> +        }

This is a quadratic algorithm - that could bite us... Not sure if we
need to care.  If we want to fix it, one approach owuld be to use
RangeVarGetRelid() instead, and then do a qsort/deduplicate before
actually opening the relations.

>  
> -def_elem:    ColLabel '=' def_arg
> +def_elem:    def_key '=' def_arg
>                  {
>                      $$ = makeDefElem($1, (Node *) $3, @1);
>                  }
> -            | ColLabel
> +            | def_key
>                  {
>                      $$ = makeDefElem($1, NULL, @1);
>                  }
>          ;

> +def_key:
> +            ColLabel                        { $$ = $1; }
> +            | ColLabel ColLabel                { $$ = psprintf("%s %s", $1, $2); }
> +        ;
> +

Not quite sure what this is about?  Doesn't that change the accepted
syntax in a bunch of places?


> @@ -2337,6 +2338,8 @@ RelationDestroyRelation(Relation relation, bool remember_tupdesc)
>      bms_free(relation->rd_indexattr);
>      bms_free(relation->rd_keyattr);
>      bms_free(relation->rd_idattr);
> +    if (relation->rd_pubactions)
> +        pfree(relation->rd_pubactions);
>      if (relation->rd_options)
>          pfree(relation->rd_options);
>      if (relation->rd_indextuple)
> @@ -4992,6 +4995,67 @@ RelationGetExclusionInfo(Relation indexRelation,
>      MemoryContextSwitchTo(oldcxt);
>  }
>  
> +/*
> + * Get publication actions for the given relation.
> + */
> +struct PublicationActions *
> +GetRelationPublicationActions(Relation relation)
> +{
> +    List       *puboids;
> +    ListCell   *lc;
> +    MemoryContext        oldcxt;
> +    PublicationActions *pubactions = palloc0(sizeof(PublicationActions));
> +
> +    if (relation->rd_pubactions)
> +        return memcpy(pubactions, relation->rd_pubactions,
> +                      sizeof(PublicationActions));
> +
> +    /* Fetch the publication membership info. */
> +    puboids = GetRelationPublications(RelationGetRelid(relation));
> +    puboids = list_concat_unique_oid(puboids, GetAllTablesPublications());
> +
> +    foreach(lc, puboids)
> +    {
> +        Oid            pubid = lfirst_oid(lc);
> +        HeapTuple    tup;
> +        Form_pg_publication pubform;
> +
> +        tup = SearchSysCache1(PUBLICATIONOID, ObjectIdGetDatum(pubid));
> +
> +        if (!HeapTupleIsValid(tup))
> +            elog(ERROR, "cache lookup failed for publication %u", pubid);
> +
> +        pubform = (Form_pg_publication) GETSTRUCT(tup);
> +
> +        pubactions->pubinsert |= pubform->pubinsert;
> +        pubactions->pubupdate |= pubform->pubupdate;
> +        pubactions->pubdelete |= pubform->pubdelete;
> +
> +        ReleaseSysCache(tup);
> +
> +        /*
> +         * If we know everything is replicated, there is no point to check
> +         * for other publications.
> +         */
> +        if (pubactions->pubinsert && pubactions->pubupdate &&
> +            pubactions->pubdelete)
> +            break;
> +    }
> +
> +    if (relation->rd_pubactions)
> +    {
> +        pfree(relation->rd_pubactions);
> +        relation->rd_pubactions = NULL;
> +    }
> +
> +    /* Now save copy of the actions in the relcache entry. */
> +    oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
> +    relation->rd_pubactions = palloc(sizeof(PublicationActions));
> +    memcpy(relation->rd_pubactions, pubactions, sizeof(PublicationActions));
> +    MemoryContextSwitchTo(oldcxt);
> +
> +    return pubactions;
> +}


Hm. Do we actually have enough cache invalidation support to make this
cached version correct?  I haven't seen anything in that regard? Seems
to mean that all changes to an ALL TABLES publication need to do a
global relcache invalidation?

- Andres



Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 13/12/16 02:41, Andres Freund wrote:
> On 2016-12-10 08:48:55 +0100, Petr Jelinek wrote:
> 
>> diff --git a/src/backend/catalog/pg_publication.c b/src/backend/catalog/pg_publication.c
>> new file mode 100644
>> index 0000000..e3560b7
>> --- /dev/null
>> +++ b/src/backend/catalog/pg_publication.c
>> +
>> +Datum pg_get_publication_tables(PG_FUNCTION_ARGS);
> 
> Don't we usually put these in a header?
>

We put these to rather random places, I don't mind either way.

> 
>> +/*
>> + * Gets list of relation oids for a publication.
>> + *
>> + * This should only be used for normal publications, the FOR ALL TABLES
>> + * should use GetAllTablesPublicationRelations().
>> + */
>> +List *
>> +GetPublicationRelations(Oid pubid)
>> +{
>> +    List           *result;
>> +    Relation        pubrelsrel;
>> +    ScanKeyData        scankey;
>> +    SysScanDesc        scan;
>> +    HeapTuple        tup;
>> +
>> +    /* Find all publications associated with the relation. */
>> +    pubrelsrel = heap_open(PublicationRelRelationId, AccessShareLock);
>> +
>> +    ScanKeyInit(&scankey,
>> +                Anum_pg_publication_rel_prpubid,
>> +                BTEqualStrategyNumber, F_OIDEQ,
>> +                ObjectIdGetDatum(pubid));
>> +
>> +    scan = systable_beginscan(pubrelsrel, PublicationRelMapIndexId, true,
>> +                              NULL, 1, &scankey);
>> +
>> +    result = NIL;
>> +    while (HeapTupleIsValid(tup = systable_getnext(scan)))
>> +    {
>> +        Form_pg_publication_rel        pubrel;
>> +
>> +        pubrel = (Form_pg_publication_rel) GETSTRUCT(tup);
>> +
>> +        result = lappend_oid(result, pubrel->prrelid);
>> +    }
>> +
>> +    systable_endscan(scan);
>> +    heap_close(pubrelsrel, NoLock);
> 
> In other parts of this you drop the lock, but not here?
> 
> 
>> +    heap_close(rel, NoLock);
>> +
>> +    return result;
>> +}
> 
> and here.
> 

Meh, ignore, that's some pglogical legacy.


> 
> Btw, why are matviews not publishable?
> 

Because standard way of updating them is REFRESH MATERIALIZED VIEW which
is decoded as inserts into pg_temp_<oid> table. I think we'll have to
rethink how we do this before we can sanely support them.

>> +/*
>> + * Get Publication using name.
>> + */
>> +Publication *
>> +GetPublicationByName(const char *pubname, bool missing_ok)
>> +{
>> +    Oid            oid;
>> +
>> +    oid = GetSysCacheOid1(PUBLICATIONNAME, CStringGetDatum(pubname));
>> +    if (!OidIsValid(oid))
>> +    {
>> +        if (missing_ok)
>> +            return NULL;
>> +
>> +        ereport(ERROR,
>> +                (errcode(ERRCODE_UNDEFINED_OBJECT),
>> +                 errmsg("publication \"%s\" does not exist", pubname)));
>> +    }
>> +
>> +    return GetPublication(oid);
>> +}
> 
> That's racy... Also, shouldn't we specify for how to deal with the
> returned memory for Publication * returning methods?
> 

So are most of the other existing functions with similar purpose. The
worst case is that with enough concurrency around same publication name
DDL you'll get cache lookup failure.

I added comment to GetPublication saying that memory is palloced.

> 
>> diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c
>> new file mode 100644
>> index 0000000..954b2bd
>> --- /dev/null
>> +++ b/src/backend/commands/publicationcmds.c
>> @@ -0,0 +1,613 @@
> 
>> +/*
>> + * Create new publication.
>> + */
>> +ObjectAddress
>> +CreatePublication(CreatePublicationStmt *stmt)
>> +{
>> +    Relation    rel;
> 
>> +
>> +    values[Anum_pg_publication_puballtables - 1] =
>> +        BoolGetDatum(stmt->for_all_tables);
>> +    values[Anum_pg_publication_pubinsert - 1] =
>> +        BoolGetDatum(publish_insert);
>> +    values[Anum_pg_publication_pubupdate - 1] =
>> +        BoolGetDatum(publish_update);
>> +    values[Anum_pg_publication_pubdelete - 1] =
>> +        BoolGetDatum(publish_delete);
> 
> I remain convinced that a different representation would be
> better. There'll be more options over time (truncate, DDL at least).
> 

So? It's boolean properties, it's not like we store bitmaps in catalogs
much. I very much expect DDL to be much more complex than boolean btw.

> 
>> +static void
>> +AlterPublicationOptions(AlterPublicationStmt *stmt, Relation rel,
>> +                       HeapTuple tup)
>> +{
>> +    bool        publish_insert_given;
>> +    bool        publish_update_given;
>> +    bool        publish_delete_given;
>> +    bool        publish_insert;
>> +    bool        publish_update;
>> +    bool        publish_delete;
>> +    ObjectAddress        obj;
>> +
>> +    parse_publication_options(stmt->options,
>> +                              &publish_insert_given, &publish_insert,
>> +                              &publish_update_given, &publish_update,
>> +                              &publish_delete_given, &publish_delete);
> 
> You could pass it a struct instead...
> 

Here yes, but in similar code for subscription not, I slightly prefer
consistency between those similar functions.

> 
>> +static List *
>> +OpenTableList(List *tables)
>> +{
>> +    List       *relids = NIL;
>> +    List       *rels = NIL;
>> +    ListCell   *lc;
>> +
>> +    /*
>> +     * Open, share-lock, and check all the explicitly-specified relations
>> +     */
>> +    foreach(lc, tables)
>> +    {
>> +        RangeVar   *rv = lfirst(lc);
>> +        Relation    rel;
>> +        bool        recurse = interpretInhOption(rv->inhOpt);
>> +        Oid            myrelid;
>> +
>> +        rel = heap_openrv(rv, ShareUpdateExclusiveLock);
>> +        myrelid = RelationGetRelid(rel);
>> +        /* filter out duplicates when user specifies "foo, foo" */
>> +        if (list_member_oid(relids, myrelid))
>> +        {
>> +            heap_close(rel, ShareUpdateExclusiveLock);
>> +            continue;
>> +        }
> 
> This is a quadratic algorithm - that could bite us... Not sure if we
> need to care.  If we want to fix it, one approach owuld be to use
> RangeVarGetRelid() instead, and then do a qsort/deduplicate before
> actually opening the relations.
> 

I guess it could get really slow only with big inheritance tree, I'll
look into how much work is the other way of doing things (this is not
exactly hot code path).

>>  
>> -def_elem:    ColLabel '=' def_arg
>> +def_elem:    def_key '=' def_arg
>>                  {
>>                      $$ = makeDefElem($1, (Node *) $3, @1);
>>                  }
>> -            | ColLabel
>> +            | def_key
>>                  {
>>                      $$ = makeDefElem($1, NULL, @1);
>>                  }
>>          ;
> 
>> +def_key:
>> +            ColLabel                        { $$ = $1; }
>> +            | ColLabel ColLabel                { $$ = psprintf("%s %s", $1, $2); }
>> +        ;
>> +
> 
> Not quite sure what this is about?  Doesn't that change the accepted
> syntax in a bunch of places?
> 

Well all those places have to check the actual values in the C code
later. It will change the error message a bit in some DDL. I made it
this way so that we don't have to introduce same thing as definition
with just this small change.

> 
>> @@ -2337,6 +2338,8 @@ RelationDestroyRelation(Relation relation, bool remember_tupdesc)
>>      bms_free(relation->rd_indexattr);
>>      bms_free(relation->rd_keyattr);
>>      bms_free(relation->rd_idattr);
>> +    if (relation->rd_pubactions)
>> +        pfree(relation->rd_pubactions);
>>      if (relation->rd_options)
>>          pfree(relation->rd_options);
>>      if (relation->rd_indextuple)
>> @@ -4992,6 +4995,67 @@ RelationGetExclusionInfo(Relation indexRelation,
>>      MemoryContextSwitchTo(oldcxt);
>>  }
>>  
>> +/*
>> + * Get publication actions for the given relation.
>> + */
>> +struct PublicationActions *
>> +GetRelationPublicationActions(Relation relation)
>> +{
>> +    List       *puboids;
>> +    ListCell   *lc;
>> +    MemoryContext        oldcxt;
>> +    PublicationActions *pubactions = palloc0(sizeof(PublicationActions));
>> +
>> +    if (relation->rd_pubactions)
>> +        return memcpy(pubactions, relation->rd_pubactions,
>> +                      sizeof(PublicationActions));
>> +
>> +    /* Fetch the publication membership info. */
>> +    puboids = GetRelationPublications(RelationGetRelid(relation));
>> +    puboids = list_concat_unique_oid(puboids, GetAllTablesPublications());
>> +
>> +    foreach(lc, puboids)
>> +    {
>> +        Oid            pubid = lfirst_oid(lc);
>> +        HeapTuple    tup;
>> +        Form_pg_publication pubform;
>> +
>> +        tup = SearchSysCache1(PUBLICATIONOID, ObjectIdGetDatum(pubid));
>> +
>> +        if (!HeapTupleIsValid(tup))
>> +            elog(ERROR, "cache lookup failed for publication %u", pubid);
>> +
>> +        pubform = (Form_pg_publication) GETSTRUCT(tup);
>> +
>> +        pubactions->pubinsert |= pubform->pubinsert;
>> +        pubactions->pubupdate |= pubform->pubupdate;
>> +        pubactions->pubdelete |= pubform->pubdelete;
>> +
>> +        ReleaseSysCache(tup);
>> +
>> +        /*
>> +         * If we know everything is replicated, there is no point to check
>> +         * for other publications.
>> +         */
>> +        if (pubactions->pubinsert && pubactions->pubupdate &&
>> +            pubactions->pubdelete)
>> +            break;
>> +    }
>> +
>> +    if (relation->rd_pubactions)
>> +    {
>> +        pfree(relation->rd_pubactions);
>> +        relation->rd_pubactions = NULL;
>> +    }
>> +
>> +    /* Now save copy of the actions in the relcache entry. */
>> +    oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
>> +    relation->rd_pubactions = palloc(sizeof(PublicationActions));
>> +    memcpy(relation->rd_pubactions, pubactions, sizeof(PublicationActions));
>> +    MemoryContextSwitchTo(oldcxt);
>> +
>> +    return pubactions;
>> +}
> 
> 
> Hm. Do we actually have enough cache invalidation support to make this
> cached version correct?  I haven't seen anything in that regard? Seems
> to mean that all changes to an ALL TABLES publication need to do a
> global relcache invalidation?
> 

Yeah you're right, we definitely don't do enough relcache invalidation
for this.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 13/12/16 03:26, Petr Jelinek wrote:
> On 13/12/16 02:41, Andres Freund wrote:
>> On 2016-12-10 08:48:55 +0100, Petr Jelinek wrote: 
>>
>>> +static List *
>>> +OpenTableList(List *tables)
>>> +{
>>> +    List       *relids = NIL;
>>> +    List       *rels = NIL;
>>> +    ListCell   *lc;
>>> +
>>> +    /*
>>> +     * Open, share-lock, and check all the explicitly-specified relations
>>> +     */
>>> +    foreach(lc, tables)
>>> +    {
>>> +        RangeVar   *rv = lfirst(lc);
>>> +        Relation    rel;
>>> +        bool        recurse = interpretInhOption(rv->inhOpt);
>>> +        Oid            myrelid;
>>> +
>>> +        rel = heap_openrv(rv, ShareUpdateExclusiveLock);
>>> +        myrelid = RelationGetRelid(rel);
>>> +        /* filter out duplicates when user specifies "foo, foo" */
>>> +        if (list_member_oid(relids, myrelid))
>>> +        {
>>> +            heap_close(rel, ShareUpdateExclusiveLock);
>>> +            continue;
>>> +        }
>>
>> This is a quadratic algorithm - that could bite us... Not sure if we
>> need to care.  If we want to fix it, one approach owuld be to use
>> RangeVarGetRelid() instead, and then do a qsort/deduplicate before
>> actually opening the relations.
>>
> 
> I guess it could get really slow only with big inheritance tree, I'll
> look into how much work is the other way of doing things (this is not
> exactly hot code path).
> 

Actually looking at it, it only processes user input so I don't think
it's very problematic in terms of performance. You'd have to pass many
thousands of tables in single DDL to notice.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] Logical Replication WIP

From
Peter Eisentraut
Date:
On 12/10/16 2:48 AM, Petr Jelinek wrote:
> Attached new version with your updates and rebased on top of the current
> HEAD (the partitioning patch produced quite a few conflicts).

I have attached a few more "fixup" patches, mostly with some editing of
documentation and comments and some compiler warnings.

In 0006 in the protocol documentation I have left a "XXX ???" where I
didn't understand what it was trying to say.

All issues from (my) previous reviews appear to have been addressed.

Comments besides that:


0003-Add-SUBSCRIPTION-catalog-and-DDL-v12.patch

Still wondering about the best workflow with pg_dump, but it seems all
the pieces are there right now, and the interfaces can be tweaked later.

DROP SUBSCRIPTION requires superuser, but should perhaps be owner check
only?

DROP SUBSCRIPTION IF EXISTS crashes if the subscription does not in fact
exist.

Maybe write the grammar so that SLOT does not need to be a new key word.
 The changes you made for CREATE PUBLICATION should allow that.

The tests are not added to serial_schedule.  Intentional?  If so, document?


0004-Define-logical-replication-protocol-and-output-plugi-v12.patch

Not sure why pg_catalog is encoded as a zero-length string.  I guess it
saves some space.  Maybe that could be explained in a brief code comment?


0005-Add-logical-replication-workers-v12.patch

The way the executor stuff is organized now looks better to me.

The subscriber crashes if max_replication_slots is 0:

TRAP: FailedAssertion("!(max_replication_slots > 0)", File: "origin.c",
Line: 999)

The documentation says that replication slots are required on the
subscriber, but from a user's perspective, it's not clear why that is.

Dropping a table that is part of a live subscription results in log
messages like

WARNING:  leaked hash_seq_search scan for hash table 0x7f9d2a807238

I was testing replicating into a temporary table, which failed like this:

FATAL:  the logical replication target public.test1 not found
LOG:  worker process:  (PID 2879) exited with exit code 1
LOG:  starting logical replication worker for subscription 16392
LOG:  logical replication apply for subscription mysub started

That's okay, but those messages were repeated every few seconds or so
and would create quite some log volume.  I wonder if that needs to be
reigned in somewhat.


I think this is getting very close to the point where it's committable.
So if anyone else has major concerns about the whole approach and
perhaps the way the new code in 0005 is organized, now would be the time ...

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Logical Replication WIP

From
Andres Freund
Date:
Hi,

On 2016-12-13 15:42:17 -0500, Peter Eisentraut wrote:
> I think this is getting very close to the point where it's committable.
> So if anyone else has major concerns about the whole approach and
> perhaps the way the new code in 0005 is organized, now would be the time ...

Uh. The whole cache invalidation thing is completely unresolved, and
that's just the publication patch. I've not looked in detail at later
patches.  So no, I don't think so.

I think after the invalidation issue is resolved the publication patch
might be close to being ready. I'm doubtful the later patches are.

Greetings,

Andres Freund



Re: [HACKERS] Logical Replication WIP

From
Andres Freund
Date:
On 2016-12-13 06:55:31 +0100, Petr Jelinek wrote:
> >> This is a quadratic algorithm - that could bite us... Not sure if we
> >> need to care.  If we want to fix it, one approach owuld be to use
> >> RangeVarGetRelid() instead, and then do a qsort/deduplicate before
> >> actually opening the relations.
> >>
> > 
> > I guess it could get really slow only with big inheritance tree, I'll
> > look into how much work is the other way of doing things (this is not
> > exactly hot code path).
> > 
> 
> Actually looking at it, it only processes user input so I don't think
> it's very problematic in terms of performance. You'd have to pass many
> thousands of tables in single DDL to notice.

Well, at least we should put a CHECK_FOR_INTERRUPTS there. At the moment
it's IIRC uninterruptible, which isn't good for something directly
triggered by the user.  A comment that it's known to be O(n^2), but
considered acceptable, would be good too.

Andres



Re: [HACKERS] Logical Replication WIP

From
Peter Eisentraut
Date:
On 12/12/16 7:33 PM, Andres Freund wrote:
> +-- test switching between slots in a session
> +SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot1', 'test_decoding', true);
> +SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot2', 'test_decoding', true);
> +SELECT * FROM pg_logical_slot_get_changes('regression_slot1', NULL, NULL);
> +SELECT * FROM pg_logical_slot_get_changes('regression_slot2', NULL, NULL);
> 
> Can we actually output something? Right now this doesn't test that
> much...

This test was added because an earlier version of the patch would crash
on this.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 14/12/16 01:26, Peter Eisentraut wrote:
> On 12/12/16 7:33 PM, Andres Freund wrote:
>> +-- test switching between slots in a session
>> +SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot1', 'test_decoding', true);
>> +SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot2', 'test_decoding', true);
>> +SELECT * FROM pg_logical_slot_get_changes('regression_slot1', NULL, NULL);
>> +SELECT * FROM pg_logical_slot_get_changes('regression_slot2', NULL, NULL);
>>
>> Can we actually output something? Right now this doesn't test that
>> much...
> 
> This test was added because an earlier version of the patch would crash
> on this.
> 

I did improve the test as part of the tests improvements that were sent
to committers list btw.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 13/12/16 22:05, Andres Freund wrote:
> Hi,
> 
> On 2016-12-13 15:42:17 -0500, Peter Eisentraut wrote:
>> I think this is getting very close to the point where it's committable.
>> So if anyone else has major concerns about the whole approach and
>> perhaps the way the new code in 0005 is organized, now would be the time ...
> 
> Uh. The whole cache invalidation thing is completely unresolved, and
> that's just the publication patch. I've not looked in detail at later
> patches.  So no, I don't think so.
> 

I already have code for that. I'll submit next version once I'll go over
PeterE's review. BTW the relcache thing is not as bad as it seems from
the publication patch because output plugin has to deal with
relcache/publication cache invalidations, it handles most of the updates
correctly. But there was still problem in terms of the write filtering
so the publications still have to reset relcache too.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 13/12/16 21:42, Peter Eisentraut wrote:
> On 12/10/16 2:48 AM, Petr Jelinek wrote:
>> Attached new version with your updates and rebased on top of the current
>> HEAD (the partitioning patch produced quite a few conflicts).
> 
> I have attached a few more "fixup" patches, mostly with some editing of
> documentation and comments and some compiler warnings.
> 
> In 0006 in the protocol documentation I have left a "XXX ???" where I
> didn't understand what it was trying to say.
> 

Okay I'll address that separately, thanks.

> All issues from (my) previous reviews appear to have been addressed.
> 
> Comments besides that:
> 
> 
> 0003-Add-SUBSCRIPTION-catalog-and-DDL-v12.patch
> 
> Still wondering about the best workflow with pg_dump, but it seems all
> the pieces are there right now, and the interfaces can be tweaked later.

Right, either way there needs to be some special handling for
subscriptions, having to request them specifically seems safest option
to me, but I am open to suggestions there.

> 
> DROP SUBSCRIPTION requires superuser, but should perhaps be owner check
> only?
> 

Hmm not sure that it requires superuser, I actually think it mistakenly
didn't require anything. In any case will make sure it just does owner
check.

> DROP SUBSCRIPTION IF EXISTS crashes if the subscription does not in fact
> exist.
>

Right, missing return.

> Maybe write the grammar so that SLOT does not need to be a new key word.
>  The changes you made for CREATE PUBLICATION should allow that.
> 

Hmm how would that look like? The opt_drop_slot would become IDENT
IDENT? Or maybe you want me to add the WITH (definition) kind of thing?

> The tests are not added to serial_schedule.  Intentional?  If so, document?
> 

Not intentional, will fix. Never use it, easy to forget about it.

> 
> 0004-Define-logical-replication-protocol-and-output-plugi-v12.patch
> 
> Not sure why pg_catalog is encoded as a zero-length string.  I guess it
> saves some space.  Maybe that could be explained in a brief code comment?
> 

Yes it's to save space, mainly for built-in types.

> 
> 0005-Add-logical-replication-workers-v12.patch
> 
> The way the executor stuff is organized now looks better to me.
> 
> The subscriber crashes if max_replication_slots is 0:
> 
> TRAP: FailedAssertion("!(max_replication_slots > 0)", File: "origin.c",
> Line: 999)
> 
> The documentation says that replication slots are required on the
> subscriber, but from a user's perspective, it's not clear why that is.

Yeah honestly I think origins should not depend on
max_replication_slots. They are not really connected (you can have many
of one and none of the other and vice versa). Also max_replication_slots
should IMHO default to max_wal_senders at this point. (In ideal world
all of those 3 would be in DSM instead of SHM and only governed by some
implementation maximum which is probably 2^16 and the GUCs would be removed)

But yes as it is, we should check for that, probably both during CREATE
SUBSCRIPTION and during apply start.

> 
> Dropping a table that is part of a live subscription results in log
> messages like
> 
> WARNING:  leaked hash_seq_search scan for hash table 0x7f9d2a807238
> 
> I was testing replicating into a temporary table, which failed like this:
> 
> FATAL:  the logical replication target public.test1 not found
> LOG:  worker process:  (PID 2879) exited with exit code 1
> LOG:  starting logical replication worker for subscription 16392
> LOG:  logical replication apply for subscription mysub started
> 
> That's okay, but those messages were repeated every few seconds or so
> and would create quite some log volume.  I wonder if that needs to be
> reigned in somewhat.

It retries every 5s or so I think, I am not sure how that could be
improved besides using the wal_retrieve_retry_interval instead of
hardcoded 5s (or maybe better add GUC for apply). Maybe some kind of
backoff algorithm could be added as well.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 13/12/16 01:33, Andres Freund wrote:
> 
> On 2016-12-12 09:18:48 -0500, Peter Eisentraut wrote:
>> On 12/8/16 4:10 PM, Petr Jelinek wrote:
>>> On 08/12/16 20:16, Peter Eisentraut wrote:
>>>> On 12/6/16 11:58 AM, Peter Eisentraut wrote:
>>>>> On 12/5/16 6:24 PM, Petr Jelinek wrote:
>>>>>> I think that the removal of changes to ReplicationSlotAcquire() that you
>>>>>> did will result in making it impossible to reacquire temporary slot once
>>>>>> you switched to different one in the session as the if (active_pid != 0)
>>>>>> will always be true for temp slot.
>>>>>
>>>>> I see.  I suppose it's difficult to get a test case for this.
>>>>
>>>> I created a test case, saw the error of my ways, and added your code
>>>> back in.  Patch attached.
>>>>
>>>
>>> Hi,
>>>
>>> I am happy with this version, thanks for moving it forward.
>>
>> committed
> 
> Hm.
> 
>  /*
> + * Cleanup all temporary slots created in current session.
> + */
> +void
> +ReplicationSlotCleanup()
> 
> I'd rather see a (void) there. The prototype has it, but still.
> 
> 
> +
> +    /*
> +     * No need for locking as we are only interested in slots active in
> +     * current process and those are not touched by other processes.
> 
> I'm a bit suspicious of this claim.  Without a memory barrier you could
> actually look at outdated versions of active_pid. In practice there's
> enough full memory barriers in the slot creation code that it's
> guaranteed to not be the same pid from before a wraparound though.
> 
> I think that doing iterations of slots without
> ReplicationSlotControlLock makes things more fragile, because suddenly
> assumptions that previously held aren't true anymore.   E.g. factually
>     /*
>      * The slot is definitely gone.  Lock out concurrent scans of the array
>      * long enough to kill it.  It's OK to clear the active flag here without
>      * grabbing the mutex because nobody else can be scanning the array here,
>      * and nobody can be attached to this slot and thus access it without
>      * scanning the array.
>      */
> is now simply not true anymore.  It's probably not harmfully broken, but
> at least you've changed the locking protocol without adapting comments.
> 
> 

Any thoughts on attached? Yes it does repeated scans which can in theory
be slow but as I explained in the comment, in practice there is not much
need to have many temporary slots active within single session so it
should not be big issue.

I am not quite convinced that all the locking is necessary from the
current logic perspective TBH but it should help prevent mistakes by
whoever changes things in slot.c next.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 13/12/16 21:42, Peter Eisentraut wrote:
> On 12/10/16 2:48 AM, Petr Jelinek wrote:
>> Attached new version with your updates and rebased on top of the current
>> HEAD (the partitioning patch produced quite a few conflicts).
> 
> I have attached a few more "fixup" patches, mostly with some editing of
> documentation and comments and some compiler warnings.
> 
> In 0006 in the protocol documentation I have left a "XXX ???" where I
> didn't understand what it was trying to say.
> 

Ah so you didn't understand the
> +                Identifies the following TupleData submessage as a key.
> +                This field is optional and is only present if
> +                the update changed the REPLICA IDENTITY index. XXX???

So what happens here is that the update message can contain one or two
out of 3 possible tuple submessages. It always contains 'N' message
which is the new data. Then it can optionally contain 'O' message with
old data if the table has REPLICA IDENTITY FULL (ie, not REPLICA
IDENTITY index like pkey, etc). Or it can include 'K' message that only
contains old data for the columns in the REPLICA IDENTITY index. But if
the REPLICA IDENTITY index didn't change (ie, old and new would be same
for those columns) we simply omit the 'K' message and let the downstream
take the key data from the 'N' message to save space.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] Logical Replication WIP

From
Craig Ringer
Date:
On 15 Dec. 2016 18:19, "Petr Jelinek" <petr.jelinek@2ndquadrant.com> wrote:
On 13/12/16 21:42, Peter Eisentraut wrote:
> On 12/10/16 2:48 AM, Petr Jelinek wrote:
>> Attached new version with your updates and rebased on top of the current
>> HEAD (the partitioning patch produced quite a few conflicts).
>
> I have attached a few more "fixup" patches, mostly with some editing of
> documentation and comments and some compiler warnings.
>
> In 0006 in the protocol documentation I have left a "XXX ???" where I
> didn't understand what it was trying to say.
>

Ah so you didn't understand the
> +                Identifies the following TupleData submessage as a key.
> +                This field is optional and is only present if
> +                the update changed the REPLICA IDENTITY index. XXX???

So what happens here is that the update message can contain one or two
out of 3 possible tuple submessages. It always contains 'N' message
which is the new data. Then it can optionally contain 'O' message with
old data if the table has REPLICA IDENTITY FULL (ie, not REPLICA
IDENTITY index like pkey, etc). Or it can include 'K' message that only
contains old data for the columns in the REPLICA IDENTITY index. But if
the REPLICA IDENTITY index didn't change (ie, old and new would be same
for those columns) we simply omit the 'K' message and let the downstream
take the key data from the 'N' message to save space.

Something we forgot to bake into pglogical that might be worth leaving room for here: sending the whole old tuple, with some fields marked as key.

So you can use replica identity pkey or whatever and the downstream knows which are the key fields. But can still transmit the whole old tuple in case the downstream wants it for conflict resolution/logging/etc.

We don't have the logical decoding and wal output for this yet, nor a way of requesting old tuple recording table by table. So all i'm suggesting is leaving room in the protocol.

Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 15/12/16 13:06, Craig Ringer wrote:
> On 15 Dec. 2016 18:19, "Petr Jelinek" <petr.jelinek@2ndquadrant.com
> <mailto:petr.jelinek@2ndquadrant.com>> wrote:
> 
>     On 13/12/16 21:42, Peter Eisentraut wrote:
>     > On 12/10/16 2:48 AM, Petr Jelinek wrote:
>     >> Attached new version with your updates and rebased on top of the
>     current
>     >> HEAD (the partitioning patch produced quite a few conflicts).
>     >
>     > I have attached a few more "fixup" patches, mostly with some
>     editing of
>     > documentation and comments and some compiler warnings.
>     >
>     > In 0006 in the protocol documentation I have left a "XXX ???" where I
>     > didn't understand what it was trying to say.
>     >
> 
>     Ah so you didn't understand the
>     > +                Identifies the following TupleData submessage as
>     a key.
>     > +                This field is optional and is only present if
>     > +                the update changed the REPLICA IDENTITY index. XXX???
> 
>     So what happens here is that the update message can contain one or two
>     out of 3 possible tuple submessages. It always contains 'N' message
>     which is the new data. Then it can optionally contain 'O' message with
>     old data if the table has REPLICA IDENTITY FULL (ie, not REPLICA
>     IDENTITY index like pkey, etc). Or it can include 'K' message that only
>     contains old data for the columns in the REPLICA IDENTITY index. But if
>     the REPLICA IDENTITY index didn't change (ie, old and new would be same
>     for those columns) we simply omit the 'K' message and let the downstream
>     take the key data from the 'N' message to save space.
> 
> 
> Something we forgot to bake into pglogical that might be worth leaving
> room for here: sending the whole old tuple, with some fields marked as key.
> 
> So you can use replica identity pkey or whatever and the downstream
> knows which are the key fields. But can still transmit the whole old
> tuple in case the downstream wants it for conflict resolution/logging/etc.
> 
> We don't have the logical decoding and wal output for this yet, nor a
> way of requesting old tuple recording table by table. So all i'm
> suggesting is leaving room in the protocol.
> 

Not really sure I follow, which columns are keys is not part of the info
in the data message, it's part of relation message, so it's already
possible in the protocol. Also the current implementation is fully
capable of taking advantage of PK on downstream even with REPLICA
IDENTITY FULL.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
Hi,

attached is version 13 of the patch.

I merged in changes from PeterE. And did following changes:
- fixed the ownership error messages for both provider and subscriber
- added ability to send invalidation message to invalidate whole
relcache and use it in publication code
- added the post creation/alter/drop hooks
- removed parts of docs that refer to initial sync (which does not exist
yet)
- added timeout handling/retry, etc to apply/launcher based on the GUCs
that exist for wal receiver (they could use renaming though)
- improved feedback behavior
- apply worker now uses owner of the subscription as connection user
- more tests
- check for max_replication_slots in launcher
- clarify the update 'K' sub-message description in protocol

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Logical Replication WIP

From
Erik Rijkers
Date:
On 2016-12-16 13:49, Petr Jelinek wrote:
> 
> version 13 of the patch.
> 
> 0001-Add-PUBLICATION-catalogs-and-DDL-v13.patch.gz (~32 KB)
> 0002-Add-SUBSCRIPTION-catalog-and-DDL-v13.patch.gz (~28 KB)
> 0003-Define-logical-rep...utput-plugi-v13.patch.gz (~13 KB)
> 0004-Add-logical-replication-workers-v13.patch.gz (~44 KB)
> 0005-Add-separate-synch...or-logical--v13.patch.gz (~2 KB)

Hi,

You wrote on 2016-08-05: :

> What's missing:
>  - sequences, I'd like to have them in 10.0 but I don't have good
>    way to implement it. PGLogical uses periodical syncing with some
>    buffer value but that's suboptimal. I would like to decode them
>    but that has proven to be complicated due to their sometimes
>    transactional sometimes nontransactional nature, so I probably
>    won't have time to do it within 10.0 by myself.

I ran into problems with sequences and I wonder if sequences-problems
are still expected, as the above seems to imply.

(short story: I tried to run pgbench across logical replication; and 
therefore
added a sequence to pgbench_history to give it a replica identity, and
cannot get it to work reliably ).


thanks,

Eik Rijkers








Re: [HACKERS] Logical Replication WIP

From
Steve Singer
Date:
On 12/16/2016 07:49 AM, Petr Jelinek wrote:
> Hi,
>
> attached is version 13 of the patch.
>
> I merged in changes from PeterE. And did following changes:
> - fixed the ownership error messages for both provider and subscriber
> - added ability to send invalidation message to invalidate whole
> relcache and use it in publication code
> - added the post creation/alter/drop hooks
> - removed parts of docs that refer to initial sync (which does not exist
> yet)
> - added timeout handling/retry, etc to apply/launcher based on the GUCs
> that exist for wal receiver (they could use renaming though)
> - improved feedback behavior
> - apply worker now uses owner of the subscription as connection user
> - more tests
> - check for max_replication_slots in launcher
> - clarify the update 'K' sub-message description in protocol

A few things I've noticed so far

If I shutdown the publisher I see the following in the log

2016-12-17 11:33:49.548 EST [1891] LOG:  worker process: ?)G? (PID 1987) 
exited with exit code 1

but then if I shutdown the subscriber postmaster and restart it switches to
2016-12-17 11:43:09.628 EST [2373] LOG:  worker process: ???? (PID 2393) 
exited with exit code 1

Not sure where the 'G' was coming from (other times I have seen an 'I' 
here or other random characters)


I don't think we are cleaning up subscriptions on a drop database

If I do the following

1) Create a subscription in a new database
2) Stop the publisher
3) Drop the database on the subscriber

test=# create subscription mysuba connection 'host=localhost dbname=test 
port=5440' publication mypub;
test=# \c b
b=# drop database test;
DROP DATABASE
b=# select * FROM pg_subscription ; subdbid | subname | subowner | subenabled | subconninfo              | 
subslotname | subpublications
---------+---------+----------+------------+--------------------------------------+-------------+-----------------
16384| mysuba  |       10 | t          | host=localhost dbname=test 
 
port=5440 | mysuba      | {mypub}

b=# select datname FROM pg_database where oid=16384; datname
---------
(0 rows)

Also I don't think I can now drop mysuba
b=# drop subscription mysuba;
ERROR:  subscription "mysuba" does not exist




>
>




Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 17/12/16 13:37, Erik Rijkers wrote:
> On 2016-12-16 13:49, Petr Jelinek wrote:
>>
>> version 13 of the patch.
>>
>> 0001-Add-PUBLICATION-catalogs-and-DDL-v13.patch.gz (~32 KB)
>> 0002-Add-SUBSCRIPTION-catalog-and-DDL-v13.patch.gz (~28 KB)
>> 0003-Define-logical-rep...utput-plugi-v13.patch.gz (~13 KB)
>> 0004-Add-logical-replication-workers-v13.patch.gz (~44 KB)
>> 0005-Add-separate-synch...or-logical--v13.patch.gz (~2 KB)
> 
> Hi,
> 
> You wrote on 2016-08-05: :
> 
>> What's missing:
>>  - sequences, I'd like to have them in 10.0 but I don't have good
>>    way to implement it. PGLogical uses periodical syncing with some
>>    buffer value but that's suboptimal. I would like to decode them
>>    but that has proven to be complicated due to their sometimes
>>    transactional sometimes nontransactional nature, so I probably
>>    won't have time to do it within 10.0 by myself.
> 
> I ran into problems with sequences and I wonder if sequences-problems
> are still expected, as the above seems to imply.
> 
> (short story: I tried to run pgbench across logical replication; and
> therefore
> added a sequence to pgbench_history to give it a replica identity, and
> cannot get it to work reliably ).
> 

Sequences are not replicated but that should not prevent pgbench_history
itself from being replicated when you add serial to it.

BTW you don't need to add primary key to pgbench_history. Simply ALTER
TABLE pgbench_history REPLICA IDENTITY FULL; should be enough.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 17/12/16 18:34, Steve Singer wrote:
> On 12/16/2016 07:49 AM, Petr Jelinek wrote:
>> Hi,
>>
>> attached is version 13 of the patch.
>>
>> I merged in changes from PeterE. And did following changes:
>> - fixed the ownership error messages for both provider and subscriber
>> - added ability to send invalidation message to invalidate whole
>> relcache and use it in publication code
>> - added the post creation/alter/drop hooks
>> - removed parts of docs that refer to initial sync (which does not exist
>> yet)
>> - added timeout handling/retry, etc to apply/launcher based on the GUCs
>> that exist for wal receiver (they could use renaming though)
>> - improved feedback behavior
>> - apply worker now uses owner of the subscription as connection user
>> - more tests
>> - check for max_replication_slots in launcher
>> - clarify the update 'K' sub-message description in protocol
> 
> A few things I've noticed so far
> 
> If I shutdown the publisher I see the following in the log
> 
> 2016-12-17 11:33:49.548 EST [1891] LOG:  worker process: ?)G? (PID 1987)
> exited with exit code 1
> 
> but then if I shutdown the subscriber postmaster and restart it switches to
> 2016-12-17 11:43:09.628 EST [2373] LOG:  worker process: ???? (PID 2393)
> exited with exit code 1
> 
> Not sure where the 'G' was coming from (other times I have seen an 'I'
> here or other random characters)
> 

Uninitialized bgw_name for apply worker. Rather silly bug. Fixed.

> 
> I don't think we are cleaning up subscriptions on a drop database
> 
> If I do the following
> 
> 1) Create a subscription in a new database
> 2) Stop the publisher
> 3) Drop the database on the subscriber
> 
> test=# create subscription mysuba connection 'host=localhost dbname=test
> port=5440' publication mypub;
> test=# \c b
> b=# drop database test;
> DROP DATABASE
> b=# select * FROM pg_subscription ;
>  subdbid | subname | subowner | subenabled | subconninfo              |
> subslotname | subpublications
> ---------+---------+----------+------------+--------------------------------------+-------------+-----------------
> 
>    16384 | mysuba  |       10 | t          | host=localhost dbname=test
> port=5440 | mysuba      | {mypub}
> 

Good one. I added check that prevents dropping database when there is
subscription defined for it. I think we can't cascade here as
subscription may or may not hold resources (slot) in another
instance/database so preventing the drop is best we can do.

> 
> Also I don't think I can now drop mysuba
> b=# drop subscription mysuba;
> ERROR:  subscription "mysuba" does not exist
> 

Yeah subscriptions are per database.

I don't want to make v14 just for these 2 changes as that would make
life harder for anybody code-reviewing the v13 so attached is diff with
above fixes that applies on top of v13.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Logical Replication WIP

From
Steve Singer
Date:
On 12/18/2016 05:28 AM, Petr Jelinek wrote:
> On 17/12/16 18:34, Steve Singer wrote:
>> On 12/16/2016 07:49 AM, Petr Jelinek wrote:
>> Yeah subscriptions are per database. I don't want to make v14 just 
>> for these 2 changes as that would make life harder for anybody 
>> code-reviewing the v13 so attached is diff with above fixes that 
>> applies on top of v13. 
>


Thanks that fixes those issues.

A few more I've noticed


pg_dumping subscriptions doesn't seem to work

./pg_dump -h localhost --port 5441 --include-subscriptions test
pg_dump: [archiver (db)] query failed: ERROR:  missing FROM-clause entry 
for table "p"
LINE 1: ...LECT rolname FROM pg_catalog.pg_roles WHERE oid = p.subowner...
                                                              ^
pg_dump: [archiver (db)] query was: SELECT s.tableoid, s.oid, 
s.subname,(SELECT rolname FROM pg_catalog.pg_roles WHERE oid = 
p.subowner) AS rolname, s.subenabled,  s.subconninfo, s.subslotname, 
s.subpublications FROM pg_catalog.pg_subscription s WHERE s.subdbid = 
(SELECT oid FROM pg_catalog.pg_database                   WHERE datname 
= current_database())

I have attached a patch that fixes this.

pg_dump is also generating warnings

pg_dump: [archiver] WARNING: don't know how to set owner for object type 
SUBSCRIPTION

I know that the plan is to add proper ACL's for publications and 
subscriptions later. I don't know if we want to leave the warning in 
until then or do something about it.


Also the tab-competion for create subscription doesn't seem to work as 
intended.
I've attached a patch that fixes it and patches to add tab completion 
for alter publication|subscription




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 18/12/16 19:02, Steve Singer wrote:
> On 12/18/2016 05:28 AM, Petr Jelinek wrote:
>> On 17/12/16 18:34, Steve Singer wrote:
>>> On 12/16/2016 07:49 AM, Petr Jelinek wrote:
>>> Yeah subscriptions are per database. I don't want to make v14 just
>>> for these 2 changes as that would make life harder for anybody
>>> code-reviewing the v13 so attached is diff with above fixes that
>>> applies on top of v13. 
>>
> 
> 
> Thanks that fixes those issues.
> 
> A few more I've noticed
> 
> 
> pg_dumping subscriptions doesn't seem to work
> 
> ./pg_dump -h localhost --port 5441 --include-subscriptions test
> pg_dump: [archiver (db)] query failed: ERROR:  missing FROM-clause entry
> for table "p"
> LINE 1: ...LECT rolname FROM pg_catalog.pg_roles WHERE oid = p.subowner...
>                                                              ^
> pg_dump: [archiver (db)] query was: SELECT s.tableoid, s.oid,
> s.subname,(SELECT rolname FROM pg_catalog.pg_roles WHERE oid =
> p.subowner) AS rolname, s.subenabled,  s.subconninfo, s.subslotname,
> s.subpublications FROM pg_catalog.pg_subscription s WHERE s.subdbid =
> (SELECT oid FROM pg_catalog.pg_database                   WHERE datname
> = current_database())
> 
> I have attached a patch that fixes this.
> 

Thanks, merged.

> pg_dump is also generating warnings
> 
> pg_dump: [archiver] WARNING: don't know how to set owner for object type
> SUBSCRIPTION
> 
> I know that the plan is to add proper ACL's for publications and
> subscriptions later. I don't know if we want to leave the warning in
> until then or do something about it.
> 

No, ACLs are separate from owner. This is thinko on my side. I was
thinking we can live without ALTER ... OWNER TO for now, but we actually
need it for pg_dump and for REASSIGN OWNED. So now I added the OWNER TO
for both PUBLICATION and SUBSCRIPTION.

> 
> Also the tab-competion for create subscription doesn't seem to work as
> intended.
> I've attached a patch that fixes it and patches to add tab completion
> for alter publication|subscription
> 

Merged as well.

Okay so now is the time for v14 I guess as more changes accumulated (I
also noticed missing doc for max_logical_replication_workers GUC).

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Logical Replication WIP

From
Erik Rijkers
Date:
On 2016-12-18 11:12, Petr Jelinek wrote:

(now using latest: patchset:)

0001-Add-PUBLICATION-catalogs-and-DDL-v14.patch
0002-Add-SUBSCRIPTION-catalog-and-DDL-v14.patch
0003-Define-logical-replication-protocol-and-output-plugi-v14.patch
0004-Add-logical-replication-workers-v14.patch
0005-Add-separate-synchronous-commit-control-for-logical--v14.patch

> BTW you don't need to add primary key to pgbench_history. Simply ALTER
> TABLE pgbench_history REPLICA IDENTITY FULL; should be enough.

Either should, but neither is.

set-up:
Before creating the publication/subscription:
On master I run   pgbench -qis 1,  then set replica identity (and/or add 
serial column) for pgbench_history, then dump/restore the 4 pgbench 
tables from master to replica.
Then enabling publication/subscription.  logs looks well.  (Other tests  
I've devised earlier (on other tables) still work nicely.)

Now when I do a pgbench-run on master, something like:
   pgbench -c 1 -T 20 -P 1

I often see this (when running pgbench):

ERROR:  publisher does not send replica identity column expected by the 
logical replication target public.pgbench_tellers
or, sometimes  (less often) the same ERROR for pgbench_accounts appears 
(as in the subsciber-log below)

-- publisher log
2016-12-19 07:44:22.738 CET 22690 LOG:  logical decoding found 
consistent point at 0/14598C78
2016-12-19 07:44:22.738 CET 22690 DETAIL:  There are no running 
transactions.
2016-12-19 07:44:22.738 CET 22690 LOG:  exported logical decoding 
snapshot: "000130FA-1" with 0 transaction IDs
2016-12-19 07:44:22.886 CET 22729 LOG:  starting logical decoding for 
slot "sub1"
2016-12-19 07:44:22.886 CET 22729 DETAIL:  streaming transactions 
committing after 0/14598CB0, reading WAL from 0/14598C78
2016-12-19 07:44:22.886 CET 22729 LOG:  logical decoding found 
consistent point at 0/14598C78
2016-12-19 07:44:22.886 CET 22729 DETAIL:  There are no running 
transactions.
2016-12-19 07:45:25.568 CET 22729 LOG:  could not receive data from 
client: Connection reset by peer
2016-12-19 07:45:25.568 CET 22729 LOG:  unexpected EOF on standby 
connection
2016-12-19 07:45:25.580 CET 26696 LOG:  starting logical decoding for 
slot "sub1"
2016-12-19 07:45:25.580 CET 26696 DETAIL:  streaming transactions 
committing after 0/1468E0D0, reading WAL from 0/1468DC90
2016-12-19 07:45:25.589 CET 26696 LOG:  logical decoding found 
consistent point at 0/1468DC90
2016-12-19 07:45:25.589 CET 26696 DETAIL:  There are no running 
transactions.

-- subscriber log
2016-12-19 07:44:22.878 CET 17027 LOG:  starting logical replication 
worker for subscription 24581
2016-12-19 07:44:22.883 CET 22726 LOG:  logical replication apply for 
subscription sub1 started
2016-12-19 07:45:11.069 CET 22726 WARNING:  leaked hash_seq_search scan 
for hash table 0x2def1a8
2016-12-19 07:45:25.566 CET 22726 ERROR:  publisher does not send 
replica identity column expected by the logical replication target 
public.pgbench_accounts
2016-12-19 07:45:25.568 CET 16984 LOG:  worker process: logical 
replication worker 24581 (PID 22726) exited with exit code 1
2016-12-19 07:45:25.568 CET 17027 LOG:  starting logical replication 
worker for subscription 24581
2016-12-19 07:45:25.574 CET 26695 LOG:  logical replication apply for 
subscription sub1 started
2016-12-19 07:46:10.950 CET 26695 WARNING:  leaked hash_seq_search scan 
for hash table 0x2def2c8
2016-12-19 07:46:10.950 CET 26695 WARNING:  leaked hash_seq_search scan 
for hash table 0x2def2c8
2016-12-19 07:46:10.950 CET 26695 WARNING:  leaked hash_seq_search scan 
for hash table 0x2def2c8


Sometimes  replication (caused by a pgbench run)  runs for a few seconds 
replicating all 4 pgbench tables correctly, but never longer than 10 to 
20 seconds.

If you cannot reproduce with the provided info it  I will make a more 
precise setup-desciption, but it's so solidly failing here that I hope 
that won't be necessary.


Erik Rijkers







Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 19/12/16 08:04, Erik Rijkers wrote:
> On 2016-12-18 11:12, Petr Jelinek wrote:
> 
> (now using latest: patchset:)
> 
> 0001-Add-PUBLICATION-catalogs-and-DDL-v14.patch
> 0002-Add-SUBSCRIPTION-catalog-and-DDL-v14.patch
> 0003-Define-logical-replication-protocol-and-output-plugi-v14.patch
> 0004-Add-logical-replication-workers-v14.patch
> 0005-Add-separate-synchronous-commit-control-for-logical--v14.patch
> 
>> BTW you don't need to add primary key to pgbench_history. Simply ALTER
>> TABLE pgbench_history REPLICA IDENTITY FULL; should be enough.
> 
> Either should, but neither is.
> 
> set-up:
> Before creating the publication/subscription:
> On master I run   pgbench -qis 1,  then set replica identity (and/or add
> serial column) for pgbench_history, then dump/restore the 4 pgbench
> tables from master to replica.
> Then enabling publication/subscription.  logs looks well.  (Other tests 
> I've devised earlier (on other tables) still work nicely.)
> 
> Now when I do a pgbench-run on master, something like:
> 
>    pgbench -c 1 -T 20 -P 1
> 
> I often see this (when running pgbench):
> 
> ERROR:  publisher does not send replica identity column expected by the
> logical replication target public.pgbench_tellers
> or, sometimes  (less often) the same ERROR for pgbench_accounts appears
> (as in the subsciber-log below)
> 
> -- publisher log
> 2016-12-19 07:44:22.738 CET 22690 LOG:  logical decoding found
> consistent point at 0/14598C78
> 2016-12-19 07:44:22.738 CET 22690 DETAIL:  There are no running
> transactions.
> 2016-12-19 07:44:22.738 CET 22690 LOG:  exported logical decoding
> snapshot: "000130FA-1" with 0 transaction IDs
> 2016-12-19 07:44:22.886 CET 22729 LOG:  starting logical decoding for
> slot "sub1"
> 2016-12-19 07:44:22.886 CET 22729 DETAIL:  streaming transactions
> committing after 0/14598CB0, reading WAL from 0/14598C78
> 2016-12-19 07:44:22.886 CET 22729 LOG:  logical decoding found
> consistent point at 0/14598C78
> 2016-12-19 07:44:22.886 CET 22729 DETAIL:  There are no running
> transactions.
> 2016-12-19 07:45:25.568 CET 22729 LOG:  could not receive data from
> client: Connection reset by peer
> 2016-12-19 07:45:25.568 CET 22729 LOG:  unexpected EOF on standby
> connection
> 2016-12-19 07:45:25.580 CET 26696 LOG:  starting logical decoding for
> slot "sub1"
> 2016-12-19 07:45:25.580 CET 26696 DETAIL:  streaming transactions
> committing after 0/1468E0D0, reading WAL from 0/1468DC90
> 2016-12-19 07:45:25.589 CET 26696 LOG:  logical decoding found
> consistent point at 0/1468DC90
> 2016-12-19 07:45:25.589 CET 26696 DETAIL:  There are no running
> transactions.
> 
> -- subscriber log
> 2016-12-19 07:44:22.878 CET 17027 LOG:  starting logical replication
> worker for subscription 24581
> 2016-12-19 07:44:22.883 CET 22726 LOG:  logical replication apply for
> subscription sub1 started
> 2016-12-19 07:45:11.069 CET 22726 WARNING:  leaked hash_seq_search scan
> for hash table 0x2def1a8
> 2016-12-19 07:45:25.566 CET 22726 ERROR:  publisher does not send
> replica identity column expected by the logical replication target
> public.pgbench_accounts
> 2016-12-19 07:45:25.568 CET 16984 LOG:  worker process: logical
> replication worker 24581 (PID 22726) exited with exit code 1
> 2016-12-19 07:45:25.568 CET 17027 LOG:  starting logical replication
> worker for subscription 24581
> 2016-12-19 07:45:25.574 CET 26695 LOG:  logical replication apply for
> subscription sub1 started
> 2016-12-19 07:46:10.950 CET 26695 WARNING:  leaked hash_seq_search scan
> for hash table 0x2def2c8
> 2016-12-19 07:46:10.950 CET 26695 WARNING:  leaked hash_seq_search scan
> for hash table 0x2def2c8
> 2016-12-19 07:46:10.950 CET 26695 WARNING:  leaked hash_seq_search scan
> for hash table 0x2def2c8
> 
> 
> Sometimes  replication (caused by a pgbench run)  runs for a few seconds
> replicating all 4 pgbench tables correctly, but never longer than 10 to
> 20 seconds.
> 
> If you cannot reproduce with the provided info it  I will make a more
> precise setup-desciption, but it's so solidly failing here that I hope
> that won't be necessary.
> 

Hi,

nope can't reproduce that. I can reproduce the leaked hash_seq_search.
The attached fixes that. But no issues with replication itself.

The error basically means that pkey on publisher and subscriber isn't
the same.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Logical Replication WIP

From
Steve Singer
Date:
On 12/18/2016 09:04 PM, Petr Jelinek wrote:
> On 18/12/16 19:02, Steve Singer wrote:
>
>> pg_dump is also generating warnings
>>
>> pg_dump: [archiver] WARNING: don't know how to set owner for object type
>> SUBSCRIPTION
>>
>> I know that the plan is to add proper ACL's for publications and
>> subscriptions later. I don't know if we want to leave the warning in
>> until then or do something about it.
>>
> No, ACLs are separate from owner. This is thinko on my side. I was
> thinking we can live without ALTER ... OWNER TO for now, but we actually
> need it for pg_dump and for REASSIGN OWNED. So now I added the OWNER TO
> for both PUBLICATION and SUBSCRIPTION.


When I try to restore my pg_dump with publications I get

./pg_dump  -h localhost --port 5440 test |./psql -h localhost --port 
5440 test2


ALTER TABLE
CREATE PUBLICATION
ERROR:  unexpected command tag "PUBLICATION

This comes from a
ALTER PUBLICATION mypub OWNER TO ssinger;


Does the OWNER TO  clause need to be added to AlterPublicationStmt: 
instead of AlterOwnerStmt ?
Also we should update the tab-complete for ALTER PUBLICATION to show the 
OWNER to options  + the \h help in psql and the reference SGML






Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 19/12/16 15:39, Steve Singer wrote:
> On 12/18/2016 09:04 PM, Petr Jelinek wrote:
>> On 18/12/16 19:02, Steve Singer wrote:
>>
>>> pg_dump is also generating warnings
>>>
>>> pg_dump: [archiver] WARNING: don't know how to set owner for object type
>>> SUBSCRIPTION
>>>
>>> I know that the plan is to add proper ACL's for publications and
>>> subscriptions later. I don't know if we want to leave the warning in
>>> until then or do something about it.
>>>
>> No, ACLs are separate from owner. This is thinko on my side. I was
>> thinking we can live without ALTER ... OWNER TO for now, but we actually
>> need it for pg_dump and for REASSIGN OWNED. So now I added the OWNER TO
>> for both PUBLICATION and SUBSCRIPTION.
> 
> 
> When I try to restore my pg_dump with publications I get
> 
> ./pg_dump  -h localhost --port 5440 test |./psql -h localhost --port
> 5440 test2
> 
> 
> ALTER TABLE
> CREATE PUBLICATION
> ERROR:  unexpected command tag "PUBLICATION
> 
> This comes from a
> ALTER PUBLICATION mypub OWNER TO ssinger;
> 
> 
> Does the OWNER TO  clause need to be added to AlterPublicationStmt:
> instead of AlterOwnerStmt ?

Nah that's just bug in what command tag string we return in the
utility.c, I noticed this myself after sending the v14, it's one line fix.

> Also we should update the tab-complete for ALTER PUBLICATION to show the
> OWNER to options  + the \h help in psql and the reference SGML
> 

Yeah.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] Logical Replication WIP

From
Erik Rijkers
Date:
On 2016-12-19 08:04, Erik Rijkers wrote:
> On 2016-12-18 11:12, Petr Jelinek wrote:
> 
> (now using latest: patchset:)
> 
> 0001-Add-PUBLICATION-catalogs-and-DDL-v14.patch
> 0002-Add-SUBSCRIPTION-catalog-and-DDL-v14.patch
> 0003-Define-logical-replication-protocol-and-output-plugi-v14.patch
> 0004-Add-logical-replication-workers-v14.patch
> 0005-Add-separate-synchronous-commit-control-for-logical--v14.patch
> 
> Sometimes  replication (caused by a pgbench run)  runs for a few
> seconds replicating all 4 pgbench tables correctly, but never longer
> than 10 to 20 seconds.
> 

I've concocted pgbench_derail.sh.  It assumes 2 instances running, 
initially without the publication and subsciption.

There are two separate installations, on the same machine.

To startup the two instances I use instance.sh:

# ./instances.sh
#!/bin/sh
port1=6972
port2=6973
project1=logical_replication
project2=logical_replication2
pg_stuff_dir=$HOME/pg_stuff
PATH1=$pg_stuff_dir/pg_installations/pgsql.$project1/bin:$PATH
PATH2=$pg_stuff_dir/pg_installations/pgsql.$project2/bin:$PATH
server_dir1=$pg_stuff_dir/pg_installations/pgsql.$project1
server_dir2=$pg_stuff_dir/pg_installations/pgsql.$project2
data_dir1=$server_dir1/data
data_dir2=$server_dir2/data
options1="
-c wal_level=logical
-c max_replication_slots=10
-c max_worker_processes=12
-c max_logical_replication_workers=10
-c max_wal_senders=10
-c logging_collector=on
-c log_directory=$server_dir1
-c log_filename=logfile.${project1} "

options2="
-c wal_level=replica
-c max_replication_slots=10
-c max_worker_processes=12
-c max_logical_replication_workers=10
-c max_wal_senders=10
-c logging_collector=on
-c log_directory=$server_dir2
-c log_filename=logfile.${project2} "
which postgres
export PATH=$PATH1; postgres -D $data_dir1 -p $port1 ${options1} &
export PATH=$PATH2; postgres -D $data_dir2 -p $port2 ${options2} &
# end ./instances.sh







#--- pgbench_derail.sh
#!/bin/sh

# assumes both instances are running

# clear logs
# echo > 
$HOME/pg_stuff/pg_installations/pgsql.logical_replication/logfile.logical_replication
# echo > 
$HOME/pg_stuff/pg_installations/pgsql.logical_replication2/logfile.logical_replication2

port1=6972
port2=6973


function cb()
{  #  display the 4 pgbench tables' accumulated content as md5s  #  a,b,t,h stand for:  pgbench_accounts, -branches,
-tellers,-history  for port in $port1 $port2  do    md5_a=$(echo "select * from pgbench_accounts order by aid"        
 
|psql -qtAXp$port|md5sum|cut -b 1-9)    md5_b=$(echo "select * from pgbench_branches order by bid"        
|psql -qtAXp$port|md5sum|cut -b 1-9)    md5_t=$(echo "select * from pgbench_tellers  order by tid"        
|psql -qtAXp$port|md5sum|cut -b 1-9)    md5_h=$(echo "select * from pgbench_history  order by 
aid,bid,tid"|psql -qtAXp$port|md5sum|cut -b 1-9)    cnt_a=$(echo "select count(*) from pgbench_accounts"|psql -qtAXp 
$port)    cnt_b=$(echo "select count(*) from pgbench_branches"|psql -qtAXp 
$port)    cnt_t=$(echo "select count(*) from pgbench_tellers" |psql -qtAXp 
$port)    cnt_h=$(echo "select count(*) from pgbench_history" |psql -qtAXp 
$port)    printf "$port a,b,t,h: %6d %6d %6d %6d" $cnt_a  $cnt_b  $cnt_t  
$cnt_h    echo -n "   $md5_a  $md5_b  $md5_t  $md5_h"    if   [[ $port -eq $port1 ]]; then echo "   master"    elif [[
$port-eq $port2 ]]; then echo "   replica"    else                              echo "             ERROR"    fi  done
 
}


echo "
drop table if exists pgbench_accounts;
drop table if exists pgbench_branches;
drop table if exists pgbench_tellers;
drop table if exists pgbench_history;" | psql -X -p $port1 \  && echo "
drop table if exists pgbench_accounts;
drop table if exists pgbench_branches;
drop table if exists pgbench_tellers;
drop table if exists pgbench_history;" | psql -X -p $port2 \  && pgbench -p $port1 -qis 1 \  && echo "alter table
pgbench_historyreplica identity full;" | psql 
 
-1p $port1 \  && pg_dump -F c  -p $port1 \         -t pgbench_accounts \         -t pgbench_branches \         -t
pgbench_tellers \         -t pgbench_history  \    | pg_restore -p $port2 -d testdb
 

echo  "$(cb)"

sleep 2

echo  "$(cb)"

echo "create publication pub1 for all tables;" | psql -p $port1 -aqtAX

echo "
create subscription sub1  connection 'port=${port1}'  publication pub1  with (disabled);
alter subscription sub1 enable;
" | psql -p $port2 -aqtAX
#------------------------------------

# repeat a short (10 s) pgbench-un to show that during such
# short runs the logical replication often remains intact.
# Longer pgbench-runs always derail the logrep of one or more
# of these 4 table
#
# bug:  pgbench_history no longer replicates
#       sometimes also the other 3 table de-synced.

echo  "$(cb)"
echo "-- pgbench -c 1 -T 10 -P 5 (short run, first)"         pgbench -c 1 -T 10 -P 5
sleep 2
echo  "$(cb)"

echo "-- pgbench -c 1 -T 10 -P 5 (short run, second)"         pgbench -c 1 -T 10 -P 5
sleep 2
echo  "$(cb)"

echo "-- pgbench -c 1 -T 120 -P 15 (long run)"         pgbench -c 1 -T 120 -P 15
sleep 2
echo "-- 60 second (1)"
echo  "$(cb)"
#--- end pgbench_derail.sh


(Sorry for the messy bash.)

thanks,

Erik Rijkers






Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 20/12/16 08:10, Erik Rijkers wrote:
> On 2016-12-19 08:04, Erik Rijkers wrote:
>> On 2016-12-18 11:12, Petr Jelinek wrote:
>>
>> (now using latest: patchset:)
>>
>> 0001-Add-PUBLICATION-catalogs-and-DDL-v14.patch
>> 0002-Add-SUBSCRIPTION-catalog-and-DDL-v14.patch
>> 0003-Define-logical-replication-protocol-and-output-plugi-v14.patch
>> 0004-Add-logical-replication-workers-v14.patch
>> 0005-Add-separate-synchronous-commit-control-for-logical--v14.patch
>>
>> Sometimes  replication (caused by a pgbench run)  runs for a few
>> seconds replicating all 4 pgbench tables correctly, but never longer
>> than 10 to 20 seconds.
>>
> 
> I've concocted pgbench_derail.sh.  It assumes 2 instances running,
> initially without the publication and subsciption.
> 
> There are two separate installations, on the same machine.
> 

Thanks, this was very useful. We had wrong attribute index arithmetics
in the place where we verify that replica identities match well enough.

BTW that script you have for testing has 2 minor flaws in terms of
pgbench_history - the order by is not unique enough (adding mtime or
something helps) and second, the pgbench actually truncates the
pgbench_history unless -n is added to command line.

So attached is v15, which fixes this and the
ERROR:  unexpected command tag "PUBLICATION
as reported by Steve Singer (plus tab completion fixes and doc fixes).

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Logical Replication WIP

From
Erik Rijkers
Date:
On 2016-12-20 09:43, Petr Jelinek wrote:

> Thanks, this was very useful. We had wrong attribute index arithmetics
> in the place where we verify that replica identities match well enough.

Well, I spent a lot of time on the whole thing so I am glad it's not 
just
something stupid I did :)

> BTW that script you have for testing has 2 minor flaws in terms of
> pgbench_history - the order by is not unique enough (adding mtime or
> something helps)

yes, in another version I did
   ALTER TABLE pgbench_history ADD COLUMN hid SERIAL PRIMARY KEY.
I suppose that's the best way (adding mtime doesn't work; apparently 
mtime
gets repeated too).  (I have now added that alter table-statement  
again.)

> and second, the pgbench actually truncates the
> pgbench_history unless -n is added to command line.

ok, -n  added.

> So attached is v15, which fixes this and the
> ERROR:  unexpected command tag "PUBLICATION
> as reported by Steve Singer (plus tab completion fixes and doc fixes).

Great. It seems to fix the problem: I just an an unprecidented
5-minute run with correct replication.

The first compile gave the attached diffs in the publication regression 
test; subsequent
compiles went OK (2x). If I have time later today I'll try to reproduce 
that one FAILED test
but maybe you can see  immediately what's wrong there .

thanks,

Erik Rijkers




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 20/12/16 10:41, Erik Rijkers wrote:
> On 2016-12-20 09:43, Petr Jelinek wrote:
> 
>> Thanks, this was very useful. We had wrong attribute index arithmetics
>> in the place where we verify that replica identities match well enough.
> 
> Well, I spent a lot of time on the whole thing so I am glad it's not just
> something stupid I did :)

Yeah sadly it was something stupid I did ;)

> 
>> BTW that script you have for testing has 2 minor flaws in terms of
>> pgbench_history - the order by is not unique enough (adding mtime or
>> something helps)
> 
> yes, in another version I did
>   ALTER TABLE pgbench_history ADD COLUMN hid SERIAL PRIMARY KEY.
> I suppose that's the best way (adding mtime doesn't work; apparently mtime
> gets repeated too).  (I have now added that alter table-statement  again.)
> 
>> and second, the pgbench actually truncates the
>> pgbench_history unless -n is added to command line.
> 
> ok, -n  added.
> 
>> So attached is v15, which fixes this and the
>> ERROR:  unexpected command tag "PUBLICATION
>> as reported by Steve Singer (plus tab completion fixes and doc fixes).
> 
> Great. It seems to fix the problem: I just an an unprecidented
> 5-minute run with correct replication.
> 

Great, thanks.

> The first compile gave the attached diffs in the publication regression
> test; subsequent
> compiles went OK (2x). If I have time later today I'll try to reproduce
> that one FAILED test
> but maybe you can see  immediately what's wrong there .

Seems like tables are just returned in different order but otherwise
it's ok. I guess a way to make this more stable would be to add order by
in the query psql sends to get the list of tables in the publication.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] Logical Replication WIP

From
Erik Rijkers
Date:
On 2016-12-20 10:48, Petr Jelinek wrote:

Here is another small thing:

$ psql -d testdb -p 6972
psql (10devel_logical_replication_20161220_1008_db80acfc9d50)
Type "help" for help.

testdb=# drop publication if exists xxx;
ERROR:  unrecognized object type: 28


testdb=# drop subscription if exists xxx;
WARNING:  relcache reference leak: relation "pg_subscription" not closed
DROP SUBSCRIPTION


I don't mind but I suppose eventually other messages need to go there


thanks,

Erik Rijkers



Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 20/12/16 10:56, Erik Rijkers wrote:
> On 2016-12-20 10:48, Petr Jelinek wrote:
> 
> Here is another small thing:
> 
> $ psql -d testdb -p 6972
> psql (10devel_logical_replication_20161220_1008_db80acfc9d50)
> Type "help" for help.
> 
> testdb=# drop publication if exists xxx;
> ERROR:  unrecognized object type: 28
> 
> 
> testdb=# drop subscription if exists xxx;
> WARNING:  relcache reference leak: relation "pg_subscription" not closed
> DROP SUBSCRIPTION
> 
> 
> I don't mind but I suppose eventually other messages need to go there
> 

Yep, attached should fix it.

DDL for completely new db objects surely touches a lot of places.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
Hi,

I rebased this for the changes made to inheritance and merged in the
fixes that I previously sent separately.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Logical Replication WIP

From
Erik Rijkers
Date:
On 2016-12-30 11:53, Petr Jelinek wrote:
> I rebased this for the changes made to inheritance and merged in the

> 0002-Add-SUBSCRIPTION-catalog-and-DDL-v16.patch.gz (~31 KB)







Re: [HACKERS] Logical Replication WIP

From
Erik Rijkers
Date:
On 2016-12-30 11:53, Petr Jelinek wrote:
> I rebased this for the changes made to inheritance and merged in the

> 0002-Add-SUBSCRIPTION-catalog-and-DDL-v16.patch.gz (~31 KB)

couple of orthography errors in messages



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Logical Replication WIP

From
Steve Singer
Date:
On 12/30/2016 05:53 AM, Petr Jelinek wrote:
> Hi,
>
> I rebased this for the changes made to inheritance and merged in the
> fixes that I previously sent separately.
>
>
>


I'm not sure if the following is expected or not

I have 1 publisher and 1 subscriber.
I then do pg_dump on my subscriber
./pg_dump -h localhost --port 5441 --include-subscriptions 
--no-create-subscription-slot test|./psql --port 5441 test_b

I now can't do a drop database test_b  , which is expected

but I can't drop the subscription either


test_b=# drop subscription mysub;
ERROR:  could not drop replication origin with OID 1, in use by PID 24996
 alter subscription mysub disable;
ALTER SUBSCRIPTION
drop subscription mysub;
ERROR:  could not drop replication origin with OID 1, in use by PID 24996

drop subscription mysub nodrop slot;

doesn't work either.  If I first drop the working/active subscription on 
the original 'test' database it works but I can't seem to drop the 
subscription record on test_b






Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 02/01/17 05:23, Steve Singer wrote:
> On 12/30/2016 05:53 AM, Petr Jelinek wrote:
>> Hi,
>>
>> I rebased this for the changes made to inheritance and merged in the
>> fixes that I previously sent separately.
>>
>>
>>
> 
> 
> I'm not sure if the following is expected or not
> 
> I have 1 publisher and 1 subscriber.
> I then do pg_dump on my subscriber
> ./pg_dump -h localhost --port 5441 --include-subscriptions
> --no-create-subscription-slot test|./psql --port 5441 test_b
> 
> I now can't do a drop database test_b  , which is expected
> 
> but I can't drop the subscription either
> 
> 
> test_b=# drop subscription mysub;
> ERROR:  could not drop replication origin with OID 1, in use by PID 24996
> 
>  alter subscription mysub disable;
> ALTER SUBSCRIPTION
> drop subscription mysub;
> ERROR:  could not drop replication origin with OID 1, in use by PID 24996
> 
> drop subscription mysub nodrop slot;
> 
> doesn't work either.  If I first drop the working/active subscription on
> the original 'test' database it works but I can't seem to drop the
> subscription record on test_b
> 

I guess this is because replication origins are pg instance global and
we use subscription name for origin name internally. Maybe we need to
prefix/suffix it with db oid or something like that, but that's
problematic a bit as well as they both have same length limit. I guess
we could use subscription OID as replication origin name which is
somewhat less user friendly in terms of debugging but would be unique.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] Logical Replication WIP

From
Peter Eisentraut
Date:
In 0001-Add-PUBLICATION-catalogs-and-DDL-v16.patch.gz,

+static bool
+is_publishable_class(Oid relid, Form_pg_class reltuple)
+{
+       return reltuple->relkind == RELKIND_RELATION &&
+               !IsCatalogClass(relid, reltuple) &&
+               reltuple->relpersistence == RELPERSISTENCE_PERMANENT &&
+               /* XXX needed to exclude information_schema tables */
+               relid >= FirstNormalObjectId;
+}

I don't think the XXX part is necessary, because IsCatalogClass()
already checks for the same thing.  (The whole thing is a bit bogus
anyway, because you can drop and recreate the information schema at run
time without restriction.)

+#define MAX_RELCACHE_INVAL_MSGS 100
+       List    *relids = GetPublicationRelations(HeapTupleGetOid(tup));
+
+       /*
+        * We don't want to send too many individual messages, at some point
+        * it's cheaper to just reset whole relcache.
+        *
+        * XXX: the MAX_RELCACHE_INVAL_MSGS was picked arbitrarily, maybe
+        * there is better limit.
+        */
+       if (list_length(relids) < MAX_RELCACHE_INVAL_MSGS)

Do we have more data on this?  There are people running with 100000
tables, and changing a publication with a 1000 tables would blow all
that away?

Maybe at least it should be set relative to INITRELCACHESIZE (400) to
tie things together a bit?

Update the documentation of SharedInvalCatalogMsg in sinval.h for the
"all relations" case.  (Maybe look around the whole file to make sure
comments are still valid.)

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Logical Replication WIP

From
Peter Eisentraut
Date:
On 1/3/17 2:39 PM, Peter Eisentraut wrote:
> In 0001-Add-PUBLICATION-catalogs-and-DDL-v16.patch.gz,

Attached are a couple of small fixes for this.  Feel free to ignore the
removal of the header files if they are needed by later patches.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 03/01/17 20:39, Peter Eisentraut wrote:
> In 0001-Add-PUBLICATION-catalogs-and-DDL-v16.patch.gz,
> 
> +static bool
> +is_publishable_class(Oid relid, Form_pg_class reltuple)
> +{
> +       return reltuple->relkind == RELKIND_RELATION &&
> +               !IsCatalogClass(relid, reltuple) &&
> +               reltuple->relpersistence == RELPERSISTENCE_PERMANENT &&
> +               /* XXX needed to exclude information_schema tables */
> +               relid >= FirstNormalObjectId;
> +}
> 
> I don't think the XXX part is necessary, because IsCatalogClass()
> already checks for the same thing.  (The whole thing is a bit bogus
> anyway, because you can drop and recreate the information schema at run
> time without restriction.)
>

I got this remark about IsCatalogClass() from Andres offline as well,
but it's not true, it only checks for FirstNormalObjectId for objects in
pg_catalog and toast schemas, not anywhere else.

> +#define MAX_RELCACHE_INVAL_MSGS 100
> +       List    *relids = GetPublicationRelations(HeapTupleGetOid(tup));
> +
> +       /*
> +        * We don't want to send too many individual messages, at some point
> +        * it's cheaper to just reset whole relcache.
> +        *
> +        * XXX: the MAX_RELCACHE_INVAL_MSGS was picked arbitrarily, maybe
> +        * there is better limit.
> +        */
> +       if (list_length(relids) < MAX_RELCACHE_INVAL_MSGS)
> 
> Do we have more data on this?  There are people running with 100000
> tables, and changing a publication with a 1000 tables would blow all
> that away?
> 
> Maybe at least it should be set relative to INITRELCACHESIZE (400) to
> tie things together a bit?
> 

I am actually thinking this should correspond to MAXNUMMESSAGES (4096)
as that's the limit on buffer size. I didn't find it the first time
around when I was looking for good number.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 03/01/17 22:51, Peter Eisentraut wrote:
> On 1/3/17 2:39 PM, Peter Eisentraut wrote:
>> In 0001-Add-PUBLICATION-catalogs-and-DDL-v16.patch.gz,
> 
> Attached are a couple of small fixes for this.  Feel free to ignore the
> removal of the header files if they are needed by later patches.
> 

Thanks, merged, no they are not needed by other patches.

I also hopefully resolved the concerns you had about the relcache
invalidation and expanded comment in is_publishable_class to make the
intention there bit clearer.

Only attached the changed patch, the rest should still apply fine on top
of it.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Logical Replication WIP

From
Peter Eisentraut
Date:
Some small patches for 0002-Add-SUBSCRIPTION-catalog-and-DDL-v16.patch.gz:

- Add a get_subscription_name() function

- Remove call for ApplyLauncherWakeupAtCommit() (rebasing error?)

- Remove some unused include files (same as before)

- Rename pg_dump --no-create-subscription-slot to
--no-create-subscription-slots (plural), add documentation.

In CreateSubscription(), I don't think we should connect to the remote
if no slot creation is requested.  Arguably, the point of that option is
to not make network connections.  (That is what my documentation patch
above claims, in any case.)

I don't know why we need to check the PostgreSQL version number of the
remote.  We should rely on the protocol version number, and we should
just make it work.  When PG 11 comes around, subscribing from PG 10 to a
publisher on PG 11 should just work without any warnings, IMO.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Logical Replication WIP

From
Peter Eisentraut
Date:
0003-Define-logical-replication-protocol-and-output-plugi-v16.patch.gz
looks good now, documentation is clear now.

Another fixup patch to remove excessive includes. ;-)

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Logical Replication WIP

From
Peter Eisentraut
Date:
Comments on 0004-Add-logical-replication-workers-v16.patch.gz:

I didn't find any major problems.  At times while I was testing strange
things it was not clear why "nothing is happening".  I'll do some more
checking in that direction.

Fixup patch attached that enhances some error messages, fixes some
typos, and other minor changes.  See also comments below.

---

The way check_max_logical_replication_workers() is implemented creates
potential ordering dependencies in postgresql.conf.  For example,

max_logical_replication_workers = 100
max_worker_processes = 200

fails, but if you change the order, it works.  The existing
check_max_worker_processes() has the same problem, but I suspect because
it only checks against MAX_BACKENDS, nobody has ever seriously hit that
limit.

I suggest just removing the check.  If you set
max_logical_replication_workers higher than max_worker_processes and you
hit the lower limit, then whatever is controlling max_worker_processes
should complain with its own error message.

---

The default for max_logical_replication_workers is 4, which seems very
little.  Maybe it should be more like 10 or 20.  The "Quick setup"
section recommends changing it to 10.  We should at least be
consistent there: If you set a default value that is not 0, then it
should enough that we don't need to change it again in the Quick
setup.  (Maybe the default max_worker_processes should also be
raised?)

+max_logical_replication_workers = 10 # one per subscription + one per
instance needed on subscriber

I think this is incorrect (copied from max_worker_processes?).  The
launcher does not count as one of the workers here.

On a related note, should the minimum not be 0 instead of 1?

---

About the changes to libpqrcv_startstreaming().  The timeline is not
really an option in the syntax.  Just passing in a string that is
pasted in the final command creates too much coupling, I think.  I
would keep the old timeline (TimeLineID tli) argument, and make the
options const char * [], and let startstreaming() assemble the final
string, including commas and parentheses.  It's still not a perfect
abstraction, because you need to do the quoting yourself, but much
better.  (Alternatively, get rid of the startstreaming call and just
have callers use libpqrcv_PQexec directly.)

---

Some of the header files are named inconsistently with their .c files.
I think src/include/replication/logicalworker.h should be split into
logicalapply.h and logicallauncher.h.  Not sure about
worker_internal.h.  Maybe rename apply.c to worker.c?

(I'm also not fond of throwing publicationcmds.h and
subscriptioncmds.h together into replicationcmds.h.  Maybe that could
be changed, too.)

---

Various FATAL errors in logical/relation.c when the target relation is
not in the right state.  Could those not be ERRORs?  The behavior is
the same at the moment because background workers terminate on
uncaught exceptions, but that should eventually be improved.

A FATAL error will lead to a

LOG:  unexpected EOF on standby connection

on the publisher, because the process just dies without protocol
shutdown.  (And then it reconnects and tries again.  So we might as
well not die and just retry again.)

---

In LogicalRepRelMapEntry, rename rel to localrel, so it's clearer in
the code using this struct.  (Maybe reloid -> localreloid)

---

Partitioned tables are not supported in either publications or as
replication targets.  This is expected but should be fixed before the
final release.

---

In apply.c:

The comment in apply_handle_relation() makes a point that the schema
validation is done later, but does not tell why.  The answer is
probably because it doesn't matter and it's more convenient, but it
should be explained in the comment.

See XXX comment in logicalrep_worker_stop().

The get_flush_position() return value is not intuitive from the
function name.  Maybe make that another pointer argument for clarity.

reread_subscription() complains if the subscription name was changed.
I don't know why that is a problem.

---

In launcher.c:

pg_stat_get_subscription should hold LogicalRepWorkerLock around the
whole loop, so that it doesn't get inconsistent results when workers
change during the loop.

---

In relation.c:

Inconsistent use of uint32 vs LogicalRepRelId.  Pick one. :)

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Logical Replication WIP

From
Peter Eisentraut
Date:
0005-Add-separate-synchronous-commit-control-for-logical--v16.patch.gz

This looks a little bit hackish.  I'm not sure how this would behave
properly when either synchronous_commit or
logical_replication_synchronous_commit is changed at run time with a reload.

I'm thinking maybe this and perhaps some other WAL receiver settings
should be properties of a subscription, like ALTER SUBSCRIPTION ...
SET/RESET.

Actually, maybe I'm a bit confused what this is supposed to achieve.
synchronous_commit has both a local and a remote meaning.  What behavior
are the various combinations of physical and logical replication
supposed to accomplish?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Logical Replication WIP

From
Peter Eisentraut
Date:
On 1/2/17 8:32 AM, Petr Jelinek wrote:
> On 02/01/17 05:23, Steve Singer wrote:
>> but I can't drop the subscription either
>>
>>
>> test_b=# drop subscription mysub;
>> ERROR:  could not drop replication origin with OID 1, in use by PID 24996
>>
>>  alter subscription mysub disable;
>> ALTER SUBSCRIPTION
>> drop subscription mysub;
>> ERROR:  could not drop replication origin with OID 1, in use by PID 24996
>>
>> drop subscription mysub nodrop slot;
>>
>> doesn't work either.  If I first drop the working/active subscription on
>> the original 'test' database it works but I can't seem to drop the
>> subscription record on test_b

I can't reproduce this exactly, but I notice that CREATE SUBSCRIPTION
NOCREATE SLOT does not create a replication origin, but DROP
SUBSCRIPTION NODROP SLOT does attempt to drop the origin.  If the origin
is not in use, it will just go away, but if it is in use, it might lead
to the situation described above, where the second subscription cannot
be removed.

> I guess this is because replication origins are pg instance global and
> we use subscription name for origin name internally. Maybe we need to
> prefix/suffix it with db oid or something like that, but that's
> problematic a bit as well as they both have same length limit. I guess
> we could use subscription OID as replication origin name which is
> somewhat less user friendly in terms of debugging but would be unique.

I think the most robust way would be to associate origins to
subscriptions using the object dependency mechanism, and just pick an
internal name like we do for automatically created indexes or sequences,
for example.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Logical Replication WIP

From
Peter Eisentraut
Date:
On 1/3/17 5:23 PM, Petr Jelinek wrote:
> I got this remark about IsCatalogClass() from Andres offline as well,
> but it's not true, it only checks for FirstNormalObjectId for objects in
> pg_catalog and toast schemas, not anywhere else.

I see your statement is correct, but I'm not sure the overall behavior
is sensible.  Either we consider the information_schema tables to be
catalog tables, and then IsCatalogClass() should be changed, or we
consider then non-catalog tables, and then we should let them be in
publications.  I don't think having a third category of
sometimes-catalog tables is desirable.

Currently, they clearly behave like non-catalog tables, since you can
just drop and recreate them freely, so I would choose the second option.It might be worth changing that, but it doesn't
haveto be the job of
 
this patch set.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 10/01/17 14:52, Peter Eisentraut wrote:
> On 1/2/17 8:32 AM, Petr Jelinek wrote:
>> On 02/01/17 05:23, Steve Singer wrote:
>>> but I can't drop the subscription either
>>>
>>>
>>> test_b=# drop subscription mysub;
>>> ERROR:  could not drop replication origin with OID 1, in use by PID 24996
>>>
>>>  alter subscription mysub disable;
>>> ALTER SUBSCRIPTION
>>> drop subscription mysub;
>>> ERROR:  could not drop replication origin with OID 1, in use by PID 24996
>>>
>>> drop subscription mysub nodrop slot;
>>>
>>> doesn't work either.  If I first drop the working/active subscription on
>>> the original 'test' database it works but I can't seem to drop the
>>> subscription record on test_b
> 
> I can't reproduce this exactly, but I notice that CREATE SUBSCRIPTION
> NOCREATE SLOT does not create a replication origin, but DROP
> SUBSCRIPTION NODROP SLOT does attempt to drop the origin.  If the origin
> is not in use, it will just go away, but if it is in use, it might lead
> to the situation described above, where the second subscription cannot
> be removed.

This is thinko in it's own regard, origin needs to be created regardless
of the slot.

> 
>> I guess this is because replication origins are pg instance global and
>> we use subscription name for origin name internally. Maybe we need to
>> prefix/suffix it with db oid or something like that, but that's
>> problematic a bit as well as they both have same length limit. I guess
>> we could use subscription OID as replication origin name which is
>> somewhat less user friendly in terms of debugging but would be unique.
> 
> I think the most robust way would be to associate origins to
> subscriptions using the object dependency mechanism, and just pick an
> internal name like we do for automatically created indexes or sequences,
> for example.
> 

That will not help, issue is that we consider names for origins to be
unique across cluster while subscription names are per database so if
there is origin per subscription (which there has to be) it will always
clash if we just use the name. I already have locally changed this to
pg_<subscription_oid> naming scheme and it works fine.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 10/01/17 15:06, Peter Eisentraut wrote:
> On 1/3/17 5:23 PM, Petr Jelinek wrote:
>> I got this remark about IsCatalogClass() from Andres offline as well,
>> but it's not true, it only checks for FirstNormalObjectId for objects in
>> pg_catalog and toast schemas, not anywhere else.
> 
> I see your statement is correct, but I'm not sure the overall behavior
> is sensible.  Either we consider the information_schema tables to be
> catalog tables, and then IsCatalogClass() should be changed, or we
> consider then non-catalog tables, and then we should let them be in
> publications.  I don't think having a third category of
> sometimes-catalog tables is desirable.
> 
> Currently, they clearly behave like non-catalog tables, since you can
> just drop and recreate them freely, so I would choose the second option.
>  It might be worth changing that, but it doesn't have to be the job of
> this patch set.
> 

Okay, looking into my notes, I originally did this because we did not
allow adding tables without pkeys to publications which effectively
prohibited FOR ALL TABLES publication from working because of
information_schema without this. Since this is no longer the case I
think it's safe to skip the FirstNormalObjectId check.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] Logical Replication WIP

From
Peter Eisentraut
Date:
On 1/11/17 3:11 AM, Petr Jelinek wrote:
> That will not help, issue is that we consider names for origins to be
> unique across cluster while subscription names are per database so if
> there is origin per subscription (which there has to be) it will always
> clash if we just use the name. I already have locally changed this to
> pg_<subscription_oid> naming scheme and it works fine.

How will that make it unique across the cluster?

Should we include the system ID from pg_control?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Logical Replication WIP

From
Peter Eisentraut
Date:
On 1/11/17 3:29 AM, Petr Jelinek wrote:
> Okay, looking into my notes, I originally did this because we did not
> allow adding tables without pkeys to publications which effectively
> prohibited FOR ALL TABLES publication from working because of
> information_schema without this. Since this is no longer the case I
> think it's safe to skip the FirstNormalObjectId check.

Wouldn't that mean that FOR ALL TABLES replicates the tables from
information_schema?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 11/01/17 18:32, Peter Eisentraut wrote:
> On 1/11/17 3:29 AM, Petr Jelinek wrote:
>> Okay, looking into my notes, I originally did this because we did not
>> allow adding tables without pkeys to publications which effectively
>> prohibited FOR ALL TABLES publication from working because of
>> information_schema without this. Since this is no longer the case I
>> think it's safe to skip the FirstNormalObjectId check.
> 
> Wouldn't that mean that FOR ALL TABLES replicates the tables from
> information_schema?
> 

Yes, as they are not catalog tables, I thought that was your point.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 11/01/17 18:27, Peter Eisentraut wrote:
> On 1/11/17 3:11 AM, Petr Jelinek wrote:
>> That will not help, issue is that we consider names for origins to be
>> unique across cluster while subscription names are per database so if
>> there is origin per subscription (which there has to be) it will always
>> clash if we just use the name. I already have locally changed this to
>> pg_<subscription_oid> naming scheme and it works fine.
> 
> How will that make it unique across the cluster?
> 
> Should we include the system ID from pg_control?
> 

pg_subscription is shared catalog so oids are unique.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] Logical Replication WIP

From
Peter Eisentraut
Date:
On 1/11/17 3:35 PM, Petr Jelinek wrote:
> On 11/01/17 18:27, Peter Eisentraut wrote:
>> On 1/11/17 3:11 AM, Petr Jelinek wrote:
>>> That will not help, issue is that we consider names for origins to be
>>> unique across cluster while subscription names are per database so if
>>> there is origin per subscription (which there has to be) it will always
>>> clash if we just use the name. I already have locally changed this to
>>> pg_<subscription_oid> naming scheme and it works fine.
>>
>> How will that make it unique across the cluster?
>>
>> Should we include the system ID from pg_control?
>>
> 
> pg_subscription is shared catalog so oids are unique.

Oh, I see what you mean by cluster now.  It's a confusing term.


-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Logical Replication WIP

From
Peter Eisentraut
Date:
On 1/11/17 3:35 PM, Petr Jelinek wrote:
> On 11/01/17 18:32, Peter Eisentraut wrote:
>> On 1/11/17 3:29 AM, Petr Jelinek wrote:
>>> Okay, looking into my notes, I originally did this because we did not
>>> allow adding tables without pkeys to publications which effectively
>>> prohibited FOR ALL TABLES publication from working because of
>>> information_schema without this. Since this is no longer the case I
>>> think it's safe to skip the FirstNormalObjectId check.
>>
>> Wouldn't that mean that FOR ALL TABLES replicates the tables from
>> information_schema?
>>
> 
> Yes, as they are not catalog tables, I thought that was your point.

But we shouldn't do that.  So we need to exclude information_schema from
"all tables" somehow.  Just probably not by OID, since that is not fixed.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 11/01/17 22:30, Peter Eisentraut wrote:
> On 1/11/17 3:35 PM, Petr Jelinek wrote:
>> On 11/01/17 18:32, Peter Eisentraut wrote:
>>> On 1/11/17 3:29 AM, Petr Jelinek wrote:
>>>> Okay, looking into my notes, I originally did this because we did not
>>>> allow adding tables without pkeys to publications which effectively
>>>> prohibited FOR ALL TABLES publication from working because of
>>>> information_schema without this. Since this is no longer the case I
>>>> think it's safe to skip the FirstNormalObjectId check.
>>>
>>> Wouldn't that mean that FOR ALL TABLES replicates the tables from
>>> information_schema?
>>>
>>
>> Yes, as they are not catalog tables, I thought that was your point.
> 
> But we shouldn't do that.  So we need to exclude information_schema from
> "all tables" somehow.  Just probably not by OID, since that is not fixed.
> 

I am not quite sure I agree with this. Either it's system object and we
don't replicate it (which I would have considered to be anything with
Oid < FirstNormalObjectId) or it's user made and then it should be
replicated. Filtering by schema name is IMHO way too fragile (what stops
user creating additional tables there for example).

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 06/01/17 21:26, Peter Eisentraut wrote:
> 0005-Add-separate-synchronous-commit-control-for-logical--v16.patch.gz
> 
> This looks a little bit hackish.  I'm not sure how this would behave
> properly when either synchronous_commit or
> logical_replication_synchronous_commit is changed at run time with a reload.
> 

Yes, I said in the initial email that this is meant for discussion and
not as final implementation. And certainly it's not required for initial
commit. Perhaps I should have started separate thread for this part.

> I'm thinking maybe this and perhaps some other WAL receiver settings
> should be properties of a subscription, like ALTER SUBSCRIPTION ...
> SET/RESET.
>

True, but we still need the GUC defaults.

> Actually, maybe I'm a bit confused what this is supposed to achieve.
> synchronous_commit has both a local and a remote meaning.  What behavior
> are the various combinations of physical and logical replication
> supposed to accomplish?
> 

It's meant to decouple the synchronous commit setting for logical
replication workers from the one set for normal clients. Now that we
have owners for subscription and subscription runs as that owner, maybe
we could do that via ALTER USER. However I think the apply should by
default run with sync commit turned off as the performance benefits are
important there given that there is one worker that has to replicate in
serialized manner and the success of replication is not confirmed by
responding to COMMIT but by reporting LSNs of various replication stages.

Perhaps the logical_replication_synchronous_commit should only be
boolean that would translate to 'off' and 'local' for the real
synchronous_commit.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] Logical Replication WIP

From
Euler Taveira
Date:
On 15-01-2017 15:13, Petr Jelinek wrote:
> I am not quite sure I agree with this. Either it's system object and we
> don't replicate it (which I would have considered to be anything with
> Oid < FirstNormalObjectId) or it's user made and then it should be
> replicated. Filtering by schema name is IMHO way too fragile (what stops
> user creating additional tables there for example).
> 
What happens if you replicate information_schema tables? AFAICS, those
tables are already in the subscriber database. And will it generate
error or warning? (I'm not sure how this functionality deals with
schemas.) Also, why do I want to replicate a information schema table?
Their contents are static and, by default, it is already in each database.

Information schema isn't a catalog but I think it is good to exclude it
from FOR ALL TABLES clause because the use case is almost zero. Of
course, it should be documented. Also, if someone wants to replicate an
information schema table, it could do it with ALTER PUBLICATION.


--   Euler Taveira                   Timbira - http://www.timbira.com.br/  PostgreSQL: Consultoria, Desenvolvimento,
Suporte24x7 e Treinamento
 



Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 15/01/17 20:20, Euler Taveira wrote:
> On 15-01-2017 15:13, Petr Jelinek wrote:
>> I am not quite sure I agree with this. Either it's system object and we
>> don't replicate it (which I would have considered to be anything with
>> Oid < FirstNormalObjectId) or it's user made and then it should be
>> replicated. Filtering by schema name is IMHO way too fragile (what stops
>> user creating additional tables there for example).
>>
> What happens if you replicate information_schema tables? AFAICS, those
> tables are already in the subscriber database. And will it generate
> error or warning? (I'm not sure how this functionality deals with
> schemas.) Also, why do I want to replicate a information schema table?
> Their contents are static and, by default, it is already in each database.
> 
> Information schema isn't a catalog but I think it is good to exclude it
> from FOR ALL TABLES clause because the use case is almost zero. Of
> course, it should be documented. Also, if someone wants to replicate an
> information schema table, it could do it with ALTER PUBLICATION.
> 

Well the preinstalled information_schema is excluded by the
FirstNormalObjectId filter as it's created by initdb. If user drops and
recreates it that means it was created as user object.

My opinion is that FOR ALL TABLES should replicate all user tables (ie,
anything that has Oid >= FirstNormalObjectId), if those are added to
information_schema that's up to user. We also replicate user created
tables in pg_catalog even if it's system catalog so I don't see why
information_schema should be filtered on schema level.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
Hi,

finally got to this (multiple emails squashed into one).

On 04/01/17 18:46, Peter Eisentraut wrote:
> Some small patches for 0002-Add-SUBSCRIPTION-catalog-and-DDL-v16.patch.gz:
> 

Merged thanks.

> In CreateSubscription(), I don't think we should connect to the remote
> if no slot creation is requested.  Arguably, the point of that option is
> to not make network connections.  (That is what my documentation patch
> above claims, in any case.)
> 

Agreed and done.

> I don't know why we need to check the PostgreSQL version number of the
> remote.  We should rely on the protocol version number, and we should
> just make it work.  When PG 11 comes around, subscribing from PG 10 to a
> publisher on PG 11 should just work without any warnings, IMO.
> 

Also agreed and removed.

> 003-Define-logical-replication-protocol-and-output-plugi-v16.patch.gz
> looks good now, documentation is clear now.
> 
> Another fixup patch to remove excessive includes. 

Thanks merged.

> Comments on 0004-Add-logical-replication-workers-v16.patch.gz:
> 
> I didn't find any major problems.  At times while I was testing strange
> things it was not clear why "nothing is happening".  I'll do some more
> checking in that direction.
> 
> Fixup patch attached that enhances some error messages, fixes some
> typos, and other minor changes.  See also comments below.
> 

Merged.

> 
> The way check_max_logical_replication_workers() is implemented creates
> potential ordering dependencies in postgresql.conf.  For example,
> 
> max_logical_replication_workers = 100
> max_worker_processes = 200
> 
> fails, but if you change the order, it works.  The existing
> check_max_worker_processes() has the same problem, but I suspect because
> it only checks against MAX_BACKENDS, nobody has ever seriously hit that
> limit.
> 
> I suggest just removing the check.  If you set
> max_logical_replication_workers higher than max_worker_processes and you
> hit the lower limit, then whatever is controlling max_worker_processes
> should complain with its own error message.
> 

Good point, removed.

> 
> The default for max_logical_replication_workers is 4, which seems very
> little.  Maybe it should be more like 10 or 20.  The "Quick setup"
> section recommends changing it to 10.  We should at least be
> consistent there: If you set a default value that is not 0, then it
> should enough that we don't need to change it again in the Quick
> setup.  (Maybe the default max_worker_processes should also be
> raised?)

Well, it's 4 because max_worker_processes is 8, I think default
max_worker_processes should be higher than
max_logical_replication_workers so that's why I picked 4. If we are okay
wit bumping the max_worker_processes a bit, I am all for increasing
max_logical_replication_workers as well.

The quick setup mentions 10 mainly for consistency with slots and wal
senders (those IMHO should also not be 0 by default at this point...).

> 
> +max_logical_replication_workers = 10 # one per subscription + one per
> instance needed on subscriber
> 
> I think this is incorrect (copied from max_worker_processes?).  The
> launcher does not count as one of the workers here.
> 
> On a related note, should the minimum not be 0 instead of 1?
> 

Eh, yes.

> 
> About the changes to libpqrcv_startstreaming().  The timeline is not
> really an option in the syntax.  Just passing in a string that is
> pasted in the final command creates too much coupling, I think.  I
> would keep the old timeline (TimeLineID tli) argument, and make the
> options const char * [], and let startstreaming() assemble the final
> string, including commas and parentheses.  It's still not a perfect
> abstraction, because you need to do the quoting yourself, but much
> better.  (Alternatively, get rid of the startstreaming call and just
> have callers use libpqrcv_PQexec directly.)
> 

I did this somewhat differently, with struct that defines options and
has different union members for physical and logical replication. What
do you think of that?

> 
> Some of the header files are named inconsistently with their .c files.
> I think src/include/replication/logicalworker.h should be split into
> logicalapply.h and logicallauncher.h.

Okay.

>  Not sure about
> worker_internal.h.  Maybe rename apply.c to worker.c?
> 

Hmm I did that, seems reasonably okay. Original patch in fact had both
worker.c and apply.c and I eventually moved the worker.c functions to
either apply.c or launcher.c.

> (I'm also not fond of throwing publicationcmds.h and
> subscriptioncmds.h together into replicationcmds.h.  Maybe that could
> be changed, too)

Okay.

> 
> Various FATAL errors in logical/relation.c when the target relation is
> not in the right state.  Could those not be ERRORs?  The behavior is
> the same at the moment because background workers terminate on
> uncaught exceptions, but that should eventually be improved.
> 

Seems like you changed this in your patch. I don't have any objections.

> 
> In LogicalRepRelMapEntry, rename rel to localrel, so it's clearer in
> the code using this struct.  (Maybe reloid -> localreloid)
> 

Okay.

> 
> Partitioned tables are not supported in either publications or as
> replication targets.  This is expected but should be fixed before the
> final release.
> 

Yes, that will need some discussion about corner case behaviour. For
example, have partitioned table 'foo' which is in publication, then you
have table 'bar' which is not in publication, you attach it to the
partitioned table 'foo', should it automatically be added to
publication? Then you detach it, should it then be removed from publication?
What if 'bar' was in publication before it was attached/detached to/from
'foo'? What if 'foo' wasn't in publication but 'bar' was? Should we
allow ONLY syntax for partitioned table when they are being added and
removed?

Sadly current partitioning section of the docs doesn't provide any
guidance in terms of precedents for other actions here as it still
speaks about using inheritance and check constraints directly instead of
the new feature.

My proposal would be to let partitions to be added/removed to/from
publications normally (as they are now) and have them also check if
parent table is published in case they aren't (ie, if partitioned table
is in some publications, all partitions are implicitly as well without
adding them to the pg_publication_rel catalog, but they also keep their
own membership in publications as well individually there). That would
mean we don't allow ONLY syntax for partitioned tables. One scenario
where I am on the fence is what should happen here if we do ALTER
PUBLICATION ... DROP TABLE partitioned_table in case that
partitioned_table also contains partition which was explicitly added to
the publication, should it keep its own membership or should it be
removed? Maybe we could allow the ONLY clause only for DROP but not for ADD?

> 
> In apply.c:
> 
> The comment in apply_handle_relation() makes a point that the schema
> validation is done later, but does not tell why.  The answer is
> probably because it doesn't matter and it's more convenient, but it
> should be explained in the comment.

Yes I noticed, I tried to explain.

> 
> See XXX comment in logicalrep_worker_stop().

Yes that was a good point.

> 
> The get_flush_position() return value is not intuitive from the
> function name.  Maybe make that another pointer argument for clarity.

Okay.

> reread_subscription() complains if the subscription name was changed.
> I don't know why that is a problem.

Because we don't have ALTER SUBSCRIPTION RENAME currently. Maybe should
be Assert?

> 
> In launcher.c:
> 
> pg_stat_get_subscription should hold LogicalRepWorkerLock around the
> whole loop, so that it doesn't get inconsistent results when workers
> change during the loop.
> 

Done.

> In relation.c:
> 
> Inconsistent use of uint32 vs LogicalRepRelId.  Pick one. 

Done.

Attached is new version with your changes merged and above suggestions
applied. It still does not support partitioned tables and does the
filtering using FirstNormalObjectId.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Logical Replication WIP

From
Erik Rijkers
Date:
On 2017-01-15 23:20, Petr Jelinek wrote:

> 0001-Add-PUBLICATION-catalogs-and-DDL-v18.patch
> 0002-Add-SUBSCRIPTION-catalog-and-DDL-v18.patch
> 0003-Define-logical-replication-protocol-and-output-plugi-v18.patch
> 0004-Add-logical-replication-workers-v18.patch
> 0005-Add-separate-synchronous-commit-control-for-logical--v18.patch

patches apply OK (to master), but I get this compile error:

execReplication.c: In function ‘ExecSimpleRelationInsert’:
execReplication.c:392:41: warning: passing argument 3 of 
‘ExecConstraints’ from incompatible pointer type 
[-Wincompatible-pointer-types]    ExecConstraints(resultRelInfo, slot, estate);
^~~~~~
In file included from execReplication.c:21:0:
../../../src/include/executor/executor.h:197:13: note: expected 
‘TupleTableSlot * {aka struct TupleTableSlot *}’ but argument is of type 
‘EState * {aka struct EState *}’ extern void ExecConstraints(ResultRelInfo *resultRelInfo,             ^~~~~~~~~~~~~~~
execReplication.c:392:4: error: too few arguments to function 
‘ExecConstraints’    ExecConstraints(resultRelInfo, slot, estate);    ^~~~~~~~~~~~~~~
In file included from execReplication.c:21:0:
../../../src/include/executor/executor.h:197:13: note: declared here extern void ExecConstraints(ResultRelInfo
*resultRelInfo,            ^~~~~~~~~~~~~~~
 
execReplication.c: In function ‘ExecSimpleRelationUpdate’:
execReplication.c:451:41: warning: passing argument 3 of 
‘ExecConstraints’ from incompatible pointer type 
[-Wincompatible-pointer-types]    ExecConstraints(resultRelInfo, slot, estate);
^~~~~~
In file included from execReplication.c:21:0:
../../../src/include/executor/executor.h:197:13: note: expected 
‘TupleTableSlot * {aka struct TupleTableSlot *}’ but argument is of type 
‘EState * {aka struct EState *}’ extern void ExecConstraints(ResultRelInfo *resultRelInfo,             ^~~~~~~~~~~~~~~
execReplication.c:451:4: error: too few arguments to function 
‘ExecConstraints’    ExecConstraints(resultRelInfo, slot, estate);    ^~~~~~~~~~~~~~~
In file included from execReplication.c:21:0:
../../../src/include/executor/executor.h:197:13: note: declared here extern void ExecConstraints(ResultRelInfo
*resultRelInfo,            ^~~~~~~~~~~~~~~
 
make[3]: *** [execReplication.o] Error 1
make[2]: *** [executor-recursive] Error 2
make[1]: *** [install-backend-recurse] Error 2
make: *** [install-src-recurse] Error 2



Erik Rijkers






Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 15/01/17 23:57, Erik Rijkers wrote:
> On 2017-01-15 23:20, Petr Jelinek wrote:
> 
>> 0001-Add-PUBLICATION-catalogs-and-DDL-v18.patch
>> 0002-Add-SUBSCRIPTION-catalog-and-DDL-v18.patch
>> 0003-Define-logical-replication-protocol-and-output-plugi-v18.patch
>> 0004-Add-logical-replication-workers-v18.patch
>> 0005-Add-separate-synchronous-commit-control-for-logical--v18.patch
> 
> patches apply OK (to master), but I get this compile error:
> 

Ah missed that during final rebase, sorry. Here is fixed 0004 patch.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Logical Replication WIP

From
Peter Eisentraut
Date:
On 1/15/17 2:28 PM, Petr Jelinek wrote:
> Well the preinstalled information_schema is excluded by the
> FirstNormalObjectId filter as it's created by initdb. If user drops and
> recreates it that means it was created as user object.
> 
> My opinion is that FOR ALL TABLES should replicate all user tables (ie,
> anything that has Oid >= FirstNormalObjectId), if those are added to
> information_schema that's up to user. We also replicate user created
> tables in pg_catalog even if it's system catalog so I don't see why
> information_schema should be filtered on schema level.

Fair enough.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Logical Replication WIP

From
Peter Eisentraut
Date:
On 1/15/17 5:20 PM, Petr Jelinek wrote:
> Well, it's 4 because max_worker_processes is 8, I think default
> max_worker_processes should be higher than
> max_logical_replication_workers so that's why I picked 4. If we are okay
> wit bumping the max_worker_processes a bit, I am all for increasing
> max_logical_replication_workers as well.
> 
> The quick setup mentions 10 mainly for consistency with slots and wal
> senders (those IMHO should also not be 0 by default at this point...).

Those defaults have now been changed, so the "Quick setup" section could
potentially be simplified a bit.

> I did this somewhat differently, with struct that defines options and
> has different union members for physical and logical replication. What
> do you think of that?

Looks good.

>>  Not sure about
>> worker_internal.h.  Maybe rename apply.c to worker.c?
>>
> 
> Hmm I did that, seems reasonably okay. Original patch in fact had both
> worker.c and apply.c and I eventually moved the worker.c functions to
> either apply.c or launcher.c.

I'm not too worried about this.

> Yes, that will need some discussion about corner case behaviour. For
> example, have partitioned table 'foo' which is in publication, then you
> have table 'bar' which is not in publication, you attach it to the
> partitioned table 'foo', should it automatically be added to
> publication? Then you detach it, should it then be removed from publication?
> What if 'bar' was in publication before it was attached/detached to/from
> 'foo'? What if 'foo' wasn't in publication but 'bar' was? Should we
> allow ONLY syntax for partitioned table when they are being added and
> removed?

Let's think about that in a separate thread.

>> reread_subscription() complains if the subscription name was changed.
>> I don't know why that is a problem.
> 
> Because we don't have ALTER SUBSCRIPTION RENAME currently. Maybe should
> be Assert?

Is there anything stopping anyone from implementing it?


I'm happy with these patches now.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Logical Replication WIP

From
Peter Eisentraut
Date:
On 1/15/17 1:48 PM, Petr Jelinek wrote:
> It's meant to decouple the synchronous commit setting for logical
> replication workers from the one set for normal clients. Now that we
> have owners for subscription and subscription runs as that owner, maybe
> we could do that via ALTER USER.

I was thinking about that as well.

> However I think the apply should by
> default run with sync commit turned off as the performance benefits are
> important there given that there is one worker that has to replicate in
> serialized manner and the success of replication is not confirmed by
> responding to COMMIT but by reporting LSNs of various replication stages.

Hmm, I don't think we should ship with an "unsafe" default.  Do we have
any measurements of the performance impact?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 17/01/17 17:09, Peter Eisentraut wrote:
> 
>> Yes, that will need some discussion about corner case behaviour. For
>> example, have partitioned table 'foo' which is in publication, then you
>> have table 'bar' which is not in publication, you attach it to the
>> partitioned table 'foo', should it automatically be added to
>> publication? Then you detach it, should it then be removed from publication?
>> What if 'bar' was in publication before it was attached/detached to/from
>> 'foo'? What if 'foo' wasn't in publication but 'bar' was? Should we
>> allow ONLY syntax for partitioned table when they are being added and
>> removed?
> 
> Let's think about that in a separate thread.
> 

Agreed.

>>> reread_subscription() complains if the subscription name was changed.
>>> I don't know why that is a problem.
>>
>> Because we don't have ALTER SUBSCRIPTION RENAME currently. Maybe should
>> be Assert?
> 
> Is there anything stopping anyone from implementing it?
> 

No, just didn't seem priority for the functionality right now.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 17/01/17 17:11, Peter Eisentraut wrote:
> On 1/15/17 1:48 PM, Petr Jelinek wrote:
>> It's meant to decouple the synchronous commit setting for logical
>> replication workers from the one set for normal clients. Now that we
>> have owners for subscription and subscription runs as that owner, maybe
>> we could do that via ALTER USER.
> 
> I was thinking about that as well.
> 
>> However I think the apply should by
>> default run with sync commit turned off as the performance benefits are
>> important there given that there is one worker that has to replicate in
>> serialized manner and the success of replication is not confirmed by
>> responding to COMMIT but by reporting LSNs of various replication stages.
> 
> Hmm, I don't think we should ship with an "unsafe" default.  Do we have
> any measurements of the performance impact?
> 

I will have to do some for the patch specifically, I only have ones for
pglogical/bdr where it's quite significant.

The default is not unsafe really, we still report correct flush position
to the publisher. The synchronous replication on publisher will still
work even if synchronous standby is subscription which itself has sync
commit off (that's why the complicated
send_feedback()/get_flush_position()) but will have higher latency as
flushes don't happen immediately. Cascading should be fine as well even
around crashes as logical decoding only picks up flushed WAL.

It could be however argued there may be some consistency issues around
the crash as other transactions could have already seen data that
disappeared after postgres recovery and then reappeared again when the
replication caught up again. That might indeed be a show stopper for the
default off.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] Logical Replication WIP

From
Robert Haas
Date:
On Tue, Jan 17, 2017 at 11:15 AM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:
>> Is there anything stopping anyone from implementing it?
>
> No, just didn't seem priority for the functionality right now.

Why is it OK for this to not support rename like everything else does?It shouldn't be more than a few hours of work to
fixthat, and I
 
think leaving stuff like that out just because it's a lower priority
is fairly short-sighted.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 17/01/17 22:43, Robert Haas wrote:
> On Tue, Jan 17, 2017 at 11:15 AM, Petr Jelinek
> <petr.jelinek@2ndquadrant.com> wrote:
>>> Is there anything stopping anyone from implementing it?
>>
>> No, just didn't seem priority for the functionality right now.
> 
> Why is it OK for this to not support rename like everything else does?
>  It shouldn't be more than a few hours of work to fix that, and I
> think leaving stuff like that out just because it's a lower priority
> is fairly short-sighted.
> 

Sigh, I wanted to leave it for next CF, but since you insist. Here is a
patch that adds rename.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment
On 2017-01-19 01:02, Petr Jelinek wrote:

This causes the replica to crash:

#--------------
#!/bin/bash

# 2 instances on 6972 (master) and 6973 (replica)
# initially without publication or subscription

# clean logs
#echo > 
/var/data1/pg_stuff/pg_installations/pgsql.logical_replication/logfile.logical_replication
#echo > 
/var/data1/pg_stuff/pg_installations/pgsql.logical_replication2/logfile.logical_replication2

SLEEP=1
bail=0
pub_count=$( echo "select count(*) from pg_publication" | psql -qtAXp 
6972 )
if  [[ $pub_count -ne 0 ]]
then  echo "pub_count -ne 0 - deleting pub1 & bailing out"  echo "drop publication if exists pub1" | psql -Xp 6972
bail=1
fi
sub_count=$( echo "select count(*) from pg_subscription" | psql -qtAXp 
6973 )
if [[ $sub_count -ne 0 ]] then  echo "sub_count -ne 0 - deleting sub1 & bailing out"  echo "drop subscription if exists
sub1"| psql -Xp 6973  bail=1
 
fi

if [[ $bail -eq 1 ]]
then   exit -1
fi

echo "drop table if exists testt;" | psql -qXap 6972
echo "drop table if exists testt;" | psql -qXap 6973

echo "-- on master  (port 6972):
create table testt(id serial primary key, n integer, c text);
create publication pub1 for all tables; " | psql -qXap 6972

echo "-- on replica (port 6973):
create table testt(id serial primary key, n integer, c text);
create subscription sub1 connection 'port=6972' publication pub1 with 
(disabled);
alter  subscription sub1 enable; "        | psql -qXap 6973

sleep $SLEEP

echo "table testt /*limit 3*/; select current_setting('port'), count(*) 
from testt;" | psql -qXp 6972
echo "table testt /*limit 3*/; select current_setting('port'), count(*) 
from testt;" | psql -qXp 6973

echo "-- now crash:
analyze pg_subscription" | psql -qXp 6973
#--------------



-- log of the replica:

2017-01-19 17:54:09.163 CET 224200 LOG:  starting logical replication 
worker for subscription "sub1"
2017-01-19 17:54:09.166 CET 21166 LOG:  logical replication apply for 
subscription sub1 started
2017-01-19 17:54:09.169 CET 21166 LOG:  starting logical replication 
worker for subscription "sub1"
2017-01-19 17:54:09.172 CET 21171 LOG:  logical replication sync for 
subscription sub1, table testt started
2017-01-19 17:54:09.190 CET 21171 LOG:  logical replication 
synchronization worker finished processing
TRAP: FailedAssertion("!(((array)->elemtype) == extra_data->type_id)", 
File: "array_typanalyze.c", Line: 340)
2017-01-19 17:54:20.110 CET 224190 LOG:  server process (PID 21183) was 
terminated by signal 6: Aborted
2017-01-19 17:54:20.110 CET 224190 DETAIL:  Failed process was running: 
autovacuum: ANALYZE pg_catalog.pg_subscription
2017-01-19 17:54:20.110 CET 224190 LOG:  terminating any other active 
server processes
2017-01-19 17:54:20.110 CET 224198 WARNING:  terminating connection 
because of crash of another server process
2017-01-19 17:54:20.110 CET 224198 DETAIL:  The postmaster has commanded 
this server process to roll back the current transaction and exit, 
because another server process exited abnormally and possibly corrupted 
shared memory.
2017-01-19 17:54:20.110 CET 224198 HINT:  In a moment you should be able 
to reconnect to the database and repeat your command.
2017-01-19 17:54:20.111 CET 224190 LOG:  all server processes 
terminated; reinitializing
2017-01-19 17:54:20.143 CET 21184 LOG:  database system was interrupted; 
last known up at 2017-01-19 17:38:48 CET
2017-01-19 17:54:20.179 CET 21184 LOG:  recovered replication state of 
node 1 to 0/2CEBF08
2017-01-19 17:54:20.179 CET 21184 LOG:  database system was not properly 
shut down; automatic recovery in progress
2017-01-19 17:54:20.181 CET 21184 LOG:  redo starts at 0/2513E88
2017-01-19 17:54:20.184 CET 21184 LOG:  invalid record length at 
0/2546980: wanted 24, got 0
2017-01-19 17:54:20.184 CET 21184 LOG:  redo done at 0/2546918
2017-01-19 17:54:20.184 CET 21184 LOG:  last completed transaction was 
at log time 2017-01-19 17:54:09.191697+01
2017-01-19 17:54:20.191 CET 21184 LOG:  MultiXact member wraparound 
protections are now enabled
2017-01-19 17:54:20.193 CET 224190 LOG:  database system is ready to 
accept connections
2017-01-19 17:54:20.193 CET 21188 LOG:  autovacuum launcher started
2017-01-19 17:54:20.194 CET 21190 LOG:  logical replication launcher 
started
2017-01-19 17:54:20.194 CET 21190 LOG:  starting logical replication 
worker for subscription "sub1"
2017-01-19 17:54:20.202 CET 21191 LOG:  logical replication apply for 
subscription sub1 started



Could probably be whittled down to something shorter but I hope it's 
still easily reproduced.


thanks,

Erik Rijkers





setup of the 2 instances:


#---------------- ./instances.sh
#!/bin/bash
port1=6972
port2=6973
project1=logical_replication
project2=logical_replication2
# pg_stuff_dir=$HOME/pg_stuff  pg_stuff_dir=/var/data1/pg_stuff
PATH1=$pg_stuff_dir/pg_installations/pgsql.$project1/bin:$PATH
PATH2=$pg_stuff_dir/pg_installations/pgsql.$project2/bin:$PATH
server_dir1=$pg_stuff_dir/pg_installations/pgsql.$project1
server_dir2=$pg_stuff_dir/pg_installations/pgsql.$project2
data_dir1=$server_dir1/data
data_dir2=$server_dir2/data
options1="
-c wal_level=logical
-c max_replication_slots=10
-c max_worker_processes=12
-c max_logical_replication_workers=10
-c max_wal_senders=10
-c logging_collector=on
-c log_directory=$server_dir1
-c log_filename=logfile.${project1} "

options2="
-c wal_level=replica
-c max_replication_slots=10
-c max_worker_processes=12
-c max_logical_replication_workers=10
-c max_wal_senders=10
-c logging_collector=on
-c log_directory=$server_dir2
-c log_filename=logfile.${project2} "
which postgres
export PATH=$PATH1; postgres -D $data_dir1 -p $port1 ${options1} &
export PATH=$PATH2; postgres -D $data_dir2 -p $port2 ${options2} &
#---------------- ./instances.sh end




On 19/01/17 18:44, Erik Rijkers wrote:
> 
> Could probably be whittled down to something shorter but I hope it's
> still easily reproduced.
> 

Just analyze on the pg_subscription is enough. Looks like it's the
name[] type, when I change it to text[] like in the attached patch it
works fine for me.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment
On 2017-01-19 19:12, Petr Jelinek wrote:
> On 19/01/17 18:44, Erik Rijkers wrote:
>> 
>> Could probably be whittled down to something shorter but I hope it's
>> still easily reproduced.
>> 
> 
> Just analyze on the pg_subscription is enough.


heh. Ah well, I did find it :)


Can you give the current patch set? I am failing to get a compilable 
set.

In the following order they apply, but then fail during compile.

0001-Add-PUBLICATION-catalogs-and-DDL-v18.patch
0002-Add-SUBSCRIPTION-catalog-and-DDL-v18.patch
0003-Define-logical-replication-protocol-and-output-plugi-v18.patch
0004-Add-logical-replication-workers-v18fixed.patch
0006-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION.patch
0001-Logical-replication-support-for-initial-data-copy-v3.patch
pg_subscription-analyze-fix.diff

The compile fails with:

In file included from ../../../../src/include/postgres.h:47:0,                 from worker.c:27:
worker.c: In function ‘create_estate_for_relation’:
../../../../src/include/c.h:203:14: warning: passing argument 4 of 
‘InitResultRelInfo’ makes pointer from integer without a cast 
[-Wint-conversion] #define true ((bool) 1)              ^
worker.c:187:53: note: in expansion of macro ‘true’  InitResultRelInfo(resultRelInfo, rel->localrel, 1, true, NULL, 0);
                                                   ^~~~
 
In file included from ../../../../src/include/funcapi.h:21:0,                 from worker.c:31:
../../../../src/include/executor/executor.h:189:13: note: expected 
‘Relation {aka struct RelationData *}’ but argument is of type ‘char’ extern void InitResultRelInfo(ResultRelInfo
*resultRelInfo,            ^~~~~~~~~~~~~~~~~
 
worker.c:187:59: warning: passing argument 5 of ‘InitResultRelInfo’ 
makes integer from pointer without a cast [-Wint-conversion]  InitResultRelInfo(resultRelInfo, rel->localrel, 1, true,
NULL,0);                                                           ^~~~
 
In file included from ../../../../src/include/funcapi.h:21:0,                 from worker.c:31:
../../../../src/include/executor/executor.h:189:13: note: expected ‘int’ 
but argument is of type ‘void *’ extern void InitResultRelInfo(ResultRelInfo *resultRelInfo,
^~~~~~~~~~~~~~~~~
worker.c:187:2: error: too many arguments to function 
‘InitResultRelInfo’  InitResultRelInfo(resultRelInfo, rel->localrel, 1, true, NULL, 0);  ^~~~~~~~~~~~~~~~~
In file included from ../../../../src/include/funcapi.h:21:0,                 from worker.c:31:
../../../../src/include/executor/executor.h:189:13: note: declared here extern void InitResultRelInfo(ResultRelInfo
*resultRelInfo,            ^~~~~~~~~~~~~~~~~
 
make[4]: *** [worker.o] Error 1
make[4]: *** Waiting for unfinished jobs....
make[3]: *** [logical-recursive] Error 2
make[2]: *** [replication-recursive] Error 2
make[2]: *** Waiting for unfinished jobs....
^[make[1]: *** [all-backend-recurse] Error 2
make: *** [all-src-recurse] Error 2




but perhaps that patchset itself is incorrect, or the order in which I 
applied them.

Can you please put them in the right order?  (I tried already a few...)


thanks,


Erik Rijkers




Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
Hi,

There were some conflicting changes committed today so I rebased the
patch on top of them.

Other than that nothing much has changed, I removed the separate sync
commit patch, included the rename patch in the patchset and fixed the
bug around pg_subscription catalog reported by Erik Rijkers.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Logical Replication WIP

From
Peter Eisentraut
Date:
On 1/19/17 5:01 PM, Petr Jelinek wrote:
> There were some conflicting changes committed today so I rebased the
> patch on top of them.
> 
> Other than that nothing much has changed, I removed the separate sync
> commit patch, included the rename patch in the patchset and fixed the
> bug around pg_subscription catalog reported by Erik Rijkers.

Committed.  I haven't reviewed the rename patch yet, so I'll get back to
that later.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Logical Replication WIP

From
Fujii Masao
Date:
On Fri, Jan 20, 2017 at 11:08 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 1/19/17 5:01 PM, Petr Jelinek wrote:
>> There were some conflicting changes committed today so I rebased the
>> patch on top of them.
>>
>> Other than that nothing much has changed, I removed the separate sync
>> commit patch, included the rename patch in the patchset and fixed the
>> bug around pg_subscription catalog reported by Erik Rijkers.
>
> Committed.

Sorry I've not followed the discussion about logical replication at all, but
why does logical replication launcher need to start up by default?
   $ initdb -D data   $ pg_ctl -D data start

When I ran the above commands, I got the following message and
found that the bgworker for logical replicatdion launcher was running.
   LOG:  logical replication launcher started

Regards,

-- 
Fujii Masao



Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 20/01/17 15:08, Peter Eisentraut wrote:
> On 1/19/17 5:01 PM, Petr Jelinek wrote:
>> There were some conflicting changes committed today so I rebased the
>> patch on top of them.
>>
>> Other than that nothing much has changed, I removed the separate sync
>> commit patch, included the rename patch in the patchset and fixed the
>> bug around pg_subscription catalog reported by Erik Rijkers.
> 
> Committed.  I haven't reviewed the rename patch yet, so I'll get back to
> that later.
> 

Hi,

Thanks!

Here is fix for the dependency mess.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 20/01/17 17:05, Fujii Masao wrote:
> On Fri, Jan 20, 2017 at 11:08 PM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>> On 1/19/17 5:01 PM, Petr Jelinek wrote:
>>> There were some conflicting changes committed today so I rebased the
>>> patch on top of them.
>>>
>>> Other than that nothing much has changed, I removed the separate sync
>>> commit patch, included the rename patch in the patchset and fixed the
>>> bug around pg_subscription catalog reported by Erik Rijkers.
>>
>> Committed.
> 
> Sorry I've not followed the discussion about logical replication at all, but
> why does logical replication launcher need to start up by default?
> 

Because running subscriptions is allowed by default. You'd need to set
max_logical_replication_workers to 0 to disable that.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] Logical Replication WIP

From
Jaime Casanova
Date:
On 20 January 2017 at 11:25, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
> On 20/01/17 17:05, Fujii Masao wrote:
>> On Fri, Jan 20, 2017 at 11:08 PM, Peter Eisentraut
>> <peter.eisentraut@2ndquadrant.com> wrote:
>>> On 1/19/17 5:01 PM, Petr Jelinek wrote:
>>>> There were some conflicting changes committed today so I rebased the
>>>> patch on top of them.
>>>>
>>>> Other than that nothing much has changed, I removed the separate sync
>>>> commit patch, included the rename patch in the patchset and fixed the
>>>> bug around pg_subscription catalog reported by Erik Rijkers.
>>>
>>> Committed.
>>
>> Sorry I've not followed the discussion about logical replication at all, but
>> why does logical replication launcher need to start up by default?
>>
>
> Because running subscriptions is allowed by default. You'd need to set
> max_logical_replication_workers to 0 to disable that.
>

surely wal_level < logical shouldn't start a logical replication
launcher, and after an initdb wal_level is only replica


-- 
Jaime Casanova                      www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 20/01/17 17:33, Jaime Casanova wrote:
> On 20 January 2017 at 11:25, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
>> On 20/01/17 17:05, Fujii Masao wrote:
>>> On Fri, Jan 20, 2017 at 11:08 PM, Peter Eisentraut
>>> <peter.eisentraut@2ndquadrant.com> wrote:
>>>> On 1/19/17 5:01 PM, Petr Jelinek wrote:
>>>>> There were some conflicting changes committed today so I rebased the
>>>>> patch on top of them.
>>>>>
>>>>> Other than that nothing much has changed, I removed the separate sync
>>>>> commit patch, included the rename patch in the patchset and fixed the
>>>>> bug around pg_subscription catalog reported by Erik Rijkers.
>>>>
>>>> Committed.
>>>
>>> Sorry I've not followed the discussion about logical replication at all, but
>>> why does logical replication launcher need to start up by default?
>>>
>>
>> Because running subscriptions is allowed by default. You'd need to set
>> max_logical_replication_workers to 0 to disable that.
>>
> 
> surely wal_level < logical shouldn't start a logical replication
> launcher, and after an initdb wal_level is only replica
> 

Launcher is needed for subscriptions, subscriptions don't depend on
wal_level.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] Logical Replication WIP

From
Jaime Casanova
Date:
On 20 January 2017 at 11:39, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
> On 20/01/17 17:33, Jaime Casanova wrote:
>>>
>>
>> surely wal_level < logical shouldn't start a logical replication
>> launcher, and after an initdb wal_level is only replica
>>
>
> Launcher is needed for subscriptions, subscriptions don't depend on
> wal_level.
>

mmm... ok, i need to read a little then. thanks

-- 
Jaime Casanova                      www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Logical Replication WIP

From
Robert Haas
Date:
On Fri, Jan 20, 2017 at 11:39 AM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:
> Launcher is needed for subscriptions, subscriptions don't depend on
> wal_level.

I don't see how a subscription can do anything useful with wal_level < logical?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Logical Replication WIP

From
Craig Ringer
Date:


On 21 Jan. 2017 06:48, "Robert Haas" <robertmhaas@gmail.com> wrote:
On Fri, Jan 20, 2017 at 11:39 AM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:
> Launcher is needed for subscriptions, subscriptions don't depend on
> wal_level.

I don't see how a subscription can do anything useful with wal_level < logical?

The upstream must have it set to logical so we can decide the change stream.

The downstream need not. It's an independent instance. 

Re: [HACKERS] Logical Replication WIP

From
Robert Haas
Date:
On Fri, Jan 20, 2017 at 2:57 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> > I don't see how a subscription can do anything useful with wal_level <
> > logical?
>
> The upstream must have it set to logical so we can decide the change stream.
>
> The downstream need not. It's an independent instance.

/me facepalms.

Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 20/01/17 17:23, Petr Jelinek wrote:
> On 20/01/17 15:08, Peter Eisentraut wrote:
>> On 1/19/17 5:01 PM, Petr Jelinek wrote:
>>> There were some conflicting changes committed today so I rebased the
>>> patch on top of them.
>>>
>>> Other than that nothing much has changed, I removed the separate sync
>>> commit patch, included the rename patch in the patchset and fixed the
>>> bug around pg_subscription catalog reported by Erik Rijkers.
>>
>> Committed.  I haven't reviewed the rename patch yet, so I'll get back to
>> that later.
>>
> 
> Hi,
> 
> Thanks!
> 
> Here is fix for the dependency mess.
> 

Álvaro pointed out off list couple of issues with how we handle
interruption of commands that connect to walsender.

a) The libpqwalreceiver.c does blocking connect so it's impossible to
cancel CREATE SUBSCRIPTION which is stuck on connect. This is btw
preexisting problem and applies to walreceiver as well. I rewrote the
connect function to use asynchronous API (patch 0001).

b) We can cancel in middle of the command (when stuck in
libpqrcv_PQexec) but the connection to walsender stays open which in
case we are waiting for snapshot can mean that it will stay idle in
transaction. I added PG_TRY wrapper which disconnects on error around
this (patch 0002).

And finally, while testing these two I found bug in walsender StringInfo
initialization (or lack there of). There are 3 static StringInfo buffers
that are initialized in WalSndLoop. Problem with that is that they can
be in some rare scenarios used from CreateReplicationSlot (and IMHO
StartLogicalReplication) before WalSndLoop is called which causes
segfault of walsender. This is rare because it only happens when
downstream closes connection during logical decoding initialization.

Since it's not exactly straight forward to find when these need to be
initialized based on commands, I decided to move the initialization code
to exec_replication_command() since that's always called before anything
so that makes it much less error prone (patch 0003).

The 0003 should be backpatched all the way to 9.4 where multiple
commands started using those buffers.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 20/01/17 22:30, Petr Jelinek wrote:
> Since it's not exactly straight forward to find when these need to be
> initialized based on commands, I decided to move the initialization code
> to exec_replication_command() since that's always called before anything
> so that makes it much less error prone (patch 0003).
> 
> The 0003 should be backpatched all the way to 9.4 where multiple
> commands started using those buffers.
> 

Actually there is better place, the WalSndInit().

Just to make it easier for PeterE (or whichever committer picks this up)
I attached all the logical replication followup fix/polish patches:

0001 - Changes the libpqrcv_connect to use async libpq api so that it
won't get stuck forever in case of connect is stuck. This is preexisting
bug that also affects walreceiver but it's less visible there as there
is no SQL interface to initiate connection there.

0002 - Close replication connection when CREATE SUBSCRIPTION gets
canceled (otherwise walsender on the other side may stay in idle in
transaction state).

0003 - Fixes buffer initialization in walsender that I found when
testing the above two. This one should be back-patched to 9.4 since it's
broken since then.

0004 - Fixes the foreign key issue reported by Thom Brown and also adds
tests for FK and trigger handling.

0005 - Adds support for renaming publications and subscriptions.

All rebased on top of current master (90992e0).

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Logical Replication WIP

From
Thom Brown
Date:
On 23 January 2017 at 01:11, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
> On 20/01/17 22:30, Petr Jelinek wrote:
>> Since it's not exactly straight forward to find when these need to be
>> initialized based on commands, I decided to move the initialization code
>> to exec_replication_command() since that's always called before anything
>> so that makes it much less error prone (patch 0003).
>>
>> The 0003 should be backpatched all the way to 9.4 where multiple
>> commands started using those buffers.
>>
>
> Actually there is better place, the WalSndInit().
>
> Just to make it easier for PeterE (or whichever committer picks this up)
> I attached all the logical replication followup fix/polish patches:
>
> 0001 - Changes the libpqrcv_connect to use async libpq api so that it
> won't get stuck forever in case of connect is stuck. This is preexisting
> bug that also affects walreceiver but it's less visible there as there
> is no SQL interface to initiate connection there.
>
> 0002 - Close replication connection when CREATE SUBSCRIPTION gets
> canceled (otherwise walsender on the other side may stay in idle in
> transaction state).
>
> 0003 - Fixes buffer initialization in walsender that I found when
> testing the above two. This one should be back-patched to 9.4 since it's
> broken since then.
>
> 0004 - Fixes the foreign key issue reported by Thom Brown and also adds
> tests for FK and trigger handling.

This fixes the problem for me.  Thanks.
>
> 0005 - Adds support for renaming publications and subscriptions.

Works for me.

I haven't tested the first 3.

Regards

Thom



Re: [HACKERS] Logical Replication WIP

From
Fujii Masao
Date:
On Sat, Jan 21, 2017 at 1:39 AM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:
> On 20/01/17 17:33, Jaime Casanova wrote:
>> On 20 January 2017 at 11:25, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
>>> On 20/01/17 17:05, Fujii Masao wrote:
>>>> On Fri, Jan 20, 2017 at 11:08 PM, Peter Eisentraut
>>>> <peter.eisentraut@2ndquadrant.com> wrote:
>>>>> On 1/19/17 5:01 PM, Petr Jelinek wrote:
>>>>>> There were some conflicting changes committed today so I rebased the
>>>>>> patch on top of them.
>>>>>>
>>>>>> Other than that nothing much has changed, I removed the separate sync
>>>>>> commit patch, included the rename patch in the patchset and fixed the
>>>>>> bug around pg_subscription catalog reported by Erik Rijkers.
>>>>>
>>>>> Committed.
>>>>
>>>> Sorry I've not followed the discussion about logical replication at all, but
>>>> why does logical replication launcher need to start up by default?
>>>>
>>>
>>> Because running subscriptions is allowed by default. You'd need to set
>>> max_logical_replication_workers to 0 to disable that.
>>>
>>
>> surely wal_level < logical shouldn't start a logical replication
>> launcher, and after an initdb wal_level is only replica
>>
>
> Launcher is needed for subscriptions, subscriptions don't depend on
> wal_level.

But why did you enable only subscription by default while publication is
disabled by default (i.e., wal_level != logical)? I think that it's better to
enable both by default OR disable both by default.

While I was reading the logical rep code, I found that
logicalrep_worker_launch returns *without* releasing LogicalRepWorkerLock
when there is no unused worker slot. This seems a bug.
   /* Report this after the initial starting message for consistency. */   if (max_replication_slots == 0)
ereport(ERROR,          (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),           errmsg("cannot start logical
replicationworkers when
 
max_replication_slots = 0")));

logicalrep_worker_launch checks max_replication_slots as above.
Why does it need to check that setting value in the *subscriber* side?
Maybe I'm missing something here, but ISTM that the subscription uses
one replication slot in *publisher* side but doesn't use in *subscriber* side.
   *  The apply worker may spawn additional workers (sync) for initial data   *  synchronization of tables.

The above header comment in logical/worker.c is true?

The copyright in each file that the commit of logical rep added needs to
be updated.

Regards,

-- 
Fujii Masao



Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 23/01/17 17:19, Fujii Masao wrote:
> On Sat, Jan 21, 2017 at 1:39 AM, Petr Jelinek
> <petr.jelinek@2ndquadrant.com> wrote:
>> On 20/01/17 17:33, Jaime Casanova wrote:
>>> On 20 January 2017 at 11:25, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
>>>> On 20/01/17 17:05, Fujii Masao wrote:
>>>>> On Fri, Jan 20, 2017 at 11:08 PM, Peter Eisentraut
>>>>> <peter.eisentraut@2ndquadrant.com> wrote:
>>>>>> On 1/19/17 5:01 PM, Petr Jelinek wrote:
>>>>>>> There were some conflicting changes committed today so I rebased the
>>>>>>> patch on top of them.
>>>>>>>
>>>>>>> Other than that nothing much has changed, I removed the separate sync
>>>>>>> commit patch, included the rename patch in the patchset and fixed the
>>>>>>> bug around pg_subscription catalog reported by Erik Rijkers.
>>>>>>
>>>>>> Committed.
>>>>>
>>>>> Sorry I've not followed the discussion about logical replication at all, but
>>>>> why does logical replication launcher need to start up by default?
>>>>>
>>>>
>>>> Because running subscriptions is allowed by default. You'd need to set
>>>> max_logical_replication_workers to 0 to disable that.
>>>>
>>>
>>> surely wal_level < logical shouldn't start a logical replication
>>> launcher, and after an initdb wal_level is only replica
>>>
>>
>> Launcher is needed for subscriptions, subscriptions don't depend on
>> wal_level.
> 
> But why did you enable only subscription by default while publication is
> disabled by default (i.e., wal_level != logical)? I think that it's better to
> enable both by default OR disable both by default.
> 

That depends, the wal_level = logical by default was deemed to not be
worth the potential overhead in the thread about wal_level thread. There
is no such overhead associated with enabling subscription, one could say
that it's less work this way to setup whole thing. But I guess it's up
for a debate.

> While I was reading the logical rep code, I found that
> logicalrep_worker_launch returns *without* releasing LogicalRepWorkerLock
> when there is no unused worker slot. This seems a bug.

True, fix attached.

> 
>     /* Report this after the initial starting message for consistency. */
>     if (max_replication_slots == 0)
>         ereport(ERROR,
>             (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
>             errmsg("cannot start logical replication workers when
> max_replication_slots = 0")));
> 
> logicalrep_worker_launch checks max_replication_slots as above.
> Why does it need to check that setting value in the *subscriber* side?
> Maybe I'm missing something here, but ISTM that the subscription uses
> one replication slot in *publisher* side but doesn't use in *subscriber* side.

Because replication origins are also limited by the
max_replication_slots and they are required for subscription to work (I
am not quite sure why it's the case, I guess we wanted to save GUC).

> 
>     *  The apply worker may spawn additional workers (sync) for initial data
>     *  synchronization of tables.
> 
> The above header comment in logical/worker.c is true?
> 

Hmm not yet, there is separate patch for it in CF, I guess it got
through the cracks while rebasing.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Logical Replication WIP

From
Peter Eisentraut
Date:
On 1/22/17 8:11 PM, Petr Jelinek wrote:
> 0001 - Changes the libpqrcv_connect to use async libpq api so that it
> won't get stuck forever in case of connect is stuck. This is preexisting
> bug that also affects walreceiver but it's less visible there as there
> is no SQL interface to initiate connection there.

Probably a mistake here:

+                       case PGRES_POLLING_READING:
+                               extra_flag = WL_SOCKET_READABLE;
+                               /* pass through */
+                       case PGRES_POLLING_WRITING:
+                               extra_flag = WL_SOCKET_WRITEABLE;

extra_flag gets overwritten in the reading case.

Please elaborate in the commit message what this change is for.

> 0002 - Close replication connection when CREATE SUBSCRIPTION gets
> canceled (otherwise walsender on the other side may stay in idle in
> transaction state).

committed

> 0003 - Fixes buffer initialization in walsender that I found when
> testing the above two. This one should be back-patched to 9.4 since it's
> broken since then.

Can you explain more in which code path this problem occurs?

I think we should get rid of the global variables and give each function
its own buffer that it initializes the first time through.  Otherwise
we'll keep having to worry about this.

> 0004 - Fixes the foreign key issue reported by Thom Brown and also adds
> tests for FK and trigger handling.

I think the trigger handing should go into execReplication.c.

> 0005 - Adds support for renaming publications and subscriptions.

Could those not be handled in the generic rename support in
ExecRenameStmt()?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Logical Replication WIP

From
Peter Eisentraut
Date:
On 1/23/17 11:19 AM, Fujii Masao wrote:
> The copyright in each file that the commit of logical rep added needs to
> be updated.

I have fixed that.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 25/01/17 18:16, Peter Eisentraut wrote:
> On 1/22/17 8:11 PM, Petr Jelinek wrote:
>> 0001 - Changes the libpqrcv_connect to use async libpq api so that it
>> won't get stuck forever in case of connect is stuck. This is preexisting
>> bug that also affects walreceiver but it's less visible there as there
>> is no SQL interface to initiate connection there.
> 
> Probably a mistake here:
> 
> +                       case PGRES_POLLING_READING:
> +                               extra_flag = WL_SOCKET_READABLE;
> +                               /* pass through */
> +                       case PGRES_POLLING_WRITING:
> +                               extra_flag = WL_SOCKET_WRITEABLE;
> 
> extra_flag gets overwritten in the reading case.
> 

Eh, reworked that to just if statement as switch does not really buy us
anything there.

> Please elaborate in the commit message what this change is for.
> 

Okay.

>> 0002 - Close replication connection when CREATE SUBSCRIPTION gets
>> canceled (otherwise walsender on the other side may stay in idle in
>> transaction state).
> 
> committed

Thanks!

> 
>> 0003 - Fixes buffer initialization in walsender that I found when
>> testing the above two. This one should be back-patched to 9.4 since it's
>> broken since then.
> 
> Can you explain more in which code path this problem occurs?

With existing code base, anything that calls WalSndWaitForWal (it calls
ProcessRepliesIfAny()) which is called from logical_read_xlog_page which
is given as callback to logical decoding in CreateReplicationSlot and
StartLogicalReplication.

The reason why I decided to put it into init is that following up all
the paths to where buffers are used is rather complicated due to various
callbacks so if anybody else starts poking around in the future it might
get easily broken again if we don't initialize those unconditionally
(plus the memory footprint is few kB and in usual use of WalSender they
will eventually be initialized anyway as they are needed for streaming).

> I think we should get rid of the global variables and give each function
> its own buffer that it initializes the first time through.  Otherwise
> we'll keep having to worry about this.
> 

Because of above, it would mean some refactoring in logical decoding
APIs not just in WalSender so that would not be backpatchable (and in
general it's much bigger patch then).

>> 0004 - Fixes the foreign key issue reported by Thom Brown and also adds
>> tests for FK and trigger handling.
> 
> I think the trigger handing should go into execReplication.c.
> 

Not in the current state, eventually (and I am afraid that PG11 material
at this point as we still have partitioned tables support and initial
data copy to finish in this release) we'll want to move all the executor
state code to execReplication.c and do less of reinitialization but in
the current code the trigger stuff belongs to worker IMHO.

>> 0005 - Adds support for renaming publications and subscriptions.
> 
> Could those not be handled in the generic rename support in
> ExecRenameStmt()?

Yes seems they can.

Attached updated version of the uncommitted patches.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
Hi,

I updated these patches for current HEAD and removed the string
initialization in walsender as Fuji Masao committed similar fix in meantime.

I also found typo/thinko in the first patch which is now fixed.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Logical Replication WIP

From
Petr Jelinek
Date:
On 22/02/17 12:24, Petr Jelinek wrote:
> Hi,
> 
> I updated these patches for current HEAD and removed the string
> initialization in walsender as Fuji Masao committed similar fix in meantime.
> 
> I also found typo/thinko in the first patch which is now fixed.
> 

And of course I missed the xlog->wal rename, sigh. Fixed.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Logical Replication WIP

From
Peter Eisentraut
Date:
On 2/22/17 07:00, Petr Jelinek wrote:
> On 22/02/17 12:24, Petr Jelinek wrote:
>> Hi,
>>
>> I updated these patches for current HEAD and removed the string
>> initialization in walsender as Fuji Masao committed similar fix in meantime.
>>
>> I also found typo/thinko in the first patch which is now fixed.
>>
> 
> And of course I missed the xlog->wal rename, sigh. Fixed.

all three committed

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services