Thread: Logical Replication WIP
Hi, as promised here is WIP version of logical replication patch. This is by no means anywhere close to be committable, but it should be enough for discussion on the approaches chosen. I do plan to give this some more time before September CF as well as during the CF itself. You've seen some preview of ideas in the doc Simon posted [1], not all of them are implemented yet in this patch though. I'll start with the overview of the state of things. What works: - Replication of INSERT/UPDATE/DELETE operations on tables in publication. - Initial copy of data in publication. - Automatic management of things like slots and origin tracking. - Some psql support (\drp, \drs and additional info in \d for tables, it's mainly missing ACLs as those are not implemented (see bellow) yet and tab completion. What's missing: - sequences, I'd like to have them in 10.0 but I don't have good way to implement it. PGLogical uses periodical syncing with some buffer value but that's suboptimal. I would like to decode them but that has proven to be complicated due to their sometimes transactional sometimes nontransactional nature, so I probably won't have time to do it within 10.0 by myself. - ACLs, I still expect to have it the way it's documented in the logical replication docs, but currently the code just assumes superuser/REPLICATION role. This can be probably discussed in the design thread more [1]. - pg_dump, same as above, I want to have publications and membership in those dumped unconditionally and potentially dump also subscription definitions if user asks for it using commandline option as I don't think subscriptions should be dumped by default as automatically starting replication when somebody dumps and restores the db goes against POLA. - DDL, I see several approaches we could do here for 10.0. a) don't deal with DDL at all yet, b) provide function which pushes the DDL into replication queue and then executes on downstream (like londiste, slony, pglogical do), c) capture the DDL query as text and allow user defined function to be called with that DDL text on the subscriber (that's what oracle did with CDC) - FDW support on downstream, currently only INSERTs should work there but that should be easy to fix. - Monitoring, I'd like to add some pg_stat_subscription view on the downstream (the rest of monitoring is very similar to physical streaming so that needs mostly docs). - TRUNCATE, this is handled using triggers in BDR and pglogical but I am not convinced that's the right way to do it for incore as it brings limitations (fe. inability to use restart identity). The parts I am not overly happy with: - The fact that subscription handles slot creation/drop means we do some automagic that might fail and user might need to fix that up manually. I am not saying this is necessarily problem as that's how most of the publish/subscribe replication systems work but I wonder if there is better way of doing this that I missed. - The initial copy patch adds some interfaces for getting table list and data into the DecodingContext and I wonder if that's good place for those or if we should create some TableSync API instead that would load plugin as well and have these two new interfaces and put into the tablesync module. One reason why I didn't do it is that the interface would be almost the same and the plugin then would have to do separate init for DecodingContext and TableSync. - The initial copy uses the snapshot from slot creation in the walsender. I currently just push it as active snapshot inside snapbuilder which is probably not the right thing to do (tm). That is mostly because I don't really know what the right thing is there. About individual pathes: 0001-Add-PUBLICATION-catalogs-and-DDL.patch: This patch defines a Publication which his basically same thing as replication set. It adds database local catalog pg_publication which stores the publications and DML filters, and pg_publication_rel catalog for storing membership of relation in the publication. Adds the DDL, dependency handling and all the necessary boilerplate around that including some basic regression tests for the DDL. 0002-Add-SUBSCRIPTION-catalog-and-DDL.patch: Adds Subscriptions with shared nailed (!) catalog pg_subscription which stores the individual subscriptions for each database. The reason why this is nailed is that it needs to be accessible without connection to database so that the logical replication launcher can read it and start/stop workers as necessary. This does not include regression tests as I am usure how to test this within regression testing framework given that it is supposed to start workers (those are added in later patches). 0003-Define-logical-replication-protocol-and-output-plugi.patch: Adds the logical replication protocol (api and docs) and "standard" output plugin for logical decoding that produces output based on that protocol and the publication definitions. 0004-Make-libpqwalreceiver-reentrant.patch: Redesigns the libpqwalreceiver to be reusable outside of walreceiver by exporting the api as struct and opaque connection handle. Also adds couple of additional functions for logical replication. 0005-Add-logical-replication-workers.patch: This patch adds the actual logical replication workers that use all above to implement the data change replication from publisher to subscriber. It adds two different background workers. First is Launcher which works like the autovacuum laucnher in that it gets list of subscriptions and starts/stops the apply workers for those subscriptions as needed. Apply workers connect to the output plugin via streaming protocol and handle the actual data replication. I exported the ExecUpdate/ExecInsert/ExecDelete functions from nodeModifyTable to handle the actual database updates so that things like triggers, etc are handled automatically without special code. This also adds couple of TAP tests that test basic replication setup and also wide variety of type support. Also the overview doc for logical replication that Simon previously posted to the list is part of this one. 0006-Logical-replication-support-for-initial-data-copy.patch: PoC of initial sync. It adds another mode into apply worker which just applies updates for single table and some handover logic for when the table is given synchronized and can be replicated normally. It also adds new catalog pg_subscription_rel which keeps information about synchronization status of individual tables. Note that tables added to publications at later time are not yet synchronized, there is also no resynchronization UI yet. On the upstream side it adds two new commands into replication protocol for getting list of tables and for streaming existing table data. I discussed this part as suboptimal above so won't repeat here. Feedback is welcome. [1] https://www.postgresql.org/message-id/flat/CANP8%2Bj%2BNMHP-yFvoG03tpb4_s7GdmnCriEEOJeKkXWmUu_%3D-HA%40mail.gmail.com#CANP8+j+NMHP-yFvoG03tpb4_s7GdmnCriEEOJeKkXWmUu_=-HA@mail.gmail.com -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 2016-08-05 17:00:13 +0200, Petr Jelinek wrote: > as promised here is WIP version of logical replication patch. Yay! I'm about to head out for a week of, desperately needed, holidays, but after that I plan to spend a fair amount of time helping to review etc. this.
On 5 August 2016 at 16:22, Andres Freund <andres@anarazel.de> wrote: > On 2016-08-05 17:00:13 +0200, Petr Jelinek wrote: >> as promised here is WIP version of logical replication patch. > > Yay! Yay2 > I'm about to head out for a week of, desperately needed, holidays, but > after that I plan to spend a fair amount of time helping to review > etc. this. Have a good one. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Aug 6, 2016 at 2:04 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 5 August 2016 at 16:22, Andres Freund <andres@anarazel.de> wrote: >> On 2016-08-05 17:00:13 +0200, Petr Jelinek wrote: >>> as promised here is WIP version of logical replication patch. >> >> Yay! > > Yay2 > Thank you for working on this! I've applied these patches to current HEAD, but got the following error. libpqwalreceiver.c:48: error: redefinition of typedef ‘WalReceiverConnHandle’ ../../../../src/include/replication/walreceiver.h:137: note: previous declaration of ‘WalReceiverConnHandle’ was here make[2]: *** [libpqwalreceiver.o] Error 1 make[1]: *** [install-backend/replication/libpqwalreceiver-recurse] Error 2 make: *** [install-src-recurse] Error 2 After fixed this issue with attached patch, I used logical replication a little. Some random comments and questions. The logical replication launcher process and the apply process are implemented as a bgworker. Isn't better to have them as an auxiliary process like checkpointer, wal writer? IMO the number of logical replication connections should not be limited by max_worker_processes. -- We need to set the publication up by at least CREATE PUBLICATION and ALTER PUBLICATION command. Can we make CREATE PUBLICATION possible to define tables as well? For example, CREATE PUBLICATION mypub [ TABLE table_name, ...] [WITH options] -- This patch can not drop the subscription. =# drop subscription sub; ERROR: unrecognized object class: 6102 -- +/*------------------------------------------------------------------------- + * + * proto.c + * logical replication protocol functions + * + * Copyright (c) 2015, PostgreSQL Global Development Group + * The copyright of added files are old. And this patch has some whitespace problems. Please run "git show --check" or "git diff origin/master --check" Regards, -- Masahiko Sawada
Attachment
On 9 August 2016 at 15:59, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
The logical replication launcher process and the apply process are
implemented as a bgworker. Isn't better to have them as an auxiliary
process like checkpointer, wal writer?
I don't think so. The checkpointer, walwriter, autovacuum, etc predate bgworkers. I strongly suspect that if they were to be implemented now they'd use bgworkers.
Now, perhaps we want a new bgworker "kind" for system workers or some other minor tweaks. But basically I think bgworkers are exactly what we should be using here.
IMO the number of logical replication connections should not be
limited by max_worker_processes.
Well, they *are* worker processes... but I take your point, that that setting has been "number of bgworkers the user can run" and it might not be expected that logical replication would use the same space.
max_worker_progresses isn't just a limit, it controls how many shmem slots we allocate.
I guess we could have a separate max_logical_workers or something, but I'm inclined to think that adds complexity without really making things any nicer. We'd just add them together to decide how many shmem slots to allocate and we'd have to keep track of how many slots were used by which types of backend. Or create a near-duplicate of the bgworker facility for logical rep.
Sure, you can go deeper down the rabbit hole here and say that we need to add bgworker "categories" with reserved pools of worker slots for each category. But do we really need that?
max_connections includes everything, both system and user backends. It's not like we don't do this elsewhere. It's at worst a mild wart.
The only argument I can see for not using bgworkers is for the supervisor worker. It's a singleton that launches the per-database workers, and arguably is a job that the postmaster could do better. The current design there stems from its origins as an extension. Maybe worker management could be simplified a bit as a result. I'd really rather not invent yet another new and mostly duplicate category of custom workers to achieve that though.
On Tue, Aug 9, 2016 at 5:13 PM, Craig Ringer <craig@2ndquadrant.com> wrote: > On 9 August 2016 at 15:59, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> The logical replication launcher process and the apply process are >> implemented as a bgworker. Isn't better to have them as an auxiliary >> process like checkpointer, wal writer? > > I don't think so. The checkpointer, walwriter, autovacuum, etc predate > bgworkers. I strongly suspect that if they were to be implemented now they'd > use bgworkers. +1. We could always get them now under the umbrella of the bgworker infrastructure if this cleans up some code duplication. -- Michael
On 09/08/16 09:59, Masahiko Sawada wrote: >>> On 2016-08-05 17:00:13 +0200, Petr Jelinek wrote: >>>> as promised here is WIP version of logical replication patch. >>> > > Thank you for working on this! Thanks for looking! > > I've applied these patches to current HEAD, but got the following error. > > libpqwalreceiver.c:48: error: redefinition of typedef ‘WalReceiverConnHandle’ > ../../../../src/include/replication/walreceiver.h:137: note: previous > declaration of ‘WalReceiverConnHandle’ was here > make[2]: *** [libpqwalreceiver.o] Error 1 > make[1]: *** [install-backend/replication/libpqwalreceiver-recurse] Error 2 > make: *** [install-src-recurse] Error 2 > > After fixed this issue with attached patch, I used logical replication a little. > Some random comments and questions. > Interesting, my compiler does have problem. Will investigate. > The logical replication launcher process and the apply process are > implemented as a bgworker. Isn't better to have them as an auxiliary > process like checkpointer, wal writer? > IMO the number of logical replication connections should not be > limited by max_worker_processes. > What Craig said reflects my rationale for doing this pretty well. > We need to set the publication up by at least CREATE PUBLICATION and > ALTER PUBLICATION command. > Can we make CREATE PUBLICATION possible to define tables as well? > For example, > CREATE PUBLICATION mypub [ TABLE table_name, ...] [WITH options] Agreed, that just didn't make it to the first cut to -hackers. We've been also thinking of having special ALL TABLES parameter there that would encompass whole db. > -- > This patch can not drop the subscription. > > =# drop subscription sub; > ERROR: unrecognized object class: 6102 > Yeah that's because of the patch 0006, I didn't finish all the dependency tracking for the pg_subscription_rel catalog that it adds (which is why I called it PoC). I expect to have this working in next version (there is still quite a bit of polish work needed in general). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 09/08/16 10:13, Craig Ringer wrote: > On 9 August 2016 at 15:59, Masahiko Sawada <sawada.mshk@gmail.com > <mailto:sawada.mshk@gmail.com>> wrote: > > > > The logical replication launcher process and the apply process are > implemented as a bgworker. Isn't better to have them as an auxiliary > process like checkpointer, wal writer? > > > I don't think so. The checkpointer, walwriter, autovacuum, etc predate > bgworkers. I strongly suspect that if they were to be implemented now > they'd use bgworkers. > > Now, perhaps we want a new bgworker "kind" for system workers or some > other minor tweaks. But basically I think bgworkers are exactly what we > should be using here. > Agreed. > > IMO the number of logical replication connections should not be > limited by max_worker_processes. > > > Well, they *are* worker processes... but I take your point, that that > setting has been "number of bgworkers the user can run" and it might not > be expected that logical replication would use the same space. Again agree, I think we should ultimately go towards what PeterE suggested in https://www.postgresql.org/message-id/a2fffd92-6e59-a4eb-dd85-c5865ebca1a0@2ndquadrant.com > > The only argument I can see for not using bgworkers is for the > supervisor worker. It's a singleton that launches the per-database > workers, and arguably is a job that the postmaster could do better. The > current design there stems from its origins as an extension. Maybe > worker management could be simplified a bit as a result. I'd really > rather not invent yet another new and mostly duplicate category of > custom workers to achieve that though. > It is simplified compared to pglogical (there is only 2 worker types not 3). I don't think it's job of postmaster to scan catalogs however so it can't really start workers for logical replication. I actually modeled it more after autovacuum (using bgworkers though) than the original extension. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Aug 9, 2016 at 5:13 PM, Craig Ringer <craig@2ndquadrant.com> wrote: > On 9 August 2016 at 15:59, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > >> >> >> The logical replication launcher process and the apply process are >> implemented as a bgworker. Isn't better to have them as an auxiliary >> process like checkpointer, wal writer? > > > I don't think so. The checkpointer, walwriter, autovacuum, etc predate > bgworkers. I strongly suspect that if they were to be implemented now they'd > use bgworkers. > > Now, perhaps we want a new bgworker "kind" for system workers or some other > minor tweaks. But basically I think bgworkers are exactly what we should be > using here. I understood. Thanks! >> >> IMO the number of logical replication connections should not be >> limited by max_worker_processes. > > > Well, they *are* worker processes... but I take your point, that that > setting has been "number of bgworkers the user can run" and it might not be > expected that logical replication would use the same space. > > max_worker_progresses isn't just a limit, it controls how many shmem slots > we allocate. > > I guess we could have a separate max_logical_workers or something, but I'm > inclined to think that adds complexity without really making things any > nicer. We'd just add them together to decide how many shmem slots to > allocate and we'd have to keep track of how many slots were used by which > types of backend. Or create a near-duplicate of the bgworker facility for > logical rep. > > Sure, you can go deeper down the rabbit hole here and say that we need to > add bgworker "categories" with reserved pools of worker slots for each > category. But do we really need that? If we change these processes to bgworker, we can categorize them into two, auxiliary process(check pointer and wal sender etc) and other worker process. And max_worker_processes controls the latter. > max_connections includes everything, both system and user backends. It's not > like we don't do this elsewhere. It's at worst a mild wart. > > The only argument I can see for not using bgworkers is for the supervisor > worker. It's a singleton that launches the per-database workers, and > arguably is a job that the postmaster could do better. The current design > there stems from its origins as an extension. Maybe worker management could > be simplified a bit as a result. I'd really rather not invent yet another > new and mostly duplicate category of custom workers to achieve that though. > > Regards, -- Masahiko Sawada
On 9 August 2016 at 17:28, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Sure, you can go deeper down the rabbit hole here and say that we need to
> add bgworker "categories" with reserved pools of worker slots for each
> category. But do we really need that?
If we change these processes to bgworker, we can categorize them into
two, auxiliary process(check pointer and wal sender etc) and other
worker process.
And max_worker_processes controls the latter.
Right. I think that's probably the direction we should be going eventually. Personally I don't think such a change should block the logical replication work from proceeding with bgworkers, though. It's been delayed a long time, a lot of people want it, and I think we need to focus on meeting the core requirements not getting too sidetracked on minor points.
Of course, everyone's idea of what's core and what's a minor sidetrack differs ;)
On 09/08/16 12:16, Craig Ringer wrote: > On 9 August 2016 at 17:28, Masahiko Sawada <sawada.mshk@gmail.com > <mailto:sawada.mshk@gmail.com>> wrote: > > > > Sure, you can go deeper down the rabbit hole here and say that we need to > > add bgworker "categories" with reserved pools of worker slots for each > > category. But do we really need that? > > If we change these processes to bgworker, we can categorize them into > two, auxiliary process(check pointer and wal sender etc) and other > worker process. > And max_worker_processes controls the latter. > > > Right. I think that's probably the direction we should be going > eventually. Personally I don't think such a change should block the > logical replication work from proceeding with bgworkers, though. It's > been delayed a long time, a lot of people want it, and I think we need > to focus on meeting the core requirements not getting too sidetracked on > minor points. > > Of course, everyone's idea of what's core and what's a minor sidetrack > differs ;) > Yeah that's why I added local max GUC that just handles the logical worker limit within the max_worker_processes. I didn't want to also write generic framework for managing the max workers using tags or something as part of this, it's big enough as it is and we can always move the limit to the more generic place once we have it. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Petr Jelinek wrote: > On 09/08/16 12:16, Craig Ringer wrote: > >Right. I think that's probably the direction we should be going > >eventually. Personally I don't think such a change should block the > >logical replication work from proceeding with bgworkers, though. > > Yeah that's why I added local max GUC that just handles the logical worker > limit within the max_worker_processes. I didn't want to also write generic > framework for managing the max workers using tags or something as part of > this, it's big enough as it is and we can always move the limit to the more > generic place once we have it. Parallel query does exactly that: the workers are allocated from the bgworkers array, and if you want more, it's on you to increase that limit (it doesn't even have the GUC for a maximum). As far as logical replication and parallel query are concerned, that's fine. We can improve this later, if it proves to be a problem. I think there are far more pressing matters to review. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Petr Jelinek wrote: > On 09/08/16 10:13, Craig Ringer wrote: > >The only argument I can see for not using bgworkers is for the > >supervisor worker. It's a singleton that launches the per-database > >workers, and arguably is a job that the postmaster could do better. The > >current design there stems from its origins as an extension. Maybe > >worker management could be simplified a bit as a result. I'd really > >rather not invent yet another new and mostly duplicate category of > >custom workers to achieve that though. > > It is simplified compared to pglogical (there is only 2 worker types not 3). > I don't think it's job of postmaster to scan catalogs however so it can't > really start workers for logical replication. I actually modeled it more > after autovacuum (using bgworkers though) than the original extension. Yeah, it's a very bad idea to put postmaster on this task. We should definitely stay away from that. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> On 05 Aug 2016, at 18:00, Petr Jelinek <petr@2ndquadrant.com> wrote: > > Hi, > > as promised here is WIP version of logical replication patch. Great! Proposed DDL about publication/subscriptions looks very nice to me. Some notes and thoughts about patch: * Clang grumbles at following pieces of code: apply.c:1316:6: warning: variable 'origin_startpos' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] tablesync.c:436:45: warning: if statement has empty body [-Wempty-body] if (wait_for_sync_status_change(tstate)); * max_logical_replication_workers mentioned everywhere in docs, but guc.c defines variable called max_logical_replication_processes for postgresql.conf * Since pg_subscription already shared across the cluster, it can be also handy to share pg_publications too and allow publication of tables from different databases. That is rare scenarios but quite important for virtual hosting use case — tons of small databases in a single postgres cluster. * There is no way to see attachecd tables/schemas to publication through \drp * As far as I understand there is no way to add table/tablespace right in CREATE PUBLICATION and one need explicitly do ALTER PUBLICATION right after creation. May be add something like WITH TABLE/TABLESPACE to CREATE? * So binary protocol goes into core. Is it still possible to use it as decoding plugin for manually created walsender? May be also include json as it was in pglogical? While i’m not arguing that it should be done, i’m interested about your opinion on that. * Also I’ve noted that you got rid of reserved byte (flags) in protocol comparing to pglogical_native. It was very handy to use it for two phase tx decoding (0 — usual commit, 1 — prepare, 2 — commit prepared), because both prepare and commit prepared generates commit record in xlog. > On 05 Aug 2016, at 18:00, Petr Jelinek <petr@2ndquadrant.com> wrote: > > - DDL, I see several approaches we could do here for 10.0. a) don't > deal with DDL at all yet, b) provide function which pushes the DDL > into replication queue and then executes on downstream (like > londiste, slony, pglogical do), c) capture the DDL query as text > and allow user defined function to be called with that DDL text on > the subscriber * Since here DDL is mostly ALTER / CREATE / DROP TABLE (or am I wrong?) may be we can add something like WITH SUBSCRIBERS to statements? * Talking about exact mechanism of DDL replication I like you variant b), but since we have transactional DDL, we can do two phase commit here. That will require two phase decoding and some logic about catching prepare responses through logical messages. If that approach sounds interesting i can describe proposal in more details and create a patch. * Also I wasn’t able actually to run replication itself =) While regression tests passes, TAP tests and manual run stuck. pg_subscription_rel.substate never becomes ‘r’. I’ll investigate that more and write again. * As far as I understand sync starts automatically on enabling publication. May be split that logic into a different command with some options? Like don’t sync at all for example. * When I’m trying to create subscription to non-existent publication, CREATE SUBSRITION creates replication slot and do not destroys it: # create subscription sub connection 'host=127.0.0.1 dbname=postgres' publication mypub; NOTICE: created replication slot "sub" on provider ERROR: could not receive list of replicated tables from the provider: ERROR: cache lookup failed for publication 0 CONTEXT: slot "sub", output plugin "pgoutput", in the list_tables callback after that: postgres=# drop subscription sub; ERROR: subscription "sub" does not exist postgres=# create subscription sub connection 'host=127.0.0.1 dbname=postgres' publication pub; ERROR: could not crate replication slot "sub": ERROR: replication slot "sub" already exists * Also can’t drop subscription: postgres=# \drs List of subscriptions Name | Database | Enabled | Publication | Conninfo ------+----------+---------+-------------+-------------------------------- sub | postgres | t | {mypub} | host=127.0.0.1 dbname=postgres (1 row) postgres=# drop subscription sub; ERROR: unrecognized object class: 6102 * Several time i’ve run in a situation where provider's postmaster ignores Ctrl-C until subscribed node is switched off. * Patch with small typos fixed attached. I’ll do more testing, just want to share what i have so far. -- Stas Kelvich Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Attachment
Hi, On 11/08/16 13:34, Stas Kelvich wrote: > > * max_logical_replication_workers mentioned everywhere in docs, but guc.c defines > variable called max_logical_replication_processes for postgresql.conf > Ah changed it in code but not in docs, will fix. > * Since pg_subscription already shared across the cluster, it can be also handy to > share pg_publications too and allow publication of tables from different databases. That > is rare scenarios but quite important for virtual hosting use case — tons of small databases > in a single postgres cluster. You can't decode changes from multiple databases in one slot so I don't see the usefulness there. The pg_subscription is currently shared because it's technical necessity (as in I don't see how to solve the need to access the catalog from launcher in any other way) not because I think it's great design :) > > * There is no way to see attachecd tables/schemas to publication through \drp > That's mostly intentional as publications for table are visible in \d, but I am not against adding it to \drp. > * As far as I understand there is no way to add table/tablespace right in CREATE > PUBLICATION and one need explicitly do ALTER PUBLICATION right after creation. > May be add something like WITH TABLE/TABLESPACE to CREATE? > Yes, as I said to Masahiko Sawada, it's just not there yet but I plan to have that. > * So binary protocol goes into core. Is it still possible to use it as decoding plugin for > manually created walsender? May be also include json as it was in pglogical? While > i’m not arguing that it should be done, i’m interested about your opinion on that. > Well the plugin is bit more integrated into the publication infra so if somebody would want to use it directly they'd have to use that part as well. OTOH the protocol itself is provided as API so it's reusable by other plugins if needed. JSON plugin is something that would be nice to have in core as well, but I don't think it's part of this patch. > * Also I’ve noted that you got rid of reserved byte (flags) in protocol comparing to > pglogical_native. It was very handy to use it for two phase tx decoding (0 — usual > commit, 1 — prepare, 2 — commit prepared), because both prepare and commit > prepared generates commit record in xlog. Hmm maybe commit message could get it back. PGLogical has them sprinkled all around the protocol which I don't really like so I want to limit them to the places where they are actually useful. > >> On 05 Aug 2016, at 18:00, Petr Jelinek <petr@2ndquadrant.com> wrote: >> >> - DDL, I see several approaches we could do here for 10.0. a) don't >> deal with DDL at all yet, b) provide function which pushes the DDL >> into replication queue and then executes on downstream (like >> londiste, slony, pglogical do), c) capture the DDL query as text >> and allow user defined function to be called with that DDL text on >> the subscriber > > * Since here DDL is mostly ALTER / CREATE / DROP TABLE (or am I wrong?) may be > we can add something like WITH SUBSCRIBERS to statements? > Not sure I follow. How does that help? > * Talking about exact mechanism of DDL replication I like you variant b), but since we > have transactional DDL, we can do two phase commit here. That will require two phase > decoding and some logic about catching prepare responses through logical messages. If that > approach sounds interesting i can describe proposal in more details and create a patch. > I'd think that such approach is somewhat more interesting with c) honestly. The difference between b) and c) is mostly about explicit vs implicit. I definitely would like to see the 2PC patch updated to work with this. But maybe it's wise to wait a while until the core of the patch stabilizes during the discussion. > * Also I wasn’t able actually to run replication itself =) While regression tests passes, TAP > tests and manual run stuck. pg_subscription_rel.substate never becomes ‘r’. I’ll investigate > that more and write again. Interesting, please keep me posted. It's possible for tables to stay in 's' state for some time if there is nothing happening on the server, but that should not mean anything is stuck. > > * As far as I understand sync starts automatically on enabling publication. May be split that > logic into a different command with some options? Like don’t sync at all for example. > I think SYNC should be option of subscription creation just like INITIALLY ENABLED/DISABLED is. And then there should be interface to resync a table manually (like pglogical has). Not yet sure how that interface should look like in terms of DDL though. > * When I’m trying to create subscription to non-existent publication, CREATE SUBSRITION > creates replication slot and do not destroys it: > > # create subscription sub connection 'host=127.0.0.1 dbname=postgres' publication mypub; > NOTICE: created replication slot "sub" on provider > ERROR: could not receive list of replicated tables from the provider: ERROR: cache lookup failed for publication 0 > CONTEXT: slot "sub", output plugin "pgoutput", in the list_tables callback > > after that: > > postgres=# drop subscription sub; > ERROR: subscription "sub" does not exist > postgres=# create subscription sub connection 'host=127.0.0.1 dbname=postgres' publication pub; > ERROR: could not crate replication slot "sub": ERROR: replication slot "sub" already exists > See the TODO in CreateSubscription function :) > * Also can’t drop subscription: > > postgres=# \drs > List of subscriptions > Name | Database | Enabled | Publication | Conninfo > ------+----------+---------+-------------+-------------------------------- > sub | postgres | t | {mypub} | host=127.0.0.1 dbname=postgres > (1 row) > > postgres=# drop subscription sub; > ERROR: unrecognized object class: 6102 Yes that has been already reported. > > * Several time i’ve run in a situation where provider's postmaster ignores Ctrl-C until subscribed > node is switched off. > Hmm I guess there is bug in signal processing code somewhere. > * Patch with small typos fixed attached. > > I’ll do more testing, just want to share what i have so far. > Thanks for both. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 08/05/2016 11:00 AM, Petr Jelinek wrote: > Hi, > > as promised here is WIP version of logical replication patch. > Thanks for keeping on this. This is important work > Feedback is welcome. > +<sect1 id="logical-replication-publication"> + <title>Publication</title> + <para> + A Publication object can be defined on any master node, owned by one + user. A Publication is a set of changes generated from a group of + tables, and might also be described as a Change Set or Replication Set. + Each Publication exists in only one database. 'A publication object can be defined on *any master node*'. I found this confusing the first time I read it because I thought it was circular (what makes a node a 'master' node? Having a publication object published from it?). On reflection I realized that you mean ' any *physical replication master*'. I think this might be better worded as 'A publication object can be defined on any node other than a standby node'. I think referring to 'master' in the context of logical replication might confuse people. I am raising this in the context of the larger terminology that we want to use and potential confusion with the terminology we use for physical replication. I like the publication / subscription terminology you've gone with. <para> + Publications are different from table schema and do not affect + how the table is accessed. Each table can be added to multiple + Publications if needed. Publications may include both tables + and materialized views. Objects must be added explicitly, except + when a Publication is created for "ALL TABLES". There is no + default name for a Publication which specifies all tables. + </para> + <para> + The Publication is different from table schema, it does not affect + how the table is accessed and each table can be added to multiple Those 2 paragraphs seem to start the same way. I get the feeling that there is some point your trying to express that I'm not catching onto. Of course a publication is different than a tables schema, or different than a function. The definition of publication you have on the CREATE PUBLICATION page seems better and should be repeated here (A publication is essentially a group of tables intended for managing logical replication. See Section 30.1 <cid:part1.06040100.08080900@ssinger.info> for details about how publications fit into logical replication setup. ) + <para> + Conflicts happen when the replicated changes is breaking any + specified constraints (with the exception of foreign keys which are + not checked). Currently conflicts are not resolved automatically and + cause replication to be stopped with an error until the conflict is + manually resolved. What options are there for manually resolving conflicts? Is the only option to change the data on the subscriber to avoid the conflict? I assume there isn't a way to flag a particular row coming from the publisher and say ignore it. I don't think this is something we need to support for the first version. <sect1 id="logical-replication-architecture"> + <title>Architecture</title> + <para> + Logical replication starts by copying a snapshot of the data on + the Provider database. Once that is done, the changes on Provider I notice the user of 'Provider' above do you intend to update that to 'Publisher' or does provider mean something different. If we like the 'publication' terminology then I think 'publishers' should publish them not providers. I'm trying to test a basic subscription and I do the following I did the following: cluster 1: create database test1; create table a(id serial8 primary key,b text); create publication testpub1; alter publication testpub1 add table a; insert into a(b) values ('1'); cluster2 create database test1; create table a(id serial8 primary key,b text); create subscription testsub2 publication testpub1 connection 'host=localhost port=5440 dbname=test1'; NOTICE: created replication slot "testsub2" on provider NOTICE: synchronized table states CREATE SUBSCRIPTION This resulted in LOG: logical decoding found consistent point at 0/15625E0 DETAIL: There are no running transactions. LOG: exported logical decoding snapshot: "00000494-1" with 0 transaction IDs LOG: logical replication apply for subscription testsub2 started LOG: starting logical decoding for slot "testsub2" DETAIL: streaming transactions committing after 0/1562618, reading WAL from 0/15625E0 LOG: logical decoding found consistent point at 0/15625E0 DETAIL: There are no running transactions. LOG: logical replication sync for subscription testsub2, table a started LOG: logical decoding found consistent point at 0/1562640 DETAIL: There are no running transactions. LOG: exported logical decoding snapshot: "00000495-1" with 0 transaction IDs LOG: logical replication synchronization worker finished processing The initial sync completed okay, then I did insert into a(b) values ('2'); but the second insert never replicated. I had the following output LOG: terminating walsender process due to replication timeout On cluster 1 I do select * FROM pg_stat_replication; pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_location | write_location | flush_location | replay_location | sync_priority | sy nc_state -----+----------+---------+------------------+-------------+-----------------+-------------+---------------+- -------------+-------+---------------+----------------+----------------+-----------------+---------------+--- --------- (0 rows) If I then kill the cluster2 postmaster, I have to do a -9 or it won't die I get LOG: worker process: logical replication worker 16396 sync 16387 (PID 3677) exited with exit code 1 WARNING: could not launch logical replication worker LOG: logical replication sync for subscription testsub2, table a started ERROR: replication slot "testsub2_sync_a" does not exist ERROR: could not start WAL streaming: ERROR: replication slot "testsub2_sync_a" does not exist I'm not really sure what I need to do to debug this, I suspect the worker on cluster2 is having some issue. > [1] > https://www.postgresql.org/message-id/flat/CANP8%2Bj%2BNMHP-yFvoG03tpb4_s7GdmnCriEEOJeKkXWmUu_%3D-HA%40mail.gmail.com#CANP8+j+NMHP-yFvoG03tpb4_s7GdmnCriEEOJeKkXWmUu_=-HA@mail.gmail.com > > >
On 13/08/16 17:34, Steve Singer wrote: > On 08/05/2016 11:00 AM, Petr Jelinek wrote: >> Hi, >> >> as promised here is WIP version of logical replication patch. >> > > Thanks for keeping on this. This is important work > >> Feedback is welcome. >> > > +<sect1 id="logical-replication-publication"> > + <title>Publication</title> > + <para> > + A Publication object can be defined on any master node, owned by one > + user. A Publication is a set of changes generated from a group of > + tables, and might also be described as a Change Set or Replication > Set. > + Each Publication exists in only one database. > > 'A publication object can be defined on *any master node*'. I found > this confusing the first time I read it because I thought it was > circular (what makes a node a 'master' node? Having a publication object > published from it?). On reflection I realized that you mean ' any > *physical replication master*'. I think this might be better worded as > 'A publication object can be defined on any node other than a standby > node'. I think referring to 'master' in the context of logical > replication might confuse people. Makes sense to me. > > I am raising this in the context of the larger terminology that we want > to use and potential confusion with the terminology we use for physical > replication. I like the publication / subscription terminology you've > gone with. > > > <para> > + Publications are different from table schema and do not affect > + how the table is accessed. Each table can be added to multiple > + Publications if needed. Publications may include both tables > + and materialized views. Objects must be added explicitly, except > + when a Publication is created for "ALL TABLES". There is no > + default name for a Publication which specifies all tables. > + </para> > + <para> > + The Publication is different from table schema, it does not affect > + how the table is accessed and each table can be added to multiple > > Those 2 paragraphs seem to start the same way. I get the feeling that > there is some point your trying to express that I'm not catching onto. > Of course a publication is different than a tables schema, or different > than a function. Ah that's relic of some editorialization, will fix. The reason why we think it's important to mention the difference between publication and schema is that they are the only objects that contain tables but they affect them in very different ways which might confuse users. > > The definition of publication you have on the CREATE PUBLICATION page > seems better and should be repeated here (A publication is essentially a > group of tables intended for managing logical replication. See Section > 30.1 <cid:part1.06040100.08080900@ssinger.info> for details about how > publications fit into logical replication setup. ) > > > + <para> > + Conflicts happen when the replicated changes is breaking any > + specified constraints (with the exception of foreign keys which are > + not checked). Currently conflicts are not resolved automatically and > + cause replication to be stopped with an error until the conflict is > + manually resolved. > > What options are there for manually resolving conflicts? Is the only > option to change the data on the subscriber to avoid the conflict? > I assume there isn't a way to flag a particular row coming from the > publisher and say ignore it. I don't think this is something we need to > support for the first version. Yes you have to update data on subscriber or skip the the replication of whole transaction (for which the UI is not very friendly currently as you either have to consume the transaction pg_logical_slot_get_binary_changes or by moving origin on subscriber using pg_replication_origin_advance). It's relatively easy to add some automatic conflict resolution as well, but it didn't seem absolutely necessary so I didn't do it for the initial version. > > <sect1 id="logical-replication-architecture"> > + <title>Architecture</title> > + <para> > + Logical replication starts by copying a snapshot of the data on > + the Provider database. Once that is done, the changes on Provider > > I notice the user of 'Provider' above do you intend to update that to > 'Publisher' or does provider mean something different. If we like the > 'publication' terminology then I think 'publishers' should publish them > not providers. > Okay, I am just used to 'provider' in general (I guess londiste habit), but 'publisher' is fine as well. > > I'm trying to test a basic subscription and I do the following > > I did the following: > > cluster 1: > create database test1; > create table a(id serial8 primary key,b text); > create publication testpub1; > alter publication testpub1 add table a; > insert into a(b) values ('1'); > > cluster2 > create database test1; > create table a(id serial8 primary key,b text); > create subscription testsub2 publication testpub1 connection > 'host=localhost port=5440 dbname=test1'; > NOTICE: created replication slot "testsub2" on provider > NOTICE: synchronized table states > CREATE SUBSCRIPTION > > [...] > > The initial sync completed okay, then I did > > insert into a(b) values ('2'); > > but the second insert never replicated. > > I had the following output > > LOG: terminating walsender process due to replication timeout > > > On cluster 1 I do > > select * FROM pg_stat_replication; > pid | usesysid | usename | application_name | client_addr | > client_hostname | client_port | backend_start | > backend_xmin | state | sent_location | write_location | flush_location | > replay_location | sync_priority | sy > nc_state > -----+----------+---------+------------------+-------------+-----------------+-------------+---------------+- > > -------------+-------+---------------+----------------+----------------+-----------------+---------------+--- > > --------- > (0 rows) > > > > If I then kill the cluster2 postmaster, I have to do a -9 or it won't die > That might explain why it didn't replicate. The wait loops in apply worker clearly need some work. Thanks for the report. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
> On 11 Aug 2016, at 17:43, Petr Jelinek <petr@2ndquadrant.com> wrote: > >> >> * Also I wasn’t able actually to run replication itself =) While regression tests passes, TAP >> tests and manual run stuck. pg_subscription_rel.substate never becomes ‘r’. I’ll investigate >> that more and write again. > > Interesting, please keep me posted. It's possible for tables to stay in 's' state for some time if there is nothing happeningon the server, but that should not mean anything is stuck. Slightly played around, it seems that apply worker waits forever for substate change. (lldb) bt * thread #1: tid = 0x183e00, 0x00007fff88c7f2a2 libsystem_kernel.dylib`poll + 10, queue = 'com.apple.main-thread', stop reason= signal SIGSTOP frame #0: 0x00007fff88c7f2a2 libsystem_kernel.dylib`poll + 10 frame #1: 0x00000001017ca8a3 postgres`WaitEventSetWaitBlock(set=0x00007fd2dc816b30,cur_timeout=10000, occurred_events=0x00007fff5e7f67d8, nevents=1) +51 at latch.c:1108 frame #2: 0x00000001017ca438 postgres`WaitEventSetWait(set=0x00007fd2dc816b30, timeout=10000, occurred_events=0x00007fff5e7f67d8,nevents=1) + 248 at latch.c:941 frame #3: 0x00000001017c9fde postgres`WaitLatchOrSocket(latch=0x000000010ab208a4,wakeEvents=25, sock=-1, timeout=10000) + 254 at latch.c:347 frame #4:0x00000001017c9eda postgres`WaitLatch(latch=0x000000010ab208a4, wakeEvents=25, timeout=10000) + 42 at latch.c:302 * frame#5: 0x0000000101793352 postgres`wait_for_sync_status_change(tstate=0x0000000101e409b0) + 178 at tablesync.c:228 frame#6: 0x0000000101792bbe postgres`process_syncing_tables_apply(slotname="subbi", end_lsn=140734778796592) + 430 at tablesync.c:436 frame #7: 0x00000001017928c1 postgres`process_syncing_tables(slotname="subbi", end_lsn=140734778796592)+ 81 at tablesync.c:518 frame #8: 0x000000010177b620 postgres`LogicalRepApplyLoop(last_received=140734778796592)+ 704 at apply.c:1122 frame #9: 0x000000010177bef4 postgres`ApplyWorkerMain(main_arg=0)+ 1044 at apply.c:1353 frame #10: 0x000000010174cb5a postgres`StartBackgroundWorker+ 826 at bgworker.c:729 frame #11: 0x0000000101762227 postgres`do_start_bgworker(rw=0x00007fd2db700000)+ 343 at postmaster.c:5553 frame #12: 0x000000010175d42b postgres`maybe_start_bgworker+ 427 at postmaster.c:5761 frame #13: 0x000000010175bccf postgres`sigusr1_handler(postgres_signal_arg=30)+ 383 at postmaster.c:4979 frame #14: 0x00007fff9ab2352a libsystem_platform.dylib`_sigtramp+ 26 frame #15: 0x00007fff88c7e07b libsystem_kernel.dylib`__select + 11 frame #16:0x000000010175d5ac postgres`ServerLoop + 252 at postmaster.c:1665 frame #17: 0x000000010175b2e0 postgres`PostmasterMain(argc=3,argv=0x00007fd2db403840) + 5968 at postmaster.c:1309 frame #18: 0x000000010169507f postgres`main(argc=3,argv=0x00007fd2db403840) + 751 at main.c:228 frame #19: 0x00007fff8d45c5ad libdyld.dylib`start + 1 (lldb) p state (char) $1 = 'c' (lldb) p tstate->state (char) $2 = ‘c’ Also I’ve noted that some lsn position looks wrong on publisher: postgres=# select restart_lsn, confirmed_flush_lsn from pg_replication_slots;restart_lsn | confirmed_flush_lsn -------------+---------------------0/1530EF8 | 7FFF/5E7F6A30 (1 row) postgres=# select sent_location, write_location, flush_location, replay_location from pg_stat_replication;sent_location |write_location | flush_location | replay_location ---------------+----------------+----------------+-----------------0/1530F30 | 7FFF/5E7F6A30 | 7FFF/5E7F6A30 | 7FFF/5E7F6A30 (1 row) -- Stas Kelvich Postgres Professional: http://www.postgrespro.com Russian Postgres Company
On 15/08/16 15:51, Stas Kelvich wrote: >> On 11 Aug 2016, at 17:43, Petr Jelinek <petr@2ndquadrant.com> wrote: >> >>> >>> * Also I wasn’t able actually to run replication itself =) While regression tests passes, TAP >>> tests and manual run stuck. pg_subscription_rel.substate never becomes ‘r’. I’ll investigate >>> that more and write again. >> >> Interesting, please keep me posted. It's possible for tables to stay in 's' state for some time if there is nothing happeningon the server, but that should not mean anything is stuck. > > Slightly played around, it seems that apply worker waits forever for substate change. > > (lldb) bt > * thread #1: tid = 0x183e00, 0x00007fff88c7f2a2 libsystem_kernel.dylib`poll + 10, queue = 'com.apple.main-thread', stopreason = signal SIGSTOP > frame #0: 0x00007fff88c7f2a2 libsystem_kernel.dylib`poll + 10 > frame #1: 0x00000001017ca8a3 postgres`WaitEventSetWaitBlock(set=0x00007fd2dc816b30, cur_timeout=10000, occurred_events=0x00007fff5e7f67d8,nevents=1) + 51 at latch.c:1108 > frame #2: 0x00000001017ca438 postgres`WaitEventSetWait(set=0x00007fd2dc816b30, timeout=10000, occurred_events=0x00007fff5e7f67d8,nevents=1) + 248 at latch.c:941 > frame #3: 0x00000001017c9fde postgres`WaitLatchOrSocket(latch=0x000000010ab208a4, wakeEvents=25, sock=-1, timeout=10000)+ 254 at latch.c:347 > frame #4: 0x00000001017c9eda postgres`WaitLatch(latch=0x000000010ab208a4, wakeEvents=25, timeout=10000) + 42 at latch.c:302 > * frame #5: 0x0000000101793352 postgres`wait_for_sync_status_change(tstate=0x0000000101e409b0) + 178 at tablesync.c:228 > frame #6: 0x0000000101792bbe postgres`process_syncing_tables_apply(slotname="subbi", end_lsn=140734778796592) + 430at tablesync.c:436 > frame #7: 0x00000001017928c1 postgres`process_syncing_tables(slotname="subbi", end_lsn=140734778796592) + 81 at tablesync.c:518 > frame #8: 0x000000010177b620 postgres`LogicalRepApplyLoop(last_received=140734778796592) + 704 at apply.c:1122 > frame #9: 0x000000010177bef4 postgres`ApplyWorkerMain(main_arg=0) + 1044 at apply.c:1353 > frame #10: 0x000000010174cb5a postgres`StartBackgroundWorker + 826 at bgworker.c:729 > frame #11: 0x0000000101762227 postgres`do_start_bgworker(rw=0x00007fd2db700000) + 343 at postmaster.c:5553 > frame #12: 0x000000010175d42b postgres`maybe_start_bgworker + 427 at postmaster.c:5761 > frame #13: 0x000000010175bccf postgres`sigusr1_handler(postgres_signal_arg=30) + 383 at postmaster.c:4979 > frame #14: 0x00007fff9ab2352a libsystem_platform.dylib`_sigtramp + 26 > frame #15: 0x00007fff88c7e07b libsystem_kernel.dylib`__select + 11 > frame #16: 0x000000010175d5ac postgres`ServerLoop + 252 at postmaster.c:1665 > frame #17: 0x000000010175b2e0 postgres`PostmasterMain(argc=3, argv=0x00007fd2db403840) + 5968 at postmaster.c:1309 > frame #18: 0x000000010169507f postgres`main(argc=3, argv=0x00007fd2db403840) + 751 at main.c:228 > frame #19: 0x00007fff8d45c5ad libdyld.dylib`start + 1 > (lldb) p state > (char) $1 = 'c' > (lldb) p tstate->state > (char) $2 = ‘c’ > Hmm, not sure why is that, it might be related to the lsn reported being wrong. Could you check what is the lsn there (either in tstate or or in pg_subscription_rel)? Especially in comparison with what the sent_location is. > Also I’ve noted that some lsn position looks wrong on publisher: > > postgres=# select restart_lsn, confirmed_flush_lsn from pg_replication_slots; > restart_lsn | confirmed_flush_lsn > -------------+--------------------- > 0/1530EF8 | 7FFF/5E7F6A30 > (1 row) > > postgres=# select sent_location, write_location, flush_location, replay_location from pg_stat_replication; > sent_location | write_location | flush_location | replay_location > ---------------+----------------+----------------+----------------- > 0/1530F30 | 7FFF/5E7F6A30 | 7FFF/5E7F6A30 | 7FFF/5E7F6A30 > (1 row) > That's most likely result of the unitialized origin_startpos warning. I am working on new version of patch where that part is fixed, if you want to check this before I send it in, the patch looks like this: diff --git a/src/backend/replication/logical/apply.c b/src/backend/replication/logical/apply.c index 581299e..7a9e775 100644 --- a/src/backend/replication/logical/apply.c +++ b/src/backend/replication/logical/apply.c @@ -1353,6 +1353,7 @@ ApplyWorkerMain(Datum main_arg) originid = replorigin_by_name(myslotname, false); replorigin_session_setup(originid); replorigin_session_origin = originid; + origin_startpos = replorigin_session_get_progress(false); CommitTransactionCommand(); wrcapi->connect(wrchandle, MySubscription->conninfo, true, -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi all, attaching updated version of the patch. Still very much WIP but it's slowly getting there. Changes since last time: - Mostly rewrote publication handling in pgoutput which brings a) ability to add FOR ALL TABLES publications, b) performs better (no need to syscache lookup for every change like before), c) does correct invalidation of publications on DDL - added FOR TABLE and FOR ALL TABLES clause to both CREATE PUBLICATION and ALTER PUBLICATION so that one can create publication directly with table list, the FOR TABLE in ALTER PUBLICATION behaves like SET operation (removes existing, adds new ones) - fixed several issues with initial table synchronization (most of which have been reported here) - added pg_stat_subscription monitoring view - updated docs to reflect all the changes, also removed the stuff that's only planned from the docs (there is copy of the planned stuff docs in the neighboring thread so no need to keep it in the patch) - added documentation improvements suggested by Steve Singer and removed the capitalization in the main doc - added pg_dump support - improved psql support (\drp+ shows list of tables) - added flags to COMMIT message in the protocol so that we can add 2PC support in the future - fixed DROP SUBSCRIPTION issues and added tests for it I decided to not deal with ACLs so far, assuming superuser/replication role for now. We can always make it less restrictive later by adding the grantable privileges. FDW support is still TODO. I think TRUNCATE will have to be solved as part of other DDL in the future. I do have some ideas what to do with DDL but I don't plan to implement them in the initial patch. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
- 0001-Add-PUBLICATION-catalogs-and-DDL.patch.gz
- 0002-Add-SUBSCRIPTION-catalog-and-DDL.patch.gz
- 0003-Define-logical-replication-protocol-and-output-plugi.patch.gz
- 0004-Make-libpqwalreceiver-reentrant.patch.gz
- 0005-Add-logical-replication-workers.patch.gz
- 0006-Logical-replication-support-for-initial-data-copy.patch.gz
Hi, I found few bugs and missing docs and fixed those, here is updated version of the patch. No changes in terms of features. On 20/08/16 19:24, Petr Jelinek wrote: > Hi all, > > attaching updated version of the patch. Still very much WIP but it's > slowly getting there. > > Changes since last time: > - Mostly rewrote publication handling in pgoutput which brings a) > ability to add FOR ALL TABLES publications, b) performs better (no need > to syscache lookup for every change like before), c) does correct > invalidation of publications on DDL > - added FOR TABLE and FOR ALL TABLES clause to both CREATE PUBLICATION > and ALTER PUBLICATION so that one can create publication directly with > table list, the FOR TABLE in ALTER PUBLICATION behaves like SET > operation (removes existing, adds new ones) > - fixed several issues with initial table synchronization (most of which > have been reported here) > - added pg_stat_subscription monitoring view > - updated docs to reflect all the changes, also removed the stuff that's > only planned from the docs (there is copy of the planned stuff docs in > the neighboring thread so no need to keep it in the patch) > - added documentation improvements suggested by Steve Singer and removed > the capitalization in the main doc > - added pg_dump support > - improved psql support (\drp+ shows list of tables) > - added flags to COMMIT message in the protocol so that we can add 2PC > support in the future > - fixed DROP SUBSCRIPTION issues and added tests for it > > I decided to not deal with ACLs so far, assuming superuser/replication > role for now. We can always make it less restrictive later by adding the > grantable privileges. > -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
- 0001-Add-PUBLICATION-catalogs-and-DDL.patch.gz
- 0002-Add-SUBSCRIPTION-catalog-and-DDL.patch.gz
- 0003-Define-logical-replication-protocol-and-output-plugi.patch.gz
- 0004-Make-libpqwalreceiver-reentrant.patch.gz
- 0005-Add-logical-replication-workers.patch.gz
- 0006-Logical-replication-support-for-initial-data-copy.patch.gz
Hi, and one more version with bug fixes, improved code docs and couple more tests, some general cleanup and also rebased on current master for the start of CF. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
- 0001-Add-PUBLICATION-catalogs-and-DDL.patch.gz
- 0002-Add-SUBSCRIPTION-catalog-and-DDL.patch.gz
- 0003-Define-logical-replication-protocol-and-output-plugi.patch.gz
- 0004-Make-libpqwalreceiver-reentrant.patch.gz
- 0005-Add-logical-replication-workers.patch.gz
- 0006-Logical-replication-support-for-initial-data-copy.patch.gz
On 2016-08-31 22:51, Petr Jelinek wrote: > Hi, > > and one more version with bug fixes, improved code docs and couple > more tests, some general cleanup and also rebased on current master > for the start of CF. Clear, well-written docs, thanks. Here are some small changes to logical-replication.sgml Erik Rijkers
Attachment
On 2016-09-01 01:04, Erik Rijkers wrote: > On 2016-08-31 22:51, Petr Jelinek wrote: > > Here are some small changes to logical-replication.sgml ... and other .sgml files. Erik Rijkers
Attachment
Review of 0001-Add-PUBLICATION-catalogs-and-DDL.patch: The new system catalog pg_publication_rel has columns pubid, relid, and does not use the customary column name prefixes. Maybe that is OK here. I can't actually think of a naming scheme that wouldn't make things worse. The hunk in patch 0006 for src/backend/replication/logical/publication.c needs to be moved to 0001, for the definition of GetPublicationRelations(). The catalog column puballtables is not mentioned in the documentation. Unrelated formatting changes in src/backend/commands/Makefile. In psql, the code psql_error("The server (version %d.%d) does not ...") should be updated to use the new formatPGVersionNumber() function. In psql, psql \dr is already for "roles" (\drds). You are adding \drp for publications. Maybe use big R for replication-related describes? There should be some documentation about how TRUNCATE commands are handled by publications. Patch 0005 mentions TRUNCATE in the general documentation, but I would have questions when reading the CREATE PUBLICATION reference page. Also, document how publications deal with INSERT ON CONFLICT. In some places, the new publication object type is just added to the end of a list instead of some alphabetical place, e.g., event_trigger.c, gram.y (drop_type). publication.h has /* true if inserts are replicated */ repeated several times. What are the BKI_ROWTYPE_OID assignments for? Are they necessary here? (Maybe this was just copied from pg_subscription?) I think some or all of replication/logical/publication.c should be catalog/pg_publication.c. There are various different precedents in how this can be split up, but I kind of like having command/foocmds.c call into catalog/pg_foo.c. Also, some things could be in lsyscache.c, although not too many new things go in there now. Most calls of the GetPublication() function could be changed to a simpler get_publication_name(Oid), because that is all it is used for so far. (It will be used later in 0003, but only in one specific case.) In get_object_address_publication_rel() you are calling ObjectAddressSet(address, UserMappingRelationId, InvalidOid). That is probably a typo. Also, document somewhere around get_object_address_publication_rel() what objname (relation) and objargs (publication) are, otherwise one has to guess. (Existing similar functions are also not good about that.) The code for OCLASS_PUBLICATION_REL in getObjectIdentityParts() does not fill in objname and objargs, as it is supposed to. If I add a table to a publication, it requires a primary key. But after the table is added, I can remove the primary key. There is code in publication_add_relation() to record dependencies for that, but it doesn't seem to do its job right. Relatedly, the error messages in check_publication_add_relation() and AlterPublicationOptions() conflate replica identity index and primary key. (I suppose the whole user-facing presentation of what replica identity indexes are, which have so far been a rather obscure feature, will need some polishing during this.) I think the syntax could be made prettier. For example, instead of CREATE PUBLICATION testpib_ins_trunct WITH noreplicate_delete noreplicate_update; how about something like CREATE PUBLICATION foo (REPLICATE DELETE, NO REPLICATE UPDATE); Not that important right now, but something to keep in mind. I also found ALTER PUBLICATION FOR TABLE / FOR ALL TABLES confusing. Maybe that should be SET TABLE or something. Finally, I'd like some more test coverage of DDL error cases, like adding a view to a publication, trying to drop a primary key (as per above), and so on. (Various small typos and such I didn't bother with at this time.) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 02/09/16 22:57, Peter Eisentraut wrote: > Review of 0001-Add-PUBLICATION-catalogs-and-DDL.patch: > Thanks! > The new system catalog pg_publication_rel has columns pubid, relid, > and does not use the customary column name prefixes. Maybe that is OK > here. I can't actually think of a naming scheme that wouldn't make > things worse. > Yeah, well I could not either and thee are some catalogs that don't use the prefixes so I figured it's probably not big deal. > In psql, the code psql_error("The server (version %d.%d) does not > ...") should be updated to use the new formatPGVersionNumber() > function. > Right, same thing will be in the 2nd patch. > In psql, psql \dr is already for "roles" (\drds). You are adding \drp > for publications. Maybe use big R for replication-related describes? > Seems reasonable. > There should be some documentation about how TRUNCATE commands are > handled by publications. Patch 0005 mentions TRUNCATE in the general > documentation, but I would have questions when reading the CREATE > PUBLICATION reference page. > That's actually bug in the 0005 patch, TRUNCATE is not handled ATM, but that should be probably documented as well. > Also, document how publications deal with INSERT ON CONFLICT. > Okay, they just replicate whatever was the result of that operation (if any). > In some places, the new publication object type is just added to the > end of a list instead of some alphabetical place, e.g., > event_trigger.c, gram.y (drop_type). Hmm, what is and what isn't alphabetically sorted is quite unclear for me as we have mix of both everywhere. For example, if you consider drop_type to be alphabetically sorted then our locales are much more different than I thought :) > What are the BKI_ROWTYPE_OID assignments for? Are they necessary > here? (Maybe this was just copied from pg_subscription?) > Yes they are. > I think some or all of replication/logical/publication.c should be > catalog/pg_publication.c. There are various different precedents in > how this can be split up, but I kind of like having command/foocmds.c > call into catalog/pg_foo.c. > Okay, I prefer grouping the code by functionality (as in terms of this is replication) rather than architectures (as in terms this is catalog) but no problem moving it. Again same thing will be in 2nd patch. > Also, some things could be in lsyscache.c, although not too many new > things go in there now. TBH I dislike the whole lsyscache concept of just random lookup functions piled in one huge module and would rather not add to it. > > Most calls of the GetPublication() function could be changed to a > simpler get_publication_name(Oid), because that is all it is used for > so far. (It will be used later in 0003, but only in one specific > case.) You mean the calls from objectaddress? Will change that - I actually added the get_publication_name much later in the development and didn't go back to use it in preexisting code. > If I add a table to a publication, it requires a primary key. But > after the table is added, I can remove the primary key. There is code > in publication_add_relation() to record dependencies for that, but it > doesn't seem to do its job right. > I need to rewrite that part. That's actually something I could use other people opinion on - currently the pg_publication_rel does not have records for the alltables publication as that seemed redundant so it will need some special handling in tablecmds.c for the "dependency" tracking and possibly elsewhere for other things. I do wonder though if we should instead just add records to the pg_publication_rel catalog. > Relatedly, the error messages in check_publication_add_relation() and > AlterPublicationOptions() conflate replica identity index and primary > key. (I suppose the whole user-facing presentation of what replica > identity indexes are, which have so far been a rather obscure feature, > will need some polishing during this.) > Those are copy/paste issues from pglogical. It should say replica identity index everywhere. But yes it might be needed to make it more obvious what replica identity indexes are. > I think the syntax could be made prettier. For example, instead of > > CREATE PUBLICATION testpib_ins_trunct WITH noreplicate_delete > noreplicate_update; > > how about something like > > CREATE PUBLICATION foo (REPLICATE DELETE, NO REPLICATE UPDATE); > I went with the same syntax style as CREATE ROLE, but I am open to changes. > I also found ALTER PUBLICATION FOR TABLE / FOR ALL TABLES confusing. > Maybe that should be SET TABLE or something. > Yeah I am not sure what is the best option there. SET was also what I was thinking but then it does not map well to the CREATE PUBLICATION syntax and I would like to have some harmony there. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Petr Jelinek <petr@2ndquadrant.com> writes: > On 02/09/16 22:57, Peter Eisentraut wrote: >> The new system catalog pg_publication_rel has columns pubid, relid, >> and does not use the customary column name prefixes. Maybe that is OK >> here. I can't actually think of a naming scheme that wouldn't make >> things worse. > Yeah, well I could not either and thee are some catalogs that don't use > the prefixes so I figured it's probably not big deal. The ones that don't are not models to be emulated. They are cases where somebody ignored project convention and it wasn't caught until too late. regards, tom lane
On 03/09/16 18:04, Tom Lane wrote: > Petr Jelinek <petr@2ndquadrant.com> writes: >> On 02/09/16 22:57, Peter Eisentraut wrote: >>> The new system catalog pg_publication_rel has columns pubid, relid, >>> and does not use the customary column name prefixes. Maybe that is OK >>> here. I can't actually think of a naming scheme that wouldn't make >>> things worse. > >> Yeah, well I could not either and thee are some catalogs that don't use >> the prefixes so I figured it's probably not big deal. > > The ones that don't are not models to be emulated. They are cases > where somebody ignored project convention and it wasn't caught until > too late. > Okay but if I follow the convention the names of those fields would be something like pubrelpubid and pubrelrelid which does not seem like improvement to me. Maybe the catalog should be pg_publication_map then as that would make it seem less ugly although less future proof (as we'll want to add more things to publications than just tables and they might need different catalogs). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 08/31/2016 04:51 PM, Petr Jelinek wrote: > Hi, > > and one more version with bug fixes, improved code docs and couple > more tests, some general cleanup and also rebased on current master > for the start of CF. > > > To get the 'subscription' TAP tests to pass I need to set export PGTZ=+02 Shouldn't the expected output be with reference to PST8PDT?
On 09/05/2016 03:58 PM, Steve Singer wrote: > On 08/31/2016 04:51 PM, Petr Jelinek wrote: >> Hi, >> >> and one more version with bug fixes, improved code docs and couple >> more tests, some general cleanup and also rebased on current master >> for the start of CF. >> >> >> > A few more things I noticed when playing with the patches 1, Creating a subscription to yourself ends pretty badly, the 'CREATE SUBSCRIPTION' command seems to get stuck, and you can't kill it. The background process seems to be waiting for a transaction to commit (I assume the create subscription command). I had to kill -9 the various processes to get things to stop. Getting confused about hostnames and ports is a common operator error. 2. Failures during the initial subscription aren't recoverable For example on db1 create table a(id serial4 primary key,b text); insert into a(b) values ('1'); create publication testpub for tablea; on db2 create table a(id serial4 primary key,b text); insert into a(b) values ('1'); create subscription testsub connection'host=localhost port=5440 dbname=test' publication testpub; I then get in my db2 log ERROR: duplicate key value violates unique constraint "a_pkey" DETAIL: Key (id)=(1) already exists. LOG: worker process: logical replication worker 16396 sync 16387 (PID 10583) exited with exit code 1 LOG: logical replication sync for subscription testsub, table a started ERROR: could not crate replication slot "testsub_sync_a": ERROR: replication slot "testsub_sync_a" already exists LOG: worker process: logical replication worker 16396 sync 16387 (PID 10585) exited with exit code 1 LOG: logical replication sync for subscription testsub, table a started ERROR: could not crate replication slot "testsub_sync_a": ERROR: replication slot "testsub_sync_a" already exists and it keeps looping. If I then truncate "a" on db2 it doesn't help. (I'd expect at that point the initial subscription to work) If I then do on db2 drop subscription testsub cascade; I still see a slot in use on db1 select * FROM pg_replication_slots ; slot_name | plugin | slot_type | datoid | database | active | active_pid | xmin | catalog_xmin | rest art_lsn | confirmed_flush_lsn ----------------+----------+-----------+--------+----------+--------+------------+------+--------------+----- --------+--------------------- testsub_sync_a | pgoutput | logical | 16384 | test | f | | | 1173 | 0/15 66E08 | 0/1566E40
On 05/09/16 21:58, Steve Singer wrote: > On 08/31/2016 04:51 PM, Petr Jelinek wrote: >> Hi, >> >> and one more version with bug fixes, improved code docs and couple >> more tests, some general cleanup and also rebased on current master >> for the start of CF. >> >> >> > > To get the 'subscription' TAP tests to pass I need to set > > export PGTZ=+02 > > Shouldn't the expected output be with reference to PST8PDT? > That would break it for other timezones, the expected output should be whatever will work for everybody. I think the connection just needs to set the timezone so that it's stable across environments. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 05/09/16 23:35, Steve Singer wrote: > On 09/05/2016 03:58 PM, Steve Singer wrote: >> On 08/31/2016 04:51 PM, Petr Jelinek wrote: >>> Hi, >>> >>> and one more version with bug fixes, improved code docs and couple >>> more tests, some general cleanup and also rebased on current master >>> for the start of CF. >>> >>> >>> >> > > A few more things I noticed when playing with the patches > > 1, Creating a subscription to yourself ends pretty badly, > the 'CREATE SUBSCRIPTION' command seems to get stuck, and you can't kill > it. The background process seems to be waiting for a transaction to > commit (I assume the create subscription command). I had to kill -9 the > various processes to get things to stop. Getting confused about > hostnames and ports is a common operator error. > Hmm I guess there is missing interrupts check, will look. It would be great to detect it properly but I am not really sure how to do that as afaik there is no accurate way to detect that the connection is to yourself. > 2. Failures during the initial subscription aren't recoverable > > For example > > on db1 > create table a(id serial4 primary key,b text); > insert into a(b) values ('1'); > create publication testpub for table a; > > on db2 > create table a(id serial4 primary key,b text); > insert into a(b) values ('1'); > create subscription testsub connection 'host=localhost port=5440 > dbname=test' publication testpub; > > I then get in my db2 log > > ERROR: duplicate key value violates unique constraint "a_pkey" > DETAIL: Key (id)=(1) already exists. > LOG: worker process: logical replication worker 16396 sync 16387 (PID > 10583) exited with exit code 1 > LOG: logical replication sync for subscription testsub, table a started > ERROR: could not crate replication slot "testsub_sync_a": ERROR: > replication slot "testsub_sync_a" already exists > > > LOG: worker process: logical replication worker 16396 sync 16387 (PID > 10585) exited with exit code 1 > LOG: logical replication sync for subscription testsub, table a started > ERROR: could not crate replication slot "testsub_sync_a": ERROR: > replication slot "testsub_sync_a" already exists > > > and it keeps looping. > If I then truncate "a" on db2 it doesn't help. (I'd expect at that point > the initial subscription to work) Hmm, looks like the error case does not cleanup correctly after itself. > > If I then do on db2 > drop subscription testsub cascade; > > I still see a slot in use on db1 > > select * FROM pg_replication_slots ; > slot_name | plugin | slot_type | datoid | database | active | > active_pid | xmin | catalog_xmin | rest > art_lsn | confirmed_flush_lsn > ----------------+----------+-----------+--------+----------+--------+------------+------+--------------+----- > > --------+--------------------- > testsub_sync_a | pgoutput | logical | 16384 | test | f > | | | 1173 | 0/15 > 66E08 | 0/1566E40 > Same as above. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 01/09/16 08:29, Erik Rijkers wrote: > On 2016-09-01 01:04, Erik Rijkers wrote: >> On 2016-08-31 22:51, Petr Jelinek wrote: >> >> Here are some small changes to logical-replication.sgml > > ... and other .sgml files. Thanks I'll integrate these into next iteration of the patch, -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 9/3/16 5:14 AM, Petr Jelinek wrote: >> What are the BKI_ROWTYPE_OID assignments for? Are they necessary >> > here? (Maybe this was just copied from pg_subscription?) >> > > Yes they are. Please explain/document why. It does not match other catalogs, which either use it for relcache initialization or because they are shared catalogs. (I'm not sure of the details, but this one clearly looks different.) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 06/09/16 20:14, Peter Eisentraut wrote: > On 9/3/16 5:14 AM, Petr Jelinek wrote: >>> What are the BKI_ROWTYPE_OID assignments for? Are they necessary >>>> here? (Maybe this was just copied from pg_subscription?) >>>> >> Yes they are. > > Please explain/document why. It does not match other catalogs, which > either use it for relcache initialization or because they are shared > catalogs. (I'm not sure of the details, but this one clearly looks > different.) > Erm, I meant yes they are just copied and I will remove them (I see how my answer might been confusing given that you asked multiple questions, sorry). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Review of 0002-Add-SUBSCRIPTION-catalog-and-DDL.patch: (As you had already mentioned, some of the review items in 0001 apply analogously here.) Changes needed to compile: --- a/src/backend/commands/subscriptioncmds.c +++ b/src/backend/commands/subscriptioncmds.c @@ -218,7 +218,7 @@ CreateSubscription(CreateSubscriptionStmt *stmt) CatalogUpdateIndexes(rel, tup); heap_freetuple(tup); - ObjectAddressSet(myself, SubscriptionRelationId, suboid); + ObjectAddressSet(myself, SubscriptionRelationId, subid); heap_close(rel, RowExclusiveLock); This is fixed later in patch 0005. --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -6140,8 +6140,7 @@ <title><structname>pg_subscription</structname> Columns</title> <entry><structfield>subpublications</structfield></entry> <entry><type>text[]</type></entry> <entry></entry> - <entry>Array of subscribed publication names. For more on publications - see <xref linkend="publications">. + <entry>Array of subscribed publication names. </entry> </row> </tbody> I don't see that id defined in any later patch. Minor problems: Probably unintentional change in pg_dump.h: - * The PublicationInfo struct is used to represent publications. + * The PublicationInfo struct is used to represent publication. pg_subscription column "dbid" rename to "subdbid". I think subpublications ought to be of type name[], not text[]. It says, a subscription can only be dropped by its owner or a superuser. But subscriptions don't have owners. Maybe they should. On the CREATE SUBSCRIPTION ref page, | INITIALLY ( ENABLED | DISABLED ) should use {} instead. We might want to add ALTER commands to rename subscriptions and publications. Similar concerns as before about ALTER syntax, e.g., does ALTER SUBSCRIPTION ... PUBLICATION add to or replace the publication set? For that matter, why is there no way to add? Document why publicationListToArray() creates its own memory context. I think we should allow creating subscriptions initally without publications. This could be useful for example to test connections, or create slots before later on adding publications. Seeing that there is support for changing the publications later, this shouldn't be a problem. The synopsis of CREATE SUBSCRIPTION indicates that options are optional, but it actually requires at least one option. At the end of CreateSubscription(), the CommandCounterIncrement() doesn't appear to be necessary (yet, see patch 0005?). Maybe check for duplicates in the publications list. Larger conceptual issues: I haven't read the rest of the code yet to understand why pg_subscription needs to be a shared catalog, but be that as it may, I would still make it so that subscription names appear local to the database. We already have the database OID in the pg_subscription catalog, so I would make the key (subname, subdatid). DDL commands would only operate on subscriptions in the current database (or you could use "datname"."subname" syntax), and \drs would only show subscriptions in the current database. That way the shared catalog is an implementation detail that can be changed in the future. I think it would be very helpful for users if publications and subscriptions appear to work in a parallel way. If I have two databases that I want to replicate between two servers, I might want to have a publication "mypub" in each database and a subscription "mysub" in each database. If I learn that the subscriptions can't be named that way, then I have to go back to rename to the publications, and it'll all be a bit frustrating. Some thoughts on pg_dump and such: Even an INITIALLY DISABLED subscription needs network access to create the replication slot. So restoring a dump when the master is not available will have some difficulties. And restoring master and slave at the same time (say disaster recovery) will not necessarily work well either. Also, the general idea of doing network access during a backup restore without obvious prior warning sounds a bit unsafe. I imagine maybe having three states for subscriptions: DISABLED, PREPARED, ENABLED (to use existing key words). DISABLED just exists in the catalog, PREPARED has the slots set up, ENABLED starts replicating. So you can restore a dump with all slots disabled. And then it might be good to have a command to "prepare" or "enable" all subscriptions at once. That command would also help if you restore a dump not in a transaction but you want to enable all subscriptions in the same transaction. I'd also prefer having subscriptions dumped by default, just to keep it so that pg_dump by default backs up everything. Finally, having disabled subscriptions without network access would also allow writing some DDL command tests. As I had mentioned privately before, I would perhaps have CREATE SUBSCRIPTION use the foreign server infrastructure for storing connection information. We'll have to keep thinking about ways to handle abandonded replication slots. I imagine that people will want to create subscriber instances in fully automated ways. If that fails every so often and requires manual cleanup of replication slots on the master some of the time, that will get messy. I don't have well-formed ideas about this, though. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 07/09/16 02:56, Peter Eisentraut wrote: > Review of 0002-Add-SUBSCRIPTION-catalog-and-DDL.patch: > > Similar concerns as before about ALTER syntax, e.g., does ALTER > SUBSCRIPTION ... PUBLICATION add to or replace the publication set? > It sets. > For that matter, why is there no way to add? > Didn't seem all that useful, the expectation here is that most subscriptions will use one couple of publications. > > I think we should allow creating subscriptions initally without > publications. This could be useful for example to test connections, > or create slots before later on adding publications. Seeing that > there is support for changing the publications later, this shouldn't > be a problem. > Sure, but they need to be created disabled then. > > Larger conceptual issues: > > I haven't read the rest of the code yet to understand why > pg_subscription needs to be a shared catalog, but be that as it may, I > would still make it so that subscription names appear local to the > database. We already have the database OID in the pg_subscription > catalog, so I would make the key (subname, subdatid). DDL commands > would only operate on subscriptions in the current database (or you > could use "datname"."subname" syntax), and \drs would only show > subscriptions in the current database. That way the shared catalog is > an implementation detail that can be changed in the future. I think > it would be very helpful for users if publications and subscriptions > appear to work in a parallel way. If I have two databases that I want > to replicate between two servers, I might want to have a publication > "mypub" in each database and a subscription "mysub" in each database. > If I learn that the subscriptions can't be named that way, then I have > to go back to rename to the publications, and it'll all be a bit > frustrating. Okay that makes sense. The pg_subscription is shared catalog so that we can have one launcher per cluster instead one per database. Otherwise there is no reason why it could not behave like local catalog. > > Some thoughts on pg_dump and such: > > Even an INITIALLY DISABLED subscription needs network access to create > the replication slot. So restoring a dump when the master is not > available will have some difficulties. And restoring master and slave > at the same time (say disaster recovery) will not necessarily work > well either. Also, the general idea of doing network access during a > backup restore without obvious prior warning sounds a bit unsafe. > > I imagine maybe having three states for subscriptions: DISABLED, > PREPARED, ENABLED (to use existing key words). DISABLED just exists > in the catalog, PREPARED has the slots set up, ENABLED starts > replicating. So you can restore a dump with all slots disabled. And > then it might be good to have a command to "prepare" or "enable" all > subscriptions at once. Well the DISABLED keyword is also used in alter to stop the subscription but not remove it, that would not longer map well if we used the behavior you described. That being said I agree with the idea of having subscription that exists just locally in catalog, we just need to figure out better naming. > > As I had mentioned privately before, I would perhaps have CREATE > SUBSCRIPTION use the foreign server infrastructure for storing > connection information. > Hmm, yeah it's an idea. My worry there is that it will make it bit more complex to setup as user will have to first create server and user mapping before creating subscription. > We'll have to keep thinking about ways to handle abandonded > replication slots. I imagine that people will want to create > subscriber instances in fully automated ways. If that fails every so > often and requires manual cleanup of replication slots on the master > some of the time, that will get messy. I don't have well-formed ideas > about this, though. > Yes it's potential issue, don't have good solution for it either. It's loosely coupled system so we can't have 100% control over everything. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2016-08-31 22:51, Petr Jelinek wrote: > > and one more version with bug fixes, improved code docs and couple I am not able to get the replication to work. Would you (or anyone) be so kind to point out what I am doing wrong? Patches applied, compiled, make-checked, installed OK. I have 2 copies compiled and installed, logical_replication and logical_replication2, to be publisher and subscriber, ports 6972 and 6973 respectively. ( BTW, there is no postgres user; OS user is 'aardvark'. 'aardvark is also db superuser, and it is also the user as which the two installations are installed. ) PGPASSFILE is set up and works for both instances. both pg_hba.conf's changed to have: local replication aardvark md5 instances.sh -------------------------------------------------------------------- #!/bin/sh project1=logical_replication # publisher project2=logical_replication2 # subscriber pg_stuff_dir=$HOME/pg_stuff PATH1=$pg_stuff_dir/pg_installations/pgsql.$project1/bin:$PATH PATH2=$pg_stuff_dir/pg_installations/pgsql.$project2/bin:$PATH server_dir1=$pg_stuff_dir/pg_installations/pgsql.$project1 server_dir2=$pg_stuff_dir/pg_installations/pgsql.$project2 port1=6972 port2=6973 data_dir1=$server_dir1/data data_dir2=$server_dir2/data options1=" -c wal_level=logical -c max_replication_slots=10 -c max_worker_processes=12 -c max_logical_replication_workers=10 -c max_wal_senders=10 -c logging_collector=on -c log_directory=$server_dir1 -c log_filename=logfile.${project1} " options2=" -c wal_level=logical -c max_replication_slots=10 -c max_worker_processes=12 -c max_logical_replication_workers=10 -c max_wal_senders=10 -c logging_collector=on -c log_directory=$server_dir2 -c log_filename=logfile.${project2} " # start two instances: export PATH=$PATH1; postgres -D $data_dir1 -p $port1 ${options1} & export PATH=$PATH2; postgres -D $data_dir2 -p $port2 ${options2} & -------------------------------------------------------------------- Both instances run fine. On publisher db: Create a table testt, with 20 rows. CREATE PUBLICATION pub1 FOR TABLE testt ; No problem. On Subscriber db: CREATE SUBSCRIPTION sub1 WITH CONNECTION 'host=/tmp dbname=testdb port=6972' PUBLICATION pub1 INITIALLY DISABLED; ALTER SUBSCRIPTION sub1 enable; Adding rows to the table (publisher-side) gets activity going. I give the resulting logs of both sides: Logfile publisher side: [...] 2016-09-07 13:47:44.287 CEST 21995 LOG: logical replication launcher started 2016-09-07 13:51:42.601 CEST 22141 LOG: logical decoding found consistent point at 0/230F478 2016-09-07 13:51:42.601 CEST 22141 DETAIL: There are no running transactions. 2016-09-07 13:51:42.601 CEST 22141 LOG: exported logical decoding snapshot: "00000702-1" with 0 transaction IDs 2016-09-07 13:52:11.326 CEST 22144 LOG: starting logical decoding for slot "sub1" 2016-09-07 13:52:11.326 CEST 22144 DETAIL: streaming transactions committing after 0/230F4B0, reading WAL from 0/230F478 2016-09-07 13:52:11.326 CEST 22144 LOG: logical decoding found consistent point at 0/230F478 2016-09-07 13:52:11.326 CEST 22144 DETAIL: There are no running transactions. 2016-09-07 13:53:47.012 CEST 22144 LOG: could not receive data from client: Connection reset by peer 2016-09-07 13:53:47.012 CEST 22144 LOG: unexpected EOF on standby connection 2016-09-07 13:53:47.025 CEST 22185 LOG: starting logical decoding for slot "sub1" 2016-09-07 13:53:47.025 CEST 22185 DETAIL: streaming transactions committing after 0/230F628, reading WAL from 0/230F5F0 2016-09-07 13:53:47.025 CEST 22185 LOG: logical decoding found consistent point at 0/230F5F0 2016-09-07 13:53:47.025 CEST 22185 DETAIL: There are no running transactions. 2016-09-07 13:53:47.030 CEST 22185 LOG: could not receive data from client: Connection reset by peer 2016-09-07 13:53:47.030 CEST 22185 LOG: unexpected EOF on standby connection 2016-09-07 13:53:52.044 CEST 22188 LOG: starting logical decoding for slot "sub1" 2016-09-07 13:53:52.044 CEST 22188 DETAIL: streaming transactions committing after 0/230F628, reading WAL from 0/230F5F0 2016-09-07 13:53:52.044 CEST 22188 LOG: logical decoding found consistent point at 0/230F5F0 2016-09-07 13:53:52.044 CEST 22188 DETAIL: There are no running transactions. 2016-09-07 13:53:52.195 CEST 22188 LOG: could not receive data from client: Connection reset by peer 2016-09-07 13:53:52.195 CEST 22188 LOG: unexpected EOF on standby connection (repeat every few seconds) Logfile subscriber-side: [...] 2016-09-07 13:47:44.441 CEST 21997 LOG: MultiXact member wraparound protections are now enabled 2016-09-07 13:47:44.528 CEST 21986 LOG: database system is ready to accept connections 2016-09-07 13:47:44.529 CEST 22002 LOG: logical replication launcher started 2016-09-07 13:52:11.319 CEST 22143 LOG: logical replication apply for subscription sub1 started 2016-09-07 13:53:47.010 CEST 22143 ERROR: could not open relation with OID 0 2016-09-07 13:53:47.012 CEST 21986 LOG: worker process: logical replication worker 24048 (PID 22143) exited with exit code 1 2016-09-07 13:53:47.018 CEST 22184 LOG: logical replication apply for subscription sub1 started 2016-09-07 13:53:47.028 CEST 22184 ERROR: could not open relation with OID 0 2016-09-07 13:53:47.030 CEST 21986 LOG: worker process: logical replication worker 24048 (PID 22184) exited with exit code 1 2016-09-07 13:53:52.041 CEST 22187 LOG: logical replication apply for subscription sub1 started 2016-09-07 13:53:52.045 CEST 22187 ERROR: could not open relation with OID 0 2016-09-07 13:53:52.046 CEST 21986 LOG: worker process: logical replication worker 24048 (PID 22187) exited with exit code 1 (repeat every few seconds) Any hints welcome. Thanks! Erik Rijkers
Hi, On 07/09/16 14:10, Erik Rijkers wrote: > On 2016-08-31 22:51, Petr Jelinek wrote: >> >> and one more version with bug fixes, improved code docs and couple > > > I am not able to get the replication to work. Would you (or anyone) be > so kind to point out what I am doing wrong? > > Patches applied, compiled, make-checked, installed OK. > > I have 2 copies compiled and installed, logical_replication and > logical_replication2, to be publisher and subscriber, ports 6972 and > 6973 respectively. > > > Logfile subscriber-side: > [...] > 2016-09-07 13:47:44.441 CEST 21997 LOG: MultiXact member wraparound > protections are now enabled > 2016-09-07 13:47:44.528 CEST 21986 LOG: database system is ready to > accept connections > 2016-09-07 13:47:44.529 CEST 22002 LOG: logical replication launcher > started > 2016-09-07 13:52:11.319 CEST 22143 LOG: logical replication apply for > subscription sub1 started > 2016-09-07 13:53:47.010 CEST 22143 ERROR: could not open relation with > OID 0 > 2016-09-07 13:53:47.012 CEST 21986 LOG: worker process: logical > replication worker 24048 (PID 22143) exited with exit code 1 > 2016-09-07 13:53:47.018 CEST 22184 LOG: logical replication apply for > subscription sub1 started > 2016-09-07 13:53:47.028 CEST 22184 ERROR: could not open relation with > OID 0 > 2016-09-07 13:53:47.030 CEST 21986 LOG: worker process: logical > replication worker 24048 (PID 22184) exited with exit code 1 > 2016-09-07 13:53:52.041 CEST 22187 LOG: logical replication apply for > subscription sub1 started > 2016-09-07 13:53:52.045 CEST 22187 ERROR: could not open relation with > OID 0 > 2016-09-07 13:53:52.046 CEST 21986 LOG: worker process: logical > replication worker 24048 (PID 22187) exited with exit code 1 > (repeat every few seconds) > > It means the tables don't exist on subscriber. I added check and proper error message in my local dev branch, it will be part of the next update. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi, Updated version, this should address most of the things in Peter's reviews so far, not all though as some of it needs more discussion. Changes: - I moved the publication.c to pg_publication.c, subscription.c to pg_subscription.c. - changed \drp and \drs to \dRp and \dRs - fixed definitions of the catalogs (BKI_ROWTYPE_OID) - changed some GetPublication calls to get_publication_name - fixed getObjectIdentityParts for OCLASS_PUBLICATION_REL - fixed get_object_address_publication_rel - fixed the dependencies between pkeys and publications, for this I actually had to add new interface to depenency.c that allows dropping single dependency - fixed the 'for all tables' and 'for tables all in schema' publications - changed the alter publication from FOR to SET - added more test cases for the publication DDL - fixed compilation of subscription patch alone and docs - changed subpublications to name[] - added check for publication list duplicates - made the subscriptions behave more like they are inside the database instead of shared catalog (even though the catalog is still shared) - added options for for CREATE SUBSCRIPTION to optionally not create slot and not do the initial data sync - that should solve the complaint about CREATE SUBSCRIPTION always connecting - the CREATE SUBSCRIPTION also tries to check if the specified connection connects back to same db (although that check is somewhat imperfect) and if it gets stuck on create slot it should be normally cancelable (that should solve the issue Steve Singer had) - fixed the tests to work in any timezone - added DDL regress tests for subscription - added proper detection of missing schemas and tables on subscriber - rebased on top of 19acee8 as the DefElem changes broke the patch The table sync is still far from ready. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
- 0001-Add-PUBLICATION-catalogs-and-DDL.patch.gz
- 0002-Add-SUBSCRIPTION-catalog-and-DDL.patch.gz
- 0003-Define-logical-replication-protocol-and-output-plugi.patch.gz
- 0004-Make-libpqwalreceiver-reentrant.patch.gz
- 0005-Add-logical-replication-workers.patch.gz
- 0006-Logical-replication-support-for-initial-data-copy.patch.gz
Review of 0003-Define-logical-replication-protocol-and-output-plugi.patch: (This is still based on the Aug 31 patch set, but at quick glance I didn't see any significant changes in the Sep 8 set.) Generally, this all seems mostly fine. Everything is encapsulated well enough that problems are localized and any tweaks don't affect the overall work. Changes needed to build: --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -2158,8 +2158,8 @@ <title>Logical Streaming Replication Parameters</title> <listitem> <para> Comma separated list of publication names for which to subscribe - (receive changes). See - <xref linkend="logical-replication-publication"> for more info. + (receive changes). <!-- See + <xref linkend="logical-replication-publication"> for more info. --> </para> </listitem> </varlistentry> --- a/src/backend/replication/pgoutput/pgoutput.c +++ b/src/backend/replication/pgoutput/pgoutput.c @@ -25,6 +25,7 @@#include "utils/builtins.h"#include "utils/inval.h"#include "utils/memutils.h" +#include "utils/syscache.h" PG_MODULE_MAGIC; This is all fixed in later patches. AFAICT, pgoutput does not use libpq, so the mentions in src/backend/replication/pgoutput/Makefile are not needed (perhaps copied from libpqwalreceiver?). The start_replication option pg_version option is not documented and not used in any later patch. We can probably do without it and just rely on the protocol version. In pgoutput_startup(), you check opt->output_type. But it is not set anywhere. Actually, the startup callback is supposed to set it itself. In init_rel_sync_cache(), the way hash_flags is set seems kind of weird. I think that variable could be removed and the flags put directly into the hash_create() call. pgoutput_config.c seems over-engineered, e.g., converting cstring to Datum and back. Just do normal DefElem list parsing in pgoutput.c. That's not pretty either, but at least it's a common coding pattern. In the protocol documentation, explain the meaning of int64 as a commit timestamp. Also, the documentation should emphasize more clearly that all the messages are not actually top-level protocol messages but are contained inside binary copy data. On the actual protocol messages: Why do strings have a length byte? That is not how other strings in the protocol work. As a minor side-effect, this would limit for example column names to 255 characters. The message structure doesn't match the documentation in some ways. For example Attributes and TupleData are not separate messages but are contained in Relation and Insert/Update/Delete messages. So the documentation needs to be structured a bit differently. In the Attributes message (or actually Relation message), we don't need the 'A' and 'C' bytes. I'm not sure that pgoutput should concern itself with the client encoding. The client encoding should already be set by the initial FE/BE protocol handshake. I haven't checked that further yet, so it might already work, or it should be made to work that way, or I might be way off. Slight abuse of pqformat functions. We're not composing messages using pq_beginmessage()/pq_endmessage(), and we're not using pq_getmsgend() when reading. The "proper" way to do this is probably to define a custom set of PQcommMethods. (low priority) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 9/6/16 8:56 PM, Peter Eisentraut wrote: > Some thoughts on pg_dump and such: Another issue to add to this list: With the current patch set, pg_dump will fail for unprivileged users, because it can't read pg_subscription. The include_subscription flag ought to be checked in getSubscriptions() already, not (only) in dumpSubscription(). The test suite for pg_dump fails because of this. We might make further changes in this area, per ongoing discussion, but it would be good to put in a quick fix for this in the next patch set so that the global test suite doesn't fail. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Review of 0004-Make-libpqwalreceiver-reentrant.patch: This looks like a good change. typo: _PG_walreceirver_conn_init For libpqrcv_create_slot(), slotname should be const char *. Similarly, for slotname in libpqrcv_startstreaming*() and conninfo in libpqrcv_connect(). (the latter two pre-existing) The connection handle should record in libpqrcv_connect() whether a connection is a logical or physical replication stream. Then that parameter doesn't have to be passed around later (or at least some asserts could double-check it). In libpqrcv_connect(), the new argument connname is actually just the application name, for which in later patches the subscription name is passed in. Does this have a deeper meaning, or should we call the argument appname to avoid introducing another term? New function libpqrcv_create_slot(): Hardcoded cmd length (hmm, other functions do that too), should used StringInfo. ereport instead of elog. No newline at the end of error message, since PQerrorMessage() already supplies it. Typo "could not crate". Briefly document return value. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 09/09/16 06:33, Peter Eisentraut wrote: > Review of 0003-Define-logical-replication-protocol-and-output-plugi.patch: > > (This is still based on the Aug 31 patch set, but at quick glance I > didn't see any significant changes in the Sep 8 set.) > Yep. > The start_replication option pg_version option is not documented and > not used in any later patch. We can probably do without it and just > rely on the protocol version. > That's leftover from binary type data transfer which is not part of this initial approach as it adds a lot of complications to both protocol and apply side. So yes can do without. > In pgoutput_startup(), you check opt->output_type. But it is not set > anywhere. Actually, the startup callback is supposed to set it > itself. Leftover from pglogical which actually supports both output types. > In init_rel_sync_cache(), the way hash_flags is set seems kind of > weird. I think that variable could be removed and the flags put > directly into the hash_create() call. > Eh, yes no idea how that came to be. > pgoutput_config.c seems over-engineered, e.g., converting cstring to > Datum and back. Just do normal DefElem list parsing in pgoutput.c. > That's not pretty either, but at least it's a common coding pattern. > Yes now that we have only couple of options I agree. > In the protocol documentation, explain the meaning of int64 as a > commit timestamp. > You mean that it's milliseconds since postgres epoch? > On the actual protocol messages: > > Why do strings have a length byte? That is not how other strings in > the protocol work. As a minor side-effect, this would limit for > example column names to 255 characters. Because I originally sent them without the null termination but I guess they don't really need it anymore. (the 255 char limit is not really important in practice given the column length is limited to 64 characters anyway) > > The message structure doesn't match the documentation in some ways. > For example Attributes and TupleData are not separate messages but are > contained in Relation and Insert/Update/Delete messages. So the > documentation needs to be structured a bit differently. > > In the Attributes message (or actually Relation message), we don't > need the 'A' and 'C' bytes. > Hmm okay will look into it. I guess if we remove the 'A' then rest of the Attribute message neatly merges into the Relation message. The more interesting part will be the TupleData as it's common part of other messages. > I'm not sure that pgoutput should concern itself with the client > encoding. The client encoding should already be set by the initial > FE/BE protocol handshake. I haven't checked that further yet, so it > might already work, or it should be made to work that way, or I might > be way off. Yes, I think you are right, that was there mostly for same reason as the pg_version. > > Slight abuse of pqformat functions. We're not composing messages > using pq_beginmessage()/pq_endmessage(), and we're not using > pq_getmsgend() when reading. The "proper" way to do this is probably > to define a custom set of PQcommMethods. (low priority) > If we change that, I'd probably rather go with direct use of StringInfo functions. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2016-09-12 21:47:08 +0200, Petr Jelinek wrote: > On 09/09/16 06:33, Peter Eisentraut wrote: > > The start_replication option pg_version option is not documented and > > not used in any later patch. We can probably do without it and just > > rely on the protocol version. > > > > That's leftover from binary type data transfer which is not part of this > initial approach as it adds a lot of complications to both protocol and > apply side. So yes can do without. FWIW, I don't think we can leave this out of the initial protocol design. We don't have to implement it, but it has to be part of the design. Greetings, Andres Freund
On 12/09/16 21:54, Andres Freund wrote: > On 2016-09-12 21:47:08 +0200, Petr Jelinek wrote: >> On 09/09/16 06:33, Peter Eisentraut wrote: >>> The start_replication option pg_version option is not documented and >>> not used in any later patch. We can probably do without it and just >>> rely on the protocol version. >>> >> >> That's leftover from binary type data transfer which is not part of this >> initial approach as it adds a lot of complications to both protocol and >> apply side. So yes can do without. > > FWIW, I don't think we can leave this out of the initial protocol > design. We don't have to implement it, but it has to be part of the > design. > I don't think it's a good idea to have unimplemented parts of protocol, we have protocol version so it can be added in v2 when we have code that is able to handle it. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2016-09-12 21:57:39 +0200, Petr Jelinek wrote: > On 12/09/16 21:54, Andres Freund wrote: > > On 2016-09-12 21:47:08 +0200, Petr Jelinek wrote: > > > On 09/09/16 06:33, Peter Eisentraut wrote: > > > > The start_replication option pg_version option is not documented and > > > > not used in any later patch. We can probably do without it and just > > > > rely on the protocol version. > > > > > > > > > > That's leftover from binary type data transfer which is not part of this > > > initial approach as it adds a lot of complications to both protocol and > > > apply side. So yes can do without. > > > > FWIW, I don't think we can leave this out of the initial protocol > > design. We don't have to implement it, but it has to be part of the > > design. > > > > I don't think it's a good idea to have unimplemented parts of protocol, we > have protocol version so it can be added in v2 when we have code that is > able to handle it. I don't think we have to have it part of the protocol. But it has to be forseen, otherwise introducing it later will end up requiring more invasive changes than acceptable. I don't want to repeat the "libpq v3 protocol" evolution story here.
On 12/09/16 22:21, Andres Freund wrote: > On 2016-09-12 21:57:39 +0200, Petr Jelinek wrote: >> On 12/09/16 21:54, Andres Freund wrote: >>> On 2016-09-12 21:47:08 +0200, Petr Jelinek wrote: >>>> On 09/09/16 06:33, Peter Eisentraut wrote: >>>>> The start_replication option pg_version option is not documented and >>>>> not used in any later patch. We can probably do without it and just >>>>> rely on the protocol version. >>>>> >>>> >>>> That's leftover from binary type data transfer which is not part of this >>>> initial approach as it adds a lot of complications to both protocol and >>>> apply side. So yes can do without. >>> >>> FWIW, I don't think we can leave this out of the initial protocol >>> design. We don't have to implement it, but it has to be part of the >>> design. >>> >> >> I don't think it's a good idea to have unimplemented parts of protocol, we >> have protocol version so it can be added in v2 when we have code that is >> able to handle it. > > I don't think we have to have it part of the protocol. But it has to be > forseen, otherwise introducing it later will end up requiring more > invasive changes than acceptable. I don't want to repeat the "libpq v3 > protocol" evolution story here. > Oh sure, I don't see that as big problem, the TupleData already contains type of the data it sends (to distinguish between nulls and text data) so that's mostly about adding some different type there and we'll also need type info in the column part of the Relation message but that should be easy to fence with one if for different protocol version. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 13 September 2016 at 06:03, Petr Jelinek <petr@2ndquadrant.com> wrote: > Oh sure, I don't see that as big problem, the TupleData already contains > type of the data it sends (to distinguish between nulls and text data) so > that's mostly about adding some different type there and we'll also need > type info in the column part of the Relation message but that should be easy > to fence with one if for different protocol version. The missing piece seems to be negotiation. If a binary-aware client connects to a non-binary aware server, the non-binary-aware server needs a way to say "you requested this option I don't understand, go away" or "you asked for binary but I don't support that". -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 13/09/16 02:55, Craig Ringer wrote: > On 13 September 2016 at 06:03, Petr Jelinek <petr@2ndquadrant.com> wrote: > >> Oh sure, I don't see that as big problem, the TupleData already contains >> type of the data it sends (to distinguish between nulls and text data) so >> that's mostly about adding some different type there and we'll also need >> type info in the column part of the Relation message but that should be easy >> to fence with one if for different protocol version. > > The missing piece seems to be negotiation. > > If a binary-aware client connects to a non-binary aware server, the > non-binary-aware server needs a way to say "you requested this option > I don't understand, go away" or "you asked for binary but I don't > support that". > Not sure what you mean by negotiation. Why would that be needed? You know server version when you connect and when you know that you also know what capabilities that version of Postgres has. If you send unrecognized option you get corresponding error. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi, First read through the current version. Hence no real architectural comments. On 2016-09-09 00:59:26 +0200, Petr Jelinek wrote: > diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c > new file mode 100644 > index 0000000..e0c719d > --- /dev/null > +++ b/src/backend/commands/publicationcmds.c > @@ -0,0 +1,761 @@ > +/*------------------------------------------------------------------------- > + * > + * publicationcmds.c > + * publication manipulation > + * > + * Copyright (c) 2015, PostgreSQL Global Development Group > + * > + * IDENTIFICATION > + * publicationcmds.c > Not that I'm a fan of this line in the first place, but usually it does include the path. > +static void > +check_replication_permissions(void) > +{ > + if (!superuser() && !has_rolreplication(GetUserId())) > + ereport(ERROR, > + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), > + (errmsg("must be superuser or replication role to manipulate publications")))); > +} Do we want to require owner privileges for replication roles? I'd say no, but want to raise the question. > +ObjectAddress > +CreatePublication(CreatePublicationStmt *stmt) > +{ > + Relation rel; > + ObjectAddress myself; > + Oid puboid; > + bool nulls[Natts_pg_publication]; > + Datum values[Natts_pg_publication]; > + HeapTuple tup; > + bool replicate_insert_given; > + bool replicate_update_given; > + bool replicate_delete_given; > + bool replicate_insert; > + bool replicate_update; > + bool replicate_delete; > + > + check_replication_permissions(); > + > + rel = heap_open(PublicationRelationId, RowExclusiveLock); > + > + /* Check if name is used */ > + puboid = GetSysCacheOid1(PUBLICATIONNAME, CStringGetDatum(stmt->pubname)); > + if (OidIsValid(puboid)) > + { > + ereport(ERROR, > + (errcode(ERRCODE_DUPLICATE_OBJECT), > + errmsg("publication \"%s\" already exists", > + stmt->pubname))); > + } > + > + /* Form a tuple. */ > + memset(values, 0, sizeof(values)); > + memset(nulls, false, sizeof(nulls)); > + > + values[Anum_pg_publication_pubname - 1] = > + DirectFunctionCall1(namein, CStringGetDatum(stmt->pubname)); > + > + parse_publication_options(stmt->options, > + &replicate_insert_given, &replicate_insert, > + &replicate_update_given, &replicate_update, > + &replicate_delete_given, &replicate_delete); > + > + values[Anum_pg_publication_puballtables - 1] = > + BoolGetDatum(stmt->for_all_tables); > + values[Anum_pg_publication_pubreplins - 1] = > + BoolGetDatum(replicate_insert); > + values[Anum_pg_publication_pubreplupd - 1] = > + BoolGetDatum(replicate_update); > + values[Anum_pg_publication_pubrepldel - 1] = > + BoolGetDatum(replicate_delete); > + > + tup = heap_form_tuple(RelationGetDescr(rel), values, nulls); > + > + /* Insert tuple into catalog. */ > + puboid = simple_heap_insert(rel, tup); > + CatalogUpdateIndexes(rel, tup); > + heap_freetuple(tup); > + > + ObjectAddressSet(myself, PublicationRelationId, puboid); > + > + /* Make the changes visible. */ > + CommandCounterIncrement(); > + > + if (stmt->tables) > + { > + List *rels; > + > + Assert(list_length(stmt->tables) > 0); > + > + rels = GatherTableList(stmt->tables); > + PublicationAddTables(puboid, rels, true, NULL); > + CloseTables(rels); > + } > + else if (stmt->for_all_tables || stmt->schema) > + { > + List *rels; > + > + rels = GatherTables(stmt->schema); > + PublicationAddTables(puboid, rels, true, NULL); > + CloseTables(rels); > + } Isn't this (and ALTER) racy? What happens if tables are concurrently created? This session wouldn't necessarily see the tables, and other sessions won't see for_all_tables/schema. Evaluating for_all_tables/all_in_schema when the publication is used, would solve that problem. > +/* > + * Gather all tables optinally filtered by schema name. > + * The gathered tables are locked in access share lock mode. > + */ > +static List * > +GatherTables(char *nspname) > +{ > + Oid nspid = InvalidOid; > + List *rels = NIL; > + Relation rel; > + SysScanDesc scan; > + ScanKeyData key[1]; > + HeapTuple tup; > + > + /* Resolve and validate the schema if specified */ > + if (nspname) > + { > + nspid = LookupExplicitNamespace(nspname, false); > + if (IsSystemNamespace(nspid) || IsToastNamespace(nspid)) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("only tables in user schemas can be added to publication"), > + errdetail("%s is a system schema", strVal(nspname)))); > + } Why are we restricting pg_catalog here? There's a bunch of extensions creating objects therein, and we allow that. Seems better to just rely on the IsSystemClass check for that below. > +/* > + * Gather Relations based o provided by RangeVar list. > + * The gathered tables are locked in access share lock mode. > + */ Why access share? Shouldn't we make this ShareUpdateExclusive or similar, to prevent schema changes? > +static List * > +GatherTableList(List *tables) > +{ > + List *relids = NIL; > + List *rels = NIL; > + ListCell *lc; > + > + /* > + * Open, share-lock, and check all the explicitly-specified relations > + */ > + foreach(lc, tables) > + { > + RangeVar *rv = lfirst(lc); > + Relation rel; > + bool recurse = interpretInhOption(rv->inhOpt); > + Oid myrelid; > + > + rel = heap_openrv(rv, AccessShareLock); > + myrelid = RelationGetRelid(rel); > + /* don't throw error for "foo, foo" */ > + if (list_member_oid(relids, myrelid)) > + { > + heap_close(rel, AccessShareLock); > + continue; > + } > + rels = lappend(rels, rel); > + relids = lappend_oid(relids, myrelid); > + > + if (recurse) > + { > + ListCell *child; > + List *children; > + > + children = find_all_inheritors(myrelid, AccessShareLock, > + NULL); > + > + foreach(child, children) > + { > + Oid childrelid = lfirst_oid(child); > + > + if (list_member_oid(relids, childrelid)) > + continue; > + > + /* find_all_inheritors already got lock */ > + rel = heap_open(childrelid, NoLock); > + rels = lappend(rels, rel); > + relids = lappend_oid(relids, childrelid); > + } > + } > + } Hm, can't this yield duplicates, when both an inherited and a top level relation are specified? > @@ -713,6 +714,25 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId, > ObjectAddressSet(address, RelationRelationId, relationId); > > /* > + * If the newly created relation is a table and there are publications > + * which were created as FOR ALL TABLES, we want to add the relation > + * membership to those publications. > + */ > + > + if (relkind == RELKIND_RELATION) > + { > + List *pubids = GetAllTablesPublications(); > + ListCell *lc; > + > + foreach(lc, pubids) > + { > + Oid pubid = lfirst_oid(lc); > + > + publication_add_relation(pubid, rel, false); > + } > + } > + Hm, this has the potential to noticeably slow down table creation. > +publication_opt_item: > + IDENT > + { > + /* > + * We handle identifiers that aren't parser keywords with > + * the following special-case codes, to avoid bloating the > + * size of the main parser. > + */ > + if (strcmp($1, "replicate_insert") == 0) > + $$ = makeDefElem("replicate_insert", > + (Node *)makeInteger(TRUE), @1); > + else if (strcmp($1, "noreplicate_insert") == 0) > + $$ = makeDefElem("replicate_insert", > + (Node *)makeInteger(FALSE), @1); > + else if (strcmp($1, "replicate_update") == 0) > + $$ = makeDefElem("replicate_update", > + (Node *)makeInteger(TRUE), @1); > + else if (strcmp($1, "noreplicate_update") == 0) > + $$ = makeDefElem("replicate_update", > + (Node *)makeInteger(FALSE), @1); > + else if (strcmp($1, "replicate_delete") == 0) > + $$ = makeDefElem("replicate_delete", > + (Node *)makeInteger(TRUE), @1); > + else if (strcmp($1, "noreplicate_delete") == 0) > + $$ = makeDefElem("replicate_delete", > + (Node *)makeInteger(FALSE), @1); > + else > + ereport(ERROR, > + (errcode(ERRCODE_SYNTAX_ERROR), > + errmsg("unrecognized publication option \"%s\"", $1), > + parser_errposition(@1))); > + } > + ; I'm kind of inclined to do this checking at execution (or transform) time instead. That allows extension to add options, and handle them in utility hooks. > + > +/* ---------------- > + * pg_publication_rel definition. cpp turns this into > + * typedef struct FormData_pg_publication_rel > + * > + * ---------------- > + */ > +#define PublicationRelRelationId 6106 > + > +CATALOG(pg_publication_rel,6106) > +{ > + Oid pubid; /* Oid of the publication */ > + Oid relid; /* Oid of the relation */ > +} FormData_pg_publication_rel; Hm. Do we really want this to have an oid? Won't that significantly, especially if multiple publications are present, increase our oid consumption? It seems entirely sufficient to identify rows in here using (pubid, relid). > +ObjectAddress > +CreateSubscription(CreateSubscriptionStmt *stmt) > +{ > + Relation rel; > + ObjectAddress myself; > + Oid subid; > + bool nulls[Natts_pg_subscription]; > + Datum values[Natts_pg_subscription]; > + HeapTuple tup; > + bool enabled_given; > + bool enabled; > + char *conninfo; > + List *publications; > + > + check_subscription_permissions(); > + > + rel = heap_open(SubscriptionRelationId, RowExclusiveLock); > + > + /* Check if name is used */ > + subid = GetSysCacheOid2(SUBSCRIPTIONNAME, MyDatabaseId, > + CStringGetDatum(stmt->subname)); > + if (OidIsValid(subid)) > + { > + ereport(ERROR, > + (errcode(ERRCODE_DUPLICATE_OBJECT), > + errmsg("subscription \"%s\" already exists", > + stmt->subname))); > + } > + > + /* Parse and check options. */ > + parse_subscription_options(stmt->options, &enabled_given, &enabled, > + &conninfo, &publications); > + > + /* TODO: improve error messages here. */ > + if (conninfo == NULL) > + ereport(ERROR, > + (errcode(ERRCODE_SYNTAX_ERROR), > + errmsg("connection not specified"))); Probably also makes sense to parse the conninfo here to verify it looks saen. Although that's fairly annoying to do, because the relevant code is libpq :( > diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c > index 65230e2..f3d54c8 100644 > --- a/src/backend/nodes/copyfuncs.c > +++ b/src/backend/nodes/copyfuncs.c I think you might be missing outfuncs support. > + > +CATALOG(pg_subscription,6100) BKI_SHARED_RELATION BKI_ROWTYPE_OID(6101) BKI_SCHEMA_MACRO > +{ > + Oid subdbid; /* Database the subscription is in. */ > + NameData subname; /* Name of the subscription */ > + bool subenabled; /* True if the subsription is enabled (running) */ Not sure what "running" means here. > +#ifdef CATALOG_VARLEN /* variable-length fields start here */ > + text subconninfo; /* Connection string to the provider */ > + NameData subslotname; /* Slot name on provider */ > + > + name subpublications[1]; /* List of publications subscribed to */ > +#endif > +} FormData_pg_subscription; > + <varlistentry> > + <term> > + publication_names > + </term> > + <listitem> > + <para> > + Comma separated list of publication names for which to subscribe > + (receive changes). See > + <xref linkend="logical-replication-publication"> for more info. > + </para> > + </listitem> > + </varlistentry> > + </variablelist> Do we need to specify an escaping scheme here? > + <para> > + Every DML message contains arbitraty relation id, which can be mapped to Typo: "arbitraty" > +<listitem> > +<para> > + Commit timestamp of the transaction. > +</para> > +</listitem> > +</varlistentry> Perhaps mention it's relative to postgres epoch? > +<variablelist> > +<varlistentry> > +<term> > + Byte1('O') > +</term> > +<listitem> > +<para> > + Identifies the message as an origin message. > +</para> > +</listitem> > +</varlistentry> > +<varlistentry> > +<term> > + Int64 > +</term> > +<listitem> > +<para> > + The LSN of the commit on the origin server. > +</para> > +</listitem> > +</varlistentry> > +<varlistentry> > +<term> > + Int8 > +</term> > +<listitem> > +<para> > + Length of the origin name (including the NULL-termination > + character). > +</para> > +</listitem> > +</varlistentry> Should this explain that there could be mulitple origin messages (when replay switched origins during an xact)? > +<para> > + Relation name. > +</para> > +</listitem> > +</varlistentry> > +</variablelist> > + > +</para> > + > +<para> > +This message is always followed by Attributes message. > +</para> What's the point of having this separate from the relation message? > +<varlistentry> > +<term> > + Byte1('C') > +</term> > +<listitem> > +<para> > + Start of column block. > +</para> > +</listitem> "block"? > +</varlistentry><varlistentry> > +<term> > + Int8 > +</term> > +<listitem> > +<para> > + Flags for the column. Currently can be either 0 for no flags > + or one which marks the column as part of the key. > +</para> > +</listitem> > +</varlistentry> > +<varlistentry> > +<term> > + Int8 > +</term> > +<listitem> > +<para> > + Length of column name (including the NULL-termination > + character). > +</para> > +</listitem> > +</varlistentry> > +<varlistentry> > +<term> > + String > +</term> > +<listitem> > +<para> > + Name of the column. > +</para> > +</listitem> > +</varlistentry> Huh, no type information? > +<varlistentry> > +<term> > + Byte1('O') > +</term> > +<listitem> > +<para> > + Identifies the following TupleData message as the old tuple > + (deleted tuple). > +</para> > +</listitem> > +</varlistentry> Should we discern between old key and old tuple? > +#define IS_REPLICA_IDENTITY 1 Defining this in the c file doesn't seem particularly useful? > +/* > + * Read transaction BEGIN from the stream. > + */ > +void > +logicalrep_read_begin(StringInfo in, XLogRecPtr *remote_lsn, > + TimestampTz *committime, TransactionId *remote_xid) > +{ > + /* read fields */ > + *remote_lsn = pq_getmsgint64(in); > + Assert(*remote_lsn != InvalidXLogRecPtr); > + *committime = pq_getmsgint64(in); > + *remote_xid = pq_getmsgint(in, 4); > +} In network exposed stuff it seems better not to use assert, and error out instead. > +/* > + * Write UPDATE to the output stream. > + */ > +void > +logicalrep_write_update(StringInfo out, Relation rel, HeapTuple oldtuple, > + HeapTuple newtuple) > +{ > + pq_sendbyte(out, 'U'); /* action UPDATE */ > + > + /* use Oid as relation identifier */ > + pq_sendint(out, RelationGetRelid(rel), 4); Wonder if there's a way that could screw us. What happens if there's an oid wraparound, and a relation is dropped? Then a new relation could end up with same id. Maybe answered somewhere further down. > +/* > + * Write a tuple to the outputstream, in the most efficient format possible. > + */ > +static void > +logicalrep_write_tuple(StringInfo out, Relation rel, HeapTuple tuple) > +{ > + /* Write the values */ > + for (i = 0; i < desc->natts; i++) > + { > + outputstr = OidOutputFunctionCall(typclass->typoutput, values[i]); Odd spacing. > +/* > + * Initialize this plugin > + */ > +static void > +pgoutput_startup(LogicalDecodingContext * ctx, OutputPluginOptions *opt, > + bool is_init) > +{ > + PGOutputData *data = palloc0(sizeof(PGOutputData)); > + int client_encoding; > + > + /* Create our memory context for private allocations. */ > + data->context = AllocSetContextCreate(ctx->context, > + "logical replication output context", > + ALLOCSET_DEFAULT_MINSIZE, > + ALLOCSET_DEFAULT_INITSIZE, > + ALLOCSET_DEFAULT_MAXSIZE); > + > + ctx->output_plugin_private = data; > + > + /* > + * This is replication start and not slot initialization. > + * > + * Parse and validate options passed by the client. > + */ > + if (!is_init) > + { > + /* We can only do binary */ > + if (opt->output_type != OUTPUT_PLUGIN_BINARY_OUTPUT) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("only binary mode is supported for logical replication protocol"))); Shouldn't you just set opt->output_type = OUTPUT_PLUGIN_BINARY_OUTPUT; or is the goal just to output a better message? > + > +/* > + * COMMIT callback > + */ > +static void > +pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, > + XLogRecPtr commit_lsn) > +{ > + OutputPluginPrepareWrite(ctx, true); > + logicalrep_write_commit(ctx->out, txn, commit_lsn); > + OutputPluginWrite(ctx, true); > +} Hm, so we don't reset the context for these... > +/* > + * Sends the decoded DML over wire. > + */ > +static void > +pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, > + Relation relation, ReorderBufferChange *change) > +{ > + /* Avoid leaking memory by using and resetting our own context */ > + old = MemoryContextSwitchTo(data->context); > + > + /* > + * Write the relation schema if the current schema haven't been sent yet. > + */ > + if (!relentry->schema_sent) > + { > + OutputPluginPrepareWrite(ctx, false); > + logicalrep_write_rel(ctx->out, relation); > + OutputPluginWrite(ctx, false); > + relentry->schema_sent = true; > + } > + > + /* Send the data */ > + switch (change->action) > + { ... > + /* Cleanup */ > + MemoryContextSwitchTo(old); > + MemoryContextReset(data->context); > +} IIRC there were some pfree's in called functions. It's probably better to remove those and rely on this. > +/* > + * Load publications from the list of publication names. > + */ > +static List * > +LoadPublications(List *pubnames) > +{ > + List *result = NIL; > + ListCell *lc; > + > + foreach (lc, pubnames) > + { > + char *pubname = (char *) lfirst(lc); > + Publication *pub = GetPublicationByName(pubname, false); > + > + result = lappend(result, pub); > + } > + > + return result; > +} Why are we doing this eagerly? On systems with a lot of relations this'll suck up a fair amount of memory, without much need? > +/* > + * Remove all the entries from our relation cache. > + */ > +static void > +destroy_rel_sync_cache(void) > +{ > + HASH_SEQ_STATUS status; > + RelationSyncEntry *entry; > + > + if (RelationSyncCache == NULL) > + return; > + > + hash_seq_init(&status, RelationSyncCache); > + > + while ((entry = (RelationSyncEntry *) hash_seq_search(&status)) != NULL) > + { > + if (hash_search(RelationSyncCache, (void *) &entry->relid, > + HASH_REMOVE, NULL) == NULL) > + elog(ERROR, "hash table corrupted"); > + } > + > + RelationSyncCache = NULL; > +} Any reason not to just destroy the hash table instead? > +enum { > + PARAM_UNRECOGNISED, > + PARAM_PROTOCOL_VERSION, > + PARAM_ENCODING, > + PARAM_PG_VERSION, > + PARAM_PUBLICATION_NAMES, > +} OutputPluginParamKey; > + > +typedef struct { > + const char * const paramname; > + int paramkey; > +} OutputPluginParam; > + > +/* Oh, if only C had switch on strings */ > +static OutputPluginParam param_lookup[] = { > + {"proto_version", PARAM_PROTOCOL_VERSION}, > + {"encoding", PARAM_ENCODING}, > + {"pg_version", PARAM_PG_VERSION}, > + {"publication_names", PARAM_PUBLICATION_NAMES}, > + {NULL, PARAM_UNRECOGNISED} > +}; > + > + > +/* > + * Read parameters sent by client at startup and store recognised > + * ones in the parameters PGOutputData. > + * > + * The data must have all client-supplied parameter fields zeroed, > + * such as by memset or palloc0, since values not supplied > + * by the client are not set. > + */ > +void > +pgoutput_process_parameters(List *options, PGOutputData *data) > +{ > + ListCell *lc; > + > + /* Examine all the other params in the message. */ > + foreach(lc, options) > + { > + DefElem *elem = lfirst(lc); > + Datum val; > + > + Assert(elem->arg == NULL || IsA(elem->arg, String)); > + > + /* Check each param, whether or not we recognise it */ > + switch(get_param_key(elem->defname)) > + { > + case PARAM_PROTOCOL_VERSION: > + val = get_param_value(elem, OUTPUT_PARAM_TYPE_UINT32, false); > + data->protocol_version = DatumGetUInt32(val); > + break; > + > + case PARAM_ENCODING: > + val = get_param_value(elem, OUTPUT_PARAM_TYPE_STRING, false); > + data->client_encoding = DatumGetCString(val); > + break; > + > + case PARAM_PG_VERSION: > + val = get_param_value(elem, OUTPUT_PARAM_TYPE_UINT32, false); > + data->client_pg_version = DatumGetUInt32(val); > + break; > + > + case PARAM_PUBLICATION_NAMES: > + val = get_param_value(elem, OUTPUT_PARAM_TYPE_STRING, false); > + if (!SplitIdentifierString(DatumGetCString(val), ',', > + &data->publication_names)) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_NAME), > + errmsg("invalid publication name syntax"))); > + > + break; > + > + default: > + ereport(ERROR, > + (errmsg("Unrecognised pgoutput parameter %s", > + elem->defname))); > + break; > + } > + } > +} > + > +/* > + * Look up a param name to find the enum value for the > + * param, or PARAM_UNRECOGNISED if not found. > + */ > +static int > +get_param_key(const char * const param_name) > +{ > + OutputPluginParam *param = ¶m_lookup[0]; > + > + do { > + if (strcmp(param->paramname, param_name) == 0) > + return param->paramkey; > + param++; > + } while (param->paramname != NULL); > + > + return PARAM_UNRECOGNISED; > +} I'm not following why this isn't just one routine with a chain of else if (strmcp() == 0) blocks? > From 2241471aec03de553126c2d5fc012fcba1ecf50d Mon Sep 17 00:00:00 2001 > From: Petr Jelinek <pjmodos@pjmodos.net> > Date: Wed, 6 Jul 2016 13:59:23 +0200 > Subject: [PATCH 4/6] Make libpqwalreceiver reentrant > > --- > .../libpqwalreceiver/libpqwalreceiver.c | 328 ++++++++++++++------- > src/backend/replication/walreceiver.c | 67 +++-- > src/include/replication/walreceiver.h | 75 +++-- > 3 files changed, 306 insertions(+), 164 deletions(-) > > diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c > index f1c843e..5da4474 100644 > --- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c > +++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c > @@ -25,6 +25,7 @@ > #include "miscadmin.h" > #include "replication/walreceiver.h" > #include "utils/builtins.h" > +#include "utils/pg_lsn.h" > > #ifdef HAVE_POLL_H > #include <poll.h> > @@ -38,62 +39,83 @@ > > PG_MODULE_MAGIC; > > -void _PG_init(void); > +struct WalReceiverConnHandle { > + /* Current connection to the primary, if any */ > + PGconn *streamConn; > + /* Buffer for currently read records */ > + char *recvBuf; > +}; newline before { > -/* Current connection to the primary, if any */ > -static PGconn *streamConn = NULL; > - > -/* Buffer for currently read records */ > -static char *recvBuf = NULL; Yuck, this indeed seems better. > > /* > - * Module load callback > + * Module initialization callback > */ > -void > -_PG_init(void) > +WalReceiverConnHandle * > +_PG_walreceirver_conn_init(WalReceiverConnAPI *wrcapi) > { > - /* Tell walreceiver how to reach us */ > - if (walrcv_connect != NULL || walrcv_identify_system != NULL || > - walrcv_readtimelinehistoryfile != NULL || > - walrcv_startstreaming != NULL || walrcv_endstreaming != NULL || > - walrcv_receive != NULL || walrcv_send != NULL || > - walrcv_disconnect != NULL) > - elog(ERROR, "libpqwalreceiver already loaded"); > - walrcv_connect = libpqrcv_connect; > - walrcv_get_conninfo = libpqrcv_get_conninfo; > - walrcv_identify_system = libpqrcv_identify_system; > - walrcv_readtimelinehistoryfile = libpqrcv_readtimelinehistoryfile; > - walrcv_startstreaming = libpqrcv_startstreaming; > - walrcv_endstreaming = libpqrcv_endstreaming; > - walrcv_receive = libpqrcv_receive; > - walrcv_send = libpqrcv_send; > - walrcv_disconnect = libpqrcv_disconnect; > + WalReceiverConnHandle *handle; > + > + handle = palloc0(sizeof(WalReceiverConnHandle)); > + > + /* Tell caller how to reach us */ > + wrcapi->connect = libpqrcv_connect; > + wrcapi->get_conninfo = libpqrcv_get_conninfo; > + wrcapi->identify_system = libpqrcv_identify_system; > + wrcapi->readtimelinehistoryfile = libpqrcv_readtimelinehistoryfile; > + wrcapi->create_slot = libpqrcv_create_slot; > + wrcapi->startstreaming_physical = libpqrcv_startstreaming_physical; > + wrcapi->startstreaming_logical = libpqrcv_startstreaming_logical; > + wrcapi->endstreaming = libpqrcv_endstreaming; > + wrcapi->receive = libpqrcv_receive; > + wrcapi->send = libpqrcv_send; > + wrcapi->disconnect = libpqrcv_disconnect; > + > + return handle; > } This however I'm not following. Why do we need multiple copies of this? And why aren't we doing the assignments in _PG_init? Seems better to just allocate one WalRcvCalllbacks globally and assign all these as constants. Then the establishment function can just return all these (as part of a bigger struct). (skipped logical rep docs) > diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml > index 8acdff1..34007d3 100644 > --- a/doc/src/sgml/reference.sgml > +++ b/doc/src/sgml/reference.sgml > @@ -54,11 +54,13 @@ > &alterOperatorClass; > &alterOperatorFamily; > &alterPolicy; > + &alterPublication; > &alterRole; > &alterRule; > &alterSchema; > &alterSequence; > &alterServer; > + &alterSubscription; > &alterSystem; > &alterTable; > &alterTableSpace; > @@ -100,11 +102,13 @@ > &createOperatorClass; > &createOperatorFamily; > &createPolicy; > + &createPublication; > &createRole; > &createRule; > &createSchema; > &createSequence; > &createServer; > + &createSubscription; > &createTable; > &createTableAs; > &createTableSpace; > @@ -144,11 +148,13 @@ > &dropOperatorFamily; > &dropOwned; > &dropPolicy; > + &dropPublication; > &dropRole; > &dropRule; > &dropSchema; > &dropSequence; > &dropServer; > + &dropSubscription; > &dropTable; > &dropTableSpace; > &dropTSConfig; Hm, shouldn't all these have been registered in the earlier patch? > diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c > index d29d3f9..f2052b8 100644 > --- a/src/backend/commands/subscriptioncmds.c > +++ b/src/backend/commands/subscriptioncmds.c This sure is a lot of yanking around of previously added code. At least some of it looks like it should really have been part of the earlier commit. > @@ -327,6 +431,18 @@ DropSubscriptionById(Oid subid) > { > Relation rel; > HeapTuple tup; > + Datum datum; > + bool isnull; > + char *subname; > + char *conninfo; > + char *slotname; > + RepOriginId originid; > + MemoryContext tmpctx, > + oldctx; > + WalReceiverConnHandle *wrchandle = NULL; > + WalReceiverConnAPI *wrcapi = NULL; > + walrcvconn_init_fn walrcvconn_init; > + LogicalRepWorker *worker; > > check_subscription_permissions(); > > @@ -337,9 +453,135 @@ DropSubscriptionById(Oid subid) > if (!HeapTupleIsValid(tup)) > elog(ERROR, "cache lookup failed for subscription %u", subid); > > + /* > + * Create temporary memory context to keep copy of subscription > + * info needed later in the execution. > + */ > + tmpctx = AllocSetContextCreate(TopMemoryContext, > + "DropSubscription Ctx", > + ALLOCSET_DEFAULT_MINSIZE, > + ALLOCSET_DEFAULT_INITSIZE, > + ALLOCSET_DEFAULT_MAXSIZE); > + oldctx = MemoryContextSwitchTo(tmpctx); > + > + /* Get subname */ > + datum = SysCacheGetAttr(SUBSCRIPTIONOID, tup, > + Anum_pg_subscription_subname, &isnull); > + Assert(!isnull); > + subname = pstrdup(NameStr(*DatumGetName(datum))); > + > + /* Get conninfo */ > + datum = SysCacheGetAttr(SUBSCRIPTIONOID, tup, > + Anum_pg_subscription_subconninfo, &isnull); > + Assert(!isnull); > + conninfo = pstrdup(TextDatumGetCString(datum)); > + > + /* Get slotname */ > + datum = SysCacheGetAttr(SUBSCRIPTIONOID, tup, > + Anum_pg_subscription_subslotname, &isnull); > + Assert(!isnull); > + slotname = pstrdup(NameStr(*DatumGetName(datum))); > + > + MemoryContextSwitchTo(oldctx); > + > + /* Remove the tuple from catalog. */ > simple_heap_delete(rel, &tup->t_self); > > - ReleaseSysCache(tup); > + /* Protect against launcher restarting the worker. */ > + LWLockAcquire(LogicalRepLauncherLock, LW_EXCLUSIVE); > > - heap_close(rel, RowExclusiveLock); > + /* Kill the apply worker so that the slot becomes accessible. */ > + LWLockAcquire(LogicalRepWorkerLock, LW_SHARED); > + worker = logicalrep_worker_find(subid); > + if (worker) > + logicalrep_worker_stop(worker); > + LWLockRelease(LogicalRepWorkerLock); > + > + /* Wait for apply process to die. */ > + for (;;) > + { > + int rc; > + > + CHECK_FOR_INTERRUPTS(); > + > + LWLockAcquire(LogicalRepWorkerLock, LW_SHARED); > + if (logicalrep_worker_count(subid) < 1) > + { > + LWLockRelease(LogicalRepWorkerLock); > + break; > + } > + LWLockRelease(LogicalRepWorkerLock); > + > + /* Wait for more work. */ > + rc = WaitLatch(&MyProc->procLatch, > + WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH, > + 1000L); > + > + /* emergency bailout if postmaster has died */ > + if (rc & WL_POSTMASTER_DEATH) > + proc_exit(1); > + > + ResetLatch(&MyProc->procLatch); > + } I'm really far from convinced this is the right layer to perform these operations. Previously these routines were low level catalog manipulation routines. Now they're certainly not. > + /* Remove the origin trakicking. */ typo. > + /* > + * Now that the catalog update is done, try to reserve slot at the > + * provider node using replication connection. > + */ > + wrcapi = palloc0(sizeof(WalReceiverConnAPI)); > + > + walrcvconn_init = (walrcvconn_init_fn) > + load_external_function("libpqwalreceiver", > + "_PG_walreceirver_conn_init", false, NULL); > + > + if (walrcvconn_init == NULL) > + elog(ERROR, "libpqwalreceiver does not declare _PG_walreceirver_conn_init symbol"); This does rather reinforce my opinion that the _PG_init removal in libpqwalreceiver isn't useful. > diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c > index 699c934..fc998cd 100644 > --- a/src/backend/postmaster/bgworker.c > +++ b/src/backend/postmaster/bgworker.c > @@ -93,6 +93,9 @@ struct BackgroundWorkerHandle > > static BackgroundWorkerArray *BackgroundWorkerData; > > +/* Enables registration of internal background workers. */ > +bool internal_bgworker_registration_in_progress = false; > + > /* > * Calculate shared memory needed. > */ > @@ -745,7 +748,8 @@ RegisterBackgroundWorker(BackgroundWorker *worker) > ereport(DEBUG1, > (errmsg("registering background worker \"%s\"", worker->bgw_name))); > > - if (!process_shared_preload_libraries_in_progress) > + if (!process_shared_preload_libraries_in_progress && > + !internal_bgworker_registration_in_progress) > { > if (!IsUnderPostmaster) > ereport(LOG, Ugh. > /* > + * Register internal background workers. > + * > + * This is here mainly because the permanent bgworkers are normally allowed > + * to be registered only when share preload libraries are loaded which does > + * not work for the internal ones. > + */ > +static void > +register_internal_bgworkers(void) > +{ > + internal_bgworker_registration_in_progress = true; > + > + /* Register the logical replication worker launcher if appropriate. */ > + if (!IsBinaryUpgrade && max_logical_replication_workers > 0) > + { > + BackgroundWorker bgw; > + > + bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | > + BGWORKER_BACKEND_DATABASE_CONNECTION; > + bgw.bgw_start_time = BgWorkerStart_RecoveryFinished; > + bgw.bgw_main = ApplyLauncherMain; > + snprintf(bgw.bgw_name, BGW_MAXLEN, > + "logical replication launcher"); > + bgw.bgw_restart_time = 5; > + bgw.bgw_notify_pid = 0; > + bgw.bgw_main_arg = (Datum) 0; > + > + RegisterBackgroundWorker(&bgw); > + } > + > + internal_bgworker_registration_in_progress = false; > +} Who says these flags are right for everyone? If we indeed want to go through bgworkers here, I think you'll have to generallize this a bit, so we don't check for max_logical_replication_workers and such here. We could e.g. have the shared memory sizing hooks set up a chain of registrations. > -static void > +static char * > libpqrcv_identify_system(WalReceiverConnHandle *handle, > - TimeLineID *primary_tli) > + TimeLineID *primary_tli, > + char **dbname) > { > + char *sysid; > PGresult *res; > - char *primary_sysid; > - char standby_sysid[32]; > > /* > * Get the system identifier and timeline ID as a DataRow message from the > @@ -231,24 +234,19 @@ libpqrcv_identify_system(WalReceiverConnHandle *handle, > errdetail("Could not identify system: got %d rows and %d fields, expected %d rows and %d or more fields.", > ntuples, nfields, 3, 1))); > } > - primary_sysid = PQgetvalue(res, 0, 0); > + sysid = pstrdup(PQgetvalue(res, 0, 0)); > *primary_tli = pg_atoi(PQgetvalue(res, 0, 1), 4, 0); > - > - /* > - * Confirm that the system identifier of the primary is the same as ours. > - */ > - snprintf(standby_sysid, sizeof(standby_sysid), UINT64_FORMAT, > - GetSystemIdentifier()); > - if (strcmp(primary_sysid, standby_sysid) != 0) > + if (dbname) > { > - primary_sysid = pstrdup(primary_sysid); > - PQclear(res); > - ereport(ERROR, > - (errmsg("database system identifier differs between the primary and standby"), > - errdetail("The primary's identifier is %s, the standby's identifier is %s.", > - primary_sysid, standby_sysid))); > + if (PQgetisnull(res, 0, 3)) > + *dbname = NULL; > + else > + *dbname = pstrdup(PQgetvalue(res, 0, 3)); > } > + > PQclear(res); > + > + return sysid; > } > > /* > @@ -274,7 +272,7 @@ libpqrcv_create_slot(WalReceiverConnHandle *handle, char *slotname, > > if (PQresultStatus(res) != PGRES_TUPLES_OK) > { > - elog(FATAL, "could not crate replication slot \"%s\": %s\n", > + elog(ERROR, "could not crate replication slot \"%s\": %s\n", > slotname, PQerrorMessage(handle->streamConn)); > } > > @@ -287,6 +285,28 @@ libpqrcv_create_slot(WalReceiverConnHandle *handle, char *slotname, > return snapshot; > } > > +/* > + * Drop replication slot. > + */ > +static void > +libpqrcv_drop_slot(WalReceiverConnHandle *handle, char *slotname) > +{ > + PGresult *res; > + char cmd[256]; > + > + snprintf(cmd, sizeof(cmd), > + "DROP_REPLICATION_SLOT \"%s\"", slotname); > + > + res = libpqrcv_PQexec(handle, cmd); > + > + if (PQresultStatus(res) != PGRES_COMMAND_OK) > + { > + elog(ERROR, "could not drop replication slot \"%s\": %s\n", > + slotname, PQerrorMessage(handle->streamConn)); > + } > + > + PQclear(res); > +} Given that the earlier commit to libpqwalreciever added a lot of this information, it doesn't seem right to change it again here. > +typedef struct LogicalRepRelMapEntry { early { Ok, running out of time. See you soon I guess ;) Andres
(continuing, uh, a bit happier) On 2016-09-09 00:59:26 +0200, Petr Jelinek wrote: > +/* > + * Relcache invalidation callback for our relation map cache. > + */ > +static void > +logicalreprelmap_invalidate_cb(Datum arg, Oid reloid) > +{ > + LogicalRepRelMapEntry *entry; > + > + /* Just to be sure. */ > + if (LogicalRepRelMap == NULL) > + return; > + > + if (reloid != InvalidOid) > + { > + HASH_SEQ_STATUS status; > + > + hash_seq_init(&status, LogicalRepRelMap); > + > + /* TODO, use inverse lookup hastable? */ *hashtable > + while ((entry = (LogicalRepRelMapEntry *) hash_seq_search(&status)) != NULL) > + { > + if (entry->reloid == reloid) > + entry->reloid = InvalidOid; can't we break here? > +/* > + * Initialize the relation map cache. > + */ > +static void > +remoterelmap_init(void) > +{ > + HASHCTL ctl; > + > + /* Make sure we've initialized CacheMemoryContext. */ > + if (CacheMemoryContext == NULL) > + CreateCacheMemoryContext(); > + > + /* Initialize the hash table. */ > + MemSet(&ctl, 0, sizeof(ctl)); > + ctl.keysize = sizeof(uint32); > + ctl.entrysize = sizeof(LogicalRepRelMapEntry); > + ctl.hcxt = CacheMemoryContext; Wonder if this (and similar code earlier) should try to do everything in a sub-context of CacheMemoryContext instead. That'd make some issues easier to track down. > +/* > + * Open the local relation associated with the remote one. > + */ > +static LogicalRepRelMapEntry * > +logicalreprel_open(uint32 remoteid, LOCKMODE lockmode) > +{ > + LogicalRepRelMapEntry *entry; > + bool found; > + > + if (LogicalRepRelMap == NULL) > + remoterelmap_init(); > + > + /* Search for existing entry. */ > + entry = hash_search(LogicalRepRelMap, (void *) &remoteid, > + HASH_FIND, &found); > + > + if (!found) > + elog(FATAL, "cache lookup failed for remote relation %u", > + remoteid); > + > + /* Need to update the local cache? */ > + if (!OidIsValid(entry->reloid)) > + { > + Oid nspid; > + Oid relid; > + int i; > + TupleDesc desc; > + LogicalRepRelation *remoterel; > + > + remoterel = &entry->remoterel; > + > + nspid = LookupExplicitNamespace(remoterel->nspname, false); > + if (!OidIsValid(nspid)) > + ereport(FATAL, > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > + errmsg("the logical replication target %s not found", > + quote_qualified_identifier(remoterel->nspname, remoterel->relname)))); > + relid = get_relname_relid(remoterel->relname, nspid); > + if (!OidIsValid(relid)) > + ereport(FATAL, > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > + errmsg("the logical replication target %s not found", > + quote_qualified_identifier(remoterel->nspname, > + remoterel->relname)))); > + > + entry->rel = heap_open(relid, lockmode); This seems rather racy. I think this really instead needs something akin to RangeVarGetRelidExtended(). > +/* > + * Executor state preparation for evaluation of constraint expressions, > + * indexes and triggers. > + * > + * This is based on similar code in copy.c > + */ > +static EState * > +create_estate_for_relation(LogicalRepRelMapEntry *rel) > +{ > + EState *estate; > + ResultRelInfo *resultRelInfo; > + RangeTblEntry *rte; > + > + estate = CreateExecutorState(); > + > + rte = makeNode(RangeTblEntry); > + rte->rtekind = RTE_RELATION; > + rte->relid = RelationGetRelid(rel->rel); > + rte->relkind = rel->rel->rd_rel->relkind; > + estate->es_range_table = list_make1(rte); > + > + resultRelInfo = makeNode(ResultRelInfo); > + InitResultRelInfo(resultRelInfo, rel->rel, 1, 0); > + > + estate->es_result_relations = resultRelInfo; > + estate->es_num_result_relations = 1; > + estate->es_result_relation_info = resultRelInfo; > + > + /* Triggers might need a slot */ > + if (resultRelInfo->ri_TrigDesc) > + estate->es_trig_tuple_slot = ExecInitExtraTupleSlot(estate); > + > + return estate; > +} Ugh, we do this for every single change? That's pretty darn heavy. > +/* > + * Check if the local attribute is present in relation definition used > + * by upstream and hence updated by the replication. > + */ > +static bool > +physatt_in_attmap(LogicalRepRelMapEntry *rel, int attid) > +{ > + AttrNumber i; > + > + /* Fast path for tables that are same on upstream and downstream. */ > + if (attid < rel->remoterel.natts && rel->attmap[attid] == attid) > + return true; > + > + /* Try to find the attribute in the map. */ > + for (i = 0; i < rel->remoterel.natts; i++) > + if (rel->attmap[i] == attid) > + return true; > + > + return false; > +} Shouldn't we rather try to keep an attribute map that always can map remote attribute numbers to local ones? That doesn't seem hard on a first blush? But I might be missing something here. > +/* > + * Executes default values for columns for which we can't map to remote > + * relation columns. > + * > + * This allows us to support tables which have more columns on the downstream > + * than on the upsttream. > + */ Typo: upsttream. > +static void > +FillSlotDefaults(LogicalRepRelMapEntry *rel, EState *estate, > + TupleTableSlot *slot) > +{ Why is this using a different naming scheme? > +/* > + * Handle COMMIT message. > + * > + * TODO, support tracking of multiple origins > + */ > +static void > +handle_commit(StringInfo s) > +{ > + XLogRecPtr commit_lsn; > + XLogRecPtr end_lsn; > + TimestampTz commit_time; > + > + logicalrep_read_commit(s, &commit_lsn, &end_lsn, &commit_time); Perhaps this (and related routines) should rather be LogicalRepCommitdata commit_data; logicalrep_read_commit(s,&commit_data); etc? That way the data can transparently be enhanced. > + Assert(commit_lsn == replorigin_session_origin_lsn); > + Assert(commit_time == replorigin_session_origin_timestamp); > + > + if (IsTransactionState()) > + { > + FlushPosition *flushpos; > + > + CommitTransactionCommand(); > + MemoryContextSwitchTo(CacheMemoryContext); > + > + /* Track commit lsn */ > + flushpos = (FlushPosition *) palloc(sizeof(FlushPosition)); > + flushpos->local_end = XactLastCommitEnd; > + flushpos->remote_end = end_lsn; > + > + dlist_push_tail(&lsn_mapping, &flushpos->node); > + MemoryContextSwitchTo(ApplyContext); Seems like it should be in a separate function. > +/* > + * Handle INSERT message. > + */ > +static void > +handle_insert(StringInfo s) > +{ > + LogicalRepRelMapEntry *rel; > + LogicalRepTupleData newtup; > + LogicalRepRelId relid; > + EState *estate; > + TupleTableSlot *remoteslot; > + MemoryContext oldctx; > + > + ensure_transaction(); > + > + relid = logicalrep_read_insert(s, &newtup); > + rel = logicalreprel_open(relid, RowExclusiveLock); > + > + /* Initialize the executor state. */ > + estate = create_estate_for_relation(rel); > + remoteslot = ExecInitExtraTupleSlot(estate); > + ExecSetSlotDescriptor(remoteslot, RelationGetDescr(rel->rel)); This seems incredibly expensive for replicating a lot of rows. > + /* Process and store remote tuple in the slot */ > + oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate)); > + SlotStoreCStrings(remoteslot, newtup.values); > + FillSlotDefaults(rel, estate, remoteslot); > + MemoryContextSwitchTo(oldctx); > + > + PushActiveSnapshot(GetTransactionSnapshot()); > + ExecOpenIndices(estate->es_result_relation_info, false); > + > + ExecInsert(NULL, /* mtstate is only used for onconflict handling which we don't support atm */ > + remoteslot, > + remoteslot, > + NIL, > + ONCONFLICT_NONE, > + estate, > + false); I have *severe* doubts about just using the (newly) exposed functions 1:1 here. > +/* > + * Search the relation 'rel' for tuple using the replication index. > + * > + * If a matching tuple is found lock it with lockmode, fill the slot with its > + * contents and return true, return false is returned otherwise. > + */ > +static bool > +tuple_find_by_replidx(Relation rel, LockTupleMode lockmode, > + TupleTableSlot *searchslot, TupleTableSlot *slot) > +{ > + HeapTuple scantuple; > + ScanKeyData skey[INDEX_MAX_KEYS]; > + IndexScanDesc scan; > + SnapshotData snap; > + TransactionId xwait; > + Oid idxoid; > + Relation idxrel; > + bool found; > + > + /* Open REPLICA IDENTITY index.*/ > + idxoid = RelationGetReplicaIndex(rel); > + if (!OidIsValid(idxoid)) > + { > + elog(ERROR, "could not find configured replica identity for table \"%s\"", > + RelationGetRelationName(rel)); > + return false; > + } > + idxrel = index_open(idxoid, RowExclusiveLock); > + > + /* Start an index scan. */ > + InitDirtySnapshot(snap); > + scan = index_beginscan(rel, idxrel, &snap, > + RelationGetNumberOfAttributes(idxrel), > + 0); > + > + /* Build scan key. */ > + build_replindex_scan_key(skey, rel, idxrel, searchslot); > + > +retry: > + found = false; > + > + index_rescan(scan, skey, RelationGetNumberOfAttributes(idxrel), NULL, 0); > + > + /* Try to find the tuple */ > + if ((scantuple = index_getnext(scan, ForwardScanDirection)) != NULL) > + { > + found = true; > + ExecStoreTuple(scantuple, slot, InvalidBuffer, false); > + ExecMaterializeSlot(slot); > + > + xwait = TransactionIdIsValid(snap.xmin) ? > + snap.xmin : snap.xmax; > + > + /* > + * If the tuple is locked, wait for locking transaction to finish > + * and retry. > + */ > + if (TransactionIdIsValid(xwait)) > + { > + XactLockTableWait(xwait, NULL, NULL, XLTW_None); > + goto retry; > + } > + } Hm. So we potentially find multiple tuples here, and lock all of them. but then only use one for the update. > +static List * > +get_subscription_list(void) > +{ > + List *res = NIL; > + Relation rel; > + HeapScanDesc scan; > + HeapTuple tup; > + MemoryContext resultcxt; > + > + /* This is the context that we will allocate our output data in */ > + resultcxt = CurrentMemoryContext; > + > + /* > + * Start a transaction so we can access pg_database, and get a snapshot. > + * We don't have a use for the snapshot itself, but we're interested in > + * the secondary effect that it sets RecentGlobalXmin. (This is critical > + * for anything that reads heap pages, because HOT may decide to prune > + * them even if the process doesn't attempt to modify any tuples.) > + */ > + StartTransactionCommand(); > + (void) GetTransactionSnapshot(); > + > + rel = heap_open(SubscriptionRelationId, AccessShareLock); > + scan = heap_beginscan_catalog(rel, 0, NULL); > + > + while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection))) > + { > + Form_pg_subscription subform = (Form_pg_subscription) GETSTRUCT(tup); > + Subscription *sub; > + MemoryContext oldcxt; > + > + /* > + * Allocate our results in the caller's context, not the > + * transaction's. We do this inside the loop, and restore the original > + * context at the end, so that leaky things like heap_getnext() are > + * not called in a potentially long-lived context. > + */ > + oldcxt = MemoryContextSwitchTo(resultcxt); > + > + sub = (Subscription *) palloc(sizeof(Subscription)); > + sub->oid = HeapTupleGetOid(tup); > + sub->dbid = subform->subdbid; > + sub->enabled = subform->subenabled; > + > + /* We don't fill fields we are not intereste in. */ > + sub->name = NULL; > + sub->conninfo = NULL; > + sub->slotname = NULL; > + sub->publications = NIL; > + > + res = lappend(res, sub); > + MemoryContextSwitchTo(oldcxt); > + } > + > + heap_endscan(scan); > + heap_close(rel, AccessShareLock); > + > + CommitTransactionCommand(); Hm. this doesn't seem quite right from a locking pov. What if, in the middle of this, a new subscription is created? > +void > +logicalrep_worker_stop(LogicalRepWorker *worker) > +{ > + Assert(LWLockHeldByMe(LogicalRepWorkerLock)); > + > + /* Check that the worker is up and what we expect. */ > + if (!worker->proc) > + return; > + if (!IsBackendPid(worker->proc->pid)) > + return; > + > + /* Terminate the worker. */ > + kill(worker->proc->pid, SIGTERM); > + > + LWLockRelease(LogicalRepLauncherLock); > + > + /* Wait for it to detach. */ > + for (;;) > + { > + int rc = WaitLatch(&MyProc->procLatch, > + WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH, > + 1000L); > + > + /* emergency bailout if postmaster has died */ > + if (rc & WL_POSTMASTER_DEATH) > + proc_exit(1); > + > + ResetLatch(&MyProc->procLatch); > + > + CHECK_FOR_INTERRUPTS(); > + > + if (!worker->proc) > + return; > + } > +} indentation here seems scfrewed. > +static void > +xacthook_signal_launcher(XactEvent event, void *arg) > +{ > + switch (event) > + { > + case XACT_EVENT_COMMIT: > + if (xacthook_do_signal_launcher) > + ApplyLauncherWakeup(); > + break; > + default: > + /* We're not interested in other tx events */ > + break; > + } > +} > +void > +ApplyLauncherWakeupOnCommit(void) > +{ > + if (!xacthook_do_signal_launcher) > + { > + RegisterXactCallback(xacthook_signal_launcher, NULL); > + xacthook_do_signal_launcher = true; > + } > +} Hm. This seems like it really should be an AtCommit_* routine instead. This also needs more docs. Hadn't I previously read about always streaming data to disk first? > @@ -0,0 +1,674 @@ > +/*------------------------------------------------------------------------- > + * tablesync.c > + * PostgreSQL logical replication > + * > + * Copyright (c) 2012-2016, PostgreSQL Global Development Group > + * > + * IDENTIFICATION > + * src/backend/replication/logical/tablesync.c > + * > + * NOTES > + * This file contains code for initial table data synchronization for > + * logical replication. > + * > + * The initial data synchronization is done separately for each table, > + * in separate apply worker that only fetches the initial snapshot data > + * from the provider and then synchronizes the position in stream with > + * the main apply worker. Why? I guess that's because it allows to incrementally add tables, with acceptable overhead. > + * The stream position synchronization works in multiple steps. > + * - sync finishes copy and sets table state as SYNCWAIT and waits > + * for state to change in a loop > + * - apply periodically checks unsynced tables for SYNCWAIT, when it > + * appears it will compare its position in the stream with the > + * SYNCWAIT position and decides to either set it to CATCHUP when > + * the apply was infront (and wait for the sync to do the catchup), > + * or set the state to SYNCDONE if the sync was infront or in case > + * both sync and apply are at the same position it will set it to > + * READY and stops tracking it I'm not quite following here. > + * - if the state was set to CATCHUP sync will read the stream and > + * apply changes until it catches up to the specified stream > + * position and then sets state to READY and signals apply that it > + * can stop waiting and exits, if the state was set to something > + * else than CATCHUP the sync process will simply end > + * - if the state was set to SYNCDONE by apply, the apply will > + * continue tracking the table until it reaches the SYNCDONE stream > + * position at which point it sets state to READY and stops tracking > + * > + * Example flows look like this: > + * - Apply is infront: > + * sync:8 -> set SYNCWAIT > + * apply:10 -> set CATCHUP > + * sync:10 -> set ready > + * exit > + * apply:10 > + * stop tracking > + * continue rep > + * - Sync infront: > + * sync:10 > + * set SYNCWAIT > + * apply:8 > + * set SYNCDONE > + * sync:10 > + * exit > + * apply:10 > + * set READY > + * stop tracking > + * continue rep This definitely needs to be expanded a bit. Where are we tracking how far replication has progressed on individual tables? Are we creating new slots for syncing? Is there any parallelism in syncing? > +/* > + * Exit routine for synchronization worker. > + */ > +static void > +finish_sync_worker(char *slotname) > +{ > + LogicalRepWorker *worker; > + RepOriginId originid; > + MemoryContext oldctx = CurrentMemoryContext; > + > + /* > + * Drop the replication slot on remote server. > + * We want to continue even in the case that the slot on remote side > + * is already gone. This means that we can leave slot on the remote > + * side but that can happen for other reasons as well so we can't > + * really protect against that. > + */ > + PG_TRY(); > + { > + wrcapi->drop_slot(wrchandle, slotname); > + } > + PG_CATCH(); > + { > + MemoryContext ectx; > + ErrorData *edata; > + > + ectx = MemoryContextSwitchTo(oldctx); > + /* Save error info */ > + edata = CopyErrorData(); > + MemoryContextSwitchTo(ectx); > + FlushErrorState(); > + > + ereport(WARNING, > + (errmsg("there was problem dropping the replication slot " > + "\"%s\" on provider", slotname), > + errdetail("The error was: %s", edata->message), > + errhint("You may have to drop it manually"))); > + FreeErrorData(edata); ISTM we really should rather return success/failure here, and not throw an error inside the libpqwalreceiver stuff. I kind of wonder if we actually can get rid of this indirection. > + /* Find the main apply worker and signal it. */ > + LWLockAcquire(LogicalRepWorkerLock, LW_EXCLUSIVE); > + worker = logicalrep_worker_find(MyLogicalRepWorker->subid, InvalidOid); > + if (worker && worker->proc) > + SetLatch(&worker->proc->procLatch); > + LWLockRelease(LogicalRepWorkerLock); I'd rather do the SetLatch outside of the critical section. > +static bool > +wait_for_sync_status_change(TableState *tstate) > +{ > + int rc; > + char state = tstate->state; > + > + while (!got_SIGTERM) > + { > + StartTransactionCommand(); > + tstate->state = GetSubscriptionRelState(MyLogicalRepWorker->subid, > + tstate->relid, > + &tstate->lsn, > + true); > + CommitTransactionCommand(); > + > + /* Status record was removed. */ > + if (tstate->state == SUBREL_STATE_UNKNOWN) > + return false; > + > + if (tstate->state != state) > + return true; > + > + rc = WaitLatch(&MyProc->procLatch, > + WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH, > + 10000L); > + > + /* emergency bailout if postmaster has died */ > + if (rc & WL_POSTMASTER_DEATH) > + proc_exit(1); > + > + ResetLatch(&MyProc->procLatch); broken indentation. > +/* > + * Read the state of the tables in the subscription and update our table > + * state list. > + */ > +static void > +reread_sync_state(Oid relid) > +{ > + dlist_mutable_iter iter; > + Relation rel; > + HeapTuple tup; > + ScanKeyData skey[2]; > + HeapScanDesc scan; > + > + /* Clean the old list. */ > + dlist_foreach_modify(iter, &table_states) > + { > + TableState *tstate = dlist_container(TableState, node, iter.cur); > + > + dlist_delete(iter.cur); > + pfree(tstate); > + } > + > + /* > + * Fetch all the subscription relation states that are not marked as > + * ready and push them into our table state tracking list. > + */ > + rel = heap_open(SubscriptionRelRelationId, RowExclusiveLock); > + > + ScanKeyInit(&skey[0], > + Anum_pg_subscription_rel_subid, > + BTEqualStrategyNumber, F_OIDEQ, > + ObjectIdGetDatum(MyLogicalRepWorker->subid)); > + > + if (OidIsValid(relid)) > + { > + ScanKeyInit(&skey[1], > + Anum_pg_subscription_rel_subrelid, > + BTEqualStrategyNumber, F_OIDEQ, > + ObjectIdGetDatum(relid)); > + } > + else > + { > + ScanKeyInit(&skey[1], > + Anum_pg_subscription_rel_substate, > + BTEqualStrategyNumber, F_CHARNE, > + CharGetDatum(SUBREL_STATE_READY)); > + } > + > + scan = heap_beginscan_catalog(rel, 2, skey); Hm. So this is a seqscan. Shouldn't we make this use an index (depending on which branch is taken above)? > + while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection))) > + { > + Form_pg_subscription_rel subrel; > + TableState *tstate; > + MemoryContext oldctx; > + > + subrel = (Form_pg_subscription_rel) GETSTRUCT(tup); > + > + /* Allocate the tracking info in a permament memory context. */ s/permament/permanent/ > +/* > + * Handle table synchronization cooperation from the synchroniation > + * worker. > + */ > +static void > +process_syncing_tables_sync(char *slotname, XLogRecPtr end_lsn) > +{ > + TableState *tstate; > + TimeLineID tli; > + > + Assert(!IsTransactionState()); > + > + /* > + * Synchronization workers don't keep track of all synchronization > + * tables, they only care about their table. > + */ > + if (!table_states_valid) > + { > + StartTransactionCommand(); > + reread_sync_state(MyLogicalRepWorker->relid); > + CommitTransactionCommand(); > + } > + > + /* Somebody removed table underneath this worker, nothing more to do. */ > + if (dlist_is_empty(&table_states)) > + { > + wrcapi->endstreaming(wrchandle, &tli); > + finish_sync_worker(slotname); > + } > + > + /* Check if we are done with catchup now. */ > + tstate = dlist_container(TableState, node, dlist_head_node(&table_states)); > + if (tstate->state == SUBREL_STATE_CATCHUP) > + { > + Assert(tstate->lsn != InvalidXLogRecPtr); > + > + if (tstate->lsn == end_lsn) > + { > + tstate->state = SUBREL_STATE_READY; > + tstate->lsn = InvalidXLogRecPtr; > + /* Update state of the synchronization. */ > + StartTransactionCommand(); > + SetSubscriptionRelState(MyLogicalRepWorker->subid, > + tstate->relid, tstate->state, > + tstate->lsn); > + CommitTransactionCommand(); > + > + wrcapi->endstreaming(wrchandle, &tli); > + finish_sync_worker(slotname); > + } > + return; > + } > +} The return inside the if is a bit weird. Makes one think it might be a loop or such. > +/* > + * Handle table synchronization cooperation from the apply worker. > + */ > +static void > +process_syncing_tables_apply(char *slotname, XLogRecPtr end_lsn) > +{ > + dlist_mutable_iter iter; > + > + Assert(!IsTransactionState()); > + > + if (!table_states_valid) > + { > + StartTransactionCommand(); > + reread_sync_state(InvalidOid); > + CommitTransactionCommand(); > + } So this pattern is repeated a bunch of times, maybe we can encapsulate that somewhat? Maybe like ensure_sync_state_valid() or such? > + dlist_foreach_modify(iter, &table_states) > + { > + TableState *tstate = dlist_container(TableState, node, iter.cur); > + bool start_worker; > + LogicalRepWorker *worker; > + > + /* > + * When the synchronization process is at the cachup phase we need s/cachup/catchup/ > + * to ensure that we are not behind it (it's going to wait at this > + * point for the change of state). Once we are infront or at the same > + * position as the synchronization proccess we can signal it to > + * finish the catchup. > + */ > + if (tstate->state == SUBREL_STATE_SYNCWAIT) > + { > + if (end_lsn > tstate->lsn) > + { > + /* > + * Apply is infront, tell sync to catchup. and wait until > + * it does. > + */ > + tstate->state = SUBREL_STATE_CATCHUP; > + tstate->lsn = end_lsn; > + StartTransactionCommand(); > + SetSubscriptionRelState(MyLogicalRepWorker->subid, > + tstate->relid, tstate->state, > + tstate->lsn); > + CommitTransactionCommand(); > + > + /* Signal the worker as it may be waiting for us. */ > + LWLockAcquire(LogicalRepWorkerLock, LW_SHARED); > + worker = logicalrep_worker_find(MyLogicalRepWorker->subid, > + tstate->relid); > + if (worker && worker->proc) > + SetLatch(&worker->proc->procLatch); > + LWLockRelease(LogicalRepWorkerLock); Different parts of this file use different lock level to set the latch. Why? > + if (wait_for_sync_status_change(tstate)) > + Assert(tstate->state == SUBREL_STATE_READY); > + } > + else > + { > + /* > + * Apply is either behind in which case sync worker is done > + * but apply needs to keep tracking the table until it > + * catches up to where sync finished. > + * Or apply and sync are at the same position in which case > + * table can be switched to standard replication mode > + * immediately. > + */ > + if (end_lsn < tstate->lsn) > + tstate->state = SUBREL_STATE_SYNCDONE; > + else > + tstate->state = SUBREL_STATE_READY; > + What I'm failing to understand is how this can be done under concurrency. You probably thought about this, but it should really be explained somewhere. > + StartTransactionCommand(); > + SetSubscriptionRelState(MyLogicalRepWorker->subid, > + tstate->relid, tstate->state, > + tstate->lsn); > + CommitTransactionCommand(); > + > + /* Signal the worker as it may be waiting for us. */ > + LWLockAcquire(LogicalRepWorkerLock, LW_SHARED); > + worker = logicalrep_worker_find(MyLogicalRepWorker->subid, > + tstate->relid); > + if (worker && worker->proc) > + SetLatch(&worker->proc->procLatch); > + LWLockRelease(LogicalRepWorkerLock); Oh, and again, please set latches outside of the lock. > + else if (tstate->state == SUBREL_STATE_SYNCDONE && > + end_lsn >= tstate->lsn) > + { > + /* > + * Apply catched up to the position where table sync finished, > + * mark the table as ready for normal replication. > + */ Sentence needs to be rephrased a bit. > + /* > + * In case table is supposed to be synchronizing but the > + * synchronization worker is not running, start it. > + * Limit the number of launched workers here to one (for now). > + */ Hm. That seems problematic for online upgrade type cases, we might never be catch up that way... > +/* > + * Start syncing the table in the sync worker. > + */ > +char * > +LogicalRepSyncTableStart(XLogRecPtr *origin_startpos) > +{ > + StringInfoData s; > + TableState tstate; > + MemoryContext oldctx; > + char *slotname; > + > + /* Check the state of the table synchronization. */ > + StartTransactionCommand(); > + tstate.relid = MyLogicalRepWorker->relid; > + tstate.state = GetSubscriptionRelState(MySubscription->oid, tstate.relid, > + &tstate.lsn, false); > + > + /* > + * Build unique slot name. > + * TODO: protect against too long slot name. > + */ > + oldctx = MemoryContextSwitchTo(CacheMemoryContext); > + initStringInfo(&s); > + appendStringInfo(&s, "%s_sync_%s", MySubscription->slotname, > + get_rel_name(tstate.relid)); > + slotname = s.data; Is this memory freed somewhere? > + /* > + * We want to do the table data sync in single > + * transaction so do not close the transaction opened > + * above. > + * There will be no BEGIN or COMMIT messages coming via > + * logical replication while the copy table command is > + * running so start the transaction here. > + * Note the memory context for data handling will still > + * be done using ensure_transaction called by the insert > + * handler. > + */ > + StartTransactionCommand(); > + > + /* > + * Don't allow parallel access other than SELECT while > + * the initial contents are being copied. > + */ > + rel = heap_open(tstate.relid, ExclusiveLock); Why do we want to allow access at all? > @@ -87,6 +92,8 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb) > cb->commit_cb = pgoutput_commit_txn; > cb->filter_by_origin_cb = pgoutput_origin_filter; > cb->shutdown_cb = pgoutput_shutdown; > + cb->tuple_cb = pgoutput_tuple; > + cb->list_tables_cb = pgoutput_list_tables; > } What are these new, and undocumented callbacks actually doing? And why is this integrated into logical decoding? > /* > + * Handle LIST_TABLES command. > + */ > +static void > +SendTableList(ListTablesCmd *cmd) > +{ Ugh. I really dislike this kind of command. I think we should instead change things around, allowing to issue normal SQL via the replication command. We'll have to error out for running sql for non-database connected replication connections, but that seems fine. Andres
On 9/14/16 11:21 AM, Andres Freund wrote: >> + ExecInsert(NULL, /* mtstate is only used for onconflict handling which we don't support atm */ >> > + remoteslot, >> > + remoteslot, >> > + NIL, >> > + ONCONFLICT_NONE, >> > + estate, >> > + false); > I have *severe* doubts about just using the (newly) exposed functions > 1:1 here. It is a valid concern, but what is the alternative? ExecInsert() and the others appear to do exactly the right things that are required. Are your concerns mainly philosophical about calling into internal executor code, or do you have technical concerns that this will not do the right thing in some cases? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2016-09-14 13:20:02 -0500, Peter Eisentraut wrote: > On 9/14/16 11:21 AM, Andres Freund wrote: > >> + ExecInsert(NULL, /* mtstate is only used for onconflict handling which we don't support atm */ > >> > + remoteslot, > >> > + remoteslot, > >> > + NIL, > >> > + ONCONFLICT_NONE, > >> > + estate, > >> > + false); > > I have *severe* doubts about just using the (newly) exposed functions > > 1:1 here. > > It is a valid concern, but what is the alternative? ExecInsert() and > the others appear to do exactly the right things that are required. They're actually a lot more heavyweight than what's required. If you e.g. do a large COPY on the source side, we create a single executor state (if at all), and then insert the rows using lower level routines. And that's *vastly* faster, than going through all the setup costs here for each row. > Are your concerns mainly philosophical about calling into internal > executor code, or do you have technical concerns that this will not do > the right thing in some cases? Well, not about it being wrong in the sene of returning wrong results, but wrong in the sense of not even remotely being able to keep up in common cases. Andres
On 14/09/16 00:48, Andres Freund wrote: > > First read through the current version. Hence no real architectural > comments. Hi, Thanks for looking! > > On 2016-09-09 00:59:26 +0200, Petr Jelinek wrote: > >> diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c >> new file mode 100644 >> index 0000000..e0c719d >> --- /dev/null >> +++ b/src/backend/commands/publicationcmds.c >> @@ -0,0 +1,761 @@ >> +/*------------------------------------------------------------------------- >> + * >> + * publicationcmds.c >> + * publication manipulation >> + * >> + * Copyright (c) 2015, PostgreSQL Global Development Group >> + * >> + * IDENTIFICATION >> + * publicationcmds.c >> > > Not that I'm a fan of this line in the first place, but usually it does > include the path. > Yes, I don't bother with it in WIP version though, because this way I won't forget to change it when it's getting close to ready if there were renames. >> +static void >> +check_replication_permissions(void) >> +{ >> + if (!superuser() && !has_rolreplication(GetUserId())) >> + ereport(ERROR, >> + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), >> + (errmsg("must be superuser or replication role to manipulate publications")))); >> +} > > Do we want to require owner privileges for replication roles? I'd say > no, but want to raise the question. > No, we might want to invent some publish role for which we will so that we can do logical replication with higher granularity but for replication role it does not make sense. And I think the higher granularity ACLs is something for followup patches. > >> +ObjectAddress >> +CreatePublication(CreatePublicationStmt *stmt) >> +{ >> + Relation rel; >> + ObjectAddress myself; >> + Oid puboid; >> + bool nulls[Natts_pg_publication]; >> + Datum values[Natts_pg_publication]; >> + HeapTuple tup; >> + bool replicate_insert_given; >> + bool replicate_update_given; >> + bool replicate_delete_given; >> + bool replicate_insert; >> + bool replicate_update; >> + bool replicate_delete; >> + >> + check_replication_permissions(); >> + >> + rel = heap_open(PublicationRelationId, RowExclusiveLock); >> + >> + /* Check if name is used */ >> + puboid = GetSysCacheOid1(PUBLICATIONNAME, CStringGetDatum(stmt->pubname)); >> + if (OidIsValid(puboid)) >> + { >> + ereport(ERROR, >> + (errcode(ERRCODE_DUPLICATE_OBJECT), >> + errmsg("publication \"%s\" already exists", >> + stmt->pubname))); >> + } >> + >> + /* Form a tuple. */ >> + memset(values, 0, sizeof(values)); >> + memset(nulls, false, sizeof(nulls)); >> + >> + values[Anum_pg_publication_pubname - 1] = >> + DirectFunctionCall1(namein, CStringGetDatum(stmt->pubname)); >> + >> + parse_publication_options(stmt->options, >> + &replicate_insert_given, &replicate_insert, >> + &replicate_update_given, &replicate_update, >> + &replicate_delete_given, &replicate_delete); >> + >> + values[Anum_pg_publication_puballtables - 1] = >> + BoolGetDatum(stmt->for_all_tables); >> + values[Anum_pg_publication_pubreplins - 1] = >> + BoolGetDatum(replicate_insert); >> + values[Anum_pg_publication_pubreplupd - 1] = >> + BoolGetDatum(replicate_update); >> + values[Anum_pg_publication_pubrepldel - 1] = >> + BoolGetDatum(replicate_delete); >> + >> + tup = heap_form_tuple(RelationGetDescr(rel), values, nulls); >> + >> + /* Insert tuple into catalog. */ >> + puboid = simple_heap_insert(rel, tup); >> + CatalogUpdateIndexes(rel, tup); >> + heap_freetuple(tup); >> + >> + ObjectAddressSet(myself, PublicationRelationId, puboid); >> + >> + /* Make the changes visible. */ >> + CommandCounterIncrement(); >> + >> + if (stmt->tables) >> + { >> + List *rels; >> + >> + Assert(list_length(stmt->tables) > 0); >> + >> + rels = GatherTableList(stmt->tables); >> + PublicationAddTables(puboid, rels, true, NULL); >> + CloseTables(rels); >> + } >> + else if (stmt->for_all_tables || stmt->schema) >> + { >> + List *rels; >> + >> + rels = GatherTables(stmt->schema); >> + PublicationAddTables(puboid, rels, true, NULL); >> + CloseTables(rels); >> + } > > Isn't this (and ALTER) racy? What happens if tables are concurrently > created? This session wouldn't necessarily see the tables, and other > sessions won't see for_all_tables/schema. Evaluating > for_all_tables/all_in_schema when the publication is used, would solve > that problem. Well, yes it is. It's technically not problem for all_in_schema as that's just shorthand for TABLE a,b,c,d etc where future tables don't matter (and should be added manually, unless we want to change that behavior to act more like for_all_tables just with schema filter which I wouldn't be against). But for for_all_tables it's problem I agree. Based on discussion offline I'll move the check to the actual DML operation instead of DDL and have for_all_tables be evaluated when used not when defined. > >> +/* >> + * Gather all tables optinally filtered by schema name. >> + * The gathered tables are locked in access share lock mode. >> + */ >> +static List * >> +GatherTables(char *nspname) >> +{ >> + Oid nspid = InvalidOid; >> + List *rels = NIL; >> + Relation rel; >> + SysScanDesc scan; >> + ScanKeyData key[1]; >> + HeapTuple tup; >> + >> + /* Resolve and validate the schema if specified */ >> + if (nspname) >> + { >> + nspid = LookupExplicitNamespace(nspname, false); >> + if (IsSystemNamespace(nspid) || IsToastNamespace(nspid)) >> + ereport(ERROR, >> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), >> + errmsg("only tables in user schemas can be added to publication"), >> + errdetail("%s is a system schema", strVal(nspname)))); >> + } > > Why are we restricting pg_catalog here? There's a bunch of extensions > creating objects therein, and we allow that. Seems better to just rely > on the IsSystemClass check for that below. > Makes sense. >> +/* >> + * Gather Relations based o provided by RangeVar list. >> + * The gathered tables are locked in access share lock mode. >> + */ > > Why access share? Shouldn't we make this ShareUpdateExclusive or > similar, to prevent schema changes? > Hm, I thought AccessShare would be enough to prevent schema changes that matter to us (which is basically just drop afaik). > >> +static List * >> +GatherTableList(List *tables) >> +{ >> + List *relids = NIL; >> + List *rels = NIL; >> + ListCell *lc; >> + >> + /* >> + * Open, share-lock, and check all the explicitly-specified relations >> + */ >> + foreach(lc, tables) >> + { >> + RangeVar *rv = lfirst(lc); >> + Relation rel; >> + bool recurse = interpretInhOption(rv->inhOpt); >> + Oid myrelid; >> + >> + rel = heap_openrv(rv, AccessShareLock); >> + myrelid = RelationGetRelid(rel); >> + /* don't throw error for "foo, foo" */ >> + if (list_member_oid(relids, myrelid)) >> + { >> + heap_close(rel, AccessShareLock); >> + continue; >> + } >> + rels = lappend(rels, rel); >> + relids = lappend_oid(relids, myrelid); >> + >> + if (recurse) >> + { >> + ListCell *child; >> + List *children; >> + >> + children = find_all_inheritors(myrelid, AccessShareLock, >> + NULL); >> + >> + foreach(child, children) >> + { >> + Oid childrelid = lfirst_oid(child); >> + >> + if (list_member_oid(relids, childrelid)) >> + continue; >> + >> + /* find_all_inheritors already got lock */ >> + rel = heap_open(childrelid, NoLock); >> + rels = lappend(rels, rel); >> + relids = lappend_oid(relids, childrelid); >> + } >> + } >> + } > > Hm, can't this yield duplicates, when both an inherited and a top level > relation are specified? > Hmm possible, I'll do the same check as I do above. > >> @@ -713,6 +714,25 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId, >> ObjectAddressSet(address, RelationRelationId, relationId); >> >> /* >> + * If the newly created relation is a table and there are publications >> + * which were created as FOR ALL TABLES, we want to add the relation >> + * membership to those publications. >> + */ >> + >> + if (relkind == RELKIND_RELATION) >> + { >> + List *pubids = GetAllTablesPublications(); >> + ListCell *lc; >> + >> + foreach(lc, pubids) >> + { >> + Oid pubid = lfirst_oid(lc); >> + >> + publication_add_relation(pubid, rel, false); >> + } >> + } >> + > > Hm, this has the potential to noticeably slow down table creation. > I doubt it's going to be noticeable given all the work CREATE TABLE already does, but it certainly won't make it any faster. But since we agreed to move the check to DML this will be removed as well. >> +publication_opt_item: >> + IDENT >> + { >> + /* >> + * We handle identifiers that aren't parser keywords with >> + * the following special-case codes, to avoid bloating the >> + * size of the main parser. >> + */ >> + if (strcmp($1, "replicate_insert") == 0) >> + $$ = makeDefElem("replicate_insert", >> + (Node *)makeInteger(TRUE), @1); >> + else if (strcmp($1, "noreplicate_insert") == 0) >> + $$ = makeDefElem("replicate_insert", >> + (Node *)makeInteger(FALSE), @1); >> + else if (strcmp($1, "replicate_update") == 0) >> + $$ = makeDefElem("replicate_update", >> + (Node *)makeInteger(TRUE), @1); >> + else if (strcmp($1, "noreplicate_update") == 0) >> + $$ = makeDefElem("replicate_update", >> + (Node *)makeInteger(FALSE), @1); >> + else if (strcmp($1, "replicate_delete") == 0) >> + $$ = makeDefElem("replicate_delete", >> + (Node *)makeInteger(TRUE), @1); >> + else if (strcmp($1, "noreplicate_delete") == 0) >> + $$ = makeDefElem("replicate_delete", >> + (Node *)makeInteger(FALSE), @1); >> + else >> + ereport(ERROR, >> + (errcode(ERRCODE_SYNTAX_ERROR), >> + errmsg("unrecognized publication option \"%s\"", $1), >> + parser_errposition(@1))); >> + } >> + ; > > I'm kind of inclined to do this checking at execution (or transform) > time instead. That allows extension to add options, and handle them in > utility hooks. > Thant's interesting point, I prefer the parsing to be done in gram.y, but it might be worth moving it for extensibility. Although there are so far other barriers for that. >> + >> +/* ---------------- >> + * pg_publication_rel definition. cpp turns this into >> + * typedef struct FormData_pg_publication_rel >> + * >> + * ---------------- >> + */ >> +#define PublicationRelRelationId 6106 >> + >> +CATALOG(pg_publication_rel,6106) >> +{ >> + Oid pubid; /* Oid of the publication */ >> + Oid relid; /* Oid of the relation */ >> +} FormData_pg_publication_rel; > > Hm. Do we really want this to have an oid? Won't that significantly, > especially if multiple publications are present, increase our oid > consumption? It seems entirely sufficient to identify rows in here > using (pubid, relid). > It could, but I'll have to check and possibly fix dependency code, I vaguely remember that there is some part of it that assumes that suboid is only used for relation column and nothing else. > >> +ObjectAddress >> +CreateSubscription(CreateSubscriptionStmt *stmt) >> +{ >> + Relation rel; >> + ObjectAddress myself; >> + Oid subid; >> + bool nulls[Natts_pg_subscription]; >> + Datum values[Natts_pg_subscription]; >> + HeapTuple tup; >> + bool enabled_given; >> + bool enabled; >> + char *conninfo; >> + List *publications; >> + >> + check_subscription_permissions(); >> + >> + rel = heap_open(SubscriptionRelationId, RowExclusiveLock); >> + >> + /* Check if name is used */ >> + subid = GetSysCacheOid2(SUBSCRIPTIONNAME, MyDatabaseId, >> + CStringGetDatum(stmt->subname)); >> + if (OidIsValid(subid)) >> + { >> + ereport(ERROR, >> + (errcode(ERRCODE_DUPLICATE_OBJECT), >> + errmsg("subscription \"%s\" already exists", >> + stmt->subname))); >> + } >> + >> + /* Parse and check options. */ >> + parse_subscription_options(stmt->options, &enabled_given, &enabled, >> + &conninfo, &publications); >> + >> + /* TODO: improve error messages here. */ >> + if (conninfo == NULL) >> + ereport(ERROR, >> + (errcode(ERRCODE_SYNTAX_ERROR), >> + errmsg("connection not specified"))); > > Probably also makes sense to parse the conninfo here to verify it looks > saen. Although that's fairly annoying to do, because the relevant code > is libpq :( > Well the connection is eventually used (in later patches) so maybe that's not problem. > >> diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c >> index 65230e2..f3d54c8 100644 >> --- a/src/backend/nodes/copyfuncs.c >> +++ b/src/backend/nodes/copyfuncs.c > > I think you might be missing outfuncs support. > I thought that we don't do outfuncs for DDL? >> + >> +CATALOG(pg_subscription,6100) BKI_SHARED_RELATION BKI_ROWTYPE_OID(6101) BKI_SCHEMA_MACRO >> +{ >> + Oid subdbid; /* Database the subscription is in. */ >> + NameData subname; /* Name of the subscription */ >> + bool subenabled; /* True if the subsription is enabled (running) */ > > Not sure what "running" means here. It's very terse way of saying that enabled means worker should be running. >> + <varlistentry> >> + <term> >> + publication_names >> + </term> >> + <listitem> >> + <para> >> + Comma separated list of publication names for which to subscribe >> + (receive changes). See >> + <xref linkend="logical-replication-publication"> for more info. >> + </para> >> + </listitem> >> + </varlistentry> >> + </variablelist> > > Do we need to specify an escaping scheme here? > Probably as we allow whatever Name allows. > >> +<listitem> >> +<para> >> + Commit timestamp of the transaction. >> +</para> >> +</listitem> >> +</varlistentry> > > Perhaps mention it's relative to postgres epoch? > Already done in my local working copy. > > >> +<variablelist> >> +<varlistentry> >> +<term> >> + Byte1('O') >> +</term> >> +<listitem> >> +<para> >> + Identifies the message as an origin message. >> +</para> >> +</listitem> >> +</varlistentry> >> +<varlistentry> >> +<term> >> + Int64 >> +</term> >> +<listitem> >> +<para> >> + The LSN of the commit on the origin server. >> +</para> >> +</listitem> >> +</varlistentry> >> +<varlistentry> >> +<term> >> + Int8 >> +</term> >> +<listitem> >> +<para> >> + Length of the origin name (including the NULL-termination >> + character). >> +</para> >> +</listitem> >> +</varlistentry> > > Should this explain that there could be mulitple origin messages (when > replay switched origins during an xact)? > Makes sense. >> +<para> >> + Relation name. >> +</para> >> +</listitem> >> +</varlistentry> >> +</variablelist> >> + >> +</para> >> + >> +<para> >> +This message is always followed by Attributes message. >> +</para> > > What's the point of having this separate from the relation message? > It's not, it part of it, but the documentation does not make that very clear. >> +<varlistentry> >> +<term> >> + Byte1('C') >> +</term> >> +<listitem> >> +<para> >> + Start of column block. >> +</para> >> +</listitem> > > "block"? > Block, message part, sub-message, I am not sure how to call something that's repeating inside of a message. >> +</varlistentry><varlistentry> >> +<term> >> + Int8 >> +</term> >> +<listitem> >> +<para> >> + Flags for the column. Currently can be either 0 for no flags >> + or one which marks the column as part of the key. >> +</para> >> +</listitem> >> +</varlistentry> >> +<varlistentry> >> +<term> >> + Int8 >> +</term> >> +<listitem> >> +<para> >> + Length of column name (including the NULL-termination >> + character). >> +</para> >> +</listitem> >> +</varlistentry> >> +<varlistentry> >> +<term> >> + String >> +</term> >> +<listitem> >> +<para> >> + Name of the column. >> +</para> >> +</listitem> >> +</varlistentry> > > Huh, no type information? > It's not necessary for the text transfer, it will be if we ever add binary data transfer but that will require protocol version bump anyway. >> +<varlistentry> >> +<term> >> + Byte1('O') >> +</term> >> +<listitem> >> +<para> >> + Identifies the following TupleData message as the old tuple >> + (deleted tuple). >> +</para> >> +</listitem> >> +</varlistentry> > > Should we discern between old key and old tuple? > Yes, otherwise it will be hard to support REPLICA IDENTITY FULL. > >> +/* >> + * Read transaction BEGIN from the stream. >> + */ >> +void >> +logicalrep_read_begin(StringInfo in, XLogRecPtr *remote_lsn, >> + TimestampTz *committime, TransactionId *remote_xid) >> +{ >> + /* read fields */ >> + *remote_lsn = pq_getmsgint64(in); >> + Assert(*remote_lsn != InvalidXLogRecPtr); >> + *committime = pq_getmsgint64(in); >> + *remote_xid = pq_getmsgint(in, 4); >> +} > > In network exposed stuff it seems better not to use assert, and error > out instead. > Okay > >> +/* >> + * Write UPDATE to the output stream. >> + */ >> +void >> +logicalrep_write_update(StringInfo out, Relation rel, HeapTuple oldtuple, >> + HeapTuple newtuple) >> +{ >> + pq_sendbyte(out, 'U'); /* action UPDATE */ >> + >> + /* use Oid as relation identifier */ >> + pq_sendint(out, RelationGetRelid(rel), 4); > > Wonder if there's a way that could screw us. What happens if there's an > oid wraparound, and a relation is dropped? Then a new relation could end > up with same id. Maybe answered somewhere further down. > Should not, we'll know we didn't send the message for the new table yet so we'll send new Relation message. >> + >> +/* >> + * COMMIT callback >> + */ >> +static void >> +pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, >> + XLogRecPtr commit_lsn) >> +{ >> + OutputPluginPrepareWrite(ctx, true); >> + logicalrep_write_commit(ctx->out, txn, commit_lsn); >> + OutputPluginWrite(ctx, true); >> +} > > Hm, so we don't reset the context for these... > What? >> +/* >> + * Sends the decoded DML over wire. >> + */ >> +static void >> +pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, >> + Relation relation, ReorderBufferChange *change) >> +{ > >> + /* Avoid leaking memory by using and resetting our own context */ >> + old = MemoryContextSwitchTo(data->context); >> + >> + /* >> + * Write the relation schema if the current schema haven't been sent yet. >> + */ >> + if (!relentry->schema_sent) >> + { >> + OutputPluginPrepareWrite(ctx, false); >> + logicalrep_write_rel(ctx->out, relation); >> + OutputPluginWrite(ctx, false); >> + relentry->schema_sent = true; >> + } >> + >> + /* Send the data */ >> + switch (change->action) >> + { > ... >> + /* Cleanup */ >> + MemoryContextSwitchTo(old); >> + MemoryContextReset(data->context); >> +} > > IIRC there were some pfree's in called functions. It's probably better > to remove those and rely on this. > Only write_tuple calls pfree, that's mostly because we may call it twice for single tuple and it might allocate a lot of data. >> +/* >> + * Load publications from the list of publication names. >> + */ >> +static List * >> +LoadPublications(List *pubnames) >> +{ >> + List *result = NIL; >> + ListCell *lc; >> + >> + foreach (lc, pubnames) >> + { >> + char *pubname = (char *) lfirst(lc); >> + Publication *pub = GetPublicationByName(pubname, false); >> + >> + result = lappend(result, pub); >> + } >> + >> + return result; >> +} > > Why are we doing this eagerly? On systems with a lot of relations > this'll suck up a fair amount of memory, without much need? > Don't follow, it only reads publications not relations in them, reason why we do it eagerly is to validate that the requested publications actually exist. >> +/* >> + * Remove all the entries from our relation cache. >> + */ >> +static void >> +destroy_rel_sync_cache(void) >> +{ >> + HASH_SEQ_STATUS status; >> + RelationSyncEntry *entry; >> + >> + if (RelationSyncCache == NULL) >> + return; >> + >> + hash_seq_init(&status, RelationSyncCache); >> + >> + while ((entry = (RelationSyncEntry *) hash_seq_search(&status)) != NULL) >> + { >> + if (hash_search(RelationSyncCache, (void *) &entry->relid, >> + HASH_REMOVE, NULL) == NULL) >> + elog(ERROR, "hash table corrupted"); >> + } >> + >> + RelationSyncCache = NULL; >> +} > > Any reason not to just destroy the hash table instead? > Missed that we have AOI for that. >> >> /* >> - * Module load callback >> + * Module initialization callback >> */ >> -void >> -_PG_init(void) >> +WalReceiverConnHandle * >> +_PG_walreceirver_conn_init(WalReceiverConnAPI *wrcapi) >> { >> - /* Tell walreceiver how to reach us */ >> - if (walrcv_connect != NULL || walrcv_identify_system != NULL || >> - walrcv_readtimelinehistoryfile != NULL || >> - walrcv_startstreaming != NULL || walrcv_endstreaming != NULL || >> - walrcv_receive != NULL || walrcv_send != NULL || >> - walrcv_disconnect != NULL) >> - elog(ERROR, "libpqwalreceiver already loaded"); >> - walrcv_connect = libpqrcv_connect; >> - walrcv_get_conninfo = libpqrcv_get_conninfo; >> - walrcv_identify_system = libpqrcv_identify_system; >> - walrcv_readtimelinehistoryfile = libpqrcv_readtimelinehistoryfile; >> - walrcv_startstreaming = libpqrcv_startstreaming; >> - walrcv_endstreaming = libpqrcv_endstreaming; >> - walrcv_receive = libpqrcv_receive; >> - walrcv_send = libpqrcv_send; >> - walrcv_disconnect = libpqrcv_disconnect; >> + WalReceiverConnHandle *handle; >> + >> + handle = palloc0(sizeof(WalReceiverConnHandle)); >> + >> + /* Tell caller how to reach us */ >> + wrcapi->connect = libpqrcv_connect; >> + wrcapi->get_conninfo = libpqrcv_get_conninfo; >> + wrcapi->identify_system = libpqrcv_identify_system; >> + wrcapi->readtimelinehistoryfile = libpqrcv_readtimelinehistoryfile; >> + wrcapi->create_slot = libpqrcv_create_slot; >> + wrcapi->startstreaming_physical = libpqrcv_startstreaming_physical; >> + wrcapi->startstreaming_logical = libpqrcv_startstreaming_logical; >> + wrcapi->endstreaming = libpqrcv_endstreaming; >> + wrcapi->receive = libpqrcv_receive; >> + wrcapi->send = libpqrcv_send; >> + wrcapi->disconnect = libpqrcv_disconnect; >> + >> + return handle; >> } > > This however I'm not following. Why do we need multiple copies of this? > And why aren't we doing the assignments in _PG_init? Seems better to > just allocate one WalRcvCalllbacks globally and assign all these as > constants. Then the establishment function can just return all these > (as part of a bigger struct). > Meh, If I understand you correctly that will make the access bit more ugly (multiple layers of structs). > > (skipped logical rep docs) > >> diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml >> index 8acdff1..34007d3 100644 >> --- a/doc/src/sgml/reference.sgml >> +++ b/doc/src/sgml/reference.sgml >> @@ -54,11 +54,13 @@ >> &alterOperatorClass; >> &alterOperatorFamily; >> &alterPolicy; >> + &alterPublication; >> &alterRole; >> &alterRule; >> &alterSchema; >> &alterSequence; >> &alterServer; >> + &alterSubscription; >> &alterSystem; >> &alterTable; >> &alterTableSpace; >> @@ -100,11 +102,13 @@ >> &createOperatorClass; >> &createOperatorFamily; >> &createPolicy; >> + &createPublication; >> &createRole; >> &createRule; >> &createSchema; >> &createSequence; >> &createServer; >> + &createSubscription; >> &createTable; >> &createTableAs; >> &createTableSpace; >> @@ -144,11 +148,13 @@ >> &dropOperatorFamily; >> &dropOwned; >> &dropPolicy; >> + &dropPublication; >> &dropRole; >> &dropRule; >> &dropSchema; >> &dropSequence; >> &dropServer; >> + &dropSubscription; >> &dropTable; >> &dropTableSpace; >> &dropTSConfig; > > Hm, shouldn't all these have been registered in the earlier patch? > Yeah, all the rebasing sometimes produces artefacts. > > >> diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c >> index d29d3f9..f2052b8 100644 >> --- a/src/backend/commands/subscriptioncmds.c >> +++ b/src/backend/commands/subscriptioncmds.c > > This sure is a lot of yanking around of previously added code. At least > some of it looks like it should really have been part of the earlier > commit. > True, but it depends on the previous patch ... scratches head ... hmm although the libpqwalreceiver actually does not depend on anything so it could be first patch in series, then this code could be moved to the patch which adds subscriptions. > >> @@ -327,6 +431,18 @@ DropSubscriptionById(Oid subid) >> { >> Relation rel; >> HeapTuple tup; >> + Datum datum; >> + bool isnull; >> + char *subname; >> + char *conninfo; >> + char *slotname; >> + RepOriginId originid; >> + MemoryContext tmpctx, >> + oldctx; >> + WalReceiverConnHandle *wrchandle = NULL; >> + WalReceiverConnAPI *wrcapi = NULL; >> + walrcvconn_init_fn walrcvconn_init; >> + LogicalRepWorker *worker; >> >> check_subscription_permissions(); >> >> @@ -337,9 +453,135 @@ DropSubscriptionById(Oid subid) >> if (!HeapTupleIsValid(tup)) >> elog(ERROR, "cache lookup failed for subscription %u", subid); >> >> + /* >> + * Create temporary memory context to keep copy of subscription >> + * info needed later in the execution. >> + */ >> + tmpctx = AllocSetContextCreate(TopMemoryContext, >> + "DropSubscription Ctx", >> + ALLOCSET_DEFAULT_MINSIZE, >> + ALLOCSET_DEFAULT_INITSIZE, >> + ALLOCSET_DEFAULT_MAXSIZE); >> + oldctx = MemoryContextSwitchTo(tmpctx); >> + >> + /* Get subname */ >> + datum = SysCacheGetAttr(SUBSCRIPTIONOID, tup, >> + Anum_pg_subscription_subname, &isnull); >> + Assert(!isnull); >> + subname = pstrdup(NameStr(*DatumGetName(datum))); >> + >> + /* Get conninfo */ >> + datum = SysCacheGetAttr(SUBSCRIPTIONOID, tup, >> + Anum_pg_subscription_subconninfo, &isnull); >> + Assert(!isnull); >> + conninfo = pstrdup(TextDatumGetCString(datum)); >> + >> + /* Get slotname */ >> + datum = SysCacheGetAttr(SUBSCRIPTIONOID, tup, >> + Anum_pg_subscription_subslotname, &isnull); >> + Assert(!isnull); >> + slotname = pstrdup(NameStr(*DatumGetName(datum))); >> + >> + MemoryContextSwitchTo(oldctx); >> + >> + /* Remove the tuple from catalog. */ >> simple_heap_delete(rel, &tup->t_self); >> >> - ReleaseSysCache(tup); >> + /* Protect against launcher restarting the worker. */ >> + LWLockAcquire(LogicalRepLauncherLock, LW_EXCLUSIVE); >> >> - heap_close(rel, RowExclusiveLock); >> + /* Kill the apply worker so that the slot becomes accessible. */ >> + LWLockAcquire(LogicalRepWorkerLock, LW_SHARED); >> + worker = logicalrep_worker_find(subid); >> + if (worker) >> + logicalrep_worker_stop(worker); >> + LWLockRelease(LogicalRepWorkerLock); >> + >> + /* Wait for apply process to die. */ >> + for (;;) >> + { >> + int rc; >> + >> + CHECK_FOR_INTERRUPTS(); >> + >> + LWLockAcquire(LogicalRepWorkerLock, LW_SHARED); >> + if (logicalrep_worker_count(subid) < 1) >> + { >> + LWLockRelease(LogicalRepWorkerLock); >> + break; >> + } >> + LWLockRelease(LogicalRepWorkerLock); >> + >> + /* Wait for more work. */ >> + rc = WaitLatch(&MyProc->procLatch, >> + WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH, >> + 1000L); >> + >> + /* emergency bailout if postmaster has died */ >> + if (rc & WL_POSTMASTER_DEATH) >> + proc_exit(1); >> + >> + ResetLatch(&MyProc->procLatch); >> + } > > I'm really far from convinced this is the right layer to perform these > operations. Previously these routines were low level catalog > manipulation routines. Now they're certainly not. > Well I do want to have this happen when the DDL is executed so that I can inform user about failure. I can move this code to a separate function but it will still be executed in this layer. > >> + /* >> + * Now that the catalog update is done, try to reserve slot at the >> + * provider node using replication connection. >> + */ >> + wrcapi = palloc0(sizeof(WalReceiverConnAPI)); >> + >> + walrcvconn_init = (walrcvconn_init_fn) >> + load_external_function("libpqwalreceiver", >> + "_PG_walreceirver_conn_init", false, NULL); >> + >> + if (walrcvconn_init == NULL) >> + elog(ERROR, "libpqwalreceiver does not declare _PG_walreceirver_conn_init symbol"); > > This does rather reinforce my opinion that the _PG_init removal in > libpqwalreceiver isn't useful. I don't see how it helps, you said we'd still return struct from some interface so this would be more or less the same? > >> diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c >> index 699c934..fc998cd 100644 >> --- a/src/backend/postmaster/bgworker.c >> +++ b/src/backend/postmaster/bgworker.c >> @@ -93,6 +93,9 @@ struct BackgroundWorkerHandle >> >> static BackgroundWorkerArray *BackgroundWorkerData; >> >> +/* Enables registration of internal background workers. */ >> +bool internal_bgworker_registration_in_progress = false; >> + >> /* >> * Calculate shared memory needed. >> */ >> @@ -745,7 +748,8 @@ RegisterBackgroundWorker(BackgroundWorker *worker) >> ereport(DEBUG1, >> (errmsg("registering background worker \"%s\"", worker->bgw_name))); >> >> - if (!process_shared_preload_libraries_in_progress) >> + if (!process_shared_preload_libraries_in_progress && >> + !internal_bgworker_registration_in_progress) >> { >> if (!IsUnderPostmaster) >> ereport(LOG, > > Ugh. > > > > >> /* >> + * Register internal background workers. >> + * >> + * This is here mainly because the permanent bgworkers are normally allowed >> + * to be registered only when share preload libraries are loaded which does >> + * not work for the internal ones. >> + */ >> +static void >> +register_internal_bgworkers(void) >> +{ >> + internal_bgworker_registration_in_progress = true; >> + >> + /* Register the logical replication worker launcher if appropriate. */ >> + if (!IsBinaryUpgrade && max_logical_replication_workers > 0) >> + { >> + BackgroundWorker bgw; >> + >> + bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | >> + BGWORKER_BACKEND_DATABASE_CONNECTION; >> + bgw.bgw_start_time = BgWorkerStart_RecoveryFinished; >> + bgw.bgw_main = ApplyLauncherMain; >> + snprintf(bgw.bgw_name, BGW_MAXLEN, >> + "logical replication launcher"); >> + bgw.bgw_restart_time = 5; >> + bgw.bgw_notify_pid = 0; >> + bgw.bgw_main_arg = (Datum) 0; >> + >> + RegisterBackgroundWorker(&bgw); >> + } >> + >> + internal_bgworker_registration_in_progress = false; >> +} > > Who says these flags are right for everyone? If we indeed want to go > through bgworkers here, I think you'll have to generallize this a bit, > so we don't check for max_logical_replication_workers and such here. We > could e.g. have the shared memory sizing hooks set up a chain of > registrations. > It could be more generalized, I agree, this is more of a WIP hack. I would like to make special version of RegisterBackgroundWorker called something like RegisterInternalBackgroundWorker that does something similar as the above function (obviously the if should be moved to the caller of that function). The main point here is to be able to register static worker without extension. > > >> -static void >> +static char * >> libpqrcv_identify_system(WalReceiverConnHandle *handle, >> - TimeLineID *primary_tli) >> + TimeLineID *primary_tli, >> + char **dbname) >> { >> + char *sysid; >> PGresult *res; >> - char *primary_sysid; >> - char standby_sysid[32]; >> >> /* >> * Get the system identifier and timeline ID as a DataRow message from the >> @@ -231,24 +234,19 @@ libpqrcv_identify_system(WalReceiverConnHandle *handle, >> errdetail("Could not identify system: got %d rows and %d fields, expected %d rows and %d or more fields.", >> ntuples, nfields, 3, 1))); >> } >> - primary_sysid = PQgetvalue(res, 0, 0); >> + sysid = pstrdup(PQgetvalue(res, 0, 0)); >> *primary_tli = pg_atoi(PQgetvalue(res, 0, 1), 4, 0); >> - >> - /* >> - * Confirm that the system identifier of the primary is the same as ours. >> - */ >> - snprintf(standby_sysid, sizeof(standby_sysid), UINT64_FORMAT, >> - GetSystemIdentifier()); >> - if (strcmp(primary_sysid, standby_sysid) != 0) >> + if (dbname) >> { >> - primary_sysid = pstrdup(primary_sysid); >> - PQclear(res); >> - ereport(ERROR, >> - (errmsg("database system identifier differs between the primary and standby"), >> - errdetail("The primary's identifier is %s, the standby's identifier is %s.", >> - primary_sysid, standby_sysid))); >> + if (PQgetisnull(res, 0, 3)) >> + *dbname = NULL; >> + else >> + *dbname = pstrdup(PQgetvalue(res, 0, 3)); >> } >> + >> PQclear(res); >> + >> + return sysid; >> } >> >> /* >> @@ -274,7 +272,7 @@ libpqrcv_create_slot(WalReceiverConnHandle *handle, char *slotname, >> >> if (PQresultStatus(res) != PGRES_TUPLES_OK) >> { >> - elog(FATAL, "could not crate replication slot \"%s\": %s\n", >> + elog(ERROR, "could not crate replication slot \"%s\": %s\n", >> slotname, PQerrorMessage(handle->streamConn)); >> } >> >> @@ -287,6 +285,28 @@ libpqrcv_create_slot(WalReceiverConnHandle *handle, char *slotname, >> return snapshot; >> } >> >> +/* >> + * Drop replication slot. >> + */ >> +static void >> +libpqrcv_drop_slot(WalReceiverConnHandle *handle, char *slotname) >> +{ >> + PGresult *res; >> + char cmd[256]; >> + >> + snprintf(cmd, sizeof(cmd), >> + "DROP_REPLICATION_SLOT \"%s\"", slotname); >> + >> + res = libpqrcv_PQexec(handle, cmd); >> + >> + if (PQresultStatus(res) != PGRES_COMMAND_OK) >> + { >> + elog(ERROR, "could not drop replication slot \"%s\": %s\n", >> + slotname, PQerrorMessage(handle->streamConn)); >> + } >> + >> + PQclear(res); >> +} > > > Given that the earlier commit to libpqwalreciever added a lot of this > information, it doesn't seem right to change it again here. > Why? It's pretty unrelated to the previous change which is basically just refactoring, this actually adds new functionality. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 14/09/16 20:50, Andres Freund wrote: > On 2016-09-14 13:20:02 -0500, Peter Eisentraut wrote: >> On 9/14/16 11:21 AM, Andres Freund wrote: >>>> + ExecInsert(NULL, /* mtstate is only used for onconflict handling which we don't support atm */ >>>>> + remoteslot, >>>>> + remoteslot, >>>>> + NIL, >>>>> + ONCONFLICT_NONE, >>>>> + estate, >>>>> + false); >>> I have *severe* doubts about just using the (newly) exposed functions >>> 1:1 here. >> >> It is a valid concern, but what is the alternative? ExecInsert() and >> the others appear to do exactly the right things that are required. > > They're actually a lot more heavyweight than what's required. If you > e.g. do a large COPY on the source side, we create a single executor > state (if at all), and then insert the rows using lower level > routines. And that's *vastly* faster, than going through all the setup > costs here for each row. > > >> Are your concerns mainly philosophical about calling into internal >> executor code, or do you have technical concerns that this will not do >> the right thing in some cases? > > Well, not about it being wrong in the sene of returning wrong results, > but wrong in the sense of not even remotely being able to keep up in > common cases. > I'd say in common case they will. I don't plan to use these forever btw, but it's simplest to just use them in v1 IMHO instead of trying to reinvent new versions of these that perform better but also behave correctly (in terms of triggers and stuff for example). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi, On 2016-09-14 21:17:42 +0200, Petr Jelinek wrote: > > > +/* > > > + * Gather Relations based o provided by RangeVar list. > > > + * The gathered tables are locked in access share lock mode. > > > + */ > > > > Why access share? Shouldn't we make this ShareUpdateExclusive or > > similar, to prevent schema changes? > > > > Hm, I thought AccessShare would be enough to prevent schema changes that > matter to us (which is basically just drop afaik). Doesn't e.g. dropping an index matter as well? > > > + if (strcmp($1, "replicate_insert") == 0) > > > + $$ = makeDefElem("replicate_insert", > > > + (Node *)makeInteger(TRUE), @1); > > > + else if (strcmp($1, "noreplicate_insert") == 0) > > > + $$ = makeDefElem("replicate_insert", > > > + (Node *)makeInteger(FALSE), @1); > > > + else if (strcmp($1, "replicate_update") == 0) > > > + $$ = makeDefElem("replicate_update", > > > + (Node *)makeInteger(TRUE), @1); > > > + else if (strcmp($1, "noreplicate_update") == 0) > > > + $$ = makeDefElem("replicate_update", > > > + (Node *)makeInteger(FALSE), @1); > > > + else if (strcmp($1, "replicate_delete") == 0) > > > + $$ = makeDefElem("replicate_delete", > > > + (Node *)makeInteger(TRUE), @1); > > > + else if (strcmp($1, "noreplicate_delete") == 0) > > > + $$ = makeDefElem("replicate_delete", > > > + (Node *)makeInteger(FALSE), @1); > > > + else > > > + ereport(ERROR, > > > + (errcode(ERRCODE_SYNTAX_ERROR), > > > + errmsg("unrecognized publication option \"%s\"", $1), > > > + parser_errposition(@1))); > > > + } > > > + ; > > > > I'm kind of inclined to do this checking at execution (or transform) > > time instead. That allows extension to add options, and handle them in > > utility hooks. > > > > Thant's interesting point, I prefer the parsing to be done in gram.y, but it > might be worth moving it for extensibility. Although there are so far other > barriers for that. Citus uses the lack of such check for COPY to implement copy over it's distributed tables for example. So there's some benefit. > > > + check_subscription_permissions(); > > > + > > > + rel = heap_open(SubscriptionRelationId, RowExclusiveLock); > > > + > > > + /* Check if name is used */ > > > + subid = GetSysCacheOid2(SUBSCRIPTIONNAME, MyDatabaseId, > > > + CStringGetDatum(stmt->subname)); > > > + if (OidIsValid(subid)) > > > + { > > > + ereport(ERROR, > > > + (errcode(ERRCODE_DUPLICATE_OBJECT), > > > + errmsg("subscription \"%s\" already exists", > > > + stmt->subname))); > > > + } > > > + > > > + /* Parse and check options. */ > > > + parse_subscription_options(stmt->options, &enabled_given, &enabled, > > > + &conninfo, &publications); > > > + > > > + /* TODO: improve error messages here. */ > > > + if (conninfo == NULL) > > > + ereport(ERROR, > > > + (errcode(ERRCODE_SYNTAX_ERROR), > > > + errmsg("connection not specified"))); > > > > Probably also makes sense to parse the conninfo here to verify it looks > > saen. Although that's fairly annoying to do, because the relevant code > > is libpq :( > > > > Well the connection is eventually used (in later patches) so maybe that's > not problem. Well, it's nicer if it's immediately parsed, before doing complex and expensive stuff, especially if that happens outside of the transaction. > > > > > diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c > > > index 65230e2..f3d54c8 100644 > > > --- a/src/backend/nodes/copyfuncs.c > > > +++ b/src/backend/nodes/copyfuncs.c > > > > I think you might be missing outfuncs support. > > > > I thought that we don't do outfuncs for DDL? I think it's just readfuncs that's skipped. > > > + Length of column name (including the NULL-termination > > > + character). > > > +</para> > > > +</listitem> > > > +</varlistentry> > > > +<varlistentry> > > > +<term> > > > + String > > > +</term> > > > +<listitem> > > > +<para> > > > + Name of the column. > > > +</para> > > > +</listitem> > > > +</varlistentry> > > > > Huh, no type information? > > > > It's not necessary for the text transfer, it will be if we ever add binary > data transfer but that will require protocol version bump anyway. I'm *hugely* unconvinced of this. For one type information is useful for error reporting and such as well. For another, it's one thing to add a new protocol message (for differently encoded tuples), and something entirely different to change the format of existing messages. > > > + > > > +/* > > > + * COMMIT callback > > > + */ > > > +static void > > > +pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, > > > + XLogRecPtr commit_lsn) > > > +{ > > > + OutputPluginPrepareWrite(ctx, true); > > > + logicalrep_write_commit(ctx->out, txn, commit_lsn); > > > + OutputPluginWrite(ctx, true); > > > +} > > > > Hm, so we don't reset the context for these... > > > > What? We only use & reset the data-> memory context in the change callback. I'm not sure that's good. > > This however I'm not following. Why do we need multiple copies of this? > > And why aren't we doing the assignments in _PG_init? Seems better to > > just allocate one WalRcvCalllbacks globally and assign all these as > > constants. Then the establishment function can just return all these > > (as part of a bigger struct). > > > > Meh, If I understand you correctly that will make the access bit more ugly > (multiple layers of structs). On the other hand, you right now need to access one struct, and pass the other... > > This does rather reinforce my opinion that the _PG_init removal in > > libpqwalreceiver isn't useful. > > I don't see how it helps, you said we'd still return struct from some > interface so this would be more or less the same? Or we just set some global vars and use them directly. Andres
On 14/09/16 18:21, Andres Freund wrote: > (continuing, uh, a bit happier) > > On 2016-09-09 00:59:26 +0200, Petr Jelinek wrote: > >> +/* >> + * Relcache invalidation callback for our relation map cache. >> + */ >> +static void >> +logicalreprelmap_invalidate_cb(Datum arg, Oid reloid) >> +{ >> + LogicalRepRelMapEntry *entry; >> + >> + /* Just to be sure. */ >> + if (LogicalRepRelMap == NULL) >> + return; >> + >> + if (reloid != InvalidOid) >> + { >> + HASH_SEQ_STATUS status; >> + >> + hash_seq_init(&status, LogicalRepRelMap); >> + >> + /* TODO, use inverse lookup hastable? */ > > *hashtable > >> + while ((entry = (LogicalRepRelMapEntry *) hash_seq_search(&status)) != NULL) >> + { >> + if (entry->reloid == reloid) >> + entry->reloid = InvalidOid; > > can't we break here? > Probably. > >> +/* >> + * Initialize the relation map cache. >> + */ >> +static void >> +remoterelmap_init(void) >> +{ >> + HASHCTL ctl; >> + >> + /* Make sure we've initialized CacheMemoryContext. */ >> + if (CacheMemoryContext == NULL) >> + CreateCacheMemoryContext(); >> + >> + /* Initialize the hash table. */ >> + MemSet(&ctl, 0, sizeof(ctl)); >> + ctl.keysize = sizeof(uint32); >> + ctl.entrysize = sizeof(LogicalRepRelMapEntry); >> + ctl.hcxt = CacheMemoryContext; > > Wonder if this (and similar code earlier) should try to do everything in > a sub-context of CacheMemoryContext instead. That'd make some issues > easier to track down. Sure. don't see why not. > >> +/* >> + * Open the local relation associated with the remote one. >> + */ >> +static LogicalRepRelMapEntry * >> +logicalreprel_open(uint32 remoteid, LOCKMODE lockmode) >> +{ >> + LogicalRepRelMapEntry *entry; >> + bool found; >> + >> + if (LogicalRepRelMap == NULL) >> + remoterelmap_init(); >> + >> + /* Search for existing entry. */ >> + entry = hash_search(LogicalRepRelMap, (void *) &remoteid, >> + HASH_FIND, &found); >> + >> + if (!found) >> + elog(FATAL, "cache lookup failed for remote relation %u", >> + remoteid); >> + >> + /* Need to update the local cache? */ >> + if (!OidIsValid(entry->reloid)) >> + { >> + Oid nspid; >> + Oid relid; >> + int i; >> + TupleDesc desc; >> + LogicalRepRelation *remoterel; >> + >> + remoterel = &entry->remoterel; >> + >> + nspid = LookupExplicitNamespace(remoterel->nspname, false); >> + if (!OidIsValid(nspid)) >> + ereport(FATAL, >> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), >> + errmsg("the logical replication target %s not found", >> + quote_qualified_identifier(remoterel->nspname, > remoterel->relname)))); >> + relid = get_relname_relid(remoterel->relname, nspid); >> + if (!OidIsValid(relid)) >> + ereport(FATAL, >> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), >> + errmsg("the logical replication target %s not found", >> + quote_qualified_identifier(remoterel->nspname, >> + remoterel->relname)))); >> + >> + entry->rel = heap_open(relid, lockmode); > > This seems rather racy. I think this really instead needs something akin > to RangeVarGetRelidExtended(). Maybe, I am not sure if it really matters here given how it's used, but I can change that. > >> +/* >> + * Executor state preparation for evaluation of constraint expressions, >> + * indexes and triggers. >> + * >> + * This is based on similar code in copy.c >> + */ >> +static EState * >> +create_estate_for_relation(LogicalRepRelMapEntry *rel) >> +{ >> + EState *estate; >> + ResultRelInfo *resultRelInfo; >> + RangeTblEntry *rte; >> + >> + estate = CreateExecutorState(); >> + >> + rte = makeNode(RangeTblEntry); >> + rte->rtekind = RTE_RELATION; >> + rte->relid = RelationGetRelid(rel->rel); >> + rte->relkind = rel->rel->rd_rel->relkind; >> + estate->es_range_table = list_make1(rte); >> + >> + resultRelInfo = makeNode(ResultRelInfo); >> + InitResultRelInfo(resultRelInfo, rel->rel, 1, 0); >> + >> + estate->es_result_relations = resultRelInfo; >> + estate->es_num_result_relations = 1; >> + estate->es_result_relation_info = resultRelInfo; >> + >> + /* Triggers might need a slot */ >> + if (resultRelInfo->ri_TrigDesc) >> + estate->es_trig_tuple_slot = ExecInitExtraTupleSlot(estate); >> + >> + return estate; >> +} > > Ugh, we do this for every single change? That's pretty darn heavy. > I plan to add caching but didn't come up with good way of doing that yet. > >> +/* >> + * Check if the local attribute is present in relation definition used >> + * by upstream and hence updated by the replication. >> + */ >> +static bool >> +physatt_in_attmap(LogicalRepRelMapEntry *rel, int attid) >> +{ >> + AttrNumber i; >> + >> + /* Fast path for tables that are same on upstream and downstream. */ >> + if (attid < rel->remoterel.natts && rel->attmap[attid] == attid) >> + return true; >> + >> + /* Try to find the attribute in the map. */ >> + for (i = 0; i < rel->remoterel.natts; i++) >> + if (rel->attmap[i] == attid) >> + return true; >> + >> + return false; >> +} > > Shouldn't we rather try to keep an attribute map that always can map > remote attribute numbers to local ones? That doesn't seem hard on a > first blush? But I might be missing something here. > > >> +static void >> +FillSlotDefaults(LogicalRepRelMapEntry *rel, EState *estate, >> + TupleTableSlot *slot) >> +{ > > Why is this using a different naming scheme? > Because I originally wanted to put it into executor. >> +/* >> + * Handle INSERT message. >> + */ >> +static void >> +handle_insert(StringInfo s) >> +{ >> + LogicalRepRelMapEntry *rel; >> + LogicalRepTupleData newtup; >> + LogicalRepRelId relid; >> + EState *estate; >> + TupleTableSlot *remoteslot; >> + MemoryContext oldctx; >> + >> + ensure_transaction(); >> + >> + relid = logicalrep_read_insert(s, &newtup); >> + rel = logicalreprel_open(relid, RowExclusiveLock); >> + >> + /* Initialize the executor state. */ >> + estate = create_estate_for_relation(rel); >> + remoteslot = ExecInitExtraTupleSlot(estate); >> + ExecSetSlotDescriptor(remoteslot, RelationGetDescr(rel->rel)); > > This seems incredibly expensive for replicating a lot of rows. You mean because of create_estate_for_relation()? > >> +/* >> + * Search the relation 'rel' for tuple using the replication index. >> + * >> + * If a matching tuple is found lock it with lockmode, fill the slot with its >> + * contents and return true, return false is returned otherwise. >> + */ >> +static bool >> +tuple_find_by_replidx(Relation rel, LockTupleMode lockmode, >> + TupleTableSlot *searchslot, TupleTableSlot *slot) >> +{ >> + HeapTuple scantuple; >> + ScanKeyData skey[INDEX_MAX_KEYS]; >> + IndexScanDesc scan; >> + SnapshotData snap; >> + TransactionId xwait; >> + Oid idxoid; >> + Relation idxrel; >> + bool found; >> + >> + /* Open REPLICA IDENTITY index.*/ >> + idxoid = RelationGetReplicaIndex(rel); >> + if (!OidIsValid(idxoid)) >> + { >> + elog(ERROR, "could not find configured replica identity for table \"%s\"", >> + RelationGetRelationName(rel)); >> + return false; >> + } >> + idxrel = index_open(idxoid, RowExclusiveLock); >> + >> + /* Start an index scan. */ >> + InitDirtySnapshot(snap); >> + scan = index_beginscan(rel, idxrel, &snap, >> + RelationGetNumberOfAttributes(idxrel), >> + 0); >> + >> + /* Build scan key. */ >> + build_replindex_scan_key(skey, rel, idxrel, searchslot); >> + >> +retry: >> + found = false; >> + >> + index_rescan(scan, skey, RelationGetNumberOfAttributes(idxrel), NULL, 0); >> + >> + /* Try to find the tuple */ >> + if ((scantuple = index_getnext(scan, ForwardScanDirection)) != NULL) >> + { >> + found = true; >> + ExecStoreTuple(scantuple, slot, InvalidBuffer, false); >> + ExecMaterializeSlot(slot); >> + >> + xwait = TransactionIdIsValid(snap.xmin) ? >> + snap.xmin : snap.xmax; >> + >> + /* >> + * If the tuple is locked, wait for locking transaction to finish >> + * and retry. >> + */ >> + if (TransactionIdIsValid(xwait)) >> + { >> + XactLockTableWait(xwait, NULL, NULL, XLTW_None); >> + goto retry; >> + } >> + } > > Hm. So we potentially find multiple tuples here, and lock all of > them. but then only use one for the update. > That's not how that code reads for me. > >> +static List * >> +get_subscription_list(void) >> +{ >> + List *res = NIL; >> + Relation rel; >> + HeapScanDesc scan; >> + HeapTuple tup; >> + MemoryContext resultcxt; >> + >> + /* This is the context that we will allocate our output data in */ >> + resultcxt = CurrentMemoryContext; >> + >> + /* >> + * Start a transaction so we can access pg_database, and get a snapshot. >> + * We don't have a use for the snapshot itself, but we're interested in >> + * the secondary effect that it sets RecentGlobalXmin. (This is critical >> + * for anything that reads heap pages, because HOT may decide to prune >> + * them even if the process doesn't attempt to modify any tuples.) >> + */ > >> + StartTransactionCommand(); >> + (void) GetTransactionSnapshot(); >> + >> + rel = heap_open(SubscriptionRelationId, AccessShareLock); >> + scan = heap_beginscan_catalog(rel, 0, NULL); >> + >> + while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection))) >> + { >> + Form_pg_subscription subform = (Form_pg_subscription) GETSTRUCT(tup); >> + Subscription *sub; >> + MemoryContext oldcxt; >> + >> + /* >> + * Allocate our results in the caller's context, not the >> + * transaction's. We do this inside the loop, and restore the original >> + * context at the end, so that leaky things like heap_getnext() are >> + * not called in a potentially long-lived context. >> + */ >> + oldcxt = MemoryContextSwitchTo(resultcxt); >> + >> + sub = (Subscription *) palloc(sizeof(Subscription)); >> + sub->oid = HeapTupleGetOid(tup); >> + sub->dbid = subform->subdbid; >> + sub->enabled = subform->subenabled; >> + >> + /* We don't fill fields we are not intereste in. */ >> + sub->name = NULL; >> + sub->conninfo = NULL; >> + sub->slotname = NULL; >> + sub->publications = NIL; >> + >> + res = lappend(res, sub); >> + MemoryContextSwitchTo(oldcxt); >> + } >> + >> + heap_endscan(scan); >> + heap_close(rel, AccessShareLock); >> + >> + CommitTransactionCommand(); > > Hm. this doesn't seem quite right from a locking pov. What if, in the > middle of this, a new subscription is created? > So it will be called again eventually in the next iteration of main loop. We don't perfectly stable world view here, just snapshot of it to work with. > > Hadn't I previously read about always streaming data to disk first? > >> @@ -0,0 +1,674 @@ >> +/*------------------------------------------------------------------------- >> + * tablesync.c >> + * PostgreSQL logical replication >> + * >> + * Copyright (c) 2012-2016, PostgreSQL Global Development Group >> + * >> + * IDENTIFICATION >> + * src/backend/replication/logical/tablesync.c >> + * >> + * NOTES >> + * This file contains code for initial table data synchronization for >> + * logical replication. >> + * >> + * The initial data synchronization is done separately for each table, >> + * in separate apply worker that only fetches the initial snapshot data >> + * from the provider and then synchronizes the position in stream with >> + * the main apply worker. > > Why? I guess that's because it allows to incrementally add tables, with > acceptable overhead. > Yes I need to document why's more here. It enables us to copy multiple tables in parallel (in the future). It also is needed for adding tables after the initial sync as you say. > >> + * The stream position synchronization works in multiple steps. >> + * - sync finishes copy and sets table state as SYNCWAIT and waits >> + * for state to change in a loop >> + * - apply periodically checks unsynced tables for SYNCWAIT, when it >> + * appears it will compare its position in the stream with the >> + * SYNCWAIT position and decides to either set it to CATCHUP when >> + * the apply was infront (and wait for the sync to do the catchup), >> + * or set the state to SYNCDONE if the sync was infront or in case >> + * both sync and apply are at the same position it will set it to >> + * READY and stops tracking it > > I'm not quite following here. > It's hard for me to explain I guess, that's why the flow diagram is underneath. The point is to reach same LSN for the table before the main apply process can take over the replication of that table. There are 2 possible scenarios a) either apply has replayed more of the stream than sync did and then the sync needs to ask apply to wait for it a bit (which blocks replication for short while) b) or the sync has replayed more of the stream than sync and then apply needs to track the table for a while (and don't apply changes to it) until it reaches the same position where sync stopped and once it reaches that point it can just apply changes to it same as to any old table >> + * - if the state was set to CATCHUP sync will read the stream and >> + * apply changes until it catches up to the specified stream >> + * position and then sets state to READY and signals apply that it >> + * can stop waiting and exits, if the state was set to something >> + * else than CATCHUP the sync process will simply end >> + * - if the state was set to SYNCDONE by apply, the apply will >> + * continue tracking the table until it reaches the SYNCDONE stream >> + * position at which point it sets state to READY and stops tracking >> + * >> + * Example flows look like this: >> + * - Apply is infront: >> + * sync:8 -> set SYNCWAIT >> + * apply:10 -> set CATCHUP >> + * sync:10 -> set ready >> + * exit >> + * apply:10 >> + * stop tracking >> + * continue rep >> + * - Sync infront: >> + * sync:10 >> + * set SYNCWAIT >> + * apply:8 >> + * set SYNCDONE >> + * sync:10 >> + * exit >> + * apply:10 >> + * set READY >> + * stop tracking >> + * continue rep > > This definitely needs to be expanded a bit. Where are we tracking how > far replication has progressed on individual tables? Are we creating new > slots for syncing? Is there any parallelism in syncing? > Yes, new slots, tracking is in pg_subscription_rel, parallelism is not there yet, but the design is ready for expanding it (I currently artificially limit the number of sync workers to one to limit potential bugs, but afaik it could just be bumped to more and it should work). >> +/* >> + * Exit routine for synchronization worker. >> + */ >> +static void >> +finish_sync_worker(char *slotname) >> +{ >> + LogicalRepWorker *worker; >> + RepOriginId originid; >> + MemoryContext oldctx = CurrentMemoryContext; >> + >> + /* >> + * Drop the replication slot on remote server. >> + * We want to continue even in the case that the slot on remote side >> + * is already gone. This means that we can leave slot on the remote >> + * side but that can happen for other reasons as well so we can't >> + * really protect against that. >> + */ >> + PG_TRY(); >> + { >> + wrcapi->drop_slot(wrchandle, slotname); >> + } >> + PG_CATCH(); >> + { >> + MemoryContext ectx; >> + ErrorData *edata; >> + >> + ectx = MemoryContextSwitchTo(oldctx); >> + /* Save error info */ >> + edata = CopyErrorData(); >> + MemoryContextSwitchTo(ectx); >> + FlushErrorState(); >> + >> + ereport(WARNING, >> + (errmsg("there was problem dropping the replication slot " >> + "\"%s\" on provider", slotname), >> + errdetail("The error was: %s", edata->message), >> + errhint("You may have to drop it manually"))); >> + FreeErrorData(edata); > > ISTM we really should rather return success/failure here, and not throw > an error inside the libpqwalreceiver stuff. I kind of wonder if we > actually can get rid of this indirection. > Yeah I can do success/failure. Not sure what you mean by indirection. >> + * to ensure that we are not behind it (it's going to wait at this >> + * point for the change of state). Once we are infront or at the same >> + * position as the synchronization proccess we can signal it to >> + * finish the catchup. >> + */ >> + if (tstate->state == SUBREL_STATE_SYNCWAIT) >> + { >> + if (end_lsn > tstate->lsn) >> + { >> + /* >> + * Apply is infront, tell sync to catchup. and wait until >> + * it does. >> + */ >> + tstate->state = SUBREL_STATE_CATCHUP; >> + tstate->lsn = end_lsn; >> + StartTransactionCommand(); >> + SetSubscriptionRelState(MyLogicalRepWorker->subid, >> + tstate->relid, tstate->state, >> + tstate->lsn); >> + CommitTransactionCommand(); >> + >> + /* Signal the worker as it may be waiting for us. */ >> + LWLockAcquire(LogicalRepWorkerLock, LW_SHARED); >> + worker = logicalrep_worker_find(MyLogicalRepWorker->subid, >> + tstate->relid); >> + if (worker && worker->proc) >> + SetLatch(&worker->proc->procLatch); >> + LWLockRelease(LogicalRepWorkerLock); > > Different parts of this file use different lock level to set the > latch. Why? > The latch does not need the lock, not really following what you mean. But the lock here is for the benefit of logicalrep_worker_find. > >> + if (wait_for_sync_status_change(tstate)) >> + Assert(tstate->state == SUBREL_STATE_READY); >> + } >> + else >> + { >> + /* >> + * Apply is either behind in which case sync worker is done >> + * but apply needs to keep tracking the table until it >> + * catches up to where sync finished. >> + * Or apply and sync are at the same position in which case >> + * table can be switched to standard replication mode >> + * immediately. >> + */ >> + if (end_lsn < tstate->lsn) >> + tstate->state = SUBREL_STATE_SYNCDONE; >> + else >> + tstate->state = SUBREL_STATE_READY; >> + > > What I'm failing to understand is how this can be done under > concurrency. You probably thought about this, but it should really be > explained somewhere. Well, so, if the original state was syncdone (the previous branch) the apply won't actually do any work until the state changes (and it can only change to either syncdone or ready at that point) so there is no real concurrently. If reach this branch then either sync worker already exited (if it set the state to syncdone) or it's not doing anything and is waiting for apply to set state to ready in which case there is also no concurrency. >> + /* >> + * In case table is supposed to be synchronizing but the >> + * synchronization worker is not running, start it. >> + * Limit the number of launched workers here to one (for now). >> + */ > > Hm. That seems problematic for online upgrade type cases, we might never > be catch up that way... > You mean the limit to 1? That's just because I didn't get to creating GUC for configuring this. > > >> + /* >> + * We want to do the table data sync in single >> + * transaction so do not close the transaction opened >> + * above. >> + * There will be no BEGIN or COMMIT messages coming via >> + * logical replication while the copy table command is >> + * running so start the transaction here. >> + * Note the memory context for data handling will still >> + * be done using ensure_transaction called by the insert >> + * handler. >> + */ >> + StartTransactionCommand(); >> + >> + /* >> + * Don't allow parallel access other than SELECT while >> + * the initial contents are being copied. >> + */ >> + rel = heap_open(tstate.relid, ExclusiveLock); > > Why do we want to allow access at all? > I didn't see reason to not allow selects. > > >> @@ -87,6 +92,8 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb) >> cb->commit_cb = pgoutput_commit_txn; >> cb->filter_by_origin_cb = pgoutput_origin_filter; >> cb->shutdown_cb = pgoutput_shutdown; >> + cb->tuple_cb = pgoutput_tuple; >> + cb->list_tables_cb = pgoutput_list_tables; >> } > > What are these new, and undocumented callbacks actually doing? And why > is this integrated into logical decoding? > In the initial email I was saying that I am not very happy with this design, that's still true, because they don't belong to decoding. > >> /* >> + * Handle LIST_TABLES command. >> + */ >> +static void >> +SendTableList(ListTablesCmd *cmd) >> +{ > > Ugh. > > > I really dislike this kind of command. I think we should instead change > things around, allowing to issue normal SQL via the replication > command. We'll have to error out for running sql for non-database > connected replication connections, but that seems fine. > Note per discussion offline we agree to do this stuff over normal connection for now. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 14/09/16 21:53, Andres Freund wrote: > Hi, > > On 2016-09-14 21:17:42 +0200, Petr Jelinek wrote: >>>> +/* >>>> + * Gather Relations based o provided by RangeVar list. >>>> + * The gathered tables are locked in access share lock mode. >>>> + */ >>> >>> Why access share? Shouldn't we make this ShareUpdateExclusive or >>> similar, to prevent schema changes? >>> >> >> Hm, I thought AccessShare would be enough to prevent schema changes that >> matter to us (which is basically just drop afaik). > > Doesn't e.g. dropping an index matter as well? > Drop of primary key matters I guess. > >>>> + if (strcmp($1, "replicate_insert") == 0) >>>> + $$ = makeDefElem("replicate_insert", >>>> + (Node *)makeInteger(TRUE), @1); >>>> + else if (strcmp($1, "noreplicate_insert") == 0) >>>> + $$ = makeDefElem("replicate_insert", >>>> + (Node *)makeInteger(FALSE), @1); >>>> + else if (strcmp($1, "replicate_update") == 0) >>>> + $$ = makeDefElem("replicate_update", >>>> + (Node *)makeInteger(TRUE), @1); >>>> + else if (strcmp($1, "noreplicate_update") == 0) >>>> + $$ = makeDefElem("replicate_update", >>>> + (Node *)makeInteger(FALSE), @1); >>>> + else if (strcmp($1, "replicate_delete") == 0) >>>> + $$ = makeDefElem("replicate_delete", >>>> + (Node *)makeInteger(TRUE), @1); >>>> + else if (strcmp($1, "noreplicate_delete") == 0) >>>> + $$ = makeDefElem("replicate_delete", >>>> + (Node *)makeInteger(FALSE), @1); >>>> + else >>>> + ereport(ERROR, >>>> + (errcode(ERRCODE_SYNTAX_ERROR), >>>> + errmsg("unrecognized publication option \"%s\"", $1), >>>> + parser_errposition(@1))); >>>> + } >>>> + ; >>> >>> I'm kind of inclined to do this checking at execution (or transform) >>> time instead. That allows extension to add options, and handle them in >>> utility hooks. >>> >> >> Thant's interesting point, I prefer the parsing to be done in gram.y, but it >> might be worth moving it for extensibility. Although there are so far other >> barriers for that. > > Citus uses the lack of such check for COPY to implement copy over it's > distributed tables for example. So there's some benefit. > Yeah I am not saying that I am fundamentally against it, I am just saying it won't help all that much probably. > > >>>> + check_subscription_permissions(); >>>> + >>>> + rel = heap_open(SubscriptionRelationId, RowExclusiveLock); >>>> + >>>> + /* Check if name is used */ >>>> + subid = GetSysCacheOid2(SUBSCRIPTIONNAME, MyDatabaseId, >>>> + CStringGetDatum(stmt->subname)); >>>> + if (OidIsValid(subid)) >>>> + { >>>> + ereport(ERROR, >>>> + (errcode(ERRCODE_DUPLICATE_OBJECT), >>>> + errmsg("subscription \"%s\" already exists", >>>> + stmt->subname))); >>>> + } >>>> + >>>> + /* Parse and check options. */ >>>> + parse_subscription_options(stmt->options, &enabled_given, &enabled, >>>> + &conninfo, &publications); >>>> + >>>> + /* TODO: improve error messages here. */ >>>> + if (conninfo == NULL) >>>> + ereport(ERROR, >>>> + (errcode(ERRCODE_SYNTAX_ERROR), >>>> + errmsg("connection not specified"))); >>> >>> Probably also makes sense to parse the conninfo here to verify it looks >>> saen. Although that's fairly annoying to do, because the relevant code >>> is libpq :( >>> >> >> Well the connection is eventually used (in later patches) so maybe that's >> not problem. > > Well, it's nicer if it's immediately parsed, before doing complex and > expensive stuff, especially if that happens outside of the transaction. > Maybe, it's not too hard to add another function to libpqwalreceiver I guess. > >>> >>>> diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c >>>> index 65230e2..f3d54c8 100644 >>>> --- a/src/backend/nodes/copyfuncs.c >>>> +++ b/src/backend/nodes/copyfuncs.c >>> >>> I think you might be missing outfuncs support. >>> >> >> I thought that we don't do outfuncs for DDL? > > I think it's just readfuncs that's skipped. > I see only couple odd DDL commands in outfuncs.c. > >>>> + Length of column name (including the NULL-termination >>>> + character). >>>> +</para> >>>> +</listitem> >>>> +</varlistentry> >>>> +<varlistentry> >>>> +<term> >>>> + String >>>> +</term> >>>> +<listitem> >>>> +<para> >>>> + Name of the column. >>>> +</para> >>>> +</listitem> >>>> +</varlistentry> >>> >>> Huh, no type information? >>> >> >> It's not necessary for the text transfer, it will be if we ever add binary >> data transfer but that will require protocol version bump anyway. > > I'm *hugely* unconvinced of this. For one type information is useful for > error reporting and such as well. For another, it's one thing to add a > new protocol message (for differently encoded tuples), and something > entirely different to change the format of existing messages. > Well it's one if on wrrite and one if on read side in this case, but I can add it, it's rather simple change. One thing that we need to clarify is how we actually send type info, I think for builtin types Oid should be enough, but for all other ones we need qualified name of the type IMHO. > >>>> + >>>> +/* >>>> + * COMMIT callback >>>> + */ >>>> +static void >>>> +pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, >>>> + XLogRecPtr commit_lsn) >>>> +{ >>>> + OutputPluginPrepareWrite(ctx, true); >>>> + logicalrep_write_commit(ctx->out, txn, commit_lsn); >>>> + OutputPluginWrite(ctx, true); >>>> +} >>> >>> Hm, so we don't reset the context for these... >>> >> >> What? > > We only use & reset the data-> memory context in the change > callback. I'm not sure that's good. > Well we don't do anything with the data memory context here. > > >>> This however I'm not following. Why do we need multiple copies of this? >>> And why aren't we doing the assignments in _PG_init? Seems better to >>> just allocate one WalRcvCalllbacks globally and assign all these as >>> constants. Then the establishment function can just return all these >>> (as part of a bigger struct). >>> >> >> Meh, If I understand you correctly that will make the access bit more ugly >> (multiple layers of structs). > > On the other hand, you right now need to access one struct, and pass the > other... > Point taken. > > >>> This does rather reinforce my opinion that the _PG_init removal in >>> libpqwalreceiver isn't useful. >> >> I don't see how it helps, you said we'd still return struct from some >> interface so this would be more or less the same? > > Or we just set some global vars and use them directly. > I really hate the "global vars filled by external library when loaded" as design pattern, it's how it was done before but it's ugly, especially when you share the library between multiple C modules later. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 14 September 2016 at 04:56, Petr Jelinek <petr@2ndquadrant.com> wrote: > Not sure what you mean by negotiation. Why would that be needed? You know > server version when you connect and when you know that you also know what > capabilities that version of Postgres has. If you send unrecognized option > you get corresponding error. Right, because we can rely on the server version = the logical replication version now. All good. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 09/08/2016 06:59 PM, Petr Jelinek wrote: > - the CREATE SUBSCRIPTION also tries to check if the specified > connection connects back to same db (although that check is somewhat > imperfect) and if it gets stuck on create slot it should be normally > cancelable (that should solve the issue Steve Singer had) When I create my subscriber database by doing a physical backup of the publisher cluster (with cp before I add any data) then I am unable to connect subscribe. ie initdb ../data cp -r ../data ../data2 ./postgres -D ../data ./postgres -D ../data2 This make sense when I look at your code, but it might not be what we want I had the same issue when I created my subscriber cluster with pg_basebackup (The timeline on the destination cluster still shows as 1)
On 09/08/2016 06:59 PM, Petr Jelinek wrote: > Hi, > > Updated version, this should address most of the things in Peter's > reviews so far, not all though as some of it needs more discussion. > Another bug report. I had subscribed a subscriber database to a publication with 1 table create table a (a serial4 primary key, b text); * I then dropped column b on the subscriber * inserted some rows on the publisher * Noticed the expected error about column b not existing in the subscriber log * Added column c on the subscriber, then added column b after column C I now get the following stack trace #1 0x00000000007dc8f9 in cstring_to_text ( s=0x16f238af0 <error: Cannot access memory at address 0x16f238af0>) atvarlena.c:152 #2 0x00000000008046a3 in InputFunctionCall ( flinfo=flinfo@entry=0x7fffa02d0250, str=str@entry=0x16f238af0 <error:Cannot access memory at address 0x16f238af0>, typioparam=typioparam@entry=25, typmod=typmod@entry=-1) at fmgr.c:1909 #3 0x0000000000804971 in OidInputFunctionCall (functionId=<optimized out>, str=0x16f238af0 <error: Cannot access memoryat address 0x16f238af0>, typioparam=25, typmod=-1) at fmgr.c:2040 #4 0x00000000006aa485 in SlotStoreCStrings (slot=0x2748670, values=0x7fffa02d0330) at apply.c:569 #5 0x00000000006ab45c in handle_insert (s=0x274d088) at apply.c:756 #6 0x00000000006abcea in handle_message (s=0x7fffa02d3e20) at apply.c:978 #7 LogicalRepApplyLoop (last_received=117457680) at apply.c:1146 #8 0x00000000006ac37e in ApplyWorkerMain (main_arg=<optimized out>) at apply.c:1530 In SlotStoreCStrings values only has 2 elements but natts is 4 > Changes: > - I moved the publication.c to pg_publication.c, subscription.c to > pg_subscription.c. > - changed \drp and \drs to \dRp and \dRs > - fixed definitions of the catalogs (BKI_ROWTYPE_OID) > - changed some GetPublication calls to get_publication_name > - fixed getObjectIdentityParts for OCLASS_PUBLICATION_REL > - fixed get_object_address_publication_rel > - fixed the dependencies between pkeys and publications, for this I > actually had to add new interface to depenency.c that allows dropping > single dependency > - fixed the 'for all tables' and 'for tables all in schema' publications > - changed the alter publication from FOR to SET > - added more test cases for the publication DDL > - fixed compilation of subscription patch alone and docs > - changed subpublications to name[] > - added check for publication list duplicates > - made the subscriptions behave more like they are inside the database > instead of shared catalog (even though the catalog is still shared) > - added options for for CREATE SUBSCRIPTION to optionally not create > slot and not do the initial data sync - that should solve the > complaint about CREATE SUBSCRIPTION always connecting > - the CREATE SUBSCRIPTION also tries to check if the specified > connection connects back to same db (although that check is somewhat > imperfect) and if it gets stuck on create slot it should be normally > cancelable (that should solve the issue Steve Singer had) > - fixed the tests to work in any timezone > - added DDL regress tests for subscription > - added proper detection of missing schemas and tables on subscriber > - rebased on top of 19acee8 as the DefElem changes broke the patch > > The table sync is still far from ready. > > >
On 9/18/16 4:17 PM, Steve Singer wrote: > When I create my subscriber database by doing a physical backup of the > publisher cluster (with cp before I add any data) then I am unable to > connect subscribe. > ie > initdb ../data > cp -r ../data ../data2 > ./postgres -D ../data > ./postgres -D ../data2 > > This make sense when I look at your code, but it might not be what we want I think if we want to prevent the creation of subscriptions that point to self, then we need to create a magic token when the postmaster starts and check for that when we connect. So more of a running-instance identifier instead of a instance-on-disk identifier. The other option is that we just allow it and make it more robust. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, 20 Sep 2016, Peter Eisentraut wrote: > On 9/18/16 4:17 PM, Steve Singer wrote: > > I think if we want to prevent the creation of subscriptions that point > to self, then we need to create a magic token when the postmaster starts > and check for that when we connect. So more of a running-instance > identifier instead of a instance-on-disk identifier. > > The other option is that we just allow it and make it more robust. I think we should go with the second option for now. I feel that the effort is better spent making sure that initial syncs that have don't subscribe (for whatever reasons) can be aborted instead of trying to build a concept of node identity before we really need it. Steve > > -- > Peter Eisentraut http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >
On 21/09/16 05:35, Steve Singer wrote: > On Tue, 20 Sep 2016, Peter Eisentraut wrote: > >> On 9/18/16 4:17 PM, Steve Singer wrote: > >> >> I think if we want to prevent the creation of subscriptions that point >> to self, then we need to create a magic token when the postmaster starts >> and check for that when we connect. So more of a running-instance >> identifier instead of a instance-on-disk identifier. >> >> The other option is that we just allow it and make it more robust. > > I think we should go with the second option for now. I feel that the > effort is better spent making sure that initial syncs that have don't > subscribe (for whatever reasons) can be aborted instead of trying to > build a concept of node identity before we really need it. > Well connecting to yourself will always hang though because the slot creation needs snapshot and it will wait forever for the current query to finish. So it will never really work. The hanging query is now abortable though. Question is if doing the logical snapshot is really required since we don't really use the snapshot for anything here. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Some partial notes on 0005-Add-logical-replication-workers.patch: Documentation still says that TRUNCATE is supported. In catalogs.sgml for pg_subscription column subpublications I'd add a note that those are publications that live on the remote server. Otherwise one might think by mistake that it references pg_publication. The changes in reference.sgml should go into an earlier patch. Document that table and column names are matched by name. (This seems obvious, but it's not explained anywhere, AFAICT.) Document to what extent other relation types are supported (e.g., materialized views as source, view or foreign table or temp table as target). Suggest an updatable view as target if user wants to have different table names or write into a different table structure. subscriptioncmds.c: In CreateSubscription(), the CommandCounterIncrement() call is apparently not needed. subscriptioncmds.c: Duplicative code for libpqwalreceiver loading and init, should be refactored. subscriptioncmds.c: Perhaps combine logicalrep_worker_find() and logicalrep_worker_stop() into one call that also encapsulates the required locking. 001_rep_changes.pl: The TAP protocol does not allow direct printing to stdout. (It needs to be prefixed with # or with spaces or something; I forget.) In this case, the print calls can just be removed, because the following is() calls in each case will print the failing value anyway. In get_subscription_list(), the memory context pointers don't appear to do anything useful, because everything ends up being CurrentMemoryContext. pg_stat_get_subscription(NULL) for "all" seems a bit of a weird interface. pglogical_apply_main not used, should be removed. In logicalreprel_open(), the error message "cache lookup failed for remote relation %u" could be clarified. This message could probably happen if the protocol did not send a Relation message first. (The term "cache" is perhaps inappropriate for LogicalRepRelMap, because it implies that the value can be gotten from elsewhere if it's not in the cache. In this case it's really session state that cannot be recovered easily.) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 21/09/16 15:04, Peter Eisentraut wrote: > Some partial notes on 0005-Add-logical-replication-workers.patch: > > Document to what extent other relation types are supported (e.g., > materialized views as source, view or foreign table or temp table as > target). Suggest an updatable view as target if user wants to have > different table names or write into a different table structure. > I don't think that's good suggestion, for one it won't work for UPDATEs as we have completely different path for finding the tuple to update which only works on real data, not on view. I am thinking of even just allowing table to table replication in v1 tbh, but yes it should be documented what target relation types can be. > > subscriptioncmds.c: Perhaps combine logicalrep_worker_find() and > logicalrep_worker_stop() into one call that also encapsulates the > required locking. I was actually thinking of moving the wait loop that waits for worker to finish there as well. > > In get_subscription_list(), the memory context pointers don't appear to > do anything useful, because everything ends up being CurrentMemoryContext. > That's kind of the point of the memory context pointers there though as we start transaction inside that function. > pg_stat_get_subscription(NULL) for "all" seems a bit of a weird interface. > I modeled that after pg_stat_get_activity() which seems to be similar type of interface. > pglogical_apply_main not used, should be removed. > Hah. > In logicalreprel_open(), the error message "cache lookup failed for > remote relation %u" could be clarified. This message could probably > happen if the protocol did not send a Relation message first. (The term > "cache" is perhaps inappropriate for LogicalRepRelMap, because it > implies that the value can be gotten from elsewhere if it's not in the > cache. In this case it's really session state that cannot be recovered > easily.) > Yeah I have different code and error for that now. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 9/23/16 9:28 PM, Petr Jelinek wrote: >> Document to what extent other relation types are supported (e.g., >> > materialized views as source, view or foreign table or temp table as >> > target). Suggest an updatable view as target if user wants to have >> > different table names or write into a different table structure. >> > > I don't think that's good suggestion, for one it won't work for UPDATEs > as we have completely different path for finding the tuple to update > which only works on real data, not on view. I am thinking of even just > allowing table to table replication in v1 tbh, but yes it should be > documented what target relation types can be. I'll generalize this then to: Determine which relation types should be supported at either end, document that, and then make sure it works that way. A restrictive implementation is OK for the first version, as long as it keeps options open. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Sep 28, 2016 at 10:12 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 9/23/16 9:28 PM, Petr Jelinek wrote: >>> Document to what extent other relation types are supported (e.g., >>> > materialized views as source, view or foreign table or temp table as >>> > target). Suggest an updatable view as target if user wants to have >>> > different table names or write into a different table structure. >>> > >> I don't think that's good suggestion, for one it won't work for UPDATEs >> as we have completely different path for finding the tuple to update >> which only works on real data, not on view. I am thinking of even just >> allowing table to table replication in v1 tbh, but yes it should be >> documented what target relation types can be. > > I'll generalize this then to: Determine which relation types should be > supported at either end, document that, and then make sure it works that > way. A restrictive implementation is OK for the first version, as long > as it keeps options open. The newest patch is 3-week old, so marking this entry as returned with feedback. -- Michael
Hi, attached is updated version of the patch. There are quite a few improvements and restructuring, I fixed all the bugs and basically everything that came up from the reviews and was agreed on. There are still couple of things missing, ie column type definition in protocol and some things related to existing data copy. The biggest changes are: I added one more prerequisite patch (the first one) which adds ephemeral slots (or well implements UI on top of the code that was mostly already there). The ephemeral slots are different in that they go away either on error or when session is closed. This means the initial data sync does not have to worry about cleaning up the slots after itself. I think this will be useful in other places as well (for example basebackup). I originally wanted to call them temporary slots in the UI but since the behavior is bit different from temp tables I decided to go with what the underlying code calls them in UI as well. I also split out the libpqwalreceiver rewrite to separate patch which does just the re-architecture and does not really add new functionality. And I did the re-architecture bit differently based on the review. There is now new executor module in execReplication.c, no new nodes but several utility commands. I moved there the tuple lookup functions from apply and also wrote new interfaces for doing inserts/updates/deletes to a table including index updates and constraints checks and trigger execution but without the need for the whole nodeModifyTable handling. What I also did when rewriting this is implementation of the tuple lookup also using sequential scan so that we can support replica identity full properly. This greatly simplified the dependency handling between pkeys and publications (by removing it completely ;) ). Also when there is replica identity full and the table has primary key, the code will use the primary key even though it's not replica identity index to lookup the row so that users who want to combine the logical replication with some kind of other system that requires replica identity full (ie auditing) they still get usable experience. The way copy is done was heavily reworked. For one it uses the ephemeral slots mentioned above. But more importantly there are now new custom commands anymore. Instead the walsender accepts some SQL, currently allowed are BEGIN, ROLLBACK, SELECT and COPY. The way that is implemented is probably not perfect and it could use look from somebody who knows bison better. How it works is that if the command sent to walsender starts with one of the above mentioned keywords the walsender parser passes the whole query back and it's passed then to exec_simple_query. The main reason why we need BEGIN is so that the COPY can use the snapshot exported by the slot creation so that there is synchronization point when there are concurrent writes. This probably needs more discussion. I also tried to keep the naming more consistent so cleaned up all mentions of "provider" and changed them to "publisher" and also publications don't mention that they "replicate", they just "publish" now (that has effect on DDL syntax as well). Some things that were discussed in the reviews that I didn't implement knowingly include: Removal of the Oid in the pg_publication_rel, that's mainly because it would need significant changes to pg_dump which assumes everything that's dumped has Oid and it's not something that seems worth it as part of this patch. Also didn't do the outfuncs, it's unclear to me what are the rules there as the only DDL statement there is CreateStmt atm. There are still few TODOs: Type info for columns. My current best idea is to write typeOid and typemod in the relation message and add another message (type message) that describes the type which will skip the built-in types (as we can't really remap those without breaking a lot of software so they seem safe to skip). I plan to do this soonish barring objections. Removal of use of replication origin in the table sync worker. Parallelization of the initial copy. And ability to resync (do new copy) of a table. These two mainly wait for agreement over how the current way of doing copy should work. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
- 0001-Add-user-interface-for-EPHEMERAL-replication-slots.patch.gz
- 0002-Make-libpqwalreceiver-reentrant.patch.gz
- 0003-Add-PUBLICATION-catalogs-and-DDL.patch.gz
- 0004-Add-SUBSCRIPTION-catalog-and-DDL.patch.gz
- 0005-Define-logical-replication-protocol-and-output-plugi.patch.gz
- 0006-Add-logical-replication-workers.patch.gz
- 0007-Logical-replication-support-for-initial-data-copy.patch.gz
On 10/24/2016 09:22 AM, Petr Jelinek wrote: > Hi, > > attached is updated version of the patch. > > There are quite a few improvements and restructuring, I fixed all the > bugs and basically everything that came up from the reviews and was > agreed on. There are still couple of things missing, ie column type > definition in protocol and some things related to existing data copy. Here are a few things I've noticed so far. +<programlisting> +CREATE SUBSCRIPTION mysub WITH CONNECTION <quote>dbname=foo host=bar user=repuser</quote> PUBLICATION mypub; +</programlisting> + </para> + <para> The documentation above doesn't match the syntax, CONNECTION needs to be in single quotes not double quotes I think you want +<programlisting> +CREATE SUBSCRIPTION mysub WITH CONNECTION 'dbname=foo host=bar user=repuser' PUBLICATION mypub; +</programlisting> + </para> + <para> I am not sure if this is a known issue covered by your comments about data copy but I am still having issues with error reporting on a failed subscription. I created a subscription, dropped the subscription and created a second one. The second subscription isn't active but shows no errors. P: create publication mypub for table public.a; S: create subscription mysub with connection 'dbname=test host=localhost port=5440' publication mypub; P: insert into a(b) values ('t'); S: select * FROM a; a | b ---+--- 1 | t (1 row) Everything is good Then I do S: drop subscription mysub; S: create subscription mysub2 with connection 'dbname=test host=localhost port=5440' publication mypub; P: insert into a(b) values ('f'); S: select * FROM a; a | b ---+--- 1 | t The data doesn't replicate select * FROM pg_stat_subscription; subid | subname | pid | relid | received_lsn | last_msg_send_time | last_msg_receipt_time | latest_end_lsn | latest_end_time -------+---------+-----+-------+--------------+--------------------+-----------------------+----------------+----------------- 16398| mysub2 | | | | | | | (1 row) The only thing in my log is 2016-10-30 15:27:27.038 EDT [6028] NOTICE: dropped replication slot "mysub" on publisher 2016-10-30 15:27:36.072 EDT [6028] NOTICE: created replication slot "mysub2" on publisher 2016-10-30 15:27:36.082 EDT [6028] NOTICE: synchronized table states I'd expect an error in the log or something. However, if I delete everything from the table on the subscriber then the subscription proceeds I think there are still problems with signal handling in the initial sync If I try to drop mysub2 (while the subscription is stuck instead of deleting the data) the drop hangs If I then try to kill the postmaster for the subscriber nothing happens, have to send it a -9 to go away. However once I do that and then restart the postmaster for the subscriber I start to see the duplicate key errors in the log 2016-10-30 16:00:54.635 EDT [7018] ERROR: duplicate key value violates unique constraint "a_pkey" 2016-10-30 16:00:54.635 EDT [7018] DETAIL: Key (a)=(1) already exists. 2016-10-30 16:00:54.635 EDT [7018] CONTEXT: COPY a, line 1 2016-10-30 16:00:54.637 EDT [7007] LOG: worker process: logical replication worker 16400 sync 16387 (PID 7018) exited with exit code 1 I'm not sure why I didn't get those until I restarted the postmaster but it seems to happen whenever I drop a subscription then create a new one. Creating the second subscription from the same psql session as I create/drop the first seems important in reproducing this I am also having issues dropping a second subscription from the same psql session (table a is empty on both nodes to avoid duplicate key errors) S: create subscription sub1 with connection 'host=localhost dbname=test port=5440' publication mypub; S: create subscription sub2 with connection 'host=localhost dbname=test port=5440' publication mypub; S: drop subscription sub1; S: drop subscription sub2; At this point the drop subscription hangs. > > The biggest changes are: > > I added one more prerequisite patch (the first one) which adds ephemeral > slots (or well implements UI on top of the code that was mostly already > there). The ephemeral slots are different in that they go away either on > error or when session is closed. This means the initial data sync does > not have to worry about cleaning up the slots after itself. I think this > will be useful in other places as well (for example basebackup). I > originally wanted to call them temporary slots in the UI but since the > behavior is bit different from temp tables I decided to go with what the > underlying code calls them in UI as well. > > I also split out the libpqwalreceiver rewrite to separate patch which > does just the re-architecture and does not really add new functionality. > And I did the re-architecture bit differently based on the review. > > There is now new executor module in execReplication.c, no new nodes but > several utility commands. I moved there the tuple lookup functions from > apply and also wrote new interfaces for doing inserts/updates/deletes to > a table including index updates and constraints checks and trigger > execution but without the need for the whole nodeModifyTable handling. > > What I also did when rewriting this is implementation of the tuple > lookup also using sequential scan so that we can support replica > identity full properly. This greatly simplified the dependency handling > between pkeys and publications (by removing it completely ;) ). Also > when there is replica identity full and the table has primary key, the > code will use the primary key even though it's not replica identity > index to lookup the row so that users who want to combine the logical > replication with some kind of other system that requires replica > identity full (ie auditing) they still get usable experience. > > The way copy is done was heavily reworked. For one it uses the ephemeral > slots mentioned above. But more importantly there are now new custom > commands anymore. Instead the walsender accepts some SQL, currently > allowed are BEGIN, ROLLBACK, SELECT and COPY. The way that is > implemented is probably not perfect and it could use look from somebody > who knows bison better. How it works is that if the command sent to > walsender starts with one of the above mentioned keywords the walsender > parser passes the whole query back and it's passed then to > exec_simple_query. The main reason why we need BEGIN is so that the COPY > can use the snapshot exported by the slot creation so that there is > synchronization point when there are concurrent writes. This probably > needs more discussion. > > I also tried to keep the naming more consistent so cleaned up all > mentions of "provider" and changed them to "publisher" and also > publications don't mention that they "replicate", they just "publish" > now (that has effect on DDL syntax as well). > > > Some things that were discussed in the reviews that I didn't implement > knowingly include: > > Removal of the Oid in the pg_publication_rel, that's mainly because it > would need significant changes to pg_dump which assumes everything > that's dumped has Oid and it's not something that seems worth it as part > of this patch. > > Also didn't do the outfuncs, it's unclear to me what are the rules there > as the only DDL statement there is CreateStmt atm. > > > There are still few TODOs: > > Type info for columns. My current best idea is to write typeOid and > typemod in the relation message and add another message (type message) > that describes the type which will skip the built-in types (as we can't > really remap those without breaking a lot of software so they seem safe > to skip). I plan to do this soonish barring objections. > > Removal of use of replication origin in the table sync worker. > > Parallelization of the initial copy. And ability to resync (do new copy) > of a table. These two mainly wait for agreement over how the current way > of doing copy should work. > > >
On 31/10/16 00:52, Steve Singer wrote: > On 10/24/2016 09:22 AM, Petr Jelinek wrote: >> Hi, >> >> attached is updated version of the patch. >> >> There are quite a few improvements and restructuring, I fixed all the >> bugs and basically everything that came up from the reviews and was >> agreed on. There are still couple of things missing, ie column type >> definition in protocol and some things related to existing data copy. > > Here are a few things I've noticed so far. > > +<programlisting> > +CREATE SUBSCRIPTION mysub WITH CONNECTION <quote>dbname=foo host=bar > user=repuser</quote> PUBLICATION mypub; > +</programlisting> > + </para> > + <para> > > The documentation above doesn't match the syntax, CONNECTION needs to be > in single quotes not double quotes > I think you want > +<programlisting> > +CREATE SUBSCRIPTION mysub WITH CONNECTION 'dbname=foo host=bar > user=repuser' PUBLICATION mypub; > +</programlisting> > + </para> > + <para> > Yes. > > I am not sure if this is a known issue covered by your comments about > data copy but I am still having issues with error reporting on a failed > subscription. > > I created a subscription, dropped the subscription and created a second > one. The second subscription isn't active but shows no errors. > There are some fundamental issues with initial sync that need to be discussed on list but this one is not known. I'll try to convert this to test case (seems like useful one) and fix it, thanks for the report. In meantime I realized I broke the last patch in the series during rebase so attached is the fixed version. It also contains the type info in the protocol. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 10/24/16 9:22 AM, Petr Jelinek wrote: > I added one more prerequisite patch (the first one) which adds ephemeral > slots (or well implements UI on top of the code that was mostly already > there). The ephemeral slots are different in that they go away either on > error or when session is closed. This means the initial data sync does > not have to worry about cleaning up the slots after itself. I think this > will be useful in other places as well (for example basebackup). I > originally wanted to call them temporary slots in the UI but since the > behavior is bit different from temp tables I decided to go with what the > underlying code calls them in UI as well. I think it makes sense to expose this. Some of the comments need some polishing. Eventually, we might want to convert the option list in CREATE_REPLICATION_SLOT into a list instead of adding more and more keywords (see also VACUUM), but not necessarily now. I find the way Acquire and Release are handled now quite confusing. Because Release of an ephemeral slot means to delete it, you have changed most code to never release them until the end of the session. So there is a lot of ugly and confusing code that needs to know this difference. I think we need to use some different verbs for different purposes here. Acquire and release should keep their meaning of "I'm using this", and the calls in proc.c and postgres.c should be something like ReplicationSlotCleanup(). -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 02/11/16 17:22, Peter Eisentraut wrote: > On 10/24/16 9:22 AM, Petr Jelinek wrote: >> I added one more prerequisite patch (the first one) which adds ephemeral >> slots (or well implements UI on top of the code that was mostly already >> there). The ephemeral slots are different in that they go away either on >> error or when session is closed. This means the initial data sync does >> not have to worry about cleaning up the slots after itself. I think this >> will be useful in other places as well (for example basebackup). I >> originally wanted to call them temporary slots in the UI but since the >> behavior is bit different from temp tables I decided to go with what the >> underlying code calls them in UI as well. > > I think it makes sense to expose this. > > Some of the comments need some polishing. > > Eventually, we might want to convert the option list in > CREATE_REPLICATION_SLOT into a list instead of adding more and more > keywords (see also VACUUM), but not necessarily now. > > I find the way Acquire and Release are handled now quite confusing. > Because Release of an ephemeral slot means to delete it, you have > changed most code to never release them until the end of the session. > So there is a lot of ugly and confusing code that needs to know this > difference. I think we need to use some different verbs for different > purposes here. Acquire and release should keep their meaning of "I'm > using this", and the calls in proc.c and postgres.c should be something > like ReplicationSlotCleanup(). > Release does not really change behavior, it has always dropped ephemeral slot. So if I understand correctly what you are proposing is to change behavior of Release to not remove ephemeral slot, add function that removes the ephemeral slots of current session and add tracking of ephemeral slots created in current session? That seems like quite more complicated than what the patch does with little gain. What about just releasing the ephemeral slot if the different one is being acquired instead of the current error? -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 11/3/16 9:31 AM, Petr Jelinek wrote: > Release does not really change behavior, it has always dropped ephemeral > slot. Well, currently ephemeral is just a temporary state while a slot is being created. It's not really something that can exist independently. You might as well call it RS_NOTREADY. Therefore, dropping the slot when you de-acquire (release) it makes sense. But what you want is a slot that exists across acquire/release but it dropped at the end of the session. And what is implicit is that the slot is only usable by one session, so you don't really need to ever "release" it for use by other sessions. And so half the Release calls have been changed to Release-if-persistent, but it's not explained why in each case. It all seems to work OK, but there are a lot of hidden assumptions in each case that make it hard to follow. > So if I understand correctly what you are proposing is to change > behavior of Release to not remove ephemeral slot, add function that > removes the ephemeral slots of current session and add tracking of > ephemeral slots created in current session? That seems like quite more > complicated than what the patch does with little gain. > > What about just releasing the ephemeral slot if the different one is > being acquired instead of the current error? Maybe that would help reducing some of the mystery about when you have to call Release and when ReleasePersistent (better called ReleaseIfPersistent). -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 10/24/16 9:22 AM, Petr Jelinek wrote: > I also split out the libpqwalreceiver rewrite to separate patch which > does just the re-architecture and does not really add new functionality. > And I did the re-architecture bit differently based on the review. That looks good to me, and it appears to address the previous discussions. I wouldn't change walrcv_xxx to walrcvconn_xxx. If we're going to have macros to hide the internals, we might as well keep the names the same. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
/* * Replication slot on-disk data structure. @@ -225,10 +226,25 @@ ReplicationSlotCreate(const char *name, bool db_specific, ReplicationSlot *slot = NULL; int i; - Assert(MyReplicationSlot == NULL); + /* Only aka ephemeral slots can survive across commands. */ What does this comment mean? + Assert(!MyReplicationSlot || + MyReplicationSlot->data.persistency == RS_EPHEMERAL); + if (MyReplicationSlot) + { + /* Already acquired? Nothis to do. */ typo. + if (namestrcmp(&MyReplicationSlot->data.name, name) == 0) + return; + + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("cannot create replication slot %s, another slot %s is " + "already active in this session", + name, NameStr(MyReplicationSlot->data.name)))); + } + Why do we now create slots that are already created? That seems like an odd API change. /* * If some other backend ran this code concurrently with us, we'd likely * both allocate the same slot, andthat would be bad. We'd also be at @@ -331,10 +347,25 @@ ReplicationSlotAcquire(const char *name) int i; int active_pid = 0; - Assert(MyReplicationSlot == NULL); + /* Only aka ephemeral slots can survive across commands. */ + Assert(!MyReplicationSlot || + MyReplicationSlot->data.persistency == RS_EPHEMERAL); ReplicationSlotValidateName(name, ERROR); + if (MyReplicationSlot) + { + /* Already acquired? Nothis to do. */ + if (namestrcmp(&MyReplicationSlot->data.name, name) == 0) + return; + + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("cannot acquire replication slot %s, another slot %s is " + "already active in this session", + name, NameStr(MyReplicationSlot->data.name)))); + } + /* Search for the named slot and mark it active if we find it. */ LWLockAcquire(ReplicationSlotControlLock, LW_SHARED); for (i = 0; i < max_replication_slots; i++) @@ -406,12 +437,26 @@ ReplicationSlotRelease(void)} Uh? We shouldn't ever have to acquire ephemeral /* + * Same as above but only if currently acquired slot is peristent one. + */ s/peristent/persistent/ +void +ReplicationSlotReleasePersistent(void) +{ + Assert(MyReplicationSlot); + + if (MyReplicationSlot->data.persistency == RS_PERSISTENT) + ReplicationSlotRelease(); +} Ick. Hm. I think I have to agree a bit with Peter here. Overloading MyReplicationSlot this way seems ugly, and I think there's a bunch of bugs around it too. Sounds what we really want is a) two different lifetimes for ephemeral slots, session and "command" b) have a number of slots that are released either after a failed transaction / command or at session end. The easiest way for that appears to have a list of slots to be checked at end-of-xact and backend shutdown. Regards, Andres
Hi, /* Prototypes for interface functions */ -static void libpqrcv_connect(char *conninfo); -static char *libpqrcv_get_conninfo(void); -static void libpqrcv_identify_system(TimeLineID *primary_tli); -static void libpqrcv_readtimelinehistoryfile(TimeLineID tli, char **filename, char **content, int *len); -static bool libpqrcv_startstreaming(TimeLineID tli, XLogRecPtr startpoint, - char *slotname); -static void libpqrcv_endstreaming(TimeLineID *next_tli); -static int libpqrcv_receive(char **buffer, pgsocket *wait_fd); -static void libpqrcv_send(const char *buffer, int nbytes); -static void libpqrcv_disconnect(void); +static WalReceiverConn *libpqrcv_connect(char *conninfo, + bool logical, const char *appname); +static char *libpqrcv_get_conninfo(WalReceiverConn *conn); +static char *libpqrcv_identify_system(WalReceiverConn *conn, + TimeLineID *primary_tli); +static void libpqrcv_readtimelinehistoryfile(WalReceiverConn *conn, + TimeLineID tli, char **filename, + char **content, int *len); +static bool libpqrcv_startstreaming(WalReceiverConn *conn, + TimeLineID tli, XLogRecPtr startpoint, + const char *slotname); +static void libpqrcv_endstreaming(WalReceiverConn *conn, + TimeLineID *next_tli); +static int libpqrcv_receive(WalReceiverConn *conn, char **buffer, + pgsocket *wait_fd); +static void libpqrcv_send(WalReceiverConn *conn, const char *buffer, + int nbytes); +static void libpqrcv_disconnect(WalReceiverConn *conn); That looks good. /* Prototypes for private functions */ -static bool libpq_select(int timeout_ms); +static bool libpq_select(PGconn *streamConn, + int timeout_ms); If we're starting to use this more widely, we really should just a latch instead of the plain select(). In fact, I think it's more or less a bug that we don't (select is only interruptible by signals on a subset of our platforms). That shouldn't bother this patch, but... This looks pretty close to committable, Peter do you want to do that, or should I? Andres
Hi, + <sect1 id="catalog-pg-publication-rel"> + <title><structname>pg_publication_rel</structname></title> + + <indexterm zone="catalog-pg-publication-rel"> + <primary>pg_publication_rel</primary> + </indexterm> + + <para> + The <structname>pg_publication_rel</structname> catalog contains + mapping between tables and publications in the database. This is many to + many mapping. + </para> I wonder if we shouldn't abstract this a bit away from relations to allow other objects to be exported to. Could structure it a bit more like pg_depend. +ALTER PUBLICATION <replaceable class="PARAMETER">name</replaceable> [ [ WITH ] <replaceable class="PARAMETER">option</replaceable>[ ... ] ] + +<phrase>where <replaceable class="PARAMETER">option</replaceable> can be:</phrase> + + PuBLISH_INSERT | NOPuBLISH_INSERT + | PuBLISH_UPDATE | NOPuBLISH_UPDATE + | PuBLISH_DELETE | NOPuBLISH_DELETE That's odd casing. + <varlistentry> + <term><literal>PuBLISH_INSERT</literal></term> + <term><literal>NOPuBLISH_INSERT</literal></term> + <term><literal>PuBLISH_UPDATE</literal></term> + <term><literal>NOPuBLISH_UPDATE</literal></term> + <term><literal>PuBLISH_DELETE</literal></term> + <term><literal>NOPuBLISH_DELETE</literal></term> More odd casing. + <varlistentry> + <term><literal>FOR TABLE</literal></term> + <listitem> + <para> + Specifies optional list of tables to add to the publication. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>FOR TABLE ALL IN SCHEMA</literal></term> + <listitem> + <para> + Specifies optional schema for which all logged tables will be added to + publication. + </para> + </listitem> + </varlistentry> "FOR TABLE ALL IN SCHEMA" sounds weird. + <para> + This operation does not reserve any resources on the server. It only + defines grouping and filtering logic for future subscribers. + </para> That's strictly speaking not true, maybe rephrase a bit? +/* + * Check if relation can be in given publication and throws appropriate + * error if not. + */ +static void +check_publication_add_relation(Relation targetrel) +{ + /* Must be table */ + if (RelationGetForm(targetrel)->relkind != RELKIND_RELATION) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("only tables can be added to publication"), + errdetail("%s is not a table", + RelationGetRelationName(targetrel)))); + + /* Can't be system table */ + if (IsCatalogRelation(targetrel)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("only user tables can be added to publication"), + errdetail("%s is a system table", + RelationGetRelationName(targetrel)))); + + /* UNLOGGED and TEMP relations cannot be part of publication. */ + if (!RelationNeedsWAL(targetrel)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("UNLOGGED and TEMP relations cannot be replicated"))); +} This probably means we need a check in the ALTER TABLE ... SET UNLOGGED path. +/* + * Returns if relation represented by oid and Form_pg_class entry + * is publishable. + * + * Does same checks as the above, but does not need relation to be opened + * and also does not throw errors. + */ +static bool +is_publishable_class(Oid relid, Form_pg_class reltuple) +{ + return reltuple->relkind == RELKIND_RELATION && + !IsCatalogClass(relid, reltuple) && + reltuple->relpersistence == RELPERSISTENCE_PERMANENT && + /* XXX needed to exclude information_schema tables */ + relid >= FirstNormalObjectId; +} Shouldn't that be IsCatalogRelation() instead? +CREATE VIEW pg_publication_tables AS + SELECT + P.pubname AS pubname, + N.nspname AS schemaname, + C.relname AS tablename + FROM pg_publication P, pg_class C + JOIN pg_namespace N ON (N.oid = C.relnamespace) + WHERE C.relkind = 'r' + AND C.oid IN (SELECT relid FROM pg_get_publication_tables(P.pubname)); That's going to be quite inefficient if you filter by table... Might be better to do that via the underlying table. +/* + * Create new publication. + * TODO ACL check + */ Hm? +ObjectAddress +CreatePublication(CreatePublicationStmt *stmt) +{ + check_replication_permissions(); + +/* + * Drop publication by OID + */ +void +DropPublicationById(Oid pubid) + +/* + * Remove relation from publication by mapping OID. + */ +void +RemovePublicationRelById(Oid proid) +{ Permission checks? +} Hm. Neither of these does dependency checking, wonder if that can be argued to be problematic. +/* + * Gather Relations based o provided by RangeVar list. + * The gathered tables are locked in ShareUpdateExclusiveLock mode. + */ s/o/on/. Not sure if gather is the best name. +static List * +GatherTableList(List *tables) +/* + * Close all relations in the list. + */ +static void +CloseTables(List *rels) Shouldn't that be CloseTableList() based on the preceding function's naming? + +/* + * Add listed tables to the publication. + */ +static void +PublicationAddTables(Oid pubid, List *rels, bool if_not_exists, + AlterPublicationStmt *stmt) +{ + ListCell *lc; + + Assert(!stmt || !stmt->for_all_tables); + + foreach(lc, rels) + { + Relation rel = (Relation) lfirst(lc); + ObjectAddress obj; + + obj = publication_add_relation(pubid, rel, if_not_exists); + if (stmt) + EventTriggerCollectSimpleCommand(obj, InvalidObjectAddress, + (Node *) stmt); + } +} + +/* + * Remove listed tables to the publication. + */ s/to/from/ +static void +PublicationDropTables(Oid pubid, List *rels, bool missing_ok) +{ + ObjectAddress obj; + ListCell *lc; + Oid prid; + + foreach(lc, rels) + { + Relation rel = (Relation) lfirst(lc); + Oid relid = RelationGetRelid(rel); + + prid = GetSysCacheOid2(PUBLICATIONRELMAP, ObjectIdGetDatum(relid), + ObjectIdGetDatum(pubid)); + if (!OidIsValid(prid)) + { + if (missing_ok) + continue; + + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), + errmsg("relation \"%s\" is not part of the publication", + RelationGetRelationName(rel)))); + } + + ObjectAddressSet(obj, PublicationRelRelationId, prid); + performDeletion(&obj, DROP_CASCADE, 0); + } +} /* + * Check if command can be executed with current replica identity. + */ +static void +CheckCmdReplicaIdentity(Relation rel, CmdType cmd) +{ + PublicationActions *pubactions; + + /* We only need to do checks for UPDATE and DELETE. */ + if (cmd != CMD_UPDATE && cmd != CMD_DELETE) + return; + + /* If relation has replica identity we are always good. */ + if (rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL || + OidIsValid(RelationGetReplicaIndex(rel))) + return; + + /* + * This is either UPDATE OR DELETE and there is no replica identity. + * + * Check if the table publishes UPDATES or DELETES. + */ + pubactions = GetRelationPublicationActions(rel); + if (pubactions->pubupdate || pubactions->pubdelete) I think that leads to spurious errors. Consider a DELETE with a publication that replicates updates but not deletes. + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("cannot update table \"%s\" because it does not have replica identity and publishes updates", + RelationGetRelationName(rel)), + errhint("To enable updating the table, provide set REPLICA IDENTITY using ALTER TABLE."))); +} "provide set" +publication_opt_item: + IDENT + { + /* + * We handle identifiers that aren't parser keywords with + * the following special-case codes, to avoid bloating the + * size of the main parser. + */ + if (strcmp($1, "publish_insert") == 0) + $$ = makeDefElem("publish_insert", + (Node *)makeInteger(TRUE), @1); + else if (strcmp($1, "nopublish_insert") == 0) + $$ = makeDefElem("publish_insert", + (Node *)makeInteger(FALSE), @1); + else if (strcmp($1, "publish_update") == 0) + $$ = makeDefElem("publish_update", + (Node *)makeInteger(TRUE), @1); + else if (strcmp($1, "nopublish_update") == 0) + $$ = makeDefElem("publish_update", + (Node *)makeInteger(FALSE), @1); + else if (strcmp($1, "publish_delete") == 0) + $$ = makeDefElem("publish_delete", + (Node *)makeInteger(TRUE), @1); + else if (strcmp($1, "nopublish_delete") == 0) + $$ = makeDefElem("publish_delete", + (Node *)makeInteger(FALSE), @1); + else + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("unrecognized publication option \"%s\"", $1), + parser_errposition(@1))); + } + ; I still would very much like to move this outside of gram.y and just use IDENTs here. Like how COPY options are handled. +/* + * Get publication actions for list of publication oids. + */ +struct PublicationActions * +GetRelationPublicationActions(Relation relation) API description and function name/parameters don't quite match. +CATALOG(pg_publication,6104) +{ + NameData pubname; /* name of the publication */ + + /* + * indicates that this is special publication which should encompass + * all tables in the database (except for the unlogged and temp ones) + */ + bool puballtables; + + /* true if inserts are published */ + bool pubinsert; + + /* true if updates are published */ + bool pubupdate; + + /* true if deletes are published */ + bool pubdelete; + +} FormData_pg_publication; Shouldn't this have an owner? I also wonder if we want an easier to extend form of pubinsert/update/delete (say to add pubddl, pubtruncate, pub ... without changing the schema). +/* ---------------- + * pg_publication_rel definition. cpp turns this into + * typedef struct FormData_pg_publication_rel + * + * ---------------- + */ +#define PublicationRelRelationId 6106 + +CATALOG(pg_publication_rel,6106) +{ + Oid prpubid; /* Oid of the publication */ + Oid prrelid; /* Oid of the relation */ +} FormData_pg_publication_rel; To me it seems like a good idea to have objclassid/objsubid here. Regards, Andres
Hi, (btw, I vote against tarballing patches) + <tgroup cols="4"> + <thead> + <row> + <entry>Name</entry> + <entry>Type</entry> + <entry>References</entry> + <entry>Description</entry> + </row> + </thead> + + <tbody> + <row> + <entry><structfield>oid</structfield></entry> + <entry><type>oid</type></entry> + <entry></entry> + <entry>Row identifier (hidden attribute; must be explicitly selected)</entry> + </row> + + <row> + <entry><structfield>subpublications</structfield></entry> + <entry><type>name[]</type></entry> + <entry></entry> + <entry>Array of subscribed publication names. These reference the + publications on the publisher server. + </entry> Why is this names and not oids? So you can see it across databases? I think this again should have an owner. include $(top_srcdir)/src/backend/common.mk diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c index 68d7e46..523008d 100644 --- a/src/backend/commands/event_trigger.c +++ b/src/backend/commands/event_trigger.c @@ -112,6 +112,7 @@ static event_trigger_support_data event_trigger_support[] = { {"SCHEMA", true}, {"SEQUENCE", true}, {"SERVER", true}, + {"SUBSCRIPTION", true}, Hm, is that ok? Subscriptions are shared, so ...? + /* + * If requested, create the replication slot on remote side for our + * newly created subscription. + * + * Note, we can't cleanup slot in case of failure as reason for + * failure might be already existing slot of the same name and we + * don't want to drop somebody else's slot by mistake. + */ + if (create_slot) + { + XLogRecPtr lsn; + + /* + * Create the replication slot on remote side for our newly created + * subscription. + * + * Note, we can't cleanup slot in case of failure as reason for + * failure might be already existing slot of the same name and we + * don't want to drop somebody else's slot by mistake. + */ We should really be able to recognize that based on the error code... +/* + * Drop subscription by OID + */ +void +DropSubscriptionById(Oid subid) +{ + /* + * We must ignore errors here as that would make it impossible to drop + * subscription when publisher is down. + */ I'm not convinced. Leaving a slot around without a "record" of it on the creating side isn't nice either. Maybe a FORCE flag or something? +subscription_create_opt_item: + subscription_opt_item + | INITIALLY IDENT + { + if (strcmp($2, "enabled") == 0) + $$ = makeDefElem("enabled", + (Node *)makeInteger(TRUE), @1); + else if (strcmp($2, "disabled") == 0) + $$ = makeDefElem("enabled", + (Node *)makeInteger(FALSE), @1); + else + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("unrecognized subscription option \"%s\"", $1), + parser_errposition(@2))); + } + | IDENT + { + if (strcmp($1, "create_slot") == 0) + $$ = makeDefElem("create_slot", + (Node *)makeInteger(TRUE), @1); + else if (strcmp($1, "nocreate_slot") == 0) + $$ = makeDefElem("create_slot", + (Node *)makeInteger(FALSE), @1); + } + ; Hm, the IDENT case ignores $1 if it's not create_slot/nocreate_slot and thus leaves $$ uninitialized? I again really would like to have the error checking elsewhere. - Andres
On 10/31/2016 06:38 AM, Petr Jelinek wrote: > There are some fundamental issues with initial sync that need to be > discussed on list but this one is not known. I'll try to convert this > to test case (seems like useful one) and fix it, thanks for the > report. In meantime I realized I broke the last patch in the series > during rebase so attached is the fixed version. It also contains the > type info in the protocol. > > I don't know if this is covered by the known initial_sync problems or not If I have a 'all tables' publication and then create a new table the data doesn't seem to replicate to the new table. P: create table a(a serial4 primary key, b text); S: create table a(a serial4 primary key, b text); P: create publication mypub for all tables; S: create subscription mysub connection 'host=localhost dbname=test port=5441' publication mypub; P: create table b(a serial4 primary key, b text); P: insert into b(b) values ('foo2'); P: insert into a(b) values ('foo3'); Then I check my subscriber select * FROM a; a | b ---+------ 1 | foo 2 | foo3 (2 rows) test=# select * FROM b; a | b ---+--- (0 rows) However, if the table isn't on the subscriber I do get an error: ie P: create table c(a serial4 primary key, b text); P: insert into c(b) values('foo'); 2016-11-05 11:49:31.456 EDT [14938] FATAL: the logical replication target public.c not found 2016-11-05 11:49:31.457 EDT [13703] LOG: worker process: logical replication worker 16457 (PID 14938) exited with exit code 1 but if then add the table S: create table c(a serial4 primary key, b text); 2016-11-05 11:51:08.583 EDT [15014] LOG: logical replication apply for subscription mysub started but the data doesn't replicate to table c either.
Review of v7 0003-Add-PUBLICATION-catalogs-and-DDL.patch: This appears to address previous reviews and is looking pretty solid. I have some comments that are easily addressed: [still from previous review] The code for OCLASS_PUBLICATION_REL in getObjectIdentityParts() does not fill in objname and objargs, as it is supposed to. catalog.sgml: pg_publication_rel column names must be updated after renaming alter_publication.sgml and elsewhere: typos PuBLISH_INSERT etc. create_publication.sgml: FOR TABLE ALL IN SCHEMA does not exist anymore create_publication.sgml: talks about not-yet-existing SUBSCRIPTION role DropPublicationById maybe name RemovePublicationById for consistency system_views.sql: C.relkind = 'r' unnecessary CheckCmdReplicaIdentity: error message says "cannot update", should distinguish between update and delete relcache.c: pubactions->pubinsert |= pubform->pubinsert; etc. should be ||= RelationData.rd_pubactions could be a bitmap, simplifying some memcpy and context management. But RelationData appears to favor rich data structures, so maybe that is fine. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 11/4/16 9:00 AM, Andres Freund wrote: > + <para> > + The <structname>pg_publication_rel</structname> catalog contains > + mapping between tables and publications in the database. This is many to > + many mapping. > + </para> > > I wonder if we shouldn't abstract this a bit away from relations to > allow other objects to be exported to. Could structure it a bit more > like pg_depend. I think we can add/change that when we have use for it. > + <varlistentry> > + <term><literal>FOR TABLE ALL IN SCHEMA</literal></term> > + <listitem> > + <para> > + Specifies optional schema for which all logged tables will be added to > + publication. > + </para> > + </listitem> > + </varlistentry> > > "FOR TABLE ALL IN SCHEMA" sounds weird. That clause no longer exists anyway. > + <para> > + This operation does not reserve any resources on the server. It only > + defines grouping and filtering logic for future subscribers. > + </para> > > That's strictly speaking not true, maybe rephrase a bit? Maybe the point is that it does not initiate any contact with remote nodes. > +/* > + * Create new publication. > + * TODO ACL check > + */ > > Hm? The first patch is going to be just superuser and replication role. I'm working on a patch set for later that adds proper ACLs, owners, and all that. So I'd suggest to ignore these details for now, unless of course you find permission checks *missing*. > +/* > + * Drop publication by OID > + */ > +void > +DropPublicationById(Oid pubid) > + > +/* > + * Remove relation from publication by mapping OID. > + */ > +void > +RemovePublicationRelById(Oid proid) > +{ > > Permission checks? > > +} > > Hm. Neither of these does dependency checking, wonder if that can be > argued to be problematic. The dependency checking is done before it gets to these functions, no? > /* > + * Check if command can be executed with current replica identity. > + */ > +static void > +CheckCmdReplicaIdentity(Relation rel, CmdType cmd) > +{ > + PublicationActions *pubactions; > + > + /* We only need to do checks for UPDATE and DELETE. */ > + if (cmd != CMD_UPDATE && cmd != CMD_DELETE) > + return; > + > + /* If relation has replica identity we are always good. */ > + if (rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL || > + OidIsValid(RelationGetReplicaIndex(rel))) > + return; > + > + /* > + * This is either UPDATE OR DELETE and there is no replica identity. > + * > + * Check if the table publishes UPDATES or DELETES. > + */ > + pubactions = GetRelationPublicationActions(rel); > + if (pubactions->pubupdate || pubactions->pubdelete) > > I think that leads to spurious errors. Consider a DELETE with a > publication that replicates updates but not deletes. Yeah, it needs to check the pubactions against the specific command. > +} FormData_pg_publication; > > Shouldn't this have an owner? Yes, see above. > I also wonder if we want an easier to > extend form of pubinsert/update/delete (say to add pubddl, pubtruncate, > pub ... without changing the schema). Maybe, but how? (without using weird array constructs that are a pain to parse in psql and pg_dump, for example) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 04/11/16 13:15, Andres Freund wrote: > > /* Prototypes for private functions */ > -static bool libpq_select(int timeout_ms); > +static bool libpq_select(PGconn *streamConn, > + int timeout_ms); > > If we're starting to use this more widely, we really should just a latch > instead of the plain select(). In fact, I think it's more or less a bug > that we don't (select is only interruptible by signals on a subset of > our platforms). That shouldn't bother this patch, but... > > Agree that this is problem, especially for the subscription creation later. We should be doing WaitLatchOrSocket, but the question is which latch. We can't use MyProc one as that's not the latch that WalReceiver uses so I guess we would have to send latch as parameter to any caller of this which is not very pretty from api perspective but I don't have better idea here. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 04/11/16 13:07, Andres Freund wrote: > > Hm. I think I have to agree a bit with Peter here. Overloading > MyReplicationSlot this way seems ugly, and I think there's a bunch of > bugs around it too. > > > Sounds what we really want is a) two different lifetimes for ephemeral > slots, session and "command" b) have a number of slots that are released > either after a failed transaction / command or at session end. The > easiest way for that appears to have a list of slots to be checked at > end-of-xact and backend shutdown. > Ok so how about attached? It adds temp slots as new type of persistence. It does not really touch the behavior of any of the existing API or persistence settings. The temp slots are just cleaned up on backend exit or error, other than that they are not special. I don't use any specific backend local list to track them, instead they have active_pid always set and just cleanup everything that has that set at the end of the session. This has nice property that it forbids other backends for acquiring them. It does not do any locking while searching for the slots to cleanup (see ReplicationSlotCleanup), mainly because it complicates the interaction with ReplicationSlotDropPtr and it seems to me that locking there is not really needed there as other backends will never change active_pid to our backend pid and then the ReplicationSlotDropPtr does exclusive lock when resetting it. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 04/11/16 14:00, Andres Freund wrote: > Hi, > > + <sect1 id="catalog-pg-publication-rel"> > + <title><structname>pg_publication_rel</structname></title> > + > + <indexterm zone="catalog-pg-publication-rel"> > + <primary>pg_publication_rel</primary> > + </indexterm> > + > + <para> > + The <structname>pg_publication_rel</structname> catalog contains > + mapping between tables and publications in the database. This is many to > + many mapping. > + </para> > > I wonder if we shouldn't abstract this a bit away from relations to > allow other objects to be exported to. Could structure it a bit more > like pg_depend. > Honestly, let's not overdesign this. Change like that can be made in the future if we need it and I am quite unconvinced we do given that anything we might want to replicate will be relation. I understand that it might be useful to know what's on downstream in terms of objects at some point for some future functionality, but I am don't have idea how that functionality will look like so it's premature to guess what catalog structure it will need. > > +ALTER PUBLICATION <replaceable class="PARAMETER">name</replaceable> [ [ WITH ] <replaceable class="PARAMETER">option</replaceable>[ ... ] ] > + > +<phrase>where <replaceable class="PARAMETER">option</replaceable> can be:</phrase> > + > + PuBLISH_INSERT | NOPuBLISH_INSERT > + | PuBLISH_UPDATE | NOPuBLISH_UPDATE > + | PuBLISH_DELETE | NOPuBLISH_DELETE > > That's odd casing. > > > + <varlistentry> > + <term><literal>PuBLISH_INSERT</literal></term> > + <term><literal>NOPuBLISH_INSERT</literal></term> > + <term><literal>PuBLISH_UPDATE</literal></term> > + <term><literal>NOPuBLISH_UPDATE</literal></term> > + <term><literal>PuBLISH_DELETE</literal></term> > + <term><literal>NOPuBLISH_DELETE</literal></term> > Ah typo in my sed script, fun. > More odd casing. > > + <varlistentry> > + <term><literal>FOR TABLE</literal></term> > + <listitem> > + <para> > + Specifies optional list of tables to add to the publication. > + </para> > + </listitem> > + </varlistentry> > + > + <varlistentry> > + <term><literal>FOR TABLE ALL IN SCHEMA</literal></term> > + <listitem> > + <para> > + Specifies optional schema for which all logged tables will be added to > + publication. > + </para> > + </listitem> > + </varlistentry> > > "FOR TABLE ALL IN SCHEMA" sounds weird. > I actually removed support for this at some point, forgot to remove docs. I might add this feature again in the future but I reckon we can live without it in v1. > + <para> > + This operation does not reserve any resources on the server. It only > + defines grouping and filtering logic for future subscribers. > + </para> > > That's strictly speaking not true, maybe rephrase a bit? > Sure, this basically is supposed to mean that it does not really start replication or keep wal or anything like that as opposed what for example slots do. > +/* > + * Check if relation can be in given publication and throws appropriate > + * error if not. > + */ > +static void > +check_publication_add_relation(Relation targetrel) > +{ > + /* Must be table */ > + if (RelationGetForm(targetrel)->relkind != RELKIND_RELATION) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("only tables can be added to publication"), > + errdetail("%s is not a table", > + RelationGetRelationName(targetrel)))); > + > + /* Can't be system table */ > + if (IsCatalogRelation(targetrel)) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("only user tables can be added to publication"), > + errdetail("%s is a system table", > + RelationGetRelationName(targetrel)))); > + > + /* UNLOGGED and TEMP relations cannot be part of publication. */ > + if (!RelationNeedsWAL(targetrel)) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("UNLOGGED and TEMP relations cannot be replicated"))); > +} > > This probably means we need a check in the ALTER TABLE ... SET UNLOGGED > path. > Good point. > > +/* > + * Returns if relation represented by oid and Form_pg_class entry > + * is publishable. > + * > + * Does same checks as the above, but does not need relation to be opened > + * and also does not throw errors. > + */ > +static bool > +is_publishable_class(Oid relid, Form_pg_class reltuple) > +{ > + return reltuple->relkind == RELKIND_RELATION && > + !IsCatalogClass(relid, reltuple) && > + reltuple->relpersistence == RELPERSISTENCE_PERMANENT && > + /* XXX needed to exclude information_schema tables */ > + relid >= FirstNormalObjectId; > +} > > Shouldn't that be IsCatalogRelation() instead? > Well IsCatalogRelation just calls IsCatalogClass and we call IsCatalogClass here as well. The problem with IsCatalogClass is that it does not consider tables in information_schema that were created as part of initdb to be system catalogs because it first does negative check on pg_catalog and toast schemas and only then considers FirstNormalObjectId. I was actually wondering if that might be a bug in IsCatalogClass. > > +/* > + * Create new publication. > + * TODO ACL check > + */ > That was meant for future enhancements, but I think I'll don't do detailed ACLs in v1 so I'll remove that TODO. > + > +/* > + * Drop publication by OID > + */ > +void > +DropPublicationById(Oid pubid) > + > +/* > + * Remove relation from publication by mapping OID. > + */ > +void > +RemovePublicationRelById(Oid proid) > +{ > > Permission checks? > > +} > > Hm. Neither of these does dependency checking, wonder if that can be > argued to be problematic. > As PeterE said, that's done by caller, none of the Drop...ById does dependency checks. > +publication_opt_item: > + IDENT > + { > + /* > + * We handle identifiers that aren't parser keywords with > + * the following special-case codes, to avoid bloating the > + * size of the main parser. > + */ > + if (strcmp($1, "publish_insert") == 0) > + $$ = makeDefElem("publish_insert", > + (Node *)makeInteger(TRUE), @1); > + else if (strcmp($1, "nopublish_insert") == 0) > + $$ = makeDefElem("publish_insert", > + (Node *)makeInteger(FALSE), @1); > + else if (strcmp($1, "publish_update") == 0) > + $$ = makeDefElem("publish_update", > + (Node *)makeInteger(TRUE), @1); > + else if (strcmp($1, "nopublish_update") == 0) > + $$ = makeDefElem("publish_update", > + (Node *)makeInteger(FALSE), @1); > + else if (strcmp($1, "publish_delete") == 0) > + $$ = makeDefElem("publish_delete", > + (Node *)makeInteger(TRUE), @1); > + else if (strcmp($1, "nopublish_delete") == 0) > + $$ = makeDefElem("publish_delete", > + (Node *)makeInteger(FALSE), @1); > + else > + ereport(ERROR, > + (errcode(ERRCODE_SYNTAX_ERROR), > + errmsg("unrecognized publication option \"%s\"", $1), > + parser_errposition(@1))); > + } > + ; > > I still would very much like to move this outside of gram.y and just use > IDENTs here. Like how COPY options are handled. > Well, I looked into it and it means some loss of info in the error messages - mainly the error position in the query because utility statements don't get ParseState (unlike COPY). It might be worth the flexibility though. > > > +CATALOG(pg_publication,6104) > +{ > + NameData pubname; /* name of the publication */ > + > + /* > + * indicates that this is special publication which should encompass > + * all tables in the database (except for the unlogged and temp ones) > + */ > + bool puballtables; > + > + /* true if inserts are published */ > + bool pubinsert; > + > + /* true if updates are published */ > + bool pubupdate; > + > + /* true if deletes are published */ > + bool pubdelete; > + > +} FormData_pg_publication; > > Shouldn't this have an owner? Probably, I wanted to do that as follow-up patch originally, but looks like it should be in initial version. > I also wonder if we want an easier to > extend form of pubinsert/update/delete (say to add pubddl, pubtruncate, > pub ... without changing the schema). > So like, text array that's then parsed everywhere (I am not doing bitmask/int definitely)? > > +/* ---------------- > + * pg_publication_rel definition. cpp turns this into > + * typedef struct FormData_pg_publication_rel > + * > + * ---------------- > + */ > +#define PublicationRelRelationId 6106 > + > +CATALOG(pg_publication_rel,6106) > +{ > + Oid prpubid; /* Oid of the publication */ > + Oid prrelid; /* Oid of the relation */ > +} FormData_pg_publication_rel; > > To me it seems like a good idea to have objclassid/objsubid here. > You said that in the beginning, but again I am not quite convinced of that yet. i guess if PeterE will move the sequence patches all the way and we might lose the notion that sequences are relation (not sure if that's where he is ultimately going though), that might make sense, otherwise, don't really think this we need that. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 04/11/16 14:24, Andres Freund wrote: > Hi, > > (btw, I vote against tarballing patches) > Well, I vote against CF app not handling correctly emails with multiple attachments :) > + <tgroup cols="4"> > + <thead> > + <row> > + <entry>Name</entry> > + <entry>Type</entry> > + <entry>References</entry> > + <entry>Description</entry> > + </row> > + </thead> > + > + <tbody> > + <row> > + <entry><structfield>oid</structfield></entry> > + <entry><type>oid</type></entry> > + <entry></entry> > + <entry>Row identifier (hidden attribute; must be explicitly selected)</entry> > + </row> > + > > + <row> > + <entry><structfield>subpublications</structfield></entry> > + <entry><type>name[]</type></entry> > + <entry></entry> > + <entry>Array of subscribed publication names. These reference the > + publications on the publisher server. > + </entry> > > Why is this names and not oids? So you can see it across databases? > Because they only exist on remote server. > > > include $(top_srcdir)/src/backend/common.mk > diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c > index 68d7e46..523008d 100644 > --- a/src/backend/commands/event_trigger.c > +++ b/src/backend/commands/event_trigger.c > @@ -112,6 +112,7 @@ static event_trigger_support_data event_trigger_support[] = { > {"SCHEMA", true}, > {"SEQUENCE", true}, > {"SERVER", true}, > + {"SUBSCRIPTION", true}, > > Hm, is that ok? Subscriptions are shared, so ...? > Good point, forgot event triggers don't handle shared objects. > > + /* > + * If requested, create the replication slot on remote side for our > + * newly created subscription. > + * > + * Note, we can't cleanup slot in case of failure as reason for > + * failure might be already existing slot of the same name and we > + * don't want to drop somebody else's slot by mistake. > + */ > + if (create_slot) > + { > + XLogRecPtr lsn; > + > + /* > + * Create the replication slot on remote side for our newly created > + * subscription. > + * > + * Note, we can't cleanup slot in case of failure as reason for > + * failure might be already existing slot of the same name and we > + * don't want to drop somebody else's slot by mistake. > + */ > > We should really be able to recognize that based on the error code... > We could, provided that the slot is active, but that would leave nasty race condition where if you do drop and the other subscription of same name is not running (restarting, temporarily disabled, etc) we'll remove the slot for it. Maybe we should not care about that and say slot is representing the subscription and if you name slot same for two different subscriptions then that's your problem though. > +/* > + * Drop subscription by OID > + */ > +void > +DropSubscriptionById(Oid subid) > +{ > > + /* > + * We must ignore errors here as that would make it impossible to drop > + * subscription when publisher is down. > + */ > > I'm not convinced. Leaving a slot around without a "record" of it on > the creating side isn't nice either. Maybe a FORCE flag or something? > I would like to have this as option yes, not sure if FORCE is best naming, but I have trouble coming up with good name. We have CREATE_SLOT and NOCREATE_SLOT for CREATE SUBSCRIPTION, so maybe we could have DROP_SLOT (default) and NODROP_SLOT for DROP SUBSCRIPTION. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 08/11/16 19:51, Peter Eisentraut wrote: > Review of v7 0003-Add-PUBLICATION-catalogs-and-DDL.patch: > > This appears to address previous reviews and is looking pretty solid. I > have some comments that are easily addressed: > > [still from previous review] The code for OCLASS_PUBLICATION_REL in > getObjectIdentityParts() does not fill in objname and objargs, as it is > supposed to. > > catalog.sgml: pg_publication_rel column names must be updated after renaming > > alter_publication.sgml and elsewhere: typos PuBLISH_INSERT etc. > > create_publication.sgml: FOR TABLE ALL IN SCHEMA does not exist anymore > > create_publication.sgml: talks about not-yet-existing SUBSCRIPTION role > > DropPublicationById maybe name RemovePublicationById for consistency > > system_views.sql: C.relkind = 'r' unnecessary > > CheckCmdReplicaIdentity: error message says "cannot update", should > distinguish between update and delete > > relcache.c: pubactions->pubinsert |= pubform->pubinsert; etc. should be ||= > > RelationData.rd_pubactions could be a bitmap, simplifying some memcpy > and context management. But RelationData appears to favor rich data > structures, so maybe that is fine. > Thanks for these, some of it is result of various rebases that I did (the sync patch makes rebasing bit complicated as it touches everything) and it's easy for me to overlook it at this point. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2016-11-10 23:31:27 +0100, Petr Jelinek wrote: > On 04/11/16 13:15, Andres Freund wrote: > > > > /* Prototypes for private functions */ > > -static bool libpq_select(int timeout_ms); > > +static bool libpq_select(PGconn *streamConn, > > + int timeout_ms); > > > > If we're starting to use this more widely, we really should just a latch > > instead of the plain select(). In fact, I think it's more or less a bug > > that we don't (select is only interruptible by signals on a subset of > > our platforms). That shouldn't bother this patch, but... > > > > > > Agree that this is problem, especially for the subscription creation > later. We should be doing WaitLatchOrSocket, but the question is which > latch. We can't use MyProc one as that's not the latch that WalReceiver > uses so I guess we would have to send latch as parameter to any caller > of this which is not very pretty from api perspective but I don't have > better idea here. I think we should simply make walsender use the standard proc latch. Afaics that should be fairly trivial? Greetings, Andres Freund
Hi, On 2016-11-11 12:04:27 +0100, Petr Jelinek wrote: > On 04/11/16 14:00, Andres Freund wrote: > > Hi, > > > > + <sect1 id="catalog-pg-publication-rel"> > > + <title><structname>pg_publication_rel</structname></title> > > + > > + <indexterm zone="catalog-pg-publication-rel"> > > + <primary>pg_publication_rel</primary> > > + </indexterm> > > + > > + <para> > > + The <structname>pg_publication_rel</structname> catalog contains > > + mapping between tables and publications in the database. This is many to > > + many mapping. > > + </para> > > > > I wonder if we shouldn't abstract this a bit away from relations to > > allow other objects to be exported to. Could structure it a bit more > > like pg_depend. > > > > Honestly, let's not overdesign this. Change like that can be made in the > future if we need it and I am quite unconvinced we do given that > anything we might want to replicate will be relation. I understand that > it might be useful to know what's on downstream in terms of objects at > some point for some future functionality, but I am don't have idea how > that functionality will look like so it's premature to guess what > catalog structure it will need. I slightly prefer to make it more generic right now, but I don't think that's a blocker. > > I still would very much like to move this outside of gram.y and just use > > IDENTs here. Like how COPY options are handled. > > > > Well, I looked into it and it means some loss of info in the error > messages - mainly the error position in the query because utility > statements don't get ParseState (unlike COPY). It might be worth the > flexibility though. Pretty sure that that's the case. > > I also wonder if we want an easier to > > extend form of pubinsert/update/delete (say to add pubddl, pubtruncate, > > pub ... without changing the schema). > > > > So like, text array that's then parsed everywhere (I am not doing > bitmask/int definitely)? Yes, that sounds good to me. Then convert it to individual booleans or a bitmask when loading the publications into the in-memory form (which you already do). Greetings, Andres Freund
On 12/11/16 20:19, Andres Freund wrote: > On 2016-11-10 23:31:27 +0100, Petr Jelinek wrote: >> On 04/11/16 13:15, Andres Freund wrote: >>> >>> /* Prototypes for private functions */ >>> -static bool libpq_select(int timeout_ms); >>> +static bool libpq_select(PGconn *streamConn, >>> + int timeout_ms); >>> >>> If we're starting to use this more widely, we really should just a latch >>> instead of the plain select(). In fact, I think it's more or less a bug >>> that we don't (select is only interruptible by signals on a subset of >>> our platforms). That shouldn't bother this patch, but... >>> >>> >> >> Agree that this is problem, especially for the subscription creation >> later. We should be doing WaitLatchOrSocket, but the question is which >> latch. We can't use MyProc one as that's not the latch that WalReceiver >> uses so I guess we would have to send latch as parameter to any caller >> of this which is not very pretty from api perspective but I don't have >> better idea here. > > I think we should simply make walsender use the standard proc > latch. Afaics that should be fairly trivial? Walreceiver you mean. Yeah that should be simple, looking at the code I am not quite sure why it uses separate latch in the first place. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 11/12/16 2:18 PM, Andres Freund wrote: >>> I also wonder if we want an easier to >>> > > extend form of pubinsert/update/delete (say to add pubddl, pubtruncate, >>> > > pub ... without changing the schema). >>> > > >> > >> > So like, text array that's then parsed everywhere (I am not doing >> > bitmask/int definitely)? > Yes, that sounds good to me. Then convert it to individual booleans or a > bitmask when loading the publications into the in-memory form (which you > already do). I'm not sure why that would be better. Adding catalog columns in future versions is not a problem. We're not planning on adding hundreds of publication attributes. Denormalizing catalog columns creates all kinds of inconveniences, in the backend code, in frontend code, for users. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2016-11-13 00:40:12 -0500, Peter Eisentraut wrote: > On 11/12/16 2:18 PM, Andres Freund wrote: > >>> I also wonder if we want an easier to > >>> > > extend form of pubinsert/update/delete (say to add pubddl, pubtruncate, > >>> > > pub ... without changing the schema). > >>> > > > >> > > >> > So like, text array that's then parsed everywhere (I am not doing > >> > bitmask/int definitely)? > > Yes, that sounds good to me. Then convert it to individual booleans or a > > bitmask when loading the publications into the in-memory form (which you > > already do). > > I'm not sure why that would be better. Adding catalog columns in future > versions is not a problem. It can be extended from what core provides, for extended versions of replication solutions, for one. I presume publications/subscriptions aren't only going to be used by built-in code. Andres
On 10/31/2016 06:38 AM, Petr Jelinek wrote: > On 31/10/16 00:52, Steve Singer wrote: > There are some fundamental issues with initial sync that need to be > discussed on list but this one is not known. I'll try to convert this > to test case (seems like useful one) and fix it, thanks for the > report. In meantime I realized I broke the last patch in the series > during rebase so attached is the fixed version. It also contains the > type info in the protocol. > Attached are some proposed documentation updates (to be applied ontop of your 20161031 patch set) Also <sect1 id="logical-replication-publication"> <title>Publication</title> + <para> + The tables are matched using fully qualified table name. Renaming of + tables or schemas is not supported. + </para> Is renaming of tables any less supported than other DDL operations For example alter table nokey2 rename to nokey3 select * FROM pg_publication_tables ; pubname | schemaname | tablename ---------+------------+----------- tpub | public | nokey3 (1 row) If I then kill the postmaster on my subscriber and restart it, I get 2016-11-13 16:17:11.341 EST [29488] FATAL: the logical replication target public.nokey3 not found 2016-11-13 16:17:11.342 EST [29272] LOG: worker process: logical replication worker 41076 (PID 29488) exited with exit code 1 2016-11-13 16:17:16.350 EST [29496] LOG: logical replication apply for subscription nokeysub started 2016-11-13 16:17:16.358 EST [29498] LOG: logical replication sync for subscription nokeysub, table nokey2 started 2016-11-13 16:17:16.515 EST [29498] ERROR: table public.nokey2 not found on publisher 2016-11-13 16:17:16.517 EST [29272] LOG: worker process: logical replication worker 41076 sync 24688 (PID 29498) exited with exit code 1 but if I then rename the table on the subscriber everything seems to work. (I suspect the need to kill+restart is a bug, I've seen other instances where a hard restart of the subscriber following changes to is required) I am also having issues adding a table to a publication ,it doesn't seem work P: create publication tpub for table a; S: create subscription mysub connection 'host=localhost dbname=test port=5440' publication tpub; P: insert into a(b) values ('1'); P: alter publication tpub add table b; P: insert into b(b) values ('1'); P: insert into a(b) values ('2'); select * FROM pg_publication_tables ; pubname | schemaname | tablename ---------+------------+----------- tpub | public | a tpub | public | b but S: select * FROM b; a | b ---+--- (0 rows) S: select * FROM a; a | b ---+--- 5 | 1 6 | 2 (2 rows)
Attachment
On 13/11/16 10:21, Andres Freund wrote: > On 2016-11-13 00:40:12 -0500, Peter Eisentraut wrote: >> On 11/12/16 2:18 PM, Andres Freund wrote: >>>>> I also wonder if we want an easier to >>>>>>> extend form of pubinsert/update/delete (say to add pubddl, pubtruncate, >>>>>>> pub ... without changing the schema). >>>>>>> >>>>> >>>>> So like, text array that's then parsed everywhere (I am not doing >>>>> bitmask/int definitely)? >>> Yes, that sounds good to me. Then convert it to individual booleans or a >>> bitmask when loading the publications into the in-memory form (which you >>> already do). >> >> I'm not sure why that would be better. Adding catalog columns in future >> versions is not a problem. > > It can be extended from what core provides, for extended versions of > replication solutions, for one. I presume publications/subscriptions > aren't only going to be used by built-in code. > I understand the desire here (especially as an author of such out of the core tools), but I am not sure if this is a good place where to start having pluggable catalogs given that we have no generic idea for those. Currently, plugins writing arbitrary data to catalogs will cause things to break when those plugins get uninstalled (and we don't have good mechanism for cleaning that up when that happens). And that won't change if we convert this into array. Besides, shouldn't the code then anyway check that we only have expected data in that array otherwise we might miss corruption? So if the main reason for turning this into array is extendability for other providers then I am -1 on the idea. IMHO this is for completely different path that adds user catalogs with proper syscache-like interface and everything but has nothing to do with publications. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Sun, Nov 13, 2016 at 4:21 AM, Andres Freund <andres@anarazel.de> wrote: > It can be extended from what core provides, for extended versions of > replication solutions, for one. I presume publications/subscriptions > aren't only going to be used by built-in code. Hmm, I would not have presumed that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, attached is v8. No tarballing this time ;) About the patches: 0001: This is the reworked approach to temporary slots that I sent earlier. 0002: I ripped out the libpq_select completely and did what Andres suggested, ie, WaitLatchOrSocket, that needed changes for WalReceiver to use procLatch but that was trivial. Otherwise it's same. 0003: Changes: - Moved the parser of options into C - Removed the dead references to "FOR TABLE ALL IN SCHEMA" - Rephrased some things and fixed several typos - Added needed check into ALTER TABLE ... SET UNLOGGED - Fixed the UPDATE/DELETE check in CheckCmdReplicaIdentity - Added owner - Fixed permission checks I didn't do any of the text array instead of bools and objclassid/objsubid as the reasoning for former is wrong IMHO and the latter is quite premature and I am still not convinced it will be ever needed. I also didn't do couple of things reported by PeterE: > relcache.c: pubactions->pubinsert |= pubform->pubinsert; etc. should be ||= This one does not seem to be true, there is no ||= and |= works fine for booleans. And > The code for OCLASS_PUBLICATION_REL in > getObjectIdentityParts() does not fill in objname and objargs, as it is > supposed to. >From what I see it already does that. 0004: Changes: - Added separate DropSubscriptionStmt statement for DROP. This was prompted by Andres' comment about event triggers. The event triggers actually work fine as all the SQL is only supposed to touch subscriptions in current database even though it's shared catalog (it's only shared because we need the catalog pin but that's implementation detail), but the DROP would break if name matched subscription in another database if handled by DropStmt. - Added SLOT_DROP/NOSLOT_DROP options to DROP SUBSCRIPTION, the new DropSubscriptionStmt helps here as well - Added owner - Moved the option parsing into C 0005/0006 - Mainly just included the doc patch from Steve Singer and did some additional doc fixes. The 0007 is something that's more a question for discussion if we want that. It adds new GUC that sets synchronous commit for apply workers and defaults to off. This gives quite noticeable performance boost while still working correctly even is provider uses sync replication. This is based on experience (and default behaviour) of BDR and pglogical, but I am not quite sure if core postgres should have that as well (I think it definitely should have the option, question is more about default setting). And that's it for now. After some discussion with PeterE I decided to skip the initial sync patch as it has quite high impact on development of rest of the patch (because it touches everything, I spend all my time rebasing it instead of actually fixing things) and can be done as follow-up patch. I also believe that it will also be polished much faster once I can fully concentrate on it when this part is done. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
- 0001-Add-support-for-TEMPORARY-replication-slots-v8.patch.gz
- 0002-Refactor-libpqwalreceiver-v8.patch.gz
- 0003-Add-PUBLICATION-catalogs-and-DDL-v8.patch.gz
- 0004-Add-SUBSCRIPTION-catalog-and-DDL-v8.patch.gz
- 0005-Define-logical-replication-protocol-and-output-plugi-v8.patch.gz
- 0006-Add-logical-replication-workers-v8.patch.gz
- 0007-Add-separate-synchronous-commit-control-for-logical--v8.patch.gz
On 13/11/16 23:02, Steve Singer wrote: > On 10/31/2016 06:38 AM, Petr Jelinek wrote: >> On 31/10/16 00:52, Steve Singer wrote: >> There are some fundamental issues with initial sync that need to be >> discussed on list but this one is not known. I'll try to convert this >> to test case (seems like useful one) and fix it, thanks for the >> report. In meantime I realized I broke the last patch in the series >> during rebase so attached is the fixed version. It also contains the >> type info in the protocol. >> > > Attached are some proposed documentation updates (to be applied ontop of > your 20161031 patch set) > Merged into v8, thanks! There is one exception though: > *** 195,214 **** > </para> > <para> > A conflict will produce an error and will stop the replication; it > ! must be resolved manually by the user. > </para> > <para> > ! The resolution can be done either by changing data on the subscriber > ! so that it does not conflict with incoming change or by skipping the > ! transaction that conflicts with the existing data. The transaction > ! can be skipped by calling the > ! <link linkend="pg-replication-origin-advance"> > ! <function>pg_replication_origin_advance()</function></link> function > ! with a <literal>node_name</> corresponding to the subscription name. The > ! current position of origins can be seen in the > ! <link linkend="view-pg-replication-origin-status"> > ! <structname>pg_replication_origin_status</structname></link> system view. > ! </para> > </sect1> > <sect1 id="logical-replication-architecture"> I don't see why this needs to be removed? Maybe it could be improved but certainly not removed? > Also > > <sect1 id="logical-replication-publication"> > <title>Publication</title> > > > + <para> > + The tables are matched using fully qualified table name. Renaming of > + tables or schemas is not supported. > + </para> > > Is renaming of tables any less supported than other DDL operations > For example > I changed that text as it means something completely different. > alter table nokey2 rename to nokey3 > select * FROM pg_publication_tables ; > pubname | schemaname | tablename > ---------+------------+----------- > tpub | public | nokey3 > (1 row) > > > If I then kill the postmaster on my subscriber and restart it, I get > > 2016-11-13 16:17:11.341 EST [29488] FATAL: the logical replication > target public.nokey3 not found > 2016-11-13 16:17:11.342 EST [29272] LOG: worker process: logical > replication worker 41076 (PID 29488) exited with exit code 1 > 2016-11-13 16:17:16.350 EST [29496] LOG: logical replication apply for > subscription nokeysub started > 2016-11-13 16:17:16.358 EST [29498] LOG: logical replication sync for > subscription nokeysub, table nokey2 started > 2016-11-13 16:17:16.515 EST [29498] ERROR: table public.nokey2 not > found on publisher > 2016-11-13 16:17:16.517 EST [29272] LOG: worker process: logical > replication worker 41076 sync 24688 (PID 29498) exited with exit code 1 > > but if I then rename the table on the subscriber everything seems to work. > > (I suspect the need to kill+restart is a bug, I've seen other instances > where a hard restart of the subscriber following changes to is required) > This is another initial sync patch bug. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Sun, 20 Nov 2016, Petr Jelinek wrote: > On 13/11/16 23:02, Steve Singer wrote: > There is one exception though: >> *** 195,214 **** >> </para> >> <para> >> A conflict will produce an error and will stop the replication; it >> ! must be resolved manually by the user. >> </para> >> <para> >> ! The resolution can be done either by changing data on the subscriber >> ! so that it does not conflict with incoming change or by skipping the >> ! transaction that conflicts with the existing data. The transaction >> ! can be skipped by calling the >> ! <link linkend="pg-replication-origin-advance"> >> ! <function>pg_replication_origin_advance()</function></link> function >> ! with a <literal>node_name</> corresponding to the subscription name. The >> ! current position of origins can be seen in the >> ! <link linkend="view-pg-replication-origin-status"> >> ! <structname>pg_replication_origin_status</structname></link> system view. >> ! </para> >> </sect1> >> <sect1 id="logical-replication-architecture"> > > I don't see why this needs to be removed? Maybe it could be improved but > certainly not removed? > Sorry, I was confused. I noticed that the function was missing in the patch and thought it was documentation for a function that you had removed from recent versions of the patch versus referencing a function that is already committed.
On 2016-11-20 19:06, Petr Jelinek wrote: > 0004-Add-SUBSCRIPTION-catalog-and-DDL-v8.patch.gz This patch contains 2 tabs which break the html build when using 'make oldhtml': $ ( cd /var/data1/pg_stuff/pg_sandbox/pgsql.logical_replication/doc/src/sgml; time make oldhtml ) make check-tabs make[1]: Entering directory `/var/data1/pg_stuff/pg_sandbox/pgsql.logical_replication/doc/src/sgml' ./ref/create_subscription.sgml: WITH (DISABLED); Tabs appear in SGML/XML files make[1]: *** [check-tabs] Error 1 make[1]: Leaving directory `/var/data1/pg_stuff/pg_sandbox/pgsql.logical_replication/doc/src/sgml' make: *** [oldhtml-stamp] Error 2 Very minor change, but it fixes that build. Thanks, Erik Rijkers
and the attachment... On 2016-11-22 14:55, Erik Rijkers wrote: > On 2016-11-20 19:06, Petr Jelinek wrote: >> 0004-Add-SUBSCRIPTION-catalog-and-DDL-v8.patch.gz > > This patch contains 2 tabs which break the html build when using 'make > oldhtml': > > $ ( cd > /var/data1/pg_stuff/pg_sandbox/pgsql.logical_replication/doc/src/sgml; > time make oldhtml ) > make check-tabs > make[1]: Entering directory > `/var/data1/pg_stuff/pg_sandbox/pgsql.logical_replication/doc/src/sgml' > ./ref/create_subscription.sgml: WITH (DISABLED); > Tabs appear in SGML/XML files > make[1]: *** [check-tabs] Error 1 > make[1]: Leaving directory > `/var/data1/pg_stuff/pg_sandbox/pgsql.logical_replication/doc/src/sgml' > make: *** [oldhtml-stamp] Error 2 > > Very minor change, but it fixes that build. > > Thanks, > > Erik Rijkers > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 2016-11-20 19:02, Petr Jelinek wrote: > 0001-Add-support-for-TE...cation-slots-v8.patch.gz (~8 KB) > 0002-Refactor-libpqwalreceiver-v8.patch.gz (~9 KB) > 0003-Add-PUBLICATION-catalogs-and-DDL-v8.patch.gz (~30 KB) > 0004-Add-SUBSCRIPTION-catalog-and-DDL-v8.patch.gz (~27 KB) > 0005-Define-logical-rep...output-plugi-v8.patch.gz (~13 KB) > 0006-Add-logical-replication-workers-v8.patch.gz (~43 KB) > 0007-Add-separate-synch...for-logical--v8.patch.gz (~2 KB) Apply, make, make check, install OK. A crash of the subscriber can be forced by running vacuum <published table> on the publisher. - publisher create table if not exists testt( id integer primary key, c text ); create publication pub1 for table testt; - subscriber create table if not exists testt( id integer primary key, c text ); create subscription sub1 connection 'dbname=testdb port=6444' publication pub1 with (disabled); alter subscription sub1 enable; - publisher vacuum testt; now data change on the published table, (perhaps also a select on the subscriber-side data) leads to: - subscriber log: TRAP: FailedAssertion("!(pointer != ((void *)0))", File: "mcxt.c", Line: 1001) 2016-11-22 18:13:13.983 CET 10177 LOG: worker process: ??)? (PID 10334) was terminated by signal 6: Aborted 2016-11-22 18:13:13.983 CET 10177 LOG: terminating any other active server processes 2016-11-22 18:13:13.983 CET 10338 WARNING: terminating connection because of crash of another server process 2016-11-22 18:13:13.983 CET 10338 DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. [...] Erik Rijkers
On 22/11/16 18:42, Erik Rijkers wrote: > On 2016-11-20 19:02, Petr Jelinek wrote: > >> 0001-Add-support-for-TE...cation-slots-v8.patch.gz (~8 KB) >> 0002-Refactor-libpqwalreceiver-v8.patch.gz (~9 KB) >> 0003-Add-PUBLICATION-catalogs-and-DDL-v8.patch.gz (~30 KB) >> 0004-Add-SUBSCRIPTION-catalog-and-DDL-v8.patch.gz (~27 KB) >> 0005-Define-logical-rep...output-plugi-v8.patch.gz (~13 KB) >> 0006-Add-logical-replication-workers-v8.patch.gz (~43 KB) >> 0007-Add-separate-synch...for-logical--v8.patch.gz (~2 KB) > > Apply, make, make check, install OK. > > > A crash of the subscriber can be forced by running vacuum <published > table> on the publisher. > > > - publisher > create table if not exists testt( id integer primary key, c text ); > create publication pub1 for table testt; > > - subscriber > create table if not exists testt( id integer primary key, c text ); > create subscription sub1 connection 'dbname=testdb port=6444' > publication pub1 with (disabled); > alter subscription sub1 enable; > > - publisher > vacuum testt; > > now data change on the published table, (perhaps also a select on the > subscriber-side data) leads to: > > > - subscriber log: > TRAP: FailedAssertion("!(pointer != ((void *)0))", File: "mcxt.c", Line: > 1001) > 2016-11-22 18:13:13.983 CET 10177 LOG: worker process: ??)? (PID 10334) > was terminated by signal 6: Aborted > 2016-11-22 18:13:13.983 CET 10177 LOG: terminating any other active > server processes > 2016-11-22 18:13:13.983 CET 10338 WARNING: terminating connection > because of crash of another server process > 2016-11-22 18:13:13.983 CET 10338 DETAIL: The postmaster has commanded > this server process to roll back the current transaction and exit, > because another server process exited abnormally and possibly corrupted > shared memory. > [...] > Hi, thanks for report. I very much doubt this is problem of vacuum as it does not send anything to subscriber. Is there anything else you did on those servers? -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2016-11-27 19:57, Petr Jelinek wrote: > On 22/11/16 18:42, Erik Rijkers wrote: >> On 2016-11-20 19:02, Petr Jelinek wrote: >> >>> 0001-Add-support-for-TE...cation-slots-v8.patch.gz (~8 KB) >>> 0002-Refactor-libpqwalreceiver-v8.patch.gz (~9 KB) >>> 0003-Add-PUBLICATION-catalogs-and-DDL-v8.patch.gz (~30 KB) >>> 0004-Add-SUBSCRIPTION-catalog-and-DDL-v8.patch.gz (~27 KB) >>> 0005-Define-logical-rep...output-plugi-v8.patch.gz (~13 KB) >>> 0006-Add-logical-replication-workers-v8.patch.gz (~43 KB) >>> 0007-Add-separate-synch...for-logical--v8.patch.gz (~2 KB) >> >> Apply, make, make check, install OK. >> >> >> A crash of the subscriber can be forced by running vacuum <published >> table> on the publisher. >> >> >> - publisher >> create table if not exists testt( id integer primary key, c text ); >> create publication pub1 for table testt; >> >> - subscriber >> create table if not exists testt( id integer primary key, c text ); >> create subscription sub1 connection 'dbname=testdb port=6444' >> publication pub1 with (disabled); >> alter subscription sub1 enable; >> >> - publisher >> vacuum testt; >> >> now data change on the published table, (perhaps also a select on the >> subscriber-side data) leads to: >> >> >> - subscriber log: >> TRAP: FailedAssertion("!(pointer != ((void *)0))", File: "mcxt.c", >> Line: >> 1001) > > I very much doubt this is problem of vacuum as it does not send > anything > to subscriber. Is there anything else you did on those servers? > It is not the vacuum that triggers the crash but the data change (insert or delete, on the publisher) /after/ that vacuum. Just now, I compiled 2 instances from master and such a crash (after vacuum + delete) seems reliable here. (If you can't duplicate such a crash let me know; then I'll dig out more precise set-up detail) (by the way, the logical replication between the two instances works well otherwise)
On 27/11/16 23:42, Erik Rijkers wrote: > On 2016-11-27 19:57, Petr Jelinek wrote: >> On 22/11/16 18:42, Erik Rijkers wrote: >>> On 2016-11-20 19:02, Petr Jelinek wrote: >>> >>>> 0001-Add-support-for-TE...cation-slots-v8.patch.gz (~8 KB) >>>> 0002-Refactor-libpqwalreceiver-v8.patch.gz (~9 KB) >>>> 0003-Add-PUBLICATION-catalogs-and-DDL-v8.patch.gz (~30 KB) >>>> 0004-Add-SUBSCRIPTION-catalog-and-DDL-v8.patch.gz (~27 KB) >>>> 0005-Define-logical-rep...output-plugi-v8.patch.gz (~13 KB) >>>> 0006-Add-logical-replication-workers-v8.patch.gz (~43 KB) >>>> 0007-Add-separate-synch...for-logical--v8.patch.gz (~2 KB) >>> >>> Apply, make, make check, install OK. >>> >>> >>> A crash of the subscriber can be forced by running vacuum <published >>> table> on the publisher. >>> >>> >>> - publisher >>> create table if not exists testt( id integer primary key, c text ); >>> create publication pub1 for table testt; >>> >>> - subscriber >>> create table if not exists testt( id integer primary key, c text ); >>> create subscription sub1 connection 'dbname=testdb port=6444' >>> publication pub1 with (disabled); >>> alter subscription sub1 enable; >>> >>> - publisher >>> vacuum testt; >>> >>> now data change on the published table, (perhaps also a select on the >>> subscriber-side data) leads to: >>> >>> >>> - subscriber log: >>> TRAP: FailedAssertion("!(pointer != ((void *)0))", File: "mcxt.c", Line: >>> 1001) > >> >> I very much doubt this is problem of vacuum as it does not send anything >> to subscriber. Is there anything else you did on those servers? >> > > It is not the vacuum that triggers the crash but the data change (insert > or delete, on the publisher) /after/ that vacuum. > > Just now, I compiled 2 instances from master and such a crash (after > vacuum + delete) seems reliable here. > > (If you can't duplicate such a crash let me know; then I'll dig out more > precise set-up detail) > I found the reason. It's not just vacuum (which was what confused me) it's when the publishing side sends the info about relation again (which happens when there was cache invalidation on the relation and then new data were written) and I did free one pointer that I never set. I'll send fixed patch tomorrow. Thanks! -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 27/11/16 23:54, Petr Jelinek wrote: > On 27/11/16 23:42, Erik Rijkers wrote: >> On 2016-11-27 19:57, Petr Jelinek wrote: >>> On 22/11/16 18:42, Erik Rijkers wrote: >>>> A crash of the subscriber can be forced by running vacuum <published >>>> table> on the publisher. >>>> >>>> >>>> - publisher >>>> create table if not exists testt( id integer primary key, c text ); >>>> create publication pub1 for table testt; >>>> >>>> - subscriber >>>> create table if not exists testt( id integer primary key, c text ); >>>> create subscription sub1 connection 'dbname=testdb port=6444' >>>> publication pub1 with (disabled); >>>> alter subscription sub1 enable; >>>> >>>> - publisher >>>> vacuum testt; >>>> >>>> now data change on the published table, (perhaps also a select on the >>>> subscriber-side data) leads to: >>>> >>>> >>>> - subscriber log: >>>> TRAP: FailedAssertion("!(pointer != ((void *)0))", File: "mcxt.c", Line: >>>> 1001) >> >>> >>> I very much doubt this is problem of vacuum as it does not send anything >>> to subscriber. Is there anything else you did on those servers? >>> >> >> It is not the vacuum that triggers the crash but the data change (insert >> or delete, on the publisher) /after/ that vacuum. >> >> Just now, I compiled 2 instances from master and such a crash (after >> vacuum + delete) seems reliable here. >> >> (If you can't duplicate such a crash let me know; then I'll dig out more >> precise set-up detail) >> > > I found the reason. It's not just vacuum (which was what confused me) > it's when the publishing side sends the info about relation again (which > happens when there was cache invalidation on the relation and then new > data were written) and I did free one pointer that I never set. I'll > send fixed patch tomorrow. > Thanks! > Okay, so here it is, I also included your doc fix, added test for REPLICA IDENTITY FULL (which also tests this issue as side effect) and fixed one relcache leak. I also rebased it against current master as there was some conflict in the bgworker.c. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
- 0001-Add-support-for-TEMPORARY-replication-slots-v9.patch.gz
- 0002-Refactor-libpqwalreceiver-v9.patch.gz
- 0003-Add-PUBLICATION-catalogs-and-DDL-v9.patch.gz
- 0004-Add-SUBSCRIPTION-catalog-and-DDL-v9.patch.gz
- 0005-Define-logical-replication-protocol-and-output-plugi-v9.patch.gz
- 0006-Add-logical-replication-workers-v9.patch.gz
- 0007-Add-separate-synchronous-commit-control-for-logical--v9.patch.gz
I have taken the libpqwalreceiver refactoring patch and split it into two: one for the latch change, one for the API change. I have done some mild editing. These two patches are now ready to commit in my mind. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 30/11/16 22:37, Peter Eisentraut wrote: > I have taken the libpqwalreceiver refactoring patch and split it into > two: one for the latch change, one for the API change. I have done some > mild editing. > > These two patches are now ready to commit in my mind. > Hi, looks good to me, do you plan to commit this soon or would you rather me to resubmit the patches rebased on top of this (and including this) first? -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 11/30/16 8:06 PM, Petr Jelinek wrote: > On 30/11/16 22:37, Peter Eisentraut wrote: >> I have taken the libpqwalreceiver refactoring patch and split it into >> two: one for the latch change, one for the API change. I have done some >> mild editing. >> >> These two patches are now ready to commit in my mind. > Hi, looks good to me, do you plan to commit this soon or would you > rather me to resubmit the patches rebased on top of this (and including > this) first? committed those two -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Dec 2, 2016 at 2:32 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 11/30/16 8:06 PM, Petr Jelinek wrote: >> On 30/11/16 22:37, Peter Eisentraut wrote: >>> I have taken the libpqwalreceiver refactoring patch and split it into >>> two: one for the latch change, one for the API change. I have done some >>> mild editing. >>> >>> These two patches are now ready to commit in my mind. > >> Hi, looks good to me, do you plan to commit this soon or would you >> rather me to resubmit the patches rebased on top of this (and including >> this) first? > > committed those two Commit 597a87ccc9a6fa8af7f3cf280b1e24e41807d555 left some comments behind that referred to the select() that it removed. Maybe rewrite like in the attached? I wonder if it would be worth creating and reusing a WaitEventSet here. -- Thomas Munro http://www.enterprisedb.com
Attachment
On 02/12/16 02:55, Thomas Munro wrote: > On Fri, Dec 2, 2016 at 2:32 PM, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >> On 11/30/16 8:06 PM, Petr Jelinek wrote: >>> On 30/11/16 22:37, Peter Eisentraut wrote: >>>> I have taken the libpqwalreceiver refactoring patch and split it into >>>> two: one for the latch change, one for the API change. I have done some >>>> mild editing. >>>> >>>> These two patches are now ready to commit in my mind. >> >>> Hi, looks good to me, do you plan to commit this soon or would you >>> rather me to resubmit the patches rebased on top of this (and including >>> this) first? >> >> committed those two > > Commit 597a87ccc9a6fa8af7f3cf280b1e24e41807d555 left some comments > behind that referred to the select() that it removed. Maybe rewrite > like in the attached? Agreed. > > I wonder if it would be worth creating and reusing a WaitEventSet here. > I don't think it's worth the extra code given that this is rarely called interface. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Petr Jelinek wrote: > On 02/12/16 02:55, Thomas Munro wrote: > > Commit 597a87ccc9a6fa8af7f3cf280b1e24e41807d555 left some comments > > behind that referred to the select() that it removed. Maybe rewrite > > like in the attached? > > Agreed. Thanks, pushed. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 11/20/16 1:02 PM, Petr Jelinek wrote: > 0001: > This is the reworked approach to temporary slots that I sent earlier. Andres, you had expressed an interest in this. Will you be able to review it soon? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, this is rebased version after one of the patches was committed and there were some renaming. I also did some small fixes around pg_dump and changes syntax slightly to what PeterE suggested in the beginning of the thread since I like it more as it looks more like English (PUBLISH_INSERT => PUBLISH INSERT, SLOT_NAME => SLOT NAME, etc). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
- 0001-Add-support-for-TEMPORARY-replication-slots-v10.patch.gz
- 0002-Add-PUBLICATION-catalogs-and-DDL-v10.patch.gz
- 0003-Add-SUBSCRIPTION-catalog-and-DDL-v10.patch.gz
- 0004-Define-logical-replication-protocol-and-output-plugi-v10.patch.gz
- 0005-Add-logical-replication-workers-v10.patch.gz
- 0006-Add-separate-synchronous-commit-control-for-logical--v10.patch.gz
On 02/12/16 19:35, Petr Jelinek wrote: > Hi, > > this is rebased version after one of the patches was committed and there > were some renaming. > > I also did some small fixes around pg_dump and changes syntax slightly > to what PeterE suggested in the beginning of the thread since I like it > more as it looks more like English (PUBLISH_INSERT => PUBLISH INSERT, > SLOT_NAME => SLOT NAME, etc). > Ah sorry, wrong attachment. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
- 0001-Add-support-for-TEMPORARY-replication-slots-v11.patch.gz
- 0002-Add-PUBLICATION-catalogs-and-DDL-v11.patch.gz
- 0003-Add-SUBSCRIPTION-catalog-and-DDL-v11.patch.gz
- 0004-Define-logical-replication-protocol-and-output-plugi-v11.patch.gz
- 0005-Add-logical-replication-workers-v11.patch.gz
- 0006-Add-separate-synchronous-commit-control-for-logical--v11.patch.gz
I massaged the temporary replication slot patch a bit. I changed the column name in pg_stat_replication_slots from "persistent" to "temporary" and flipped the logical sense, so that it is consistent with the creation commands. I also adjusted some comments and removed some changes in ReplicationSlotCreate() that didn't seem to do anything useful (might have been from a previous patch). The attached patch looks good to me. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Sun, Dec 4, 2016 at 12:06 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
I massaged the temporary replication slot patch a bit. I changed the
column name in pg_stat_replication_slots from "persistent" to
"temporary" and flipped the logical sense, so that it is consistent with
the creation commands. I also adjusted some comments and removed some
changes in ReplicationSlotCreate() that didn't seem to do anything
useful (might have been from a previous patch).
The attached patch looks good to me.
Moved to next CF with "needs review" status.
Regards,
Hari Babu
Fujitsu Australia
On 04/12/16 02:06, Peter Eisentraut wrote: > I massaged the temporary replication slot patch a bit. I changed the > column name in pg_stat_replication_slots from "persistent" to > "temporary" and flipped the logical sense, so that it is consistent with > the creation commands. I also adjusted some comments and removed some > changes in ReplicationSlotCreate() that didn't seem to do anything > useful (might have been from a previous patch). > > The attached patch looks good to me. > I think that the removal of changes to ReplicationSlotAcquire() that you did will result in making it impossible to reacquire temporary slot once you switched to different one in the session as the if (active_pid != 0) will always be true for temp slot. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2016-12-02 12:37:49 -0500, Peter Eisentraut wrote: > On 11/20/16 1:02 PM, Petr Jelinek wrote: > > 0001: > > This is the reworked approach to temporary slots that I sent earlier. > > Andres, you had expressed an interest in this. Will you be able to > review it soon? Yep. Needed to get that WIP stuff about expression evaluation and JITing out of the door first though. Regards, Andres
On 12/5/16 6:24 PM, Petr Jelinek wrote: > I think that the removal of changes to ReplicationSlotAcquire() that you > did will result in making it impossible to reacquire temporary slot once > you switched to different one in the session as the if (active_pid != 0) > will always be true for temp slot. I see. I suppose it's difficult to get a test case for this. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 12/6/16 11:58 AM, Peter Eisentraut wrote: > On 12/5/16 6:24 PM, Petr Jelinek wrote: >> I think that the removal of changes to ReplicationSlotAcquire() that you >> did will result in making it impossible to reacquire temporary slot once >> you switched to different one in the session as the if (active_pid != 0) >> will always be true for temp slot. > > I see. I suppose it's difficult to get a test case for this. I created a test case, saw the error of my ways, and added your code back in. Patch attached. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 08/12/16 20:16, Peter Eisentraut wrote: > On 12/6/16 11:58 AM, Peter Eisentraut wrote: >> On 12/5/16 6:24 PM, Petr Jelinek wrote: >>> I think that the removal of changes to ReplicationSlotAcquire() that you >>> did will result in making it impossible to reacquire temporary slot once >>> you switched to different one in the session as the if (active_pid != 0) >>> will always be true for temp slot. >> >> I see. I suppose it's difficult to get a test case for this. > > I created a test case, saw the error of my ways, and added your code > back in. Patch attached. > Hi, I am happy with this version, thanks for moving it forward. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 08/12/16 20:16, Peter Eisentraut wrote: > On 12/6/16 11:58 AM, Peter Eisentraut wrote: >> On 12/5/16 6:24 PM, Petr Jelinek wrote: >>> I think that the removal of changes to ReplicationSlotAcquire() that you >>> did will result in making it impossible to reacquire temporary slot once >>> you switched to different one in the session as the if (active_pid != 0) >>> will always be true for temp slot. >> >> I see. I suppose it's difficult to get a test case for this. > > I created a test case, saw the error of my ways, and added your code > back in. Patch attached. > Hi, I am happy with this version, thanks for moving it forward. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Here is a "fixup" patch for 0002-Add-PUBLICATION-catalogs-and-DDL-v11.patch.gz with some minor fixes. Two issues that should be addressed: 1. I think ALTER PUBLICATION does not need to require CREATE privilege on the database. That should be easy to change. 2. By requiring only SELECT privilege to include a table in a publication, someone could include a table without replica identity into a publication and thus prevent updates to the table. A while ago I had been working on a patch to create a new PUBLICATION privilege for this purpose. I have attached the in-progress patch here. We could either finish that up and include it, or commit your patch initially with requiring superuser and then refine the permissions later. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 2016-12-09 17:08, Peter Eisentraut wrote: Your earlier 0001-Add-support-for-temporary-replication-slots.patch could be applied instead of the similarly named, original patch by Petr. (I used 19fcc0058ecc8e5eb756547006bc1b24a93cbb80 to apply this patch-set to) (And it was, by the way, pretty stable and running well.) I'd like to get it running again but now I can't find a way to also include your newer 0001-fixup-Add-PUBLICATION-catalogs-and-DDL.patch of today. How should these patches be applied (and at what level)? 20161208: 0001-Add-support-for-temporary-replication-slots__petere.patch # petere 20161202: 0002-Add-PUBLICATION-catalogs-and-DDL-v11.patch # PJ 20161209: 0001-fixup-Add-PUBLICATION-catalogs-and-DDL.patch # petere 20161202: 0003-Add-SUBSCRIPTION-catalog-and-DDL-v11.patch # PJ 20161202: 0004-Define-logical-replication-protocol-and-output-plugi-v11.patch # PJ 20161202: 0005-Add-logical-replication-workers-v11.patch # PJ 20161202: 0006-Add-separate-synchronous-commit-control-for-logical--v11.patch # PJ Could (one of) you give me a hint? Thanks, Erik Rijkers
On 09/12/16 17:08, Peter Eisentraut wrote: > Here is a "fixup" patch for > 0002-Add-PUBLICATION-catalogs-and-DDL-v11.patch.gz with some minor fixes. > Thanks, merged. > Two issues that should be addressed: > > 1. I think ALTER PUBLICATION does not need to require CREATE privilege > on the database. That should be easy to change. > Right, I removed the check. > 2. By requiring only SELECT privilege to include a table in a > publication, someone could include a table without replica identity into > a publication and thus prevent updates to the table. > > A while ago I had been working on a patch to create a new PUBLICATION > privilege for this purpose. I have attached the in-progress patch here. > We could either finish that up and include it, or commit your patch > initially with requiring superuser and then refine the permissions later. > Hmm, good catch. I changed the SELECT privilege check to owner check for now, that seems relatively reasonable. I agree that we should eventually have special privilege for that though. But then we also need to invent privileges for PUBLICATIONs themselves for this to work reasonably as you need to be owner of PUBLICATION to add tables right now, so having PUBLICATION privilege on table does not seem to do an awful lot. Also I think if we add table privilege for this it's probably better named as PUBLISH rather than PUBLICATION but that's not really important. Attached new version with your updates and rebased on top of the current HEAD (the partitioning patch produced quite a few conflicts). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
- 0001-Add-support-for-temporary-replication-slots-v12.patch.gz
- 0002-Add-PUBLICATION-catalogs-and-DDL-v12.patch.gz
- 0003-Add-SUBSCRIPTION-catalog-and-DDL-v12.patch.gz
- 0004-Define-logical-replication-protocol-and-output-plugi-v12.patch.gz
- 0005-Add-logical-replication-workers-v12.patch.gz
- 0006-Add-separate-synchronous-commit-control-for-logical--v12.patch.gz
Hi, On 09/12/16 22:00, Erik Rijkers wrote: > On 2016-12-09 17:08, Peter Eisentraut wrote: > > Your earlier 0001-Add-support-for-temporary-replication-slots.patch > could be applied instead of the similarly named, original patch by Petr. > (I used 19fcc0058ecc8e5eb756547006bc1b24a93cbb80 to apply this patch-set > to) > > (And it was, by the way, pretty stable and running well.) > Great, thanks for testing. > I'd like to get it running again but now I can't find a way to also > include your newer 0001-fixup-Add-PUBLICATION-catalogs-and-DDL.patch of > today. > > How should these patches be applied (and at what level)? > > 20161208: 0001-Add-support-for-temporary-replication-slots__petere.patch > # petere > 20161202: 0002-Add-PUBLICATION-catalogs-and-DDL-v11.patch # PJ > 20161209: 0001-fixup-Add-PUBLICATION-catalogs-and-DDL.patch # petere > 20161202: 0003-Add-SUBSCRIPTION-catalog-and-DDL-v11.patch # PJ > 20161202: > 0004-Define-logical-replication-protocol-and-output-plugi-v11.patch # PJ > 20161202: 0005-Add-logical-replication-workers-v11.patch # PJ > 20161202: > 0006-Add-separate-synchronous-commit-control-for-logical--v11.patch # PJ > > Could (one of) you give me a hint? > I just sent in a rebased patch that includes all of it. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 12/8/16 4:10 PM, Petr Jelinek wrote: > On 08/12/16 20:16, Peter Eisentraut wrote: >> On 12/6/16 11:58 AM, Peter Eisentraut wrote: >>> On 12/5/16 6:24 PM, Petr Jelinek wrote: >>>> I think that the removal of changes to ReplicationSlotAcquire() that you >>>> did will result in making it impossible to reacquire temporary slot once >>>> you switched to different one in the session as the if (active_pid != 0) >>>> will always be true for temp slot. >>> >>> I see. I suppose it's difficult to get a test case for this. >> >> I created a test case, saw the error of my ways, and added your code >> back in. Patch attached. >> > > Hi, > > I am happy with this version, thanks for moving it forward. committed -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
HJi, On 2016-12-12 09:18:48 -0500, Peter Eisentraut wrote: > On 12/8/16 4:10 PM, Petr Jelinek wrote: > > On 08/12/16 20:16, Peter Eisentraut wrote: > >> On 12/6/16 11:58 AM, Peter Eisentraut wrote: > >>> On 12/5/16 6:24 PM, Petr Jelinek wrote: > >>>> I think that the removal of changes to ReplicationSlotAcquire() that you > >>>> did will result in making it impossible to reacquire temporary slot once > >>>> you switched to different one in the session as the if (active_pid != 0) > >>>> will always be true for temp slot. > >>> > >>> I see. I suppose it's difficult to get a test case for this. > >> > >> I created a test case, saw the error of my ways, and added your code > >> back in. Patch attached. > >> > > > > Hi, > > > > I am happy with this version, thanks for moving it forward. > > committed Hm. /* + * Cleanup all temporary slots created in current session. + */ +void +ReplicationSlotCleanup() I'd rather see a (void) there. The prototype has it, but still. + + /* + * No need for locking as we are only interested in slots active in + * current process and those are not touched by other processes. I'm a bit suspicious of this claim. Without a memory barrier you could actually look at outdated versions of active_pid. In practice there's enough full memory barriers in the slot creation code that it's guaranteed to not be the same pid from before a wraparound though. I think that doing iterations of slots without ReplicationSlotControlLock makes things more fragile, because suddenly assumptions that previously held aren't true anymore. E.g. factually/* * The slot is definitely gone. Lock out concurrentscans of the array * long enough to kill it. It's OK to clear the active flag here without * grabbing the mutexbecause nobody else can be scanning the array here, * and nobody can be attached to this slot and thus access it without* scanning the array. */ is now simply not true anymore. It's probably not harmfully broken, but at least you've changed the locking protocol without adapting comments. /* - * Permanently drop the currently acquired replication slot which will be - * released by the point this function returns. + * Permanently drop the currently acquired replication slot. */static voidReplicationSlotDropAcquired(void) Isn't that actually removing interesting information? Yes, the comment's been moved to ReplicationSlotDropPtr(), but that routine is an internal one... @@ -810,6 +810,9 @@ ProcKill(int code, Datum arg) if (MyReplicationSlot != NULL) ReplicationSlotRelease(); + /* Also cleanup all the temporary slots. */ + ReplicationSlotCleanup(); + So we now have exactly this code in several places. Why does a generically named Cleanup routine not also deal with a currently acquired slot? Right now it'd be more appropriately named ReplicationSlotDropTemporary() or such. @@ -1427,13 +1427,14 @@ pg_replication_slots| SELECT l.slot_name, l.slot_type, l.datoid, d.datname AS database, + l.temporary, l.active, l.active_pid, l.xmin, l.catalog_xmin, l.restart_lsn, l.confirmed_flush_lsn - FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, active, active_pid, xmin, catalog_xmin, restart_lsn,confirmed_flush_lsn) + FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin,restart_lsn, confirmed_flush_lsn) LEFT JOIN pg_database d ON ((l.datoid = d.oid)));pg_roles| SELECT pg_authid.rolname, pg_authid.rolsuper, If we start to expose this, shouldn't we expose the persistency instead (i.e. persistent/ephemeral/temporary)? new file contrib/test_decoding/sql/slot.sql @@ -0,0 +1,20 @@ +SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_p', 'test_decoding'); +SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_t', 'test_decoding', true); + +SELECT pg_drop_replication_slot('regression_slot_p'); +SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_p', 'test_decoding', false); + +-- reconnect to clean temp slots +\c Can we add multiple slots to clean up here? Can we also add a test for the cleanup on error for temporary slots? E.g. something like in ddl.sql (maybe we should actually move some of the relevant tests from there to here). It'd also be good to test this with physical slots? +-- test switching between slots in a session +SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot1', 'test_decoding', true); +SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot2', 'test_decoding', true); +SELECT * FROM pg_logical_slot_get_changes('regression_slot1', NULL, NULL); +SELECT * FROM pg_logical_slot_get_changes('regression_slot2', NULL, NULL); Can we actually output something? Right now this doesn't test that much... - Andres
On 13/12/16 01:33, Andres Freund wrote: > HJi, > > On 2016-12-12 09:18:48 -0500, Peter Eisentraut wrote: >> On 12/8/16 4:10 PM, Petr Jelinek wrote: >>> On 08/12/16 20:16, Peter Eisentraut wrote: >>>> On 12/6/16 11:58 AM, Peter Eisentraut wrote: >>>>> On 12/5/16 6:24 PM, Petr Jelinek wrote: >>>>>> I think that the removal of changes to ReplicationSlotAcquire() that you >>>>>> did will result in making it impossible to reacquire temporary slot once >>>>>> you switched to different one in the session as the if (active_pid != 0) >>>>>> will always be true for temp slot. >>>>> >>>>> I see. I suppose it's difficult to get a test case for this. >>>> >>>> I created a test case, saw the error of my ways, and added your code >>>> back in. Patch attached. >>>> >>> >>> Hi, >>> >>> I am happy with this version, thanks for moving it forward. >> >> committed > > Hm. > > /* > + * Cleanup all temporary slots created in current session. > + */ > +void > +ReplicationSlotCleanup() > > I'd rather see a (void) there. The prototype has it, but still. > > > + > + /* > + * No need for locking as we are only interested in slots active in > + * current process and those are not touched by other processes. > > I'm a bit suspicious of this claim. Without a memory barrier you could > actually look at outdated versions of active_pid. In practice there's > enough full memory barriers in the slot creation code that it's > guaranteed to not be the same pid from before a wraparound though. > > I think that doing iterations of slots without > ReplicationSlotControlLock makes things more fragile, because suddenly > assumptions that previously held aren't true anymore. E.g. factually > /* > * The slot is definitely gone. Lock out concurrent scans of the array > * long enough to kill it. It's OK to clear the active flag here without > * grabbing the mutex because nobody else can be scanning the array here, > * and nobody can be attached to this slot and thus access it without > * scanning the array. > */ > is now simply not true anymore. It's probably not harmfully broken, but > at least you've changed the locking protocol without adapting comments. > Well it's protected by being called only by ReplicationSlotCleanup() and ReplicationSlotDropAcquired(). The comment could be improved though, yes. Holding the ReplicationSlotControlLock in the scan is somewhat problematic because ReplicationSlotDropPtr tryes to use it as well (and in exclusive mode), so we'd have to do exclusive lock in ReplicationSlotCleanup() which I don't really like much. > > /* > - * Permanently drop the currently acquired replication slot which will be > - * released by the point this function returns. > + * Permanently drop the currently acquired replication slot. > */ > static void > ReplicationSlotDropAcquired(void) > > Isn't that actually removing interesting information? Yes, the comment's > been moved to ReplicationSlotDropPtr(), but that routine is an internal > one... > ReplicationSlotDropAcquired() is internal one as well. > > @@ -810,6 +810,9 @@ ProcKill(int code, Datum arg) > if (MyReplicationSlot != NULL) > ReplicationSlotRelease(); > > + /* Also cleanup all the temporary slots. */ > + ReplicationSlotCleanup(); > + > > So we now have exactly this code in several places. Why does a > generically named Cleanup routine not also deal with a currently > acquired slot? Right now it'd be more appropriately named > ReplicationSlotDropTemporary() or such. > It definitely could release MyReplicationSlot as well. > > @@ -1427,13 +1427,14 @@ pg_replication_slots| SELECT l.slot_name, > l.slot_type, > l.datoid, > d.datname AS database, > + l.temporary, > l.active, > l.active_pid, > l.xmin, > l.catalog_xmin, > l.restart_lsn, > l.confirmed_flush_lsn > - FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, active, active_pid, xmin, catalog_xmin, restart_lsn,confirmed_flush_lsn) > + FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin,restart_lsn, confirmed_flush_lsn) > LEFT JOIN pg_database d ON ((l.datoid = d.oid))); > pg_roles| SELECT pg_authid.rolname, > pg_authid.rolsuper, > > If we start to expose this, shouldn't we expose the persistency instead > (i.e. persistent/ephemeral/temporary)? > Not sure how much is that useful given that ephemeral is transient state only present during slot creation. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2016-12-10 08:48:55 +0100, Petr Jelinek wrote: > diff --git a/src/backend/catalog/pg_publication.c b/src/backend/catalog/pg_publication.c > new file mode 100644 > index 0000000..e3560b7 > --- /dev/null > +++ b/src/backend/catalog/pg_publication.c > + > +Datum pg_get_publication_tables(PG_FUNCTION_ARGS); Don't we usually put these in a header? > +/* > + * Insert new publication / relation mapping. > + */ > +ObjectAddress > +publication_add_relation(Oid pubid, Relation targetrel, > + bool if_not_exists) > +{ > + Relation rel; > + HeapTuple tup; > + Datum values[Natts_pg_publication_rel]; > + bool nulls[Natts_pg_publication_rel]; > + Oid relid = RelationGetRelid(targetrel); > + Oid prrelid; > + Publication *pub = GetPublication(pubid); > + ObjectAddress myself, > + referenced; > + > + rel = heap_open(PublicationRelRelationId, RowExclusiveLock); > + > + /* Check for duplicates */ Maybe mention that that check is racy, but a unique index protects against the race? > + /* Insert tuple into catalog. */ > + prrelid = simple_heap_insert(rel, tup); > + CatalogUpdateIndexes(rel, tup); > + heap_freetuple(tup); > + > + ObjectAddressSet(myself, PublicationRelRelationId, prrelid); > + > + /* Add dependency on the publication */ > + ObjectAddressSet(referenced, PublicationRelationId, pubid); > + recordDependencyOn(&myself, &referenced, DEPENDENCY_AUTO); > + > + /* Add dependency on the relation */ > + ObjectAddressSet(referenced, RelationRelationId, relid); > + recordDependencyOn(&myself, &referenced, DEPENDENCY_AUTO); > + > + /* Close the table. */ > + heap_close(rel, RowExclusiveLock); I'm not quite sure abou the policy, but shouldn't we invoke InvokeObjectPostCreateHook etc here? > +/* > + * Gets list of relation oids for a publication. > + * > + * This should only be used for normal publications, the FOR ALL TABLES > + * should use GetAllTablesPublicationRelations(). > + */ > +List * > +GetPublicationRelations(Oid pubid) > +{ > + List *result; > + Relation pubrelsrel; > + ScanKeyData scankey; > + SysScanDesc scan; > + HeapTuple tup; > + > + /* Find all publications associated with the relation. */ > + pubrelsrel = heap_open(PublicationRelRelationId, AccessShareLock); > + > + ScanKeyInit(&scankey, > + Anum_pg_publication_rel_prpubid, > + BTEqualStrategyNumber, F_OIDEQ, > + ObjectIdGetDatum(pubid)); > + > + scan = systable_beginscan(pubrelsrel, PublicationRelMapIndexId, true, > + NULL, 1, &scankey); > + > + result = NIL; > + while (HeapTupleIsValid(tup = systable_getnext(scan))) > + { > + Form_pg_publication_rel pubrel; > + > + pubrel = (Form_pg_publication_rel) GETSTRUCT(tup); > + > + result = lappend_oid(result, pubrel->prrelid); > + } > + > + systable_endscan(scan); > + heap_close(pubrelsrel, NoLock); In other parts of this you drop the lock, but not here? > + heap_close(rel, NoLock); > + > + return result; > +} and here. > +/* > + * Gets list of all relation published by FOR ALL TABLES publication(s). > + */ > +List * > +GetAllTablesPublicationRelations(void) > +{ > + Relation classRel; > + ScanKeyData key[1]; > + HeapScanDesc scan; > + HeapTuple tuple; > + List *result = NIL; > + > + classRel = heap_open(RelationRelationId, AccessShareLock); > + heap_endscan(scan); > + heap_close(classRel, AccessShareLock); > + > + return result; > +} but here. Btw, why are matviews not publishable? > +/* > + * Get Publication using name. > + */ > +Publication * > +GetPublicationByName(const char *pubname, bool missing_ok) > +{ > + Oid oid; > + > + oid = GetSysCacheOid1(PUBLICATIONNAME, CStringGetDatum(pubname)); > + if (!OidIsValid(oid)) > + { > + if (missing_ok) > + return NULL; > + > + ereport(ERROR, > + (errcode(ERRCODE_UNDEFINED_OBJECT), > + errmsg("publication \"%s\" does not exist", pubname))); > + } > + > + return GetPublication(oid); > +} That's racy... Also, shouldn't we specify for how to deal with the returned memory for Publication * returning methods? > diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c > new file mode 100644 > index 0000000..954b2bd > --- /dev/null > +++ b/src/backend/commands/publicationcmds.c > @@ -0,0 +1,613 @@ > +/* > + * Create new publication. > + */ > +ObjectAddress > +CreatePublication(CreatePublicationStmt *stmt) > +{ > + Relation rel; > + > + values[Anum_pg_publication_puballtables - 1] = > + BoolGetDatum(stmt->for_all_tables); > + values[Anum_pg_publication_pubinsert - 1] = > + BoolGetDatum(publish_insert); > + values[Anum_pg_publication_pubupdate - 1] = > + BoolGetDatum(publish_update); > + values[Anum_pg_publication_pubdelete - 1] = > + BoolGetDatum(publish_delete); I remain convinced that a different representation would be better. There'll be more options over time (truncate, DDL at least). > +static void > +AlterPublicationOptions(AlterPublicationStmt *stmt, Relation rel, > + HeapTuple tup) > +{ > + bool publish_insert_given; > + bool publish_update_given; > + bool publish_delete_given; > + bool publish_insert; > + bool publish_update; > + bool publish_delete; > + ObjectAddress obj; > + > + parse_publication_options(stmt->options, > + &publish_insert_given, &publish_insert, > + &publish_update_given, &publish_update, > + &publish_delete_given, &publish_delete); You could pass it a struct instead... > +static List * > +OpenTableList(List *tables) > +{ > + List *relids = NIL; > + List *rels = NIL; > + ListCell *lc; > + > + /* > + * Open, share-lock, and check all the explicitly-specified relations > + */ > + foreach(lc, tables) > + { > + RangeVar *rv = lfirst(lc); > + Relation rel; > + bool recurse = interpretInhOption(rv->inhOpt); > + Oid myrelid; > + > + rel = heap_openrv(rv, ShareUpdateExclusiveLock); > + myrelid = RelationGetRelid(rel); > + /* filter out duplicates when user specifies "foo, foo" */ > + if (list_member_oid(relids, myrelid)) > + { > + heap_close(rel, ShareUpdateExclusiveLock); > + continue; > + } This is a quadratic algorithm - that could bite us... Not sure if we need to care. If we want to fix it, one approach owuld be to use RangeVarGetRelid() instead, and then do a qsort/deduplicate before actually opening the relations. > > -def_elem: ColLabel '=' def_arg > +def_elem: def_key '=' def_arg > { > $$ = makeDefElem($1, (Node *) $3, @1); > } > - | ColLabel > + | def_key > { > $$ = makeDefElem($1, NULL, @1); > } > ; > +def_key: > + ColLabel { $$ = $1; } > + | ColLabel ColLabel { $$ = psprintf("%s %s", $1, $2); } > + ; > + Not quite sure what this is about? Doesn't that change the accepted syntax in a bunch of places? > @@ -2337,6 +2338,8 @@ RelationDestroyRelation(Relation relation, bool remember_tupdesc) > bms_free(relation->rd_indexattr); > bms_free(relation->rd_keyattr); > bms_free(relation->rd_idattr); > + if (relation->rd_pubactions) > + pfree(relation->rd_pubactions); > if (relation->rd_options) > pfree(relation->rd_options); > if (relation->rd_indextuple) > @@ -4992,6 +4995,67 @@ RelationGetExclusionInfo(Relation indexRelation, > MemoryContextSwitchTo(oldcxt); > } > > +/* > + * Get publication actions for the given relation. > + */ > +struct PublicationActions * > +GetRelationPublicationActions(Relation relation) > +{ > + List *puboids; > + ListCell *lc; > + MemoryContext oldcxt; > + PublicationActions *pubactions = palloc0(sizeof(PublicationActions)); > + > + if (relation->rd_pubactions) > + return memcpy(pubactions, relation->rd_pubactions, > + sizeof(PublicationActions)); > + > + /* Fetch the publication membership info. */ > + puboids = GetRelationPublications(RelationGetRelid(relation)); > + puboids = list_concat_unique_oid(puboids, GetAllTablesPublications()); > + > + foreach(lc, puboids) > + { > + Oid pubid = lfirst_oid(lc); > + HeapTuple tup; > + Form_pg_publication pubform; > + > + tup = SearchSysCache1(PUBLICATIONOID, ObjectIdGetDatum(pubid)); > + > + if (!HeapTupleIsValid(tup)) > + elog(ERROR, "cache lookup failed for publication %u", pubid); > + > + pubform = (Form_pg_publication) GETSTRUCT(tup); > + > + pubactions->pubinsert |= pubform->pubinsert; > + pubactions->pubupdate |= pubform->pubupdate; > + pubactions->pubdelete |= pubform->pubdelete; > + > + ReleaseSysCache(tup); > + > + /* > + * If we know everything is replicated, there is no point to check > + * for other publications. > + */ > + if (pubactions->pubinsert && pubactions->pubupdate && > + pubactions->pubdelete) > + break; > + } > + > + if (relation->rd_pubactions) > + { > + pfree(relation->rd_pubactions); > + relation->rd_pubactions = NULL; > + } > + > + /* Now save copy of the actions in the relcache entry. */ > + oldcxt = MemoryContextSwitchTo(CacheMemoryContext); > + relation->rd_pubactions = palloc(sizeof(PublicationActions)); > + memcpy(relation->rd_pubactions, pubactions, sizeof(PublicationActions)); > + MemoryContextSwitchTo(oldcxt); > + > + return pubactions; > +} Hm. Do we actually have enough cache invalidation support to make this cached version correct? I haven't seen anything in that regard? Seems to mean that all changes to an ALL TABLES publication need to do a global relcache invalidation? - Andres
On 13/12/16 02:41, Andres Freund wrote: > On 2016-12-10 08:48:55 +0100, Petr Jelinek wrote: > >> diff --git a/src/backend/catalog/pg_publication.c b/src/backend/catalog/pg_publication.c >> new file mode 100644 >> index 0000000..e3560b7 >> --- /dev/null >> +++ b/src/backend/catalog/pg_publication.c >> + >> +Datum pg_get_publication_tables(PG_FUNCTION_ARGS); > > Don't we usually put these in a header? > We put these to rather random places, I don't mind either way. > >> +/* >> + * Gets list of relation oids for a publication. >> + * >> + * This should only be used for normal publications, the FOR ALL TABLES >> + * should use GetAllTablesPublicationRelations(). >> + */ >> +List * >> +GetPublicationRelations(Oid pubid) >> +{ >> + List *result; >> + Relation pubrelsrel; >> + ScanKeyData scankey; >> + SysScanDesc scan; >> + HeapTuple tup; >> + >> + /* Find all publications associated with the relation. */ >> + pubrelsrel = heap_open(PublicationRelRelationId, AccessShareLock); >> + >> + ScanKeyInit(&scankey, >> + Anum_pg_publication_rel_prpubid, >> + BTEqualStrategyNumber, F_OIDEQ, >> + ObjectIdGetDatum(pubid)); >> + >> + scan = systable_beginscan(pubrelsrel, PublicationRelMapIndexId, true, >> + NULL, 1, &scankey); >> + >> + result = NIL; >> + while (HeapTupleIsValid(tup = systable_getnext(scan))) >> + { >> + Form_pg_publication_rel pubrel; >> + >> + pubrel = (Form_pg_publication_rel) GETSTRUCT(tup); >> + >> + result = lappend_oid(result, pubrel->prrelid); >> + } >> + >> + systable_endscan(scan); >> + heap_close(pubrelsrel, NoLock); > > In other parts of this you drop the lock, but not here? > > >> + heap_close(rel, NoLock); >> + >> + return result; >> +} > > and here. > Meh, ignore, that's some pglogical legacy. > > Btw, why are matviews not publishable? > Because standard way of updating them is REFRESH MATERIALIZED VIEW which is decoded as inserts into pg_temp_<oid> table. I think we'll have to rethink how we do this before we can sanely support them. >> +/* >> + * Get Publication using name. >> + */ >> +Publication * >> +GetPublicationByName(const char *pubname, bool missing_ok) >> +{ >> + Oid oid; >> + >> + oid = GetSysCacheOid1(PUBLICATIONNAME, CStringGetDatum(pubname)); >> + if (!OidIsValid(oid)) >> + { >> + if (missing_ok) >> + return NULL; >> + >> + ereport(ERROR, >> + (errcode(ERRCODE_UNDEFINED_OBJECT), >> + errmsg("publication \"%s\" does not exist", pubname))); >> + } >> + >> + return GetPublication(oid); >> +} > > That's racy... Also, shouldn't we specify for how to deal with the > returned memory for Publication * returning methods? > So are most of the other existing functions with similar purpose. The worst case is that with enough concurrency around same publication name DDL you'll get cache lookup failure. I added comment to GetPublication saying that memory is palloced. > >> diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c >> new file mode 100644 >> index 0000000..954b2bd >> --- /dev/null >> +++ b/src/backend/commands/publicationcmds.c >> @@ -0,0 +1,613 @@ > >> +/* >> + * Create new publication. >> + */ >> +ObjectAddress >> +CreatePublication(CreatePublicationStmt *stmt) >> +{ >> + Relation rel; > >> + >> + values[Anum_pg_publication_puballtables - 1] = >> + BoolGetDatum(stmt->for_all_tables); >> + values[Anum_pg_publication_pubinsert - 1] = >> + BoolGetDatum(publish_insert); >> + values[Anum_pg_publication_pubupdate - 1] = >> + BoolGetDatum(publish_update); >> + values[Anum_pg_publication_pubdelete - 1] = >> + BoolGetDatum(publish_delete); > > I remain convinced that a different representation would be > better. There'll be more options over time (truncate, DDL at least). > So? It's boolean properties, it's not like we store bitmaps in catalogs much. I very much expect DDL to be much more complex than boolean btw. > >> +static void >> +AlterPublicationOptions(AlterPublicationStmt *stmt, Relation rel, >> + HeapTuple tup) >> +{ >> + bool publish_insert_given; >> + bool publish_update_given; >> + bool publish_delete_given; >> + bool publish_insert; >> + bool publish_update; >> + bool publish_delete; >> + ObjectAddress obj; >> + >> + parse_publication_options(stmt->options, >> + &publish_insert_given, &publish_insert, >> + &publish_update_given, &publish_update, >> + &publish_delete_given, &publish_delete); > > You could pass it a struct instead... > Here yes, but in similar code for subscription not, I slightly prefer consistency between those similar functions. > >> +static List * >> +OpenTableList(List *tables) >> +{ >> + List *relids = NIL; >> + List *rels = NIL; >> + ListCell *lc; >> + >> + /* >> + * Open, share-lock, and check all the explicitly-specified relations >> + */ >> + foreach(lc, tables) >> + { >> + RangeVar *rv = lfirst(lc); >> + Relation rel; >> + bool recurse = interpretInhOption(rv->inhOpt); >> + Oid myrelid; >> + >> + rel = heap_openrv(rv, ShareUpdateExclusiveLock); >> + myrelid = RelationGetRelid(rel); >> + /* filter out duplicates when user specifies "foo, foo" */ >> + if (list_member_oid(relids, myrelid)) >> + { >> + heap_close(rel, ShareUpdateExclusiveLock); >> + continue; >> + } > > This is a quadratic algorithm - that could bite us... Not sure if we > need to care. If we want to fix it, one approach owuld be to use > RangeVarGetRelid() instead, and then do a qsort/deduplicate before > actually opening the relations. > I guess it could get really slow only with big inheritance tree, I'll look into how much work is the other way of doing things (this is not exactly hot code path). >> >> -def_elem: ColLabel '=' def_arg >> +def_elem: def_key '=' def_arg >> { >> $$ = makeDefElem($1, (Node *) $3, @1); >> } >> - | ColLabel >> + | def_key >> { >> $$ = makeDefElem($1, NULL, @1); >> } >> ; > >> +def_key: >> + ColLabel { $$ = $1; } >> + | ColLabel ColLabel { $$ = psprintf("%s %s", $1, $2); } >> + ; >> + > > Not quite sure what this is about? Doesn't that change the accepted > syntax in a bunch of places? > Well all those places have to check the actual values in the C code later. It will change the error message a bit in some DDL. I made it this way so that we don't have to introduce same thing as definition with just this small change. > >> @@ -2337,6 +2338,8 @@ RelationDestroyRelation(Relation relation, bool remember_tupdesc) >> bms_free(relation->rd_indexattr); >> bms_free(relation->rd_keyattr); >> bms_free(relation->rd_idattr); >> + if (relation->rd_pubactions) >> + pfree(relation->rd_pubactions); >> if (relation->rd_options) >> pfree(relation->rd_options); >> if (relation->rd_indextuple) >> @@ -4992,6 +4995,67 @@ RelationGetExclusionInfo(Relation indexRelation, >> MemoryContextSwitchTo(oldcxt); >> } >> >> +/* >> + * Get publication actions for the given relation. >> + */ >> +struct PublicationActions * >> +GetRelationPublicationActions(Relation relation) >> +{ >> + List *puboids; >> + ListCell *lc; >> + MemoryContext oldcxt; >> + PublicationActions *pubactions = palloc0(sizeof(PublicationActions)); >> + >> + if (relation->rd_pubactions) >> + return memcpy(pubactions, relation->rd_pubactions, >> + sizeof(PublicationActions)); >> + >> + /* Fetch the publication membership info. */ >> + puboids = GetRelationPublications(RelationGetRelid(relation)); >> + puboids = list_concat_unique_oid(puboids, GetAllTablesPublications()); >> + >> + foreach(lc, puboids) >> + { >> + Oid pubid = lfirst_oid(lc); >> + HeapTuple tup; >> + Form_pg_publication pubform; >> + >> + tup = SearchSysCache1(PUBLICATIONOID, ObjectIdGetDatum(pubid)); >> + >> + if (!HeapTupleIsValid(tup)) >> + elog(ERROR, "cache lookup failed for publication %u", pubid); >> + >> + pubform = (Form_pg_publication) GETSTRUCT(tup); >> + >> + pubactions->pubinsert |= pubform->pubinsert; >> + pubactions->pubupdate |= pubform->pubupdate; >> + pubactions->pubdelete |= pubform->pubdelete; >> + >> + ReleaseSysCache(tup); >> + >> + /* >> + * If we know everything is replicated, there is no point to check >> + * for other publications. >> + */ >> + if (pubactions->pubinsert && pubactions->pubupdate && >> + pubactions->pubdelete) >> + break; >> + } >> + >> + if (relation->rd_pubactions) >> + { >> + pfree(relation->rd_pubactions); >> + relation->rd_pubactions = NULL; >> + } >> + >> + /* Now save copy of the actions in the relcache entry. */ >> + oldcxt = MemoryContextSwitchTo(CacheMemoryContext); >> + relation->rd_pubactions = palloc(sizeof(PublicationActions)); >> + memcpy(relation->rd_pubactions, pubactions, sizeof(PublicationActions)); >> + MemoryContextSwitchTo(oldcxt); >> + >> + return pubactions; >> +} > > > Hm. Do we actually have enough cache invalidation support to make this > cached version correct? I haven't seen anything in that regard? Seems > to mean that all changes to an ALL TABLES publication need to do a > global relcache invalidation? > Yeah you're right, we definitely don't do enough relcache invalidation for this. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 13/12/16 03:26, Petr Jelinek wrote: > On 13/12/16 02:41, Andres Freund wrote: >> On 2016-12-10 08:48:55 +0100, Petr Jelinek wrote: >> >>> +static List * >>> +OpenTableList(List *tables) >>> +{ >>> + List *relids = NIL; >>> + List *rels = NIL; >>> + ListCell *lc; >>> + >>> + /* >>> + * Open, share-lock, and check all the explicitly-specified relations >>> + */ >>> + foreach(lc, tables) >>> + { >>> + RangeVar *rv = lfirst(lc); >>> + Relation rel; >>> + bool recurse = interpretInhOption(rv->inhOpt); >>> + Oid myrelid; >>> + >>> + rel = heap_openrv(rv, ShareUpdateExclusiveLock); >>> + myrelid = RelationGetRelid(rel); >>> + /* filter out duplicates when user specifies "foo, foo" */ >>> + if (list_member_oid(relids, myrelid)) >>> + { >>> + heap_close(rel, ShareUpdateExclusiveLock); >>> + continue; >>> + } >> >> This is a quadratic algorithm - that could bite us... Not sure if we >> need to care. If we want to fix it, one approach owuld be to use >> RangeVarGetRelid() instead, and then do a qsort/deduplicate before >> actually opening the relations. >> > > I guess it could get really slow only with big inheritance tree, I'll > look into how much work is the other way of doing things (this is not > exactly hot code path). > Actually looking at it, it only processes user input so I don't think it's very problematic in terms of performance. You'd have to pass many thousands of tables in single DDL to notice. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 12/10/16 2:48 AM, Petr Jelinek wrote: > Attached new version with your updates and rebased on top of the current > HEAD (the partitioning patch produced quite a few conflicts). I have attached a few more "fixup" patches, mostly with some editing of documentation and comments and some compiler warnings. In 0006 in the protocol documentation I have left a "XXX ???" where I didn't understand what it was trying to say. All issues from (my) previous reviews appear to have been addressed. Comments besides that: 0003-Add-SUBSCRIPTION-catalog-and-DDL-v12.patch Still wondering about the best workflow with pg_dump, but it seems all the pieces are there right now, and the interfaces can be tweaked later. DROP SUBSCRIPTION requires superuser, but should perhaps be owner check only? DROP SUBSCRIPTION IF EXISTS crashes if the subscription does not in fact exist. Maybe write the grammar so that SLOT does not need to be a new key word. The changes you made for CREATE PUBLICATION should allow that. The tests are not added to serial_schedule. Intentional? If so, document? 0004-Define-logical-replication-protocol-and-output-plugi-v12.patch Not sure why pg_catalog is encoded as a zero-length string. I guess it saves some space. Maybe that could be explained in a brief code comment? 0005-Add-logical-replication-workers-v12.patch The way the executor stuff is organized now looks better to me. The subscriber crashes if max_replication_slots is 0: TRAP: FailedAssertion("!(max_replication_slots > 0)", File: "origin.c", Line: 999) The documentation says that replication slots are required on the subscriber, but from a user's perspective, it's not clear why that is. Dropping a table that is part of a live subscription results in log messages like WARNING: leaked hash_seq_search scan for hash table 0x7f9d2a807238 I was testing replicating into a temporary table, which failed like this: FATAL: the logical replication target public.test1 not found LOG: worker process: (PID 2879) exited with exit code 1 LOG: starting logical replication worker for subscription 16392 LOG: logical replication apply for subscription mysub started That's okay, but those messages were repeated every few seconds or so and would create quite some log volume. I wonder if that needs to be reigned in somewhat. I think this is getting very close to the point where it's committable. So if anyone else has major concerns about the whole approach and perhaps the way the new code in 0005 is organized, now would be the time ... -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Hi, On 2016-12-13 15:42:17 -0500, Peter Eisentraut wrote: > I think this is getting very close to the point where it's committable. > So if anyone else has major concerns about the whole approach and > perhaps the way the new code in 0005 is organized, now would be the time ... Uh. The whole cache invalidation thing is completely unresolved, and that's just the publication patch. I've not looked in detail at later patches. So no, I don't think so. I think after the invalidation issue is resolved the publication patch might be close to being ready. I'm doubtful the later patches are. Greetings, Andres Freund
On 2016-12-13 06:55:31 +0100, Petr Jelinek wrote: > >> This is a quadratic algorithm - that could bite us... Not sure if we > >> need to care. If we want to fix it, one approach owuld be to use > >> RangeVarGetRelid() instead, and then do a qsort/deduplicate before > >> actually opening the relations. > >> > > > > I guess it could get really slow only with big inheritance tree, I'll > > look into how much work is the other way of doing things (this is not > > exactly hot code path). > > > > Actually looking at it, it only processes user input so I don't think > it's very problematic in terms of performance. You'd have to pass many > thousands of tables in single DDL to notice. Well, at least we should put a CHECK_FOR_INTERRUPTS there. At the moment it's IIRC uninterruptible, which isn't good for something directly triggered by the user. A comment that it's known to be O(n^2), but considered acceptable, would be good too. Andres
On 12/12/16 7:33 PM, Andres Freund wrote: > +-- test switching between slots in a session > +SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot1', 'test_decoding', true); > +SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot2', 'test_decoding', true); > +SELECT * FROM pg_logical_slot_get_changes('regression_slot1', NULL, NULL); > +SELECT * FROM pg_logical_slot_get_changes('regression_slot2', NULL, NULL); > > Can we actually output something? Right now this doesn't test that > much... This test was added because an earlier version of the patch would crash on this. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 14/12/16 01:26, Peter Eisentraut wrote: > On 12/12/16 7:33 PM, Andres Freund wrote: >> +-- test switching between slots in a session >> +SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot1', 'test_decoding', true); >> +SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot2', 'test_decoding', true); >> +SELECT * FROM pg_logical_slot_get_changes('regression_slot1', NULL, NULL); >> +SELECT * FROM pg_logical_slot_get_changes('regression_slot2', NULL, NULL); >> >> Can we actually output something? Right now this doesn't test that >> much... > > This test was added because an earlier version of the patch would crash > on this. > I did improve the test as part of the tests improvements that were sent to committers list btw. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 13/12/16 22:05, Andres Freund wrote: > Hi, > > On 2016-12-13 15:42:17 -0500, Peter Eisentraut wrote: >> I think this is getting very close to the point where it's committable. >> So if anyone else has major concerns about the whole approach and >> perhaps the way the new code in 0005 is organized, now would be the time ... > > Uh. The whole cache invalidation thing is completely unresolved, and > that's just the publication patch. I've not looked in detail at later > patches. So no, I don't think so. > I already have code for that. I'll submit next version once I'll go over PeterE's review. BTW the relcache thing is not as bad as it seems from the publication patch because output plugin has to deal with relcache/publication cache invalidations, it handles most of the updates correctly. But there was still problem in terms of the write filtering so the publications still have to reset relcache too. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 13/12/16 21:42, Peter Eisentraut wrote: > On 12/10/16 2:48 AM, Petr Jelinek wrote: >> Attached new version with your updates and rebased on top of the current >> HEAD (the partitioning patch produced quite a few conflicts). > > I have attached a few more "fixup" patches, mostly with some editing of > documentation and comments and some compiler warnings. > > In 0006 in the protocol documentation I have left a "XXX ???" where I > didn't understand what it was trying to say. > Okay I'll address that separately, thanks. > All issues from (my) previous reviews appear to have been addressed. > > Comments besides that: > > > 0003-Add-SUBSCRIPTION-catalog-and-DDL-v12.patch > > Still wondering about the best workflow with pg_dump, but it seems all > the pieces are there right now, and the interfaces can be tweaked later. Right, either way there needs to be some special handling for subscriptions, having to request them specifically seems safest option to me, but I am open to suggestions there. > > DROP SUBSCRIPTION requires superuser, but should perhaps be owner check > only? > Hmm not sure that it requires superuser, I actually think it mistakenly didn't require anything. In any case will make sure it just does owner check. > DROP SUBSCRIPTION IF EXISTS crashes if the subscription does not in fact > exist. > Right, missing return. > Maybe write the grammar so that SLOT does not need to be a new key word. > The changes you made for CREATE PUBLICATION should allow that. > Hmm how would that look like? The opt_drop_slot would become IDENT IDENT? Or maybe you want me to add the WITH (definition) kind of thing? > The tests are not added to serial_schedule. Intentional? If so, document? > Not intentional, will fix. Never use it, easy to forget about it. > > 0004-Define-logical-replication-protocol-and-output-plugi-v12.patch > > Not sure why pg_catalog is encoded as a zero-length string. I guess it > saves some space. Maybe that could be explained in a brief code comment? > Yes it's to save space, mainly for built-in types. > > 0005-Add-logical-replication-workers-v12.patch > > The way the executor stuff is organized now looks better to me. > > The subscriber crashes if max_replication_slots is 0: > > TRAP: FailedAssertion("!(max_replication_slots > 0)", File: "origin.c", > Line: 999) > > The documentation says that replication slots are required on the > subscriber, but from a user's perspective, it's not clear why that is. Yeah honestly I think origins should not depend on max_replication_slots. They are not really connected (you can have many of one and none of the other and vice versa). Also max_replication_slots should IMHO default to max_wal_senders at this point. (In ideal world all of those 3 would be in DSM instead of SHM and only governed by some implementation maximum which is probably 2^16 and the GUCs would be removed) But yes as it is, we should check for that, probably both during CREATE SUBSCRIPTION and during apply start. > > Dropping a table that is part of a live subscription results in log > messages like > > WARNING: leaked hash_seq_search scan for hash table 0x7f9d2a807238 > > I was testing replicating into a temporary table, which failed like this: > > FATAL: the logical replication target public.test1 not found > LOG: worker process: (PID 2879) exited with exit code 1 > LOG: starting logical replication worker for subscription 16392 > LOG: logical replication apply for subscription mysub started > > That's okay, but those messages were repeated every few seconds or so > and would create quite some log volume. I wonder if that needs to be > reigned in somewhat. It retries every 5s or so I think, I am not sure how that could be improved besides using the wal_retrieve_retry_interval instead of hardcoded 5s (or maybe better add GUC for apply). Maybe some kind of backoff algorithm could be added as well. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 13/12/16 01:33, Andres Freund wrote: > > On 2016-12-12 09:18:48 -0500, Peter Eisentraut wrote: >> On 12/8/16 4:10 PM, Petr Jelinek wrote: >>> On 08/12/16 20:16, Peter Eisentraut wrote: >>>> On 12/6/16 11:58 AM, Peter Eisentraut wrote: >>>>> On 12/5/16 6:24 PM, Petr Jelinek wrote: >>>>>> I think that the removal of changes to ReplicationSlotAcquire() that you >>>>>> did will result in making it impossible to reacquire temporary slot once >>>>>> you switched to different one in the session as the if (active_pid != 0) >>>>>> will always be true for temp slot. >>>>> >>>>> I see. I suppose it's difficult to get a test case for this. >>>> >>>> I created a test case, saw the error of my ways, and added your code >>>> back in. Patch attached. >>>> >>> >>> Hi, >>> >>> I am happy with this version, thanks for moving it forward. >> >> committed > > Hm. > > /* > + * Cleanup all temporary slots created in current session. > + */ > +void > +ReplicationSlotCleanup() > > I'd rather see a (void) there. The prototype has it, but still. > > > + > + /* > + * No need for locking as we are only interested in slots active in > + * current process and those are not touched by other processes. > > I'm a bit suspicious of this claim. Without a memory barrier you could > actually look at outdated versions of active_pid. In practice there's > enough full memory barriers in the slot creation code that it's > guaranteed to not be the same pid from before a wraparound though. > > I think that doing iterations of slots without > ReplicationSlotControlLock makes things more fragile, because suddenly > assumptions that previously held aren't true anymore. E.g. factually > /* > * The slot is definitely gone. Lock out concurrent scans of the array > * long enough to kill it. It's OK to clear the active flag here without > * grabbing the mutex because nobody else can be scanning the array here, > * and nobody can be attached to this slot and thus access it without > * scanning the array. > */ > is now simply not true anymore. It's probably not harmfully broken, but > at least you've changed the locking protocol without adapting comments. > > Any thoughts on attached? Yes it does repeated scans which can in theory be slow but as I explained in the comment, in practice there is not much need to have many temporary slots active within single session so it should not be big issue. I am not quite convinced that all the locking is necessary from the current logic perspective TBH but it should help prevent mistakes by whoever changes things in slot.c next. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 13/12/16 21:42, Peter Eisentraut wrote: > On 12/10/16 2:48 AM, Petr Jelinek wrote: >> Attached new version with your updates and rebased on top of the current >> HEAD (the partitioning patch produced quite a few conflicts). > > I have attached a few more "fixup" patches, mostly with some editing of > documentation and comments and some compiler warnings. > > In 0006 in the protocol documentation I have left a "XXX ???" where I > didn't understand what it was trying to say. > Ah so you didn't understand the > + Identifies the following TupleData submessage as a key. > + This field is optional and is only present if > + the update changed the REPLICA IDENTITY index. XXX??? So what happens here is that the update message can contain one or two out of 3 possible tuple submessages. It always contains 'N' message which is the new data. Then it can optionally contain 'O' message with old data if the table has REPLICA IDENTITY FULL (ie, not REPLICA IDENTITY index like pkey, etc). Or it can include 'K' message that only contains old data for the columns in the REPLICA IDENTITY index. But if the REPLICA IDENTITY index didn't change (ie, old and new would be same for those columns) we simply omit the 'K' message and let the downstream take the key data from the 'N' message to save space. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 15 Dec. 2016 18:19, "Petr Jelinek" <petr.jelinek@2ndquadrant.com> wrote:
On 13/12/16 21:42, Peter Eisentraut wrote:> On 12/10/16 2:48 AM, Petr Jelinek wrote:Ah so you didn't understand the
>> Attached new version with your updates and rebased on top of the current
>> HEAD (the partitioning patch produced quite a few conflicts).
>
> I have attached a few more "fixup" patches, mostly with some editing of
> documentation and comments and some compiler warnings.
>
> In 0006 in the protocol documentation I have left a "XXX ???" where I
> didn't understand what it was trying to say.
>
> + Identifies the following TupleData submessage as a key.
> + This field is optional and is only present if
> + the update changed the REPLICA IDENTITY index. XXX???
So what happens here is that the update message can contain one or two
out of 3 possible tuple submessages. It always contains 'N' message
which is the new data. Then it can optionally contain 'O' message with
old data if the table has REPLICA IDENTITY FULL (ie, not REPLICA
IDENTITY index like pkey, etc). Or it can include 'K' message that only
contains old data for the columns in the REPLICA IDENTITY index. But if
the REPLICA IDENTITY index didn't change (ie, old and new would be same
for those columns) we simply omit the 'K' message and let the downstream
take the key data from the 'N' message to save space.
Something we forgot to bake into pglogical that might be worth leaving room for here: sending the whole old tuple, with some fields marked as key.
So you can use replica identity pkey or whatever and the downstream knows which are the key fields. But can still transmit the whole old tuple in case the downstream wants it for conflict resolution/logging/etc.
We don't have the logical decoding and wal output for this yet, nor a way of requesting old tuple recording table by table. So all i'm suggesting is leaving room in the protocol.
On 15/12/16 13:06, Craig Ringer wrote: > On 15 Dec. 2016 18:19, "Petr Jelinek" <petr.jelinek@2ndquadrant.com > <mailto:petr.jelinek@2ndquadrant.com>> wrote: > > On 13/12/16 21:42, Peter Eisentraut wrote: > > On 12/10/16 2:48 AM, Petr Jelinek wrote: > >> Attached new version with your updates and rebased on top of the > current > >> HEAD (the partitioning patch produced quite a few conflicts). > > > > I have attached a few more "fixup" patches, mostly with some > editing of > > documentation and comments and some compiler warnings. > > > > In 0006 in the protocol documentation I have left a "XXX ???" where I > > didn't understand what it was trying to say. > > > > Ah so you didn't understand the > > + Identifies the following TupleData submessage as > a key. > > + This field is optional and is only present if > > + the update changed the REPLICA IDENTITY index. XXX??? > > So what happens here is that the update message can contain one or two > out of 3 possible tuple submessages. It always contains 'N' message > which is the new data. Then it can optionally contain 'O' message with > old data if the table has REPLICA IDENTITY FULL (ie, not REPLICA > IDENTITY index like pkey, etc). Or it can include 'K' message that only > contains old data for the columns in the REPLICA IDENTITY index. But if > the REPLICA IDENTITY index didn't change (ie, old and new would be same > for those columns) we simply omit the 'K' message and let the downstream > take the key data from the 'N' message to save space. > > > Something we forgot to bake into pglogical that might be worth leaving > room for here: sending the whole old tuple, with some fields marked as key. > > So you can use replica identity pkey or whatever and the downstream > knows which are the key fields. But can still transmit the whole old > tuple in case the downstream wants it for conflict resolution/logging/etc. > > We don't have the logical decoding and wal output for this yet, nor a > way of requesting old tuple recording table by table. So all i'm > suggesting is leaving room in the protocol. > Not really sure I follow, which columns are keys is not part of the info in the data message, it's part of relation message, so it's already possible in the protocol. Also the current implementation is fully capable of taking advantage of PK on downstream even with REPLICA IDENTITY FULL. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi, attached is version 13 of the patch. I merged in changes from PeterE. And did following changes: - fixed the ownership error messages for both provider and subscriber - added ability to send invalidation message to invalidate whole relcache and use it in publication code - added the post creation/alter/drop hooks - removed parts of docs that refer to initial sync (which does not exist yet) - added timeout handling/retry, etc to apply/launcher based on the GUCs that exist for wal receiver (they could use renaming though) - improved feedback behavior - apply worker now uses owner of the subscription as connection user - more tests - check for max_replication_slots in launcher - clarify the update 'K' sub-message description in protocol -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 2016-12-16 13:49, Petr Jelinek wrote: > > version 13 of the patch. > > 0001-Add-PUBLICATION-catalogs-and-DDL-v13.patch.gz (~32 KB) > 0002-Add-SUBSCRIPTION-catalog-and-DDL-v13.patch.gz (~28 KB) > 0003-Define-logical-rep...utput-plugi-v13.patch.gz (~13 KB) > 0004-Add-logical-replication-workers-v13.patch.gz (~44 KB) > 0005-Add-separate-synch...or-logical--v13.patch.gz (~2 KB) Hi, You wrote on 2016-08-05: : > What's missing: > - sequences, I'd like to have them in 10.0 but I don't have good > way to implement it. PGLogical uses periodical syncing with some > buffer value but that's suboptimal. I would like to decode them > but that has proven to be complicated due to their sometimes > transactional sometimes nontransactional nature, so I probably > won't have time to do it within 10.0 by myself. I ran into problems with sequences and I wonder if sequences-problems are still expected, as the above seems to imply. (short story: I tried to run pgbench across logical replication; and therefore added a sequence to pgbench_history to give it a replica identity, and cannot get it to work reliably ). thanks, Eik Rijkers
On 12/16/2016 07:49 AM, Petr Jelinek wrote: > Hi, > > attached is version 13 of the patch. > > I merged in changes from PeterE. And did following changes: > - fixed the ownership error messages for both provider and subscriber > - added ability to send invalidation message to invalidate whole > relcache and use it in publication code > - added the post creation/alter/drop hooks > - removed parts of docs that refer to initial sync (which does not exist > yet) > - added timeout handling/retry, etc to apply/launcher based on the GUCs > that exist for wal receiver (they could use renaming though) > - improved feedback behavior > - apply worker now uses owner of the subscription as connection user > - more tests > - check for max_replication_slots in launcher > - clarify the update 'K' sub-message description in protocol A few things I've noticed so far If I shutdown the publisher I see the following in the log 2016-12-17 11:33:49.548 EST [1891] LOG: worker process: ?)G? (PID 1987) exited with exit code 1 but then if I shutdown the subscriber postmaster and restart it switches to 2016-12-17 11:43:09.628 EST [2373] LOG: worker process: ???? (PID 2393) exited with exit code 1 Not sure where the 'G' was coming from (other times I have seen an 'I' here or other random characters) I don't think we are cleaning up subscriptions on a drop database If I do the following 1) Create a subscription in a new database 2) Stop the publisher 3) Drop the database on the subscriber test=# create subscription mysuba connection 'host=localhost dbname=test port=5440' publication mypub; test=# \c b b=# drop database test; DROP DATABASE b=# select * FROM pg_subscription ; subdbid | subname | subowner | subenabled | subconninfo | subslotname | subpublications ---------+---------+----------+------------+--------------------------------------+-------------+----------------- 16384| mysuba | 10 | t | host=localhost dbname=test port=5440 | mysuba | {mypub} b=# select datname FROM pg_database where oid=16384; datname --------- (0 rows) Also I don't think I can now drop mysuba b=# drop subscription mysuba; ERROR: subscription "mysuba" does not exist > >
On 17/12/16 13:37, Erik Rijkers wrote: > On 2016-12-16 13:49, Petr Jelinek wrote: >> >> version 13 of the patch. >> >> 0001-Add-PUBLICATION-catalogs-and-DDL-v13.patch.gz (~32 KB) >> 0002-Add-SUBSCRIPTION-catalog-and-DDL-v13.patch.gz (~28 KB) >> 0003-Define-logical-rep...utput-plugi-v13.patch.gz (~13 KB) >> 0004-Add-logical-replication-workers-v13.patch.gz (~44 KB) >> 0005-Add-separate-synch...or-logical--v13.patch.gz (~2 KB) > > Hi, > > You wrote on 2016-08-05: : > >> What's missing: >> - sequences, I'd like to have them in 10.0 but I don't have good >> way to implement it. PGLogical uses periodical syncing with some >> buffer value but that's suboptimal. I would like to decode them >> but that has proven to be complicated due to their sometimes >> transactional sometimes nontransactional nature, so I probably >> won't have time to do it within 10.0 by myself. > > I ran into problems with sequences and I wonder if sequences-problems > are still expected, as the above seems to imply. > > (short story: I tried to run pgbench across logical replication; and > therefore > added a sequence to pgbench_history to give it a replica identity, and > cannot get it to work reliably ). > Sequences are not replicated but that should not prevent pgbench_history itself from being replicated when you add serial to it. BTW you don't need to add primary key to pgbench_history. Simply ALTER TABLE pgbench_history REPLICA IDENTITY FULL; should be enough. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 17/12/16 18:34, Steve Singer wrote: > On 12/16/2016 07:49 AM, Petr Jelinek wrote: >> Hi, >> >> attached is version 13 of the patch. >> >> I merged in changes from PeterE. And did following changes: >> - fixed the ownership error messages for both provider and subscriber >> - added ability to send invalidation message to invalidate whole >> relcache and use it in publication code >> - added the post creation/alter/drop hooks >> - removed parts of docs that refer to initial sync (which does not exist >> yet) >> - added timeout handling/retry, etc to apply/launcher based on the GUCs >> that exist for wal receiver (they could use renaming though) >> - improved feedback behavior >> - apply worker now uses owner of the subscription as connection user >> - more tests >> - check for max_replication_slots in launcher >> - clarify the update 'K' sub-message description in protocol > > A few things I've noticed so far > > If I shutdown the publisher I see the following in the log > > 2016-12-17 11:33:49.548 EST [1891] LOG: worker process: ?)G? (PID 1987) > exited with exit code 1 > > but then if I shutdown the subscriber postmaster and restart it switches to > 2016-12-17 11:43:09.628 EST [2373] LOG: worker process: ???? (PID 2393) > exited with exit code 1 > > Not sure where the 'G' was coming from (other times I have seen an 'I' > here or other random characters) > Uninitialized bgw_name for apply worker. Rather silly bug. Fixed. > > I don't think we are cleaning up subscriptions on a drop database > > If I do the following > > 1) Create a subscription in a new database > 2) Stop the publisher > 3) Drop the database on the subscriber > > test=# create subscription mysuba connection 'host=localhost dbname=test > port=5440' publication mypub; > test=# \c b > b=# drop database test; > DROP DATABASE > b=# select * FROM pg_subscription ; > subdbid | subname | subowner | subenabled | subconninfo | > subslotname | subpublications > ---------+---------+----------+------------+--------------------------------------+-------------+----------------- > > 16384 | mysuba | 10 | t | host=localhost dbname=test > port=5440 | mysuba | {mypub} > Good one. I added check that prevents dropping database when there is subscription defined for it. I think we can't cascade here as subscription may or may not hold resources (slot) in another instance/database so preventing the drop is best we can do. > > Also I don't think I can now drop mysuba > b=# drop subscription mysuba; > ERROR: subscription "mysuba" does not exist > Yeah subscriptions are per database. I don't want to make v14 just for these 2 changes as that would make life harder for anybody code-reviewing the v13 so attached is diff with above fixes that applies on top of v13. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 12/18/2016 05:28 AM, Petr Jelinek wrote: > On 17/12/16 18:34, Steve Singer wrote: >> On 12/16/2016 07:49 AM, Petr Jelinek wrote: >> Yeah subscriptions are per database. I don't want to make v14 just >> for these 2 changes as that would make life harder for anybody >> code-reviewing the v13 so attached is diff with above fixes that >> applies on top of v13. > Thanks that fixes those issues. A few more I've noticed pg_dumping subscriptions doesn't seem to work ./pg_dump -h localhost --port 5441 --include-subscriptions test pg_dump: [archiver (db)] query failed: ERROR: missing FROM-clause entry for table "p" LINE 1: ...LECT rolname FROM pg_catalog.pg_roles WHERE oid = p.subowner... ^ pg_dump: [archiver (db)] query was: SELECT s.tableoid, s.oid, s.subname,(SELECT rolname FROM pg_catalog.pg_roles WHERE oid = p.subowner) AS rolname, s.subenabled, s.subconninfo, s.subslotname, s.subpublications FROM pg_catalog.pg_subscription s WHERE s.subdbid = (SELECT oid FROM pg_catalog.pg_database WHERE datname = current_database()) I have attached a patch that fixes this. pg_dump is also generating warnings pg_dump: [archiver] WARNING: don't know how to set owner for object type SUBSCRIPTION I know that the plan is to add proper ACL's for publications and subscriptions later. I don't know if we want to leave the warning in until then or do something about it. Also the tab-competion for create subscription doesn't seem to work as intended. I've attached a patch that fixes it and patches to add tab completion for alter publication|subscription -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 18/12/16 19:02, Steve Singer wrote: > On 12/18/2016 05:28 AM, Petr Jelinek wrote: >> On 17/12/16 18:34, Steve Singer wrote: >>> On 12/16/2016 07:49 AM, Petr Jelinek wrote: >>> Yeah subscriptions are per database. I don't want to make v14 just >>> for these 2 changes as that would make life harder for anybody >>> code-reviewing the v13 so attached is diff with above fixes that >>> applies on top of v13. >> > > > Thanks that fixes those issues. > > A few more I've noticed > > > pg_dumping subscriptions doesn't seem to work > > ./pg_dump -h localhost --port 5441 --include-subscriptions test > pg_dump: [archiver (db)] query failed: ERROR: missing FROM-clause entry > for table "p" > LINE 1: ...LECT rolname FROM pg_catalog.pg_roles WHERE oid = p.subowner... > ^ > pg_dump: [archiver (db)] query was: SELECT s.tableoid, s.oid, > s.subname,(SELECT rolname FROM pg_catalog.pg_roles WHERE oid = > p.subowner) AS rolname, s.subenabled, s.subconninfo, s.subslotname, > s.subpublications FROM pg_catalog.pg_subscription s WHERE s.subdbid = > (SELECT oid FROM pg_catalog.pg_database WHERE datname > = current_database()) > > I have attached a patch that fixes this. > Thanks, merged. > pg_dump is also generating warnings > > pg_dump: [archiver] WARNING: don't know how to set owner for object type > SUBSCRIPTION > > I know that the plan is to add proper ACL's for publications and > subscriptions later. I don't know if we want to leave the warning in > until then or do something about it. > No, ACLs are separate from owner. This is thinko on my side. I was thinking we can live without ALTER ... OWNER TO for now, but we actually need it for pg_dump and for REASSIGN OWNED. So now I added the OWNER TO for both PUBLICATION and SUBSCRIPTION. > > Also the tab-competion for create subscription doesn't seem to work as > intended. > I've attached a patch that fixes it and patches to add tab completion > for alter publication|subscription > Merged as well. Okay so now is the time for v14 I guess as more changes accumulated (I also noticed missing doc for max_logical_replication_workers GUC). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 2016-12-18 11:12, Petr Jelinek wrote: (now using latest: patchset:) 0001-Add-PUBLICATION-catalogs-and-DDL-v14.patch 0002-Add-SUBSCRIPTION-catalog-and-DDL-v14.patch 0003-Define-logical-replication-protocol-and-output-plugi-v14.patch 0004-Add-logical-replication-workers-v14.patch 0005-Add-separate-synchronous-commit-control-for-logical--v14.patch > BTW you don't need to add primary key to pgbench_history. Simply ALTER > TABLE pgbench_history REPLICA IDENTITY FULL; should be enough. Either should, but neither is. set-up: Before creating the publication/subscription: On master I run pgbench -qis 1, then set replica identity (and/or add serial column) for pgbench_history, then dump/restore the 4 pgbench tables from master to replica. Then enabling publication/subscription. logs looks well. (Other tests I've devised earlier (on other tables) still work nicely.) Now when I do a pgbench-run on master, something like: pgbench -c 1 -T 20 -P 1 I often see this (when running pgbench): ERROR: publisher does not send replica identity column expected by the logical replication target public.pgbench_tellers or, sometimes (less often) the same ERROR for pgbench_accounts appears (as in the subsciber-log below) -- publisher log 2016-12-19 07:44:22.738 CET 22690 LOG: logical decoding found consistent point at 0/14598C78 2016-12-19 07:44:22.738 CET 22690 DETAIL: There are no running transactions. 2016-12-19 07:44:22.738 CET 22690 LOG: exported logical decoding snapshot: "000130FA-1" with 0 transaction IDs 2016-12-19 07:44:22.886 CET 22729 LOG: starting logical decoding for slot "sub1" 2016-12-19 07:44:22.886 CET 22729 DETAIL: streaming transactions committing after 0/14598CB0, reading WAL from 0/14598C78 2016-12-19 07:44:22.886 CET 22729 LOG: logical decoding found consistent point at 0/14598C78 2016-12-19 07:44:22.886 CET 22729 DETAIL: There are no running transactions. 2016-12-19 07:45:25.568 CET 22729 LOG: could not receive data from client: Connection reset by peer 2016-12-19 07:45:25.568 CET 22729 LOG: unexpected EOF on standby connection 2016-12-19 07:45:25.580 CET 26696 LOG: starting logical decoding for slot "sub1" 2016-12-19 07:45:25.580 CET 26696 DETAIL: streaming transactions committing after 0/1468E0D0, reading WAL from 0/1468DC90 2016-12-19 07:45:25.589 CET 26696 LOG: logical decoding found consistent point at 0/1468DC90 2016-12-19 07:45:25.589 CET 26696 DETAIL: There are no running transactions. -- subscriber log 2016-12-19 07:44:22.878 CET 17027 LOG: starting logical replication worker for subscription 24581 2016-12-19 07:44:22.883 CET 22726 LOG: logical replication apply for subscription sub1 started 2016-12-19 07:45:11.069 CET 22726 WARNING: leaked hash_seq_search scan for hash table 0x2def1a8 2016-12-19 07:45:25.566 CET 22726 ERROR: publisher does not send replica identity column expected by the logical replication target public.pgbench_accounts 2016-12-19 07:45:25.568 CET 16984 LOG: worker process: logical replication worker 24581 (PID 22726) exited with exit code 1 2016-12-19 07:45:25.568 CET 17027 LOG: starting logical replication worker for subscription 24581 2016-12-19 07:45:25.574 CET 26695 LOG: logical replication apply for subscription sub1 started 2016-12-19 07:46:10.950 CET 26695 WARNING: leaked hash_seq_search scan for hash table 0x2def2c8 2016-12-19 07:46:10.950 CET 26695 WARNING: leaked hash_seq_search scan for hash table 0x2def2c8 2016-12-19 07:46:10.950 CET 26695 WARNING: leaked hash_seq_search scan for hash table 0x2def2c8 Sometimes replication (caused by a pgbench run) runs for a few seconds replicating all 4 pgbench tables correctly, but never longer than 10 to 20 seconds. If you cannot reproduce with the provided info it I will make a more precise setup-desciption, but it's so solidly failing here that I hope that won't be necessary. Erik Rijkers
On 19/12/16 08:04, Erik Rijkers wrote: > On 2016-12-18 11:12, Petr Jelinek wrote: > > (now using latest: patchset:) > > 0001-Add-PUBLICATION-catalogs-and-DDL-v14.patch > 0002-Add-SUBSCRIPTION-catalog-and-DDL-v14.patch > 0003-Define-logical-replication-protocol-and-output-plugi-v14.patch > 0004-Add-logical-replication-workers-v14.patch > 0005-Add-separate-synchronous-commit-control-for-logical--v14.patch > >> BTW you don't need to add primary key to pgbench_history. Simply ALTER >> TABLE pgbench_history REPLICA IDENTITY FULL; should be enough. > > Either should, but neither is. > > set-up: > Before creating the publication/subscription: > On master I run pgbench -qis 1, then set replica identity (and/or add > serial column) for pgbench_history, then dump/restore the 4 pgbench > tables from master to replica. > Then enabling publication/subscription. logs looks well. (Other tests > I've devised earlier (on other tables) still work nicely.) > > Now when I do a pgbench-run on master, something like: > > pgbench -c 1 -T 20 -P 1 > > I often see this (when running pgbench): > > ERROR: publisher does not send replica identity column expected by the > logical replication target public.pgbench_tellers > or, sometimes (less often) the same ERROR for pgbench_accounts appears > (as in the subsciber-log below) > > -- publisher log > 2016-12-19 07:44:22.738 CET 22690 LOG: logical decoding found > consistent point at 0/14598C78 > 2016-12-19 07:44:22.738 CET 22690 DETAIL: There are no running > transactions. > 2016-12-19 07:44:22.738 CET 22690 LOG: exported logical decoding > snapshot: "000130FA-1" with 0 transaction IDs > 2016-12-19 07:44:22.886 CET 22729 LOG: starting logical decoding for > slot "sub1" > 2016-12-19 07:44:22.886 CET 22729 DETAIL: streaming transactions > committing after 0/14598CB0, reading WAL from 0/14598C78 > 2016-12-19 07:44:22.886 CET 22729 LOG: logical decoding found > consistent point at 0/14598C78 > 2016-12-19 07:44:22.886 CET 22729 DETAIL: There are no running > transactions. > 2016-12-19 07:45:25.568 CET 22729 LOG: could not receive data from > client: Connection reset by peer > 2016-12-19 07:45:25.568 CET 22729 LOG: unexpected EOF on standby > connection > 2016-12-19 07:45:25.580 CET 26696 LOG: starting logical decoding for > slot "sub1" > 2016-12-19 07:45:25.580 CET 26696 DETAIL: streaming transactions > committing after 0/1468E0D0, reading WAL from 0/1468DC90 > 2016-12-19 07:45:25.589 CET 26696 LOG: logical decoding found > consistent point at 0/1468DC90 > 2016-12-19 07:45:25.589 CET 26696 DETAIL: There are no running > transactions. > > -- subscriber log > 2016-12-19 07:44:22.878 CET 17027 LOG: starting logical replication > worker for subscription 24581 > 2016-12-19 07:44:22.883 CET 22726 LOG: logical replication apply for > subscription sub1 started > 2016-12-19 07:45:11.069 CET 22726 WARNING: leaked hash_seq_search scan > for hash table 0x2def1a8 > 2016-12-19 07:45:25.566 CET 22726 ERROR: publisher does not send > replica identity column expected by the logical replication target > public.pgbench_accounts > 2016-12-19 07:45:25.568 CET 16984 LOG: worker process: logical > replication worker 24581 (PID 22726) exited with exit code 1 > 2016-12-19 07:45:25.568 CET 17027 LOG: starting logical replication > worker for subscription 24581 > 2016-12-19 07:45:25.574 CET 26695 LOG: logical replication apply for > subscription sub1 started > 2016-12-19 07:46:10.950 CET 26695 WARNING: leaked hash_seq_search scan > for hash table 0x2def2c8 > 2016-12-19 07:46:10.950 CET 26695 WARNING: leaked hash_seq_search scan > for hash table 0x2def2c8 > 2016-12-19 07:46:10.950 CET 26695 WARNING: leaked hash_seq_search scan > for hash table 0x2def2c8 > > > Sometimes replication (caused by a pgbench run) runs for a few seconds > replicating all 4 pgbench tables correctly, but never longer than 10 to > 20 seconds. > > If you cannot reproduce with the provided info it I will make a more > precise setup-desciption, but it's so solidly failing here that I hope > that won't be necessary. > Hi, nope can't reproduce that. I can reproduce the leaked hash_seq_search. The attached fixes that. But no issues with replication itself. The error basically means that pkey on publisher and subscriber isn't the same. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 12/18/2016 09:04 PM, Petr Jelinek wrote: > On 18/12/16 19:02, Steve Singer wrote: > >> pg_dump is also generating warnings >> >> pg_dump: [archiver] WARNING: don't know how to set owner for object type >> SUBSCRIPTION >> >> I know that the plan is to add proper ACL's for publications and >> subscriptions later. I don't know if we want to leave the warning in >> until then or do something about it. >> > No, ACLs are separate from owner. This is thinko on my side. I was > thinking we can live without ALTER ... OWNER TO for now, but we actually > need it for pg_dump and for REASSIGN OWNED. So now I added the OWNER TO > for both PUBLICATION and SUBSCRIPTION. When I try to restore my pg_dump with publications I get ./pg_dump -h localhost --port 5440 test |./psql -h localhost --port 5440 test2 ALTER TABLE CREATE PUBLICATION ERROR: unexpected command tag "PUBLICATION This comes from a ALTER PUBLICATION mypub OWNER TO ssinger; Does the OWNER TO clause need to be added to AlterPublicationStmt: instead of AlterOwnerStmt ? Also we should update the tab-complete for ALTER PUBLICATION to show the OWNER to options + the \h help in psql and the reference SGML
On 19/12/16 15:39, Steve Singer wrote: > On 12/18/2016 09:04 PM, Petr Jelinek wrote: >> On 18/12/16 19:02, Steve Singer wrote: >> >>> pg_dump is also generating warnings >>> >>> pg_dump: [archiver] WARNING: don't know how to set owner for object type >>> SUBSCRIPTION >>> >>> I know that the plan is to add proper ACL's for publications and >>> subscriptions later. I don't know if we want to leave the warning in >>> until then or do something about it. >>> >> No, ACLs are separate from owner. This is thinko on my side. I was >> thinking we can live without ALTER ... OWNER TO for now, but we actually >> need it for pg_dump and for REASSIGN OWNED. So now I added the OWNER TO >> for both PUBLICATION and SUBSCRIPTION. > > > When I try to restore my pg_dump with publications I get > > ./pg_dump -h localhost --port 5440 test |./psql -h localhost --port > 5440 test2 > > > ALTER TABLE > CREATE PUBLICATION > ERROR: unexpected command tag "PUBLICATION > > This comes from a > ALTER PUBLICATION mypub OWNER TO ssinger; > > > Does the OWNER TO clause need to be added to AlterPublicationStmt: > instead of AlterOwnerStmt ? Nah that's just bug in what command tag string we return in the utility.c, I noticed this myself after sending the v14, it's one line fix. > Also we should update the tab-complete for ALTER PUBLICATION to show the > OWNER to options + the \h help in psql and the reference SGML > Yeah. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2016-12-19 08:04, Erik Rijkers wrote: > On 2016-12-18 11:12, Petr Jelinek wrote: > > (now using latest: patchset:) > > 0001-Add-PUBLICATION-catalogs-and-DDL-v14.patch > 0002-Add-SUBSCRIPTION-catalog-and-DDL-v14.patch > 0003-Define-logical-replication-protocol-and-output-plugi-v14.patch > 0004-Add-logical-replication-workers-v14.patch > 0005-Add-separate-synchronous-commit-control-for-logical--v14.patch > > Sometimes replication (caused by a pgbench run) runs for a few > seconds replicating all 4 pgbench tables correctly, but never longer > than 10 to 20 seconds. > I've concocted pgbench_derail.sh. It assumes 2 instances running, initially without the publication and subsciption. There are two separate installations, on the same machine. To startup the two instances I use instance.sh: # ./instances.sh #!/bin/sh port1=6972 port2=6973 project1=logical_replication project2=logical_replication2 pg_stuff_dir=$HOME/pg_stuff PATH1=$pg_stuff_dir/pg_installations/pgsql.$project1/bin:$PATH PATH2=$pg_stuff_dir/pg_installations/pgsql.$project2/bin:$PATH server_dir1=$pg_stuff_dir/pg_installations/pgsql.$project1 server_dir2=$pg_stuff_dir/pg_installations/pgsql.$project2 data_dir1=$server_dir1/data data_dir2=$server_dir2/data options1=" -c wal_level=logical -c max_replication_slots=10 -c max_worker_processes=12 -c max_logical_replication_workers=10 -c max_wal_senders=10 -c logging_collector=on -c log_directory=$server_dir1 -c log_filename=logfile.${project1} " options2=" -c wal_level=replica -c max_replication_slots=10 -c max_worker_processes=12 -c max_logical_replication_workers=10 -c max_wal_senders=10 -c logging_collector=on -c log_directory=$server_dir2 -c log_filename=logfile.${project2} " which postgres export PATH=$PATH1; postgres -D $data_dir1 -p $port1 ${options1} & export PATH=$PATH2; postgres -D $data_dir2 -p $port2 ${options2} & # end ./instances.sh #--- pgbench_derail.sh #!/bin/sh # assumes both instances are running # clear logs # echo > $HOME/pg_stuff/pg_installations/pgsql.logical_replication/logfile.logical_replication # echo > $HOME/pg_stuff/pg_installations/pgsql.logical_replication2/logfile.logical_replication2 port1=6972 port2=6973 function cb() { # display the 4 pgbench tables' accumulated content as md5s # a,b,t,h stand for: pgbench_accounts, -branches, -tellers,-history for port in $port1 $port2 do md5_a=$(echo "select * from pgbench_accounts order by aid" |psql -qtAXp$port|md5sum|cut -b 1-9) md5_b=$(echo "select * from pgbench_branches order by bid" |psql -qtAXp$port|md5sum|cut -b 1-9) md5_t=$(echo "select * from pgbench_tellers order by tid" |psql -qtAXp$port|md5sum|cut -b 1-9) md5_h=$(echo "select * from pgbench_history order by aid,bid,tid"|psql -qtAXp$port|md5sum|cut -b 1-9) cnt_a=$(echo "select count(*) from pgbench_accounts"|psql -qtAXp $port) cnt_b=$(echo "select count(*) from pgbench_branches"|psql -qtAXp $port) cnt_t=$(echo "select count(*) from pgbench_tellers" |psql -qtAXp $port) cnt_h=$(echo "select count(*) from pgbench_history" |psql -qtAXp $port) printf "$port a,b,t,h: %6d %6d %6d %6d" $cnt_a $cnt_b $cnt_t $cnt_h echo -n " $md5_a $md5_b $md5_t $md5_h" if [[ $port -eq $port1 ]]; then echo " master" elif [[ $port-eq $port2 ]]; then echo " replica" else echo " ERROR" fi done } echo " drop table if exists pgbench_accounts; drop table if exists pgbench_branches; drop table if exists pgbench_tellers; drop table if exists pgbench_history;" | psql -X -p $port1 \ && echo " drop table if exists pgbench_accounts; drop table if exists pgbench_branches; drop table if exists pgbench_tellers; drop table if exists pgbench_history;" | psql -X -p $port2 \ && pgbench -p $port1 -qis 1 \ && echo "alter table pgbench_historyreplica identity full;" | psql -1p $port1 \ && pg_dump -F c -p $port1 \ -t pgbench_accounts \ -t pgbench_branches \ -t pgbench_tellers \ -t pgbench_history \ | pg_restore -p $port2 -d testdb echo "$(cb)" sleep 2 echo "$(cb)" echo "create publication pub1 for all tables;" | psql -p $port1 -aqtAX echo " create subscription sub1 connection 'port=${port1}' publication pub1 with (disabled); alter subscription sub1 enable; " | psql -p $port2 -aqtAX #------------------------------------ # repeat a short (10 s) pgbench-un to show that during such # short runs the logical replication often remains intact. # Longer pgbench-runs always derail the logrep of one or more # of these 4 table # # bug: pgbench_history no longer replicates # sometimes also the other 3 table de-synced. echo "$(cb)" echo "-- pgbench -c 1 -T 10 -P 5 (short run, first)" pgbench -c 1 -T 10 -P 5 sleep 2 echo "$(cb)" echo "-- pgbench -c 1 -T 10 -P 5 (short run, second)" pgbench -c 1 -T 10 -P 5 sleep 2 echo "$(cb)" echo "-- pgbench -c 1 -T 120 -P 15 (long run)" pgbench -c 1 -T 120 -P 15 sleep 2 echo "-- 60 second (1)" echo "$(cb)" #--- end pgbench_derail.sh (Sorry for the messy bash.) thanks, Erik Rijkers
On 20/12/16 08:10, Erik Rijkers wrote: > On 2016-12-19 08:04, Erik Rijkers wrote: >> On 2016-12-18 11:12, Petr Jelinek wrote: >> >> (now using latest: patchset:) >> >> 0001-Add-PUBLICATION-catalogs-and-DDL-v14.patch >> 0002-Add-SUBSCRIPTION-catalog-and-DDL-v14.patch >> 0003-Define-logical-replication-protocol-and-output-plugi-v14.patch >> 0004-Add-logical-replication-workers-v14.patch >> 0005-Add-separate-synchronous-commit-control-for-logical--v14.patch >> >> Sometimes replication (caused by a pgbench run) runs for a few >> seconds replicating all 4 pgbench tables correctly, but never longer >> than 10 to 20 seconds. >> > > I've concocted pgbench_derail.sh. It assumes 2 instances running, > initially without the publication and subsciption. > > There are two separate installations, on the same machine. > Thanks, this was very useful. We had wrong attribute index arithmetics in the place where we verify that replica identities match well enough. BTW that script you have for testing has 2 minor flaws in terms of pgbench_history - the order by is not unique enough (adding mtime or something helps) and second, the pgbench actually truncates the pgbench_history unless -n is added to command line. So attached is v15, which fixes this and the ERROR: unexpected command tag "PUBLICATION as reported by Steve Singer (plus tab completion fixes and doc fixes). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 2016-12-20 09:43, Petr Jelinek wrote: > Thanks, this was very useful. We had wrong attribute index arithmetics > in the place where we verify that replica identities match well enough. Well, I spent a lot of time on the whole thing so I am glad it's not just something stupid I did :) > BTW that script you have for testing has 2 minor flaws in terms of > pgbench_history - the order by is not unique enough (adding mtime or > something helps) yes, in another version I did ALTER TABLE pgbench_history ADD COLUMN hid SERIAL PRIMARY KEY. I suppose that's the best way (adding mtime doesn't work; apparently mtime gets repeated too). (I have now added that alter table-statement again.) > and second, the pgbench actually truncates the > pgbench_history unless -n is added to command line. ok, -n added. > So attached is v15, which fixes this and the > ERROR: unexpected command tag "PUBLICATION > as reported by Steve Singer (plus tab completion fixes and doc fixes). Great. It seems to fix the problem: I just an an unprecidented 5-minute run with correct replication. The first compile gave the attached diffs in the publication regression test; subsequent compiles went OK (2x). If I have time later today I'll try to reproduce that one FAILED test but maybe you can see immediately what's wrong there . thanks, Erik Rijkers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 20/12/16 10:41, Erik Rijkers wrote: > On 2016-12-20 09:43, Petr Jelinek wrote: > >> Thanks, this was very useful. We had wrong attribute index arithmetics >> in the place where we verify that replica identities match well enough. > > Well, I spent a lot of time on the whole thing so I am glad it's not just > something stupid I did :) Yeah sadly it was something stupid I did ;) > >> BTW that script you have for testing has 2 minor flaws in terms of >> pgbench_history - the order by is not unique enough (adding mtime or >> something helps) > > yes, in another version I did > ALTER TABLE pgbench_history ADD COLUMN hid SERIAL PRIMARY KEY. > I suppose that's the best way (adding mtime doesn't work; apparently mtime > gets repeated too). (I have now added that alter table-statement again.) > >> and second, the pgbench actually truncates the >> pgbench_history unless -n is added to command line. > > ok, -n added. > >> So attached is v15, which fixes this and the >> ERROR: unexpected command tag "PUBLICATION >> as reported by Steve Singer (plus tab completion fixes and doc fixes). > > Great. It seems to fix the problem: I just an an unprecidented > 5-minute run with correct replication. > Great, thanks. > The first compile gave the attached diffs in the publication regression > test; subsequent > compiles went OK (2x). If I have time later today I'll try to reproduce > that one FAILED test > but maybe you can see immediately what's wrong there . Seems like tables are just returned in different order but otherwise it's ok. I guess a way to make this more stable would be to add order by in the query psql sends to get the list of tables in the publication. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2016-12-20 10:48, Petr Jelinek wrote: Here is another small thing: $ psql -d testdb -p 6972 psql (10devel_logical_replication_20161220_1008_db80acfc9d50) Type "help" for help. testdb=# drop publication if exists xxx; ERROR: unrecognized object type: 28 testdb=# drop subscription if exists xxx; WARNING: relcache reference leak: relation "pg_subscription" not closed DROP SUBSCRIPTION I don't mind but I suppose eventually other messages need to go there thanks, Erik Rijkers
On 20/12/16 10:56, Erik Rijkers wrote: > On 2016-12-20 10:48, Petr Jelinek wrote: > > Here is another small thing: > > $ psql -d testdb -p 6972 > psql (10devel_logical_replication_20161220_1008_db80acfc9d50) > Type "help" for help. > > testdb=# drop publication if exists xxx; > ERROR: unrecognized object type: 28 > > > testdb=# drop subscription if exists xxx; > WARNING: relcache reference leak: relation "pg_subscription" not closed > DROP SUBSCRIPTION > > > I don't mind but I suppose eventually other messages need to go there > Yep, attached should fix it. DDL for completely new db objects surely touches a lot of places. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Hi, I rebased this for the changes made to inheritance and merged in the fixes that I previously sent separately. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 2016-12-30 11:53, Petr Jelinek wrote: > I rebased this for the changes made to inheritance and merged in the > 0002-Add-SUBSCRIPTION-catalog-and-DDL-v16.patch.gz (~31 KB)
On 2016-12-30 11:53, Petr Jelinek wrote: > I rebased this for the changes made to inheritance and merged in the > 0002-Add-SUBSCRIPTION-catalog-and-DDL-v16.patch.gz (~31 KB) couple of orthography errors in messages -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 12/30/2016 05:53 AM, Petr Jelinek wrote: > Hi, > > I rebased this for the changes made to inheritance and merged in the > fixes that I previously sent separately. > > > I'm not sure if the following is expected or not I have 1 publisher and 1 subscriber. I then do pg_dump on my subscriber ./pg_dump -h localhost --port 5441 --include-subscriptions --no-create-subscription-slot test|./psql --port 5441 test_b I now can't do a drop database test_b , which is expected but I can't drop the subscription either test_b=# drop subscription mysub; ERROR: could not drop replication origin with OID 1, in use by PID 24996 alter subscription mysub disable; ALTER SUBSCRIPTION drop subscription mysub; ERROR: could not drop replication origin with OID 1, in use by PID 24996 drop subscription mysub nodrop slot; doesn't work either. If I first drop the working/active subscription on the original 'test' database it works but I can't seem to drop the subscription record on test_b
On 02/01/17 05:23, Steve Singer wrote: > On 12/30/2016 05:53 AM, Petr Jelinek wrote: >> Hi, >> >> I rebased this for the changes made to inheritance and merged in the >> fixes that I previously sent separately. >> >> >> > > > I'm not sure if the following is expected or not > > I have 1 publisher and 1 subscriber. > I then do pg_dump on my subscriber > ./pg_dump -h localhost --port 5441 --include-subscriptions > --no-create-subscription-slot test|./psql --port 5441 test_b > > I now can't do a drop database test_b , which is expected > > but I can't drop the subscription either > > > test_b=# drop subscription mysub; > ERROR: could not drop replication origin with OID 1, in use by PID 24996 > > alter subscription mysub disable; > ALTER SUBSCRIPTION > drop subscription mysub; > ERROR: could not drop replication origin with OID 1, in use by PID 24996 > > drop subscription mysub nodrop slot; > > doesn't work either. If I first drop the working/active subscription on > the original 'test' database it works but I can't seem to drop the > subscription record on test_b > I guess this is because replication origins are pg instance global and we use subscription name for origin name internally. Maybe we need to prefix/suffix it with db oid or something like that, but that's problematic a bit as well as they both have same length limit. I guess we could use subscription OID as replication origin name which is somewhat less user friendly in terms of debugging but would be unique. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
In 0001-Add-PUBLICATION-catalogs-and-DDL-v16.patch.gz, +static bool +is_publishable_class(Oid relid, Form_pg_class reltuple) +{ + return reltuple->relkind == RELKIND_RELATION && + !IsCatalogClass(relid, reltuple) && + reltuple->relpersistence == RELPERSISTENCE_PERMANENT && + /* XXX needed to exclude information_schema tables */ + relid >= FirstNormalObjectId; +} I don't think the XXX part is necessary, because IsCatalogClass() already checks for the same thing. (The whole thing is a bit bogus anyway, because you can drop and recreate the information schema at run time without restriction.) +#define MAX_RELCACHE_INVAL_MSGS 100 + List *relids = GetPublicationRelations(HeapTupleGetOid(tup)); + + /* + * We don't want to send too many individual messages, at some point + * it's cheaper to just reset whole relcache. + * + * XXX: the MAX_RELCACHE_INVAL_MSGS was picked arbitrarily, maybe + * there is better limit. + */ + if (list_length(relids) < MAX_RELCACHE_INVAL_MSGS) Do we have more data on this? There are people running with 100000 tables, and changing a publication with a 1000 tables would blow all that away? Maybe at least it should be set relative to INITRELCACHESIZE (400) to tie things together a bit? Update the documentation of SharedInvalCatalogMsg in sinval.h for the "all relations" case. (Maybe look around the whole file to make sure comments are still valid.) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 1/3/17 2:39 PM, Peter Eisentraut wrote: > In 0001-Add-PUBLICATION-catalogs-and-DDL-v16.patch.gz, Attached are a couple of small fixes for this. Feel free to ignore the removal of the header files if they are needed by later patches. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 03/01/17 20:39, Peter Eisentraut wrote: > In 0001-Add-PUBLICATION-catalogs-and-DDL-v16.patch.gz, > > +static bool > +is_publishable_class(Oid relid, Form_pg_class reltuple) > +{ > + return reltuple->relkind == RELKIND_RELATION && > + !IsCatalogClass(relid, reltuple) && > + reltuple->relpersistence == RELPERSISTENCE_PERMANENT && > + /* XXX needed to exclude information_schema tables */ > + relid >= FirstNormalObjectId; > +} > > I don't think the XXX part is necessary, because IsCatalogClass() > already checks for the same thing. (The whole thing is a bit bogus > anyway, because you can drop and recreate the information schema at run > time without restriction.) > I got this remark about IsCatalogClass() from Andres offline as well, but it's not true, it only checks for FirstNormalObjectId for objects in pg_catalog and toast schemas, not anywhere else. > +#define MAX_RELCACHE_INVAL_MSGS 100 > + List *relids = GetPublicationRelations(HeapTupleGetOid(tup)); > + > + /* > + * We don't want to send too many individual messages, at some point > + * it's cheaper to just reset whole relcache. > + * > + * XXX: the MAX_RELCACHE_INVAL_MSGS was picked arbitrarily, maybe > + * there is better limit. > + */ > + if (list_length(relids) < MAX_RELCACHE_INVAL_MSGS) > > Do we have more data on this? There are people running with 100000 > tables, and changing a publication with a 1000 tables would blow all > that away? > > Maybe at least it should be set relative to INITRELCACHESIZE (400) to > tie things together a bit? > I am actually thinking this should correspond to MAXNUMMESSAGES (4096) as that's the limit on buffer size. I didn't find it the first time around when I was looking for good number. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 03/01/17 22:51, Peter Eisentraut wrote: > On 1/3/17 2:39 PM, Peter Eisentraut wrote: >> In 0001-Add-PUBLICATION-catalogs-and-DDL-v16.patch.gz, > > Attached are a couple of small fixes for this. Feel free to ignore the > removal of the header files if they are needed by later patches. > Thanks, merged, no they are not needed by other patches. I also hopefully resolved the concerns you had about the relcache invalidation and expanded comment in is_publishable_class to make the intention there bit clearer. Only attached the changed patch, the rest should still apply fine on top of it. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Some small patches for 0002-Add-SUBSCRIPTION-catalog-and-DDL-v16.patch.gz: - Add a get_subscription_name() function - Remove call for ApplyLauncherWakeupAtCommit() (rebasing error?) - Remove some unused include files (same as before) - Rename pg_dump --no-create-subscription-slot to --no-create-subscription-slots (plural), add documentation. In CreateSubscription(), I don't think we should connect to the remote if no slot creation is requested. Arguably, the point of that option is to not make network connections. (That is what my documentation patch above claims, in any case.) I don't know why we need to check the PostgreSQL version number of the remote. We should rely on the protocol version number, and we should just make it work. When PG 11 comes around, subscribing from PG 10 to a publisher on PG 11 should just work without any warnings, IMO. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
0003-Define-logical-replication-protocol-and-output-plugi-v16.patch.gz looks good now, documentation is clear now. Another fixup patch to remove excessive includes. ;-) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Comments on 0004-Add-logical-replication-workers-v16.patch.gz: I didn't find any major problems. At times while I was testing strange things it was not clear why "nothing is happening". I'll do some more checking in that direction. Fixup patch attached that enhances some error messages, fixes some typos, and other minor changes. See also comments below. --- The way check_max_logical_replication_workers() is implemented creates potential ordering dependencies in postgresql.conf. For example, max_logical_replication_workers = 100 max_worker_processes = 200 fails, but if you change the order, it works. The existing check_max_worker_processes() has the same problem, but I suspect because it only checks against MAX_BACKENDS, nobody has ever seriously hit that limit. I suggest just removing the check. If you set max_logical_replication_workers higher than max_worker_processes and you hit the lower limit, then whatever is controlling max_worker_processes should complain with its own error message. --- The default for max_logical_replication_workers is 4, which seems very little. Maybe it should be more like 10 or 20. The "Quick setup" section recommends changing it to 10. We should at least be consistent there: If you set a default value that is not 0, then it should enough that we don't need to change it again in the Quick setup. (Maybe the default max_worker_processes should also be raised?) +max_logical_replication_workers = 10 # one per subscription + one per instance needed on subscriber I think this is incorrect (copied from max_worker_processes?). The launcher does not count as one of the workers here. On a related note, should the minimum not be 0 instead of 1? --- About the changes to libpqrcv_startstreaming(). The timeline is not really an option in the syntax. Just passing in a string that is pasted in the final command creates too much coupling, I think. I would keep the old timeline (TimeLineID tli) argument, and make the options const char * [], and let startstreaming() assemble the final string, including commas and parentheses. It's still not a perfect abstraction, because you need to do the quoting yourself, but much better. (Alternatively, get rid of the startstreaming call and just have callers use libpqrcv_PQexec directly.) --- Some of the header files are named inconsistently with their .c files. I think src/include/replication/logicalworker.h should be split into logicalapply.h and logicallauncher.h. Not sure about worker_internal.h. Maybe rename apply.c to worker.c? (I'm also not fond of throwing publicationcmds.h and subscriptioncmds.h together into replicationcmds.h. Maybe that could be changed, too.) --- Various FATAL errors in logical/relation.c when the target relation is not in the right state. Could those not be ERRORs? The behavior is the same at the moment because background workers terminate on uncaught exceptions, but that should eventually be improved. A FATAL error will lead to a LOG: unexpected EOF on standby connection on the publisher, because the process just dies without protocol shutdown. (And then it reconnects and tries again. So we might as well not die and just retry again.) --- In LogicalRepRelMapEntry, rename rel to localrel, so it's clearer in the code using this struct. (Maybe reloid -> localreloid) --- Partitioned tables are not supported in either publications or as replication targets. This is expected but should be fixed before the final release. --- In apply.c: The comment in apply_handle_relation() makes a point that the schema validation is done later, but does not tell why. The answer is probably because it doesn't matter and it's more convenient, but it should be explained in the comment. See XXX comment in logicalrep_worker_stop(). The get_flush_position() return value is not intuitive from the function name. Maybe make that another pointer argument for clarity. reread_subscription() complains if the subscription name was changed. I don't know why that is a problem. --- In launcher.c: pg_stat_get_subscription should hold LogicalRepWorkerLock around the whole loop, so that it doesn't get inconsistent results when workers change during the loop. --- In relation.c: Inconsistent use of uint32 vs LogicalRepRelId. Pick one. :) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
0005-Add-separate-synchronous-commit-control-for-logical--v16.patch.gz This looks a little bit hackish. I'm not sure how this would behave properly when either synchronous_commit or logical_replication_synchronous_commit is changed at run time with a reload. I'm thinking maybe this and perhaps some other WAL receiver settings should be properties of a subscription, like ALTER SUBSCRIPTION ... SET/RESET. Actually, maybe I'm a bit confused what this is supposed to achieve. synchronous_commit has both a local and a remote meaning. What behavior are the various combinations of physical and logical replication supposed to accomplish? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 1/2/17 8:32 AM, Petr Jelinek wrote: > On 02/01/17 05:23, Steve Singer wrote: >> but I can't drop the subscription either >> >> >> test_b=# drop subscription mysub; >> ERROR: could not drop replication origin with OID 1, in use by PID 24996 >> >> alter subscription mysub disable; >> ALTER SUBSCRIPTION >> drop subscription mysub; >> ERROR: could not drop replication origin with OID 1, in use by PID 24996 >> >> drop subscription mysub nodrop slot; >> >> doesn't work either. If I first drop the working/active subscription on >> the original 'test' database it works but I can't seem to drop the >> subscription record on test_b I can't reproduce this exactly, but I notice that CREATE SUBSCRIPTION NOCREATE SLOT does not create a replication origin, but DROP SUBSCRIPTION NODROP SLOT does attempt to drop the origin. If the origin is not in use, it will just go away, but if it is in use, it might lead to the situation described above, where the second subscription cannot be removed. > I guess this is because replication origins are pg instance global and > we use subscription name for origin name internally. Maybe we need to > prefix/suffix it with db oid or something like that, but that's > problematic a bit as well as they both have same length limit. I guess > we could use subscription OID as replication origin name which is > somewhat less user friendly in terms of debugging but would be unique. I think the most robust way would be to associate origins to subscriptions using the object dependency mechanism, and just pick an internal name like we do for automatically created indexes or sequences, for example. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 1/3/17 5:23 PM, Petr Jelinek wrote: > I got this remark about IsCatalogClass() from Andres offline as well, > but it's not true, it only checks for FirstNormalObjectId for objects in > pg_catalog and toast schemas, not anywhere else. I see your statement is correct, but I'm not sure the overall behavior is sensible. Either we consider the information_schema tables to be catalog tables, and then IsCatalogClass() should be changed, or we consider then non-catalog tables, and then we should let them be in publications. I don't think having a third category of sometimes-catalog tables is desirable. Currently, they clearly behave like non-catalog tables, since you can just drop and recreate them freely, so I would choose the second option.It might be worth changing that, but it doesn't haveto be the job of this patch set. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 10/01/17 14:52, Peter Eisentraut wrote: > On 1/2/17 8:32 AM, Petr Jelinek wrote: >> On 02/01/17 05:23, Steve Singer wrote: >>> but I can't drop the subscription either >>> >>> >>> test_b=# drop subscription mysub; >>> ERROR: could not drop replication origin with OID 1, in use by PID 24996 >>> >>> alter subscription mysub disable; >>> ALTER SUBSCRIPTION >>> drop subscription mysub; >>> ERROR: could not drop replication origin with OID 1, in use by PID 24996 >>> >>> drop subscription mysub nodrop slot; >>> >>> doesn't work either. If I first drop the working/active subscription on >>> the original 'test' database it works but I can't seem to drop the >>> subscription record on test_b > > I can't reproduce this exactly, but I notice that CREATE SUBSCRIPTION > NOCREATE SLOT does not create a replication origin, but DROP > SUBSCRIPTION NODROP SLOT does attempt to drop the origin. If the origin > is not in use, it will just go away, but if it is in use, it might lead > to the situation described above, where the second subscription cannot > be removed. This is thinko in it's own regard, origin needs to be created regardless of the slot. > >> I guess this is because replication origins are pg instance global and >> we use subscription name for origin name internally. Maybe we need to >> prefix/suffix it with db oid or something like that, but that's >> problematic a bit as well as they both have same length limit. I guess >> we could use subscription OID as replication origin name which is >> somewhat less user friendly in terms of debugging but would be unique. > > I think the most robust way would be to associate origins to > subscriptions using the object dependency mechanism, and just pick an > internal name like we do for automatically created indexes or sequences, > for example. > That will not help, issue is that we consider names for origins to be unique across cluster while subscription names are per database so if there is origin per subscription (which there has to be) it will always clash if we just use the name. I already have locally changed this to pg_<subscription_oid> naming scheme and it works fine. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 10/01/17 15:06, Peter Eisentraut wrote: > On 1/3/17 5:23 PM, Petr Jelinek wrote: >> I got this remark about IsCatalogClass() from Andres offline as well, >> but it's not true, it only checks for FirstNormalObjectId for objects in >> pg_catalog and toast schemas, not anywhere else. > > I see your statement is correct, but I'm not sure the overall behavior > is sensible. Either we consider the information_schema tables to be > catalog tables, and then IsCatalogClass() should be changed, or we > consider then non-catalog tables, and then we should let them be in > publications. I don't think having a third category of > sometimes-catalog tables is desirable. > > Currently, they clearly behave like non-catalog tables, since you can > just drop and recreate them freely, so I would choose the second option. > It might be worth changing that, but it doesn't have to be the job of > this patch set. > Okay, looking into my notes, I originally did this because we did not allow adding tables without pkeys to publications which effectively prohibited FOR ALL TABLES publication from working because of information_schema without this. Since this is no longer the case I think it's safe to skip the FirstNormalObjectId check. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 1/11/17 3:11 AM, Petr Jelinek wrote: > That will not help, issue is that we consider names for origins to be > unique across cluster while subscription names are per database so if > there is origin per subscription (which there has to be) it will always > clash if we just use the name. I already have locally changed this to > pg_<subscription_oid> naming scheme and it works fine. How will that make it unique across the cluster? Should we include the system ID from pg_control? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 1/11/17 3:29 AM, Petr Jelinek wrote: > Okay, looking into my notes, I originally did this because we did not > allow adding tables without pkeys to publications which effectively > prohibited FOR ALL TABLES publication from working because of > information_schema without this. Since this is no longer the case I > think it's safe to skip the FirstNormalObjectId check. Wouldn't that mean that FOR ALL TABLES replicates the tables from information_schema? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 11/01/17 18:32, Peter Eisentraut wrote: > On 1/11/17 3:29 AM, Petr Jelinek wrote: >> Okay, looking into my notes, I originally did this because we did not >> allow adding tables without pkeys to publications which effectively >> prohibited FOR ALL TABLES publication from working because of >> information_schema without this. Since this is no longer the case I >> think it's safe to skip the FirstNormalObjectId check. > > Wouldn't that mean that FOR ALL TABLES replicates the tables from > information_schema? > Yes, as they are not catalog tables, I thought that was your point. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 11/01/17 18:27, Peter Eisentraut wrote: > On 1/11/17 3:11 AM, Petr Jelinek wrote: >> That will not help, issue is that we consider names for origins to be >> unique across cluster while subscription names are per database so if >> there is origin per subscription (which there has to be) it will always >> clash if we just use the name. I already have locally changed this to >> pg_<subscription_oid> naming scheme and it works fine. > > How will that make it unique across the cluster? > > Should we include the system ID from pg_control? > pg_subscription is shared catalog so oids are unique. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 1/11/17 3:35 PM, Petr Jelinek wrote: > On 11/01/17 18:27, Peter Eisentraut wrote: >> On 1/11/17 3:11 AM, Petr Jelinek wrote: >>> That will not help, issue is that we consider names for origins to be >>> unique across cluster while subscription names are per database so if >>> there is origin per subscription (which there has to be) it will always >>> clash if we just use the name. I already have locally changed this to >>> pg_<subscription_oid> naming scheme and it works fine. >> >> How will that make it unique across the cluster? >> >> Should we include the system ID from pg_control? >> > > pg_subscription is shared catalog so oids are unique. Oh, I see what you mean by cluster now. It's a confusing term. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 1/11/17 3:35 PM, Petr Jelinek wrote: > On 11/01/17 18:32, Peter Eisentraut wrote: >> On 1/11/17 3:29 AM, Petr Jelinek wrote: >>> Okay, looking into my notes, I originally did this because we did not >>> allow adding tables without pkeys to publications which effectively >>> prohibited FOR ALL TABLES publication from working because of >>> information_schema without this. Since this is no longer the case I >>> think it's safe to skip the FirstNormalObjectId check. >> >> Wouldn't that mean that FOR ALL TABLES replicates the tables from >> information_schema? >> > > Yes, as they are not catalog tables, I thought that was your point. But we shouldn't do that. So we need to exclude information_schema from "all tables" somehow. Just probably not by OID, since that is not fixed. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 11/01/17 22:30, Peter Eisentraut wrote: > On 1/11/17 3:35 PM, Petr Jelinek wrote: >> On 11/01/17 18:32, Peter Eisentraut wrote: >>> On 1/11/17 3:29 AM, Petr Jelinek wrote: >>>> Okay, looking into my notes, I originally did this because we did not >>>> allow adding tables without pkeys to publications which effectively >>>> prohibited FOR ALL TABLES publication from working because of >>>> information_schema without this. Since this is no longer the case I >>>> think it's safe to skip the FirstNormalObjectId check. >>> >>> Wouldn't that mean that FOR ALL TABLES replicates the tables from >>> information_schema? >>> >> >> Yes, as they are not catalog tables, I thought that was your point. > > But we shouldn't do that. So we need to exclude information_schema from > "all tables" somehow. Just probably not by OID, since that is not fixed. > I am not quite sure I agree with this. Either it's system object and we don't replicate it (which I would have considered to be anything with Oid < FirstNormalObjectId) or it's user made and then it should be replicated. Filtering by schema name is IMHO way too fragile (what stops user creating additional tables there for example). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 06/01/17 21:26, Peter Eisentraut wrote: > 0005-Add-separate-synchronous-commit-control-for-logical--v16.patch.gz > > This looks a little bit hackish. I'm not sure how this would behave > properly when either synchronous_commit or > logical_replication_synchronous_commit is changed at run time with a reload. > Yes, I said in the initial email that this is meant for discussion and not as final implementation. And certainly it's not required for initial commit. Perhaps I should have started separate thread for this part. > I'm thinking maybe this and perhaps some other WAL receiver settings > should be properties of a subscription, like ALTER SUBSCRIPTION ... > SET/RESET. > True, but we still need the GUC defaults. > Actually, maybe I'm a bit confused what this is supposed to achieve. > synchronous_commit has both a local and a remote meaning. What behavior > are the various combinations of physical and logical replication > supposed to accomplish? > It's meant to decouple the synchronous commit setting for logical replication workers from the one set for normal clients. Now that we have owners for subscription and subscription runs as that owner, maybe we could do that via ALTER USER. However I think the apply should by default run with sync commit turned off as the performance benefits are important there given that there is one worker that has to replicate in serialized manner and the success of replication is not confirmed by responding to COMMIT but by reporting LSNs of various replication stages. Perhaps the logical_replication_synchronous_commit should only be boolean that would translate to 'off' and 'local' for the real synchronous_commit. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 15-01-2017 15:13, Petr Jelinek wrote: > I am not quite sure I agree with this. Either it's system object and we > don't replicate it (which I would have considered to be anything with > Oid < FirstNormalObjectId) or it's user made and then it should be > replicated. Filtering by schema name is IMHO way too fragile (what stops > user creating additional tables there for example). > What happens if you replicate information_schema tables? AFAICS, those tables are already in the subscriber database. And will it generate error or warning? (I'm not sure how this functionality deals with schemas.) Also, why do I want to replicate a information schema table? Their contents are static and, by default, it is already in each database. Information schema isn't a catalog but I think it is good to exclude it from FOR ALL TABLES clause because the use case is almost zero. Of course, it should be documented. Also, if someone wants to replicate an information schema table, it could do it with ALTER PUBLICATION. -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte24x7 e Treinamento
On 15/01/17 20:20, Euler Taveira wrote: > On 15-01-2017 15:13, Petr Jelinek wrote: >> I am not quite sure I agree with this. Either it's system object and we >> don't replicate it (which I would have considered to be anything with >> Oid < FirstNormalObjectId) or it's user made and then it should be >> replicated. Filtering by schema name is IMHO way too fragile (what stops >> user creating additional tables there for example). >> > What happens if you replicate information_schema tables? AFAICS, those > tables are already in the subscriber database. And will it generate > error or warning? (I'm not sure how this functionality deals with > schemas.) Also, why do I want to replicate a information schema table? > Their contents are static and, by default, it is already in each database. > > Information schema isn't a catalog but I think it is good to exclude it > from FOR ALL TABLES clause because the use case is almost zero. Of > course, it should be documented. Also, if someone wants to replicate an > information schema table, it could do it with ALTER PUBLICATION. > Well the preinstalled information_schema is excluded by the FirstNormalObjectId filter as it's created by initdb. If user drops and recreates it that means it was created as user object. My opinion is that FOR ALL TABLES should replicate all user tables (ie, anything that has Oid >= FirstNormalObjectId), if those are added to information_schema that's up to user. We also replicate user created tables in pg_catalog even if it's system catalog so I don't see why information_schema should be filtered on schema level. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi, finally got to this (multiple emails squashed into one). On 04/01/17 18:46, Peter Eisentraut wrote: > Some small patches for 0002-Add-SUBSCRIPTION-catalog-and-DDL-v16.patch.gz: > Merged thanks. > In CreateSubscription(), I don't think we should connect to the remote > if no slot creation is requested. Arguably, the point of that option is > to not make network connections. (That is what my documentation patch > above claims, in any case.) > Agreed and done. > I don't know why we need to check the PostgreSQL version number of the > remote. We should rely on the protocol version number, and we should > just make it work. When PG 11 comes around, subscribing from PG 10 to a > publisher on PG 11 should just work without any warnings, IMO. > Also agreed and removed. > 003-Define-logical-replication-protocol-and-output-plugi-v16.patch.gz > looks good now, documentation is clear now. > > Another fixup patch to remove excessive includes. Thanks merged. > Comments on 0004-Add-logical-replication-workers-v16.patch.gz: > > I didn't find any major problems. At times while I was testing strange > things it was not clear why "nothing is happening". I'll do some more > checking in that direction. > > Fixup patch attached that enhances some error messages, fixes some > typos, and other minor changes. See also comments below. > Merged. > > The way check_max_logical_replication_workers() is implemented creates > potential ordering dependencies in postgresql.conf. For example, > > max_logical_replication_workers = 100 > max_worker_processes = 200 > > fails, but if you change the order, it works. The existing > check_max_worker_processes() has the same problem, but I suspect because > it only checks against MAX_BACKENDS, nobody has ever seriously hit that > limit. > > I suggest just removing the check. If you set > max_logical_replication_workers higher than max_worker_processes and you > hit the lower limit, then whatever is controlling max_worker_processes > should complain with its own error message. > Good point, removed. > > The default for max_logical_replication_workers is 4, which seems very > little. Maybe it should be more like 10 or 20. The "Quick setup" > section recommends changing it to 10. We should at least be > consistent there: If you set a default value that is not 0, then it > should enough that we don't need to change it again in the Quick > setup. (Maybe the default max_worker_processes should also be > raised?) Well, it's 4 because max_worker_processes is 8, I think default max_worker_processes should be higher than max_logical_replication_workers so that's why I picked 4. If we are okay wit bumping the max_worker_processes a bit, I am all for increasing max_logical_replication_workers as well. The quick setup mentions 10 mainly for consistency with slots and wal senders (those IMHO should also not be 0 by default at this point...). > > +max_logical_replication_workers = 10 # one per subscription + one per > instance needed on subscriber > > I think this is incorrect (copied from max_worker_processes?). The > launcher does not count as one of the workers here. > > On a related note, should the minimum not be 0 instead of 1? > Eh, yes. > > About the changes to libpqrcv_startstreaming(). The timeline is not > really an option in the syntax. Just passing in a string that is > pasted in the final command creates too much coupling, I think. I > would keep the old timeline (TimeLineID tli) argument, and make the > options const char * [], and let startstreaming() assemble the final > string, including commas and parentheses. It's still not a perfect > abstraction, because you need to do the quoting yourself, but much > better. (Alternatively, get rid of the startstreaming call and just > have callers use libpqrcv_PQexec directly.) > I did this somewhat differently, with struct that defines options and has different union members for physical and logical replication. What do you think of that? > > Some of the header files are named inconsistently with their .c files. > I think src/include/replication/logicalworker.h should be split into > logicalapply.h and logicallauncher.h. Okay. > Not sure about > worker_internal.h. Maybe rename apply.c to worker.c? > Hmm I did that, seems reasonably okay. Original patch in fact had both worker.c and apply.c and I eventually moved the worker.c functions to either apply.c or launcher.c. > (I'm also not fond of throwing publicationcmds.h and > subscriptioncmds.h together into replicationcmds.h. Maybe that could > be changed, too) Okay. > > Various FATAL errors in logical/relation.c when the target relation is > not in the right state. Could those not be ERRORs? The behavior is > the same at the moment because background workers terminate on > uncaught exceptions, but that should eventually be improved. > Seems like you changed this in your patch. I don't have any objections. > > In LogicalRepRelMapEntry, rename rel to localrel, so it's clearer in > the code using this struct. (Maybe reloid -> localreloid) > Okay. > > Partitioned tables are not supported in either publications or as > replication targets. This is expected but should be fixed before the > final release. > Yes, that will need some discussion about corner case behaviour. For example, have partitioned table 'foo' which is in publication, then you have table 'bar' which is not in publication, you attach it to the partitioned table 'foo', should it automatically be added to publication? Then you detach it, should it then be removed from publication? What if 'bar' was in publication before it was attached/detached to/from 'foo'? What if 'foo' wasn't in publication but 'bar' was? Should we allow ONLY syntax for partitioned table when they are being added and removed? Sadly current partitioning section of the docs doesn't provide any guidance in terms of precedents for other actions here as it still speaks about using inheritance and check constraints directly instead of the new feature. My proposal would be to let partitions to be added/removed to/from publications normally (as they are now) and have them also check if parent table is published in case they aren't (ie, if partitioned table is in some publications, all partitions are implicitly as well without adding them to the pg_publication_rel catalog, but they also keep their own membership in publications as well individually there). That would mean we don't allow ONLY syntax for partitioned tables. One scenario where I am on the fence is what should happen here if we do ALTER PUBLICATION ... DROP TABLE partitioned_table in case that partitioned_table also contains partition which was explicitly added to the publication, should it keep its own membership or should it be removed? Maybe we could allow the ONLY clause only for DROP but not for ADD? > > In apply.c: > > The comment in apply_handle_relation() makes a point that the schema > validation is done later, but does not tell why. The answer is > probably because it doesn't matter and it's more convenient, but it > should be explained in the comment. Yes I noticed, I tried to explain. > > See XXX comment in logicalrep_worker_stop(). Yes that was a good point. > > The get_flush_position() return value is not intuitive from the > function name. Maybe make that another pointer argument for clarity. Okay. > reread_subscription() complains if the subscription name was changed. > I don't know why that is a problem. Because we don't have ALTER SUBSCRIPTION RENAME currently. Maybe should be Assert? > > In launcher.c: > > pg_stat_get_subscription should hold LogicalRepWorkerLock around the > whole loop, so that it doesn't get inconsistent results when workers > change during the loop. > Done. > In relation.c: > > Inconsistent use of uint32 vs LogicalRepRelId. Pick one. Done. Attached is new version with your changes merged and above suggestions applied. It still does not support partitioned tables and does the filtering using FirstNormalObjectId. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 2017-01-15 23:20, Petr Jelinek wrote: > 0001-Add-PUBLICATION-catalogs-and-DDL-v18.patch > 0002-Add-SUBSCRIPTION-catalog-and-DDL-v18.patch > 0003-Define-logical-replication-protocol-and-output-plugi-v18.patch > 0004-Add-logical-replication-workers-v18.patch > 0005-Add-separate-synchronous-commit-control-for-logical--v18.patch patches apply OK (to master), but I get this compile error: execReplication.c: In function ‘ExecSimpleRelationInsert’: execReplication.c:392:41: warning: passing argument 3 of ‘ExecConstraints’ from incompatible pointer type [-Wincompatible-pointer-types] ExecConstraints(resultRelInfo, slot, estate); ^~~~~~ In file included from execReplication.c:21:0: ../../../src/include/executor/executor.h:197:13: note: expected ‘TupleTableSlot * {aka struct TupleTableSlot *}’ but argument is of type ‘EState * {aka struct EState *}’ extern void ExecConstraints(ResultRelInfo *resultRelInfo, ^~~~~~~~~~~~~~~ execReplication.c:392:4: error: too few arguments to function ‘ExecConstraints’ ExecConstraints(resultRelInfo, slot, estate); ^~~~~~~~~~~~~~~ In file included from execReplication.c:21:0: ../../../src/include/executor/executor.h:197:13: note: declared here extern void ExecConstraints(ResultRelInfo *resultRelInfo, ^~~~~~~~~~~~~~~ execReplication.c: In function ‘ExecSimpleRelationUpdate’: execReplication.c:451:41: warning: passing argument 3 of ‘ExecConstraints’ from incompatible pointer type [-Wincompatible-pointer-types] ExecConstraints(resultRelInfo, slot, estate); ^~~~~~ In file included from execReplication.c:21:0: ../../../src/include/executor/executor.h:197:13: note: expected ‘TupleTableSlot * {aka struct TupleTableSlot *}’ but argument is of type ‘EState * {aka struct EState *}’ extern void ExecConstraints(ResultRelInfo *resultRelInfo, ^~~~~~~~~~~~~~~ execReplication.c:451:4: error: too few arguments to function ‘ExecConstraints’ ExecConstraints(resultRelInfo, slot, estate); ^~~~~~~~~~~~~~~ In file included from execReplication.c:21:0: ../../../src/include/executor/executor.h:197:13: note: declared here extern void ExecConstraints(ResultRelInfo *resultRelInfo, ^~~~~~~~~~~~~~~ make[3]: *** [execReplication.o] Error 1 make[2]: *** [executor-recursive] Error 2 make[1]: *** [install-backend-recurse] Error 2 make: *** [install-src-recurse] Error 2 Erik Rijkers
On 15/01/17 23:57, Erik Rijkers wrote: > On 2017-01-15 23:20, Petr Jelinek wrote: > >> 0001-Add-PUBLICATION-catalogs-and-DDL-v18.patch >> 0002-Add-SUBSCRIPTION-catalog-and-DDL-v18.patch >> 0003-Define-logical-replication-protocol-and-output-plugi-v18.patch >> 0004-Add-logical-replication-workers-v18.patch >> 0005-Add-separate-synchronous-commit-control-for-logical--v18.patch > > patches apply OK (to master), but I get this compile error: > Ah missed that during final rebase, sorry. Here is fixed 0004 patch. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 1/15/17 2:28 PM, Petr Jelinek wrote: > Well the preinstalled information_schema is excluded by the > FirstNormalObjectId filter as it's created by initdb. If user drops and > recreates it that means it was created as user object. > > My opinion is that FOR ALL TABLES should replicate all user tables (ie, > anything that has Oid >= FirstNormalObjectId), if those are added to > information_schema that's up to user. We also replicate user created > tables in pg_catalog even if it's system catalog so I don't see why > information_schema should be filtered on schema level. Fair enough. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 1/15/17 5:20 PM, Petr Jelinek wrote: > Well, it's 4 because max_worker_processes is 8, I think default > max_worker_processes should be higher than > max_logical_replication_workers so that's why I picked 4. If we are okay > wit bumping the max_worker_processes a bit, I am all for increasing > max_logical_replication_workers as well. > > The quick setup mentions 10 mainly for consistency with slots and wal > senders (those IMHO should also not be 0 by default at this point...). Those defaults have now been changed, so the "Quick setup" section could potentially be simplified a bit. > I did this somewhat differently, with struct that defines options and > has different union members for physical and logical replication. What > do you think of that? Looks good. >> Not sure about >> worker_internal.h. Maybe rename apply.c to worker.c? >> > > Hmm I did that, seems reasonably okay. Original patch in fact had both > worker.c and apply.c and I eventually moved the worker.c functions to > either apply.c or launcher.c. I'm not too worried about this. > Yes, that will need some discussion about corner case behaviour. For > example, have partitioned table 'foo' which is in publication, then you > have table 'bar' which is not in publication, you attach it to the > partitioned table 'foo', should it automatically be added to > publication? Then you detach it, should it then be removed from publication? > What if 'bar' was in publication before it was attached/detached to/from > 'foo'? What if 'foo' wasn't in publication but 'bar' was? Should we > allow ONLY syntax for partitioned table when they are being added and > removed? Let's think about that in a separate thread. >> reread_subscription() complains if the subscription name was changed. >> I don't know why that is a problem. > > Because we don't have ALTER SUBSCRIPTION RENAME currently. Maybe should > be Assert? Is there anything stopping anyone from implementing it? I'm happy with these patches now. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 1/15/17 1:48 PM, Petr Jelinek wrote: > It's meant to decouple the synchronous commit setting for logical > replication workers from the one set for normal clients. Now that we > have owners for subscription and subscription runs as that owner, maybe > we could do that via ALTER USER. I was thinking about that as well. > However I think the apply should by > default run with sync commit turned off as the performance benefits are > important there given that there is one worker that has to replicate in > serialized manner and the success of replication is not confirmed by > responding to COMMIT but by reporting LSNs of various replication stages. Hmm, I don't think we should ship with an "unsafe" default. Do we have any measurements of the performance impact? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 17/01/17 17:09, Peter Eisentraut wrote: > >> Yes, that will need some discussion about corner case behaviour. For >> example, have partitioned table 'foo' which is in publication, then you >> have table 'bar' which is not in publication, you attach it to the >> partitioned table 'foo', should it automatically be added to >> publication? Then you detach it, should it then be removed from publication? >> What if 'bar' was in publication before it was attached/detached to/from >> 'foo'? What if 'foo' wasn't in publication but 'bar' was? Should we >> allow ONLY syntax for partitioned table when they are being added and >> removed? > > Let's think about that in a separate thread. > Agreed. >>> reread_subscription() complains if the subscription name was changed. >>> I don't know why that is a problem. >> >> Because we don't have ALTER SUBSCRIPTION RENAME currently. Maybe should >> be Assert? > > Is there anything stopping anyone from implementing it? > No, just didn't seem priority for the functionality right now. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 17/01/17 17:11, Peter Eisentraut wrote: > On 1/15/17 1:48 PM, Petr Jelinek wrote: >> It's meant to decouple the synchronous commit setting for logical >> replication workers from the one set for normal clients. Now that we >> have owners for subscription and subscription runs as that owner, maybe >> we could do that via ALTER USER. > > I was thinking about that as well. > >> However I think the apply should by >> default run with sync commit turned off as the performance benefits are >> important there given that there is one worker that has to replicate in >> serialized manner and the success of replication is not confirmed by >> responding to COMMIT but by reporting LSNs of various replication stages. > > Hmm, I don't think we should ship with an "unsafe" default. Do we have > any measurements of the performance impact? > I will have to do some for the patch specifically, I only have ones for pglogical/bdr where it's quite significant. The default is not unsafe really, we still report correct flush position to the publisher. The synchronous replication on publisher will still work even if synchronous standby is subscription which itself has sync commit off (that's why the complicated send_feedback()/get_flush_position()) but will have higher latency as flushes don't happen immediately. Cascading should be fine as well even around crashes as logical decoding only picks up flushed WAL. It could be however argued there may be some consistency issues around the crash as other transactions could have already seen data that disappeared after postgres recovery and then reappeared again when the replication caught up again. That might indeed be a show stopper for the default off. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Jan 17, 2017 at 11:15 AM, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: >> Is there anything stopping anyone from implementing it? > > No, just didn't seem priority for the functionality right now. Why is it OK for this to not support rename like everything else does?It shouldn't be more than a few hours of work to fixthat, and I think leaving stuff like that out just because it's a lower priority is fairly short-sighted. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 17/01/17 22:43, Robert Haas wrote: > On Tue, Jan 17, 2017 at 11:15 AM, Petr Jelinek > <petr.jelinek@2ndquadrant.com> wrote: >>> Is there anything stopping anyone from implementing it? >> >> No, just didn't seem priority for the functionality right now. > > Why is it OK for this to not support rename like everything else does? > It shouldn't be more than a few hours of work to fix that, and I > think leaving stuff like that out just because it's a lower priority > is fairly short-sighted. > Sigh, I wanted to leave it for next CF, but since you insist. Here is a patch that adds rename. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Re: [HACKERS] Logical Replication WIP - FailedAssertion, File:"array_typanalyze.c", Line: 340
From
Erik Rijkers
Date:
On 2017-01-19 01:02, Petr Jelinek wrote: This causes the replica to crash: #-------------- #!/bin/bash # 2 instances on 6972 (master) and 6973 (replica) # initially without publication or subscription # clean logs #echo > /var/data1/pg_stuff/pg_installations/pgsql.logical_replication/logfile.logical_replication #echo > /var/data1/pg_stuff/pg_installations/pgsql.logical_replication2/logfile.logical_replication2 SLEEP=1 bail=0 pub_count=$( echo "select count(*) from pg_publication" | psql -qtAXp 6972 ) if [[ $pub_count -ne 0 ]] then echo "pub_count -ne 0 - deleting pub1 & bailing out" echo "drop publication if exists pub1" | psql -Xp 6972 bail=1 fi sub_count=$( echo "select count(*) from pg_subscription" | psql -qtAXp 6973 ) if [[ $sub_count -ne 0 ]] then echo "sub_count -ne 0 - deleting sub1 & bailing out" echo "drop subscription if exists sub1"| psql -Xp 6973 bail=1 fi if [[ $bail -eq 1 ]] then exit -1 fi echo "drop table if exists testt;" | psql -qXap 6972 echo "drop table if exists testt;" | psql -qXap 6973 echo "-- on master (port 6972): create table testt(id serial primary key, n integer, c text); create publication pub1 for all tables; " | psql -qXap 6972 echo "-- on replica (port 6973): create table testt(id serial primary key, n integer, c text); create subscription sub1 connection 'port=6972' publication pub1 with (disabled); alter subscription sub1 enable; " | psql -qXap 6973 sleep $SLEEP echo "table testt /*limit 3*/; select current_setting('port'), count(*) from testt;" | psql -qXp 6972 echo "table testt /*limit 3*/; select current_setting('port'), count(*) from testt;" | psql -qXp 6973 echo "-- now crash: analyze pg_subscription" | psql -qXp 6973 #-------------- -- log of the replica: 2017-01-19 17:54:09.163 CET 224200 LOG: starting logical replication worker for subscription "sub1" 2017-01-19 17:54:09.166 CET 21166 LOG: logical replication apply for subscription sub1 started 2017-01-19 17:54:09.169 CET 21166 LOG: starting logical replication worker for subscription "sub1" 2017-01-19 17:54:09.172 CET 21171 LOG: logical replication sync for subscription sub1, table testt started 2017-01-19 17:54:09.190 CET 21171 LOG: logical replication synchronization worker finished processing TRAP: FailedAssertion("!(((array)->elemtype) == extra_data->type_id)", File: "array_typanalyze.c", Line: 340) 2017-01-19 17:54:20.110 CET 224190 LOG: server process (PID 21183) was terminated by signal 6: Aborted 2017-01-19 17:54:20.110 CET 224190 DETAIL: Failed process was running: autovacuum: ANALYZE pg_catalog.pg_subscription 2017-01-19 17:54:20.110 CET 224190 LOG: terminating any other active server processes 2017-01-19 17:54:20.110 CET 224198 WARNING: terminating connection because of crash of another server process 2017-01-19 17:54:20.110 CET 224198 DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2017-01-19 17:54:20.110 CET 224198 HINT: In a moment you should be able to reconnect to the database and repeat your command. 2017-01-19 17:54:20.111 CET 224190 LOG: all server processes terminated; reinitializing 2017-01-19 17:54:20.143 CET 21184 LOG: database system was interrupted; last known up at 2017-01-19 17:38:48 CET 2017-01-19 17:54:20.179 CET 21184 LOG: recovered replication state of node 1 to 0/2CEBF08 2017-01-19 17:54:20.179 CET 21184 LOG: database system was not properly shut down; automatic recovery in progress 2017-01-19 17:54:20.181 CET 21184 LOG: redo starts at 0/2513E88 2017-01-19 17:54:20.184 CET 21184 LOG: invalid record length at 0/2546980: wanted 24, got 0 2017-01-19 17:54:20.184 CET 21184 LOG: redo done at 0/2546918 2017-01-19 17:54:20.184 CET 21184 LOG: last completed transaction was at log time 2017-01-19 17:54:09.191697+01 2017-01-19 17:54:20.191 CET 21184 LOG: MultiXact member wraparound protections are now enabled 2017-01-19 17:54:20.193 CET 224190 LOG: database system is ready to accept connections 2017-01-19 17:54:20.193 CET 21188 LOG: autovacuum launcher started 2017-01-19 17:54:20.194 CET 21190 LOG: logical replication launcher started 2017-01-19 17:54:20.194 CET 21190 LOG: starting logical replication worker for subscription "sub1" 2017-01-19 17:54:20.202 CET 21191 LOG: logical replication apply for subscription sub1 started Could probably be whittled down to something shorter but I hope it's still easily reproduced. thanks, Erik Rijkers setup of the 2 instances: #---------------- ./instances.sh #!/bin/bash port1=6972 port2=6973 project1=logical_replication project2=logical_replication2 # pg_stuff_dir=$HOME/pg_stuff pg_stuff_dir=/var/data1/pg_stuff PATH1=$pg_stuff_dir/pg_installations/pgsql.$project1/bin:$PATH PATH2=$pg_stuff_dir/pg_installations/pgsql.$project2/bin:$PATH server_dir1=$pg_stuff_dir/pg_installations/pgsql.$project1 server_dir2=$pg_stuff_dir/pg_installations/pgsql.$project2 data_dir1=$server_dir1/data data_dir2=$server_dir2/data options1=" -c wal_level=logical -c max_replication_slots=10 -c max_worker_processes=12 -c max_logical_replication_workers=10 -c max_wal_senders=10 -c logging_collector=on -c log_directory=$server_dir1 -c log_filename=logfile.${project1} " options2=" -c wal_level=replica -c max_replication_slots=10 -c max_worker_processes=12 -c max_logical_replication_workers=10 -c max_wal_senders=10 -c logging_collector=on -c log_directory=$server_dir2 -c log_filename=logfile.${project2} " which postgres export PATH=$PATH1; postgres -D $data_dir1 -p $port1 ${options1} & export PATH=$PATH2; postgres -D $data_dir2 -p $port2 ${options2} & #---------------- ./instances.sh end
Re: [HACKERS] Logical Replication WIP - FailedAssertion, File:"array_typanalyze.c", Line: 340
From
Petr Jelinek
Date:
On 19/01/17 18:44, Erik Rijkers wrote: > > Could probably be whittled down to something shorter but I hope it's > still easily reproduced. > Just analyze on the pg_subscription is enough. Looks like it's the name[] type, when I change it to text[] like in the attached patch it works fine for me. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Re: [HACKERS] Logical Replication WIP - FailedAssertion, File:"array_typanalyze.c", Line: 340
From
Erik Rijkers
Date:
On 2017-01-19 19:12, Petr Jelinek wrote: > On 19/01/17 18:44, Erik Rijkers wrote: >> >> Could probably be whittled down to something shorter but I hope it's >> still easily reproduced. >> > > Just analyze on the pg_subscription is enough. heh. Ah well, I did find it :) Can you give the current patch set? I am failing to get a compilable set. In the following order they apply, but then fail during compile. 0001-Add-PUBLICATION-catalogs-and-DDL-v18.patch 0002-Add-SUBSCRIPTION-catalog-and-DDL-v18.patch 0003-Define-logical-replication-protocol-and-output-plugi-v18.patch 0004-Add-logical-replication-workers-v18fixed.patch 0006-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION.patch 0001-Logical-replication-support-for-initial-data-copy-v3.patch pg_subscription-analyze-fix.diff The compile fails with: In file included from ../../../../src/include/postgres.h:47:0, from worker.c:27: worker.c: In function ‘create_estate_for_relation’: ../../../../src/include/c.h:203:14: warning: passing argument 4 of ‘InitResultRelInfo’ makes pointer from integer without a cast [-Wint-conversion] #define true ((bool) 1) ^ worker.c:187:53: note: in expansion of macro ‘true’ InitResultRelInfo(resultRelInfo, rel->localrel, 1, true, NULL, 0); ^~~~ In file included from ../../../../src/include/funcapi.h:21:0, from worker.c:31: ../../../../src/include/executor/executor.h:189:13: note: expected ‘Relation {aka struct RelationData *}’ but argument is of type ‘char’ extern void InitResultRelInfo(ResultRelInfo *resultRelInfo, ^~~~~~~~~~~~~~~~~ worker.c:187:59: warning: passing argument 5 of ‘InitResultRelInfo’ makes integer from pointer without a cast [-Wint-conversion] InitResultRelInfo(resultRelInfo, rel->localrel, 1, true, NULL,0); ^~~~ In file included from ../../../../src/include/funcapi.h:21:0, from worker.c:31: ../../../../src/include/executor/executor.h:189:13: note: expected ‘int’ but argument is of type ‘void *’ extern void InitResultRelInfo(ResultRelInfo *resultRelInfo, ^~~~~~~~~~~~~~~~~ worker.c:187:2: error: too many arguments to function ‘InitResultRelInfo’ InitResultRelInfo(resultRelInfo, rel->localrel, 1, true, NULL, 0); ^~~~~~~~~~~~~~~~~ In file included from ../../../../src/include/funcapi.h:21:0, from worker.c:31: ../../../../src/include/executor/executor.h:189:13: note: declared here extern void InitResultRelInfo(ResultRelInfo *resultRelInfo, ^~~~~~~~~~~~~~~~~ make[4]: *** [worker.o] Error 1 make[4]: *** Waiting for unfinished jobs.... make[3]: *** [logical-recursive] Error 2 make[2]: *** [replication-recursive] Error 2 make[2]: *** Waiting for unfinished jobs.... ^[make[1]: *** [all-backend-recurse] Error 2 make: *** [all-src-recurse] Error 2 but perhaps that patchset itself is incorrect, or the order in which I applied them. Can you please put them in the right order? (I tried already a few...) thanks, Erik Rijkers
Hi, There were some conflicting changes committed today so I rebased the patch on top of them. Other than that nothing much has changed, I removed the separate sync commit patch, included the rename patch in the patchset and fixed the bug around pg_subscription catalog reported by Erik Rijkers. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 1/19/17 5:01 PM, Petr Jelinek wrote: > There were some conflicting changes committed today so I rebased the > patch on top of them. > > Other than that nothing much has changed, I removed the separate sync > commit patch, included the rename patch in the patchset and fixed the > bug around pg_subscription catalog reported by Erik Rijkers. Committed. I haven't reviewed the rename patch yet, so I'll get back to that later. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jan 20, 2017 at 11:08 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 1/19/17 5:01 PM, Petr Jelinek wrote: >> There were some conflicting changes committed today so I rebased the >> patch on top of them. >> >> Other than that nothing much has changed, I removed the separate sync >> commit patch, included the rename patch in the patchset and fixed the >> bug around pg_subscription catalog reported by Erik Rijkers. > > Committed. Sorry I've not followed the discussion about logical replication at all, but why does logical replication launcher need to start up by default? $ initdb -D data $ pg_ctl -D data start When I ran the above commands, I got the following message and found that the bgworker for logical replicatdion launcher was running. LOG: logical replication launcher started Regards, -- Fujii Masao
On 20/01/17 15:08, Peter Eisentraut wrote: > On 1/19/17 5:01 PM, Petr Jelinek wrote: >> There were some conflicting changes committed today so I rebased the >> patch on top of them. >> >> Other than that nothing much has changed, I removed the separate sync >> commit patch, included the rename patch in the patchset and fixed the >> bug around pg_subscription catalog reported by Erik Rijkers. > > Committed. I haven't reviewed the rename patch yet, so I'll get back to > that later. > Hi, Thanks! Here is fix for the dependency mess. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 20/01/17 17:05, Fujii Masao wrote: > On Fri, Jan 20, 2017 at 11:08 PM, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >> On 1/19/17 5:01 PM, Petr Jelinek wrote: >>> There were some conflicting changes committed today so I rebased the >>> patch on top of them. >>> >>> Other than that nothing much has changed, I removed the separate sync >>> commit patch, included the rename patch in the patchset and fixed the >>> bug around pg_subscription catalog reported by Erik Rijkers. >> >> Committed. > > Sorry I've not followed the discussion about logical replication at all, but > why does logical replication launcher need to start up by default? > Because running subscriptions is allowed by default. You'd need to set max_logical_replication_workers to 0 to disable that. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 20 January 2017 at 11:25, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: > On 20/01/17 17:05, Fujii Masao wrote: >> On Fri, Jan 20, 2017 at 11:08 PM, Peter Eisentraut >> <peter.eisentraut@2ndquadrant.com> wrote: >>> On 1/19/17 5:01 PM, Petr Jelinek wrote: >>>> There were some conflicting changes committed today so I rebased the >>>> patch on top of them. >>>> >>>> Other than that nothing much has changed, I removed the separate sync >>>> commit patch, included the rename patch in the patchset and fixed the >>>> bug around pg_subscription catalog reported by Erik Rijkers. >>> >>> Committed. >> >> Sorry I've not followed the discussion about logical replication at all, but >> why does logical replication launcher need to start up by default? >> > > Because running subscriptions is allowed by default. You'd need to set > max_logical_replication_workers to 0 to disable that. > surely wal_level < logical shouldn't start a logical replication launcher, and after an initdb wal_level is only replica -- Jaime Casanova www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 20/01/17 17:33, Jaime Casanova wrote: > On 20 January 2017 at 11:25, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: >> On 20/01/17 17:05, Fujii Masao wrote: >>> On Fri, Jan 20, 2017 at 11:08 PM, Peter Eisentraut >>> <peter.eisentraut@2ndquadrant.com> wrote: >>>> On 1/19/17 5:01 PM, Petr Jelinek wrote: >>>>> There were some conflicting changes committed today so I rebased the >>>>> patch on top of them. >>>>> >>>>> Other than that nothing much has changed, I removed the separate sync >>>>> commit patch, included the rename patch in the patchset and fixed the >>>>> bug around pg_subscription catalog reported by Erik Rijkers. >>>> >>>> Committed. >>> >>> Sorry I've not followed the discussion about logical replication at all, but >>> why does logical replication launcher need to start up by default? >>> >> >> Because running subscriptions is allowed by default. You'd need to set >> max_logical_replication_workers to 0 to disable that. >> > > surely wal_level < logical shouldn't start a logical replication > launcher, and after an initdb wal_level is only replica > Launcher is needed for subscriptions, subscriptions don't depend on wal_level. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 20 January 2017 at 11:39, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: > On 20/01/17 17:33, Jaime Casanova wrote: >>> >> >> surely wal_level < logical shouldn't start a logical replication >> launcher, and after an initdb wal_level is only replica >> > > Launcher is needed for subscriptions, subscriptions don't depend on > wal_level. > mmm... ok, i need to read a little then. thanks -- Jaime Casanova www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jan 20, 2017 at 11:39 AM, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: > Launcher is needed for subscriptions, subscriptions don't depend on > wal_level. I don't see how a subscription can do anything useful with wal_level < logical? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 21 Jan. 2017 06:48, "Robert Haas" <robertmhaas@gmail.com> wrote:
On Fri, Jan 20, 2017 at 11:39 AM, Petr JelinekI don't see how a subscription can do anything useful with wal_level < logical?
<petr.jelinek@2ndquadrant.com> wrote:
> Launcher is needed for subscriptions, subscriptions don't depend on
> wal_level.
The upstream must have it set to logical so we can decide the change stream.
The downstream need not. It's an independent instance.
On Fri, Jan 20, 2017 at 2:57 PM, Craig Ringer <craig@2ndquadrant.com> wrote: > > I don't see how a subscription can do anything useful with wal_level < > > logical? > > The upstream must have it set to logical so we can decide the change stream. > > The downstream need not. It's an independent instance. /me facepalms. Thanks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 20/01/17 17:23, Petr Jelinek wrote: > On 20/01/17 15:08, Peter Eisentraut wrote: >> On 1/19/17 5:01 PM, Petr Jelinek wrote: >>> There were some conflicting changes committed today so I rebased the >>> patch on top of them. >>> >>> Other than that nothing much has changed, I removed the separate sync >>> commit patch, included the rename patch in the patchset and fixed the >>> bug around pg_subscription catalog reported by Erik Rijkers. >> >> Committed. I haven't reviewed the rename patch yet, so I'll get back to >> that later. >> > > Hi, > > Thanks! > > Here is fix for the dependency mess. > Álvaro pointed out off list couple of issues with how we handle interruption of commands that connect to walsender. a) The libpqwalreceiver.c does blocking connect so it's impossible to cancel CREATE SUBSCRIPTION which is stuck on connect. This is btw preexisting problem and applies to walreceiver as well. I rewrote the connect function to use asynchronous API (patch 0001). b) We can cancel in middle of the command (when stuck in libpqrcv_PQexec) but the connection to walsender stays open which in case we are waiting for snapshot can mean that it will stay idle in transaction. I added PG_TRY wrapper which disconnects on error around this (patch 0002). And finally, while testing these two I found bug in walsender StringInfo initialization (or lack there of). There are 3 static StringInfo buffers that are initialized in WalSndLoop. Problem with that is that they can be in some rare scenarios used from CreateReplicationSlot (and IMHO StartLogicalReplication) before WalSndLoop is called which causes segfault of walsender. This is rare because it only happens when downstream closes connection during logical decoding initialization. Since it's not exactly straight forward to find when these need to be initialized based on commands, I decided to move the initialization code to exec_replication_command() since that's always called before anything so that makes it much less error prone (patch 0003). The 0003 should be backpatched all the way to 9.4 where multiple commands started using those buffers. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 20/01/17 22:30, Petr Jelinek wrote: > Since it's not exactly straight forward to find when these need to be > initialized based on commands, I decided to move the initialization code > to exec_replication_command() since that's always called before anything > so that makes it much less error prone (patch 0003). > > The 0003 should be backpatched all the way to 9.4 where multiple > commands started using those buffers. > Actually there is better place, the WalSndInit(). Just to make it easier for PeterE (or whichever committer picks this up) I attached all the logical replication followup fix/polish patches: 0001 - Changes the libpqrcv_connect to use async libpq api so that it won't get stuck forever in case of connect is stuck. This is preexisting bug that also affects walreceiver but it's less visible there as there is no SQL interface to initiate connection there. 0002 - Close replication connection when CREATE SUBSCRIPTION gets canceled (otherwise walsender on the other side may stay in idle in transaction state). 0003 - Fixes buffer initialization in walsender that I found when testing the above two. This one should be back-patched to 9.4 since it's broken since then. 0004 - Fixes the foreign key issue reported by Thom Brown and also adds tests for FK and trigger handling. 0005 - Adds support for renaming publications and subscriptions. All rebased on top of current master (90992e0). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
- 0001-Use-asynchronous-connect-API-in-libpqwalreceiver.patch
- 0002-Close-replication-connection-when-slot-creation-gets.patch
- 0003-Always-initialize-stringinfo-buffers-in-walsender.patch
- 0004-Fix-after-trigger-execution-in-logical-replication.patch
- 0005-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION.patch
On 23 January 2017 at 01:11, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: > On 20/01/17 22:30, Petr Jelinek wrote: >> Since it's not exactly straight forward to find when these need to be >> initialized based on commands, I decided to move the initialization code >> to exec_replication_command() since that's always called before anything >> so that makes it much less error prone (patch 0003). >> >> The 0003 should be backpatched all the way to 9.4 where multiple >> commands started using those buffers. >> > > Actually there is better place, the WalSndInit(). > > Just to make it easier for PeterE (or whichever committer picks this up) > I attached all the logical replication followup fix/polish patches: > > 0001 - Changes the libpqrcv_connect to use async libpq api so that it > won't get stuck forever in case of connect is stuck. This is preexisting > bug that also affects walreceiver but it's less visible there as there > is no SQL interface to initiate connection there. > > 0002 - Close replication connection when CREATE SUBSCRIPTION gets > canceled (otherwise walsender on the other side may stay in idle in > transaction state). > > 0003 - Fixes buffer initialization in walsender that I found when > testing the above two. This one should be back-patched to 9.4 since it's > broken since then. > > 0004 - Fixes the foreign key issue reported by Thom Brown and also adds > tests for FK and trigger handling. This fixes the problem for me. Thanks. > > 0005 - Adds support for renaming publications and subscriptions. Works for me. I haven't tested the first 3. Regards Thom
On Sat, Jan 21, 2017 at 1:39 AM, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: > On 20/01/17 17:33, Jaime Casanova wrote: >> On 20 January 2017 at 11:25, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: >>> On 20/01/17 17:05, Fujii Masao wrote: >>>> On Fri, Jan 20, 2017 at 11:08 PM, Peter Eisentraut >>>> <peter.eisentraut@2ndquadrant.com> wrote: >>>>> On 1/19/17 5:01 PM, Petr Jelinek wrote: >>>>>> There were some conflicting changes committed today so I rebased the >>>>>> patch on top of them. >>>>>> >>>>>> Other than that nothing much has changed, I removed the separate sync >>>>>> commit patch, included the rename patch in the patchset and fixed the >>>>>> bug around pg_subscription catalog reported by Erik Rijkers. >>>>> >>>>> Committed. >>>> >>>> Sorry I've not followed the discussion about logical replication at all, but >>>> why does logical replication launcher need to start up by default? >>>> >>> >>> Because running subscriptions is allowed by default. You'd need to set >>> max_logical_replication_workers to 0 to disable that. >>> >> >> surely wal_level < logical shouldn't start a logical replication >> launcher, and after an initdb wal_level is only replica >> > > Launcher is needed for subscriptions, subscriptions don't depend on > wal_level. But why did you enable only subscription by default while publication is disabled by default (i.e., wal_level != logical)? I think that it's better to enable both by default OR disable both by default. While I was reading the logical rep code, I found that logicalrep_worker_launch returns *without* releasing LogicalRepWorkerLock when there is no unused worker slot. This seems a bug. /* Report this after the initial starting message for consistency. */ if (max_replication_slots == 0) ereport(ERROR, (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED), errmsg("cannot start logical replicationworkers when max_replication_slots = 0"))); logicalrep_worker_launch checks max_replication_slots as above. Why does it need to check that setting value in the *subscriber* side? Maybe I'm missing something here, but ISTM that the subscription uses one replication slot in *publisher* side but doesn't use in *subscriber* side. * The apply worker may spawn additional workers (sync) for initial data * synchronization of tables. The above header comment in logical/worker.c is true? The copyright in each file that the commit of logical rep added needs to be updated. Regards, -- Fujii Masao
On 23/01/17 17:19, Fujii Masao wrote: > On Sat, Jan 21, 2017 at 1:39 AM, Petr Jelinek > <petr.jelinek@2ndquadrant.com> wrote: >> On 20/01/17 17:33, Jaime Casanova wrote: >>> On 20 January 2017 at 11:25, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: >>>> On 20/01/17 17:05, Fujii Masao wrote: >>>>> On Fri, Jan 20, 2017 at 11:08 PM, Peter Eisentraut >>>>> <peter.eisentraut@2ndquadrant.com> wrote: >>>>>> On 1/19/17 5:01 PM, Petr Jelinek wrote: >>>>>>> There were some conflicting changes committed today so I rebased the >>>>>>> patch on top of them. >>>>>>> >>>>>>> Other than that nothing much has changed, I removed the separate sync >>>>>>> commit patch, included the rename patch in the patchset and fixed the >>>>>>> bug around pg_subscription catalog reported by Erik Rijkers. >>>>>> >>>>>> Committed. >>>>> >>>>> Sorry I've not followed the discussion about logical replication at all, but >>>>> why does logical replication launcher need to start up by default? >>>>> >>>> >>>> Because running subscriptions is allowed by default. You'd need to set >>>> max_logical_replication_workers to 0 to disable that. >>>> >>> >>> surely wal_level < logical shouldn't start a logical replication >>> launcher, and after an initdb wal_level is only replica >>> >> >> Launcher is needed for subscriptions, subscriptions don't depend on >> wal_level. > > But why did you enable only subscription by default while publication is > disabled by default (i.e., wal_level != logical)? I think that it's better to > enable both by default OR disable both by default. > That depends, the wal_level = logical by default was deemed to not be worth the potential overhead in the thread about wal_level thread. There is no such overhead associated with enabling subscription, one could say that it's less work this way to setup whole thing. But I guess it's up for a debate. > While I was reading the logical rep code, I found that > logicalrep_worker_launch returns *without* releasing LogicalRepWorkerLock > when there is no unused worker slot. This seems a bug. True, fix attached. > > /* Report this after the initial starting message for consistency. */ > if (max_replication_slots == 0) > ereport(ERROR, > (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED), > errmsg("cannot start logical replication workers when > max_replication_slots = 0"))); > > logicalrep_worker_launch checks max_replication_slots as above. > Why does it need to check that setting value in the *subscriber* side? > Maybe I'm missing something here, but ISTM that the subscription uses > one replication slot in *publisher* side but doesn't use in *subscriber* side. Because replication origins are also limited by the max_replication_slots and they are required for subscription to work (I am not quite sure why it's the case, I guess we wanted to save GUC). > > * The apply worker may spawn additional workers (sync) for initial data > * synchronization of tables. > > The above header comment in logical/worker.c is true? > Hmm not yet, there is separate patch for it in CF, I guess it got through the cracks while rebasing. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 1/22/17 8:11 PM, Petr Jelinek wrote: > 0001 - Changes the libpqrcv_connect to use async libpq api so that it > won't get stuck forever in case of connect is stuck. This is preexisting > bug that also affects walreceiver but it's less visible there as there > is no SQL interface to initiate connection there. Probably a mistake here: + case PGRES_POLLING_READING: + extra_flag = WL_SOCKET_READABLE; + /* pass through */ + case PGRES_POLLING_WRITING: + extra_flag = WL_SOCKET_WRITEABLE; extra_flag gets overwritten in the reading case. Please elaborate in the commit message what this change is for. > 0002 - Close replication connection when CREATE SUBSCRIPTION gets > canceled (otherwise walsender on the other side may stay in idle in > transaction state). committed > 0003 - Fixes buffer initialization in walsender that I found when > testing the above two. This one should be back-patched to 9.4 since it's > broken since then. Can you explain more in which code path this problem occurs? I think we should get rid of the global variables and give each function its own buffer that it initializes the first time through. Otherwise we'll keep having to worry about this. > 0004 - Fixes the foreign key issue reported by Thom Brown and also adds > tests for FK and trigger handling. I think the trigger handing should go into execReplication.c. > 0005 - Adds support for renaming publications and subscriptions. Could those not be handled in the generic rename support in ExecRenameStmt()? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 1/23/17 11:19 AM, Fujii Masao wrote: > The copyright in each file that the commit of logical rep added needs to > be updated. I have fixed that. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 25/01/17 18:16, Peter Eisentraut wrote: > On 1/22/17 8:11 PM, Petr Jelinek wrote: >> 0001 - Changes the libpqrcv_connect to use async libpq api so that it >> won't get stuck forever in case of connect is stuck. This is preexisting >> bug that also affects walreceiver but it's less visible there as there >> is no SQL interface to initiate connection there. > > Probably a mistake here: > > + case PGRES_POLLING_READING: > + extra_flag = WL_SOCKET_READABLE; > + /* pass through */ > + case PGRES_POLLING_WRITING: > + extra_flag = WL_SOCKET_WRITEABLE; > > extra_flag gets overwritten in the reading case. > Eh, reworked that to just if statement as switch does not really buy us anything there. > Please elaborate in the commit message what this change is for. > Okay. >> 0002 - Close replication connection when CREATE SUBSCRIPTION gets >> canceled (otherwise walsender on the other side may stay in idle in >> transaction state). > > committed Thanks! > >> 0003 - Fixes buffer initialization in walsender that I found when >> testing the above two. This one should be back-patched to 9.4 since it's >> broken since then. > > Can you explain more in which code path this problem occurs? With existing code base, anything that calls WalSndWaitForWal (it calls ProcessRepliesIfAny()) which is called from logical_read_xlog_page which is given as callback to logical decoding in CreateReplicationSlot and StartLogicalReplication. The reason why I decided to put it into init is that following up all the paths to where buffers are used is rather complicated due to various callbacks so if anybody else starts poking around in the future it might get easily broken again if we don't initialize those unconditionally (plus the memory footprint is few kB and in usual use of WalSender they will eventually be initialized anyway as they are needed for streaming). > I think we should get rid of the global variables and give each function > its own buffer that it initializes the first time through. Otherwise > we'll keep having to worry about this. > Because of above, it would mean some refactoring in logical decoding APIs not just in WalSender so that would not be backpatchable (and in general it's much bigger patch then). >> 0004 - Fixes the foreign key issue reported by Thom Brown and also adds >> tests for FK and trigger handling. > > I think the trigger handing should go into execReplication.c. > Not in the current state, eventually (and I am afraid that PG11 material at this point as we still have partitioned tables support and initial data copy to finish in this release) we'll want to move all the executor state code to execReplication.c and do less of reinitialization but in the current code the trigger stuff belongs to worker IMHO. >> 0005 - Adds support for renaming publications and subscriptions. > > Could those not be handled in the generic rename support in > ExecRenameStmt()? Yes seems they can. Attached updated version of the uncommitted patches. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Hi, I updated these patches for current HEAD and removed the string initialization in walsender as Fuji Masao committed similar fix in meantime. I also found typo/thinko in the first patch which is now fixed. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 22/02/17 12:24, Petr Jelinek wrote: > Hi, > > I updated these patches for current HEAD and removed the string > initialization in walsender as Fuji Masao committed similar fix in meantime. > > I also found typo/thinko in the first patch which is now fixed. > And of course I missed the xlog->wal rename, sigh. Fixed. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 2/22/17 07:00, Petr Jelinek wrote: > On 22/02/17 12:24, Petr Jelinek wrote: >> Hi, >> >> I updated these patches for current HEAD and removed the string >> initialization in walsender as Fuji Masao committed similar fix in meantime. >> >> I also found typo/thinko in the first patch which is now fixed. >> > > And of course I missed the xlog->wal rename, sigh. Fixed. all three committed -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services