Thread: adding partitioned tables to publications
One cannot currently add partitioned tables to a publication. create table p (a int, b int) partition by hash (a); create table p1 partition of p for values with (modulus 3, remainder 0); create table p2 partition of p for values with (modulus 3, remainder 1); create table p3 partition of p for values with (modulus 3, remainder 2); create publication publish_p for table p; ERROR: "p" is a partitioned table DETAIL: Adding partitioned tables to publications is not supported. HINT: You can add the table partitions individually. One can do this instead: create publication publish_p1 for table p1; create publication publish_p2 for table p2; create publication publish_p3 for table p3; but maybe that's too much code to maintain for users. I propose that we make this command: create publication publish_p for table p; automatically add all the partitions to the publication. Also, any future partitions should also be automatically added to the publication. So, publishing a partitioned table automatically publishes all of its existing and future partitions. Attached patch implements that. What doesn't change with this patch is that the partitions on the subscription side still have to match one-to-one with the partitions on the publication side, because the changes are still replicated as being made to the individual partitions, not as the changes to the root partitioned table. It might be useful to implement that functionality on the publication side, because it allows users to define the replication target any way they need to, but this patch doesn't implement that. Thanks, Amit
Attachment
On Mon, Oct 7, 2019 at 9:55 AM Amit Langote <amitlangote09@gmail.com> wrote: > One cannot currently add partitioned tables to a publication. > > create table p (a int, b int) partition by hash (a); > create table p1 partition of p for values with (modulus 3, remainder 0); > create table p2 partition of p for values with (modulus 3, remainder 1); > create table p3 partition of p for values with (modulus 3, remainder 2); > > create publication publish_p for table p; > ERROR: "p" is a partitioned table > DETAIL: Adding partitioned tables to publications is not supported. > HINT: You can add the table partitions individually. > > One can do this instead: > > create publication publish_p1 for table p1; > create publication publish_p2 for table p2; > create publication publish_p3 for table p3; > > but maybe that's too much code to maintain for users. > > I propose that we make this command: > > create publication publish_p for table p; > > automatically add all the partitions to the publication. Also, any > future partitions should also be automatically added to the > publication. So, publishing a partitioned table automatically > publishes all of its existing and future partitions. Attached patch > implements that. > > What doesn't change with this patch is that the partitions on the > subscription side still have to match one-to-one with the partitions > on the publication side, because the changes are still replicated as > being made to the individual partitions, not as the changes to the > root partitioned table. It might be useful to implement that > functionality on the publication side, because it allows users to > define the replication target any way they need to, but this patch > doesn't implement that. Added this to the next CF: https://commitfest.postgresql.org/25/2301/ Thanks, Amit
On Thu, 10 Oct 2019 at 08:29, Amit Langote <amitlangote09@gmail.com> wrote:
On Mon, Oct 7, 2019 at 9:55 AM Amit Langote <amitlangote09@gmail.com> wrote:
> One cannot currently add partitioned tables to a publication.
>
> create table p (a int, b int) partition by hash (a);
> create table p1 partition of p for values with (modulus 3, remainder 0);
> create table p2 partition of p for values with (modulus 3, remainder 1);
> create table p3 partition of p for values with (modulus 3, remainder 2);
>
> create publication publish_p for table p;
> ERROR: "p" is a partitioned table
> DETAIL: Adding partitioned tables to publications is not supported.
> HINT: You can add the table partitions individually.
>
> One can do this instead:
>
> create publication publish_p1 for table p1;
> create publication publish_p2 for table p2;
> create publication publish_p3 for table p3;
>
> but maybe that's too much code to maintain for users.
>
> I propose that we make this command:
>
> create publication publish_p for table p;
>
> automatically add all the partitions to the publication. Also, any
> future partitions should also be automatically added to the
> publication. So, publishing a partitioned table automatically
> publishes all of its existing and future partitions. Attached patch
> implements that.
>
> What doesn't change with this patch is that the partitions on the
> subscription side still have to match one-to-one with the partitions
> on the publication side, because the changes are still replicated as
> being made to the individual partitions, not as the changes to the
> root partitioned table. It might be useful to implement that
> functionality on the publication side, because it allows users to
> define the replication target any way they need to, but this patch
> doesn't implement that.
Added this to the next CF: https://commitfest.postgresql.org/25/2301/
Hi Amit,
Lately I was exploring logical replication feature of postgresql and I found this addition in the scope of feature for partitioned tables a useful one.
In order to understand the working of your patch a bit more, I performed an experiment wherein I created a partitioned table with several children and a default partition at the publisher side and normal tables of the same name as parent, children, and default partition of the publisher side at the subscriber side. Next I established the logical replication connection and to my surprise the data was successfully replicated from partitioned tables to normal tables and then this error filled the logs,
LOG: logical replication table synchronization worker for subscription "my_subscription", table "parent" has started
ERROR: table "public.parent" not found on publisher
here parent is the name of the partitioned table at the publisher side and it is present as normal table at subscriber side as well. Which is understandable, it is trying to find a normal table of the same name but couldn't find one, maybe it should not worry about that now also if not at replication time.
Please let me know if this is something expected because in my opinion this is not desirable, there should be some check to check the table type for replication. This wasn't important till now maybe because only normal tables were to be replicated, but with the extension of the scope of logical replication to more objects such checks would be helpful.
On a separate note was thinking for partitioned tables, wouldn't it be cleaner to have something like you create only partition table at the subscriber and then when logical replication starts it creates the child tables accordingly. Or would that be too much in future...?
Regards,
Rafia Sabih
Rafia Sabih
Hello Rafia, Great to hear that you are interested in this feature and thanks for testing the patch. On Thu, Oct 10, 2019 at 10:13 PM Rafia Sabih <rafia.pghackers@gmail.com> wrote: > Lately I was exploring logical replication feature of postgresql and I found this addition in the scope of feature forpartitioned tables a useful one. > > In order to understand the working of your patch a bit more, I performed an experiment wherein I created a partitionedtable with several children and a default partition at the publisher side and normal tables of the same nameas parent, children, and default partition of the publisher side at the subscriber side. Next I established the logicalreplication connection and to my surprise the data was successfully replicated from partitioned tables to normal tablesand then this error filled the logs, > LOG: logical replication table synchronization worker for subscription "my_subscription", table "parent" has started > ERROR: table "public.parent" not found on publisher > > here parent is the name of the partitioned table at the publisher side and it is present as normal table at subscriberside as well. Which is understandable, it is trying to find a normal table of the same name but couldn't find one,maybe it should not worry about that now also if not at replication time. > > Please let me know if this is something expected because in my opinion this is not desirable, there should be some checkto check the table type for replication. This wasn't important till now maybe because only normal tables were to bereplicated, but with the extension of the scope of logical replication to more objects such checks would be helpful. Thanks for sharing this case. I hadn't considered it, but you're right that it should be handled sensibly. I have fixed table sync code to handle this case properly. Could you please check your case with the attached updated patch? > On a separate note was thinking for partitioned tables, wouldn't it be cleaner to have something like you create only partitiontable at the subscriber and then when logical replication starts it creates the child tables accordingly. Or wouldthat be too much in future...? Hmm, we'd first need to built the "automatic partition creation" feature to consider doing something like that. I'm sure you'd agree that we should undertake that project separately from this tiny logical replication usability improvement project. :) Thanks again. Regards, Amit
Attachment
Hi, On 07/10/2019 02:55, Amit Langote wrote: > One cannot currently add partitioned tables to a publication. > > create table p (a int, b int) partition by hash (a); > create table p1 partition of p for values with (modulus 3, remainder 0); > create table p2 partition of p for values with (modulus 3, remainder 1); > create table p3 partition of p for values with (modulus 3, remainder 2); > > create publication publish_p for table p; > ERROR: "p" is a partitioned table > DETAIL: Adding partitioned tables to publications is not supported. > HINT: You can add the table partitions individually. > > One can do this instead: > > create publication publish_p1 for table p1; > create publication publish_p2 for table p2; > create publication publish_p3 for table p3; Or just create publication publish_p for table p1, p2, p3; > > but maybe that's too much code to maintain for users. > > I propose that we make this command: > > create publication publish_p for table p; > +1 > automatically add all the partitions to the publication. Also, any > future partitions should also be automatically added to the > publication. So, publishing a partitioned table automatically > publishes all of its existing and future partitions. Attached patch > implements that. > > What doesn't change with this patch is that the partitions on the > subscription side still have to match one-to-one with the partitions > on the publication side, because the changes are still replicated as > being made to the individual partitions, not as the changes to the > root partitioned table. It might be useful to implement that > functionality on the publication side, because it allows users to > define the replication target any way they need to, but this patch > doesn't implement that. > Yeah for that to work subscription would need to also need to be able to write to partitioned tables, so it needs both sides to add support for this. I think if we do both what you did and the transparent handling of root only, we'll need new keyword to differentiate the two. It might make sense to think about if we want your way to need an extra keyword or the transparent one will need it. One issue that I see reading the patch is following set of commands: CREATE TABLE foo ...; CREATE PUBLICATION mypub FOR TABLE foo; CREATE TABLE bar ...; ALTER PUBLICATION mypub ADD TABLE bar; ALTER TABLE foo ATTACH PARTITION bar ...; ALTER TABLE foo DETACH PARTITION bar ...; This will end up with bar not being in any publication even though it was explicitly added. That might be acceptable caveat but it at least should be clearly documented (IMHO with warning). -- Petr Jelinek 2ndQuadrant - PostgreSQL Solutions for the Enterprise https://www.2ndQuadrant.com/
On Mon, Oct 07, 2019 at 09:55:23AM +0900, Amit Langote wrote: > One cannot currently add partitioned tables to a publication. > > create table p (a int, b int) partition by hash (a); > create table p1 partition of p for values with (modulus 3, remainder 0); > create table p2 partition of p for values with (modulus 3, remainder 1); > create table p3 partition of p for values with (modulus 3, remainder 2); > > create publication publish_p for table p; > ERROR: "p" is a partitioned table > DETAIL: Adding partitioned tables to publications is not supported. > HINT: You can add the table partitions individually. > > One can do this instead: > > create publication publish_p1 for table p1; > create publication publish_p2 for table p2; > create publication publish_p3 for table p3; > > but maybe that's too much code to maintain for users. > > I propose that we make this command: > > create publication publish_p for table p; > > automatically add all the partitions to the publication. Also, any > future partitions should also be automatically added to the > publication. So, publishing a partitioned table automatically > publishes all of its existing and future partitions. Attached patch > implements that. > > What doesn't change with this patch is that the partitions on the > subscription side still have to match one-to-one with the partitions > on the publication side, because the changes are still replicated as > being made to the individual partitions, not as the changes to the > root partitioned table. It might be useful to implement that > functionality on the publication side, because it allows users to > define the replication target any way they need to, but this patch > doesn't implement that. With this patch, is it possible to remove a partition manually from a subscription, or will it just get automatically re-added at some point? Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Hi David, On Sun, Oct 13, 2019 at 4:55 PM David Fetter <david@fetter.org> wrote: > On Mon, Oct 07, 2019 at 09:55:23AM +0900, Amit Langote wrote: > > I propose that we make this command: > > > > create publication publish_p for table p; > > > > automatically add all the partitions to the publication. Also, any > > future partitions should also be automatically added to the > > publication. So, publishing a partitioned table automatically > > publishes all of its existing and future partitions. Attached patch > > implements that. > > > > What doesn't change with this patch is that the partitions on the > > subscription side still have to match one-to-one with the partitions > > on the publication side, because the changes are still replicated as > > being made to the individual partitions, not as the changes to the > > root partitioned table. It might be useful to implement that > > functionality on the publication side, because it allows users to > > define the replication target any way they need to, but this patch > > doesn't implement that. > > With this patch, is it possible to remove a partition manually from a > subscription, or will it just get automatically re-added at some > point? Hmm, I don't think there is any way (commands) to manually remove tables from a subscription. Testing shows that if you drop a table on the subscription server that is currently being fed data via a subscription, then a subscription worker will complain and quit if it receives a row targeting the dropped table and workers that are subsequently started will do the same thing. Interestingly, this behavior prevents replication for any other tables in the subscription from proceeding, which seems unfortunate. If you were asking if the patch extends the subscription side functionality to re-add needed partitions that were manually removed likely by accident, then no. Thanks, Amit
Hi Petr, Thanks for your comments. On Sun, Oct 13, 2019 at 5:01 AM Petr Jelinek <petr@2ndquadrant.com> wrote: > On 07/10/2019 02:55, Amit Langote wrote: > > One cannot currently add partitioned tables to a publication. > > > > create table p (a int, b int) partition by hash (a); > > create table p1 partition of p for values with (modulus 3, remainder 0); > > create table p2 partition of p for values with (modulus 3, remainder 1); > > create table p3 partition of p for values with (modulus 3, remainder 2); > > > > create publication publish_p for table p; > > ERROR: "p" is a partitioned table > > DETAIL: Adding partitioned tables to publications is not supported. > > HINT: You can add the table partitions individually. > > > > One can do this instead: > > > > create publication publish_p1 for table p1; > > create publication publish_p2 for table p2; > > create publication publish_p3 for table p3; > > Or just create publication publish_p for table p1, p2, p3; Yep, facepalm! :) So, one doesn't really need as many publication objects as there are partitions as my version suggests, which is good. Although, as you can tell, a user would still manually need to keep the set of published partitions up to date, for example when new partitions are added. > > but maybe that's too much code to maintain for users. > > > > I propose that we make this command: > > > > create publication publish_p for table p; > > > > +1 > > > automatically add all the partitions to the publication. Also, any > > future partitions should also be automatically added to the > > publication. So, publishing a partitioned table automatically > > publishes all of its existing and future partitions. Attached patch > > implements that. > > > > What doesn't change with this patch is that the partitions on the > > subscription side still have to match one-to-one with the partitions > > on the publication side, because the changes are still replicated as > > being made to the individual partitions, not as the changes to the > > root partitioned table. It might be useful to implement that > > functionality on the publication side, because it allows users to > > define the replication target any way they need to, but this patch > > doesn't implement that. > > > > Yeah for that to work subscription would need to also need to be able to > write to partitioned tables, so it needs both sides to add support for > this. Ah, I didn't know that the subscription code doesn't out-of-the-box support tuple routing. Indeed, we will need to fix that. > I think if we do both what you did and the transparent handling of > root only, we'll need new keyword to differentiate the two. It might > make sense to think about if we want your way to need an extra keyword > or the transparent one will need it. I didn't think about that but maybe you are right. > One issue that I see reading the patch is following set of commands: > > CREATE TABLE foo ...; > CREATE PUBLICATION mypub FOR TABLE foo; > > CREATE TABLE bar ...; > ALTER PUBLICATION mypub ADD TABLE bar; > > ALTER TABLE foo ATTACH PARTITION bar ...; > ALTER TABLE foo DETACH PARTITION bar ...; > > This will end up with bar not being in any publication even though it > was explicitly added. I tested and bar continues to be in the publication with above steps: create table foo (a int) partition by list (a); create publication mypub for table foo; create table bar (a int); alter publication mypub add table bar; \d bar Table "public.bar" Column │ Type │ Collation │ Nullable │ Default ────────┼─────────┼───────────┼──────────┼───────── a │ integer │ │ │ Publications: "mypub" alter table foo attach partition bar for values in (1); \d bar Table "public.bar" Column │ Type │ Collation │ Nullable │ Default ────────┼─────────┼───────────┼──────────┼───────── a │ integer │ │ │ Partition of: foo FOR VALUES IN (1) Publications: "mypub" -- can't now drop bar from mypub (its membership is no longer standalone) alter publication mypub drop table bar; ERROR: cannot drop partition "bar" from an inherited publication HINT: Drop the parent from publication instead. alter table foo detach partition bar; -- bar is still in mypub (now a standalone member) \d bar Table "public.bar" Column │ Type │ Collation │ Nullable │ Default ────────┼─────────┼───────────┼──────────┼───────── a │ integer │ │ │ Publications: "mypub" -- ok to drop now from mypub alter publication mypub drop table bar; Thanks, Amit
This patch seems excessively complicated to me. Why don't you just add the actual partitioned table to pg_publication_rel and then expand the partition hierarchy in pgoutput (get_rel_sync_entry() or GetRelationPublications() or somewhere around there). Then you don't need to do any work in table DDL to keep the list of published tables up to date. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, 11 Oct 2019 at 08:06, Amit Langote <amitlangote09@gmail.com> wrote:
Thanks for sharing this case. I hadn't considered it, but you're
right that it should be handled sensibly. I have fixed table sync
code to handle this case properly. Could you please check your case
with the attached updated patch?
I was checking this today and found that the behavior doesn't change much with the updated patch. The tables are still replicated, just that a select count from parent table shows 0, rest of the partitions including default one has the data from the publisher. I was expecting more like an error at subscriber saying the table type is not same.
Please find the attached file for the test case, in case something is unclear.
Regards,
Rafia Sabih
Rafia Sabih
Attachment
Sorry about the delay. On Mon, Nov 4, 2019 at 8:00 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > This patch seems excessively complicated to me. Why don't you just add > the actual partitioned table to pg_publication_rel and then expand the > partition hierarchy in pgoutput (get_rel_sync_entry() or > GetRelationPublications() or somewhere around there). Then you don't > need to do any work in table DDL to keep the list of published tables up > to date. I tend to agree that having to manage this at the DDL level would be bug-prone, not to mention pretty complicated code to implement it. I have tried to implement it the way you suggested. So every decoded change to a leaf partition will now be published not only via its own publication but also via publications of its ancestors if any. That irrespective of whether a row is directly inserted into the leaf partition or got routed to it via insert done on an ancestor. In this implementation, the only pg_publication_rel entry is the one corresponding to the partitioned table. On the subscription side, when creating pg_subscription_rel entries, for a publication containing a partitioned table, all of its partitions too must be fetched as being included in the publication. That is necessary, because the initial syncing copy and subsequently received changes must be applied to individual partitions. That could be changed in the future by publishing leaf partition changes as changes to the actually published partitioned table. That future implementation will also hopefully take care of the concern that Rafia mentioned on this thread that even with this patch, one must make sure that tables match one-to-one when they're in publish-subscribe relationship, which actually needs us to bake in low-level details like table's relkind in the protocol exchanges. Anyway, I've attached two patches -- 0001 is a refactoring patch. 0002 implements the feature. Thanks, Amit
Attachment
Hello Rafia, On Tue, Nov 5, 2019 at 12:41 AM Rafia Sabih <rafia.pghackers@gmail.com> wrote: > On Fri, 11 Oct 2019 at 08:06, Amit Langote <amitlangote09@gmail.com> wrote: >> Thanks for sharing this case. I hadn't considered it, but you're >> right that it should be handled sensibly. I have fixed table sync >> code to handle this case properly. Could you please check your case >> with the attached updated patch? >> > I was checking this today and found that the behavior doesn't change much with the updated patch. The tables are stillreplicated, just that a select count from parent table shows 0, rest of the partitions including default one has thedata from the publisher. I was expecting more like an error at subscriber saying the table type is not same. > > Please find the attached file for the test case, in case something is unclear. Thanks for the test case. With the latest patch I posted, you'll get the following error on subscriber: create subscription mysub connection 'host=localhost port=5432 dbname=postgres' publication mypub; ERROR: cannot use relation "public.t" as logical replication target DETAIL: "public.t" is a regular table on subscription side whereas a partitioned table on publication side Although to be honest, I'd rather not see the error. As I mentioned in my email earlier, it'd be nice to be able sync a partitioned table and a regular table (or vice versa) via replication. Thanks, Amit
On Fri, Nov 8, 2019 at 1:27 PM Amit Langote <amitlangote09@gmail.com> wrote: > Anyway, I've attached two patches -- 0001 is a refactoring patch. 0002 > implements the feature. 0002 didn't contain necessary pg_dump changes, which fixed in the attached new version. Thanks, Amit
Attachment
On 2019-11-11 08:59, Amit Langote wrote: > On Fri, Nov 8, 2019 at 1:27 PM Amit Langote <amitlangote09@gmail.com> wrote: >> Anyway, I've attached two patches -- 0001 is a refactoring patch. 0002 >> implements the feature. > > 0002 didn't contain necessary pg_dump changes, which fixed in the > attached new version. That looks more pleasant. I don't understand why you go through great lengths to ensure that the relkinds match between publisher and subscriber. We already ensure that only regular tables are published and only regular tables are allowed as subscription target. In the future, we may want to allow further combinations. What situation are you trying to address here? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Nov 11, 2019 at 9:49 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 2019-11-11 08:59, Amit Langote wrote: > > On Fri, Nov 8, 2019 at 1:27 PM Amit Langote <amitlangote09@gmail.com> wrote: > >> Anyway, I've attached two patches -- 0001 is a refactoring patch. 0002 > >> implements the feature. > > > > 0002 didn't contain necessary pg_dump changes, which fixed in the > > attached new version. > > That looks more pleasant. Thanks for looking. > I don't understand why you go through great lengths to ensure that the > relkinds match between publisher and subscriber. We already ensure that > only regular tables are published and only regular tables are allowed as > subscription target. In the future, we may want to allow further > combinations. What situation are you trying to address here? I'd really want to see the requirement for relkinds to have to match go away, but as you can see, this patch doesn't modify enough of pgoutput.c and worker.c to make that possible. Both the code for the initital syncing and that for the subsequent real-time replication assume that both source and target are regular tables. So even if partitioned tables can now be in a publication, they're never sent in the protocol messages, only their leaf partitions are. Initial syncing code can be easily modified to support any combination of source and target relations, but changes needed for real-time replication seem non-trivial. Do you think we should do that before we can say partitioned tables support logical replication? Thanks, Amit
On Tue, Nov 12, 2019 at 10:11 AM Amit Langote <amitlangote09@gmail.com> wrote: > Initial > syncing code can be easily modified to support any combination of > source and target relations, but changes needed for real-time > replication seem non-trivial. I have spent some time hacking on this. With the attached updated patch, adding a partitioned table to publication results in publishing the inserts, updates, deletes of the table's leaf partitions as inserts, updates, deletes of the table itself (it all happens inside pgoutput). So, the replication target table doesn't necessarily have to be a partitioned table and even if it is partitioned its partitions don't have to match one-to-one. One restriction remains though: partitioned tables on a subscriber can't accept updates and deletes, because we'd need to map those to updates and deletes of their partitions, including handling a tuple possibly moving from one partition to another during an update. Also, I haven't added subscription tests yet. Attached updated patch. The previous division into a refactoring patch and feature patch no longer made to sense to me, so there is only one this time. Thanks, Amit
Attachment
On 2019-11-18 09:53, Amit Langote wrote: > I have spent some time hacking on this. With the attached updated > patch, adding a partitioned table to publication results in publishing > the inserts, updates, deletes of the table's leaf partitions as > inserts, updates, deletes of the table itself (it all happens inside > pgoutput). So, the replication target table doesn't necessarily have > to be a partitioned table and even if it is partitioned its partitions > don't have to match one-to-one. > > One restriction remains though: partitioned tables on a subscriber > can't accept updates and deletes, because we'd need to map those to > updates and deletes of their partitions, including handling a tuple > possibly moving from one partition to another during an update. Right. Without that second part, the first part isn't really that useful yet, is it? I'm not sure what your intent with this patch is now. I thought the previous behavior -- add a partitioned table to a publication and its leaf tables appear in the replication output -- was pretty welcome. Do we not want that anymore? There should probably be an option to pick the behavior, like we do in pg_dump. What happens when you add a leaf table directly to a publication? Is it replicated under its own identity or under its ancestor partitioned table? (What if both the leaf table and a partitioned table are publication members?) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-11-12 02:11, Amit Langote wrote: >> I don't understand why you go through great lengths to ensure that the >> relkinds match between publisher and subscriber. We already ensure that >> only regular tables are published and only regular tables are allowed as >> subscription target. In the future, we may want to allow further >> combinations. What situation are you trying to address here? > I'd really want to see the requirement for relkinds to have to match > go away, but as you can see, this patch doesn't modify enough of > pgoutput.c and worker.c to make that possible. Both the code for the > initital syncing and that for the subsequent real-time replication > assume that both source and target are regular tables. So even if > partitioned tables can now be in a publication, they're never sent in > the protocol messages, only their leaf partitions are. Initial > syncing code can be easily modified to support any combination of > source and target relations, but changes needed for real-time > replication seem non-trivial. Do you think we should do that before > we can say partitioned tables support logical replication? My question was more simply why you have this check: + /* + * Cannot replicate from a regular to a partitioned table or vice + * versa. + */ + if (local_relkind != pt->relkind) + ereport(ERROR, + (errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("cannot use relation \"%s.%s\" as logical replication target", + rv->schemaname, rv->relname), It doesn't seem necessary. What happens if you remove it? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi Peter, On Wed, Nov 20, 2019 at 4:55 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 2019-11-18 09:53, Amit Langote wrote: > > I have spent some time hacking on this. With the attached updated > > patch, adding a partitioned table to publication results in publishing > > the inserts, updates, deletes of the table's leaf partitions as > > inserts, updates, deletes of the table itself (it all happens inside > > pgoutput). So, the replication target table doesn't necessarily have > > to be a partitioned table and even if it is partitioned its partitions > > don't have to match one-to-one. > > > > One restriction remains though: partitioned tables on a subscriber > > can't accept updates and deletes, because we'd need to map those to > > updates and deletes of their partitions, including handling a tuple > > possibly moving from one partition to another during an update. > > Right. Without that second part, the first part isn't really that > useful yet, is it? I would say yes. > I'm not sure what your intent with this patch is now. I thought the > previous behavior -- add a partitioned table to a publication and its > leaf tables appear in the replication output -- was pretty welcome. Do > we not want that anymore? Hmm, I thought it would be more desirable to not expose a published partitioned table's leaf partitions to a subscriber, because it allows the target table to be defined more flexibly. > There should probably be an option to pick the behavior, like we do in > pg_dump. I don't understand which existing behavior. Can you clarify? Regarding allowing users to choose between publishing partitioned table changes using leaf tables' schema vs as using own schema, I tend to agree that there would be value in that. Users who choose the former will have to ensure that target leaf partitions match exactly. Users who want flexibility in how the target table is defined can use the latter. > What happens when you add a leaf table directly to a publication? Is it > replicated under its own identity or under its ancestor partitioned > table? (What if both the leaf table and a partitioned table are > publication members?) If both a leaf partition and an ancestor belong to the same publication, then leaf partition changes are replicated using the ancestor's schema. For a leaf partition to be replicated using its own schema it must be published via a separate publication that doesn't contain the ancestor. At least that's what the current patch does. Thanks, Amit
On 2019-11-22 07:28, Amit Langote wrote: > Hmm, I thought it would be more desirable to not expose a published > partitioned table's leaf partitions to a subscriber, because it allows > the target table to be defined more flexibly. There are multiple different variants that we probably eventually want to support. But I think there is value in exposing the partition structure to the subscriber. Most notably, it allows the subscriber to run the initial table sync per partition rather than in one big chunk -- which ultimately reflects one of the reasons partitioning exists. The other way, exposing only the partitioned table, is also useful, especially if you want to partition differently on the subscriber. But without the ability to target a partitioned table on the subscriber, this would right now only allow you to replicate a partitioned table into a non-partitioned table. Which is valid but probably not often useful. >> What happens when you add a leaf table directly to a publication? Is it >> replicated under its own identity or under its ancestor partitioned >> table? (What if both the leaf table and a partitioned table are >> publication members?) > > If both a leaf partition and an ancestor belong to the same > publication, then leaf partition changes are replicated using the > ancestor's schema. For a leaf partition to be replicated using its > own schema it must be published via a separate publication that > doesn't contain the ancestor. At least that's what the current patch > does. Hmm, that seems confusing. This would mean that if you add a partitioned table to a publication that already contains leaf tables, the publication behavior of the leaf tables would change. So again, I think this alternative behavior of publishing partitions under the name of their root table should be an explicit option on a publication, and then it should be ensured somehow that individual partitions are not added to the publication in confusing ways. So, it's up to you which aspect of this you want to tackle, but I thought your original goal of being able to add partitioned tables to publications and have that implicitly expand to all member partitions on the publication side seemed quite useful, self-contained, and uncontroversial. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Nov 22, 2019 at 4:16 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > On 2019-11-22 07:28, Amit Langote wrote: > > >> What happens when you add a leaf table directly to a publication? Is it > >> replicated under its own identity or under its ancestor partitioned > >> table? (What if both the leaf table and a partitioned table are > >> publication members?) > > > > If both a leaf partition and an ancestor belong to the same > > publication, then leaf partition changes are replicated using the > > ancestor's schema. For a leaf partition to be replicated using its > > own schema it must be published via a separate publication that > > doesn't contain the ancestor. At least that's what the current patch > > does. > > Hmm, that seems confusing. This would mean that if you add a > partitioned table to a publication that already contains leaf tables, > the publication behavior of the leaf tables would change. So again, I > think this alternative behavior of publishing partitions under the name > of their root table should be an explicit option on a publication, and > then it should be ensured somehow that individual partitions are not > added to the publication in confusing ways. > Yeah, it can probably detect and throw an error for such cases. > So, it's up to you which aspect of this you want to tackle, but I > thought your original goal of being able to add partitioned tables to > publications and have that implicitly expand to all member partitions on > the publication side seemed quite useful, self-contained, and > uncontroversial. > +1. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, Nov 22, 2019 at 7:46 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 2019-11-22 07:28, Amit Langote wrote: > > Hmm, I thought it would be more desirable to not expose a published > > partitioned table's leaf partitions to a subscriber, because it allows > > the target table to be defined more flexibly. > > There are multiple different variants that we probably eventually want > to support. But I think there is value in exposing the partition > structure to the subscriber. Most notably, it allows the subscriber to > run the initial table sync per partition rather than in one big chunk -- > which ultimately reflects one of the reasons partitioning exists. I agree that replicating leaf-to-leaf has the least overhead. > The other way, exposing only the partitioned table, is also useful, > especially if you want to partition differently on the subscriber. But > without the ability to target a partitioned table on the subscriber, > this would right now only allow you to replicate a partitioned table > into a non-partitioned table. Which is valid but probably not often useful. Handling non-partitioned target tables was the main reason for me to make publishing using the root parent's schema the default behavior. But given that replicating from partitioned tables into non-partitioned ones would be rare, I agree to replicating using the leaf schema by default. > >> What happens when you add a leaf table directly to a publication? Is it > >> replicated under its own identity or under its ancestor partitioned > >> table? (What if both the leaf table and a partitioned table are > >> publication members?) > > > > If both a leaf partition and an ancestor belong to the same > > publication, then leaf partition changes are replicated using the > > ancestor's schema. For a leaf partition to be replicated using its > > own schema it must be published via a separate publication that > > doesn't contain the ancestor. At least that's what the current patch > > does. > > Hmm, that seems confusing. This would mean that if you add a > partitioned table to a publication that already contains leaf tables, > the publication behavior of the leaf tables would change. So again, I > think this alternative behavior of publishing partitions under the name > of their root table should be an explicit option on a publication, and > then it should be ensured somehow that individual partitions are not > added to the publication in confusing ways. > > So, it's up to you which aspect of this you want to tackle, but I > thought your original goal of being able to add partitioned tables to > publications and have that implicitly expand to all member partitions on > the publication side seemed quite useful, self-contained, and > uncontroversial. OK, let's make whether to publish with root or leaf schema an option, with the latter being the default. I will see about updating the patch that way. Thanks, Amit
On Mon, Nov 25, 2019 at 6:37 PM Amit Langote <amitlangote09@gmail.com> wrote: > OK, let's make whether to publish with root or leaf schema an option, > with the latter being the default. I will see about updating the > patch that way. Here are the updated patches. 0001: Adding a partitioned table to a publication implicitly adds all its partitions. The receiving side must have tables matching the published partitions, which is typically the case, because the same partition tree is defined on both nodes. 0002: Add a new Boolean publication parameter 'publish_using_root_schema'. If true, a partitioned table's partitions are not exposed to the subscriber, that is, changes of its partitions are published as its own. This allows to replicate partitioned table changes to a non-partitioned table (seldom useful) or to a partitioned table that has different set of partitions than on the publisher (a reasonable use case). This patch only adds the parameter and doesn't implement any of that behavior. 0003: A refactoring patch for worker.c to allow handling partitioned tables as targets of logical of replication commands a bit easier. 0004: This implements the 'publish_using_root_schema = true' behavior described above. (An unintended benefit of making partitioned tables an accepted relation type in worker.c is that it allows partitions on subscriber to be sub-partitioned even if they are not on the publisher, that is, when replicating partition-to-partition!) Thanks, Amit
Attachment
On 2019-12-06 08:48, Amit Langote wrote: > 0001: Adding a partitioned table to a publication implicitly adds all > its partitions. The receiving side must have tables matching the > published partitions, which is typically the case, because the same > partition tree is defined on both nodes. This looks pretty good to me now. But you need to make all the changed queries version-aware so that you can still replicate from and to older versions. (For example, pg_partition_tree is not very old.) This part looks a bit fishy: + /* + * If either table is partitioned, skip copying. Individual partitions + * will be copied instead. + */ + if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE || + remote_relkind == RELKIND_PARTITIONED_TABLE) + { + logicalrep_rel_close(relmapentry, NoLock); + return; + } I don't think you want to filter out a partitioned table on the local side, since (a) COPY can handle that, and (b) it's (as of this patch) an error to have a partitioned table in the subscription table set. I'm not a fan of the new ValidateSubscriptionRel() function. It's too obscure, especially the return value. Doesn't seem worth it. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Thanks for checking. On Thu, Dec 12, 2019 at 12:48 AM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 2019-12-06 08:48, Amit Langote wrote: > > 0001: Adding a partitioned table to a publication implicitly adds all > > its partitions. The receiving side must have tables matching the > > published partitions, which is typically the case, because the same > > partition tree is defined on both nodes. > > This looks pretty good to me now. But you need to make all the changed > queries version-aware so that you can still replicate from and to older > versions. (For example, pg_partition_tree is not very old.) True, fixed that. > This part looks a bit fishy: > > + /* > + * If either table is partitioned, skip copying. Individual > partitions > + * will be copied instead. > + */ > + if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE || > + remote_relkind == RELKIND_PARTITIONED_TABLE) > + { > + logicalrep_rel_close(relmapentry, NoLock); > + return; > + } > > I don't think you want to filter out a partitioned table on the local > side, since (a) COPY can handle that, and (b) it's (as of this patch) an > error to have a partitioned table in the subscription table set. Yeah, (b) is true, so copy_table() should only ever see regular tables with only patch 0001 applied. > I'm not a fan of the new ValidateSubscriptionRel() function. It's too > obscure, especially the return value. Doesn't seem worth it. It went through many variants since I first introduced it, but yeah I agree we don't need it if only because of the weird interface. It occurred to me that, *as of 0001*, we should indeed disallow replicating from a regular table on publisher node into a partitioned table of the same name on subscriber node (as the earlier patches did), because 0001 doesn't implement tuple routing support that would be needed to apply such changes. Attached updated patches. Thanks, Amit
Attachment
On Mon, Dec 16, 2019 at 2:50 PM Amit Langote <amitlangote09@gmail.com> wrote: > > Thanks for checking. > > On Thu, Dec 12, 2019 at 12:48 AM Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: > > On 2019-12-06 08:48, Amit Langote wrote: > > > 0001: Adding a partitioned table to a publication implicitly adds all > > > its partitions. The receiving side must have tables matching the > > > published partitions, which is typically the case, because the same > > > partition tree is defined on both nodes. > > > > This looks pretty good to me now. But you need to make all the changed > > queries version-aware so that you can still replicate from and to older > > versions. (For example, pg_partition_tree is not very old.) > > True, fixed that. > > > This part looks a bit fishy: > > > > + /* > > + * If either table is partitioned, skip copying. Individual > > partitions > > + * will be copied instead. > > + */ > > + if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE || > > + remote_relkind == RELKIND_PARTITIONED_TABLE) > > + { > > + logicalrep_rel_close(relmapentry, NoLock); > > + return; > > + } > > > > I don't think you want to filter out a partitioned table on the local > > side, since (a) COPY can handle that, and (b) it's (as of this patch) an > > error to have a partitioned table in the subscription table set. > > Yeah, (b) is true, so copy_table() should only ever see regular tables > with only patch 0001 applied. > > > I'm not a fan of the new ValidateSubscriptionRel() function. It's too > > obscure, especially the return value. Doesn't seem worth it. > > It went through many variants since I first introduced it, but yeah I > agree we don't need it if only because of the weird interface. > > It occurred to me that, *as of 0001*, we should indeed disallow > replicating from a regular table on publisher node into a partitioned > table of the same name on subscriber node (as the earlier patches > did), because 0001 doesn't implement tuple routing support that would > be needed to apply such changes. > > Attached updated patches. > I am planning to review this patch. Currently, it is not applying on the head so can you rebase it? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Hi Amit, I went through this patch set once again today and here are my two cents. On Mon, 16 Dec 2019 at 10:19, Amit Langote <amitlangote09@gmail.com> wrote: > > Attached updated patches. - differently partitioned setup. Attempts to replicate tables other than - base tables will result in an error. + Replication is only supported by regular and partitioned tables, although + the table kind must match between the two servers, that is, one cannot I find the phrase 'table kind' a bit odd, how about something like type of the table. /* Only plain tables can be aded to publications. */ - if (tbinfo->relkind != RELKIND_RELATION) + /* Only plain and partitioned tables can be added to publications. */ IMHO using regular instead of plain would be more consistent. + /* + * Find a partition for the tuple contained in remoteslot. + * + * For insert, remoteslot is tuple to insert. For update and delete, it + * is the tuple to be replaced and deleted, repectively. + */ Misspelled 'respectively'. +static void +apply_handle_tuple_routing(ResultRelInfo *relinfo, + LogicalRepRelMapEntry *relmapentry, + EState *estate, CmdType operation, + TupleTableSlot *remoteslot, + LogicalRepTupleData *newtup) +{ + Relation rel = relinfo->ri_RelationDesc; + ModifyTableState *mtstate = NULL; + PartitionTupleRouting *proute = NULL; + ResultRelInfo *partrelinfo, + *partrelinfo1; IMHO, partrelinfo1 can be better named to improve readability. Otherwise, as Dilip already mentioned, there is a rebase required particularly for 0003 and 0004. -- Regards, Rafia Sabih
On Mon, Jan 6, 2020 at 8:25 PM Rafia Sabih <rafia.pghackers@gmail.com> wrote: > Hi Amit, > > I went through this patch set once again today and here are my two cents. Thanks Rafia. Rebased and updated to address your comments. Regards, Amit
Attachment
On Tue, 7 Jan 2020 at 06:02, Amit Langote <amitlangote09@gmail.com> wrote: > Rebased and updated to address your comments. > + <para> + Partitioned tables are not considered when <literal>FOR ALL TABLES</literal> + is specified. + </para> + What is the reason for above, I mean not for the comment but not including partitioned tables in for all tables options. -- Regards, Rafia Sabih
On 2020-01-07 06:01, Amit Langote wrote: > On Mon, Jan 6, 2020 at 8:25 PM Rafia Sabih <rafia.pghackers@gmail.com> wrote: >> Hi Amit, >> >> I went through this patch set once again today and here are my two cents. > > Thanks Rafia. > > Rebased and updated to address your comments. Looking through 0001, I think perhaps there is a better way to structure some of the API changes. Instead of passing the root_target_rel to CheckValidResultRel() and CheckCmdReplicaIdentity(), which we only need to check the publication actions of the root table, how about changing GetRelationPublicationActions() to automatically include the publication information of the root table. Then we have that information in the relcache once and don't need to check the base table and the partition root separately at each call site (of which there is only one right now). (Would that work correctly with relcache invalidation?) Similarly, couldn't GetRelationPublications() just automatically take partitioning into account? We don't need the separation between GetRelationPublications() and GetRelationAncestorPublications(). This would also avoid errors of omission, for example the GetRelationPublications() call in ATPrepChangePersistence() doesn't take GetRelationAncestorPublications() into account. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-01-07 15:18, Rafia Sabih wrote: > On Tue, 7 Jan 2020 at 06:02, Amit Langote <amitlangote09@gmail.com> wrote: > >> Rebased and updated to address your comments. >> > + <para> > + Partitioned tables are not considered when <literal>FOR ALL TABLES</literal> > + is specified. > + </para> > + > What is the reason for above, I mean not for the comment but not > including partitioned tables in for all tables options. This comment is kind of a noop, because the leaf partitions are already included in FOR ALL TABLES, so whether partitioned tables are considered included in FOR ALL TABLES is irrelevant. I suggest removing the comment to avoid any confusion. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Jan 8, 2020 at 7:57 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 2020-01-07 15:18, Rafia Sabih wrote: > > On Tue, 7 Jan 2020 at 06:02, Amit Langote <amitlangote09@gmail.com> wrote: > > > >> Rebased and updated to address your comments. > >> > > + <para> > > + Partitioned tables are not considered when <literal>FOR ALL TABLES</literal> > > + is specified. > > + </para> > > + > > What is the reason for above, I mean not for the comment but not > > including partitioned tables in for all tables options. > > This comment is kind of a noop, because the leaf partitions are already > included in FOR ALL TABLES, so whether partitioned tables are considered > included in FOR ALL TABLES is irrelevant. I suggest removing the > comment to avoid any confusion. I agree. I had written that comment considering the other feature where the changes are published as root table's, but even in that case it'd be wrong to do what it says -- partitioned tables *should* be included in that case. I will fix the patches accordingly. Thanks, Amit
Hi Peter , Thanks for the review and sorry it took me a while to get back. On Wed, Jan 8, 2020 at 7:54 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > Looking through 0001, I think perhaps there is a better way to structure > some of the API changes. > > Instead of passing the root_target_rel to CheckValidResultRel() and > CheckCmdReplicaIdentity(), which we only need to check the publication > actions of the root table, how about changing > GetRelationPublicationActions() to automatically include the publication > information of the root table. Then we have that information in the > relcache once and don't need to check the base table and the partition > root separately at each call site (of which there is only one right > now). (Would that work correctly with relcache invalidation?) > > Similarly, couldn't GetRelationPublications() just automatically take > partitioning into account? We don't need the separation between > GetRelationPublications() and GetRelationAncestorPublications(). This > would also avoid errors of omission, for example the > GetRelationPublications() call in ATPrepChangePersistence() doesn't take > GetRelationAncestorPublications() into account. I have addressed these comments in the attached updated patch. Other than that, the updated patch contains following significant changes: * Changed pg_publication.c: GetPublicationRelations() so that any published partitioned tables are expanded as needed * Since the pg_publication_tables view is backed by GetPublicationRelations(), that means subscriptioncmds.c: fetch_table_list() no longer needs to craft a query to include partitions when needed, because partitions are included at source. That seems better, because it allows to limit the complexity surrounding publication of partitioned tables to the publication side. * Fixed the publication table DDL to spot more cases of tables being added to a publication in a duplicative manner. For example, partition being added to a publication which already contains its ancestor and a partitioned tables being added to a publication (implying all of its partitions are added) which already contains a partition Only attaching 0001. Will send the rest after polishing them a bit more. Thanks, Amit
Attachment
On Wed, Jan 22, 2020 at 2:38 PM Amit Langote <amitlangote09@gmail.com> wrote: > Other than that, the updated patch contains following significant changes: > > * Changed pg_publication.c: GetPublicationRelations() so that any > published partitioned tables are expanded as needed > > * Since the pg_publication_tables view is backed by > GetPublicationRelations(), that means subscriptioncmds.c: > fetch_table_list() no longer needs to craft a query to include > partitions when needed, because partitions are included at source. > That seems better, because it allows to limit the complexity > surrounding publication of partitioned tables to the publication side. > > * Fixed the publication table DDL to spot more cases of tables being > added to a publication in a duplicative manner. For example, > partition being added to a publication which already contains its > ancestor and a partitioned tables being added to a publication > (implying all of its partitions are added) which already contains a > partition On second thought, this seems like an overkill. It might be OK after all for both a partitioned table and its partitions to be explicitly added to a publication without complaining of duplication. IOW, it's the user's call whether it makes sense to do that or not. > Only attaching 0001. Attached updated 0001 considering the above and the rest of the patches that add support for replicating partitioned tables using their own identity and schema. I have reorganized the other patches as follows: 0002: refactoring of logical/worker.c without any functionality changes (contains much less churn than in earlier versions) 0003: support logical replication into partitioned tables on the subscription side (allows replicating from a non-partitioned table on publisher node into a partitioned table on subscriber node) 0004: support optionally replicating partitioned table changes (and changes directly made to partitions) using root partitioned table identity and schema Thanks, Amit
Attachment
On 2020-01-23 11:10, Amit Langote wrote: > On Wed, Jan 22, 2020 at 2:38 PM Amit Langote<amitlangote09@gmail.com> wrote: >> Other than that, the updated patch contains following significant changes: >> >> * Changed pg_publication.c: GetPublicationRelations() so that any >> published partitioned tables are expanded as needed >> >> * Since the pg_publication_tables view is backed by >> GetPublicationRelations(), that means subscriptioncmds.c: >> fetch_table_list() no longer needs to craft a query to include >> partitions when needed, because partitions are included at source. >> That seems better, because it allows to limit the complexity >> surrounding publication of partitioned tables to the publication side. >> >> * Fixed the publication table DDL to spot more cases of tables being >> added to a publication in a duplicative manner. For example, >> partition being added to a publication which already contains its >> ancestor and a partitioned tables being added to a publication >> (implying all of its partitions are added) which already contains a >> partition > On second thought, this seems like an overkill. It might be OK after > all for both a partitioned table and its partitions to be explicitly > added to a publication without complaining of duplication. IOW, it's > the user's call whether it makes sense to do that or not. This structure looks good now. However, it does seem unfortunate that in pg_get_publication_tables() we need to postprocess the result of GetPublicationRelations(). Since we're already changing the API of GetPublicationRelations(), couldn't we also make it optionally not include partitioned tables? For the test, perhaps add test cases where partitions are attached and detached so that we can see whether their publication relcache information is properly updated. (I'm not doubting that it works, but it would be good to have a test for, in case of future restructuring.) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi Amit, Once again I went through this patch set and here are my few comments, On Thu, 23 Jan 2020 at 11:10, Amit Langote <amitlangote09@gmail.com> wrote: > > On Wed, Jan 22, 2020 at 2:38 PM Amit Langote <amitlangote09@gmail.com> wrote: > > Other than that, the updated patch contains following significant changes: > > > > * Changed pg_publication.c: GetPublicationRelations() so that any > > published partitioned tables are expanded as needed > > > > * Since the pg_publication_tables view is backed by > > GetPublicationRelations(), that means subscriptioncmds.c: > > fetch_table_list() no longer needs to craft a query to include > > partitions when needed, because partitions are included at source. > > That seems better, because it allows to limit the complexity > > surrounding publication of partitioned tables to the publication side. > > > > * Fixed the publication table DDL to spot more cases of tables being > > added to a publication in a duplicative manner. For example, > > partition being added to a publication which already contains its > > ancestor and a partitioned tables being added to a publication > > (implying all of its partitions are added) which already contains a > > partition > > On second thought, this seems like an overkill. It might be OK after > all for both a partitioned table and its partitions to be explicitly > added to a publication without complaining of duplication. IOW, it's > the user's call whether it makes sense to do that or not. > > > Only attaching 0001. > > Attached updated 0001 considering the above and the rest of the > patches that add support for replicating partitioned tables using > their own identity and schema. I have reorganized the other patches > as follows: > > 0002: refactoring of logical/worker.c without any functionality > changes (contains much less churn than in earlier versions) > > 0003: support logical replication into partitioned tables on the > subscription side (allows replicating from a non-partitioned table on > publisher node into a partitioned table on subscriber node) > > 0004: support optionally replicating partitioned table changes (and > changes directly made to partitions) using root partitioned table > identity and schema + cannot replicate from a regular table into a partitioned able or vice Here is a missing t from table. + <para> + When a partitioned table is added to a publication, all of its existing + and future partitions are also implicitly considered to be part of the + publication. So, even operations that are performed directly on a + partition are also published via its ancestors' publications. Now this is confusing, does it mean that when partitions are later added to the table they will be replicated too, I think not, because you need to first create them manually at the replication side, isn't it...? + /* Must be a regular or partitioned table */ + if (RelationGetForm(targetrel)->relkind != RELKIND_RELATION && + RelationGetForm(targetrel)->relkind != RELKIND_PARTITIONED_TABLE) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("\"%s\" is not a table", IMHO the error message and details should be modified here to something along the lines of 'is neither a regular or partitioned table' + * published via an ancestor and when a partitioned tables's partitions tables's --> tables' + if (get_rel_relispartition(relid)) + { + List *ancestors = get_partition_ancestors(relid); Now, this is just for my understanding, why the ancestors have to be a list, I always assumed that a partition could only have one ancestor -- the root table. Is there something more to it that I am totally missing here or is it to cover the scenario of having partitions of partitions. Here I also want to clarify one thing, does it also happen like if a partitioned table is dropped from a publication then all its partitions are also implicitly dropped? As far as my understanding goes that doesn't happen, so shouldn't there be some notice about it. -GetPublicationRelations(Oid pubid) +GetPublicationRelations(Oid pubid, bool include_partitions) How about having an enum here with INCLUDE_PARTITIONS, INCLUDE_PARTITIONED_REL, and SKIP_PARTITIONS to address the three possibilities and avoiding reiterating through the list in pg_get_publication_tables(). -- Regards, Rafia Sabih
On Tue, Jan 28, 2020 at 6:11 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > This structure looks good now. Thanks for taking a look. > However, it does seem unfortunate that in pg_get_publication_tables() we > need to postprocess the result of GetPublicationRelations(). Since > we're already changing the API of GetPublicationRelations(), couldn't we > also make it optionally not include partitioned tables? Hmm, okay. We really need GetPublicationRelations() to handle partitioned tables in 3 ways: 1. Don't expand and return them as-is 2. Expand and return only leaf partitions 3. Expand and return all partitions I will try that in the new patch. > For the test, perhaps add test cases where partitions are attached and > detached so that we can see whether their publication relcache > information is properly updated. (I'm not doubting that it works, but > it would be good to have a test for, in case of future restructuring.) Okay, I will add some to publication.sql. Will send updated patches after addressing Rafia's comments. Thanks, Amit
Thank Rafia for the review. On Wed, Jan 29, 2020 at 3:55 PM Rafia Sabih <rafia.pghackers@gmail.com> wrote: > On Thu, 23 Jan 2020 at 11:10, Amit Langote <amitlangote09@gmail.com> wrote: > > v10 patches > + cannot replicate from a regular table into a partitioned able or vice > Here is a missing t from table. Oops, fixed. > + <para> > + When a partitioned table is added to a publication, all of its existing > + and future partitions are also implicitly considered to be part of the > + publication. So, even operations that are performed directly on a > + partition are also published via its ancestors' publications. > > Now this is confusing, does it mean that when partitions are later > added to the table they will be replicated too, I think not, because > you need to first create them manually at the replication side, isn't > it...? Yes, it's upon the user to make sure that they have set up the partitions correctly on the subscriber. I don't see how that's very different from what needs to be done when tables are added to a publication after-the-fact. Did I misunderstand you? > + /* Must be a regular or partitioned table */ > + if (RelationGetForm(targetrel)->relkind != RELKIND_RELATION && > + RelationGetForm(targetrel)->relkind != RELKIND_PARTITIONED_TABLE) > ereport(ERROR, > (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > errmsg("\"%s\" is not a table", > > IMHO the error message and details should be modified here to > something along the lines of 'is neither a regular or partitioned > table' Hmm, this is simply following a convention that's used in most places around the code, although I'm not really a fan of these "not a <whatever>"-style messages to begin with. It's less ambiguous with a "cannot perform <action> on <relkind>"-style message, which some places already use. In that view, I have changed the documentation too to say this: + Replication is only supported by tables, partitioned or not, although a + given table must either be partitioned on both servers or not partitioned + at all. Also, when replicating between partitioned tables, the actual + replication occurs between leaf partitions, so partitions on the two + servers must match one-to-one. In retrospect, the confusion surrounding how we communicate the various operations and properties that cannot be supported on a table if partitioned, both in the error messages and the documentation, could have been avoided if it wasn't based on relkind. I guess it's too late now though. :( > + * published via an ancestor and when a partitioned tables's partitions > tables's --> tables' > > + if (get_rel_relispartition(relid)) > + { > + List *ancestors = get_partition_ancestors(relid); > > Now, this is just for my understanding, why the ancestors have to be a > list, I always assumed that a partition could only have one ancestor > -- the root table. Is there something more to it that I am totally > missing here or is it to cover the scenario of having partitions of > partitions. Yes, with multi-level partitioning. > Here I also want to clarify one thing, does it also happen like if a > partitioned table is dropped from a publication then all its > partitions are also implicitly dropped? As far as my understanding > goes that doesn't happen, so shouldn't there be some notice about it. Actually, that is what happens, unless partitions were explicitly added to the publication, in which case they will continue to be published. > -GetPublicationRelations(Oid pubid) > +GetPublicationRelations(Oid pubid, bool include_partitions) > > How about having an enum here with INCLUDE_PARTITIONS, > INCLUDE_PARTITIONED_REL, and SKIP_PARTITIONS to address the three > possibilities and avoiding reiterating through the list in > pg_get_publication_tables(). I have done something similar in the updated patch, as I mentioned in my earlier reply. Please check the updated patches. Thanks, Amit
Attachment
I have committed the 0001 patch of this series (partitioned table member of publication). I changed the new argument of GetPublicationRelations() to an enum and reformatted some comments. I'll continue looking through the subsequent patches. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Mar 10, 2020 at 5:52 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > I have committed the 0001 patch of this series (partitioned table member > of publication). I changed the new argument of > GetPublicationRelations() to an enum and reformatted some comments. > I'll continue looking through the subsequent patches. Thank you. - Amit
I was trying to extract some preparatory work from the remaining patches and came up with the attached. This is part of your patch 0003, but also relevant for part 0004. The problem was that COPY (SELECT *) is not sufficient when the table has generated columns, so we need to build the column list explicitly. Thoughts? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Hi Peter, On Mon, Mar 16, 2020 at 9:49 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > I was trying to extract some preparatory work from the remaining patches > and came up with the attached. This is part of your patch 0003, but > also relevant for part 0004. The problem was that COPY (SELECT *) is > not sufficient when the table has generated columns, so we need to build > the column list explicitly. > > Thoughts? Thank you for that. + if (isnull || !remote_is_publishable) + ereport(ERROR, + (errmsg("table \"%s.%s\" on the publisher is not publishable", + nspname, relname))); Maybe add a one-line comment above this to say it's an "not supposed to happen" error or am I missing something? Wouldn't elog() suffice for this? Other than that, looks good. -- Thank you, Amit
On Wed, Mar 18, 2020 at 12:06 PM Amit Langote <amitlangote09@gmail.com> wrote: > Hi Peter, > > On Mon, Mar 16, 2020 at 9:49 PM Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: > > > > I was trying to extract some preparatory work from the remaining patches > > and came up with the attached. This is part of your patch 0003, but > > also relevant for part 0004. The problem was that COPY (SELECT *) is > > not sufficient when the table has generated columns, so we need to build > > the column list explicitly. > > > > Thoughts? > > Thank you for that. > > + if (isnull || !remote_is_publishable) > + ereport(ERROR, > + (errmsg("table \"%s.%s\" on the publisher is not publishable", > + nspname, relname))); > > Maybe add a one-line comment above this to say it's an "not supposed > to happen" error or am I missing something? Wouldn't elog() suffice > for this? > > Other than that, looks good. Wait, the following Assert in copy_table() should now be gone: Assert(relmapentry->localrel->rd_rel->relkind == RELKIND_RELATION); because just below it: /* Start copy on the publisher. */ initStringInfo(&cmd); - appendStringInfo(&cmd, "COPY %s TO STDOUT", - quote_qualified_identifier(lrel.nspname, lrel.relname)); + if (lrel.relkind == RELKIND_RELATION) + appendStringInfo(&cmd, "COPY %s TO STDOUT", + quote_qualified_identifier(lrel.nspname, lrel.relname)); + else + { + /* + * For non-tables, we need to do COPY (SELECT ...), but we can't just + * do SELECT * because we need to not copy generated columns. + */ By the way, I have rebased the patches, although maybe you've got your own copies; attached. -- Thank you, Amit
Attachment
On 2020-03-18 04:06, Amit Langote wrote: > + if (isnull || !remote_is_publishable) > + ereport(ERROR, > + (errmsg("table \"%s.%s\" on the publisher is not publishable", > + nspname, relname))); > > Maybe add a one-line comment above this to say it's an "not supposed > to happen" error or am I missing something? Wouldn't elog() suffice > for this? On second thought, maybe we should just drop this check. The list of tables that is part of the publication was already filtered by the publisher, so this query doesn't need to check it again. We just need the relkind to be able to construct the COPY command, but we don't need to second-guess it beyond that. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Mar 18, 2020 at 8:16 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 2020-03-18 04:06, Amit Langote wrote: > > + if (isnull || !remote_is_publishable) > > + ereport(ERROR, > > + (errmsg("table \"%s.%s\" on the publisher is not publishable", > > + nspname, relname))); > > > > Maybe add a one-line comment above this to say it's an "not supposed > > to happen" error or am I missing something? Wouldn't elog() suffice > > for this? > > On second thought, maybe we should just drop this check. The list of > tables that is part of the publication was already filtered by the > publisher, so this query doesn't need to check it again. We just need > the relkind to be able to construct the COPY command, but we don't need > to second-guess it beyond that. Agreed. -- Thank you, Amit
On 2020-03-18 15:19, Amit Langote wrote: > On Wed, Mar 18, 2020 at 8:16 PM Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >> On 2020-03-18 04:06, Amit Langote wrote: >>> + if (isnull || !remote_is_publishable) >>> + ereport(ERROR, >>> + (errmsg("table \"%s.%s\" on the publisher is not publishable", >>> + nspname, relname))); >>> >>> Maybe add a one-line comment above this to say it's an "not supposed >>> to happen" error or am I missing something? Wouldn't elog() suffice >>> for this? >> >> On second thought, maybe we should just drop this check. The list of >> tables that is part of the publication was already filtered by the >> publisher, so this query doesn't need to check it again. We just need >> the relkind to be able to construct the COPY command, but we don't need >> to second-guess it beyond that. > > Agreed. Committed with that change then. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-03-18 08:33, Amit Langote wrote: > By the way, I have rebased the patches, although maybe you've got your > own copies; attached. Looking through 0002 and 0003 now. The structure looks generally good. In 0002, the naming of apply_handle_insert() vs. apply_handle_do_insert() etc. seems a bit prone to confusion. How about something like apply_handle_insert_internal()? Also, should we put each of those internal functions next to their internal function instead of in a separate group like you have it? In apply_handle_do_insert(), the argument localslot should probably be remoteslot. In apply_handle_do_delete(), the ExecOpenIndices() call was moved to a different location relative to the rest of the code. That was probably not intended. In 0003, you have /* TODO, use inverse lookup hashtable? */. Is this something you plan to address in this cycle, or is that more for future generations? 0003 could use some more tests. The one test that you adjusted just ensures the data goes somewhere instead of being rejected, but there are no tests that check whether it ends up in the right partition, whether cross-partition updates work etc. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Mar 19, 2020 at 11:18 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 2020-03-18 08:33, Amit Langote wrote: > > By the way, I have rebased the patches, although maybe you've got your > > own copies; attached. > > Looking through 0002 and 0003 now. > > The structure looks generally good. Thanks for the review. > In 0002, the naming of apply_handle_insert() vs. > apply_handle_do_insert() etc. seems a bit prone to confusion. How about > something like apply_handle_insert_internal()? Also, should we put each > of those internal functions next to their internal function instead of > in a separate group like you have it? Sure. > In apply_handle_do_insert(), the argument localslot should probably be > remoteslot. You're right, fixed. > In apply_handle_do_delete(), the ExecOpenIndices() call was moved to a > different location relative to the rest of the code. That was probably > not intended. Fixed. > In 0003, you have /* TODO, use inverse lookup hashtable? */. Is this > something you plan to address in this cycle, or is that more for future > generations? Sorry, this is simply a copy-paste from logicalrep_relmap_invalidate_cb(). > 0003 could use some more tests. The one test that you adjusted just > ensures the data goes somewhere instead of being rejected, but there are > no tests that check whether it ends up in the right partition, whether > cross-partition updates work etc. Okay, added some tests. Attached updated patches. -- Thank you, Amit
Attachment
On 2020-03-23 06:02, Amit Langote wrote: > Okay, added some tests. > > Attached updated patches. I have committed the worker.c refactoring patch. "Add subscription support to replicate into partitioned tables" still has lacking test coverage. Your changes in relation.c are not exercised at all because the partitioned table branch in apply_handle_update() is never taken. This is critical and tricky code, so I would look for significant testing. The code looks okay to me. I would remove this code + memset(entry->attrmap->attnums, -1, + entry->attrmap->maplen * sizeof(AttrNumber)); because the entries are explicitly filled right after anyway, and filling the bytes with -1 has an unclear effect. There is also seemingly some fishiness in this code around whether attribute numbers are zero- or one-based. Perhaps this could be documented briefly. Maybe I'm misunderstanding something. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Mar 25, 2020 at 9:29 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 2020-03-23 06:02, Amit Langote wrote: > > Okay, added some tests. > > > > Attached updated patches. > > I have committed the worker.c refactoring patch. > > "Add subscription support to replicate into partitioned tables" still > has lacking test coverage. Your changes in relation.c are not exercised > at all because the partitioned table branch in apply_handle_update() is > never taken. This is critical and tricky code, so I would look for > significant testing. While trying some tests around the code you mentioned, I found what looks like a bug, which looking into now. > The code looks okay to me. I would remove this code > > + memset(entry->attrmap->attnums, -1, > + entry->attrmap->maplen * sizeof(AttrNumber)); > > because the entries are explicitly filled right after anyway, and > filling the bytes with -1 has an unclear effect. There is also > seemingly some fishiness in this code around whether attribute numbers > are zero- or one-based. Perhaps this could be documented briefly. > Maybe I'm misunderstanding something. Will check and fix as necessary. -- Thank you, Amit Langote EnterpriseDB: http://www.enterprisedb.com
On Thu, Mar 26, 2020 at 11:23 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Wed, Mar 25, 2020 at 9:29 PM Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: > > On 2020-03-23 06:02, Amit Langote wrote: > > > Okay, added some tests. > > > > > > Attached updated patches. > > > > I have committed the worker.c refactoring patch. > > > > "Add subscription support to replicate into partitioned tables" still > > has lacking test coverage. Your changes in relation.c are not exercised > > at all because the partitioned table branch in apply_handle_update() is > > never taken. This is critical and tricky code, so I would look for > > significant testing. > > While trying some tests around the code you mentioned, I found what > looks like a bug, which looking into now. Turns out the code in apply_handle_tuple_routing() for the UPDATE message was somewhat bogus, which fixed in the updated version. I ended up with anothing refactoring patch, which attached as 0001. It appears to me that the tests now seem enough to cover apply_handle_tuple_routing(), although more could still be added. > > The code looks okay to me. I would remove this code > > > > + memset(entry->attrmap->attnums, -1, > > + entry->attrmap->maplen * sizeof(AttrNumber)); > > > > because the entries are explicitly filled right after anyway, and > > filling the bytes with -1 has an unclear effect. There is also > > seemingly some fishiness in this code around whether attribute numbers > > are zero- or one-based. Perhaps this could be documented briefly. > > Maybe I'm misunderstanding something. > > Will check and fix as necessary. Removed that memset. I have added a comment about one- vs. zero-based indexes contained in the maps coming from two different modules, viz. tuple routing and logical replication, resp. -- Thank you, Amit Langote EnterpriseDB: http://www.enterprisedb.com
Attachment
I have updated the comments in apply_handle_tuple_routing() (see 0002) to better explain what's going on with UPDATE handling. I also rearranged the tests a bit for clarity. Attached updated patches. -- Thank you, Amit Langote EnterpriseDB: http://www.enterprisedb.com
Attachment
On 2020-03-30 17:42, Amit Langote wrote: > I have updated the comments in apply_handle_tuple_routing() (see 0002) > to better explain what's going on with UPDATE handling. I also > rearranged the tests a bit for clarity. > > Attached updated patches. Test coverage for 0002 is still a bit lacking. Please do a coverage build yourself and get at least one test case to exercise every branch in apply_handle_tuple_routing(). Right now, I don't see any coverage for updates without attribute remapping and updates that don't move to a new partition. Also, the coverage report reveals that in logicalrep_partmap_init(), the patch is mistakenly initializing LogicalRepRelMapContext instead of LogicalRepPartMapContext. (Hmm, how does it even work like that?) I think apart from some of these details, this patch is okay, but I don't have deep experience in the partitioning code, I can just see that it looks like other code elsewhere. Perhaps someone with more knowledge can give this a look as well. About patch 0003, I was talking to some people offline about the name of the option. There was some confusion about using the term "schema". How about naming it "publish_via_partition_root", which also matches the name of the analogous option in pg_dump. Code coverage here could also be improved. A lot of the new code in pgoutput.c is not tested. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 02/04/2020 14:23, Peter Eisentraut wrote: > On 2020-03-30 17:42, Amit Langote wrote: >> I have updated the comments in apply_handle_tuple_routing() (see 0002) >> to better explain what's going on with UPDATE handling. I also >> rearranged the tests a bit for clarity. >> >> Attached updated patches. > > Also, the coverage report reveals that in logicalrep_partmap_init(), the > patch is mistakenly initializing LogicalRepRelMapContext instead of > LogicalRepPartMapContext. (Hmm, how does it even work like that?) > It works because it's just a MemoryContext and it's long lived. I wonder if the fix here is to simply remove the LogicalRepPartMapContext... > I think apart from some of these details, this patch is okay, but I > don't have deep experience in the partitioning code, I can just see that > it looks like other code elsewhere. Perhaps someone with more knowledge > can give this a look as well. > FWIW it looks okay to me as well from perspective of somebody who implemented something similar outside of core. > About patch 0003, I was talking to some people offline about the name of > the option. There was some confusion about using the term "schema". How > about naming it "publish_via_partition_root", which also matches the > name of the analogous option in pg_dump. > +1 (disclaimer: I was one of the people who discussed this offline) -- Petr Jelinek 2ndQuadrant - PostgreSQL Solutions for the Enterprise https://www.2ndQuadrant.com/
On Fri, Apr 3, 2020 at 4:52 PM Petr Jelinek <petr@2ndquadrant.com> wrote: > On 02/04/2020 14:23, Peter Eisentraut wrote: > > On 2020-03-30 17:42, Amit Langote wrote: > >> I have updated the comments in apply_handle_tuple_routing() (see 0002) > >> to better explain what's going on with UPDATE handling. I also > >> rearranged the tests a bit for clarity. > >> > >> Attached updated patches. > > > Also, the coverage report reveals that in logicalrep_partmap_init(), the > > patch is mistakenly initializing LogicalRepRelMapContext instead of > > LogicalRepPartMapContext. (Hmm, how does it even work like that?) > > > > It works because it's just a MemoryContext and it's long lived. I wonder > if the fix here is to simply remove the LogicalRepPartMapContext... Actually, there is no LogicalRepPartMapContext in the patches posted so far, but I have decided to add it in the updated patch. One advantage beside avoiding confusion is that it might help to tell memory consumed by the partitions apart from that consumed by the actual replication targets. > > I think apart from some of these details, this patch is okay, but I > > don't have deep experience in the partitioning code, I can just see that > > it looks like other code elsewhere. Perhaps someone with more knowledge > > can give this a look as well. > > > > FWIW it looks okay to me as well from perspective of somebody who > implemented something similar outside of core. Thanks for giving it a look. > > About patch 0003, I was talking to some people offline about the name of > > the option. There was some confusion about using the term "schema". How > > about naming it "publish_via_partition_root", which also matches the > > name of the analogous option in pg_dump. > > > > +1 (disclaimer: I was one of the people who discussed this offline) Okay, I like that too. I am checking test coverage at the moment and should have the patches ready by sometime later today. -- Thank you, Amit Langote EnterpriseDB: http://www.enterprisedb.com
On Fri, Apr 3, 2020 at 6:34 PM Amit Langote <amitlangote09@gmail.com> wrote: > I am checking test coverage at the moment and should have the patches > ready by sometime later today. Attached updated patches. I confirmed using a coverage build that all the new code in logical/worker.c due to 0002 is now covered. For some reason, coverage report for pgoutput.c doesn't say the same thing for 0003's changes, although I doubt that result. It seems strange to believe that *none* of the new code is tested. I even checked by adding debugging elog()s next to the lines that the coverage report says aren't exercised, which tell me that that's not true. Perhaps my coverage build is somehow getting messed up, so it would be nice if someone with reliable coverage builds can confirm one way or the other. I will continue to check what's wrong. I fixed a couple of bugs in 0002. One of the bugs was that the "partition map" hash table in logical/relation.c didn't really work, so logicalrep_partition_would() always create a new entry. In 0003, changed the publication parameter name to publish_via_partition_root. -- Thank you, Amit Langote EnterpriseDB: http://www.enterprisedb.com
Attachment
Hi, On 03/04/2020 16:25, Amit Langote wrote: > On Fri, Apr 3, 2020 at 6:34 PM Amit Langote <amitlangote09@gmail.com> wrote: >> I am checking test coverage at the moment and should have the patches >> ready by sometime later today. > > Attached updated patches. > > I confirmed using a coverage build that all the new code in > logical/worker.c due to 0002 is now covered. For some reason, coverage > report for pgoutput.c doesn't say the same thing for 0003's changes, > although I doubt that result. It seems strange to believe that *none* > of the new code is tested. I even checked by adding debugging elog()s > next to the lines that the coverage report says aren't exercised, > which tell me that that's not true. Perhaps my coverage build is > somehow getting messed up, so it would be nice if someone with > reliable coverage builds can confirm one way or the other. I will > continue to check what's wrong. > AFAIK gcov can't handle multiple instances of same process being started as it just overwrites the coverage files. So for TAP test it will report bogus info (as in some code that's executed will look as not executed). We'd probably have to do some kind of `GCOV_PREFIX` magic in the TAP framework and merge (gcov/lcov can do that AFAIK) the resulting files to get accurate coverage info. But that's beyond this patch IMHO. -- Petr Jelinek 2ndQuadrant - PostgreSQL Solutions for the Enterprise https://www.2ndQuadrant.com/
Petr Jelinek <petr@2ndquadrant.com> writes: > AFAIK gcov can't handle multiple instances of same process being started > as it just overwrites the coverage files. So for TAP test it will report > bogus info (as in some code that's executed will look as not executed). Hm, really? I routinely run "make check" (ie, parallel regression tests) under coverage, and I get results that seem sane. If I were losing large chunks of the data, I think I'd have noticed. regards, tom lane
On 03/04/2020 16:59, Tom Lane wrote: > Petr Jelinek <petr@2ndquadrant.com> writes: >> AFAIK gcov can't handle multiple instances of same process being started >> as it just overwrites the coverage files. So for TAP test it will report >> bogus info (as in some code that's executed will look as not executed). > > Hm, really? I routinely run "make check" (ie, parallel regression > tests) under coverage, and I get results that seem sane. If I were > losing large chunks of the data, I think I'd have noticed. > Parallel regression still just starts single postgres instance no? -- Petr Jelinek 2ndQuadrant - PostgreSQL Solutions for the Enterprise https://www.2ndQuadrant.com/
Petr Jelinek <petr@2ndquadrant.com> writes: > On 03/04/2020 16:59, Tom Lane wrote: >> Petr Jelinek <petr@2ndquadrant.com> writes: >>> AFAIK gcov can't handle multiple instances of same process being started >>> as it just overwrites the coverage files. So for TAP test it will report >>> bogus info (as in some code that's executed will look as not executed). >> Hm, really? I routinely run "make check" (ie, parallel regression >> tests) under coverage, and I get results that seem sane. If I were >> losing large chunks of the data, I think I'd have noticed. > Parallel regression still just starts single postgres instance no? But the forked-off children have to write the gcov files independently, don't they? regards, tom lane
On 03/04/2020 17:51, Tom Lane wrote: > Petr Jelinek <petr@2ndquadrant.com> writes: >> On 03/04/2020 16:59, Tom Lane wrote: >>> Petr Jelinek <petr@2ndquadrant.com> writes: >>>> AFAIK gcov can't handle multiple instances of same process being started >>>> as it just overwrites the coverage files. So for TAP test it will report >>>> bogus info (as in some code that's executed will look as not executed). > >>> Hm, really? I routinely run "make check" (ie, parallel regression >>> tests) under coverage, and I get results that seem sane. If I were >>> losing large chunks of the data, I think I'd have noticed. > >> Parallel regression still just starts single postgres instance no? > > But the forked-off children have to write the gcov files independently, > don't they? > Hmm that's very good point. I did see these missing coverage issue when running tests that explicitly start more instances of postgres before though. And with some quick googling, parallel testing seems to be issue with gcov for more people. I wonder if the program checksum that gcov calculates when merging the .gcda data while updating it is somehow different for separately started instances but not for the ones forked from same parent or something. I don't know internals of gcov well enough to say how exactly that works. -- Petr Jelinek 2ndQuadrant - PostgreSQL Solutions for the Enterprise https://www.2ndQuadrant.com/
Petr Jelinek <petr@2ndquadrant.com> writes: > On 03/04/2020 17:51, Tom Lane wrote: >> But the forked-off children have to write the gcov files independently, >> don't they? > Hmm that's very good point. I did see these missing coverage issue when > running tests that explicitly start more instances of postgres before > though. And with some quick googling, parallel testing seems to be issue > with gcov for more people. I poked around and found this: https://gcc.gnu.org/legacy-ml/gcc-help/2005-11/msg00074.html which says gcov instrumentation is multi-process safe, but not multi-thread safe. The multi-processing safety relies on OS level file locking, which is not available on some systems. That would explain why it works for me, but then there's a question of why it doesn't work for you ... regards, tom lane
On 04/04/2020 07:25, Tom Lane wrote: > Petr Jelinek <petr@2ndquadrant.com> writes: >> On 03/04/2020 17:51, Tom Lane wrote: >>> But the forked-off children have to write the gcov files independently, >>> don't they? > >> Hmm that's very good point. I did see these missing coverage issue when >> running tests that explicitly start more instances of postgres before >> though. And with some quick googling, parallel testing seems to be issue >> with gcov for more people. > > I poked around and found this: > > https://gcc.gnu.org/legacy-ml/gcc-help/2005-11/msg00074.html > > which says > > gcov instrumentation is multi-process safe, but not multi-thread > safe. The multi-processing safety relies on OS level file locking, > which is not available on some systems. > > That would explain why it works for me, but then there's a question > of why it doesn't work for you ... Hmm, I wonder if it has something to do with docker then (I rarely run any tests directly on the main system nowadays). But that does not explain why it does not work for Amit either. -- Petr Jelinek 2ndQuadrant - PostgreSQL Solutions for the Enterprise https://www.2ndQuadrant.com/
On Sat, Apr 4, 2020 at 5:56 PM Petr Jelinek <petr@2ndquadrant.com> wrote: > On 04/04/2020 07:25, Tom Lane wrote: > > Petr Jelinek <petr@2ndquadrant.com> writes: > >> On 03/04/2020 17:51, Tom Lane wrote: > >>> But the forked-off children have to write the gcov files independently, > >>> don't they? > > > >> Hmm that's very good point. I did see these missing coverage issue when > >> running tests that explicitly start more instances of postgres before > >> though. And with some quick googling, parallel testing seems to be issue > >> with gcov for more people. > > > > I poked around and found this: > > > > https://gcc.gnu.org/legacy-ml/gcc-help/2005-11/msg00074.html > > > > which says > > > > gcov instrumentation is multi-process safe, but not multi-thread > > safe. The multi-processing safety relies on OS level file locking, > > which is not available on some systems. > > > > That would explain why it works for me, but then there's a question > > of why it doesn't work for you ... > > Hmm, I wonder if it has something to do with docker then (I rarely run > any tests directly on the main system nowadays). But that does not > explain why it does not work for Amit either. One thing to I must clarify: coverage for most of pgoutput.c looks okay on each run. I am concerned that the coverage for the code added by the patch is shown to be close to zero, which is a mystery to me, because I can confirm by other means such as debugging elogs() to next to the new code that the newly added tests do cover them. -- Thank you, Amit Langote EnterpriseDB: http://www.enterprisedb.com
Amit Langote <amitlangote09@gmail.com> writes: > One thing to I must clarify: coverage for most of pgoutput.c looks > okay on each run. I am concerned that the coverage for the code added > by the patch is shown to be close to zero, which is a mystery to me, > because I can confirm by other means such as debugging elogs() to next > to the new code that the newly added tests do cover them. According to https://coverage.postgresql.org/src/backend/replication/pgoutput/index.html the coverage is pretty good. Maybe you're doing something wrong in enabling coverage testing locally? regards, tom lane
On 2020-04-03 16:25, Amit Langote wrote: > On Fri, Apr 3, 2020 at 6:34 PM Amit Langote <amitlangote09@gmail.com> wrote: >> I am checking test coverage at the moment and should have the patches >> ready by sometime later today. > > Attached updated patches. Committed 0001 now. I'll work on the rest tomorrow. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Apr 6, 2020 at 10:25 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 2020-04-03 16:25, Amit Langote wrote: > > On Fri, Apr 3, 2020 at 6:34 PM Amit Langote <amitlangote09@gmail.com> wrote: > >> I am checking test coverage at the moment and should have the patches > >> ready by sometime later today. > > > > Attached updated patches. > > Committed 0001 now. I'll work on the rest tomorrow. Thank you. I have rebased the one remaining. -- Thank you, Amit Langote EnterpriseDB: http://www.enterprisedb.com
Attachment
On Tue, Apr 7, 2020 at 12:04 AM Amit Langote <amitlangote09@gmail.com> wrote: > > On Mon, Apr 6, 2020 at 10:25 PM Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: > > On 2020-04-03 16:25, Amit Langote wrote: > > > On Fri, Apr 3, 2020 at 6:34 PM Amit Langote <amitlangote09@gmail.com> wrote: > > >> I am checking test coverage at the moment and should have the patches > > >> ready by sometime later today. > > > > > > Attached updated patches. > > > > Committed 0001 now. I'll work on the rest tomorrow. > > Thank you. I have rebased the one remaining. I updated the patch to make the following changes: * Rewrote the tests to match in style with those committed yesterday * Renamed all variables that had pubasroot in it to have pubviaroot instead to match the publication parameter * Updated pg_publication catalog documentation -- Thank you, Amit Langote EnterpriseDB: http://www.enterprisedb.com
Attachment
On 2020-04-07 08:44, Amit Langote wrote: > I updated the patch to make the following changes: > > * Rewrote the tests to match in style with those committed yesterday > * Renamed all variables that had pubasroot in it to have pubviaroot > instead to match the publication parameter > * Updated pg_publication catalog documentation Thanks. I have some further questions: The change in nodeModifyTable.c to add CheckValidResultRel() is unclear. It doesn't seem to do anything, and it's not clear how it's related to this patch. The changes in GetRelationPublications() are confusing to me: + if (published_rels) + { + num = list_length(result); + for (i = 0; i < num; i++) + *published_rels = lappend_oid(*published_rels, relid); + } This adds relid to the output list "num" times, where num is the number of publications found. Shouldn't "i" be used in the loop somehow? Similarly later in the function. The descriptions of the new fields in RelationSyncEntry don't seem to match the code accurately, or at least it's confusing. replicate_as_relid is always filled in with an ancestor, even if pubviaroot is not set. I think the pubviaroot field is actually not necessary. We only need replicate_as_relid. There is a markup typo in logical-replication.sgml: <xref linkend=="sql-createpublication"/> In pg_dump, you missed updating a branch for an older version. See attached patch. Also attached a patch to rephrase the psql output a bit to make it not so long. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Thanks for the review. On Tue, Apr 7, 2020 at 6:01 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 2020-04-07 08:44, Amit Langote wrote: > > I updated the patch to make the following changes: > > > > * Rewrote the tests to match in style with those committed yesterday > > * Renamed all variables that had pubasroot in it to have pubviaroot > > instead to match the publication parameter > > * Updated pg_publication catalog documentation > > Thanks. I have some further questions: > > The change in nodeModifyTable.c to add CheckValidResultRel() is unclear. > It doesn't seem to do anything, and it's not clear how it's related to > this patch. CheckValidResultRel() checks that replica identity is present for replicating given update/delete, which I think, it's better to perform on the root table itself, rather than some partition that would be affected. The latter already occurs by way of CheckValidResultRel() being called on partitions to be updated. I think we get a more helpful message if the root parent is flagged instead of a partition. update prt1 set b = b + 1 where a = 578; ERROR: cannot update table "prt1" because it does not have a replica identity and publishes updates HINT: To enable updating the table, set REPLICA IDENTITY using ALTER TABLE. vs. -- checking the partition update prt1 set b = b + 1 where a = 578; ERROR: cannot update table "prt1_p3" because it does not have a replica identity and publishes updates HINT: To enable updating the table, set REPLICA IDENTITY using ALTER TABLE. I am okay to get rid of the check on root table if flagging invidual partitions seems good enough. > The changes in GetRelationPublications() are confusing to me: > > + if (published_rels) > + { > + num = list_length(result); > + for (i = 0; i < num; i++) > + *published_rels = lappend_oid(*published_rels, relid); > + } > > This adds relid to the output list "num" times, where num is the number > of publications found. Shouldn't "i" be used in the loop somehow? > Similarly later in the function. published_rels contains an *OID* for each publication that will be in result. Callers should iterate the two lists together and for each publication found in result, it will know which relation it is associated with using the OID found in published_rels being scanned in parallel. If publishing through an ancestor's publication, we need to know which ancestor, so the whole dance. I have thought this to be a bit ugly before, but after having to explain it, I think it's better to use some other approach for this. I have updated the patch so that GetRelationPublications no longer considers a relation's ancestors. That way, it doesn't have to second-guess what other information will be needed by the caller. I hope that's clearer, because all the logic is in one place and that is get_rel_sync_entry(). > The descriptions of the new fields in RelationSyncEntry don't seem to > match the code accurately, or at least it's confusing. > replicate_as_relid is always filled in with an ancestor, even if > pubviaroot is not set. Given this confusion, I have changed how replicate_as_relid works so that it's now always set -- if different from the relation's own OID, the code for "publishing via root" kicks in in various places. > I think the pubviaroot field is actually not necessary. We only need > replicate_as_relid. Looking through the code, I agree. I guess I only kept it around to go with pubupdate, etc. I guess it might also be a good idea to call it publish_as_relid instead of replicate_as_relid for consistency. > There is a markup typo in logical-replication.sgml: > > <xref linkend=="sql-createpublication"/> Oops, fixed. > In pg_dump, you missed updating a branch for an older version. See > attached patch. > > Also attached a patch to rephrase the psql output a bit to make it not > so long. Thank you, merged. Attached updated patch with above changes. -- Thank you, Amit Langote EnterpriseDB: http://www.enterprisedb.com
Attachment
On Wed, Apr 8, 2020 at 1:22 AM Amit Langote <amitlangote09@gmail.com> wrote: > On Tue, Apr 7, 2020 at 6:01 PM Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: > > The descriptions of the new fields in RelationSyncEntry don't seem to > > match the code accurately, or at least it's confusing. > > replicate_as_relid is always filled in with an ancestor, even if > > pubviaroot is not set. > > Given this confusion, I have changed how replicate_as_relid works so > that it's now always set -- if different from the relation's own OID, > the code for "publishing via root" kicks in in various places. > > > I think the pubviaroot field is actually not necessary. We only need > > replicate_as_relid. > > Looking through the code, I agree. I guess I only kept it around to > go with pubupdate, etc. Think I broke truncate replication with this. Fixed in the attached updated patch. -- Amit Langote EnterpriseDB: http://www.enterprisedb.com
Attachment
On 2020-04-08 07:45, Amit Langote wrote: > On Wed, Apr 8, 2020 at 1:22 AM Amit Langote <amitlangote09@gmail.com> wrote: >> On Tue, Apr 7, 2020 at 6:01 PM Peter Eisentraut >> <peter.eisentraut@2ndquadrant.com> wrote: >>> The descriptions of the new fields in RelationSyncEntry don't seem to >>> match the code accurately, or at least it's confusing. >>> replicate_as_relid is always filled in with an ancestor, even if >>> pubviaroot is not set. >> >> Given this confusion, I have changed how replicate_as_relid works so >> that it's now always set -- if different from the relation's own OID, >> the code for "publishing via root" kicks in in various places. >> >>> I think the pubviaroot field is actually not necessary. We only need >>> replicate_as_relid. >> >> Looking through the code, I agree. I guess I only kept it around to >> go with pubupdate, etc. > > Think I broke truncate replication with this. Fixed in the attached > updated patch. All committed. Thank you and everyone very much for working on this. I'm very happy that these two features from PG10 have finally met. :) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Apr 8, 2020 at 6:26 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > All committed. > > Thank you and everyone very much for working on this. I'm very happy > that these two features from PG10 have finally met. :) Thanks a lot for reviewing and committing. prion seems to have failed: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2020-04-08%2009%3A53%3A13 Also, still unsure why the coverage report for pgoutput.c changes not good: https://coverage.postgresql.org/src/backend/replication/pgoutput/pgoutput.c.gcov.html Will check. -- Amit Langote EnterpriseDB: http://www.enterprisedb.com
On 2020-04-08 13:16, Amit Langote wrote: > On Wed, Apr 8, 2020 at 6:26 PM Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >> All committed. >> >> Thank you and everyone very much for working on this. I'm very happy >> that these two features from PG10 have finally met. :) > > Thanks a lot for reviewing and committing. > > prion seems to have failed: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2020-04-08%2009%3A53%3A13 This comes from -DRELCACHE_FORCE_RELEASE. > Also, still unsure why the coverage report for pgoutput.c changes not good: > https://coverage.postgresql.org/src/backend/replication/pgoutput/pgoutput.c.gcov.html I think this is because the END { } section in PostgresNode.pm shuts down all running instances in immediate mode, which doesn't save coverage properly. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Apr 8, 2020 at 9:21 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 2020-04-08 13:16, Amit Langote wrote: > > On Wed, Apr 8, 2020 at 6:26 PM Peter Eisentraut > > <peter.eisentraut@2ndquadrant.com> wrote: > >> All committed. > >> > >> Thank you and everyone very much for working on this. I'm very happy > >> that these two features from PG10 have finally met. :) > > > > Thanks a lot for reviewing and committing. > > > > prion seems to have failed: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2020-04-08%2009%3A53%3A13 > > This comes from -DRELCACHE_FORCE_RELEASE. I'm seeing some funny stuff on such a build locally too, although haven't been able to make sense of it yet. > > Also, still unsure why the coverage report for pgoutput.c changes not good: > > https://coverage.postgresql.org/src/backend/replication/pgoutput/pgoutput.c.gcov.html > > I think this is because the END { } section in PostgresNode.pm shuts > down all running instances in immediate mode, which doesn't save > coverage properly. Thanks for that tip. Appending the following at the end of the test file has fixed the coverage reporting for me. I noticed the following coverage issues: 1. The previous commit f1ac27bfd missed a command that I had included to cover the following blocks of apply_handle_tuple_routing(): 1165 : else 1166 : { 1167 0 : remoteslot = ExecCopySlot(remoteslot, remoteslot_part); 1168 0 : slot_getallattrs(remoteslot); 1169 : } ... 1200 2 : if (map != NULL) 1201 : { 1202 0 : remoteslot_part = execute_attr_map_slot(map->attrMap, 1203 : remoteslot, 1204 : remoteslot_part); 1205 : } 2. Now that I am able to see proper coverage fo publish_via_partition_root related changes, I can see that a block in pgoutput_change() is missing coverage The attached fixes these coverage issues. -- Amit Langote EnterpriseDB: http://www.enterprisedb.com
Attachment
On Wed, Apr 8, 2020 at 11:07 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Wed, Apr 8, 2020 at 9:21 PM Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: > > I think this is because the END { } section in PostgresNode.pm shuts > > down all running instances in immediate mode, which doesn't save > > coverage properly. > > Thanks for that tip. Appending the following at the end of the test > file has fixed the coverage reporting for me. The patch posted in the previous email has it, but I meant this by "the following": + +$node_publisher->stop('fast'); +$node_subscriber1->stop('fast'); +$node_subscriber2->stop('fast'); -- Amit Langote EnterpriseDB: http://www.enterprisedb.com
On Wed, Apr 8, 2020 at 11:07 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Wed, Apr 8, 2020 at 9:21 PM Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: > > On 2020-04-08 13:16, Amit Langote wrote: > > > prion seems to have failed: > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2020-04-08%2009%3A53%3A13 > > > > This comes from -DRELCACHE_FORCE_RELEASE. > > I'm seeing some funny stuff on such a build locally too, although > haven't been able to make sense of it yet. So, I see the following repeated in the publisher's log (013_partition.pl) until PostgresNode.pm times out: sub_viaroot ERROR: number of columns (2601) exceeds limit (1664) sub_viaroot CONTEXT: slot "sub_viaroot", output plugin "pgoutput", in the change callback, associated LSN 0/1621010 causing the tests introduced by this last commit to stall. Just before where the above starts repeating is this: sub_viaroot_16479_sync_16455 LOG: starting logical decoding for slot "sub_viaroot_16479_sync_16455" sub_viaroot_16479_sync_16455 DETAIL: Streaming transactions committing after 0/1620A40, reading WAL from 0/1620A08. sub_viaroot_16479_sync_16455 LOG: logical decoding found consistent point at 0/1620A08 sub_viaroot_16479_sync_16455 DETAIL: There are no running transactions. sub_viaroot_16479_sync_16470 LOG: statement: COPY public.tab3_1 TO STDOUT sub_viaroot_16479_sync_16470 LOG: statement: COMMIT Same thing for the other subscriber sub2. -- Amit Langote EnterpriseDB: http://www.enterprisedb.com
On 2020-04-09 05:39, Amit Langote wrote: > sub_viaroot ERROR: number of columns (2601) exceeds limit (1664) > sub_viaroot CONTEXT: slot "sub_viaroot", output plugin "pgoutput", in > the change callback, associated LSN 0/1621010 I think the problem is that in maybe_send_schema(), RelationClose(ancestor) releases the relcache entry, but the tuple descriptors, which are part of the relcache entry, are still pointed to by the tuple map. This patch makes the tests pass for me: diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c index 5fbf2d4367..cf6e8629c1 100644 --- a/src/backend/replication/pgoutput/pgoutput.c +++ b/src/backend/replication/pgoutput/pgoutput.c @@ -305,7 +305,7 @@ maybe_send_schema(LogicalDecodingContext *ctx, /* Map must live as long as the session does. */ oldctx = MemoryContextSwitchTo(CacheMemoryContext); - relentry->map = convert_tuples_by_name(indesc, outdesc); + relentry->map = convert_tuples_by_name(CreateTupleDescCopy(indesc), CreateTupleDescCopy(outdesc)); MemoryContextSwitchTo(oldctx); send_relation_and_attrs(ancestor, ctx); RelationClose(ancestor); Please check. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Apr 9, 2020 at 4:14 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 2020-04-09 05:39, Amit Langote wrote: > > sub_viaroot ERROR: number of columns (2601) exceeds limit (1664) > > sub_viaroot CONTEXT: slot "sub_viaroot", output plugin "pgoutput", in > > the change callback, associated LSN 0/1621010 > > I think the problem is that in maybe_send_schema(), > RelationClose(ancestor) releases the relcache entry, but the tuple > descriptors, which are part of the relcache entry, are still pointed to > by the tuple map. > > This patch makes the tests pass for me: > > diff --git a/src/backend/replication/pgoutput/pgoutput.c > b/src/backend/replication/pgoutput/pgoutput.c > index 5fbf2d4367..cf6e8629c1 100644 > --- a/src/backend/replication/pgoutput/pgoutput.c > +++ b/src/backend/replication/pgoutput/pgoutput.c > @@ -305,7 +305,7 @@ maybe_send_schema(LogicalDecodingContext *ctx, > > /* Map must live as long as the session does. */ > oldctx = MemoryContextSwitchTo(CacheMemoryContext); > - relentry->map = convert_tuples_by_name(indesc, outdesc); > + relentry->map = > convert_tuples_by_name(CreateTupleDescCopy(indesc), > CreateTupleDescCopy(outdesc)); > MemoryContextSwitchTo(oldctx); > send_relation_and_attrs(ancestor, ctx); > RelationClose(ancestor); > > Please check. Thanks. Yes, that's what I just found out too and was about to send a patch, which is basically same as yours as far as the fix for this issue is concerned. While figuring this out, I thought the nearby code could be rearranged a bit, especially to de-duplicate the code. Also, I think get_rel_sync_entry() may be a better place to set the map, rather than maybe_send_schema(). Thoughts? -- Amit Langote EnterpriseDB: http://www.enterprisedb.com
Attachment
On 2020-04-09 09:28, Amit Langote wrote: >> This patch makes the tests pass for me: >> >> diff --git a/src/backend/replication/pgoutput/pgoutput.c >> b/src/backend/replication/pgoutput/pgoutput.c >> index 5fbf2d4367..cf6e8629c1 100644 >> --- a/src/backend/replication/pgoutput/pgoutput.c >> +++ b/src/backend/replication/pgoutput/pgoutput.c >> @@ -305,7 +305,7 @@ maybe_send_schema(LogicalDecodingContext *ctx, >> >> /* Map must live as long as the session does. */ >> oldctx = MemoryContextSwitchTo(CacheMemoryContext); >> - relentry->map = convert_tuples_by_name(indesc, outdesc); >> + relentry->map = >> convert_tuples_by_name(CreateTupleDescCopy(indesc), >> CreateTupleDescCopy(outdesc)); >> MemoryContextSwitchTo(oldctx); >> send_relation_and_attrs(ancestor, ctx); >> RelationClose(ancestor); >> >> Please check. > > Thanks. Yes, that's what I just found out too and was about to send a > patch, which is basically same as yours as far as the fix for this > issue is concerned. I have committed my patch but not ... > While figuring this out, I thought the nearby code could be rearranged > a bit, especially to de-duplicate the code. Also, I think > get_rel_sync_entry() may be a better place to set the map, rather than > maybe_send_schema(). Thoughts? because I didn't really have an opinion on that at the time, but if you still want it considered or have any open thoughts on this thread, please resend or explain. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Apr 17, 2020 at 10:23 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 2020-04-09 09:28, Amit Langote wrote: > > While figuring this out, I thought the nearby code could be rearranged > > a bit, especially to de-duplicate the code. Also, I think > > get_rel_sync_entry() may be a better place to set the map, rather than > > maybe_send_schema(). Thoughts? > > because I didn't really have an opinion on that at the time, but if you > still want it considered or have any open thoughts on this thread, > please resend or explain. Sure, thanks for taking care of the bug. Rebased the code rearrangement patch. Also resending the patch to fix TAP tests for improving coverage as described in: https://www.postgresql.org/message-id/CA%2BHiwqFyydvQ5g%3Dqa54UM%2BXjm77BdhX-nM4dXQkNOgH%3DzvDjoA%40mail.gmail.com To summarize: 1. Missing coverage for a couple of related blocks in apply_handle_tuple_routing() 2. Missing coverage report for the code in pgoutput.c added by 83fd4532 -- Amit Langote EnterpriseDB: http://www.enterprisedb.com
Attachment
The first file of Amit's patch can not only re-range the code, but also fix a hidden bug.
To make it easy to see, I attach another patch.
"RelationIdGetRelation" will increase ref on owner->relrefarr, without "RelationClose", the owner->relrefarr will enlarge and re-hash.
To make it easy to see, I attach another patch.
"RelationIdGetRelation" will increase ref on owner->relrefarr, without "RelationClose", the owner->relrefarr will enlarge and re-hash.
When the capacity of owner->relrefarr is over than 10 million, enlarge and re-hash takes serial hours. And what's worse, increase ref will also take minutes, as the hash collision resolution is based on looking up an array in order.
When we want to publish 10 billion data under one partition table, it takes serial days up to increase ref, enlarge and re-hash, and CPU is always 99%.
After applying my patch, 10 billion will be published in 10 minutes.
------------------ Original ------------------
From: "Amit Langote";<amitlangote09@gmail.com>;
Send time: Friday, Apr 17, 2020 10:58 PM
To: "Peter Eisentraut"<peter.eisentraut@2ndquadrant.com>;
Cc: "Petr Jelinek"<petr@2ndquadrant.com>; "Rafia Sabih"<rafia.pghackers@gmail.com>; "PostgreSQL-development"<pgsql-hackers@postgresql.org>;
Subject: Re: adding partitioned tables to publications
<peter.eisentraut@2ndquadrant.com> wrote:
> On 2020-04-09 09:28, Amit Langote wrote:
> > While figuring this out, I thought the nearby code could be rearranged
> > a bit, especially to de-duplicate the code. Also, I think
> > get_rel_sync_entry() may be a better place to set the map, rather than
> > maybe_send_schema(). Thoughts?
>
> because I didn't really have an opinion on that at the time, but if you
> still want it considered or have any open thoughts on this thread,
> please resend or explain.
Sure, thanks for taking care of the bug.
Rebased the code rearrangement patch. Also resending the patch to fix
TAP tests for improving coverage as described in:
https://www.postgresql.org/message-id/CA%2BHiwqFyydvQ5g%3Dqa54UM%2BXjm77BdhX-nM4dXQkNOgH%3DzvDjoA%40mail.gmail.com
To summarize:
1. Missing coverage for a couple of related blocks in
apply_handle_tuple_routing()
2. Missing coverage report for the code in pgoutput.c added by 83fd4532
--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com
Attachment
On Wed, Dec 30, 2020 at 8:03 PM 赵锐 <875941708@qq.com> wrote: > > The first file of Amit's patch can not only re-range the code, but also fix a hidden bug. > To make it easy to see, I attach another patch. > "RelationIdGetRelation" will increase ref on owner->relrefarr, without "RelationClose", the owner->relrefarr will enlargeand re-hash. > When the capacity of owner->relrefarr is over than 10 million, enlarge and re-hash takes serial hours. And what's worse,increase ref will also take minutes, as the hash collision resolution is based on looking up an array in order. > When we want to publish 10 billion data under one partition table, it takes serial days up to increase ref, enlarge andre-hash, and CPU is always 99%. > After applying my patch, 10 billion will be published in 10 minutes. > It is a clear relation descriptor leak. The proposed fix seems correct to me. The patch wasn't getting applied to HEAD. So, I have prepared the separate patches for HEAD and 13. There are minor modifications in the patch like I have used RelationIsValid before closing the relation. I have not added any test because I see that there is already a test in src/test/subscription/t/013_partition. Kindly let me know your English name so that I can give you credit as a co-author? -- With Regards, Amit Kapila.
Attachment
Thanks for your reply. The patch is exactly what I want.
My English name is Mark Zhao, which should be the current email name.
My English name is Mark Zhao, which should be the current email name.
Thanks,
Mark Zhao
------------------ Original ------------------
From: "Amit Kapila";<amit.kapila16@gmail.com>;
Send time: Monday, Jan 11, 2021 8:12 PM
To: "赵锐"<875941708@qq.com>;
Cc: "Peter Eisentraut"<peter.eisentraut@2ndquadrant.com>; "Amit Langote"<amitlangote09@gmail.com>; "Petr Jelinek"<petr@2ndquadrant.com>; "Rafia Sabih"<rafia.pghackers@gmail.com>; "PostgreSQL-development"<pgsql-hackers@postgresql.org>;
Subject: Re: adding partitioned tables to publications
>
> The first file of Amit's patch can not only re-range the code, but also fix a hidden bug.
> To make it easy to see, I attach another patch.
> "RelationIdGetRelation" will increase ref on owner->relrefarr, without "RelationClose", the owner->relrefarr will enlarge and re-hash.
> When the capacity of owner->relrefarr is over than 10 million, enlarge and re-hash takes serial hours. And what's worse, increase ref will also take minutes, as the hash collision resolution is based on looking up an array in order.
> When we want to publish 10 billion data under one partition table, it takes serial days up to increase ref, enlarge and re-hash, and CPU is always 99%.
> After applying my patch, 10 billion will be published in 10 minutes.
>
It is a clear relation descriptor leak. The proposed fix seems correct
to me. The patch wasn't getting applied to HEAD. So, I have prepared
the separate patches for HEAD and 13. There are minor modifications in
the patch like I have used RelationIsValid before closing the
relation. I have not added any test because I see that there is
already a test in src/test/subscription/t/013_partition.
Kindly let me know your English name so that I can give you credit as
a co-author?
--
With Regards,
Amit Kapila.
On Mon, Jan 11, 2021 at 5:44 PM Mark Zhao <875941708@qq.com> wrote: > > Thanks for your reply. The patch is exactly what I want. > My English name is Mark Zhao, which should be the current email name. > Pushed the fix. -- With Regards, Amit Kapila.
On Tue, Jan 12, 2021 at 5:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Jan 11, 2021 at 5:44 PM Mark Zhao <875941708@qq.com> wrote: > > > > Thanks for your reply. The patch is exactly what I want. > > My English name is Mark Zhao, which should be the current email name. > > > > Pushed the fix. Thanks Amit and Mark. I hadn't realized at the time that the relation descriptor leak was occurring, but good to see it tested and fixed. -- Amit Langote EDB: http://www.enterprisedb.com