Thread: adding partitioned tables to publications

adding partitioned tables to publications

From
Amit Langote
Date:
One cannot currently add partitioned tables to a publication.

create table p (a int, b int) partition by hash (a);
create table p1 partition of p for values with (modulus 3, remainder 0);
create table p2 partition of p for values with (modulus 3, remainder 1);
create table p3 partition of p for values with (modulus 3, remainder 2);

create publication publish_p for table p;
ERROR:  "p" is a partitioned table
DETAIL:  Adding partitioned tables to publications is not supported.
HINT:  You can add the table partitions individually.

One can do this instead:

create publication publish_p1 for table p1;
create publication publish_p2 for table p2;
create publication publish_p3 for table p3;

but maybe that's too much code to maintain for users.

I propose that we make this command:

create publication publish_p for table p;

automatically add all the partitions to the publication.  Also, any
future partitions should also be automatically added to the
publication.  So, publishing a partitioned table automatically
publishes all of its existing and future partitions.  Attached patch
implements that.

What doesn't change with this patch is that the partitions on the
subscription side still have to match one-to-one with the partitions
on the publication side, because the changes are still replicated as
being made to the individual partitions, not as the changes to the
root partitioned table.  It might be useful to implement that
functionality on the publication side, because it allows users to
define the replication target any way they need to, but this patch
doesn't implement that.

Thanks,
Amit

Attachment

Re: adding partitioned tables to publications

From
Amit Langote
Date:
On Mon, Oct 7, 2019 at 9:55 AM Amit Langote <amitlangote09@gmail.com> wrote:
> One cannot currently add partitioned tables to a publication.
>
> create table p (a int, b int) partition by hash (a);
> create table p1 partition of p for values with (modulus 3, remainder 0);
> create table p2 partition of p for values with (modulus 3, remainder 1);
> create table p3 partition of p for values with (modulus 3, remainder 2);
>
> create publication publish_p for table p;
> ERROR:  "p" is a partitioned table
> DETAIL:  Adding partitioned tables to publications is not supported.
> HINT:  You can add the table partitions individually.
>
> One can do this instead:
>
> create publication publish_p1 for table p1;
> create publication publish_p2 for table p2;
> create publication publish_p3 for table p3;
>
> but maybe that's too much code to maintain for users.
>
> I propose that we make this command:
>
> create publication publish_p for table p;
>
> automatically add all the partitions to the publication.  Also, any
> future partitions should also be automatically added to the
> publication.  So, publishing a partitioned table automatically
> publishes all of its existing and future partitions.  Attached patch
> implements that.
>
> What doesn't change with this patch is that the partitions on the
> subscription side still have to match one-to-one with the partitions
> on the publication side, because the changes are still replicated as
> being made to the individual partitions, not as the changes to the
> root partitioned table.  It might be useful to implement that
> functionality on the publication side, because it allows users to
> define the replication target any way they need to, but this patch
> doesn't implement that.

Added this to the next CF: https://commitfest.postgresql.org/25/2301/

Thanks,
Amit



Re: adding partitioned tables to publications

From
Rafia Sabih
Date:


On Thu, 10 Oct 2019 at 08:29, Amit Langote <amitlangote09@gmail.com> wrote:
On Mon, Oct 7, 2019 at 9:55 AM Amit Langote <amitlangote09@gmail.com> wrote:
> One cannot currently add partitioned tables to a publication.
>
> create table p (a int, b int) partition by hash (a);
> create table p1 partition of p for values with (modulus 3, remainder 0);
> create table p2 partition of p for values with (modulus 3, remainder 1);
> create table p3 partition of p for values with (modulus 3, remainder 2);
>
> create publication publish_p for table p;
> ERROR:  "p" is a partitioned table
> DETAIL:  Adding partitioned tables to publications is not supported.
> HINT:  You can add the table partitions individually.
>
> One can do this instead:
>
> create publication publish_p1 for table p1;
> create publication publish_p2 for table p2;
> create publication publish_p3 for table p3;
>
> but maybe that's too much code to maintain for users.
>
> I propose that we make this command:
>
> create publication publish_p for table p;
>
> automatically add all the partitions to the publication.  Also, any
> future partitions should also be automatically added to the
> publication.  So, publishing a partitioned table automatically
> publishes all of its existing and future partitions.  Attached patch
> implements that.
>
> What doesn't change with this patch is that the partitions on the
> subscription side still have to match one-to-one with the partitions
> on the publication side, because the changes are still replicated as
> being made to the individual partitions, not as the changes to the
> root partitioned table.  It might be useful to implement that
> functionality on the publication side, because it allows users to
> define the replication target any way they need to, but this patch
> doesn't implement that.

Added this to the next CF: https://commitfest.postgresql.org/25/2301/

Hi Amit,

Lately I was exploring logical replication feature of postgresql and I found this addition in the scope of feature for partitioned tables a useful one. 

In order to understand the working of your patch a bit more, I performed an experiment wherein I created a partitioned table with several  children and a default partition at the publisher side and normal tables of the same name as parent, children, and default partition of the publisher side at the subscriber side. Next I established the logical replication connection and to my surprise the data was successfully replicated from partitioned tables to normal tables and then this error filled the logs,
LOG:  logical replication table synchronization worker for subscription "my_subscription", table "parent" has started
ERROR:  table "public.parent" not found on publisher

here parent is the name of the partitioned table at the publisher side and it is present as normal table at subscriber side as well. Which is understandable, it is trying to find a normal table of the same name but couldn't find one, maybe it should not worry about that now also if not at replication time.

Please let me know if this is something expected because in my opinion this is not desirable, there should be some check to check the table type for replication. This wasn't important till now maybe because only normal tables were to be replicated, but with the extension of the scope of logical replication to more objects such checks would be helpful.

On a separate note was thinking for partitioned tables, wouldn't it be cleaner to have something like you create only partition table at the subscriber and then when logical replication starts it creates the child tables accordingly. Or would that be too much in future...?

--
Regards,
Rafia Sabih

Re: adding partitioned tables to publications

From
Amit Langote
Date:
Hello Rafia,

Great to hear that you are interested in this feature and thanks for
testing the patch.

On Thu, Oct 10, 2019 at 10:13 PM Rafia Sabih <rafia.pghackers@gmail.com> wrote:
> Lately I was exploring logical replication feature of postgresql and I found this addition in the scope of feature
forpartitioned tables a useful one. 
>
> In order to understand the working of your patch a bit more, I performed an experiment wherein I created a
partitionedtable with several  children and a default partition at the publisher side and normal tables of the same
nameas parent, children, and default partition of the publisher side at the subscriber side. Next I established the
logicalreplication connection and to my surprise the data was successfully replicated from partitioned tables to normal
tablesand then this error filled the logs, 
> LOG:  logical replication table synchronization worker for subscription "my_subscription", table "parent" has started
> ERROR:  table "public.parent" not found on publisher
>
> here parent is the name of the partitioned table at the publisher side and it is present as normal table at
subscriberside as well. Which is understandable, it is trying to find a normal table of the same name but couldn't find
one,maybe it should not worry about that now also if not at replication time. 
>
> Please let me know if this is something expected because in my opinion this is not desirable, there should be some
checkto check the table type for replication. This wasn't important till now maybe because only normal tables were to
bereplicated, but with the extension of the scope of logical replication to more objects such checks would be helpful. 

Thanks for sharing this case.  I hadn't considered it, but you're
right that it should be handled sensibly.  I have fixed table sync
code to handle this case properly.  Could you please check your case
with the attached updated patch?

> On a separate note was thinking for partitioned tables, wouldn't it be cleaner to have something like you create only
partitiontable at the subscriber and then when logical replication starts it creates the child tables accordingly. Or
wouldthat be too much in future...? 

Hmm, we'd first need to built the "automatic partition creation"
feature to consider doing something like that.  I'm sure you'd agree
that we should undertake that project separately from this tiny
logical replication usability improvement project. :)

Thanks again.

Regards,
Amit

Attachment

Re: adding partitioned tables to publications

From
Petr Jelinek
Date:
Hi,

On 07/10/2019 02:55, Amit Langote wrote:
> One cannot currently add partitioned tables to a publication.
> 
> create table p (a int, b int) partition by hash (a);
> create table p1 partition of p for values with (modulus 3, remainder 0);
> create table p2 partition of p for values with (modulus 3, remainder 1);
> create table p3 partition of p for values with (modulus 3, remainder 2);
> 
> create publication publish_p for table p;
> ERROR:  "p" is a partitioned table
> DETAIL:  Adding partitioned tables to publications is not supported.
> HINT:  You can add the table partitions individually.
> 
> One can do this instead:
> 
> create publication publish_p1 for table p1;
> create publication publish_p2 for table p2;
> create publication publish_p3 for table p3;

Or just create publication publish_p for table p1, p2, p3;

> 
> but maybe that's too much code to maintain for users.
> 
> I propose that we make this command:
> 
> create publication publish_p for table p;
> 

+1

> automatically add all the partitions to the publication.  Also, any
> future partitions should also be automatically added to the
> publication.  So, publishing a partitioned table automatically
> publishes all of its existing and future partitions.  Attached patch
> implements that.
> 
> What doesn't change with this patch is that the partitions on the
> subscription side still have to match one-to-one with the partitions
> on the publication side, because the changes are still replicated as
> being made to the individual partitions, not as the changes to the
> root partitioned table.  It might be useful to implement that
> functionality on the publication side, because it allows users to
> define the replication target any way they need to, but this patch
> doesn't implement that.
> 

Yeah for that to work subscription would need to also need to be able to 
write to partitioned tables, so it needs both sides to add support for 
this. I think if we do both what you did and the transparent handling of 
root only, we'll need new keyword to differentiate the two. It might 
make sense to think about if we want your way to need an extra keyword 
or the transparent one will need it.

One issue that I see reading the patch is following set of commands:

CREATE TABLE foo ...;
CREATE PUBLICATION mypub FOR TABLE foo;

CREATE TABLE bar ...;
ALTER PUBLICATION mypub ADD TABLE bar;

ALTER TABLE foo ATTACH PARTITION bar ...;
ALTER TABLE foo DETACH PARTITION bar ...;

This will end up with bar not being in any publication even though it 
was explicitly added. That might be acceptable caveat but it at least 
should be clearly documented (IMHO with warning).

-- 
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/



Re: adding partitioned tables to publications

From
David Fetter
Date:
On Mon, Oct 07, 2019 at 09:55:23AM +0900, Amit Langote wrote:
> One cannot currently add partitioned tables to a publication.
> 
> create table p (a int, b int) partition by hash (a);
> create table p1 partition of p for values with (modulus 3, remainder 0);
> create table p2 partition of p for values with (modulus 3, remainder 1);
> create table p3 partition of p for values with (modulus 3, remainder 2);
> 
> create publication publish_p for table p;
> ERROR:  "p" is a partitioned table
> DETAIL:  Adding partitioned tables to publications is not supported.
> HINT:  You can add the table partitions individually.
> 
> One can do this instead:
> 
> create publication publish_p1 for table p1;
> create publication publish_p2 for table p2;
> create publication publish_p3 for table p3;
> 
> but maybe that's too much code to maintain for users.
> 
> I propose that we make this command:
> 
> create publication publish_p for table p;
> 
> automatically add all the partitions to the publication.  Also, any
> future partitions should also be automatically added to the
> publication.  So, publishing a partitioned table automatically
> publishes all of its existing and future partitions.  Attached patch
> implements that.
> 
> What doesn't change with this patch is that the partitions on the
> subscription side still have to match one-to-one with the partitions
> on the publication side, because the changes are still replicated as
> being made to the individual partitions, not as the changes to the
> root partitioned table.  It might be useful to implement that
> functionality on the publication side, because it allows users to
> define the replication target any way they need to, but this patch
> doesn't implement that.

With this patch, is it possible to remove a partition manually from a
subscription, or will it just get automatically re-added at some
point?

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: adding partitioned tables to publications

From
Amit Langote
Date:
Hi David,

On Sun, Oct 13, 2019 at 4:55 PM David Fetter <david@fetter.org> wrote:
> On Mon, Oct 07, 2019 at 09:55:23AM +0900, Amit Langote wrote:
> > I propose that we make this command:
> >
> > create publication publish_p for table p;
> >
> > automatically add all the partitions to the publication.  Also, any
> > future partitions should also be automatically added to the
> > publication.  So, publishing a partitioned table automatically
> > publishes all of its existing and future partitions.  Attached patch
> > implements that.
> >
> > What doesn't change with this patch is that the partitions on the
> > subscription side still have to match one-to-one with the partitions
> > on the publication side, because the changes are still replicated as
> > being made to the individual partitions, not as the changes to the
> > root partitioned table.  It might be useful to implement that
> > functionality on the publication side, because it allows users to
> > define the replication target any way they need to, but this patch
> > doesn't implement that.
>
> With this patch, is it possible to remove a partition manually from a
> subscription, or will it just get automatically re-added at some
> point?

Hmm, I don't think there is any way (commands) to manually remove
tables from a subscription.  Testing shows that if you drop a table on
the subscription server that is currently being fed data via a
subscription, then a subscription worker will complain and quit if it
receives a row targeting the dropped table and workers that are
subsequently started will do the same thing.  Interestingly, this
behavior prevents replication for any other tables in the subscription
from proceeding, which seems unfortunate.

If you were asking if the patch extends the subscription side
functionality to re-add needed partitions that were manually removed
likely by accident, then no.

Thanks,
Amit



Re: adding partitioned tables to publications

From
Amit Langote
Date:
Hi Petr,

Thanks for your comments.

On Sun, Oct 13, 2019 at 5:01 AM Petr Jelinek <petr@2ndquadrant.com> wrote:
> On 07/10/2019 02:55, Amit Langote wrote:
> > One cannot currently add partitioned tables to a publication.
> >
> > create table p (a int, b int) partition by hash (a);
> > create table p1 partition of p for values with (modulus 3, remainder 0);
> > create table p2 partition of p for values with (modulus 3, remainder 1);
> > create table p3 partition of p for values with (modulus 3, remainder 2);
> >
> > create publication publish_p for table p;
> > ERROR:  "p" is a partitioned table
> > DETAIL:  Adding partitioned tables to publications is not supported.
> > HINT:  You can add the table partitions individually.
> >
> > One can do this instead:
> >
> > create publication publish_p1 for table p1;
> > create publication publish_p2 for table p2;
> > create publication publish_p3 for table p3;
>
> Or just create publication publish_p for table p1, p2, p3;

Yep, facepalm! :)

So, one doesn't really need as many publication objects as there are
partitions as my version suggests, which is good.  Although, as you
can tell, a user would still manually need to keep the set of
published partitions up to date, for example when new partitions are
added.

> > but maybe that's too much code to maintain for users.
> >
> > I propose that we make this command:
> >
> > create publication publish_p for table p;
> >
>
> +1
>
> > automatically add all the partitions to the publication.  Also, any
> > future partitions should also be automatically added to the
> > publication.  So, publishing a partitioned table automatically
> > publishes all of its existing and future partitions.  Attached patch
> > implements that.
> >
> > What doesn't change with this patch is that the partitions on the
> > subscription side still have to match one-to-one with the partitions
> > on the publication side, because the changes are still replicated as
> > being made to the individual partitions, not as the changes to the
> > root partitioned table.  It might be useful to implement that
> > functionality on the publication side, because it allows users to
> > define the replication target any way they need to, but this patch
> > doesn't implement that.
> >
>
> Yeah for that to work subscription would need to also need to be able to
> write to partitioned tables, so it needs both sides to add support for
> this.

Ah, I didn't know that the subscription code doesn't out-of-the-box
support tuple routing.  Indeed, we will need to fix that.

> I think if we do both what you did and the transparent handling of
> root only, we'll need new keyword to differentiate the two. It might
> make sense to think about if we want your way to need an extra keyword
> or the transparent one will need it.

I didn't think about that but maybe you are right.

> One issue that I see reading the patch is following set of commands:
>
> CREATE TABLE foo ...;
> CREATE PUBLICATION mypub FOR TABLE foo;
>
> CREATE TABLE bar ...;
> ALTER PUBLICATION mypub ADD TABLE bar;
>
> ALTER TABLE foo ATTACH PARTITION bar ...;
> ALTER TABLE foo DETACH PARTITION bar ...;
>
> This will end up with bar not being in any publication even though it
> was explicitly added.

I tested and bar continues to be in the publication with above steps:

create table foo (a int) partition by list (a);
create publication mypub for table foo;
create table bar (a int);
alter publication mypub add table bar;
\d bar
                Table "public.bar"
 Column │  Type   │ Collation │ Nullable │ Default
────────┼─────────┼───────────┼──────────┼─────────
 a      │ integer │           │          │
Publications:
    "mypub"

alter table foo attach partition bar for values in (1);
\d bar
                Table "public.bar"
 Column │  Type   │ Collation │ Nullable │ Default
────────┼─────────┼───────────┼──────────┼─────────
 a      │ integer │           │          │
Partition of: foo FOR VALUES IN (1)
Publications:
    "mypub"

-- can't now drop bar from mypub (its membership is no longer standalone)
alter publication mypub drop table bar;
ERROR:  cannot drop partition "bar" from an inherited publication
HINT:  Drop the parent from publication instead.

alter table foo detach partition bar;

-- bar is still in mypub (now a standalone member)
\d bar
                Table "public.bar"
 Column │  Type   │ Collation │ Nullable │ Default
────────┼─────────┼───────────┼──────────┼─────────
 a      │ integer │           │          │
Publications:
    "mypub"

-- ok to drop now from mypub
alter publication mypub drop table bar;

Thanks,
Amit



Re: adding partitioned tables to publications

From
Peter Eisentraut
Date:
This patch seems excessively complicated to me.  Why don't you just add 
the actual partitioned table to pg_publication_rel and then expand the 
partition hierarchy in pgoutput (get_rel_sync_entry() or 
GetRelationPublications() or somewhere around there).  Then you don't 
need to do any work in table DDL to keep the list of published tables up 
to date.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: adding partitioned tables to publications

From
Rafia Sabih
Date:

Hi Amit,

On Fri, 11 Oct 2019 at 08:06, Amit Langote <amitlangote09@gmail.com> wrote:

Thanks for sharing this case.  I hadn't considered it, but you're
right that it should be handled sensibly.  I have fixed table sync
code to handle this case properly.  Could you please check your case
with the attached updated patch?

I was checking this today and found that the behavior doesn't change much with the updated patch. The tables are still replicated, just that a select count from parent table shows 0, rest of the partitions including default one has the data from the publisher. I was expecting more like an error at subscriber saying the table type is not same. 

Please find the attached file for the test case, in case something is unclear.

--
Regards,
Rafia Sabih
Attachment

Re: adding partitioned tables to publications

From
Amit Langote
Date:
Sorry about the delay.

On Mon, Nov 4, 2019 at 8:00 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> This patch seems excessively complicated to me.  Why don't you just add
> the actual partitioned table to pg_publication_rel and then expand the
> partition hierarchy in pgoutput (get_rel_sync_entry() or
> GetRelationPublications() or somewhere around there).  Then you don't
> need to do any work in table DDL to keep the list of published tables up
> to date.

I tend to agree that having to manage this at the DDL level would be
bug-prone, not to mention pretty complicated code to implement it.

I have tried to implement it the way you suggested.  So every decoded
change to a leaf partition will now be published not only via its own
publication but also via publications of its ancestors if any.  That
irrespective of whether a row is directly inserted into the leaf
partition or got routed to it via insert done on an ancestor.  In this
implementation, the only pg_publication_rel entry is the one
corresponding to the partitioned table.

On the subscription side, when creating pg_subscription_rel entries,
for a publication containing a partitioned table, all of its
partitions too must be fetched as being included in the publication.
That is necessary, because the initial syncing copy and subsequently
received changes must be applied to individual partitions.  That could
be changed in the future by publishing leaf partition changes as
changes to the actually published partitioned table.  That future
implementation will also hopefully take care of the concern that Rafia
mentioned on this thread that even with this patch, one must make sure
that tables match one-to-one when they're in publish-subscribe
relationship, which actually needs us to bake in low-level details
like table's relkind in the protocol exchanges.

Anyway, I've attached two patches -- 0001 is a refactoring patch. 0002
implements the feature.

Thanks,
Amit

Attachment

Re: adding partitioned tables to publications

From
Amit Langote
Date:
Hello Rafia,

On Tue, Nov 5, 2019 at 12:41 AM Rafia Sabih <rafia.pghackers@gmail.com> wrote:
> On Fri, 11 Oct 2019 at 08:06, Amit Langote <amitlangote09@gmail.com> wrote:
>> Thanks for sharing this case.  I hadn't considered it, but you're
>> right that it should be handled sensibly.  I have fixed table sync
>> code to handle this case properly.  Could you please check your case
>> with the attached updated patch?
>>
> I was checking this today and found that the behavior doesn't change much with the updated patch. The tables are
stillreplicated, just that a select count from parent table shows 0, rest of the partitions including default one has
thedata from the publisher. I was expecting more like an error at subscriber saying the table type is not same. 
>
> Please find the attached file for the test case, in case something is unclear.

Thanks for the test case.

With the latest patch I posted, you'll get the following error on subscriber:

create subscription mysub connection 'host=localhost port=5432
dbname=postgres' publication mypub;
ERROR:  cannot use relation "public.t" as logical replication target
DETAIL:  "public.t" is a regular table on subscription side whereas a
partitioned table on publication side

Although to be honest, I'd rather not see the error.  As I mentioned
in my email earlier, it'd be nice to be able sync a partitioned table
and a regular table (or vice versa) via replication.

Thanks,
Amit



Re: adding partitioned tables to publications

From
Amit Langote
Date:
On Fri, Nov 8, 2019 at 1:27 PM Amit Langote <amitlangote09@gmail.com> wrote:
> Anyway, I've attached two patches -- 0001 is a refactoring patch. 0002
> implements the feature.

0002 didn't contain necessary pg_dump changes, which fixed in the
attached new version.

Thanks,
Amit

Attachment

Re: adding partitioned tables to publications

From
Peter Eisentraut
Date:
On 2019-11-11 08:59, Amit Langote wrote:
> On Fri, Nov 8, 2019 at 1:27 PM Amit Langote <amitlangote09@gmail.com> wrote:
>> Anyway, I've attached two patches -- 0001 is a refactoring patch. 0002
>> implements the feature.
> 
> 0002 didn't contain necessary pg_dump changes, which fixed in the
> attached new version.

That looks more pleasant.

I don't understand why you go through great lengths to ensure that the 
relkinds match between publisher and subscriber.  We already ensure that 
only regular tables are published and only regular tables are allowed as 
subscription target.  In the future, we may want to allow further 
combinations.  What situation are you trying to address here?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: adding partitioned tables to publications

From
Amit Langote
Date:
On Mon, Nov 11, 2019 at 9:49 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 2019-11-11 08:59, Amit Langote wrote:
> > On Fri, Nov 8, 2019 at 1:27 PM Amit Langote <amitlangote09@gmail.com> wrote:
> >> Anyway, I've attached two patches -- 0001 is a refactoring patch. 0002
> >> implements the feature.
> >
> > 0002 didn't contain necessary pg_dump changes, which fixed in the
> > attached new version.
>
> That looks more pleasant.

Thanks for looking.

> I don't understand why you go through great lengths to ensure that the
> relkinds match between publisher and subscriber.  We already ensure that
> only regular tables are published and only regular tables are allowed as
> subscription target.  In the future, we may want to allow further
> combinations.  What situation are you trying to address here?

I'd really want to see the requirement for relkinds to have to match
go away, but as you can see, this patch doesn't modify enough of
pgoutput.c and worker.c to make that possible.  Both the code for the
initital syncing and that for the subsequent real-time replication
assume that both source and target are regular tables.  So even if
partitioned tables can now be in a publication, they're never sent in
the protocol messages, only their leaf partitions are.  Initial
syncing code can be easily modified to support any combination of
source and target relations, but changes needed for real-time
replication seem non-trivial.  Do you think we should do that before
we can say partitioned tables support logical replication?

Thanks,
Amit



Re: adding partitioned tables to publications

From
Amit Langote
Date:
On Tue, Nov 12, 2019 at 10:11 AM Amit Langote <amitlangote09@gmail.com> wrote:
> Initial
> syncing code can be easily modified to support any combination of
> source and target relations, but changes needed for real-time
> replication seem non-trivial.

I have spent some time hacking on this.  With the attached updated
patch, adding a partitioned table to publication results in publishing
the inserts, updates, deletes of the table's leaf partitions as
inserts, updates, deletes of the table itself (it all happens inside
pgoutput).  So, the replication target table doesn't necessarily have
to be a partitioned table and even if it is partitioned its partitions
don't have to match one-to-one.

One restriction remains though: partitioned tables on a subscriber
can't accept updates and deletes, because we'd need to map those to
updates and deletes of their partitions, including handling a tuple
possibly moving from one partition to another during an update.

Also, I haven't added subscription tests yet.

Attached updated patch.  The previous division into a refactoring
patch and feature patch no longer made to sense to me, so there is
only one this time.

Thanks,
Amit

Attachment

Re: adding partitioned tables to publications

From
Peter Eisentraut
Date:
On 2019-11-18 09:53, Amit Langote wrote:
> I have spent some time hacking on this.  With the attached updated
> patch, adding a partitioned table to publication results in publishing
> the inserts, updates, deletes of the table's leaf partitions as
> inserts, updates, deletes of the table itself (it all happens inside
> pgoutput).  So, the replication target table doesn't necessarily have
> to be a partitioned table and even if it is partitioned its partitions
> don't have to match one-to-one.
> 
> One restriction remains though: partitioned tables on a subscriber
> can't accept updates and deletes, because we'd need to map those to
> updates and deletes of their partitions, including handling a tuple
> possibly moving from one partition to another during an update.

Right.  Without that second part, the first part isn't really that 
useful yet, is it?

I'm not sure what your intent with this patch is now.  I thought the 
previous behavior -- add a partitioned table to a publication and its 
leaf tables appear in the replication output -- was pretty welcome.  Do 
we not want that anymore?

There should probably be an option to pick the behavior, like we do in 
pg_dump.

What happens when you add a leaf table directly to a publication?  Is it 
replicated under its own identity or under its ancestor partitioned 
table?  (What if both the leaf table and a partitioned table are 
publication members?)

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: adding partitioned tables to publications

From
Peter Eisentraut
Date:
On 2019-11-12 02:11, Amit Langote wrote:
>> I don't understand why you go through great lengths to ensure that the
>> relkinds match between publisher and subscriber.  We already ensure that
>> only regular tables are published and only regular tables are allowed as
>> subscription target.  In the future, we may want to allow further
>> combinations.  What situation are you trying to address here?
> I'd really want to see the requirement for relkinds to have to match
> go away, but as you can see, this patch doesn't modify enough of
> pgoutput.c and worker.c to make that possible.  Both the code for the
> initital syncing and that for the subsequent real-time replication
> assume that both source and target are regular tables.  So even if
> partitioned tables can now be in a publication, they're never sent in
> the protocol messages, only their leaf partitions are.  Initial
> syncing code can be easily modified to support any combination of
> source and target relations, but changes needed for real-time
> replication seem non-trivial.  Do you think we should do that before
> we can say partitioned tables support logical replication?

My question was more simply why you have this check:

+   /*
+    * Cannot replicate from a regular to a partitioned table or vice
+    * versa.
+    */
+   if (local_relkind != pt->relkind)
+       ereport(ERROR,
+               (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+                errmsg("cannot use relation \"%s.%s\" as logical 
replication target",
+                       rv->schemaname, rv->relname),

It doesn't seem necessary.  What happens if you remove it?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: adding partitioned tables to publications

From
Amit Langote
Date:
Hi Peter,

On Wed, Nov 20, 2019 at 4:55 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 2019-11-18 09:53, Amit Langote wrote:
> > I have spent some time hacking on this.  With the attached updated
> > patch, adding a partitioned table to publication results in publishing
> > the inserts, updates, deletes of the table's leaf partitions as
> > inserts, updates, deletes of the table itself (it all happens inside
> > pgoutput).  So, the replication target table doesn't necessarily have
> > to be a partitioned table and even if it is partitioned its partitions
> > don't have to match one-to-one.
> >
> > One restriction remains though: partitioned tables on a subscriber
> > can't accept updates and deletes, because we'd need to map those to
> > updates and deletes of their partitions, including handling a tuple
> > possibly moving from one partition to another during an update.
>
> Right.  Without that second part, the first part isn't really that
> useful yet, is it?

I would say yes.

> I'm not sure what your intent with this patch is now.  I thought the
> previous behavior -- add a partitioned table to a publication and its
> leaf tables appear in the replication output -- was pretty welcome.  Do
> we not want that anymore?

Hmm, I thought it would be more desirable to not expose a published
partitioned table's leaf partitions to a subscriber, because it allows
the target table to be defined more flexibly.

> There should probably be an option to pick the behavior, like we do in
> pg_dump.

I don't understand which existing behavior.  Can you clarify?

Regarding allowing users to choose between publishing partitioned
table changes using leaf tables' schema vs as using own schema, I tend
to agree that there would be value in that.  Users who choose the
former will have to ensure that target leaf partitions match exactly.
Users who want flexibility in how the target table is defined can use
the latter.

> What happens when you add a leaf table directly to a publication?  Is it
> replicated under its own identity or under its ancestor partitioned
> table?  (What if both the leaf table and a partitioned table are
> publication members?)

If both a leaf partition and an ancestor belong to the same
publication, then leaf partition changes are replicated using the
ancestor's schema.  For a leaf partition to be replicated using its
own schema it must be published via a separate publication that
doesn't contain the ancestor.  At least that's what the current patch
does.

Thanks,
Amit



Re: adding partitioned tables to publications

From
Peter Eisentraut
Date:
On 2019-11-22 07:28, Amit Langote wrote:
> Hmm, I thought it would be more desirable to not expose a published
> partitioned table's leaf partitions to a subscriber, because it allows
> the target table to be defined more flexibly.

There are multiple different variants that we probably eventually want 
to support.  But I think there is value in exposing the partition 
structure to the subscriber.  Most notably, it allows the subscriber to 
run the initial table sync per partition rather than in one big chunk -- 
which ultimately reflects one of the reasons partitioning exists.

The other way, exposing only the partitioned table, is also useful, 
especially if you want to partition differently on the subscriber.  But 
without the ability to target a partitioned table on the subscriber, 
this would right now only allow you to replicate a partitioned table 
into a non-partitioned table.  Which is valid but probably not often useful.

>> What happens when you add a leaf table directly to a publication?  Is it
>> replicated under its own identity or under its ancestor partitioned
>> table?  (What if both the leaf table and a partitioned table are
>> publication members?)
> 
> If both a leaf partition and an ancestor belong to the same
> publication, then leaf partition changes are replicated using the
> ancestor's schema.  For a leaf partition to be replicated using its
> own schema it must be published via a separate publication that
> doesn't contain the ancestor.  At least that's what the current patch
> does.

Hmm, that seems confusing.  This would mean that if you add a 
partitioned table to a publication that already contains leaf tables, 
the publication behavior of the leaf tables would change.  So again, I 
think this alternative behavior of publishing partitions under the name 
of their root table should be an explicit option on a publication, and 
then it should be ensured somehow that individual partitions are not 
added to the publication in confusing ways.

So, it's up to you which aspect of this you want to tackle, but I 
thought your original goal of being able to add partitioned tables to 
publications and have that implicitly expand to all member partitions on 
the publication side seemed quite useful, self-contained, and 
uncontroversial.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: adding partitioned tables to publications

From
Amit Kapila
Date:
On Fri, Nov 22, 2019 at 4:16 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
>
> On 2019-11-22 07:28, Amit Langote wrote:
>
> >> What happens when you add a leaf table directly to a publication?  Is it
> >> replicated under its own identity or under its ancestor partitioned
> >> table?  (What if both the leaf table and a partitioned table are
> >> publication members?)
> >
> > If both a leaf partition and an ancestor belong to the same
> > publication, then leaf partition changes are replicated using the
> > ancestor's schema.  For a leaf partition to be replicated using its
> > own schema it must be published via a separate publication that
> > doesn't contain the ancestor.  At least that's what the current patch
> > does.
>
> Hmm, that seems confusing.  This would mean that if you add a
> partitioned table to a publication that already contains leaf tables,
> the publication behavior of the leaf tables would change.  So again, I
> think this alternative behavior of publishing partitions under the name
> of their root table should be an explicit option on a publication, and
> then it should be ensured somehow that individual partitions are not
> added to the publication in confusing ways.
>

Yeah, it can probably detect and throw an error for such cases.

> So, it's up to you which aspect of this you want to tackle, but I
> thought your original goal of being able to add partitioned tables to
> publications and have that implicitly expand to all member partitions on
> the publication side seemed quite useful, self-contained, and
> uncontroversial.
>

+1.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: adding partitioned tables to publications

From
Amit Langote
Date:
On Fri, Nov 22, 2019 at 7:46 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 2019-11-22 07:28, Amit Langote wrote:
> > Hmm, I thought it would be more desirable to not expose a published
> > partitioned table's leaf partitions to a subscriber, because it allows
> > the target table to be defined more flexibly.
>
> There are multiple different variants that we probably eventually want
> to support.  But I think there is value in exposing the partition
> structure to the subscriber.  Most notably, it allows the subscriber to
> run the initial table sync per partition rather than in one big chunk --
> which ultimately reflects one of the reasons partitioning exists.

I agree that replicating leaf-to-leaf has the least overhead.

> The other way, exposing only the partitioned table, is also useful,
> especially if you want to partition differently on the subscriber.  But
> without the ability to target a partitioned table on the subscriber,
> this would right now only allow you to replicate a partitioned table
> into a non-partitioned table.  Which is valid but probably not often useful.

Handling non-partitioned target tables was the main reason for me to
make publishing using the root parent's schema the default behavior.
But given that replicating from partitioned tables into
non-partitioned ones would be rare, I agree to replicating using the
leaf schema by default.

> >> What happens when you add a leaf table directly to a publication?  Is it
> >> replicated under its own identity or under its ancestor partitioned
> >> table?  (What if both the leaf table and a partitioned table are
> >> publication members?)
> >
> > If both a leaf partition and an ancestor belong to the same
> > publication, then leaf partition changes are replicated using the
> > ancestor's schema.  For a leaf partition to be replicated using its
> > own schema it must be published via a separate publication that
> > doesn't contain the ancestor.  At least that's what the current patch
> > does.
>
> Hmm, that seems confusing.  This would mean that if you add a
> partitioned table to a publication that already contains leaf tables,
> the publication behavior of the leaf tables would change.  So again, I
> think this alternative behavior of publishing partitions under the name
> of their root table should be an explicit option on a publication, and
> then it should be ensured somehow that individual partitions are not
> added to the publication in confusing ways.
>
> So, it's up to you which aspect of this you want to tackle, but I
> thought your original goal of being able to add partitioned tables to
> publications and have that implicitly expand to all member partitions on
> the publication side seemed quite useful, self-contained, and
> uncontroversial.

OK, let's make whether to publish with root or leaf schema an option,
with the latter being the default.  I will see about updating the
patch that way.

Thanks,
Amit



Re: adding partitioned tables to publications

From
Amit Langote
Date:
On Mon, Nov 25, 2019 at 6:37 PM Amit Langote <amitlangote09@gmail.com> wrote:
> OK, let's make whether to publish with root or leaf schema an option,
> with the latter being the default.  I will see about updating the
> patch that way.

Here are the updated patches.

0001:  Adding a partitioned table to a publication implicitly adds all
its partitions.  The receiving side must have tables matching the
published partitions, which is typically the case, because the same
partition tree is defined on both nodes.

0002:  Add a new Boolean publication parameter
'publish_using_root_schema'.  If true, a partitioned table's
partitions are not exposed to the subscriber, that is, changes of its
partitions are published as its own.  This allows to replicate
partitioned table changes to a non-partitioned table (seldom useful)
or to a partitioned table that has different set of partitions than on
the publisher (a reasonable use case).  This patch only adds the
parameter and doesn't implement any of that behavior.

0003: A refactoring patch for worker.c to allow handling partitioned
tables as targets of logical of replication commands a bit easier.

0004: This implements the 'publish_using_root_schema = true' behavior
described above.  (An unintended benefit of making partitioned tables
an accepted relation type in worker.c is that it allows partitions on
subscriber to be sub-partitioned even if they are not on the
publisher, that is, when replicating partition-to-partition!)

Thanks,
Amit

Attachment

Re: adding partitioned tables to publications

From
Peter Eisentraut
Date:
On 2019-12-06 08:48, Amit Langote wrote:
> 0001:  Adding a partitioned table to a publication implicitly adds all
> its partitions.  The receiving side must have tables matching the
> published partitions, which is typically the case, because the same
> partition tree is defined on both nodes.

This looks pretty good to me now.  But you need to make all the changed 
queries version-aware so that you can still replicate from and to older 
versions.  (For example, pg_partition_tree is not very old.)

This part looks a bit fishy:

+       /*
+        * If either table is partitioned, skip copying.  Individual 
partitions
+        * will be copied instead.
+        */
+       if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ||
+               remote_relkind == RELKIND_PARTITIONED_TABLE)
+       {
+               logicalrep_rel_close(relmapentry, NoLock);
+               return;
+       }

I don't think you want to filter out a partitioned table on the local 
side, since (a) COPY can handle that, and (b) it's (as of this patch) an 
error to have a partitioned table in the subscription table set.

I'm not a fan of the new ValidateSubscriptionRel() function.  It's too 
obscure, especially the return value.  Doesn't seem worth it.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: adding partitioned tables to publications

From
Amit Langote
Date:
Thanks for checking.

On Thu, Dec 12, 2019 at 12:48 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 2019-12-06 08:48, Amit Langote wrote:
> > 0001:  Adding a partitioned table to a publication implicitly adds all
> > its partitions.  The receiving side must have tables matching the
> > published partitions, which is typically the case, because the same
> > partition tree is defined on both nodes.
>
> This looks pretty good to me now.  But you need to make all the changed
> queries version-aware so that you can still replicate from and to older
> versions.  (For example, pg_partition_tree is not very old.)

True, fixed that.

> This part looks a bit fishy:
>
> +       /*
> +        * If either table is partitioned, skip copying.  Individual
> partitions
> +        * will be copied instead.
> +        */
> +       if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ||
> +               remote_relkind == RELKIND_PARTITIONED_TABLE)
> +       {
> +               logicalrep_rel_close(relmapentry, NoLock);
> +               return;
> +       }
>
> I don't think you want to filter out a partitioned table on the local
> side, since (a) COPY can handle that, and (b) it's (as of this patch) an
> error to have a partitioned table in the subscription table set.

Yeah, (b) is true, so copy_table() should only ever see regular tables
with only patch 0001 applied.

> I'm not a fan of the new ValidateSubscriptionRel() function.  It's too
> obscure, especially the return value.  Doesn't seem worth it.

It went through many variants since I first introduced it, but yeah I
agree we don't need it if only because of the weird interface.

It occurred to me that, *as of 0001*, we should indeed disallow
replicating from a regular table on publisher node into a partitioned
table of the same name on subscriber node (as the earlier patches
did), because 0001 doesn't implement tuple routing support that would
be needed to apply such changes.

Attached updated patches.

Thanks,
Amit

Attachment

Re: adding partitioned tables to publications

From
Dilip Kumar
Date:
On Mon, Dec 16, 2019 at 2:50 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> Thanks for checking.
>
> On Thu, Dec 12, 2019 at 12:48 AM Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
> > On 2019-12-06 08:48, Amit Langote wrote:
> > > 0001:  Adding a partitioned table to a publication implicitly adds all
> > > its partitions.  The receiving side must have tables matching the
> > > published partitions, which is typically the case, because the same
> > > partition tree is defined on both nodes.
> >
> > This looks pretty good to me now.  But you need to make all the changed
> > queries version-aware so that you can still replicate from and to older
> > versions.  (For example, pg_partition_tree is not very old.)
>
> True, fixed that.
>
> > This part looks a bit fishy:
> >
> > +       /*
> > +        * If either table is partitioned, skip copying.  Individual
> > partitions
> > +        * will be copied instead.
> > +        */
> > +       if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ||
> > +               remote_relkind == RELKIND_PARTITIONED_TABLE)
> > +       {
> > +               logicalrep_rel_close(relmapentry, NoLock);
> > +               return;
> > +       }
> >
> > I don't think you want to filter out a partitioned table on the local
> > side, since (a) COPY can handle that, and (b) it's (as of this patch) an
> > error to have a partitioned table in the subscription table set.
>
> Yeah, (b) is true, so copy_table() should only ever see regular tables
> with only patch 0001 applied.
>
> > I'm not a fan of the new ValidateSubscriptionRel() function.  It's too
> > obscure, especially the return value.  Doesn't seem worth it.
>
> It went through many variants since I first introduced it, but yeah I
> agree we don't need it if only because of the weird interface.
>
> It occurred to me that, *as of 0001*, we should indeed disallow
> replicating from a regular table on publisher node into a partitioned
> table of the same name on subscriber node (as the earlier patches
> did), because 0001 doesn't implement tuple routing support that would
> be needed to apply such changes.
>
> Attached updated patches.
>
I am planning to review this patch.  Currently, it is not applying on
the head so can you rebase it?


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: adding partitioned tables to publications

From
Rafia Sabih
Date:
Hi Amit,

I went through this patch set once again today and here are my two cents.

On Mon, 16 Dec 2019 at 10:19, Amit Langote <amitlangote09@gmail.com> wrote:
>
> Attached updated patches.
-     differently partitioned setup.  Attempts to replicate tables other than
-     base tables will result in an error.
+     Replication is only supported by regular and partitioned tables, although
+     the table kind must match between the two servers, that is, one cannot

I find the phrase 'table kind' a bit odd, how about something like
type of the table.

/* Only plain tables can be aded to publications. */
- if (tbinfo->relkind != RELKIND_RELATION)
+ /* Only plain and partitioned tables can be added to publications. */
IMHO using regular instead of plain would be more consistent.

+ /*
+ * Find a partition for the tuple contained in remoteslot.
+ *
+ * For insert, remoteslot is tuple to insert.  For update and delete, it
+ * is the tuple to be replaced and deleted, repectively.
+ */
Misspelled 'respectively'.

+static void
+apply_handle_tuple_routing(ResultRelInfo *relinfo,
+    LogicalRepRelMapEntry *relmapentry,
+    EState *estate, CmdType operation,
+    TupleTableSlot *remoteslot,
+    LogicalRepTupleData *newtup)
+{
+ Relation rel = relinfo->ri_RelationDesc;
+ ModifyTableState *mtstate = NULL;
+ PartitionTupleRouting *proute = NULL;
+ ResultRelInfo *partrelinfo,
+    *partrelinfo1;

IMHO, partrelinfo1 can be better named to improve readability.

Otherwise, as Dilip already mentioned, there is a rebase required
particularly for 0003 and 0004.

-- 
Regards,
Rafia Sabih



Re: adding partitioned tables to publications

From
Amit Langote
Date:
On Mon, Jan 6, 2020 at 8:25 PM Rafia Sabih <rafia.pghackers@gmail.com> wrote:
> Hi Amit,
>
> I went through this patch set once again today and here are my two cents.

Thanks Rafia.

Rebased and updated to address your comments.

Regards,
Amit

Attachment

Re: adding partitioned tables to publications

From
Rafia Sabih
Date:
On Tue, 7 Jan 2020 at 06:02, Amit Langote <amitlangote09@gmail.com> wrote:

> Rebased and updated to address your comments.
>
+  <para>
+   Partitioned tables are not considered when <literal>FOR ALL TABLES</literal>
+   is specified.
+  </para>
+
What is the reason for above, I mean not for the comment but not
including partitioned tables in for all tables options.


-- 
Regards,
Rafia Sabih



Re: adding partitioned tables to publications

From
Peter Eisentraut
Date:
On 2020-01-07 06:01, Amit Langote wrote:
> On Mon, Jan 6, 2020 at 8:25 PM Rafia Sabih <rafia.pghackers@gmail.com> wrote:
>> Hi Amit,
>>
>> I went through this patch set once again today and here are my two cents.
> 
> Thanks Rafia.
> 
> Rebased and updated to address your comments.

Looking through 0001, I think perhaps there is a better way to structure 
some of the API changes.

Instead of passing the root_target_rel to CheckValidResultRel() and 
CheckCmdReplicaIdentity(), which we only need to check the publication 
actions of the root table, how about changing 
GetRelationPublicationActions() to automatically include the publication 
information of the root table.  Then we have that information in the 
relcache once and don't need to check the base table and the partition 
root separately at each call site (of which there is only one right 
now).  (Would that work correctly with relcache invalidation?)

Similarly, couldn't GetRelationPublications() just automatically take 
partitioning into account?  We don't need the separation between 
GetRelationPublications() and GetRelationAncestorPublications().  This 
would also avoid errors of omission, for example the 
GetRelationPublications() call in ATPrepChangePersistence() doesn't take 
GetRelationAncestorPublications() into account.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: adding partitioned tables to publications

From
Peter Eisentraut
Date:
On 2020-01-07 15:18, Rafia Sabih wrote:
> On Tue, 7 Jan 2020 at 06:02, Amit Langote <amitlangote09@gmail.com> wrote:
> 
>> Rebased and updated to address your comments.
>>
> +  <para>
> +   Partitioned tables are not considered when <literal>FOR ALL TABLES</literal>
> +   is specified.
> +  </para>
> +
> What is the reason for above, I mean not for the comment but not
> including partitioned tables in for all tables options.

This comment is kind of a noop, because the leaf partitions are already 
included in FOR ALL TABLES, so whether partitioned tables are considered 
included in FOR ALL TABLES is irrelevant.  I suggest removing the 
comment to avoid any confusion.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: adding partitioned tables to publications

From
Amit Langote
Date:
On Wed, Jan 8, 2020 at 7:57 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 2020-01-07 15:18, Rafia Sabih wrote:
> > On Tue, 7 Jan 2020 at 06:02, Amit Langote <amitlangote09@gmail.com> wrote:
> >
> >> Rebased and updated to address your comments.
> >>
> > +  <para>
> > +   Partitioned tables are not considered when <literal>FOR ALL TABLES</literal>
> > +   is specified.
> > +  </para>
> > +
> > What is the reason for above, I mean not for the comment but not
> > including partitioned tables in for all tables options.
>
> This comment is kind of a noop, because the leaf partitions are already
> included in FOR ALL TABLES, so whether partitioned tables are considered
> included in FOR ALL TABLES is irrelevant.  I suggest removing the
> comment to avoid any confusion.

I agree.  I had written that comment considering the other feature
where the changes are published as root table's, but even in that case
it'd be wrong to do what it says -- partitioned tables *should* be
included in that case.

I will fix the patches accordingly.

Thanks,
Amit



Re: adding partitioned tables to publications

From
Amit Langote
Date:
Hi Peter
,
Thanks for the review and sorry it took me a while to get back.

On Wed, Jan 8, 2020 at 7:54 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> Looking through 0001, I think perhaps there is a better way to structure
> some of the API changes.
>
> Instead of passing the root_target_rel to CheckValidResultRel() and
> CheckCmdReplicaIdentity(), which we only need to check the publication
> actions of the root table, how about changing
> GetRelationPublicationActions() to automatically include the publication
> information of the root table.  Then we have that information in the
> relcache once and don't need to check the base table and the partition
> root separately at each call site (of which there is only one right
> now).  (Would that work correctly with relcache invalidation?)
>
> Similarly, couldn't GetRelationPublications() just automatically take
> partitioning into account?  We don't need the separation between
> GetRelationPublications() and GetRelationAncestorPublications().  This
> would also avoid errors of omission, for example the
> GetRelationPublications() call in ATPrepChangePersistence() doesn't take
> GetRelationAncestorPublications() into account.

I have addressed these comments in the attached updated patch.

Other than that, the updated patch contains following significant changes:

* Changed pg_publication.c: GetPublicationRelations() so that any
published partitioned tables are expanded as needed

* Since the pg_publication_tables view is backed by
GetPublicationRelations(), that means subscriptioncmds.c:
fetch_table_list() no longer needs to craft a query to include
partitions when needed, because partitions are included at source.
That seems better, because it allows to limit the complexity
surrounding publication of partitioned tables to the publication side.

* Fixed the publication table DDL to spot more cases of tables being
added to a publication in a duplicative manner.  For example,
partition being added to a publication which already contains its
ancestor and a partitioned tables being added to a publication
(implying all of its partitions are added) which already contains a
partition

Only attaching 0001.  Will send the rest after polishing them a bit more.

Thanks,
Amit

Attachment

Re: adding partitioned tables to publications

From
Amit Langote
Date:
On Wed, Jan 22, 2020 at 2:38 PM Amit Langote <amitlangote09@gmail.com> wrote:
> Other than that, the updated patch contains following significant changes:
>
> * Changed pg_publication.c: GetPublicationRelations() so that any
> published partitioned tables are expanded as needed
>
> * Since the pg_publication_tables view is backed by
> GetPublicationRelations(), that means subscriptioncmds.c:
> fetch_table_list() no longer needs to craft a query to include
> partitions when needed, because partitions are included at source.
> That seems better, because it allows to limit the complexity
> surrounding publication of partitioned tables to the publication side.
>
> * Fixed the publication table DDL to spot more cases of tables being
> added to a publication in a duplicative manner.  For example,
> partition being added to a publication which already contains its
> ancestor and a partitioned tables being added to a publication
> (implying all of its partitions are added) which already contains a
> partition

On second thought, this seems like an overkill.  It might be OK after
all for both a partitioned table and its partitions to be explicitly
added to a publication without complaining of duplication. IOW, it's
the user's call whether it makes sense to do that or not.

> Only attaching 0001.

Attached updated 0001 considering the above and the rest of the
patches that add support for replicating partitioned tables using
their own identity and schema.  I have reorganized the other patches
as follows:

0002: refactoring of logical/worker.c without any functionality
changes (contains much less churn than in earlier versions)

0003: support logical replication into partitioned tables on the
subscription side (allows replicating from a non-partitioned table on
publisher node into a partitioned table on subscriber node)

0004: support optionally replicating partitioned table changes (and
changes directly made to partitions) using root partitioned table
identity and schema

Thanks,
Amit

Attachment

Re: adding partitioned tables to publications

From
Peter Eisentraut
Date:
On 2020-01-23 11:10, Amit Langote wrote:
> On Wed, Jan 22, 2020 at 2:38 PM Amit Langote<amitlangote09@gmail.com>  wrote:
>> Other than that, the updated patch contains following significant changes:
>>
>> * Changed pg_publication.c: GetPublicationRelations() so that any
>> published partitioned tables are expanded as needed
>>
>> * Since the pg_publication_tables view is backed by
>> GetPublicationRelations(), that means subscriptioncmds.c:
>> fetch_table_list() no longer needs to craft a query to include
>> partitions when needed, because partitions are included at source.
>> That seems better, because it allows to limit the complexity
>> surrounding publication of partitioned tables to the publication side.
>>
>> * Fixed the publication table DDL to spot more cases of tables being
>> added to a publication in a duplicative manner.  For example,
>> partition being added to a publication which already contains its
>> ancestor and a partitioned tables being added to a publication
>> (implying all of its partitions are added) which already contains a
>> partition
> On second thought, this seems like an overkill.  It might be OK after
> all for both a partitioned table and its partitions to be explicitly
> added to a publication without complaining of duplication. IOW, it's
> the user's call whether it makes sense to do that or not.

This structure looks good now.

However, it does seem unfortunate that in pg_get_publication_tables() we 
need to postprocess the result of GetPublicationRelations().  Since 
we're already changing the API of GetPublicationRelations(), couldn't we 
also make it optionally not include partitioned tables?

For the test, perhaps add test cases where partitions are attached and 
detached so that we can see whether their publication relcache 
information is properly updated.  (I'm not doubting that it works, but 
it would be good to have a test for, in case of future restructuring.)

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: adding partitioned tables to publications

From
Rafia Sabih
Date:
Hi Amit,

Once again I went through this patch set and here are my few comments,

On Thu, 23 Jan 2020 at 11:10, Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Wed, Jan 22, 2020 at 2:38 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > Other than that, the updated patch contains following significant changes:
> >
> > * Changed pg_publication.c: GetPublicationRelations() so that any
> > published partitioned tables are expanded as needed
> >
> > * Since the pg_publication_tables view is backed by
> > GetPublicationRelations(), that means subscriptioncmds.c:
> > fetch_table_list() no longer needs to craft a query to include
> > partitions when needed, because partitions are included at source.
> > That seems better, because it allows to limit the complexity
> > surrounding publication of partitioned tables to the publication side.
> >
> > * Fixed the publication table DDL to spot more cases of tables being
> > added to a publication in a duplicative manner.  For example,
> > partition being added to a publication which already contains its
> > ancestor and a partitioned tables being added to a publication
> > (implying all of its partitions are added) which already contains a
> > partition
>
> On second thought, this seems like an overkill.  It might be OK after
> all for both a partitioned table and its partitions to be explicitly
> added to a publication without complaining of duplication. IOW, it's
> the user's call whether it makes sense to do that or not.
>
> > Only attaching 0001.
>
> Attached updated 0001 considering the above and the rest of the
> patches that add support for replicating partitioned tables using
> their own identity and schema.  I have reorganized the other patches
> as follows:
>
> 0002: refactoring of logical/worker.c without any functionality
> changes (contains much less churn than in earlier versions)
>
> 0003: support logical replication into partitioned tables on the
> subscription side (allows replicating from a non-partitioned table on
> publisher node into a partitioned table on subscriber node)
>
> 0004: support optionally replicating partitioned table changes (and
> changes directly made to partitions) using root partitioned table
> identity and schema

+     cannot replicate from a regular table into a partitioned able or vice
Here is a missing t from table.

+     <para>
+      When a partitioned table is added to a publication, all of its existing
+      and future partitions are also implicitly considered to be part of the
+      publication.  So, even operations that are performed directly on a
+      partition are also published via its ancestors' publications.

Now this is confusing, does it mean that when partitions are later
added to the table they will be replicated too, I think not, because
you need to first create them manually at the replication side, isn't
it...?

+ /* Must be a regular or partitioned table */
+ if (RelationGetForm(targetrel)->relkind != RELKIND_RELATION &&
+ RelationGetForm(targetrel)->relkind != RELKIND_PARTITIONED_TABLE)
  ereport(ERROR,
  (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  errmsg("\"%s\" is not a table",

IMHO the error message and details should be modified here to
something along the lines of 'is neither a regular or partitioned
table'

+ * published via an ancestor and when a partitioned tables's partitions
tables's --> tables'

+ if (get_rel_relispartition(relid))
+ {
+ List    *ancestors = get_partition_ancestors(relid);

Now, this is just for my understanding, why the ancestors have to be a
list, I always assumed that a partition could only have one ancestor
-- the root table. Is there something more to it that I am totally
missing here or is it to cover the scenario of having partitions of
partitions.

Here I also want to clarify one thing, does it also happen like if a
partitioned table is dropped from a publication then all its
partitions are also implicitly dropped? As far as my understanding
goes that doesn't happen, so shouldn't there be some notice about it.

-GetPublicationRelations(Oid pubid)
+GetPublicationRelations(Oid pubid, bool include_partitions)

How about having an enum here with INCLUDE_PARTITIONS,
INCLUDE_PARTITIONED_REL, and SKIP_PARTITIONS to address the three
possibilities and avoiding reiterating through the list in
pg_get_publication_tables().

-- 
Regards,
Rafia Sabih



Re: adding partitioned tables to publications

From
Amit Langote
Date:
On Tue, Jan 28, 2020 at 6:11 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> This structure looks good now.

Thanks for taking a look.

> However, it does seem unfortunate that in pg_get_publication_tables() we
> need to postprocess the result of GetPublicationRelations().  Since
> we're already changing the API of GetPublicationRelations(), couldn't we
> also make it optionally not include partitioned tables?

Hmm, okay.  We really need GetPublicationRelations() to handle
partitioned tables in 3 ways:

1. Don't expand and return them as-is
2. Expand and return only leaf partitions
3. Expand and return all partitions

I will try that in the new patch.

> For the test, perhaps add test cases where partitions are attached and
> detached so that we can see whether their publication relcache
> information is properly updated.  (I'm not doubting that it works, but
> it would be good to have a test for, in case of future restructuring.)

Okay, I will add some to publication.sql.

Will send updated patches after addressing Rafia's comments.

Thanks,
Amit



Re: adding partitioned tables to publications

From
Amit Langote
Date:
Thank Rafia for the review.

On Wed, Jan 29, 2020 at 3:55 PM Rafia Sabih <rafia.pghackers@gmail.com> wrote:
> On Thu, 23 Jan 2020 at 11:10, Amit Langote <amitlangote09@gmail.com> wrote:
> > v10 patches
> +     cannot replicate from a regular table into a partitioned able or vice
> Here is a missing t from table.

Oops, fixed.

> +     <para>
> +      When a partitioned table is added to a publication, all of its existing
> +      and future partitions are also implicitly considered to be part of the
> +      publication.  So, even operations that are performed directly on a
> +      partition are also published via its ancestors' publications.
>
> Now this is confusing, does it mean that when partitions are later
> added to the table they will be replicated too, I think not, because
> you need to first create them manually at the replication side, isn't
> it...?

Yes, it's upon the user to make sure that they have set up the
partitions correctly on the subscriber.  I don't see how that's very
different from what needs to be done when tables are added to a
publication after-the-fact.  Did I misunderstand you?

> + /* Must be a regular or partitioned table */
> + if (RelationGetForm(targetrel)->relkind != RELKIND_RELATION &&
> + RelationGetForm(targetrel)->relkind != RELKIND_PARTITIONED_TABLE)
>   ereport(ERROR,
>   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>   errmsg("\"%s\" is not a table",
>
> IMHO the error message and details should be modified here to
> something along the lines of 'is neither a regular or partitioned
> table'

Hmm, this is simply following a convention that's used in most places
around the code, although I'm not really a fan of these "not a
<whatever>"-style messages to begin with.  It's less ambiguous with a
"cannot perform <action> on <relkind>"-style message, which some
places already use.

In that view, I have changed the documentation too to say this:

+     Replication is only supported by tables, partitioned or not, although a
+     given table must either be partitioned on both servers or not partitioned
+     at all.  Also, when replicating between partitioned tables, the actual
+     replication occurs between leaf partitions, so partitions on the two
+     servers must match one-to-one.

In retrospect, the confusion surrounding how we communicate the
various operations and properties that cannot be supported on a table
if partitioned, both in the error messages and the documentation,
could have been avoided if it wasn't based on relkind. I guess it's
too late now though. :(

> + * published via an ancestor and when a partitioned tables's partitions
> tables's --> tables'
>
> + if (get_rel_relispartition(relid))
> + {
> + List    *ancestors = get_partition_ancestors(relid);
>
> Now, this is just for my understanding, why the ancestors have to be a
> list, I always assumed that a partition could only have one ancestor
> -- the root table. Is there something more to it that I am totally
> missing here or is it to cover the scenario of having partitions of
> partitions.

Yes, with multi-level partitioning.

> Here I also want to clarify one thing, does it also happen like if a
> partitioned table is dropped from a publication then all its
> partitions are also implicitly dropped? As far as my understanding
> goes that doesn't happen, so shouldn't there be some notice about it.

Actually, that is what happens, unless partitions were explicitly
added to the publication, in which case they will continue to be
published.

> -GetPublicationRelations(Oid pubid)
> +GetPublicationRelations(Oid pubid, bool include_partitions)
>
> How about having an enum here with INCLUDE_PARTITIONS,
> INCLUDE_PARTITIONED_REL, and SKIP_PARTITIONS to address the three
> possibilities and avoiding reiterating through the list in
> pg_get_publication_tables().

I have done something similar in the updated patch, as I mentioned in
my earlier reply.

Please check the updated patches.

Thanks,
Amit

Attachment

Re: adding partitioned tables to publications

From
Peter Eisentraut
Date:
I have committed the 0001 patch of this series (partitioned table member 
of publication).  I changed the new argument of 
GetPublicationRelations() to an enum and reformatted some comments. 
I'll continue looking through the subsequent patches.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: adding partitioned tables to publications

From
Amit Langote
Date:
On Tue, Mar 10, 2020 at 5:52 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> I have committed the 0001 patch of this series (partitioned table member
> of publication).  I changed the new argument of
> GetPublicationRelations() to an enum and reformatted some comments.
> I'll continue looking through the subsequent patches.

Thank you.

- Amit



Re: adding partitioned tables to publications

From
Peter Eisentraut
Date:
I was trying to extract some preparatory work from the remaining patches 
and came up with the attached.  This is part of your patch 0003, but 
also relevant for part 0004.  The problem was that COPY (SELECT *) is 
not sufficient when the table has generated columns, so we need to build 
the column list explicitly.

Thoughts?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: adding partitioned tables to publications

From
Amit Langote
Date:
Hi Peter,

On Mon, Mar 16, 2020 at 9:49 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
>
> I was trying to extract some preparatory work from the remaining patches
> and came up with the attached.  This is part of your patch 0003, but
> also relevant for part 0004.  The problem was that COPY (SELECT *) is
> not sufficient when the table has generated columns, so we need to build
> the column list explicitly.
>
> Thoughts?

Thank you for that.

+   if (isnull || !remote_is_publishable)
+       ereport(ERROR,
+               (errmsg("table \"%s.%s\" on the publisher is not publishable",
+                       nspname, relname)));

Maybe add a one-line comment above this to say it's an "not supposed
to happen" error or am I missing something?  Wouldn't elog() suffice
for this?

Other than that, looks good.

--
Thank you,
Amit



Re: adding partitioned tables to publications

From
Amit Langote
Date:
On Wed, Mar 18, 2020 at 12:06 PM Amit Langote <amitlangote09@gmail.com> wrote:
> Hi Peter,
>
> On Mon, Mar 16, 2020 at 9:49 PM Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
> >
> > I was trying to extract some preparatory work from the remaining patches
> > and came up with the attached.  This is part of your patch 0003, but
> > also relevant for part 0004.  The problem was that COPY (SELECT *) is
> > not sufficient when the table has generated columns, so we need to build
> > the column list explicitly.
> >
> > Thoughts?
>
> Thank you for that.
>
> +   if (isnull || !remote_is_publishable)
> +       ereport(ERROR,
> +               (errmsg("table \"%s.%s\" on the publisher is not publishable",
> +                       nspname, relname)));
>
> Maybe add a one-line comment above this to say it's an "not supposed
> to happen" error or am I missing something?  Wouldn't elog() suffice
> for this?
>
> Other than that, looks good.

Wait, the following Assert in copy_table() should now be gone:

    Assert(relmapentry->localrel->rd_rel->relkind == RELKIND_RELATION);

because just below it:

    /* Start copy on the publisher. */
    initStringInfo(&cmd);
-   appendStringInfo(&cmd, "COPY %s TO STDOUT",
-                    quote_qualified_identifier(lrel.nspname, lrel.relname));
+   if (lrel.relkind == RELKIND_RELATION)
+       appendStringInfo(&cmd, "COPY %s TO STDOUT",
+                        quote_qualified_identifier(lrel.nspname,
lrel.relname));
+   else
+   {
+       /*
+        * For non-tables, we need to do COPY (SELECT ...), but we can't just
+        * do SELECT * because we need to not copy generated columns.
+        */

By the way, I have rebased the patches, although maybe you've got your
own copies; attached.

-- 
Thank you,
Amit

Attachment

Re: adding partitioned tables to publications

From
Peter Eisentraut
Date:
On 2020-03-18 04:06, Amit Langote wrote:
> +   if (isnull || !remote_is_publishable)
> +       ereport(ERROR,
> +               (errmsg("table \"%s.%s\" on the publisher is not publishable",
> +                       nspname, relname)));
> 
> Maybe add a one-line comment above this to say it's an "not supposed
> to happen" error or am I missing something?  Wouldn't elog() suffice
> for this?

On second thought, maybe we should just drop this check.  The list of 
tables that is part of the publication was already filtered by the 
publisher, so this query doesn't need to check it again.  We just need 
the relkind to be able to construct the COPY command, but we don't need 
to second-guess it beyond that.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: adding partitioned tables to publications

From
Amit Langote
Date:
On Wed, Mar 18, 2020 at 8:16 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 2020-03-18 04:06, Amit Langote wrote:
> > +   if (isnull || !remote_is_publishable)
> > +       ereport(ERROR,
> > +               (errmsg("table \"%s.%s\" on the publisher is not publishable",
> > +                       nspname, relname)));
> >
> > Maybe add a one-line comment above this to say it's an "not supposed
> > to happen" error or am I missing something?  Wouldn't elog() suffice
> > for this?
>
> On second thought, maybe we should just drop this check.  The list of
> tables that is part of the publication was already filtered by the
> publisher, so this query doesn't need to check it again.  We just need
> the relkind to be able to construct the COPY command, but we don't need
> to second-guess it beyond that.

Agreed.

-- 
Thank you,
Amit



Re: adding partitioned tables to publications

From
Peter Eisentraut
Date:
On 2020-03-18 15:19, Amit Langote wrote:
> On Wed, Mar 18, 2020 at 8:16 PM Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>> On 2020-03-18 04:06, Amit Langote wrote:
>>> +   if (isnull || !remote_is_publishable)
>>> +       ereport(ERROR,
>>> +               (errmsg("table \"%s.%s\" on the publisher is not publishable",
>>> +                       nspname, relname)));
>>>
>>> Maybe add a one-line comment above this to say it's an "not supposed
>>> to happen" error or am I missing something?  Wouldn't elog() suffice
>>> for this?
>>
>> On second thought, maybe we should just drop this check.  The list of
>> tables that is part of the publication was already filtered by the
>> publisher, so this query doesn't need to check it again.  We just need
>> the relkind to be able to construct the COPY command, but we don't need
>> to second-guess it beyond that.
> 
> Agreed.

Committed with that change then.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: adding partitioned tables to publications

From
Peter Eisentraut
Date:
On 2020-03-18 08:33, Amit Langote wrote:
> By the way, I have rebased the patches, although maybe you've got your
> own copies; attached.

Looking through 0002 and 0003 now.

The structure looks generally good.

In 0002, the naming of apply_handle_insert() vs. 
apply_handle_do_insert() etc. seems a bit prone to confusion.  How about 
something like apply_handle_insert_internal()?  Also, should we put each 
of those internal functions next to their internal function instead of 
in a separate group like you have it?

In apply_handle_do_insert(), the argument localslot should probably be 
remoteslot.

In apply_handle_do_delete(), the ExecOpenIndices() call was moved to a 
different location relative to the rest of the code.  That was probably 
not intended.

In 0003, you have /* TODO, use inverse lookup hashtable? */.  Is this 
something you plan to address in this cycle, or is that more for future 
generations?

0003 could use some more tests.  The one test that you adjusted just 
ensures the data goes somewhere instead of being rejected, but there are 
no tests that check whether it ends up in the right partition, whether 
cross-partition updates work etc.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: adding partitioned tables to publications

From
Amit Langote
Date:
On Thu, Mar 19, 2020 at 11:18 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 2020-03-18 08:33, Amit Langote wrote:
> > By the way, I have rebased the patches, although maybe you've got your
> > own copies; attached.
>
> Looking through 0002 and 0003 now.
>
> The structure looks generally good.

Thanks for the review.

> In 0002, the naming of apply_handle_insert() vs.
> apply_handle_do_insert() etc. seems a bit prone to confusion.  How about
> something like apply_handle_insert_internal()?  Also, should we put each
> of those internal functions next to their internal function instead of
> in a separate group like you have it?

Sure.

> In apply_handle_do_insert(), the argument localslot should probably be
> remoteslot.

You're right, fixed.

> In apply_handle_do_delete(), the ExecOpenIndices() call was moved to a
> different location relative to the rest of the code.  That was probably
> not intended.

Fixed.

> In 0003, you have /* TODO, use inverse lookup hashtable? */.  Is this
> something you plan to address in this cycle, or is that more for future
> generations?

Sorry, this is simply a copy-paste from logicalrep_relmap_invalidate_cb().

> 0003 could use some more tests.  The one test that you adjusted just
> ensures the data goes somewhere instead of being rejected, but there are
> no tests that check whether it ends up in the right partition, whether
> cross-partition updates work etc.

Okay, added some tests.

Attached updated patches.

--
Thank you,
Amit

Attachment

Re: adding partitioned tables to publications

From
Peter Eisentraut
Date:
On 2020-03-23 06:02, Amit Langote wrote:
> Okay, added some tests.
> 
> Attached updated patches.

I have committed the worker.c refactoring patch.

"Add subscription support to replicate into partitioned tables" still 
has lacking test coverage.  Your changes in relation.c are not exercised 
at all because the partitioned table branch in apply_handle_update() is 
never taken.  This is critical and tricky code, so I would look for 
significant testing.

The code looks okay to me.  I would remove this code

+       memset(entry->attrmap->attnums, -1,
+              entry->attrmap->maplen * sizeof(AttrNumber));

because the entries are explicitly filled right after anyway, and 
filling the bytes with -1 has an unclear effect.  There is also 
seemingly some fishiness in this code around whether attribute numbers 
are zero- or one-based.  Perhaps this could be documented briefly. 
Maybe I'm misunderstanding something.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: adding partitioned tables to publications

From
Amit Langote
Date:
On Wed, Mar 25, 2020 at 9:29 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 2020-03-23 06:02, Amit Langote wrote:
> > Okay, added some tests.
> >
> > Attached updated patches.
>
> I have committed the worker.c refactoring patch.
>
> "Add subscription support to replicate into partitioned tables" still
> has lacking test coverage.  Your changes in relation.c are not exercised
> at all because the partitioned table branch in apply_handle_update() is
> never taken.  This is critical and tricky code, so I would look for
> significant testing.

While trying some tests around the code you  mentioned, I found what
looks like a bug, which looking into now.

> The code looks okay to me.  I would remove this code
>
> +       memset(entry->attrmap->attnums, -1,
> +              entry->attrmap->maplen * sizeof(AttrNumber));
>
> because the entries are explicitly filled right after anyway, and
> filling the bytes with -1 has an unclear effect.  There is also
> seemingly some fishiness in this code around whether attribute numbers
> are zero- or one-based.  Perhaps this could be documented briefly.
> Maybe I'm misunderstanding something.

Will check and fix as necessary.

--
Thank you,

Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: adding partitioned tables to publications

From
Amit Langote
Date:
On Thu, Mar 26, 2020 at 11:23 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Wed, Mar 25, 2020 at 9:29 PM Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
> > On 2020-03-23 06:02, Amit Langote wrote:
> > > Okay, added some tests.
> > >
> > > Attached updated patches.
> >
> > I have committed the worker.c refactoring patch.
> >
> > "Add subscription support to replicate into partitioned tables" still
> > has lacking test coverage.  Your changes in relation.c are not exercised
> > at all because the partitioned table branch in apply_handle_update() is
> > never taken.  This is critical and tricky code, so I would look for
> > significant testing.
>
> While trying some tests around the code you  mentioned, I found what
> looks like a bug, which looking into now.

Turns out the code in apply_handle_tuple_routing() for the UPDATE
message was somewhat bogus, which fixed in the updated version.  I
ended up with anothing refactoring patch, which attached as 0001.

It appears to me that the tests now seem enough to cover
apply_handle_tuple_routing(), although more could still be added.

> > The code looks okay to me.  I would remove this code
> >
> > +       memset(entry->attrmap->attnums, -1,
> > +              entry->attrmap->maplen * sizeof(AttrNumber));
> >
> > because the entries are explicitly filled right after anyway, and
> > filling the bytes with -1 has an unclear effect.  There is also
> > seemingly some fishiness in this code around whether attribute numbers
> > are zero- or one-based.  Perhaps this could be documented briefly.
> > Maybe I'm misunderstanding something.
>
> Will check and fix as necessary.

Removed that memset.  I have added a comment about one- vs. zero-based
indexes contained in the maps coming from two different modules, viz.
tuple routing and logical replication, resp.

-- 
Thank you,

Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: adding partitioned tables to publications

From
Amit Langote
Date:
I have updated the comments in apply_handle_tuple_routing() (see 0002)
to better explain what's going on with UPDATE handling.  I also
rearranged the tests a bit for clarity.

Attached updated patches.

-- 
Thank you,

Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: adding partitioned tables to publications

From
Peter Eisentraut
Date:
On 2020-03-30 17:42, Amit Langote wrote:
> I have updated the comments in apply_handle_tuple_routing() (see 0002)
> to better explain what's going on with UPDATE handling.  I also
> rearranged the tests a bit for clarity.
> 
> Attached updated patches.

Test coverage for 0002 is still a bit lacking.  Please do a coverage 
build yourself and get at least one test case to exercise every branch 
in apply_handle_tuple_routing().  Right now, I don't see any coverage 
for updates without attribute remapping and updates that don't move to a 
new partition.

Also, the coverage report reveals that in logicalrep_partmap_init(), the 
patch is mistakenly initializing LogicalRepRelMapContext instead of 
LogicalRepPartMapContext.  (Hmm, how does it even work like that?)

I think apart from some of these details, this patch is okay, but I 
don't have deep experience in the partitioning code, I can just see that 
it looks like other code elsewhere.  Perhaps someone with more knowledge 
can give this a look as well.

About patch 0003, I was talking to some people offline about the name of 
the option.  There was some confusion about using the term "schema". 
How about naming it "publish_via_partition_root", which also matches the 
name of the analogous option in pg_dump.

Code coverage here could also be improved.  A lot of the new code in 
pgoutput.c is not tested.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: adding partitioned tables to publications

From
Petr Jelinek
Date:
Hi,

On 02/04/2020 14:23, Peter Eisentraut wrote:
> On 2020-03-30 17:42, Amit Langote wrote:
>> I have updated the comments in apply_handle_tuple_routing() (see 0002)
>> to better explain what's going on with UPDATE handling.  I also
>> rearranged the tests a bit for clarity.
>>
>> Attached updated patches.
> > Also, the coverage report reveals that in logicalrep_partmap_init(), the
> patch is mistakenly initializing LogicalRepRelMapContext instead of 
> LogicalRepPartMapContext.  (Hmm, how does it even work like that?)
> 

It works because it's just a MemoryContext and it's long lived. I wonder 
if the fix here is to simply remove the LogicalRepPartMapContext...

> I think apart from some of these details, this patch is okay, but I 
> don't have deep experience in the partitioning code, I can just see that 
> it looks like other code elsewhere.  Perhaps someone with more knowledge 
> can give this a look as well.
> 

FWIW it looks okay to me as well from perspective of somebody who 
implemented something similar outside of core.

> About patch 0003, I was talking to some people offline about the name of 
> the option.  There was some confusion about using the term "schema". How 
> about naming it "publish_via_partition_root", which also matches the 
> name of the analogous option in pg_dump.
> 

+1 (disclaimer: I was one of the people who discussed this offline)

-- 
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/



Re: adding partitioned tables to publications

From
Amit Langote
Date:
On Fri, Apr 3, 2020 at 4:52 PM Petr Jelinek <petr@2ndquadrant.com> wrote:
> On 02/04/2020 14:23, Peter Eisentraut wrote:
> > On 2020-03-30 17:42, Amit Langote wrote:
> >> I have updated the comments in apply_handle_tuple_routing() (see 0002)
> >> to better explain what's going on with UPDATE handling.  I also
> >> rearranged the tests a bit for clarity.
> >>
> >> Attached updated patches.
> > > Also, the coverage report reveals that in logicalrep_partmap_init(), the
> > patch is mistakenly initializing LogicalRepRelMapContext instead of
> > LogicalRepPartMapContext.  (Hmm, how does it even work like that?)
> >
>
> It works because it's just a MemoryContext and it's long lived. I wonder
> if the fix here is to simply remove the LogicalRepPartMapContext...

Actually, there is no LogicalRepPartMapContext in the patches posted
so far, but I have decided to add it in the updated patch. One
advantage beside avoiding confusion is that it might help to tell
memory consumed by the partitions apart from that consumed by the
actual replication targets.

> > I think apart from some of these details, this patch is okay, but I
> > don't have deep experience in the partitioning code, I can just see that
> > it looks like other code elsewhere.  Perhaps someone with more knowledge
> > can give this a look as well.
> >
>
> FWIW it looks okay to me as well from perspective of somebody who
> implemented something similar outside of core.

Thanks for giving it a look.

> > About patch 0003, I was talking to some people offline about the name of
> > the option.  There was some confusion about using the term "schema". How
> > about naming it "publish_via_partition_root", which also matches the
> > name of the analogous option in pg_dump.
> >
>
> +1 (disclaimer: I was one of the people who discussed this offline)

Okay, I like that too.

I am checking test coverage at the moment and should have the patches
ready by sometime later today.

-- 
Thank you,

Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: adding partitioned tables to publications

From
Amit Langote
Date:
On Fri, Apr 3, 2020 at 6:34 PM Amit Langote <amitlangote09@gmail.com> wrote:
> I am checking test coverage at the moment and should have the patches
> ready by sometime later today.

Attached updated patches.

I confirmed using a coverage build that all the new code in
logical/worker.c due to 0002 is now covered. For some reason, coverage
report for pgoutput.c doesn't say the same thing for 0003's changes,
although I doubt that result.  It seems strange to believe that *none*
of the new code is tested.  I even checked by adding debugging elog()s
next to the lines that the coverage report says aren't exercised,
which tell me that that's not true. Perhaps my coverage build is
somehow getting messed up, so it would be nice if someone with
reliable coverage builds can confirm one way or the other. I will
continue to check what's wrong.

I fixed a couple of bugs in 0002. One of the bugs was that the
"partition map" hash table in logical/relation.c didn't really work,
so logicalrep_partition_would() always create a new entry.

In 0003, changed the publication parameter name to publish_via_partition_root.

-- 
Thank you,

Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: adding partitioned tables to publications

From
Petr Jelinek
Date:
Hi,

On 03/04/2020 16:25, Amit Langote wrote:
> On Fri, Apr 3, 2020 at 6:34 PM Amit Langote <amitlangote09@gmail.com> wrote:
>> I am checking test coverage at the moment and should have the patches
>> ready by sometime later today.
> 
> Attached updated patches.
> 
> I confirmed using a coverage build that all the new code in
> logical/worker.c due to 0002 is now covered. For some reason, coverage
> report for pgoutput.c doesn't say the same thing for 0003's changes,
> although I doubt that result.  It seems strange to believe that *none*
> of the new code is tested.  I even checked by adding debugging elog()s
> next to the lines that the coverage report says aren't exercised,
> which tell me that that's not true. Perhaps my coverage build is
> somehow getting messed up, so it would be nice if someone with
> reliable coverage builds can confirm one way or the other. I will
> continue to check what's wrong.
> 

AFAIK gcov can't handle multiple instances of same process being started 
as it just overwrites the coverage files. So for TAP test it will report 
bogus info (as in some code that's executed will look as not executed). 
We'd probably have to do some kind of `GCOV_PREFIX` magic in the TAP 
framework and merge (gcov/lcov can do that AFAIK) the resulting files to 
get accurate coverage info. But that's beyond this patch IMHO.

-- 
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/



Re: adding partitioned tables to publications

From
Tom Lane
Date:
Petr Jelinek <petr@2ndquadrant.com> writes:
> AFAIK gcov can't handle multiple instances of same process being started 
> as it just overwrites the coverage files. So for TAP test it will report 
> bogus info (as in some code that's executed will look as not executed). 

Hm, really?  I routinely run "make check" (ie, parallel regression
tests) under coverage, and I get results that seem sane.  If I were
losing large chunks of the data, I think I'd have noticed.

            regards, tom lane



Re: adding partitioned tables to publications

From
Petr Jelinek
Date:
On 03/04/2020 16:59, Tom Lane wrote:
> Petr Jelinek <petr@2ndquadrant.com> writes:
>> AFAIK gcov can't handle multiple instances of same process being started
>> as it just overwrites the coverage files. So for TAP test it will report
>> bogus info (as in some code that's executed will look as not executed).
> 
> Hm, really?  I routinely run "make check" (ie, parallel regression
> tests) under coverage, and I get results that seem sane.  If I were
> losing large chunks of the data, I think I'd have noticed.
> 

Parallel regression still just starts single postgres instance no?

-- 
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/



Re: adding partitioned tables to publications

From
Tom Lane
Date:
Petr Jelinek <petr@2ndquadrant.com> writes:
> On 03/04/2020 16:59, Tom Lane wrote:
>> Petr Jelinek <petr@2ndquadrant.com> writes:
>>> AFAIK gcov can't handle multiple instances of same process being started
>>> as it just overwrites the coverage files. So for TAP test it will report
>>> bogus info (as in some code that's executed will look as not executed).

>> Hm, really?  I routinely run "make check" (ie, parallel regression
>> tests) under coverage, and I get results that seem sane.  If I were
>> losing large chunks of the data, I think I'd have noticed.

> Parallel regression still just starts single postgres instance no?

But the forked-off children have to write the gcov files independently,
don't they?

            regards, tom lane



Re: adding partitioned tables to publications

From
Petr Jelinek
Date:
On 03/04/2020 17:51, Tom Lane wrote:
> Petr Jelinek <petr@2ndquadrant.com> writes:
>> On 03/04/2020 16:59, Tom Lane wrote:
>>> Petr Jelinek <petr@2ndquadrant.com> writes:
>>>> AFAIK gcov can't handle multiple instances of same process being started
>>>> as it just overwrites the coverage files. So for TAP test it will report
>>>> bogus info (as in some code that's executed will look as not executed).
> 
>>> Hm, really?  I routinely run "make check" (ie, parallel regression
>>> tests) under coverage, and I get results that seem sane.  If I were
>>> losing large chunks of the data, I think I'd have noticed.
> 
>> Parallel regression still just starts single postgres instance no?
> 
> But the forked-off children have to write the gcov files independently,
> don't they?
> 

Hmm that's very good point. I did see these missing coverage issue when 
running tests that explicitly start more instances of postgres before 
though. And with some quick googling, parallel testing seems to be issue 
with gcov for more people.

I wonder if the program checksum that gcov calculates when merging the 
.gcda data while updating it is somehow different for separately started 
instances but not for the ones forked from same parent or something. I 
don't know internals of gcov well enough to say how exactly that works.

-- 
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/



Re: adding partitioned tables to publications

From
Tom Lane
Date:
Petr Jelinek <petr@2ndquadrant.com> writes:
> On 03/04/2020 17:51, Tom Lane wrote:
>> But the forked-off children have to write the gcov files independently,
>> don't they?

> Hmm that's very good point. I did see these missing coverage issue when 
> running tests that explicitly start more instances of postgres before 
> though. And with some quick googling, parallel testing seems to be issue 
> with gcov for more people.

I poked around and found this:

https://gcc.gnu.org/legacy-ml/gcc-help/2005-11/msg00074.html

which says

    gcov instrumentation is multi-process safe, but not multi-thread
    safe. The multi-processing safety relies on OS level file locking,
    which is not available on some systems.

That would explain why it works for me, but then there's a question
of why it doesn't work for you ...

            regards, tom lane



Re: adding partitioned tables to publications

From
Petr Jelinek
Date:
On 04/04/2020 07:25, Tom Lane wrote:
> Petr Jelinek <petr@2ndquadrant.com> writes:
>> On 03/04/2020 17:51, Tom Lane wrote:
>>> But the forked-off children have to write the gcov files independently,
>>> don't they?
> 
>> Hmm that's very good point. I did see these missing coverage issue when
>> running tests that explicitly start more instances of postgres before
>> though. And with some quick googling, parallel testing seems to be issue
>> with gcov for more people.
> 
> I poked around and found this:
> 
> https://gcc.gnu.org/legacy-ml/gcc-help/2005-11/msg00074.html
> 
> which says
> 
>      gcov instrumentation is multi-process safe, but not multi-thread
>      safe. The multi-processing safety relies on OS level file locking,
>      which is not available on some systems.
> 
> That would explain why it works for me, but then there's a question
> of why it doesn't work for you ...

Hmm, I wonder if it has something to do with docker then (I rarely run 
any tests directly on the main system nowadays). But that does not 
explain why it does not work for Amit either.

-- 
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/



Re: adding partitioned tables to publications

From
Amit Langote
Date:
On Sat, Apr 4, 2020 at 5:56 PM Petr Jelinek <petr@2ndquadrant.com> wrote:
> On 04/04/2020 07:25, Tom Lane wrote:
> > Petr Jelinek <petr@2ndquadrant.com> writes:
> >> On 03/04/2020 17:51, Tom Lane wrote:
> >>> But the forked-off children have to write the gcov files independently,
> >>> don't they?
> >
> >> Hmm that's very good point. I did see these missing coverage issue when
> >> running tests that explicitly start more instances of postgres before
> >> though. And with some quick googling, parallel testing seems to be issue
> >> with gcov for more people.
> >
> > I poked around and found this:
> >
> > https://gcc.gnu.org/legacy-ml/gcc-help/2005-11/msg00074.html
> >
> > which says
> >
> >      gcov instrumentation is multi-process safe, but not multi-thread
> >      safe. The multi-processing safety relies on OS level file locking,
> >      which is not available on some systems.
> >
> > That would explain why it works for me, but then there's a question
> > of why it doesn't work for you ...
>
> Hmm, I wonder if it has something to do with docker then (I rarely run
> any tests directly on the main system nowadays). But that does not
> explain why it does not work for Amit either.

One thing to I must clarify: coverage for most of pgoutput.c looks
okay on each run.  I am concerned that the coverage for the code added
by the patch is shown to be close to zero, which is a mystery to me,
because I can confirm by other means such as debugging elogs() to next
to the new code that the newly added tests do cover them.

--
Thank you,

Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: adding partitioned tables to publications

From
Tom Lane
Date:
Amit Langote <amitlangote09@gmail.com> writes:
> One thing to I must clarify: coverage for most of pgoutput.c looks
> okay on each run.  I am concerned that the coverage for the code added
> by the patch is shown to be close to zero, which is a mystery to me,
> because I can confirm by other means such as debugging elogs() to next
> to the new code that the newly added tests do cover them.

According to

https://coverage.postgresql.org/src/backend/replication/pgoutput/index.html

the coverage is pretty good.  Maybe you're doing something wrong
in enabling coverage testing locally?

            regards, tom lane



Re: adding partitioned tables to publications

From
Peter Eisentraut
Date:
On 2020-04-03 16:25, Amit Langote wrote:
> On Fri, Apr 3, 2020 at 6:34 PM Amit Langote <amitlangote09@gmail.com> wrote:
>> I am checking test coverage at the moment and should have the patches
>> ready by sometime later today.
> 
> Attached updated patches.

Committed 0001 now.  I'll work on the rest tomorrow.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: adding partitioned tables to publications

From
Amit Langote
Date:
On Mon, Apr 6, 2020 at 10:25 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 2020-04-03 16:25, Amit Langote wrote:
> > On Fri, Apr 3, 2020 at 6:34 PM Amit Langote <amitlangote09@gmail.com> wrote:
> >> I am checking test coverage at the moment and should have the patches
> >> ready by sometime later today.
> >
> > Attached updated patches.
>
> Committed 0001 now.  I'll work on the rest tomorrow.

Thank you.  I have rebased the one remaining.

-- 
Thank you,

Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: adding partitioned tables to publications

From
Amit Langote
Date:
On Tue, Apr 7, 2020 at 12:04 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Mon, Apr 6, 2020 at 10:25 PM Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
> > On 2020-04-03 16:25, Amit Langote wrote:
> > > On Fri, Apr 3, 2020 at 6:34 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > >> I am checking test coverage at the moment and should have the patches
> > >> ready by sometime later today.
> > >
> > > Attached updated patches.
> >
> > Committed 0001 now.  I'll work on the rest tomorrow.
>
> Thank you.  I have rebased the one remaining.

I updated the patch to make the following changes:

* Rewrote the tests to match in style with those committed yesterday
* Renamed all variables that had pubasroot in it to have pubviaroot
instead to match the publication parameter
* Updated pg_publication catalog documentation

-- 
Thank you,

Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: adding partitioned tables to publications

From
Peter Eisentraut
Date:
On 2020-04-07 08:44, Amit Langote wrote:
> I updated the patch to make the following changes:
> 
> * Rewrote the tests to match in style with those committed yesterday
> * Renamed all variables that had pubasroot in it to have pubviaroot
> instead to match the publication parameter
> * Updated pg_publication catalog documentation

Thanks.  I have some further questions:

The change in nodeModifyTable.c to add CheckValidResultRel() is unclear. 
  It doesn't seem to do anything, and it's not clear how it's related to 
this patch.

The changes in GetRelationPublications() are confusing to me:

+   if (published_rels)
+   {
+       num = list_length(result);
+       for (i = 0; i < num; i++)
+           *published_rels = lappend_oid(*published_rels, relid);
+   }

This adds relid to the output list "num" times, where num is the number 
of publications found.  Shouldn't "i" be used in the loop somehow? 
Similarly later in the function.

The descriptions of the new fields in RelationSyncEntry don't seem to 
match the code accurately, or at least it's confusing. 
replicate_as_relid is always filled in with an ancestor, even if 
pubviaroot is not set.

I think the pubviaroot field is actually not necessary.  We only need 
replicate_as_relid.

There is a markup typo in logical-replication.sgml:

     <xref linkend=="sql-createpublication"/>

In pg_dump, you missed updating a branch for an older version.  See 
attached patch.

Also attached a patch to rephrase the psql output a bit to make it not 
so long.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: adding partitioned tables to publications

From
Amit Langote
Date:
Thanks for the review.

On Tue, Apr 7, 2020 at 6:01 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 2020-04-07 08:44, Amit Langote wrote:
> > I updated the patch to make the following changes:
> >
> > * Rewrote the tests to match in style with those committed yesterday
> > * Renamed all variables that had pubasroot in it to have pubviaroot
> > instead to match the publication parameter
> > * Updated pg_publication catalog documentation
>
> Thanks.  I have some further questions:
>
> The change in nodeModifyTable.c to add CheckValidResultRel() is unclear.
>   It doesn't seem to do anything, and it's not clear how it's related to
> this patch.

CheckValidResultRel() checks that replica identity is present for
replicating given update/delete, which I think, it's better to perform
on the root table itself, rather than some partition that would be
affected.  The latter already occurs by way of CheckValidResultRel()
being called on partitions to be updated. I think we get a more
helpful message if the root parent is flagged instead of a partition.

update prt1 set b = b + 1 where a = 578;
ERROR:  cannot update table "prt1" because it does not have a replica
identity and publishes updates
HINT:  To enable updating the table, set REPLICA IDENTITY using ALTER TABLE.

vs.

-- checking the partition
update prt1 set b = b + 1 where a = 578;
ERROR:  cannot update table "prt1_p3" because it does not have a
replica identity and publishes updates
HINT:  To enable updating the table, set REPLICA IDENTITY using ALTER TABLE.

I am okay to get rid of the check on root table if flagging invidual
partitions seems good enough.

> The changes in GetRelationPublications() are confusing to me:
>
> +   if (published_rels)
> +   {
> +       num = list_length(result);
> +       for (i = 0; i < num; i++)
> +           *published_rels = lappend_oid(*published_rels, relid);
> +   }
>
> This adds relid to the output list "num" times, where num is the number
> of publications found.  Shouldn't "i" be used in the loop somehow?
> Similarly later in the function.

published_rels contains an *OID* for each publication that will be in
result.  Callers should iterate the two lists together and for each
publication found in result, it will know which relation it is
associated with using the OID found in published_rels being scanned in
parallel.  If publishing through an ancestor's publication, we need to
know which ancestor, so the whole dance.

I have thought this to be a bit ugly before, but after having to
explain it, I think it's better to use some other approach for this.
I have updated the patch so that GetRelationPublications no longer
considers a relation's ancestors.  That way, it doesn't have to
second-guess what other information will be needed by the caller.

I hope that's clearer, because all the logic is in one place and that
is get_rel_sync_entry().

> The descriptions of the new fields in RelationSyncEntry don't seem to
> match the code accurately, or at least it's confusing.
> replicate_as_relid is always filled in with an ancestor, even if
> pubviaroot is not set.

Given this confusion, I have changed how replicate_as_relid works so
that it's now always set -- if different from the relation's own OID,
the code for "publishing via root" kicks in in various places.

> I think the pubviaroot field is actually not necessary.  We only need
> replicate_as_relid.

Looking through the code, I agree.  I guess I only kept it around to
go with pubupdate, etc.

I guess it might also be a good idea to call it publish_as_relid
instead of replicate_as_relid for consistency.

> There is a markup typo in logical-replication.sgml:
>
>      <xref linkend=="sql-createpublication"/>

Oops, fixed.

> In pg_dump, you missed updating a branch for an older version.  See
> attached patch.
>
> Also attached a patch to rephrase the psql output a bit to make it not
> so long.

Thank you, merged.

Attached updated patch with above changes.

--
Thank you,

Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: adding partitioned tables to publications

From
Amit Langote
Date:
On Wed, Apr 8, 2020 at 1:22 AM Amit Langote <amitlangote09@gmail.com> wrote:
> On Tue, Apr 7, 2020 at 6:01 PM Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
> > The descriptions of the new fields in RelationSyncEntry don't seem to
> > match the code accurately, or at least it's confusing.
> > replicate_as_relid is always filled in with an ancestor, even if
> > pubviaroot is not set.
>
> Given this confusion, I have changed how replicate_as_relid works so
> that it's now always set -- if different from the relation's own OID,
> the code for "publishing via root" kicks in in various places.
>
> > I think the pubviaroot field is actually not necessary.  We only need
> > replicate_as_relid.
>
> Looking through the code, I agree.  I guess I only kept it around to
> go with pubupdate, etc.

Think I broke truncate replication with this.  Fixed in the attached
updated patch.

--

Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: adding partitioned tables to publications

From
Peter Eisentraut
Date:
On 2020-04-08 07:45, Amit Langote wrote:
> On Wed, Apr 8, 2020 at 1:22 AM Amit Langote <amitlangote09@gmail.com> wrote:
>> On Tue, Apr 7, 2020 at 6:01 PM Peter Eisentraut
>> <peter.eisentraut@2ndquadrant.com> wrote:
>>> The descriptions of the new fields in RelationSyncEntry don't seem to
>>> match the code accurately, or at least it's confusing.
>>> replicate_as_relid is always filled in with an ancestor, even if
>>> pubviaroot is not set.
>>
>> Given this confusion, I have changed how replicate_as_relid works so
>> that it's now always set -- if different from the relation's own OID,
>> the code for "publishing via root" kicks in in various places.
>>
>>> I think the pubviaroot field is actually not necessary.  We only need
>>> replicate_as_relid.
>>
>> Looking through the code, I agree.  I guess I only kept it around to
>> go with pubupdate, etc.
> 
> Think I broke truncate replication with this.  Fixed in the attached
> updated patch.

All committed.

Thank you and everyone very much for working on this.  I'm very happy 
that these two features from PG10 have finally met. :)

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: adding partitioned tables to publications

From
Amit Langote
Date:
On Wed, Apr 8, 2020 at 6:26 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> All committed.
>
> Thank you and everyone very much for working on this.  I'm very happy
> that these two features from PG10 have finally met. :)

Thanks a lot for reviewing and committing.

prion seems to have failed:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2020-04-08%2009%3A53%3A13

Also, still unsure why the coverage report for pgoutput.c changes not good:
https://coverage.postgresql.org/src/backend/replication/pgoutput/pgoutput.c.gcov.html

Will check.

-- 

Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: adding partitioned tables to publications

From
Peter Eisentraut
Date:
On 2020-04-08 13:16, Amit Langote wrote:
> On Wed, Apr 8, 2020 at 6:26 PM Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>> All committed.
>>
>> Thank you and everyone very much for working on this.  I'm very happy
>> that these two features from PG10 have finally met. :)
> 
> Thanks a lot for reviewing and committing.
> 
> prion seems to have failed:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2020-04-08%2009%3A53%3A13

This comes from -DRELCACHE_FORCE_RELEASE.

> Also, still unsure why the coverage report for pgoutput.c changes not good:
> https://coverage.postgresql.org/src/backend/replication/pgoutput/pgoutput.c.gcov.html

I think this is because the END { } section in PostgresNode.pm shuts 
down all running instances in immediate mode, which doesn't save 
coverage properly.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: adding partitioned tables to publications

From
Amit Langote
Date:
On Wed, Apr 8, 2020 at 9:21 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 2020-04-08 13:16, Amit Langote wrote:
> > On Wed, Apr 8, 2020 at 6:26 PM Peter Eisentraut
> > <peter.eisentraut@2ndquadrant.com> wrote:
> >> All committed.
> >>
> >> Thank you and everyone very much for working on this.  I'm very happy
> >> that these two features from PG10 have finally met. :)
> >
> > Thanks a lot for reviewing and committing.
> >
> > prion seems to have failed:
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2020-04-08%2009%3A53%3A13
>
> This comes from -DRELCACHE_FORCE_RELEASE.

I'm seeing some funny stuff on such a build locally too, although
haven't been able to make sense of it yet.

> > Also, still unsure why the coverage report for pgoutput.c changes not good:
> > https://coverage.postgresql.org/src/backend/replication/pgoutput/pgoutput.c.gcov.html
>
> I think this is because the END { } section in PostgresNode.pm shuts
> down all running instances in immediate mode, which doesn't save
> coverage properly.

Thanks for that tip.  Appending the following at the end of the test
file has fixed the coverage reporting for me.

I noticed the following coverage issues:

1. The previous commit f1ac27bfd missed a command that I had included
to cover the following blocks of apply_handle_tuple_routing():

    1165             :                     else
    1166             :                     {
    1167           0 :                         remoteslot =
ExecCopySlot(remoteslot, remoteslot_part);
    1168           0 :                         slot_getallattrs(remoteslot);
    1169             :                     }
...

    1200           2 :                     if (map != NULL)
    1201             :                     {
    1202           0 :                         remoteslot_part =
execute_attr_map_slot(map->attrMap,
    1203             :
                remoteslot,
    1204             :
                remoteslot_part);
    1205             :                     }

2. Now that I am able to see proper coverage fo
publish_via_partition_root related changes, I can see that a block in
pgoutput_change() is missing coverage

The attached fixes these coverage issues.

-- 

Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: adding partitioned tables to publications

From
Amit Langote
Date:
On Wed, Apr 8, 2020 at 11:07 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Wed, Apr 8, 2020 at 9:21 PM Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
> > I think this is because the END { } section in PostgresNode.pm shuts
> > down all running instances in immediate mode, which doesn't save
> > coverage properly.
>
> Thanks for that tip.  Appending the following at the end of the test
> file has fixed the coverage reporting for me.

The patch posted in the previous email has it, but I meant this by
"the following":

+
+$node_publisher->stop('fast');
+$node_subscriber1->stop('fast');
+$node_subscriber2->stop('fast');

-- 

Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: adding partitioned tables to publications

From
Amit Langote
Date:
On Wed, Apr 8, 2020 at 11:07 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Wed, Apr 8, 2020 at 9:21 PM Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
> > On 2020-04-08 13:16, Amit Langote wrote:
> > > prion seems to have failed:
> > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2020-04-08%2009%3A53%3A13
> >
> > This comes from -DRELCACHE_FORCE_RELEASE.
>
> I'm seeing some funny stuff on such a build locally too, although
> haven't been able to make sense of it yet.

So, I see the following repeated in the publisher's log
(013_partition.pl) until PostgresNode.pm times out:

sub_viaroot ERROR:  number of columns (2601) exceeds limit (1664)
sub_viaroot CONTEXT:  slot "sub_viaroot", output plugin "pgoutput", in
the change callback, associated LSN 0/1621010

causing the tests introduced by this last commit to stall.

Just before where the above starts repeating is this:

sub_viaroot_16479_sync_16455 LOG:  starting logical decoding for slot
"sub_viaroot_16479_sync_16455"
sub_viaroot_16479_sync_16455 DETAIL:  Streaming transactions
committing after 0/1620A40, reading WAL from 0/1620A08.
sub_viaroot_16479_sync_16455 LOG:  logical decoding found consistent
point at 0/1620A08
sub_viaroot_16479_sync_16455 DETAIL:  There are no running transactions.
sub_viaroot_16479_sync_16470 LOG:  statement: COPY public.tab3_1 TO STDOUT
sub_viaroot_16479_sync_16470 LOG:  statement: COMMIT

Same thing for the other subscriber sub2.

-- 

Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: adding partitioned tables to publications

From
Peter Eisentraut
Date:
On 2020-04-09 05:39, Amit Langote wrote:
> sub_viaroot ERROR:  number of columns (2601) exceeds limit (1664)
> sub_viaroot CONTEXT:  slot "sub_viaroot", output plugin "pgoutput", in
> the change callback, associated LSN 0/1621010

I think the problem is that in maybe_send_schema(), 
RelationClose(ancestor) releases the relcache entry, but the tuple 
descriptors, which are part of the relcache entry, are still pointed to 
by the tuple map.

This patch makes the tests pass for me:

diff --git a/src/backend/replication/pgoutput/pgoutput.c 
b/src/backend/replication/pgoutput/pgoutput.c
index 5fbf2d4367..cf6e8629c1 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -305,7 +305,7 @@ maybe_send_schema(LogicalDecodingContext *ctx,

         /* Map must live as long as the session does. */
         oldctx = MemoryContextSwitchTo(CacheMemoryContext);
-       relentry->map = convert_tuples_by_name(indesc, outdesc);
+       relentry->map = 
convert_tuples_by_name(CreateTupleDescCopy(indesc), 
CreateTupleDescCopy(outdesc));
         MemoryContextSwitchTo(oldctx);
         send_relation_and_attrs(ancestor, ctx);
         RelationClose(ancestor);

Please check.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: adding partitioned tables to publications

From
Amit Langote
Date:
On Thu, Apr 9, 2020 at 4:14 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 2020-04-09 05:39, Amit Langote wrote:
> > sub_viaroot ERROR:  number of columns (2601) exceeds limit (1664)
> > sub_viaroot CONTEXT:  slot "sub_viaroot", output plugin "pgoutput", in
> > the change callback, associated LSN 0/1621010
>
> I think the problem is that in maybe_send_schema(),
> RelationClose(ancestor) releases the relcache entry, but the tuple
> descriptors, which are part of the relcache entry, are still pointed to
> by the tuple map.
>
> This patch makes the tests pass for me:
>
> diff --git a/src/backend/replication/pgoutput/pgoutput.c
> b/src/backend/replication/pgoutput/pgoutput.c
> index 5fbf2d4367..cf6e8629c1 100644
> --- a/src/backend/replication/pgoutput/pgoutput.c
> +++ b/src/backend/replication/pgoutput/pgoutput.c
> @@ -305,7 +305,7 @@ maybe_send_schema(LogicalDecodingContext *ctx,
>
>          /* Map must live as long as the session does. */
>          oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> -       relentry->map = convert_tuples_by_name(indesc, outdesc);
> +       relentry->map =
> convert_tuples_by_name(CreateTupleDescCopy(indesc),
> CreateTupleDescCopy(outdesc));
>          MemoryContextSwitchTo(oldctx);
>          send_relation_and_attrs(ancestor, ctx);
>          RelationClose(ancestor);
>
> Please check.

Thanks.  Yes, that's what I just found out too and was about to send a
patch, which is basically same as yours as far as the fix for this
issue is concerned.

While figuring this out, I thought the nearby code could be rearranged
a bit, especially to de-duplicate the code.  Also, I think
get_rel_sync_entry() may be a better place to set the map, rather than
maybe_send_schema().  Thoughts?

-- 

Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: adding partitioned tables to publications

From
Peter Eisentraut
Date:
On 2020-04-09 09:28, Amit Langote wrote:
>> This patch makes the tests pass for me:
>>
>> diff --git a/src/backend/replication/pgoutput/pgoutput.c
>> b/src/backend/replication/pgoutput/pgoutput.c
>> index 5fbf2d4367..cf6e8629c1 100644
>> --- a/src/backend/replication/pgoutput/pgoutput.c
>> +++ b/src/backend/replication/pgoutput/pgoutput.c
>> @@ -305,7 +305,7 @@ maybe_send_schema(LogicalDecodingContext *ctx,
>>
>>           /* Map must live as long as the session does. */
>>           oldctx = MemoryContextSwitchTo(CacheMemoryContext);
>> -       relentry->map = convert_tuples_by_name(indesc, outdesc);
>> +       relentry->map =
>> convert_tuples_by_name(CreateTupleDescCopy(indesc),
>> CreateTupleDescCopy(outdesc));
>>           MemoryContextSwitchTo(oldctx);
>>           send_relation_and_attrs(ancestor, ctx);
>>           RelationClose(ancestor);
>>
>> Please check.
> 
> Thanks.  Yes, that's what I just found out too and was about to send a
> patch, which is basically same as yours as far as the fix for this
> issue is concerned.

I have committed my patch but not ...

> While figuring this out, I thought the nearby code could be rearranged
> a bit, especially to de-duplicate the code.  Also, I think
> get_rel_sync_entry() may be a better place to set the map, rather than
> maybe_send_schema().  Thoughts?

because I didn't really have an opinion on that at the time, but if you 
still want it considered or have any open thoughts on this thread, 
please resend or explain.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: adding partitioned tables to publications

From
Amit Langote
Date:
On Fri, Apr 17, 2020 at 10:23 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 2020-04-09 09:28, Amit Langote wrote:
> > While figuring this out, I thought the nearby code could be rearranged
> > a bit, especially to de-duplicate the code.  Also, I think
> > get_rel_sync_entry() may be a better place to set the map, rather than
> > maybe_send_schema().  Thoughts?
>
> because I didn't really have an opinion on that at the time, but if you
> still want it considered or have any open thoughts on this thread,
> please resend or explain.

Sure, thanks for taking care of the bug.

Rebased the code rearrangement patch.  Also resending the patch to fix
TAP tests for improving coverage as described in:
https://www.postgresql.org/message-id/CA%2BHiwqFyydvQ5g%3Dqa54UM%2BXjm77BdhX-nM4dXQkNOgH%3DzvDjoA%40mail.gmail.com

To summarize:
1. Missing coverage for a couple of related blocks in
apply_handle_tuple_routing()
2. Missing coverage report for the code in pgoutput.c added by 83fd4532

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: adding partitioned tables to publications

From
"赵锐"
Date:
The first file of Amit's patch can not only re-range the code, but also fix a hidden bug.
To make it easy to see, I attach another patch.
"RelationIdGetRelation" will increase ref on owner->relrefarr, without "RelationClose", the owner->relrefarr will enlarge and re-hash.
When the capacity of owner->relrefarr is over than 10 million, enlarge and re-hash takes serial hours. And what's worse, increase ref will also take minutes, as the hash collision resolution is based on looking up an array in order.
When we want to publish 10 billion data under one partition table, it takes serial days up to increase ref, enlarge and re-hash, and CPU is always 99%.
After applying my patch, 10 billion will be published in 10 minutes.


------------------ Original ------------------
From:  "Amit Langote";<amitlangote09@gmail.com>;
Send time: Friday, Apr 17, 2020 10:58 PM
To: "Peter Eisentraut"<peter.eisentraut@2ndquadrant.com>;
Cc: "Petr Jelinek"<petr@2ndquadrant.com>; "Rafia Sabih"<rafia.pghackers@gmail.com>; "PostgreSQL-development"<pgsql-hackers@postgresql.org>;
Subject:  Re: adding partitioned tables to publications

On Fri, Apr 17, 2020 at 10:23 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 2020-04-09 09:28, Amit Langote wrote:
> > While figuring this out, I thought the nearby code could be rearranged
> > a bit, especially to de-duplicate the code.  Also, I think
> > get_rel_sync_entry() may be a better place to set the map, rather than
> > maybe_send_schema().  Thoughts?
>
> because I didn't really have an opinion on that at the time, but if you
> still want it considered or have any open thoughts on this thread,
> please resend or explain.

Sure, thanks for taking care of the bug.

Rebased the code rearrangement patch.  Also resending the patch to fix
TAP tests for improving coverage as described in:
https://www.postgresql.org/message-id/CA%2BHiwqFyydvQ5g%3Dqa54UM%2BXjm77BdhX-nM4dXQkNOgH%3DzvDjoA%40mail.gmail.com

To summarize:
1. Missing coverage for a couple of related blocks in
apply_handle_tuple_routing()
2. Missing coverage report for the code in pgoutput.c added by 83fd4532

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: adding partitioned tables to publications

From
Amit Kapila
Date:
On Wed, Dec 30, 2020 at 8:03 PM 赵锐 <875941708@qq.com> wrote:
>
> The first file of Amit's patch can not only re-range the code, but also fix a hidden bug.
> To make it easy to see, I attach another patch.
> "RelationIdGetRelation" will increase ref on owner->relrefarr, without "RelationClose", the owner->relrefarr will
enlargeand re-hash. 
> When the capacity of owner->relrefarr is over than 10 million, enlarge and re-hash takes serial hours. And what's
worse,increase ref will also take minutes, as the hash collision resolution is based on looking up an array in order. 
> When we want to publish 10 billion data under one partition table, it takes serial days up to increase ref, enlarge
andre-hash, and CPU is always 99%. 
> After applying my patch, 10 billion will be published in 10 minutes.
>

It is a clear relation descriptor leak. The proposed fix seems correct
to me. The patch wasn't getting applied to HEAD. So, I have prepared
the separate patches for HEAD and 13. There are minor modifications in
the patch like I have used RelationIsValid before closing the
relation. I have not added any test because I see that there is
already a test in src/test/subscription/t/013_partition.

Kindly let me know your English name so that I can give you credit as
a co-author?

--
With Regards,
Amit Kapila.

Attachment

Re: adding partitioned tables to publications

From
"Mark Zhao"
Date:
Thanks for your reply. The patch is exactly what I want.
My English name is Mark Zhao, which should be the current email name.

Thanks,
Mark Zhao


------------------ Original ------------------
From:  "Amit Kapila";<amit.kapila16@gmail.com>;
Send time: Monday, Jan 11, 2021 8:12 PM
To: "赵锐"<875941708@qq.com>;
Cc: "Peter Eisentraut"<peter.eisentraut@2ndquadrant.com>; "Amit Langote"<amitlangote09@gmail.com>; "Petr Jelinek"<petr@2ndquadrant.com>; "Rafia Sabih"<rafia.pghackers@gmail.com>; "PostgreSQL-development"<pgsql-hackers@postgresql.org>;
Subject:  Re: adding partitioned tables to publications

On Wed, Dec 30, 2020 at 8:03 PM 赵锐 <875941708@qq.com> wrote:
>
> The first file of Amit's patch can not only re-range the code, but also fix a hidden bug.
> To make it easy to see, I attach another patch.
> "RelationIdGetRelation" will increase ref on owner->relrefarr, without "RelationClose", the owner->relrefarr will enlarge and re-hash.
> When the capacity of owner->relrefarr is over than 10 million, enlarge and re-hash takes serial hours. And what's worse, increase ref will also take minutes, as the hash collision resolution is based on looking up an array in order.
> When we want to publish 10 billion data under one partition table, it takes serial days up to increase ref, enlarge and re-hash, and CPU is always 99%.
> After applying my patch, 10 billion will be published in 10 minutes.
>

It is a clear relation descriptor leak. The proposed fix seems correct
to me. The patch wasn't getting applied to HEAD. So, I have prepared
the separate patches for HEAD and 13. There are minor modifications in
the patch like I have used RelationIsValid before closing the
relation. I have not added any test because I see that there is
already a test in src/test/subscription/t/013_partition.

Kindly let me know your English name so that I can give you credit as
a co-author?

--
With Regards,
Amit Kapila.

Re: adding partitioned tables to publications

From
Amit Kapila
Date:
On Mon, Jan 11, 2021 at 5:44 PM Mark Zhao <875941708@qq.com> wrote:
>
> Thanks for your reply. The patch is exactly what I want.
> My English name is Mark Zhao, which should be the current email name.
>

Pushed the fix.

-- 
With Regards,
Amit Kapila.



Re: adding partitioned tables to publications

From
Amit Langote
Date:
On Tue, Jan 12, 2021 at 5:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Jan 11, 2021 at 5:44 PM Mark Zhao <875941708@qq.com> wrote:
> >
> > Thanks for your reply. The patch is exactly what I want.
> > My English name is Mark Zhao, which should be the current email name.
> >
>
> Pushed the fix.

Thanks Amit and Mark.  I hadn't realized at the time that the relation
descriptor leak was occurring, but good to see it tested and fixed.

-- 
Amit Langote
EDB: http://www.enterprisedb.com