Thread: RFC: logical publication via inheritance root?

RFC: logical publication via inheritance root?

From
Jacob Champion
Date:
Hi,

TImescale makes use of inheritance in its partitioning implementation,
so we can't make use of the publish_via_partition_root publication
option during logical replication. We don't guarantee that the exact
same partitions exist on both sides, so that's a major roadblock for
our implementing logical subscription support, and by the same token
it's not possible to replicate out to a "standard" table.

If we were to work on a corresponding publish_via_inheritance_root
option, is there a chance that it'd be accepted, or is there some
other technical reason preventing it? In addition to Timescale, it
seems like other installations using extensions like pg_partman could
potentially make use of this, during online migrations from the old
style of partitioning to the new.

Some inheritance hierarchies won't be "partitioned" hierarchies, of
course, but the user can simply not set that replication option for
those publications. (Alternatively, I can imagine a system where an
extension explicitly marks a table as having a different "publication
root", and then handling that marker with the existing replication
option. But that may be overengineering things.)

WDYT?

--Jacob



Re: RFC: logical publication via inheritance root?

From
Jacob Champion
Date:
On Fri, Dec 9, 2022 at 10:21 AM Jacob Champion <jchampion@timescale.com> wrote:
> Some inheritance hierarchies won't be "partitioned" hierarchies, of
> course, but the user can simply not set that replication option for
> those publications.

The more I noodle around with this approach, the less I like it: it
feels overly brittle, we have to deal with multiple inheritance
somehow, and there seem to be many code paths that need to be
partially duplicated. And my suggestion that the user could just opt
out of problematic cases would be a bad user experience, since any
non-partition inheritance hierarchies would just silently break.

Instead...

> (Alternatively, I can imagine a system where an
> extension explicitly marks a table as having a different "publication
> root", and then handling that marker with the existing replication
> option. But that may be overengineering things.)

...I'm going to try this approach next, since it's opt-in and may be
able to better use the existing code paths.

--Jacob



Re: RFC: logical publication via inheritance root?

From
Aleksander Alekseev
Date:
Hi Jacob,

> we have to deal with multiple inheritance somehow

I would like to point out that we shouldn't necessarily support
multiple inheritance in all the possible cases, at least not in the
first implementation. Supporting simple cases of inheritance would be
already a valuable feature even if it will have certain limitations.

-- 
Best regards,
Aleksander Alekseev



Re: RFC: logical publication via inheritance root?

From
Jacob Champion
Date:
On Mon, Jan 9, 2023 at 12:41 AM Aleksander Alekseev
<aleksander@timescale.com> wrote:
> I would like to point out that we shouldn't necessarily support
> multiple inheritance in all the possible cases, at least not in the
> first implementation. Supporting simple cases of inheritance would be
> already a valuable feature even if it will have certain limitations.

I agree. What I'm trying to avoid is the case where replication works
nicely for a table until someone performs an ALTER TABLE ... [NO]
INHERIT, and then Something Bad happens because we can't support the
new edge case. If every inheritance tree is automatically opted into
this new publication behavior, I think it'll be easier to hit that by
accident, making the whole thing feel brittle.

By contrast, if we have to opt specific tables into this feature by
marking them in the catalog, then not only will it be harder to hit by
accident (because we can document the requirements for the marker
function, and then it's up to the callers/extension authors/DBAs to
maintain those requirements), but we even have the chance to bail out
during an inheritance change if we see that the table is marked in
this way.

Two general pieces of progress to report:

1) I'm playing around with a marker in pg_inherits, where the inhseqno
is set to a sentinel value (0) for an inheritance relationship that
has been marked for logical publication. The intent is that the
pg_inherits helpers will prevent further inheritance relationships
when they see that marker, and reusing inhseqno means we can make use
of the existing index to do the lookups. An example:

    =# CREATE TABLE root (a int);
    =# CREATE TABLE root_p1 () INHERITS (root);
    =# SELECT pg_set_logical_root('root_p1', 'root');

and then any data written to root_p1 gets replicated via root instead,
if publish_via_partition_root = true. If root_p1 is set up with extra
columns, they'll be omitted from replication.

2) While this strategy works well for ongoing replication, it's not
enough to get the initial synchronization correct. The subscriber
still does a COPY of the root table directly, missing out on all the
logical descendant data. The publisher will have to tell the
subscriber about the relationship somehow, and older subscriber
versions won't understand how to use that (similar to how old
subscribers can't correctly handle row filters).

--Jacob



Re: RFC: logical publication via inheritance root?

From
Jacob Champion
Date:
On 1/10/23 11:36, Jacob Champion wrote:
> 1) I'm playing around with a marker in pg_inherits, where the inhseqno
> is set to a sentinel value (0) for an inheritance relationship that
> has been marked for logical publication. The intent is that the
> pg_inherits helpers will prevent further inheritance relationships
> when they see that marker, and reusing inhseqno means we can make use
> of the existing index to do the lookups. An example:
> 
>     =# CREATE TABLE root (a int);
>     =# CREATE TABLE root_p1 () INHERITS (root);
>     =# SELECT pg_set_logical_root('root_p1', 'root');
> 
> and then any data written to root_p1 gets replicated via root instead,
> if publish_via_partition_root = true. If root_p1 is set up with extra
> columns, they'll be omitted from replication.

First draft attached. (Due to some indentation changes, it's easiest to
read with --ignore-all-space.)

The overall strategy is
- introduce pg_set_logical_root, which sets the sentinel in pg_inherits,
- swap out any checks for partition parents with checks for logical
parents in the publishing code, and
- introduce the ability for a subscriber to perform an initial table
sync from multiple tables on the publisher.

> 2) While this strategy works well for ongoing replication, it's not
> enough to get the initial synchronization correct. The subscriber
> still does a COPY of the root table directly, missing out on all the
> logical descendant data. The publisher will have to tell the
> subscriber about the relationship somehow, and older subscriber
> versions won't understand how to use that (similar to how old
> subscribers can't correctly handle row filters).

I partially solved this by having the subscriber pull the logical
hierarchy from the publisher to figure out which tables to COPY. This
works when publish_via_partition_root=true, but it doesn't correctly
return to the previous behavior when the setting is false. I need to
check the publication setting from the subscriber, too, but that opens
up the question of what to do if two different publications conflict.

And while I go down that rabbit hole, I wanted to see if anyone thinks
this whole thing is unacceptable. :D

Thanks,
--Jacob
Attachment

Re: RFC: logical publication via inheritance root?

From
Jacob Champion
Date:
Hi,

I'm going to register this in CF for feedback.

Summary for potential reviewers: we don't use declarative partitions in
the Timescale partitioning scheme, but it'd be really nice to be able to
replicate between our tables and standard tables, or between two
Timescale-partitioned tables with different layouts. This patch lets
extensions (or savvy users) upgrade an existing inheritance relationship
between two tables into a "logical partition" relationship, so that they
can be handled with the publish_via_partition_root machinery.

I hope this might also help pg_partman users migrate between old- and
new-style partition schemes, but that's speculation.

On 1/20/23 09:53, Jacob Champion wrote:
>> 2) While this strategy works well for ongoing replication, it's not
>> enough to get the initial synchronization correct. The subscriber
>> still does a COPY of the root table directly, missing out on all the
>> logical descendant data. The publisher will have to tell the
>> subscriber about the relationship somehow, and older subscriber
>> versions won't understand how to use that (similar to how old
>> subscribers can't correctly handle row filters).
> 
> I partially solved this by having the subscriber pull the logical
> hierarchy from the publisher to figure out which tables to COPY. This
> works when publish_via_partition_root=true, but it doesn't correctly
> return to the previous behavior when the setting is false. I need to
> check the publication setting from the subscriber, too, but that opens
> up the question of what to do if two different publications conflict.

Second draft attached, which fixes that bug. I kept thinking to myself
that this would be much easier if the publisher told the subscriber what
data to copy rather than having the subscriber hardcode the initial sync
process... and then I realized that I could, sort of, move in that
direction.

This version adds a SQL function to determine the list of source tables
to COPY into a subscriber's target table. Now the publisher can make use
of whatever catalogs it needs to make that list and the subscriber
doesn't need to couple to them. (This could also provide a way for
publishers to provide more generic "table indirection" in the future,
but I'm wary of selling genericism as a feature here.)

I haven't solved the problem where two publications of the same table
have different settings for publish_via_partition_root. I was curious to
see how the existing partition code prevented problems, but I'm not
really sure that it does... Here are some situations where the existing
implementation duplicates data on the initial sync:

1) A single subscription to two publications, one with
publish_via_partition_root on and the other off, which publish the same
partitioned table

2) A single subscription to two publications with
publish_via_partition_root on, one of which publishes a root partition
and the other of which publishes a descendant/leaf

3) A single subscription to two publications with
publish_via_partition_root on, one of which publishes FOR ALL TABLES and
the other of which publishes a descendant/leaf

Is it expected that DBAs should avoid these cases, or are they worth
pursuing with a bug fix?

Thanks,
--Jacob
Attachment

Re: RFC: logical publication via inheritance root?

From
Aleksander Alekseev
Date:
Hi Jacob,

> I'm going to register this in CF for feedback.

Many thanks for the updated patch.

Despite the fact that the patch is still work in progress all in all
it looks very good to me.

So far I only have a couple of nitpicks, mostly regarding the code coverage [1]:

```
+    tablename = get_rel_name(tableoid);
+    if (tablename == NULL)
+        ereport(ERROR,
+                (errcode(ERRCODE_UNDEFINED_TABLE),
+                 errmsg("OID %u does not refer to a table", tableoid)));
+    rootname = get_rel_name(rootoid);
+    if (rootname == NULL)
+        ereport(ERROR,
+                (errcode(ERRCODE_UNDEFINED_TABLE),
+                 errmsg("OID %u does not refer to a table", rootoid)));
```

```
+        res = walrcv_exec(LogRepWorkerWalRcvConn, cmd.data,
+                          lengthof(descRow), descRow);
+
+        if (res->status != WALRCV_OK_TUPLES)
+            ereport(ERROR,
+                    (errmsg("could not fetch logical descendants for
table \"%s.%s\" from publisher: %s",
+                            nspname, relname, res->err)));
```

```
+        res = walrcv_exec(LogRepWorkerWalRcvConn, cmd.data, 0, NULL);
+        pfree(cmd.data);
+        if (res->status != WALRCV_OK_COPY_OUT)
+            ereport(ERROR,
+                    (errcode(ERRCODE_CONNECTION_FAILURE),
+                     errmsg("could not start initial contents copy
for table \"%s.%s\" from remote %s: %s",
+                            lrel.nspname, lrel.relname, quoted_name,
res->err)));
```

These new ereport() paths are never executed when we run the tests.
I'm not 100% sure if they are "should never happen in practice" cases
or not. If they are, I suggest adding corresponding comments.
Otherwise we have to test these paths.

```
+    else
+    {
+        /* For older servers, we only COPY the table itself. */
+        char   *quoted = quote_qualified_identifier(lrel->nspname,
+                                                    lrel->relname);
+        *to_copy = lappend(*to_copy, quoted);
+    }
```

Also we have to be extra careful with this code path because it is not
test-covered too.

```
+Datum
+pg_get_publication_rels_to_sync(PG_FUNCTION_ARGS)
+{
+#define NUM_SYNC_TABLES_ELEM   1
```

What is this macro for?

```
+{ oid => '8137', descr => 'get list of tables to copy during initial sync',
+  proname => 'pg_get_publication_rels_to_sync', prorows => '10',
proretset => 't',
+  provolatile => 's', prorettype => 'regclass', proargtypes => 'regclass text',
+  proargnames => '{rootid,pubname}',
+  prosrc => 'pg_get_publication_rels_to_sync' },
```

Something seems odd here. Is there a chance that it can return
different results even within one statement, especially considering
the fact that pg_set_logical_root() is VOLATILE? Maybe
pg_get_publication_rels_to_sync() should be VOLATILE too [2].

```
+{ oid => '8136', descr => 'mark a table root for logical replication',
+  proname => 'pg_set_logical_root', provolatile => 'v', proparallel => 'u',
+  prorettype => 'void', proargtypes => 'regclass regclass',
+  prosrc => 'pg_set_logical_root' },
```

Shouldn't we also have pg_unset(reset?)_logical_root?

[1]: https://github.com/afiskon/pgscripts/blob/master/code-coverage.sh
[2]: https://www.postgresql.org/docs/current/xfunc-volatility.html

-- 
Best regards,
Aleksander Alekseev



Re: RFC: logical publication via inheritance root?

From
Jacob Champion
Date:
On Tue, Mar 7, 2023 at 2:40 AM Aleksander Alekseev
<aleksander@timescale.com> wrote:
> So far I only have a couple of nitpicks, mostly regarding the code coverage [1]:

Yeah, I need to work on error cases and their coverage in general.
There are more cases that I need to reject as well (marked TODO).

> +Datum
> +pg_get_publication_rels_to_sync(PG_FUNCTION_ARGS)
> +{
> +#define NUM_SYNC_TABLES_ELEM   1
> ```
>
> What is this macro for?

Whoops, that's cruft from an intermediate implementation. Will fix in
the next draft.

> +{ oid => '8137', descr => 'get list of tables to copy during initial sync',
> +  proname => 'pg_get_publication_rels_to_sync', prorows => '10',
> proretset => 't',
> +  provolatile => 's', prorettype => 'regclass', proargtypes => 'regclass text',
> +  proargnames => '{rootid,pubname}',
> +  prosrc => 'pg_get_publication_rels_to_sync' },
> ```
>
> Something seems odd here. Is there a chance that it can return
> different results even within one statement, especially considering
> the fact that pg_set_logical_root() is VOLATILE? Maybe
> pg_get_publication_rels_to_sync() should be VOLATILE too [2].

Hm. I'm not sure how this all should behave in the face of concurrent
structural changes, or how the existing publication queries handle
that same situation (e.g. partition attachment), so that's definitely
something for me to look into. At a glance, I'm not sure that
returning different results for the same table is more correct. And I
feel like a VOLATILE implementation might significantly impact the
JOIN/LATERAL performance in the pg_dump query? But I don't really know
how that's planned.

> +{ oid => '8136', descr => 'mark a table root for logical replication',
> +  proname => 'pg_set_logical_root', provolatile => 'v', proparallel => 'u',
> +  prorettype => 'void', proargtypes => 'regclass regclass',
> +  prosrc => 'pg_set_logical_root' },
> ```
>
> Shouldn't we also have pg_unset(reset?)_logical_root?

My initial thought was that a one-way "upgrade" makes things easier to
reason about. But a one-way function is not good UX, so maybe we
should provide that. We'd need to verify and test what happens if you
undo/"detach" the logical tree during replication.

If it's okay to blindly replace any existing inhseqno with, say, 1 (on
a table with single inheritance), then we can reverse the process
safely. If not, we can't -- at least not with the current
implementation -- because we don't save the previous value anywhere.

Thanks for the review!

--Jacob



Re: RFC: logical publication via inheritance root?

From
Peter Smith
Date:
On Wed, Mar 1, 2023 at 9:47 AM Jacob Champion <jchampion@timescale.com> wrote:
>
> Hi,
>
> I'm going to register this in CF for feedback.
>
> Summary for potential reviewers: we don't use declarative partitions in
> the Timescale partitioning scheme, but it'd be really nice to be able to
> replicate between our tables and standard tables, or between two
> Timescale-partitioned tables with different layouts. This patch lets
> extensions (or savvy users) upgrade an existing inheritance relationship
> between two tables into a "logical partition" relationship, so that they
> can be handled with the publish_via_partition_root machinery.
>
> I hope this might also help pg_partman users migrate between old- and
> new-style partition schemes, but that's speculation.
>

OK, my understanding is that TimescaleDB uses some kind of
quasi-partitioned/inherited tables (aka hypertables? [1]) internally,
and your proposed WIP patch provides a set_logical_root() function
which combines with the logical replication (LR) PUBLICATION option
"publish_via_partition_root" to help to replicate those.

You also mentioned pg_partman. IIUC pg_partman is a partitioning
extension [2] that pre-dated the native PostgreSQL partitioning
introduced in PG10 (i.e. quite a while ago). I guess it would be a
very niche group of users that are still using pg_partman old-style
(pre-PG10) partitions and want to migrate them but have not already
done so. Also, the pg_partman README [3] says since v4.0.0 there is
extensive support for native PostgreSQL partitions, so perhaps
existing LR already works for those.

Outside the scope of special TimescaleDB tables and the speculated
pg_partman old-style table migration, will this proposed new feature
have any other application? In other words, do you know if this
proposal will be of any benefit to the *normal* users who just have
native PostgreSQL inherited tables they want to replicate? I haven’t
yet looked at the WIP patch TAP tests – so apologies for my question
if the benefits to normal users are self-evident from your test cases.

------
[1] https://docs.timescale.com/use-timescale/latest/hypertables/about-hypertables/
[2] https://www.crunchydata.com/blog/native-partitioning-with-postgres
[3] https://github.com/pgpartman/pg_partman

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: RFC: logical publication via inheritance root?

From
Jacob Champion
Date:
On Fri, Mar 31, 2023 at 3:17 PM Peter Smith <smithpb2250@gmail.com> wrote:
> OK, my understanding is that TimescaleDB uses some kind of
> quasi-partitioned/inherited tables (aka hypertables? [1]) internally,
> and your proposed WIP patch provides a set_logical_root() function
> which combines with the logical replication (LR) PUBLICATION option
> "publish_via_partition_root" to help to replicate those.

Correct!

> You also mentioned pg_partman. IIUC pg_partman is a partitioning
> extension [2] that pre-dated the native PostgreSQL partitioning
> introduced in PG10 (i.e. quite a while ago). I guess it would be a
> very niche group of users that are still using pg_partman old-style
> (pre-PG10) partitions and want to migrate them but have not already
> done so.

Yeah. I've got no evidence either way, unfortunately -- on the one
hand, surely people have been able to upgrade by now? And on the
other, implementation inertia seems to override most other engineering
goals...

Probably best to ask the partman users, and not me. :D Or assume it's
a non-benefit unless someone says otherwise (even then, the partman
maintainers would need to agree it's useful and add support for this).

> Outside the scope of special TimescaleDB tables and the speculated
> pg_partman old-style table migration, will this proposed new feature
> have any other application? In other words, do you know if this
> proposal will be of any benefit to the *normal* users who just have
> native PostgreSQL inherited tables they want to replicate?

I think it comes down to why an inheritance scheme was used. If it's
because you want to group rows into named, queryable subsets (e.g. the
"cities/capitals" example in the docs [1]), I don't think this has any
utility, because I assume you'd want to replicate your subsets as-is.

But if it's because you've implemented a partitioning scheme of your
own (the docs still list reasons you might want to [2], even today),
and all you ever really do is interact with the root table, I think
this feature will give you some of the same benefits that
publish_via_partition_root gives native partition users. We're very
much in that boat, but I don't know how many others are.

Thanks!
--Jacob

[1] https://www.postgresql.org/docs/15/tutorial-inheritance.html
[2] https://www.postgresql.org/docs/15/ddl-partitioning.html#DDL-PARTITIONING-USING-INHERITANCE



Re: RFC: logical publication via inheritance root?

From
Aleksander Alekseev
Date:
Hi,

> > Outside the scope of special TimescaleDB tables and the speculated
> > pg_partman old-style table migration, will this proposed new feature
> > have any other application? In other words, do you know if this
> > proposal will be of any benefit to the *normal* users who just have
> > native PostgreSQL inherited tables they want to replicate?
>
> I think it comes down to why an inheritance scheme was used. If it's
> because you want to group rows into named, queryable subsets (e.g. the
> "cities/capitals" example in the docs [1]), I don't think this has any
> utility, because I assume you'd want to replicate your subsets as-is.
>
> But if it's because you've implemented a partitioning scheme of your
> own (the docs still list reasons you might want to [2], even today),
> and all you ever really do is interact with the root table, I think
> this feature will give you some of the same benefits that
> publish_via_partition_root gives native partition users. We're very
> much in that boat, but I don't know how many others are.

I would like to point out that inheritance is merely a tool for
modeling data. Its use cases are not limited to only partitioning,
although many people ended up using it for this purpose back when we
didn't have a proper built-in partitioning. So unless we are going to
remove inheritance in nearest releases (*) I believe it should work
with logical replication in a sane and convenient way.

Correct me if I'm wrong, but I got an impression that the patch tries
to accomplish just that.

(*) Which personally I believe would be a good change. Unlikely to
happen, though.

-- 
Best regards,
Aleksander Alekseev



Re: RFC: logical publication via inheritance root?

From
Peter Smith
Date:
FYI, the WIP patch does not seem to apply cleanly anymore using the latest HEAD.

See the cfbot rebase logs [1].

------
[1] http://cfbot.cputube.org/patch_42_4225.log

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: RFC: logical publication via inheritance root?

From
Jacob Champion
Date:
On Mon, Apr 3, 2023 at 8:53 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> FYI, the WIP patch does not seem to apply cleanly anymore using the latest HEAD.

Yes, sorry -- after 062a84442, the architecture needs to change in a
way that I'm still working through. I've moved the patch to Waiting on
Author while I figure out the rebase.

Thanks,
--Jacob



Re: RFC: logical publication via inheritance root?

From
Jacob Champion
Date:
On 4/4/23 08:14, Jacob Champion wrote:
> Yes, sorry -- after 062a84442, the architecture needs to change in a
> way that I'm still working through. I've moved the patch to Waiting on
> Author while I figure out the rebase.

Okay -- that took longer than I wanted, but here's a rebased patchset
that I'll call v2.

Commit 062a84442 necessitated some rework of the new
pg_get_publication_rels_to_sync() helper. It now takes a list of
publications so that we can handle conflicts in the pubviaroot settings.
This is more complicated than before -- unlike partitions, standard
inheritance trees can selectively publish tables that aren't leaves. But
I think I've finally settled on some semantics for it which are
unsurprising.

As part of that, I've pulled out a patch in 0001 which I hope is
independently useful. Today, there appears to be no way to check which
relid a table will be published through, short of creating a
subscription just to see what happens. 0001 introduces
pg_get_relation_publishing_info() to surface this information, which
makes testing it easier and also makes it possible to inspect what's
happening with more complicated publication setups.

0001 also moves the determination of publish_as_relid out of the
pgoutput plugin and into a pg_publication helper function, because
unless I've missed something crucial, it doesn't seem like an output
plugin is really free to make that decision independently of the
publication settings. The subscriber is not going to ask a plugin for
the right tables to COPY during initial sync, so the plugin had better
be using the same logic as the core.

Many TODOs and upthread points of feedback are still pending, and I
think that several of them are actually symptoms of one architectural
problem with my patch:

- The volatility classifications of pg_set_logical_root() and
pg_get_publication_rels_to_sync() appear to conflict
- A dump/restore cycle loses the new marker
- Inheritance can be tampered with after the logical root has been set
- There's currently no way to clear a logical root after setting it

I wonder if pg_set_logical_root() might be better implemented as part of
ALTER TABLE. Maybe with a relation option? If it all went through ALTER
TABLE ONLY ... SET, then we wouldn't have to worry about a user
modifying roots while reading pg_get_publication_rels_to_sync() in the
same query. The permissions checks should be more consistent with less
effort, and there's an existing way to set/clear the option that already
plays well with pg_dump and pg_upgrade. The downsides I can see are the
need to handle simultaneous changes to INHERIT and SET (since we'd be
manipulating pg_inherits in both), as well as the fact that ALTER TABLE
... SET defaults to altering the entire table hierarchy, which may be
bad UX for this case.

--Jacob
Attachment

Re: RFC: logical publication via inheritance root?

From
Amit Kapila
Date:
On Sat, Apr 1, 2023 at 5:06 AM Jacob Champion <jchampion@timescale.com> wrote:
>
> On Fri, Mar 31, 2023 at 3:17 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> > Outside the scope of special TimescaleDB tables and the speculated
> > pg_partman old-style table migration, will this proposed new feature
> > have any other application? In other words, do you know if this
> > proposal will be of any benefit to the *normal* users who just have
> > native PostgreSQL inherited tables they want to replicate?
>
> I think it comes down to why an inheritance scheme was used. If it's
> because you want to group rows into named, queryable subsets (e.g. the
> "cities/capitals" example in the docs [1]), I don't think this has any
> utility, because I assume you'd want to replicate your subsets as-is.
>

I also think so and your idea to have a function like
pg_set_logical_root() seems to make the inheritance hierarchy behaves
as a declarative partitioning scheme for the purpose of logical
replication.

> But if it's because you've implemented a partitioning scheme of your
> own (the docs still list reasons you might want to [2], even today),
> and all you ever really do is interact with the root table, I think
> this feature will give you some of the same benefits that
> publish_via_partition_root gives native partition users. We're very
> much in that boat, but I don't know how many others are.
>

I agree that there may still be cases as pointed out by you where
people want to use inheritance as a mechanism for partitioning but I
feel those would still be in the minority. Personally, I am not very
excited about this idea.

--
With Regards,
Amit Kapila.



Re: RFC: logical publication via inheritance root?

From
Jacob Champion
Date:
On Fri, Jun 16, 2023 at 6:26 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Sat, Apr 1, 2023 at 5:06 AM Jacob Champion <jchampion@timescale.com> wrote:
> > I think it comes down to why an inheritance scheme was used. If it's
> > because you want to group rows into named, queryable subsets (e.g. the
> > "cities/capitals" example in the docs [1]), I don't think this has any
> > utility, because I assume you'd want to replicate your subsets as-is.
>
> I also think so and your idea to have a function like
> pg_set_logical_root() seems to make the inheritance hierarchy behaves
> as a declarative partitioning scheme for the purpose of logical
> replication.

Right.

> > But if it's because you've implemented a partitioning scheme of your
> > own (the docs still list reasons you might want to [2], even today),
> > and all you ever really do is interact with the root table, I think
> > this feature will give you some of the same benefits that
> > publish_via_partition_root gives native partition users. We're very
> > much in that boat, but I don't know how many others are.
> >
>
> I agree that there may still be cases as pointed out by you where
> people want to use inheritance as a mechanism for partitioning but I
> feel those would still be in the minority.

(Just to clarify -- timescaledb is one of those cases. They definitely
still exist.)

> Personally, I am not very
> excited about this idea.

Yeah, "exciting" isn't how I'd describe this feature either :D But I
think we're probably locked out of logical replication without the
ability to override publish_as_relid for our internal tables, somehow.
And I don't think DDL replication will help, just like it wouldn't
necessarily help existing publish_via_partition_root use cases,
because we don't want to force the source table's hierarchy on the
target table. (A later version of timescaledb may not even use the
same internal layout.)

Is there an alternative implementation I'm missing, maybe, or a way to
make this feature more generally applicable? "We have table Y and want
it to be migrated as part of table X" seems to fall squarely under the
logical replication umbrella.

Thanks,
--Jacob



Re: RFC: logical publication via inheritance root?

From
Amit Kapila
Date:
On Sat, Jun 17, 2023 at 1:51 AM Jacob Champion <jchampion@timescale.com> wrote:
>
> On Fri, Jun 16, 2023 at 6:26 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > > But if it's because you've implemented a partitioning scheme of your
> > > own (the docs still list reasons you might want to [2], even today),
> > > and all you ever really do is interact with the root table, I think
> > > this feature will give you some of the same benefits that
> > > publish_via_partition_root gives native partition users. We're very
> > > much in that boat, but I don't know how many others are.
> > >
> >
> > I agree that there may still be cases as pointed out by you where
> > people want to use inheritance as a mechanism for partitioning but I
> > feel those would still be in the minority.
>
> (Just to clarify -- timescaledb is one of those cases. They definitely
> still exist.)
>

Noted, but I think that can't be the reason to accept this feature in core.

> > Personally, I am not very
> > excited about this idea.
>
> Yeah, "exciting" isn't how I'd describe this feature either :D But I
> think we're probably locked out of logical replication without the
> ability to override publish_as_relid for our internal tables, somehow.
> And I don't think DDL replication will help, just like it wouldn't
> necessarily help existing publish_via_partition_root use cases,
> because we don't want to force the source table's hierarchy on the
> target table. (A later version of timescaledb may not even use the
> same internal layout.)
>
> Is there an alternative implementation I'm missing, maybe, or a way to
> make this feature more generally applicable? "We have table Y and want
> it to be migrated as part of table X" seems to fall squarely under the
> logical replication umbrella.
>

Are you talking about this w.r.t inheritance/partition hierarchy? I
don't see any other way except "publish_via_partition_root" because we
expect the same schema and relation name on the subscriber to
replicate. You haven't explained why exactly you have such a
requirement of replicating via inheritance root aka why you want
inheritance hierarchy to be different on target db.

The other idea that came across my mind was to provide some schema
mapping kind of feature on subscribers where we could route the tuples
from table X to table Y provided they have the same or compatible
schema. I don't know if this is feasible or how generally it will be
useful and whether any other DB (replication solution) provides such a
feature but I guess something like that would have helped your use
case.

--
With Regards,
Amit Kapila.



Re: RFC: logical publication via inheritance root?

From
Jacob Champion
Date:
On Fri, Jun 16, 2023 at 9:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Is there an alternative implementation I'm missing, maybe, or a way to
> > make this feature more generally applicable? "We have table Y and want
> > it to be migrated as part of table X" seems to fall squarely under the
> > logical replication umbrella.
>
> Are you talking about this w.r.t inheritance/partition hierarchy? I
> don't see any other way except "publish_via_partition_root" because we
> expect the same schema and relation name on the subscriber to
> replicate. You haven't explained why exactly you have such a
> requirement of replicating via inheritance root aka why you want
> inheritance hierarchy to be different on target db.

I think all the "standard" use cases for publish_via_partition_root
still apply to our hypertables, and then add on the fact that our
partitions are dynamically created as needed. The subscriber may have
different ideas on how to divide and size those partitions based on
the extension version. (I'm still trying to figure out how to make
sure those new partitions are automatically included in the
publication, for what it's worth.)

> The other idea that came across my mind was to provide some schema
> mapping kind of feature on subscribers where we could route the tuples
> from table X to table Y provided they have the same or compatible
> schema. I don't know if this is feasible or how generally it will be
> useful and whether any other DB (replication solution) provides such a
> feature but I guess something like that would have helped your use
> case.

Yes, that may have also worked. Making it a subscriber-side feature
requires tight coupling between the two peers, though. (For the
timescaledb case, how does the subscriber know which new partitions
belong to which root? The publisher knows already.) And if it's
publisher-side instead, it would still need something like the
pg_get_publication_rels_to_sync() proposed here.

Thanks,
--Jacob



Re: RFC: logical publication via inheritance root?

From
Amit Kapila
Date:
On Tue, Jun 20, 2023 at 10:39 PM Jacob Champion <jchampion@timescale.com> wrote:
>
> On Fri, Jun 16, 2023 at 9:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > The other idea that came across my mind was to provide some schema
> > mapping kind of feature on subscribers where we could route the tuples
> > from table X to table Y provided they have the same or compatible
> > schema. I don't know if this is feasible or how generally it will be
> > useful and whether any other DB (replication solution) provides such a
> > feature but I guess something like that would have helped your use
> > case.
>
> Yes, that may have also worked. Making it a subscriber-side feature
> requires tight coupling between the two peers, though. (For the
> timescaledb case, how does the subscriber know which new partitions
> belong to which root?
>

Yeah, the subscriber can't figure that out automatically. Users need
to provide the mapping manually.

--
With Regards,
Amit Kapila.



Re: RFC: logical publication via inheritance root?

From
Jacob Champion
Date:
On Wed, Jun 21, 2023 at 3:28 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Tue, Jun 20, 2023 at 10:39 PM Jacob Champion <jchampion@timescale.com> wrote:
> > Making it a subscriber-side feature
> > requires tight coupling between the two peers, though. (For the
> > timescaledb case, how does the subscriber know which new partitions
> > belong to which root?
>
> Yeah, the subscriber can't figure that out automatically. Users need
> to provide the mapping manually.

Right. For that reason, I think subscriber-side mappings probably
won't help this particular use case. This patchset is pushing more in
the direction of publisher-side mappings.

Thanks,
--Jacob



Re: RFC: logical publication via inheritance root?

From
Jacob Champion
Date:
On 6/6/23 08:50, Jacob Champion wrote:
> Commit 062a84442 necessitated some rework of the new
> pg_get_publication_rels_to_sync() helper. It now takes a list of
> publications so that we can handle conflicts in the pubviaroot settings.
> This is more complicated than before -- unlike partitions, standard
> inheritance trees can selectively publish tables that aren't leaves. But
> I think I've finally settled on some semantics for it which are
> unsurprising.
The semantics I've picked are wrong :( I've accidentally reintroduced a
bug that's similar to the one discussed upthread, where if you have two
leaf tables published through different roots:

  tree    pub1    pub2
  -----   -----   -----
    A       -       A
   B C     B -     - -
  D       D       D

then a subscription on both publications will duplicate the data in the
leaf (D in the above example). I need to choose semantics that are
closer to the current behavior of partitions.

> I wonder if pg_set_logical_root() might be better implemented as part of
> ALTER TABLE. Maybe with a relation option? If it all went through ALTER
> TABLE ONLY ... SET, then we wouldn't have to worry about a user
> modifying roots while reading pg_get_publication_rels_to_sync() in the
> same query. The permissions checks should be more consistent with less
> effort, and there's an existing way to set/clear the option that already
> plays well with pg_dump and pg_upgrade.

I've implemented ALTER TABLE in v3; I like it a lot more. The new
reloption is named publish_via_parent. So now we can unset the flag and
dump/restore.

v3 also fixes a nasty uninitialized stack variable, along with a bad
collation assumption I made.

> The downsides I can see are the
> need to handle simultaneous changes to INHERIT and SET (since we'd be
> manipulating pg_inherits in both),

(This didn't turn out to be as bad as I feared.)

> as well as the fact that ALTER TABLE
> ... SET defaults to altering the entire table hierarchy, which may be
> bad UX for this case.

This was just wrong -- ALTER TABLE ... SET does not recurse. I think the
docs are misleading here, and I'm not sure why we allow an explicit '*'
after a table name if we're going to ignore it. But I also don't really
want to poke at that in this patchset, if I can avoid it.

--Jacob
Attachment

Re: RFC: logical publication via inheritance root?

From
Aleksander Alekseev
Date:
Hi,

> v3 also fixes a nasty uninitialized stack variable, along with a bad
> collation assumption I made.

I decided to take a closer look at 0001.

Since pg_get_relation_publishing_info() is exposed to the users I
think it should be described in a bit more detail than:

```
+  descr => 'get information on how a relation will be published via a
list of publications',
```

This description in \df+ output doesn't seem to be particularly
useful. Also the function should be documented. In order to accomplish
all this it could make sense to reconsider the signature of the
function and/or split it into several separate functions.

The volatility is declared as STABLE. This is probably correct. At
least at first glance I don't see any calls of VOLATILE functions and
off the top of my head can't give an example when it will not behave
as STABLE. This being said, a second opinion would be appreciated.

process_relation_publications() misses a brief comment before the
declaration. What are the arguments, what is the return value, are
there any pre/postconditions (locks, memory), etc.

Otherwise 0001 is in a decent shape, it passes make
installcheck-world, etc. I would suggest focusing on delivering this
part, assuming there will be no push-back to the refactorings and
slight test improvements. If 0002 could be further decomposed into
separate iterative improvements this could be helpful.

-- 
Best regards,
Aleksander Alekseev



Re: RFC: logical publication via inheritance root?

From
Jacob Champion
Date:
On 7/19/23 04:54, Aleksander Alekseev wrote:
> I decided to take a closer look at 0001.

Hi Aleks! I saw you put this back into Needs Review; thanks.

This thread has been pretty quiet from me, because we've run into
difficulties on the subscriber end. Our original optimistic assumption
was that we just needed to funnel all the leaf tables through the root
on the publisher, and then a sufficiently complex replica trigger on
the subscriber would be able to correctly route the data through the
root into new leaves.

That has not panned out for a number of reasons. Probably the easiest
one to describe is that replica identity handling breaks: if we move
the incoming tuples to different tables, and an UPDATE or DELETE comes
in later for those rows, the replication logic checks the root table
(bypassing our existing routing logic) and sees that they don't exist.
We never get the chance to handle routing the way that partitions do
[1]. Given that, I think I need to pivot and focus on the subscriber
side first. That might(?) be a smaller effort anyway, and if we can't
make headway there then publisher-side support probably doesn't make
sense at all.

So I'll pause this CF entry for now. This would also be a good time to
ask the crowd: are there alternative approaches to solve the OP that I
may be missing?

Thanks!
--Jacob

[1] https://git.postgresql.org/cgit/postgresql.git/tree/src/backend/replication/logical/worker.c?h=8bf7db02#n2631

P.S. I've attached a v4, which fixes the semantics problem I mentioned
upthread, so it doesn't get lost in the shuffle.

Attachment