Re: Added schema level support for publication. - Mailing list pgsql-hackers

From vignesh C
Subject Re: Added schema level support for publication.
Date
Msg-id CALDaNm2FY6zFBqEZR+yHX7i=jbjnXZeWa_QOpFsq_+Ghf_CJng@mail.gmail.com
Whole thread Raw
In response to Re: Added schema level support for publication.  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Added schema level support for publication.
List pgsql-hackers
On Sat, 14 Dec 2024 at 05:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> [ reviving one aspect of an old thread ]
>
> vignesh C <vignesh21@gmail.com> writes:
> > On Mon, Jul 19, 2021 at 9:32 AM tanghy.fnst@fujitsu.com <
> > tanghy.fnst@fujitsu.com> wrote:
> >> I tested your v12 patch and found a problem in the following case.
> >>
> >> Step 1:
> >> postgres=# create schema s1;
> >> CREATE SCHEMA
> >> postgres=# create table s1.t1 (a int);
> >> CREATE TABLE
> >> postgres=# create publication pub_t for table s1.t1;
> >> CREATE PUBLICATION
> >> postgres=# create publication pub_s for schema s1;
> >> CREATE PUBLICATION
> >>
> >> Step 2:
> >> pg_dump -N s1
> >>
> >> I dumped and excluded schema s1, pg_dump generated the following SQL:
> >> -------------------------------
> >> ALTER PUBLICATION pub_s ADD SCHEMA s1;
> >>
> >> I think it was not expected because SQL like "ALTER PUBLICATION pub_t ADD
> > TABLE s1.t1" was not generated in my case. Thoughts?
>
> > Thanks for reporting this issue, this issue is fixed in the v13 patch
>
> I suppose this exchange is what led to this bit in
> getPublicationNamespaces:
>
>         /*
>          * We always dump publication namespaces unless the corresponding
>          * namespace is excluded from the dump.
>          */
>         if (nspinfo->dobj.dump == DUMP_COMPONENT_NONE)
>             continue;
>
> I'd like to push back against this on three separate grounds:
>
>
> 1. The behavior this produces is extremely non-obvious and not
> adequately explained by the comment, which makes one wonder how
> much of it was intended.  For example:
>
> * The public schema will be included if listed in FOR ALL TABLES IN,
> even though it's not dumped explicitly in the dump, because its dump
> mask includes other bits besides DUMP_COMPONENT_DEFINITION.  OK, that
> was probably intentional, but you wouldn't know it from the comment.
>
> * Schemas created within extensions will be included if listed in FOR
> ALL TABLES IN, even though they're not dumped explicitly in the dump.
> This seems like a quite accidental by-product of the fact that
> checkExtensionMembership will set DUMP_COMPONENT_ACL on extension
> member objects, thus making their dump mask not NONE.  If this
> behavior was intentional, it needs a less-fragile implementation.
>
> * The information_schema will NOT be included, even if it was listed in
> FOR ALL TABLES IN.  Admittedly, information_schema doesn't normally
> contain any tables that'd be useful to publish.  But still, this seems
> like randomly ignoring the user's intent.
>
>
> 2. The complaint was that if a schema is excluded from the dump
> by --exclude-schema, then it should not get included in the
> publication either.  I think this is at best highly debatable:
> arguably it amounts to breaking the publication.  It seems
> analogous to deciding that if a function is excluded from the
> dump, while a view using the function is included, we should
> silently adjust the view by removing the output columns or
> WHERE clauses that use the function.  I'm pretty sure that
> nobody would think that was sane.  Perhaps there's a case for
> excluding the view as a whole, but we don't do that.  Besides, the
> corresponding behavior would be to exclude the whole publication,
> not silently modify its definition.
>
>
> 3. The corresponding test for individual tables listed in
> a publication is coded differently:
>
>         /*
>          * Ignore publication membership of tables whose definitions are not
>          * to be dumped.
>          */
>         if (!(tbinfo->dobj.dump & DUMP_COMPONENT_DEFINITION))
>             continue;
>
> This is considerably easier to understand the effects of than a test
> on the whole dump mask: it will list the table if we intend to emit
> CREATE TABLE, and not otherwise, regardless of side issues like ACLs.
> But why is it different from the code for schemas?
>
>
> So I think that this was just wrongly thought through.  My
> preference would be to either delete the above-quoted bit in
> getPublicationNamespaces entirely, or make it look like the
> test in getPublicationTables.  Or maybe we should delete
> both of these tests, on the grounds that redefining the
> contents of the publication is far outside pg_dump's charter.

We cannot keep the code identical for getPublicationNamespaces and
getPublicationTables because selectDumpableNamespace performs special
handling for the public schema. Specifically, it unsets
DUMP_COMPONENT_DEFINITION for the public namespace, which prevents the
inclusion of 'TABLES IN SCHEMA public' in the publication. That is the
reason we did not keep the code similar to getPublicationTables.
I prefer the other approach to remove both the checks in
getPublicationTables() and getPublicationNamespaces() which also makes
it consistent with the other case that Amit mentioned at [1].

[1] - https://www.postgresql.org/message-id/CAA4eK1%2BZYanA51c9NzKM31AqJSw-j0-edGz91%2BVh-nsoKdzKfQ%40mail.gmail.com

Regards,
Vignesh



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: DOCS: pg_createsubscriber wrong link?
Next
From: Enes Özdeniz
Date:
Subject: Query optimization about cube