Re: Column Filtering in Logical Replication - Mailing list pgsql-hackers

From vignesh C
Subject Re: Column Filtering in Logical Replication
Date
Msg-id CALDaNm06=LDytYyY+xcAQd8UK_YpJ3zMo4P5V8KBArw6MoDWDg@mail.gmail.com
Whole thread Raw
In response to Re: Column Filtering in Logical Replication  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses Re: Column Filtering in Logical Replication  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Thu, Sep 16, 2021 at 7:20 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2021-Sep-16, vignesh C wrote:
>
> > diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
> > index e3068a374e..c50bb570ea 100644
> > --- a/src/backend/parser/gram.y
> > +++ b/src/backend/parser/gram.y
>
> Yeah, on a quick glance this looks all wrong.  Your PublicationObjSpec
> production should return a node with tag PublicationObjSpec, and
> pubobj_expr should not exist at all -- that stuff is just making it all
> more confusing.
>
> I think it'd be something like this:
>
> PublicationObjSpec:
>                         ALL TABLES
>                                         {
>                                                 $$ = makeNode(PublicationObjSpec);
>                                                 $$->pubobjtype = PUBLICATIONOBJ_ALL_TABLES;
>                                                 $$->location = @1;
>                                         }
>                         | TABLE qualified_name
>                                         {
>                                                 $$ = makeNode(PublicationObjSpec);
>                                                 $$->pubobjtype = PUBLICATIONOBJ_TABLE;
>                                                 $$->pubobj = $2;
>                                                 $$->location = @1;
>                                         }
>                         | ALL TABLES IN_P SCHEMA name
>                                         {
>                                                 $$ = makeNode(PublicationObjSpec);
>                                                 $$->pubobjtype = PUBLICATIONOBJ_ALL_TABLES_IN_SCHEMA;
>                                                 $$->pubobj = makeRangeVar( ... $5 ... );
>                                                 $$->location = @1;
>                                         }
>                         | qualified_name
>                                         {
>                                                 $$ = makeNode(PublicationObjSpec);
>                                                 $$->pubobjtype = PUBLICATIONOBJ_CONTINUATION;
>                                                 $$->pubobj = $1;
>                                                 $$->location = @1;
>                                         };
>
> You need a single object name under TABLE, not a list -- this was Tom's
> point about needing post-processing to determine how to assign a type to
> a object that's what I named PUBLICATIONOBJ_CONTINUATION here.

In the above, we will not be able to use qualified_name, as
qualified_name will not support the following syntaxes:
create publication pub1 for table t1 *;
create publication pub1 for table ONLY t1 *;
create publication pub1 for table ONLY (t1);

To solve this problem we can change qualified_name to relation_expr
but the problem with doing that is that the user will be able to
provide the following syntaxes:
create publication pub1 for all tables in schema sch1 *;
create publication pub1 for all tables in schema ONLY sch1 *;
create publication pub1 for all tables in schema ONLY (sch1);

To handle this we will need some special flag which will differentiate
these and throw errors at post processing time. We need to define an
expression similar to relation_expr say pub_expr which handles all
variants of qualified_name and then use a special flag so that we can
throw an error if somebody uses the above type of syntax for schema
names. And then if we have to distinguish between schema name and
relation name variant, then we need few other things.

We proposed the below solution which handles all these problems and
also used Node type which need not store schemaname in RangeVar type:
pubobj_expr:
                        pubobj_name
                                {
                                        /* inheritance query, implicitly */
                                        $$ = makeNode(PublicationObjSpec);
                                        $$->object = $1;
                                }
                        | extended_relation_expr
                                {
                                        $$ = makeNode(PublicationObjSpec);
                                        $$->object = (Node *)$1;
                                }
                        | CURRENT_SCHEMA
                                {
                                        $$ = makeNode(PublicationObjSpec);
                                        $$->object = (Node
*)makeString("CURRENT_SCHEMA");
                                }
                ;
/* This can be either a schema or relation name. */
pubobj_name:
                        ColId
                                {
                                        $$ = (Node *) makeString($1);
                                }
                        | ColId indirection
                                {
                                        $$ = (Node *)
makeRangeVarFromQualifiedName($1, $2, @1, yyscanner);
                                }
                ;
/* FOR TABLE and FOR ALL TABLES IN SCHEMA specifications */
PublicationObjSpec:     TABLE pubobj_expr
                                        {
                                                $$ = $2;
                                                $$->pubobjtype =
PUBLICATIONOBJ_TABLE;
                                                $$->location = @1;
                                        }
                        | ALL TABLES IN_P SCHEMA pubobj_expr
                                        {
                                                $$ = $5;
                                                $$->pubobjtype =
PUBLICATIONOBJ_REL_IN_SCHEMA;
                                                $$->location = @1;
                                        }
                        | pubobj_expr
                                        {
                                                $$ = $1;
                                                $$->pubobjtype =
PUBLICATIONOBJ_UNKNOWN;
                                                $$->location = @1;
                                        }
                ;

The same has been proposed in the recent version of patch [1].
[1] - https://www.postgresql.org/message-id/CALDaNm0OudeDeFN7bSWPro0hgKx%3D1zPgcNFWnvU_G6w3mDPX0Q%40mail.gmail.com
Thoughts?

Regards,
Vignesh



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Split xlog.c
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Improve logging when using Huge Pages