Home > mailing lists

Re: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers

From	vignesh C
Subject	Re: Data is copied twice when specifying both child and parent table in publication
Date	November 17, 2022 05:57:58
Msg-id	CALDaNm3oUXw_YA4PeQrUAxFupkgye+e2LZbJxVOY9uCCvg2Jng@mail.gmail.com Whole thread Raw
In response to	RE: Data is copied twice when specifying both child and parent table in publication ("wangw.fnst@fujitsu.com" <wangw.fnst@fujitsu.com>)
Responses	RE: Data is copied twice when specifying both child and parent table in publication
List	pgsql-hackers

Tree view

On Wed, 16 Nov 2022 at 14:28, wangw.fnst@fujitsu.com
<wangw.fnst@fujitsu.com> wrote:
>
> On Mon, Nov 14, 2022 at 0:56 AM vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > Attach new patches.
> >
>
> Thanks for your comments.
>
> > Here we are having tables list to store the relids and table_infos
> > list which stores pubid along with relid. Here tables list acts as a
> > temporary list to get filter_partitions and then delete the
> > published_rel from table_infos. Will it be possible to directly
> > operate on table_infos list and remove the temporary tables list used.
> > We might have to implement comparator, deduplication functions and
> > change filter_partitions function to work directly on published_rel
> > type list.
> > +                       /
> > +                        * Record the published table and the
> > corresponding publication so
> > +                        * that we can get row filters and column list later.
> > +                        *
> > +                        * When a table is published by multiple
> > publications, to obtain
> > +                        * all row filters and column list, the
> > structure related to this
> > +                        * table will be recorded multiple times.
> > +                        */
> > +                       foreach(lc, pub_elem_tables)
> > +                       {
> > +                               published_rel *table_info =
> > (published_rel *) malloc(sizeof(published_rel));
> > +
> > +                               table_info->relid = lfirst_oid(lc);
> > +                               table_info->pubid = pub_elem->oid;
> > +                               table_infos = lappend(table_infos, table_info);
> > +                       }
> > +
> > +                       tables = list_concat(tables, pub_elem_tables);
> >
> > Thoughts?
>
> I think we could only deduplicate published tables per publication to get all
> row filters and column lists for each published table later.
> I removed the temporary list 'tables' and modified the API of the function
> filter_partitions to handle published_rel type list.
>
> Attach the new patch set.

Thanks for the update patch.
One suggestion:
+/* Records association between publication and published table */
+typedef struct
+{
+       Oid                     relid;          /* OID of published table */
+       Oid                     pubid;          /* OID of publication
that publishes this
+                                                        * table. */
+} published_rel;
+

+                       /*
+                        * Record the published table and the
corresponding publication so
+                        * that we can get row filters and column list later.
+                        *
+                        * When a table is published by multiple
publications, to obtain
+                        * all row filters and column list, the
structure related to this
+                        * table will be recorded multiple times.
+                        */
+                       foreach(lc, pub_elem_tables)
+                       {
+                               published_rel *table_info =
(published_rel *) malloc(sizeof(published_rel));
+
+                               table_info->relid = lfirst_oid(lc);
+                               table_info->pubid = pub_elem->oid;
+                               table_infos = lappend(table_infos, table_info);
+                       }

In this format if there are n relations in publication we will store
pubid n times, in all tables publication there will many thousands of
tables. We could avoid storing the pubid for every relid, instead we
could  represent it like below to avoid storing publication id for
each tables:

+/* Records association between publication and published tables */
+typedef struct
+{
+       List    *relids,          /* OIDs of the publisher tables */
+       Oid         pubid;           /* OID of publication that publishes this
+                                     * tables. */
+}published_rel;

Thoughts?

Regards,
Vignesh

pgsql-hackers by date:

From: "Waithant Myo (Fujitsu)"
Date: 17 November 2022, 05:16:47
Subject: RE: Fix the README file for MERGE command

From: Noah Misch
Date: 17 November 2022, 06:18:05
Subject: Re: Supporting TAP tests with MSVC and Windows

Re: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers

Previous

Next