Re: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers

From vignesh C
Subject Re: Data is copied twice when specifying both child and parent table in publication
Date
Msg-id CALDaNm3oUXw_YA4PeQrUAxFupkgye+e2LZbJxVOY9uCCvg2Jng@mail.gmail.com
Whole thread Raw
In response to RE: Data is copied twice when specifying both child and parent table in publication  ("wangw.fnst@fujitsu.com" <wangw.fnst@fujitsu.com>)
Responses RE: Data is copied twice when specifying both child and parent table in publication
List pgsql-hackers
On Wed, 16 Nov 2022 at 14:28, wangw.fnst@fujitsu.com
<wangw.fnst@fujitsu.com> wrote:
>
> On Mon, Nov 14, 2022 at 0:56 AM vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > Attach new patches.
> >
>
> Thanks for your comments.
>
> > Here we are having tables list to store the relids and table_infos
> > list which stores pubid along with relid. Here tables list acts as a
> > temporary list to get filter_partitions and then delete the
> > published_rel from table_infos. Will it be possible to directly
> > operate on table_infos list and remove the temporary tables list used.
> > We might have to implement comparator, deduplication functions and
> > change filter_partitions function to work directly on published_rel
> > type list.
> > +                       /
> > +                        * Record the published table and the
> > corresponding publication so
> > +                        * that we can get row filters and column list later.
> > +                        *
> > +                        * When a table is published by multiple
> > publications, to obtain
> > +                        * all row filters and column list, the
> > structure related to this
> > +                        * table will be recorded multiple times.
> > +                        */
> > +                       foreach(lc, pub_elem_tables)
> > +                       {
> > +                               published_rel *table_info =
> > (published_rel *) malloc(sizeof(published_rel));
> > +
> > +                               table_info->relid = lfirst_oid(lc);
> > +                               table_info->pubid = pub_elem->oid;
> > +                               table_infos = lappend(table_infos, table_info);
> > +                       }
> > +
> > +                       tables = list_concat(tables, pub_elem_tables);
> >
> > Thoughts?
>
> I think we could only deduplicate published tables per publication to get all
> row filters and column lists for each published table later.
> I removed the temporary list 'tables' and modified the API of the function
> filter_partitions to handle published_rel type list.
>
> Attach the new patch set.

Thanks for the update patch.
One suggestion:
+/* Records association between publication and published table */
+typedef struct
+{
+       Oid                     relid;          /* OID of published table */
+       Oid                     pubid;          /* OID of publication
that publishes this
+                                                        * table. */
+} published_rel;
+

+                       /*
+                        * Record the published table and the
corresponding publication so
+                        * that we can get row filters and column list later.
+                        *
+                        * When a table is published by multiple
publications, to obtain
+                        * all row filters and column list, the
structure related to this
+                        * table will be recorded multiple times.
+                        */
+                       foreach(lc, pub_elem_tables)
+                       {
+                               published_rel *table_info =
(published_rel *) malloc(sizeof(published_rel));
+
+                               table_info->relid = lfirst_oid(lc);
+                               table_info->pubid = pub_elem->oid;
+                               table_infos = lappend(table_infos, table_info);
+                       }

In this format if there are n relations in publication we will store
pubid n times, in all tables publication there will many thousands of
tables. We could avoid storing the pubid for every relid, instead we
could  represent it like below to avoid storing publication id for
each tables:

+/* Records association between publication and published tables */
+typedef struct
+{
+       List    *relids,          /* OIDs of the publisher tables */
+       Oid         pubid;           /* OID of publication that publishes this
+                                     * tables. */
+}published_rel;

Thoughts?

Regards,
Vignesh



pgsql-hackers by date:

Previous
From: "Waithant Myo (Fujitsu)"
Date:
Subject: RE: Fix the README file for MERGE command
Next
From: Noah Misch
Date:
Subject: Re: Supporting TAP tests with MSVC and Windows