RE: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers

From wangw.fnst@fujitsu.com
Subject RE: Data is copied twice when specifying both child and parent table in publication
Date
Msg-id OS3PR01MB6275470131128E4C371E0F7C9E069@OS3PR01MB6275.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Data is copied twice when specifying both child and parent table in publication  (vignesh C <vignesh21@gmail.com>)
List pgsql-hackers
On Thurs, Nov 17, 2022 at 13:58 PM vignesh C <vignesh21@gmail.com> wrote:
> On Wed, 16 Nov 2022 at 14:28, wangw.fnst@fujitsu.com
> <wangw.fnst@fujitsu.com> wrote:
> >
> > On Mon, Nov 14, 2022 at 0:56 AM vignesh C <vignesh21@gmail.com> wrote:
> > > >
> > > > Attach new patches.
> > >
> >
> > Thanks for your comments.
> >
> > > Here we are having tables list to store the relids and table_infos
> > > list which stores pubid along with relid. Here tables list acts as a
> > > temporary list to get filter_partitions and then delete the
> > > published_rel from table_infos. Will it be possible to directly
> > > operate on table_infos list and remove the temporary tables list used.
> > > We might have to implement comparator, deduplication functions and
> > > change filter_partitions function to work directly on published_rel
> > > type list.
> > > +                       /
> > > +                        * Record the published table and the
> > > corresponding publication so
> > > +                        * that we can get row filters and column list later.
> > > +                        *
> > > +                        * When a table is published by multiple
> > > publications, to obtain
> > > +                        * all row filters and column list, the
> > > structure related to this
> > > +                        * table will be recorded multiple times.
> > > +                        */
> > > +                       foreach(lc, pub_elem_tables)
> > > +                       {
> > > +                               published_rel *table_info =
> > > (published_rel *) malloc(sizeof(published_rel));
> > > +
> > > +                               table_info->relid = lfirst_oid(lc);
> > > +                               table_info->pubid = pub_elem->oid;
> > > +                               table_infos = lappend(table_infos, table_info);
> > > +                       }
> > > +
> > > +                       tables = list_concat(tables, pub_elem_tables);
> > >
> > > Thoughts?
> >
> > I think we could only deduplicate published tables per publication to get all
> > row filters and column lists for each published table later.
> > I removed the temporary list 'tables' and modified the API of the function
> > filter_partitions to handle published_rel type list.
> >
> > Attach the new patch set.
> 
> Thanks for the update patch.

Thanks for your comment.

> One suggestion:
> +/* Records association between publication and published table */
> +typedef struct
> +{
> +       Oid                     relid;          /* OID of published table */
> +       Oid                     pubid;          /* OID of publication
> that publishes this
> +                                                        * table. */
> +} published_rel;
> +
> 
> +                       /*
> +                        * Record the published table and the
> corresponding publication so
> +                        * that we can get row filters and column list later.
> +                        *
> +                        * When a table is published by multiple
> publications, to obtain
> +                        * all row filters and column list, the
> structure related to this
> +                        * table will be recorded multiple times.
> +                        */
> +                       foreach(lc, pub_elem_tables)
> +                       {
> +                               published_rel *table_info =
> (published_rel *) malloc(sizeof(published_rel));
> +
> +                               table_info->relid = lfirst_oid(lc);
> +                               table_info->pubid = pub_elem->oid;
> +                               table_infos = lappend(table_infos, table_info);
> +                       }
> 
> In this format if there are n relations in publication we will store
> pubid n times, in all tables publication there will many thousands of
> tables. We could avoid storing the pubid for every relid, instead we
> could  represent it like below to avoid storing publication id for
> each tables:
> 
> +/* Records association between publication and published tables */
> +typedef struct
> +{
> +       List    *relids,          /* OIDs of the publisher tables */
> +       Oid         pubid;           /* OID of publication that publishes this
> +                                     * tables. */
> +}published_rel;
> 
> Thoughts?

I think this complicates the function filter_partitions.
Because if we use such a node type, I think we need to concatenate 'relids'
list of each node of the 'table_infos' list in the function filter_partitions
to become a temporary list. Then filter this temporary list and process the
'table_infos' list according to the filtering result.

Regards,
Wang wei

pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: Reducing power consumption on idle servers
Next
From: Simon Riggs
Date:
Subject: Re: Reducing power consumption on idle servers