Re: Column Filtering in Logical Replication - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Column Filtering in Logical Replication |
Date | |
Msg-id | CAA4eK1KUpOnFam+2bBuGGX-_pZbmO_ShJV6G=GT-RyM7gQSvWg@mail.gmail.com Whole thread Raw |
In response to | Re: Column Filtering in Logical Replication (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: Column Filtering in Logical Replication
|
List | pgsql-hackers |
On Fri, Mar 11, 2022 at 6:20 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > On 3/11/22 10:52, Amit Kapila wrote: > > On Fri, Mar 11, 2022 at 7:26 AM Tomas Vondra > > <tomas.vondra@enterprisedb.com> wrote: > >> > >> On 3/10/22 20:10, Tomas Vondra wrote: > >>> > >>> > >>> FWIW I think the reason is pretty simple - pgoutput_row_filter_init is > >>> broken. It assumes you can just do this > >>> > >>> rftuple = SearchSysCache2(PUBLICATIONRELMAP, > >>> ObjectIdGetDatum(entry->publish_as_relid), > >>> ObjectIdGetDatum(pub->oid)); > >>> > >>> if (HeapTupleIsValid(rftuple)) > >>> { > >>> /* Null indicates no filter. */ > >>> rfdatum = SysCacheGetAttr(PUBLICATIONRELMAP, rftuple, > >>> Anum_pg_publication_rel_prqual, > >>> &pub_no_filter); > >>> } > >>> else > >>> { > >>> pub_no_filter = true; > >>> } > >>> > >>> > >>> and pub_no_filter=true means there's no filter at all. Which is > >>> nonsense, because we're using publish_as_relid here - the publication > >>> may not include this particular ancestor, in which case we need to just > >>> ignore this publication. > >>> > >>> So yeah, this needs to be reworked. > >>> > >> > >> I spent a bit of time looking at this, and I think a minor change in > >> get_rel_sync_entry() fixes this - it's enough to ensure rel_publications > >> only includes publications that actually include publish_as_relid. > >> > > > > Thanks for looking into this. I think in the first patch before > > calling get_partition_ancestors() we need to ensure it is a partition > > (the call expects that) and pubviaroot is true. > > Does the call really require that? > There may not be any harm but I have mentioned it because (a) the comments atop get_partition_ancestors(...it should only be called when it is known that the relation is a partition.) indicates the same; (b) all existing callers seems to use it only for partitions. > Also, I'm not sure why we'd need to > look at pubviaroot - that's already considered earlier when calculating > publish_as_relid, here we just need to know the relationship of the two > OIDs (if one is ancestor/child of the other). > I thought of avoiding calling get_partition_ancestors when pubviaroot is not set. It will unnecessary check the whole hierarchy for partitions even when it is not required. I agree that this is not a common code path but still felt why do it needlessly? > > I think it would be > > good if we can avoid an additional call to get_partition_ancestors() > > as it could be costly. > > Maybe. OTOH we only should do this only very rarely anyway. > > > I wonder why it is not sufficient to ensure > > that publish_as_relid exists after ancestor in ancestors list before > > assigning the ancestor to publish_as_relid? This only needs to be done > > in case of (if (!publish)). I have not tried this so I could be wrong. > > > > I'm not sure what exactly are you proposing. Maybe try coding it? That's > probably faster than trying to describe what the code might do ... > Okay, please find attached. I have done basic testing of this, if we agree with this approach then this will require some more testing. -- With Regards, Amit Kapila.
Attachment
pgsql-hackers by date: