Re: Added schema level support for publication. - Mailing list pgsql-hackers
From | vignesh C |
---|---|
Subject | Re: Added schema level support for publication. |
Date | |
Msg-id | CALDaNm0V=Tx=jQ310duLM0xFeCht0nQYr=kU5NetqnjWLYLHnA@mail.gmail.com Whole thread Raw |
In response to | Re: Added schema level support for publication. (Amit Kapila <amit.kapila16@gmail.com>) |
List | pgsql-hackers |
On Mon, Aug 9, 2021 at 10:23 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sun, Aug 8, 2021 at 2:52 PM vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Fri, Aug 6, 2021 at 4:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Fri, Aug 6, 2021 at 2:02 PM vignesh C <vignesh21@gmail.com> wrote:
> > > >
> > > > On Wed, Aug 4, 2021 at 4:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > On Tue, Aug 3, 2021 at 8:38 PM vignesh C <vignesh21@gmail.com> wrote:
> > > >
> > > > > 6.
> > > > > + {PublicationSchemaRelationId, /* PUBLICATIONSCHEMAMAP */
> > > > > + PublicationSchemaPsnspcidPspubidIndexId,
> > > > > + 2,
> > > > > + {
> > > > > + Anum_pg_publication_schema_psnspcid,
> > > > > + Anum_pg_publication_schema_pspubid,
> > > > > + 0,
> > > > > + 0
> > > > > + },
> > > > >
> > > > > Why don't we keep pubid as the first column in this index?
> > > >
> > > > I wanted to keep it similar to PUBLICATIONRELMAP, should we keep it as
> > > > it is, thoughts?
> > > >
> > >
> > > Okay, I see your point. I think for PUBLICATIONRELMAP, we need it
> > > because it is searched using the only relid in
> > > GetRelationPublications, so, similarly, in the patch, you are using
> > > schema_oid in GetSchemaPublications, so probably that will help. I was
> > > wondering why you haven't directly used the cache in
> > > GetSchemaPublications similar to GetRelationPublications?
> >
> > Both of the approaches work, I was not sure which one is better, If
> > the approach in GetRelationPublications is better, I will change it to
> > something similar to GetRelationPublications. Thoughts?
> >
>
> I think it is better to use the cache as if we don't find the entry in
> the cache, then we will anyway search the required entry via sys table
> scan, see SearchCatCacheList.
Modified. This is handled in the v20 patch posted at [1].
I think the point I wanted to ensure was
> that a concurrent session won't blow up the entry while we are looking
> at it. How is that ensured?
The concurrency points occur at two places, Walsender session and user session:
For Walsender process when we query the data from the cache we will get the results based on historic snapshot. I also debugged and verified that we get the results based on historic snapshot, if we check the cache during our operation (insert is just before drop) we will be able to get the dropped record from the cache as this drop occurred after our insert. And if we query the cache after the drop, we will not get the dropped information from the cache. So I feel our existing code is good enough which handles the concurrency through the historic snapshot.
For user sessions, user session checks for replica identity for update/delete operations. To prevent concurrency issues, when schema is added to the publication, the rel cache invalidation happens in publication_add_schema by calling InvalidatePublicationRels, similarly when a schema is dropped from the publication, the rel cache invalidation is handled in RemovePublicationSchemaById by calling InvalidatePublicationRels. Once the invalidation happens it will check the system tables again before deciding. I felt this rel cache invalidation will prevent concurrency issues.
[1] - https://www.postgresql.org/message-id/CALDaNm00X9SQBokUTy1OxN1Sa2DFsK8rg8j_wLgc-7ZuKcuh0Q%40mail.gmail.com
Regards,
Vignesh
>
> On Sun, Aug 8, 2021 at 2:52 PM vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Fri, Aug 6, 2021 at 4:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Fri, Aug 6, 2021 at 2:02 PM vignesh C <vignesh21@gmail.com> wrote:
> > > >
> > > > On Wed, Aug 4, 2021 at 4:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > On Tue, Aug 3, 2021 at 8:38 PM vignesh C <vignesh21@gmail.com> wrote:
> > > >
> > > > > 6.
> > > > > + {PublicationSchemaRelationId, /* PUBLICATIONSCHEMAMAP */
> > > > > + PublicationSchemaPsnspcidPspubidIndexId,
> > > > > + 2,
> > > > > + {
> > > > > + Anum_pg_publication_schema_psnspcid,
> > > > > + Anum_pg_publication_schema_pspubid,
> > > > > + 0,
> > > > > + 0
> > > > > + },
> > > > >
> > > > > Why don't we keep pubid as the first column in this index?
> > > >
> > > > I wanted to keep it similar to PUBLICATIONRELMAP, should we keep it as
> > > > it is, thoughts?
> > > >
> > >
> > > Okay, I see your point. I think for PUBLICATIONRELMAP, we need it
> > > because it is searched using the only relid in
> > > GetRelationPublications, so, similarly, in the patch, you are using
> > > schema_oid in GetSchemaPublications, so probably that will help. I was
> > > wondering why you haven't directly used the cache in
> > > GetSchemaPublications similar to GetRelationPublications?
> >
> > Both of the approaches work, I was not sure which one is better, If
> > the approach in GetRelationPublications is better, I will change it to
> > something similar to GetRelationPublications. Thoughts?
> >
>
> I think it is better to use the cache as if we don't find the entry in
> the cache, then we will anyway search the required entry via sys table
> scan, see SearchCatCacheList.
Modified. This is handled in the v20 patch posted at [1].
I think the point I wanted to ensure was
> that a concurrent session won't blow up the entry while we are looking
> at it. How is that ensured?
The concurrency points occur at two places, Walsender session and user session:
For Walsender process when we query the data from the cache we will get the results based on historic snapshot. I also debugged and verified that we get the results based on historic snapshot, if we check the cache during our operation (insert is just before drop) we will be able to get the dropped record from the cache as this drop occurred after our insert. And if we query the cache after the drop, we will not get the dropped information from the cache. So I feel our existing code is good enough which handles the concurrency through the historic snapshot.
For user sessions, user session checks for replica identity for update/delete operations. To prevent concurrency issues, when schema is added to the publication, the rel cache invalidation happens in publication_add_schema by calling InvalidatePublicationRels, similarly when a schema is dropped from the publication, the rel cache invalidation is handled in RemovePublicationSchemaById by calling InvalidatePublicationRels. Once the invalidation happens it will check the system tables again before deciding. I felt this rel cache invalidation will prevent concurrency issues.
[1] - https://www.postgresql.org/message-id/CALDaNm00X9SQBokUTy1OxN1Sa2DFsK8rg8j_wLgc-7ZuKcuh0Q%40mail.gmail.com
Regards,
Vignesh
pgsql-hackers by date: