Thread: Re: How is this possible "publication does not exist"

Re: How is this possible "publication does not exist"

From
Amit Kapila
Date:
On Tue, May 24, 2022 at 7:51 AM operations i <ioperations.c@gmail.com> wrote:
I have found this topic (https://www.postgresql.org/message-id/89b22955-bf9f-f7f2-9620-fbe2e4766a18%402ndquadrant.com)
cause I have the same problem in my environment, and I'd like to continue the story .
it seems the problem will definitely occur when start the script.
find an empty directory and start the script .

The problem is due to the reason that the slot used for sending changes to the subscription is created before publication. So, when we decode the changes (in your case 'insert') and check the respective publication, it won't exist by that time. The same problem is being discussed in a slightly different context in the email [1]. As a workaround, if you create a slot after creating a publication, you should not see this error.


--
With Regards,
Amit Kapila.

Re: How is this possible "publication does not exist"

From
Amit Kapila
Date:
On Tue, May 24, 2022 at 11:42 AM operations i <ioperations.c@gmail.com> wrote:
>
> Thanks for your quick response,So the problem is user's misusing it. Is there any plan to throw an error when using
logicalreplication like this?
 
>

It could be tricky to detect that but if you have ideas on how to do
that, I am all ears. I think you can wait for that other thread [1] to
reach conclusion and see if that might help your case as well.

Note - Please keep 'pgsql*' lists in cc while replying as others can
also help/provide suggestions.

[1] - https://www.postgresql.org/message-id/CAHut%2BPvMbCsL8PAz1Qc6LNoL0Ag0y3YJtPVJ8V0xVXJOPb%2B0xw%40mail.gmail.com

-- 
With Regards,
Amit Kapila.



How is this possible "publication does not exist"

From
operations i
Date:
I have found this topic (https://www.postgresql.org/message-id/89b22955-bf9f-f7f2-9620-fbe2e4766a18%402ndquadrant.com)
cause I have the same problem in my environment, and I'd like to continue the story .
it seems the problem will definitely occur when start the script.
find an empty directory and start the script . 
I have tested the latest github repo build. version 15beta1.
Screenshot from 2022-05-24 10-15-12.png

Attachment

Re: How is this possible "publication does not exist"

From
Amit Kapila
Date:
On Wed, May 25, 2022 at 9:32 AM operations i <ioperations.c@gmail.com> wrote:
During a detailed test , I've found the order of replication slot creation and publication creation is not the key point , but after replication slot creation , there should not be any insertion to the table,so I wonder why the change ( my case insert) will lead to src/backend/catalog/pg_publication:1040 GetSysCacheOid1() cache miss.

It is due to the reason that both slot creation and 'insert' are before "CREATE PUBLICATION". Without inserts, it won't try to decode anything before the publication is created. Can you test it with the patch provided in the email [1] and if possible join that thread for discussion on this topic.


--
With Regards,
Amit Kapila.

Re: How is this possible "publication does not exist"

From
operations i
Date:
During a detailed test , I've found the order of replication slot creation and publication creation is not the key point , but after replication slot creation , there should not be any insertion to the table,so I wonder why the change ( my case insert) will lead to src/backend/catalog/pg_publication:1040 GetSysCacheOid1() cache miss.
image.png

On Tue, May 24, 2022 at 9:38 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, May 24, 2022 at 11:42 AM operations i <ioperations.c@gmail.com> wrote:
>
> Thanks for your quick response,So the problem is user's misusing it. Is there any plan to throw an error when using logical replication like this?
>

It could be tricky to detect that but if you have ideas on how to do
that, I am all ears. I think you can wait for that other thread [1] to
reach conclusion and see if that might help your case as well.

Note - Please keep 'pgsql*' lists in cc while replying as others can
also help/provide suggestions.

[1] - https://www.postgresql.org/message-id/CAHut%2BPvMbCsL8PAz1Qc6LNoL0Ag0y3YJtPVJ8V0xVXJOPb%2B0xw%40mail.gmail.com

--
With Regards,
Amit Kapila.
Attachment

Re: How is this possible "publication does not exist"

From
operations i
Date:
sure, I've test the patch [1] using script provided before, and it seems ok 

On Wed, May 25, 2022 at 4:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, May 25, 2022 at 9:32 AM operations i <ioperations.c@gmail.com> wrote:
During a detailed test , I've found the order of replication slot creation and publication creation is not the key point , but after replication slot creation , there should not be any insertion to the table,so I wonder why the change ( my case insert) will lead to src/backend/catalog/pg_publication:1040 GetSysCacheOid1() cache miss.

It is due to the reason that both slot creation and 'insert' are before "CREATE PUBLICATION". Without inserts, it won't try to decode anything before the publication is created. Can you test it with the patch provided in the email [1] and if possible join that thread for discussion on this topic.


--
With Regards,
Amit Kapila.
Attachment

Re: How is this possible "publication does not exist"

From
Amit Kapila
Date:
On Thu, Jun 2, 2022 at 9:11 AM operations i <ioperations.c@gmail.com> wrote:
sorry, I still confused ,as we have already created the publication "pub_test" , but during database runtime, function GetPublicationByName() does not get system catalog cache, Is there any mechanism  that could make system catalog expand or shrink?

The system catalog doesn't expand or shrink in this case but rather it uses a snapshot to check the visibility of rows present. Here, we use historic snapshots to check the visibility of the publication row. Even though it is present in the catalog, it won't be visible, so you are getting an error. Here, the snapshot is built from WAL starting from the point where you created a slot. As the publication is created after Insert when you are trying to decode Insert, the required publication is not visible and hence you are getting the error. You might want to read src/backend/replication/logical/snapbuild.c to understand how the required snapshots are built from WAL.

--
With Regards,
Amit Kapila.

Re: How is this possible "publication does not exist"

From
operations i
Date:
after several times debugging, I 've realized the key point is the order of publication creation and insert statement, and both two have reached catcache.c:SearchCatCacheMiss() .and below has the difference explained
1. publication create first and then insert data into table
       the expression ```HeapTupleIsValid(ntp = systable_getnext(scandesc))``` return true

2.insert data into table and then create publication
      the expression ```HeapTupleIsValid(ntp = systable_getnext(scandesc))``` return false

so in case 2 , when decoding the first change , system table snapshot do not have Publication 'pub_test' , so throw the error
        " publication 'pub_test'" not exists 


Re: How is this possible "publication does not exist"

From
operations i
Date:
cool , you have send faster than me for about 5 seconds

On Thu, Jun 2, 2022 at 6:57 PM operations i <ioperations.c@gmail.com> wrote:
after several times debugging, I 've realized the key point is the order of publication creation and insert statement, and both two have reached catcache.c:SearchCatCacheMiss() .and below has the difference explained
1. publication create first and then insert data into table
       the expression ```HeapTupleIsValid(ntp = systable_getnext(scandesc))``` return true

2.insert data into table and then create publication
      the expression ```HeapTupleIsValid(ntp = systable_getnext(scandesc))``` return false

so in case 2 , when decoding the first change , system table snapshot do not have Publication 'pub_test' , so throw the error
        " publication 'pub_test'" not exists 


Re: How is this possible "publication does not exist"

From
operations i
Date:
Ok, I have got the real problem, It's very kind of you for your quick response.

On Thu, Jun 2, 2022 at 6:58 PM operations i <ioperations.c@gmail.com> wrote:
cool , you have send faster than me for about 5 seconds

On Thu, Jun 2, 2022 at 6:57 PM operations i <ioperations.c@gmail.com> wrote:
after several times debugging, I 've realized the key point is the order of publication creation and insert statement, and both two have reached catcache.c:SearchCatCacheMiss() .and below has the difference explained
1. publication create first and then insert data into table
       the expression ```HeapTupleIsValid(ntp = systable_getnext(scandesc))``` return true

2.insert data into table and then create publication
      the expression ```HeapTupleIsValid(ntp = systable_getnext(scandesc))``` return false

so in case 2 , when decoding the first change , system table snapshot do not have Publication 'pub_test' , so throw the error
        " publication 'pub_test'" not exists