Thread: How is this possible "publication does not exist"
The publication exists but for some reason the function can't find it
ERROR: publication "dbz_publication" does not exist
CONTEXT: slot "debezium", output plugin "pgoutput", in the change callback, associated LSN 0/307D8E8
postgres=# select * from pg_publication;
pubname | pubowner | puballtables | pubinsert | pubupdate | pubdelete | pubtruncate
-----------------+----------+--------------+-----------+-----------+-----------+-------------
dbz_publication | 10 | t | t | t | t | t
(1 row)
postgres=# SELECT * FROM pg_logical_slot_get_binary_changes('debezium', NULL, NULL,'proto_version','1','publication_names','dbz_publication');
ERROR: publication "dbz_publication" does not exist
CONTEXT: slot "debezium", output plugin "pgoutput", in the change callback, associated LSN 0/307D8E8
Dave Cramer
On Thu, 19 Dec 2019 at 11:59, Dave Cramer <davecramer@gmail.com> wrote:
The publication exists but for some reason the function can't find itSELECT * FROM pg_logical_slot_get_binary_changes('debezium', NULL, NULL,'proto_version','1','publication_names','dbz_publication');
ERROR: publication "dbz_publication" does not exist
CONTEXT: slot "debezium", output plugin "pgoutput", in the change callback, associated LSN 0/307D8E8
postgres=# select * from pg_publication;
pubname | pubowner | puballtables | pubinsert | pubupdate | pubdelete | pubtruncate
-----------------+----------+--------------+-----------+-----------+-----------+-------------
dbz_publication | 10 | t | t | t | t | t
(1 row)
postgres=# SELECT * FROM pg_logical_slot_get_binary_changes('debezium', NULL, NULL,'proto_version','1','publication_names','dbz_publication');
ERROR: publication "dbz_publication" does not exist
CONTEXT: slot "debezium", output plugin "pgoutput", in the change callback, associated LSN 0/307D8E8
It seems that if you drop the publication on an existing slot it needs to be recreated. Is this expected behaviour
drop publication dbz_publication ;
DROP PUBLICATION
postgres=# create publication dbz_publication for all tables;
CREATE PUBLICATION
postgres=# SELECT * FROM pg_logical_slot_get_binary_changes('debezium', NULL, NULL,'proto_version','1','publication_names','dbz_publication');
ERROR: publication "dbz_publication" does not exist
CONTEXT: slot "debezium", output plugin "pgoutput", in the change callback, associated LSN 0/4324180
DROP PUBLICATION
postgres=# create publication dbz_publication for all tables;
CREATE PUBLICATION
postgres=# SELECT * FROM pg_logical_slot_get_binary_changes('debezium', NULL, NULL,'proto_version','1','publication_names','dbz_publication');
ERROR: publication "dbz_publication" does not exist
CONTEXT: slot "debezium", output plugin "pgoutput", in the change callback, associated LSN 0/4324180
Dave Cramer
On 2019-12-19 19:15, Dave Cramer wrote: > It seems that if you drop the publication on an existing slot it needs > to be recreated. Is this expected behaviour A publication is not associated with a slot. Only a subscription is associated with a slot. > drop publication dbz_publication ; > DROP PUBLICATION > postgres=# create publication dbz_publication for all tables; > CREATE PUBLICATION > postgres=# SELECT * FROM pg_logical_slot_get_binary_changes('debezium', > NULL, NULL,'proto_version','1','publication_names','dbz_publication'); > ERROR: publication "dbz_publication" does not exist > CONTEXT: slot "debezium", output plugin "pgoutput", in the change > callback, associated LSN 0/4324180 This must be something particular to Debezium. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Dec 19, 2019 at 07:19:56PM +0100, Peter Eisentraut wrote: >On 2019-12-19 19:15, Dave Cramer wrote: >>It seems that if you drop the publication on an existing slot it >>needs to be recreated. Is this expected behaviour > >A publication is not associated with a slot. Only a subscription is >associated with a slot. > >>drop publication dbz_publication ; >>DROP PUBLICATION >>postgres=# create publication dbz_publication for all tables; >>CREATE PUBLICATION >>postgres=# SELECT * FROM >>pg_logical_slot_get_binary_changes('debezium', NULL, >>NULL,'proto_version','1','publication_names','dbz_publication'); >>ERROR: publication "dbz_publication" does not exist >>CONTEXT: slot "debezium", output plugin "pgoutput", in the change >>callback, associated LSN 0/4324180 > >This must be something particular to Debezium. > Yeah, I don't see this error message anywhere in our sources on 11 or 12, so perhaps debezium does something funny? It's not clear to me where, though, as this suggests it uses the pgoutput module. While trying to reproduce this I however ran into a related issue with pgoutput/pg_logical_slot_get_binary_changes. If you call the function repeatedly (~10x) you'll get an error like this: FATAL: out of relcache_callback_list slots CONTEXT: slot "slot", output plugin "pgoutput", in the startup callback server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The reason is very simple - each call executes pgoutput_startup, which does CacheRegisterRelcacheCallback in init_rel_sync_cache. And we do this on each pg_logical_slot_get_binary_changes() call and never remove the callbacks, so we simply run out of MAX_RELCACHE_CALLBACKS slots. Not sure if this is a known issue/behavior, but it seems a bit annoying and possibly related to the issue reported by Dave. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, 19 Dec 2019 19:19:56 +0100 Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 2019-12-19 19:15, Dave Cramer wrote: > > It seems that if you drop the publication on an existing slot it needs > > to be recreated. Is this expected behaviour > > A publication is not associated with a slot. Only a subscription is > associated with a slot. Couldn't it be the same scenario I reported back in october? See: Subject: Logical replication dead but synching Date: Thu, 10 Oct 2019 11:57:52 +0200
I've been running into a similar issue and am a little puzzled by it, especially since it survives restarts. On Fri, Dec 20, 2019 at 2:39 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > Yeah, I don't see this error message anywhere in our sources on 11 or > 12, so perhaps debezium does something funny? It's not clear to me > where, though, as this suggests it uses the pgoutput module. The error message comes from GetPublicationByName and the context is added by output_plugin_error_callback in logical.c. Stack trace of where the error occurs below. # SELECT * FROM pg_publication; pubname | pubowner | puballtables | pubinsert | pubupdate | pubdelete | pubtruncate ----------------+----------+--------------+-----------+-----------+-----------+------------- migration_pub | 10 | f | t | t | t | t (1 row) # SELECT * FROM pg_replication_slots ; slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn ----------------+----------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+--------------------- migration_slot | pgoutput | logical | 13121 | postgres | f | f | | | 17153 | 0/CDFC840 | 0/CDFC878 (1 row) # SELECT * FROM pg_logical_slot_get_binary_changes('migration_slot', NULL, NULL,'proto_version','1','publication_names','migration_pub'); ERROR: publication "migration_pub" does not exist CONTEXT: slot "migration_slot", output plugin "pgoutput", in the change callback, associated LSN 0/CDFC878 #0 errstart (elevel=elevel@entry=20, filename=filename@entry=0x5581958a6c70 "pg_publication.c", lineno=lineno@entry=401, funcname=funcname@entry=0x5581958a6ea0 <__func__.24991> "GetPublicationByName", domain=domain@entry=0x0) at elog.c:251 #1 0x00005581954771e5 in GetPublicationByName (pubname=0x558196d107a0 "migration_pub", missing_ok=missing_ok@entry=false) at pg_publication.c:401 #2 0x00007f77ba1cd704 in LoadPublications (pubnames=<optimized out>) at pgoutput.c:467 #3 0x00007f77ba1cd7e3 in get_rel_sync_entry (data=data@entry=0x558196cedee8, relid=<optimized out>) at pgoutput.c:559 #4 0x00007f77ba1cdb52 in pgoutput_change (ctx=0x558196d7b4f8, txn=<optimized out>, relation=0x7f77ba1e67c8, change=0x558196cdbab8) at pgoutput.c:315 #5 0x000055819566a2e6 in change_cb_wrapper (cache=<optimized out>, txn=<optimized out>, relation=<optimized out>, change=<optimized out>) at logical.c:747 #6 0x0000558195675785 in ReorderBufferCommit (rb=0x558196d35d38, xid=xid@entry=17153, commit_lsn=215994160, end_lsn=<optimized out>, commit_time=commit_time@entry=662061745906576, origin_id=origin_id@entry=0, origin_lsn=0) at reorderbuffer.c:1592 #7 0x0000558195667407 in DecodeCommit (ctx=ctx@entry=0x558196d7b4f8, buf=buf@entry=0x7ffd61faae60, parsed=parsed@entry=0x7ffd61faacf0, xid=17153) at decode.c:641 #8 0x00005581956675a0 in DecodeXactOp (ctx=0x558196d7b4f8, buf=buf@entry=0x7ffd61faae60) at decode.c:249 #9 0x00005581956684cb in LogicalDecodingProcessRecord (ctx=ctx@entry=0x558196d7b4f8, record=<optimized out>) at decode.c:117 #10 0x000055819566c108 in pg_logical_slot_get_changes_guts (fcinfo=0x7ffd61fab120, confirm=confirm@entry=true, binary=binary@entry=true) at logicalfuncs.c:309 #11 0x000055819566c35d in pg_logical_slot_get_binary_changes (fcinfo=<optimized out>) at logicalfuncs.c:391 cheers, Marco
On 12/24/20 12:50 PM, Marco Slot wrote: > I've been running into a similar issue and am a little puzzled by it, > especially since it survives restarts. > Interesting. Which PostgreSQL version are you using? Any idea how to reproduce it? Were there any failures right before the issue appeared? I wonder if this might be a case of index corruption. Can you try forcing an index scan on pg_publication? SET enable_seqscan = false; SET enable_bitmapscan = off; SELECT * FROM pg_publication WHERE pubname = 'migration_pub'; Also, it might be helpful to know why get_rel_sync_entry ended up calling LoadPublications - did we just create the entry? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> 20 дек. 2019 г., в 06:39, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а): > > While trying to reproduce this I however ran into a related issue with > pgoutput/pg_logical_slot_get_binary_changes. If you call the function > repeatedly (~10x) you'll get an error like this: > > FATAL: out of relcache_callback_list slots > CONTEXT: slot "slot", output plugin "pgoutput", in the startup callback > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > > The reason is very simple - each call executes pgoutput_startup, which > does CacheRegisterRelcacheCallback in init_rel_sync_cache. And we do > this on each pg_logical_slot_get_binary_changes() call and never remove > the callbacks, so we simply run out of MAX_RELCACHE_CALLBACKS slots. > > Not sure if this is a known issue/behavior, but it seems a bit annoying > and possibly related to the issue reported by Dave. Sorry for bumping old thread. I was involved in troubleshooting logical replication recently. And found out that it sometimes has a really annoying errorreporting. While source of the problem was allegedly in low max_replication_slots, users were getting only this error about relcache_callback_list. Maybe we could fix this particular error by deduplicating callbacks? Best regards, Andrey Borodin.
Attachment
Reviving this thread
2021-08-10 19:05:09.096 UTC [3738] LOG: logical replication apply worker for subscription "sub_mycluster_alltables" has started 2021-08-10 19:05:09.107 UTC [3739] LOG: logical replication table synchronization worker for subscription "sub_mycluster_alltables", table "t_random" has started 2021-08-10 19:05:12.222 UTC [3739] LOG: logical replication table synchronization worker for subscription "sub_mycluster_alltables", table "t_random" has finished 2021-08-10 19:05:14.806 UTC [3738] ERROR: could not receive data from WAL stream: ERROR: publication "sub_mycluster_alltables" does not exist CONTEXT: slot "sub_mycluster_alltables", output plugin "pgoutput", in the change callback, associated LSN 0/4015DF0 2021-08-10 19:05:14.811 UTC [175] LOG: background worker "logical replication worker" (PID 3738) exited with exit code 1
select * from pg_publication; -[ RECORD 1 ]+------------------------ oid | 16415 pubname | sub_mycluster_alltables pubowner | 10 puballtables | t pubinsert | t pubupdate | t pubdelete | t pubtruncate | t
select * from pg_replication_slots; -[ RECORD 1 ]-------+-------------------------------- slot_name | mycluster_cjvq_68cf55677c_6vgcf plugin | slot_type | physical datoid | database | temporary | f active | t active_pid | 433 xmin | catalog_xmin | restart_lsn | 0/D000000 confirmed_flush_lsn | -[ RECORD 2 ]-------+-------------------------------- slot_name | sub_mycluster_alltables plugin | pgoutput slot_type | logical datoid | 16395 database | mycluster temporary | f active | t active_pid | 8799 xmin | catalog_xmin | 500 restart_lsn | 0/40011C0
confirmed_flush_lsn | 0/40011C0
I'm at a loss as to where to even look at this point.
Dave
On Wed, Aug 11, 2021 at 1:18 AM Dave Cramer <davecramer@gmail.com> wrote: > > Reviving this thread > > 2021-08-10 19:05:09.096 UTC [3738] LOG: logical replication apply worker for subscription "sub_mycluster_alltables" hasstarted > 2021-08-10 19:05:09.107 UTC [3739] LOG: logical replication table synchronization worker for subscription "sub_mycluster_alltables",table "t_random" has started > 2021-08-10 19:05:12.222 UTC [3739] LOG: logical replication table synchronization worker for subscription "sub_mycluster_alltables",table "t_random" has finished > 2021-08-10 19:05:14.806 UTC [3738] ERROR: could not receive data from WAL stream: ERROR: publication "sub_mycluster_alltables"does not exist > CONTEXT: slot "sub_mycluster_alltables", output plugin "pgoutput", in the change callback, associated LSN 0/4015DF0 > 2021-08-10 19:05:14.811 UTC [175] LOG: background worker "logical replication worker" (PID 3738) exited with exit code1 > > > select * from pg_publication; > -[ RECORD 1 ]+------------------------ > oid | 16415 > pubname | sub_mycluster_alltables > pubowner | 10 > puballtables | t > pubinsert | t > pubupdate | t > pubdelete | t > pubtruncate | t > By any chance, did you dropped and recreated this publication as mentioned in your first email? If so, I think this can happen because of our use of historical snapshots to consult system catalogs. -- With Regards, Amit Kapila.
On Wed, 11 Aug 2021 at 07:24, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Aug 11, 2021 at 1:18 AM Dave Cramer <davecramer@gmail.com> wrote:
>
> Reviving this thread
>
> 2021-08-10 19:05:09.096 UTC [3738] LOG: logical replication apply worker for subscription "sub_mycluster_alltables" has started
> 2021-08-10 19:05:09.107 UTC [3739] LOG: logical replication table synchronization worker for subscription "sub_mycluster_alltables", table "t_random" has started
> 2021-08-10 19:05:12.222 UTC [3739] LOG: logical replication table synchronization worker for subscription "sub_mycluster_alltables", table "t_random" has finished
> 2021-08-10 19:05:14.806 UTC [3738] ERROR: could not receive data from WAL stream: ERROR: publication "sub_mycluster_alltables" does not exist
> CONTEXT: slot "sub_mycluster_alltables", output plugin "pgoutput", in the change callback, associated LSN 0/4015DF0
> 2021-08-10 19:05:14.811 UTC [175] LOG: background worker "logical replication worker" (PID 3738) exited with exit code 1
>
>
> select * from pg_publication;
> -[ RECORD 1 ]+------------------------
> oid | 16415
> pubname | sub_mycluster_alltables
> pubowner | 10
> puballtables | t
> pubinsert | t
> pubupdate | t
> pubdelete | t
> pubtruncate | t
>
By any chance, did you dropped and recreated this publication as
mentioned in your first email? If so, I think this can happen because
of our use of historical snapshots to consult system catalogs.
In this case, no.
I am suspecting this error comes from pgoutput though.
Dave
On Wed, Aug 11, 2021 at 4:57 PM Dave Cramer <davecramer@gmail.com> wrote: > > On Wed, 11 Aug 2021 at 07:24, Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> On Wed, Aug 11, 2021 at 1:18 AM Dave Cramer <davecramer@gmail.com> wrote: >> > >> > Reviving this thread >> > >> > 2021-08-10 19:05:09.096 UTC [3738] LOG: logical replication apply worker for subscription "sub_mycluster_alltables"has started >> > 2021-08-10 19:05:09.107 UTC [3739] LOG: logical replication table synchronization worker for subscription "sub_mycluster_alltables",table "t_random" has started >> > 2021-08-10 19:05:12.222 UTC [3739] LOG: logical replication table synchronization worker for subscription "sub_mycluster_alltables",table "t_random" has finished >> > 2021-08-10 19:05:14.806 UTC [3738] ERROR: could not receive data from WAL stream: ERROR: publication "sub_mycluster_alltables"does not exist >> > CONTEXT: slot "sub_mycluster_alltables", output plugin "pgoutput", in the change callback, associated LSN 0/4015DF0 >> > 2021-08-10 19:05:14.811 UTC [175] LOG: background worker "logical replication worker" (PID 3738) exited with exit code1 >> > >> > >> > select * from pg_publication; >> > -[ RECORD 1 ]+------------------------ >> > oid | 16415 >> > pubname | sub_mycluster_alltables >> > pubowner | 10 >> > puballtables | t >> > pubinsert | t >> > pubupdate | t >> > pubdelete | t >> > pubtruncate | t >> > >> >> By any chance, did you dropped and recreated this publication as >> mentioned in your first email? If so, I think this can happen because >> of our use of historical snapshots to consult system catalogs. > > > In this case, no. > > I am suspecting this error comes from pgoutput though. > I think it is and the context is generated via output_plugin_error_callback. Is this reproducible for you and if so, can you share a test case or some steps to reproduce this? Does this work and suddenly start giving errors or it happens the very first time you tried to set up publication/subscription? I think some more details are required about your setup and steps to analyze this problem. You might want to check publication-side logs but not sure if get any better clue there. -- With Regards, Amit Kapila.
On Wed, 11 Aug 2021 at 07:37, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Aug 11, 2021 at 4:57 PM Dave Cramer <davecramer@gmail.com> wrote:
>
> On Wed, 11 Aug 2021 at 07:24, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Wed, Aug 11, 2021 at 1:18 AM Dave Cramer <davecramer@gmail.com> wrote:
>> >
>> > Reviving this thread
>> >
>> > 2021-08-10 19:05:09.096 UTC [3738] LOG: logical replication apply worker for subscription "sub_mycluster_alltables" has started
>> > 2021-08-10 19:05:09.107 UTC [3739] LOG: logical replication table synchronization worker for subscription "sub_mycluster_alltables", table "t_random" has started
>> > 2021-08-10 19:05:12.222 UTC [3739] LOG: logical replication table synchronization worker for subscription "sub_mycluster_alltables", table "t_random" has finished
>> > 2021-08-10 19:05:14.806 UTC [3738] ERROR: could not receive data from WAL stream: ERROR: publication "sub_mycluster_alltables" does not exist
>> > CONTEXT: slot "sub_mycluster_alltables", output plugin "pgoutput", in the change callback, associated LSN 0/4015DF0
>> > 2021-08-10 19:05:14.811 UTC [175] LOG: background worker "logical replication worker" (PID 3738) exited with exit code 1
>> >
>> >
>> > select * from pg_publication;
>> > -[ RECORD 1 ]+------------------------
>> > oid | 16415
>> > pubname | sub_mycluster_alltables
>> > pubowner | 10
>> > puballtables | t
>> > pubinsert | t
>> > pubupdate | t
>> > pubdelete | t
>> > pubtruncate | t
>> >
>>
>> By any chance, did you dropped and recreated this publication as
>> mentioned in your first email? If so, I think this can happen because
>> of our use of historical snapshots to consult system catalogs.
>
>
> In this case, no.
>
> I am suspecting this error comes from pgoutput though.
>
I think it is and the context is generated via
output_plugin_error_callback. Is this reproducible for you and if so,
can you share a test case or some steps to reproduce this? Does this
work and suddenly start giving errors or it happens the very first
time you tried to set up publication/subscription? I think some more
details are required about your setup and steps to analyze this
problem. You might want to check publication-side logs but not sure if
get any better clue there.
In this case I am the messenger. I will try to get a repeatable test case.
Dave
On Wed, Aug 11, 2021 at 1:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > I think it is and the context is generated via > output_plugin_error_callback. Is this reproducible for you and if so, > can you share a test case or some steps to reproduce this? Does this > work and suddenly start giving errors or it happens the very first > time you tried to set up publication/subscription? I think some more > details are required about your setup and steps to analyze this > problem. You might want to check publication-side logs but not sure if > get any better clue there. This seems to regularly reproduce the issue on PostgreSQL 14.4: drop subscription if exists local_sub; drop publication if exists local_pub; drop table if exists local; select pg_create_logical_replication_slot('test','pgoutput'); create table local (x int, y int); insert into local values (1,2); create publication local_pub for table local; create subscription local_sub connection 'host=localhost port=5432' publication local_pub with (create_slot = false, slot_name = 'test', copy_data = false); The log on the publisher then repeatedly shows: 2022-08-04 10:46:56.140 CEST [12785] ERROR: publication "local_pub" does not exist 2022-08-04 10:46:56.140 CEST [12785] CONTEXT: slot "test", output plugin "pgoutput", in the change callback, associated LSN 8/6C01A270 2022-08-04 10:46:56.140 CEST [12785] STATEMENT: START_REPLICATION SLOT "test" LOGICAL 0/0 (proto_version '2', publication_names '"local_pub"') (fails in the same way when setting up the subscription on a different node) The local_pub does appear in pg_publication, but it seems a bit like the change_cb is using an old snapshot when reading from the catalog in GetPublicationByName. cheers, Marco
At Thu, 4 Aug 2022 10:56:45 +0200, Marco Slot <marco.slot@gmail.com> wrote in > On Wed, Aug 11, 2021 at 1:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > I think it is and the context is generated via > > output_plugin_error_callback. Is this reproducible for you and if so, > > can you share a test case or some steps to reproduce this? Does this > > work and suddenly start giving errors or it happens the very first > > time you tried to set up publication/subscription? I think some more > > details are required about your setup and steps to analyze this > > problem. You might want to check publication-side logs but not sure if > > get any better clue there. > > This seems to regularly reproduce the issue on PostgreSQL 14.4: > > drop subscription if exists local_sub; > drop publication if exists local_pub; > drop table if exists local; > > select pg_create_logical_replication_slot('test','pgoutput'); > create table local (x int, y int); > insert into local values (1,2); > create publication local_pub for table local; > create subscription local_sub connection 'host=localhost port=5432' > publication local_pub with (create_slot = false, slot_name = 'test', > copy_data = false); > The local_pub does appear in pg_publication, but it seems a bit like > the change_cb is using an old snapshot when reading from the catalog > in GetPublicationByName. Yes. So the slot should be created *after* the publication is created. A discussion [1] was held on allowing missing publications in such a case but it has not been rached a conclusion.. [1] https://www.postgresql.org/message-id/CAA4eK1LwQAEPJMTwVe3UYODeNMkK2QHf-WZF5aXp5ZcjDRcrUA%40mail.gmail.com regards. -- Kyotaro Horiguchi NTT Open Source Software Center