Logical replication dead but synching - Mailing list pgsql-hackers
From | Jehan-Guillaume de Rorthais |
---|---|
Subject | Logical replication dead but synching |
Date | |
Msg-id | 20191010115752.2d0f27af@firost Whole thread Raw |
List | pgsql-hackers |
Hello, While giving assistance to some customer with their broker procedure, I found a scenario where the subscription is failing but the table are sync'ed anyway. Here is bash script to reproduce it with versions 10, 11 and 12 (make sure to set PATH correctly): # env PUB=/tmp/pub SUB=/tmp/sub unset PGPORT PGHOST PGDATABASE PGDATA export PGUSER=postgres # cleanup kill %1 pg_ctl -w -s -D "$PUB" -m immediate stop; echo $? pg_ctl -w -s -D "$SUB" -m immediate stop; echo $? rm -r "$PUB" "$SUB" # cluster initdb -U postgres -N "$PUB" &>/dev/null; echo $? initdb -U postgres -N "$SUB" &>/dev/null; echo $? echo "wal_level=logical" >> "$PUB"/postgresql.conf echo "port=5433" >> "$SUB"/postgresql.conf pg_ctl -w -s -D $PUB -l "$PUB"-"$(date +%FT%T)".log start; echo $? pg_ctl -w -s -D $SUB -l "$SUB"-"$(date +%FT%T)".log start; echo $? pgbench -p 5432 -qi pg_dump -p 5432 -s | psql -qXp 5433 # fake activity pgbench -p 5432 -T 60 -c 2 & # replication setup cat<<SQL | psql -p 5432 -X SELECT * FROM pg_create_logical_replication_slot('sub','pgoutput'); CREATE PUBLICATION prov FOR ALL TABLES; SQL cat<<SQL | psql -p 5433 -X CREATE SUBSCRIPTION sub CONNECTION 'port=5432' PUBLICATION prov WITH (create_slot=false, slot_name=sub) SQL Here are part of the logs from the subscriber: LOG: logical replication apply worker for subscription "sub" has started LOG: logical replication table synchronization worker for subscription "sub", table "pgbench_accounts" has started LOG: logical replication table synchronization worker for subscription "sub", table "pgbench_branches" has started ERROR: could not receive data from WAL stream: ERROR: publication "prov" does not exist CONTEXT: slot "sub", output plugin "pgoutput", in the change callback, associated LSN 0/22C0138 LOG: logical replication table synchronization worker for subscription "sub", table "pgbench_branches" has finished LOG: logical replication table synchronization worker for subscription "sub", table "pgbench_accounts" has finished LOG: logical replication apply worker for subscription "sub" has started ERROR: could not receive data from WAL stream: ERROR: publication "prov" does not exist CONTEXT: slot "sub", output plugin "pgoutput", in the change callback, associated LSN 0/22C0138 All tables are synch'ed while the main worker for subscription is spawned again and again with the same failure. As far as I could find out, the problem here is that the slot is created manually before the publication is created. When the subscriber subscribe, it builds a catalog cache from the slot by the time it has been created. Then, it couldn't find the publication, because it didn't exists in this old version of the catalog. Is my understanding correct? Sadly, I couldn't find any documentation (neither official or in sources) about the fact the slot must be created after the related publication. Moreover, it's quite illogical to find some error about a non existing publication when the data are being synched AND the publication actually exists on the other side. I suppose this should be documented in user documentation. Plus, what about forbidding the data sync if the main worker for subscription fails? Regards, PS: the customer hit the following issue as well while messing around but I hadn't time to find out how they did yet: https://www.postgresql.org/message-id/flat/a9139c29-7ddd-973b-aa7f-71fed9c38d75%40minerva.info
pgsql-hackers by date: