Hi,
When developing pg_auto_failover we found a bug where if the target replication slot given to pg_basebackup does not exist, then a full copy of the source PGDATA is completed before erroring out. You could easily end-up copying 100GB of data over the network just to see pg_basebackup remove them all at the end, and then when using the –progress option, you have to scroll up to the very start of the output to see the error message.
Please find attached a patch that shows a way to fix the issue. The patch is missing windows compatibility, I don’t know how to cast the WNOHANG spell on this platform. Please use the patch as you see fit, either inspiration, or maybe something you would like to commit to fix the bug.
Here what it looks like without the patch:
$ ./src/bin/pg_basebackup/pg_basebackup -p 5501 -D /tmp/bb -X stream -S SlotDoesNotExists -P
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "SlotDoesNotExists" does not exist
32971/32971 kB (100%), 1/1 tablespace
pg_basebackup: error: child process exited with exit code 1
pg_basebackup: removing data directory "/tmp/bb"
Here’s what it looks like with the patch applied locally:
$ ./src/bin/pg_basebackup/pg_basebackup -p 5501 -D /tmp/bb -X stream -S SlotDoesNotExists -P
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "SlotDoesNotExists" does not exist
pg_basebackup: error: child process exited with exit code 1
pg_basebackup: removing data directory "/tmp/bb"
Regards,
--