Hi,
While jumping around partically decoded xacts questions [1], I've read
through the copy replication slots code (9f06d79ef) and found a couple
of issues.
1) It seems quite reckless to me to dive into
DecodingContextFindStartpoint without actual WAL reservation (donors
slot restart_lsn is used, but it is not acquired). Why risking erroring
out with WAL removal error if the function advances new slot position to
updated donors one in the end anyway?
2) In the end, restart_lsn of new slot is set to updated donors
one. However, confirmed_flush field is not updated. This is just wrong
-- we could start decoding too early and stream partially decoded
transaction.
I'd probably avoid doing DecodingContextFindStartpoint at all. Its only
purpose is to assemble consistent snapshot (and establish corresponding
<restart_lsn, confirmed_flush_lsn> pair), but donor slot must have
already done that and we could use it as well. Was this considered?
[1] https://www.postgresql.org/message-id/flat/AB5978B2-1772-4FEE-A245-74C91704ECB0%40amazon.com
--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company