Hi,
here is another tidbit from our experience with using logical decoding.
The attached patch adds a callback to notify the output plugin of a
concurrent abort. I'll continue to describe the problem in more detail
and how this additional callback solves it.
Streamed transactions as well as two-phase commit transactions may get
decoded before they finish. At the point the begin_cb is invoked and
first changes are delivered to the output plugin, it is not necessarily
known whether the transaction will commit or abort.
This leads to the possibility of the transaction getting aborted
concurrent to logical decoding. In that case, it is likely for the
decoder to error on a catalog scan that conflicts with the abort of the
transaction. The reorderbuffer sports a PG_CATCH block to cleanup.
However, it does not currently inform the output plugin. From its point
of view, the transaction is left dangling until another one comes along
or until the final ROLLBACK or ROLLBACK PREPARED record from WAL gets
decoded. Therefore, what the output plugin might see in this case is:
* filter_prepare_cb (txn A)
* begin_prepare_cb (txn A)
* apply_change (txn A)
* apply_change (txn A)
* apply_change (txn A)
* begin_cb (txn B)
In other words, in this example, only the begin_cb of the following
transaction implicitly tells the output plugin that txn A could not be
fully decoded. And there's no upper time boundary on when that may
happen. (It could also be another filter_prepare_cb, if the subsequent
transaction happens to be a two-phase transaction as well. Or an
explicit rollback_prepared_cb or stream_abort if there's no other
transaction in between.)
An alternative and arguably cleaner approach for streamed transactions
may be to directly invoke stream_abort. However, the lsn argument
passed could not be that of the abort record, as that's not known at the
point in time of the concurrent abort. Plus, this seems like a bad fit
for two-phase commit transactions.
Again, this callback is especially important for output plugins that
invoke further actions on downstream nodes that delay the COMMIT
PREPARED of a transaction upstream, e.g. until prepared on other nodes.
Up until now, the output plugin has no way to learn about a concurrent
abort of the currently decoded (2PC or streamed) transaction (perhaps
short of continued polling on the transaction status).
I also think it generally improves the API by allowing the output plugin
to rely on such a callback, rather than having to implicitly deduce this
from other callbacks.
Thoughts or comments? If this is agreed on, I can look into adding
tests (concurrent aborts are not currently covered, it seems).
Regards
Markus