Hi,
So I've been trying to understand the "Introduce an option to make
logical replication database specific." patch and I have to confess I
just cannot.
As far as I can read, the point is that if we reach
SnapBuildProcessRunningXacts() when db_specific is true (which means
standby_decode is called in an output plugin that has set
need_shared_catalogs to false), _and_ we've not reached consistent state
yet, then we'll call LogStandbySnapshot with our DB oid to emit a new
xl_running_xacts message.
So the WAL-decoding process emits WAL. I don't know if in normal
conditions logical decoding processes emit WAL. If this is exceptional,
I think we should add a comment.
Now, this additional WAL message will be processed by all other
processes decoding WAL. Perhaps it will ignored by most of them. But
most importantly, it will also reach back to ourselves, at which point
we can hopefully use it to see that we might have reached consistent
state within our database. Then we know our snapshot is ready to be
used.
Is this correct?
I think the reason it's safe to skip a lot of the processing caused by
this additional process, is that xl_running_xacts messages are also
emitted in other places in a non-database specific manner. So all the
other placecs that are emitting that message continue to exist and
cause logical-decoders operate in the same way as before.
I think we should sprinkle lots of comments in several places about
this. For example, I propose that standby_redo() should have something
like
* If 'dbid' is valid, only gather transactions running in that database.
+ * Such records should not be the only ones emitted, because this has
+ * potentially dangerous side-effects which makes some places ignore them:
+ *
+ * 1. SnapBuildProcessRunningXacts will skip computing the xmin and restart
+ * point from its input record if the record's xmin is older that the
+ * snapbuilder's current xmin; this should normally be fine because that
+ * information will be updated from other xl_running_xacts records.
+ * 2. standby_redo will likewise skip processing such a record
*
(are there other things that should be mentioned?)
Also, LogStandbySnapshot() should have a comment explaining that passing
a valid dboid is a weird corner case which is to be used with care, and
that functions X Y and Z are going to ignore snapshots carrying a valid
dbid.
Why do we call SnapBuildFindSnapshot() to do this, instead of doing it
directly in SnapBuildProcessRunningXacts? Seems like it would be more
straightforward.
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/