Simon Riggs wrote:
> Taking snapshots from primary has a few disadvantages
>
> ...
> * snapshots on primary prevent row removal (but this was also an
> advantage of this technique!)
That makes it an awful solution for high availability. A backend hung in
transaction-in-progress state in the slave will prevent row removal on
the master. Isolating the master from queries done performed in the
slave is exactly the reason why people use hot standby. And running long
reporting queries in the standby is again a very typical use case.
And still we can't escape the scenario that the slave receives a WAL
record that vacuums away a tuple that's still visible according to a
snapshot used in the slave. Even with the proposed scheme, this can happen:
1. Slave receives a snapshot from master
2. A long-running transaction begins on the slave, using that snapshot
3. Network connection is lost
4. Master hits a timeout, and decides to discard the snapshot it sent to
the slave
5. A tuple visible to the snapshot is vacuumed
6. Network connection is re-established
7. Slave receives the vacuum WAL record, even though the long-running
transaction still needs the tuple.
I like the idea of acquiring snapshots locally in the slave much more.
As you mentioned, the options there are to defer applying WAL, or cancel
queries. I think both options need the same ability to detect when
you're about to remove a tuple that's still visible to some snapshot,
just the action is different. We should probably provide a GUC to
control which you want.
However, if we still to provide the behavior that "as long as the
network connection works, the master will not remove tuples still needed
in the slave" as an option, a lot simpler implementation is to
periodically send the slave's oldest xmin to master. Master can take
that into account when calculating its own oldest xmin. That requires a
lot less communication than the proposed scheme to send snapshots back
and forth. A softer version of that is also possible, where the master
obeys the slave's oldest xmin, but only up to a point.
-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com