Re: Transaction Snapshots and Hot Standby - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Transaction Snapshots and Hot Standby
Date
Msg-id 48C8B9B9.3000407@enterprisedb.com
Whole thread Raw
In response to Transaction Snapshots and Hot Standby  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Transaction Snapshots and Hot Standby  (Hannu Krosing <hannu@2ndQuadrant.com>)
Re: Transaction Snapshots and Hot Standby  (Simon Riggs <simon@2ndQuadrant.com>)
Re: Transaction Snapshots and Hot Standby  (Richard Huxton <dev@archonet.com>)
Re: Transaction Snapshots and Hot Standby  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
Simon Riggs wrote:
> Taking snapshots from primary has a few disadvantages
> 
>  ...
>       * snapshots on primary prevent row removal (but this was also an
>         advantage of this technique!)

That makes it an awful solution for high availability. A backend hung in 
transaction-in-progress state in the slave will prevent row removal on 
the master. Isolating the master from queries done performed in the 
slave is exactly the reason why people use hot standby. And running long 
reporting queries in the standby is again a very typical use case.

And still we can't escape the scenario that the slave receives a WAL 
record that vacuums away a tuple that's still visible according to a 
snapshot used in the slave. Even with the proposed scheme, this can happen:

1. Slave receives a snapshot from master
2. A long-running transaction begins on the slave, using that snapshot
3. Network connection is lost
4. Master hits a timeout, and decides to discard the snapshot it sent to 
the slave
5. A tuple visible to the snapshot is vacuumed
6. Network connection is re-established
7. Slave receives the vacuum WAL record, even though the long-running 
transaction still needs the tuple.

I like the idea of acquiring snapshots locally in the slave much more. 
As you mentioned, the options there are to defer applying WAL, or cancel 
queries. I think both options need the same ability to detect when 
you're about to remove a tuple that's still visible to some snapshot, 
just the action is different. We should probably provide a GUC to 
control which you want.

However, if we still to provide the behavior that "as long as the 
network connection works, the master will not remove tuples still needed 
in the slave" as an option, a lot simpler implementation is to 
periodically send the slave's oldest xmin to master. Master can take 
that into account when calculating its own oldest xmin. That requires a 
lot less communication than the proposed scheme to send snapshots back 
and forth. A softer version of that is also possible, where the master 
obeys the slave's oldest xmin, but only up to a point.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Commitfest patches mostly assigned ... status
Next
From: Heikki Linnakangas
Date:
Subject: Re: [PATCHES] TODO item: Implement Boyer-Moore searching (First time hacker)