Re: Transaction Snapshots and Hot Standby - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Transaction Snapshots and Hot Standby |
Date | |
Msg-id | 1221143446.3913.893.camel@ebony.2ndQuadrant Whole thread Raw |
In response to | Re: Transaction Snapshots and Hot Standby (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>) |
Responses |
Re: Transaction Snapshots and Hot Standby
|
List | pgsql-hackers |
Thanks for the detailed thinking. At least one very good new idea here, some debate on other points. On Thu, 2008-09-11 at 09:24 +0300, Heikki Linnakangas wrote: > And still we can't escape the scenario that the slave receives a WAL > record that vacuums away a tuple that's still visible according to a > snapshot used in the slave. Even with the proposed scheme, this can happen: > > 1. Slave receives a snapshot from master > 2. A long-running transaction begins on the slave, using that snapshot > 3. Network connection is lost > 4. Master hits a timeout, and decides to discard the snapshot it sent to > the slave > 5. A tuple visible to the snapshot is vacuumed > 6. Network connection is re-established > 7. Slave receives the vacuum WAL record, even though the long-running > transaction still needs the tuple. Interesting point. (4) is a problem, though not for the reason you suggest. If we were to stop and start master, that would be sufficient to discard the snapshot that the standby is using and so cause problems. So the standby *must* tell the master the recentxmin it is using, as you suggest later, so good thinking. So part of the handshake between primary and standby must be "what is your recentxmin?". The primary will then use the lower/earliest of the two. > I like the idea of acquiring snapshots locally in the slave much more. Me too. We just need to know how, if at all. > As you mentioned, the options there are to defer applying WAL, or cancel > queries. I think both options need the same ability to detect when > you're about to remove a tuple that's still visible to some snapshot, > just the action is different. We should probably provide a GUC to > control which you want. I don't see any practical way of telling whether a tuple removal will affect a snapshot or not. Each removed row would need to be checked against each standby snapshot. Even if those were available, it would be too costly. And even if we can do that, ISTM that neither option is acceptable: if we cancel queries then touching a frequently updated table is nearly impossible, or if we delay applying WAL then the standby could fall behind, impairing its ability for use in HA. (If there was a way, yes, we should have a parameter for it). It was also suggested we might take the removed rows and put them in a side table, but that makes me think of the earlier ideas for HOT and so I've steered clear of that. You might detect blocks that have had tuples removed from them *after* a query started by either * keeping a hash table of changed blocks - it would be a very big data structure and hard to keep clean * adding an additional "last cleaned LSN" onto every data block * keeping an extra LSN on the bufhdr for each of the shared_buffers, plus keeping a hash table of blocks that have been cleaned and then paged out Once detected, your only option is to cancel the query. ISTM if we want to try to avoid making recentxmin same on both primary and standby then the only viable options are the 3 on the original post. > However, if we still to provide the behavior that "as long as the > network connection works, the master will not remove tuples still needed > in the slave" as an option, a lot simpler implementation is to > periodically send the slave's oldest xmin to master. Master can take > that into account when calculating its own oldest xmin. That requires a > lot less communication than the proposed scheme to send snapshots back > and forth. A softer version of that is also possible, where the master > obeys the slave's oldest xmin, but only up to a point. I like this very much. Much simpler implementation and no need for a delay in granting snapshots. I'll go for this as the default implementation. Thanks for the idea. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
pgsql-hackers by date: