Re: Synchronization levels in SR - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Synchronization levels in SR |
Date | |
Msg-id | 4C062380.8090108@enterprisedb.com Whole thread Raw |
In response to | Re: Synchronization levels in SR (Greg Smith <greg@2ndquadrant.com>) |
Responses |
Re: Synchronization levels in SR
|
List | pgsql-hackers |
On 02/06/10 10:22, Greg Smith wrote: > Heikki Linnakangas wrote: >> The possibilities are endless... Your proposal above covers a pretty >> good set of scenarios, but it's by no means complete. If we try to >> solve everything the configuration will need to be written in a >> Turing-complete Replication Description Language. We'll have to pick a >> useful, easy-to-understand subset that covers the common scenarios. To >> handle the more exotic scenarios, you can write a proxy that sits in >> front of the master, and implements whatever rules you wish, with the >> rules written in C. > > I was thinking about this a bit recently. As I see it, there are three > fundamental parts of this: > > 1) We have a transaction that is being committed. The rest of the > computations here are all relative to it. Agreed. > So in a 3 node case, the internal state table might look like this after > a bit of data had been committed: > > node | location | state > ---------------------------------- > a | local | fsync b | remote | recv > c | remote | async > > This means that the local node has a fully persistent copy, but the best > either remote one has done is received the data, it's not on disk at all > yet at the remote data center. Still working its way through. > > 3) The decision about whether the data has been committed to enough > places to be considered safe by the master is computed by a function > that is passed this internal table as something like a SRF, and it > returns a boolean. Once that returns true, saying it's satisfied, the > transaction closes on the master and continues to percolate out from > there. If it's false, we wait for another state change to come in and > return to (2). You can't implement "wait for X to ack the commit, but if that doesn't happen in Y seconds, time out and return true anyway" with that. > While exposing the local state and running this computation isn't free, > in situations where there truly are remote nodes in here being > communicated with the network overhead is going to dwarf that. If there > were a fast path for the simplest cases and this complicated one for the > rest, I think you could get the fully programmable behavior some people > want using simple SQL, rather than having to write a new "Replication > Description Language" or something so ambitious. This data about what's > been replicated to where looks an awful lot like a set of rows you can > operate on using features already in the database to me. Yeah, if we want to provide full control over when a commit is acknowledged to the client, there's certainly no reason we can't expose that using a hook or something. It's pretty scary to call a user-defined function at that point in transaction. Even if we document that you must refrain from doing nasty stuff like modifying tables in that function, it's still scary. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: