Re: Synchronization levels in SR - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Synchronization levels in SR
Date
Msg-id 4C062380.8090108@enterprisedb.com
Whole thread Raw
In response to Re: Synchronization levels in SR  (Greg Smith <greg@2ndquadrant.com>)
Responses Re: Synchronization levels in SR  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 02/06/10 10:22, Greg Smith wrote:
> Heikki Linnakangas wrote:
>> The possibilities are endless... Your proposal above covers a pretty
>> good set of scenarios, but it's by no means complete. If we try to
>> solve everything the configuration will need to be written in a
>> Turing-complete Replication Description Language. We'll have to pick a
>> useful, easy-to-understand subset that covers the common scenarios. To
>> handle the more exotic scenarios, you can write a proxy that sits in
>> front of the master, and implements whatever rules you wish, with the
>> rules written in C.
>
> I was thinking about this a bit recently. As I see it, there are three
> fundamental parts of this:
>
> 1) We have a transaction that is being committed. The rest of the
> computations here are all relative to it.

Agreed.

> So in a 3 node case, the internal state table might look like this after
> a bit of data had been committed:
>
> node | location | state
> ----------------------------------
> a | local | fsync b | remote | recv
> c | remote | async
>
> This means that the local node has a fully persistent copy, but the best
> either remote one has done is received the data, it's not on disk at all
> yet at the remote data center. Still working its way through.
>
> 3) The decision about whether the data has been committed to enough
> places to be considered safe by the master is computed by a function
> that is passed this internal table as something like a SRF, and it
> returns a boolean. Once that returns true, saying it's satisfied, the
> transaction closes on the master and continues to percolate out from
> there. If it's false, we wait for another state change to come in and
> return to (2).

You can't implement "wait for X to ack the commit, but if that doesn't 
happen in Y seconds, time out and return true anyway" with that.

> While exposing the local state and running this computation isn't free,
> in situations where there truly are remote nodes in here being
> communicated with the network overhead is going to dwarf that. If there
> were a fast path for the simplest cases and this complicated one for the
> rest, I think you could get the fully programmable behavior some people
> want using simple SQL, rather than having to write a new "Replication
> Description Language" or something so ambitious. This data about what's
> been replicated to where looks an awful lot like a set of rows you can
> operate on using features already in the database to me.

Yeah, if we want to provide full control over when a commit is 
acknowledged to the client, there's certainly no reason we can't expose 
that using a hook or something.

It's pretty scary to call a user-defined function at that point in 
transaction. Even if we document that you must refrain from doing nasty 
stuff like modifying tables in that function, it's still scary.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: obsolete comments in xlog.c
Next
From: Heikki Linnakangas
Date:
Subject: Re: obsolete comments in xlog.c