Home > mailing lists

Re: Synchronization levels in SR - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: Synchronization levels in SR
Date	June 2, 2010 06:25:41
Msg-id	4C062380.8090108@enterprisedb.com Whole thread Raw
In response to	Re: Synchronization levels in SR (Greg Smith <greg@2ndquadrant.com>)
Responses	Re: Synchronization levels in SR
List	pgsql-hackers

Tree view

On 02/06/10 10:22, Greg Smith wrote:
> Heikki Linnakangas wrote:
>> The possibilities are endless... Your proposal above covers a pretty
>> good set of scenarios, but it's by no means complete. If we try to
>> solve everything the configuration will need to be written in a
>> Turing-complete Replication Description Language. We'll have to pick a
>> useful, easy-to-understand subset that covers the common scenarios. To
>> handle the more exotic scenarios, you can write a proxy that sits in
>> front of the master, and implements whatever rules you wish, with the
>> rules written in C.
>
> I was thinking about this a bit recently. As I see it, there are three
> fundamental parts of this:
>
> 1) We have a transaction that is being committed. The rest of the
> computations here are all relative to it.

Agreed.

> So in a 3 node case, the internal state table might look like this after
> a bit of data had been committed:
>
> node | location | state
> ----------------------------------
> a | local | fsync b | remote | recv
> c | remote | async
>
> This means that the local node has a fully persistent copy, but the best
> either remote one has done is received the data, it's not on disk at all
> yet at the remote data center. Still working its way through.
>
> 3) The decision about whether the data has been committed to enough
> places to be considered safe by the master is computed by a function
> that is passed this internal table as something like a SRF, and it
> returns a boolean. Once that returns true, saying it's satisfied, the
> transaction closes on the master and continues to percolate out from
> there. If it's false, we wait for another state change to come in and
> return to (2).

You can't implement "wait for X to ack the commit, but if that doesn't 
happen in Y seconds, time out and return true anyway" with that.

> While exposing the local state and running this computation isn't free,
> in situations where there truly are remote nodes in here being
> communicated with the network overhead is going to dwarf that. If there
> were a fast path for the simplest cases and this complicated one for the
> rest, I think you could get the fully programmable behavior some people
> want using simple SQL, rather than having to write a new "Replication
> Description Language" or something so ambitious. This data about what's
> been replicated to where looks an awful lot like a set of rows you can
> operate on using features already in the database to me.

Yeah, if we want to provide full control over when a commit is 
acknowledged to the client, there's certainly no reason we can't expose 
that using a hook or something.

It's pretty scary to call a user-defined function at that point in 
transaction. Even if we document that you must refrain from doing nasty 
stuff like modifying tables in that function, it's still scary.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

pgsql-hackers by date:

From: Fujii Masao
Date: 02 June 2010, 04:40:03
Subject: obsolete comments in xlog.c

From: Heikki Linnakangas
Date: 02 June 2010, 06:29:15
Subject: Re: obsolete comments in xlog.c

Re: Synchronization levels in SR - Mailing list pgsql-hackers

Previous

Next