Re: Sync Rep: Second thoughts - Mailing list pgsql-hackers

From Emmanuel Cecchet
Subject Re: Sync Rep: Second thoughts
Date
Msg-id 494ED74C.1060003@frogthinker.org
Whole thread Raw
In response to Re: Sync Rep: Second thoughts  (Markus Wanner <markus@bluegap.ch>)
Responses Re: Sync Rep: Second thoughts
List pgsql-hackers
Hi Markus,

I am happy to see that Postgres-R is alive again. The paper was written 
in 07 (and published in 08, the review process is longer than a 
CommitFest ;-)) and at the time of the writing there was no version of 
Postgres-R available, hence the 'obsolete' mention referring to past 
versions.
I think that it is legitimate for users to expect more guarantees from a 
replicated database than from a single database. Not knowing what happen 
when a failure happens at commit time when some nodes are still active 
in a cluster is not intuitive for users.
I did not look at the source, but if Postgres -R continue to elaborate 
on Bettina's ideas with writeset extraction and a certification 
protocol, I think that it will be a bad idea to try to mix it with Sync 
Rep (mentioned in another thread). If you delay commits, you will 
increase the window for transactions to conflict and therefore induce a 
higher abort rate (thus less scalability). Certification-based 
approaches have already multiple reliability issues to improve write 
performance compared to statement-based replication, but this is very 
dependent on the capacity of the system to limit the conflicting window 
for concurrent transactions. The writeset extraction mechanisms have had 
too many limitations so far to allow the use of certification-based 
replication in production (AFAIK).

Good luck with Postgres-R.
Emmanuel

> Emmanuel Cecchet wrote:
>   
>> What the application is going to see is a failure when the postmaster it
>> is connected to is going down. If this happen at commit time, I think
>> that there is no guarantee for the application to know what happened:
>> 1. failure occurred before the request reached postmaster:  no instance
>> committed
>> 2. failure occurred during commit: might be committed on either nodes
>> 3. failure occurred while sending back ack of commit to client: both
>> instances have committed
>> But for the client, it will all look the same: an error on commit.
>>     
>
> This is very much the same for a single node database system, so I think
> current application developers are used to that behavior.
>
> A distributed database system just needs to make sure, that committed
> transactions can and will eventually get committed on all nodes. So in
> case the client doesn't receive a COMMIT acknowledgment due to atny kind
> of failure, it can only be sure the distributed database is in a
> consistent state. The client cannot tell, if it applied the transaction
> or not. Much like for single node systems.
>
> I agree, that a distributed database system could theoretically do
> better, but that would require changes on the client side connection
> library as well, letting the client connect to two or even more nodes of
> the distributed database system.
>
>   
>> This is just to point out that despite all your efforts, the client
>> might think that some transactions have failed (error on commit) but
>> they are actually committed.
>>     
>
> As pointed out above, that would currently be an erroneous conclusion.
>
>   
>> If you don't put some state in the driver
>> that is able to check at failover time if the commit operation succeeded
>> or not, it does not really matter what happens for in-flight
>> transactions (or in-commit transactions) at failure time.
>>     
>
> Sure it does. The database system still needs to guarantee consistency.
>
> Hm.. well, you're right if there's only one single standby left (as is
> obviously the case for the proposed Sync Rep). Ensuring consistency is
> pretty simple in such a case. But imagine having two or more standby
> servers. Those would need to agree on a set of in-flight transactions
> from the master they both need to apply.
>
>   
>> Actually, if there was a way to query the database about the status of a
>> particular transaction by providing a cluster-wide unique id, that would
>> help a lot.
>>     
>
> You're certainly aware, that Postgres-R features such a global
> transaction id... And I guess Sync Rep could easily add such an
> identifier as well.
>
>   
>> I wrote a paper on the issues with database replication at
>> Sigmod earlier this year (http://infoscience.epfl.ch/record/129042).
>> Even though it was targeted at middleware replication, I think that some
>> of it is still relevant for the problem at hand.
>>     
>
> Interesting read, thanks for the pointer. It's pretty obvious that I
> don't consider Postgres-R to be obsolete...
>
> I've linked your paper from www.postgres-r.org [1].
>
>   
>> Regarding the wording, if experts can't agree, you can be sure that
>> users won't either. Most of them don't have a clue about the different
>> flavors of replication. So as long as you state clearly how it behaves
>> and define all the terms you use that should be fine.
>>     
>
> I mostly agree to that, without repeating my concerns again, here.
>
> Regards
>
> Markus Wanner
>
>
> [1]: Referenced Papers from Postgres-R website:
> http://www.postgres-r.org/documentation/references
>
>   


-- 
Emmanuel Cecchet
FTO @ Frog Thinker 
Open Source Development & Consulting
--
Web: http://www.frogthinker.org
email: manu@frogthinker.org
Skype: emmanuel_cecchet



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Hot standby and b-tree killed items
Next
From: Markus Wanner
Date:
Subject: Re: Sync Rep: Second thoughts