Re: Sync Rep Design - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Sync Rep Design |
Date | |
Msg-id | AANLkTinUVPKCTUSZAiqAv5YsLmANXC_v=RHLo=KqeG_8@mail.gmail.com Whole thread Raw |
In response to | Re: Sync Rep Design (Simon Riggs <simon@2ndQuadrant.com>) |
List | pgsql-hackers |
On Sun, Jan 2, 2011 at 4:19 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Sun, 2011-01-02 at 18:54 +0200, Heikki Linnakangas wrote: > >> I believe we all agree that there's different use cases that require >> different setups. Both "first-past-the-post" and "wait-for-all-to-ack" >> have their uses. > > Robert's analysis is that "first-past-the-post" doesn't actually improve > the durability guarantee (according to his calcs). Which means that > 1 primary, 2 sync standbys with first-past-the-post > is actually worse than > 1 primary, 1 sync and 1 async standby > in terms of its durability guarantees. > > So ISTM that Robert does not agree that both have their uses. I think it depends on what failure modes you want to protect against. If you have a primary in New York, a secondary in Los Angeles, and another secondary in London, you might decide that the chances of two standbys being taken out by the same event are negligible, or alternatively that if one event does take out both of them, it'll be something like a meteor where you'll have bigger things to worry about than lost transactions. In that case, requiring one ACK but not two is pretty sensible. If the primary goes down, you'll look at the two remaining machines (which, by presumption, will still be up) and promote whichever one is ahead. In this setup, you get a performance benefit from waiting for either ACK rather than both ACKs, and you haven't compromised any of the cases you care about. However, if you have the traditional close/far setup, things are different. Suppose you have a primary and a secondary in New York and another secondary in Los Angeles. Now it has to be viewed as a reasonable possibility that you could lose the New York site. If that happens, you need to be able to promote the LA standby *without reference to the NY standby*. So you really can't afford to do the 1-of-2 thing, because then when NY goes away you're not sure whether the LA standby is safe to promote. So, IMHO, it just depends on what you want to do. >> I'm not >> sure what the point of such a timeout in general is, but people have >> requested that. > > Again, this sounds like you think a timeout has no measurable benefit, > other than to please some people's perceived needs. > >> The "wait-for-all-to-ack" looks a lot less ridiculous if you also >> configure a timeout and don't wait for disconnected standbys > > Does it? Do Robert, Stefan and Aidan agree? What are the availability > and durability percentages if we do that? Based on those, we may decide > to do that instead. But I'd like to see some analysis of your ideas, not > just a "we could". Since nobody has commented on my analysis, lets see > someone else's. Here's my take on this point. I think there is a use case for waiting for a disconnected standby and a use case for not waiting for a disconnected standby. The danger of NOT waiting for a disconnected standby is that if you then go on to irretrievably lose the primary, you lose transactions. But on the other hand, if you do wait, you've made the primary unavailable. I don't know that there's one right answer here. For some people, if they can't be certain of recording the transaction in two places, then it may be better to not process any transactions at all. For other people, it may be better to process transactions unprotected for a while while you get a new standby up. It's not for us to make that judgment; we're here to provide options. Having said that, I am OK with whichever one we want to implement first so long as we keep the door open to doing the other one later. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: