Re: Synchronous Standalone Master Redoux - Mailing list pgsql-hackers
From | Jose Ildefonso Camargo Tolosa |
---|---|
Subject | Re: Synchronous Standalone Master Redoux |
Date | |
Msg-id | CAETJ_S9Tr8aFhy9xDKExbawgMdnw8NaFkRKdBDsovU8i6nw+0w@mail.gmail.com Whole thread Raw |
In response to | Re: Synchronous Standalone Master Redoux (Dimitri Fontaine <dimitri@2ndQuadrant.fr>) |
Responses |
Re: Synchronous Standalone Master Redoux
|
List | pgsql-hackers |
On Thu, Jul 12, 2012 at 8:35 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Hi, > > Jose Ildefonso Camargo Tolosa <ildefonso.camargo@gmail.com> writes: >> environments. And no, it doesn't makes synchronous replication >> meaningless, because it will work synchronous if it have someone to >> sync to, and work async (or standalone) if it doesn't: that's perfect >> for HA environment. > > You seem to want Service Availibility when we are providing Data > Availibility. I'm not saying you shouldn't ask what you're asking, just > that it is a different need. Yes, and no: I don't see why we can't have and option to choose which one we want. I can see the point of "data availability": it is better freeze the service, than risk losing transactions... however, try to explain that to some managers: "well, you know, the DB server froze the whole bank system because, well, the standby server died, and we didn't want to risk transaction loss, we just froze the master.... you know, in case the master were to die too before the we had a reliable standby." I don't think a manager would really understand why you would block the whole company's system, just because *the standby* server died (and why you don't block it, when the master dies?!). Now, maybe that's a bad example, I know a bank should have at least 3 or 4 servers, with some of them in different geographical areas, but just think on the typical boss. In "Service Availability", you have data Availability most of the time, until one of the servers fails (if you have just 2 nodes), what if you have more than two: well, good for you! But, you can keep going with a single server, understanding that you are in a high risk, that have to be fixed real soon (emergency). > > If you troll the archives, you will see that this debate has received > much consideration already. The conclusion is that if you care about > Service Availibility you should have 2 standby servers and set them both > as candidates to being the synchronous one. That's more cost, and for most applications: it doesn't worth the extra cost. Really, I see the point you have, and I have *never* asked to remove the data warranties, but to have an option to relax it, if the particular situation requires it: "enough safety" for a given cost. > > That way, when you lose one standby the service is unaffected, the > second standby is now the synchronous one, and it's possible to > re-attach the failed standby live, with or without archiving (with is > preferred so that the master isn't involved in the catch-up phase). > >> As synchronous standby currently is, it just doesn't fit the HA usage, > > It does actually allow both data high availability and service high > availability, provided that you feed at least two standbys. Still, doesn't fit. You need to spend more hardware, and more power (and money there), and more carbon footprint, ..... you get the point, also, having 3 servers for your DB can be necessary (and possible) for some companies, but for others: no. > > What you seem to be asking is both data and service high availability > with only two nodes. You're right that we can not provide that with > current releases of PostgreSQL. I'm not sure anyone has a solid plan to > make that happen. > >> and if you really want to keep it that way, it doesn't belong to the >> HA chapter on the pgsql documentation, and should be moved. And NO >> async replication will *not* work for HA, because the master can have >> more transactions than standby, and if the master crashes, the standby >> will have no way to recover these transactions, with synchronous >> replication we have *exactly* what we need: the data in the standby, >> after all, it will apply it once we promote it. > > Exactly. We want data availability first. Service availability is > important too, and for that you need another standby. Yeah, you need that with PostgreSQL, but no with DRBD, for example (sorry, but DRBD is one of the flagships of HA things in the Linux world). Also, I'm not convinced about the "2nd standby" thing... I mean, just read this on the docs, which is a little alarming: "If primary restarts while commits are waiting for acknowledgement, those waiting transactions will be marked fully committed once the primary database recovers. There is no way to be certain that all standbys have received all outstanding WAL data at time of the crash of the primary. Some transactions may not show as committed on the standby, even though they show as committed on the primary. The guarantee we offer is that the application will not receive explicit acknowledgement of the successful commit of a transaction until the WAL data is known to be safely received by the standby." So... there is no *real* warranty here either... I don't know how I skipped that paragraph before today.... I mean, this implies that it is possible that a transaction could be marked as commited on the master, but the app was not informed on that (and thus, could try to send it again), and the transaction was NOT applied on the standby.... how can this happen? I mean, when the master comes back, shouldn't the standby get the missing WAL pieces from the master and then apply the transaction? The standby part is the one that I don't really get, on the application side... well, there are several ways in which you can miss the "commit confirmation": connection issues in the worst moment, and the such, so, I guess it is not *so* serious, and the app should have a way of checking its last transaction if it lost connectivity to server before getting the transaction commited.
pgsql-hackers by date: