Re: Postgresql replication - Mailing list pgsql-general

From William Yu
Subject Re: Postgresql replication
Date
Msg-id deqen3$miv$1@news.hub.org
Whole thread Raw
In response to Re: Postgresql replication  (Chris Travers <chris@travelamericas.com>)
List pgsql-general
Our own personal IM :)

Chris Travers wrote:
> The delay will by definition defeat any guarantee of financial integrity
> if you are allowing read-write operations to the replica without
> checking with some sort of central authority.  At very least, the
> central authority should look for suspicious patterns.  Again, it may be
> possible to do some failover here, but I don't think you can do without
> *some* sort of centralized control.

Actually this is the easy part. When the home server finally issues
payments, it only issues what it knows about and what can be verified as
OK. Any transactions that are currently being entered on another server
will appear after the next replication cycle and it will be verified
afterwards. If the previous payment issuing cycle used up all the money,
the "new" requests are kept in pending until money is put in. This does
allow for newer requests that happen to be executed on home servers to
possibly take precendence over old requests but there is no requirement
in the business process that payments must come out in any specific order.


> This doesn;t make the security issue go away, but it may reduce it to an
> acceptable level. I.e. it is still possible for duplicates to be
> submitted just before and after a home server goes down, but this is a
> lot better than being able to have one transaction repeated on each
> server and then dealing with the massively overdrawn account.

The "home" server going down is the trickiest issue. Because when a
server disappears, is that because it went down temporarily? For good? A
temporary internet problem where nobody can get access to it? Or an
internet routing issue where just the connection between those two
servers is severed? If it's the last, users might still be doing stuff
on ServerA with ServerA is posting financials but ServerB thinks the
server is down and decides to take over ServerA's duties. Of course, in
ServerA's view, it's ServerB and ServerC that's down -- not itself.

Maybe we can mitigate this by having more servers at more data centers
around the world so everybody can monitor everybody. At some point, if
you have N servers and N-1 servers say ServerA is down, it probably is
down. With a high enough N, ServerA could probably decisively decide it
was the server severed from the internet and refuse to post any
financials until connection to the outside world was restore + some
extra threshold.

This last problem, which luckily occurs rarely, we do by hand right now.
We're not ready to run this on full auto because we only have 2 data
centers (with multiple servers within each data center). The servers do
not have enough info to know which server is actually down in order to
auto-promote/demote. It does require staff that's not just in 1 location
though because our primary office going down w/ our local datacenter
would mean nobody there could do the switchover. (Assuming major natural
disaster that kept us from using our laptops at the local Starbucks to
do the work.)

pgsql-general by date:

Previous
From: Mike Nolan
Date:
Subject: Dumb question about 8.1 beta test
Next
From: Greg Stark
Date:
Subject: Re: POSS. FEATURE REQ: "Dynamic" Views