Re: 24x7x365 high-volume ops ideas - Mailing list pgsql-general

From Christopher Browne
Subject Re: 24x7x365 high-volume ops ideas
Date
Msg-id m3y8hdnfff.fsf@knuth.knuth.cbbrowne.com
Whole thread Raw
In response to 24x7x365 high-volume ops ideas  ("Ed L." <pgsql@bluepolka.net>)
Responses Re: 24x7x365 high-volume ops ideas
List pgsql-general
A long time ago, in a galaxy far, far away, Karim.Nassar@NAU.EDU (Karim Nassar) wrote:
> On Wed, 2004-11-03 at 18:10, Ed L. wrote:
>> unfortunately, the requirement is 100% uptime all the time, and any
>> downtime at all is a liability.  Here are some of the issues:
>
> Seems like 100% uptime is always an issue, but not even close to
> reality. I think it's unreasonable to expect a single piece of
> software that NEVER to be restarted. Never is a really long time.
>
> For this case, isn't replication sufficient? (FWIW, in 1 month I
> have to answer this same question). Would this work?
>
> * 'Main' db server up 99.78% of time
> * 'Replicant' up 99.78% of time (using slony, dbmirror)
> * When Main goes down (crisis, maintenance), Replicant answers for Main,
>   in a read-only fashion.
> * When Main comes back up, any waiting writes can now happen.
> * Likewise, Replicant can be taken down for maint, then Main syncs to it
>   when going back online.
>
> Is this how it's done?

The challenge lies in two places:

1.  You need some mechanism to detect that the "replica" should take
over, and to actually perform that takeover.

That "takeover" requires having some way for your application to
become aware of the new IP address of the DB host.

2.  Some changes need to take place in order to prepare the "replica"
to be treated as "master."

For instance, in the case of Slony-I, you can do a fullscale
"failover" where you tell it to treat the "main" database as being
dead.  At that point, the replica becomes the master.  That
essentially discards the former 'master' as dead.

Alternatively, there's a "MOVE SET" which is suitable for predictable
maintenance; that shifts the "master" node from one node to another;
you can take MAIN out of service for a while, and add it back, perhaps
making it the "master" again.

None of these systems _directly_ address how apps would get pointed to
the shifting servers.

A neat approach would involve making pgpool, a C-based 'connection
pool' manager, Slony-I-aware.  If it were to submit either MOVE SET or
FAILOVER, it would be aware of which DB to point things to, so that
applications that pass requests through pgpool would not necessarily
need to be aware of there being a change beyond perhaps seeing some
transactions terminated.  That won't be ready tomorrow...

Something needs to be "smart enough" to point apps to the right place;
that's something to think about...
--
let name="cbbrowne" and tld="linuxfinances.info" in String.concat "@" [name;tld];;
http://www3.sympatico.ca/cbbrowne/advocacy.html
"XFS might  (or might not)  come out before  the year 3000.  As far as
kernel patches go,  SGI are brilliant.  As far as graphics, especially
OpenGL,  go,  SGI is  untouchable.  As  far as   filing  systems go, a
concussed doormouse in a tarpit would move faster."  -- jd on Slashdot

pgsql-general by date:

Previous
From: Russ Allbery
Date:
Subject: Re: Postresql RFD version 2.0 Help Wanted.
Next
From: "Marc G. Fournier"
Date:
Subject: Re: Postresql RFD version 2.0 Help Wanted.