Thread: Advice on multi-machine high-reliability setup?

Advice on multi-machine high-reliability setup?

From
postgresql@phreaker.net
Date:
Hi,

I've done some work with databases, but nothing extreme.
I just got assigned the setting up of what I'd call a
"high-reliability" site, as opposed to "high-availability" --
which I'd like too, of course :-)

I've got some ideas on how to acheive my goal, but I fear I'm
not quite up-to-date on the possibilities of replication and
such, and I haven't found much in-depth documentation on the
kind of setup I'd like (pointers anyone?), so I'd appreciate
it if you could give my ideas the once-over and tell me if I'm
missing something.

Requirements:
=============

Basic model is that of a reference database; I am a repository
for data which is constantly consulted and regularly updated.
OK, that pretty much describes any database :-) Anyway, a lot
more queries than updates.

I'll have one or more webservers using PHP to insert into the
database.  I want them to be limited to queries and inserts so
that bugs in the PHP scripts can only insert bad data and not
remove good data.  I prefer to have programs that modify the
central data to be server-side logic, and not farm out "UPDATE"s
to PHP scripts.

I want to keep a record of all updates (made through client
scripts) for some time, and I do want to be able to query
the updates, which I suppose eliminates simple logging.  I
need a ticket number for each update (I'm avoiding the term
"transaction"), and the capability to see when and how the
update was fulfilled, which updates were made in a given
time-frame, etc.

Consulting has to be immediate, while updates don't have to go
through instantaneously, but I want it rather quick.  One or two
minutes is OK, five minutes start being a long time.

If the client scripts say "there was an error, try again"
it's not a severe problem.  If they say "OK" and the data
doesn't make it into the nightly backup, even once, that's
a showstopper, and nobody will use my system.  Least
acceptable reason for losing data once acknowledged is the
near-simultaneous catastrophic loss of hard disks on two
separate machines.  (Yes, software RAID -- or maybe even
hardware if I get my hands on enough controllers -- will make
that four disks lost simultaneously :-)). I think a good way
would be for updating clients to write to one machine, and delay
acknowledgement until the data is read from the second "slave"
machine, possibly saying after some time "there seems to be a
delay, please recheck <URL with ticket number> in a few minutes
to see if your update went through".  Does that sound feasible?

I need to be able to exchange machines.  I fully expect my
crummy hardware to break, I know that sooner or later I'll want
to upgrade the OS in ways that require a reboot, or upgrade
crummy hardware, and I don't want the downtime.  I'm working on
the premise that having multiple cheap Linux boxes (I've got
lots of layoffs) is better than one or two really big expensive
servers (no money for that anyway).  I want to be able to take
a new machine, install it to specifications (or restore one
from backup !), switch it on, bring it up to date with current
database, and let it take over as a hot backup and/or primary
server.  Or as a web server, of course.

I don't have much of an idea on data size or update frequency,
which is one of the reasons I want to be able to plug in new
machines seamlessly; if load gets high, it means I'm popular,
and I'll be able to get better hardware :-)

My ideas:
=========

I'm thinking that the PHP scripts could write updates to a
"write-only" table/database/server.  I suppose there is a simple
way to make an insert into an auto-increment table and get in
return the unique number to use as a ticket.

I'd use the newly released replication to have the updates
and the data on a query-only server, an update only being
acknowleged to the user when it's registered on the query-only
server.

Once the update is registered, I use some kind of script
to convert "ticket=aaaaa, time=xx:xx who=Alice name=XXX
newvalue=YYY, done=NULL" into "UPDATE data SET value=YYY,
lastchange=aaaaa where name=XXX; UPDATE tickets set done=time
where ticket=aaaaa;".  Stored Procedure triggered on ticket
creation?  I've never used them . . . do they work across
machines?  That is, what would be the best way to have update
tables on one machine and data tables on another?  If I had
that, changing the data master would be transparent to users,
who'd just notice a five-minute delay before the update went
through instead of 30 (?) seconds.  It would be cool to use
transactions, but I don't think one transaction can refer to two
databases on two machines (yet)?

This should also enable me to restore a backup of the primary
data (in case of loss of the primary data), apply the updates
since the backup was made, and end-up with an up-to date
system. Hmm. Is this necessary if I have replication . . . is
replication necessary if I have this?

My doubts:
==========

If I do manage to put updates and data on two different servers,
would it be possible to make a transaction on the data triggered
by the creation of an update ticket?  That way I could make
updates conditional on the data ("You wanted to update name=XXX
from ZZZ (value when you queried) to YYY, but between you query
of the data and the update request another update went though,
setting the value to TTT").

Could I maybe have several update databases (one per web server
for instance, each replicating their data onto a shared slave)
so that I can exchange the update master without user-perceived
downtime?

Can I do partial replication?  Querying updates is obviously a
totally different operation from querying the data, and it'll be
much less frequent, so could I have slaves for just the data?

(Sudden fundamental doubt) Replication doesn't just mean better
query time on the slave, it also means failover to the slave if
need be, right?

(Most important) Am I overcomplicating things? I do want the
slightest data loss to need complete and near-simultanous loss
of two servers, I do want to be able to add and remove machines
from my almost-cluster witout users perceiving downtime, but is
there a simpler way to achive that goal than the multi-tiered
cascading update system I've outlined?

Thanks a lot for any pointers or comments you can give me.

HAND

FATAL 2: (pg_clog ... no such file or directory)

From
kaolin fire
Date:
Where would I go to start tracking down recurring error messages of the
sort:

FATAL 2:  open of /usr/local/pgsql/data/pg_clog/06F7 failed: No such
file or directory
FATAL 2:  open of /usr/local/pgsql/data/pg_clog/0707 failed: No such
file or directory

06F7 and 0707 do not exist.  Currently just looks like it goes from
0000 (May 14 2002) to 004F (Feb 6 2004, and counting).

I'm running 7.2.4

-kaolin fire


Re: FATAL 2: (pg_clog ... no such file or directory)

From
Tom Lane
Date:
kaolin fire <cognosco@tentacle.net> writes:
> Where would I go to start tracking down recurring error messages of the
> sort:

> FATAL 2:  open of /usr/local/pgsql/data/pg_clog/06F7 failed: No such
> file or directory
> FATAL 2:  open of /usr/local/pgsql/data/pg_clog/0707 failed: No such
> file or directory

> 06F7 and 0707 do not exist.  Currently just looks like it goes from
> 0000 (May 14 2002) to 004F (Feb 6 2004, and counting).

Given those facts, you have corrupt data --- specifically, a wildly
out-of-range transaction number in some tuple header, causing the tuple
validity checker to try to fetch a nonexistent page of the CLOG.  The
odds are good that the corruption extends further than just the one
field; that just happens to be the one that gets checked first.

There are discussions in the mailing list archives about how to locate
and clean up corrupted data.  It's a pretty messy process but you can
usually get back everything except the rows on the particular corrupted
page (I'm optimistically assuming there's only one).  Looking for
threads mentioning pg_filedump might be the quickest way to find info.

            regards, tom lane

Re: FATAL 2: (pg_clog ... no such file or directory)

From
"scott.marlowe"
Date:
On Fri, 6 Feb 2004, Tom Lane wrote:

> kaolin fire <cognosco@tentacle.net> writes:
> > Where would I go to start tracking down recurring error messages of the
> > sort:
>
> > FATAL 2:  open of /usr/local/pgsql/data/pg_clog/06F7 failed: No such
> > file or directory
> > FATAL 2:  open of /usr/local/pgsql/data/pg_clog/0707 failed: No such
> > file or directory
>
> > 06F7 and 0707 do not exist.  Currently just looks like it goes from
> > 0000 (May 14 2002) to 004F (Feb 6 2004, and counting).
>
> Given those facts, you have corrupt data --- specifically, a wildly
> out-of-range transaction number in some tuple header, causing the tuple
> validity checker to try to fetch a nonexistent page of the CLOG.  The
> odds are good that the corruption extends further than just the one
> field; that just happens to be the one that gets checked first.
>
> There are discussions in the mailing list archives about how to locate
> and clean up corrupted data.  It's a pretty messy process but you can
> usually get back everything except the rows on the particular corrupted
> page (I'm optimistically assuming there's only one).  Looking for
> threads mentioning pg_filedump might be the quickest way to find info.

And don't forget, if you've gotten corrupt data, you need to check all
your hardware thoroughly.  A single bad bit in memory can play more havoc
with your system than almost anything else, because it will fail just
often enough to cause problems, but seldom enough to be very hard to find.