Thread: replication/redundancy

replication/redundancy

From
"Dave [Hawk-Systems]"
Date:
some searches on this have produced mixed results...

do we have a stable means to replicate transactions between two physical
servers, preferrably in master - master configuration where updates/inserts can
be done on either database and the results replicated to the other master.

Dave




Re: replication/redundancy

From
weigelt@metux.de
Date:
On Fri, Jun 27, 2003 at 08:05:02PM -0400, Dave [Hawk-Systems] wrote:
> some searches on this have produced mixed results...
>
> do we have a stable means to replicate transactions between two physical
> servers, preferrably in master - master configuration where updates/inserts can
> be done on either database and the results replicated to the other master.

I've integrated an replication support in my object abstraction layer in php.
(evrything is running over this layer - no direct queries in the application)

in short words:
each table has the fields inode_id (int4) and mtime (timestamp). on each write
operation, the mtime must be updated. from time to time (i.e. once per minute)
an script runs over the db, fetches out all new records and posts them to
the other nodes. this way, evry node can write anytime (masterless). but this
brings some problems on highly-written databases - conflicts may occour.

i didnt find an better solution yet, but still works for all of my apps,
even running over email :)

cu
--
---------------------------------------------------------------------
 Enrico Weigelt    ==   metux ITS
 Webhosting ab 5 EUR/Monat.          UUCP, rawIP und vieles mehr.

 phone:     +49 36207 519931         www:       http://www.metux.de/
 fax:       +49 36207 519932         email:     contact@metux.de
 cellphone: +49 174 7066481         smsgate:   sms.weigelt@metux.de
---------------------------------------------------------------------
 Diese Mail wurde mit UUCP versandt.      http://www.metux.de/uucp/

Re: replication/redundancy

From
Toni Schlichting
Date:

weigelt@metux.de wrote:

>On Fri, Jun 27, 2003 at 08:05:02PM -0400, Dave [Hawk-Systems] wrote:
>
>
>>some searches on this have produced mixed results...
>>
>>do we have a stable means to replicate transactions between two physical
>>servers, preferrably in master - master configuration where updates/inserts can
>>be done on either database and the results replicated to the other master.
>>
I fear up to date there is no such facility available. As far as I have
understood
your question you intend to copy all transactions from database-A to
database-B and
continously rollforward database-B, or so.

It seems pgreplication will be the tool you are looking. it is under way
but it is
not here yet. It looks as if pgreplication will do much more than just
enabling a
hot standby. What I have read up to now looks more like a
quorum-facility. I hope,
I didn't get this wrong, because I would have use for something like this.

Ciao, Toni



Re: replication/redundancy

From
Jonathan Gardner
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sunday 29 June 2003 01:28, Toni Schlichting wrote:
> weigelt@metux.de wrote:
> It seems pgreplication will be the tool you are looking. it is under way
> but it is
> not here yet. It looks as if pgreplication will do much more than just
> enabling a
> hot standby. What I have read up to now looks more like a
> quorum-facility. I hope,
> I didn't get this wrong, because I would have use for something like this.
>

Yes, replication is going to be the next BIG thing in the database world.
Right now, no one (not even Oracle, and I've experience with their system)
has a good solution out there. The stuff that pgreplication is doing is
earth-shatteringly huge. Read through the research paper and see for yourself
what I mean.

    http://www.cs.mcgill.ca/~kemme/papers/vldb00.html

What the original poster is looking for is called "peer-to-peer eager
replication". "peer-to-peer" because there is no master server, "eager"
because the two databases are always exactly the same.

The really cool thing about pgreplication will be that it is hot-swappable. If
one of the servers goes down, the whole thing will still work. You can also
add databases to the system on-the-fly. That way, you can always keep one (or
two or three) extra stand-by databases.

- --
Jonathan Gardner <jgardner@jonathangardner.net>
(was jgardn@alumni.washington.edu)
Live Free, Use Linux!
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE/AFe9WgwF3QvpWNwRAvRxAKClx8twjsSyQ0uwqr1ukSUYeQtkuACgxxz3
tsl8mOP53S8PRwMPOMPPy24=
=yDXw
-----END PGP SIGNATURE-----

Re: replication/redundancy

From
weigelt@metux.de
Date:
On Mon, Jun 30, 2003 at 08:31:09AM -0700, Jonathan Gardner wrote:

<snip>
> The really cool thing about pgreplication will be that it is
> hot-swappable. If one of the servers goes down, the whole thing will
> still work. You can also add databases to the system on-the-fly.
> That way, you can always keep one (or two or three) extra stand-by databases.

This is also provided by my system, but with some strict limitations:

* all tables must have the same first three fields: inode_id, mtime, attr
  (im using inherited tables, perhaps thats not needed)
* on each write operation, the mtime field must be set accurately
  (im currently not using an rule, since i dont know if there's an
  way to disabled this within an query -- perhaps someone can help here)
* currently only an explicit sync-out is supported - from time to time
  evry table has to be scanned for new records
* currently no real conflict handling

perhaps we can improve this a little bit.

cu
--
---------------------------------------------------------------------
 Enrico Weigelt    ==   metux ITS
 Webhosting ab 5 EUR/Monat.          UUCP, rawIP und vieles mehr.

 phone:     +49 36207 519931         www:       http://www.metux.de/
 fax:       +49 36207 519932         email:     contact@metux.de
 cellphone: +49 174 7066481         smsgate:   sms.weigelt@metux.de
---------------------------------------------------------------------
 Diese Mail wurde mit UUCP versandt.      http://www.metux.de/uucp/

Re: replication/redundancy

From
weigelt@metux.de
Date:
On Tue, Jul 01, 2003 at 08:22:37AM -0700, Jonathan Gardner wrote:

<snip>
> So you are using "lazy" rather than "eager" replication.
correct.

<snip>
>   1) The data is not consistent. This means if you run the same select query
> at the same time on the two databases, you may get different results. For
> some situations, that is okay (like Usenet). For others, it is not. (like
> registrations -- you'll sign up on one database, but you won't appear on the
> other.)
Yes. For those cases, an explicit syncout should be triggered as fast
as possible.
What you of course cannot is to store session data into the clustered DB.
All requests to same session must go to the same host. In web applications
this can be done by redirecting the client to an per-server-vhost before
creating the session. (i.e. from www.metux.de to s[1..n].metux.de).
So the session data is always available for the application. In the case
of failure only those sessions are lost, which run on the failing server.

Another solution could be an NFS-shared session directory or an rpc-based
update notification system. Perhaps we could implement some remote locking
features direct into the postmaster.

<snip>
>   3) These two factors above make using the second database as a hot-swappable
> backup risky at best. You will lose some data when you switch to the backup,
> unless changes to the database are so rare that the backup is usually up to
> date. If that were the case, you probably don't need the backup in the first
> place, because databases that don't do much tend not to be very important.
For my current applications it seems ok that some DBs are one or two
minutes behind.

<snip>
> The most obvious problem with this comes from incrementing a column.
I had such an case. If i want to count objects, i normally put them
into an table and count the rows (i.e. from an view).

If both servers add an new object, there will be no conflict.

<snip>
> This would be bad for things like paypal, where your account would only
> increase by one or the other account transfers, rather than both, if two
> occured at the same time.
Well, for such applications, the actual account value must be calculated
from an checkpoint and the lasted transactions. Checkpoints may be made
only by an master (which may change from time to time)

cu
--
---------------------------------------------------------------------
 Enrico Weigelt    ==   metux ITS
 Webhosting ab 5 EUR/Monat.          UUCP, rawIP und vieles mehr.

 phone:     +49 36207 519931         www:       http://www.metux.de/
 fax:       +49 36207 519932         email:     contact@metux.de
 cellphone: +49 174 7066481         smsgate:   sms.weigelt@metux.de
---------------------------------------------------------------------
 Diese Mail wurde mit UUCP versandt.      http://www.metux.de/uucp/

Re: replication/redundancy

From
Jonathan Gardner
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Monday 30 June 2003 09:17, weigelt@metux.de wrote:
> On Mon, Jun 30, 2003 at 08:31:09AM -0700, Jonathan Gardner wrote:
>
> * currently only an explicit sync-out is supported - from time to time
>   evry table has to be scanned for new records

So you are using "lazy" rather than "eager" replication. I am sure you know
the limitations for lazy replication. Let me enumerate them here for those of
you who aren't familiar with this:

  1) The data is not consistent. This means if you run the same select query
at the same time on the two databases, you may get different results. For
some situations, that is okay (like Usenet). For others, it is not. (like
registrations -- you'll sign up on one database, but you won't appear on the
other.)

  2) The "other" process that does the synchronization is serial in nature.
The processes that change the database are parallel in nature. It is very
possible to have changes happening to the database faster than you can
replicate them. This was a real problem at a web company I recently worked
for that used lazy replication. Their backup database fell weeks behind the
live database. It almost got to the point where recreating the entire
database would've been faster than waiting for the replication process to
catch up.

  3) These two factors above make using the second database as a hot-swappable
backup risky at best. You will lose some data when you switch to the backup,
unless changes to the database are so rare that the backup is usually up to
date. If that were the case, you probably don't need the backup in the first
place, because databases that don't do much tend not to be very important.

>
> * currently no real conflict handling
>

What he is talking about here is what happens when two seperate processes are
working on the same rows. PostgreSQL uses transactions and locking right now,
so two processes on the same system cannot do this. However, his system
cannot handle this at all when the two processes are on seperate machines.

The most obvious problem with this comes from incrementing a column. If both
processes try to increment the same column, then they will end up with the
column incremented by one or the other, but not both. This would be bad for
things like paypal, where your account would only increase by one or the
other account transfers, rather than both, if two occured at the same time.

>
> perhaps we can improve this a little bit.
>

I would hope you spend some time researching what others have done. Relational
databases are an area that a tremendous amount of solid research has already
occured. Applying yourself to understand the research and projects that have
gone before you will save yourself a lot of time replicating their work. In
other words, "If I have seen farther, it is because I have stood on the
shoulders of giants" to (mis?)quote Newton.

Again, to re-emphasize why pgreplication is so cool and why everyone should be
excited about this:
  1) Database theory says that scaleable, eager replication is impossible.
This is true in practice.
  2) The Postgres-R team discovered a way to make scaleable, eager replication
work. The restriction is that locks, once granted, may be aborted or revoked.
  3) This means you will one day be able to setup a beowulf-type cluster of
postgres databases that will rival the most powerful databases on earth
today.

- --
Jonathan Gardner <jgardner@jonathangardner.net>
(was jgardn@alumni.washington.edu)
Live Free, Use Linux!
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE/Aac+WgwF3QvpWNwRAgFxAJ9Mxesnc6Q3wLrUcL1Zz62AGLLjGACcCYJp
zcV9rFm8TiqH90N6eSpRQnY=
=/bFm
-----END PGP SIGNATURE-----