Thread: Log-based repliaction?

Log-based repliaction?

From

Tobias Brox

Date:

21 October 2006, 01:22:59

I would eventually like to:

 1) find a way to (stress) test a mockup database server.

 2) fix a near-realtime replication of our production database server
    for pulling out statistics and reports.  Minor bugs would be
    permitable in such a setup.  We don't need a full-fledged data
    warehouse solution just yet.

 3) set up a failover solution *)

Now, why not just get the production database to log all the queries,
and feed them into a replication database?  I guess this solution
could be used for archieving any of the three things above.  This idea
is very trivial, but I never found any pointers while googling, so I
assume there are some problems with this approach?

Here is some things I can think of at the moment:

 - logging all queries at the production database will slow it down
   considerably (haven't done much testing here)

 - transactional model can easily be broken (postgres can log
   transaction starts, commits and rollbacks, and the transactions are
   also supposed to be serializable ... so I don't see the issue?)

 - disregarded due to the resource consumption on the replica server.

 - some of the transactions aren't really serializable, or relies on
   the exact timestamp for the operation. **)

 - unlike the wal, the log file doesn't get synced for every
   transaction, and this will cause delays and potentially data loss.

 ...anything else?

The simplest thing would be to have one connection open towards the
replica for every connection made towards the production database, run
every query in order, and hope the best - should work good for problem
1 and problem 2 above.

Still, maybe better (for 2) to filter out only queries altering the
table and transactions ending with a commit - and do the transactions
one by one, ordered by commit time.  Would save quite some CPU cycles
on the replica database compared to the suggestion above.

I was administring a mysql database quite some years ago ... well, it
was a simple hobby project and we didn't even need transactional
operations.  However, we did need point-in-time-recovery.  The mysql
way of fixinge this was to write a separate log containing only the
queries involving writing to the database.  This log was really nice
to work with, and it could easily be used for replication as well.



*) boss has decided that a failover solution is important and should
be prioritied in 2007 ... even to the point where he's willing to of
throw money at it.  If anyone have experiences with failover-solutions
built over a SAN, and can do consultancy services for us, please send
me a private email ;-)

**) We have had problems with transactions doing "if not exists in
database then insert".  Two such transactions can be run synchronously
and result in duplicates.

Re: Log-based repliaction?

From

"Uwe C. Schroeder"

Date:

21 October 2006, 01:41:31

For your immediate needs I'd recommend slony.
It's a quite reliable replication solution that works quite nicely.
It will sync the replica in nearly real-time and you won't have any
transactional problems either.

Uwe


On Friday 20 October 2006 21:22, Tobias Brox wrote:
> I would eventually like to:
>
>  1) find a way to (stress) test a mockup database server.
>
>  2) fix a near-realtime replication of our production database server
>     for pulling out statistics and reports.  Minor bugs would be
>     permitable in such a setup.  We don't need a full-fledged data
>     warehouse solution just yet.
>
>  3) set up a failover solution *)
>
> Now, why not just get the production database to log all the queries,
> and feed them into a replication database?  I guess this solution
> could be used for archieving any of the three things above.  This idea
> is very trivial, but I never found any pointers while googling, so I
> assume there are some problems with this approach?
>
> Here is some things I can think of at the moment:
>
>  - logging all queries at the production database will slow it down
>    considerably (haven't done much testing here)
>
>  - transactional model can easily be broken (postgres can log
>    transaction starts, commits and rollbacks, and the transactions are
>    also supposed to be serializable ... so I don't see the issue?)
>
>  - disregarded due to the resource consumption on the replica server.
>
>  - some of the transactions aren't really serializable, or relies on
>    the exact timestamp for the operation. **)
>
>  - unlike the wal, the log file doesn't get synced for every
>    transaction, and this will cause delays and potentially data loss.
>
>  ...anything else?
>
> The simplest thing would be to have one connection open towards the
> replica for every connection made towards the production database, run
> every query in order, and hope the best - should work good for problem
> 1 and problem 2 above.
>
> Still, maybe better (for 2) to filter out only queries altering the
> table and transactions ending with a commit - and do the transactions
> one by one, ordered by commit time.  Would save quite some CPU cycles
> on the replica database compared to the suggestion above.
>
> I was administring a mysql database quite some years ago ... well, it
> was a simple hobby project and we didn't even need transactional
> operations.  However, we did need point-in-time-recovery.  The mysql
> way of fixinge this was to write a separate log containing only the
> queries involving writing to the database.  This log was really nice
> to work with, and it could easily be used for replication as well.
>
>
>
> *) boss has decided that a failover solution is important and should
> be prioritied in 2007 ... even to the point where he's willing to of
> throw money at it.  If anyone have experiences with failover-solutions
> built over a SAN, and can do consultancy services for us, please send
> me a private email ;-)
>
> **) We have had problems with transactions doing "if not exists in
> database then insert".  Two such transactions can be run synchronously
> and result in duplicates.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

--
    UC

--
Open Source Solutions 4U, LLC    1618 Kelly St
Phone:  +1 707 568 3056        Santa Rosa, CA 95401
Cell:   +1 650 302 2405        United States
Fax:    +1 707 568 6416

Re: Log-based repliaction?

From

Richard Broersma Jr

Date:

21 October 2006, 02:03:19

> For your immediate needs I'd recommend slony.
> It's a quite reliable replication solution that works quite nicely.
> It will sync the replica in nearly real-time and you won't have any
> transactional problems either.

I seems that slony will pretty much do everything mentioned.  But can it do point-in-time
recovery?

Regards,

Richard Broersma Jr.

Re: Log-based repliaction?

From

"Joshua D. Drake"

Date:

21 October 2006, 12:24:11

> *) boss has decided that a failover solution is important and should
> be prioritied in 2007 ... even to the point where he's willing to of
> throw money at it.  If anyone have experiences with failover-solutions
> built over a SAN, and can do consultancy services for us, please send
> me a private email ;-)
>
> **) We have had problems with transactions doing "if not exists in
> database then insert".  Two such transactions can be run synchronously
> and result in duplicates.

You have a couple of options.

1. Slony - Trigger based
2. Mammoth Replicator - Transaction Log based
3. PITR log shipping on timed intervals

Sincerely,

Joshua D. Drake


>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>


--

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

Re: Log-based repliaction?

From

Tobias Brox

Date:

21 October 2006, 22:54:24

[Joshua D. Drake - Sat at 08:14:51AM -0700]
> 1. Slony - Trigger based

I don't know ... I'm not overly happy with the idea of using triggers
for the replication.  I'm a bit concerned about the overheads, both on
the primary database server as well as the administrative costs
setting it up and maintaining it.  It really seems like slony is the
most recommended route for postgres replication now, so I hope my
concerns are unfounded.

> 2. Mammoth Replicator - Transaction Log based

Hm ... I don't really like the smell of proprietary binary code.
Anyway, I was really happy with the mysql update log ... it was such a
simple multi-purpose tool, it could be used for debugging, for rescue
after a power break or sysadm mistakes, simple pitr, and much more.
The idea of using transaction logs for replication is appealing; the
replication-logic is done outside the database itself and not
interphering with the database, it's done without significant resource
consumption on the master server (since the database anyway has to
make and write those transaction logs), etc.

> 3. PITR log shipping on timed intervals

We already do that (archive_command), but it's not exactly a hot
standby failover solution.  Even if we automated everything, it would
still take quite long time to get the replica up and running.  Also,
some transactions would be lost, even if constantly rsyncing the WAL
files.

I don't think any of the solutions above satisfies my immediate need
right now - stress testing.  Right now we have a twin brother of our
database server standing idle.  My boss wants some numbers on how much
more traffic our database server can handle, and I want to know what
effects I'll get from tuning postgres conf.  The best way I can think of
right now for measuring this is to log all our transactions (including
RO transactions and rolled back transactions) for a while, and "play
them" on the replica server, on increased speed ...