Thread: Log-based repliaction?
I would eventually like to: 1) find a way to (stress) test a mockup database server. 2) fix a near-realtime replication of our production database server for pulling out statistics and reports. Minor bugs would be permitable in such a setup. We don't need a full-fledged data warehouse solution just yet. 3) set up a failover solution *) Now, why not just get the production database to log all the queries, and feed them into a replication database? I guess this solution could be used for archieving any of the three things above. This idea is very trivial, but I never found any pointers while googling, so I assume there are some problems with this approach? Here is some things I can think of at the moment: - logging all queries at the production database will slow it down considerably (haven't done much testing here) - transactional model can easily be broken (postgres can log transaction starts, commits and rollbacks, and the transactions are also supposed to be serializable ... so I don't see the issue?) - disregarded due to the resource consumption on the replica server. - some of the transactions aren't really serializable, or relies on the exact timestamp for the operation. **) - unlike the wal, the log file doesn't get synced for every transaction, and this will cause delays and potentially data loss. ...anything else? The simplest thing would be to have one connection open towards the replica for every connection made towards the production database, run every query in order, and hope the best - should work good for problem 1 and problem 2 above. Still, maybe better (for 2) to filter out only queries altering the table and transactions ending with a commit - and do the transactions one by one, ordered by commit time. Would save quite some CPU cycles on the replica database compared to the suggestion above. I was administring a mysql database quite some years ago ... well, it was a simple hobby project and we didn't even need transactional operations. However, we did need point-in-time-recovery. The mysql way of fixinge this was to write a separate log containing only the queries involving writing to the database. This log was really nice to work with, and it could easily be used for replication as well. *) boss has decided that a failover solution is important and should be prioritied in 2007 ... even to the point where he's willing to of throw money at it. If anyone have experiences with failover-solutions built over a SAN, and can do consultancy services for us, please send me a private email ;-) **) We have had problems with transactions doing "if not exists in database then insert". Two such transactions can be run synchronously and result in duplicates.
For your immediate needs I'd recommend slony. It's a quite reliable replication solution that works quite nicely. It will sync the replica in nearly real-time and you won't have any transactional problems either. Uwe On Friday 20 October 2006 21:22, Tobias Brox wrote: > I would eventually like to: > > 1) find a way to (stress) test a mockup database server. > > 2) fix a near-realtime replication of our production database server > for pulling out statistics and reports. Minor bugs would be > permitable in such a setup. We don't need a full-fledged data > warehouse solution just yet. > > 3) set up a failover solution *) > > Now, why not just get the production database to log all the queries, > and feed them into a replication database? I guess this solution > could be used for archieving any of the three things above. This idea > is very trivial, but I never found any pointers while googling, so I > assume there are some problems with this approach? > > Here is some things I can think of at the moment: > > - logging all queries at the production database will slow it down > considerably (haven't done much testing here) > > - transactional model can easily be broken (postgres can log > transaction starts, commits and rollbacks, and the transactions are > also supposed to be serializable ... so I don't see the issue?) > > - disregarded due to the resource consumption on the replica server. > > - some of the transactions aren't really serializable, or relies on > the exact timestamp for the operation. **) > > - unlike the wal, the log file doesn't get synced for every > transaction, and this will cause delays and potentially data loss. > > ...anything else? > > The simplest thing would be to have one connection open towards the > replica for every connection made towards the production database, run > every query in order, and hope the best - should work good for problem > 1 and problem 2 above. > > Still, maybe better (for 2) to filter out only queries altering the > table and transactions ending with a commit - and do the transactions > one by one, ordered by commit time. Would save quite some CPU cycles > on the replica database compared to the suggestion above. > > I was administring a mysql database quite some years ago ... well, it > was a simple hobby project and we didn't even need transactional > operations. However, we did need point-in-time-recovery. The mysql > way of fixinge this was to write a separate log containing only the > queries involving writing to the database. This log was really nice > to work with, and it could easily be used for replication as well. > > > > *) boss has decided that a failover solution is important and should > be prioritied in 2007 ... even to the point where he's willing to of > throw money at it. If anyone have experiences with failover-solutions > built over a SAN, and can do consultancy services for us, please send > me a private email ;-) > > **) We have had problems with transactions doing "if not exists in > database then insert". Two such transactions can be run synchronously > and result in duplicates. > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings -- UC -- Open Source Solutions 4U, LLC 1618 Kelly St Phone: +1 707 568 3056 Santa Rosa, CA 95401 Cell: +1 650 302 2405 United States Fax: +1 707 568 6416
> For your immediate needs I'd recommend slony. > It's a quite reliable replication solution that works quite nicely. > It will sync the replica in nearly real-time and you won't have any > transactional problems either. I seems that slony will pretty much do everything mentioned. But can it do point-in-time recovery? Regards, Richard Broersma Jr.
> *) boss has decided that a failover solution is important and should > be prioritied in 2007 ... even to the point where he's willing to of > throw money at it. If anyone have experiences with failover-solutions > built over a SAN, and can do consultancy services for us, please send > me a private email ;-) > > **) We have had problems with transactions doing "if not exists in > database then insert". Two such transactions can be run synchronously > and result in duplicates. You have a couple of options. 1. Slony - Trigger based 2. Mammoth Replicator - Transaction Log based 3. PITR log shipping on timed intervals Sincerely, Joshua D. Drake > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
[Joshua D. Drake - Sat at 08:14:51AM -0700] > 1. Slony - Trigger based I don't know ... I'm not overly happy with the idea of using triggers for the replication. I'm a bit concerned about the overheads, both on the primary database server as well as the administrative costs setting it up and maintaining it. It really seems like slony is the most recommended route for postgres replication now, so I hope my concerns are unfounded. > 2. Mammoth Replicator - Transaction Log based Hm ... I don't really like the smell of proprietary binary code. Anyway, I was really happy with the mysql update log ... it was such a simple multi-purpose tool, it could be used for debugging, for rescue after a power break or sysadm mistakes, simple pitr, and much more. The idea of using transaction logs for replication is appealing; the replication-logic is done outside the database itself and not interphering with the database, it's done without significant resource consumption on the master server (since the database anyway has to make and write those transaction logs), etc. > 3. PITR log shipping on timed intervals We already do that (archive_command), but it's not exactly a hot standby failover solution. Even if we automated everything, it would still take quite long time to get the replica up and running. Also, some transactions would be lost, even if constantly rsyncing the WAL files. I don't think any of the solutions above satisfies my immediate need right now - stress testing. Right now we have a twin brother of our database server standing idle. My boss wants some numbers on how much more traffic our database server can handle, and I want to know what effects I'll get from tuning postgres conf. The best way I can think of right now for measuring this is to log all our transactions (including RO transactions and rolled back transactions) for a while, and "play them" on the replica server, on increased speed ...