Thread: Replication for a large database
Hello all, I apologize if this has already been covered in the past, but I couldn't seem to find an adequate solution to my problem in the archives. I have a database that is used for a bandwidth tracking system at a broadband ISP. To make a long story short, I'm inserting over 800,000 records per day into this database. Suffice to say, the uptime of this database is of paramount importance, so I would like to have a more up-to-date backup copy of my database in the event of a failure (more recent than my twice-per-day db_dump backup). I have two servers, both Dual Xeon-2G with 4G of RAM, and would like to replicate between the two. I would like to have "live" replication, but I couldn't seem to find a solution for that for PostgreSQL. I tried RServ but, after attempting it, I saw a mailing list posting saying that it is more-or-less useless for databases that have a large number of inserts (like mine). When I perform a replication after a batch of data is inserted, the query runs literally for hours before it returns. I have never actually been present during the whole replication duration since it takes longer than my 8-12 hour days here at work. Is there any replication solution that would fit my needs? I'm taking advantage of some PG7.2 features so "downgrading" to the 6.x version of postgres that has replication support isn't an option. Thanks. --man
Ok, mabye this is just because I'm coming from a layman's perspective regarding enterprise level databases, but couldn't you fake replication by inserting the data into both databases? (granted this involves having source access to the program doing the insertion.) It may be a kludge, but it would work until something better came along. Ryan > Hello all, > > I apologize if this has already been covered in the past, but I > couldn't seem to find an adequate solution to my problem in the > archives. > > I have a database that is used for a bandwidth tracking system at a > broadband ISP. To make a long story short, I'm inserting over > 800,000 records per day into this database. Suffice to say, the > uptime of this database is of paramount importance, so I would like > to have a more up-to-date backup copy of my database in the event of > a failure (more recent than my twice-per-day db_dump backup). > > I have two servers, both Dual Xeon-2G with 4G of RAM, and would like > to replicate between the two. I would like to have "live" > replication, but I couldn't seem to find a solution for that for > PostgreSQL. I tried RServ but, after attempting it, I saw a mailing > list posting saying that it is more-or-less useless for databases > that have a large number of inserts (like mine). > > When I perform a replication after a batch of data is inserted, the > query runs literally for hours before it returns. I have never > actually been present during the whole replication duration since it > takes longer than my 8-12 hour days here at work. > > Is there any replication solution that would fit my needs? I'm taking > advantage of some PG7.2 features so "downgrading" to the 6.x version > of postgres that has replication support isn't an option. > > Thanks. > > --man
Here's something interesting: http://www.objectweb.org/c-jdbc/index.html It's alpha quality, and it's not "true" replication, but it might be interesting to keep an eye on. I've only read about it, so I can't claim any personal experience. Sounds like a good solution for this situation, if it works as advertised. MT --- Ryan <pgsql-sql@seahat.com> wrote: > Ok, mabye this is just because I'm coming from a layman's perspective > regarding enterprise level databases, but couldn't you fake replication > by inserting the data into both databases? (granted this involves > having source access to the program doing the insertion.) > > It may be a kludge, but it would work until something better came along. > > Ryan > > > Hello all, > > > > I apologize if this has already been covered in the past, but I > > couldn't seem to find an adequate solution to my problem in the > > archives. > > > > I have a database that is used for a bandwidth tracking system at a > > broadband ISP. To make a long story short, I'm inserting over > > 800,000 records per day into this database. Suffice to say, the > > uptime of this database is of paramount importance, so I would like > > to have a more up-to-date backup copy of my database in the event of > > a failure (more recent than my twice-per-day db_dump backup). > > > > I have two servers, both Dual Xeon-2G with 4G of RAM, and would like > > to replicate between the two. I would like to have "live" > > replication, but I couldn't seem to find a solution for that for > > PostgreSQL. I tried RServ but, after attempting it, I saw a mailing > > list posting saying that it is more-or-less useless for databases > > that have a large number of inserts (like mine). > > > > When I perform a replication after a batch of data is inserted, the > > query runs literally for hours before it returns. I have never > > actually been present during the whole replication duration since it > > takes longer than my 8-12 hour days here at work. > > > > Is there any replication solution that would fit my needs? I'm taking > > advantage of some PG7.2 features so "downgrading" to the 6.x version > > of postgres that has replication support isn't an option. > > > > Thanks. > > > > --man > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com
I have thought about this. The problem I come into is data consistancy. I have at least 8 different processes that harvest data, and an intranet website that can also manipulate the database (to assign customers to different packages, re-assign modems to different customers, etc). Trying to maintain consistancy across the entire application would be such a nightmare, I don't want to think about it. If I go with a centralized middleware server that manages all database access, then I could perhaps do that in there...and then I could use transactions on both databases, and if either transaction fails then I'll roll back the other. But this would make my entire framework very rigid. On Monday 05 May 2003 09:16 am, Ryan wrote: > Ok, mabye this is just because I'm coming from a layman's perspective > regarding enterprise level databases, but couldn't you fake replication > by inserting the data into both databases? (granted this involves > having source access to the program doing the insertion.) > > It may be a kludge, but it would work until something better came along. > > Ryan > > > Hello all, > > > > I apologize if this has already been covered in the past, but I > > couldn't seem to find an adequate solution to my problem in the > > archives. > > > > I have a database that is used for a bandwidth tracking system at a > > broadband ISP. To make a long story short, I'm inserting over > > 800,000 records per day into this database. Suffice to say, the > > uptime of this database is of paramount importance, so I would like > > to have a more up-to-date backup copy of my database in the event of > > a failure (more recent than my twice-per-day db_dump backup). > > > > I have two servers, both Dual Xeon-2G with 4G of RAM, and would like > > to replicate between the two. I would like to have "live" > > replication, but I couldn't seem to find a solution for that for > > PostgreSQL. I tried RServ but, after attempting it, I saw a mailing > > list posting saying that it is more-or-less useless for databases > > that have a large number of inserts (like mine). > > > > When I perform a replication after a batch of data is inserted, the > > query runs literally for hours before it returns. I have never > > actually been present during the whole replication duration since it > > takes longer than my 8-12 hour days here at work. > > > > Is there any replication solution that would fit my needs? I'm taking > > advantage of some PG7.2 features so "downgrading" to the 6.x version > > of postgres that has replication support isn't an option. > > > > Thanks. > > > > --man > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
IIRC, pgsql.com's product (Postgres-R I think it's called) is based on PostgreSQL-6.4.2, which won't work for me since I use a lot of features available only in the 7.x generation. I'm using the Rserv code from the contrib section of Postgres, and that's what I'm having problems with. On Monday 05 May 2003 10:30 am, Rajesh Kumar Mallah wrote: > Have u tried the Rserv solution which pgsql.com provides > commercially? I have heard its pretty good. > > > regds > mallah. > > On Mon, 5 May 2003, Michael A Nachbaur wrote: > > Hello all, > > > > I apologize if this has already been covered in the past, but I couldn't > > seem to find an adequate solution to my problem in the archives. > > > > I have a database that is used for a bandwidth tracking system at a > > broadband ISP. To make a long story short, I'm inserting over 800,000 > > records per day into this database. Suffice to say, the uptime of this > > database is of paramount importance, so I would like to have a more > > up-to-date backup copy of my database in the event of a failure (more > > recent than my twice-per-day db_dump backup). > > > > I have two servers, both Dual Xeon-2G with 4G of RAM, and would like to > > replicate between the two. I would like to have "live" replication, but > > I couldn't seem to find a solution for that for PostgreSQL. I tried > > RServ but, after attempting it, I saw a mailing list posting saying that > > it is more-or-less useless for databases that have a large number of > > inserts (like mine). > > > > When I perform a replication after a batch of data is inserted, the query > > runs literally for hours before it returns. I have never actually been > > present during the whole replication duration since it takes longer than > > my 8-12 hour days here at work. > > > > Is there any replication solution that would fit my needs? I'm taking > > advantage of some PG7.2 features so "downgrading" to the 6.x version of > > postgres that has replication support isn't an option. > > > > Thanks. > > > > --man > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 4: Don't 'kill -9' the postmaster
I've thought some more about this, and I want to pass this idea past you guys. What do you think about a replication proxy, essentially a daemon that sits between a PostgreSQL client and server. Every single SQL query, transaction statement, etc that the proxy recieves it repeats back to all the database servers. In this way, if a back-end database server goes down queries will continue unabated (except the downed server won't recieve updates). Basically, the proxy server could intercept these queries and place them in a stack (on a per-database basis) and when every server in the queue acknowledges the update, the query is removed from the stack. Each database server can have their own position in the stack, so if servers A and B successfully run a query, but C doesn't (e.g. it requires human intervention), C is removed from the list of acceptable servers but A and B can keep moving through the queue. What do you think? Also, should this discussion be moved to another mailing list? On Monday 05 May 2003 12:26 pm, Michael A Nachbaur wrote: > I have thought about this. The problem I come into is data consistancy. I > have at least 8 different processes that harvest data, and an intranet > website that can also manipulate the database (to assign customers to > different packages, re-assign modems to different customers, etc). Trying > to maintain consistancy across the entire application would be such a > nightmare, I don't want to think about it. > > If I go with a centralized middleware server that manages all database > access, then I could perhaps do that in there...and then I could use > transactions on both databases, and if either transaction fails then I'll > roll back the other. But this would make my entire framework very rigid.
--- Michael A Nachbaur <mike@nachbaur.com> wrote: > IIRC, pgsql.com's product (Postgres-R I think it's > called) is based on > PostgreSQL-6.4.2, which won't work for me since I > use a lot of features > available only in the 7.x generation. You are thinking of the pgreplication project here. The pgsql.com product is (IIRC) an improved version of rserv. It is reported to work well in enterprise level applications. __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com
Postgres-R is not pgsql.com's product ... eRServer is ... the code that is in product has been quite extensively pounded on by both the .org and .info registries ... On Mon, 5 May 2003, Michael A Nachbaur wrote: > IIRC, pgsql.com's product (Postgres-R I think it's called) is based on > PostgreSQL-6.4.2, which won't work for me since I use a lot of features > available only in the 7.x generation. > > I'm using the Rserv code from the contrib section of Postgres, and that's what > I'm having problems with. > > On Monday 05 May 2003 10:30 am, Rajesh Kumar Mallah wrote: > > Have u tried the Rserv solution which pgsql.com provides > > commercially? I have heard its pretty good. > > > > > > regds > > mallah. > > > > On Mon, 5 May 2003, Michael A Nachbaur wrote: > > > Hello all, > > > > > > I apologize if this has already been covered in the past, but I couldn't > > > seem to find an adequate solution to my problem in the archives. > > > > > > I have a database that is used for a bandwidth tracking system at a > > > broadband ISP. To make a long story short, I'm inserting over 800,000 > > > records per day into this database. Suffice to say, the uptime of this > > > database is of paramount importance, so I would like to have a more > > > up-to-date backup copy of my database in the event of a failure (more > > > recent than my twice-per-day db_dump backup). > > > > > > I have two servers, both Dual Xeon-2G with 4G of RAM, and would like to > > > replicate between the two. I would like to have "live" replication, but > > > I couldn't seem to find a solution for that for PostgreSQL. I tried > > > RServ but, after attempting it, I saw a mailing list posting saying that > > > it is more-or-less useless for databases that have a large number of > > > inserts (like mine). > > > > > > When I perform a replication after a batch of data is inserted, the query > > > runs literally for hours before it returns. I have never actually been > > > present during the whole replication duration since it takes longer than > > > my 8-12 hour days here at work. > > > > > > Is there any replication solution that would fit my needs? I'm taking > > > advantage of some PG7.2 features so "downgrading" to the 6.x version of > > > postgres that has replication support isn't an option. > > > > > > Thanks. > > > > > > --man > > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > > TIP 4: Don't 'kill -9' the postmaster > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org