Thread: Re: pgsql Replication Proxy (was Re: Replication for a
I love this idea. The proxy could return immediately instead of making my program block on update. One note, though. Instead of a stack, you need a FIFO. For example: delete from sometable where field=value; insert into sometable (field) values (value1); insert into sometable (field) values (value2); .... This code breaks in a stack and only works in a fifo. Minor point, though. So do we have a volunteer to write such a tool? <grin> Mike Diehl. -----Original Message----- From: Michael A Nachbaur [mailto:mike@nachbaur.com] Sent: Monday, May 05, 2003 1:57 PM To: pgsql-sql@postgresql.org Subject: pgsql Replication Proxy (was Re: [SQL] Replication for a large database) I've thought some more about this, and I want to pass this idea past you guys. What do you think about a replication proxy, essentially a daemon that sits between a PostgreSQL client and server. Every single SQL query, transaction statement, etc that the proxy recieves it repeats back to all the database servers. In this way, if a back-end database server goes down queries will continue unabated (except the downed server won't recieve updates). Basically, the proxy server could intercept these queries and place them in a stack (on a per-database basis) and when every server in the queue acknowledges the update, the query is removed from the stack. Each database server can have their own position in the stack, so if servers A and B successfully run a query, but C doesn't (e.g. it requires human intervention), C is removed from the list of acceptable servers but A and B can keep moving through the queue. What do you think? Also, should this discussion be moved to another mailing list? On Monday 05 May 2003 12:26 pm, Michael A Nachbaur wrote: > I have thought about this. The problem I come into is data consistancy. I > have at least 8 different processes that harvest data, and an intranet > website that can also manipulate the database (to assign customers to > different packages, re-assign modems to different customers, etc). Trying > to maintain consistancy across the entire application would be such a > nightmare, I don't want to think about it. > > If I go with a centralized middleware server that manages all database > access, then I could perhaps do that in there...and then I could use > transactions on both databases, and if either transaction fails then I'll > roll back the other. But this would make my entire framework very rigid. ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
LOL! Depending on how much FUD I can throw at the guys higher up in the food chain at my office, I might be able to get some budget space to develop something like this. There are some significant technical hurdles I have to overcome, but I think it's doable. The analogy I came up with is SCSI RAID for databases. You can rip a database server out, and the overall system will still function...toss it back in, and updates will still happen. I would also like to be able to throw a fresh database in place and have it mirror the existing database servers in the background so you don't have to go through the complicated procedure of dumping/restoring the database servers by hand. Re: FIFO, yeah, I realized that after I sent the message. Does anyone have any ideas for me on this? I think it might make sense to use PostgreSQL as the storage mechanism for the proxy server, but that sort of defeats the purpose of having a replication system. Maybe spread can be used to distribute the messages to different servers, but I'm not too familiar with it. Also, one final note, I'm a Perl programmer, so anything I build will be written in that. If anyone has objections, let me know and maybe we could work together on something. On Tuesday 06 May 2003 09:28 am, Diehl, Jeffrey wrote: > I love this idea. The proxy could return immediately instead of making my > program block on update. > > One note, though. Instead of a stack, you need a FIFO. For example: > > delete from sometable where field=value; > insert into sometable (field) values (value1); > insert into sometable (field) values (value2); > .... > > > This code breaks in a stack and only works in a fifo. Minor point, though. > > So do we have a volunteer to write such a tool? <grin> > > Mike Diehl. > > -----Original Message----- > From: Michael A Nachbaur [mailto:mike@nachbaur.com] > Sent: Monday, May 05, 2003 1:57 PM > To: pgsql-sql@postgresql.org > Subject: pgsql Replication Proxy (was Re: [SQL] Replication for a large > database) > > > I've thought some more about this, and I want to pass this idea past you > guys. > What do you think about a replication proxy, essentially a daemon that sits > between a PostgreSQL client and server. Every single SQL query, > transaction > > statement, etc that the proxy recieves it repeats back to all the database > servers. In this way, if a back-end database server goes down queries will > continue unabated (except the downed server won't recieve updates). > > Basically, the proxy server could intercept these queries and place them in > a > stack (on a per-database basis) and when every server in the queue > acknowledges the update, the query is removed from the stack. Each > database > > server can have their own position in the stack, so if servers A and B > successfully run a query, but C doesn't (e.g. it requires human > intervention), C is removed from the list of acceptable servers but A and B > can keep moving through the queue. > > What do you think? Also, should this discussion be moved to another > mailing > > list? > > On Monday 05 May 2003 12:26 pm, Michael A Nachbaur wrote: > > I have thought about this. The problem I come into is data consistancy. > > I > > > have at least 8 different processes that harvest data, and an intranet > > website that can also manipulate the database (to assign customers to > > different packages, re-assign modems to different customers, etc). > > Trying to maintain consistancy across the entire application would be > > such a nightmare, I don't want to think about it. > > > > If I go with a centralized middleware server that manages all database > > access, then I could perhaps do that in there...and then I could use > > transactions on both databases, and if either transaction fails then I'll > > roll back the other. But this would make my entire framework very rigid. > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
Postgres don't have the support ( yet ) for the 2 phase commit, so I think that is impossible do it now. What happen if the last server do an error in commit phase ? Regards Gaetano ----- Original Message ----- From: "Diehl, Jeffrey" <jdiehl@sandia.gov> To: "'Michael A Nachbaur'" <mike@nachbaur.com>; <pgsql-sql@postgresql.org> Sent: Tuesday, May 06, 2003 6:28 PM Subject: Re: pgsql Replication Proxy (was Re: [SQL] Replication for a > I love this idea. The proxy could return immediately instead of making my > program block on update. > > One note, though. Instead of a stack, you need a FIFO. For example: > > delete from sometable where field=value; > insert into sometable (field) values (value1); > insert into sometable (field) values (value2); > .... > > > This code breaks in a stack and only works in a fifo. Minor point, though. > > So do we have a volunteer to write such a tool? <grin> > > Mike Diehl. > > -----Original Message----- > From: Michael A Nachbaur [mailto:mike@nachbaur.com] > Sent: Monday, May 05, 2003 1:57 PM > To: pgsql-sql@postgresql.org > Subject: pgsql Replication Proxy (was Re: [SQL] Replication for a large > database) > > > I've thought some more about this, and I want to pass this idea past you > guys. > What do you think about a replication proxy, essentially a daemon that sits > between a PostgreSQL client and server. Every single SQL query, transaction > > statement, etc that the proxy recieves it repeats back to all the database > servers. In this way, if a back-end database server goes down queries will > continue unabated (except the downed server won't recieve updates). > > Basically, the proxy server could intercept these queries and place them in > a > stack (on a per-database basis) and when every server in the queue > acknowledges the update, the query is removed from the stack. Each database > > server can have their own position in the stack, so if servers A and B > successfully run a query, but C doesn't (e.g. it requires human > intervention), C is removed from the list of acceptable servers but A and B > can keep moving through the queue. > > What do you think? Also, should this discussion be moved to another mailing > > list? > > On Monday 05 May 2003 12:26 pm, Michael A Nachbaur wrote: > > I have thought about this. The problem I come into is data consistancy. > I > > have at least 8 different processes that harvest data, and an intranet > > website that can also manipulate the database (to assign customers to > > different packages, re-assign modems to different customers, etc). Trying > > to maintain consistancy across the entire application would be such a > > nightmare, I don't want to think about it. > > > > If I go with a centralized middleware server that manages all database > > access, then I could perhaps do that in there...and then I could use > > transactions on both databases, and if either transaction fails then I'll > > roll back the other. But this would make my entire framework very rigid. > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
Then it is removed from the list of "valid" servers, and it isn't included in updates, at least until a) a DBA fixes the issue, or b) the DBA tells the database server to re-mirror itself from the other available servers. Once two-phased commit support is available, this could be changed to be much more robust. Any alternative ideas? On Tuesday 06 May 2003 10:54 am, Mendola Gaetano wrote: > Postgres don't have the support ( yet ) for the 2 phase commit, so > I think that is impossible do it now. > What happen if the last server do an error in commit phase ? > > Regards > Gaetano > > ----- Original Message ----- > From: "Diehl, Jeffrey" <jdiehl@sandia.gov> > To: "'Michael A Nachbaur'" <mike@nachbaur.com>; <pgsql-sql@postgresql.org> > Sent: Tuesday, May 06, 2003 6:28 PM > Subject: Re: pgsql Replication Proxy (was Re: [SQL] Replication for a > > > I love this idea. The proxy could return immediately instead of making > > my program block on update. > > > > One note, though. Instead of a stack, you need a FIFO. For example: > > > > delete from sometable where field=value; > > insert into sometable (field) values (value1); > > insert into sometable (field) values (value2); > > .... > > > > > > This code breaks in a stack and only works in a fifo. Minor point, > > though. > > > So do we have a volunteer to write such a tool? <grin> > > > > Mike Diehl. > > > > -----Original Message----- > > From: Michael A Nachbaur [mailto:mike@nachbaur.com] > > Sent: Monday, May 05, 2003 1:57 PM > > To: pgsql-sql@postgresql.org > > Subject: pgsql Replication Proxy (was Re: [SQL] Replication for a large > > database) > > > > > > I've thought some more about this, and I want to pass this idea past you > > guys. > > What do you think about a replication proxy, essentially a daemon that > > sits > > > between a PostgreSQL client and server. Every single SQL query, > > transaction > > > statement, etc that the proxy recieves it repeats back to all the > > database servers. In this way, if a back-end database server goes down > > queries > > will > > > continue unabated (except the downed server won't recieve updates). > > > > Basically, the proxy server could intercept these queries and place them > > in > > > a > > stack (on a per-database basis) and when every server in the queue > > acknowledges the update, the query is removed from the stack. Each > > database > > > server can have their own position in the stack, so if servers A and B > > successfully run a query, but C doesn't (e.g. it requires human > > intervention), C is removed from the list of acceptable servers but A and > > B > > > can keep moving through the queue. > > > > What do you think? Also, should this discussion be moved to another > > mailing > > > list? > > > > On Monday 05 May 2003 12:26 pm, Michael A Nachbaur wrote: > > > I have thought about this. The problem I come into is data > > > consistancy. > > > > I > > > > > have at least 8 different processes that harvest data, and an intranet > > > website that can also manipulate the database (to assign customers to > > > different packages, re-assign modems to different customers, etc). > > Trying > > > > to maintain consistancy across the entire application would be such a > > > nightmare, I don't want to think about it. > > > > > > If I go with a centralized middleware server that manages all database > > > access, then I could perhaps do that in there...and then I could use > > > transactions on both databases, and if either transaction fails then > > I'll > > > > roll back the other. But this would make my entire framework very > > rigid. > > > ---------------------------(end of broadcast)--------------------------- > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 3: if posting/reading through Usenet, please send an appropriate > > subscribe-nomail command to majordomo@postgresql.org so that your > > message can get through to the mailing list cleanly
Michael: 1: It will probably need a messaging system that can properly register/unregister a backend database system. a SIP like syntax comes into mind. (or a XML message, for the ease of future expansion) 2: What do you mean by "per-database basis"? Database as the concept of database schema? or seperate physical database server? 3: How does the proxy server know that a backend server failed in commit? Wei Michael A Nachbaur wrote: >LOL! Depending on how much FUD I can throw at the guys higher up in the food >chain at my office, I might be able to get some budget space to develop >something like this. There are some significant technical hurdles I have to >overcome, but I think it's doable. The analogy I came up with is SCSI RAID >for databases. You can rip a database server out, and the overall system >will still function...toss it back in, and updates will still happen. I >would also like to be able to throw a fresh database in place and have it >mirror the existing database servers in the background so you don't have to >go through the complicated procedure of dumping/restoring the database >servers by hand. > >Re: FIFO, yeah, I realized that after I sent the message. > >Does anyone have any ideas for me on this? I think it might make sense to use >PostgreSQL as the storage mechanism for the proxy server, but that sort of >defeats the purpose of having a replication system. Maybe spread can be used >to distribute the messages to different servers, but I'm not too familiar >with it. > >Also, one final note, I'm a Perl programmer, so anything I build will be >written in that. If anyone has objections, let me know and maybe we could >work together on something. > > >
On Tuesday 06 May 2003 12:10 pm, Wei Weng wrote: > Michael: > > 1: It will probably need a messaging system that can properly > register/unregister a backend database system. a SIP like syntax comes > into mind. (or a XML message, for the ease of future expansion) I'm thinking Spread would be a good idea here. To tell you the truth, I haven't had to deal too much with high-speed guarenteed messaging in a looong time, so I'm out of touch with this. > 2: What do you mean by "per-database basis"? Database as the concept of > database schema? or seperate physical database server? Per database schema. This is primarily to meet my requirements, but it makes sense for other scenarios. Essentially, I have a production database server and a development database server. Both have similar capabilities, and are always on (redundant power supplies, UPS, etc). My development machine will have a replicated copy of the production database schema, but I'd also like to have a "private" database schema on the same machine that I can do development work on, where I can break it without affecting production. I'd like to eventually do something similar to mod_backhand, where "select" statements are sent to a server with the least amount of load, so slower machines can still take part in a proxy setup without reducing performance (or hammering the poor little box). One could also give a server a handicap so it has more capacity for other uses (e.g. my development server). > 3: How does the proxy server know that a backend server failed in commit? I'm thinking of having a queue of statements that the proxy server maintains. Each server will have it's own pointer that refers to the location in the queue they are at. Each statement is then sent to the database server sequentially, and if a statement fails on a given server, its removed from the list list of valid servers, and it's index remains at the statement that failed. This is really starting to sound complicated... > Wei > > Michael A Nachbaur wrote: > >LOL! Depending on how much FUD I can throw at the guys higher up in the > > food chain at my office, I might be able to get some budget space to > > develop something like this. There are some significant technical > > hurdles I have to overcome, but I think it's doable. The analogy I came > > up with is SCSI RAID for databases. You can rip a database server out, > > and the overall system will still function...toss it back in, and updates > > will still happen. I would also like to be able to throw a fresh > > database in place and have it mirror the existing database servers in the > > background so you don't have to go through the complicated procedure of > > dumping/restoring the database servers by hand. > > > >Re: FIFO, yeah, I realized that after I sent the message. > > > >Does anyone have any ideas for me on this? I think it might make sense to > > use PostgreSQL as the storage mechanism for the proxy server, but that > > sort of defeats the purpose of having a replication system. Maybe spread > > can be used to distribute the messages to different servers, but I'm not > > too familiar with it. > > > >Also, one final note, I'm a Perl programmer, so anything I build will be > >written in that. If anyone has objections, let me know and maybe we could > >work together on something.
And if the first commit fail if a key is duplicated, or for another trigger issue? Should be possible understand why the commit fails and consider this a normal event too, I think anyway that is hard. Also the Remirror ( the point B) is hard to do without "Point in time recovery" how do a remirror of a live DB without have an exclusive lock on it ? Regards Gaetano On: Tuesday, May 06, 2003 8:16 PM "Michael A Nachbaur" <mike@nachbaur.com> wrote: > Then it is removed from the list of "valid" servers, and it isn't included in > updates, at least until a) a DBA fixes the issue, or b) the DBA tells the > database server to re-mirror itself from the other available servers. > > Once two-phased commit support is available, this could be changed to be much > more robust. > > Any alternative ideas? > > On Tuesday 06 May 2003 10:54 am, Mendola Gaetano wrote: > > Postgres don't have the support ( yet ) for the 2 phase commit, so > > I think that is impossible do it now. > > What happen if the last server do an error in commit phase ? > > > > Regards > > Gaetano > > > > ----- Original Message ----- > > From: "Diehl, Jeffrey" <jdiehl@sandia.gov> > > To: "'Michael A Nachbaur'" <mike@nachbaur.com>; <pgsql-sql@postgresql.org> > > Sent: Tuesday, May 06, 2003 6:28 PM > > Subject: Re: pgsql Replication Proxy (was Re: [SQL] Replication for a > > > > > I love this idea. The proxy could return immediately instead of making > > > my program block on update. > > > > > > One note, though. Instead of a stack, you need a FIFO. For example: > > > > > > delete from sometable where field=value; > > > insert into sometable (field) values (value1); > > > insert into sometable (field) values (value2); > > > .... > > > > > > > > > This code breaks in a stack and only works in a fifo. Minor point, > > > > though. > > > > > So do we have a volunteer to write such a tool? <grin> > > > > > > Mike Diehl. > > > > > > -----Original Message----- > > > From: Michael A Nachbaur [mailto:mike@nachbaur.com] > > > Sent: Monday, May 05, 2003 1:57 PM > > > To: pgsql-sql@postgresql.org > > > Subject: pgsql Replication Proxy (was Re: [SQL] Replication for a large > > > database) > > > > > > > > > I've thought some more about this, and I want to pass this idea past you > > > guys. > > > What do you think about a replication proxy, essentially a daemon that > > > > sits > > > > > between a PostgreSQL client and server. Every single SQL query, > > > > transaction > > > > > statement, etc that the proxy recieves it repeats back to all the > > > database servers. In this way, if a back-end database server goes down > > > queries > > > > will > > > > > continue unabated (except the downed server won't recieve updates). > > > > > > Basically, the proxy server could intercept these queries and place them > > > > in > > > > > a > > > stack (on a per-database basis) and when every server in the queue > > > acknowledges the update, the query is removed from the stack. Each > > > > database > > > > > server can have their own position in the stack, so if servers A and B > > > successfully run a query, but C doesn't (e.g. it requires human > > > intervention), C is removed from the list of acceptable servers but A and > > > > B > > > > > can keep moving through the queue. > > > > > > What do you think? Also, should this discussion be moved to another > > > > mailing > > > > > list? > > > > > > On Monday 05 May 2003 12:26 pm, Michael A Nachbaur wrote: > > > > I have thought about this. The problem I come into is data > > > > consistancy. > > > > > > I > > > > > > > have at least 8 different processes that harvest data, and an intranet > > > > website that can also manipulate the database (to assign customers to > > > > different packages, re-assign modems to different customers, etc). > > > > Trying > > > > > > to maintain consistancy across the entire application would be such a > > > > nightmare, I don't want to think about it. > > > > > > > > If I go with a centralized middleware server that manages all database > > > > access, then I could perhaps do that in there...and then I could use > > > > transactions on both databases, and if either transaction fails then > > > > I'll > > > > > > roll back the other. But this would make my entire framework very > > > > rigid. > > > > > ---------------------------(end of broadcast)--------------------------- > > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > > TIP 3: if posting/reading through Usenet, please send an appropriate > > > subscribe-nomail command to majordomo@postgresql.org so that your > > > message can get through to the mailing list cleanly > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster >