Thread: Replication & web apps
Hi, I still don't understand how replication can be used in web applications. Given this scenario: 1) user updates his profile -> update to the db (master) 2) web app redirects to the "profile page" -> select from db (slave) Since (2) is a select it is issued to the slave. How can one be sure that the master propagates the update (1) to the slave before data is requested from the slave (2)? And: suppose there is a method to understand that the user made a change to the db in the web request (as above) so that we have to issue all queries of the same web request to the master, that is: 1) user updates his profile -> update to the db (master) 2) web app redirects to the "profile page" -> select from db (master again because in this web-request user made a change to the db) what if the user ask AGAIN for the "profile page" BEFORE write propagates to the slave: 3) User ask for a refresh of the "profile page" -> select -> slave (because user didn't make any change during THIS web request) ??? In other words: how can asynchronous replication be used in an application???
Well, you've hit upon the difference between synchronous replication and asynchronous replication (and touched on the idea of clustering/load balancing) In a synchronous replication scheme, the database update is made to BOTH master and slave simultaneously....the transactions must commit to both or it is rolled back. In asynchronous replication, there can indeed be a delay between when the master makes update and the slave receives the notification of it. pg_cluster is an example of a synchronous replication method (although it's really considered multi-master...not master-slave) Slony is an example of asynchronous. There are other techniques to balance the load of the database calls so that some go to one box and some to others, yet keep the data in synch... Continuent makes a commercial p/cluster product as well as an open source product called Sequoia that sit in the JDBC layer and direct traffic and control the load balancing. Jeff Amiel Leonardo Francalanci wrote: > Hi, > > I still don't understand how replication can be used in web applications. > Given this scenario: > > 1) user updates his profile -> update to the db (master) > 2) web app redirects to the "profile page" -> select from db (slave) > > Since (2) is a select it is issued to the slave. > > How can one be sure that the master propagates the update (1) to the > slave before data is requested from the slave (2)? > And: suppose there is a method to understand that the user made a > change to the db in the web request (as above) so that we have to > issue all queries of the same web request to the master, that is: > > 1) user updates his profile -> update to the db (master) > 2) web app redirects to the "profile page" -> select from db (master > again because in this web-request user made a change to the db) > > what if the user ask AGAIN for the "profile page" BEFORE write > propagates to the slave: > > 3) User ask for a refresh of the "profile page" -> select -> slave > (because user didn't make any change during THIS web request) > > ??? > > In other words: how can asynchronous replication be used in an > application??? > > > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match
I just reread your post and see that you already understand that what you are describing is async replication. Duh. That being said, what you are trying to do is really combine the idea of load balancing and async replication....not too feasible I would think. Usually when people are using async replication, the slave nodes are only used as backups or for read-only reporting systems) Jeff Amiel wrote: > Well, you've hit upon the difference between synchronous replication > and asynchronous replication (and touched on the idea of > clustering/load balancing) > In a synchronous replication scheme, the database update is made to > BOTH master and slave simultaneously....the transactions must commit > to both or it is rolled back. > In asynchronous replication, there can indeed be a delay between when > the master makes update and the slave receives the notification of it. > > pg_cluster is an example of a synchronous replication method (although > it's really considered multi-master...not master-slave) > Slony is an example of asynchronous. > > There are other techniques to balance the load of the database calls > so that some go to one box and some to others, yet keep the data in > synch... > Continuent makes a commercial p/cluster product as well as an open > source product called Sequoia that sit in the JDBC layer and direct > traffic and control the load balancing. > > Jeff Amiel > > Leonardo Francalanci wrote: >> Hi, >> >> I still don't understand how replication can be used in web >> applications. >> Given this scenario: >> >> 1) user updates his profile -> update to the db (master) >> 2) web app redirects to the "profile page" -> select from db (slave) >> >> Since (2) is a select it is issued to the slave. >> >> How can one be sure that the master propagates the update (1) to the >> slave before data is requested from the slave (2)? >> And: suppose there is a method to understand that the user made a >> change to the db in the web request (as above) so that we have to >> issue all queries of the same web request to the master, that is: >> >> 1) user updates his profile -> update to the db (master) >> 2) web app redirects to the "profile page" -> select from db (master >> again because in this web-request user made a change to the db) >> >> what if the user ask AGAIN for the "profile page" BEFORE write >> propagates to the slave: >> >> 3) User ask for a refresh of the "profile page" -> select -> slave >> (because user didn't make any change during THIS web request) >> >> ??? >> >> In other words: how can asynchronous replication be used in an >> application??? >> >> >> >> >> >> >> >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 9: In versions below 8.0, the planner will ignore your desire to >> choose an index scan if your joining column's datatypes do not >> match >
> pg_cluster is an example of a synchronous replication method (although > it's really considered multi-master...not master-slave) It looks like pgcluster is what I would need, I just don't understand how it works... aren't there some "good" docs about it? Which are its limits? And: is there a way to search the mailing list? It looks like Pgcluster-general and Pgpool-general aren't searchable... Thank you in advance
> There are other techniques to balance the load of the database calls so that > some go to one box and some to others, yet keep the data in synch... > Continuent makes a commercial p/cluster product as well as an open source > product called Sequoia that sit in the JDBC layer and direct traffic and > control the load balancing. We use continuent at work (albeit for mysql...) on a three node cluster. In their default setup they use what's called "strict" querying. By which they mean any read to any node will first check to make sure there aren't any pending writes before continuing. The other mode is "weak". Where reads don't bother checking to see if there are any pending writes before doing their read. In this mode, we run the risk that a read might happen before a previous write it's depending on. In our tests though the time for replication was really small (don't remember offhand) but certainly much much smaller than a round trip request for an end user would be. That all said, we had to go with weak because strict killed performance. Absolutely killed it. Anyway, just something to keep in mind. Kind of like raid... you lose in one area to gain in another.... -philip > > Jeff Amiel > > Leonardo Francalanci wrote: >> Hi, >> >> I still don't understand how replication can be used in web applications. >> Given this scenario: >> >> 1) user updates his profile -> update to the db (master) >> 2) web app redirects to the "profile page" -> select from db (slave) >> >> Since (2) is a select it is issued to the slave. >> >> How can one be sure that the master propagates the update (1) to the slave >> before data is requested from the slave (2)? >> And: suppose there is a method to understand that the user made a change to >> the db in the web request (as above) so that we have to issue all queries >> of the same web request to the master, that is: >> >> 1) user updates his profile -> update to the db (master) >> 2) web app redirects to the "profile page" -> select from db (master again >> because in this web-request user made a change to the db) >> >> what if the user ask AGAIN for the "profile page" BEFORE write propagates >> to the slave: >> >> 3) User ask for a refresh of the "profile page" -> select -> slave (because >> user didn't make any change during THIS web request) >> >> ??? >> >> In other words: how can asynchronous replication be used in an >> application??? >> >> >> >> >> >> >> >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 9: In versions below 8.0, the planner will ignore your desire to >> choose an index scan if your joining column's datatypes do not >> match > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq >
Jeff Amiel wrote: > There are other techniques to balance the load of the database calls so > that some go to one box and some to others, yet keep the data in synch... > Continuent makes a commercial p/cluster product as well as an open > source product called Sequoia that sit in the JDBC layer and direct > traffic and control the load balancing. pgpool allows you to load balance queries between two servers. While it doesn't solve the issue of trying to read a row that hasn't been replicated, it does offer you a bit of control that may be of use to you. Any statement that is in a transaction will automatically go to the master DB. If some queries are sensitive to this issue, and some aren't, then you might be able to make pgpool work for you by wrapping the sensitive ones in a transaction. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp.
On Thu, 2006-03-16 at 13:18, Brad Nicholson wrote: > Jeff Amiel wrote: > > > There are other techniques to balance the load of the database calls so > > that some go to one box and some to others, yet keep the data in synch... > > Continuent makes a commercial p/cluster product as well as an open > > source product called Sequoia that sit in the JDBC layer and direct > > traffic and control the load balancing. > > pgpool allows you to load balance queries between two servers. While it > doesn't solve the issue of trying to read a row that hasn't been > replicated, it does offer you a bit of control that may be of use to > you. Any statement that is in a transaction will automatically go to > the master DB. If some queries are sensitive to this issue, and some > aren't, then you might be able to make pgpool work for you by wrapping > the sensitive ones in a transaction. Note that it's also possible to use pgpool as a synchronous clustering too with automatic failover. Since it is statement based, you have to make sure both DBs are in exactly the same state when you start it up, and you can't rely on things like now() being exactly the same, nor on random() and a few other things. however, for many applications, this is plenty good enough, and it's dirt simple to setup and use.
In the last exciting episode, Leonardo.Francalanci@CommProve.com (Leonardo Francalanci) wrote: > In other words: how can asynchronous replication be used in an > application??? Yes, this is an issue. Asynchronous replication is NOT suitable in cases where you point applications that need forcibly up-to-date information to a replica that is (even just a little) behind. We *have to* point all of the applications that need up-to-the-instance balance information to the "master" system. But there are doubtless plenty of places where you could use data that isn't *forcibly* up to date. - For instance, running temporal reports against a replica should turn out fine. - We run the WHOIS service against replicas; we don't promise instantaneous propagation, and nobody worries too much if they get information that, in a few cases, is a few seconds out of date. - In a web sales system, I'd be inclined to run "normal" queries against a replica for the portion of the workflow that involves users assembling their "basket" of products. And move to pointing to the "master" when it comes time to "check out" the order, so that the final order gets the most up-to-date data. Asynchronous replication isn't good for *everything*, but you'll generally find ways to use replicas, when you have them... -- let name="cbbrowne" and tld="ntlug.org" in String.concat "@" [name;tld];; http://linuxfinances.info/info/lsf.html "The best design is not predicated on how brain-dead you can be and still operate it." -- David C. Wright
> We use continuent at work (albeit for mysql...) on a three node cluster. That's a good project, the only thing I don't like is that one is forced to use Java which is not what I'd like to do (I'd prefer Ruby). Thank you everybody for your answers. Leonardo