Thread: Replication & web apps

Replication & web apps

From
Leonardo Francalanci
Date:
Hi,

I still don't understand how replication can be used in web applications.
Given this scenario:

1) user updates his profile -> update to the db (master)
2) web app redirects to the "profile page" -> select from db (slave)

Since (2) is a select it is issued to the slave.

How can one be sure that the master propagates the update (1) to the
slave before data is requested from the slave (2)?
And: suppose there is a method to understand that the user made a change
to the db in the web request (as above) so that we have to issue all
queries of the same web request to the master, that is:

1) user updates his profile -> update to the db (master)
2) web app redirects to the "profile page" -> select from db (master
again because in this web-request user made a change to the db)

what if the user ask AGAIN for the "profile page" BEFORE write
propagates to the slave:

3) User ask for a refresh of the "profile page" -> select -> slave
(because user didn't make any change during THIS web request)

???

In other words: how can asynchronous replication be used in an
application???








Re: Replication & web apps

From
Jeff Amiel
Date:
Well, you've hit upon the difference between synchronous replication and
asynchronous replication (and touched on the idea of clustering/load
balancing)
In a synchronous replication scheme, the database update is made to BOTH
master and slave simultaneously....the transactions must commit to both
or it is rolled back.
In asynchronous replication, there can indeed be a delay between when
the master makes update and the slave receives the notification of it.

pg_cluster is an example of a synchronous replication method (although
it's really considered multi-master...not master-slave)
Slony is an example of asynchronous.

There are other techniques to balance the load of the database calls so
that some go to one box and some to others, yet keep the data in synch...
Continuent makes a commercial p/cluster product  as well as an open
source product called Sequoia that sit in the JDBC layer and direct
traffic and control the load balancing.

Jeff Amiel

Leonardo Francalanci wrote:
> Hi,
>
> I still don't understand how replication can be used in web applications.
> Given this scenario:
>
> 1) user updates his profile -> update to the db (master)
> 2) web app redirects to the "profile page" -> select from db (slave)
>
> Since (2) is a select it is issued to the slave.
>
> How can one be sure that the master propagates the update (1) to the
> slave before data is requested from the slave (2)?
> And: suppose there is a method to understand that the user made a
> change to the db in the web request (as above) so that we have to
> issue all queries of the same web request to the master, that is:
>
> 1) user updates his profile -> update to the db (master)
> 2) web app redirects to the "profile page" -> select from db (master
> again because in this web-request user made a change to the db)
>
> what if the user ask AGAIN for the "profile page" BEFORE write
> propagates to the slave:
>
> 3) User ask for a refresh of the "profile page" -> select -> slave
> (because user didn't make any change during THIS web request)
>
> ???
>
> In other words: how can asynchronous replication be used in an
> application???
>
>
>
>
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match

Re: Replication & web apps

From
Jeff Amiel
Date:
I just reread your post and see that you already understand that what
you are describing is async replication.  Duh.
That being said, what you are trying to do is really combine the idea of
load balancing and async replication....not too feasible I would think.
Usually when people are using async replication, the slave nodes are
only used as backups or for read-only reporting systems)

Jeff Amiel wrote:
> Well, you've hit upon the difference between synchronous replication
> and asynchronous replication (and touched on the idea of
> clustering/load balancing)
> In a synchronous replication scheme, the database update is made to
> BOTH master and slave simultaneously....the transactions must commit
> to both or it is rolled back.
> In asynchronous replication, there can indeed be a delay between when
> the master makes update and the slave receives the notification of it.
>
> pg_cluster is an example of a synchronous replication method (although
> it's really considered multi-master...not master-slave)
> Slony is an example of asynchronous.
>
> There are other techniques to balance the load of the database calls
> so that some go to one box and some to others, yet keep the data in
> synch...
> Continuent makes a commercial p/cluster product  as well as an open
> source product called Sequoia that sit in the JDBC layer and direct
> traffic and control the load balancing.
>
> Jeff Amiel
>
> Leonardo Francalanci wrote:
>> Hi,
>>
>> I still don't understand how replication can be used in web
>> applications.
>> Given this scenario:
>>
>> 1) user updates his profile -> update to the db (master)
>> 2) web app redirects to the "profile page" -> select from db (slave)
>>
>> Since (2) is a select it is issued to the slave.
>>
>> How can one be sure that the master propagates the update (1) to the
>> slave before data is requested from the slave (2)?
>> And: suppose there is a method to understand that the user made a
>> change to the db in the web request (as above) so that we have to
>> issue all queries of the same web request to the master, that is:
>>
>> 1) user updates his profile -> update to the db (master)
>> 2) web app redirects to the "profile page" -> select from db (master
>> again because in this web-request user made a change to the db)
>>
>> what if the user ask AGAIN for the "profile page" BEFORE write
>> propagates to the slave:
>>
>> 3) User ask for a refresh of the "profile page" -> select -> slave
>> (because user didn't make any change during THIS web request)
>>
>> ???
>>
>> In other words: how can asynchronous replication be used in an
>> application???
>>
>>
>>
>>
>>
>>
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 9: In versions below 8.0, the planner will ignore your desire to
>>       choose an index scan if your joining column's datatypes do not
>>       match
>

Re: Replication & web apps

From
Leonardo Francalanci
Date:
> pg_cluster is an example of a synchronous replication method (although
> it's really considered multi-master...not master-slave)

It looks like pgcluster is what I would need, I just don't understand
how it works... aren't there some "good" docs about it?
Which are its limits?


And: is there a way to search the mailing list? It looks like
Pgcluster-general and Pgpool-general aren't searchable...



Thank you in advance


Re: Replication & web apps

From
Philip Hallstrom
Date:
> There are other techniques to balance the load of the database calls so that
> some go to one box and some to others, yet keep the data in synch...
> Continuent makes a commercial p/cluster product  as well as an open source
> product called Sequoia that sit in the JDBC layer and direct traffic and
> control the load balancing.

We use continuent at work (albeit for mysql...) on a three node cluster.

In their default setup they use what's called "strict" querying.  By which
they mean any read to any node will first check to make sure there aren't
any pending writes before continuing.

The other mode is "weak".  Where reads don't bother checking to see if
there are any pending writes before doing their read.  In this mode, we
run the risk that a read might happen before a previous write it's
depending on.  In our tests though the time for replication was really
small (don't remember offhand) but certainly much much smaller than a
round trip request for an end user would be.

That all said, we had to go with weak because strict killed performance.
Absolutely killed it.

Anyway, just something to keep in mind.  Kind of like raid... you lose in
one area to gain in another....

-philip

>
> Jeff Amiel
>
> Leonardo Francalanci wrote:
>> Hi,
>>
>> I still don't understand how replication can be used in web applications.
>> Given this scenario:
>>
>> 1) user updates his profile -> update to the db (master)
>> 2) web app redirects to the "profile page" -> select from db (slave)
>>
>> Since (2) is a select it is issued to the slave.
>>
>> How can one be sure that the master propagates the update (1) to the slave
>> before data is requested from the slave (2)?
>> And: suppose there is a method to understand that the user made a change to
>> the db in the web request (as above) so that we have to issue all queries
>> of the same web request to the master, that is:
>>
>> 1) user updates his profile -> update to the db (master)
>> 2) web app redirects to the "profile page" -> select from db (master again
>> because in this web-request user made a change to the db)
>>
>> what if the user ask AGAIN for the "profile page" BEFORE write propagates
>> to the slave:
>>
>> 3) User ask for a refresh of the "profile page" -> select -> slave (because
>> user didn't make any change during THIS web request)
>>
>> ???
>>
>> In other words: how can asynchronous replication be used in an
>> application???
>>
>>
>>
>>
>>
>>
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 9: In versions below 8.0, the planner will ignore your desire to
>>       choose an index scan if your joining column's datatypes do not
>>       match
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>              http://www.postgresql.org/docs/faq
>

Re: Replication & web apps

From
Brad Nicholson
Date:
Jeff Amiel wrote:

> There are other techniques to balance the load of the database calls so
> that some go to one box and some to others, yet keep the data in synch...
> Continuent makes a commercial p/cluster product  as well as an open
> source product called Sequoia that sit in the JDBC layer and direct
> traffic and control the load balancing.

pgpool allows you to load balance queries between two servers.  While it
doesn't solve the issue of trying to read a row that hasn't been
replicated, it does offer you a bit of control that may be of use to
you.  Any statement that is in a transaction will automatically go to
the master DB.  If some queries are sensitive to this issue, and some
aren't, then you might be able to make pgpool work for you by wrapping
the sensitive ones in a transaction.

--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


Re: Replication & web apps

From
Scott Marlowe
Date:
On Thu, 2006-03-16 at 13:18, Brad Nicholson wrote:
> Jeff Amiel wrote:
>
> > There are other techniques to balance the load of the database calls so
> > that some go to one box and some to others, yet keep the data in synch...
> > Continuent makes a commercial p/cluster product  as well as an open
> > source product called Sequoia that sit in the JDBC layer and direct
> > traffic and control the load balancing.
>
> pgpool allows you to load balance queries between two servers.  While it
> doesn't solve the issue of trying to read a row that hasn't been
> replicated, it does offer you a bit of control that may be of use to
> you.  Any statement that is in a transaction will automatically go to
> the master DB.  If some queries are sensitive to this issue, and some
> aren't, then you might be able to make pgpool work for you by wrapping
> the sensitive ones in a transaction.

Note that it's also possible to use pgpool as a synchronous clustering
too with automatic failover.  Since it is statement based, you have to
make sure both DBs are in exactly the same state when you start it up,
and you can't rely on things like now() being exactly the same, nor on
random() and a few other things.

however, for many applications, this is plenty good enough, and it's
dirt simple to setup and use.

Re: Replication & web apps

From
Christopher Browne
Date:
In the last exciting episode, Leonardo.Francalanci@CommProve.com (Leonardo Francalanci) wrote:
> In other words: how can asynchronous replication be used in an
> application???

Yes, this is an issue.

Asynchronous replication is NOT suitable in cases where you point
applications that need forcibly up-to-date information to a replica
that is (even just a little) behind.

We *have to* point all of the applications that need
up-to-the-instance balance information to the "master" system.

But there are doubtless plenty of places where you could use data that
isn't *forcibly* up to date.

- For instance, running temporal reports against a replica should turn
  out fine.

- We run the WHOIS service against replicas; we don't promise
  instantaneous propagation, and nobody worries too much if they get
  information that, in a few cases, is a few seconds out of date.

- In a web sales system, I'd be inclined to run "normal" queries
  against a replica for the portion of the workflow that involves
  users assembling their "basket" of products.  And move to
  pointing to the "master" when it comes time to "check out" the
  order, so that the final order gets the most up-to-date data.

Asynchronous replication isn't good for *everything*, but you'll
generally find ways to use replicas, when you have them...
--
let name="cbbrowne" and tld="ntlug.org" in String.concat "@" [name;tld];;
http://linuxfinances.info/info/lsf.html
"The best  design is not predicated  on how brain-dead you  can be and
still operate it." -- David C. Wright

Re: Replication & web apps

From
Leonardo Francalanci
Date:
> We use continuent at work (albeit for mysql...) on a three node cluster.

That's a good project, the only thing I don't like is that one is forced
to use Java which is not what I'd like to do (I'd prefer Ruby).

Thank you everybody for your answers.


Leonardo