On Wed, May 11, 2005 at 08:57:57AM +0100, David Roussel wrote:
> For an interesting look at scalability, clustering, caching, etc for a
> large site have a look at how livejournal did it.
> http://www.danga.com/words/2004_lisa/lisa04.pdf
>
> They have 2.6 Million active users, posting 200 new blog entries per
> minute, plus many comments and countless page views.
Neither of which is that horribly impressive. 200 TPM is less than 4TPS.
While I haven't run high transaction rate databases under PostgreSQL, I
suspect others who have will say that 4TPS isn't that big of a deal.
> Although this system is of a different sort to the type I work on it's
> interesting to see how they've made it scale.
>
> They use mysql on dell hardware! And found single master replication did
> not scale. There's a section on multimaster replication, not sure if
Probably didn't scale because they used to use MyISAM.
> they use it. The main approach they use is to parition users into
> spefic database clusters. Caching is done using memcached at the
Which means they've got a huge amount of additional code complexity, not
to mention how many times you can't post something because 'that cluster
is down for maintenance'.
> application level to avoid hitting the db for rendered pageviews.
Memcached is about the only good thing I've seen come out of
livejournal.
> It's interesting that the solution livejournal have arrived at is quite
> similar in ways to the way google is set up.
Except that unlike LJ, google stays up and it's fast. Though granted, LJ
is quite a bit faster than it was 6 months ago.
--
Jim C. Nasby, Database Consultant decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828
Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"