Re: Partitioning / Clustering - Mailing list pgsql-performance

From Alex Stapleton
Subject Re: Partitioning / Clustering
Date
Msg-id 3F94C2F9-29D9-46F6-A873-3681B4F3E379@advfn.com
Whole thread Raw
In response to Re: Partitioning / Clustering  ("David Roussel" <pgsql-performance@diroussel.xsmail.com>)
Responses Re: Partitioning / Clustering
List pgsql-performance
On 11 May 2005, at 08:57, David Roussel wrote:

> For an interesting look at scalability, clustering, caching, etc for a
> large site have a look at how livejournal did it.
> http://www.danga.com/words/2004_lisa/lisa04.pdf

I have implemented similar systems in the past, it's a pretty good
technique, unfortunately it's not very "Plug-and-Play" as you have to
base most of your API on memcached (I imagine MySQLs NDB tables might
work as well actually) for it to work well.

> They have 2.6 Million active users, posting 200 new blog entries per
> minute, plus many comments and countless page views.
>
> Although this system is of a different sort to the type I work on it's
> interesting to see how they've made it scale.
>
> They use mysql on dell hardware! And found single master
> replication did
> not scale.  There's a section on multimaster replication, not sure if
> they use it.  The main approach they use is to parition users into
> spefic database clusters.  Caching is done using memcached at the
> application level to avoid hitting the db for rendered pageviews

I don't think they are storing pre-rendered pages (or bits of) in
memcached, but are principally storing the data for the pages in it.
Gluing pages together is not a hugely intensive process usually :)
The only problem with memcached is that the clients clustering/
partitioning system will probably break if a node dies, and probably
get confused if you add new nodes onto it as well. Easily extensible
clustering (no complete redistribution of data required when you add/
remove nodes) with the data distributed across nodes seems to be
nothing but a pipe dream right now.

> It's interesting that the solution livejournal have arrived at is
> quite
> similar in ways to the way google is set up.

Don't Google use indexing servers which keep track of where data is?
So that you only need to update them when you add or move data,
deletes don't even have to be propagated among indexes immediately
really because you'll find out if data isn't there when you visit
where it should be. Or am I talking crap?

> David
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 8: explain analyze is your friend
>
>


pgsql-performance by date:

Previous
From: Alex Stapleton
Date:
Subject: Re: Partitioning / Clustering
Next
From: Alex Stapleton
Date:
Subject: Re: Partitioning / Clustering