Re: Partitioning / Clustering - Mailing list pgsql-performance

From Alex Stapleton
Subject Re: Partitioning / Clustering
Date
Msg-id 57AE769C-381F-4757-95FB-409001758AB6@advfn.com
Whole thread Raw
In response to Re: Partitioning / Clustering  (Alex Stapleton <alexs@advfn.com>)
List pgsql-performance
On 11 May 2005, at 09:50, Alex Stapleton wrote:

>
> On 11 May 2005, at 08:57, David Roussel wrote:
>
>
>> For an interesting look at scalability, clustering, caching, etc
>> for a
>> large site have a look at how livejournal did it.
>> http://www.danga.com/words/2004_lisa/lisa04.pdf
>>
>
> I have implemented similar systems in the past, it's a pretty good
> technique, unfortunately it's not very "Plug-and-Play" as you have
> to base most of your API on memcached (I imagine MySQLs NDB tables
> might work as well actually) for it to work well.
>
>
>> They have 2.6 Million active users, posting 200 new blog entries per
>> minute, plus many comments and countless page views.
>>
>> Although this system is of a different sort to the type I work on
>> it's
>> interesting to see how they've made it scale.
>>
>> They use mysql on dell hardware! And found single master
>> replication did
>> not scale.  There's a section on multimaster replication, not sure if
>> they use it.  The main approach they use is to parition users into
>> spefic database clusters.  Caching is done using memcached at the
>> application level to avoid hitting the db for rendered pageviews
>>
>
> I don't think they are storing pre-rendered pages (or bits of) in
> memcached, but are principally storing the data for the pages in
> it. Gluing pages together is not a hugely intensive process usually :)
> The only problem with memcached is that the clients clustering/
> partitioning system will probably break if a node dies, and
> probably get confused if you add new nodes onto it as well. Easily
> extensible clustering (no complete redistribution of data required
> when you add/remove nodes) with the data distributed across nodes
> seems to be nothing but a pipe dream right now.
>
>
>> It's interesting that the solution livejournal have arrived at is
>> quite
>> similar in ways to the way google is set up.
>>
>
> Don't Google use indexing servers which keep track of where data
> is? So that you only need to update them when you add or move data,
> deletes don't even have to be propagated among indexes immediately
> really because you'll find out if data isn't there when you visit
> where it should be. Or am I talking crap?

That will teach me to RTFA first ;) Ok so LJ maintain an index of
which cluster each user is on, kinda of like google do :)

>
>> David
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 8: explain analyze is your friend
>>
>>
>>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>
>


pgsql-performance by date:

Previous
From: Alex Stapleton
Date:
Subject: Re: Partitioning / Clustering
Next
From: Christopher Kings-Lynne
Date:
Subject: Re: Partitioning / Clustering