Re: multimaster (was: Slightly OT.) - Mailing list pgsql-general

From Alexander Staubo
Subject Re: multimaster (was: Slightly OT.)
Date
Msg-id 88daf38c0706011140t5380d50i1d37240add321025@mail.gmail.com
Whole thread Raw
In response to Re: multimaster (was: Slightly OT.)  (Andrew Sullivan <ajs@crankycanuck.ca>)
Responses Re: multimaster
Re: multimaster (was: Slightly OT.)
Re: multimaster
Re: multimaster
List pgsql-general
On 6/1/07, Andrew Sullivan <ajs@crankycanuck.ca> wrote:
> These are all different solutions to different problems, so it's not
> surprising that they look different.  This was the reason I asked,
> "What is the problem you are trying to solve?"

You mean aside from the obvious one, scalability?

The databases is becoming a bottleneck for a lot of so-called "Web
2.0" apps which use a shared-nothing architecture (such as Rails,
Django or PHP) in conjunction with a database. Lots of ad-hoc database
queries that come not just from web hits but also from somewhat
awkwardly fitting an object model onto a relational database.

These "new" apps are typically intensely personal and contextual,
where every page is personalized for the visiting user, and doing a
whole bunch of crazy multijoin queries to fetch the latest posts, the
most recent recommendations from your friends, the most highly rated
stuff. In fact, merely doing something seemingly simple like
incrementing a row's counter every time a post has been viewed is
eventually going to have a negative performance impact on a
traditional OLTP-optimized relational database.

I'm sure some people would disagree with the significance of the above
(possibly by replying that a relational database is the wrong kind of
tool for such apps), or that there is an urgent need to scale beyond
the single server, but I would hope that there would, at some point,
appear a solution that could enable a database to scale horizontally
with minimal impact on the application. In light of this need, I think
we could be more productive by rephrasing the question "how/when we
can implement multimaster replication?" as "how/when can we implement
horizontal scaling?".

As it stands today, horizontally partitioning a database into multiple
separate "shards" is incredibly invasive on the application
architecture, and typically relies on brittle and non-obvious hacks
such as configuring sequence generators with staggered starting
numbers, omitting referential integrity constraints, sacrificing
transactional semantics, and moving query aggregation into the app
level. On top of this, dumb caches such as Memcached are typically
layered to avoid hitting the database in the first place.

Still, with MySQL and a bit of glue, guys like eBay, Flickr and
MySpace are partitioning their databases relatively successfully using
such tricks. These guys are not average database users, but not they
are not the only ones that have suffered from database bottlenecks and
overcome them using clever, if desperate, measures. Cal Henderson (or
was it Stewart Butterfield?) of Flickr has famously said he would
never again start a project that didn't have a partitioning from the
start.

I would love to see a discussion about how PostgreSQL could address
these issues.

Alexander.

pgsql-general by date:

Previous
From: Rick Schumeyer
Date:
Subject: Multiple customers sharing one database?
Next
From: Teodor Sigaev
Date:
Subject: Re: warm standby server stops doingcheckpointsafterawhile