Thread: Planning a Large RDBMS

Planning a Large RDBMS

From
Alessandro Gagliardi
Date:
Hi folks,

We've been using MongoDB for the past couple years and are talking about moving to a more traditional (i.e. stable) database system for OLTP. I've been using RDBMS for OLAP already, but the demands are somewhat different. But we figure, if Facebook can do it, we can. The expertise around here is more suited for MySQL, but I, for one, am fond of the features in Postgres that you can't find in MySQL. Our Mongo system is sharded across 11 logical nodes on Amazon's EC2. Our database is in the hundreds of gigabytes (i.e. too big for memory on one machine) and we can expect it to exceed a terabyte before too long. We have the distinct advantage of being able to plan this from the ground up. I've read a bit about partitioning and table spaces, but from what I can tell, those solutions still require the database to be hosted by a single machine which may not suffice for our purposes.

I realize this is a big question, and possibly too big to answer here. But I would be grateful even for some website or chapter recommendations.

Thank you in advance,
-Alessandro

Re: Planning a Large RDBMS

From
"ktm@rice.edu"
Date:
On Fri, Jun 15, 2012 at 11:11:17AM -0700, Alessandro Gagliardi wrote:
> Hi folks,
>
> We've been using MongoDB for the past couple years and are talking about
> moving to a more traditional (i.e. stable) database system for OLTP. I've
> been using RDBMS for OLAP already, but the demands are somewhat different.
> But we figure, if Facebook can do it, we can. The expertise around here is
> more suited for MySQL, but I, for one, am fond of the features in Postgres
> that you can't find in MySQL. Our Mongo system is sharded across 11 logical
> nodes on Amazon's EC2. Our database is in the hundreds of gigabytes (i.e.
> too big for memory on one machine) and we can expect it to exceed
> a terabyte before too long. We have the distinct advantage of being able to
> plan this from the ground up. I've read a bit about partitioning and table
> spaces, but from what I can tell, those solutions still require the
> database to be hosted by a single machine which may not suffice for our
> purposes.
>
> I realize this is a big question, and possibly too big to answer here. But
> I would be grateful even for some website or chapter recommendations.
>
> Thank you in advance,
> -Alessandro

He Alessandro,

I just saw an announcement for the 1.0 release of Postgres-XC which
may fit your use case.

Regards,
Ken

Re: Planning a Large RDBMS

From
Alessandro Gagliardi
Date:
After the nightmares of dealing with Mongo, I think we want to avoid anything that's in "beta" but I will definitely take a look.

On Fri, Jun 15, 2012 at 11:42 AM, ktm@rice.edu <ktm@rice.edu> wrote:
He Alessandro,

I just saw an announcement for the 1.0 release of Postgres-XC which
may fit your use case.

Regards,
Ken

Re: Planning a Large RDBMS

From
Alessandro Gagliardi
Date:
I couldn't find anything on pgpool regarding sharding, but I did stumble upon this presentation that mentioned pl/proxy. PL/Proxy looks good for sharding, though it looks like it could really limit the ability to do ad hoc queries. But I suppose that would work for OLTP and then we could just maintain a separate OLAP database that may have to read from disk but would have the advantage of existing on a single machine.... 

-Alessandro

On Fri, Jun 15, 2012 at 12:17 PM, ktm@rice.edu <ktm@rice.edu> wrote:
On Fri, Jun 15, 2012 at 11:58:02AM -0700, Alessandro Gagliardi wrote:
> I pretty much consider anything v1.0 to be "beta" for all intents and purposes.
>
> That said, perhaps the intersection between "web scale" and "tried and
> true" is a null set. I hope it's not. I guess what I'm hoping for is a
> methodology that uses tried and true technology (as opposed to untested
> bleeding edge technology).
>

You may want to look at pgpool to shard across a set of databases.
Good luck and keep us updated with what you do and how it worked out.

Regards,
Ken

Re: Planning a Large RDBMS

From
Jeff Davis
Date:
On Fri, 2012-06-15 at 11:11 -0700, Alessandro Gagliardi wrote:
> We've been using MongoDB for the past couple years and are talking
> about moving to a more traditional (i.e. stable) database system for
> OLTP. I've been using RDBMS for OLAP already, but the demands are
> somewhat different. But we figure, if Facebook can do it, we can. The
> expertise around here is more suited for MySQL, but I, for one, am
> fond of the features in Postgres that you can't find in MySQL. Our
> Mongo system is sharded across 11 logical nodes on Amazon's EC2. Our
> database is in the hundreds of gigabytes (i.e. too big for memory on
> one machine) and we can expect it to exceed a terabyte before too
> long. We have the distinct advantage of being able to plan this from
> the ground up. I've read a bit about partitioning and table spaces,
> but from what I can tell, those solutions still require the database
> to be hosted by a single machine which may not suffice for our
> purposes.

I suggest first identifying what portions of the data are largely
independent versus closely tied together. Then use that to determine how
it makes sense to lay these out across multiple machines. Then, just use
different instances of postgres across the different machines. You'll
need to think carefully about how you interact with those different
instances, but you've already done a lot of that thinking for your
current setup.

One way to think about it is that one postgres instance on one machine
is the unit of consistency.

Regards,
    Jeff Davis





Re: Planning a Large RDBMS

From
Jeff Davis
Date:
On Fri, 2012-06-15 at 11:11 -0700, Alessandro Gagliardi wrote:
> We've been using MongoDB for the past couple years and are talking
> about moving to a more traditional (i.e. stable) database system for
> OLTP. I've been using RDBMS for OLAP already, but the demands are
> somewhat different. But we figure, if Facebook can do it, we can. The
> expertise around here is more suited for MySQL, but I, for one, am
> fond of the features in Postgres that you can't find in MySQL. Our
> Mongo system is sharded across 11 logical nodes on Amazon's EC2. Our
> database is in the hundreds of gigabytes (i.e. too big for memory on
> one machine) and we can expect it to exceed a terabyte before too
> long. We have the distinct advantage of being able to plan this from
> the ground up. I've read a bit about partitioning and table spaces,
> but from what I can tell, those solutions still require the database
> to be hosted by a single machine which may not suffice for our
> purposes.

I suggest first identifying what portions of the data are largely
independent versus closely tied together. Then use that to determine how
it makes sense to lay these out across multiple machines. Then, just use
different instances of postgres across the different machines. You'll
need to think carefully about how you interact with those different
instances, but you've already done a lot of that thinking for your
current setup.

One way to think about it is that one postgres instance on one machine
is the unit of consistency.

Regards,
    Jeff Davis