Thread: Planning a Large RDBMS
Hi folks,
We've been using MongoDB for the past couple years and are talking about moving to a more traditional (i.e. stable) database system for OLTP. I've been using RDBMS for OLAP already, but the demands are somewhat different. But we figure, if Facebook can do it, we can. The expertise around here is more suited for MySQL, but I, for one, am fond of the features in Postgres that you can't find in MySQL. Our Mongo system is sharded across 11 logical nodes on Amazon's EC2. Our database is in the hundreds of gigabytes (i.e. too big for memory on one machine) and we can expect it to exceed a terabyte before too long. We have the distinct advantage of being able to plan this from the ground up. I've read a bit about partitioning and table spaces, but from what I can tell, those solutions still require the database to be hosted by a single machine which may not suffice for our purposes.
I realize this is a big question, and possibly too big to answer here. But I would be grateful even for some website or chapter recommendations.
Thank you in advance,
-Alessandro
On Fri, Jun 15, 2012 at 11:11:17AM -0700, Alessandro Gagliardi wrote: > Hi folks, > > We've been using MongoDB for the past couple years and are talking about > moving to a more traditional (i.e. stable) database system for OLTP. I've > been using RDBMS for OLAP already, but the demands are somewhat different. > But we figure, if Facebook can do it, we can. The expertise around here is > more suited for MySQL, but I, for one, am fond of the features in Postgres > that you can't find in MySQL. Our Mongo system is sharded across 11 logical > nodes on Amazon's EC2. Our database is in the hundreds of gigabytes (i.e. > too big for memory on one machine) and we can expect it to exceed > a terabyte before too long. We have the distinct advantage of being able to > plan this from the ground up. I've read a bit about partitioning and table > spaces, but from what I can tell, those solutions still require the > database to be hosted by a single machine which may not suffice for our > purposes. > > I realize this is a big question, and possibly too big to answer here. But > I would be grateful even for some website or chapter recommendations. > > Thank you in advance, > -Alessandro He Alessandro, I just saw an announcement for the 1.0 release of Postgres-XC which may fit your use case. Regards, Ken
After the nightmares of dealing with Mongo, I think we want to avoid anything that's in "beta" but I will definitely take a look.
On Fri, Jun 15, 2012 at 11:42 AM, ktm@rice.edu <ktm@rice.edu> wrote:
He Alessandro,
I just saw an announcement for the 1.0 release of Postgres-XC which
may fit your use case.
Regards,
Ken
I couldn't find anything on pgpool regarding sharding, but I did stumble upon this presentation that mentioned pl/proxy. PL/Proxy looks good for sharding, though it looks like it could really limit the ability to do ad hoc queries. But I suppose that would work for OLTP and then we could just maintain a separate OLAP database that may have to read from disk but would have the advantage of existing on a single machine....
-Alessandro
-Alessandro
On Fri, Jun 15, 2012 at 12:17 PM, ktm@rice.edu <ktm@rice.edu> wrote:
On Fri, Jun 15, 2012 at 11:58:02AM -0700, Alessandro Gagliardi wrote:You may want to look at pgpool to shard across a set of databases.
> I pretty much consider anything v1.0 to be "beta" for all intents and purposes.
>
> That said, perhaps the intersection between "web scale" and "tried and
> true" is a null set. I hope it's not. I guess what I'm hoping for is a
> methodology that uses tried and true technology (as opposed to untested
> bleeding edge technology).
>
Good luck and keep us updated with what you do and how it worked out.
Regards,
Ken
On Fri, 2012-06-15 at 11:11 -0700, Alessandro Gagliardi wrote: > We've been using MongoDB for the past couple years and are talking > about moving to a more traditional (i.e. stable) database system for > OLTP. I've been using RDBMS for OLAP already, but the demands are > somewhat different. But we figure, if Facebook can do it, we can. The > expertise around here is more suited for MySQL, but I, for one, am > fond of the features in Postgres that you can't find in MySQL. Our > Mongo system is sharded across 11 logical nodes on Amazon's EC2. Our > database is in the hundreds of gigabytes (i.e. too big for memory on > one machine) and we can expect it to exceed a terabyte before too > long. We have the distinct advantage of being able to plan this from > the ground up. I've read a bit about partitioning and table spaces, > but from what I can tell, those solutions still require the database > to be hosted by a single machine which may not suffice for our > purposes. I suggest first identifying what portions of the data are largely independent versus closely tied together. Then use that to determine how it makes sense to lay these out across multiple machines. Then, just use different instances of postgres across the different machines. You'll need to think carefully about how you interact with those different instances, but you've already done a lot of that thinking for your current setup. One way to think about it is that one postgres instance on one machine is the unit of consistency. Regards, Jeff Davis
On Fri, 2012-06-15 at 11:11 -0700, Alessandro Gagliardi wrote: > We've been using MongoDB for the past couple years and are talking > about moving to a more traditional (i.e. stable) database system for > OLTP. I've been using RDBMS for OLAP already, but the demands are > somewhat different. But we figure, if Facebook can do it, we can. The > expertise around here is more suited for MySQL, but I, for one, am > fond of the features in Postgres that you can't find in MySQL. Our > Mongo system is sharded across 11 logical nodes on Amazon's EC2. Our > database is in the hundreds of gigabytes (i.e. too big for memory on > one machine) and we can expect it to exceed a terabyte before too > long. We have the distinct advantage of being able to plan this from > the ground up. I've read a bit about partitioning and table spaces, > but from what I can tell, those solutions still require the database > to be hosted by a single machine which may not suffice for our > purposes. I suggest first identifying what portions of the data are largely independent versus closely tied together. Then use that to determine how it makes sense to lay these out across multiple machines. Then, just use different instances of postgres across the different machines. You'll need to think carefully about how you interact with those different instances, but you've already done a lot of that thinking for your current setup. One way to think about it is that one postgres instance on one machine is the unit of consistency. Regards, Jeff Davis