Home > mailing lists

Re: Millions of tables - Mailing list pgsql-performance

From	Mike Sofen
Subject	Re: Millions of tables
Date	September 26, 2016 03:24:01
Msg-id	059501d217a5$5b17e120$1147a360$@runbox.com Whole thread Raw
In response to	Millions of tables (Greg Spiegelberg <gspiegelberg@gmail.com>)
Responses	Re: Millions of tables
List	pgsql-performance

Tree view

From: Greg Spiegelberg Sent: Sunday, September 25, 2016 7:50 PM
… Over the weekend, I created 8M tables with 16M indexes on those tables.

… A system or database crash could take potentially hours to days to recover. There are likely other issues ahead.

You may wonder, "why is Greg attempting such a thing?" I looked at DynamoDB, BigTable, and Cassandra. I like Greenplum but, let's face it, it's antiquated and don't get me started on "Hadoop". Problem with the "one big table" solution is I anticipate 1,200 trillion records. Random access is expected and the customer expects <30ms reads for a single record fetch.

I'm not looking for alternatives yet but input to my test.

_________

Holy guacamole, batman! Ok, here’s my take: you’ve traded the risks/limitations of the known for the risks of the unknown. The unknown being, in the numerous places where postgres historical development may have cut corners, you may be the first to exercise those corners and flame out like the recent SpaceX rocket.

Put it another way – you’re going to bet your career (perhaps) or a client’s future on an architectural model that just doesn’t seem feasible. I think you’ve got a remarkable design problem to solve, and am glad you’ve chosen to share that problem with us.

And I do think it will boil down to this: it’s not that you CAN do it on Postgres (which you clearly can), but once in production, assuming things are actually stable, how will you handle the data management aspects like inevitable breakage, data integrity issues, backups, restores, user contention for resources, fault tolerance and disaster recovery. Just listing the tables will take forever. Add a column? Never. I do think the amount of testing you’ll need to do prove that every normal data management function still works at that table count…that in itself is going to be not a lot of fun.

This one hurts my head. Ironically, the most logical destination for this type of data may actually be Hadoop – auto-scale, auto-shard, fault tolerant, etc…and I’m not a Hadoopie.

I am looking forward to hearing how this all plays out, it will be quite an adventure! All the best,

Mike Sofen (Synthetic Genomics…on Postgres 9.5x)

pgsql-performance by date:

From: julyanto SUTANDANG
Date: 26 September 2016, 03:04:36
Subject: Re: Millions of tables

From: Greg Spiegelberg
Date: 26 September 2016, 04:05:23
Subject: Re: Millions of tables

Re: Millions of tables - Mailing list pgsql-performance

Previous

Next