Thread: Largest PG database known to man!

Largest PG database known to man!

From
Mark Jones
Date:
Hi all,

We are currently working with a customer who is looking at a database of between 200-400 TB! They are after any confirmation of PG working at this size or anywhere near it.
Anyone out there worked on anything like this size in PG please? If so, can you let me know more details etc..  
----------------------------------------------------

   Mark Jones
   Principal Sales Engineer Emea

   http://www.enterprisedb.com/
   
   Email: Mark.Jones@enterprisedb.com
   Tel: 44 7711217186
   Skype: Mxjones121


Re: Largest PG database known to man!

From
John R Pierce
Date:
On 10/1/2013 2:49 PM, Mark Jones wrote:
> We are currently working with a customer who is looking at a database
> of between 200-400 TB! They are after any confirmation of PG working
> at this size or anywhere near it.


is that really 200-400TB of relational data, or is it 199-399TB of bulk
data (blobs or whatever) interspersed with some relational metadata?

what all is the usage pattern of this data?   that determines the
feasibility of something far more than just the raw size.




--
john r pierce                                      37N 122W
somewhere on the middle of the left coast



Re: Largest PG database known to man!

From
Mark Jones
Date:
Thanks for your quick response John.

From the limited information, it is mostly relational.
As for usage patterns, I do not have that yet.
I was just after a general feel of what is out there size wise.

Regards


----------------------------------------------------
   Mark Jones
   Principal Sales Engineer Emea


   http://www.enterprisedb.com/

   Email: Mark.Jones@enterprisedb.com
   Tel: 44 7711217186
   Skype: Mxjones121














On 01/10/2013 22:56, "John R Pierce" <pierce@hogranch.com> wrote:

>On 10/1/2013 2:49 PM, Mark Jones wrote:
>> We are currently working with a customer who is looking at a database
>> of between 200-400 TB! They are after any confirmation of PG working
>> at this size or anywhere near it.
>
>
>is that really 200-400TB of relational data, or is it 199-399TB of bulk
>data (blobs or whatever) interspersed with some relational metadata?
>
>what all is the usage pattern of this data?   that determines the
>feasibility of something far more than just the raw size.
>
>
>
>
>--
>john r pierce                                      37N 122W
>somewhere on the middle of the left coast
>
>
>
>--
>Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-general




Re: Largest PG database known to man!

From
Jeff Ross
Date:
Maybe some of these folks can chime in?

http://cds.u-strasbg.fr/

Simbad (and I think VisieR) runs on PostgreSQL.  A friend of mine is a grad student in astronomy and he told me about them.

Jeff Ross
On 10/1/13 3:49 PM, Mark Jones wrote:
Hi all,

We are currently working with a customer who is looking at a database of between 200-400 TB! They are after any confirmation of PG working at this size or anywhere near it.
Anyone out there worked on anything like this size in PG please? If so, can you let me know more details etc..  
----------------------------------------------------

   Mark Jones
   Principal Sales Engineer Emea

   http://www.enterprisedb.com/
   
   Email: Mark.Jones@enterprisedb.com
   Tel: 44 7711217186
   Skype: Mxjones121



Re: Largest PG database known to man!

From
John R Pierce
Date:
On 10/1/2013 3:00 PM, Mark Jones wrote:
> >From the limited information, it is mostly relational.

phew.   thats going to be a monster.    400TB on 600GB 15000rpm SAS
drives in raid10 will require around 1400 drives.   at 25 disks per 2U
drive tray, thats 2 6' racks of nothing but disks, and to maintain a
reasonable fanout to minimize IO bottlenecks, would require on the order
of 25 SAS raid cards.   or, a really big SAN with some serious IOPS.
and naturally, you should have at least 2 of these for availability.

if we assume the tables average 1KB/record (which is a fairly large
record size even including indexing), you're looking at 400 billion
records.   if you can populate these at 5000 records/second, it would
take 2.5 years of 24/7 operation to populate that.

this sort of big data system is probably more suitable for something
like hadoop+mongo or whatever on a cloud of 1000 nodes, not a monolithic
SQL relational database.


--
john r pierce                                      37N 122W
somewhere on the middle of the left coast



Re: Largest PG database known to man!

From
Julian
Date:
On 02/10/13 07:49, Mark Jones wrote:
> Hi all,
>
> We are currently working with a customer who is looking at a database
> of between 200-400 TB! They are after any confirmation of PG working
> at this size or anywhere near it.
> Anyone out there worked on anything like this size in PG please? If
> so, can you let me know more details etc..
> ----------------------------------------------------
>
Wow that's awesome - but you know the difference between 200TB and 400TB
is quite significant (100%)? Like a whole bunch of cash
significant...unless we are talking GB.
But is that it? This isn't really fair, is this a test?
Jules


Re: Largest PG database known to man!

From
Stephen Frost
Date:
* John R Pierce (pierce@hogranch.com) wrote:
> if we assume the tables average 1KB/record (which is a fairly large
> record size even including indexing), you're looking at 400 billion
> records.   if you can populate these at 5000 records/second, it
> would take 2.5 years of 24/7 operation to populate that.

5000 1KB records per second is only 5MB/s or so, which is really quite
slow..  I can't imagine that they'd load all of this data by doing a
commit for each record and you could load a *huge* amount of data *very*
quickly, in parallel, by using either unlogged tables or wal_level =
minimal and creating the tables in the same transaction that's loading
them.

> this sort of big data system is probably more suitable for something
> like hadoop+mongo or whatever on a cloud of 1000 nodes, not a
> monolithic SQL relational database.

Or a federated PG database using FDWs..

Sadly, I've not personally worked with a data system on the 100+TB range
w/ PG (we do have a Hadoop environment along that scale) but I've built
systems as large as 25TB which, built correctly, work very well.  Still,
I don't think I'd recommend building a single-image PG database on that
scale but rather would shard it.

    Thanks,

        Stephen

Attachment

Re: Largest PG database known to man!

From
John R Pierce
Date:
On 10/1/2013 6:53 PM, Stephen Frost wrote:
> I don't think I'd recommend building a single-image PG database on that
> scale but rather would shard it.

sharding only works well if your data has natural divisions and you're
not doing complex joins/aggregates across those divisions.



--
john r pierce                                      37N 122W
somewhere on the middle of the left coast



Re: Largest PG database known to man!

From
Chris Travers
Date:



On Tue, Oct 1, 2013 at 3:00 PM, Mark Jones <mark.jones@enterprisedb.com> wrote:
Thanks for your quick response John.

From the limited information, it is mostly relational.
As for usage patterns, I do not have that yet.
I was just after a general feel of what is out there size wise.

Usage patterns are going to be critical here.  There is a huge difference between a large amount of data being used in an OLTP workflow than a DSS/OLAP workflow.  Additionally, I am concerned your indexes are going to be very large.  Now, depending on your usage pattern, breaking things down carefully regarding tablespace and partial indexes may be enough.  However, keep in mind that no table can be larger than 32TB.  At any rate, no matter what solution you use, I don't see a generally tuned database being what you want (which means you are tuning for workflow).

Now, I think your big limits in OLTP are going to be max table size (32TB) and index size.  These can all be managed (and managed carefully) but they are limits.

For OLAP you have a totally different set of concerns, and since you are talking about aggregating a lot of data, vanilla PostgreSQL is going to be a pain to get working as it is.  On the other hand OLAP and large db mixed workloads is where Postgres-XC might really shine.  The complexity costs there will likely be worth it in removing limitations on disk I/O and lack of intraquery parallelism.

200TB is a lot of data.  400TB is twice that.  Either way you are going to have a really complex set of problems to tackle regardless of what solution you choose. 

I have heard of db sizes in the 30-100TB range on PostgreSQL even before Postgres-XC.  I am not sure beyond that.

Best Wishes,
Chris Travers

Regards


----------------------------------------------------
   Mark Jones
   Principal Sales Engineer Emea


   http://www.enterprisedb.com/

   Email: Mark.Jones@enterprisedb.com
   Tel: 44 7711217186
   Skype: Mxjones121














On 01/10/2013 22:56, "John R Pierce" <pierce@hogranch.com> wrote:

>On 10/1/2013 2:49 PM, Mark Jones wrote:
>> We are currently working with a customer who is looking at a database
>> of between 200-400 TB! They are after any confirmation of PG working
>> at this size or anywhere near it.
>
>
>is that really 200-400TB of relational data, or is it 199-399TB of bulk
>data (blobs or whatever) interspersed with some relational metadata?
>
>what all is the usage pattern of this data?   that determines the
>feasibility of something far more than just the raw size.
>
>
>
>
>--
>john r pierce                                      37N 122W
>somewhere on the middle of the left coast
>
>
>
>--
>Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-general




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor lock-in.