Thread: Largest PG database known to man!
Hi all,
We are currently working with a customer who is looking at a database of between 200-400 TB! They are after any confirmation of PG working at this size or anywhere near it.
Anyone out there worked on anything like this size in PG please? If so, can you let me know more details etc..
----------------------------------------------------
Mark Jones
Principal Sales Engineer Emea
http://www.enterprisedb.com/
Email: Mark.Jones@enterprisedb.com
Tel: 44 7711217186
Skype: Mxjones121
On 10/1/2013 2:49 PM, Mark Jones wrote: > We are currently working with a customer who is looking at a database > of between 200-400 TB! They are after any confirmation of PG working > at this size or anywhere near it. is that really 200-400TB of relational data, or is it 199-399TB of bulk data (blobs or whatever) interspersed with some relational metadata? what all is the usage pattern of this data? that determines the feasibility of something far more than just the raw size. -- john r pierce 37N 122W somewhere on the middle of the left coast
Thanks for your quick response John. From the limited information, it is mostly relational. As for usage patterns, I do not have that yet. I was just after a general feel of what is out there size wise. Regards ---------------------------------------------------- Mark Jones Principal Sales Engineer Emea http://www.enterprisedb.com/ Email: Mark.Jones@enterprisedb.com Tel: 44 7711217186 Skype: Mxjones121 On 01/10/2013 22:56, "John R Pierce" <pierce@hogranch.com> wrote: >On 10/1/2013 2:49 PM, Mark Jones wrote: >> We are currently working with a customer who is looking at a database >> of between 200-400 TB! They are after any confirmation of PG working >> at this size or anywhere near it. > > >is that really 200-400TB of relational data, or is it 199-399TB of bulk >data (blobs or whatever) interspersed with some relational metadata? > >what all is the usage pattern of this data? that determines the >feasibility of something far more than just the raw size. > > > > >-- >john r pierce 37N 122W >somewhere on the middle of the left coast > > > >-- >Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >To make changes to your subscription: >http://www.postgresql.org/mailpref/pgsql-general
Maybe some of these folks can chime in?
http://cds.u-strasbg.fr/
Simbad (and I think VisieR) runs on PostgreSQL. A friend of mine is a grad student in astronomy and he told me about them.
Jeff Ross
http://cds.u-strasbg.fr/
Simbad (and I think VisieR) runs on PostgreSQL. A friend of mine is a grad student in astronomy and he told me about them.
Jeff Ross
On 10/1/13 3:49 PM, Mark Jones wrote:
Hi all,We are currently working with a customer who is looking at a database of between 200-400 TB! They are after any confirmation of PG working at this size or anywhere near it.Anyone out there worked on anything like this size in PG please? If so, can you let me know more details etc..----------------------------------------------------Mark Jones
Principal Sales Engineer Emeahttp://www.enterprisedb.com/
Email: Mark.Jones@enterprisedb.com
Tel: 44 7711217186
Skype: Mxjones121
On 10/1/2013 3:00 PM, Mark Jones wrote: > >From the limited information, it is mostly relational. phew. thats going to be a monster. 400TB on 600GB 15000rpm SAS drives in raid10 will require around 1400 drives. at 25 disks per 2U drive tray, thats 2 6' racks of nothing but disks, and to maintain a reasonable fanout to minimize IO bottlenecks, would require on the order of 25 SAS raid cards. or, a really big SAN with some serious IOPS. and naturally, you should have at least 2 of these for availability. if we assume the tables average 1KB/record (which is a fairly large record size even including indexing), you're looking at 400 billion records. if you can populate these at 5000 records/second, it would take 2.5 years of 24/7 operation to populate that. this sort of big data system is probably more suitable for something like hadoop+mongo or whatever on a cloud of 1000 nodes, not a monolithic SQL relational database. -- john r pierce 37N 122W somewhere on the middle of the left coast
On 02/10/13 07:49, Mark Jones wrote: > Hi all, > > We are currently working with a customer who is looking at a database > of between 200-400 TB! They are after any confirmation of PG working > at this size or anywhere near it. > Anyone out there worked on anything like this size in PG please? If > so, can you let me know more details etc.. > ---------------------------------------------------- > Wow that's awesome - but you know the difference between 200TB and 400TB is quite significant (100%)? Like a whole bunch of cash significant...unless we are talking GB. But is that it? This isn't really fair, is this a test? Jules
* John R Pierce (pierce@hogranch.com) wrote: > if we assume the tables average 1KB/record (which is a fairly large > record size even including indexing), you're looking at 400 billion > records. if you can populate these at 5000 records/second, it > would take 2.5 years of 24/7 operation to populate that. 5000 1KB records per second is only 5MB/s or so, which is really quite slow.. I can't imagine that they'd load all of this data by doing a commit for each record and you could load a *huge* amount of data *very* quickly, in parallel, by using either unlogged tables or wal_level = minimal and creating the tables in the same transaction that's loading them. > this sort of big data system is probably more suitable for something > like hadoop+mongo or whatever on a cloud of 1000 nodes, not a > monolithic SQL relational database. Or a federated PG database using FDWs.. Sadly, I've not personally worked with a data system on the 100+TB range w/ PG (we do have a Hadoop environment along that scale) but I've built systems as large as 25TB which, built correctly, work very well. Still, I don't think I'd recommend building a single-image PG database on that scale but rather would shard it. Thanks, Stephen
Attachment
On 10/1/2013 6:53 PM, Stephen Frost wrote: > I don't think I'd recommend building a single-image PG database on that > scale but rather would shard it. sharding only works well if your data has natural divisions and you're not doing complex joins/aggregates across those divisions. -- john r pierce 37N 122W somewhere on the middle of the left coast
On Tue, Oct 1, 2013 at 3:00 PM, Mark Jones <mark.jones@enterprisedb.com> wrote:
Thanks for your quick response John.
From the limited information, it is mostly relational.
As for usage patterns, I do not have that yet.
I was just after a general feel of what is out there size wise.
Usage patterns are going to be critical here. There is a huge difference between a large amount of data being used in an OLTP workflow than a DSS/OLAP workflow. Additionally, I am concerned your indexes are going to be very large. Now, depending on your usage pattern, breaking things down carefully regarding tablespace and partial indexes may be enough. However, keep in mind that no table can be larger than 32TB. At any rate, no matter what solution you use, I don't see a generally tuned database being what you want (which means you are tuning for workflow).
Now, I think your big limits in OLTP are going to be max table size (32TB) and index size. These can all be managed (and managed carefully) but they are limits.
For OLAP you have a totally different set of concerns, and since you are talking about aggregating a lot of data, vanilla PostgreSQL is going to be a pain to get working as it is. On the other hand OLAP and large db mixed workloads is where Postgres-XC might really shine. The complexity costs there will likely be worth it in removing limitations on disk I/O and lack of intraquery parallelism.
200TB is a lot of data. 400TB is twice that. Either way you are going to have a really complex set of problems to tackle regardless of what solution you choose.
I have heard of db sizes in the 30-100TB range on PostgreSQL even before Postgres-XC. I am not sure beyond that.
Best Wishes,
Chris Travers
Regards
----------------------------------------------------
Mark Jones
Principal Sales Engineer Emeahttp://www.enterprisedb.com/
Email: Mark.Jones@enterprisedb.com
Tel: 44 7711217186
Skype: Mxjones121On 01/10/2013 22:56, "John R Pierce" <pierce@hogranch.com> wrote:
>On 10/1/2013 2:49 PM, Mark Jones wrote:
>> We are currently working with a customer who is looking at a database
>> of between 200-400 TB! They are after any confirmation of PG working
>> at this size or anywhere near it.
>
>
>is that really 200-400TB of relational data, or is it 199-399TB of bulk
>data (blobs or whatever) interspersed with some relational metadata?
>
>what all is the usage pattern of this data? that determines the
>feasibility of something far more than just the raw size.
>
>
>
>
>--
>john r pierce 37N 122W
>somewhere on the middle of the left coast
>
>
>
>--
>Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-general
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Best Wishes,
Chris Travers
Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor lock-in.