Thread: White paper on very big databases

White paper on very big databases

From
Jean-Paul Argudo
Date:
Hi all,

I'm planning to write a white paper telling the world that yes it's
possible to manager terabytes of data with PostgreSQL.

The idea is to explain how PostgreSQL can manage this, what's the
infrastructure needed to do so. We have to tell everyone that yes it's
possible.

I know many places where the existence of this white paper would have
been of real interest when PostgreSQL face detractors, let's say
pro-my-favorite-commercial-database (replace
"my-favorite-commercial-database" with every commercial RDBS name...).

In france, we have at least one place using a 4 Tb database, the
national forecast company (Meteo France).

I'm sure we all know other places in the world, I'd like then to digest
all those examples and tell the world "look how they do at X and Y:
architecture number one", "look like the do it at W and Z: architecture
number two", etc...

So I'm asking everyone willing to work with me on this topic.

The study has to be based on real usecases, so, the first thing is that
every usecase may become of public knowledge. So I don't want someone
telling me "we have 5 Tb here, we do it this way.. but shhh don't tell
my name". This will not work, because detractors may say everything in
the study is wrong, false and pro-PostgreSQL...

So if you have any comments regarding this project, its time to say it
here!

I'll be at FOSDEM this night, until sunday. Be sure I'm open for any
discussion there, and by mail, off course.

Cheers!

--
Jean-Paul Argudo
www.PostgreSQLFr.org
www.Dalibo.com

Re: White paper on very big databases

From
"Joshua D. Drake"
Date:
On Wed, 2009-02-04 at 10:20 +0100, Jean-Paul Argudo wrote:
> Hi all,

> The study has to be based on real usecases, so, the first thing is that
> every usecase may become of public knowledge. So I don't want someone
> telling me "we have 5 Tb here, we do it this way.. but shhh don't tell
> my name". This will not work, because detractors may say everything in
> the study is wrong, false and pro-PostgreSQL...
>

Well therein lies the problem. CMD has a customer with a multi-terrabyte
table (not including the rest of the database) but we can't really talk
about it :(

> So if you have any comments regarding this project, its time to say it
> here!
>
> I'll be at FOSDEM this night, until sunday. Be sure I'm open for any
> discussion there, and by mail, off course.
>
> Cheers!
>
> --
> Jean-Paul Argudo
> www.PostgreSQLFr.org
> www.Dalibo.com
>
--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


Re: White paper on very big databases

From
"Jonah H. Harris"
Date:
On Wed, Feb 4, 2009 at 12:09 PM, Joshua D. Drake <jd@commandprompt.com> wrote:
Well therein lies the problem. CMD has a customer with a multi-terrabyte
table (not including the rest of the database) but we can't really talk
about it :(

IIRC, EnterpriseDB had one customer with over 1TB of data, but they too would have been hush-hush about it.  When I was consulting, I saw very few Postgres databases at or over 1TB.  While Postgres can handle fairly large data sets, it lacks some fairly important VLDB features which is probably why there are so few people with multi-terabyte PG databases.  Perhaps JD/Fetter know of more, but I can count the ones I know of at < 10.

--
Jonah H. Harris, Senior DBA
myYearbook.com

Re: White paper on very big databases

From
Robert Treat
Date:
On Wednesday 04 February 2009 04:20:11 Jean-Paul Argudo wrote:
> The study has to be based on real usecases, so, the first thing is that
> every usecase may become of public knowledge. So I don't want someone
> telling me "we have 5 Tb here, we do it this way.. but shhh don't tell
> my name". This will not work, because detractors may say everything in
> the study is wrong, false and pro-PostgreSQL...
>

Do you have an idea of how much specific company information you need? We
manage multi Tb db's (we've done postgres, oracle, and mysql btw), and we are
able to talk about some of our clients  implementations, but for the most
part our customers would prefer to be left out of the conversation. This was
good enough for Sun:
http://www.sun.com/third-party/srsc/resources/postgresql/postgre_success_dwp.pdf
If that level of detail is good enough for you, please drop me a line, we'd be
happy to help out.

--
Robert Treat
Conjecture: http://www.xzilla.net
Consulting: http://www.omniti.com

Re: White paper on very big databases

From
Josh Berkus
Date:
Jonah,

> IIRC, EnterpriseDB had one customer with over 1TB of data, but they too
> would have been hush-hush about it.  When I was consulting, I saw very
> few Postgres databases at or over 1TB.  While Postgres can handle fairly
> large data sets, it lacks some fairly important VLDB features which is
> probably why there are so few people with multi-terabyte PG databases.
> Perhaps JD/Fetter know of more, but I can count the ones I know of at < 10.

Odd, I worked on a bunch of multi-TB databases, one of them 75TB.

However, I'd agree that for *most* VLDB purposes, specialty DBMSes are
generally a better choice.  We tend to fill in when someone needs hybrid
OLTP/DW functionality.

--Josh


Re: White paper on very big databases

From
Grant Allen
Date:
If you're willing to consider hybrids/hacked versions based on PostgreSQL, then look at Aaron Harsh's presentation
aboutRentrak's TV ad viewing/media monitoring database ... search Youtube on those names and it'll come up as first hit
(called"Is your RDBMS letting you down?").  Their DB is petabyte scale, but it is a heavily modified version of pg. 

Ciao
Fuzzy
:-)

------------------------------------------------
Dazed and confused about technology for 20 years
http://fuzzydata.wordpress.com/


Jean-Paul Argudo wrote:
> Hi all,
>
> I'm planning to write a white paper telling the world that yes it's
> possible to manager terabytes of data with PostgreSQL.
>
> The idea is to explain how PostgreSQL can manage this, what's the
> infrastructure needed to do so. We have to tell everyone that yes it's
> possible.
>
> I know many places where the existence of this white paper would have
> been of real interest when PostgreSQL face detractors, let's say
> pro-my-favorite-commercial-database (replace
> "my-favorite-commercial-database" with every commercial RDBS name...).
>
> In france, we have at least one place using a 4 Tb database, the
> national forecast company (Meteo France).
>
> I'm sure we all know other places in the world, I'd like then to digest
> all those examples and tell the world "look how they do at X and Y:
> architecture number one", "look like the do it at W and Z: architecture
> number two", etc...
>
> So I'm asking everyone willing to work with me on this topic.
>
> The study has to be based on real usecases, so, the first thing is that
> every usecase may become of public knowledge. So I don't want someone
> telling me "we have 5 Tb here, we do it this way.. but shhh don't tell
> my name". This will not work, because detractors may say everything in
> the study is wrong, false and pro-PostgreSQL...
>
> So if you have any comments regarding this project, its time to say it
> here!
>
> I'll be at FOSDEM this night, until sunday. Be sure I'm open for any
> discussion there, and by mail, off course.
>
> Cheers!
>


Re: White paper on very big databases

From
Greg Smith
Date:
At Truviso I work on a 3.8TB database where the largest table is 1.3TB of
data and 0.8TB of index.  I have some paperwork to do before I could
really give out any real details, will look into that.  The biggest
challenge was loading all that data into there in the first place (it was
2TB from day one), as COPY isn't very fast.  That is particularly
unfortunate because that's the same path the dump/reload needed for a
version update would need to take.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: White paper on very big databases

From
Brad Nicholson
Date:
On Wed, 2009-02-04 at 21:23 -0500, Jonah H. Harris wrote:
> On Wed, Feb 4, 2009 at 12:09 PM, Joshua D. Drake
> <jd@commandprompt.com> wrote:
>
>         Well therein lies the problem. CMD has a customer with a
>         multi-terrabyte
>         table (not including the rest of the database) but we can't
>         really talk
>         about it :(
>
> IIRC, EnterpriseDB had one customer with over 1TB of data, but they
> too would have been hush-hush about it.  When I was consulting, I saw
> very few Postgres databases at or over 1TB.  While Postgres can handle
> fairly large data sets, it lacks some fairly important VLDB features
> which is probably why there are so few people with multi-terabyte PG
> databases.  Perhaps JD/Fetter know of more, but I can count the ones I
> know of at < 10.

We have one internal system that is over the 2TB mark.

One of the biggest issues around this is that we have to keep two
completely separate copies of this systems up and running in parallel
for DR purposes.  Restoring a dump takes several days (on high end
hardware and disk subsystems), and the system can not be down for that
long  - so we need to keep two completely separate instances running in
case the primary fails.

Unfortunately this system still runs on 7.4 largely due to the major
PITA it is to upgrade a database of this size (we finally have the
traction to upgrade this to 8.3).  Dump and restore is the only option
for us with this DB for an upgrade. We can't drop our backup node for
the several days and if we want to do it safely we need 3x the allocated
disk space in order to accomplish this  - space for production, the
backup, and the newly minted 8.3 system  That's a lot of high end disk
array real estate to justify.

--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


Re: White paper on very big databases

From
"Joshua D. Drake"
Date:
On Wed, 2009-02-04 at 21:23 -0500, Jonah H. Harris wrote:
> On Wed, Feb 4, 2009 at 12:09 PM, Joshua D. Drake
> <jd@commandprompt.com> wrote:
>
>         Well therein lies the problem. CMD has a customer with a
>         multi-terrabyte
>         table (not including the rest of the database) but we can't
>         really talk
>         about it :(
>
> IIRC, EnterpriseDB had one customer with over 1TB of data, but they
> too would have been hush-hush about it.  When I was consulting, I saw
> very few Postgres databases at or over 1TB.  While Postgres can handle
> fairly large data sets, it lacks some fairly important VLDB features
> which is probably why there are so few people with multi-terabyte PG
> databases.  Perhaps JD/Fetter know of more, but I can count the ones I
> know of at < 10.

I know of a lot more than that.

Joshua D. Drake

>
--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


Re: White paper on very big databases

From
Hans-Juergen Schoenig
Date:
Jonah H. Harris wrote:
> On Wed, Feb 4, 2009 at 12:09 PM, Joshua D. Drake <jd@commandprompt.com
> <mailto:jd@commandprompt.com>> wrote:
>
>     Well therein lies the problem. CMD has a customer with a
>     multi-terrabyte
>     table (not including the rest of the database) but we can't really
>     talk
>     about it :(
>
>
> IIRC, EnterpriseDB had one customer with over 1TB of data, but they
> too would have been hush-hush about it.  When I was consulting, I saw
> very few Postgres databases at or over 1TB.  While Postgres can handle
> fairly large data sets, it lacks some fairly important VLDB features
> which is probably why there are so few people with multi-terabyte PG
> databases.  Perhaps JD/Fetter know of more, but I can count the ones I
> know of at < 10.
>
> --
> Jonah H. Harris, Senior DBA
> myYearbook.com
>

hello everybody,

i know GIS databases which are ways bigger than 1 TB.
the biggest one i have seen personally recently was around 8 TB.
i had my hands on a 12 TB biest 3 years ago.

for the database size is not the real problem; the problem is rather
getting stuff in.
for most people the question is: is there anything we could still store
which would finally end up being 12 TB or more :). 80% of all people
will never get there even if they store every little movement everywhere :).

    hans



--
Cybertec Schönig & Schönig GmbH
Professional PostgreSQL Consulting, Support, Training
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: www.postgresql-support.de


Re: White paper on very big databases

From
decibel
Date:
On Feb 4, 2009, at 3:20 AM, Jean-Paul Argudo wrote:
> The study has to be based on real usecases, so, the first thing is
> that
> every usecase may become of public knowledge. So I don't want someone
> telling me "we have 5 Tb here, we do it this way.. but shhh don't tell
> my name". This will not work, because detractors may say everything in
> the study is wrong, false and pro-PostgreSQL...


Our largest database is currently at 850GB (yeah, not quite 1TB ;P),
but it's also OLTP. It's been a *long* time since I've looked, but
I'm pretty sure it's doing on the order of 100TPS. I also rather
doubt that the company would have an issue doing a case study.
--
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828