Thread: table size/record limit

table size/record limit

From
Dennis Gearon
Date:
I am designing something that may be the size of yahoo, google, ebay, etc.

Just ONE many to many table could possibly have the following
characteristics:

    3,600,000,000 records
    each record is 9 fields of INT4/DATE

Other tables will have about 5 million records of about the same size.

There are lots of scenarios here to lessson this.

BUT, is postgres on linux, maybe necessarily a 64 bit system, cabable of
this? And there'd be 4-5 indexes on that table.

Re: table size/record limit

From
Tino Wildenhain
Date:
Hi,

Am Do, den 21.10.2004 schrieb Dennis Gearon um 1:30:
> I am designing something that may be the size of yahoo, google, ebay, etc.
>
> Just ONE many to many table could possibly have the following
> characteristics:
>
>     3,600,000,000 records
>     each record is 9 fields of INT4/DATE
>
> Other tables will have about 5 million records of about the same size.
>
> There are lots of scenarios here to lessson this.
>
> BUT, is postgres on linux, maybe necessarily a 64 bit system, cabable of
> this? And there'd be 4-5 indexes on that table.

Sure. Why not? 3...5mio records is not really a problem.
We had bigger tables with historic commercial transactions
(even on an old dual PIII/1000) with fine performance.
I bet however, yahoo, google at least are much bigger :-)

Regards
Tino



Re: table size/record limit

From
Dennis Gearon
Date:
Google probably is much bigger, and on mainframes, and probably Oracle or DB2.

But the table I am worried about is the one sized >= 3.6 GIGA records.

Tino Wildenhain wrote:

> Hi,
>
> Am Do, den 21.10.2004 schrieb Dennis Gearon um 1:30:
>
>>I am designing something that may be the size of yahoo, google, ebay, etc.
>>
>>Just ONE many to many table could possibly have the following
>>characteristics:
>>
>>    3,600,000,000 records
>>    each record is 9 fields of INT4/DATE
>>
>>Other tables will have about 5 million records of about the same size.
>>
>>There are lots of scenarios here to lessson this.
>>
>>BUT, is postgres on linux, maybe necessarily a 64 bit system, cabable of
>>this? And there'd be 4-5 indexes on that table.
>
>
> Sure. Why not? 3...5mio records is not really a problem.
> We had bigger tables with historic commercial transactions
> (even on an old dual PIII/1000) with fine performance.
> I bet however, yahoo, google at least are much bigger :-)
>
> Regards
> Tino
>
>
>


Re: table size/record limit

From
"Joshua D. Drake"
Date:
Dennis Gearon wrote:

> Google probably is much bigger, and on mainframes, and probably Oracle
> or DB2.

Google uses a Linux cluster and there database is HUGE. I do not know
which database
they use. I bet they built their own specifically for what they do.

Sincerely,

Joshua D. Drake



>
> But the table I am worried about is the one sized >= 3.6 GIGA records.
>
> Tino Wildenhain wrote:
>
>> Hi,
>>
>> Am Do, den 21.10.2004 schrieb Dennis Gearon um 1:30:
>>
>>> I am designing something that may be the size of yahoo, google,
>>> ebay, etc.
>>>
>>> Just ONE many to many table could possibly have the following
>>> characteristics:
>>>
>>>    3,600,000,000 records
>>>    each record is 9 fields of INT4/DATE
>>>
>>> Other tables will have about 5 million records of about the same size.
>>>
>>> There are lots of scenarios here to lessson this.
>>>
>>> BUT, is postgres on linux, maybe necessarily a 64 bit system,
>>> cabable of this? And there'd be 4-5 indexes on that table.
>>
>>
>>
>> Sure. Why not? 3...5mio records is not really a problem.
>> We had bigger tables with historic commercial transactions
>> (even on an old dual PIII/1000) with fine performance.
>> I bet however, yahoo, google at least are much bigger :-)
>>
>> Regards
>> Tino
>>
>>
>>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faqs/FAQ.html



--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL


Re: table size/record limit

From
Dennis Gearon
Date:
Actually, now that I think about it, they use a special table type that the INDEX is also the DATUM. It is possible to
recoverthe data, out of the index listing. So go down the index, then decode the indexing value - voila, a whole step
saved.I have no idea what engine these table types are in, however. 

Joshua D. Drake wrote:

> Dennis Gearon wrote:
>
>> Google probably is much bigger, and on mainframes, and probably Oracle
>> or DB2.
>
>
> Google uses a Linux cluster and there database is HUGE. I do not know
> which database
> they use. I bet they built their own specifically for what they do.
>
> Sincerely,
>
> Joshua D. Drake
>
>
>
>>
>> But the table I am worried about is the one sized >= 3.6 GIGA records.
>>
>> Tino Wildenhain wrote:
>>
>>> Hi,
>>>
>>> Am Do, den 21.10.2004 schrieb Dennis Gearon um 1:30:
>>>
>>>> I am designing something that may be the size of yahoo, google,
>>>> ebay, etc.
>>>>
>>>> Just ONE many to many table could possibly have the following
>>>> characteristics:
>>>>
>>>>    3,600,000,000 records
>>>>    each record is 9 fields of INT4/DATE
>>>>
>>>> Other tables will have about 5 million records of about the same size.
>>>>
>>>> There are lots of scenarios here to lessson this.
>>>>
>>>> BUT, is postgres on linux, maybe necessarily a 64 bit system,
>>>> cabable of this? And there'd be 4-5 indexes on that table.
>>>
>>>
>>>
>>>
>>> Sure. Why not? 3...5mio records is not really a problem.
>>> We had bigger tables with historic commercial transactions
>>> (even on an old dual PIII/1000) with fine performance.
>>> I bet however, yahoo, google at least are much bigger :-)
>>>
>>> Regards
>>> Tino
>>>
>>>
>>>
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 5: Have you checked our extensive FAQ?
>>
>>               http://www.postgresql.org/docs/faqs/FAQ.html
>
>
>
>


Re: table size/record limit

From
Robby Russell
Date:
On Wed, 2004-10-20 at 23:01 -0700, Joshua D. Drake wrote:
> Dennis Gearon wrote:
>
> > Google probably is much bigger, and on mainframes, and probably Oracle
> > or DB2.
>
> Google uses a Linux cluster and there database is HUGE. I do not know
> which database
> they use. I bet they built their own specifically for what they do.

...actually, I heard they were running it off a flat file database on 7
386 machines in some guys garage off a dsl connection. I could be wrong
though. ;-)

-Robby

--
/***************************************
* Robby Russell | Owner.Developer.Geek
* PLANET ARGON  | www.planetargon.com
* Portland, OR  | robby@planetargon.com
* 503.351.4730  | blog.planetargon.com
* PHP/PostgreSQL Hosting & Development
****************************************/


Attachment

Re: table size/record limit

From
David Helgason
Date:
On 21. okt 2004, at 01:30, Dennis Gearon wrote:

> I am designing something that may be the size of yahoo, google, ebay,
> etc.

Grrr. Geek wet-dream.

> Just ONE many to many table could possibly have the following
> characteristics:
>
>    3,600,000,000 records
>    each record is 9 fields of INT4/DATE

I don't do this myself (my data is only 3 gig, and most of that is in
blobs), but people have repeatedly reported such sizes on this list.

Check
    http://archives.postgresql.org/pgsql-admin/2001-01/msg00188.php

... but the best you can do is just to try it out. With a few commands
in the 'pql' query tool you can easily populate a ridiculously large
database ("insert into foo select * from foo" a few times).

In few hours you'll have some feel of it.

> Other tables will have about 5 million records of about the same size.
>
> There are lots of scenarios here to lessson this.

What you'll have to worry about most is the access pattern, and update
frequency.

There's a lot of info out there. You may need any of the following:
  • clustering, the 'slony' project seems to be popular around here.
  • concurrency of updating
  • connnection pooling, maybe via Apache or some java-thingey
  • securing yourself from hardware errors

This list is a goldmine of discussions. Search the archives for
discussions and pointers. Search interfaces at

    http://archives.postgresql.org/pgsql-general/
    http://archives.postgresql.org/pgsql-admin/

.... or download the list archive mbox files into your mail-program and
use that (which is what I do).

d.
--
David Helgason,
Business Development et al.,
Over the Edge I/S (http://otee.dk)
Direct line +45 2620 0663
Main line +45 3264 5049



Re: table size/record limit

From
Gaetano Mendola
Date:
Dennis Gearon wrote:
> I am designing something that may be the size of yahoo, google, ebay, etc.
>
> Just ONE many to many table could possibly have the following
> characteristics:
>
>    3,600,000,000 records

This is a really huge monster one, and if you don't partition that
table in some way I think you'll have nightmares with it...



Regards
Gaetano Mendola


Re: table size/record limit

From
Gaetano Mendola
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dennis Gearon wrote:
| Gaetano Mendola wrote:
|
|> Dennis Gearon wrote:
|>
|>> I am designing something that may be the size of yahoo, google, ebay,
|>> etc.
|>>
|>> Just ONE many to many table could possibly have the following
|>> characteristics:
|>>
|>>    3,600,000,000 records
|>
|> This is a really huge monster one, and if you don't partition that
|> table in some way I think you'll have nightmares with it...
|>
|> Regards
|> Gaetano Mendola
|>
| thanks for the input, Gaetano.

For partion in some way I don't mean only split it in more tables. You
can use some available tools in postgres and continue to see this table
as one but implemented behind the scenes with more tables.
One usefull and impressive way is to use the inheritance in order to obtain
a vertical partition

0) Decide a partition policy ( based on time stamp for example )
1) Create an empty base table with the name that you want see as "public"
2) Create the partition using the empty table as base table
3) Create a rule on the base table so an insert or the update on it is
~   performed as a insert or an update on the right table ( using the partition
~   policy at step 0 )

in this way you are able to vacuum each partition, reindex each partition and
so on in a more "feseable way" I do not immagine vacuum full or reindex a
3,600,000,000 records table...



Regards
Gaetano Mendola






















-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBeLiK7UpzwH2SGd4RAh+TAJ4w89SvkFWgt9DGhQx/aUR6j2wDtwCgtut5
FN0OuoycbI37a8Wouvo3icw=
=Wb6h
-----END PGP SIGNATURE-----


Re: table size/record limit

From
Dennis Gearon
Date:
Great Idea! When I get that far, I will try it.

Gaetano Mendola wrote:

<snip>

> For partion in some way I don't mean only split it in more tables. You
> can use some available tools in postgres and continue to see this table
> as one but implemented behind the scenes with more tables.
> One usefull and impressive way is to use the inheritance in order to obtain
> a vertical partition
>
> 0) Decide a partition policy ( based on time stamp for example )
> 1) Create an empty base table with the name that you want see as "public"
> 2) Create the partition using the empty table as base table
> 3) Create a rule on the base table so an insert or the update on it is
> ~   performed as a insert or an update on the right table ( using the
> partition
> ~   policy at step 0 )
>
> in this way you are able to vacuum each partition, reindex each
> partition and
> so on in a more "feseable way" I do not immagine vacuum full or reindex a
> 3,600,000,000 records table...