Thread: table size/record limit
I am designing something that may be the size of yahoo, google, ebay, etc. Just ONE many to many table could possibly have the following characteristics: 3,600,000,000 records each record is 9 fields of INT4/DATE Other tables will have about 5 million records of about the same size. There are lots of scenarios here to lessson this. BUT, is postgres on linux, maybe necessarily a 64 bit system, cabable of this? And there'd be 4-5 indexes on that table.
Hi, Am Do, den 21.10.2004 schrieb Dennis Gearon um 1:30: > I am designing something that may be the size of yahoo, google, ebay, etc. > > Just ONE many to many table could possibly have the following > characteristics: > > 3,600,000,000 records > each record is 9 fields of INT4/DATE > > Other tables will have about 5 million records of about the same size. > > There are lots of scenarios here to lessson this. > > BUT, is postgres on linux, maybe necessarily a 64 bit system, cabable of > this? And there'd be 4-5 indexes on that table. Sure. Why not? 3...5mio records is not really a problem. We had bigger tables with historic commercial transactions (even on an old dual PIII/1000) with fine performance. I bet however, yahoo, google at least are much bigger :-) Regards Tino
Google probably is much bigger, and on mainframes, and probably Oracle or DB2. But the table I am worried about is the one sized >= 3.6 GIGA records. Tino Wildenhain wrote: > Hi, > > Am Do, den 21.10.2004 schrieb Dennis Gearon um 1:30: > >>I am designing something that may be the size of yahoo, google, ebay, etc. >> >>Just ONE many to many table could possibly have the following >>characteristics: >> >> 3,600,000,000 records >> each record is 9 fields of INT4/DATE >> >>Other tables will have about 5 million records of about the same size. >> >>There are lots of scenarios here to lessson this. >> >>BUT, is postgres on linux, maybe necessarily a 64 bit system, cabable of >>this? And there'd be 4-5 indexes on that table. > > > Sure. Why not? 3...5mio records is not really a problem. > We had bigger tables with historic commercial transactions > (even on an old dual PIII/1000) with fine performance. > I bet however, yahoo, google at least are much bigger :-) > > Regards > Tino > > >
Dennis Gearon wrote: > Google probably is much bigger, and on mainframes, and probably Oracle > or DB2. Google uses a Linux cluster and there database is HUGE. I do not know which database they use. I bet they built their own specifically for what they do. Sincerely, Joshua D. Drake > > But the table I am worried about is the one sized >= 3.6 GIGA records. > > Tino Wildenhain wrote: > >> Hi, >> >> Am Do, den 21.10.2004 schrieb Dennis Gearon um 1:30: >> >>> I am designing something that may be the size of yahoo, google, >>> ebay, etc. >>> >>> Just ONE many to many table could possibly have the following >>> characteristics: >>> >>> 3,600,000,000 records >>> each record is 9 fields of INT4/DATE >>> >>> Other tables will have about 5 million records of about the same size. >>> >>> There are lots of scenarios here to lessson this. >>> >>> BUT, is postgres on linux, maybe necessarily a 64 bit system, >>> cabable of this? And there'd be 4-5 indexes on that table. >> >> >> >> Sure. Why not? 3...5mio records is not really a problem. >> We had bigger tables with historic commercial transactions >> (even on an old dual PIII/1000) with fine performance. >> I bet however, yahoo, google at least are much bigger :-) >> >> Regards >> Tino >> >> >> > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL
Actually, now that I think about it, they use a special table type that the INDEX is also the DATUM. It is possible to recoverthe data, out of the index listing. So go down the index, then decode the indexing value - voila, a whole step saved.I have no idea what engine these table types are in, however. Joshua D. Drake wrote: > Dennis Gearon wrote: > >> Google probably is much bigger, and on mainframes, and probably Oracle >> or DB2. > > > Google uses a Linux cluster and there database is HUGE. I do not know > which database > they use. I bet they built their own specifically for what they do. > > Sincerely, > > Joshua D. Drake > > > >> >> But the table I am worried about is the one sized >= 3.6 GIGA records. >> >> Tino Wildenhain wrote: >> >>> Hi, >>> >>> Am Do, den 21.10.2004 schrieb Dennis Gearon um 1:30: >>> >>>> I am designing something that may be the size of yahoo, google, >>>> ebay, etc. >>>> >>>> Just ONE many to many table could possibly have the following >>>> characteristics: >>>> >>>> 3,600,000,000 records >>>> each record is 9 fields of INT4/DATE >>>> >>>> Other tables will have about 5 million records of about the same size. >>>> >>>> There are lots of scenarios here to lessson this. >>>> >>>> BUT, is postgres on linux, maybe necessarily a 64 bit system, >>>> cabable of this? And there'd be 4-5 indexes on that table. >>> >>> >>> >>> >>> Sure. Why not? 3...5mio records is not really a problem. >>> We had bigger tables with historic commercial transactions >>> (even on an old dual PIII/1000) with fine performance. >>> I bet however, yahoo, google at least are much bigger :-) >>> >>> Regards >>> Tino >>> >>> >>> >> >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 5: Have you checked our extensive FAQ? >> >> http://www.postgresql.org/docs/faqs/FAQ.html > > > >
On Wed, 2004-10-20 at 23:01 -0700, Joshua D. Drake wrote: > Dennis Gearon wrote: > > > Google probably is much bigger, and on mainframes, and probably Oracle > > or DB2. > > Google uses a Linux cluster and there database is HUGE. I do not know > which database > they use. I bet they built their own specifically for what they do. ...actually, I heard they were running it off a flat file database on 7 386 machines in some guys garage off a dsl connection. I could be wrong though. ;-) -Robby -- /*************************************** * Robby Russell | Owner.Developer.Geek * PLANET ARGON | www.planetargon.com * Portland, OR | robby@planetargon.com * 503.351.4730 | blog.planetargon.com * PHP/PostgreSQL Hosting & Development ****************************************/
Attachment
On 21. okt 2004, at 01:30, Dennis Gearon wrote: > I am designing something that may be the size of yahoo, google, ebay, > etc. Grrr. Geek wet-dream. > Just ONE many to many table could possibly have the following > characteristics: > > 3,600,000,000 records > each record is 9 fields of INT4/DATE I don't do this myself (my data is only 3 gig, and most of that is in blobs), but people have repeatedly reported such sizes on this list. Check http://archives.postgresql.org/pgsql-admin/2001-01/msg00188.php ... but the best you can do is just to try it out. With a few commands in the 'pql' query tool you can easily populate a ridiculously large database ("insert into foo select * from foo" a few times). In few hours you'll have some feel of it. > Other tables will have about 5 million records of about the same size. > > There are lots of scenarios here to lessson this. What you'll have to worry about most is the access pattern, and update frequency. There's a lot of info out there. You may need any of the following: • clustering, the 'slony' project seems to be popular around here. • concurrency of updating • connnection pooling, maybe via Apache or some java-thingey • securing yourself from hardware errors This list is a goldmine of discussions. Search the archives for discussions and pointers. Search interfaces at http://archives.postgresql.org/pgsql-general/ http://archives.postgresql.org/pgsql-admin/ .... or download the list archive mbox files into your mail-program and use that (which is what I do). d. -- David Helgason, Business Development et al., Over the Edge I/S (http://otee.dk) Direct line +45 2620 0663 Main line +45 3264 5049
Dennis Gearon wrote: > I am designing something that may be the size of yahoo, google, ebay, etc. > > Just ONE many to many table could possibly have the following > characteristics: > > 3,600,000,000 records This is a really huge monster one, and if you don't partition that table in some way I think you'll have nightmares with it... Regards Gaetano Mendola
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Dennis Gearon wrote: | Gaetano Mendola wrote: | |> Dennis Gearon wrote: |> |>> I am designing something that may be the size of yahoo, google, ebay, |>> etc. |>> |>> Just ONE many to many table could possibly have the following |>> characteristics: |>> |>> 3,600,000,000 records |> |> This is a really huge monster one, and if you don't partition that |> table in some way I think you'll have nightmares with it... |> |> Regards |> Gaetano Mendola |> | thanks for the input, Gaetano. For partion in some way I don't mean only split it in more tables. You can use some available tools in postgres and continue to see this table as one but implemented behind the scenes with more tables. One usefull and impressive way is to use the inheritance in order to obtain a vertical partition 0) Decide a partition policy ( based on time stamp for example ) 1) Create an empty base table with the name that you want see as "public" 2) Create the partition using the empty table as base table 3) Create a rule on the base table so an insert or the update on it is ~ performed as a insert or an update on the right table ( using the partition ~ policy at step 0 ) in this way you are able to vacuum each partition, reindex each partition and so on in a more "feseable way" I do not immagine vacuum full or reindex a 3,600,000,000 records table... Regards Gaetano Mendola -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (MingW32) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBeLiK7UpzwH2SGd4RAh+TAJ4w89SvkFWgt9DGhQx/aUR6j2wDtwCgtut5 FN0OuoycbI37a8Wouvo3icw= =Wb6h -----END PGP SIGNATURE-----
Great Idea! When I get that far, I will try it. Gaetano Mendola wrote: <snip> > For partion in some way I don't mean only split it in more tables. You > can use some available tools in postgres and continue to see this table > as one but implemented behind the scenes with more tables. > One usefull and impressive way is to use the inheritance in order to obtain > a vertical partition > > 0) Decide a partition policy ( based on time stamp for example ) > 1) Create an empty base table with the name that you want see as "public" > 2) Create the partition using the empty table as base table > 3) Create a rule on the base table so an insert or the update on it is > ~ performed as a insert or an update on the right table ( using the > partition > ~ policy at step 0 ) > > in this way you are able to vacuum each partition, reindex each > partition and > so on in a more "feseable way" I do not immagine vacuum full or reindex a > 3,600,000,000 records table...