Thread: Billions of records?

Billions of records?

From
John Bercik
Date:
Hi,

Can anyone tell me what the limitations are regarding records?
Say I have a table with 50 fields of 20 characters each.  If a table can
be 16 TB, how many records can it hold?

If you all have any other size benchmarks or such please include it.  Is
PG ready for huge db's?  How big is the biggest known?

--
Best regards,

John Bercik                E-Mail: bercikj@musc.edu
Systems Programmer            Office: 843/792-1715
Medical University of South Carolina    Fax: 843/792-0816
Room 826F Clinical Science Building
76 Jonathan Lucas Street
Charleston, SC, USA  29425


Re: Billions of records?

From
Shridhar Daithankar
Date:
On Tuesday 15 July 2003 19:15, John Bercik wrote:
> Hi,
>
> Can anyone tell me what the limitations are regarding records?
> Say I have a table with 50 fields of 20 characters each.  If a table can
> be 16 TB, how many records can it hold?

Each tuple in postgresql has around 28 bytes of overhead. Index tuple has 12
bytes. Besides varchar would include additional overhead. And char won't be
true char because it always stores length of string. Although I have
specialised char types developed by some kind hackers which are true chars.

Other than that there is no limit on size of table.  Check this
http://www.postgresql.org/docs/faqs/FAQ.html#4.5

BTW, is it coincidence that your figure of 16TB matches the limit there?

> If you all have any other size benchmarks or such please include it.  Is
> PG ready for huge db's?  How big is the biggest known?

I have had references of databases as big as 300GB. People routinely have more
than 10GB databases. You can dig archives for the same.

HTH

 Shridhar


Re: Billions of records?

From
Tom Lane
Date:
Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes:
> Other than that there is no limit on size of table.  Check this
> http://www.postgresql.org/docs/faqs/FAQ.html#4.5
> BTW, is it coincidence that your figure of 16TB matches the limit there?

No, because he was correctly quoting the FAQ ;-).  The 16TB limit comes
from the fact that BlockNumber is 32 bits, so you cannot have a table
larger than 4 billion blocks.  (The FAQ is conservatively assuming that
the limit is 2 billion blocks; 2G blocks * 8KB block size = 16TB.  In
principle 4 billion should work, since BlockNumber is an unsigned int.
But there used to be places that sloppily used signed arithmetic on
block numbers, and we aren't entirely sure we've gotten rid of all of
'em.  If anyone has facilities to test behavior with tables between
16TB and 32TB in size, I'd be interested to hear the results ...)

>> If you all have any other size benchmarks or such please include it.  Is
>> PG ready for huge db's?  How big is the biggest known?

The 4TB database mentioned in the FAQ belongs to the American Chemical
Society (something about scanning all their journals back to the
mid-1800s...); you can dig in the PG list archives for details.

Another large database I'm aware of is the 2-micron sky survey:
http://pegasus.astro.umass.edu/
which covers a bit under half a billion stars; it is reportedly about
150GB when loaded into Postgres.  The UMass people seem to be happy with
the performance they get...

            regards, tom lane

Re: Billions of records?

From
Shridhar Daithankar
Date:
On Tuesday 15 July 2003 20:01, John Bercik wrote:
> Thanks for the reply.  So given my conditions, how many records can PG
> hold?  I got the 16TB from the limitations page but I don't see how to
> calculate how many records I can hold.

OK. Let's calculate. You have 50 fields of 20 char each. So each field would
go for 24 bytes at least assuming a 4 byte integer for length.

So a tuple size is 50*24+28=1228 bytes. So in a page of 8KB, you would get 6
tuples.

In 16TB you can accomodate, 2GB pages. So you can accomodate around 12 billion
tuples for your table schema.

If you get true char types, that would accomodate  7 tuples per page. So you
could get around 14 billion tuples in there.

HTH

BTW How much data you actually have?

 Shridhar


Re: Billions of records?

From
"Jim C. Nasby"
Date:
On Tue, Jul 15, 2003 at 07:53:37PM +0530, Shridhar Daithankar wrote:
> Each tuple in postgresql has around 28 bytes of overhead. Index tuple has 12

Is this accurate?
http://www.postgresql.org/docs/view.php?version=7.3&idoc=1&file=page.html
indicates a per-tuple overhead of 23 bytes. Or have things changed in
7.4?

BTW, is there any documentation on the structure of index pages?
--
Jim C. Nasby, Database Consultant                  jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: Billions of records?

From
Tom Lane
Date:
"Jim C. Nasby" <jim@nasby.net> writes:
> On Tue, Jul 15, 2003 at 07:53:37PM +0530, Shridhar Daithankar wrote:
>> Each tuple in postgresql has around 28 bytes of overhead. Index tuple has 12

> Is this accurate?
> http://www.postgresql.org/docs/view.php?version=7.3&idoc=1&file=page.html
> indicates a per-tuple overhead of 23 bytes. Or have things changed in
> 7.4?

He's probably assuming you are using OIDs (which is the default).

> BTW, is there any documentation on the structure of index pages?

Same as heap pages...
http://developer.postgresql.org/docs/postgres/page.html
The index tuple header layout is different from heap tuple headers,
but the other info on that page applies.

            regards, tom lane

Re: Billions of records?

From
"Jim C. Nasby"
Date:
On Wed, Jul 16, 2003 at 06:07:22PM -0400, Tom Lane wrote:
> "Jim C. Nasby" <jim@nasby.net> writes:
> > BTW, is there any documentation on the structure of index pages?
>
> Same as heap pages...
> http://developer.postgresql.org/docs/postgres/page.html
> The index tuple header layout is different from heap tuple headers,
> but the other info on that page applies.

Sorry, said one thing and meant another... is there documentation on
index *tuple* layout?
--
Jim C. Nasby, Database Consultant                  jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: Billions of records?

From
Bruce Momjian
Date:
I have an internals presentation on my home page below.  There is a PDF
that will give a general overview,and of course the source code has the
details.


---------------------------------------------------------------------------

Jim C. Nasby wrote:
> On Wed, Jul 16, 2003 at 06:07:22PM -0400, Tom Lane wrote:
> > "Jim C. Nasby" <jim@nasby.net> writes:
> > > BTW, is there any documentation on the structure of index pages?
> >
> > Same as heap pages...
> > http://developer.postgresql.org/docs/postgres/page.html
> > The index tuple header layout is different from heap tuple headers,
> > but the other info on that page applies.
>
> Sorry, said one thing and meant another... is there documentation on
> index *tuple* layout?
> --
> Jim C. Nasby, Database Consultant                  jim@nasby.net
> Member: Triangle Fraternity, Sports Car Club of America
> Give your computer some brain candy! www.distributed.net Team #1828
>
> Windows: "Where do you want to go today?"
> Linux: "Where do you want to go tomorrow?"
> FreeBSD: "Are you guys coming, or what?"
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073