Thread: 4G row table?

4G row table?

From
george young
Date:
[linux, 700MHz athlon, 512MB RAM, 700GB 10kRPM SCSI HW RAID, postgresql 7.2]
We're setting up a DB of IC test data, which is very simple and regular, but large.
One project (we get three or four per year) has ~4 giga bits, each specified by
a few small integer values, e.g.:
 Name       Type Values
 ----------------------
 wafer      int     1-50
 die_row    int  2-7
 die_col    int  2-7
 testtype   string (~10 different short strings)
 vdd        int  0-25
 bit_col    int  0-127
 bit_row    int  0-511
 value      bit  0 or 1

with 4 billion(4e9) rows.  I would guess to make wafer, die_row, etc. be of
type "char", probably testtype a char too with a separate testtype lookup table.
Even so, this will be a huge table.

Questions: How much overhead will there be in the table in addition to the
9 bytes of data I see?  How big will the primary index on the first seven columns
be?  Will this schema work at all?

Of course, we could pack 128 bits into an 8 byte "text" field (or should we use bit(128)?),
but lose some ease of use, especially for naive (but important) users.

Comments, suggestions?

-- George

--
 I cannot think why the whole bed of the ocean is
 not one solid mass of oysters, so prolific they seem. Ah,
 I am wandering! Strange how the brain controls the brain!
    -- Sherlock Holmes in "The Dying Detective"

Re: 4G row table?

From
Manfred Koizar
Date:
On Thu, 19 Dec 2002 14:10:58 -0500, george young <gry@ll.mit.edu>
wrote:
>with 4 billion(4e9) rows.  I would guess to make wafer, die_row, etc. be of
>type "char", probably testtype a char too with a separate testtype lookup table.
>Even so, this will be a huge table.

Don't know if you can store 0-127 in a "char" column ... Anyway, it
doesn't matter, if it does not cause the tuple size to cross a 4 byte
boundary, because the tuple size will be rounded up to a multiple of
4.

>Questions: How much overhead will there be in the table in addition to the
>9 bytes of data I see?

There is a page header (ca. 20 bytes) per page (8K by default).  Then
you have a tuple header and 4 bytes ItemIdData per tuple.

PG 7.2: Without NULLs a tuple header is 32 bytes, add 4 bytes for each
tuple containing at least one NULL column.

PG 7.3: 24 bytes tuple header (with and without NULLs, because you
have only 8 columns).

>How big will the primary index on the first seven columns be?

Don't know offhand.  No time now to dig it out.  Will answer tomorrow,
if nobody else jumps in ...

Servus
 Manfred

Re: 4G row table?

From
Josh Berkus
Date:
George,

> [linux, 700MHz athlon, 512MB RAM, 700GB 10kRPM SCSI HW RAID, postgresql 7.2]

What kind of RAID?  How many drives?  Will you be updating the data
frequently, or mostly just running reports on it?

With 4G rows, you will have *heavy* disk access, so the configuration and
quality of your disk array is a big concern.   You also might think about
upping th ememory if you can.

> We're setting up a DB of IC test data, which is very simple and regular, but
large.
> One project (we get three or four per year) has ~4 giga bits, each specified
by
> a few small integer values, e.g.:
>  Name       Type Values
>  ----------------------
>  wafer      int     1-50
>  die_row    int  2-7
>  die_col    int  2-7
>  testtype   string (~10 different short strings)
>  vdd        int  0-25
>  bit_col    int  0-127
>  bit_row    int  0-511
>  value      bit  0 or 1
>
> with 4 billion(4e9) rows.  I would guess to make wafer, die_row, etc. be of
> type "char", probably testtype a char too with a separate testtype lookup
table.
> Even so, this will be a huge table.

1. Use INT2 and not INT for the INT values above.  If you can hire a
PostgreSQL hacker, have them design a new data type for you, an unsigned INT1
which will cut your storage space even further.

2. Do not use CHAR for wafer & die-row.  CHAR requries min 3bytes storage;
INT2 is only 2 bytes.

3. If you can use a lookup table for testtype, make it another INT2 and create
a numeric key for the lookup table.

> Questions: How much overhead will there be in the table in addition to the
> 9 bytes of data I see?

There's more than 9 bytes in the above.  Count again.

> How big will the primary index on the first seven columns
> be?  Will this schema work at all?

As large as the 7 columns themselves, plus a little more.   I suggest creating
a surrogate key as an int8 sequence to refer to most rows.

> Of course, we could pack 128 bits into an 8 byte "text" field (or should we
use bit(128)?),
> but lose some ease of use, especially for naive (but important) users.

This is also unlikely to be more efficient due to the translation<->conversion
process requried to access the data when you query.

> Comments, suggestions?

Unless you have a *really* good RAID array, expect slow performance on this
hardware platform.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: 4G row table?

From
"Charles H. Woloszynski"
Date:
Josh:

Why do you say to expect slow performance on this hardware?   Is there
something specific about the configuration that worries you?  Or, just
lots of data in the database, so the data will be on disk and not in the
cache (system or postgresql)?

What do you classify as *slow*?  Obviously, he is dependent on the I/O
channel given the size of the tables.  So, good indexing will be
required to help on the queries.  No comments on the commit rate for
this data (I am guessing that it is slow, given the description of the
database), so that may or may not be an issue.

Depending on the type of queries, perhaps clustering will help, along
with some good partitioning indexes.

I just don't see the slow in the hardware.  Of course, if he is
targeting lots of concurrent queries, better add some additional
processors, or better yet, use ERSERVER and replicate the data to a farm
of machines.  [To avoid the I/O bottleneck of lots of concurrent queries
against these large tables].

I guess there are a lot of assumptions on the data's use to decide if
the hardware is adequate or not :-)

Charlie



Josh Berkus wrote:

>George,
>
>
>>[linux, 700MHz athlon, 512MB RAM, 700GB 10kRPM SCSI HW RAID, postgresql 7.2]
>>
>>
>
>What kind of RAID?  How many drives?  Will you be updating the data
>frequently, or mostly just running reports on it?
>
>
>Unless you have a *really* good RAID array, expect slow performance on this
>hardware platform.
>
>
>

--


Charles H. Woloszynski

ClearMetrix, Inc.
115 Research Drive
Bethlehem, PA 18015

tel: 610-419-2210 x400
fax: 240-371-3256
web: www.clearmetrix.com






Re: 4G row table?

From
Ron Johnson
Date:
On Thu, 2002-12-19 at 13:10, george young wrote:
> [linux, 700MHz athlon, 512MB RAM, 700GB 10kRPM SCSI HW RAID, postgresql 7.2]
> We're setting up a DB of IC test data, which is very simple and regular, but large.
> One project (we get three or four per year) has ~4 giga bits, each specified by
> a few small integer values, e.g.:
>  Name       Type Values
>  ----------------------
>  wafer      int     1-50
>  die_row    int  2-7
>  die_col    int  2-7
>  testtype   string (~10 different short strings)
>  vdd        int  0-25
>  bit_col    int  0-127
>  bit_row    int  0-511
>  value      bit  0 or 1
>
> with 4 billion(4e9) rows.  I would guess to make wafer, die_row, etc. be of
> type "char", probably testtype a char too with a separate testtype lookup table.
> Even so, this will be a huge table.

How many records per day will be inserted?

Will they ever be updated?

Do you have to have *ALL* 4 billion records in the same table at the
same time?  As Josh Berkus mentioned, wafer thru bit_col can be
converted to INT2, if you make testtype use a lookup table; thus, each
tuple could be shrunk to 20 bytes, plus 24 bytes per tuple (in v7.3)
that would make the table a minimum of 189 billion bytes, not
including index!!!

Rethink your solution...

One possibility would to have a set of tables, with names like:
TEST_DATA_200301
TEST_DATA_200302
TEST_DATA_200303
TEST_DATA_200304
TEST_DATA_200305
TEST_DATA_200306
TEST_DATA_200307
TEST_DATA_<etc>

Then, each month do "CREATE VIEW TEST_DATA AS TEST_DATA_yyyymm" for the
current month.


> Questions: How much overhead will there be in the table in addition to the
> 9 bytes of data I see?  How big will the primary index on the first seven columns
> be?  Will this schema work at all?
>
> Of course, we could pack 128 bits into an 8 byte "text" field (or should we use bit(128)?),
> but lose some ease of use, especially for naive (but important) users.
>
> Comments, suggestions?
>
> -- George
>
--
+---------------------------------------------------------------+
| Ron Johnson, Jr.        mailto:ron.l.johnson@cox.net          |
| Jefferson, LA  USA      http://members.cox.net/ron.l.johnson  |
|                                                               |
| "My advice to you is to get married: If you find a good wife, |
| you will be happy; if not, you will become a philosopher."    |
|    Socrates                                                   |
+---------------------------------------------------------------+


Re: 4G row table?

From
Manfred Koizar
Date:
On Thu, 19 Dec 2002 14:10:58 -0500, george young <gry@ll.mit.edu>
wrote:
>with 4 billion(4e9) rows.
>How big will the primary index on the first seven columns be?

If you manage to pack the key into 8 bytes (by using a custom 1 byte
integer datatype) and if there are no NULLs:

  75 GB with a 100% fill factor,
 114 GB with a  66% fill factor,
realistically something in between.  Note that frequent updates can
cause index growth.

>Will this schema work at all?

You have a somewhat unusual identifier : payload ratio (8B : 1b).  It
depends on the planned use, but I'm not sure if *any* database is the
right solution.  You have "only" 30670848000 (30G) possible different
key combinations, more than 1/8 of them (4G) are actually used.  A
7-dimensional array of double-bits (1 bit to indicate a valid value
and 1 bit payload) would require not more than 8 GB.

If you plan to use a database because you have to answer ad-hoc
queries, you will almost certainly need additonal indices.

Servus
 Manfred

Re: 4G row table?

From
"Josh Berkus"
Date:
Charlie,

> Why do you say to expect slow performance on this hardware?   Is
> there something specific about the configuration that worries you?
>  Or, just lots of data in the database, so the data will be on disk
> and not in the cache (system or postgresql)?
> What do you classify as *slow*?  Obviously, he is dependent on the
> I/O channel given the size of the tables.  So, good indexing will be
> required to help on the queries.  No comments on the commit rate for
> this data (I am guessing that it is slow, given the description of
> the database), so that may or may not be an issue.
> Depending on the type of queries, perhaps clustering will help, along
> with some good partitioning indexes.
> I just don't see the slow in the hardware.  Of course, if he is
> targeting lots of concurrent queries, better add some additional
> processors, or better yet, use ERSERVER and replicate the data to a
> farm of machines.  [To avoid the I/O bottleneck of lots of concurrent
> queries against these large tables].
>
> I guess there are a lot of assumptions on the data's use to decide if
> the hardware is adequate or not :-)

Well, slow is relative.   It may be fast enough for him.  Me, I'd be
screaming in frustration.

Take, for example, an index scan on the primary key.  Assuming that he
can get the primary key down to 12 bytes per node using custom data
types, that's still:

12bytes * 4,000,000,000 rows = 48 GB for the index

As you can see, it's utterly impossible for him to load even the
primary key index into his 512 MB of RAM (of which no more than 200mb
can go to Postgres anyway without risking conflicts over RAM).   A
Sort-and-Limit on the primary key, for example, would require swapping
the index from RAM to swap space as much as 480 times! (though probably
more like 100 times on average)

With a slow RAID array and the hardware he described to us, this would
mean, most likely, that a simple sort-and-limit on primary key query
could take hours to execute.  Even with really fast disk access, we're
talking tens of minutes at least.

-Josh Berkus