Thread: database size estimates

database size estimates

From
Francois Deliege
Date:
Hi,

I am trying to estimate the size of a table composed of 51754000 rows.
Each row has 31 attributes: 16 x bit(24) and 15 x bit(8)

So, the payload should be:
51754000 x ( 16 x 24 + 15 x 24 ) bits = 3115 MB

Now, from what I understand from postgresql manual is that the overhead
is composed of
32 bytes per row + 20 bytes per page.
This leads me to approx. 1700 MB overhead.
Therefore a total table size of 4900 MB.

However, when I load the table, the table reaches the size of 21500 MB,
i.e., 400% of my estimate.
The table has no toast table, no index.
So I am wondering if someone could give me a better estimate.

Cheers,

Francois

Attachment

Re: database size estimates

From
Heikki Linnakangas
Date:
Francois Deliege wrote:
> Hi,
> 
> I am trying to estimate the size of a table composed of 51754000 rows.
> Each row has 31 attributes: 16 x bit(24) and 15 x bit(8)
> 
> So, the payload should be:
> 51754000 x ( 16 x 24 + 15 x 24 ) bits = 3115 MB

What data types are those exactly? If those 24-bit fields are in fact 
text, varchar, char(x) or other data types that are stored as variable 
length fields, the varlen header will take 4 bytes. And then there's 
alignment, those 24-bit fields are most almost certainly 4-byte aligned, 
which means that there'll be one byte of padding between them.

The upcoming 8.3 release will be much better in that respect, It'll use 
just a 1 byte varlen header per field instead of 4 bytes for small 
values like yours. You might want to test a CVS snapshot.

> Now, from what I understand from postgresql manual is that the overhead 
> is composed of
> 32 bytes per row + 20 bytes per page.
> This leads me to approx. 1700 MB overhead.
> Therefore a total table size of 4900 MB.

In addition, there will be on average 1/2 rows worth of wasted space on 
every page.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: database size estimates

From
Gregory Stark
Date:
"Heikki Linnakangas" <heikki@enterprisedb.com> writes:

> Francois Deliege wrote:
>> Hi,
>>
>> I am trying to estimate the size of a table composed of 51754000 rows.
>> Each row has 31 attributes: 16 x bit(24) and 15 x bit(8)
>>
>> So, the payload should be:
>> 51754000 x ( 16 x 24 + 15 x 24 ) bits = 3115 MB
>
> What data types are those exactly? If those 24-bit fields are in fact text,
> varchar, char(x) or other data types that are stored as variable length fields,

And sadly that includes bit() if you're being literal.

As of Postgres 8.1 you can see how much space a column is taking up using the
pg_column_size() function. This won't include alignment padding but will
include the length header for that column.

You can see how much a given row is taking up by passing the entire row to
og_column_size with something like pg_column_size(tab.*)

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com