Thread: Postgres table size

Postgres table size

From
SHARMILA JOTHIRAJAH
Date:
Hi
I have a table with  29384048 records  in oracle and postgresql. The table has 47 columns (16 numeric and 27 varchar and the rest timestamp). The tablesize in postgresql is twice as much than the tablesize in oracle (for the same number of rows and columns). There are no updates or deletes in this table. It is a test table that is used only for querying. The tables are vacuumed regularly

Even a simple seqscan query takes twice as much time in postgres than in oracle.
Does postgresql generally occupy more space than oracle tables?
Thanks
Sharmila


Get easy, one-click access to your favorites. Make Yahoo! your homepage.

Re: Postgres table size

From
Reg Me Please
Date:
Il Tuesday 13 November 2007 17:36:30 SHARMILA JOTHIRAJAH ha scritto:
> Hi
> I have a table with  29384048 records  in oracle and postgresql. The table
> has 47 columns (16 numeric and 27 varchar and the rest timestamp). The
> tablesize in postgresql is twice as much than the tablesize in oracle (for
> the same number of rows and columns). There are no updates or deletes in
> this table. It is a test table that is used only for querying. The tables
> are vacuumed regularly
>
> Even a simple seqscan query takes twice as much time in postgres than in
> oracle. Does postgresql generally occupy more space than oracle tables?
> Thanks
> Sharmila

This's an interesting point fore sure as far as the data types for the two
table are comparable.

If this yelds true, the more space an RDBMS occupies, the slower the access.
I think.


--
Reg me Please

Re: Postgres table size

From
Jeff Davis
Date:
On Tue, 2007-11-13 at 08:36 -0800, SHARMILA JOTHIRAJAH wrote:
> Hi
> I have a table with  29384048 records  in oracle and postgresql. The
> table has 47 columns (16 numeric and 27 varchar and the rest
> timestamp). The tablesize in postgresql is twice as much than the
> tablesize in oracle (for the same number of rows and columns). There
> are no updates or deletes in this table. It is a test table that is
> used only for querying. The tables are vacuumed regularly
>
> Even a simple seqscan query takes twice as much time in postgres than
> in oracle.
> Does postgresql generally occupy more space than oracle tables?
> Thanks
> Sharmila

PostgreSQL generally does occupy slightly more space. However, your case
is extreme due to the number of columns in the table. In PostgreSQL 8.2
and before, it would store a full 4 byte length header for every
variable-width field (which is any text type).

8.3 (currently in beta) will substantially reduce this overhead, usually
just storing one byte of overhead for every variable-width field (saving
3 bytes), and also reducing the per-row overhead by either 4 or 8 bytes
(depending on platform).

My quick calculations show that you could save up to (47*3 + 8)*29384048
= 4378223152. So you might save up to 4GB with 8.3! It would be fairly
easy for you to check for yourself exactly how much by downloading the
beta.

Regards,
    Jeff Davis



Re: Postgres table size

From
SHARMILA JOTHIRAJAH
Date:
Hi
Thanks. Ill post it on the mailing list when I get the results. Im trying to calculate and see how the tablesize works for a simple table.

I have a table with 10 cols
5 varchars _ it is declared as varchar(40) but contains data of length 3
5 numeric - declared as numeric(22,0) but contains data of precision 10
There are 10000 rows

select * from pg_relation_size gives 1548288 bytes as the table's size
select relpages from pg_class for that table gives 189 pages

Calculation
varchar = (overhead) 4 + (actual length of string) 3  = 7 bytes
*for 5 varchar cols                                             = 5*7  = 35 bytes

numeric   (according to manual--- The actual storage requirement is two bytes for each group of four decimal digits, plus eight bytes overhead )
numeric = ( 10/4)*2 +8  = 13 bytes
*for 5 numeric cols         = 13 *5 = 65

*row overhead                = 32

So
Bytes per row = 35 + 65 +32 = 132 bytes
for 10000 rows                      = 1320000

The manual says ---The first 20 bytes of each page consists of a page header (PageHeaderData).
There are 189 pages
*page header cost = 20*189 = 3780 bytes

Therefore Total = 1320000 (row cost) + 3780 (page header cost)=1323780 bytes for this table

Is this calculation right? But the size of the table according to pg_relation_size is 1548288 bytes

What am I missing in my calculation?

Thanks for your help.

Thanks
sharmila

----- Original Message ----
From: Jeff Davis <pgsql@j-davis.com>
To: SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com>
Sent: Friday, November 16, 2007 2:12:46 PM
Subject: Re: [GENERAL] Postgres table size

On Fri, 2007-11-16 at 07:36 -0800, SHARMILA JOTHIRAJAH wrote:
> Hi,
>
> You were right. I installed beta2 and the table size now is
> 4682817536. Thanks
>
> How do you estimate the table size generally., ie.,
> what is the storage size of varchar, numeric and timestamp columns and
> What is the row overhead?
>
> For example, If I have a table (in postgres8.2.3 version) with 10 rows
> and 3 columns  (varchar,numeric(22,0),timestamp) , how do I estimate
> their storage size. How does 8.3Beta-2 handle this?
> this will be very helpful for me for allocating the space properly
> Thanks again for your help
> sharmila
>

That's good news. Please post to pgsql-advocacy to show the reduction in
table size that 8.3 brings for you, and the performance difference that
means for you. If it helps you to choose PostgreSQL instead of some
other database that would be great to mention too (however, some
databases don't like you to publish benchmarks, so be careful not to
violate your license).

The official docs are here:
http://developer.postgresql.org/pgdocs/postgres/storage.html  (8.3)
http://www.postgresql.org/docs/8.2/static/storage.html (8.2)

The way I think about it is simple:

In 8.2:
* 32 bytes of row overhead: 28 byte row header + 4 byte line pointer
* 4 bytes of overhead for every variable-width type: to store length

In 8.3:
* 28 bytes of row overhead: 24 byte row header + 4 byte line pointer
* 1-4 bytes of overhead for every variable-width type: only one byte of
overhead if length < 127 bytes, up to 4 bytes if it is longer.

Regards,
    Jeff Davis




Be a better sports nut! Let your teams follow you with Yahoo Mobile. Try it now.

Re: Postgres table size

From
Erik Jones
Date:
On Nov 21, 2007, at 9:17 AM, SHARMILA JOTHIRAJAH wrote:

> Hi
> Thanks. Ill post it on the mailing list when I get the results. Im
> trying to calculate and see how the tablesize works for a simple
> table.
>
> I have a table with 10 cols
> 5 varchars _ it is declared as varchar(40) but contains data of
> length 3
> 5 numeric - declared as numeric(22,0) but contains data of
> precision 10
> There are 10000 rows
>
> select * from pg_relation_size gives 1548288 bytes as the table's size
> select relpages from pg_class for that table gives 189 pages
>
> Calculation
> varchar = (overhead) 4 + (actual length of string) 3  = 7 bytes
> *for 5 varchar cols                                             =
> 5*7  = 35 bytes
>
> numeric   (according to manual--- The actual storage requirement is
> two bytes for each group of four decimal digits, plus eight bytes
> overhead )
> numeric = ( 10/4)*2 +8  = 13 bytes
> *for 5 numeric cols         = 13 *5 = 65

Not that it will make much difference, but you need to round up in
the 10/4 part so you get 14 bytes, not 13.
>
> *row overhead                = 32
>
> So
> Bytes per row = 35 + 65 +32 = 132 bytes
> for 10000 rows                      = 1320000
>
> The manual says ---The first 20 bytes of each page consists of a
> page header (PageHeaderData).
> There are 189 pages
> *page header cost = 20*189 = 3780 bytes
>
> Therefore Total = 1320000 (row cost) + 3780 (page header cost)
> =1323780 bytes for this table
>
> Is this calculation right? But the size of the table according to
> pg_relation_size is 1548288 bytes
>
> What am I missing in my calculation?

One thing to note is that the relpages value is only exact from the
time of a vacuum until the next dml statement on the table, i.e.
relpages is not updated after inserts, update, and deletes, just
after vacuums.

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



Re: Postgres table size

From
SHARMILA JOTHIRAJAH
Date:
I vacuumed the table before I got this relpages count. When i make the change tat you mentioned the total table size as per my calculation is 1373780 and the result thro pg_relation_size is 1548288.
What other overheads are there for the table? Im not sure how to find the free space? But I didnot delete/insert/update any rows after the first insertion.

Thanks
sharmila

----- Original Message ----
From: Erik Jones <erik@myemma.com>
To: SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com>
Cc: pgsql-general@postgresql.org
Sent: Wednesday, November 21, 2007 11:38:44 AM
Subject: Re: [GENERAL] Postgres table size

On Nov 21, 2007, at 9:17 AM, SHARMILA JOTHIRAJAH wrote:

> Hi
> Thanks. Ill post it on the mailing list when I get the results. Im 
> trying to calculate and see how the tablesize works for a simple 
> table.
>
> I have a table with 10 cols
> 5 varchars _ it is declared as varchar(40) but contains data of 
> length 3
> 5 numeric - declared as numeric(22,0) but contains data of 
> precision 10
> There are 10000 rows
>
> select * from pg_relation_size gives 1548288 bytes as the table's size
> select relpages from pg_class for that table gives 189 pages
>
> Calculation
> varchar = (overhead) 4 + (actual length of string) 3  = 7 bytes
> *for 5 varchar cols                                            = 
> 5*7  = 35 bytes
>
> numeric  (according to manual--- The actual storage requirement is 
> two bytes for each group of four decimal digits, plus eight bytes 
> overhead )
> numeric = ( 10/4)*2 +8  = 13 bytes
> *for 5 numeric cols        = 13 *5 = 65

Not that it will make much difference, but you need to round up in 
the 10/4 part so you get 14 bytes, not 13.
>
> *row overhead                = 32
>
> So
> Bytes per row = 35 + 65 +32 = 132 bytes
> for 10000 rows                      = 1320000
>
> The manual says ---The first 20 bytes of each page consists of a 
> page header (PageHeaderData).
> There are 189 pages
> *page header cost = 20*189 = 3780 bytes
>
> Therefore Total = 1320000 (row cost) + 3780 (page header cost)
> =1323780 bytes for this table
>
> Is this calculation right? But the size of the table according to 
> pg_relation_size is 1548288 bytes
>
> What am I missing in my calculation?

One thing to note is that the relpages value is only exact from the 
time of a vacuum until the next dml statement on the table, i.e. 
relpages is not updated after inserts, update, and deletes, just 
after vacuums.

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster



Get easy, one-click access to your favorites. Make Yahoo! your homepage.

Re: Postgres table size

From
Tom Lane
Date:
Erik Jones <erik@myemma.com> writes:
> On Nov 21, 2007, at 9:17 AM, SHARMILA JOTHIRAJAH wrote:
>> Calculation
>> varchar = (overhead) 4 + (actual length of string) 3  = 7 bytes
>> *for 5 varchar cols                                             =
>> 5*7  = 35 bytes
>>
>> numeric   (according to manual--- The actual storage requirement is
>> two bytes for each group of four decimal digits, plus eight bytes
>> overhead )
>> numeric = ( 10/4)*2 +8  = 13 bytes
>> *for 5 numeric cols         = 13 *5 = 65

> Not that it will make much difference, but you need to round up in
> the 10/4 part so you get 14 bytes, not 13.

Also, this calculation is ignoring the fact that (pre-8.3) varlena
values have to be int-aligned, so there's wasted pad space too.
The varchar values really need 8 bytes each, and the numeric values
16, so the actual data payload in each row is 120 bytes.  Then add
HeapTupleHeader (28 bytes), and then MAXALIGN the whole row size.
So the rowsize would be either 148 or 152 bytes depending on if you
were on a machine with 8-byte MAXALIGN.  Then add the per-row item
pointer, giving total per-row space of 152 or 156 bytes.  That
means you can fit either 53 or 52 rows per page, giving either 188
or 192 pages as the minimum possible file size.  Evidently, Sharmila
is using a MAXALIGN=4 machine and has a few dead rows in there.

            regards, tom lane

Re: Postgres table size

From
SHARMILA JOTHIRAJAH
Date:


>>> Calculation
>> >varchar = (overhead) 4 + (actual length of string) 3  = 7 bytes
>> >*for 5 varchar cols                                            = 
>> >5*7  = 35 bytes
>> >
>> >numeric  (according to manual--- The actual storage requirement is  
>> >two bytes for each group of four decimal digits, plus eight bytes 
>>> overhead )
>>> numeric = ( 10/4)*2 +8  = 13 bytes
>>> *for 5 numeric cols        = 13 *5 = 65

> >Not that it will make much difference, but you need to round up in 
> >the 10/4 part so you get 14 bytes, not 13.

>Also, this calculation is ignoring the fact that (pre-8.3) varlena
>values have to be int-aligned, so there's wasted pad space too.
>The varchar values really need 8 bytes each, and the numeric values
>16, so the actual data payload in each row is 120 bytes.  Then add
>HeapTupleHeader (28 bytes), and then MAXALIGN the whole row size.
>So the rowsize would be either 148 or 152 bytes depending on if you
>were on a machine with 8-byte MAXALIGN.  Then add the per-row item
>pointer, giving total per-row space of 152 or 156 bytes.  That
>means you can fit either 53 or 52 rows per page, giving either 188
>or 192 pages as the minimum possible file size.  Evidently, Sharmila
>is using a MAXALIGN=4 machine and has a few dead rows in there.

Thanks for the detailed explanation Tom. You are rigt...The number of pages for this tables as per pg_class is 189.

I have a few questions
1. How do you find  the MAXALIGN of the machine? And what is that used for?
2. How does null columns account for this space. For example, if I have the same table (10 cols --5 varchars and 5 numerics) in which 8 cols are null and 2 cols(1 varchar and 1 num) are not null, how is the storage space affected for this case?

Thanks again
sharmila



Never miss a thing. Make Yahoo your homepage.

Re: Postgres table size

From
Tom Lane
Date:
SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com> writes:
> 1. How do you find  the MAXALIGN of the machine? And what is that used for?

pg_controldata will show "maximum data alignment".  A rule of thumb is
that it's 4 on 32-bit machines and 8 on 64-bit machines, but there are
exceptions.

> 2. How does null columns account for this space.

If there are any nulls in a row then you pay for a null bitmap with 1
bit/column, but the null columns themselves aren't stored and hence
take zero space.  In your example the bitmap needs 10 bits, but after
allowing for alignment the effect is that the heap tuple header gets
4 bytes bigger if there's any nulls.

            regards, tom lane

Re: Postgres table size

From
SHARMILA JOTHIRAJAH
Date:
Thanks Tom and Erik and all the others who helped.
You guys really rock!!!
Sharmila

----- Original Message ----
From: Tom Lane <tgl@sss.pgh.pa.us>
To: SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com>
Cc: Erik Jones <erik@myemma.com>; pgsql-general@postgresql.org
Sent: Wednesday, November 21, 2007 1:14:02 PM
Subject: Re: [GENERAL] Postgres table size

SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com> writes:
> 1. How do you find  the MAXALIGN of the machine? And what is that used for?

pg_controldata will show "maximum data alignment".  A rule of thumb is
that it's 4 on 32-bit machines and 8 on 64-bit machines, but there are
exceptions.

> 2. How does null columns account for this space.

If there are any nulls in a row then you pay for a null bitmap with 1
bit/column, but the null columns themselves aren't stored and hence
take zero space.  In your example the bitmap needs 10 bits, but after
allowing for alignment the effect is that the heap tuple header gets
4 bytes bigger if there's any nulls.

            regards, tom lane



Be a better sports nut! Let your teams follow you with Yahoo Mobile. Try it now.

Re: Postgres table size

From
Martijn van Oosterhout
Date:
On Wed, Nov 21, 2007 at 09:51:22AM -0800, SHARMILA JOTHIRAJAH wrote:
> Thanks for the detailed explanation Tom. You are rigt...The number of pages for this tables as per pg_class is 189.
>
> I have a few questions
> 1. How do you find  the MAXALIGN of the machine? And what is that used for?

MAXALIGN is the preferred alignment for structure objects, iirc. It's
generally less efficient to access unaligned objects than aligned ones,
though this is architechure specific. You should be able to find it in
your pg_config.h under MAXIMUM_ALIGNOF,.

> 2. How does null columns account for this space. For example, if I
> have the same table (10 cols --5 varchars and 5 numerics) in which 8
> cols are null and 2 cols(1 varchar and 1 num) are not null, how is
> the storage space affected for this case?

The cost is typically one bit per column in the table, unless there are
no NULLs in which case it costs nothing. 10 cols = 2 bytes, which may
or may not be swallowed by alignment.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Those who make peaceful revolution impossible will make violent revolution inevitable.
>  -- John F Kennedy

Attachment