Thread: Postgres table size
Hi
I have a table with 29384048 records in oracle and postgresql. The table has 47 columns (16 numeric and 27 varchar and the rest timestamp). The tablesize in postgresql is twice as much than the tablesize in oracle (for the same number of rows and columns). There are no updates or deletes in this table. It is a test table that is used only for querying. The tables are vacuumed regularly
Even a simple seqscan query takes twice as much time in postgres than in oracle.
Does postgresql generally occupy more space than oracle tables?
Thanks
Sharmila
I have a table with 29384048 records in oracle and postgresql. The table has 47 columns (16 numeric and 27 varchar and the rest timestamp). The tablesize in postgresql is twice as much than the tablesize in oracle (for the same number of rows and columns). There are no updates or deletes in this table. It is a test table that is used only for querying. The tables are vacuumed regularly
Even a simple seqscan query takes twice as much time in postgres than in oracle.
Does postgresql generally occupy more space than oracle tables?
Thanks
Sharmila
Get easy, one-click access to your favorites. Make Yahoo! your homepage.
Il Tuesday 13 November 2007 17:36:30 SHARMILA JOTHIRAJAH ha scritto: > Hi > I have a table with 29384048 records in oracle and postgresql. The table > has 47 columns (16 numeric and 27 varchar and the rest timestamp). The > tablesize in postgresql is twice as much than the tablesize in oracle (for > the same number of rows and columns). There are no updates or deletes in > this table. It is a test table that is used only for querying. The tables > are vacuumed regularly > > Even a simple seqscan query takes twice as much time in postgres than in > oracle. Does postgresql generally occupy more space than oracle tables? > Thanks > Sharmila This's an interesting point fore sure as far as the data types for the two table are comparable. If this yelds true, the more space an RDBMS occupies, the slower the access. I think. -- Reg me Please
On Tue, 2007-11-13 at 08:36 -0800, SHARMILA JOTHIRAJAH wrote: > Hi > I have a table with 29384048 records in oracle and postgresql. The > table has 47 columns (16 numeric and 27 varchar and the rest > timestamp). The tablesize in postgresql is twice as much than the > tablesize in oracle (for the same number of rows and columns). There > are no updates or deletes in this table. It is a test table that is > used only for querying. The tables are vacuumed regularly > > Even a simple seqscan query takes twice as much time in postgres than > in oracle. > Does postgresql generally occupy more space than oracle tables? > Thanks > Sharmila PostgreSQL generally does occupy slightly more space. However, your case is extreme due to the number of columns in the table. In PostgreSQL 8.2 and before, it would store a full 4 byte length header for every variable-width field (which is any text type). 8.3 (currently in beta) will substantially reduce this overhead, usually just storing one byte of overhead for every variable-width field (saving 3 bytes), and also reducing the per-row overhead by either 4 or 8 bytes (depending on platform). My quick calculations show that you could save up to (47*3 + 8)*29384048 = 4378223152. So you might save up to 4GB with 8.3! It would be fairly easy for you to check for yourself exactly how much by downloading the beta. Regards, Jeff Davis
Hi
Thanks. Ill post it on the mailing list when I get the results. Im trying to calculate and see how the tablesize works for a simple table.
I have a table with 10 cols
5 varchars _ it is declared as varchar(40) but contains data of length 3
5 numeric - declared as numeric(22,0) but contains data of precision 10
There are 10000 rows
select * from pg_relation_size gives 1548288 bytes as the table's size
select relpages from pg_class for that table gives 189 pages
Calculation
varchar = (overhead) 4 + (actual length of string) 3 = 7 bytes
*for 5 varchar cols = 5*7 = 35 bytes
numeric (according to manual--- The actual storage requirement is two bytes for each group of four decimal digits, plus eight bytes overhead )
numeric = ( 10/4)*2 +8 = 13 bytes
*for 5 numeric cols = 13 *5 = 65
*row overhead = 32
So
Bytes per row = 35 + 65 +32 = 132 bytes
for 10000 rows = 1320000
The manual says ---The first 20 bytes of each page consists of a page header (PageHeaderData).
There are 189 pages
*page header cost = 20*189 = 3780 bytes
Therefore Total = 1320000 (row cost) + 3780 (page header cost)=1323780 bytes for this table
Is this calculation right? But the size of the table according to pg_relation_size is 1548288 bytes
What am I missing in my calculation?
Thanks for your help.
Thanks
sharmila
Thanks. Ill post it on the mailing list when I get the results. Im trying to calculate and see how the tablesize works for a simple table.
I have a table with 10 cols
5 varchars _ it is declared as varchar(40) but contains data of length 3
5 numeric - declared as numeric(22,0) but contains data of precision 10
There are 10000 rows
select * from pg_relation_size gives 1548288 bytes as the table's size
select relpages from pg_class for that table gives 189 pages
Calculation
varchar = (overhead) 4 + (actual length of string) 3 = 7 bytes
*for 5 varchar cols = 5*7 = 35 bytes
numeric (according to manual--- The actual storage requirement is two bytes for each group of four decimal digits, plus eight bytes overhead )
numeric = ( 10/4)*2 +8 = 13 bytes
*for 5 numeric cols = 13 *5 = 65
*row overhead = 32
So
Bytes per row = 35 + 65 +32 = 132 bytes
for 10000 rows = 1320000
The manual says ---The first 20 bytes of each page consists of a page header (PageHeaderData).
There are 189 pages
*page header cost = 20*189 = 3780 bytes
Therefore Total = 1320000 (row cost) + 3780 (page header cost)=1323780 bytes for this table
Is this calculation right? But the size of the table according to pg_relation_size is 1548288 bytes
What am I missing in my calculation?
Thanks for your help.
Thanks
sharmila
----- Original Message ----
From: Jeff Davis <pgsql@j-davis.com>
To: SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com>
Sent: Friday, November 16, 2007 2:12:46 PM
Subject: Re: [GENERAL] Postgres table size
On Fri, 2007-11-16 at 07:36 -0800, SHARMILA JOTHIRAJAH wrote:
> Hi,
>
> You were right. I installed beta2 and the table size now is
> 4682817536. Thanks
>
> How do you estimate the table size generally., ie.,
> what is the storage size of varchar, numeric and timestamp columns and
> What is the row overhead?
>
> For example, If I have a table (in postgres8.2.3 version) with 10 rows
> and 3 columns (varchar,numeric(22,0),timestamp) , how do I estimate
> their storage size. How does 8.3Beta-2 handle this?
> this will be very helpful for me for allocating the space properly
> Thanks again for your help
> sharmila
>
That's good news. Please post to pgsql-advocacy to show the reduction in
table size that 8.3 brings for you, and the performance difference that
means for you. If it helps you to choose PostgreSQL instead of some
other database that would be great to mention too (however, some
databases don't like you to publish benchmarks, so be careful not to
violate your license).
The official docs are here:
http://developer.postgresql.org/pgdocs/postgres/storage.html (8.3)
http://www.postgresql.org/docs/8.2/static/storage.html (8.2)
The way I think about it is simple:
In 8.2:
* 32 bytes of row overhead: 28 byte row header + 4 byte line pointer
* 4 bytes of overhead for every variable-width type: to store length
In 8.3:
* 28 bytes of row overhead: 24 byte row header + 4 byte line pointer
* 1-4 bytes of overhead for every variable-width type: only one byte of
overhead if length < 127 bytes, up to 4 bytes if it is longer.
Regards,
Jeff Davis
From: Jeff Davis <pgsql@j-davis.com>
To: SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com>
Sent: Friday, November 16, 2007 2:12:46 PM
Subject: Re: [GENERAL] Postgres table size
On Fri, 2007-11-16 at 07:36 -0800, SHARMILA JOTHIRAJAH wrote:
> Hi,
>
> You were right. I installed beta2 and the table size now is
> 4682817536. Thanks
>
> How do you estimate the table size generally., ie.,
> what is the storage size of varchar, numeric and timestamp columns and
> What is the row overhead?
>
> For example, If I have a table (in postgres8.2.3 version) with 10 rows
> and 3 columns (varchar,numeric(22,0),timestamp) , how do I estimate
> their storage size. How does 8.3Beta-2 handle this?
> this will be very helpful for me for allocating the space properly
> Thanks again for your help
> sharmila
>
That's good news. Please post to pgsql-advocacy to show the reduction in
table size that 8.3 brings for you, and the performance difference that
means for you. If it helps you to choose PostgreSQL instead of some
other database that would be great to mention too (however, some
databases don't like you to publish benchmarks, so be careful not to
violate your license).
The official docs are here:
http://developer.postgresql.org/pgdocs/postgres/storage.html (8.3)
http://www.postgresql.org/docs/8.2/static/storage.html (8.2)
The way I think about it is simple:
In 8.2:
* 32 bytes of row overhead: 28 byte row header + 4 byte line pointer
* 4 bytes of overhead for every variable-width type: to store length
In 8.3:
* 28 bytes of row overhead: 24 byte row header + 4 byte line pointer
* 1-4 bytes of overhead for every variable-width type: only one byte of
overhead if length < 127 bytes, up to 4 bytes if it is longer.
Regards,
Jeff Davis
Be a better sports nut! Let your teams follow you with Yahoo Mobile. Try it now.
On Nov 21, 2007, at 9:17 AM, SHARMILA JOTHIRAJAH wrote: > Hi > Thanks. Ill post it on the mailing list when I get the results. Im > trying to calculate and see how the tablesize works for a simple > table. > > I have a table with 10 cols > 5 varchars _ it is declared as varchar(40) but contains data of > length 3 > 5 numeric - declared as numeric(22,0) but contains data of > precision 10 > There are 10000 rows > > select * from pg_relation_size gives 1548288 bytes as the table's size > select relpages from pg_class for that table gives 189 pages > > Calculation > varchar = (overhead) 4 + (actual length of string) 3 = 7 bytes > *for 5 varchar cols = > 5*7 = 35 bytes > > numeric (according to manual--- The actual storage requirement is > two bytes for each group of four decimal digits, plus eight bytes > overhead ) > numeric = ( 10/4)*2 +8 = 13 bytes > *for 5 numeric cols = 13 *5 = 65 Not that it will make much difference, but you need to round up in the 10/4 part so you get 14 bytes, not 13. > > *row overhead = 32 > > So > Bytes per row = 35 + 65 +32 = 132 bytes > for 10000 rows = 1320000 > > The manual says ---The first 20 bytes of each page consists of a > page header (PageHeaderData). > There are 189 pages > *page header cost = 20*189 = 3780 bytes > > Therefore Total = 1320000 (row cost) + 3780 (page header cost) > =1323780 bytes for this table > > Is this calculation right? But the size of the table according to > pg_relation_size is 1548288 bytes > > What am I missing in my calculation? One thing to note is that the relpages value is only exact from the time of a vacuum until the next dml statement on the table, i.e. relpages is not updated after inserts, update, and deletes, just after vacuums. Erik Jones Software Developer | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
I vacuumed the table before I got this relpages count. When i make the change tat you mentioned the total table size as per my calculation is 1373780 and the result thro pg_relation_size is 1548288.
What other overheads are there for the table? Im not sure how to find the free space? But I didnot delete/insert/update any rows after the first insertion.
Thanks
sharmila
What other overheads are there for the table? Im not sure how to find the free space? But I didnot delete/insert/update any rows after the first insertion.
Thanks
sharmila
----- Original Message ----
From: Erik Jones <erik@myemma.com>
To: SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com>
Cc: pgsql-general@postgresql.org
Sent: Wednesday, November 21, 2007 11:38:44 AM
Subject: Re: [GENERAL] Postgres table size
On Nov 21, 2007, at 9:17 AM, SHARMILA JOTHIRAJAH wrote:
> Hi
> Thanks. Ill post it on the mailing list when I get the results. Im
> trying to calculate and see how the tablesize works for a simple
> table.
>
> I have a table with 10 cols
> 5 varchars _ it is declared as varchar(40) but contains data of
> length 3
> 5 numeric - declared as numeric(22,0) but contains data of
> precision 10
> There are 10000 rows
>
> select * from pg_relation_size gives 1548288 bytes as the table's size
> select relpages from pg_class for that table gives 189 pages
>
> Calculation
> varchar = (overhead) 4 + (actual length of string) 3 = 7 bytes
> *for 5 varchar cols =
> 5*7 = 35 bytes
>
> numeric (according to manual--- The actual storage requirement is
> two bytes for each group of four decimal digits, plus eight bytes
> overhead )
> numeric = ( 10/4)*2 +8 = 13 bytes
> *for 5 numeric cols = 13 *5 = 65
Not that it will make much difference, but you need to round up in
the 10/4 part so you get 14 bytes, not 13.
>
> *row overhead = 32
>
> So
> Bytes per row = 35 + 65 +32 = 132 bytes
> for 10000 rows = 1320000
>
> The manual says ---The first 20 bytes of each page consists of a
> page header (PageHeaderData).
> There are 189 pages
> *page header cost = 20*189 = 3780 bytes
>
> Therefore Total = 1320000 (row cost) + 3780 (page header cost)
> =1323780 bytes for this table
>
> Is this calculation right? But the size of the table according to
> pg_relation_size is 1548288 bytes
>
> What am I missing in my calculation?
One thing to note is that the relpages value is only exact from the
time of a vacuum until the next dml statement on the table, i.e.
relpages is not updated after inserts, update, and deletes, just
after vacuums.
Erik Jones
Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)
Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
From: Erik Jones <erik@myemma.com>
To: SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com>
Cc: pgsql-general@postgresql.org
Sent: Wednesday, November 21, 2007 11:38:44 AM
Subject: Re: [GENERAL] Postgres table size
On Nov 21, 2007, at 9:17 AM, SHARMILA JOTHIRAJAH wrote:
> Hi
> Thanks. Ill post it on the mailing list when I get the results. Im
> trying to calculate and see how the tablesize works for a simple
> table.
>
> I have a table with 10 cols
> 5 varchars _ it is declared as varchar(40) but contains data of
> length 3
> 5 numeric - declared as numeric(22,0) but contains data of
> precision 10
> There are 10000 rows
>
> select * from pg_relation_size gives 1548288 bytes as the table's size
> select relpages from pg_class for that table gives 189 pages
>
> Calculation
> varchar = (overhead) 4 + (actual length of string) 3 = 7 bytes
> *for 5 varchar cols =
> 5*7 = 35 bytes
>
> numeric (according to manual--- The actual storage requirement is
> two bytes for each group of four decimal digits, plus eight bytes
> overhead )
> numeric = ( 10/4)*2 +8 = 13 bytes
> *for 5 numeric cols = 13 *5 = 65
Not that it will make much difference, but you need to round up in
the 10/4 part so you get 14 bytes, not 13.
>
> *row overhead = 32
>
> So
> Bytes per row = 35 + 65 +32 = 132 bytes
> for 10000 rows = 1320000
>
> The manual says ---The first 20 bytes of each page consists of a
> page header (PageHeaderData).
> There are 189 pages
> *page header cost = 20*189 = 3780 bytes
>
> Therefore Total = 1320000 (row cost) + 3780 (page header cost)
> =1323780 bytes for this table
>
> Is this calculation right? But the size of the table according to
> pg_relation_size is 1548288 bytes
>
> What am I missing in my calculation?
One thing to note is that the relpages value is only exact from the
time of a vacuum until the next dml statement on the table, i.e.
relpages is not updated after inserts, update, and deletes, just
after vacuums.
Erik Jones
Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)
Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
Get easy, one-click access to your favorites. Make Yahoo! your homepage.
Erik Jones <erik@myemma.com> writes: > On Nov 21, 2007, at 9:17 AM, SHARMILA JOTHIRAJAH wrote: >> Calculation >> varchar = (overhead) 4 + (actual length of string) 3 = 7 bytes >> *for 5 varchar cols = >> 5*7 = 35 bytes >> >> numeric (according to manual--- The actual storage requirement is >> two bytes for each group of four decimal digits, plus eight bytes >> overhead ) >> numeric = ( 10/4)*2 +8 = 13 bytes >> *for 5 numeric cols = 13 *5 = 65 > Not that it will make much difference, but you need to round up in > the 10/4 part so you get 14 bytes, not 13. Also, this calculation is ignoring the fact that (pre-8.3) varlena values have to be int-aligned, so there's wasted pad space too. The varchar values really need 8 bytes each, and the numeric values 16, so the actual data payload in each row is 120 bytes. Then add HeapTupleHeader (28 bytes), and then MAXALIGN the whole row size. So the rowsize would be either 148 or 152 bytes depending on if you were on a machine with 8-byte MAXALIGN. Then add the per-row item pointer, giving total per-row space of 152 or 156 bytes. That means you can fit either 53 or 52 rows per page, giving either 188 or 192 pages as the minimum possible file size. Evidently, Sharmila is using a MAXALIGN=4 machine and has a few dead rows in there. regards, tom lane
>>> Calculation
>> >varchar = (overhead) 4 + (actual length of string) 3 = 7 bytes
>> >*for 5 varchar cols =
>> >5*7 = 35 bytes
>> >
>> >numeric (according to manual--- The actual storage requirement is
>> >two bytes for each group of four decimal digits, plus eight bytes
>>> overhead )
>>> numeric = ( 10/4)*2 +8 = 13 bytes
>>> *for 5 numeric cols = 13 *5 = 65
> >Not that it will make much difference, but you need to round up in
> >the 10/4 part so you get 14 bytes, not 13.
>Also, this calculation is ignoring the fact that (pre-8.3) varlena
>values have to be int-aligned, so there's wasted pad space too.
>The varchar values really need 8 bytes each, and the numeric values
>16, so the actual data payload in each row is 120 bytes. Then add
>HeapTupleHeader (28 bytes), and then MAXALIGN the whole row size.
>So the rowsize would be either 148 or 152 bytes depending on if you
>were on a machine with 8-byte MAXALIGN. Then add the per-row item
>pointer, giving total per-row space of 152 or 156 bytes. That
>means you can fit either 53 or 52 rows per page, giving either 188
>or 192 pages as the minimum possible file size. Evidently, Sharmila
>is using a MAXALIGN=4 machine and has a few dead rows in there.
Thanks for the detailed explanation Tom. You are rigt...The number of pages for this tables as per pg_class is 189.
I have a few questions
1. How do you find the MAXALIGN of the machine? And what is that used for?
2. How does null columns account for this space. For example, if I have the same table (10 cols --5 varchars and 5 numerics) in which 8 cols are null and 2 cols(1 varchar and 1 num) are not null, how is the storage space affected for this case?
Thanks again
sharmila
Never miss a thing. Make Yahoo your homepage.
SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com> writes: > 1. How do you find the MAXALIGN of the machine? And what is that used for? pg_controldata will show "maximum data alignment". A rule of thumb is that it's 4 on 32-bit machines and 8 on 64-bit machines, but there are exceptions. > 2. How does null columns account for this space. If there are any nulls in a row then you pay for a null bitmap with 1 bit/column, but the null columns themselves aren't stored and hence take zero space. In your example the bitmap needs 10 bits, but after allowing for alignment the effect is that the heap tuple header gets 4 bytes bigger if there's any nulls. regards, tom lane
Thanks Tom and Erik and all the others who helped.
You guys really rock!!!
Sharmila
You guys really rock!!!
Sharmila
----- Original Message ----
From: Tom Lane <tgl@sss.pgh.pa.us>
To: SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com>
Cc: Erik Jones <erik@myemma.com>; pgsql-general@postgresql.org
Sent: Wednesday, November 21, 2007 1:14:02 PM
Subject: Re: [GENERAL] Postgres table size
SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com> writes:
> 1. How do you find the MAXALIGN of the machine? And what is that used for?
pg_controldata will show "maximum data alignment". A rule of thumb is
that it's 4 on 32-bit machines and 8 on 64-bit machines, but there are
exceptions.
> 2. How does null columns account for this space.
If there are any nulls in a row then you pay for a null bitmap with 1
bit/column, but the null columns themselves aren't stored and hence
take zero space. In your example the bitmap needs 10 bits, but after
allowing for alignment the effect is that the heap tuple header gets
4 bytes bigger if there's any nulls.
regards, tom lane
From: Tom Lane <tgl@sss.pgh.pa.us>
To: SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com>
Cc: Erik Jones <erik@myemma.com>; pgsql-general@postgresql.org
Sent: Wednesday, November 21, 2007 1:14:02 PM
Subject: Re: [GENERAL] Postgres table size
SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com> writes:
> 1. How do you find the MAXALIGN of the machine? And what is that used for?
pg_controldata will show "maximum data alignment". A rule of thumb is
that it's 4 on 32-bit machines and 8 on 64-bit machines, but there are
exceptions.
> 2. How does null columns account for this space.
If there are any nulls in a row then you pay for a null bitmap with 1
bit/column, but the null columns themselves aren't stored and hence
take zero space. In your example the bitmap needs 10 bits, but after
allowing for alignment the effect is that the heap tuple header gets
4 bytes bigger if there's any nulls.
regards, tom lane
Be a better sports nut! Let your teams follow you with Yahoo Mobile. Try it now.
On Wed, Nov 21, 2007 at 09:51:22AM -0800, SHARMILA JOTHIRAJAH wrote: > Thanks for the detailed explanation Tom. You are rigt...The number of pages for this tables as per pg_class is 189. > > I have a few questions > 1. How do you find the MAXALIGN of the machine? And what is that used for? MAXALIGN is the preferred alignment for structure objects, iirc. It's generally less efficient to access unaligned objects than aligned ones, though this is architechure specific. You should be able to find it in your pg_config.h under MAXIMUM_ALIGNOF,. > 2. How does null columns account for this space. For example, if I > have the same table (10 cols --5 varchars and 5 numerics) in which 8 > cols are null and 2 cols(1 varchar and 1 num) are not null, how is > the storage space affected for this case? The cost is typically one bit per column in the table, unless there are no NULLs in which case it costs nothing. 10 cols = 2 bytes, which may or may not be swallowed by alignment. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Those who make peaceful revolution impossible will make violent revolution inevitable. > -- John F Kennedy