Re: Why the mismatch? {row size reported by VACUUM} vs. {table size on disk / number of rows} vs. {sum of column type sizes} - Mailing list pgsql-general

From Tom Lane
Subject Re: Why the mismatch? {row size reported by VACUUM} vs. {table size on disk / number of rows} vs. {sum of column type sizes}
Date
Msg-id 17255.1257645694@sss.pgh.pa.us
Whole thread Raw
In response to Why the mismatch? {row size reported by VACUUM} vs. {table size on disk / number of rows} vs. {sum of column type sizes}  ("Vlad Romascanu" <vromascanu@accurev.com>)
List pgsql-general
"Vlad Romascanu" <vromascanu@accurev.com> writes:
> PostgreSQL 8.3.3, on Win32.
> I have a table with 8 fixed-size (integer-type) columns: 6 x int8, 1 x
> integer, 1 x int2.  All columns have storage=plain.  Have explicitly set
> FILLFACTOR=100 and WITHOUT OIDS just so there's no misunderstanding.

> Expected row size, based solely on data types and minus any per-row (MVCC
> etc.) overhead: 54 bytes.

> However, row size reported by VACUUM FULL VERBOSE is 88 bytes.

> Also, table size (reported by pg_relation_size) divided by number of rows in
> the table: ~93 bytes per row.

> After altering the int2 column to int8, the table size (reported by
> pg_relation_size) does not change by one single byte.  Idem row size
> reported by VACCUM.

> Why is the observed-vs-theoretical row size overhead ~40 bytes/row?

You're not accounting for row-header overhead nor alignment padding.
There is a 24-byte row header, which already brings us to 78 bytes,
which would be rounded up to at least 80 bytes as a consequence of
rows being aligned on MAXALIGN boundaries.  However, given that you
have ordered the columns like this:

>   seq bigserial NOT NULL,
>   column1 integer NOT NULL,
>   column2 bigint NOT NULL,
>   column3 bigint NOT NULL,
>   column4 smallint NOT NULL,
>   column5 bigint NOT NULL,
>   column6 bigint NOT NULL,
>   column7 bigint NOT NULL,

it's even worse: bigint columns have to start on 8-byte boundaries
so there is alignment pad space after each of the two nominally-smaller
columns.  What you've actually got here is 64 bytes of data area, plus
24 gives the 88 bytes reported by VACUUM.  In addition to that there
is 4 bytes per row for an "item pointer" in the page header, plus
you have to consider that the page header is of nonzero size and there
will be wasted space at the end of each page because the row size
doesn't divide the page size exactly.  (It looks to me like this
particular example is nearly worst case --- almost a whole row's
worth of space will be wasted on each page.)

You can find more details in
http://www.postgresql.org/docs/8.3/static/storage-page-layout.html

            regards, tom lane

pgsql-general by date:

Previous
From: "Vlad Romascanu"
Date:
Subject: Why the mismatch? {row size reported by VACUUM} vs. {table size on disk / number of rows} vs. {sum of column type sizes}
Next
From: Martin Gainty
Date:
Subject: Re: pgp encryption functions