Thread: Why the mismatch? {row size reported by VACUUM} vs. {table size on disk / number of rows} vs. {sum of column type sizes}

PostgreSQL 8.3.3, on Win32.

I have a table with 8 fixed-size (integer-type) columns: 6 x int8, 1 x
integer, 1 x int2.  All columns have storage=plain.  Have explicitly set
FILLFACTOR=100 and WITHOUT OIDS just so there's no misunderstanding.

Expected row size, based solely on data types and minus any per-row (MVCC
etc.) overhead: 54 bytes.

However, row size reported by VACUUM FULL VERBOSE is 88 bytes.

Also, table size (reported by pg_relation_size) divided by number of rows in
the table: ~93 bytes per row.

After altering the int2 column to int8, the table size (reported by
pg_relation_size) does not change by one single byte.  Idem row size
reported by VACCUM.

Why is the observed-vs-theoretical row size overhead ~40 bytes/row?  Why
does VACUUM report a value that's 5 bytes less than
table_size/number_of_rows?  Why does the table size not change when altering
a column's type from int2 to int8?  Is there any place where I can read more
about this (other than the source code itself? :) )

Thanks!
V.



SQL> CREATE TABLE my_schema.my_table (
  seq bigserial NOT NULL,
  column1 integer NOT NULL,
  column2 bigint NOT NULL,
  column3 bigint NOT NULL,
  column4 smallint NOT NULL,
  column5 bigint NOT NULL,
  column6 bigint NOT NULL,
  column7 bigint NOT NULL,
  CONSTRAINT my_table_pkey PRIMARY KEY (column1, column6, column2, column3),
  CONSTRAINT my_table_seq_key UNIQUE (seq)
) WITH (FILLFACTOR=100, OIDS=FALSE);
CREATE INDEX my_table_el_tmndx ON my_schema.my_table USING btree(column2,
column3);
CREATE INDEX my_table_stream_el_tmndx ON my_schema.my_table USING btree
(column1, column2, column3);
CREATE INDEX my_table_tndx ON my_schema.my_table USING btree (column3);


...
populate table with 1245969 rows
...



SQL> SELECT pg_relation_size('my_schema.my_table');
115990528

SQL> SELECT COUNT(*) FROM my_schema.my_table;
1245969

SQL> SELECT 1.0*pg_relation_size('my_schema.my_table')/(SELECT COUNT(*) FROM
my_schema.my_table);
93.0926275051787003

SQL> VACUUM FULL VERBOSE my_schema.my_table;
INFO:  vacuuming "my_schema.my_table"INFO:  "my_table": found 0 removable,
1245969 nonremovable row versions in 14159 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 88 to 88 bytes long.
There were 0 unused item pointers.
Total free space (including removable row versions) is 1021564 bytes.
0 pages are or will become empty, including 0 at the end of the table.
1 pages containing 2188 free bytes are potential move destinations.
CPU 0.00s/0.13u sec elapsed 0.13 sec.INFO:  index "my_table_pkey" now
contains 1245969 row versions in 7567 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.07s/0.00u sec elapsed 0.07 sec.INFO:  index "my_table_seq_key" now
contains 1245969 row versions in 3419 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.03s/0.00u sec elapsed 0.03 sec.
INFO:  index "my_table_el_tmndx" now contains 1245969 row versions in 4800
pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.05s/0.00u sec elapsed 0.05 sec.INFO:  index "my_table_stream_el_tmndx"
now contains 1245969 row versions in 6179 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.04s/0.01u sec elapsed 0.05 sec.INFO:  index "my_table_tndx" now
contains 1245969 row versions in 3419 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.03s/0.01u sec elapsed 0.04 sec.INFO:  "my_table": moved 0 row
versions, truncated 14159 to 14159 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
Query returned successfully with no result in 390 ms.

SQL> SELECT pg_relation_size('my_schema.my_table');
115990528

SQL> ALTER TABLE my_schema.my_table ALTER COLUMN column4 TYPE int8; -- from
int2
Query returned successfully with no result in 53937 ms.

SQL> SELECT pg_relation_size('my_schema.my_table');
115990528

SQL> VACUUM FULL VERBOSE my_schema.my_table;
INFO:  vacuuming "my_schema.my_table"INFO:  "my_table": found 0 removable,
1245969 nonremovable row versions in 14159 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 88 to 88 bytes long.
There were 0 unused item pointers.
Total free space (including removable row versions) is 1021564 bytes.
0 pages are or will become empty, including 0 at the end of the table.
1 pages containing 2188 free bytes are potential move destinations.
CPU 0.00s/0.18u sec elapsed 0.19 sec.
INFO:  index "my_table_pkey" now contains 1245969 row versions in 7567 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.04s/0.02u sec elapsed 0.06 sec.INFO:  index "my_table_seq_key" now
contains 1245969 row versions in 3419 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.02s/0.02u sec elapsed 0.04 sec.INFO:  index "my_table_el_tmndx" now
contains 1245969 row versions in 4800 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.03s/0.01u sec elapsed 0.04 sec.INFO:  index "my_table_stream_el_tmndx"
now contains 1245969 row versions in 6179 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.02s/0.04u sec elapsed 0.06 sec.INFO:  index "my_table_tndx" now
contains 1245969 row versions in 3419 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.03s/0.00u sec elapsed 0.03 sec.INFO:  "my_table": moved 0 row
versions, truncated 14159 to 14159 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
Query returned successfully with no result in 451 ms.

SQL> SELECT pg_relation_size('my_schema.my_table');
115990528


"Vlad Romascanu" <vromascanu@accurev.com> writes:
> PostgreSQL 8.3.3, on Win32.
> I have a table with 8 fixed-size (integer-type) columns: 6 x int8, 1 x
> integer, 1 x int2.  All columns have storage=plain.  Have explicitly set
> FILLFACTOR=100 and WITHOUT OIDS just so there's no misunderstanding.

> Expected row size, based solely on data types and minus any per-row (MVCC
> etc.) overhead: 54 bytes.

> However, row size reported by VACUUM FULL VERBOSE is 88 bytes.

> Also, table size (reported by pg_relation_size) divided by number of rows in
> the table: ~93 bytes per row.

> After altering the int2 column to int8, the table size (reported by
> pg_relation_size) does not change by one single byte.  Idem row size
> reported by VACCUM.

> Why is the observed-vs-theoretical row size overhead ~40 bytes/row?

You're not accounting for row-header overhead nor alignment padding.
There is a 24-byte row header, which already brings us to 78 bytes,
which would be rounded up to at least 80 bytes as a consequence of
rows being aligned on MAXALIGN boundaries.  However, given that you
have ordered the columns like this:

>   seq bigserial NOT NULL,
>   column1 integer NOT NULL,
>   column2 bigint NOT NULL,
>   column3 bigint NOT NULL,
>   column4 smallint NOT NULL,
>   column5 bigint NOT NULL,
>   column6 bigint NOT NULL,
>   column7 bigint NOT NULL,

it's even worse: bigint columns have to start on 8-byte boundaries
so there is alignment pad space after each of the two nominally-smaller
columns.  What you've actually got here is 64 bytes of data area, plus
24 gives the 88 bytes reported by VACUUM.  In addition to that there
is 4 bytes per row for an "item pointer" in the page header, plus
you have to consider that the page header is of nonzero size and there
will be wasted space at the end of each page because the row size
doesn't divide the page size exactly.  (It looks to me like this
particular example is nearly worst case --- almost a whole row's
worth of space will be wasted on each page.)

You can find more details in
http://www.postgresql.org/docs/8.3/static/storage-page-layout.html

            regards, tom lane