Re: Oracle migration : size on disk of data file greater in PG - Mailing list pgsql-general

From Gregory Stark
Subject Re: Oracle migration : size on disk of data file greater in PG
Date
Msg-id 87wt809nsi.fsf@stark.xeocode.com
Whole thread Raw
In response to Oracle migration : size on disk of data file greater in PG  (Benoit.Gerrienne@BULL.BE)
List pgsql-general
Benoit.Gerrienne@BULL.BE writes:

> At a customer site, we've made a migration from Oracle 8.1.5 to PGSQL
> 8.1.1. The migration happened without any problem and now the performances
> are better with PG than with Ora, but the customer noticed that the size
> of PG on disk where much greater than the size on disk of Oracle. And I'm
> not able to find an easy explanation.
>
> Is it normal, due to inner data storage mecanisms differents between
> Oracle and PG ?

To a certain degree yes. It's one of the topics under active discussion for
improvement in the future. In particular you'll see a big difference if you
have a lot of very small columns. You may also see some difference if you have
very narrow rows because of the transaction status overhead.

> Of course, we've run VACUUM on both DB before measuring the size on disk.

There's a couple problems here though. Firstly VACUUM doesn't usually shrink
the actual size of data on disk, it just notes where the free space is so it
can be reused.

To shrink the actual data on disk you would need VACUUM FULL, CLUSTER, or
ALTER TABLE ... ALTER COLUMN ... TYPE USING. However under normal operation
Postgres expects to have some amount of free space anyways. Running VACUUM
FULL is usually pointless and actually hurts performance unless you have an
unusual situation such as having done large batch updates recently or not
having run VACUUM regularly enough in the past.

Because Postgres keeps free space around in the tables for new versions of
tuples you should include Oracle's rollback segments in the comparison since
that effectively corresponds to the free space Postgres keeps. Or you could do
a VACUUM FULL before comparing but I do not recommend VACUUM FULL for regular
operation.

> The database is used to store statistical data by month and therefore
> contain dozen of tables of the same layout containing most of the time
> hundred of thousands records.

You may find it makes more sense to store this all in one table or in tables
that are children of the same table such as described in:

  http://www.postgresql.org/docs/8.1/static/ddl-partitioning.html

--
greg

pgsql-general by date:

Previous
From: "Andrews, Chris"
Date:
Subject: Re: Oracle migration : size on disk of data file greater in PG
Next
From: Naz Gassiep
Date:
Subject: Re: vista