Home > mailing lists

Re: Tuple storage overhead - Mailing list pgsql-general

From	Merlin Moncure
Subject	Re: Tuple storage overhead
Date	April 16, 2010 15:28:49
Msg-id	n2lb42b73151004160828ifa0cbf54lc1839d4e36b18745@mail.gmail.com Whole thread Raw
In response to	Tuple storage overhead (Peter Bex <Peter.Bex@xs4all.nl>)
Responses	Re: Tuple storage overhead (Peter Bex <Peter.Bex@xs4all.nl>)
List	pgsql-general

Tree view

On Fri, Apr 16, 2010 at 5:41 AM, Peter Bex <Peter.Bex@xs4all.nl> wrote:
> Hi all,
>
> I have a table with three columns: one integer and two doubles.
> There are two indexes defined (one on the integer and one on one
> of the doubles).  This table stores 700000 records, which take up
> 30 Mb according to pg_relation_size(), and the total relation size
> is 66 Mb.
>
> I expected the disk space usage to be halved by changing the doubles
> to floats, but it only dropped by 5 MB!  (I tried various approaches,
> including dumping and restoring to make sure there was no uncollected
> garbage lying around)
>
> Someone on IRC told me the per-tuple storage overhead is pretty big,
> and asked me to create a similar table containing only integers:
>
> db=# create table testing (  x integer );
> db=# INSERT INTO testing (x) VALUES (generate_series(1, 700000));
> dacolt_development=# SELECT pg_size_pretty(pg_total_relation_size('testing'));
>  pg_size_pretty
>  ----------------
>  24 MB
>  (1 row)
> db=# SELECT pg_size_pretty(pg_relation_size('testing'));
>  pg_size_pretty
>  ----------------
>  24 MB
> db=# CREATE INDEX testing_1 ON testing (x);
> db=# CREATE INDEX testing_2 ON testing (x);
> db=# SELECT pg_size_pretty(pg_relation_size('testing'));
>  pg_size_pretty
>  ----------------
>  24 MB
>  (1 row)
> db=# SELECT pg_size_pretty(pg_total_relation_size('testing'));
>  pg_size_pretty
>  ----------------
>  54 MB
>  (1 row)
>
> Is there a way to reduce the per-tuple storage overhead?
>
> The reason I'm asking is that I have tons of tables like this,
> and some data sets are much bigger than this.  In a relatively
> simple testcase I'm importing data from text files which are
> 5.7 Gb in total, and this causes the db size to grow to 34Gb.
>
> This size is just one small sample of many such datasets that I
> need to import, so disk size is really an important factor.

If you are storing big data and want to keep the overhead low, the
first thing you need to examine is organizing your data into arrays.
This involves tradeoffs of course and may not work but it's worth a
shot!

merlin

pgsql-general by date:

From: Scott Mead
Date: 16 April 2010, 15:26:33
Subject: Re: readline library not found

From: John R Pierce
Date: 16 April 2010, 16:49:36
Subject: Re: Int64GetDatum

Re: Tuple storage overhead - Mailing list pgsql-general

Previous

Next