Home > mailing lists

Re: database 1.2G, pg_dump 73M?! - Mailing list pgsql-general

From	Joris Dobbelsteen
Subject	Re: database 1.2G, pg_dump 73M?!
Date	March 30, 2008 21:59:42
Msg-id	E4953B65D9E5054AA6C227B410C56AA91B5F36@exchange1.joris2k.local Whole thread Raw
In response to	database 1.2G, pg_dump 73M?! (Ross Boylan <RossBoylan@stanfordalumni.org>)
Responses	Re: database 1.2G, pg_dump 73M?! (Ross Boylan <RossBoylan@stanfordalumni.org>)
List	pgsql-general

Tree view

>-----Original Message-----
>From: Ross Boylan [mailto:RossBoylan@stanfordalumni.org]
>Sent: Sunday, 30 March 2008 23:43
>To: Joris Dobbelsteen
>Cc: RossBoylan@stanfordalumni.org; pgsql-general@postgresql.org
>Subject: Re: [GENERAL] database 1.2G, pg_dump 73M?!
>
>On Sun, 2008-03-30 at 21:22 +0200, Joris Dobbelsteen wrote:
>>  From the top contenders, about half are indexes, so you are stuck
>> with ~200 MB of data in the tables.
>> Postgresql has some wasted space due to placement of the tuples in a
>> block and overhead for each block and row. I don't know
>those values,
>> but they are in the range of 24 bytes per tuple, I believe.
>Secondly a
>> block is 8 KB by default and tuples cannot be stored into multiple
>> blocks (thats what toast to work around).
>>
>> All in all: Lookup tuple sizes, if they are small than the overhead
>> from postgresql can be a big factor. If you are huge you loose on
>> portions of unoccupied space in blocks. I believe pg_statistics will
>> provide this information.
>There is a pg_statistic (no "s")  table, but I don't know how
>to get tuple size from it--the documentation refers to the
>source code to figure out the codes.  Backing up a step, I
>don't know what a tuple is in Postgres, and don't see an entry
>for it in the index.

It was pg_stats.
You get avg_width. It gives this per column.

So probably you want
SELECT tablename, SUM(avg_width)
FROM pg_stats
WHERE schemaname = 'public'
GROUP BY tablename;

>Is a tuple just a row?  That's what the docs say for the following
>report:

Yes.
Also where I typed "block" it will refer to "page".

># select distinct relname, reltuples, relpages from pg_class
>where relkind='r' and substring(relname, 1, 3) != 'pg_';
>
>         relname         | reltuples | relpages
>-------------------------+-----------+----------
> basefiles               |         0 |        0
> cdimages                |         0 |        0
> client                  |         2 |        1
> counters                |         1 |        1
> device                  |         0 |        0
> file                    |    650659 |    11558
> filename                |    623012 |     3958
> fileset                 |        22 |        1
> job                     |       384 |       10
> jobmedia                |       596 |        7
> location                |         0 |        0
> locationlog             |         0 |        0
> log                     |         0 |        0
> media                   |       245 |        9
> mediatype               |         2 |        1
> path                    |     67908 |      966
> pool                    |         5 |        1
> sql_features            |       439 |        6
> sql_implementation_info |        12 |        1
> sql_languages           |         4 |        1
> sql_packages            |        10 |        1
> sql_parts               |         9 |        1
> sql_sizing              |        23 |        1
> sql_sizing_profiles     |         0 |        0
> status                  |        19 |        1
> storage                 |         2 |        1
> unsavedfiles            |         0 |        0
> version                 |         1 |        1

Do relpages * 8096 / reltuples.
Default installs have 8 KB pages/blocks by default.

For file it should be ~144 bytes/tuple.
For filename it should be ~51 bytes/tuple.

Probably you will get some signficant differences here.

>> Another factor is representation in the SQL dump might be more
>> efficient than in the database, but this highly depends on your data
>> set. For example, a int8 takes 8 bytes in a table, while it takes
>> between 1 and ~20 in a SQL dump.
>>
>> How the plain SQL dump becomes this small I cannot explain without
>> much much more details.
>
>On Tom's point, bacula regularly inserts entries into the
>tables and then, days to months later, deletes them.  As far
>as I know, the VACUUM FULLs I just did were the first ever; I
>did do several of them because I kept getting messages about
>needing more fsm_pages.
>
>I am still trying to figure out if the database was getting
>any automatic vacuuming at all.  The Postgres documentation
>(the database is 8.2, though I'm moving to 8.3 soon) sounds as
>if it's on automatically, but the Debian-specific
>documentation suggests I may need to do some additional things
>to enable it.
>
>Probably the fsm_pages being low also hurt disk useage, since
>the message accompanying the vacuum said that's what happens
>if fsm_pages is low.  It was 20k; vacuum said I needed 56k,
>and I upped it to 80k.
>I'm not sure if my recent cleaning has brought the needed
>fsm_pages down.
>
>I've only been doing partial backups for the last few months,
>so there's probably less info in the tables than under normal
>conditions.  I suppose it's possible the space I gained was
>just a temporary win.

Looks like some configuration changes are needed to tune your
installation to better suite you needs. fsm_pages can be modified, but I
don't know if any other parameters might need change for that.
Documentation can help and probably a lot of people here can do and tell
it blindfolded.

- Joris

pgsql-general by date:

From: Ivan Sergio Borgonovo
Date: 30 March 2008, 21:56:06
Subject: shortcut for select * where id in (arrayvar)

From: Ross Boylan
Date: 30 March 2008, 22:23:35
Subject: Re: database 1.2G, pg_dump 73M?!

Re: database 1.2G, pg_dump 73M?! - Mailing list pgsql-general

Previous

Next