Re: database 1.2G, pg_dump 73M?! - Mailing list pgsql-general

From Joris Dobbelsteen
Subject Re: database 1.2G, pg_dump 73M?!
Date
Msg-id E4953B65D9E5054AA6C227B410C56AA91B5F37@exchange1.joris2k.local
Whole thread Raw
In response to database 1.2G, pg_dump 73M?!  (Ross Boylan <RossBoylan@stanfordalumni.org>)
List pgsql-general
>-----Original Message-----
>From: Ross Boylan [mailto:RossBoylan@stanfordalumni.org]
>Sent: Monday, 31 March 2008 0:23
>To: Joris Dobbelsteen
>Cc: RossBoylan@stanfordalumni.org; pgsql-general@postgresql.org
>Subject: RE: [GENERAL] database 1.2G, pg_dump 73M?!
>
>
>On Sun, 2008-03-30 at 22:59 +0100, Joris Dobbelsteen wrote:
>> >-----Original Message-----
>> >From: Ross Boylan [mailto:RossBoylan@stanfordalumni.org]
>> >Sent: Sunday, 30 March 2008 23:43
>> >To: Joris Dobbelsteen
>> >Cc: RossBoylan@stanfordalumni.org; pgsql-general@postgresql.org
>> >Subject: Re: [GENERAL] database 1.2G, pg_dump 73M?!
>> >
>> >On Sun, 2008-03-30 at 21:22 +0200, Joris Dobbelsteen wrote:
>> >>  From the top contenders, about half are indexes, so you
>are stuck
>> >> with ~200 MB of data in the tables.
>> >> Postgresql has some wasted space due to placement of the
>tuples in
>> >> a block and overhead for each block and row. I don't know
>> >those values,
>> >> but they are in the range of 24 bytes per tuple, I believe.
>> >Secondly a
>> >> block is 8 KB by default and tuples cannot be stored into
>multiple
>> >> blocks (thats what toast to work around).
>> >>
>> >> All in all: Lookup tuple sizes, if they are small than
>the overhead
>> >> from postgresql can be a big factor. If you are huge you loose on
>> >> portions of unoccupied space in blocks. I believe pg_statistics
>> >> will provide this information.
>> >There is a pg_statistic (no "s")  table, but I don't know
>how to get
>> >tuple size from it--the documentation refers to the source code to
>> >figure out the codes.  Backing up a step, I don't know what a tuple
>> >is in Postgres, and don't see an entry for it in the index.
>>
>> It was pg_stats.
>> You get avg_width. It gives this per column.
>>
>> So probably you want
>> SELECT tablename, SUM(avg_width)
>> FROM pg_stats
>> WHERE schemaname = 'public'
>[Thanks; I didn't know about using schemaname to limit it to
>interesting tables]
>> GROUP BY tablename [RB added] ORDER BY tablename;
> tablename | sum
>-----------+-----
> client    |  62
> counters  |  25
> file      | 109
> filename  |  18
> fileset   |  53
> job       | 149
> jobmedia  |  52
> media     | 226
> mediatype |  16
> path      |  82
> pool      | 179
> status    |  29
> storage   |  23
> version   |   4
>
>So, for example, if each tuple has 24 bytes of overhead, the
>overhead more than doubles the size of the file table (18
>bytes), which has a big record count.  Am I following correctly?

Yes.
(Note not to pin down on the 24 bytes, it varies between versions. I
think this is close enough however).

>Between the space taken up by indices and the other overhead,
>the size difference between the sql dump and the db disk
>useage is starting to seem more reasonable.
>
>The database uses SQL-ASCII encoding, so I'd expect the
>textual parts (filenames and paths) to take up the same space
>(just for the basic storage, not counting overhead/indices) in
>Postgres as on the dump, 1 byte/character.

That's the idea.

>I'm not sure what "If you are huge you loose on portions of
>unoccupied space in blocks" means.

A tuple has to be in exactly 1 page (or block). It cannot span multiple
pages. Hence if your tuples happens to be a half page size or slightly
larger, than only a single one will fit in a single page. So you waste
half the page. You don't have to worry about this, as your tuples are
significantly smaller than that.

(As a reference only: if a tuple is larger than a blocksize some
attributes will be evicted to the toast table or compressed. So that is
to overcome the limitations of 8000 bytes per tuple.)

[snip]
>>
>> ># select distinct relname, reltuples, relpages from pg_class where
>> >relkind='r' and substring(relname, 1, 3) != 'pg_';
>> >
>> >         relname         | reltuples | relpages
>> >-------------------------+-----------+----------
>> > basefiles               |         0 |        0
>> > cdimages                |         0 |        0
>> > client                  |         2 |        1
>> > counters                |         1 |        1
>> > device                  |         0 |        0
>> > file                    |    650659 |    11558
>> > filename                |    623012 |     3958
>> > fileset                 |        22 |        1
>> > job                     |       384 |       10
>> > jobmedia                |       596 |        7
>> > location                |         0 |        0
>> > locationlog             |         0 |        0
>> > log                     |         0 |        0
>> > media                   |       245 |        9
>> > mediatype               |         2 |        1
>> > path                    |     67908 |      966
>> > pool                    |         5 |        1
>> > sql_features            |       439 |        6
>> > sql_implementation_info |        12 |        1
>> > sql_languages           |         4 |        1
>> > sql_packages            |        10 |        1
>> > sql_parts               |         9 |        1
>> > sql_sizing              |        23 |        1
>> > sql_sizing_profiles     |         0 |        0
>> > status                  |        19 |        1
>> > storage                 |         2 |        1
>> > unsavedfiles            |         0 |        0
>> > version                 |         1 |        1
>>
>> Do relpages * 8096 / reltuples.
>> Default installs have 8 KB pages/blocks by default.
>>
>> For file it should be ~144 bytes/tuple.
>> For filename it should be ~51 bytes/tuple.
>
>Those tuple size estimates seem consistent with the tuple
>sizes reported earlier when the overhead is added in.  I'm impressed!
>
[snip]

- Joris

pgsql-general by date:

Previous
From: Ross Boylan
Date:
Subject: Re: database 1.2G, pg_dump 73M?!
Next
From: Dragan Zubac
Date:
Subject: Re: database 1.2G, pg_dump 73M?!