Re: database 1.2G, pg_dump 73M?! - Mailing list pgsql-general

From Ross Boylan
Subject Re: database 1.2G, pg_dump 73M?!
Date
Msg-id 1206903278.8239.212.camel@corn.betterworld.us
Whole thread Raw
In response to Re: database 1.2G, pg_dump 73M?!  (Joris Dobbelsteen <joris@familiedobbelsteen.nl>)
Responses Re: database 1.2G, pg_dump 73M?!  (Joris Dobbelsteen <joris@familiedobbelsteen.nl>)
Re: database 1.2G, pg_dump 73M?!  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
On Sun, 2008-03-30 at 20:27 +0200, Joris Dobbelsteen wrote:
> Ross Boylan wrote:
> > I have a postgres server for which du reports
> > 1188072    /var/lib/postgresql/8.2/main
> > on  Linux system.
> > The server has only one real database, which is for bacula.  When I dump
> > the database, it's 73Mg.
> >
> > This is immediately after I did a full vacuum and restarted the server.
> >
> > Also,
> > bacula=> SELECT relname, relpages FROM pg_class ORDER BY relpages DESC
> > limit 15;
> >              relname             | relpages
> > ---------------------------------+----------
> >  file_jpfid_idx                  |    27122
> >  file_pathid_idx                 |    17969
> >  file_jobid_idx                  |    17948
> >  file_pkey                       |    14580
> >  file_fp_idx                     |    12714
> >  file                            |    11558
> >  file_filenameid_idx             |     9806
> >  filename                        |     3958
> >  filename_name_idx               |     2510
> >  filename_pkey                   |     1367
> >  path                            |      966
> >  path_name_idx                   |      950
> >  path_pkey                       |      151
> >  pg_attribute_relid_attnam_index |       46
> >  pg_proc                         |       45
> >
> > It seems very strange to me that there is such a difference in size
> > between the dump and the database: the data store is almost 15 time
> > larger than the dump.
> >
> > Is this to be expected (e.g., from the indices taking up disk space)?
> > Is there anything I can do to reclaim some disk space
> There are a few factors you need to take into account:
>
>     * Data storage in the database is packed into blocks and contains
>       header data. Since data needs to be put into blocks there is a
>       potential for waisting space. If you are unlucky it can become
>       nearly a single row in the worst case.
>     * You need to vacuum often, to ensure obsolete rows are removed and
>       space can be reused.
>     * Tables are not reduced in size and only grown. I thinks cluster
>       and vacuum full will reduce the size of your table.
>     * Indexes are not in the backup, they are derived from the table
>       data on a restore.
>       If you remove the indexes you are left with 150~200 MB of data (I
>       guessed).
>       Doing reindex will rebuild the index and get rid of all the bloat
>       it has been collected during use.  _I recommend you try this_, as
>       your indexes on the file table look quite huge.
reindexing had a huge effect.  After reindex the top tables (file,
filename and path) I now see
 SELECT relname, relpages FROM pg_class ORDER BY relpages DESC limit 15;
             relname             | relpages
---------------------------------+----------
 file                            |    11558
 filename                        |     3958
 filename_name_idx               |     2383
 file_jpfid_idx                  |     2145
 file_fp_idx                     |     1787
 file_jobid_idx                  |     1427
 file_pathid_idx                 |     1427
 file_pkey                       |     1427
 file_filenameid_idx             |     1427
 filename_pkey                   |     1367
 path                            |      966
 path_name_idx                   |      871
 path_pkey                       |      151
 pg_attribute_relid_attnam_index |       46
 pg_proc                         |       45
and du now reports 451M.  That still seems a bit large, given the size
of the sql dump, but it's almost 2/3 lower than it was before.

Thanks so much!

I guess I need to figure out how to reindex automatically.
>
> But the most important factor for you will be the following:
>
>     * Backups are compressed. Since you store filenames and paths these
>       will have a very high amount of regularity and therefore are very
>       good targets for compression. This can save a huge amount of data.
>       If you take a compression factor of 50%~70% you will reach your 70 MB.
I don't see how this is relevant, since my dump file was plain text
(sql).


> Ow, server restarts will not help reduce your database size. In fact,
> nothing at all should change, except lower performance until sufficient
> cached data is back in the cache again.
>
> Hope this helps...

It was a huge help.
>
> - Joris

pgsql-general by date:

Previous
From: postgre@seznam.cz
Date:
Subject: Re: [GENERAL] Re: [GENERAL] postgreSQL multithreading
Next
From: Martijn van Oosterhout
Date:
Subject: Re: Locale / Encoding mismatch