Re: db size - Mailing list pgsql-performance
From | Craig Ringer |
---|---|
Subject | Re: db size |
Date | |
Msg-id | 480317E2.1050900@postnewspapers.com.au Whole thread Raw |
In response to | db size (Adrian Moisey <adrian@careerjunction.co.za>) |
List | pgsql-performance |
Adrian Moisey wrote: > Hi > > We currently have a 16CPU 32GB box running postgres 8.2. > > When I do a pg_dump with the following parameters "/usr/bin/pg_dump -E > UTF8 -F c -b" I get a file of 14GB in size. > > But the database is 110GB in size on the disk. Why the big difference > in size? Does this have anything to do with performance? Reasons: You're using a compact format designed to limit size and provide fast dump/restore. The database, by contrast, is designed for fast access. The database can contain "dead space" that hasn't been reclaimed by a VACUUM. It can also have space allocated that it doesn't need, which you can reclaim with VACUUM FULL. This dead space can really add up, but it's the price of fast updates, inserts and deletes. Your indexes take up disk space in the database, but are not dumped and do not take up space in the dump file. Indexes can get very large especially if you have lots of multi-column indexes. I'm told that under certain loads indexes can grow full of mostly empty pages, and a REINDEX every now and then can be useful to shrink them - see "\h reindex" in psql. That won't affect your dump sizes as indexes aren't dumped, but will affect the database size. You can examine index (and relation) sizes using a query like: select * from pg_class order by relpages desc Data in the database is either not compressed, or (for larger fields) is compressed with an algorithm that's very fast but doesn't achieve high levels of compression. By contrast, the dumps are quite efficiently compressed. One of my database clusters is 571MB on disk at the moment, just after being dropped, recreated, and populated from another data source. The repopulation process is quite complex. I found that running VACUUM FULL followed by REINDEX DATABASE dbname knocked 50MB off the database size, pushing it down to 521MB. That's on a basically brand new DB. Note, however, that 130MB of that space is in pg_xlog, and much of it will be wasted as the DB has been under very light load but uses large xlogs because it needs to perform well under huge load spikes. The size of the `base' directory (the "real data", indexes, etc) is only 392MB. If I dump that database using the same options you dumped yours with, I end up with a hilariously small 29MB dump file. That's less than 10% of the size of the main DB. The difference will be entirely due to compression, a more compact storage layout in the dump files, and to the lack of index data in the dumps. The database has quite a few indexes, some of which are multicolumn indexes on tables with large numbers of tuples, so that bloats the "live" version a lot. -- Craig Ringer
pgsql-performance by date: