Thread: db size
Hi We currently have a 16CPU 32GB box running postgres 8.2. When I do a pg_dump with the following parameters "/usr/bin/pg_dump -E UTF8 -F c -b" I get a file of 14GB in size. But the database is 110GB in size on the disk. Why the big difference in size? Does this have anything to do with performance? -- Adrian Moisey Systems Administrator | CareerJunction | Your Future Starts Here. Web: www.careerjunction.co.za | Email: adrian@careerjunction.co.za Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842
Hi Adrian, >When I do a pg_dump with the following parameters "/usr/bin/pg_dump -E UTF8 -F c -b" I get a file of 14GB in size. From the man page of pg_dump " -F format, --format=format Selects the format of the output. format can be one of the following: c output a custom archive suitable for input into pg_restore. This is the most flexible format in that it allows reorderingof data load as well as schema elements. This format is also compressed by default. " The output is compressed and it is a dump of the database which contain the SQL commands: >But the database is 110GB in size on the disk. Why the big difference >in size? Does this have anything to do with performance? VACUUM or VACUUM FULL of the entire database will reduce the size of the database by reclaiming any unused space and youcan use the filesystem based backup or backup/restore strategy.
Adrian Moisey wrote: > Hi > > We currently have a 16CPU 32GB box running postgres 8.2. > > When I do a pg_dump with the following parameters "/usr/bin/pg_dump -E > UTF8 -F c -b" I get a file of 14GB in size. > > But the database is 110GB in size on the disk. Why the big difference > in size? Does this have anything to do with performance? Reasons: You're using a compact format designed to limit size and provide fast dump/restore. The database, by contrast, is designed for fast access. The database can contain "dead space" that hasn't been reclaimed by a VACUUM. It can also have space allocated that it doesn't need, which you can reclaim with VACUUM FULL. This dead space can really add up, but it's the price of fast updates, inserts and deletes. Your indexes take up disk space in the database, but are not dumped and do not take up space in the dump file. Indexes can get very large especially if you have lots of multi-column indexes. I'm told that under certain loads indexes can grow full of mostly empty pages, and a REINDEX every now and then can be useful to shrink them - see "\h reindex" in psql. That won't affect your dump sizes as indexes aren't dumped, but will affect the database size. You can examine index (and relation) sizes using a query like: select * from pg_class order by relpages desc Data in the database is either not compressed, or (for larger fields) is compressed with an algorithm that's very fast but doesn't achieve high levels of compression. By contrast, the dumps are quite efficiently compressed. One of my database clusters is 571MB on disk at the moment, just after being dropped, recreated, and populated from another data source. The repopulation process is quite complex. I found that running VACUUM FULL followed by REINDEX DATABASE dbname knocked 50MB off the database size, pushing it down to 521MB. That's on a basically brand new DB. Note, however, that 130MB of that space is in pg_xlog, and much of it will be wasted as the DB has been under very light load but uses large xlogs because it needs to perform well under huge load spikes. The size of the `base' directory (the "real data", indexes, etc) is only 392MB. If I dump that database using the same options you dumped yours with, I end up with a hilariously small 29MB dump file. That's less than 10% of the size of the main DB. The difference will be entirely due to compression, a more compact storage layout in the dump files, and to the lack of index data in the dumps. The database has quite a few indexes, some of which are multicolumn indexes on tables with large numbers of tuples, so that bloats the "live" version a lot. -- Craig Ringer
> Hi > > We currently have a 16CPU 32GB box running postgres 8.2. > > When I do a pg_dump with the following parameters "/usr/bin/pg_dump -E > UTF8 -F c -b" I get a file of 14GB in size. > > But the database is 110GB in size on the disk. Why the big difference > in size? Does this have anything to do with performance? I have a 2GB database, which dumps to a 340 MB file... Two reasons : - I have lots of big fat but very necessary indexes (not included in dump) - Dump is compressed with gzip which really works well on database data. If you suspect your tables or indexes are bloated, restore your dump to a test box. Use fsync=off during restore, you don't care about integrity on the test box. This will avoid slowing down your production database. Then look at the size of the restored database. If it is much smaller than your production database, then you have bloat. Time to CLUSTER, or REINDEX, or VACUUM FULL (your choice), on the tables that are bloated, and take note to vacuum those more often (and perhaps tune the autovacuum). Judicious use of CLUSTER on that small, but extremely often updated table can also be a very good option. 8.3 and its new HOT feature are also a good idea.
Hi > If you suspect your tables or indexes are bloated, restore your dump > to a test box. > Use fsync=off during restore, you don't care about integrity on the > test box. > This will avoid slowing down your production database. > Then look at the size of the restored database. > If it is much smaller than your production database, then you have > bloat. I have done that, and I get the following: the live one is 113G the restored one is 78G How should I get rid of the bloat? VACUUM FULL? -- Adrian Moisey Systems Administrator | CareerJunction | Your Future Starts Here. Web: www.careerjunction.co.za | Email: adrian@careerjunction.co.za Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842
> >> If you suspect your tables or indexes are bloated, restore your >> dump to a test box. >> Use fsync=off during restore, you don't care about integrity on the >> test box. >> This will avoid slowing down your production database. >> Then look at the size of the restored database. >> If it is much smaller than your production database, then you have >> bloat. > > I have done that, and I get the following: > > the live one is 113G > the restored one is 78G Ah. Good news for you is that you know that you can do something ;) Now, is the bloat in the tables (which tables ?) or in the indexes (which indexes ?), or in the toast tables perhaps, or in the system catalogs or all of the above ? Or perhaps there is a long-forgotten process that got zombified while holding a huge temp table ? (not very likely, but who knows). Use pg_relation_size() and its friends to get an idea of the size of stuff. Perhaps you have 1 extremely bloated table or index, or perhaps everything is bloated. The solution to your problem depends on which case you have.
Adrian Moisey wrote: > Hi > >> If you suspect your tables or indexes are bloated, restore your >> dump to a test box. >> Use fsync=off during restore, you don't care about integrity on >> the test box. >> This will avoid slowing down your production database. >> Then look at the size of the restored database. >> If it is much smaller than your production database, then you have >> bloat. > > I have done that, and I get the following: > > the live one is 113G > the restored one is 78G > > How should I get rid of the bloat? > VACUUM FULL? And/or REINDEX if you're not satisfied with the results of a VACUUM FULL. http://www.postgresql.org/docs/8.3/interactive/vacuum.html http://www.postgresql.org/docs/8.3/interactive/sql-reindex.html Of course, all of these will have performance consequences while they're running, and take out locks that prevent certain other operatons as shown in table 13-2: http://www.postgresql.org/docs/8.3/static/explicit-locking.html and the explanation following it. Note in particular: ---- ACCESS EXCLUSIVE Conflicts with locks of all modes (ACCESS SHARE, ROW SHARE, ROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE). This mode guarantees that the holder is the only transaction accessing the table in any way. Acquired by the ALTER TABLE, DROP TABLE, TRUNCATE, REINDEX, CLUSTER, and VACUUM FULL commands. This is also the default lock mode for LOCK TABLE statements that do not specify a mode explicitly. Tip: Only an ACCESS EXCLUSIVE lock blocks a SELECT (without FOR UPDATE/SHARE) statement. ---- In other words, you won't be doing much with a table/index while a VACUUM FULL or a REINDEX is in progress on it. Given that, you probably want to check your table/index sizes and see if there are particular problem tables or indexes, rather than just using a sledgehammer approach. -- Craig Ringer
Hi >> the live one is 113G >> the restored one is 78G > > Good news for you is that you know that you can do something ;) :) Will this help with performance ? > Now, is the bloat in the tables (which tables ?) or in the indexes > (which indexes ?), or in the toast tables perhaps, or in the system > catalogs or all of the above ? Or perhaps there is a long-forgotten > process that got zombified while holding a huge temp table ? (not very > likely, but who knows). > Use pg_relation_size() and its friends to get an idea of the size of > stuff. I'll look into that, thanks -- Adrian Moisey Systems Administrator | CareerJunction | Your Future Starts Here. Web: www.careerjunction.co.za | Email: adrian@careerjunction.co.za Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842
> Will this help with performance ? Depends if the bloat is in part of your working set. If debloating can make the working set fit in RAM, or lower your IOs, you'll get a boost. >> Now, is the bloat in the tables (which tables ?) or in the indexes >> (which indexes ?), or in the toast tables perhaps, or in the system >> catalogs or all of the above ? Or perhaps there is a long-forgotten >> process that got zombified while holding a huge temp table ? (not very >> likely, but who knows). >> Use pg_relation_size() and its friends to get an idea of the size >> of stuff. > > I'll look into that, thanks >
In response to Adrian Moisey <adrian@careerjunction.co.za>: > > We currently have a 16CPU 32GB box running postgres 8.2. > > When I do a pg_dump with the following parameters "/usr/bin/pg_dump -E > UTF8 -F c -b" I get a file of 14GB in size. > > But the database is 110GB in size on the disk. Why the big difference > in size? Does this have anything to do with performance? In a dump, indexes are a single command. In the actual database, the indexes actually contain all the data the indexes require, which can be substantially more in size than the command to create the index. Additionally, a running database has a certain amount of wasted space. If you're running vacuum on a proper schedule, this won't get out of hand. Read this page to understand better: http://www.postgresql.org/docs/8.1/static/maintenance.html And lastly, I expect that the pg_dump format is able to do more aggressive compression than the running database. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023
Hi >>> Now, is the bloat in the tables (which tables ?) or in the >>> indexes (which indexes ?), or in the toast tables perhaps, or in the >>> system catalogs or all of the above ? Or perhaps there is a >>> long-forgotten process that got zombified while holding a huge temp >>> table ? (not very likely, but who knows). >>> Use pg_relation_size() and its friends to get an idea of the size >>> of stuff. Can anybody give me some advice on the above? I'm not sure where to start looking or how to start looking -- Adrian Moisey Systems Administrator | CareerJunction | Your Future Starts Here. Web: www.careerjunction.co.za | Email: adrian@careerjunction.co.za Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842
Adrian Moisey <adrian@careerjunction.co.za> wrote: > > Hi > > >>> Now, is the bloat in the tables (which tables ?) or in the > >>> indexes (which indexes ?), or in the toast tables perhaps, or in the > >>> system catalogs or all of the above ? Or perhaps there is a > >>> long-forgotten process that got zombified while holding a huge temp > >>> table ? (not very likely, but who knows). > >>> Use pg_relation_size() and its friends to get an idea of the size > >>> of stuff. > > Can anybody give me some advice on the above? I'm not sure where to > start looking or how to start looking Running VACUUM VERBOSE will give you a detailed view of space usage of each individual table. -- Bill Moran Collaborative Fusion Inc. wmoran@collaborativefusion.com Phone: 412-422-3463x4023
Hi > Running VACUUM VERBOSE will give you a detailed view of space usage of > each individual table. I did that. Not too sure what I'm looking for, can someone tell me what this means: INFO: "blahxxx": scanned 27 of 27 pages, containing 1272 live rows and 0 dead rows; 1272 rows in sample, 1272 estimated total rows INFO: free space map contains 4667977 pages in 1199 relations DETAIL: A total of 4505344 page slots are in use (including overhead). 4505344 page slots are required to track all free space. Current limits are: 15537488 page slots, 1200 relations, using 91172 kB. -- Adrian Moisey Systems Administrator | CareerJunction | Your Future Starts Here. Web: www.careerjunction.co.za | Email: adrian@careerjunction.co.za Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842
Adrian Moisey wrote: > Hi > >> Running VACUUM VERBOSE will give you a detailed view of space usage of >> each individual table. > > I did that. > > Not too sure what I'm looking for, can someone tell me what this means: > > INFO: "blahxxx": scanned 27 of 27 pages, containing 1272 live rows and > 0 dead rows; 1272 rows in sample, 1272 estimated total rows This is a small table that takes up 27 pages and it scanned all of them. You have 1272 rows in it and none of them are dead (i.e. deleted/updated but still taking up space). > INFO: free space map contains 4667977 pages in 1199 relations > DETAIL: A total of 4505344 page slots are in use (including overhead). > 4505344 page slots are required to track all free space. > Current limits are: 15537488 page slots, 1200 relations, using 91172 kB. You are tracking ~ 4.6 million pages and have space to track ~ 15.5 million, so that's fine. You are right up against your limit of relations (tables, indexes etc) being tracked though - 1200. You'll probably want to increase max_fsm_relations - see manual for details (server configuration / free space map). -- Richard Huxton Archonet Ltd
Hi > You are tracking ~ 4.6 million pages and have space to track ~ 15.5 > million, so that's fine. You are right up against your limit of > relations (tables, indexes etc) being tracked though - 1200. You'll > probably want to increase max_fsm_relations - see manual for details > (server configuration / free space map). That is helpful, thanks. I did a grep on the output to find out more about the max_fsm_relations: INFO: free space map contains 2333562 pages in 832 relations INFO: free space map contains 3012404 pages in 544 relations INFO: free space map contains 3012303 pages in 654 relations INFO: free space map contains 3012345 pages in 669 relations INFO: free space map contains 3012394 pages in 678 relations INFO: free space map contains 3017248 pages in 717 relations INFO: free space map contains 2860737 pages in 824 relations INFO: free space map contains 4667977 pages in 1199 relations INFO: free space map contains 3140238 pages in 181 relations INFO: free space map contains 3140322 pages in 182 relations INFO: free space map contains 3140387 pages in 183 relations INFO: free space map contains 3142781 pages in 184 relations It doesn't go up close to 1200 often... should I still up that value? -- Adrian Moisey Systems Administrator | CareerJunction | Your Future Starts Here. Web: www.careerjunction.co.za | Email: adrian@careerjunction.co.za Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842
Hi >> INFO: "blahxxx": scanned 27 of 27 pages, containing 1272 live rows >> and 0 dead rows; 1272 rows in sample, 1272 estimated total rows > > This is a small table that takes up 27 pages and it scanned all of them. > You have 1272 rows in it and none of them are dead (i.e. deleted/updated > but still taking up space). I had a look through a few other tables...: INFO: "table1": scanned 22988 of 22988 pages, containing 2713446 live rows and 895662 dead rows; 45000 rows in sample, 2713446 estimate d total rows INFO: "table2": scanned 24600 of 24600 pages, containing 270585 live rows and 65524 dead rows; 45000 rows in sample, 270585 estimated total rows Is that dead rows an issue? Should I try clean it out? Will it improve performance ? -- Adrian Moisey Systems Administrator | CareerJunction | Your Future Starts Here. Web: www.careerjunction.co.za | Email: adrian@careerjunction.co.za Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842
Adrian Moisey wrote: > Hi > >>> INFO: "blahxxx": scanned 27 of 27 pages, containing 1272 live rows >>> and 0 dead rows; 1272 rows in sample, 1272 estimated total rows >> >> This is a small table that takes up 27 pages and it scanned all of >> them. You have 1272 rows in it and none of them are dead (i.e. >> deleted/updated but still taking up space). > > I had a look through a few other tables...: > > INFO: "table1": scanned 22988 of 22988 pages, containing 2713446 live > rows and 895662 dead rows; 45000 rows in sample, 2713446 estimate > d total rows > > INFO: "table2": scanned 24600 of 24600 pages, containing 270585 live > rows and 65524 dead rows; 45000 rows in sample, 270585 estimated total rows > > Is that dead rows an issue? Should I try clean it out? Will it improve > performance ? What you're hoping to see is that figure remain stable. The point of the free-space-map is to track these and allow the space to be re-used. If you find that the number of dead rows is increasing then either you are: 1. Just deleting rows 2. Not vacuuming enough - check your autovacuum settings The effect on performance is that when you read in a page from disk you're reading dead rows along with the data you are after. Trying to keep 0 dead rows in a constantly updated table isn't worth the effort though - you'd end up wasting your disk I/O on maintenance rather than queries. The figures above look high to me - 90,000 out of 270,000 and 65,000 out of 270,000. Of course, if these tables have just had bulk updates/deletes then that's fine. If there's a steady stream of updates though, you probably want to up your autovacuum settings. -- Richard Huxton Archonet Ltd