Thread: db size

db size

From
Adrian Moisey
Date:
Hi

We currently have a 16CPU 32GB box running postgres 8.2.

When I do a pg_dump with the following parameters "/usr/bin/pg_dump -E
UTF8 -F c -b" I get a file of 14GB in size.

But the database is 110GB in size on the disk.  Why the big difference
in size?  Does this have anything to do with performance?

--
Adrian Moisey
Systems Administrator | CareerJunction | Your Future Starts Here.
Web: www.careerjunction.co.za | Email: adrian@careerjunction.co.za
Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842

Re: db size

From
Vinubalaji Gopal
Date:
Hi Adrian,



>When I do a pg_dump with the following parameters "/usr/bin/pg_dump -E
UTF8 -F c -b" I get a file of 14GB in size.


From the man page of pg_dump
"
-F format, --format=format

 Selects the format of the output. format can be one of the following:
c
output a custom archive suitable for input into pg_restore. This is the most flexible format in that it allows
reorderingof data load as well as schema elements. This format is also compressed by default. 
"

 The output is compressed and it is a dump of the database which contain the SQL commands:



>But the database is 110GB in size on the disk.  Why the big difference
>in size?  Does this have anything to do with performance?

VACUUM or VACUUM FULL of the entire database will reduce the size of the database by reclaiming any unused space and
youcan use the filesystem based backup or backup/restore strategy. 

Re: db size

From
Craig Ringer
Date:
Adrian Moisey wrote:
> Hi
>
> We currently have a 16CPU 32GB box running postgres 8.2.
>
> When I do a pg_dump with the following parameters "/usr/bin/pg_dump -E
> UTF8 -F c -b" I get a file of 14GB in size.
>
> But the database is 110GB in size on the disk.  Why the big difference
> in size?  Does this have anything to do with performance?


Reasons:

You're using a compact format designed to limit size and provide fast
dump/restore. The database, by contrast, is designed for fast access.

The database can contain "dead space" that hasn't been reclaimed by a
VACUUM. It can also have space allocated that it doesn't need, which you
can reclaim with VACUUM FULL. This dead space can really add up, but
it's the price of fast updates, inserts and deletes.

Your indexes take up disk space in the database, but are not dumped and
do not take up space in the dump file. Indexes can get very large
especially if you have lots of multi-column indexes.

I'm told that under certain loads indexes can grow full of mostly empty
pages, and a REINDEX every now and then can be useful to shrink them -
see "\h reindex" in psql. That won't affect your dump sizes as indexes
aren't dumped, but will affect the database size.

You can examine index (and relation) sizes using a query like:

select * from pg_class order by relpages desc


Data in the database is either not compressed, or (for larger fields) is
compressed with an algorithm that's very fast but doesn't achieve high
levels of compression. By contrast, the dumps are quite efficiently
compressed.

One of my database clusters is 571MB on disk at the moment, just after
being dropped, recreated, and populated from another data source. The
repopulation process is quite complex. I found that running VACUUM FULL
followed by REINDEX DATABASE dbname knocked 50MB off the database size,
pushing it down to 521MB. That's on a basically brand new DB. Note,
however, that 130MB of that space is in pg_xlog, and much of it will be
wasted as the DB has been under very light load but uses large xlogs
because it needs to perform well under huge load spikes. The size of the
`base' directory (the "real data", indexes, etc) is only 392MB.

If I dump that database using the same options you dumped yours with, I
end up with a hilariously small 29MB dump file. That's less than 10% of
the size of the main DB. The difference will be entirely due to
compression, a more compact storage layout in the dump files, and to the
lack of index data in the dumps. The database has quite a few indexes,
some of which are multicolumn indexes on tables with large numbers of
tuples, so that bloats the "live" version a lot.

--
Craig Ringer

Re: db size

From
PFC
Date:
> Hi
>
> We currently have a 16CPU 32GB box running postgres 8.2.
>
> When I do a pg_dump with the following parameters "/usr/bin/pg_dump -E
> UTF8 -F c -b" I get a file of 14GB in size.
>
> But the database is 110GB in size on the disk.  Why the big difference
> in size?  Does this have anything to do with performance?

    I have a 2GB database, which dumps to a 340 MB file...
    Two reasons :

    - I have lots of big fat but very necessary indexes (not included in dump)
    - Dump is compressed with gzip which really works well on database data.

    If you suspect your tables or indexes are bloated, restore your dump to a
test box.
    Use fsync=off during restore, you don't care about integrity on the test
box.
    This will avoid slowing down your production database.
    Then look at the size of the restored database.
    If it is much smaller than your production database, then you have bloat.
    Time to CLUSTER, or REINDEX, or VACUUM FULL (your choice), on the tables
that are bloated, and take note to vacuum those more often (and perhaps
tune the autovacuum).
    Judicious use of CLUSTER on that small, but extremely often updated table
can also be a very good option.
    8.3 and its new HOT feature are also a good idea.

Re: db size

From
Adrian Moisey
Date:
Hi

>     If you suspect your tables or indexes are bloated, restore your dump
> to a test box.
>     Use fsync=off during restore, you don't care about integrity on the
> test box.
>     This will avoid slowing down your production database.
>     Then look at the size of the restored database.
>     If it is much smaller than your production database, then you have
> bloat.

I have done that, and I get the following:

the live one is 113G
the restored one is 78G

How should I get rid of the bloat?
VACUUM FULL?

--
Adrian Moisey
Systems Administrator | CareerJunction | Your Future Starts Here.
Web: www.careerjunction.co.za | Email: adrian@careerjunction.co.za
Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842

Re: db size

From
PFC
Date:
>
>>     If you suspect your tables or indexes are bloated, restore your
>> dump to a test box.
>>     Use fsync=off during restore, you don't care about integrity on the
>> test box.
>>     This will avoid slowing down your production database.
>>     Then look at the size of the restored database.
>>     If it is much smaller than your production database, then you have
>> bloat.
>
> I have done that, and I get the following:
>
> the live one is 113G
> the restored one is 78G

    Ah.
    Good news for you is that you know that you can do something ;)

    Now, is the bloat in the tables (which tables ?) or in the indexes (which
indexes ?), or in the toast tables perhaps, or in the system catalogs or
all of the above ? Or perhaps there is a long-forgotten process that got
zombified while holding a huge temp table ? (not very likely, but who
knows).
    Use pg_relation_size() and its friends to get an idea of the size of
stuff.
    Perhaps you have 1 extremely bloated table or index, or perhaps
everything is bloated.
    The solution to your problem depends on which case you have.

Re: db size

From
Craig Ringer
Date:
Adrian Moisey wrote:
> Hi
>
>>     If you suspect your tables or indexes are bloated, restore your
>> dump to a test box.
>>     Use fsync=off during restore, you don't care about integrity on
>> the test box.
>>     This will avoid slowing down your production database.
>>     Then look at the size of the restored database.
>>     If it is much smaller than your production database, then you have
>> bloat.
>
> I have done that, and I get the following:
>
> the live one is 113G
> the restored one is 78G
>
> How should I get rid of the bloat?
> VACUUM FULL?

And/or REINDEX if you're not satisfied with the results of a VACUUM FULL.

http://www.postgresql.org/docs/8.3/interactive/vacuum.html
http://www.postgresql.org/docs/8.3/interactive/sql-reindex.html

Of course, all of these will have performance consequences while they're
running, and take out locks that prevent certain other operatons as
shown in table 13-2:

http://www.postgresql.org/docs/8.3/static/explicit-locking.html

and the explanation following it.

Note in particular:

----
ACCESS EXCLUSIVE

    Conflicts with locks of all modes (ACCESS SHARE, ROW SHARE, ROW
EXCLUSIVE, SHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW EXCLUSIVE,
EXCLUSIVE, and ACCESS EXCLUSIVE). This mode guarantees that the holder
is the only transaction accessing the table in any way.

    Acquired by the ALTER TABLE, DROP TABLE, TRUNCATE, REINDEX, CLUSTER,
and VACUUM FULL commands. This is also the default lock mode for LOCK
TABLE statements that do not specify a mode explicitly.

    Tip: Only an ACCESS EXCLUSIVE lock blocks a SELECT (without FOR
UPDATE/SHARE) statement.
----

In other words, you won't be doing much with a table/index while a
VACUUM FULL or a REINDEX is in progress on it.

Given that, you probably want to check your table/index sizes and see if
there are particular problem tables or indexes, rather than just using a
sledgehammer approach.

--
Craig Ringer

Re: db size

From
Adrian Moisey
Date:
Hi

>> the live one is 113G
>> the restored one is 78G
 >
>     Good news for you is that you know that you can do something ;)

:)

Will this help with performance ?

>     Now, is the bloat in the tables (which tables ?) or in the indexes
> (which indexes ?), or in the toast tables perhaps, or in the system
> catalogs or all of the above ? Or perhaps there is a long-forgotten
> process that got zombified while holding a huge temp table ? (not very
> likely, but who knows).
>     Use pg_relation_size() and its friends to get an idea of the size of
> stuff.

I'll look into that, thanks

--
Adrian Moisey
Systems Administrator | CareerJunction | Your Future Starts Here.
Web: www.careerjunction.co.za | Email: adrian@careerjunction.co.za
Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842

Re: db size

From
PFC
Date:
> Will this help with performance ?

    Depends if the bloat is in part of your working set. If debloating can
make the working set fit in RAM, or lower your IOs, you'll get a boost.

>>     Now, is the bloat in the tables (which tables ?) or in the indexes
>> (which indexes ?), or in the toast tables perhaps, or in the system
>> catalogs or all of the above ? Or perhaps there is a long-forgotten
>> process that got zombified while holding a huge temp table ? (not very
>> likely, but who knows).
>>     Use pg_relation_size() and its friends to get an idea of the size
>> of stuff.
>
> I'll look into that, thanks
>



Re: db size

From
Bill Moran
Date:
In response to Adrian Moisey <adrian@careerjunction.co.za>:
>
> We currently have a 16CPU 32GB box running postgres 8.2.
>
> When I do a pg_dump with the following parameters "/usr/bin/pg_dump -E
> UTF8 -F c -b" I get a file of 14GB in size.
>
> But the database is 110GB in size on the disk.  Why the big difference
> in size?  Does this have anything to do with performance?

In a dump, indexes are a single command.  In the actual database, the
indexes actually contain all the data the indexes require, which can
be substantially more in size than the command to create the index.

Additionally, a running database has a certain amount of wasted space.
If you're running vacuum on a proper schedule, this won't get out of
hand.  Read this page to understand better:
http://www.postgresql.org/docs/8.1/static/maintenance.html

And lastly, I expect that the pg_dump format is able to do more aggressive
compression than the running database.

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

Re: db size

From
Adrian Moisey
Date:
Hi

>>>     Now, is the bloat in the tables (which tables ?) or in the
>>> indexes (which indexes ?), or in the toast tables perhaps, or in the
>>> system catalogs or all of the above ? Or perhaps there is a
>>> long-forgotten process that got zombified while holding a huge temp
>>> table ? (not very likely, but who knows).
>>>     Use pg_relation_size() and its friends to get an idea of the size
>>> of stuff.

Can anybody give me some advice on the above?  I'm not sure where to
start looking or how to start looking

--
Adrian Moisey
Systems Administrator | CareerJunction | Your Future Starts Here.
Web: www.careerjunction.co.za | Email: adrian@careerjunction.co.za
Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842

Re: db size

From
Bill Moran
Date:
Adrian Moisey <adrian@careerjunction.co.za> wrote:
>
> Hi
>
> >>>     Now, is the bloat in the tables (which tables ?) or in the
> >>> indexes (which indexes ?), or in the toast tables perhaps, or in the
> >>> system catalogs or all of the above ? Or perhaps there is a
> >>> long-forgotten process that got zombified while holding a huge temp
> >>> table ? (not very likely, but who knows).
> >>>     Use pg_relation_size() and its friends to get an idea of the size
> >>> of stuff.
>
> Can anybody give me some advice on the above?  I'm not sure where to
> start looking or how to start looking

Running VACUUM VERBOSE will give you a detailed view of space usage of
each individual table.

--
Bill Moran
Collaborative Fusion Inc.

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

Re: db size

From
Adrian Moisey
Date:
Hi

> Running VACUUM VERBOSE will give you a detailed view of space usage of
> each individual table.

I did that.

Not too sure what I'm looking for, can someone tell me what this means:

INFO:  "blahxxx": scanned 27 of 27 pages, containing 1272 live rows and
0 dead rows; 1272 rows in sample, 1272 estimated total rows
INFO:  free space map contains 4667977 pages in 1199 relations
DETAIL:  A total of 4505344 page slots are in use (including overhead).
4505344 page slots are required to track all free space.
Current limits are:  15537488 page slots, 1200 relations, using 91172 kB.



--
Adrian Moisey
Systems Administrator | CareerJunction | Your Future Starts Here.
Web: www.careerjunction.co.za | Email: adrian@careerjunction.co.za
Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842

Re: db size

From
Richard Huxton
Date:
Adrian Moisey wrote:
> Hi
>
>> Running VACUUM VERBOSE will give you a detailed view of space usage of
>> each individual table.
>
> I did that.
>
> Not too sure what I'm looking for, can someone tell me what this means:
>
> INFO:  "blahxxx": scanned 27 of 27 pages, containing 1272 live rows and
> 0 dead rows; 1272 rows in sample, 1272 estimated total rows

This is a small table that takes up 27 pages and it scanned all of them.
You have 1272 rows in it and none of them are dead (i.e. deleted/updated
but still taking up space).

> INFO:  free space map contains 4667977 pages in 1199 relations
> DETAIL:  A total of 4505344 page slots are in use (including overhead).
> 4505344 page slots are required to track all free space.
> Current limits are:  15537488 page slots, 1200 relations, using 91172 kB.

You are tracking ~ 4.6 million pages and have space to track ~ 15.5
million, so that's fine. You are right up against your limit of
relations (tables, indexes etc) being tracked though - 1200. You'll
probably want to increase max_fsm_relations - see manual for details
(server configuration / free space map).

--
   Richard Huxton
   Archonet Ltd

Re: db size

From
Adrian Moisey
Date:
Hi

> You are tracking ~ 4.6 million pages and have space to track ~ 15.5
> million, so that's fine. You are right up against your limit of
> relations (tables, indexes etc) being tracked though - 1200. You'll
> probably want to increase max_fsm_relations - see manual for details
> (server configuration / free space map).

That is helpful, thanks.

I did a grep on the output to find out more about the max_fsm_relations:

INFO:  free space map contains 2333562 pages in 832 relations
INFO:  free space map contains 3012404 pages in 544 relations
INFO:  free space map contains 3012303 pages in 654 relations
INFO:  free space map contains 3012345 pages in 669 relations
INFO:  free space map contains 3012394 pages in 678 relations
INFO:  free space map contains 3017248 pages in 717 relations
INFO:  free space map contains 2860737 pages in 824 relations
INFO:  free space map contains 4667977 pages in 1199 relations
INFO:  free space map contains 3140238 pages in 181 relations
INFO:  free space map contains 3140322 pages in 182 relations
INFO:  free space map contains 3140387 pages in 183 relations
INFO:  free space map contains 3142781 pages in 184 relations

It doesn't go up close to 1200 often... should I still up that value?

--
Adrian Moisey
Systems Administrator | CareerJunction | Your Future Starts Here.
Web: www.careerjunction.co.za | Email: adrian@careerjunction.co.za
Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842

Re: db size

From
Adrian Moisey
Date:
Hi

>> INFO:  "blahxxx": scanned 27 of 27 pages, containing 1272 live rows
>> and 0 dead rows; 1272 rows in sample, 1272 estimated total rows
>
> This is a small table that takes up 27 pages and it scanned all of them.
> You have 1272 rows in it and none of them are dead (i.e. deleted/updated
> but still taking up space).

I had a look through a few other tables...:

INFO:  "table1": scanned 22988 of 22988 pages, containing 2713446 live
rows and 895662 dead rows; 45000 rows in sample, 2713446 estimate
d total rows

INFO:  "table2": scanned 24600 of 24600 pages, containing 270585 live
rows and 65524 dead rows; 45000 rows in sample, 270585 estimated total rows

Is that dead rows an issue?  Should I try clean it out?  Will it improve
performance ?


--
Adrian Moisey
Systems Administrator | CareerJunction | Your Future Starts Here.
Web: www.careerjunction.co.za | Email: adrian@careerjunction.co.za
Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842

Re: db size

From
Richard Huxton
Date:
Adrian Moisey wrote:
> Hi
>
>>> INFO:  "blahxxx": scanned 27 of 27 pages, containing 1272 live rows
>>> and 0 dead rows; 1272 rows in sample, 1272 estimated total rows
>>
>> This is a small table that takes up 27 pages and it scanned all of
>> them. You have 1272 rows in it and none of them are dead (i.e.
>> deleted/updated but still taking up space).
>
> I had a look through a few other tables...:
>
> INFO:  "table1": scanned 22988 of 22988 pages, containing 2713446 live
> rows and 895662 dead rows; 45000 rows in sample, 2713446 estimate
> d total rows
>
> INFO:  "table2": scanned 24600 of 24600 pages, containing 270585 live
> rows and 65524 dead rows; 45000 rows in sample, 270585 estimated total rows
>
> Is that dead rows an issue?  Should I try clean it out?  Will it improve
> performance ?

What you're hoping to see is that figure remain stable. The point of the
free-space-map is to track these and allow the space to be re-used. If
you find that the number of dead rows is increasing then either you are:
1. Just deleting rows
2. Not vacuuming enough - check your autovacuum settings

The effect on performance is that when you read in a page from disk
you're reading dead rows along with the data you are after. Trying to
keep 0 dead rows in a constantly updated table isn't worth the effort
though - you'd end up wasting your disk I/O on maintenance rather than
queries.

The figures above look high to me - 90,000 out of 270,000 and 65,000 out
of 270,000. Of course, if these tables have just had bulk
updates/deletes then that's fine. If there's a steady stream of updates
though, you probably want to up your autovacuum settings.

--
   Richard Huxton
   Archonet Ltd