Thread: big database with very small dump !?
Hello all, I'm finding it very strange that my pg takes 9Giga on disk but pg_dumpall produces a 250Mega dump. 'VACUUM FULL' was executed yesterday. Is this normal ? Should I be worried ? details bellow: ------------------------------------------ # pg_dumpall --oids --clean > pg_dumpall.sql # ls -lh total 232M -rw-r--r-- 1 postgres postgres 231M Aug 11 15:46 pg_dumpall.sql # du -sh /var/pgsql/data/ 9.4G /var/pgsql/data ------------------------------------------ thx joao
On Mon, 11 Aug 2008, Joao Ferreira gmail wrote: > I'm finding it very strange that my pg takes 9Giga on disk but > pg_dumpall produces a 250Mega dump. 'VACUUM FULL' was executed > yesterday. If you've been running VACUUM FULL, it's probably so-called "index bloat". Try running the query at http://wiki.postgresql.org/wiki/Disk_Usage to figure out where all your space has gone inside the database. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Mon, Aug 11, 2008 at 10:30 AM, Joao Ferreira gmail <joao.miguel.c.ferreira@gmail.com> wrote: > Hello all, > > I'm finding it very strange that my pg takes 9Giga on disk but > pg_dumpall produces a 250Mega dump. 'VACUUM FULL' was executed > yesterday. > > Is this normal ? Should I be worried ? It really depends. If you've got a lot of indexes then maybe that's normal. But most the time you don't see more than about 4 times the space used in the db as in the dump. It's likely you've got index bloat. If you reload a pg_dump of the database in question into another server how much space does that take up? Look into using CLUSTER or REINDEX to fix the space usage.
I'd do du -sh /var/pgsql/data/base rather then /var/pgsql/data depending on how your pgsql server logging is setup, there are other folders and/or files that might take considerable disk space under ./data/ you may want to exclude those. I find this query useful for something like this as well: select datname,pg_size_pretty(pg_database_size(oid)) from pg_database ; V. Joao Ferreira gmail wrote: > Hello all, > > I'm finding it very strange that my pg takes 9Giga on disk but > pg_dumpall produces a 250Mega dump. 'VACUUM FULL' was executed > yesterday. > > Is this normal ? Should I be worried ? > > > details bellow: > ------------------------------------------ > # pg_dumpall --oids --clean > pg_dumpall.sql > # ls -lh > total 232M > -rw-r--r-- 1 postgres postgres 231M Aug 11 15:46 > pg_dumpall.sql > # du -sh /var/pgsql/data/ > 9.4G /var/pgsql/data > ------------------------------------------ > > > thx > joao > > > > >
On Mon, 2008-08-11 at 12:43 -0700, Vlad Kosilov wrote: > ./data/ you may want to exclude those. I find this query useful for > something like this as well: > > select datname,pg_size_pretty(pg_database_size(oid)) from pg_database ; > Hello Vlad, I ran your query and I got the 9Gigas! I guess it should be related to index bloating, then. Do you agree ? thx Joao postgres=# select datname,pg_size_pretty(pg_database_size(oid)) from pg_database; datname | pg_size_pretty ---------------+---------------- postgres | 3617 kB egbert | 9585 MB asterisk | 3993 kB turba | 3673 kB edgereporting | 3617 kB template1 | 3617 kB template0 | 3537 kB (7 rows) postgres=# > V. > > Joao Ferreira gmail wrote: > > Hello all, > > > > I'm finding it very strange that my pg takes 9Giga on disk but > > pg_dumpall produces a 250Mega dump. 'VACUUM FULL' was executed > > yesterday. > > > > Is this normal ? Should I be worried ? > > > > > > details bellow: > > ------------------------------------------ > > # pg_dumpall --oids --clean > pg_dumpall.sql > > # ls -lh > > total 232M > > -rw-r--r-- 1 postgres postgres 231M Aug 11 15:46 > > pg_dumpall.sql > > # du -sh /var/pgsql/data/ > > 9.4G /var/pgsql/data > > ------------------------------------------ > > > > > > thx > > joao > > > > > > > > > > >
Hello Greg, Vlad, Scott and all, thanks for the feedback. O forgot to mention that I execute REINDEX on all tables and INDEXes every week (right after executing VACUUM FULL). Is this enough to eliminate the possibility of "index bloat" ? and, yes, my database has some crazy indexes. I use these indexes, and I keep them REINDEXed to keep query execution time down. see bellow. could these indexes be the real reason for taking up all that space ? thanks joao egbert=# \d timeslots; Table "public.timeslots" Column | Type | Modifiers -----------+---------+----------- str1 | text | str2 | text | ... ... str20 | text | val1 | real | ... ... val6 | real | var | text | count | integer | total | real | timeslot | integer | not null timestamp | integer | not null tsws | integer | not null tses | integer | not null Indexes: "timeslots_strs_var_ts_key" UNIQUE, btree (str1, str2, str3, str4, str5, str6, str7, str8, str9, str10, str11, str12, str13, str14, str15, str16, str17, str18, str19, str20, var, timeslot) CLUSTER "timeslots_timeslot_index" btree (timeslot) "timeslots_timestamp_index" btree ("timestamp") "timeslots_var_index" btree (var) egbert=# ------------------------------------------------------------ On Mon, 2008-08-11 at 12:45 -0400, Greg Smith wrote: > On Mon, 11 Aug 2008, Joao Ferreira gmail wrote: > > > I'm finding it very strange that my pg takes 9Giga on disk but > > pg_dumpall produces a 250Mega dump. 'VACUUM FULL' was executed > > yesterday. > > If you've been running VACUUM FULL, it's probably so-called "index bloat". > Try running the query at http://wiki.postgresql.org/wiki/Disk_Usage to > figure out where all your space has gone inside the database. > > -- > * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, M
On Mon, 2008-08-11 at 12:45 -0400, Greg Smith wrote: > On Mon, 11 Aug 2008, Joao Ferreira gmail wrote: > > > I'm finding it very strange that my pg takes 9Giga on disk but > > pg_dumpall produces a 250Mega dump. 'VACUUM FULL' was executed > > yesterday. > > If you've been running VACUUM FULL, it's probably so-called "index bloat". > Try running the query at http://wiki.postgresql.org/wiki/Disk_Usage to > figure out where all your space has gone inside the database. > egbert=# SELECT nspname || '.' || relname AS "relation", egbert-# pg_size_pretty(pg_relation_size(nspname || '.' || relname)) AS "size" egbert-# FROM pg_class C egbert-# LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) egbert-# WHERE nspname NOT IN ('pg_catalog', 'information_schema') egbert-# AND nspname !~ '^pg_toast' egbert-# AND pg_relation_size(nspname || '.' || relname)>0 egbert-# ORDER BY pg_relation_size(nspname || '.' || relname) DESC egbert-# LIMIT 20; relation | size ----------------------------------+--------- public.timeslots_strs_var_ts_key | 5643 MB #this is a UNIQUE clause public.timeslots | 2660 MB #this is the only table public.timeslots_timestamp_index | 583 MB #this is an index public.timeslots_var_index | 314 MB #this is an index public.timeslots_timeslot_index | 275 MB "this is an index (5 rows) so it seems that the UNIQUE clause is taking up more space than the data itself... stil I have 2660 MB of data but the dump is about 10x smaller !!! any hints ? > -- > * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD >
In response to Joao Ferreira gmail <joao.miguel.c.ferreira@gmail.com>: > > On Mon, 2008-08-11 at 12:43 -0700, Vlad Kosilov wrote: > > ./data/ you may want to exclude those. I find this query useful for > > something like this as well: > > > > select datname,pg_size_pretty(pg_database_size(oid)) from pg_database ; > > > > Hello Vlad, > > I ran your query and I got the 9Gigas! > > I guess it should be related to index bloating, then. Do you agree ? No, the index size is included in pg_database_size(). Perhaps do a du -hd1 /var/pgsql/data to see which directories have all the space. (or du -h --max-depth=1 /var/pgsql/data on Linux) -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023 **************************************************************** IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ****************************************************************
On Mon, 2008-08-11 at 10:58 -0600, Scott Marlowe wrote: > It's likely you've got index bloat. If you reload a pg_dump of the > database in question into another server how much space does that take > up? right. just loaded the dump into a clean database and everything came down about 10 times... ---------------------------------------------- NOW: (injected dump into fresh Pg): relation | size ----------------------------------+-------- public.timeslots | 549 MB public.timeslots_strs_var_ts_key | 482 MB public.timeslots_var_index | 59 MB public.timeslots_timeslot_index | 37 MB public.timeslots_timestamp_index | 37 MB (5 rows) ------------------------------------------------ BEFORE: relation | size ----------------------------------+--------- public.timeslots_strs_var_ts_key | 5643 MB public.timeslots | 2660 MB public.timeslots_timestamp_index | 583 MB public.timeslots_var_index | 314 MB public.timeslots_timeslot_index | 275 MB I'm confused here.... on the fresh database the whole set only takes 1.3G on the original db, even after VACUUM FULL and REINDEX it takes 9G. can I really do anything about it ? If I try cluster, I'm guessing I'll choose the big index and forget about the smaller ones... is this right ? thanks joao thx > Look into using CLUSTER or REINDEX to fix the space usage. >
Joao Ferreira gmail wrote: > On Mon, 2008-08-11 at 12:45 -0400, Greg Smith wrote: >> On Mon, 11 Aug 2008, Joao Ferreira gmail wrote: >> >>> I'm finding it very strange that my pg takes 9Giga on disk but >>> pg_dumpall produces a 250Mega dump. 'VACUUM FULL' was executed >>> yesterday. >> If you've been running VACUUM FULL, it's probably so-called "index bloat". >> Try running the query at http://wiki.postgresql.org/wiki/Disk_Usage to >> figure out where all your space has gone inside the database. >> > > > egbert=# SELECT nspname || '.' || relname AS "relation", > egbert-# pg_size_pretty(pg_relation_size(nspname || '.' || relname)) > AS "size" > egbert-# FROM pg_class C > egbert-# LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) > egbert-# WHERE nspname NOT IN ('pg_catalog', 'information_schema') > egbert-# AND nspname !~ '^pg_toast' > egbert-# AND pg_relation_size(nspname || '.' || relname)>0 > egbert-# ORDER BY pg_relation_size(nspname || '.' || relname) DESC > egbert-# LIMIT 20; > > relation | size > ----------------------------------+--------- > public.timeslots_strs_var_ts_key | 5643 MB #this is a UNIQUE clause > public.timeslots | 2660 MB #this is the only table > public.timeslots_timestamp_index | 583 MB #this is an index > public.timeslots_var_index | 314 MB #this is an index > public.timeslots_timeslot_index | 275 MB "this is an index > (5 rows) > > > so it seems that the UNIQUE clause is taking up more space than the data > itself... > > stil I have 2660 MB of data but the dump is about 10x smaller !!! > > any hints ? I would try running a cluster on the table. This will usually clean up things and free diskspace both in the table and the indexes. It does require quite extensive locking though, so might not be an option if you can't afford having the database unavailable for a few (10-15) minutes. -- Tommy Gildseth
Joao Ferreira gmail wrote: > On Mon, 2008-08-11 at 10:58 -0600, Scott Marlowe wrote: > > If I try cluster, I'm guessing I'll choose the big index and forget > about the smaller ones... is this right ? > CLUSTER will sort out all the indexes, even though you're just clustering on on. -- Tommy Gildseth DBA, Gruppe for databasedrift Universitetet i Oslo, USIT m: +47 45 86 38 50 t: +47 22 85 29 39
Hi guys, If found the reason for all this problem. explanation: vacuum reindex cron scripts were not being executed. I executed the operations by hand and the values became normal. thank you all for the fine discussion. joao On Tue, 2008-08-12 at 13:49 +0200, Tommy Gildseth wrote: > Joao Ferreira gmail wrote: > > On Mon, 2008-08-11 at 10:58 -0600, Scott Marlowe wrote: > > > > If I try cluster, I'm guessing I'll choose the big index and forget > > about the smaller ones... is this right ? > > > > CLUSTER will sort out all the indexes, even though you're just > clustering on on. > >