Thread: big database with very small dump !?

big database with very small dump !?

From

Joao Ferreira gmail

Date:

11 August 2008, 13:31:49

Hello all,

        I'm finding it very strange that my pg takes 9Giga on disk but
        pg_dumpall produces a 250Mega dump. 'VACUUM FULL' was executed
        yesterday.

        Is this normal ? Should I be worried ?


        details bellow:
        ------------------------------------------
        # pg_dumpall --oids --clean > pg_dumpall.sql
        # ls -lh
        total 232M
        -rw-r--r--    1 postgres postgres     231M Aug 11 15:46
        pg_dumpall.sql
        # du -sh /var/pgsql/data/
        9.4G    /var/pgsql/data
        ------------------------------------------


        thx
        joao

Re: big database with very small dump !?

From

Greg Smith

Date:

11 August 2008, 13:45:55

On Mon, 11 Aug 2008, Joao Ferreira gmail wrote:

> I'm finding it very strange that my pg takes 9Giga on disk but
> pg_dumpall produces a 250Mega dump. 'VACUUM FULL' was executed
> yesterday.

If you've been running VACUUM FULL, it's probably so-called "index bloat".
Try running the query at http://wiki.postgresql.org/wiki/Disk_Usage to
figure out where all your space has gone inside the database.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: big database with very small dump !?

From

"Scott Marlowe"

Date:

11 August 2008, 13:58:10

On Mon, Aug 11, 2008 at 10:30 AM, Joao Ferreira gmail
<joao.miguel.c.ferreira@gmail.com> wrote:
> Hello all,
>
>        I'm finding it very strange that my pg takes 9Giga on disk but
>        pg_dumpall produces a 250Mega dump. 'VACUUM FULL' was executed
>        yesterday.
>
>        Is this normal ? Should I be worried ?

It really depends.  If you've got a lot of indexes then maybe that's
normal.  But most the time you don't see more than about 4 times the
space used in the db as in the dump.

It's likely you've got index bloat.  If you reload a pg_dump of the
database in question into another server how much space does that take
up?  Look into using CLUSTER or REINDEX to fix the space usage.

Re: big database with very small dump !?

From

Vlad Kosilov

Date:

11 August 2008, 16:50:37

I'd do
du -sh /var/pgsql/data/base
rather then /var/pgsql/data
depending on how your pgsql server logging is setup, there are other
folders and/or files that might take considerable disk space under
./data/ you may want to exclude those. I find this query useful for
something like this as well:

select datname,pg_size_pretty(pg_database_size(oid)) from pg_database ;

V.

Joao Ferreira gmail wrote:
> Hello all,
>
>         I'm finding it very strange that my pg takes 9Giga on disk but
>         pg_dumpall produces a 250Mega dump. 'VACUUM FULL' was executed
>         yesterday.
>
>         Is this normal ? Should I be worried ?
>
>
>         details bellow:
>         ------------------------------------------
>         # pg_dumpall --oids --clean > pg_dumpall.sql
>         # ls -lh
>         total 232M
>         -rw-r--r--    1 postgres postgres     231M Aug 11 15:46
>         pg_dumpall.sql
>         # du -sh /var/pgsql/data/
>         9.4G    /var/pgsql/data
>         ------------------------------------------
>
>
>         thx
>         joao
>
>
>
>
>

Re: big database with very small dump !?

From

Joao Ferreira gmail

Date:

12 August 2008, 06:21:18

On Mon, 2008-08-11 at 12:43 -0700, Vlad Kosilov wrote:
> ./data/ you may want to exclude those. I find this query useful for
> something like this as well:
>
> select datname,pg_size_pretty(pg_database_size(oid)) from pg_database ;
>

Hello Vlad,

I ran your query and I got the 9Gigas!

I guess it should be related to index bloating, then. Do you agree ?

thx
Joao



postgres=# select datname,pg_size_pretty(pg_database_size(oid)) from
pg_database;
    datname    | pg_size_pretty
---------------+----------------
 postgres      | 3617 kB
 egbert        | 9585 MB
 asterisk      | 3993 kB
 turba         | 3673 kB
 edgereporting | 3617 kB
 template1     | 3617 kB
 template0     | 3537 kB
(7 rows)

postgres=#


> V.
>
> Joao Ferreira gmail wrote:
> > Hello all,
> >
> >         I'm finding it very strange that my pg takes 9Giga on disk but
> >         pg_dumpall produces a 250Mega dump. 'VACUUM FULL' was executed
> >         yesterday.
> >
> >         Is this normal ? Should I be worried ?
> >
> >
> >         details bellow:
> >         ------------------------------------------
> >         # pg_dumpall --oids --clean > pg_dumpall.sql
> >         # ls -lh
> >         total 232M
> >         -rw-r--r--    1 postgres postgres     231M Aug 11 15:46
> >         pg_dumpall.sql
> >         # du -sh /var/pgsql/data/
> >         9.4G    /var/pgsql/data
> >         ------------------------------------------
> >
> >
> >         thx
> >         joao
> >
> >
> >
> >
> >
>

Re: big database with very small dump !?

From

Joao Ferreira gmail

Date:

12 August 2008, 06:34:04

Hello Greg, Vlad, Scott and all,

thanks for the feedback.

O forgot to mention that I execute REINDEX on all tables and INDEXes
every week (right after executing VACUUM FULL).

Is this enough to eliminate the possibility of "index bloat" ?



and, yes, my database has some crazy indexes. I use these indexes, and I
keep them REINDEXed to keep query execution time down. see bellow.

could these indexes be the real reason for taking up all that space ?

thanks
joao



egbert=# \d timeslots;
    Table "public.timeslots"
  Column   |  Type   | Modifiers
-----------+---------+-----------
 str1      | text    |
 str2      | text    |
 ...
 ...
 str20     | text    |
 val1      | real    |
 ...
...
 val6      | real    |
 var       | text    |
 count     | integer |
 total     | real    |
 timeslot  | integer | not null
 timestamp | integer | not null
 tsws      | integer | not null
 tses      | integer | not null
Indexes:
    "timeslots_strs_var_ts_key" UNIQUE, btree (str1, str2, str3, str4,
str5, str6, str7, str8, str9, str10, str11, str12, str13, str14, str15,
str16, str17, str18, str19, str20, var, timeslot) CLUSTER
    "timeslots_timeslot_index" btree (timeslot)
    "timeslots_timestamp_index" btree ("timestamp")
    "timeslots_var_index" btree (var)

egbert=#
------------------------------------------------------------


On Mon, 2008-08-11 at 12:45 -0400, Greg Smith wrote:
> On Mon, 11 Aug 2008, Joao Ferreira gmail wrote:
>
> > I'm finding it very strange that my pg takes 9Giga on disk but
> > pg_dumpall produces a 250Mega dump. 'VACUUM FULL' was executed
> > yesterday.
>
> If you've been running VACUUM FULL, it's probably so-called "index bloat".
> Try running the query at http://wiki.postgresql.org/wiki/Disk_Usage to
> figure out where all your space has gone inside the database.
>
> --
> * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, M

Re: big database with very small dump !?

From

Joao Ferreira gmail

Date:

12 August 2008, 07:34:23

On Mon, 2008-08-11 at 12:45 -0400, Greg Smith wrote:
> On Mon, 11 Aug 2008, Joao Ferreira gmail wrote:
>
> > I'm finding it very strange that my pg takes 9Giga on disk but
> > pg_dumpall produces a 250Mega dump. 'VACUUM FULL' was executed
> > yesterday.
>
> If you've been running VACUUM FULL, it's probably so-called "index bloat".
> Try running the query at http://wiki.postgresql.org/wiki/Disk_Usage to
> figure out where all your space has gone inside the database.
>


egbert=# SELECT nspname || '.' || relname AS "relation",
egbert-#     pg_size_pretty(pg_relation_size(nspname || '.' || relname))
AS "size"
egbert-#   FROM pg_class C
egbert-#   LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
egbert-#   WHERE nspname NOT IN ('pg_catalog', 'information_schema')
egbert-#     AND nspname !~ '^pg_toast'
egbert-#     AND pg_relation_size(nspname || '.' || relname)>0
egbert-#   ORDER BY pg_relation_size(nspname || '.' || relname) DESC
egbert-#   LIMIT 20;

             relation             |  size
----------------------------------+---------
 public.timeslots_strs_var_ts_key | 5643 MB     #this is a UNIQUE clause
 public.timeslots                 | 2660 MB     #this is the only table
 public.timeslots_timestamp_index | 583 MB      #this is an index
 public.timeslots_var_index       | 314 MB      #this is an index
 public.timeslots_timeslot_index  | 275 MB      "this is an index
(5 rows)


so it seems that the UNIQUE clause is taking up more space than the data
itself...

stil I have 2660 MB of data but the dump is about 10x smaller !!!

any hints ?




> --
> * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
>

Re: big database with very small dump !?

From

Bill Moran

Date:

12 August 2008, 08:00:15

In response to Joao Ferreira gmail <joao.miguel.c.ferreira@gmail.com>:
>
> On Mon, 2008-08-11 at 12:43 -0700, Vlad Kosilov wrote:
> > ./data/ you may want to exclude those. I find this query useful for
> > something like this as well:
> >
> > select datname,pg_size_pretty(pg_database_size(oid)) from pg_database ;
> >
>
> Hello Vlad,
>
> I ran your query and I got the 9Gigas!
>
> I guess it should be related to index bloating, then. Do you agree ?

No, the index size is included in pg_database_size().

Perhaps do a du -hd1 /var/pgsql/data to see which directories have all
the space.  (or du -h --max-depth=1 /var/pgsql/data on Linux)

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

****************************************************************
IMPORTANT: This message contains confidential information and is
intended only for the individual named. If the reader of this
message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.
****************************************************************

Re: big database with very small dump !?

From

Joao Ferreira gmail

Date:

12 August 2008, 08:01:37

On Mon, 2008-08-11 at 10:58 -0600, Scott Marlowe wrote:

> It's likely you've got index bloat.  If you reload a pg_dump of the
> database in question into another server how much space does that take
> up?

right. just loaded the dump into a clean database and everything came
down about 10 times...
----------------------------------------------
NOW: (injected dump into fresh Pg):
             relation             |  size
----------------------------------+--------
 public.timeslots                 | 549 MB
 public.timeslots_strs_var_ts_key | 482 MB
 public.timeslots_var_index       | 59 MB
 public.timeslots_timeslot_index  | 37 MB
 public.timeslots_timestamp_index | 37 MB
(5 rows)


------------------------------------------------
BEFORE:
             relation             |  size
----------------------------------+---------
 public.timeslots_strs_var_ts_key | 5643 MB
 public.timeslots                 | 2660 MB
 public.timeslots_timestamp_index | 583 MB
 public.timeslots_var_index       | 314 MB
 public.timeslots_timeslot_index  | 275 MB


I'm confused here....

on the fresh database the whole set only takes 1.3G

on the original db, even after VACUUM FULL and REINDEX it takes 9G.

can I really do anything about it ?

If I try cluster, I'm guessing I'll choose the big index and forget
about the smaller ones... is this right ?

thanks
joao


thx


>  Look into using CLUSTER or REINDEX to fix the space usage.
>

Re: big database with very small dump !?

From

Tommy Gildseth

Date:

12 August 2008, 08:10:13

Joao Ferreira gmail wrote:
> On Mon, 2008-08-11 at 12:45 -0400, Greg Smith wrote:
>> On Mon, 11 Aug 2008, Joao Ferreira gmail wrote:
>>
>>> I'm finding it very strange that my pg takes 9Giga on disk but
>>> pg_dumpall produces a 250Mega dump. 'VACUUM FULL' was executed
>>> yesterday.
>> If you've been running VACUUM FULL, it's probably so-called "index bloat".
>> Try running the query at http://wiki.postgresql.org/wiki/Disk_Usage to
>> figure out where all your space has gone inside the database.
>>
>
>
> egbert=# SELECT nspname || '.' || relname AS "relation",
> egbert-#     pg_size_pretty(pg_relation_size(nspname || '.' || relname))
> AS "size"
> egbert-#   FROM pg_class C
> egbert-#   LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
> egbert-#   WHERE nspname NOT IN ('pg_catalog', 'information_schema')
> egbert-#     AND nspname !~ '^pg_toast'
> egbert-#     AND pg_relation_size(nspname || '.' || relname)>0
> egbert-#   ORDER BY pg_relation_size(nspname || '.' || relname) DESC
> egbert-#   LIMIT 20;
>
>              relation             |  size
> ----------------------------------+---------
>  public.timeslots_strs_var_ts_key | 5643 MB     #this is a UNIQUE clause
>  public.timeslots                 | 2660 MB     #this is the only table
>  public.timeslots_timestamp_index | 583 MB      #this is an index
>  public.timeslots_var_index       | 314 MB      #this is an index
>  public.timeslots_timeslot_index  | 275 MB      "this is an index
> (5 rows)
>
>
> so it seems that the UNIQUE clause is taking up more space than the data
> itself...
>
> stil I have 2660 MB of data but the dump is about 10x smaller !!!
>
> any hints ?


I would try running a cluster on the table. This will usually clean up
things and free diskspace both in the table and the indexes.
It does require quite extensive locking though, so might not be an
option if you can't afford having the database unavailable for a few
(10-15) minutes.


--
Tommy Gildseth

Re: big database with very small dump !?

From

Tommy Gildseth

Date:

12 August 2008, 08:49:50

Joao Ferreira gmail wrote:
> On Mon, 2008-08-11 at 10:58 -0600, Scott Marlowe wrote:
>
> If I try cluster, I'm guessing I'll choose the big index and forget
> about the smaller ones... is this right ?
>

CLUSTER will sort out all the indexes, even though you're just
clustering on on.


--
Tommy Gildseth
DBA, Gruppe for databasedrift
Universitetet i Oslo, USIT
m: +47 45 86 38 50
t: +47 22 85 29 39

Re: big database with very small dump !? SOLVED

From

Joao Ferreira gmail

Date:

12 August 2008, 13:27:03

Hi guys,

If found the reason for all this problem.

explanation: vacuum reindex cron scripts were not being executed.

I executed the operations by hand and the values became normal.

thank you all for the fine discussion.

joao

On Tue, 2008-08-12 at 13:49 +0200, Tommy Gildseth wrote:
> Joao Ferreira gmail wrote:
> > On Mon, 2008-08-11 at 10:58 -0600, Scott Marlowe wrote:
> >
> > If I try cluster, I'm guessing I'll choose the big index and forget
> > about the smaller ones... is this right ?
> >
>
> CLUSTER will sort out all the indexes, even though you're just
> clustering on on.
>
>