Thread: Data files became huge with no apparent reason

Data files became huge with no apparent reason

From
Dario Fumagalli
Date:
[(Please disregard my previous post, this is more updated)]

Hello to all,

I'm returned from a 2 weeks vacation and I'm now facing a problem that
is effectively blocking a critical database server.

I have a main database (proj_store) that holds 34 objects (tables and
sequences). Only one table has more than some 100s records (the products
table, totalling 975 records). The backend is PostgreSQL 7.1.1 on a
Caldera Linux OpenServer, compiled from sources.

A full, uncompressed SQL dump (with full inserts) is about 3,4 MB.

Each day, a cron-driven program I wrote one year ago updates the
database from a MS-SQL 7.0 server (inserts and updates only, in the
range of 2-5 each day).

So this database server (hosted on its stand alone corporate pretty high
end server, a Compaq ML-370 PIII PC with RAID, 1 GB RAM etc.) should run
very smootly. And in fact it is at least one year it does so.

But this is not the case any more. In fact it has grown in some abnormal
fashion and now the "du -h" command reports the following:


1.6M    ./data/base/1
1.5M    ./data/base/18719
470M    ./data/base/242014
11M     ./data/base/46821
1.7M    ./data/base/197097
2.3M    ./data/base/279236
488M    ./data/base
750k    ./data/global
16M     ./data/pg_xlog
505M    ./data

The ls -l reports:

drwx------   2 postgres database     2048 Jul 17 16:42 1
drwx------   2 postgres database     2048 May 22  2001 18719
drwx------   2 postgres database     2048 Feb 14  2002 197097
drwx------   2 postgres database     3072 Jul  3 08:35 242014
drwx------   2 postgres database     2048 Jul 17 16:16 279236
drwx------   2 postgres database     3072 Jan 29  2002 46821


That is this DB is 500+ MB!

And to think that I perform VACUUM ANALYZE at least weekly!
The only period I didn't VACUUM the database was during the mentioned 2
weeks vacation.

The database filled its partition and the backend crashed.
This morning, when I was told of the malfunction, I tried to restart the
daemon with:

/etc/rc.d/init.d/postgres start

as always and it blew up.
Then I tried manually and it told me about a FATAL 2 error (searching on
the archives revealed me it means "disk full" and from that I understood
that the database grew too large).

So freed some disk space (52 MB) and the backend finally restarted.


By the way, I have a mirrored develompent machine, last update just
before leaving for vacation.

Here the "du -h" command reports the following "assuring" stats:

1.6M    ./1
1.5M    ./18719
7.3M    ./250600
11M     ./46821
1.7M    ./197097
2.2M    ./259865
26M     .

(Some directories have different names but I assure the contents is the
same).

Now, the questions are:
- How is it possible this exceptional growth (N.B. the database server
has all the default values set in its configuration files - i.e. was not
"optimized")?
- More urgently: why now it takes 30 seconds to perform a select
count(*) on 900 records (no other clients connected), while in the past
it was almost instantaneous? All database operations are now slow as
dogs. And I have to live with this 52 MB until the technician comes with
a new disk (1 week... he is in vacation now).
- Why do the backend crashed immediately if I try to VACUUM (ANALYZE)
it? I think the backend claims additional disk space for this operation
and fills the disk again, but I'm not sure.
- And last, but not least... is it possible to restore the situation
without loosing data (backup is 3 weeks old)? I'm able to start the
daemon and perform SQL operations, but I don't know how to make
PostgreSQL release the disk space after I dumped the database in order
to reload it. And I fear an InitDB will destroy db users and their
privileges.


Thanks in advance,
Dario Fumagalli


Re: Data files became huge with no apparent reason

From
Stephan Szabo
Date:
On Wed, 28 Aug 2002, Dario Fumagalli wrote:

> I have a main database (proj_store) that holds 34 objects (tables and
> sequences). Only one table has more than some 100s records (the products
> table, totalling 975 records). The backend is PostgreSQL 7.1.1 on a
> Caldera Linux OpenServer, compiled from sources.
>
> A full, uncompressed SQL dump (with full inserts) is about 3,4 MB.
>
> Each day, a cron-driven program I wrote one year ago updates the
> database from a MS-SQL 7.0 server (inserts and updates only, in the
> range of 2-5 each day).

> - How is it possible this exceptional growth (N.B. the database server
> has all the default values set in its configuration files - i.e. was not
> "optimized")?

Well, updates are effectively similar to delete/insert for space usage
and the deleted space isn't freed until vacuum.  With 2-5 updates I'm
not sure why you'd get into that state, unless of course each was
say updating every row.

> - More urgently: why now it takes 30 seconds to perform a select
> count(*) on 900 records (no other clients connected), while in the past
> it was almost instantaneous? All database operations are now slow as
> dogs. And I have to live with this 52 MB until the technician comes with
> a new disk (1 week... he is in vacation now).

There are almost certainly a huge number of dead rows that it's checking
to see if you can see them.

> - Why do the backend crashed immediately if I try to VACUUM (ANALYZE)
> it? I think the backend claims additional disk space for this operation
> and fills the disk again, but I'm not sure.

Possibly making the logs, I'd guess, but I'm not sure.

> - And last, but not least... is it possible to restore the situation
> without loosing data (backup is 3 weeks old)? I'm able to start the

Either dump and restore, or once you can vacuum, vacuum.
I don't remember if 7.1.1 has vacuum verbose, but that'd give alot
of info on how many rows were erased and such.

If you decide to do a dump and restore, you might want to looking at
7.2.2 since you can do vacuums that do not block tables.


Re: Data files became huge with no apparent reason

From
Jochem van Dieten
Date:
Dario Fumagalli wrote:
>
> Now, the questions are:
> - How is it possible this exceptional growth (N.B. the database server
> has all the default values set in its configuration files - i.e. was not
> "optimized")?

Too many updates without vacuuming.


> - More urgently: why now it takes 30 seconds to perform a select
> count(*) on 900 records (no other clients connected), while in the past
> it was almost instantaneous?

Because a count uses a full table scan, on a table that is now pretty
large on the disk.


> - Why do the backend crashed immediately if I try to VACUUM (ANALYZE)
> it? I think the backend claims additional disk space for this operation
> and fills the disk again, but I'm not sure.
> - And last, but not least... is it possible to restore the situation
> without loosing data (backup is 3 weeks old)? I'm able to start the
> daemon and perform SQL operations, but I don't know how to make
> PostgreSQL release the disk space after I dumped the database in order
> to reload it. And I fear an InitDB will destroy db users and their
> privileges.

What I would do is:
1 Backup.
2 Drop any index that is not unique.
3 "VACUUM VERBOSE tablename" on the smallest table. Try the next
smallest table etc. If you are using PostgreSQL 7.2.x use "VACUUM FULL
VERBOSE tablename"

In the end you either have vacuumed them all and can restore your
indexes or you have an error to post here.

Jochem