Re: Call closed: Almost happy ending (from "Data files became huge with no apparent reason" thread) - Mailing list pgsql-general

From Dario Fumagalli
Subject Re: Call closed: Almost happy ending (from "Data files became huge with no apparent reason" thread)
Date
Msg-id 3D74D7E9.5000500@tin.it
Whole thread Raw
In response to Re: Almost happy ending (from "Data files became huge with  (Bruce Momjian <pgman@candle.pha.pa.us>)
Responses Re: Call closed: Almost happy ending (from "Data files became huge with no apparent reason" thread)
List pgsql-general

Martijn van Oosterhout wrote:

> On Tue, Sep 03, 2002 at 04:08:18PM +0200, Dario Fumagalli wrote:
>
>>Martijn van Oosterhout wrote:
>>
>>>What happens if you say "REINDEX INDEX products_pkey" or "REINDEX INDEX
>>>det_prod_dep_consumpt_pkey". Do those numbers change?
>>>
>>>
>>
>>Yes, it did it!
>>
>>proj_store=# REINDEX INDEX products_pkey;
>>REINDEX
>>proj_store=# REINDEX INDEX det_prod_dep_consumpt_pkey
>>REINDEX
>>proj_store=# select relname, relfilenode as node, reltoastrelid as relid,
>>proj_store=# reltoastidxid as idxid, relkind as k, relpages as pag from
>>pg_class order by relname;
>>
>>[Snip]
>>             relname             |  node   | relid  | idxid  | k | pag
>>--------------------------------+---------+--------+--------+---+------
>>det_prod_dep_consumpt_pkey      | 1224634 |      0 |      0 | i |    6
>>[Snip]
>>products_pkey                   | 1224633 |      0 |      0 | i |    5
>>
>>BUT... there is a but...
>>
>>The du -h command says that, despite the reported index reduction, the
>>overall database size has increased to 105 MB (?)!.
>>
>>And this raises a question: why a reindex proj_store force in single
>>user mode did not packed the indexes while from psql / multiuser
>>postmaster it did? Bah!
>>
>
>
> It sounds like there is actually an awful lot of activity going on.
>
>
>>NOTICE:  Index idx_products_csc: Pages 832; Tuples 976: Deleted 90768.
>>CPU 0.59s
>>/1.12u sec.
>>
>
> There you go. 90000 deleted tuples, 900 active. When was the last time you
> did a vacuum? Basically, you need to do a vacuum whenever you want to reuse
> space. Especially prior to 7.2.
>


As I stated in an earlier post, i vacuumed was about 2 - 3 weeks earlier
(my vacation duration).


>
>>NOTICE:  --Relation pg_toast_258417--
>>NOTICE:  Pages 0: Changed 0, reaped 0, Empty 0, New 0; Tup 0: Vac 0,
>>Keep/VTL 0/
>>0, Crash 0, UnUsed 0, MinLen 0, MaxLen 0; Re-using: Free/Avail. Space
>>0/0; EndEm
>>pty/Avail. Pages 0/0. CPU 0.00s/0.00u sec.
>>NOTICE:  Index pg_toast_258417_idx: Pages 1; Tuples 0. CPU 0.00s/0.00u sec.
>>
>
> Toast table/index is empty, so it's not that.
>
>
>>I finally made a VACUUM VERBOSE ANALYZE.
>>
>>It did things (lots of deletions and some moves) on the affected tables.
>>But now, if I reissue the pg_class query, I get figures like the following:
>>              relname             |  node   | relid  | idxid  | k | pag
>>---------------------------------+---------+--------+--------+---+------
>>  idx_det_prod_comp_vid_connsp    | 1094065 |      0 |      0 | i |   1
>>  idx_det_prod_dep_consumptdp     | 1094068 |      0 |      0 | i | 479
>>  idx_det_prod_vid_connsp         | 1094059 |      0 |      0 | i |   1
>>  idx_det_prod_vid_resp           | 1094053 |      0 |      0 | i |   1
>>  idx_det_prod_vid_stdsp          | 1094056 |      0 |      0 | i |   1
>>  idx_det_turnover_c              | 1094095 |      0 |      0 | i |   4
>>  idx_det_turnover_t              | 1094098 |      0 |      0 | i |   4
>>  idx_products_b                  | 1094044 |      0 |      0 | i | 733
>>  idx_products_b2                 | 1094047 |      0 |      0 | i | 281
>>  idx_products_csc                | 1094041 |      0 |      0 | i | 832
>>  idx_products_v                  | 1094050 |      0 |      0 | i | 270
>>
>>So, new big index file have born!
>>After this vacuum, du -h reports:
>>
>>[postgres@web base]$ du -h
>>1.6M    ./1
>>1.5M    ./18719
>>24M     ./242014
>>11M     ./46821
>>1.7M    ./197097
>>2.3M    ./279236
>>43M
>>
>>A lower disk usage than after the reindex above (still more than the
>>expected 26MB). The development machine (with 3 weeks old outdated data,
>>unfortunately, but still with the same record figures (+-5%)):
>>
>
> Your disk space seems to fluctuate an awful lot. Are you doing any queries
> that UPDATE rows, even if the values are not changing?
>


Yes, in this database insert / upate ratio is 1 : 180, and db updates
are performed 4 times a day by an external PHP console script.


>
>>This question is about a different database server for another (very
>>big) company. Here records are not hundreds, they are 10 millions up.
>>And a vacuum I fear will last for more that some seconds.
>>
>
> Quite true. Hence the non-blocking vacuum in 7.2. Similarly, scheduling a
> VACUUM at 4am and no-one will notice :)


It depends... This company intranet is accessible from working units in:
USA
Germany
Italy
UK

So 4am is somewhat relative here...


>
>
>>I hoped to be able to use a Debian. I had to mediate between Open Source
>>(Debian, the preferred for us the developers) and a closed, commercial
>>solution (loved by management... sigh). The solution was a Caldera... a
>>Linux solution but with some commercial bells and whistles on it to make
>>everyone happy. But I don't like it as Debian for programming purposes.
>>It is too Red-Hattish :) and comes with ancient tools and libraries.
>>
>
> Hey, if it works, it works, right :)
>
>
>>- Given that single user REINDEX [database name] FORCE in single user
>>modedid rebuild the indexes but did not optimize them;
>>- Given that a REINDEX INDEX [index name] did optimize and shrink a
>>single index but increased the overall data size;
>>- Given that a VACUUM VERBOSE ANALYZE somehow alters the overall
>>database files size (seems to re-distribute the file space evenly across
>>indexes);
>>- Given that I was at a loss and ready to do anything with a logical
>>sense (or not);
>>
>
> Hmm. BTW, I'd never heard of REINDEX DATABASE before yesterday, so I'm not
> quite clear on what it does. REINDEX INDEX has always worked for me.
>


I found it in the official html pages. It is one proprietary SQL command
that stating the original page in

http://www.postgresql.org/idocs/index.php?sql-reindex.html

Description
REINDEX is used to rebuild corrupted indexes. Although in theory this
should never be necessary, in practice indexes may become corrupted due
to software bugs or hardware failures. REINDEX provides a recovery method.

If you suspect corruption of an index on a user table, you can simply
rebuild that index, or all indexes on the table, using REINDEX INDEX or
REINDEX TABLE.

Note: Another approach to dealing with a corrupted user-table index is
just to drop and recreate it. This may in fact be preferable if you
would like to maintain some semblance of normal operation on the table
meanwhile. REINDEX acquires exclusive lock on the table, while CREATE
INDEX only locks out writes not reads of the table.

Things are more difficult if you need to recover from corruption of an
index on a system table. In this case it's important for the backend
doing the recovery to not have used any of the suspect indexes itself.
(Indeed, in this sort of scenario you may find that backends are
crashing immediately at startup, due to reliance on the corrupted
indexes.) To recover safely, the postmaster must be shut down and a
stand-alone PostgreSQL backend must be started instead, giving it the
command-line options -O and -P (these options allow system table
modifications and prevent use of system indexes, respectively). Then
issue REINDEX INDEX, REINDEX TABLE, or REINDEX DATABASE depending on how
much you want to reconstruct. If in doubt, use REINDEX DATABASE FORCE to
force reconstruction of all system indexes in the database. Then quit
the standalone backend and restart the postmaster.

Since this is likely the only situation when most people will ever use a
standalone backend, some usage notes might be in order:



Start the backend with a command like

postgres -D $PGDATA -O -P my_database
Provide the correct path to the database area with -D, or make sure that
the environment variable PGDATA is set. Also specify the name of the
particular database you want to work in.

You can issue any SQL command, not only REINDEX.

Be aware that the standalone backend treats newline as the command entry
terminator; there is no intelligence about semicolons, as there is in
psql. To continue a command across multiple lines, you must type
backslash just before each newline except the last one. Also, you won't
have any of the conveniences of readline processing (no command history,
for example).

To quit the backend, type EOF (control-D, usually).

See the postgres reference page for more information.

Usage
Recreate the indexes on the table mytable:

      REINDEX TABLE mytable;


Rebuild a single index:

     REINDEX INDEX my_index;


Rebuild all system indexes (this will only work in a standalone backend):

     REINDEX DATABASE my_database FORCE;

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

And yes, only now I saw that it says that the command will "Rebuild all
system indexes" and not ALL indexes... GRR..


>
>>1) I reindexed all excessively big indexes one by one in psql:
>>
>>2) I performed a VACUUM VERBOSE ANALYZE, that having nothing to
>>re-distribute, simply "ate" all the wasted space.
>>
>>I know this is probably the weirdest and uninformed / untechnical
>>"reasoning" you saw in this list from years, but it has one small,
>>almost unnoticeable advantage: IT WORKED AND NOW THE DATABASE IS ITS
>>ORIGINAL SIZE AGAIN (even less).
>>
>
> Actually, that seems to me to be the way to squeeze the most space out of
> the DB. It should work very reliably.
>
> Anyway, you should be up and running now. Still odd though.
>

Best regards,
Dario Fumagalli


pgsql-general by date:

Previous
From: Stephan Szabo
Date:
Subject: Re: DELETE SQL too slow.
Next
From: Dan Ostrowski
Date:
Subject: Re: [PHP] Info into Class