Re: serious problems with vacuuming databases - Mailing list pgsql-performance

From Tomas Vondra
Subject Re: serious problems with vacuuming databases
Date
Msg-id 44396FB6.6000900@fuzzy.cz
Whole thread Raw
In response to Re: serious problems with vacuuming databases  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Tom Lane wrote:
> Tomas Vondra <tv@fuzzy.cz> writes:
>> 1) drop, create and fill table B (aggregated data from A, C, D)
>> 2) copy 'old' data from A, C and D to A_old, C_old a D_old
>> 3) delete old data from A, C, D
>> 4) dump data from A_old, C_old and D_old
>> 5) truncate tables A, C, D
>> 6) vacuum full analyze tables A, C, D, A_old, C_old and D_old
>
> Steps 3/5/6 make no sense at all to me: why bother deleting data retail
> when you are about to truncate the tables, and why bother vacuuming a
> table you just truncated?  Is the above *really* what you did?

Yes, the above is exactly what I did with the exception that there's an
error in the step (5) - there should be truncation of the _old tables.
The reasons that led me to this particular steps are two:

(a) I don't want to delete all the data, just data older than two days.
    Until today we've kept all the data (containing two years access log
    for one of our production websites), but now we've decided to remove
    the data we don't need and leave just the aggregated version. That's
    why I have used DELETE rather than TRUNCATE.

(b) I want to create 'incremental' backups, so once I'll need the data
    I can take several packages (dumps of _old tables) and import them
    one after another. Using pg_dump doesn't allow me this - dumping the
    whole tables A, C and D is not an option, because I want to leave
    some of the data in the tables.

    From now on, the tables will be cleared on a daily (or maybe weekly)
    basis, which means much smaller amount of data (about 50.000 rows
    a day).
>
>> The problem is this - today, we run a scheduled VACUUM FULL ANALYZE for
>> the whole database, and it runs for about 10 hours already, which is
>> much more than usual (and it is still running).
>
> Is it actually grinding the disk, or is it just blocked waiting for
> someone's lock?  If it's actually doing work, which table is it working
> on?  (You should be able to figure that out by looking in pg_locks,
> or by strace'ing the process to see which files it's touching.)

Thanks for the hint, I'll try to figure that in case the dump/reload
recommended by Alvaro Herrera doesn't help. But as far as I know the
disks are not grinded right now, so I guess it's the problem with indexes.

t.v.

pgsql-performance by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Scaling up PostgreSQL in Multiple CPU / Dual Core
Next
From: Tomas Vondra
Date:
Subject: Re: serious problems with vacuuming databases