Thread: Is my vacuumdb stuck in a loop?

Is my vacuumdb stuck in a loop?

From
"Michael Goldner"
Date:

I'm running Postgres 8.1.8 on RHEL.  I have a very large database that
consists of a single table with a large object column.  The table had just
under 2 million rows.  The database looks to be about 700GB is size.

I have autovacuum running on the system and performance had been good.  I'm
not sure what the size threshold was, maybe 500GB, for when it began to
deteriorate.  However, we are experiencing some performance issues right now
and I am archiving data to bring the size down to about 250GB.

Last night, my archiving script slowed by a factor of 2 or 3, so I figured
that enough rows had been removed that a manual vacuum was necessary.  From
the command prompt (as the postgres user) I ran the following:

-bash-3.00$ vacuumdb --verbose --analyze <dbname>

It has been running for close to 12 hours.  The strange thing is that it
seems to be "looping".  I see repeating "INFO" messages for pg_largeobject
and pg_largeobject_loid_pn_index relations.  I've included the output below
my sig.  This database is in constant production use, so records are being
added during the vacuum.

Am I stuck in a loop, or is this happening because the size of the relation
is so large that postgres is operating on smaller chunks?

Thanks,

Mike

LOG DETAIL:

INFO:  vacuuming "pg_catalog.pg_largeobject"
INFO:  index "pg_largeobject_loid_pn_index" now contains 274033339 row
versions in 1109450 pages
DETAIL:  11184556 index row versions were removed.
7414 index pages have been deleted, 0 are currently reusable.
CPU 41.53s/111.63u sec elapsed 650.60 sec.
INFO:  "pg_largeobject": removed 11184556 row versions in 3753500 pages
DETAIL:  CPU 168.15s/37.34u sec elapsed 1498.92 sec.
INFO:  index "pg_largeobject_loid_pn_index" now contains 262883150 row
versions in 1109575 pages
DETAIL:  11184556 index row versions were removed.
15907 index pages have been deleted, 0 are currently reusable.
CPU 44.25s/107.33u sec elapsed 654.02 sec.
INFO:  "pg_largeobject": removed 11184556 row versions in 3755190 pages
DETAIL:  CPU 205.56s/42.92u sec elapsed 1604.51 sec.
INFO:  index "pg_largeobject_loid_pn_index" now contains 251843838 row
versions in 1110114 pages
DETAIL:  11184557 index row versions were removed.
22473 index pages have been deleted, 0 are currently reusable.
CPU 47.55s/104.48u sec elapsed 642.06 sec.
INFO:  "pg_largeobject": removed 11184557 row versions in 3758621 pages
DETAIL:  CPU 124.28s/39.16u sec elapsed 1347.28 sec.
INFO:  index "pg_largeobject_loid_pn_index" now contains 240671903 row
versions in 1110161 pages
DETAIL:  11184558 index row versions were removed.
29228 index pages have been deleted, 0 are currently reusable.
CPU 46.78s/100.50u sec elapsed 663.27 sec.
INFO:  "pg_largeobject": removed 11184558 row versions in 3757819 pages
DETAIL:  CPU 218.24s/48.46u sec elapsed 1573.08 sec.
INFO:  index "pg_largeobject_loid_pn_index" now contains 229491795 row
versions in 1110179 pages
DETAIL:  11184557 index row versions were removed.
47283 index pages have been deleted, 0 are currently reusable.
CPU 52.15s/95.28u sec elapsed 705.06 sec.
INFO:  "pg_largeobject": removed 11184557 row versions in 3741040 pages
DETAIL:  CPU 188.51s/47.72u sec elapsed 1406.88 sec.
INFO:  index "pg_largeobject_loid_pn_index" now contains 218339098 row
versions in 1110300 pages
DETAIL:  11184557 index row versions were removed.
90877 index pages have been deleted, 0 are currently reusable.
CPU 63.27s/90.99u sec elapsed 727.05 sec.
INFO:  "pg_largeobject": removed 11184557 row versions in 3708216 pages
DETAIL:  CPU 114.99s/41.80u sec elapsed 935.06 sec.

Re: Is my vacuumdb stuck in a loop?

From
Tom Lane
Date:
"Michael Goldner" <MGoldner@agmednet.com> writes:
> Am I stuck in a loop, or is this happening because the size of the relation
> is so large that postgres is operating on smaller chunks?

It's removing as many dead rows at a time as it can handle.  Arithmetic
suggests that you've got maintenance_work_mem set to 64MB, which would
be enough room to process 11184810 rows per index scanning cycle.

The fact that there are so many dead large objects is what I'd be
worrying about.  Does that square with your sense of what you've
removed, or does it suggest you've got a large object leak?  Do you
use contrib/lo and/or contrib/vacuumlo to manage them?

The numbers also suggest that you might be removing all or nearly
all of the rows in pg_largeobject.  If so, a CLUSTER on it might
be more effective than VACUUM as a one-shot cleanup method.

            regards, tom lane

Re: Is my vacuumdb stuck in a loop?

From
Michael Goldner
Date:
On 3/2/08 11:15 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

> "Michael Goldner" <MGoldner@agmednet.com> writes:
>> Am I stuck in a loop, or is this happening because the size of the relation
>> is so large that postgres is operating on smaller chunks?
>
> It's removing as many dead rows at a time as it can handle.  Arithmetic
> suggests that you've got maintenance_work_mem set to 64MB, which would
> be enough room to process 11184810 rows per index scanning cycle.
>
That is exactly correct regarding maintenance_work_mem.  Maybe I should
double this.  I think my server has sufficient memory.  Is there any
downside?

> The fact that there are so many dead large objects is what I'd be
> worrying about.  Does that square with your sense of what you've
> removed, or does it suggest you've got a large object leak?  Do you
> use contrib/lo and/or contrib/vacuumlo to manage them?

I am in the process of archiving off about 1.4 million large objects from a
total of 1.7 million.  I removed about 700,000 before running the vacuum.
Each large object averages about 256k, with outliers as big as 100MB.  At 2k
per row in pg_largeobject (if I understand correctly), the number of dead
large objects doesn't seem unreasonable.

>
> The numbers also suggest that you might be removing all or nearly
> all of the rows in pg_largeobject.  If so, a CLUSTER on it might
> be more effective than VACUUM as a one-shot cleanup method.
>

My understanding is that CLUSTER is a blocking operation.  My database
supports a 24x7 operation, so I cannot bring the system offline for extended
periods.  How fast is the "CLUSTER" operation?  Given my database, am I
looking at minutes, hours, or days?  Can a CLUSTER be cancelled without
negative consequences if I see it is taking longer than I can allow?

Thanks

--
Mike Goldner


Re: Is my vacuumdb stuck in a loop?

From
Tom Lane
Date:
Michael Goldner <mgoldner@agmednet.com> writes:
>> The fact that there are so many dead large objects is what I'd be
>> worrying about.  Does that square with your sense of what you've
>> removed, or does it suggest you've got a large object leak?  Do you
>> use contrib/lo and/or contrib/vacuumlo to manage them?

> I am in the process of archiving off about 1.4 million large objects from a
> total of 1.7 million.

OK, and no doubt they're the 1.4 million oldest ones?  So what we're
seeing is VACUUM removing all of the rows in the earlier part of the
table, but when it eventually gets to the end there will be a lot of
non-removed rows.

>> The numbers also suggest that you might be removing all or nearly
>> all of the rows in pg_largeobject.  If so, a CLUSTER on it might
>> be more effective than VACUUM as a one-shot cleanup method.

> My understanding is that CLUSTER is a blocking operation.  My database
> supports a 24x7 operation, so I cannot bring the system offline for extended
> periods.

You wouldn't want to do a CLUSTER then.  But you're going to have an
awful lot of dead space in pg_largeobject if you don't.  Might want to
think about it during your next scheduled maintenance window (you do
have some I hope).

The way to avoid getting into this situation in future is to archive on
a more routine basis.  Removing 10% of the rows at a time doesn't put
you in a situation where you desperately need to reclaim that space.
You can just wait for it to get used up again during normal operations.
It's a bit harder to adopt that attitude when you know that 80% of
pg_largeobject is dead space.  What's more, you're going to have to
boost max_fsm_pages quite a lot, or the dead space won't get reused
very effectively at all ...

            regards, tom lane