Thread: VACUUM FULL performance issues with pg_largeobject table

VACUUM FULL performance issues with pg_largeobject table

From
PG User 2010
Date:
Hi there,

I originally posted these questions to the pgsql-performance mailing list, but due to lack of response, I think that these may be more general in nature--so I'm re-posting them here.  Apologies for the cross-posting ahead of time.

We are having real issues trying to reclaim dead blob space via VACUUM FULL in Postgres 8.4.2 (Linux).  We have generated this large amount of blob garbage through application code and lack of running vacuumlo often enough.  In any case, we have done a vacuumlo, and "vacuum analyze pg_largeobject"--but when we go to run "vacuum full pg_largeobject" [to reclaim the many gigabytes of dead disk space now in there] the vacuum process runs without end for days, becoming CPU bound to 1 CPU, barely doing any I/O, and of course locking the pg_largeobject table exclusively.

Therefore, we have the following questions:

1) is there any easy way to fiddle with the vacuum process so that it is not CPU bound and doing very little I/O?  Why would vacuum full be CPU bound anyway???

2) is it possible to interrupt VACUUM FULL, then re-start it later on and have it pick up where it was working before?  This way we could do the cleanup in pieces.

3) are there any alternatives, such as CLUSTER (which doesn't seem to be allowed since pg_largeobject is a system table) that would work?  Trying CLUSTER on the pg_largeobject table yields this: ERROR:  "pg_largeobject" is a system catalog

If anybody could help out with these questions, it would be GREATLY appreciated....

Sam

Re: VACUUM FULL performance issues with pg_largeobject table

From
Tom Lane
Date:
PG User 2010 <pguser2010@gmail.com> writes:
> 1) is there any easy way to fiddle with the vacuum process so that it is not
> CPU bound and doing very little I/O?  Why would vacuum full be CPU bound
> anyway???

The only part of VAC FULL that seems like it could be CPU-bound is index
cleanup.  If the table is sufficiently bloated with dead tuples, that
could take awhile.  It might be useful to try this:

1. REINDEX TABLE pg_largeobject;
2. VACUUM pg_largeobject;
3. VACUUM FULL pg_largeobject;

I have never tried this in a serious bloat situation, but in principle
I think it should improve matters.  The idea is to get rid of as many dead
index and heap entries as you can before letting VAC FULL loose on it, and
also do as much of the work as possible with a less-than-exclusive lock.
Don't forget that large maintenance_work_mem will help the first two
steps, as long as you don't set it so high as to drive the machine into
swapping.

> 2) is it possible to interrupt VACUUM FULL, then re-start it later on and
> have it pick up where it was working before?

NO.  Doing that will in fact make things worse --- a failed VAC FULL
results in even more dead entries to be cleaned up.

            regards, tom lane

Re: VACUUM FULL performance issues with pg_largeobject table

From
PG User 2010
Date:
Hi Tom,

As always, your insight is VERY helpful.  We'll try your suggestions and see if that helps things out...  Thanks!

Sam

On Fri, Jan 22, 2010 at 4:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
PG User 2010 <pguser2010@gmail.com> writes:
> 1) is there any easy way to fiddle with the vacuum process so that it is not
> CPU bound and doing very little I/O?  Why would vacuum full be CPU bound
> anyway???

The only part of VAC FULL that seems like it could be CPU-bound is index
cleanup.  If the table is sufficiently bloated with dead tuples, that
could take awhile.  It might be useful to try this:

1. REINDEX TABLE pg_largeobject;
2. VACUUM pg_largeobject;
3. VACUUM FULL pg_largeobject;

I have never tried this in a serious bloat situation, but in principle
I think it should improve matters.  The idea is to get rid of as many dead
index and heap entries as you can before letting VAC FULL loose on it, and
also do as much of the work as possible with a less-than-exclusive lock.
Don't forget that large maintenance_work_mem will help the first two
steps, as long as you don't set it so high as to drive the machine into
swapping.

> 2) is it possible to interrupt VACUUM FULL, then re-start it later on and
> have it pick up where it was working before?

NO.  Doing that will in fact make things worse --- a failed VAC FULL
results in even more dead entries to be cleaned up.

                       regards, tom lane

Re: VACUUM FULL performance issues with pg_largeobject table

From
PG User 2010
Date:
Hi Tom,

Unfortunately, I've tried your advice, and I think that we're still in a CPU-bound situation even after following the re-indexing and re-vacuuming.

Fortunately, though, I've just learned about a poor-man's profiler under Linux named pstack, and it's telling me that the vacuum process is spending most of its time in the enough_space() function in vacuum.c (somewhere around line 2574 in the most recent version in CVS I believe).

When I run pstack, I almost always see the execution stack like this:

% pstack 12158
#0  0x0000000000527478 in enough_space ()
#1  0x0000000000528b60 in repair_frag ()
#2  0x000000000052b130 in full_vacuum_rel ()
#3  0x000000000052bab8 in vacuum_rel ()
#4  0x000000000052bf12 in vacuum ()
#5  0x00000000005d69b9 in PortalRunUtility ()
#6  0x00000000005d79d5 in PortalRunMulti ()
#7  0x00000000005d80e5 in PortalRun ()
#8  0x00000000005d3e0b in exec_simple_query ()
#9  0x00000000005d4a6b in PostgresMain ()
#10 0x00000000005abe0b in ServerLoop ()
#11 0x00000000005ac9bd in PostmasterMain ()
#12 0x000000000055ae7e in main ()

Every now and then, I can see a call to ReadBufferExtended from repair_frag(), so the CPU-intensive part does eventually find what it's looking for, but I'm hopeful that this could somehow be done more efficiently and this process could be limited by I/O only.

So...I guess my question is this: is it a known issue that this code path can get very cpu-bound for large numbers of blob garbage?  Looking at the code for enough_space(), it is almost hard to believe that a lot of time is being spent here, but perhaps it's just getting called a LOT from repair_frag().  I'm no stranger to hacking code if I have to....thanks!

Sam

On Fri, Jan 22, 2010 at 4:06 PM, PG User 2010 <pguser2010@gmail.com> wrote:
Hi Tom,

As always, your insight is VERY helpful.  We'll try your suggestions and see if that helps things out...  Thanks!

Sam


On Fri, Jan 22, 2010 at 4:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
PG User 2010 <pguser2010@gmail.com> writes:
> 1) is there any easy way to fiddle with the vacuum process so that it is not
> CPU bound and doing very little I/O?  Why would vacuum full be CPU bound
> anyway???

The only part of VAC FULL that seems like it could be CPU-bound is index
cleanup.  If the table is sufficiently bloated with dead tuples, that
could take awhile.  It might be useful to try this:

1. REINDEX TABLE pg_largeobject;
2. VACUUM pg_largeobject;
3. VACUUM FULL pg_largeobject;

I have never tried this in a serious bloat situation, but in principle
I think it should improve matters.  The idea is to get rid of as many dead
index and heap entries as you can before letting VAC FULL loose on it, and
also do as much of the work as possible with a less-than-exclusive lock.
Don't forget that large maintenance_work_mem will help the first two
steps, as long as you don't set it so high as to drive the machine into
swapping.

> 2) is it possible to interrupt VACUUM FULL, then re-start it later on and
> have it pick up where it was working before?

NO.  Doing that will in fact make things worse --- a failed VAC FULL
results in even more dead entries to be cleaned up.

                       regards, tom lane