Thread: Vacuum stats interpreted?

Vacuum stats interpreted?

From
Jeff Boes
Date:
Our largest (by total byte count) table has a nearly-equal flow of data in and
out on a daily basis (that is, we add some 40k rows during the day, and then
every night expire stuff that is timestamped beyond our cutoff, which results in
about 40k deletions).

After the deletions, the table gets vacuumed (not FULL):

INFO:  --Relation public.articles_content--
INFO:  Index ix_articles_content_pk: Pages 398; Tuples 180175: Deleted 0.
        CPU 0.00s/0.08u sec elapsed 0.08 sec.
INFO:  Removed 122 tuples in 77 pages.
        CPU 0.00s/0.01u sec elapsed 0.02 sec.
INFO:  Pages 3886: Changed 0, Empty 0; Tup 180175: Vac 122, Keep 26437, UnUsed
135721.
        Total CPU 0.02s/0.17u sec elapsed 0.19 sec.
INFO:  --Relation pg_toast.pg_toast_634643688--
INFO:  Index pg_toast_634643688_index: Pages 27156; Tuples 1256923: Deleted 732.
        CPU 2.32s/0.80u sec elapsed 27.93 sec.
INFO:  Removed 732 tuples in 250 pages.
        CPU 0.01s/0.04u sec elapsed 0.05 sec.
INFO:  Pages 1667633: Changed 71514, Empty 0; Tup 1256923: Vac 732, Keep 172914,
UnUsed 5499031.
        Total CPU 148.07s/12.52u sec elapsed 809.71 sec.
INFO:  Analyzing public.articles_content

The "Keep" and "UnUsed" numbers seem high to me, compared to other tables. Can
anyone interpret these and tell me anything about what we could do with this
table to make it "look" better? (I'm not convinced that the extra space affects
anything but sequential scans, and we don't do that to this table.)

If it helps: I have a theory that database connections from our webserver (which
is running PHP and mod_perl) are in "idle in transaction" states (in other
words, they issue "Commit; Begin;" and then sleep), which causes the tables to
retain bulk (nice image, eh?).


Re: Vacuum stats interpreted?

From
Jeff Boes
Date:
Ah, I should have added: we are using PG 7.3.4.


Re: Vacuum stats interpreted?

From
"nobody"
Date:
To test your theory about COMMIT; BEGIN; you could check the PostgreSQL log,
it is likely to contain a line:

WARNING: there is no transaction in progress

"Jeff Boes" <jboes@qtm.net> wrote in message
news:6b79e0aa2a1c3668937e1ce087c4a819@news.teranews.com...
> Our largest (by total byte count) table has a nearly-equal flow of data in
and
> out on a daily basis (that is, we add some 40k rows during the day, and
then
> every night expire stuff that is timestamped beyond our cutoff, which
results in
> about 40k deletions).
>
> After the deletions, the table gets vacuumed (not FULL):
>
> INFO:  --Relation public.articles_content--
> INFO:  Index ix_articles_content_pk: Pages 398; Tuples 180175: Deleted 0.
>         CPU 0.00s/0.08u sec elapsed 0.08 sec.
> INFO:  Removed 122 tuples in 77 pages.
>         CPU 0.00s/0.01u sec elapsed 0.02 sec.
> INFO:  Pages 3886: Changed 0, Empty 0; Tup 180175: Vac 122, Keep 26437,
UnUsed
> 135721.
>         Total CPU 0.02s/0.17u sec elapsed 0.19 sec.
> INFO:  --Relation pg_toast.pg_toast_634643688--
> INFO:  Index pg_toast_634643688_index: Pages 27156; Tuples 1256923:
Deleted 732.
>         CPU 2.32s/0.80u sec elapsed 27.93 sec.
> INFO:  Removed 732 tuples in 250 pages.
>         CPU 0.01s/0.04u sec elapsed 0.05 sec.
> INFO:  Pages 1667633: Changed 71514, Empty 0; Tup 1256923: Vac 732, Keep
172914,
> UnUsed 5499031.
>         Total CPU 148.07s/12.52u sec elapsed 809.71 sec.
> INFO:  Analyzing public.articles_content
>
> The "Keep" and "UnUsed" numbers seem high to me, compared to other tables.
Can
> anyone interpret these and tell me anything about what we could do with
this
> table to make it "look" better? (I'm not convinced that the extra space
affects
> anything but sequential scans, and we don't do that to this table.)
>
> If it helps: I have a theory that database connections from our webserver
(which
> is running PHP and mod_perl) are in "idle in transaction" states (in other
> words, they issue "Commit; Begin;" and then sleep), which causes the
tables to
> retain bulk (nice image, eh?).
>



Re: Vacuum stats interpreted?

From
Tom Lane
Date:
Jeff Boes <jboes@qtm.net> writes:
> The "Keep" and "UnUsed" numbers seem high to me, compared to other tables. Can
> anyone interpret these and tell me anything about what we could do with this
> table to make it "look" better?

"Keep" is the number of rows that are committed dead but had to be kept
anyway, because there are open transactions old enough to still
potentially see them.  The only way to reduce that is to not have old
transactions hanging 'round while you vacuum.

UnUsed is the number of empty line-pointer slots.  At 4 bytes apiece,
this would have to vastly exceed the number of live tuples before you
should worry much.

7.4 labels these numbers in a hopefully more user-friendly way, btw.

            regards, tom lane

Re: Vacuum stats interpreted?

From
Jeff Boes
Date:
At some point in time, tgl@sss.pgh.pa.us (Tom Lane) wrote:

>Jeff Boes <jboes@qtm.net> writes:
>> The "Keep" and "UnUsed" numbers seem high to me, compared to other tables. Can
>> anyone interpret these and tell me anything about what we could do with this
>> table to make it "look" better?
>
>"Keep" is the number of rows that are committed dead but had to be kept
>anyway, because there are open transactions old enough to still
>potentially see them.  The only way to reduce that is to not have old
>transactions hanging 'round while you vacuum.
>
>UnUsed is the number of empty line-pointer slots.  At 4 bytes apiece,
>this would have to vastly exceed the number of live tuples before you
>should worry much.

For which values of "vastly"? I have a small table (1-2k rows) which has a ratio
of UnUsed:Tuples of 50-500.

The table in question has a ratio of about 10 or 11:1.

For some tables (not this one), we find that it significantly improves
performance (of non-indexed queries) to pg_dump and reload the table
periodically. I've been asked to try to quantify (from these vacuum numbers)
when we can predict that a dump-and-reload would be valuable.

Re: Vacuum stats interpreted?

From
Tom Lane
Date:
Jeff Boes <jboes@nexcerpt.com> writes:
> At some point in time, tgl@sss.pgh.pa.us (Tom Lane) wrote:
>> UnUsed is the number of empty line-pointer slots.  At 4 bytes apiece,
>> this would have to vastly exceed the number of live tuples before you
>> should worry much.

> For which values of "vastly"? I have a small table (1-2k rows) which has a ratio
> of UnUsed:Tuples of 50-500.

That sounds like a lot to me too.  You should probably VACUUM FULL and
then try to increase the frequency of regular vacuums to cut down on
the accumulation of deadwood.

> The table in question has a ratio of about 10 or 11:1.

It did?  I saw about 1:1:

>> INFO:  Pages 3886: Changed 0, Empty 0; Tup 180175: Vac 122, Keep 26437, UnUsed 135721.

which is why I didn't feel a need to panic about it.

> For some tables (not this one), we find that it significantly improves
> performance (of non-indexed queries) to pg_dump and reload the table
> periodically. I've been asked to try to quantify (from these vacuum numbers)
> when we can predict that a dump-and-reload would be valuable.

For non-indexed scans I would think that the fraction of free space
(hence, useless I/O) would be the number you are after.  VACUUM does not
really offer this, but see contrib/pgstattuple.

            regards, tom lane