Thread: Why does autovacuum clean fewer rows than I expect?

Why does autovacuum clean fewer rows than I expect?

From
Nick Cabatoff
Date:
Hi,

I'm having trouble making sense of the these two autovacuum log entries.  
I'm running PostgreSQL 8.4.

[2014-05-22 04:56:43.486 EST]  {537cf2c6.30f9}  LOG:  automatic vacuum of table "postgres.globaldicom.requests": index scans: 1
        pages: 0 removed, 163600 remain
        tuples: 5916 removed, 8423598 remain
        system usage: CPU 3.62s/9.84u sec elapsed 1060.49 sec

[2014-05-24 05:02:12.707 EST]  {537f90c3.7144}  LOG:  automatic vacuum of table "postgres.globaldicom.requests": index scans: 2
        pages: 0 removed, 510725 remain
        tuples: 1554 removed, 8583343 remain
        system usage: CPU 9.19s/16.86u sec elapsed 2673.32 sec

I've verified that no vacuum (manual or auto) took place between them unless
they were basically no-ops, since log_autovacuum_min_duration=250.  Otherwise
we're using the stock autovacuum config settings, thus 3 workers, 1min naptime,
autovacuum_vacuum_scale_factor=0.2, vacuum_cost_delay=0ms, etc.

First question: how do we get a 3x page usage increase with a <1% tuple
increase?  The only thing I can think of is that a lot of very big values were
stored.  To my knowledge that didn't happen, but I can't rule it out entirely.
Since I judge it quite unlikely I'm looking for alternate explanations of what
might have happened.

Second question: What triggered the above autovacuums?  From what I understand
it must be one of imminent txid wraparound or exceeding
autovacuum_vacuum_scale_factor.  If it was the latter, suggesting that 20% of
the 8.5M tuples had changed, then why were only a few thousand tuples removed?
I'm not sure how to eliminate the possibility that it was the former after the
fact.  However, I know that there was a lot of activity, going by the
auto-analyzes that took place (autovacuum_vacuum_analyze_factor=0.1):

[2014-05-20 11:08:59.632 EST]  {537aab0a.b704}  LOG:  automatic analyze of table "postgres.globaldicom.requests" system usage: CPU 0.36s/0.45u sec elapsed 32.98 sec
[2014-05-23 21:12:49.164 EST]  {537f2d12.459b}  LOG:  automatic analyze of table "postgres.globaldicom.requests" system usage: CPU 0.08s/0.36u sec elapsed 30.60 sec
[2014-05-24 00:56:00.927 EST]  {537f6161.2b77}  LOG:  automatic analyze of table "postgres.globaldicom.requests" system usage: CPU 0.16s/0.38u sec elapsed 31.42 sec
[2014-05-24 06:33:16.972 EST]  {537fb06d.e9e3}  LOG:  automatic analyze of table "postgres.globaldicom.requests" system usage: CPU 0.15s/0.35u sec elapsed 31.13 sec

This isn't an isolated incident, I see many autovacuum log entries where only a
few tuples are removed from a very large table, and I can't figure out why it
was being vacuumed if that's all there was to be removed.

Re: Why does autovacuum clean fewer rows than I expect?

From
Tom Lane
Date:
Nick Cabatoff <nick.cabatoff@gmail.com> writes:
> I'm having trouble making sense of the these two autovacuum log entries.
> I'm running PostgreSQL 8.4.

8.4.what?

It'd probably be a good idea to monitor the counts in pg_stat_all_tables
for the troublesome table(s).  I suspect there are a whole lot of HOT
updates going on in this table, which'd account for autoanalyze activity
happening even though autovacuum finds few dead tuples to remove.  That
still doesn't directly explain physical table bloat, but it might be
a piece of the puzzle.

Also, have you had any database crashes in between these autovacuums?
I'm wondering if the WAL-replay issue fixed in commits 6f2aead1f and
siblings could have produced the bloat.

            regards, tom lane


Re: Why does autovacuum clean fewer rows than I expect?

From
Nick Cabatoff
Date:
Hi Tom,

Thanks for the quick reply.  8.4.9.  No db crashes, in fact not even a restart in May until the 28th.  I don't believe any HOT updates were occurring in that table: all updates modify one or more indexed fields (update status from pending to done, or update to store a stack trace and bump the time at which to retry the row into future).  We don't delete from the table, just insert and update.

During the time interval in question a bug was leading to an unusually high level of activity, both inserts and updates.   I'm thinking this is what was responsible for a huge amount of table and index bloat, but haven't yet connected all the dots.  Since I don't see autovacuum failing, or failing to run, it's a bit of a mystery.

I guess the next step is to attempt to reproduce the bug on a test system and see if we get the bloat.  I'll let you know what turns up.


On Thu, Jul 3, 2014 at 5:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Nick Cabatoff <nick.cabatoff@gmail.com> writes:
> I'm having trouble making sense of the these two autovacuum log entries.
> I'm running PostgreSQL 8.4.

8.4.what?

It'd probably be a good idea to monitor the counts in pg_stat_all_tables
for the troublesome table(s).  I suspect there are a whole lot of HOT
updates going on in this table, which'd account for autoanalyze activity
happening even though autovacuum finds few dead tuples to remove.  That
still doesn't directly explain physical table bloat, but it might be
a piece of the puzzle.

Also, have you had any database crashes in between these autovacuums?
I'm wondering if the WAL-replay issue fixed in commits 6f2aead1f and
siblings could have produced the bloat.

                        regards, tom lane

Re: Why does autovacuum clean fewer rows than I expect?

From
Michael Paquier
Date:
On Fri, Jul 4, 2014 at 7:41 AM, Nick Cabatoff <nick.cabatoff@gmail.com> wrote:
> 8.4.9.
This has been released in 2011, so you are missing 3 years worth of
bug fixes. It may be a good idea to first update to 8.4.21 before
trying to work more on that (consider as well an upgrade as 8.4 will
be EOL soon).

Regards,
--
Michael


Re: Why does autovacuum clean fewer rows than I expect?

From
Nick Cabatoff
Date:
We've been running 9.2 on our later branches for a while now.  We're overdue to bump the older branches to a later 8.4 though, you're right.  Thanks for the reminder.


On Thu, Jul 3, 2014 at 7:46 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Fri, Jul 4, 2014 at 7:41 AM, Nick Cabatoff <nick.cabatoff@gmail.com> wrote:
> 8.4.9.
This has been released in 2011, so you are missing 3 years worth of
bug fixes. It may be a good idea to first update to 8.4.21 before
trying to work more on that (consider as well an upgrade as 8.4 will
be EOL soon).

Regards,
--
Michael