Thread: Unexplained Major Vacuum Archive Activity During Vacuum

Unexplained Major Vacuum Archive Activity During Vacuum

From
Shaun Thomas
Date:
Hey guys,

I don't notice any errors, which just makes this even more strange. But
after weeks of operating normally, our 10pm manual vacuum job generated
transaction logs basically equivalent to 3/4 of our database, and I
can't find any explanation. This amount is about 6x higher than usual.

Before I go crazy and tear the box apart, does anyone know of some
internal change that may cause intermittent marking of pages to increase
without related database activity?

Or did I just vacuum a database with a corrupt CPU or piece of RAM? (Let
me say again, I see no errors anywhere in the database logs.)

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: Unexplained Major Vacuum Archive Activity During Vacuum

From
"Kevin Grittner"
Date:
Shaun Thomas wrote:

> I don't notice any errors, which just makes this even more strange.
> But after weeks of operating normally, our 10pm manual vacuum job
> generated transaction logs basically equivalent to 3/4 of our
> database, and I can't find any explanation. This amount is about 6x
> higher than usual.
>
> Before I go crazy and tear the box apart, does anyone know of some
> internal change that may cause intermittent marking of pages to
> increase without related database activity?

Yes.

Did you bulk load this data (possibly through restoring pg_dump
output)? If so, and you have not explicitly run VACUUM FREEZE
afterward, the vacuum noticed that it was time to freeze all of these
tuples. When I use pg_dump output to create a database, I always
VACUUM FREEZE ANALYZE immediately afterward -- before I start
archiving.

Just be glad you got it with a manual vacuum during off-peak hours,
rather than having this kick in via autovacuum during peak OLTP load.

> Or did I just vacuum a database with a corrupt CPU or piece of RAM?
> (Let me say again, I see no errors anywhere in the database logs.)

You haven't mentioned anything that should be taken as evidence of
corruption or any unusual behavior on the part of PostgreSQL.

-Kevin


Re: Unexplained Major Vacuum Archive Activity During Vacuum

From
Shaun Thomas
Date:
On 11/01/2012 09:18 AM, Kevin Grittner wrote:

> Did you bulk load this data (possibly through restoring pg_dump
> output)? If so, and you have not explicitly run VACUUM FREEZE
> afterward, the vacuum noticed that it was time to freeze all of these
> tuples.

Ok, that might explain it, then. We did in fact just upgrade from 8.2 to
9.1 about 2 weeks ago. And no, I didn't do a VACUUM FREEZE, just a
VACUUM ANALYZE to make sure stats were ready. I'm still a little
uncertain what the tangible difference is between a FREEZE and a regular
VACUUM. I get that it sets freeze_min_age to 0, but why does that even
matter? Is 50M out of 2B not good enough? Every VACUUM knocks the
counter back to the minimum, so I guess I don't get the justification
for magically forcing the minimum to be lower.

Of course, all that page marking would definitely produce a butt-ton of
transaction logs. So at least that makes sense. :)

Thanks, Keven!

> You haven't mentioned anything that should be taken as evidence of
> corruption or any unusual behavior on the part of PostgreSQL.

No, but I was a little freaked out by the unexplained activity.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: Unexplained Major Vacuum Archive Activity During Vacuum

From
"Kevin Grittner"
Date:
Shaun Thomas wrote:

> Ok, that might explain it, then. We did in fact just upgrade from 8.2 to
> 9.1 about 2 weeks ago. And no, I didn't do a VACUUM FREEZE, just a
> VACUUM ANALYZE to make sure stats were ready. I'm still a little
> uncertain what the tangible difference is between a FREEZE and a regular
> VACUUM. I get that it sets freeze_min_age to 0, but why does that even
> matter? Is 50M out of 2B not good enough? Every VACUUM knocks the
> counter back to the minimum, so I guess I don't get the justification
> for magically forcing the minimum to be lower.

You might find this section of the docs illuminating:

http://www.postgresql.org/docs/9.2/interactive/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND

-Kevin


Re: Unexplained Major Vacuum Archive Activity During Vacuum

From
Shaun Thomas
Date:
On 11/01/2012 10:28 AM, Kevin Grittner wrote:

> http://www.postgresql.org/docs/9.2/interactive/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND

I read that several times, and I still don't get how it applies to this
case. Based on my past experience with 8.2, and my understanding of 9.1,
I moved autovacuum_freeze_max_age up to 650M so we'd never get a mid-day
freeze. And the default for vacuum_freeze_table_age is 150M, which I
hadn't changed.

So here's what I don't get:

* A manual vacuum vacuums a table.
* If the age of that table is > 150M (by default), also freeze.
* Counters are reset to autovacuum_freeze_min_age or 0... eh.

If that's the case, the freeze bit shouldn't have affected us, because I
already basically crippled autovacuum from freezing anything. The
nightly vacuum would freeze because we do more than 150M transactions
per day.

So with the new settings, we've been effectively doing a VACUUM FREEZE
every night.  And this has been going on for weeks without issue.

But last night? Total pandemonium. I suppose it could be related to the
market being closed for 2 extra days, but we kept running our accounting
jobs. The volume is just very suspicious.

Either I'm totally misunderstanding a fundamental issue, or something
still seems fishy here.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: Unexplained Major Vacuum Archive Activity During Vacuum

From
Alban Hertroys
Date:
On 1 November 2012 17:19, Shaun Thomas <sthomas@optionshouse.com> wrote:
> On 11/01/2012 10:28 AM, Kevin Grittner wrote:
> Based on my past experience with 8.2, and my understanding of 9.1, I
> moved autovacuum_freeze_max_age up to 650M so we'd never get a mid-day
> freeze. And the default for vacuum_freeze_table_age is 150M, which I hadn't
> changed.

Instead of attempting to postpone freeze until beyond the life
expectancy of our universe, what you probably should have done is
vacuum more often so that vacuum has less work to do.

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


Re: Unexplained Major Vacuum Archive Activity During Vacuum

From
Shaun Thomas
Date:
On 11/01/2012 11:40 AM, Alban Hertroys wrote:

> Instead of attempting to postpone freeze until beyond the life
> expectancy of our universe, what you probably should have done is
> vacuum more often so that vacuum has less work to do.

More often than every night, with autovacuum running in the background
to get regular stuff that happens during the day? 650M transactions is 3
or 4 days for us. That's hardly the lifetime of the universe. And since
I didn't modify vacuum_freeze_table_age, any table vacuumed after 150M
transactions is given a vacuum freeze anyway. No harm done.

It's my understanding you *don't* want to freeze excessively. I think
once every day is bad enough, honestly.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: Unexplained Major Vacuum Archive Activity During Vacuum

From
Alban Hertroys
Date:
On 1 Nov 2012, at 17:44, Shaun Thomas wrote:

> On 11/01/2012 11:40 AM, Alban Hertroys wrote:
>
>> Instead of attempting to postpone freeze until beyond the life
>> expectancy of our universe, what you probably should have done is
>> vacuum more often so that vacuum has less work to do.
>
> More often than every night, with autovacuum running in the background to get regular stuff that happens during the
day?650M transactions is 3 or 4 days for us. That's hardly the lifetime of the universe. And since I didn't modify
vacuum_freeze_table_age,any table vacuumed after 150M transactions is given a vacuum freeze anyway. No harm done. 

150M database transactions a day sounds excessive, is there no way to reduce that number?

That aside, 650M transactions in 3 at 4 days is not equal to 150M transactions a day. It can be quite a few more. Since
youmentioned that the market halted for 2 days there were probably a lot more transactions waiting than usual; not just
piledup work, but lots of attempts at corrections as well. It wouldn't surprise me if you went over 650M transactions
thatday. 

> It's my understanding you *don't* want to freeze excessively. I think once every day is bad enough, honestly.


That's not what I was suggesting. I wasn't talking about vacuum freeze but normal autovacuum with more aggressive
parameters.
That should handle transaction wrap-around automatically when it looks like you're getting close to the transaction
wrap-aroundid. As per the docs in 8.2, vacuum freeze was deprecated back then already. Knowing the devs a bit, there
wasa good reason to do so. 

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



Re: Unexplained Major Vacuum Archive Activity During Vacuum

From
Shaun Thomas
Date:
On 11/02/2012 03:08 AM, Alban Hertroys wrote:

> 150M database transactions a day sounds excessive, is there no way to
> reduce that number?

I wish. 150M is actually a conservative estimate. In fact, we average
141M, but have been as high as 270M. It's all market dependent.

Here's a quick look at the last two weeks of activity:

  reading_date | transactions
--------------+--------------
  2012-10-19   |    188548680
  2012-10-20   |     16722600
  2012-10-21   |     18326700
  2012-10-22   |    183141540
  2012-10-23   |    183269580
  2012-10-24   |    177945540
  2012-10-25   |    180901740
  2012-10-26   |    170482920
  2012-10-27   |     16103820
  2012-10-28   |     15026340
  2012-10-29   |     27431160
  2012-10-30   |     20299500
  2012-10-31   |    165263760
  2012-11-01   |    175540020

So you can see that even in the "off" days, we handle over 15M
transactions per day. Monday and Tuesday were admittedly slow, but that
means there was even less reason for VACUUM to go bonkers. We run it
*every night*. Twice, in fact. Once after the main part of the day is
done, and once after a lot of our nightly reconciliation scripts run
because of all the rows they touch.

We can probably disable that second vacuum now that we are using
autovacuum, but the first one has to stay.

> It wouldn't surprise me if you went over 650M transactions that day.

Yeah, and I thought that might be the case too. Until I looked at our
graph of transaction totals for the week. The last vacuum that would
have caused an automatic freeze happened on the 26th. So by the time the
"bad" vacuum happened on the 31st, we'd accumulated about 244M
transactions. Sure, that's quite a few, but not 6x more than usual, as
the amount of transaction logs generated during the vacuum might suggest.

> That's not what I was suggesting. I wasn't talking about vacuum
> freeze but normal autovacuum with more aggressive parameters.

It's already pretty aggressive. I cut autovacuum_vacuum_scale_factor and
autovacuum_analyze_scale_factor in half from the defaults, and tweaked a
couple tables that were getting vacuumed or analyzed every minute.

What I don't want is for it to be so aggressive that it decides to
freeze a 50M row table in the middle of an active trading day. That's
why the 10PM vacuum stays. If I just depended on
autovacuum_freeze_max_age, that would be a distinct possibility.

> Vacuum freeze was deprecated back then already.
> Knowing the devs a bit, there was a good reason to do so.

VACUUM FREEZE is no longer deprecated according to the 9.1 and 9.2 docs.
I've never used it myself, though. Besides that, autovacuum does a
FREEZE automatically if you go over autovacuum_freeze_max_age. Due to
the free space map, they also added vacuum_freeze_table_age, which
causes a regular VACUUM to be promoted to a VACUUM FREEZE if the age is
higher than that value. By default, that's 150M; just about perfect for
our system, honestly.

In pre-8.4, before the free space map, you don't need FREEZE at all. A
regular vacuum could reset all XID counters down to some minimum value.
That minimum was vacuum_freeze_min_age. But because of the visibility
map, and regular vacuums using it, older XIDs could be missed during a
regular VACUUM because it wasn't launched by the autovacuum thread. They
introduced vacuum_freeze_table_age so you could get the old
functionality back (always resetting down to vacuum_freeze_min_age
during any VACUUM) if so desired.

Basically VACUUM FREEZE doesn't mean what it once did. The free space
map made it a necessity because VACUUM doesn't always recliam XIDs anymore.

At least, that's the impression I got from the docs. I could be way off.
It bears out, though. I've got plenty of tables in our stage setup where
I'll vacuum them after setting vacuum_freeze_min_age to 100 or
something, and they just keep climbing. VACUUM FREEZE *always* resets
the value, though.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: Unexplained Major Vacuum Archive Activity During Vacuum

From
Andres Freund
Date:
On Thursday, November 01, 2012 05:40:23 PM Alban Hertroys wrote:
> On 1 November 2012 17:19, Shaun Thomas <sthomas@optionshouse.com> wrote:
> > On 11/01/2012 10:28 AM, Kevin Grittner wrote:
> > Based on my past experience with 8.2, and my understanding of 9.1, I
> > moved autovacuum_freeze_max_age up to 650M so we'd never get a mid-day
> > freeze. And the default for vacuum_freeze_table_age is 150M, which I
> > hadn't changed.
>
> Instead of attempting to postpone freeze until beyond the life
> expectancy of our universe, what you probably should have done is
> vacuum more often so that vacuum has less work to do.

Thats not really possible with freeze vacuums. When the table is older than
the applicable freeze age its scanned completely instead of only the parts
that are sensible according to the vacuum map. The more expensive scans really
only happen when they make sense...

Andres
--
Andres Freund        http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services