Thread: template0 needing vacuum freeze?

template0 needing vacuum freeze?

From
Don Seiler
Date:
PG 9.6 (9.6.8). Got an alert today when template0 and template1 were both over 50% towards TXID wraparound. I could vacuum template1 but couldn't vacuum template0 without first allowing connections. This is what it looked like before:

# SELECT datname
    , age(datfrozenxid)
    , current_setting('autovacuum_freeze_max_age')
FROM pg_database
ORDER BY 2 DESC;
      datname       |    age     | current_setting
--------------------+------------+-----------------
 foo_db             | 1022106099 | 200000000
 template0          | 1000278345 | 200000000
 postgres           |     643729 | 200000000
 template1          |     643729 | 200000000
(4 rows)


I've since allowed connections and ran "vacuumdb --freeze" on it and then immediately disabled the connections to it again. But I'm curious how template0 would be growing in age like this. Even now I see the template0 age growing. I can say that these DB has previously been altered for locale changes as well.

I'm also running a long "vacuum freeze" on foo_db that will take a few days after seeing that autovacuum on a big table had been running on it since Feb 2 and making no progress, with over 850M dead tuples according to pg_stat_all_tables. I estimate 3-4 more days to go on that one. Once that's done I'll be scheduling manual vacuum jobs. Just wondering if that would somehow affect regular template0 cleanup though.

I don't see anything in postgres log related to template0 other than my manual interactions today.

--
Don Seiler
www.seiler.us

Re: template0 needing vacuum freeze?

From
Tom Lane
Date:
Don Seiler <don@seiler.us> writes:
> PG 9.6 (9.6.8). Got an alert today when template0 and template1 were both
> over 50% towards TXID wraparound. I could vacuum template1 but couldn't
> vacuum template0 without first allowing connections. This is what it looked
> like before:

template0 shouldn't really need freezing, if it's unchanged since initdb,
but the autovacuum logic doesn't know that and will periodically scan it
anyway.  That should be pretty cheap (since that DB is small and there's
not really any work to do), so we haven't considered it to be something
to prevent --- especially since it is a good safety valve in case
someone does change template0.

So it's unsurprising that the freeze age increases until autovacuum
decides to do something about it.  I'm suspicious that your alert settings
are too aggressive and are notifying you before autovacuum kicks in.
You should *not* have had to do anything manual about this, unless you
have frobbed your autovac settings to the point of brokenness.

            regards, tom lane



Re: template0 needing vacuum freeze?

From
Don Seiler
Date:
On Sat, May 16, 2020 at 12:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

So it's unsurprising that the freeze age increases until autovacuum
decides to do something about it.  I'm suspicious that your alert settings 
are too aggressive and are notifying you before autovacuum kicks in.
You should *not* have had to do anything manual about this, unless you
have frobbed your autovac settings to the point of brokenness.

Shouldn't autovacuum have kicked in when the age of a table reaches 200M (our autovacuum_freeze_max_age is left at that default)? I see other tables in our app DB triggering the autovacuum "to prevent wrap-around" when they reach 200M. That's what had me concerned to see template0 with an age over 1B and no autovacuum even trying to clean up for it.

Don.

--
Don Seiler
www.seiler.us

Re: template0 needing vacuum freeze?

From
Laurenz Albe
Date:
On Sat, 2020-05-16 at 12:19 -0500, Don Seiler wrote:
> PG 9.6 (9.6.8). Got an alert today when template0 and template1 were both over 50%
> towards TXID wraparound. I could vacuum template1 but couldn't vacuum template0
> without first allowing connections. This is what it looked like before:
> 
> # SELECT datname
>     , age(datfrozenxid)
>     , current_setting('autovacuum_freeze_max_age')
> FROM pg_database
> ORDER BY 2 DESC;
>       datname       |    age     | current_setting
> --------------------+------------+-----------------
>  foo_db             | 1022106099 | 200000000
>  template0          | 1000278345 | 200000000
>  postgres           |     643729 | 200000000
>  template1          |     643729 | 200000000
> (4 rows)
> 
> I've since allowed connections and ran "vacuumdb --freeze" on it and then immediately
> disabled the connections to it again. But I'm curious how template0 would be growing
> in age like this. Even now I see the template0 age growing.

That is indeed strange.

Did you see any weird messages when you vacuumed "template0"?
Did "datfrozenxid" shrink after the operation?

"foo_db" seems to be the bigger problem.
Perhaps autovacuum never handled "template0" because it concluded (rightly) that
it has to deal with "foo_db" first.

> I can say that these DB
> has previously been altered for locale changes as well.

Would you care to explain that?  You changed "template0"?  How?

> I'm also running a long "vacuum freeze" on foo_db that will take a few days after
> seeing that autovacuum on a big table had been running on it since Feb 2 and making
> no progress, with over 850M dead tuples according to pg_stat_all_tables.
> I estimate 3-4 more days to go on that one. Once that's done I'll be scheduling
> manual vacuum jobs. Just wondering if that would somehow affect regular template0
> cleanup though.

As I said, perhaps.

What are your non-default autovacuum settings?  Perhaps you should speed up autovacuum
by reducing "autovacuum_vacuum_cost_delay" to 2ms or less, and by increasing
"maintenance_work_mem".

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: template0 needing vacuum freeze?

From
Don Seiler
Date:
On Mon, May 18, 2020 at 1:40 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

Did you see any weird messages when you vacuumed "template0"?

No.
 
Did "datfrozenxid" shrink after the operation?

"foo_db" seems to be the bigger problem. 
Perhaps autovacuum never handled "template0" because it concluded (rightly) that
it has to deal with "foo_db" first.

Yes this DB had a table in it that had been autovacuuming since Feb 2. It's age is half way to wraparound so I'm in the middle of a manual VACUUM FREEZE on it. I'd be interested in knowing if that prevents template0 from autovacuuming itself. There are no other autovacuum jobs running.
 

> I can say that these DB
> has previously been altered for locale changes as well.

Would you care to explain that?  You changed "template0"?  How?

It was changed before my time here to change the encoding from LATIN1 to UTF manually. One of the fun treats I'm working to correct as I also prepare these for upgrade to PG12.

What are your non-default autovacuum settings?  Perhaps you should speed up autovacuum
by reducing "autovacuum_vacuum_cost_delay" to 2ms or less, and by increasing
"maintenance_work_mem". 

All autovacuum settings on this DB are default. Cost delay is at the default 20ms. maintenance_work_mem I've already increased to 512MB (this VM has 8GB RAM).

--
Don Seiler
www.seiler.us

Re: template0 needing vacuum freeze?

From
Tom Lane
Date:
Don Seiler <don@seiler.us> writes:
> On Mon, May 18, 2020 at 1:40 AM Laurenz Albe <laurenz.albe@cybertec.at>
> wrote:
>> Perhaps autovacuum never handled "template0" because it concluded (rightly)
>> that it has to deal with "foo_db" first.

> Yes this DB had a table in it that had been autovacuuming since Feb 2. It's
> age is half way to wraparound so I'm in the middle of a manual VACUUM
> FREEZE on it. I'd be interested in knowing if that prevents template0 from
> autovacuuming itself. There are no other autovacuum jobs running.

I think we did put in a change that would prevent any one database from
completely consuming autovacuum's attention, even in wraparound-hazard
situations.  Don't recall when.

Do you have an idea why autovac was failing to clear the issue on that one
problem table, though?

            regards, tom lane



Re: template0 needing vacuum freeze?

From
Don Seiler
Date:
On Mon, May 18, 2020 at 8:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Do you have an idea why autovac was failing to clear the issue on that one
problem table, though?

Before I intervened, the maintenance_work_mem was only 16MB (they had encoded and carried over PG 8 defaults in the chef recipe). I bumped it to 512MB before kicking off my freezeThat's the big factor I can think of. This is a huge table obviously as well. By the time it caught my attention there were 850M dead tuples to be cleaned up. My VACUUM FREEZE has been running just just about 5 days and is probably half way done. We shouldn't be at risk of hitting wraparound though (only 52% there).

Don.

--
Don Seiler
www.seiler.us

Re: template0 needing vacuum freeze?

From
Adrian Klaver
Date:
On 5/18/20 6:51 PM, Tom Lane wrote:
> Don Seiler <don@seiler.us> writes:
>> On Mon, May 18, 2020 at 1:40 AM Laurenz Albe <laurenz.albe@cybertec.at>
>> wrote:
>>> Perhaps autovacuum never handled "template0" because it concluded (rightly)
>>> that it has to deal with "foo_db" first.
> 
>> Yes this DB had a table in it that had been autovacuuming since Feb 2. It's
>> age is half way to wraparound so I'm in the middle of a manual VACUUM
>> FREEZE on it. I'd be interested in knowing if that prevents template0 from
>> autovacuuming itself. There are no other autovacuum jobs running.
> 
> I think we did put in a change that would prevent any one database from
> completely consuming autovacuum's attention, even in wraparound-hazard
> situations.  Don't recall when.

This?:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=dd9ac7d5d80608a640bb82cffb6a805ce84cf112

which I believe is only for 12 and is in the 12.3 release.


> 
> Do you have an idea why autovac was failing to clear the issue on that one
> problem table, though?
> 
>             regards, tom lane
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com