Re: PG12 autovac issues - Mailing list pgsql-general

From Michael Paquier
Subject Re: PG12 autovac issues
Date
Msg-id 20200324052606.GE1836@paquier.xyz
Whole thread Raw
In response to Re: PG12 autovac issues  (Andres Freund <andres@anarazel.de>)
Responses Re: PG12 autovac issues
List pgsql-general
On Mon, Mar 23, 2020 at 01:00:51PM -0700, Andres Freund wrote:
> On 2020-03-23 20:47:25 +0100, Julien Rouhaud wrote:
>>> - relfrozenxid, age(relfrozenxid) for the oldest table in the oldest database
>>>   SELECT oid::regclass, age(relfrozenxid), relfrozenxid FROM pg_class WHERE relfrozenxid <> 0 ORDER BY
age(relfrozenxid)DESC LIMIT 1; 
>>
>> The vm has been trashed since, and I don't have that level of detail available
>> in the gathered stats unfortunately (and the available information I have is a
>> little bit messy, sorry for that).
>>
>>> - Oldest backend xmin
>>>   SELECT pid, backend_xmin, age(backend_xmin) FROM pg_stat_activity WHERE backend_xmin <> 0 ORDER BY
age(backend_xmin)DESC LIMIT 3; 
>>> - oldest replication xmin:
>>>   SELECT pid, backend_xmin, age(backend_xmin) FROM pg_stat_replication WHERE backend_xmin <> 0 ORDER BY
age(backend_xmin)DESC LIMIT 3; 
>>> - oldest slot xmin:
>>>   SELECT slot_name, xmin, age(xmin), catalog_xmin, age(catalog_xmin) FROM pg_replication_slots WHERE xmin <> 0 OR
catalog_xmin<> 0 ORDER BY greatest(age(xmin), age(catalog_xmin)) DESC LIMIT 3; 
>>> - oldest prepared transaction (unfortunately xmin not available)
>>>   SELECT gid, database, transaction FROM pg_prepared_xacts ORDER BY age(transaction) LIMIT 3;
>>
>> I have output from this query:
https://github.com/OPMDG/check_pgactivity/blob/oldest_xmin/check_pgactivity#L5669-L5732
>> (which may be dumb).  Here are the non-null rows after 8AM GMT+1:

This stuff does not use 2PC and logical slots (there is one physical
slot for a WAL archiver), but we got a dump of pg_stat_activity.
Grepping more info about the evolution of pg_database and pg_class is
on our list.

> Could you share what the config of the server was?

Nothing really fancy:
- autovacuum_vacuum_cost_delay to 2ms (default of v12, but we used it
in v11 as well).
- autovacuum_naptime = 15s
- autovacuum_max_workers = 6
- log_autovacuum_min_duration = 0

>>> In that case you'd likely see DEBUG1 output, right? Did you try running
>>> with that?
>>
>> That's unfortunately not an option, as while the issue is reproducible, it
>> happens after roughly 16h with an average of 6.5ktps, and we definitely don't
>> have enough disk space for that amount of logs.  We'll try next time with a
>> recompiled version with the DEBUG1 promoted to WARNING, and maybe other elog.
>
> You wouldn't need to run it with DEBUG1 the whole time, you could just
> change the config and reload config once you hit the issue.

Sure, though it would reduce the window when the cluster is still up
and running.  One option that we have is just to increase the elevel
from DEBUG1 to WARNING for the log added in 2aa6e33 to keep the amount
of logs reduced without losing information, and we have no issue to
do runs with custom patches.

>> Not for now, because it's hard to be around for the ~ 90min interval the system
>> stays alive after atuvacuum get stuck, especially since it's a weekend bench.

Yeah, our room is very limited here.  I would likely not ne able to do
that, or I would just get very lucky with my timezone.  Not sure for
Julien.

>>> It sounds like what might be happening is that you have something
>>> holding back the "oldest needed transaction" horizon. Before
>>> 2aa6e331ead7, if the xmin horizon hasn't increased, there'd be repeated
>>> vacuums unable to increase the horizon, but afterwards they'll all
>>> immediately exit without any messages.
>>
>> The xmin horizon is increasing AFAICS.  Grepping the autovacuum logs, I can see
>> for the 08:00 AM UTC - 08:59 AM UTC interval the oldest xmin going from
>> 200069684 to 210363848.  E.g.:
>
> Which database is this on?

In this case this was the database used by the application, FWIW.  But
I can see from those logs that it kept increasing for all the other
databases, like postgres or template1 when this set of aggressive jobs
happened.

>> 266621:2020-03-14 08:49:59.945 UTC LOG:  automatic aggressive vacuum to prevent wraparound of table
"pg_toast.pg_toast_27712":index scans: 0 
>> 266622-    pages: 0 removed, 0 remain, 0 skipped due to pins, 0 skipped frozen
>> 266623-    tuples: 0 removed, 0 remain, 0 are dead but not yet removable, oldest xmin: 209635953
>> [...]
>
> Do you have any non-toast ones?

Yep, this includes catalogs and normal tables, based on the logs all
the relations triggered aggressive and anti-wraparound jobs.

>>> I wonder if what might be happening is that we're somehow missed/failed
>>> to update relfrozenxid and/or datfrozenxid. If you manually vacuum some
>>> table in the oldest database, but that is *NOT* the oldest table itself,
>>> does the problem "resolve" itself?
>>
>> I'll also add pg_class snapshot for next time we run the bench.  I'm not sure
>> if we'll be able to catch the 1h45 interval when the system stays alive
>> after the issue though.
>
> Could you just script something to stop the benchmark once the disk is
> 90% full or so?

Hmm.  I think that this one is possible still tricky.  There are some
alarms in place in this system.

> Did you see any errors / fatals around the time autovacuum stopped
> working?

Before going rogue (we are not sure if autovacuum didn't launch any
workers or if the workers were spawned and exited early as we did not
capture any worker information in pg_stat_activity), we saw a bunch of
aggressive wraparound jobs.  Even after that, we have traces in the
logs of one autovacuum analyze happening at equal interval of time (17
minutes) on one single table, which is...  Er...  uncommon to say the
least.
--
Michael

Attachment

pgsql-general by date:

Previous
From: Reid Thompson
Date:
Subject: Re: Loading 500m json files to database
Next
From: Andres Freund
Date:
Subject: Re: PG12 autovac issues