Thread: datfrozenxid not dropping after vacuum

datfrozenxid not dropping after vacuum

From

Matthew Tice

Date:

01 September 2021, 20:34:48

Hi,

Starting this morning at 0830 local time I noticed that my
datfrozenxid starts moving past the `autovacuum_freeze_max_age` value
of 200000000.   When we encountered this in the past the solution has
been to do one of the following:

1. This is related an error similar to
```
found xmin 2675436435 from before relfrozenxid 321165377
```
Where the solution has been to move the `pg_internal.init` file out of
the way and let Postgresql recreate it.  Or;

2. A long-running transaction.  Typically I'll just find the `idle in
transaction` transactions that have a `query_start` around when my
alarm went off notifying me when `datfrozenxid` breaches
`autovacuum_freeze_max_age`.  Using a query similar to
```
SELECT pid, query_start, datname, usename, state, backend_xmin,
age(backend_xmin)
FROM pg_stat_activity
WHERE state = 'idle in transaction';
```

3. The autovacuum process seemed to be "stuck" on a particular table.
We would kill the pid of the autovacuum process.

The problem is that neither of these solutions have seemed to drop
`datfrozenxid` back down and there is one specific database in this
cluster that's holding onto it.

Using these queries from CrunchyData:

# Show oldest current xid
# WITH max_age AS (
    SELECT 2000000000 as max_old_xid
        , setting AS autovacuum_freeze_max_age
        FROM pg_catalog.pg_settings
        WHERE name = 'autovacuum_freeze_max_age' )
, per_database_stats AS (
    SELECT datname
        , m.max_old_xid::int
        , m.autovacuum_freeze_max_age::int
        , age(d.datfrozenxid) AS oldest_current_xid
    FROM pg_catalog.pg_database d
    JOIN max_age m ON (true)
    WHERE d.datallowconn )
SELECT max(oldest_current_xid) AS oldest_current_xid
    , max(ROUND(100*(oldest_current_xid/max_old_xid::float))) AS
percent_towards_wraparound
    , max(ROUND(100*(oldest_current_xid/autovacuum_freeze_max_age::float)))
AS percent_towards_emergency_autovac
FROM per_database_stats;

 oldest_current_xid | percent_towards_wraparound |
percent_towards_emergency_autovac
--------------------+----------------------------+-----------------------------------
          230935699 |                         12 |
          115

# Top 8 individual databases
SELECT datname
    , age(datfrozenxid)
    , current_setting('autovacuum_freeze_max_age')
FROM pg_database
ORDER BY 2 DESC LIMIT 8;

          datname          |    age    | current_setting
---------------------------+-----------+-----------------
 siteservice  | 230935699 | 200000000
 coupon-ws           | 217613246 | 200000000
 contextchangestore | 211343280 | 200000000
 template0                 | 210351639 | 200000000
 productmanager | 207876167 | 200000000
 rhistory              | 207876167 | 200000000
 smsservice   | 207876167 | 200000000
 techservice      | 205432524 | 200000000

That top database `siteservice` is the "problematic" one where a
manual vacuum won't coerce it to free up the `datfrozenxid`.

Looking at the tables in that database:

# SELECT c.oid::regclass
    , age(c.relfrozenxid)
    , pg_size_pretty(pg_total_relation_size(c.oid))
FROM pg_class c
JOIN pg_namespace n on c.relnamespace = n.oid
WHERE relkind IN ('r', 't', 'm')
AND n.nspname NOT IN ('pg_toast')
ORDER BY 2 DESC LIMIT 10;

       oid         |    age    | pg_size_pretty
--------------------+-----------+----------------
 pg_database        | 230935699 | 4264 kB
 pg_proc            |  93543215 | 976 kB
 pg_collation       |  93543215 | 560 kB
 pg_attribute       |  93543215 | 600 kB
 pg_shdepend        |  59515320 | 15 MB
 pg_statistic       |  53828900 | 464 kB
 pg_subscription    |  53172718 | 16 kB
 pg_pltemplate      |  53172718 | 56 kB
 pg_authid          |  53172718 | 8616 kB
 pg_db_role_setting |  53172718 | 64 kB

I thought maybe it had to do with my replication slots somehow:

# select slot_name, slot_type, database, active, catalog_xmin,
restart_lsn, confirmed_flush_lsn from pg_replication_slots ;
               slot_name                | slot_type |
database            | active | catalog_xmin |  restart_lsn  |
confirmed_flush_lsn

----------------------------------------+-----------+-------------------------------+--------+--------------+---------------+---------------------
 dbs1db02                               | physical  |
             | f      |              |               |
 dbs1db01                               | physical  |
             | t      |              | 4D25/ACE6EE08 |
 dbs1db03                               | physical  |
             | t      |              | 4D25/ACE6EE08 |
 dbs2db01                               | physical  |
             | t      |              | 4D25/ACE6EE08 |
 debezium_cmanager              | logical   | campaign-manager
     | t      |   2152258063 | 4D25/A421A6C8 | 4D25/ABC18C88
 debezium_rservice        | logical   | retail-content-service
| t      |   2152238060 | 4D25/8EC403B0 | 4D25/A6105DF8
 debezium_partnerservice | logical   | partnerservice | t      |
2152238060 | 4D25/8EC403B0 | 4D25/A5446630
 dbs1_dr                                | physical  |
             | f      |              |               |

So I either restarted the physical standbys or I restarted the logical
connections (Debezium -> Kafka).

I'm also working on a:
# vacuumdb --all --jobs=10 --echo --analyze

But this is still running (and honestly I'm not really hopeful it's
going to do the trick).

My primary, read/write database is Postgresql 10.4 (CentOS 7) while my
standby databases have been patched to 10.17.

Any thoughts on this?

Thanks,Matt

Re: datfrozenxid not dropping after vacuum

From

Alvaro Herrera

Date:

01 September 2021, 21:00:29

On 2021-Sep-01, Matthew Tice wrote:

[ problem table is pg_database ]

> My primary, read/write database is Postgresql 10.4 (CentOS 7) while my
> standby databases have been patched to 10.17.

Hmm, I think there was a bug in the early 10.x versions where advancing
the xid age of shared tables would not work correctly for some reason ...
Ah yes, this was fixed in 10.5, a mere three years ago:

Author: Andres Freund <andres@anarazel.de>
Branch: master Release: REL_11_BR [a54e1f158] 2018-06-12 11:13:21 -0700
Branch: REL_10_STABLE Release: REL_10_5 [2ce64caaf] 2018-06-12 11:13:21 -0700
Branch: REL9_6_STABLE Release: REL9_6_10 [6a46aba1c] 2018-06-12 11:13:21 -0700
Branch: REL9_5_STABLE Release: REL9_5_14 [14b3ec6f3] 2018-06-12 11:13:21 -0700
Branch: REL9_4_STABLE Release: REL9_4_19 [817f9f9a8] 2018-06-12 11:13:22 -0700
Branch: REL9_3_STABLE Release: REL9_3_24 [9b9b622b2] 2018-06-12 11:13:22 -0700

Fix bugs in vacuum of shared rels, by keeping their relcache entries current.

When vacuum processes a relation it uses the corresponding relcache
entry's relfrozenxid / relminmxid as a cutoff for when to remove
tuples etc. Unfortunately for nailed relations (i.e. critical system
catalogs) bugs could frequently lead to the corresponding relcache
entry being stale.

This set of bugs could cause actual data corruption as vacuum would
potentially not remove the correct row versions, potentially reviving
them at a later point. After 699bf7d05c some corruptions in this vein
were prevented, but the additional error checks could also trigger
spuriously. Examples of such errors are:
ERROR: found xmin ... from before relfrozenxid ...
and
ERROR: found multixact ... from before relminmxid ...
To be caused by this bug the errors have to occur on system catalog
tables.

The two bugs are:

1) Invalidations for nailed relations were ignored, based on the
theory that the relcache entry for such tables doesn't
change. Which is largely true, except for fields like relfrozenxid
etc. This means that changes to relations vacuumed in other
sessions weren't picked up by already existing sessions. Luckily
autovacuum doesn't have particularly longrunning sessions.

2) For shared *and* nailed relations, the shared relcache init file
was never invalidated while running. That means that for such
tables (e.g. pg_authid, pg_database) it's not just already existing
sessions that are affected, but even new connections are as well.
That explains why the reports usually were about pg_authid et. al.

To fix 1), revalidate the rd_rel portion of a relcache entry when
invalid. This implies a bit of extra complexity to deal with
bootstrapping, but it's not too bad. The fix for 2) is simpler,
simply always remove both the shared and local init files.

Author: Andres Freund
Reviewed-By: Alvaro Herrera
Discussion:
https://postgr.es/m/20180525203736.crkbg36muzxrjj5e@alap3.anarazel.de
https://postgr.es/m/CAMa1XUhKSJd98JW4o9StWPrfS=11bPgG+_GDMxe25TvUY4Sugg@mail.gmail.com
https://postgr.es/m/CAKMFJucqbuoDRfxPDX39WhA3vJyxweRg_zDVXzncr6+5wOguWA@mail.gmail.com
https://postgr.es/m/CAGewt-ujGpMLQ09gXcUFMZaZsGJC98VXHEFbF-tpPB0fB13K+A@mail.gmail.com
Backpatch: 9.3-

--
Álvaro Herrera 39°49'30"S 73°17'W
"El número de instalaciones de UNIX se ha elevado a 10,
y se espera que este número aumente" (UPM, 1972)

Re: datfrozenxid not dropping after vacuum

From

Matthew Tice

Date:

01 September 2021, 22:01:28

Hi Alvaro, thanks for the quick reply.

I'm scheduled to do my patching maintenance at the end of this month -
but at this point I don't think I'm going to make it.

Other than patching, is there a work around?  For example, in #2 above:
>The fix for 2) is simpler,
>    simply always remove both the shared and local init files.

I'm not familiar with the differences between 'shared' and 'local'
init files (I'd imagine I referenced a 'local' file in my original
post)?



Thanks!

Matt

On Wed, Sep 1, 2021 at 3:00 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2021-Sep-01, Matthew Tice wrote:
>
> [ problem table is pg_database ]
>
> > My primary, read/write database is Postgresql 10.4 (CentOS 7) while my
> > standby databases have been patched to 10.17.
>
> Hmm, I think there was a bug in the early 10.x versions where advancing
> the xid age of shared tables would not work correctly for some reason ...
> Ah yes, this was fixed in 10.5, a mere three years ago:
>
> Author: Andres Freund <andres@anarazel.de>
> Branch: master Release: REL_11_BR [a54e1f158] 2018-06-12 11:13:21 -0700
> Branch: REL_10_STABLE Release: REL_10_5 [2ce64caaf] 2018-06-12 11:13:21 -0700
> Branch: REL9_6_STABLE Release: REL9_6_10 [6a46aba1c] 2018-06-12 11:13:21 -0700
> Branch: REL9_5_STABLE Release: REL9_5_14 [14b3ec6f3] 2018-06-12 11:13:21 -0700
> Branch: REL9_4_STABLE Release: REL9_4_19 [817f9f9a8] 2018-06-12 11:13:22 -0700
> Branch: REL9_3_STABLE Release: REL9_3_24 [9b9b622b2] 2018-06-12 11:13:22 -0700
>
>     Fix bugs in vacuum of shared rels, by keeping their relcache entries current.
>
>     When vacuum processes a relation it uses the corresponding relcache
>     entry's relfrozenxid / relminmxid as a cutoff for when to remove
>     tuples etc. Unfortunately for nailed relations (i.e. critical system
>     catalogs) bugs could frequently lead to the corresponding relcache
>     entry being stale.
>
>     This set of bugs could cause actual data corruption as vacuum would
>     potentially not remove the correct row versions, potentially reviving
>     them at a later point.  After 699bf7d05c some corruptions in this vein
>     were prevented, but the additional error checks could also trigger
>     spuriously. Examples of such errors are:
>       ERROR: found xmin ... from before relfrozenxid ...
>     and
>       ERROR: found multixact ... from before relminmxid ...
>     To be caused by this bug the errors have to occur on system catalog
>     tables.
>
>     The two bugs are:
>
>     1) Invalidations for nailed relations were ignored, based on the
>        theory that the relcache entry for such tables doesn't
>        change. Which is largely true, except for fields like relfrozenxid
>        etc.  This means that changes to relations vacuumed in other
>        sessions weren't picked up by already existing sessions.  Luckily
>        autovacuum doesn't have particularly longrunning sessions.
>
>     2) For shared *and* nailed relations, the shared relcache init file
>        was never invalidated while running.  That means that for such
>        tables (e.g. pg_authid, pg_database) it's not just already existing
>        sessions that are affected, but even new connections are as well.
>        That explains why the reports usually were about pg_authid et. al.
>
>     To fix 1), revalidate the rd_rel portion of a relcache entry when
>     invalid. This implies a bit of extra complexity to deal with
>     bootstrapping, but it's not too bad.  The fix for 2) is simpler,
>     simply always remove both the shared and local init files.
>
>     Author: Andres Freund
>     Reviewed-By: Alvaro Herrera
>     Discussion:
>         https://postgr.es/m/20180525203736.crkbg36muzxrjj5e@alap3.anarazel.de
>         https://postgr.es/m/CAMa1XUhKSJd98JW4o9StWPrfS=11bPgG+_GDMxe25TvUY4Sugg@mail.gmail.com
>         https://postgr.es/m/CAKMFJucqbuoDRfxPDX39WhA3vJyxweRg_zDVXzncr6+5wOguWA@mail.gmail.com
>         https://postgr.es/m/CAGewt-ujGpMLQ09gXcUFMZaZsGJC98VXHEFbF-tpPB0fB13K+A@mail.gmail.com
>     Backpatch: 9.3-
>
>
> --
> Álvaro Herrera           39°49'30"S 73°17'W
> "El número de instalaciones de UNIX se ha elevado a 10,
> y se espera que este número aumente" (UPM, 1972)

Re: datfrozenxid not dropping after vacuum

From

Matthew Tice

Date:

02 September 2021, 12:27:43

Interestingly enough, I hopped on the database system this morning and
found the `datfrozenxid` dropped back down below
`autovacuum_freeze_max_age` around 0200 local time (roughly 18 hours
after the fact).

Looking through the Postgresql logs I don't see anything standing out
at that time.

I still plan on patching to 10.17 tonight.

Matt

On Wed, Sep 1, 2021 at 4:01 PM Matthew Tice <mjtice@gmail.com> wrote:
>
> Hi Alvaro, thanks for the quick reply.
>
> I'm scheduled to do my patching maintenance at the end of this month -
> but at this point I don't think I'm going to make it.
>
> Other than patching, is there a work around?  For example, in #2 above:
> >The fix for 2) is simpler,
> >    simply always remove both the shared and local init files.
>
> I'm not familiar with the differences between 'shared' and 'local'
> init files (I'd imagine I referenced a 'local' file in my original
> post)?
>
>
>
> Thanks!
>
> Matt
>
> On Wed, Sep 1, 2021 at 3:00 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >
> > On 2021-Sep-01, Matthew Tice wrote:
> >
> > [ problem table is pg_database ]
> >
> > > My primary, read/write database is Postgresql 10.4 (CentOS 7) while my
> > > standby databases have been patched to 10.17.
> >
> > Hmm, I think there was a bug in the early 10.x versions where advancing
> > the xid age of shared tables would not work correctly for some reason ...
> > Ah yes, this was fixed in 10.5, a mere three years ago:
> >
> > Author: Andres Freund <andres@anarazel.de>
> > Branch: master Release: REL_11_BR [a54e1f158] 2018-06-12 11:13:21 -0700
> > Branch: REL_10_STABLE Release: REL_10_5 [2ce64caaf] 2018-06-12 11:13:21 -0700
> > Branch: REL9_6_STABLE Release: REL9_6_10 [6a46aba1c] 2018-06-12 11:13:21 -0700
> > Branch: REL9_5_STABLE Release: REL9_5_14 [14b3ec6f3] 2018-06-12 11:13:21 -0700
> > Branch: REL9_4_STABLE Release: REL9_4_19 [817f9f9a8] 2018-06-12 11:13:22 -0700
> > Branch: REL9_3_STABLE Release: REL9_3_24 [9b9b622b2] 2018-06-12 11:13:22 -0700
> >
> >     Fix bugs in vacuum of shared rels, by keeping their relcache entries current.
> >
> >     When vacuum processes a relation it uses the corresponding relcache
> >     entry's relfrozenxid / relminmxid as a cutoff for when to remove
> >     tuples etc. Unfortunately for nailed relations (i.e. critical system
> >     catalogs) bugs could frequently lead to the corresponding relcache
> >     entry being stale.
> >
> >     This set of bugs could cause actual data corruption as vacuum would
> >     potentially not remove the correct row versions, potentially reviving
> >     them at a later point.  After 699bf7d05c some corruptions in this vein
> >     were prevented, but the additional error checks could also trigger
> >     spuriously. Examples of such errors are:
> >       ERROR: found xmin ... from before relfrozenxid ...
> >     and
> >       ERROR: found multixact ... from before relminmxid ...
> >     To be caused by this bug the errors have to occur on system catalog
> >     tables.
> >
> >     The two bugs are:
> >
> >     1) Invalidations for nailed relations were ignored, based on the
> >        theory that the relcache entry for such tables doesn't
> >        change. Which is largely true, except for fields like relfrozenxid
> >        etc.  This means that changes to relations vacuumed in other
> >        sessions weren't picked up by already existing sessions.  Luckily
> >        autovacuum doesn't have particularly longrunning sessions.
> >
> >     2) For shared *and* nailed relations, the shared relcache init file
> >        was never invalidated while running.  That means that for such
> >        tables (e.g. pg_authid, pg_database) it's not just already existing
> >        sessions that are affected, but even new connections are as well.
> >        That explains why the reports usually were about pg_authid et. al.
> >
> >     To fix 1), revalidate the rd_rel portion of a relcache entry when
> >     invalid. This implies a bit of extra complexity to deal with
> >     bootstrapping, but it's not too bad.  The fix for 2) is simpler,
> >     simply always remove both the shared and local init files.
> >
> >     Author: Andres Freund
> >     Reviewed-By: Alvaro Herrera
> >     Discussion:
> >         https://postgr.es/m/20180525203736.crkbg36muzxrjj5e@alap3.anarazel.de
> >         https://postgr.es/m/CAMa1XUhKSJd98JW4o9StWPrfS=11bPgG+_GDMxe25TvUY4Sugg@mail.gmail.com
> >         https://postgr.es/m/CAKMFJucqbuoDRfxPDX39WhA3vJyxweRg_zDVXzncr6+5wOguWA@mail.gmail.com
> >         https://postgr.es/m/CAGewt-ujGpMLQ09gXcUFMZaZsGJC98VXHEFbF-tpPB0fB13K+A@mail.gmail.com
> >     Backpatch: 9.3-
> >
> >
> > --
> > Álvaro Herrera           39°49'30"S 73°17'W
> > "El número de instalaciones de UNIX se ha elevado a 10,
> > y se espera que este número aumente" (UPM, 1972)

Re: datfrozenxid not dropping after vacuum

From

Alvaro Herrera

Date:

02 September 2021, 13:13:39

On 2021-Sep-01, Matthew Tice wrote:

> Hi Alvaro, thanks for the quick reply.

Hi. Glad to hear that your problem is now behind.

> I'm scheduled to do my patching maintenance at the end of this month -
> but at this point I don't think I'm going to make it.
> 
> Other than patching, is there a work around?

Hm, in my previous reply I had written a suggestion to vacuum
pg_database in the offending database after deleting the
pg_internal.init file, but evidently I edited it out before sending.
(Probably because I wasn't sure if you need to delete file, connect,
vacuum, or rather connect, delete file, vacuum.)

> For example, in #2 above:
> >The fix for 2) is simpler,
> >    simply always remove both the shared and local init files.
> 
> I'm not familiar with the differences between 'shared' and 'local'
> init files (I'd imagine I referenced a 'local' file in my original
> post)?

The global file is in the global/ subdirectory of the data directory,
and the "local" ones are each in the corresponding database directory:

cd $PGDATA
$ find . -name pg_internal.init
./base/12758/pg_internal.init
./base/46212/pg_internal.init
./global/pg_internal.init

$ psql -c "select oid, datname from pg_database"
  oid  |  datname   
-------+------------
 12757 | postgres
 12758 | alvherre
     1 | template1
 12756 | template0
 46212 | regression
(5 filas)

So in the above there are cache files for databases regression and
alvherre, plus the global one.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Hay dos momentos en la vida de un hombre en los que no debería
especular: cuando puede permitírselo y cuando no puede" (Mark Twain)