Re: BUG #13373: Imminent transaction id wrap around in spite of daily VACUUM - Mailing list pgsql-bugs

From Robert Haas
Subject Re: BUG #13373: Imminent transaction id wrap around in spite of daily VACUUM
Date
Msg-id CA+TgmoZdOQatWcg3SsuNdGPHZijk1CcdOa96DdvJP6273WddpQ@mail.gmail.com
Whole thread Raw
In response to BUG #13373: Imminent transaction id wrap around in spite of daily VACUUM  (ms@clickware.de)
List pgsql-bugs
On Thu, May 28, 2015 at 6:52 PM,  <ms@clickware.de> wrote:
> The following bug has been logged on the website:
>
> Bug reference:      13373
> Logged by:          Marc Schablewski
> Email address:      ms@clickware.de
> PostgreSQL version: 9.3.4
> Operating system:   SuSE Enterprise Linux 11.3
> Description:
>
> Today one of our servers started complaining about getting close to the
> transaction limit and that our database "mydb" should be VACUUMed to prevent
> shutdown and transaction ID wrap around, although the database is VACUUMed
> every night and autovacuum is enabled.
>
> 2015-05-28 01:42:04 CEST ::1(44588) 40015 WARNING:  database "mydb" must be
> vacuumed within 11000000 transactions
> 2015-05-28 01:42:04 CEST ::1(44588) 40015 HINT:  To avoid a database
> shutdown, execute a database-wide VACUUM in that database.
>         You might also need to commit or roll back old prepared
> transactions.
>
> Sadly this condition remained undetected until the automatic shutdown.
>
> Since the docs are a bit unclear on what kind of VACUUM one should run in
> this case, I started the database in single user mode and executed a regular
> VACUUM and a VACUUM FREEZE. Both commands ran for a while (~45 min) before
> returning and printing the same warning message about VACUUMing the
> database. Each time the remaining transaction count was reduced by one.
>
> As stated before, the database is VACUUMed once every night and autovacuum
> is enabled. We also do not use prepared transaction. So I wonder how the
> database got into this state.
>
> While analyzing the problem I noticed some other strange things about the
> database that might or might not be connected to our problem. There are
> still a lot (~100) of temporary tables left, even after closing all
> connections / shutting down the database.
>
> When I use this statement, taken from chapter 23.1.4. of the docs:
>
>         SELECT c.oid::regclass as table_name,
>                    greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as age
>         FROM pg_class c
>         LEFT JOIN pg_class t ON c.reltoastrelid = t.oid
>         WHERE c.relkind = 'r';
>
> all tables have an "age" of "3", except for most of the temp tables. Their
> "age" ranges from a couple of hundreds to millions:
>
>         ----
>          1: table_name = "pg_temp_3.temp_table0" (typeid = 2205, len = 4,
> typmod = -1, byval = t)
>          2: age = "2146483651"  (typeid = 23, len = 4, typmod = -1, byval =
> t)
>         ----
>          1: table_name = "pg_temp_48.temp_table1" (typeid = 2205, len = 4,
> typmod = -1, byval = t)
>          2: age = "901" (typeid = 23, len = 4, typmod = -1, byval = t)
>         ----
>
> The highest "age" is also the "age" of the database "mydb" (statement also
> taken from chapter 23.1.4.):
>
>                 SELECT datname, age(datfrozenxid) FROM pg_database;
>
>         ----
>          1: datname = "template1"       (typeid = 19, len = 64, typmod = -1,
> byval = f)
>          2: age = "50000001"    (typeid = 23, len = 4, typmod = -1, byval =
> t)
>         ----
>          1: datname = "template0"       (typeid = 19, len = 64, typmod = -1,
> byval = f)
>          2: age = "2068988610"  (typeid = 23, len = 4, typmod = -1, byval =
> t)
>         ----
>          1: datname = "postgres"        (typeid = 19, len = 64, typmod = -1,
> byval = f)
>          2: age = "111784253"   (typeid = 23, len = 4, typmod = -1, byval =
> t)
>         ----
>          1: datname = "mydb"     (typeid = 19, len = 64, typmod = -1, byval
> = f)
>          2: age = "2146483651"  (typeid = 23, len = 4, typmod = -1, byval =
> t)
>         ----
>
> I have to admit that I am not sure how to interpret these "age" values.
>
> Secondly, there are a lot of files in pg_clog. The oldest ones are going
> back to October 2014 and they seem to have been created continuously since
> then. The whole directory is about 512MB in size. The docs state that it
> should only be about 1/10th of that size with the default setting of
> autovacuum_freeze_max_age.
>
> All parameters relating to vacuum or autovacuum are unchanged in our
> postgresql.conf, i.e we are using their default values.
>
> The next thing I'll try is to drop all those temporary schemas and tables
> and doing another VACUUM afterwards. I suspect one of them might cause the
> issue, but I would be thankful for other ideas.

Have you checked for prepared transactions?

Dropping old temporary schemas (pg_temp_NNN) might be something to
try, although in theory autovacuum should have done that for you
automatically.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #13413: pg_stat_statements don't statistics "DEALLOCATE ALL" statements
Next
From: Norbert Kiam Maclang
Date:
Subject: postgresql core dump issue