Thread: BUG #11264: Auto vacuum wraparound job blocking everything

BUG #11264: Auto vacuum wraparound job blocking everything

From

dbhandary@switchfly.com

Date:

26 August 2014, 07:01:12

The following bug has been logged on the website:

Bug reference:      11264
Logged by:          Dines Bhandary
Email address:      dbhandary@switchfly.com
PostgreSQL version: 9.3.5
Operating system:   Centos 6.5
Description:

We recently upgraded to postgressql 9.3.5  from postgres 9.1 using
pg_upgrade. We are running streaming replication. DB was functioning without
any issues till today where we discovered an auto vacuum wraparound job on
one of our tables which was blocking everything. There were no waiting jobs
when we queried pg_stat_activity, but incoming request would just hang, so
database was unusable.

We tried to disable auto vacuum for a table in question and tried kill auto
vacuum job, but that did not work. Also we had pg_dump running for several
hours for that particular table where auto vacuum was running. We tried
killing pg_dump job, but it did not succeed.

We eventually had to force shutdown the database and disable auto vacuum and
increase vacuum_freeze parameter so that auto vacuum wraparound job won't
trigger until we figure out what the issue is.

I am including debug trace here. It seems like auto vacuum is running in an
infinite loop. Will be happy to provide more info if necessary.

(gdb) bt
#0  0x00007fb9740eb5b3 in __select_nocancel () from /lib64/libc.so.6
#1  0x000000000075ad7a in pg_usleep ()
#2  0x00000000004ae2d4 in GetMultiXactIdMembers ()
#3  0x000000000047f6cc in heap_prepare_freeze_tuple ()
#4  0x000000000057d036 in lazy_vacuum_rel ()
#5  0x000000000057b346 in ?? ()
#6  0x000000000057b67c in vacuum ()
#7  0x000000000060cf8a in ?? ()
#8  0x000000000060d426 in ?? ()
#9  0x000000000060d4f6 in StartAutoVacWorker ()
#10 0x0000000000617d42 in ?? ()
#11 <signal handler called>
#12 0x00007fb9740eb5b3 in __select_nocancel () from /lib64/libc.so.6
#13 0x00000000006191fd in PostmasterMain ()
#14 0x00000000005b5e60 in main ()
(gdb)

 16423 | jetblue | 21972 |       10 | postgres |                  |
   |                 |             | 2014-08-24 18:39:38.134514+00 |
2014-08-24 18:39:38.161612+00 | 2014-08-24 18:39:38.161612+00 | 2014-08-24
18:39:38.161614+00 | f       | active | autovacuum: VACUUM
settings.room_contract_service_code (to prevent wraparound)

 COPY:

jetblue=# select * from pg_stat_activity where query ilike '%copy%';
 datid | datname |  pid  | usesysid | usename | application_name |
client_addr | client_hostname | client_port |         backend_start
|          xact_start           |          query_start          |
state_change          | waiting | state  |
                                               query


-------+---------+-------+----------+---------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+---------+--------+--------------------------------------------------------------------------------------------------------
------------------------------------------------------------------
 16423 | jetblue | 30246 |    20350 | m84     | pg_dump          |
  |                 |          -1 | 2014-08-25 03:48:59.807689+00 |
2014-08-25 03:48:59.810871+00 | 2014-08-25 04:26:14.95753+00  | 2014-08-25
04:26:14.957534+00 | f       | active | COPY
settings.room_contract_service_code (room_contract_service_code,
room_contract_service_description
, room_contract_service_code_id, supplierid, inactive) TO stdout;
 16423 | jetblue | 31999 |    20350 | m84     |                  |
10.33.11.11 |                 |       40735 | 2014-08-25 10:38:02.944083+00
| 2014-08-25 10:38:02.964562+00 | 2014-08-25 10:38:22.805859+00 | 2014-08-25
10:38:22.805861+00 | f       | active | COPY
settings.room_contract_service_code (room_contract_service_code,
room_contract_service_description
, room_contract_service_code_id, supplierid, inactive) TO stdout;

 (gdb) bt
#0  0x00007fb9740f5187 in semop () from /lib64/libc.so.6
#1  0x0000000000609397 in PGSemaphoreLock ()
#2  0x000000000064e871 in LWLockAcquire ()
#3  0x000000000047e1a9 in ?? ()
#4  0x000000000047e66a in ?? ()
#5  0x000000000047f0a6 in heap_getnext ()
#6  0x00000000005343e6 in ?? ()
#7  0x000000000053521a in DoCopy ()
#8  0x000000000066112d in standard_ProcessUtility ()
#9  0x00007fb96c4df261 in ?? () from
/usr/pgsql-9.3/lib/pg_stat_statements.so
#10 0x000000000065db97 in ?? ()
#11 0x000000000065eb2d in ?? ()
#12 0x000000000065f223 in PortalRun ()
#13 0x000000000065b91e in ?? ()
#14 0x000000000065d048 in PostgresMain ()
#15 0x0000000000619c36 in PostmasterMain ()
#16 0x00000000005b5e60 in main ()


SELECTS:

datid            | 16423
datname          | jetblue
pid              | 15599
usesysid         | 20354
usename          | developers
application_name |
client_addr      | 10.33.11.11
client_hostname  |
client_port      | 60741
backend_start    | 2014-08-25 20:20:38.985486+00
xact_start       | 2014-08-25 20:39:13.426937+00
query_start      | 2014-08-25 20:39:13.497362+00
state_change     | 2014-08-25 20:39:13.497365+00
waiting          | f
state            | active
query            |
                 |         SELECT room_contract_service_code,
room_contract_service_description, room_contract_service_code_id
                 |         FROM room_contract_service_code
                 |         WHERE room_contract_service_code.supplierid = $1
                 |                 AND NOT inactive
                 |         ORDER by room_contract_service_description
(gdb) bt
#0  0x00007fb9740f5187 in semop () from /lib64/libc.so.6
#1  0x0000000000609397 in PGSemaphoreLock ()
#2  0x000000000064e871 in LWLockAcquire ()
#3  0x000000000048d05d in index_fetch_heap ()
#4  0x000000000048d22e in index_getnext ()
#5  0x0000000000595e96 in ?? ()
#6  0x000000000058aabe in ExecScan ()
#7  0x0000000000583c08 in ExecProcNode ()
#8  0x000000000059c4e9 in ExecSort ()
#9  0x0000000000583b28 in ExecProcNode ()
#10 0x0000000000582a42 in standard_ExecutorRun ()
#11 0x00007fb96c4df4db in ?? () from
/usr/pgsql-9.3/lib/pg_stat_statements.so
#12 0x000000000065df67 in ?? ()
#13 0x000000000065f191 in PortalRun ()
#14 0x000000000065d3ed in PostgresMain ()
#15 0x0000000000619c36 in PostmasterMain ()
#16 0x00000000005b5e60 in main ()
(gdb)

Re: BUG #11264: Auto vacuum wraparound job blocking everything

From

Alvaro Herrera

Date:

26 August 2014, 07:03:37

dbhandary@switchfly.com wrote:

> I am including debug trace here. It seems like auto vacuum is running in an
> infinite loop. Will be happy to provide more info if necessary.
>
> (gdb) bt
> #0  0x00007fb9740eb5b3 in __select_nocancel () from /lib64/libc.so.6
> #1  0x000000000075ad7a in pg_usleep ()
> #2  0x00000000004ae2d4 in GetMultiXactIdMembers ()
> #3  0x000000000047f6cc in heap_prepare_freeze_tuple ()
> #4  0x000000000057d036 in lazy_vacuum_rel ()

Can you please provide pg_controldata output?

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #11264: Auto vacuum wraparound job blocking everything

From

Dinesh Bhandary

Date:

26 August 2014, 22:35:14

Hi Alvaro,

Here is the output of pg_controldata. We also looked at latest 9.3.5 fix fo=
r pg_upgrade and ran the query to check if we have to remove  $PGDATA/pg_mu=
ltixact/offsets/0000, but the query returned false.=20


pg_control version number:            937
Catalog version number:               201306121
Database system identifier:           6037642065780369403
Database cluster state:               in production
pg_control last modified:             Tue 26 Aug 2014 02:40:19 PM UTC
Latest checkpoint location:           1054/17E0BE30
Prior checkpoint location:            1054/15B2DEB8
Latest checkpoint's REDO location:    1054/16EC6028
Latest checkpoint's REDO WAL file:    000000010000105400000016
Latest checkpoint's TimeLineID:       1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          0/1004399358
Latest checkpoint's NextOID:          1303270531
Latest checkpoint's NextMultiXactId:  23431
Latest checkpoint's NextMultiOffset:  5678
Latest checkpoint's oldestXID:        801997459
Latest checkpoint's oldestXID's DB:   16428
Latest checkpoint's oldestActiveXID:  1004399358
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 16423
Time of latest checkpoint:            Tue 26 Aug 2014 02:37:48 PM UTC
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location:     0/0
Min recovery ending loc's timeline:   0
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
Current wal_level setting:            hot_standby
Current max_connections setting:      1024
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   64
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value
Data page checksum version:           0


Thanks
Dinesh

-----Original Message-----
From: Alvaro Herrera [mailto:alvherre@2ndquadrant.com]=20
Sent: Monday, August 25, 2014 9:03 PM
To: Dinesh Bhandary
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #11264: Auto vacuum wraparound job blocking everyth=
ing

dbhandary@switchfly.com wrote:

> I am including debug trace here. It seems like auto vacuum is running=20
> in an infinite loop. Will be happy to provide more info if necessary.
>=20
> (gdb) bt
> #0  0x00007fb9740eb5b3 in __select_nocancel () from /lib64/libc.so.6
> #1  0x000000000075ad7a in pg_usleep ()
> #2  0x00000000004ae2d4 in GetMultiXactIdMembers ()
> #3  0x000000000047f6cc in heap_prepare_freeze_tuple ()
> #4  0x000000000057d036 in lazy_vacuum_rel ()

Can you please provide pg_controldata output?

--=20
=C1lvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #11264: Auto vacuum wraparound job blocking everything

From

Alvaro Herrera

Date:

27 August 2014, 18:26:47

dbhandary@switchfly.com wrote:

> We recently upgraded to postgressql 9.3.5  from postgres 9.1 using
> pg_upgrade. We are running streaming replication. DB was functioning without
> any issues till today where we discovered an auto vacuum wraparound job on
> one of our tables which was blocking everything. There were no waiting jobs
> when we queried pg_stat_activity, but incoming request would just hang, so
> database was unusable.

Can you please paste

select oid::regclass, relfrozenxid, relminmxid from pg_class where relname = 'room_contract_service_code';
select datname, datfrozenxid, datminmxid from pg_database where datname = 'jetblue';

> (gdb) bt
> #0  0x00007fb9740eb5b3 in __select_nocancel () from /lib64/libc.so.6
> #1  0x000000000075ad7a in pg_usleep ()
> #2  0x00000000004ae2d4 in GetMultiXactIdMembers ()
> #3  0x000000000047f6cc in heap_prepare_freeze_tuple ()
> #4  0x000000000057d036 in lazy_vacuum_rel ()

This stall in GetMultiXactIdMembers should only occur when somebody is
writing the multixact immediately following.  It certainly should not
sleep for long.  I'm not sure what's happening here but this is probably
where the problem is.  While waiting, it holds the buffer content lock
in exclusive mode.

Can you install debug symbols and pageinspect?  I'm curious why this is
stalling.  If the tuple has a FOR SHARE or FOR UPDATE marking from
before the upgrade, it shouldn't be inquiring the multixact members at
all.  I'm curious about

Latest checkpoint's NextMultiXactId:  23431
                                                                                           
Latest checkpoint's NextMultiOffset:  5678
                                                                                           
Latest checkpoint's oldestMultiXid:   1
                                                                                           
Latest checkpoint's oldestMulti's DB: 16423
                                                                                           

It says the oldest multi is 1, so the database should not have any
values between 1 and 23431 that correspond to pg_upgrade'd multixact
values ... so what is going on here?  Unless the recent mucking with
pg_upgrade to handle multixact's got something wrong.


> 10.33.11.11 |                 |       40735 | 2014-08-25 10:38:02.944083+00
> | 2014-08-25 10:38:02.964562+00 | 2014-08-25 10:38:22.805859+00 | 2014-08-25
> 10:38:22.805861+00 | f       | active | COPY
> settings.room_contract_service_code (room_contract_service_code,
> room_contract_service_description
> , room_contract_service_code_id, supplierid, inactive) TO stdout;
>
>  (gdb) bt
> #0  0x00007fb9740f5187 in semop () from /lib64/libc.so.6
> #1  0x0000000000609397 in PGSemaphoreLock ()
> #2  0x000000000064e871 in LWLockAcquire ()
> #3  0x000000000047e1a9 in ?? ()
> #4  0x000000000047e66a in ?? ()
> #5  0x000000000047f0a6 in heap_getnext ()
> #6  0x00000000005343e6 in ?? ()
> #7  0x000000000053521a in DoCopy ()
> #8  0x000000000066112d in standard_ProcessUtility ()
> #9  0x00007fb96c4df261 in ?? () from /usr/pgsql-9.3/lib/pg_stat_statements.so

This one is stalling on the buffer content lock trying to read the
table to dump it.

> SELECTS:
>
> query            |
>                  |         SELECT room_contract_service_code,
> room_contract_service_description, room_contract_service_code_id
>                  |         FROM room_contract_service_code
>                  |         WHERE room_contract_service_code.supplierid = $1
>                  |                 AND NOT inactive
>                  |         ORDER by room_contract_service_description
> (gdb) bt
> #0  0x00007fb9740f5187 in semop () from /lib64/libc.so.6
> #1  0x0000000000609397 in PGSemaphoreLock ()
> #2  0x000000000064e871 in LWLockAcquire ()
> #3  0x000000000048d05d in index_fetch_heap ()
> #4  0x000000000048d22e in index_getnext ()

And this one is stalling in the same buffer lock most likely.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #11264: Auto vacuum wraparound job blocking everything

From

Alvaro Herrera

Date:

27 August 2014, 18:39:33

Alvaro Herrera wrote:

> > (gdb) bt
> > #0  0x00007fb9740eb5b3 in __select_nocancel () from /lib64/libc.so.6
> > #1  0x000000000075ad7a in pg_usleep ()
> > #2  0x00000000004ae2d4 in GetMultiXactIdMembers ()
> > #3  0x000000000047f6cc in heap_prepare_freeze_tuple ()
> > #4  0x000000000057d036 in lazy_vacuum_rel ()
>
> This stall in GetMultiXactIdMembers should only occur when somebody is
> writing the multixact immediately following.  It certainly should not
> sleep for long.  I'm not sure what's happening here but this is probably
> where the problem is.  While waiting, it holds the buffer content lock
> in exclusive mode.
>
> Can you install debug symbols and pageinspect?

If you have debug symbols, please attach to this process, then do in
GDB:
  bt full

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #11264: Auto vacuum wraparound job blocking everything

From

Alvaro Herrera

Date:

27 August 2014, 18:48:07

Dinesh Bhandary wrote:

> We used psotgres 9.3.4 to pg_upgrade from 9.1. We just upgraded
> binaries to 9.3.5 just last week ? Would this cause any issues with
> MultiXid issue.

So you ran 200 million write transactions in one week?

Do you still have the 9.1 data from before the upgrade?  If so, would
you paste pg_controldata output from that?

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #11264: Auto vacuum wraparound job blocking everything

From

Dinesh Bhandary

Date:

27 August 2014, 18:57:44

Hi Alvaro,

We did manual vacuum and reindex of room_contract_service_code and it finis=
hed quickly.  We still have auto vacuum disabled.


1 ) select oid::regclass, relfrozenxid, relminmxid from pg_class where reln=
ame =3D 'room_contract_service_code'; select datname, datfrozenxid, datminm=
xid from pg_database where datname =3D 'jetblue';

 jetblue=3D# select oid::regclass, relfrozenxid, relminmxid from pg_class w=
here relname =3D 'room_contract_service_code'
jetblue-# ;
            oid             | relfrozenxid | relminmxid
----------------------------+--------------+------------
 room_contract_service_code |    953567787 |      20784
(1 row)


jetblue=3D# select datname, datfrozenxid, datminmxid from pg_database where=
 datname =3D 'jetblue';
 datname | datfrozenxid | datminmxid
---------+--------------+------------
 jetblue |    870700471 |          1
(1 row)


2) Latest checkpoint's NextMultiXactId:  23431                             =
                                                                           =
                                                               =20
Latest checkpoint's NextMultiOffset:  5678                                 =
                                                                           =
                                                            =20
Latest checkpoint's oldestMultiXid:   1                                    =
                                                                           =
                                                            =20
Latest checkpoint's oldestMulti's DB: 16423                                =
                                                                           =
                                                            =20

It says the oldest multi is 1, so the database should not have any values b=
etween 1 and 23431 that correspond to pg_upgrade'd multixact values ... so =
what is going on here?  Unless the recent mucking with pg_upgrade to handle=
 multixact's got something wrong.

We used psotgres 9.3.4 to pg_upgrade from 9.1. We just upgraded binaries to=
 9.3.5 just last week ? Would this cause any issues with MultiXid issue.

Thanks
Dinesh

Re: BUG #11264: Auto vacuum wraparound job blocking everything

From

Alvaro Herrera

Date:

27 August 2014, 19:45:34

Dinesh Bhandary wrote:
> Here is the pg_controldata output from 9.1. Thanks. Dinesh
>
> $ /usr/pgsql-9.1/bin/pg_controldata data

> Time of latest checkpoint:            Sat 19 Jul 2014 07:02:38 AM GMT

Ah, so you upgraded a month ago, not one week ago, right?

> Latest checkpoint's NextMultiXactId:  20783

I'm betting this is the problem.  The 9.3 cluster should have
oldestMulti=20783, but instead it has oldestMulti=1.  Either pg_upgrade
is misreading the nextMulti value from the old cluster, or pg_resetxlog
is failing to set oldestMulti appropriately ... or something funny
happened to pg_control in the meantime.  I just ran a quick test, and I
see that pg_upgrade is setting oldestMulti correctly (to the old
cluster's NextMultiXactId value) in the new cluster when upgrading from
9.1.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #11264: Auto vacuum wraparound job blocking everything

From

Dinesh Bhandary

Date:

27 August 2014, 19:46:48

Here is the pg_controldata output from 9.1. Thanks. Dinesh

$ /usr/pgsql-9.1/bin/pg_controldata data
pg_control version number:            903
Catalog version number:               201105231
Database system identifier:           5720710153987476471
Database cluster state:               in production
pg_control last modified:             Sat 19 Jul 2014 07:05:08 AM GMT
Latest checkpoint location:           FC0/25643788
Prior checkpoint location:            FC0/169F2238
Latest checkpoint's REDO location:    FC0/1D919830
Latest checkpoint's TimeLineID:       1
Latest checkpoint's NextXID:          0/950483638
Latest checkpoint's NextOID:          1146307754
Latest checkpoint's NextMultiXactId:  20783
Latest checkpoint's NextMultiOffset:  42669
Latest checkpoint's oldestXID:        750608965
Latest checkpoint's oldestXID's DB:   16411
Latest checkpoint's oldestActiveXID:  0
Time of latest checkpoint:            Sat 19 Jul 2014 07:02:38 AM GMT
Minimum recovery ending location:     0/0
Backup start location:                0/0
Current wal_level setting:            minimal
Current max_connections setting:      1024
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   256
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value

-----Original Message-----
From: Alvaro Herrera [mailto:alvherre@2ndquadrant.com]=20
Sent: Wednesday, August 27, 2014 8:48 AM
To: Dinesh Bhandary
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #11264: Auto vacuum wraparound job blocking everyth=
ing

Dinesh Bhandary wrote:

> We used psotgres 9.3.4 to pg_upgrade from 9.1. We just upgraded=20
> binaries to 9.3.5 just last week ? Would this cause any issues with=20
> MultiXid issue.

So you ran 200 million write transactions in one week?

Do you still have the 9.1 data from before the upgrade?  If so, would you p=
aste pg_controldata output from that?

--=20
=C1lvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #11264: Auto vacuum wraparound job blocking everything

From

Alvaro Herrera

Date:

27 August 2014, 20:51:20

Dinesh Bhandary wrote:
> Yes, it was upgraded a month ago using postgresql 9.3.4 binaries. It was
> just patched up last week with 9.3.5.

Ok, that makes a lot more sense.

> I checked  all the other servers which we upgraded using postgres 9.3.5
> binaries, oldestMultiXid is other than 1 and it seems to be have picked
> the right number. But the server in question was upgraded using postgres
> 9.3.4 binaries. This sounds suspicious. Could it be pg_upgrade in 9.3.4 is
> buggy.

Well, yes, 9.3.4 had a bug fixed by this commit:

Author: Bruce Momjian <bruce@momjian.us>
Branch: master [a61daa14d] 2014-07-02 15:29:38 -0400
Branch: REL9_4_STABLE [b446a384b] 2014-07-02 15:29:38 -0400
Branch: REL9_3_STABLE Release: REL9_3_5 [3d2e18510] 2014-07-02 15:29:38 -0400

    pg_upgrade:  preserve database and relation minmxid values

    Also set these values for pre-9.3 old clusters that don't have values to
    preserve.

    Analysis by Alvaro

    Backpatch through 9.3


> How do we fix the current issue with this one server? Is there an easy fix?
> Thanks.

As far as I am aware, you should
  UPDATE pg_database SET datminmxid=20783

and that should fix it.  The oldestMulti value in pg_control would get
updated by itself some time later.  If you experience stalls before
oldestMulti fixes itself, you could stop the server (cleanly!) and then
pg_resetxlog -m x,y where x is the correct nextMulti value from
pg_controldata and y is 20783.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #11264: Auto vacuum wraparound job blocking everything

From

Dinesh Bhandary

Date:

27 August 2014, 20:57:22

Yes, it was upgraded a month ago using postgresql 9.3.4 binaries. It was
just patched up last week with 9.3.5.

I checked  all the other servers which we upgraded using postgres 9.3.5
binaries, oldestMultiXid is other than 1 and it seems to be have picked
the right number. But the server in question was upgraded using postgres
9.3.4 binaries. This sounds suspicious. Could it be pg_upgrade in 9.3.4 is
buggy.

How do we fix the current issue with this one server? Is there an easy fix?
Thanks.
Dinesh

On 8/27/14, 9:45 AM, "Alvaro Herrera" <alvherre@2ndquadrant.com> wrote:

>Dinesh Bhandary wrote:
>> Here is the pg_controldata output from 9.1. Thanks. Dinesh
>>=20
>> $ /usr/pgsql-9.1/bin/pg_controldata data
>
>> Time of latest checkpoint:            Sat 19 Jul 2014 07:02:38 AM GMT
>
>Ah, so you upgraded a month ago, not one week ago, right?
>
>
>> Latest checkpoint's NextMultiXactId:  20783
>
>I'm betting this is the problem.  The 9.3 cluster should have
>oldestMulti=3D20783, but instead it has oldestMulti=3D1.  Either pg_upgrad=
e
>is misreading the nextMulti value from the old cluster, or pg_resetxlog
>is failing to set oldestMulti appropriately ... or something funny
>happened to pg_control in the meantime.  I just ran a quick test, and I
>see that pg_upgrade is setting oldestMulti correctly (to the old
>cluster's NextMultiXactId value) in the new cluster when upgrading from
>9.1.
>
>--=20
>=C1lvaro Herrera                http://www.2ndQuadrant.com/
>PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #11264: Auto vacuum wraparound job blocking everything

From

Dinesh Bhandary

Date:

28 August 2014, 01:03:45

Hi Alvaro,

Thank you so much for jumping onto this issue. I greatly appreciate it.

We will be testing the fix that you have mentioned in test env first. I
will let you know when we are done with testing. Then we can update
production db.

Dinesh

-----Original Message-----
From: Alvaro Herrera [mailto:alvherre@2ndquadrant.com]
Sent: Wednesday, August 27, 2014 10:51 AM
To: Dinesh Bhandary
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #11264: Auto vacuum wraparound job blocking
everything

Dinesh Bhandary wrote:
> Yes, it was upgraded a month ago using postgresql 9.3.4 binaries. It
> was just patched up last week with 9.3.5.

Ok, that makes a lot more sense.

> I checked  all the other servers which we upgraded using postgres
> 9.3.5 binaries, oldestMultiXid is other than 1 and it seems to be have
> picked the right number. But the server in question was upgraded using
> postgres
> 9.3.4 binaries. This sounds suspicious. Could it be pg_upgrade in
> 9.3.4 is buggy.

Well, yes, 9.3.4 had a bug fixed by this commit:

Author: Bruce Momjian <bruce@momjian.us>
Branch: master [a61daa14d] 2014-07-02 15:29:38 -0400
Branch: REL9_4_STABLE [b446a384b] 2014-07-02 15:29:38 -0400
Branch: REL9_3_STABLE Release: REL9_3_5 [3d2e18510] 2014-07-02 15:29:38
-0400

    pg_upgrade:  preserve database and relation minmxid values
   =20
    Also set these values for pre-9.3 old clusters that don't have values
to
    preserve.
   =20
    Analysis by Alvaro
   =20
    Backpatch through 9.3


> How do we fix the current issue with this one server? Is there an easy
>fix?
> Thanks.

As far as I am aware, you should
  UPDATE pg_database SET datminmxid=3D20783

and that should fix it.  The oldestMulti value in pg_control would get
updated by itself some time later.  If you experience stalls before
oldestMulti fixes itself, you could stop the server (cleanly!) and then
pg_resetxlog -m x,y where x is the correct nextMulti value from
pg_controldata and y is 20783.

--=20
=C1lvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #11264: Auto vacuum wraparound job blocking everything

From

Alvaro Herrera

Date:

29 August 2014, 22:17:38

Dinesh Bhandary wrote:
> and that should fix it.  The oldestMulti value in pg_control would get updated by itself some time later.  If you
experiencestalls before oldestMulti fixes itself, you could stop the server (cleanly!) and then pg_resetxlog -m x,y
wherex is the correct nextMulti value from pg_controldata and y is 20783. 
>
> Are you referreing to NextMultiXactId of current pg_controldata ( from postgres 9.3.5) for x?
> There is also NextMultiOffset, not sure If you meant to reset this. Please confirm.

No, leave the offset alone.  I meant that the "x" value (i.e. the part
before the comma) should stay unchanged; you only need to change the
"oldest" value, which is "y" (part after the comma).

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #11264: Auto vacuum wraparound job blocking everything

From

Alvaro Herrera

Date:

30 August 2014, 00:08:12

Dinesh Bhandary wrote:
> I think the fix is working. Now I am able to kill copy job while auto
> vacuum job is going on that table.

I hope this killing of autovacuum jobs is not continued operational
practice.  You need whole-table vacuums of all your tables once in a
while.  (Note you can increase the "freeze_table_age" parameters if you
want to be forced to do that less frequently.  See the docs.)

> Since I am setting pg_resetxlog on the primary, do I need to do that on
> the slave too? Please let me know

No, the slave should fix itself once the checkpoint record is replayed.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #11264: Auto vacuum wraparound job blocking everything

From

Dinesh Bhandary

Date:

30 August 2014, 00:08:15

and that should fix it.  The oldestMulti value in pg_control would get upda=
ted by itself some time later.  If you experience stalls before oldestMulti=
 fixes itself, you could stop the server (cleanly!) and then pg_resetxlog -=
m x,y where x is the correct nextMulti value from pg_controldata and y is 2=
0783.

Are you referreing to NextMultiXactId of current pg_controldata ( from post=
gres 9.3.5) for x?
There is also NextMultiOffset, not sure If you meant to reset this. Please =
confirm.

Thanks,
Dinesh

Re: BUG #11264: Auto vacuum wraparound job blocking everything

From

Dinesh Bhandary

Date:

30 August 2014, 00:14:59

I think the fix is working. Now I am able to kill copy job while auto
vacuum job is going on that table.

Since I am setting pg_resetxlog on the primary, do I need to do that on
the slave too? Please let me know

Thanks.
Dinesh

On 8/29/14, 12:17 PM, "Alvaro Herrera" <alvherre@2ndquadrant.com> wrote:

>Dinesh Bhandary wrote:
>> and that should fix it.  The oldestMulti value in pg_control would get
>>updated by itself some time later.  If you experience stalls before
>>oldestMulti fixes itself, you could stop the server (cleanly!) and then
>>pg_resetxlog -m x,y where x is the correct nextMulti value from
>>pg_controldata and y is 20783.
>>=20
>> Are you referreing to NextMultiXactId of current pg_controldata ( from
>>postgres 9.3.5) for x?
>> There is also NextMultiOffset, not sure If you meant to reset this.
>>Please confirm.
>
>No, leave the offset alone.  I meant that the "x" value (i.e. the part
>before the comma) should stay unchanged; you only need to change the
>"oldest" value, which is "y" (part after the comma).
>
>--=20
>=C1lvaro Herrera                http://www.2ndQuadrant.com/
>PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #11264: Auto vacuum wraparound job blocking everything

From

Alvaro Herrera

Date:

30 August 2014, 00:25:18

Dinesh Bhandary wrote:
> Hi Alvaro,
>
> I was killing to reproduce the bug.Once it is fixed we have no need to
> kill these jobs.
> Now we still have this production server not running in auto vacuum till
> we apply the fix during next maintenance window. Is running vacuum
> manually safe, or we may run into the same issue as auto vacuum tx
> wraparound job?  I am thinking max_freeze set to 500 million, manual
> vacuum won¹t freeze xids, and it may be ok to run. But I am not completely
> sure and worried about the same issue happening in production even with
> manual vacuum. Please let me know.

Yes, that seems correct.  I think you're only running into
for-wraparound vacuums because the relfrozenxid for that table is above
the 200 millions that IIRC are the default for max_freeze_age.  You can
just set it to the maximum of 2000 million; the only downside is disk
space used by pg_clog and pg_multixact.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #11264: Auto vacuum wraparound job blocking everything

From

Alvaro Herrera

Date:

30 August 2014, 00:45:18

Dinesh Bhandary wrote:
> What you are saying is vacuum without tx wraparound wonât cause the
> blocking issue we had experienced.

Yep, that's correct.  You're only seeing an issue because vacuum (manual
or auto) is trying to freeze the tuple.  If the tuple is not considered
old enough to need freezing it will not cause the system to stall.  Once
you get the oldestMulti value fixed, the freezing routine will be able
to look at the tuple, notice that the multixact in it is older than the
upgrade, and decide it can just rip it out without trying to "resolve"
it.

> If thatâs the case I can just change the auto vacuum on ( which can be
> changed on the fly), and I had already bumped up max_freeze_settings to
> 500 mill for now. We will change that to lower number in the next
> maintenance window.

Right -- AFAICS that should work.

--
Ãlvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #11264: Auto vacuum wraparound job blocking everything

From

Dinesh Bhandary

Date:

30 August 2014, 00:47:06

V2hhdCB5b3UgYXJlIHNheWluZyBpcyB2YWN1dW0gd2l0aG91dCB0eCB3cmFwYXJvdW5kIHdvbuKA
mXQgY2F1c2UgdGhlDQpibG9ja2luZyBpc3N1ZSB3ZSBoYWQgZXhwZXJpZW5jZWQuDQoNCklmIHRo
YXTigJlzIHRoZSBjYXNlIEkgY2FuIGp1c3QgY2hhbmdlIHRoZSBhdXRvIHZhY3V1bSBvbiAoIHdo
aWNoIGNhbiBiZQ0KY2hhbmdlZCBvbiB0aGUgZmx5KSwgYW5kIEkgaGFkIGFscmVhZHkgYnVtcGVk
IHVwIG1heF9mcmVlemVfc2V0dGluZ3MgdG8NCjUwMCBtaWxsIGZvciBub3cuIFdlIHdpbGwgY2hh
bmdlIHRoYXQgdG8gbG93ZXIgbnVtYmVyIGluIHRoZSBuZXh0DQptYWludGVuYW5jZSB3aW5kb3cu
DQoNClRoYW5rcw0KRGluZXNoDQoNCk9uIDgvMjkvMTQsIDI6MjUgUE0sICJBbHZhcm8gSGVycmVy
YSIgPGFsdmhlcnJlQDJuZHF1YWRyYW50LmNvbT4gd3JvdGU6DQoNCj5EaW5lc2ggQmhhbmRhcnkg
d3JvdGU6DQo+PiBIaSBBbHZhcm8sDQo+PiANCj4+IEkgd2FzIGtpbGxpbmcgdG8gcmVwcm9kdWNl
IHRoZSBidWcuT25jZSBpdCBpcyBmaXhlZCB3ZSBoYXZlIG5vIG5lZWQgdG8NCj4+IGtpbGwgdGhl
c2Ugam9icy4NCj4+IE5vdyB3ZSBzdGlsbCBoYXZlIHRoaXMgcHJvZHVjdGlvbiBzZXJ2ZXIgbm90
IHJ1bm5pbmcgaW4gYXV0byB2YWN1dW0gdGlsbA0KPj4gd2UgYXBwbHkgdGhlIGZpeCBkdXJpbmcg
bmV4dCBtYWludGVuYW5jZSB3aW5kb3cuIElzIHJ1bm5pbmcgdmFjdXVtDQo+PiBtYW51YWxseSBz
YWZlLCBvciB3ZSBtYXkgcnVuIGludG8gdGhlIHNhbWUgaXNzdWUgYXMgYXV0byB2YWN1dW0gdHgN
Cj4+IHdyYXBhcm91bmQgam9iPyAgSSBhbSB0aGlua2luZyBtYXhfZnJlZXplIHNldCB0byA1MDAg
bWlsbGlvbiwgbWFudWFsDQo+PiB2YWN1dW0gd29uwrl0IGZyZWV6ZSB4aWRzLCBhbmQgaXQgbWF5
IGJlIG9rIHRvIHJ1bi4gQnV0IEkgYW0gbm90DQo+PmNvbXBsZXRlbHkNCj4+IHN1cmUgYW5kIHdv
cnJpZWQgYWJvdXQgdGhlIHNhbWUgaXNzdWUgaGFwcGVuaW5nIGluIHByb2R1Y3Rpb24gZXZlbiB3
aXRoDQo+PiBtYW51YWwgdmFjdXVtLiBQbGVhc2UgbGV0IG1lIGtub3cuDQo+DQo+WWVzLCB0aGF0
IHNlZW1zIGNvcnJlY3QuICBJIHRoaW5rIHlvdSdyZSBvbmx5IHJ1bm5pbmcgaW50bw0KPmZvci13
cmFwYXJvdW5kIHZhY3V1bXMgYmVjYXVzZSB0aGUgcmVsZnJvemVueGlkIGZvciB0aGF0IHRhYmxl
IGlzIGFib3ZlDQo+dGhlIDIwMCBtaWxsaW9ucyB0aGF0IElJUkMgYXJlIHRoZSBkZWZhdWx0IGZv
ciBtYXhfZnJlZXplX2FnZS4gIFlvdSBjYW4NCj5qdXN0IHNldCBpdCB0byB0aGUgbWF4aW11bSBv
ZiAyMDAwIG1pbGxpb247IHRoZSBvbmx5IGRvd25zaWRlIGlzIGRpc2sNCj5zcGFjZSB1c2VkIGJ5
IHBnX2Nsb2cgYW5kIHBnX211bHRpeGFjdC4NCj4NCj4tLSANCj7DgWx2YXJvIEhlcnJlcmEgICAg
ICAgICAgICAgICAgaHR0cDovL3d3dy4ybmRRdWFkcmFudC5jb20vDQo+UG9zdGdyZVNRTCBEZXZl
bG9wbWVudCwgMjR4NyBTdXBwb3J0LCBUcmFpbmluZyAmIFNlcnZpY2VzDQoNCg==

Re: BUG #11264: Auto vacuum wraparound job blocking everything

From

Dinesh Bhandary

Date:

30 August 2014, 00:47:07

Hi Alvaro,

I was killing to reproduce the bug.Once it is fixed we have no need to
kill these jobs.
Now we still have this production server not running in auto vacuum till
we apply the fix during next maintenance window. Is running vacuum
manually safe, or we may run into the same issue as auto vacuum tx
wraparound job?  I am thinking max_freeze set to 500 million, manual
vacuum won=B9t freeze xids, and it may be ok to run. But I am not completel=
y
sure and worried about the same issue happening in production even with
manual vacuum. Please let me know.

Thanks a lot for your help. Much appreciated.
Dinesh

On 8/29/14, 2:07 PM, "Alvaro Herrera" <alvherre@2ndquadrant.com> wrote:

>Dinesh Bhandary wrote:
>> I think the fix is working. Now I am able to kill copy job while auto
>> vacuum job is going on that table.
>
>I hope this killing of autovacuum jobs is not continued operational
>practice.  You need whole-table vacuums of all your tables once in a
>while.  (Note you can increase the "freeze_table_age" parameters if you
>want to be forced to do that less frequently.  See the docs.)
>
>> Since I am setting pg_resetxlog on the primary, do I need to do that on
>> the slave too? Please let me know
>
>No, the slave should fix itself once the checkpoint record is replayed.
>
>--=20
>=C1lvaro Herrera                http://www.2ndQuadrant.com/
>PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #11264: Auto vacuum wraparound job blocking everything

From

David Gould

Date:

30 September 2014, 03:50:15

Alvaro Herrera <alvherre <at> 2ndquadrant.com> writes:

> Well, yes, 9.3.4 had a bug fixed by this commit:
>
> Author: Bruce Momjian <bruce <at> momjian.us>
> Branch: master [a61daa14d] 2014-07-02 15:29:38 -0400
> Branch: REL9_4_STABLE [b446a384b] 2014-07-02 15:29:38 -0400
> Branch: REL9_3_STABLE Release: REL9_3_5 [3d2e18510] 2014-07-02 15:29:38
-0400
>
>     pg_upgrade:  preserve database and relation minmxid values
>
>     Also set these values for pre-9.3 old clusters that don't have values
to
>     preserve.
>
>     Analysis by Alvaro
>
>     Backpatch through 9.3
>
> > How do we fix the current issue with this one server? Is there an easy
fix?
> > Thanks.
>
> As far as I am aware, you should
>   UPDATE pg_database SET datminmxid=20783
>
> and that should fix it.  The oldestMulti value in pg_control would get
> updated by itself some time later.  If you experience stalls before
> oldestMulti fixes itself, you could stop the server (cleanly!) and then
> pg_resetxlog -m x,y where x is the correct nextMulti value from
> pg_controldata and y is 20783.

We ran into this on two instances last night and I'm starting to worry that
many others may hit it as well. As far as I can tell any postgres instance
that was upgraded to 9.3 by a pre-9.3.5 pg_upgrade is at risk as they all
will have:

  Latest checkpoint's oldestMultiXid:   1

The workaround above is good if you still have the old cluster around from
before the upgrade. We did not, that was all cleaned up months ago. Which
raises the question: how do you fix a 9.3 instance that has oldestMultiXid =
1 without access to the pre-upgrade instance. That is, where do you get the
correct value of oldestMultiXid to set into pg_database.datxminmxid?

I took a guess that the oldest pg_class.relminmxid for all the tables in all
the databases would be ok and updated pg_database.datminmxid witt that. That
is, in each database I ran:

  select relname, relminmxid, age(relminmxid)
    from pg_class
    where relkind = 'r'
      and age(relminmxid) < 2147483647
    order by 3 desc limit 1 ;

And then used the oldest one to update pg_database.datminmxid. After a
checkpoint and shutdown/restart I see that it was written to pg_controldata
too. Afterwards I was able to run a vacuum freeze on the problem table.

Questions:

1) Is the procedure above safe and effective for this, or did I just hose my
databases?

2) If the procedure above is not safe, what do we do instead?

3) Is this likely to affect all 9.3.x<5 upgraded databases? If so, how do we
fix it before too much damage happens in the wider world?

-dg

--
David Gould                               daveg <at> sonic.net
If simplicity worked, the world would be overrun with insects.

Re: BUG #11264: Auto vacuum wraparound job blocking everything - trying again

From

David Gould

Date:

02 October 2014, 00:09:44

I sent the following a couple days ago but it seems to have been
overlooked. I think this issue may impact a lot of 9.3 sites that have
upgraded from earlier versions. Please take a look at this.

-dg

On Tue, 30 Sep 2014 00:44:11 +0000 (UTC) David Gould <daveg@sonic.net>
wrote:

> Alvaro Herrera <alvherre <at> 2ndquadrant.com> writes:
>
> > Well, yes, 9.3.4 had a bug fixed by this commit:
> >
> > Author: Bruce Momjian <bruce <at> momjian.us>
> > Branch: master [a61daa14d] 2014-07-02 15:29:38 -0400
> > Branch: REL9_4_STABLE [b446a384b] 2014-07-02 15:29:38 -0400
> > Branch: REL9_3_STABLE Release: REL9_3_5 [3d2e18510] 2014-07-02 15:29:38
> -0400
> >
> >     pg_upgrade:  preserve database and relation minmxid values
> >
> >     Also set these values for pre-9.3 old clusters that don't have values
> to
> >     preserve.
> >
> >     Analysis by Alvaro
> >
> >     Backpatch through 9.3
> >
> > > How do we fix the current issue with this one server? Is there an easy
> fix?
> > > Thanks.
> >
> > As far as I am aware, you should
> >   UPDATE pg_database SET datminmxid=20783
> >
> > and that should fix it.  The oldestMulti value in pg_control would get
> > updated by itself some time later.  If you experience stalls before
> > oldestMulti fixes itself, you could stop the server (cleanly!) and then
> > pg_resetxlog -m x,y where x is the correct nextMulti value from
> > pg_controldata and y is 20783.
>
> We ran into this on two instances last night and I'm starting to worry that
> many others may hit it as well. As far as I can tell any postgres instance
> that was upgraded to 9.3 by a pre-9.3.5 pg_upgrade is at risk as they all
> will have:
>
>   Latest checkpoint's oldestMultiXid:   1
>
> The workaround above is good if you still have the old cluster around from
> before the upgrade. We did not, that was all cleaned up months ago. Which
> raises the question: how do you fix a 9.3 instance that has oldestMultiXid =
> 1 without access to the pre-upgrade instance. That is, where do you get the
> correct value of oldestMultiXid to set into pg_database.datxminmxid?
>
> I took a guess that the oldest pg_class.relminmxid for all the tables in all
> the databases would be ok and updated pg_database.datminmxid witt that. That
> is, in each database I ran:
>
>   select relname, relminmxid, age(relminmxid)
>     from pg_class
>     where relkind = 'r'
>       and age(relminmxid) < 2147483647
>     order by 3 desc limit 1 ;
>
> And then used the oldest one to update pg_database.datminmxid. After a
> checkpoint and shutdown/restart I see that it was written to pg_controldata
> too. Afterwards I was able to run a vacuum freeze on the problem table.
>
> Questions:
>
> 1) Is the procedure above safe and effective for this, or did I just hose my
> databases?
>
> 2) If the procedure above is not safe, what do we do instead?
>
> 3) Is this likely to affect all 9.3.x<5 upgraded databases? If so, how do we
> fix it before too much damage happens in the wider world?
\


--
David Gould                              daveg <at> sonic.net
If simplicity worked, the world would be overrun with insects.

Re: BUG #11264: Auto vacuum wraparound job blocking everything

From

Alvaro Herrera

Date:

02 October 2014, 00:32:21

David Gould wrote:

> We ran into this on two instances last night and I'm starting to worry that
> many others may hit it as well. As far as I can tell any postgres instance
> that was upgraded to 9.3 by a pre-9.3.5 pg_upgrade is at risk as they all
> will have:
>
>   Latest checkpoint's oldestMultiXid:   1
>
> The workaround above is good if you still have the old cluster around from
> before the upgrade. We did not, that was all cleaned up months ago. Which
> raises the question: how do you fix a 9.3 instance that has oldestMultiXid =
> 1 without access to the pre-upgrade instance. That is, where do you get the
> correct value of oldestMultiXid to set into pg_database.datxminmxid?
>
> I took a guess that the oldest pg_class.relminmxid for all the tables in all
> the databases would be ok and updated pg_database.datminmxid witt that. That
> is, in each database I ran:
>
>   select relname, relminmxid, age(relminmxid)
>     from pg_class
>     where relkind = 'r'
>       and age(relminmxid) < 2147483647
>     order by 3 desc limit 1 ;

Unfortunately, the age() function is only valid for transaction IDs, not
multixacts.  In certain cases the values returned might be sensible, but
they don't have to be, and if the counters are not relatively close to
one another, most likely they won't be sensible.  A new function was
added to 9.4, but that doesn't help those in 9.3.
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=36ad1a87a368d837d5c2260b2fd2ff83e976425a
(I guess you could have an extension containing that function.)
My bet is that is the reason you had to add the condition that the value
is less than 2^31 - 1.

Offhand, other than the way to obtain the multixact age, the procedure
seems sensible.

A better way not involving mxid_age() would be to use pg_controldata to
extract the current value of the mxid counter, then subtract the current
relminmxid from that value.

> 3) Is this likely to affect all 9.3.x<5 upgraded databases?

Yes.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #11264: Auto vacuum wraparound job blocking everything

From

Bernhard Schrader

Date:

13 November 2014, 15:47:04

On 01.10.2014 23:32, Alvaro Herrera wrote:
> David Gould wrote:
>
>> We ran into this on two instances last night and I'm starting to worry that
>> many others may hit it as well. As far as I can tell any postgres instance
>> that was upgraded to 9.3 by a pre-9.3.5 pg_upgrade is at risk as they all
>> will have:
>>
>>    Latest checkpoint's oldestMultiXid:   1
>>
>> The workaround above is good if you still have the old cluster around from
>> before the upgrade. We did not, that was all cleaned up months ago. Which
>> raises the question: how do you fix a 9.3 instance that has oldestMultiXid =
>> 1 without access to the pre-upgrade instance. That is, where do you get the
>> correct value of oldestMultiXid to set into pg_database.datxminmxid?
>>
>> I took a guess that the oldest pg_class.relminmxid for all the tables in all
>> the databases would be ok and updated pg_database.datminmxid witt that. That
>> is, in each database I ran:
>>
>>    select relname, relminmxid, age(relminmxid)
>>      from pg_class
>>      where relkind = 'r'
>>        and age(relminmxid) < 2147483647
>>      order by 3 desc limit 1 ;
> Unfortunately, the age() function is only valid for transaction IDs, not
> multixacts.  In certain cases the values returned might be sensible, but
> they don't have to be, and if the counters are not relatively close to
> one another, most likely they won't be sensible.  A new function was
> added to 9.4, but that doesn't help those in 9.3.
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=36ad1a87a368d837d5c2260b2fd2ff83e976425a
> (I guess you could have an extension containing that function.)
> My bet is that is the reason you had to add the condition that the value
> is less than 2^31 - 1.
>
> Offhand, other than the way to obtain the multixact age, the procedure
> seems sensible.
>
> A better way not involving mxid_age() would be to use pg_controldata to
> extract the current value of the mxid counter, then subtract the current
> relminmxid from that value.
>
>> 3) Is this likely to affect all 9.3.x<5 upgraded databases?
> Yes.
>
Hi,

I just have 700 databases which could been affected and i just don't
want to dump and restore all of them, so is there a prober way to detect
if one is affected or not?
E.g. vacuum of all tables or something else?

And is a dump, init, restore a way to make 100% sure this bug is fixed?

regards

Bernhard Schrader