Thread: MultiXact member wraparound protections are disabled
Hi,
We run postgres 9.4.5.
Starting this morning, we started seeing messages like the below:
Oct 12 14:07:15 site-db01a postgres[11253]: [106430-1] app=,user=,db=,ip=LOG: MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on disk
Oct 12 14:09:26 site-db01a postgres[11253]: [106526-1] app=,user=,db=,ip=LOG: MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on disk
Oct 12 14:14:18 site-db01a postgres[11253]: [106608-1] app=,user=,db=,ip=LOG: MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on disk
Our autovacuum_freeze_max_age = 1750000000.
site=# SELECT datname, age(datfrozenxid) FROM pg_database;
datname | age
-----------+------------
site | 1645328344
template0 | 1274558807
bench | 1274558807
postgres | 1324283514
template1 | 1274558807
datname | age
-----------+------------
site | 1645328344
template0 | 1274558807
bench | 1274558807
postgres | 1324283514
template1 | 1274558807
So we’re about 100 mil transactions away before we start vacuuming to prevent wraparound.
We’re running precautionary vacuums on our largest offenders to try and drop our transaction ids
What I’d request some clarity on is the message above. What does it mean that "oldest checkpointed MultiXact does not exist on disk”? Would we lose data if we did have to wrap around?
Is this telling us we’re not vacuuming effectively enough?
Thanks,
Karthik
AnandKumar, Karthik wrote: > Hi, > > We run postgres 9.4.5. > > Starting this morning, we started seeing messages like the below: > Oct 12 14:07:15 site-db01a postgres[11253]: [106430-1] app=,user=,db=,ip=LOG: MultiXact member wraparound protectionsare disabled because oldest checkpointed MultiXact 1 does not exist on disk > Oct 12 14:09:26 site-db01a postgres[11253]: [106526-1] app=,user=,db=,ip=LOG: MultiXact member wraparound protectionsare disabled because oldest checkpointed MultiXact 1 does not exist on disk > Oct 12 14:14:18 site-db01a postgres[11253]: [106608-1] app=,user=,db=,ip=LOG: MultiXact member wraparound protectionsare disabled because oldest checkpointed MultiXact 1 does not exist on disk > > Our autovacuum_freeze_max_age = 1750000000. > > site=# SELECT datname, age(datfrozenxid) FROM pg_database; > datname | age > -----------+------------ > site | 1645328344 > template0 | 1274558807 > bench | 1274558807 > postgres | 1324283514 > template1 | 1274558807 > > So we’re about 100 mil transactions away before we start vacuuming to prevent wraparound. > > We’re running precautionary vacuums on our largest offenders to try and drop our transaction ids > > What I’d request some clarity on is the message above. What does it mean that "oldest checkpointed MultiXact does not existon disk”? Would we lose data if we did have to wrap around? > > Is this telling us we’re not vacuuming effectively enough? Ugh. Can you share the output of pg_controldata and the list of files in pg_multixact/members and pg_multixact/offset? The problem here is that multixact vacuuming is separate from xid vacuuming, so you need to be looking at datminmulti rather than datfrozenxid. It may be that multixact wrap around has already occurred. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
root@site-db01a:/var/lib/pgsql/cmates/data # ls pg_multixact/members 0000 0001 0002 0003 0004 0005 0006 0007 0008 0009 000A 000B 000C 000D 000E 000F 0010 0011 0012 0013 0014 0015 0016 0017 0018 0019 001A 001B root@site-db01a:/var/lib/pgsql/cmates/data # ls pg_multixact/offsets 0001 0002 0003 0004 0005 0006 0007 0008 0009 000A 000B postgres@site-db01a:~ $ /usr/pgsql-9.4/bin/pg_controldata /var/lib/pgsql/cmates/data pg_control version number: 942 Catalog version number: 201409291 Database system identifier: 6228991221455883206 Database cluster state: in production pg_control last modified: Wed 12 Oct 2016 05:22:45 PM PDT Latest checkpoint location: 62D0/BDE939F8 Prior checkpoint location: 62CF/F039BFD0 Latest checkpoint's REDO location: 62D0/8A060220 Latest checkpoint's REDO WAL file: 00000001000062D00000008A Latest checkpoint's TimeLineID: 1 Latest checkpoint's PrevTimeLineID: 1 Latest checkpoint's full_page_writes: on Latest checkpoint's NextXID: 1/1834305762 Latest checkpoint's NextOID: 19540327 Latest checkpoint's NextMultiXactId: 784503 Latest checkpoint's NextMultiOffset: 1445264 Latest checkpoint's oldestXID: 226141373 Latest checkpoint's oldestXID's DB: 16457 Latest checkpoint's oldestActiveXID: 1834302410 Latest checkpoint's oldestMultiXid: 1 Latest checkpoint's oldestMulti's DB: 16457 Time of latest checkpoint: Wed 12 Oct 2016 05:22:05 PM PDT Fake LSN counter for unlogged rels: 0/1 Minimum recovery ending location: 0/0 Min recovery ending loc's timeline: 0 Backup start location: 0/0 Backup end location: 0/0 End-of-backup record required: no Current wal_level setting: hot_standby Current wal_log_hints setting: off Current max_connections setting: 1500 Current max_worker_processes setting: 8 Current max_prepared_xacts setting: 0 Current max_locks_per_xact setting: 1000 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Maximum size of a TOAST chunk: 1996 Size of a large-object chunk: 2048 Date/time type storage: 64-bit integers Float4 argument passing: by value Float8 argument passing: by value Data page checksum version: 0 On 10/13/16, 5:28 AM, "Alvaro Herrera" <alvherre@2ndquadrant.com> wrote: >AnandKumar, Karthik wrote: >> Hi, >> >> We run postgres 9.4.5. >> >> Starting this morning, we started seeing messages like the below: >> Oct 12 14:07:15 site-db01a postgres[11253]: [106430-1] app=,user=,db=,ip=LOG: MultiXact member wraparound protectionsare disabled because oldest checkpointed MultiXact 1 does not exist on disk >> Oct 12 14:09:26 site-db01a postgres[11253]: [106526-1] app=,user=,db=,ip=LOG: MultiXact member wraparound protectionsare disabled because oldest checkpointed MultiXact 1 does not exist on disk >> Oct 12 14:14:18 site-db01a postgres[11253]: [106608-1] app=,user=,db=,ip=LOG: MultiXact member wraparound protectionsare disabled because oldest checkpointed MultiXact 1 does not exist on disk >> >> Our autovacuum_freeze_max_age = 1750000000. >> >> site=# SELECT datname, age(datfrozenxid) FROM pg_database; >> datname | age >> -----------+------------ >> site | 1645328344 >> template0 | 1274558807 >> bench | 1274558807 >> postgres | 1324283514 >> template1 | 1274558807 >> >> So we’re about 100 mil transactions away before we start vacuuming to prevent wraparound. >> >> We’re running precautionary vacuums on our largest offenders to try and drop our transaction ids >> >> What I’d request some clarity on is the message above. What does it mean that "oldest checkpointed MultiXact does notexist on disk”? Would we lose data if we did have to wrap around? >> >> Is this telling us we’re not vacuuming effectively enough? > >Ugh. Can you share the output of pg_controldata and the list of files >in pg_multixact/members and pg_multixact/offset? > >The problem here is that multixact vacuuming is separate from xid >vacuuming, so you need to be looking at datminmulti rather than >datfrozenxid. It may be that multixact wrap around has already >occurred. > >-- >Álvaro Herrera https://www.2ndQuadrant.com/ >PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Sharing output
postgres@site-db01a:~/cmates/data/pg_multixact/members $ ls
0000 0002 0004 0006 0008 000A 000C 000E 0010 0012 0014 0016 0018 001A
0001 0003 0005 0007 0009 000B 000D 000F 0011 0013 0015 0017 0019 001B
postgres@site-db01a:~/cmates/data/pg_multixact/offsets $ ls
0001 0002 0003 0004 0005 0006 0007 0008 0009 000A 000B
postgres@site-db01a:/tmp $ /usr/pgsql-9.4/bin/pg_controldata -D /var/lib/pgsql/cmates/data
pg_controldata: could not open file "-D/global/pg_control" for reading: No such file or directory
pg_controldata is not working in here even though the file is there inside global but it is not reading from it
postgres@site-db01a:~/cmates/data/global $ ls -la pg_control
-rw-------. 1 postgres postgres 8192 Oct 12 18:55 pg_control
postgres@site-db01a:~/cmates/data/pg_multixact/members $ ls
0000 0002 0004 0006 0008 000A 000C 000E 0010 0012 0014 0016 0018 001A
0001 0003 0005 0007 0009 000B 000D 000F 0011 0013 0015 0017 0019 001B
postgres@site-db01a:~/cmates/data/pg_multixact/offsets $ ls
0001 0002 0003 0004 0005 0006 0007 0008 0009 000A 000B
postgres@site-db01a:/tmp $ /usr/pgsql-9.4/bin/pg_controldata -D /var/lib/pgsql/cmates/data
pg_controldata: could not open file "-D/global/pg_control" for reading: No such file or directory
postgres@site-db01a:~/cmates/data/global $ ls -la pg_control
-rw-------. 1 postgres postgres 8192 Oct 12 18:55 pg_control
On Wed, Oct 12, 2016 at 4:58 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
AnandKumar, Karthik wrote:
> Hi,
>
> We run postgres 9.4.5.
>
> Starting this morning, we started seeing messages like the below:
> Oct 12 14:07:15 site-db01a postgres[11253]: [106430-1] app=,user=,db=,ip=LOG: MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on disk
> Oct 12 14:09:26 site-db01a postgres[11253]: [106526-1] app=,user=,db=,ip=LOG: MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on disk
> Oct 12 14:14:18 site-db01a postgres[11253]: [106608-1] app=,user=,db=,ip=LOG: MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on disk
>
> Our autovacuum_freeze_max_age = 1750000000.
>
> site=# SELECT datname, age(datfrozenxid) FROM pg_database;
> datname | age
> -----------+------------
> site | 1645328344
> template0 | 1274558807
> bench | 1274558807
> postgres | 1324283514
> template1 | 1274558807
>
> So we’re about 100 mil transactions away before we start vacuuming to prevent wraparound.
>
> We’re running precautionary vacuums on our largest offenders to try and drop our transaction ids
>
> What I’d request some clarity on is the message above. What does it mean that "oldest checkpointed MultiXact does not exist on disk”? Would we lose data if we did have to wrap around?
>
> Is this telling us we’re not vacuuming effectively enough?
Ugh. Can you share the output of pg_controldata and the list of files
in pg_multixact/members and pg_multixact/offset?
The problem here is that multixact vacuuming is separate from xid
vacuuming, so you need to be looking at datminmulti rather than
datfrozenxid. It may be that multixact wrap around has already
occurred.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Got the output of pg_control
postgres@site-db01a:~/cmates/data/global $ /usr/pgsql-9.4/bin/pg_controldata /var/lib/pgsql/cmates/data
pg_control version number: 942
Catalog version number: 201409291
Database system identifier: 6228991221455883206
Database cluster state: in production
pg_control last modified: Wed 12 Oct 2016 07:08:14 PM PDT
Latest checkpoint location: 62E1/890DA8D8
Prior checkpoint location: 62E0/550B2178
Latest checkpoint's REDO location: 62E1/4F054A08
Latest checkpoint's REDO WAL file: 00000001000062E10000004F
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 1/1834454859
Latest checkpoint's NextOID: 19540816
Latest checkpoint's NextMultiXactId: 784527
Latest checkpoint's NextMultiOffset: 1445313
Latest checkpoint's oldestXID: 226141373
Latest checkpoint's oldestXID's DB: 16457
Latest checkpoint's oldestActiveXID: 1834454859
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 16457
Time of latest checkpoint: Wed 12 Oct 2016 07:06:45 PM PDT
Fake LSN counter for unlogged rels: 0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
Current wal_level setting: hot_standby
Current wal_log_hints setting: off
Current max_connections setting: 1500
Current max_worker_processes setting: 8
Current max_prepared_xacts setting: 0
Current max_locks_per_xact setting: 1000
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value
Data page checksum version: 0
postgres@site-db01a:~/cmates/data/global $ /usr/pgsql-9.4/bin/pg_controldata /var/lib/pgsql/cmates/data
pg_control version number: 942
Catalog version number: 201409291
Database system identifier: 6228991221455883206
Database cluster state: in production
pg_control last modified: Wed 12 Oct 2016 07:08:14 PM PDT
Latest checkpoint location: 62E1/890DA8D8
Prior checkpoint location: 62E0/550B2178
Latest checkpoint's REDO location: 62E1/4F054A08
Latest checkpoint's REDO WAL file: 00000001000062E10000004F
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 1/1834454859
Latest checkpoint's NextOID: 19540816
Latest checkpoint's NextMultiXactId: 784527
Latest checkpoint's NextMultiOffset: 1445313
Latest checkpoint's oldestXID: 226141373
Latest checkpoint's oldestXID's DB: 16457
Latest checkpoint's oldestActiveXID: 1834454859
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 16457
Time of latest checkpoint: Wed 12 Oct 2016 07:06:45 PM PDT
Fake LSN counter for unlogged rels: 0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
Current wal_level setting: hot_standby
Current wal_log_hints setting: off
Current max_connections setting: 1500
Current max_worker_processes setting: 8
Current max_prepared_xacts setting: 0
Current max_locks_per_xact setting: 1000
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value
Data page checksum version: 0
On Wed, Oct 12, 2016 at 7:10 PM, avi Singh <avisingh19811981@gmail.com> wrote:
Sharing outputpg_controldata is not working in here even though the file is there inside global but it is not reading from it
postgres@site-db01a:~/cmates/data/pg_multixact/members $ ls
0000 0002 0004 0006 0008 000A 000C 000E 0010 0012 0014 0016 0018 001A
0001 0003 0005 0007 0009 000B 000D 000F 0011 0013 0015 0017 0019 001B
postgres@site-db01a:~/cmates/data/pg_multixact/offsets $ ls
0001 0002 0003 0004 0005 0006 0007 0008 0009 000A 000B
postgres@site-db01a:/tmp $ /usr/pgsql-9.4/bin/pg_controldata -D /var/lib/pgsql/cmates/data
pg_controldata: could not open file "-D/global/pg_control" for reading: No such file or directory
postgres@site-db01a:~/cmates/data/global $ ls -la pg_control
-rw-------. 1 postgres postgres 8192 Oct 12 18:55 pg_controlOn Wed, Oct 12, 2016 at 4:58 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:AnandKumar, Karthik wrote:
> Hi,
>
> We run postgres 9.4.5.
>
> Starting this morning, we started seeing messages like the below:
> Oct 12 14:07:15 site-db01a postgres[11253]: [106430-1] app=,user=,db=,ip=LOG: MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on disk
> Oct 12 14:09:26 site-db01a postgres[11253]: [106526-1] app=,user=,db=,ip=LOG: MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on disk
> Oct 12 14:14:18 site-db01a postgres[11253]: [106608-1] app=,user=,db=,ip=LOG: MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on disk
>
> Our autovacuum_freeze_max_age = 1750000000.
>
> site=# SELECT datname, age(datfrozenxid) FROM pg_database;
> datname | age
> -----------+------------
> site | 1645328344
> template0 | 1274558807
> bench | 1274558807
> postgres | 1324283514
> template1 | 1274558807
>
> So we’re about 100 mil transactions away before we start vacuuming to prevent wraparound.
>
> We’re running precautionary vacuums on our largest offenders to try and drop our transaction ids
>
> What I’d request some clarity on is the message above. What does it mean that "oldest checkpointed MultiXact does not exist on disk”? Would we lose data if we did have to wrap around?
>
> Is this telling us we’re not vacuuming effectively enough?
Ugh. Can you share the output of pg_controldata and the list of files
in pg_multixact/members and pg_multixact/offset?
The problem here is that multixact vacuuming is separate from xid
vacuuming, so you need to be looking at datminmulti rather than
datfrozenxid. It may be that multixact wrap around has already
occurred.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
We are also seeing this in our log file
Oct 12 19:08:14 site-db01a postgres[6117]: [7589-1] app=,user=,db=,ip=LOG: MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on disk
Oct 12 19:08:14 site-db01a postgres[6117]: [7589-1] app=,user=,db=,ip=LOG: MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on disk
On Wed, Oct 12, 2016 at 7:13 PM, avi Singh <avisingh19811981@gmail.com> wrote:
Got the output of pg_control
postgres@site-db01a:~/cmates/data/global $ /usr/pgsql-9.4/bin/pg_ controldata /var/lib/pgsql/cmates/data
pg_control version number: 942
Catalog version number: 201409291
Database system identifier: 6228991221455883206
Database cluster state: in production
pg_control last modified: Wed 12 Oct 2016 07:08:14 PM PDT
Latest checkpoint location: 62E1/890DA8D8
Prior checkpoint location: 62E0/550B2178
Latest checkpoint's REDO location: 62E1/4F054A08
Latest checkpoint's REDO WAL file: 00000001000062E10000004F
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 1/1834454859
Latest checkpoint's NextOID: 19540816
Latest checkpoint's NextMultiXactId: 784527
Latest checkpoint's NextMultiOffset: 1445313
Latest checkpoint's oldestXID: 226141373
Latest checkpoint's oldestXID's DB: 16457
Latest checkpoint's oldestActiveXID: 1834454859
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 16457
Time of latest checkpoint: Wed 12 Oct 2016 07:06:45 PM PDT
Fake LSN counter for unlogged rels: 0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
Current wal_level setting: hot_standby
Current wal_log_hints setting: off
Current max_connections setting: 1500
Current max_worker_processes setting: 8
Current max_prepared_xacts setting: 0
Current max_locks_per_xact setting: 1000
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value
Data page checksum version: 0On Wed, Oct 12, 2016 at 7:10 PM, avi Singh <avisingh19811981@gmail.com> wrote:Sharing outputpg_controldata is not working in here even though the file is there inside global but it is not reading from it
postgres@site-db01a:~/cmates/data/pg_multixact/members $ ls
0000 0002 0004 0006 0008 000A 000C 000E 0010 0012 0014 0016 0018 001A
0001 0003 0005 0007 0009 000B 000D 000F 0011 0013 0015 0017 0019 001B
postgres@site-db01a:~/cmates/data/pg_multixact/offsets $ ls
0001 0002 0003 0004 0005 0006 0007 0008 0009 000A 000B
postgres@site-db01a:/tmp $ /usr/pgsql-9.4/bin/pg_controldata -D /var/lib/pgsql/cmates/data
pg_controldata: could not open file "-D/global/pg_control" for reading: No such file or directory
postgres@site-db01a:~/cmates/data/global $ ls -la pg_control
-rw-------. 1 postgres postgres 8192 Oct 12 18:55 pg_controlOn Wed, Oct 12, 2016 at 4:58 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:AnandKumar, Karthik wrote:
> Hi,
>
> We run postgres 9.4.5.
>
> Starting this morning, we started seeing messages like the below:
> Oct 12 14:07:15 site-db01a postgres[11253]: [106430-1] app=,user=,db=,ip=LOG: MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on disk
> Oct 12 14:09:26 site-db01a postgres[11253]: [106526-1] app=,user=,db=,ip=LOG: MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on disk
> Oct 12 14:14:18 site-db01a postgres[11253]: [106608-1] app=,user=,db=,ip=LOG: MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on disk
>
> Our autovacuum_freeze_max_age = 1750000000.
>
> site=# SELECT datname, age(datfrozenxid) FROM pg_database;
> datname | age
> -----------+------------
> site | 1645328344
> template0 | 1274558807
> bench | 1274558807
> postgres | 1324283514
> template1 | 1274558807
>
> So we’re about 100 mil transactions away before we start vacuuming to prevent wraparound.
>
> We’re running precautionary vacuums on our largest offenders to try and drop our transaction ids
>
> What I’d request some clarity on is the message above. What does it mean that "oldest checkpointed MultiXact does not exist on disk”? Would we lose data if we did have to wrap around?
>
> Is this telling us we’re not vacuuming effectively enough?
Ugh. Can you share the output of pg_controldata and the list of files
in pg_multixact/members and pg_multixact/offset?
The problem here is that multixact vacuuming is separate from xid
vacuuming, so you need to be looking at datminmulti rather than
datfrozenxid. It may be that multixact wrap around has already
occurred.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
AnandKumar, Karthik wrote: > root@site-db01a:/var/lib/pgsql/cmates/data # ls pg_multixact/members > 0000 0001 0002 0003 0004 0005 0006 0007 0008 0009 000A 000B 000C 000D 000E 000F 0010 0011 0012 0013 0014 0015 0016 0017 0018 0019 001A 001B > root@site-db01a:/var/lib/pgsql/cmates/data # ls pg_multixact/offsets > 0001 0002 0003 0004 0005 0006 0007 0008 0009 000A 000B > postgres@site-db01a:~ $ /usr/pgsql-9.4/bin/pg_controldata /var/lib/pgsql/cmates/data > Latest checkpoint's NextMultiXactId: 784503 > Latest checkpoint's NextMultiOffset: 1445264 > Latest checkpoint's oldestMultiXid: 1 > Latest checkpoint's oldestMulti's DB: 16457 This looks perfectly normal, except that the pg_multixact/offsets/0000 file is gone. oldestMultiXid is 1 so I don't see how could have the file gotten removed. Has this been upgraded recently from a previous 9.3 or 9.4 version? There have been bugs in this area but they've been fixed now for some time. The 0000 file could have been removed manually, perhaps? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Thanks. We started seeing this error right after a SAN FC re-cable effort - so yes, that would make sense. We’ll do a little more digging to see if the 0000 could have gotten removed. If that’s an older file that we have in our filesystem backups, is it safe to restore from there? On 10/13/16, 3:30 PM, "Alvaro Herrera" <alvherre@2ndquadrant.com> wrote: >AnandKumar, Karthik wrote: >> root@site-db01a:/var/lib/pgsql/cmates/data # ls pg_multixact/members >> 0000 0001 0002 0003 0004 0005 0006 0007 0008 0009 000A 000B 000C 000D 000E 000F 0010 0011 0012 0013 0014 0015 0016 0017 0018 0019 001A 001B >> root@site-db01a:/var/lib/pgsql/cmates/data # ls pg_multixact/offsets >> 0001 0002 0003 0004 0005 0006 0007 0008 0009 000A 000B > >> postgres@site-db01a:~ $ /usr/pgsql-9.4/bin/pg_controldata /var/lib/pgsql/cmates/data > >> Latest checkpoint's NextMultiXactId: 784503 >> Latest checkpoint's NextMultiOffset: 1445264 >> Latest checkpoint's oldestMultiXid: 1 >> Latest checkpoint's oldestMulti's DB: 16457 > >This looks perfectly normal, except that the pg_multixact/offsets/0000 >file is gone. oldestMultiXid is 1 so I don't see how could have the >file gotten removed. Has this been upgraded recently from a previous >9.3 or 9.4 version? There have been bugs in this area but they've been >fixed now for some time. > >The 0000 file could have been removed manually, perhaps? > >-- >Álvaro Herrera https://www.2ndQuadrant.com/ >PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
AnandKumar, Karthik wrote: > Thanks. We started seeing this error right after a SAN FC re-cable effort - so yes, that would make sense. > We’ll do a little more digging to see if the 0000 could have gotten removed. > If that’s an older file that we have in our filesystem backups, is it safe to restore from there? Sure, the files are immutable after they are completed. I worry that if the system removed it automatically, it would just remove it again, though. Shouldn't happen on 9.4.5, but it seems just too much of a coincidence that that file was removed. Changes such as FC recabling should not cause anything like this. I mean, why a pg_multixact file and not a table data file? Very fishy. I'd advise to verify your older logs at the time of restarts whether the "multixact protections are enabled" message has ever appeared, or it has always been "protections are disabled". Maybe you've had the problem for ages and just never noticed ... -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Your right we looked back in our old logs and we do see the messages there as well. Still what I'm not getting is since we restarted the database after SAN FC re-cable effort auto-vacuum is running on all the threads continuous. I have never seen auto-vacuum using all the threads 24*7 on this database. Any thoughts ?
On Thu, Oct 13, 2016 at 8:35 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
AnandKumar, Karthik wrote:
> Thanks. We started seeing this error right after a SAN FC re-cable effort - so yes, that would make sense.
> We’ll do a little more digging to see if the 0000 could have gotten removed.
> If that’s an older file that we have in our filesystem backups, is it safe to restore from there?
Sure, the files are immutable after they are completed. I worry that if
the system removed it automatically, it would just remove it again,
though. Shouldn't happen on 9.4.5, but it seems just too much of a
coincidence that that file was removed.
Changes such as FC recabling should not cause anything like this. I
mean, why a pg_multixact file and not a table data file? Very fishy.
I'd advise to verify your older logs at the time of restarts whether the
"multixact protections are enabled" message has ever appeared, or it has
always been "protections are disabled". Maybe you've had the problem
for ages and just never noticed ...
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
avi Singh wrote: > Your right we looked back in our old logs and we do see the messages there > as well. Still what I'm not getting is since we restarted the database > after SAN FC re-cable effort auto-vacuum is running on all the threads > continuous. I have never seen auto-vacuum using all the threads 24*7 on > this database. Any thoughts ? It's trying to ensure all tables are correctly frozen. As I recall, that's working per spec and you should just let it run until it's done. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Thank you for your help Alvaro - we really appreciate it. The error in fact stopped this morning - we took downtime and ran a vacuum across all of our tables, and saw increased autovacuum activity as well. It looks like it bumped up the oldest multitxid to something other than 1 now: postgres@site-db01a:~ $ /usr/pgsql-9.4/bin/pg_controldata /var/lib/pgsql/cmates/data | grep -i multi Latest checkpoint's NextMultiXactId: 785051 Latest checkpoint's NextMultiOffset: 1446371 Latest checkpoint's oldestMultiXid: 575211 Latest checkpoint's oldestMulti's DB: 12998