Thread: MultiXact member wraparound protections are disabled

MultiXact member wraparound protections are disabled

From
"AnandKumar, Karthik"
Date:
Hi,

We run postgres 9.4.5.

Starting this morning, we started seeing messages like the below:
Oct 12 14:07:15 site-db01a postgres[11253]: [106430-1] app=,user=,db=,ip=LOG:  MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on disk
Oct 12 14:09:26 site-db01a postgres[11253]: [106526-1] app=,user=,db=,ip=LOG:  MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on disk
Oct 12 14:14:18 site-db01a postgres[11253]: [106608-1] app=,user=,db=,ip=LOG:  MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on disk

Our autovacuum_freeze_max_age = 1750000000. 

site=# SELECT datname, age(datfrozenxid) FROM pg_database;
 datname  |    age
-----------+------------
site      | 1645328344
template0 | 1274558807
bench     | 1274558807
postgres  | 1324283514
template1 | 1274558807

So we’re about 100 mil transactions away before we start vacuuming to prevent wraparound.

We’re running precautionary vacuums on our largest offenders to try and drop our transaction ids

What I’d request some clarity on is the message above. What does it mean that "oldest checkpointed MultiXact does not exist on disk”? Would we lose data if we did have to wrap around? 

Is this telling us we’re not vacuuming effectively enough?

Thanks,
Karthik

Re: MultiXact member wraparound protections are disabled

From
Alvaro Herrera
Date:
AnandKumar, Karthik wrote:
> Hi,
>
> We run postgres 9.4.5.
>
> Starting this morning, we started seeing messages like the below:
> Oct 12 14:07:15 site-db01a postgres[11253]: [106430-1] app=,user=,db=,ip=LOG:  MultiXact member wraparound
protectionsare disabled because oldest checkpointed MultiXact 1 does not exist on disk 
> Oct 12 14:09:26 site-db01a postgres[11253]: [106526-1] app=,user=,db=,ip=LOG:  MultiXact member wraparound
protectionsare disabled because oldest checkpointed MultiXact 1 does not exist on disk 
> Oct 12 14:14:18 site-db01a postgres[11253]: [106608-1] app=,user=,db=,ip=LOG:  MultiXact member wraparound
protectionsare disabled because oldest checkpointed MultiXact 1 does not exist on disk 
>
> Our autovacuum_freeze_max_age = 1750000000.
>
> site=# SELECT datname, age(datfrozenxid) FROM pg_database;
>  datname  |    age
> -----------+------------
> site      | 1645328344
> template0 | 1274558807
> bench     | 1274558807
> postgres  | 1324283514
> template1 | 1274558807
>
> So we’re about 100 mil transactions away before we start vacuuming to prevent wraparound.
>
> We’re running precautionary vacuums on our largest offenders to try and drop our transaction ids
>
> What I’d request some clarity on is the message above. What does it mean that "oldest checkpointed MultiXact does not
existon disk”? Would we lose data if we did have to wrap around? 
>
> Is this telling us we’re not vacuuming effectively enough?

Ugh.  Can you share the output of pg_controldata and the list of files
in pg_multixact/members and pg_multixact/offset?

The problem here is that multixact vacuuming is separate from xid
vacuuming, so you need to be looking at datminmulti rather than
datfrozenxid.  It may be that multixact wrap around has already
occurred.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: MultiXact member wraparound protections are disabled

From
"AnandKumar, Karthik"
Date:
root@site-db01a:/var/lib/pgsql/cmates/data # ls pg_multixact/members
0000  0001  0002  0003  0004  0005  0006  0007  0008  0009  000A  000B  000C  000D  000E  000F  0010  0011  0012  0013
0014 0015  0016  0017  0018  0019  001A  001B
 
root@site-db01a:/var/lib/pgsql/cmates/data # ls pg_multixact/offsets
0001  0002  0003  0004  0005  0006  0007  0008  0009  000A  000B



postgres@site-db01a:~ $ /usr/pgsql-9.4/bin/pg_controldata /var/lib/pgsql/cmates/data
pg_control version number:            942
Catalog version number:               201409291
Database system identifier:           6228991221455883206
Database cluster state:               in production
pg_control last modified:             Wed 12 Oct 2016 05:22:45 PM PDT
Latest checkpoint location:           62D0/BDE939F8
Prior checkpoint location:            62CF/F039BFD0
Latest checkpoint's REDO location:    62D0/8A060220
Latest checkpoint's REDO WAL file:    00000001000062D00000008A
Latest checkpoint's TimeLineID:       1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          1/1834305762
Latest checkpoint's NextOID:          19540327
Latest checkpoint's NextMultiXactId:  784503
Latest checkpoint's NextMultiOffset:  1445264
Latest checkpoint's oldestXID:        226141373
Latest checkpoint's oldestXID's DB:   16457
Latest checkpoint's oldestActiveXID:  1834302410
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 16457
Time of latest checkpoint:            Wed 12 Oct 2016 05:22:05 PM PDT
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location:     0/0
Min recovery ending loc's timeline:   0
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
Current wal_level setting:            hot_standby
Current wal_log_hints setting:        off
Current max_connections setting:      1500
Current max_worker_processes setting: 8
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   1000
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Size of a large-object chunk:         2048
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value
Data page checksum version:           0






On 10/13/16, 5:28 AM, "Alvaro Herrera" <alvherre@2ndquadrant.com> wrote:

>AnandKumar, Karthik wrote:
>> Hi,
>> 
>> We run postgres 9.4.5.
>> 
>> Starting this morning, we started seeing messages like the below:
>> Oct 12 14:07:15 site-db01a postgres[11253]: [106430-1] app=,user=,db=,ip=LOG:  MultiXact member wraparound
protectionsare disabled because oldest checkpointed MultiXact 1 does not exist on disk
 
>> Oct 12 14:09:26 site-db01a postgres[11253]: [106526-1] app=,user=,db=,ip=LOG:  MultiXact member wraparound
protectionsare disabled because oldest checkpointed MultiXact 1 does not exist on disk
 
>> Oct 12 14:14:18 site-db01a postgres[11253]: [106608-1] app=,user=,db=,ip=LOG:  MultiXact member wraparound
protectionsare disabled because oldest checkpointed MultiXact 1 does not exist on disk
 
>> 
>> Our autovacuum_freeze_max_age = 1750000000.
>> 
>> site=# SELECT datname, age(datfrozenxid) FROM pg_database;
>>  datname  |    age
>> -----------+------------
>> site      | 1645328344
>> template0 | 1274558807
>> bench     | 1274558807
>> postgres  | 1324283514
>> template1 | 1274558807
>> 
>> So we’re about 100 mil transactions away before we start vacuuming to prevent wraparound.
>> 
>> We’re running precautionary vacuums on our largest offenders to try and drop our transaction ids
>> 
>> What I’d request some clarity on is the message above. What does it mean that "oldest checkpointed MultiXact does
notexist on disk”? Would we lose data if we did have to wrap around?
 
>> 
>> Is this telling us we’re not vacuuming effectively enough?
>
>Ugh.  Can you share the output of pg_controldata and the list of files
>in pg_multixact/members and pg_multixact/offset?
>
>The problem here is that multixact vacuuming is separate from xid
>vacuuming, so you need to be looking at datminmulti rather than
>datfrozenxid.  It may be that multixact wrap around has already
>occurred.
>
>-- 
>Álvaro Herrera                https://www.2ndQuadrant.com/
>PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: MultiXact member wraparound protections are disabled

From
avi Singh
Date:
Sharing output

postgres@site-db01a:~/cmates/data/pg_multixact/members $ ls
0000  0002  0004  0006  0008  000A  000C  000E  0010  0012  0014  0016  0018  001A
0001  0003  0005  0007  0009  000B  000D  000F  0011  0013  0015  0017  0019  001B

postgres@site-db01a:~/cmates/data/pg_multixact/offsets $ ls
0001  0002  0003  0004  0005  0006  0007  0008  0009  000A  000B


postgres@site-db01a:/tmp $ /usr/pgsql-9.4/bin/pg_controldata -D /var/lib/pgsql/cmates/data
pg_controldata: could not open file "-D/global/pg_control" for reading: No such file or directory

pg_controldata is not working in here even though the file is there inside global but it is not reading from it

postgres@site-db01a:~/cmates/data/global $ ls -la pg_control
-rw-------. 1 postgres postgres 8192 Oct 12 18:55 pg_control


On Wed, Oct 12, 2016 at 4:58 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
AnandKumar, Karthik wrote:
> Hi,
>
> We run postgres 9.4.5.
>
> Starting this morning, we started seeing messages like the below:
> Oct 12 14:07:15 site-db01a postgres[11253]: [106430-1] app=,user=,db=,ip=LOG:  MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on disk
> Oct 12 14:09:26 site-db01a postgres[11253]: [106526-1] app=,user=,db=,ip=LOG:  MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on disk
> Oct 12 14:14:18 site-db01a postgres[11253]: [106608-1] app=,user=,db=,ip=LOG:  MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on disk
>
> Our autovacuum_freeze_max_age = 1750000000.
>
> site=# SELECT datname, age(datfrozenxid) FROM pg_database;
>  datname  |    age
> -----------+------------
> site      | 1645328344
> template0 | 1274558807
> bench     | 1274558807
> postgres  | 1324283514
> template1 | 1274558807
>
> So we’re about 100 mil transactions away before we start vacuuming to prevent wraparound.
>
> We’re running precautionary vacuums on our largest offenders to try and drop our transaction ids
>
> What I’d request some clarity on is the message above. What does it mean that "oldest checkpointed MultiXact does not exist on disk”? Would we lose data if we did have to wrap around?
>
> Is this telling us we’re not vacuuming effectively enough?

Ugh.  Can you share the output of pg_controldata and the list of files
in pg_multixact/members and pg_multixact/offset?

The problem here is that multixact vacuuming is separate from xid
vacuuming, so you need to be looking at datminmulti rather than
datfrozenxid.  It may be that multixact wrap around has already
occurred.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: MultiXact member wraparound protections are disabled

From
avi Singh
Date:
Got the output of pg_control

postgres@site-db01a:~/cmates/data/global $ /usr/pgsql-9.4/bin/pg_controldata  /var/lib/pgsql/cmates/data
pg_control version number:            942
Catalog version number:               201409291
Database system identifier:           6228991221455883206
Database cluster state:               in production
pg_control last modified:             Wed 12 Oct 2016 07:08:14 PM PDT
Latest checkpoint location:           62E1/890DA8D8
Prior checkpoint location:            62E0/550B2178
Latest checkpoint's REDO location:    62E1/4F054A08
Latest checkpoint's REDO WAL file:    00000001000062E10000004F
Latest checkpoint's TimeLineID:       1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          1/1834454859
Latest checkpoint's NextOID:          19540816
Latest checkpoint's NextMultiXactId:  784527
Latest checkpoint's NextMultiOffset:  1445313
Latest checkpoint's oldestXID:        226141373
Latest checkpoint's oldestXID's DB:   16457
Latest checkpoint's oldestActiveXID:  1834454859
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 16457
Time of latest checkpoint:            Wed 12 Oct 2016 07:06:45 PM PDT
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location:     0/0
Min recovery ending loc's timeline:   0
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
Current wal_level setting:            hot_standby
Current wal_log_hints setting:        off
Current max_connections setting:      1500
Current max_worker_processes setting: 8
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   1000
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Size of a large-object chunk:         2048
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value
Data page checksum version:           0

On Wed, Oct 12, 2016 at 7:10 PM, avi Singh <avisingh19811981@gmail.com> wrote:
Sharing output

postgres@site-db01a:~/cmates/data/pg_multixact/members $ ls
0000  0002  0004  0006  0008  000A  000C  000E  0010  0012  0014  0016  0018  001A
0001  0003  0005  0007  0009  000B  000D  000F  0011  0013  0015  0017  0019  001B

postgres@site-db01a:~/cmates/data/pg_multixact/offsets $ ls
0001  0002  0003  0004  0005  0006  0007  0008  0009  000A  000B


postgres@site-db01a:/tmp $ /usr/pgsql-9.4/bin/pg_controldata -D /var/lib/pgsql/cmates/data
pg_controldata: could not open file "-D/global/pg_control" for reading: No such file or directory

pg_controldata is not working in here even though the file is there inside global but it is not reading from it

postgres@site-db01a:~/cmates/data/global $ ls -la pg_control
-rw-------. 1 postgres postgres 8192 Oct 12 18:55 pg_control


On Wed, Oct 12, 2016 at 4:58 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
AnandKumar, Karthik wrote:
> Hi,
>
> We run postgres 9.4.5.
>
> Starting this morning, we started seeing messages like the below:
> Oct 12 14:07:15 site-db01a postgres[11253]: [106430-1] app=,user=,db=,ip=LOG:  MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on disk
> Oct 12 14:09:26 site-db01a postgres[11253]: [106526-1] app=,user=,db=,ip=LOG:  MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on disk
> Oct 12 14:14:18 site-db01a postgres[11253]: [106608-1] app=,user=,db=,ip=LOG:  MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on disk
>
> Our autovacuum_freeze_max_age = 1750000000.
>
> site=# SELECT datname, age(datfrozenxid) FROM pg_database;
>  datname  |    age
> -----------+------------
> site      | 1645328344
> template0 | 1274558807
> bench     | 1274558807
> postgres  | 1324283514
> template1 | 1274558807
>
> So we’re about 100 mil transactions away before we start vacuuming to prevent wraparound.
>
> We’re running precautionary vacuums on our largest offenders to try and drop our transaction ids
>
> What I’d request some clarity on is the message above. What does it mean that "oldest checkpointed MultiXact does not exist on disk”? Would we lose data if we did have to wrap around?
>
> Is this telling us we’re not vacuuming effectively enough?

Ugh.  Can you share the output of pg_controldata and the list of files
in pg_multixact/members and pg_multixact/offset?

The problem here is that multixact vacuuming is separate from xid
vacuuming, so you need to be looking at datminmulti rather than
datfrozenxid.  It may be that multixact wrap around has already
occurred.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: MultiXact member wraparound protections are disabled

From
avi Singh
Date:
We are also seeing this in our log file


Oct 12 19:08:14 site-db01a postgres[6117]: [7589-1] app=,user=,db=,ip=LOG:  MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on disk

On Wed, Oct 12, 2016 at 7:13 PM, avi Singh <avisingh19811981@gmail.com> wrote:
Got the output of pg_control

postgres@site-db01a:~/cmates/data/global $ /usr/pgsql-9.4/bin/pg_controldata  /var/lib/pgsql/cmates/data
pg_control version number:            942
Catalog version number:               201409291
Database system identifier:           6228991221455883206
Database cluster state:               in production
pg_control last modified:             Wed 12 Oct 2016 07:08:14 PM PDT
Latest checkpoint location:           62E1/890DA8D8
Prior checkpoint location:            62E0/550B2178
Latest checkpoint's REDO location:    62E1/4F054A08
Latest checkpoint's REDO WAL file:    00000001000062E10000004F
Latest checkpoint's TimeLineID:       1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          1/1834454859
Latest checkpoint's NextOID:          19540816
Latest checkpoint's NextMultiXactId:  784527
Latest checkpoint's NextMultiOffset:  1445313
Latest checkpoint's oldestXID:        226141373
Latest checkpoint's oldestXID's DB:   16457
Latest checkpoint's oldestActiveXID:  1834454859
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 16457
Time of latest checkpoint:            Wed 12 Oct 2016 07:06:45 PM PDT

Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location:     0/0
Min recovery ending loc's timeline:   0
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
Current wal_level setting:            hot_standby
Current wal_log_hints setting:        off
Current max_connections setting:      1500
Current max_worker_processes setting: 8
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   1000
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Size of a large-object chunk:         2048
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value
Data page checksum version:           0

On Wed, Oct 12, 2016 at 7:10 PM, avi Singh <avisingh19811981@gmail.com> wrote:
Sharing output

postgres@site-db01a:~/cmates/data/pg_multixact/members $ ls
0000  0002  0004  0006  0008  000A  000C  000E  0010  0012  0014  0016  0018  001A
0001  0003  0005  0007  0009  000B  000D  000F  0011  0013  0015  0017  0019  001B

postgres@site-db01a:~/cmates/data/pg_multixact/offsets $ ls
0001  0002  0003  0004  0005  0006  0007  0008  0009  000A  000B


postgres@site-db01a:/tmp $ /usr/pgsql-9.4/bin/pg_controldata -D /var/lib/pgsql/cmates/data
pg_controldata: could not open file "-D/global/pg_control" for reading: No such file or directory

pg_controldata is not working in here even though the file is there inside global but it is not reading from it

postgres@site-db01a:~/cmates/data/global $ ls -la pg_control
-rw-------. 1 postgres postgres 8192 Oct 12 18:55 pg_control


On Wed, Oct 12, 2016 at 4:58 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
AnandKumar, Karthik wrote:
> Hi,
>
> We run postgres 9.4.5.
>
> Starting this morning, we started seeing messages like the below:
> Oct 12 14:07:15 site-db01a postgres[11253]: [106430-1] app=,user=,db=,ip=LOG:  MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on disk
> Oct 12 14:09:26 site-db01a postgres[11253]: [106526-1] app=,user=,db=,ip=LOG:  MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on disk
> Oct 12 14:14:18 site-db01a postgres[11253]: [106608-1] app=,user=,db=,ip=LOG:  MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact 1 does not exist on disk
>
> Our autovacuum_freeze_max_age = 1750000000.
>
> site=# SELECT datname, age(datfrozenxid) FROM pg_database;
>  datname  |    age
> -----------+------------
> site      | 1645328344
> template0 | 1274558807
> bench     | 1274558807
> postgres  | 1324283514
> template1 | 1274558807
>
> So we’re about 100 mil transactions away before we start vacuuming to prevent wraparound.
>
> We’re running precautionary vacuums on our largest offenders to try and drop our transaction ids
>
> What I’d request some clarity on is the message above. What does it mean that "oldest checkpointed MultiXact does not exist on disk”? Would we lose data if we did have to wrap around?
>
> Is this telling us we’re not vacuuming effectively enough?

Ugh.  Can you share the output of pg_controldata and the list of files
in pg_multixact/members and pg_multixact/offset?

The problem here is that multixact vacuuming is separate from xid
vacuuming, so you need to be looking at datminmulti rather than
datfrozenxid.  It may be that multixact wrap around has already
occurred.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



Re: MultiXact member wraparound protections are disabled

From
Alvaro Herrera
Date:
AnandKumar, Karthik wrote:
> root@site-db01a:/var/lib/pgsql/cmates/data # ls pg_multixact/members
> 0000  0001  0002  0003  0004  0005  0006  0007  0008  0009  000A  000B  000C  000D  000E  000F  0010  0011  0012
0013 0014  0015  0016  0017  0018  0019  001A  001B 
> root@site-db01a:/var/lib/pgsql/cmates/data # ls pg_multixact/offsets
> 0001  0002  0003  0004  0005  0006  0007  0008  0009  000A  000B

> postgres@site-db01a:~ $ /usr/pgsql-9.4/bin/pg_controldata /var/lib/pgsql/cmates/data

> Latest checkpoint's NextMultiXactId:  784503
> Latest checkpoint's NextMultiOffset:  1445264
> Latest checkpoint's oldestMultiXid:   1
> Latest checkpoint's oldestMulti's DB: 16457

This looks perfectly normal, except that the pg_multixact/offsets/0000
file is gone.  oldestMultiXid is 1 so I don't see how could have the
file gotten removed.  Has this been upgraded recently from a previous
9.3 or 9.4 version?  There have been bugs in this area but they've been
fixed now for some time.

The 0000 file could have been removed manually, perhaps?

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: MultiXact member wraparound protections are disabled

From
"AnandKumar, Karthik"
Date:
Thanks. We started seeing this error right after a SAN FC re-cable effort - so yes, that would make sense. 
We’ll do a little more digging to see if the 0000 could have gotten removed.
If that’s an older file that we have in our filesystem backups, is it safe to restore from there?




On 10/13/16, 3:30 PM, "Alvaro Herrera" <alvherre@2ndquadrant.com> wrote:

>AnandKumar, Karthik wrote:
>> root@site-db01a:/var/lib/pgsql/cmates/data # ls pg_multixact/members
>> 0000  0001  0002  0003  0004  0005  0006  0007  0008  0009  000A  000B  000C  000D  000E  000F  0010  0011  0012
0013 0014  0015  0016  0017  0018  0019  001A  001B
 
>> root@site-db01a:/var/lib/pgsql/cmates/data # ls pg_multixact/offsets
>> 0001  0002  0003  0004  0005  0006  0007  0008  0009  000A  000B
>
>> postgres@site-db01a:~ $ /usr/pgsql-9.4/bin/pg_controldata /var/lib/pgsql/cmates/data
>
>> Latest checkpoint's NextMultiXactId:  784503
>> Latest checkpoint's NextMultiOffset:  1445264
>> Latest checkpoint's oldestMultiXid:   1
>> Latest checkpoint's oldestMulti's DB: 16457
>
>This looks perfectly normal, except that the pg_multixact/offsets/0000
>file is gone.  oldestMultiXid is 1 so I don't see how could have the
>file gotten removed.  Has this been upgraded recently from a previous
>9.3 or 9.4 version?  There have been bugs in this area but they've been
>fixed now for some time.
>
>The 0000 file could have been removed manually, perhaps?
>
>-- 
>Álvaro Herrera                https://www.2ndQuadrant.com/
>PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: MultiXact member wraparound protections are disabled

From
Alvaro Herrera
Date:
AnandKumar, Karthik wrote:
> Thanks. We started seeing this error right after a SAN FC re-cable effort - so yes, that would make sense.
> We’ll do a little more digging to see if the 0000 could have gotten removed.
> If that’s an older file that we have in our filesystem backups, is it safe to restore from there?

Sure, the files are immutable after they are completed.  I worry that if
the system removed it automatically, it would just remove it again,
though.  Shouldn't happen on 9.4.5, but it seems just too much of a
coincidence that that file was removed.

Changes such as FC recabling should not cause anything like this.  I
mean, why a pg_multixact file and not a table data file?  Very fishy.

I'd advise to verify your older logs at the time of restarts whether the
"multixact protections are enabled" message has ever appeared, or it has
always been "protections are disabled".  Maybe you've had the problem
for ages and just never noticed ...

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: MultiXact member wraparound protections are disabled

From
avi Singh
Date:
Your right we looked back in our old logs and we do see the messages there as well. Still what I'm not getting is since we restarted the database after SAN FC re-cable effort auto-vacuum is running on all the threads continuous. I have never seen auto-vacuum using all the threads 24*7 on this database. Any thoughts ?



On Thu, Oct 13, 2016 at 8:35 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
AnandKumar, Karthik wrote:
> Thanks. We started seeing this error right after a SAN FC re-cable effort - so yes, that would make sense.
> We’ll do a little more digging to see if the 0000 could have gotten removed.
> If that’s an older file that we have in our filesystem backups, is it safe to restore from there?

Sure, the files are immutable after they are completed.  I worry that if
the system removed it automatically, it would just remove it again,
though.  Shouldn't happen on 9.4.5, but it seems just too much of a
coincidence that that file was removed.

Changes such as FC recabling should not cause anything like this.  I
mean, why a pg_multixact file and not a table data file?  Very fishy.

I'd advise to verify your older logs at the time of restarts whether the
"multixact protections are enabled" message has ever appeared, or it has
always been "protections are disabled".  Maybe you've had the problem
for ages and just never noticed ...

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: MultiXact member wraparound protections are disabled

From
Alvaro Herrera
Date:
avi Singh wrote:
> Your right we looked back in our old logs and we do see the messages there
> as well. Still what I'm not getting is since we restarted the database
> after SAN FC re-cable effort auto-vacuum is running on all the threads
> continuous. I have never seen auto-vacuum using all the threads 24*7 on
> this database. Any thoughts ?

It's trying to ensure all tables are correctly frozen.  As I recall,
that's working per spec and you should just let it run until it's done.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: MultiXact member wraparound protections are disabled

From
"AnandKumar, Karthik"
Date:
Thank you for your help Alvaro - we really appreciate it.
The error in fact stopped this morning - we took downtime and ran a vacuum across all of our tables, and saw increased
autovacuum activity as well.
 

It looks like it bumped up the oldest multitxid to something other than 1 now:

postgres@site-db01a:~ $ /usr/pgsql-9.4/bin/pg_controldata /var/lib/pgsql/cmates/data | grep -i multi 
Latest checkpoint's NextMultiXactId: 785051 
Latest checkpoint's NextMultiOffset: 1446371 
Latest checkpoint's oldestMultiXid: 575211 
Latest checkpoint's oldestMulti's DB: 12998