Re: found xmin from before relfrozenxid on pg_catalog.pg_authid - Mailing list pgsql-general

From Jeremy Finzel
Subject Re: found xmin from before relfrozenxid on pg_catalog.pg_authid
Date
Msg-id CAMa1XUjOhoxrcdtTPyxBU_CUwjii2Bkc1Fdnto4K-oATDAv4CA@mail.gmail.com
Whole thread Raw
In response to Re: found xmin from before relfrozenxid on pg_catalog.pg_authid  (Maxim Boguk <maxim.boguk@gmail.com>)
List pgsql-general


On Tue, May 22, 2018 at 2:42 PM, Maxim Boguk <maxim.boguk@gmail.com> wrote:


On Tue, May 22, 2018 at 10:30 PM, Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2018-05-22 22:18:15 +0300, Maxim Boguk wrote:
> On Tue, May 22, 2018 at 9:47 PM, Andres Freund <andres@anarazel.de> wrote:
> > >  select relfrozenxid from pg_class where relname='pg_authid';
> > >  relfrozenxid
> > > --------------
> > >    2863429136

> ​select txid_current();
>  txid_current
> --------------
>   41995913769

So that's an xid of 3341208114, if you leave the epoch out. What's
​​
autovacuum_freeze_max_age set to in that cluster?

​postgres=# show autovacuum_freeze_max_age;
 autovacuum_freeze_max_age
---------------------------
 200000000
(default value I think)​

 
Can you show pg_controldata output, and
​​
relminmxid from that cluster?

​postgres@db:~$ /usr/lib/postgresql/9.6/bin/pg_controldata  -D /var/lib/postgresql/9.6/main
pg_control version number:            960
Catalog version number:               201608131
Database system identifier:           6469368654711450114
Database cluster state:               in production
pg_control last modified:             Tue 22 May 2018 10:20:14 PM MSK
Latest checkpoint location:           CCB5/F9C37950
Prior checkpoint location:            CCB0/43F316B0
Latest checkpoint's REDO location:    CCB1/6706BD88
Latest checkpoint's REDO WAL file:    000000010000CCB100000067
Latest checkpoint's TimeLineID:       1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          9:3341161759
Latest checkpoint's NextOID:          190071899
Latest checkpoint's NextMultiXactId:  59416233
Latest checkpoint's NextMultiOffset:  215588532
Latest checkpoint's oldestXID:        2814548646
Latest checkpoint's oldestXID's DB:   16400
Latest checkpoint's oldestActiveXID:  3341161759
Latest checkpoint's oldestMultiXid:   54264778
Latest checkpoint's oldestMulti's DB: 16400
Latest checkpoint's oldestCommitTsXid:2814548646
Latest checkpoint's newestCommitTsXid:3341161758
Time of latest checkpoint:            Tue 22 May 2018 10:05:16 PM MSK
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location:     0/0
Min recovery ending loc's timeline:   0
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
wal_level setting:                    replica
wal_log_hints setting:                on
max_connections setting:              2000
max_worker_processes setting:         8
max_prepared_xacts setting:           0
max_locks_per_xact setting:           64
track_commit_timestamp setting:       on
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Size of a large-object chunk:         2048
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value
Data page checksum version:           0
postgres=# select datname,datfrozenxid,datminmxid from pg_database order by datname;
  datname  | datfrozenxid | datminmxid
-----------+--------------+------------
 **        |   2815939794 |   54265194
 postgres  |   2863429136 |   54280819
 template0 |   3148297669 |   59342106
 template1 |   2816765546 |   59261794



 

I might be daft here, but it's surely curious that the relfrozenxid from
the error and pg_catalog are really different (number of digits):
catalog: 2863429136
error:    248712603


> ​About gdb bt - it's tricky because it is mission critical master db of
> huge project.
> I'll will try promote backup replica and check is issue persist there and
> if yes - we will have our playground for a while, but it will require
> sometime to arrange.​

You should be ok to just bt that in the running cluster, but I
definitely understand if you don't want to do that...  I'd appreciate if
you set up the a playground, because this seems like something that'll
reappear.

​My gdb kunf-fu exceptionally rusty now, so I better play in sandbox.​
​And if error did't exist on fresh promoted replica it will give us useful info as well.​

--
Maxim Boguk
Senior Postgresql DBA
http://dataegret.com/

Phone RU: +7  985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61  45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk

"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно когда я так делаю ещё раз?"


All - 

We just noticed that this exact same problem has now returned for us.  It has exactly the same symptoms as my original post (including the fact that a snapshot does not have the issue).

I am open to all suggestions as to troubleshooting this further!

Thank you,
Jeremy

pgsql-general by date:

Previous
From: "a"
Date:
Subject: Re:RE: RE: How do I select composite array element that satisfy specific conditions.
Next
From: "a"
Date:
Subject: Re:How do I copy an element of composite type array into csv file?