Thread: found xmin from before relfrozenxid on pg_catalog.pg_authid
Getting some concerning errors in one of our databases that is on 9.5.11, on autovacuum from template0 database pg_authid and pg_auth_members. I only saw some notes on the list about this error related to materialized views. FWIW, we did use pg_upgrade to upgrade this database from 9.4 to 9.5. Here is an example:
2018-03-19 12:08:33.946 CDT,,,14892,,5aafee91.3a2c,1,,2018-03-19 12:08:33 CDT,59/340951,0,ERROR,XX001,"found xmin 2906288382 from before relfrozenxid 740087784",,,,,"automatic vacuum of table ""template0.pg_catalog.pg_authid""",,,,""
2018-03-19 12:08:33.957 CDT,,,14892,,5aafee91.3a2c,2,,2018-03-19 12:08:33 CDT,59/340953,0,ERROR,XX001,"found xmin 2906288383 from before relfrozenxid 740087784",,,,,"automatic vacuum of table ""template0.pg_catalog.pg_auth_members""",,,,""
Any insight would be much appreciated.
Thanks,
Jeremy
Jeremy Finzel wrote: > Getting some concerning errors in one of our databases that is on 9.5.11, > on autovacuum from template0 database pg_authid and pg_auth_members. I > only saw some notes on the list about this error related to materialized > views. FWIW, we did use pg_upgrade to upgrade this database from 9.4 to > 9.5. Here is an example: > > 2018-03-19 12:08:33.946 CDT,,,14892,,5aafee91.3a2c,1,,2018-03-19 12:08:33 > CDT,59/340951,0,ERROR,XX001,"found xmin 2906288382 from before relfrozenxid > 740087784",,,,,"automatic vacuum of table > ""template0.pg_catalog.pg_authid""",,,,"" Can you please supply output of pg_controldata? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pg_control version number: 942
Catalog version number: 201510051
Database system identifier: 6351536019599012028
Database cluster state: in production
pg_control last modified: Mon 19 Mar 2018 12:56:10 PM CDT
Latest checkpoint location: 262BE/FE96240
Prior checkpoint location: 262BA/623D5E40
Latest checkpoint's REDO location: 262BA/F5499E98
Latest checkpoint's REDO WAL file: 00000001000262BA000000F5
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 16/3132524419
Latest checkpoint's NextOID: 1090653331
Latest checkpoint's NextMultiXactId: 2142
Latest checkpoint's NextMultiOffset: 5235
Latest checkpoint's oldestXID: 1829964553
Latest checkpoint's oldestXID's DB: 12376
Latest checkpoint's oldestActiveXID: 3131774441
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 16400
Latest checkpoint's oldestCommitTsXid:1829964553
Latest checkpoint's newestCommitTsXid:3132524418
Time of latest checkpoint: Mon 19 Mar 2018 12:54:08 PM CDT
Fake LSN counter for unlogged rels: 0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
wal_level setting: logical
wal_log_hints setting: off
max_connections setting: 2000
max_worker_processes setting: 10
max_prepared_xacts setting: 0
max_locks_per_xact setting: 64
track_commit_timestamp setting: on
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value
Data page checksum version: 0
On Mon, Mar 19, 2018 at 12:46 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Jeremy Finzel wrote:
> Getting some concerning errors in one of our databases that is on 9.5.11,
> on autovacuum from template0 database pg_authid and pg_auth_members. I
> only saw some notes on the list about this error related to materialized
> views. FWIW, we did use pg_upgrade to upgrade this database from 9.4 to
> 9.5. Here is an example:
>
> 2018-03-19 12:08:33.946 CDT,,,14892,,5aafee91.3a2c,1,,2018-03-19 12:08:33
> CDT,59/340951,0,ERROR,XX001,"found xmin 2906288382 from before relfrozenxid
> 740087784",,,,,"automatic vacuum of table
> ""template0.pg_catalog.pg_authid""",,,,""
Can you please supply output of pg_controldata?
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Mar 19, 2018 at 12:46 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Jeremy Finzel wrote:
> Getting some concerning errors in one of our databases that is on 9.5.11,
> on autovacuum from template0 database pg_authid and pg_auth_members. I
> only saw some notes on the list about this error related to materialized
> views. FWIW, we did use pg_upgrade to upgrade this database from 9.4 to
> 9.5. Here is an example:
>
> 2018-03-19 12:08:33.946 CDT,,,14892,,5aafee91.3a2c,1,,2018-03-19 12:08:33
> CDT,59/340951,0,ERROR,XX001,"found xmin 2906288382 from before relfrozenxid
> 740087784",,,,,"automatic vacuum of table
> ""template0.pg_catalog.pg_authid""",,,,""
Can you please supply output of pg_controldata?
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Please forgive my accidental top-post. Here:
pg_control version number: 942
Catalog version number: 201510051
Database system identifier: 6351536019599012028
Database cluster state: in production
pg_control last modified: Mon 19 Mar 2018 12:56:10 PM CDT
Latest checkpoint location: 262BE/FE96240
Prior checkpoint location: 262BA/623D5E40
Latest checkpoint's REDO location: 262BA/F5499E98
Latest checkpoint's REDO WAL file: 00000001000262BA000000F5
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 16/3132524419
Latest checkpoint's NextOID: 1090653331
Latest checkpoint's NextMultiXactId: 2142
Latest checkpoint's NextMultiOffset: 5235
Latest checkpoint's oldestXID: 1829964553
Latest checkpoint's oldestXID's DB: 12376
Latest checkpoint's oldestActiveXID: 3131774441
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 16400
Latest checkpoint's oldestCommitTsXid:1829964553
Latest checkpoint's newestCommitTsXid:3132524418
Time of latest checkpoint: Mon 19 Mar 2018 12:54:08 PM CDT
Fake LSN counter for unlogged rels: 0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
wal_level setting: logical
wal_log_hints setting: off
max_connections setting: 2000
max_worker_processes setting: 10
max_prepared_xacts setting: 0
max_locks_per_xact setting: 64
track_commit_timestamp setting: on
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value
Data page checksum version: 0
Thanks,
Jeremy
Hi Jeremy, Alvaro, On 2018-03-19 13:00:13 -0500, Jeremy Finzel wrote: > On Mon, Mar 19, 2018 at 12:46 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> > wrote: > > > Jeremy Finzel wrote: > > > Getting some concerning errors in one of our databases that is on 9.5.11, > > > on autovacuum from template0 database pg_authid and pg_auth_members. I > > > only saw some notes on the list about this error related to materialized > > > views. FWIW, we did use pg_upgrade to upgrade this database from 9.4 to > > > 9.5. Here is an example: > > > > > > 2018-03-19 12:08:33.946 CDT,,,14892,,5aafee91.3a2c,1,,2018-03-19 > > 12:08:33 > > > CDT,59/340951,0,ERROR,XX001,"found xmin 2906288382 from before > > relfrozenxid > > > 740087784",,,,,"automatic vacuum of table > > > ""template0.pg_catalog.pg_authid""",,,,"" > > > > Can you please supply output of pg_controldata? > Latest checkpoint's NextXID: 16/3132524419 > Latest checkpoint's NextMultiXactId: 2142 > Latest checkpoint's NextMultiOffset: 5235 > Latest checkpoint's oldestXID: 1829964553 > Latest checkpoint's oldestXID's DB: 12376 > Latest checkpoint's oldestActiveXID: 3131774441 > Latest checkpoint's oldestMultiXid: 1 > Latest checkpoint's oldestMulti's DB: 16400 Hm, based on these it doesn't look like multixacts were involved (based on oldestMultiXid it's highly unlikley there've multi wraparound, and there's not much multixact usage on system tables anyway). Which suggests that there might have been actual corrpution here. Jeremy: - which version of 9.4 and 9.5 ran on this? Do you know? - Can you install the pageinspect extension? If so, it might be a CREATE EXTENSION pageinspect; CREATE OR REPLACE FUNCTION check_rel(rel regclass, OUT blockno int8, OUT lp int2, OUT xmin xid) RETURNS SETOF RECORD LANGUAGE SQL AS $$ SELECT blockno, lp, t_xmin FROM generate_series(0, pg_relation_size($1::text) / 8192 - 1) blockno, -- every block in the relation heap_page_items(get_raw_page($1::text, blockno::int4)) -- every item on the page WHERE t_xmin IS NOT NULL -- filter out empty items AND t_xmin != 1 -- filter out bootstrap AND t_xmin != 2 -- filter out frozen transaction id AND (t_infomask & ((x'0100' | x'0200')::int)) != ((x'0100' | x'0200')::int) -- filter out frozen rows with xid present AND age(t_xmin) > age((SELECT relfrozenxid FROM pg_class WHERE oid = $1)) -- xid cutoff filter $$; SELECT * FROM check_rel('pg_authid') LIMIT 100; and then display all items for one of the affected pages like SELECT * FROM heap_page_items(get_raw_page('pg_authid', 34343)); Alvaro: - Hm, we talked about code adding context for these kind of errors, right? Is that just skipped for csvlog? - Alvaro, does the above check_rel() function make sense? Greetings, Andres Freund
On Mon, Mar 19, 2018 at 1:17 PM, Andres Freund <andres@anarazel.de> wrote:
Hi Jeremy, Alvaro,
On 2018-03-19 13:00:13 -0500, Jeremy Finzel wrote:
> On Mon, Mar 19, 2018 at 12:46 PM, Alvaro Herrera <alvherre@alvh.no-ip.org>
> wrote:
>
> > Jeremy Finzel wrote:
> > > Getting some concerning errors in one of our databases that is on 9.5.11,
> > > on autovacuum from template0 database pg_authid and pg_auth_members. I
> > > only saw some notes on the list about this error related to materialized
> > > views. FWIW, we did use pg_upgrade to upgrade this database from 9.4 to
> > > 9.5. Here is an example:
> > >
> > > 2018-03-19 12:08:33.946 CDT,,,14892,,5aafee91.3a2c,1,,2018-03-19
> > 12:08:33
> > > CDT,59/340951,0,ERROR,XX001,"found xmin 2906288382 from before
> > relfrozenxid
> > > 740087784",,,,,"automatic vacuum of table
> > > ""template0.pg_catalog.pg_authid""",,,,""
> >
> > Can you please supply output of pg_controldata?
> Latest checkpoint's NextXID: 16/3132524419
> Latest checkpoint's NextMultiXactId: 2142
> Latest checkpoint's NextMultiOffset: 5235
> Latest checkpoint's oldestXID: 1829964553
> Latest checkpoint's oldestXID's DB: 12376
> Latest checkpoint's oldestActiveXID: 3131774441
> Latest checkpoint's oldestMultiXid: 1
> Latest checkpoint's oldestMulti's DB: 16400
Hm, based on these it doesn't look like multixacts were involved (based
on oldestMultiXid it's highly unlikley there've multi wraparound, and
there's not much multixact usage on system tables anyway). Which
suggests that there might have been actual corrpution here.
Jeremy:
- which version of 9.4 and 9.5 ran on this? Do you know?
We upgraded to 9.5.5, and today we are running 9.5.11. And actually we upgraded from 9.3, not 9.4. We are still trying to figure out which point release we were on at 9.3.
- Can you install the pageinspect extension? If so, it might be a
CREATE EXTENSION pageinspect;
CREATE OR REPLACE FUNCTION check_rel(rel regclass, OUT blockno int8, OUT lp int2, OUT xmin xid)
RETURNS SETOF RECORD
LANGUAGE SQL
AS $$
SELECT blockno, lp, t_xmin
FROM
generate_series(0, pg_relation_size($1::text) / 8192 - 1) blockno, -- every block in the relation
heap_page_items(get_raw_page($1::text, blockno::int4)) -- every item on the page
WHERE
t_xmin IS NOT NULL -- filter out empty items
AND t_xmin != 1 -- filter out bootstrap
AND t_xmin != 2 -- filter out frozen transaction id
AND (t_infomask & ((x'0100' | x'0200')::int)) != ((x'0100' | x'0200')::int) -- filter out frozen rows with xid present
AND age(t_xmin) > age((SELECT relfrozenxid FROM pg_class WHERE oid = $1)) -- xid cutoff filter
$$;
SELECT * FROM check_rel('pg_authid') LIMIT 100;
Small note - Needs to be this because != is not supported for xid:
AND NOT t_xmin = 1 -- filter out bootstrap
AND NOT t_xmin = 2 -- filter out frozen transaction id
AND NOT t_xmin = 2 -- filter out frozen transaction id
and then display all items for one of the affected pages like
SELECT * FROM heap_page_items(get_raw_page('pg_authid', 34343));
Alvaro:
- Hm, we talked about code adding context for these kind of errors,
right? Is that just skipped for csvlog?
- Alvaro, does the above check_rel() function make sense?
Greetings,
Andres Freund
The function does NOT show any issue with either of those tables.
One very interesting thing that is puzzling us - we have taken several san snapshots of the system real time that are running on the exact same version 9.5.11, and they do NOT show the same error when we vacuum these tables. It makes us wonder if simply a db restart would solve the issue.
We will continue to investigate but interested in your feedback about what we have seen thus far.
Thanks,
Jeremy
On 2018-03-19 14:37:24 -0500, Jeremy Finzel wrote: > We upgraded to 9.5.5, and today we are running 9.5.11. And actually we > upgraded from 9.3, not 9.4. We are still trying to figure out which point > release we were on at 9.3. Ok. IIRC there used to be a bug a few years back that sometimes lead to highly contended pages being skipped during vacuum, and we'd still update relfrozenxid. IIRC it required the table to be extended at the same time or something? > > > - Can you install the pageinspect extension? If so, it might be a > > CREATE EXTENSION pageinspect; > > CREATE OR REPLACE FUNCTION check_rel(rel regclass, OUT blockno int8, OUT > > lp int2, OUT xmin xid) > > RETURNS SETOF RECORD > > LANGUAGE SQL > > AS $$ > > SELECT blockno, lp, t_xmin > > FROM > > generate_series(0, pg_relation_size($1::text) / 8192 - 1) blockno, > > -- every block in the relation > > heap_page_items(get_raw_page($1::text, blockno::int4)) -- every > > item on the page > > WHERE > > t_xmin IS NOT NULL -- filter out empty items > > AND t_xmin != 1 -- filter out bootstrap > > AND t_xmin != 2 -- filter out frozen transaction id > > AND (t_infomask & ((x'0100' | x'0200')::int)) != ((x'0100' | > > x'0200')::int) -- filter out frozen rows with xid present > > AND age(t_xmin) > age((SELECT relfrozenxid FROM pg_class WHERE oid > > = $1)) -- xid cutoff filter > > $$; > > SELECT * FROM check_rel('pg_authid') LIMIT 100; > > > > Small note - Needs to be this because != is not supported for xid: > > AND NOT t_xmin = 1 -- filter out bootstrap > AND NOT t_xmin = 2 -- filter out frozen transaction id Only on older releases ;). But yea, that looks right. > > and then display all items for one of the affected pages like > > SELECT * FROM heap_page_items(get_raw_page('pg_authid', 34343)); > > > > > > Alvaro: > > - Hm, we talked about code adding context for these kind of errors, > > right? Is that just skipped for csvlog? > > - Alvaro, does the above check_rel() function make sense? > > > > Greetings, > > > > Andres Freund > > > > The function does NOT show any issue with either of those tables. Uh, huh? Alvaro, do you see a bug in my query? Greetings, Andres Freund
On Mon, Mar 19, 2018 at 2:41 PM, Andres Freund <andres@anarazel.de> wrote:
On 2018-03-19 14:37:24 -0500, Jeremy Finzel wrote:
> We upgraded to 9.5.5, and today we are running 9.5.11. And actually we
> upgraded from 9.3, not 9.4. We are still trying to figure out which point
> release we were on at 9.3.
Ok. IIRC there used to be a bug a few years back that sometimes lead to
highly contended pages being skipped during vacuum, and we'd still
update relfrozenxid. IIRC it required the table to be extended at the
same time or something?Only on older releases ;). But yea, that looks right.
>
> > - Can you install the pageinspect extension? If so, it might be a
> > CREATE EXTENSION pageinspect;
> > CREATE OR REPLACE FUNCTION check_rel(rel regclass, OUT blockno int8, OUT
> > lp int2, OUT xmin xid)
> > RETURNS SETOF RECORD
> > LANGUAGE SQL
> > AS $$
> > SELECT blockno, lp, t_xmin
> > FROM
> > generate_series(0, pg_relation_size($1::text) / 8192 - 1) blockno,
> > -- every block in the relation
> > heap_page_items(get_raw_page($1::text, blockno::int4)) -- every
> > item on the page
> > WHERE
> > t_xmin IS NOT NULL -- filter out empty items
> > AND t_xmin != 1 -- filter out bootstrap
> > AND t_xmin != 2 -- filter out frozen transaction id
> > AND (t_infomask & ((x'0100' | x'0200')::int)) != ((x'0100' |
> > x'0200')::int) -- filter out frozen rows with xid present
> > AND age(t_xmin) > age((SELECT relfrozenxid FROM pg_class WHERE oid
> > = $1)) -- xid cutoff filter
> > $$;
> > SELECT * FROM check_rel('pg_authid') LIMIT 100;
> >
>
> Small note - Needs to be this because != is not supported for xid:
>
> AND NOT t_xmin = 1 -- filter out bootstrap
> AND NOT t_xmin = 2 -- filter out frozen transaction id
> > and then display all items for one of the affected pages like
> > SELECT * FROM heap_page_items(get_raw_page('pg_authid', 34343));
> >
> >
> > Alvaro:
> > - Hm, we talked about code adding context for these kind of errors,
> > right? Is that just skipped for csvlog?
> > - Alvaro, does the above check_rel() function make sense?
> >
> > Greetings,
> >
> > Andres Freund
> >
>
> The function does NOT show any issue with either of those tables.
Uh, huh? Alvaro, do you see a bug in my query?
Greetings,
Andres Freund
FWIW, if I remove the last filter, I get these rows and I believe row 7/57/2906288382 is the one generating error:
SELECT * FROM check_rel('pg_authid') LIMIT 100;
blockno | lp | xmin
---------+----+------------
7 | 4 | 2040863716
7 | 5 | 2040863716
7 | 8 | 2041172882
7 | 9 | 2041172882
7 | 12 | 2041201779
7 | 13 | 2041201779
7 | 16 | 2089742733
7 | 17 | 2090021318
7 | 18 | 2090021318
7 | 47 | 2090021898
7 | 48 | 2090021898
7 | 49 | 2102749003
7 | 50 | 2103210571
7 | 51 | 2103210571
7 | 54 | 2154640913
7 | 55 | 2163849781
7 | 56 | 2295315714
7 | 57 | 2906288382
7 | 58 | 2906329443
7 | 60 | 3131766386
8 | 1 | 2089844462
8 | 2 | 2089844462
8 | 3 | 2089844463
8 | 6 | 2089844463
8 | 9 | 2295318868
(25 rows)
Hi, On 2018-03-19 14:53:58 -0500, Jeremy Finzel wrote: > FWIW, if I remove the last filter, I get these rows and I believe row 7/57/ > 2906288382 is the one generating error: Oh, yea, that makes sense. It's wrapped around and looks like it's from the future. > SELECT * FROM check_rel('pg_authid') LIMIT 100; > blockno | lp | xmin > ---------+----+------------ > 7 | 4 | 2040863716 > 7 | 5 | 2040863716 > 7 | 8 | 2041172882 > 7 | 9 | 2041172882 > 7 | 12 | 2041201779 > 7 | 13 | 2041201779 > 7 | 16 | 2089742733 > 7 | 17 | 2090021318 > 7 | 18 | 2090021318 > 7 | 47 | 2090021898 > 7 | 48 | 2090021898 > 7 | 49 | 2102749003 > 7 | 50 | 2103210571 > 7 | 51 | 2103210571 > 7 | 54 | 2154640913 > 7 | 55 | 2163849781 > 7 | 56 | 2295315714 > 7 | 57 | 2906288382 > 7 | 58 | 2906329443 > 7 | 60 | 3131766386 > 8 | 1 | 2089844462 > 8 | 2 | 2089844462 > 8 | 3 | 2089844463 > 8 | 6 | 2089844463 > 8 | 9 | 2295318868 > (25 rows) Could you show the contents of those two pages with a query like I had in an earlier email? Greetings, Andres Freund
On Mon, Mar 19, 2018 at 2:56 PM, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2018-03-19 14:53:58 -0500, Jeremy Finzel wrote:
> FWIW, if I remove the last filter, I get these rows and I believe row 7/57/
> 2906288382 is the one generating error:
Oh, yea, that makes sense. It's wrapped around and looks like it's from
the future.Could you show the contents of those two pages with a query like I had
> SELECT * FROM check_rel('pg_authid') LIMIT 100;
> blockno | lp | xmin
> ---------+----+------------
> 7 | 4 | 2040863716
> 7 | 5 | 2040863716
> 7 | 8 | 2041172882
> 7 | 9 | 2041172882
> 7 | 12 | 2041201779
> 7 | 13 | 2041201779
> 7 | 16 | 2089742733
> 7 | 17 | 2090021318
> 7 | 18 | 2090021318
> 7 | 47 | 2090021898
> 7 | 48 | 2090021898
> 7 | 49 | 2102749003
> 7 | 50 | 2103210571
> 7 | 51 | 2103210571
> 7 | 54 | 2154640913
> 7 | 55 | 2163849781
> 7 | 56 | 2295315714
> 7 | 57 | 2906288382
> 7 | 58 | 2906329443
> 7 | 60 | 3131766386
> 8 | 1 | 2089844462
> 8 | 2 | 2089844462
> 8 | 3 | 2089844463
> 8 | 6 | 2089844463
> 8 | 9 | 2295318868
> (25 rows)
in an earlier email?
Greetings,
Andres Freund
SELECT heap_page_items(get_raw_page('pg_authid', 7));
heap_page_items
--------------------------------------------------------------------------------------------------------------------------------------------
(1,4720,1,108,1897434979,0,0,"(7,1)",11,2825,32,111111111000000000000000000000000000000001011000010011111100001001111000,507769370)
(2,4608,1,108,1897442758,0,18,"(7,2)",11,2825,32,111111111000000000000000000000000000000011000011101100000010001001111000,507776451)
(3,4496,1,108,1897442758,0,20,"(7,3)",11,2825,32,111111111000000000000000000000000000000000100011101100000010001001111000,507776452)
(4,4384,1,108,2040863716,0,37,"(7,4)",11,2313,32,111111111000000000000000000000000000000000110110111011100011001011111000,525105004)
(5,4272,1,108,2040863716,0,39,"(7,5)",11,2313,32,111111111000000000000000000000000000000010110110111011100011001011111000,525105005)
(6,0,3,0,,,,,,,,,)
(7,0,3,0,,,,,,,,,)
(8,4160,1,108,2041172882,0,49,"(7,8)",11,2313,32,111111111000000000000000000000000000000001110100101011000111001011111000,525219118)
(9,4048,1,108,2041172882,0,51,"(7,9)",11,2313,32,111111111000000000000000000000000000000011110100101011000111001011111000,525219119)
(10,0,3,0,,,,,,,,,)
(11,0,3,0,,,,,,,,,)
(12,3936,1,108,2041201779,0,181,"(7,12)",11,2313,32,111111111000000000000000000000000000000011010100010111100111001011111000,525236779)
(13,3824,1,108,2041201779,0,183,"(7,13)",11,2313,32,111111111000000000000000000000000000000000110100010111100111001011111000,525236780)
(14,0,3,0,,,,,,,,,)
(15,0,3,0,,,,,,,,,)
(16,3712,1,108,2089742733,0,0,"(7,16)",11,2313,32,111111111000000000000000000000000000000001000101110011100000001111111000,532706210)
(17,3600,1,108,2090021318,0,1,"(7,17)",11,2313,32,111111111000000000000000000000000000000001001100001101001000001111111000,532753458)
(18,3488,1,108,2090021318,0,3,"(7,18)",11,2313,32,111111111000000000000000000000000000000011001100001101001000001111111000,532753459)
(19,8080,1,108,753125991,0,2,"(7,19)",11,2825,32,111111111000000000000000000000000000000000110001010111001011100001110000,236796556)
(20,7968,1,108,753125991,0,4,"(7,20)",11,2825,32,111111111000000000000000000000000000000010110001010111001011100001110000,236796557)
(21,7856,1,108,753125992,0,19,"(7,21)",11,2825,32,111111111000000000000000000000000000000011001001010111001011100001110000,236796563)
(22,7744,1,108,753125992,0,21,"(7,22)",11,2825,32,111111111000000000000000000000000000000000101001010111001011100001110000,236796564)
(23,7632,1,108,753125993,0,36,"(7,23)",11,2825,32,111111111000000000000000000000000000000001011001010111001011100001110000,236796570)
(24,7520,1,108,753125993,0,38,"(7,24)",11,2825,32,111111111000000000000000000000000000000011011001010111001011100001110000,236796571)
(25,7408,1,108,753125994,0,53,"(7,25)",11,2825,32,111111111000000000000000000000000000000010000101010111001011100001110000,236796577)
(26,7296,1,108,753125994,0,55,"(7,26)",11,2825,32,111111111000000000000000000000000000000001000101010111001011100001110000,236796578)
(27,7184,1,108,753125995,0,70,"(7,27)",11,2825,32,111111111000000000000000000000000000000000010101010111001011100001110000,236796584)
(28,7072,1,108,753125995,0,72,"(7,28)",11,2825,32,111111111000000000000000000000000000000010010101010111001011100001110000,236796585)
(29,6960,1,108,753125997,0,87,"(7,29)",11,2825,32,111111111000000000000000000000000000000011001101010111001011100001110000,236796595)
(30,6848,1,108,753125997,0,89,"(7,30)",11,2825,32,111111111000000000000000000000000000000000101101010111001011100001110000,236796596)
(31,6736,1,108,753125998,0,104,"(7,31)",11,2825,32,111111111000000000000000000000000000000001011101010111001011100001110000,236796602)
(32,6624,1,108,753125998,0,106,"(7,32)",11,2825,32,111111111000000000000000000000000000000011011101010111001011100001110000,236796603)
(33,6512,1,108,753125999,0,121,"(7,33)",11,2825,32,111111111000000000000000000000000000000010000011010111001011100001110000,236796609)
(34,6400,1,108,753125999,0,123,"(7,34)",11,2825,32,111111111000000000000000000000000000000001000011010111001011100001110000,236796610)
(35,6288,1,108,753126000,0,138,"(7,35)",11,2825,32,111111111000000000000000000000000000000000010011010111001011100001110000,236796616)
(36,6176,1,108,753126000,0,140,"(7,36)",11,2825,32,111111111000000000000000000000000000000010010011010111001011100001110000,236796617)
(37,6064,1,108,753126001,0,155,"(7,37)",11,2825,32,111111111000000000000000000000000000000011110011010111001011100001110000,236796623)
(38,5952,1,108,753126001,0,157,"(7,38)",11,2825,32,111111111000000000000000000000000000000000001011010111001011100001110000,236796624)
(39,5840,1,108,753126002,0,172,"(7,39)",11,2825,32,111111111000000000000000000000000000000001101011010111001011100001110000,236796630)
(40,5728,1,108,753126002,0,174,"(7,40)",11,2825,32,111111111000000000000000000000000000000011101011010111001011100001110000,236796631)
(41,5616,1,108,753126003,0,189,"(7,41)",11,2825,32,111111111000000000000000000000000000000010111011010111001011100001110000,236796637)
(42,5504,1,108,753126003,0,191,"(7,42)",11,2825,32,111111111000000000000000000000000000000001111011010111001011100001110000,236796638)
(43,5392,1,108,753126004,0,206,"(7,43)",11,2825,32,111111111000000000000000000000000000000000100111010111001011100001110000,236796644)
(44,5280,1,108,753126004,0,208,"(7,44)",11,2825,32,111111111000000000000000000000000000000010100111010111001011100001110000,236796645)
(45,5168,1,108,753126005,0,223,"(7,45)",11,2825,32,111111111000000000000000000000000000000011010111010111001011100001110000,236796651)
(46,5056,1,108,753126005,0,225,"(7,46)",11,2825,32,111111111000000000000000000000000000000000110111010111001011100001110000,236796652)
(47,3376,1,108,2090021898,0,18,"(7,47)",11,2313,32,111111111000000000000000000000000000000001000111001101001000001111111000,532753634)
(48,3264,1,108,2090021898,0,20,"(7,48)",11,2313,32,111111111000000000000000000000000000000000100111001101001000001111111000,532753636)
(49,3152,1,108,2102749003,0,0,"(7,49)",11,2313,32,111111111000000000000000000000000000000001111011101110010000110100000100,548445662)
(50,3040,1,108,2103210571,0,107,"(7,50)",11,2313,32,111111111000000000000000000000000000000000010110101100100100110100000100,548556136)
(51,2928,1,108,2103210571,0,109,"(7,51)",11,2313,32,111111111000000000000000000000000000000010010110101100100100110100000100,548556137)
(52,0,3,0,,,,,,,,,)
(53,0,3,0,,,,,,,,,)
(54,2816,1,108,2154640913,0,0,"(7,54)",11,2313,32,111111111000000000000000000000000000000010111111010010000000100010000100,554701565)
(55,2704,1,108,2163849781,0,0,"(7,55)",11,2313,32,111111111000000000000000000000000000000000110000001011101001100010000100,555316236)
(56,2592,1,108,2295315714,0,0,"(7,56)",11,2313,32,111111111000000000000000000000000000000000100101010100001000010001000100,572590756)
(57,2480,1,108,2906288382,0,0,"(7,57)",11,2313,32,111111111000000000000000000000000000000000001000101101000100111001111100,1047670032)
(58,2368,1,108,2906329443,0,0,"(7,58)",11,2313,32,111111111000000000000000000000000000000001111001101101000100111001111100,1047670174)
(59,60,2,0,,,,,,,,,)
(60,2224,1,144,3131766386,0,0,"(7,60)",32779,10507,32,111111111100000000000000000000000000000000001101010001010011111100000010,1090298544)
(61,0,0,0,,,,,,,,,)
(62,0,0,0,,,,,,,,,)
(63,0,0,0,,,,,,,,,)
(64,0,0,0,,,,,,,,,)
(65,0,0,0,,,,,,,,,)
(66,0,0,0,,,,,,,,,)
(67,0,0,0,,,,,,,,,)
(68,0,0,0,,,,,,,,,)
(69,4944,1,108,1034607755,0,1,"(7,69)",11,2825,32,111111111000000000000000000000000000000000010111011000010010010100001000,279217896)
(70,4832,1,108,1034607755,0,3,"(7,70)",11,2825,32,111111111000000000000000000000000000000010010111011000010010010100001000,279217897)
(70 rows)
On Mon, Mar 19, 2018 at 3:01 PM, Jeremy Finzel <finzelj@gmail.com> wrote:
On Mon, Mar 19, 2018 at 2:56 PM, Andres Freund <andres@anarazel.de> wrote:Hi,
On 2018-03-19 14:53:58 -0500, Jeremy Finzel wrote:
> FWIW, if I remove the last filter, I get these rows and I believe row 7/57/
> 2906288382 is the one generating error:
Oh, yea, that makes sense. It's wrapped around and looks like it's from
the future.Could you show the contents of those two pages with a query like I had
> SELECT * FROM check_rel('pg_authid') LIMIT 100;
> blockno | lp | xmin
> ---------+----+------------
> 7 | 4 | 2040863716
> 7 | 5 | 2040863716
> 7 | 8 | 2041172882
> 7 | 9 | 2041172882
> 7 | 12 | 2041201779
> 7 | 13 | 2041201779
> 7 | 16 | 2089742733
> 7 | 17 | 2090021318
> 7 | 18 | 2090021318
> 7 | 47 | 2090021898
> 7 | 48 | 2090021898
> 7 | 49 | 2102749003
> 7 | 50 | 2103210571
> 7 | 51 | 2103210571
> 7 | 54 | 2154640913
> 7 | 55 | 2163849781
> 7 | 56 | 2295315714
> 7 | 57 | 2906288382
> 7 | 58 | 2906329443
> 7 | 60 | 3131766386
> 8 | 1 | 2089844462
> 8 | 2 | 2089844462
> 8 | 3 | 2089844463
> 8 | 6 | 2089844463
> 8 | 9 | 2295318868
> (25 rows)
in an earlier email?
Greetings,
Andres FreundSELECT heap_page_items(get_raw_page('pg_authid', 7)); heap_page_items------------------------------------------------------------ ------------------------------ ------------------------------ -------------------- (1,4720,1,108,1897434979,0,0,"(7,1)",11,2825,32, 111111111000000000000000000000 000000000001011000010011111100 001001111000,507769370) (2,4608,1,108,1897442758,0,18,"(7,2)",11,2825,32, 111111111000000000000000000000 000000000011000011101100000010 001001111000,507776451) (3,4496,1,108,1897442758,0,20,"(7,3)",11,2825,32, 111111111000000000000000000000 000000000000100011101100000010 001001111000,507776452) (4,4384,1,108,2040863716,0,37,"(7,4)",11,2313,32, 111111111000000000000000000000 000000000000110110111011100011 001011111000,525105004) (5,4272,1,108,2040863716,0,39,"(7,5)",11,2313,32, 111111111000000000000000000000 000000000010110110111011100011 001011111000,525105005) (6,0,3,0,,,,,,,,,)(7,0,3,0,,,,,,,,,)(8,4160,1,108,2041172882,0,49,"(7,8)",11,2313,32, 111111111000000000000000000000 000000000001110100101011000111 001011111000,525219118) (9,4048,1,108,2041172882,0,51,"(7,9)",11,2313,32, 111111111000000000000000000000 000000000011110100101011000111 001011111000,525219119) (10,0,3,0,,,,,,,,,)(11,0,3,0,,,,,,,,,)(12,3936,1,108,2041201779,0,181,"(7,12)",11,2313,32, 111111111000000000000000000000 000000000011010100010111100111 001011111000,525236779) (13,3824,1,108,2041201779,0,183,"(7,13)",11,2313,32, 111111111000000000000000000000 000000000000110100010111100111 001011111000,525236780) (14,0,3,0,,,,,,,,,)(15,0,3,0,,,,,,,,,)(16,3712,1,108,2089742733,0,0,"(7,16)",11,2313,32, 111111111000000000000000000000 000000000001000101110011100000 001111111000,532706210) (17,3600,1,108,2090021318,0,1,"(7,17)",11,2313,32, 111111111000000000000000000000 000000000001001100001101001000 001111111000,532753458) (18,3488,1,108,2090021318,0,3,"(7,18)",11,2313,32, 111111111000000000000000000000 000000000011001100001101001000 001111111000,532753459) (19,8080,1,108,753125991,0,2,"(7,19)",11,2825,32, 111111111000000000000000000000 000000000000110001010111001011 100001110000,236796556) (20,7968,1,108,753125991,0,4,"(7,20)",11,2825,32, 111111111000000000000000000000 000000000010110001010111001011 100001110000,236796557) (21,7856,1,108,753125992,0,19,"(7,21)",11,2825,32, 111111111000000000000000000000 000000000011001001010111001011 100001110000,236796563) (22,7744,1,108,753125992,0,21,"(7,22)",11,2825,32, 111111111000000000000000000000 000000000000101001010111001011 100001110000,236796564) (23,7632,1,108,753125993,0,36,"(7,23)",11,2825,32, 111111111000000000000000000000 000000000001011001010111001011 100001110000,236796570) (24,7520,1,108,753125993,0,38,"(7,24)",11,2825,32, 111111111000000000000000000000 000000000011011001010111001011 100001110000,236796571) (25,7408,1,108,753125994,0,53,"(7,25)",11,2825,32, 111111111000000000000000000000 000000000010000101010111001011 100001110000,236796577) (26,7296,1,108,753125994,0,55,"(7,26)",11,2825,32, 111111111000000000000000000000 000000000001000101010111001011 100001110000,236796578) (27,7184,1,108,753125995,0,70,"(7,27)",11,2825,32, 111111111000000000000000000000 000000000000010101010111001011 100001110000,236796584) (28,7072,1,108,753125995,0,72,"(7,28)",11,2825,32, 111111111000000000000000000000 000000000010010101010111001011 100001110000,236796585) (29,6960,1,108,753125997,0,87,"(7,29)",11,2825,32, 111111111000000000000000000000 000000000011001101010111001011 100001110000,236796595) (30,6848,1,108,753125997,0,89,"(7,30)",11,2825,32, 111111111000000000000000000000 000000000000101101010111001011 100001110000,236796596) (31,6736,1,108,753125998,0,104,"(7,31)",11,2825,32, 111111111000000000000000000000 000000000001011101010111001011 100001110000,236796602) (32,6624,1,108,753125998,0,106,"(7,32)",11,2825,32, 111111111000000000000000000000 000000000011011101010111001011 100001110000,236796603) (33,6512,1,108,753125999,0,121,"(7,33)",11,2825,32, 111111111000000000000000000000 000000000010000011010111001011 100001110000,236796609) (34,6400,1,108,753125999,0,123,"(7,34)",11,2825,32, 111111111000000000000000000000 000000000001000011010111001011 100001110000,236796610) (35,6288,1,108,753126000,0,138,"(7,35)",11,2825,32, 111111111000000000000000000000 000000000000010011010111001011 100001110000,236796616) (36,6176,1,108,753126000,0,140,"(7,36)",11,2825,32, 111111111000000000000000000000 000000000010010011010111001011 100001110000,236796617) (37,6064,1,108,753126001,0,155,"(7,37)",11,2825,32, 111111111000000000000000000000 000000000011110011010111001011 100001110000,236796623) (38,5952,1,108,753126001,0,157,"(7,38)",11,2825,32, 111111111000000000000000000000 000000000000001011010111001011 100001110000,236796624) (39,5840,1,108,753126002,0,172,"(7,39)",11,2825,32, 111111111000000000000000000000 000000000001101011010111001011 100001110000,236796630) (40,5728,1,108,753126002,0,174,"(7,40)",11,2825,32, 111111111000000000000000000000 000000000011101011010111001011 100001110000,236796631) (41,5616,1,108,753126003,0,189,"(7,41)",11,2825,32, 111111111000000000000000000000 000000000010111011010111001011 100001110000,236796637) (42,5504,1,108,753126003,0,191,"(7,42)",11,2825,32, 111111111000000000000000000000 000000000001111011010111001011 100001110000,236796638) (43,5392,1,108,753126004,0,206,"(7,43)",11,2825,32, 111111111000000000000000000000 000000000000100111010111001011 100001110000,236796644) (44,5280,1,108,753126004,0,208,"(7,44)",11,2825,32, 111111111000000000000000000000 000000000010100111010111001011 100001110000,236796645) (45,5168,1,108,753126005,0,223,"(7,45)",11,2825,32, 111111111000000000000000000000 000000000011010111010111001011 100001110000,236796651) (46,5056,1,108,753126005,0,225,"(7,46)",11,2825,32, 111111111000000000000000000000 000000000000110111010111001011 100001110000,236796652) (47,3376,1,108,2090021898,0,18,"(7,47)",11,2313,32, 111111111000000000000000000000 000000000001000111001101001000 001111111000,532753634) (48,3264,1,108,2090021898,0,20,"(7,48)",11,2313,32, 111111111000000000000000000000 000000000000100111001101001000 001111111000,532753636) (49,3152,1,108,2102749003,0,0,"(7,49)",11,2313,32, 111111111000000000000000000000 000000000001111011101110010000 110100000100,548445662) (50,3040,1,108,2103210571,0,107,"(7,50)",11,2313,32, 111111111000000000000000000000 000000000000010110101100100100 110100000100,548556136) (51,2928,1,108,2103210571,0,109,"(7,51)",11,2313,32, 111111111000000000000000000000 000000000010010110101100100100 110100000100,548556137) (52,0,3,0,,,,,,,,,)(53,0,3,0,,,,,,,,,)(54,2816,1,108,2154640913,0,0,"(7,54)",11,2313,32, 111111111000000000000000000000 000000000010111111010010000000 100010000100,554701565) (55,2704,1,108,2163849781,0,0,"(7,55)",11,2313,32, 111111111000000000000000000000 000000000000110000001011101001 100010000100,555316236) (56,2592,1,108,2295315714,0,0,"(7,56)",11,2313,32, 111111111000000000000000000000 000000000000100101010100001000 010001000100,572590756) (57,2480,1,108,2906288382,0,0,"(7,57)",11,2313,32, 111111111000000000000000000000 000000000000001000101101000100 111001111100,1047670032) (58,2368,1,108,2906329443,0,0,"(7,58)",11,2313,32, 111111111000000000000000000000 000000000001111001101101000100 111001111100,1047670174) (59,60,2,0,,,,,,,,,)(60,2224,1,144,3131766386,0,0,"(7,60)",32779,10507,32, 111111111100000000000000000000 000000000000001101010001010011 111100000010,1090298544) (61,0,0,0,,,,,,,,,)(62,0,0,0,,,,,,,,,)(63,0,0,0,,,,,,,,,)(64,0,0,0,,,,,,,,,)(65,0,0,0,,,,,,,,,)(66,0,0,0,,,,,,,,,)(67,0,0,0,,,,,,,,,)(68,0,0,0,,,,,,,,,)(69,4944,1,108,1034607755,0,1,"(7,69)",11,2825,32, 111111111000000000000000000000 000000000000010111011000010010 010100001000,279217896) (70,4832,1,108,1034607755,0,3,"(7,70)",11,2825,32, 111111111000000000000000000000 000000000010010111011000010010 010100001000,279217897) (70 rows)
Does the fact that a snapshot does not have this issue suggest it could be memory-related corruption and a db restart could clear it up?
Thanks,
Jeremy
Hi, On 2018-03-19 15:37:51 -0500, Jeremy Finzel wrote: > Does the fact that a snapshot does not have this issue suggest it could be > memory-related corruption and a db restart could clear it up? Could you show the page from the snapshot? I suspect it might just be a problem that's temporarily not visible as corrupted. Greetings, Andres Freund
On Mon, Mar 19, 2018 at 1:01 PM, Jeremy Finzel <finzelj@gmail.com> wrote: > SELECT heap_page_items(get_raw_page('pg_authid', 7)); Can you post this? SELECT * FROM page_header(get_raw_page('pg_authid', 7)); -- Peter Geoghegan
On Mon, Mar 19, 2018 at 3:53 PM, Peter Geoghegan <pg@bowt.ie> wrote:
On Mon, Mar 19, 2018 at 1:01 PM, Jeremy Finzel <finzelj@gmail.com> wrote:
> SELECT heap_page_items(get_raw_page('pg_authid', 7));
Can you post this?
SELECT * FROM page_header(get_raw_page('pg_authid', 7));
--
Peter Geoghegan
@Peter :
staging=# SELECT * FROM page_header(get_raw_page('pg_authid', 7));
lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid
----------------+----------+-------+-------+-------+---------+----------+---------+-----------
262B4/10FDC478 | 0 | 1 | 304 | 2224 | 8192 | 8192 | 4 | 0
(1 row)
@Andres :
This is from snapshot (on 9.5.12, but we didn't have the error either on a 9.5.11 snap):
heap_page_items
-----------------------------------------------------------------------------------------------------------------------------------------
(1,0,0,0,,,,,,,,,)
(2,0,0,0,,,,,,,,,)
(3,0,0,0,,,,,,,,,)
(4,0,0,0,,,,,,,,,)
(5,0,0,0,,,,,,,,,)
(6,8080,1,108,3137434815,0,0,"(7,6)",11,10505,32,111111111000000000000000000000000000000001111110111100001000010010000001,2166427518)
(7,7936,1,144,3137434816,0,0,"(7,7)",11,10507,32,111111111100000000000000000000000000000001110111000001110101101100001011,3504005358)
(8,0,0,0,,,,,,,,,)
(9,0,0,0,,,,,,,,,)
(10,7792,1,144,3137434817,0,0,"(7,10)",11,10507,32,111111111100000000000000000000000000000001010110001101000011011111101000,401353834)
(11,7680,1,108,3137434818,0,0,"(7,11)",11,10505,32,111111111000000000000000000000000000000001100011010010010001000001100001,2248708806)
(12,0,0,0,,,,,,,,,)
(13,0,0,0,,,,,,,,,)
(14,7568,1,108,3137434819,0,0,"(7,14)",11,10505,32,111111111000000000000000000000000000000001110011110011011011100010100101,2770187214)
(15,7456,1,108,3137434820,0,0,"(7,15)",11,10505,32,111111111000000000000000000000000000000011110001010001010011110010100001,2235343503)
(16,0,0,0,,,,,,,,,)
(17,0,0,0,,,,,,,,,)
(18,0,0,0,,,,,,,,,)
(19,0,0,0,,,,,,,,,)
(20,0,0,0,,,,,,,,,)
(21,0,0,0,,,,,,,,,)
(22,0,0,0,,,,,,,,,)
(23,0,0,0,,,,,,,,,)
(24,0,0,0,,,,,,,,,)
(25,0,0,0,,,,,,,,,)
(26,0,0,0,,,,,,,,,)
(27,0,0,0,,,,,,,,,)
(28,0,0,0,,,,,,,,,)
(29,0,0,0,,,,,,,,,)
(30,0,0,0,,,,,,,,,)
(31,0,0,0,,,,,,,,,)
(32,0,0,0,,,,,,,,,)
(33,0,0,0,,,,,,,,,)
(34,0,0,0,,,,,,,,,)
(35,0,0,0,,,,,,,,,)
(36,0,0,0,,,,,,,,,)
(37,0,0,0,,,,,,,,,)
(38,0,0,0,,,,,,,,,)
(39,0,0,0,,,,,,,,,)
(40,0,0,0,,,,,,,,,)
(41,0,0,0,,,,,,,,,)
(42,0,0,0,,,,,,,,,)
(43,0,0,0,,,,,,,,,)
(44,0,0,0,,,,,,,,,)
(45,0,0,0,,,,,,,,,)
(46,0,0,0,,,,,,,,,)
(47,0,0,0,,,,,,,,,)
(48,0,0,0,,,,,,,,,)
(49,0,0,0,,,,,,,,,)
(50,0,0,0,,,,,,,,,)
(51,0,0,0,,,,,,,,,)
(52,7344,1,108,3137434821,0,0,"(7,52)",11,10505,32,111111111000000000000000000000000000000011011011111110001010010101000001,2191859675)
(53,7232,1,108,3137434822,0,0,"(7,53)",11,10505,32,111111111000000000000000000000000000000001101011010111100110011011100100,661027542)
(54,0,0,0,,,,,,,,,)
(55,0,0,0,,,,,,,,,)
(56,0,0,0,,,,,,,,,)
(57,0,0,0,,,,,,,,,)
(58,0,0,0,,,,,,,,,)
(59,0,0,0,,,,,,,,,)
(60,0,0,0,,,,,,,,,)
(61,7120,1,108,3137434823,0,0,"(7,61)",11,10505,32,111111111000000000000000000000000000000000010111010110000101010111010100,732568296)
(62,6976,1,144,3137434824,0,0,"(7,62)",11,10507,32,111111111100000000000000000000000000000010100100001001001010110000010100,674571301)
(63,6864,1,108,3137434825,0,0,"(7,63)",11,10505,32,111111111000000000000000000000000000000010011011011111011100001001010001,2319695577)
(64,6720,1,144,3137434826,0,0,"(7,64)",11,10507,32,111111111100000000000000000000000000000001000010011001111011100100101000,345892418)
(65,6608,1,108,3137434827,0,0,"(7,65)",11,10505,32,111111111000000000000000000000000000000001000001010100010010101011001010,1398049410)
(66,6496,1,108,3137434828,0,0,"(7,66)",11,10505,32,111111111000000000000000000000000000000000101000111100110101001000101010,1414188820)
(67,6384,1,108,3137434829,0,0,"(7,67)",11,10505,32,111111111000000000000000000000000000000000110101011101111011001110101001,2513301164)
(68,0,0,0,,,,,,,,,)
(69,0,0,0,,,,,,,,,)
(70,0,0,0,,,,,,,,,)
(70 rows)
Thanks,
Jeremy
On Mon, Mar 19, 2018 at 1:55 PM, Jeremy Finzel <finzelj@gmail.com> wrote: > @Peter : > > staging=# SELECT * FROM page_header(get_raw_page('pg_authid', 7)); > lsn | checksum | flags | lower | upper | special | pagesize | > version | prune_xid > ----------------+----------+-------+-------+-------+---------+----------+---------+----------- > 262B4/10FDC478 | 0 | 1 | 304 | 2224 | 8192 | 8192 | > 4 | 0 > (1 row) Thanks. That looks normal. I wonder if the contents of that page looks consistent with the rest of the table following manual inspection, though. I recently saw system catalog corruption on a 9.5 instance where an entirely different relation's page ended up in pg_attribute and pg_depend. They were actually pristine index pages from an application index. I still have no idea why this happened. This is very much a guess, but it can't hurt to check if the contents of the tuples themselves are actually sane by inspecting them with "SELECT * FROM pg_authid". heap_page_items() doesn't actually care about the shape of the tuples in the page, so this might have been missed. -- Peter Geoghegan
On Mon, Mar 19, 2018 at 4:12 PM, Peter Geoghegan <pg@bowt.ie> wrote:
On Mon, Mar 19, 2018 at 1:55 PM, Jeremy Finzel <finzelj@gmail.com> wrote:
> @Peter :
>
> staging=# SELECT * FROM page_header(get_raw_page('pg_authid', 7));
> lsn | checksum | flags | lower | upper | special | pagesize |
> version | prune_xid
> ----------------+----------+-------+-------+-------+-------- -+----------+---------+------- ----
> 262B4/10FDC478 | 0 | 1 | 304 | 2224 | 8192 | 8192 |
> 4 | 0
> (1 row)
Thanks.
That looks normal. I wonder if the contents of that page looks
consistent with the rest of the table following manual inspection,
though. I recently saw system catalog corruption on a 9.5 instance
where an entirely different relation's page ended up in pg_attribute
and pg_depend. They were actually pristine index pages from an
application index. I still have no idea why this happened.
This is very much a guess, but it can't hurt to check if the contents
of the tuples themselves are actually sane by inspecting them with
"SELECT * FROM pg_authid". heap_page_items() doesn't actually care
about the shape of the tuples in the page, so this might have been
missed.
--
Peter Geoghegan
The data all looks fine. I even forced the index scan on both indexes which also looks fine.
Thanks,
Jeremy
On Mon, Mar 19, 2018 at 3:55 PM, Jeremy Finzel <finzelj@gmail.com> wrote:
On Mon, Mar 19, 2018 at 3:53 PM, Peter Geoghegan <pg@bowt.ie> wrote:On Mon, Mar 19, 2018 at 1:01 PM, Jeremy Finzel <finzelj@gmail.com> wrote:
> SELECT heap_page_items(get_raw_page('pg_authid', 7));
Can you post this?
SELECT * FROM page_header(get_raw_page('pg_authid', 7));
--
Peter Geoghegan@Peter :staging=# SELECT * FROM page_header(get_raw_page('pg_authid', 7)); lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid----------------+----------+-------+-------+-------+-------- -+----------+---------+------- ---- 262B4/10FDC478 | 0 | 1 | 304 | 2224 | 8192 | 8192 | 4 | 0(1 row)@Andres :This is from snapshot (on 9.5.12, but we didn't have the error either on a 9.5.11 snap):heap_page_items------------------------------------------------------------ ------------------------------ ------------------------------ ----------------- (1,0,0,0,,,,,,,,,)(2,0,0,0,,,,,,,,,)(3,0,0,0,,,,,,,,,)(4,0,0,0,,,,,,,,,)(5,0,0,0,,,,,,,,,)(6,8080,1,108,3137434815,0,0,"(7,6)",11,10505,32, 111111111000000000000000000000 000000000001111110111100001000 010010000001,2166427518) (7,7936,1,144,3137434816,0,0,"(7,7)",11,10507,32, 111111111100000000000000000000 000000000001110111000001110101 101100001011,3504005358) (8,0,0,0,,,,,,,,,)(9,0,0,0,,,,,,,,,)(10,7792,1,144,3137434817,0,0,"(7,10)",11,10507,32, 111111111100000000000000000000 000000000001010110001101000011 011111101000,401353834) (11,7680,1,108,3137434818,0,0,"(7,11)",11,10505,32, 111111111000000000000000000000 000000000001100011010010010001 000001100001,2248708806) (12,0,0,0,,,,,,,,,)(13,0,0,0,,,,,,,,,)(14,7568,1,108,3137434819,0,0,"(7,14)",11,10505,32, 111111111000000000000000000000 000000000001110011110011011011 100010100101,2770187214) (15,7456,1,108,3137434820,0,0,"(7,15)",11,10505,32, 111111111000000000000000000000 000000000011110001010001010011 110010100001,2235343503) (16,0,0,0,,,,,,,,,)(17,0,0,0,,,,,,,,,)(18,0,0,0,,,,,,,,,)(19,0,0,0,,,,,,,,,)(20,0,0,0,,,,,,,,,)(21,0,0,0,,,,,,,,,)(22,0,0,0,,,,,,,,,)(23,0,0,0,,,,,,,,,)(24,0,0,0,,,,,,,,,)(25,0,0,0,,,,,,,,,)(26,0,0,0,,,,,,,,,)(27,0,0,0,,,,,,,,,)(28,0,0,0,,,,,,,,,)(29,0,0,0,,,,,,,,,)(30,0,0,0,,,,,,,,,)(31,0,0,0,,,,,,,,,)(32,0,0,0,,,,,,,,,)(33,0,0,0,,,,,,,,,)(34,0,0,0,,,,,,,,,)(35,0,0,0,,,,,,,,,)(36,0,0,0,,,,,,,,,)(37,0,0,0,,,,,,,,,)(38,0,0,0,,,,,,,,,)(39,0,0,0,,,,,,,,,)(40,0,0,0,,,,,,,,,)(41,0,0,0,,,,,,,,,)(42,0,0,0,,,,,,,,,)(43,0,0,0,,,,,,,,,)(44,0,0,0,,,,,,,,,)(45,0,0,0,,,,,,,,,)(46,0,0,0,,,,,,,,,)(47,0,0,0,,,,,,,,,)(48,0,0,0,,,,,,,,,)(49,0,0,0,,,,,,,,,)(50,0,0,0,,,,,,,,,)(51,0,0,0,,,,,,,,,)(52,7344,1,108,3137434821,0,0,"(7,52)",11,10505,32, 111111111000000000000000000000 000000000011011011111110001010 010101000001,2191859675) (53,7232,1,108,3137434822,0,0,"(7,53)",11,10505,32, 111111111000000000000000000000 000000000001101011010111100110 011011100100,661027542) (54,0,0,0,,,,,,,,,)(55,0,0,0,,,,,,,,,)(56,0,0,0,,,,,,,,,)(57,0,0,0,,,,,,,,,)(58,0,0,0,,,,,,,,,)(59,0,0,0,,,,,,,,,)(60,0,0,0,,,,,,,,,)(61,7120,1,108,3137434823,0,0,"(7,61)",11,10505,32, 111111111000000000000000000000 000000000000010111010110000101 010111010100,732568296) (62,6976,1,144,3137434824,0,0,"(7,62)",11,10507,32, 111111111100000000000000000000 000000000010100100001001001010 110000010100,674571301) (63,6864,1,108,3137434825,0,0,"(7,63)",11,10505,32, 111111111000000000000000000000 000000000010011011011111011100 001001010001,2319695577) (64,6720,1,144,3137434826,0,0,"(7,64)",11,10507,32, 111111111100000000000000000000 000000000001000010011001111011 100100101000,345892418) (65,6608,1,108,3137434827,0,0,"(7,65)",11,10505,32, 111111111000000000000000000000 000000000001000001010100010010 101011001010,1398049410) (66,6496,1,108,3137434828,0,0,"(7,66)",11,10505,32, 111111111000000000000000000000 000000000000101000111100110101 001000101010,1414188820) (67,6384,1,108,3137434829,0,0,"(7,67)",11,10505,32, 111111111000000000000000000000 000000000000110101011101111011 001110101001,2513301164) (68,0,0,0,,,,,,,,,)(69,0,0,0,,,,,,,,,)(70,0,0,0,,,,,,,,,)(70 rows)Thanks,Jeremy
Any suggestions as to what I can do from this point? Is it feasible that a server restart would fix this, or are you saying it would just mask the problem? I can't reproduce it on a snapshot which is quite odd.
FWIW, maybe this is obvious, but the pages look identical on the streamers we have to the master.
Thanks,
Jeremy
On Tue, Mar 20, 2018 at 11:19 AM, Jeremy Finzel <finzelj@gmail.com> wrote:
On Mon, Mar 19, 2018 at 3:55 PM, Jeremy Finzel <finzelj@gmail.com> wrote:On Mon, Mar 19, 2018 at 3:53 PM, Peter Geoghegan <pg@bowt.ie> wrote:On Mon, Mar 19, 2018 at 1:01 PM, Jeremy Finzel <finzelj@gmail.com> wrote:
> SELECT heap_page_items(get_raw_page('pg_authid', 7));
Can you post this?
SELECT * FROM page_header(get_raw_page('pg_authid', 7));
--
Peter Geoghegan@Peter :staging=# SELECT * FROM page_header(get_raw_page('pg_authid', 7)); lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid----------------+----------+-------+-------+-------+-------- -+----------+---------+------- ---- 262B4/10FDC478 | 0 | 1 | 304 | 2224 | 8192 | 8192 | 4 | 0(1 row)@Andres :This is from snapshot (on 9.5.12, but we didn't have the error either on a 9.5.11 snap):heap_page_items------------------------------------------------------------ ------------------------------ ------------------------------ ----------------- (1,0,0,0,,,,,,,,,)(2,0,0,0,,,,,,,,,)(3,0,0,0,,,,,,,,,)(4,0,0,0,,,,,,,,,)(5,0,0,0,,,,,,,,,)(6,8080,1,108,3137434815,0,0,"(7,6)",11,10505,32,1111111110 000000000000000000000000000000 011111101111000010000100100000 01,2166427518) (7,7936,1,144,3137434816,0,0,"(7,7)",11,10507,32,1111111111 000000000000000000000000000000 011101110000011101011011000010 11,3504005358) (8,0,0,0,,,,,,,,,)(9,0,0,0,,,,,,,,,)(10,7792,1,144,3137434817,0,0,"(7,10)",11,10507,32,11111111 110000000000000000000000000000 000101011000110100001101111110 1000,401353834) (11,7680,1,108,3137434818,0,0,"(7,11)",11,10505,32,11111111 100000000000000000000000000000 000110001101001001000100000110 0001,2248708806) (12,0,0,0,,,,,,,,,)(13,0,0,0,,,,,,,,,)(14,7568,1,108,3137434819,0,0,"(7,14)",11,10505,32,11111111 100000000000000000000000000000 000111001111001101101110001010 0101,2770187214) (15,7456,1,108,3137434820,0,0,"(7,15)",11,10505,32,11111111 100000000000000000000000000000 001111000101000101001111001010 0001,2235343503) (16,0,0,0,,,,,,,,,)(17,0,0,0,,,,,,,,,)(18,0,0,0,,,,,,,,,)(19,0,0,0,,,,,,,,,)(20,0,0,0,,,,,,,,,)(21,0,0,0,,,,,,,,,)(22,0,0,0,,,,,,,,,)(23,0,0,0,,,,,,,,,)(24,0,0,0,,,,,,,,,)(25,0,0,0,,,,,,,,,)(26,0,0,0,,,,,,,,,)(27,0,0,0,,,,,,,,,)(28,0,0,0,,,,,,,,,)(29,0,0,0,,,,,,,,,)(30,0,0,0,,,,,,,,,)(31,0,0,0,,,,,,,,,)(32,0,0,0,,,,,,,,,)(33,0,0,0,,,,,,,,,)(34,0,0,0,,,,,,,,,)(35,0,0,0,,,,,,,,,)(36,0,0,0,,,,,,,,,)(37,0,0,0,,,,,,,,,)(38,0,0,0,,,,,,,,,)(39,0,0,0,,,,,,,,,)(40,0,0,0,,,,,,,,,)(41,0,0,0,,,,,,,,,)(42,0,0,0,,,,,,,,,)(43,0,0,0,,,,,,,,,)(44,0,0,0,,,,,,,,,)(45,0,0,0,,,,,,,,,)(46,0,0,0,,,,,,,,,)(47,0,0,0,,,,,,,,,)(48,0,0,0,,,,,,,,,)(49,0,0,0,,,,,,,,,)(50,0,0,0,,,,,,,,,)(51,0,0,0,,,,,,,,,)(52,7344,1,108,3137434821,0,0,"(7,52)",11,10505,32,11111111 100000000000000000000000000000 001101101111111000101001010100 0001,2191859675) (53,7232,1,108,3137434822,0,0,"(7,53)",11,10505,32,11111111 100000000000000000000000000000 000110101101011110011001101110 0100,661027542) (54,0,0,0,,,,,,,,,)(55,0,0,0,,,,,,,,,)(56,0,0,0,,,,,,,,,)(57,0,0,0,,,,,,,,,)(58,0,0,0,,,,,,,,,)(59,0,0,0,,,,,,,,,)(60,0,0,0,,,,,,,,,)(61,7120,1,108,3137434823,0,0,"(7,61)",11,10505,32,11111111 100000000000000000000000000000 000001011101011000010101011101 0100,732568296) (62,6976,1,144,3137434824,0,0,"(7,62)",11,10507,32,11111111 110000000000000000000000000000 001010010000100100101011000001 0100,674571301) (63,6864,1,108,3137434825,0,0,"(7,63)",11,10505,32,11111111 100000000000000000000000000000 001001101101111101110000100101 0001,2319695577) (64,6720,1,144,3137434826,0,0,"(7,64)",11,10507,32,11111111 110000000000000000000000000000 000100001001100111101110010010 1000,345892418) (65,6608,1,108,3137434827,0,0,"(7,65)",11,10505,32,11111111 100000000000000000000000000000 000100000101010001001010101100 1010,1398049410) (66,6496,1,108,3137434828,0,0,"(7,66)",11,10505,32,11111111 100000000000000000000000000000 000010100011110011010100100010 1010,1414188820) (67,6384,1,108,3137434829,0,0,"(7,67)",11,10505,32,11111111 100000000000000000000000000000 000011010101110111101100111010 1001,2513301164) (68,0,0,0,,,,,,,,,)(69,0,0,0,,,,,,,,,)(70,0,0,0,,,,,,,,,)(70 rows)Thanks,JeremyAny suggestions as to what I can do from this point? Is it feasible that a server restart would fix this, or are you saying it would just mask the problem? I can't reproduce it on a snapshot which is quite odd.FWIW, maybe this is obvious, but the pages look identical on the streamers we have to the master.Thanks,Jeremy
A server restart and upgrade to 9.5.12 (at the same time), as expected, made the issue go away. Still doesn't give us any answers as to what happened or if it would happen again! Thanks for the feeback.
Jeremy
On Wed, Mar 21, 2018 at 1:38 PM, Jeremy Finzel <finzelj@gmail.com> wrote: > A server restart and upgrade to 9.5.12 (at the same time), as expected, made > the issue go away. Still doesn't give us any answers as to what happened or > if it would happen again! Thanks for the feeback. You may still want to use amcheck to look for problems. The version on Github works with 9.5, and there are Redhat and Debian pgdg packages. See: https://github.com/petergeoghegan/amcheck The "heapallindexed" option will be of particular interest to you - that option verifies that the table has matching rows for a target index (in addition to testing the structure of a target B-Tree index itself). This is probably the best general test for corruption that is available. There is a fair chance that this will reveal new information. -- Peter Geoghegan
On Wed, Mar 21, 2018 at 4:29 PM, Peter Geoghegan <pg@bowt.ie> wrote:
On Wed, Mar 21, 2018 at 1:38 PM, Jeremy Finzel <finzelj@gmail.com> wrote:
> A server restart and upgrade to 9.5.12 (at the same time), as expected, made
> the issue go away. Still doesn't give us any answers as to what happened or
> if it would happen again! Thanks for the feeback.
You may still want to use amcheck to look for problems. The version on
Github works with 9.5, and there are Redhat and Debian pgdg packages.
See:
https://github.com/petergeoghegan/amcheck
The "heapallindexed" option will be of particular interest to you -
that option verifies that the table has matching rows for a target
index (in addition to testing the structure of a target B-Tree index
itself). This is probably the best general test for corruption that is
available. There is a fair chance that this will reveal new
information.
--
Peter Geoghegan
Thank you for the recommendation. I ran both amcheck functions on all 4 indexes of those 2 tables with heapallindexed = true, but no issues were found.
Thanks,
Jeremy
I admit I'm pretty surprised by this whole episode. I have no useful advice to offer here. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Mar 22, 2018 at 12:27 PM, Jeremy Finzel <finzelj@gmail.com> wrote: > Thank you for the recommendation. I ran both amcheck functions on all 4 > indexes of those 2 tables with heapallindexed = true, but no issues were > found. Probably wouldn't hurt to run it against all indexes, if you can make time for that. If you can generalize from the example query that calls the bt_index_check() function, but set "heapallindexed=>i.indisprimary" and remove "n.nspname = 'pg_catalog'", as well as "LIMIT 10". This will test tables and indexes from all schemas, which might be interesting. -- Peter Geoghegan
On Thu, Mar 22, 2018 at 3:20 PM, Peter Geoghegan <pg@bowt.ie> wrote:
On Thu, Mar 22, 2018 at 12:27 PM, Jeremy Finzel <finzelj@gmail.com> wrote:
> Thank you for the recommendation. I ran both amcheck functions on all 4
> indexes of those 2 tables with heapallindexed = true, but no issues were
> found.
Probably wouldn't hurt to run it against all indexes, if you can make
time for that. If you can generalize from the example query that calls
the bt_index_check() function, but set
"heapallindexed=>i.indisprimary" and remove "n.nspname =
'pg_catalog'", as well as "LIMIT 10". This will test tables and
indexes from all schemas, which might be interesting.
--
Peter Geoghegan
I am running this on a san snapshot of our production system. I assume that this will give me a valid check for file-system-level corruption. I am going to kick it off and see if I find anything interesting.
Thanks,
Jeremy
On Thu, Mar 22, 2018 at 2:24 PM, Jeremy Finzel <finzelj@gmail.com> wrote: > I am running this on a san snapshot of our production system. I assume that > this will give me a valid check for file-system-level corruption. I am > going to kick it off and see if I find anything interesting. It might. Note that SAN snapshots might have corruption hazards, though. Postgres expects crash consistency across all filesystems, so you might run into trouble if you had a separate filesystem for WAL, for example. I know that LVM snapshots only provide a consistent view of a single logical volume, even though many LVM + Postgres setups will involve multiple logical volumes. This makes it possible for a small inconsistency across logical volumes to corrupt data. I don't know anything about your SAN snapshotting, but this is at least something to consider. -- Peter Geoghegan
Hi everyone,
I just got the same issue on 9.6.8:
2018-05-15 11:52:01 MSK 33558 @ from [vxid:317/92895305 txid:0] [] ERROR: found xmin 2808837517 from before relfrozenxid 248712603
2018-05-15 11:52:01 MSK 33558 @ from [vxid:317/92895305 txid:0] [] CONTEXT: automatic vacuum of table "template0.pg_catalog.pg_authid"
2018-05-15 11:52:01 MSK 33558 @ from [vxid:317/92895305 txid:0] [] ERROR: found xmin 2808837517 from before relfrozenxid 248712603
2018-05-15 11:52:01 MSK 33558 @ from [vxid:317/92895305 txid:0] [] CONTEXT: automatic vacuum of table "template0.pg_catalog.pg_authid"
Additional details:
1) bt_index_check - no errors on both indexes.
1) bt_index_check - no errors on both indexes.
2) SELECT pg_truncate_visibility_map('pg_authid'::regclass); + vacuum freeze - doesn't help.
3)Total 6 rows affected:
template1=# select oid, ctid, xmin, xmax, cmin, cmax from pg_authid where xmin::text::bigint > (select relfrozenxid::text::bigint from pg_class where relname='pg_authid');
oid | ctid | xmin | xmax | cmin | cmax
-----------+--------+------------+------+------+------
183671986 | (0,90) | 3039161773 | 0 | 0 | 0
183106699 | (1,48) | 2576823237 | 0 | 0 | 0
183921770 | (1,50) | 3265971811 | 0 | 0 | 0
183921914 | (1,52) | 3266122344 | 0 | 0 | 0
187988173 | (1,58) | 4258893789 | 0 | 0 | 0
182424977 | (1,62) | 2808837517 | 0 | 0 | 0
template1=# select oid, ctid, xmin, xmax, cmin, cmax from pg_authid where xmin::text::bigint > (select relfrozenxid::text::bigint from pg_class where relname='pg_authid');
oid | ctid | xmin | xmax | cmin | cmax
-----------+--------+------------+------+------+------
183671986 | (0,90) | 3039161773 | 0 | 0 | 0
183106699 | (1,48) | 2576823237 | 0 | 0 | 0
183921770 | (1,50) | 3265971811 | 0 | 0 | 0
183921914 | (1,52) | 3266122344 | 0 | 0 | 0
187988173 | (1,58) | 4258893789 | 0 | 0 | 0
182424977 | (1,62) | 2808837517 | 0 | 0 | 0
on total two pages.
4) template1=# select relfrozenxid from pg_catalog.pg_class where relname = 'pg_authid';
relfrozenxid
--------------
2548304492
4) template1=# select relfrozenxid from pg_catalog.pg_class where relname = 'pg_authid';
relfrozenxid
--------------
2548304492
5)Rows itself looks pretty valid and correspond to the actual database users.
7)No database/server crash happened last few years, no disk errors/problems.
I feel it could be related with vacuum skip locked pages patch + freeze + shared catalog combination, but cannot prove it yet.
Looking for possible course of action.
Probably simplest fix - drop and recreate these 6 affected users, but so far I willing spent some time research into this issue.
--
Maxim Boguk
Senior Postgresql DBA
http://dataegret.com/
Phone RU: +7 985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно когда я так делаю ещё раз?"
Senior Postgresql DBA
http://dataegret.com/
Phone RU: +7 985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно когда я так делаю ещё раз?"
On 2018-05-15 11:06:38 +0200, Maxim Boguk wrote: > Hi everyone, > > I just got the same issue on 9.6.8: > > 2018-05-15 11:52:01 MSK 33558 @ from [vxid:317/92895305 txid:0] [] ERROR: > found xmin 2808837517 from before relfrozenxid 248712603 > 2018-05-15 11:52:01 MSK 33558 @ from [vxid:317/92895305 txid:0] [] > CONTEXT: automatic vacuum of table "template0.pg_catalog.pg_authid" > > Additional details: > > 1) bt_index_check - no errors on both indexes. > > 2) SELECT pg_truncate_visibility_map('pg_authid'::regclass); + vacuum > freeze - doesn't help. Yea, too late. > 3)Total 6 rows affected: > template1=# select oid, ctid, xmin, xmax, cmin, cmax from pg_authid where > xmin::text::bigint > (select relfrozenxid::text::bigint from pg_class where > relname='pg_authid'); > oid | ctid | xmin | xmax | cmin | cmax > -----------+--------+------------+------+------+------ > 183671986 | (0,90) | 3039161773 | 0 | 0 | 0 > 183106699 | (1,48) | 2576823237 | 0 | 0 | 0 > 183921770 | (1,50) | 3265971811 | 0 | 0 | 0 > 183921914 | (1,52) | 3266122344 | 0 | 0 | 0 > 187988173 | (1,58) | 4258893789 | 0 | 0 | 0 > 182424977 | (1,62) | 2808837517 | 0 | 0 | 0 > on total two pages. > > 4) template1=# select relfrozenxid from pg_catalog.pg_class where relname = > 'pg_authid'; > relfrozenxid > -------------- > 2548304492 > > > 5)Rows itself looks pretty valid and correspond to the actual database > users. > 7)No database/server crash happened last few years, no disk errors/problems. > > I feel it could be related with vacuum skip locked pages patch + freeze + > shared catalog combination, but cannot prove it yet. > > Looking for possible course of action. > Probably simplest fix - drop and recreate these 6 affected users, but so > far I willing spent some time research into this issue. Could you use pageinspect to get the infomasks for the affected tuples? Greetings, Andres Freund
Hi Andres,
Could you use pageinspect to get the infomasks for the affected tuples?>
> Looking for possible course of action.
> Probably simplest fix - drop and recreate these 6 affected users, but so
> far I willing spent some time research into this issue.
Greetings,
Andres Freund
For sample:
postgres=# vacuum pg_catalog.pg_authid;
ERROR: found xmin 2894889518 from before relfrozenxid 248712603
ERROR: found xmin 2894889518 from before relfrozenxid 248712603
select ctid, xmin, xmax, cmin, cmax from pg_catalog.pg_authid where xmin::text::bigint=2894889518;
ctid | xmin | xmax | cmin | cmax
--------+------------+------+------+------
(1,26) | 2894889518 | 0 | 0 | 0
ctid | xmin | xmax | cmin | cmax
--------+------------+------+------+------
(1,26) | 2894889518 | 0 | 0 | 0
postgres=# SELECT * FROM heap_page_items(get_raw_page('pg_authid', 1)) where t_ctid::text='(1,26)';
-[ RECORD 1 ]-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
lp | 26
lp_off | 4656
lp_flags | 1
lp_len | 144
t_xmin | 2894889518
t_xmax | 0
t_field3 | 0
t_ctid | (1,26)
t_infomask2 | 32779
t_infomask | 10507
t_hoff | 32
t_bits | 1111111111000000
t_oid | 189787727
-[ RECORD 1 ]-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
lp | 26
lp_off | 4656
lp_flags | 1
lp_len | 144
t_xmin | 2894889518
t_xmax | 0
t_field3 | 0
t_ctid | (1,26)
t_infomask2 | 32779
t_infomask | 10507
t_hoff | 32
t_bits | 1111111111000000
t_oid | 189787727
Any new role created in DB instantly affected by this issue.
In the same time:
select relfrozenxid from pg_class where relname='pg_authid';
relfrozenxid
--------------
2863429136
relfrozenxid
--------------
2863429136
So it's interesting where value of " from before relfrozenxid 248712603" come from.
--
Maxim Boguk
Senior Postgresql DBA
http://dataegret.com/
Phone RU: +7 985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно когда я так делаю ещё раз?"
Senior Postgresql DBA
http://dataegret.com/
Phone RU: +7 985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно когда я так делаю ещё раз?"
Hi, On 2018-05-22 21:30:43 +0300, Maxim Boguk wrote: > For sample: > > postgres=# vacuum pg_catalog.pg_authid; > ERROR: found xmin 2894889518 from before relfrozenxid 248712603 > > select ctid, xmin, xmax, cmin, cmax from pg_catalog.pg_authid where > xmin::text::bigint=2894889518; > ctid | xmin | xmax | cmin | cmax > --------+------------+------+------+------ > (1,26) | 2894889518 | 0 | 0 | 0 > > > postgres=# SELECT * FROM heap_page_items(get_raw_page('pg_authid', 1)) > where t_ctid::text='(1,26)'; > -[ RECORD 1 > ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > lp | 26 > lp_off | 4656 > lp_flags | 1 > lp_len | 144 > t_xmin | 2894889518 > t_xmax | 0 > t_field3 | 0 > t_ctid | (1,26) > t_infomask2 | 32779 > t_infomask | 10507 > t_hoff | 32 > t_bits | 1111111111000000 > t_oid | 189787727 > So this row has, if I didn't screw up decoding the following infomask bits set:: HEAP_HASNULL HEAP_HASVARWIDTH HEAP_HASOID HEAP_XMIN_COMMITTED HEAP_XMAX_INVALID HEAP_UPDATED So it's not been previously frozen, which I was wondering about. > Which indeed makes it > Any new role created in DB instantly affected by this issue. What's txid_current()? > > In the same time: > > select relfrozenxid from pg_class where relname='pg_authid'; > relfrozenxid > -------------- > 2863429136 > > So it's interesting where value of " from before relfrozenxid 248712603" > come from. Hm. That's indeed odd. Could you get a backtrace of the error with "bt full" of the error? Greetings, Andres Freund
On Tue, May 22, 2018 at 9:47 PM, Andres Freund <andres@anarazel.de> wrote:
Hi,So this row has, if I didn't screw up decoding the following infomask
On 2018-05-22 21:30:43 +0300, Maxim Boguk wrote:
> For sample:
>
> postgres=# vacuum pg_catalog.pg_authid;
> ERROR: found xmin 2894889518 from before relfrozenxid 248712603
>
> select ctid, xmin, xmax, cmin, cmax from pg_catalog.pg_authid where
> xmin::text::bigint=2894889518;
> ctid | xmin | xmax | cmin | cmax
> --------+------------+------+------+------
> (1,26) | 2894889518 | 0 | 0 | 0
>
>
> postgres=# SELECT * FROM heap_page_items(get_raw_page('pg_authid', 1))
> where t_ctid::text='(1,26)';
> -[ RECORD 1
> ]----------------------------------------------------------- ------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------
> lp | 26
> lp_off | 4656
> lp_flags | 1
> lp_len | 144
> t_xmin | 2894889518
> t_xmax | 0
> t_field3 | 0
> t_ctid | (1,26)
> t_infomask2 | 32779
> t_infomask | 10507
> t_hoff | 32
> t_bits | 1111111111000000
> t_oid | 189787727
>
bits set::
HEAP_HASNULL
HEAP_HASVARWIDTH
HEAP_HASOID
HEAP_XMIN_COMMITTED
HEAP_XMAX_INVALID
HEAP_UPDATED
So it's not been previously frozen, which I was wondering about.
> Which indeed makes it
> Any new role created in DB instantly affected by this issue.
What's txid_current()?
>
> In the same time:
>
> select relfrozenxid from pg_class where relname='pg_authid';
> relfrozenxid
> --------------
> 2863429136
>
> So it's interesting where value of " from before relfrozenxid 248712603"
> come from.
Hm. That's indeed odd. Could you get a backtrace of the error with "bt
full" of the error?
Greetings,
Andres Freund
select txid_current();
txid_current
--------------
41995913769
txid_current
--------------
41995913769
About gdb bt - it's tricky because it is mission critical master db of huge project.
I'll will try promote backup replica and check is issue persist there and if yes - we will have our playground for a while, but it will require sometime to arrange.
--
Maxim Boguk
Senior Postgresql DBA
http://dataegret.com/
Phone RU: +7 985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно когда я так делаю ещё раз?"
Senior Postgresql DBA
http://dataegret.com/
Phone RU: +7 985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно когда я так делаю ещё раз?"
Hi, On 2018-05-22 22:18:15 +0300, Maxim Boguk wrote: > On Tue, May 22, 2018 at 9:47 PM, Andres Freund <andres@anarazel.de> wrote: > > > select relfrozenxid from pg_class where relname='pg_authid'; > > > relfrozenxid > > > -------------- > > > 2863429136 > select txid_current(); > txid_current > -------------- > 41995913769 So that's an xid of 3341208114, if you leave the epoch out. What's autovacuum_freeze_max_age set to in that cluster? Can you show pg_controldata output, and relminmxid from that cluster? I might be daft here, but it's surely curious that the relfrozenxid from the error and pg_catalog are really different (number of digits): catalog: 2863429136 error: 248712603 > About gdb bt - it's tricky because it is mission critical master db of > huge project. > I'll will try promote backup replica and check is issue persist there and > if yes - we will have our playground for a while, but it will require > sometime to arrange. You should be ok to just bt that in the running cluster, but I definitely understand if you don't want to do that... I'd appreciate if you set up the a playground, because this seems like something that'll reappear. Greetings, Andres Freund
On Tue, May 22, 2018 at 10:30 PM, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2018-05-22 22:18:15 +0300, Maxim Boguk wrote:
> On Tue, May 22, 2018 at 9:47 PM, Andres Freund <andres@anarazel.de> wrote:
> > > select relfrozenxid from pg_class where relname='pg_authid';
> > > relfrozenxid
> > > --------------
> > > 2863429136
> select txid_current();
> txid_current
> --------------
> 41995913769
So that's an xid of 3341208114, if you leave the epoch out. What'sautovacuum_freeze_max_age set to in that cluster?
postgres=# show autovacuum_freeze_max_age;
autovacuum_freeze_max_age
---------------------------
200000000
autovacuum_freeze_max_age
---------------------------
200000000
(default value I think)
Can you show pg_controldata output, andrelminmxid from that cluster?
postgres@db:~$ /usr/lib/postgresql/9.6/bin/pg_controldata -D /var/lib/postgresql/9.6/main
pg_control version number: 960
Catalog version number: 201608131
Database system identifier: 6469368654711450114
Database cluster state: in production
pg_control last modified: Tue 22 May 2018 10:20:14 PM MSK
Latest checkpoint location: CCB5/F9C37950
Prior checkpoint location: CCB0/43F316B0
Latest checkpoint's REDO location: CCB1/6706BD88
Latest checkpoint's REDO WAL file: 000000010000CCB100000067
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 9:3341161759
Latest checkpoint's NextOID: 190071899
Latest checkpoint's NextMultiXactId: 59416233
Latest checkpoint's NextMultiOffset: 215588532
Latest checkpoint's oldestXID: 2814548646
Latest checkpoint's oldestXID's DB: 16400
Latest checkpoint's oldestActiveXID: 3341161759
Latest checkpoint's oldestMultiXid: 54264778
Latest checkpoint's oldestMulti's DB: 16400
Latest checkpoint's oldestCommitTsXid:2814548646
Latest checkpoint's newestCommitTsXid:3341161758
Time of latest checkpoint: Tue 22 May 2018 10:05:16 PM MSK
Fake LSN counter for unlogged rels: 0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
wal_level setting: replica
wal_log_hints setting: on
max_connections setting: 2000
max_worker_processes setting: 8
max_prepared_xacts setting: 0
max_locks_per_xact setting: 64
track_commit_timestamp setting: on
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value
Data page checksum version: 0
pg_control version number: 960
Catalog version number: 201608131
Database system identifier: 6469368654711450114
Database cluster state: in production
pg_control last modified: Tue 22 May 2018 10:20:14 PM MSK
Latest checkpoint location: CCB5/F9C37950
Prior checkpoint location: CCB0/43F316B0
Latest checkpoint's REDO location: CCB1/6706BD88
Latest checkpoint's REDO WAL file: 000000010000CCB100000067
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 9:3341161759
Latest checkpoint's NextOID: 190071899
Latest checkpoint's NextMultiXactId: 59416233
Latest checkpoint's NextMultiOffset: 215588532
Latest checkpoint's oldestXID: 2814548646
Latest checkpoint's oldestXID's DB: 16400
Latest checkpoint's oldestActiveXID: 3341161759
Latest checkpoint's oldestMultiXid: 54264778
Latest checkpoint's oldestMulti's DB: 16400
Latest checkpoint's oldestCommitTsXid:2814548646
Latest checkpoint's newestCommitTsXid:3341161758
Time of latest checkpoint: Tue 22 May 2018 10:05:16 PM MSK
Fake LSN counter for unlogged rels: 0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
wal_level setting: replica
wal_log_hints setting: on
max_connections setting: 2000
max_worker_processes setting: 8
max_prepared_xacts setting: 0
max_locks_per_xact setting: 64
track_commit_timestamp setting: on
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value
Data page checksum version: 0
postgres=# select datname,datfrozenxid,datminmxid from pg_database order by datname;
datname | datfrozenxid | datminmxid
-----------+--------------+------------
** | 2815939794 | 54265194
postgres | 2863429136 | 54280819
template0 | 3148297669 | 59342106
template1 | 2816765546 | 59261794
datname | datfrozenxid | datminmxid
-----------+--------------+------------
** | 2815939794 | 54265194
postgres | 2863429136 | 54280819
template0 | 3148297669 | 59342106
template1 | 2816765546 | 59261794
I might be daft here, but it's surely curious that the relfrozenxid from
the error and pg_catalog are really different (number of digits):
catalog: 2863429136
error: 248712603
> About gdb bt - it's tricky because it is mission critical master db of
> huge project.
> I'll will try promote backup replica and check is issue persist there and
> if yes - we will have our playground for a while, but it will require
> sometime to arrange.
You should be ok to just bt that in the running cluster, but I
definitely understand if you don't want to do that... I'd appreciate if
you set up the a playground, because this seems like something that'll
reappear.
My gdb kunf-fu exceptionally rusty now, so I better play in sandbox.
And if error did't exist on fresh promoted replica it will give us useful info as well.
--
Maxim Boguk
Senior Postgresql DBA
http://dataegret.com/
Phone RU: +7 985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно когда я так делаю ещё раз?"
Senior Postgresql DBA
http://dataegret.com/
Phone RU: +7 985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно когда я так делаю ещё раз?"
On Tue, May 22, 2018 at 2:42 PM, Maxim Boguk <maxim.boguk@gmail.com> wrote:
On Tue, May 22, 2018 at 10:30 PM, Andres Freund <andres@anarazel.de> wrote:Hi,
On 2018-05-22 22:18:15 +0300, Maxim Boguk wrote:
> On Tue, May 22, 2018 at 9:47 PM, Andres Freund <andres@anarazel.de> wrote:
> > > select relfrozenxid from pg_class where relname='pg_authid';
> > > relfrozenxid
> > > --------------
> > > 2863429136
> select txid_current();
> txid_current
> --------------
> 41995913769
So that's an xid of 3341208114, if you leave the epoch out. What'sautovacuum_freeze_max_age set to in that cluster?postgres=# show autovacuum_freeze_max_age;
autovacuum_freeze_max_age
---------------------------
200000000(default value I think)Can you show pg_controldata output, andrelminmxid from that cluster?postgres@db:~$ /usr/lib/postgresql/9.6/bin/pg_controldata -D /var/lib/postgresql/9.6/main
pg_control version number: 960
Catalog version number: 201608131
Database system identifier: 6469368654711450114
Database cluster state: in production
pg_control last modified: Tue 22 May 2018 10:20:14 PM MSK
Latest checkpoint location: CCB5/F9C37950
Prior checkpoint location: CCB0/43F316B0
Latest checkpoint's REDO location: CCB1/6706BD88
Latest checkpoint's REDO WAL file: 000000010000CCB100000067
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 9:3341161759
Latest checkpoint's NextOID: 190071899
Latest checkpoint's NextMultiXactId: 59416233
Latest checkpoint's NextMultiOffset: 215588532
Latest checkpoint's oldestXID: 2814548646
Latest checkpoint's oldestXID's DB: 16400
Latest checkpoint's oldestActiveXID: 3341161759
Latest checkpoint's oldestMultiXid: 54264778
Latest checkpoint's oldestMulti's DB: 16400
Latest checkpoint's oldestCommitTsXid:2814548646
Latest checkpoint's newestCommitTsXid:3341161758
Time of latest checkpoint: Tue 22 May 2018 10:05:16 PM MSK
Fake LSN counter for unlogged rels: 0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
wal_level setting: replica
wal_log_hints setting: on
max_connections setting: 2000
max_worker_processes setting: 8
max_prepared_xacts setting: 0
max_locks_per_xact setting: 64
track_commit_timestamp setting: on
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value
Data page checksum version: 0
postgres=# select datname,datfrozenxid,datminmxid from pg_database order by datname;
datname | datfrozenxid | datminmxid
-----------+--------------+------------
** | 2815939794 | 54265194
postgres | 2863429136 | 54280819
template0 | 3148297669 | 59342106
template1 | 2816765546 | 59261794
I might be daft here, but it's surely curious that the relfrozenxid from
the error and pg_catalog are really different (number of digits):
catalog: 2863429136
error: 248712603
> About gdb bt - it's tricky because it is mission critical master db of
> huge project.
> I'll will try promote backup replica and check is issue persist there and
> if yes - we will have our playground for a while, but it will require
> sometime to arrange.
You should be ok to just bt that in the running cluster, but I
definitely understand if you don't want to do that... I'd appreciate if
you set up the a playground, because this seems like something that'll
reappear.My gdb kunf-fu exceptionally rusty now, so I better play in sandbox.And if error did't exist on fresh promoted replica it will give us useful info as well.
--Maxim Boguk
Senior Postgresql DBA
http://dataegret.com/
Phone RU: +7 985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно когда я так делаю ещё раз?"
All -
We just noticed that this exact same problem has now returned for us. It has exactly the same symptoms as my original post (including the fact that a snapshot does not have the issue).
I am open to all suggestions as to troubleshooting this further!
Thank you,
Jeremy
> About gdb bt - it's tricky because it is mission critical master db of
> huge project.
> I'll will try promote backup replica and check is issue persist there and
> if yes - we will have our playground for a while, but it will require
> sometime to arrange.
You should be ok to just bt that in the running cluster, but I
definitely understand if you don't want to do that... I'd appreciate if
you set up the a playground, because this seems like something that'll
reappear.
Greetings,
Andres Freund
Ok this issue reproducible on promoted replica.
So now I have my playground.
Will provide gdb info in few hours.
--
Maxim Boguk
Senior Postgresql DBA
http://dataegret.com/
Phone RU: +7 985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно когда я так делаю ещё раз?"
Senior Postgresql DBA
http://dataegret.com/
Phone RU: +7 985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно когда я так делаю ещё раз?"
On Thu, May 24, 2018 at 12:38 PM, Maxim Boguk <maxim.boguk@gmail.com> wrote:
> About gdb bt - it's tricky because it is mission critical master db of
> huge project.
> I'll will try promote backup replica and check is issue persist there and
> if yes - we will have our playground for a while, but it will require
> sometime to arrange.
You should be ok to just bt that in the running cluster, but I
definitely understand if you don't want to do that... I'd appreciate if
you set up the a playground, because this seems like something that'll
reappear.
Greetings,
Andres Freundbt full :
#0 errmsg_internal (fmt=0x555b62e6eb70 "found xmin %u from before relfrozenxid %u") at /build/postgresql-9.6-9VrjLn/postgresql-9.6-9.6.8/build/../src/backend/utils/error/elog.c:828
edata = <optimized out>
__func__ = "errmsg_internal"
#1 0x0000555b62ad1cb9 in heap_prepare_freeze_tuple (tuple=<optimized out>, relfrozenxid=relfrozenxid@entry=248720453, relminmxid=relminmxid@entry=53644256, cutoff_xid=3485221679,
cutoff_multi=<optimized out>, frz=frz@entry=0x555b640d1988, totally_frozen_p=0x7ffca32c0e90 "\001")
at /build/postgresql-9.6-9VrjLn/postgresql-9.6-9.6.8/build/../src/backend/access/heap/heapam.c:6685
changed = <optimized out>
freeze_xmax = 0 '\000'
xid = <optimized out>
totally_frozen = <optimized out>
__func__ = "heap_prepare_freeze_tuple"
#2 0x0000555b62bfd2b5 in lazy_scan_heap (aggressive=0 '\000', nindexes=2, Irel=0x555b64095948, vacrelstats=<optimized out>, options=26, onerel=0x555b64029498)
at /build/postgresql-9.6-9VrjLn/postgresql-9.6-9.6.8/build/../src/backend/commands/vacuumlazy.c:1090
tuple_totally_frozen = 1 '\001'
itemid = 0x7f2b7bc5d4fc
buf = 18138842
page = <optimized out>
offnum = 26
maxoff = 26
hastup = 1 '\001'
nfrozen = 0
freespace = <optimized out>
all_frozen = 1 '\001'
tupgone = 0 '\000'
prev_dead_count = 0
all_visible_according_to_vm = 0 '\000'
all_visible = 1 '\001'
has_dead_tuples = 0 '\000'
visibility_cutoff_xid = 3490221678
relname = 0x555b640296a8 "pg_authid"
relminmxid = 53644256
tups_vacuumed = 0
indstats = 0x555b64095928
relfrozenxid = 248720453
skipping_blocks = 0 '\000'
frozen = 0x555b640d1988
initprog_val = {1, 2, 582}
vacuumed_pages = 0
num_tuples = 80
nkeep = 0
ru0 = {tv = {tv_sec = 1527175061, tv_usec = 739743}, ru = {ru_utime = {tv_sec = 0, tv_usec = 60000}, ru_stime = {tv_sec = 1, tv_usec = 416000}, {ru_maxrss = 9704,
__ru_maxrss_word = 9704}, {ru_ixrss = 0, __ru_ixrss_word = 0}, {ru_idrss = 0, __ru_idrss_word = 0}, {ru_isrss = 0, __ru_isrss_word = 0}, {ru_minflt = 33982,
__ru_minflt_word = 33982}, {ru_majflt = 0, __ru_majflt_word = 0}, {ru_nswap = 0, __ru_nswap_word = 0}, {ru_inblock = 0, __ru_inblock_word = 0}, {ru_oublock = 8,
__ru_oublock_word = 8}, {ru_msgsnd = 0, __ru_msgsnd_word = 0}, {ru_msgrcv = 0, __ru_msgrcv_word = 0}, {ru_nsignals = 0, __ru_nsignals_word = 0}, {ru_nvcsw = 38446,
__ru_nvcsw_word = 38446}, {ru_nivcsw = 2, __ru_nivcsw_word = 2}}}
vmbuffer = 8763411
empty_pages = 0
nunused = 0
i = <optimized out>
next_unskippable_block = <optimized out>
buf = {data = 0x1 <error: Cannot access memory at address 0x1>, len = -1557393520, maxlen = 32764, cursor = -1557393616}
nblocks = 2
blkno = <optimized out>
tuple = {t_len = 144, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 1}, ip_posid = 26}, t_tableOid = 1260, t_data = 0x7f2b7bc5e6b0}
initprog_index = {0, 1, 5}
#3 lazy_vacuum_rel (onerel=onerel@entry=0x555b64029498, options=options@entry=1, params=params@entry=0x7ffca32c11b0, bstrategy=<optimized out>)
at /build/postgresql-9.6-9VrjLn/postgresql-9.6-9.6.8/build/../src/backend/commands/vacuumlazy.c:253
vacrelstats = <optimized out>
Irel = 0x555b64095948
nindexes = 2
ru0 = {tv = {tv_sec = 93850993708032, tv_usec = 23936}, ru = {ru_utime = {tv_sec = 128, tv_usec = 93851007694584}, ru_stime = {tv_sec = 0, tv_usec = 0}, {ru_maxrss = 140723046059424,
__ru_maxrss_word = 140723046059424}, {ru_ixrss = 93850989549501, __ru_ixrss_word = 93850989549501}, {ru_idrss = 16, __ru_idrss_word = 16}, {ru_isrss = 93851007694584,
__ru_isrss_word = 93851007694584}, {ru_minflt = 140723046059472, __ru_minflt_word = 140723046059472}, {ru_majflt = 93850989559601, __ru_majflt_word = 93850989559601}, {
ru_nswap = 93850994010032, __ru_nswap_word = 93850994010032}, {ru_inblock = 139833259949608, __ru_inblock_word = 139833259949608}, {ru_oublock = 93850994010032,
__ru_oublock_word = 93850994010032}, {ru_msgsnd = 93851008239032, __ru_msgsnd_word = 93851008239032}, {ru_msgrcv = 140723046059904, __ru_msgrcv_word = 140723046059904}, {
ru_nsignals = 93850989313526, __ru_nsignals_word = 93850989313526}, {ru_nvcsw = 345176855416, __ru_nvcsw_word = 345176855416}, {ru_nivcsw = 140723046059551,
__ru_nivcsw_word = 140723046059551}}}
starttime = 0
secs = 1527175061
usecs = 144
read_rate = <optimized out>
write_rate = <optimized out>
aggressive = 0 '\000'
scanned_all_unfrozen = <optimized out>
xidFullScanLimit = 3440221679
mxactFullScanLimit = 4204418425
new_rel_pages = <optimized out>
new_rel_tuples = <optimized out>
new_rel_allvisible = 2737573328
new_live_tuples = <optimized out>
new_frozen_xid = <optimized out>
new_min_multi = <optimized out>
__func__ = "lazy_vacuum_rel"
#4 0x0000555b62bfa54d in vacuum_rel (relid=relid@entry=1260, relation=relation@entry=0x555b64047100, options=options@entry=1, params=params@entry=0x7ffca32c11b0)
at /build/postgresql-9.6-9VrjLn/postgresql-9.6-9.6.8/build/../src/backend/commands/vacuum.c:1391
lmode = 4
onerel = 0x555b64029498
onerelid = {relId = 1260, dbId = 0}
toast_relid = 0
save_userid = 10
save_sec_context = 0
save_nestlevel = 2
__func__ = "vacuum_rel"
#5 0x0000555b62bfb545 in vacuum (options=1, relation=0x555b64047100, relid=relid@entry=0, params=params@entry=0x7ffca32c11b0, va_cols=0x0, bstrategy=<optimized out>, bstrategy@entry=0x0,
isTopLevel=1 '\001') at /build/postgresql-9.6-9VrjLn/postgresql-9.6-9.6.8/build/../src/backend/commands/vacuum.c:303
relid = 1260
cur = 0x555b640998b8
save_exception_stack = 0x7ffca32c1590
save_context_stack = 0x0
local_sigjmp_buf = {{__jmpbuf = {93851008735448, -4513934124599105488, 93850994009984, 93851008619616, 93851008619616, 93851008398480, -4513934124655728592, -7776366411743385552},
__mask_was_saved = 0, __saved_mask = {__val = {0, 0, 0, 0, 1, 1, 1, 3473119737632784380, 139833388464563, 140723046060384, 12, 140723046060912, 93850990837571,
4049721212982268568, 1, 140723046060432}}}}
stmttype = <optimized out>
in_outer_xact = 0 '\000'
use_own_xacts = 1 '\001'
relations = 0x555b640998d8
in_vacuum = 1 '\001'
__func__ = "vacuum"
#6 0x0000555b62bfb8ca in ExecVacuum (vacstmt=vacstmt@entry=0x555b64047150, isTopLevel=isTopLevel@entry=1 '\001')
at /build/postgresql-9.6-9VrjLn/postgresql-9.6-9.6.8/build/../src/backend/commands/vacuum.c:121
params = {freeze_min_age = -1, freeze_table_age = -1, multixact_freeze_min_age = -1, multixact_freeze_table_age = -1, is_wraparound = 0 '\000', log_min_duration = -1}
#7 0x0000555b62d27787 in standard_ProcessUtility (parsetree=0x555b64047150, queryString=0x555b64046688 "vacuum pg_catalog.pg_authid;", context=<optimized out>, params=0x0,
dest=0x555b64047490, completionTag=0x7ffca32c19c0 "") at /build/postgresql-9.6-9VrjLn/postgresql-9.6-9.6.8/build/../src/backend/tcop/utility.c:655
stmt = 0x555b64047150
isTopLevel = <optimized out>
__func__ = "standard_ProcessUtility"
edata = <optimized out>
__func__ = "errmsg_internal"
#1 0x0000555b62ad1cb9 in heap_prepare_freeze_tuple (tuple=<optimized out>, relfrozenxid=relfrozenxid@entry=248720453, relminmxid=relminmxid@entry=53644256, cutoff_xid=3485221679,
cutoff_multi=<optimized out>, frz=frz@entry=0x555b640d1988, totally_frozen_p=0x7ffca32c0e90 "\001")
at /build/postgresql-9.6-9VrjLn/postgresql-9.6-9.6.8/build/../src/backend/access/heap/heapam.c:6685
changed = <optimized out>
freeze_xmax = 0 '\000'
xid = <optimized out>
totally_frozen = <optimized out>
__func__ = "heap_prepare_freeze_tuple"
#2 0x0000555b62bfd2b5 in lazy_scan_heap (aggressive=0 '\000', nindexes=2, Irel=0x555b64095948, vacrelstats=<optimized out>, options=26, onerel=0x555b64029498)
at /build/postgresql-9.6-9VrjLn/postgresql-9.6-9.6.8/build/../src/backend/commands/vacuumlazy.c:1090
tuple_totally_frozen = 1 '\001'
itemid = 0x7f2b7bc5d4fc
buf = 18138842
page = <optimized out>
offnum = 26
maxoff = 26
hastup = 1 '\001'
nfrozen = 0
freespace = <optimized out>
all_frozen = 1 '\001'
tupgone = 0 '\000'
prev_dead_count = 0
all_visible_according_to_vm = 0 '\000'
all_visible = 1 '\001'
has_dead_tuples = 0 '\000'
visibility_cutoff_xid = 3490221678
relname = 0x555b640296a8 "pg_authid"
relminmxid = 53644256
tups_vacuumed = 0
indstats = 0x555b64095928
relfrozenxid = 248720453
skipping_blocks = 0 '\000'
frozen = 0x555b640d1988
initprog_val = {1, 2, 582}
vacuumed_pages = 0
num_tuples = 80
nkeep = 0
ru0 = {tv = {tv_sec = 1527175061, tv_usec = 739743}, ru = {ru_utime = {tv_sec = 0, tv_usec = 60000}, ru_stime = {tv_sec = 1, tv_usec = 416000}, {ru_maxrss = 9704,
__ru_maxrss_word = 9704}, {ru_ixrss = 0, __ru_ixrss_word = 0}, {ru_idrss = 0, __ru_idrss_word = 0}, {ru_isrss = 0, __ru_isrss_word = 0}, {ru_minflt = 33982,
__ru_minflt_word = 33982}, {ru_majflt = 0, __ru_majflt_word = 0}, {ru_nswap = 0, __ru_nswap_word = 0}, {ru_inblock = 0, __ru_inblock_word = 0}, {ru_oublock = 8,
__ru_oublock_word = 8}, {ru_msgsnd = 0, __ru_msgsnd_word = 0}, {ru_msgrcv = 0, __ru_msgrcv_word = 0}, {ru_nsignals = 0, __ru_nsignals_word = 0}, {ru_nvcsw = 38446,
__ru_nvcsw_word = 38446}, {ru_nivcsw = 2, __ru_nivcsw_word = 2}}}
vmbuffer = 8763411
empty_pages = 0
nunused = 0
i = <optimized out>
next_unskippable_block = <optimized out>
buf = {data = 0x1 <error: Cannot access memory at address 0x1>, len = -1557393520, maxlen = 32764, cursor = -1557393616}
nblocks = 2
blkno = <optimized out>
tuple = {t_len = 144, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 1}, ip_posid = 26}, t_tableOid = 1260, t_data = 0x7f2b7bc5e6b0}
initprog_index = {0, 1, 5}
#3 lazy_vacuum_rel (onerel=onerel@entry=0x555b64029498, options=options@entry=1, params=params@entry=0x7ffca32c11b0, bstrategy=<optimized out>)
at /build/postgresql-9.6-9VrjLn/postgresql-9.6-9.6.8/build/../src/backend/commands/vacuumlazy.c:253
vacrelstats = <optimized out>
Irel = 0x555b64095948
nindexes = 2
ru0 = {tv = {tv_sec = 93850993708032, tv_usec = 23936}, ru = {ru_utime = {tv_sec = 128, tv_usec = 93851007694584}, ru_stime = {tv_sec = 0, tv_usec = 0}, {ru_maxrss = 140723046059424,
__ru_maxrss_word = 140723046059424}, {ru_ixrss = 93850989549501, __ru_ixrss_word = 93850989549501}, {ru_idrss = 16, __ru_idrss_word = 16}, {ru_isrss = 93851007694584,
__ru_isrss_word = 93851007694584}, {ru_minflt = 140723046059472, __ru_minflt_word = 140723046059472}, {ru_majflt = 93850989559601, __ru_majflt_word = 93850989559601}, {
ru_nswap = 93850994010032, __ru_nswap_word = 93850994010032}, {ru_inblock = 139833259949608, __ru_inblock_word = 139833259949608}, {ru_oublock = 93850994010032,
__ru_oublock_word = 93850994010032}, {ru_msgsnd = 93851008239032, __ru_msgsnd_word = 93851008239032}, {ru_msgrcv = 140723046059904, __ru_msgrcv_word = 140723046059904}, {
ru_nsignals = 93850989313526, __ru_nsignals_word = 93850989313526}, {ru_nvcsw = 345176855416, __ru_nvcsw_word = 345176855416}, {ru_nivcsw = 140723046059551,
__ru_nivcsw_word = 140723046059551}}}
starttime = 0
secs = 1527175061
usecs = 144
read_rate = <optimized out>
write_rate = <optimized out>
aggressive = 0 '\000'
scanned_all_unfrozen = <optimized out>
xidFullScanLimit = 3440221679
mxactFullScanLimit = 4204418425
new_rel_pages = <optimized out>
new_rel_tuples = <optimized out>
new_rel_allvisible = 2737573328
new_live_tuples = <optimized out>
new_frozen_xid = <optimized out>
new_min_multi = <optimized out>
__func__ = "lazy_vacuum_rel"
#4 0x0000555b62bfa54d in vacuum_rel (relid=relid@entry=1260, relation=relation@entry=0x555b64047100, options=options@entry=1, params=params@entry=0x7ffca32c11b0)
at /build/postgresql-9.6-9VrjLn/postgresql-9.6-9.6.8/build/../src/backend/commands/vacuum.c:1391
lmode = 4
onerel = 0x555b64029498
onerelid = {relId = 1260, dbId = 0}
toast_relid = 0
save_userid = 10
save_sec_context = 0
save_nestlevel = 2
__func__ = "vacuum_rel"
#5 0x0000555b62bfb545 in vacuum (options=1, relation=0x555b64047100, relid=relid@entry=0, params=params@entry=0x7ffca32c11b0, va_cols=0x0, bstrategy=<optimized out>, bstrategy@entry=0x0,
isTopLevel=1 '\001') at /build/postgresql-9.6-9VrjLn/postgresql-9.6-9.6.8/build/../src/backend/commands/vacuum.c:303
relid = 1260
cur = 0x555b640998b8
save_exception_stack = 0x7ffca32c1590
save_context_stack = 0x0
local_sigjmp_buf = {{__jmpbuf = {93851008735448, -4513934124599105488, 93850994009984, 93851008619616, 93851008619616, 93851008398480, -4513934124655728592, -7776366411743385552},
__mask_was_saved = 0, __saved_mask = {__val = {0, 0, 0, 0, 1, 1, 1, 3473119737632784380, 139833388464563, 140723046060384, 12, 140723046060912, 93850990837571,
4049721212982268568, 1, 140723046060432}}}}
stmttype = <optimized out>
in_outer_xact = 0 '\000'
use_own_xacts = 1 '\001'
relations = 0x555b640998d8
in_vacuum = 1 '\001'
__func__ = "vacuum"
#6 0x0000555b62bfb8ca in ExecVacuum (vacstmt=vacstmt@entry=0x555b64047150, isTopLevel=isTopLevel@entry=1 '\001')
at /build/postgresql-9.6-9VrjLn/postgresql-9.6-9.6.8/build/../src/backend/commands/vacuum.c:121
params = {freeze_min_age = -1, freeze_table_age = -1, multixact_freeze_min_age = -1, multixact_freeze_table_age = -1, is_wraparound = 0 '\000', log_min_duration = -1}
#7 0x0000555b62d27787 in standard_ProcessUtility (parsetree=0x555b64047150, queryString=0x555b64046688 "vacuum pg_catalog.pg_authid;", context=<optimized out>, params=0x0,
dest=0x555b64047490, completionTag=0x7ffca32c19c0 "") at /build/postgresql-9.6-9VrjLn/postgresql-9.6-9.6.8/build/../src/backend/tcop/utility.c:655
stmt = 0x555b64047150
isTopLevel = <optimized out>
__func__ = "standard_ProcessUtility"
--Maxim Boguk
Senior Postgresql DBA
http://dataegret.com/
Phone RU: +7 985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно когда я так делаю ещё раз?"
--
Maxim Boguk
Senior Postgresql DBA
http://dataegret.com/
Phone RU: +7 985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно когда я так делаю ещё раз?"
Senior Postgresql DBA
http://dataegret.com/
Phone RU: +7 985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно когда я так делаю ещё раз?"
Hmm .. surely diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 5016181fd7..5d7fa1fb45 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -6690,7 +6690,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, xid = HeapTupleHeaderGetXmin(tuple); xmin_frozen = ((xid == FrozenTransactionId) || HeapTupleHeaderXminFrozen(tuple)); - if (TransactionIdIsNormal(xid)) + if (!xmin_frozen && TransactionIdIsNormal(xid)) { if (TransactionIdPrecedes(xid, relfrozenxid)) ereport(ERROR, ?? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018-05-24 13:08:53 -0400, Alvaro Herrera wrote: > Hmm .. surely > > diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c > index 5016181fd7..5d7fa1fb45 100644 > --- a/src/backend/access/heap/heapam.c > +++ b/src/backend/access/heap/heapam.c > @@ -6690,7 +6690,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, > xid = HeapTupleHeaderGetXmin(tuple); > xmin_frozen = ((xid == FrozenTransactionId) || > HeapTupleHeaderXminFrozen(tuple)); > - if (TransactionIdIsNormal(xid)) > + if (!xmin_frozen && TransactionIdIsNormal(xid)) > { > if (TransactionIdPrecedes(xid, relfrozenxid)) > ereport(ERROR, > > > ?? I don't think that's necesary - HeapTupleHeaderGetXmin() returns FrozenTransactionId if the tuple is frozen (note the HeapTupleHeaderXminFrozen() within). Greetings, Andres Freund
On 2018-05-24 13:30:54 -0700, Andres Freund wrote: > On 2018-05-24 13:08:53 -0400, Alvaro Herrera wrote: > > Hmm .. surely > > > > diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c > > index 5016181fd7..5d7fa1fb45 100644 > > --- a/src/backend/access/heap/heapam.c > > +++ b/src/backend/access/heap/heapam.c > > @@ -6690,7 +6690,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, > > xid = HeapTupleHeaderGetXmin(tuple); > > xmin_frozen = ((xid == FrozenTransactionId) || > > HeapTupleHeaderXminFrozen(tuple)); > > - if (TransactionIdIsNormal(xid)) > > + if (!xmin_frozen && TransactionIdIsNormal(xid)) > > { > > if (TransactionIdPrecedes(xid, relfrozenxid)) > > ereport(ERROR, > > > > > > ?? > > I don't think that's necesary - HeapTupleHeaderGetXmin() returns > FrozenTransactionId if the tuple is frozen (note the > HeapTupleHeaderXminFrozen() within). FWIW, even if that weren't the case: a) there'd be a lot more wrong with this routine imo. b) some of the tuples affected clearly weren't frozen... Greetings, Andres Freund
On 2018-May-24, Andres Freund wrote: > On 2018-05-24 13:08:53 -0400, Alvaro Herrera wrote: > > Hmm .. surely > > xid = HeapTupleHeaderGetXmin(tuple); > > xmin_frozen = ((xid == FrozenTransactionId) || > > HeapTupleHeaderXminFrozen(tuple)); > > - if (TransactionIdIsNormal(xid)) > > + if (!xmin_frozen && TransactionIdIsNormal(xid)) > I don't think that's necesary - HeapTupleHeaderGetXmin() returns > FrozenTransactionId if the tuple is frozen (note the > HeapTupleHeaderXminFrozen() within). Ah, yeah ... I probably thought about this when writing it and removed it for that reason. BTW I think the definition of HeapTupleHeaderXminFrozen is seriously confusing, by failing to return true if the xmin is numerically FrozenXid (which it'll be if the database was pg_upgraded). I wonder about this one in HeapTupleSatisfiesMVCC: else { /* xmin is committed, but maybe not according to our snapshot */ if (!HeapTupleHeaderXminFrozen(tuple) && XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot)) return false; /* treat as still in progress */ } I think this is not a bug only because XidInMVCCSnapshot does this /* Any xid < xmin is not in-progress */ if (TransactionIdPrecedes(xid, snapshot->xmin)) return false; which makes it return false for FrozenXid, but seems more of an accident than explicitly designed. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018-May-24, Andres Freund wrote: > FWIW, even if that weren't the case: a) there'd be a lot more wrong with > this routine imo. b) some of the tuples affected clearly weren't > frozen... Right. BTW is it just a coincidence or are all the affected tables pg_authid? Maybe the problem is shared relations ..? Maybe the fact that they have separate relfrozenxid (!?) in different databases? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018-05-24 16:49:40 -0400, Alvaro Herrera wrote: > BTW is it just a coincidence or are all the affected tables pg_authid? > Maybe the problem is shared relations ..? Maybe the fact that they have > separate relfrozenxid (!?) in different databases? Yes, that appears to be part of the problem. I've looked at a number of shared relation related codepaths, but so far my theory is that the relcache is wrong. Note that one of the reports in this thread clearly had a different relcache relfrozenxid than in the catalog. Then there's also: http://archives.postgresql.org/message-id/1527193504642.36340%40amazon.com Greetings, Andres Freund
BTW is it just a coincidence or are all the affected tables pg_authid?
Maybe the problem is shared relations ..? Maybe the fact that they have
separate relfrozenxid (!?) in different databases?
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
We have had this problem twice and both times on both pg_authid and pg_auth_members. Thanks,
Jeremy
On 2018-May-24, Andres Freund wrote: > On 2018-05-24 16:49:40 -0400, Alvaro Herrera wrote: > > BTW is it just a coincidence or are all the affected tables pg_authid? > > Maybe the problem is shared relations ..? Maybe the fact that they have > > separate relfrozenxid (!?) in different databases? > > Yes, that appears to be part of the problem. I've looked at a number of > shared relation related codepaths, but so far my theory is that the > relcache is wrong. Note that one of the reports in this thread clearly > had a different relcache relfrozenxid than in the catalog. Hmm ... is that because they read the values on different databases? Are you referring to the reports by Maxim Boguk? I see one value from template1, another value from template0. > Then there's also: > http://archives.postgresql.org/message-id/1527193504642.36340%40amazon.com ah, so deleting the relcache file makes the problem to go away? That's definitely pretty strange. I see no reason for the value in relcache to become out of step with the catalogued value in the same database ... I don't think we transmit in any way values of one database to another. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018-05-24 17:13:11 -0400, Alvaro Herrera wrote: > On 2018-May-24, Andres Freund wrote: > > > On 2018-05-24 16:49:40 -0400, Alvaro Herrera wrote: > > > BTW is it just a coincidence or are all the affected tables pg_authid? > > > Maybe the problem is shared relations ..? Maybe the fact that they have > > > separate relfrozenxid (!?) in different databases? > > > > Yes, that appears to be part of the problem. I've looked at a number of > > shared relation related codepaths, but so far my theory is that the > > relcache is wrong. Note that one of the reports in this thread clearly > > had a different relcache relfrozenxid than in the catalog. > > Hmm ... is that because they read the values on different databases? > Are you referring to the reports by Maxim Boguk? I see one value from > template1, another value from template0. I was referring to https://www.postgresql.org/message-id/20180522193007.4bi5oluqb7c72oq2@alap3.anarazel.de but you're right, it's possible that that's just caused by time passing or different databases. Greetings, Andres Freund
On 2018-05-24 16:46:24 -0400, Alvaro Herrera wrote: > On 2018-May-24, Andres Freund wrote: > > > On 2018-05-24 13:08:53 -0400, Alvaro Herrera wrote: > > > Hmm .. surely > > > > xid = HeapTupleHeaderGetXmin(tuple); > > > xmin_frozen = ((xid == FrozenTransactionId) || > > > HeapTupleHeaderXminFrozen(tuple)); > > > - if (TransactionIdIsNormal(xid)) > > > + if (!xmin_frozen && TransactionIdIsNormal(xid)) > > > I don't think that's necesary - HeapTupleHeaderGetXmin() returns > > FrozenTransactionId if the tuple is frozen (note the > > HeapTupleHeaderXminFrozen() within). > > Ah, yeah ... I probably thought about this when writing it and removed > it for that reason. > > BTW I think the definition of HeapTupleHeaderXminFrozen is seriously > confusing, by failing to return true if the xmin is numerically > FrozenXid (which it'll be if the database was pg_upgraded). I wonder > about this one in HeapTupleSatisfiesMVCC: I suggest raising this on -hackers. I agree that it's unfortunate. Greetings, Andres Freund
On Fri, May 25, 2018 at 1:38 PM, Andres Freund <andres@anarazel.de> wrote: >> BTW I think the definition of HeapTupleHeaderXminFrozen is seriously >> confusing, by failing to return true if the xmin is numerically >> FrozenXid (which it'll be if the database was pg_upgraded). I wonder >> about this one in HeapTupleSatisfiesMVCC: > > I suggest raising this on -hackers. I agree that it's unfortunate. I wonder if BootstrapTransactionId also needs to be considered here. -- Peter Geoghegan