Thread: Getting rid of duplicate tables.
First I wish I knew how this was caused but here is our problem. Sometime in the recent past we got a duplicate table. Here is the result of a pg_dump with a pg_restore for just that table. -- -- TOC entry 59 (OID 11462032) -- Name: order_to_do; Type: TABLE; Schema: public; Owner: www -- Data Pos: 0 -- CREATE TABLE order_to_do ( order_id integer DEFAULT 0 NOT NULL, to_do_id text DEFAULT ''::text NOT NULL, date_time timestamp without time zone DEFAULT '0001-01-01 00:00:00'::timestamp without time zone NOT NULL, csr_id integer DEFAULT 0 NOT NULL, supplement text DEFAULT ''::text NOT NULL ); -- -- TOC entry 60 (OID 11462032) -- Name: order_to_do; Type: TABLE; Schema: public; Owner: www -- Data Pos: 0 -- CREATE TABLE order_to_do ( order_id integer DEFAULT 0 NOT NULL, to_do_id text DEFAULT ''::text NOT NULL, date_time timestamp without time zone DEFAULT '0001-01-01 00:00:00'::timestamp without time zone NOT NULL, csr_id integer DEFAULT 0 NOT NULL, supplement text DEFAULT ''::text NOT NULL ); So somehow it appears that we managed to get an *exact* duplicate of the table well at least a duplicate reference to the table somewhere important. First of all how do we get rid of this "extra" table? Note: live=# drop table order_to_do; ERROR: duplicate key violates unique constraint "pg_class_oid_index" Secondly any ideas on how it got there in the first place? Can this just happen? Or did someone here have to accidentally do something to create it. Sorry for not having any idea on what caused this, but thanks for any help you can give. Jared
Jared Carr <jared@89glass.com> writes: > First I wish I knew how this was caused but here is our problem. > Sometime in the recent past we got a duplicate table. Here is the > result of a pg_dump with a pg_restore for just that table. This should be fairly easy to fix: manually DELETE the extra pg_class entry (selecting it by ctid, since there isn't any other way to select just one of them). However, please don't do that until we've exhausted the opportunity to figure out how this happened. For starters, what PG version is this? And how many entries are there in pg_class at the moment? (The output from "VACUUM VERBOSE pg_class" would be useful to look at.) regards, tom lane
Tom Lane wrote: >Jared Carr <jared@89glass.com> writes: > > >>First I wish I knew how this was caused but here is our problem. >>Sometime in the recent past we got a duplicate table. Here is the >>result of a pg_dump with a pg_restore for just that table. >> >> > >This should be fairly easy to fix: manually DELETE the extra pg_class >entry (selecting it by ctid, since there isn't any other way to select >just one of them). However, please don't do that until we've exhausted >the opportunity to figure out how this happened. > >For starters, what PG version is this? And how many entries are there >in pg_class at the moment? (The output from "VACUUM VERBOSE pg_class" >would be useful to look at.) > > regards, tom lane > > This is version 7.4.1 (upgraded from 7.4 about 3 weeks ago). INFO: vacuuming "pg_catalog.pg_class" INFO: index "pg_class_oid_index" now contains 412 row versions in 7 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "pg_class_relname_nsp_index" now contains 412 row versions in 25 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pg_class": found 0 removable, 412 nonremovable row versions in 31 pages DETAIL: 0 dead row versions cannot be removed yet. There were 1216 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. Query returned successfully with no result in 20 ms. And the contents of pg_class relating to this table. live=# select * from pg_class where relname='order_to_do'; relname | relnamespace | reltype | relowner | relam | relfilenode | relpages | reltuples | reltoastrelid | reltoastidxid | relhasindex | relisshared | relkind | relnatts | relchecks | reltriggers | relukeys | relfkeys | relrefs | relhasoids | relhaspkey | relhasrules | relhassubclass | relacl -------------+--------------+----------+----------+-------+-------------+----------+-----------+---------------+---------------+-------------+-------------+---------+----------+-----------+-------------+----------+----------+---------+------------+------------+-------------+----------------+-------- order_to_do | 2200 | 11462033 | 101 | 0 | 11462032 | 506 | 59401 | 11462039 | 0 | t | f | r | 5 | 0 | 1 | 0 | 0 | 0 | t | t | f | f | order_to_do | 2200 | 11462033 | 101 | 0 | 11462032 | 453 | 53407 | 11462039 | 0 | t | f | r | 5 | 0 | 1 | 0 | 0 | 0 | t | t | f | f | (2 rows) Thanks for the help, and let me know if you need any more information. Jared
Jared Carr <jared@89glass.com> writes: >> For starters, what PG version is this? And how many entries are there >> in pg_class at the moment? (The output from "VACUUM VERBOSE pg_class" >> would be useful to look at.) > This is version 7.4.1 (upgraded from 7.4 about 3 weeks ago). Okay, that shoots down my faint hope that this was an old bug. I am growing increasingly suspicious that there is some corner-case problem with duplicate rows in 7.4, because we've seen two or three reports now of problems with unexpected duplicates. The reports all have different symptoms but I sense a common theme ... > And the contents of pg_class relating to this table. > > live=# select * from pg_class where relname='order_to_do'; I was just about to ask for that, but I need to see the system columns too: select oid,ctid,cmax,xmax,cmin,xmin,* from pg_class where relname='order_to_do'; Also, would you check whether you get the same results from this query with enable_indexscan switched off? regards, tom lane
Tom Lane wrote: >Jared Carr <jared@89glass.com> writes: > > >>>For starters, what PG version is this? And how many entries are there >>>in pg_class at the moment? (The output from "VACUUM VERBOSE pg_class" >>>would be useful to look at.) >>> >>> > > > >>This is version 7.4.1 (upgraded from 7.4 about 3 weeks ago). >> >> > >Okay, that shoots down my faint hope that this was an old bug. I am >growing increasingly suspicious that there is some corner-case problem >with duplicate rows in 7.4, because we've seen two or three reports now >of problems with unexpected duplicates. The reports all have different >symptoms but I sense a common theme ... > > Well, hopefully I can provide enough information to find out what that theme is. >>And the contents of pg_class relating to this table. >> >>live=# select * from pg_class where relname='order_to_do'; >> >> > >I was just about to ask for that, but I need to see the system columns >too: > >select oid,ctid,cmax,xmax,cmin,xmin,* from pg_class > where relname='order_to_do'; > > live=# select oid,ctid,cmax,xmax,cmin,xmin,* from pg_class where relname='order_to_do'; oid | ctid | cmax | xmax | cmin | xmin | relname | relnamespace | reltype | relowner | relam | relfilenode | relpages | reltuples | reltoastrelid | reltoastidxid | relhasindex | relisshared | relkind | relnatts | relchecks | reltriggers | relukeys | relfkeys | relrefs | relhasoids | relhaspkey | relhasrules | relhassubclass | relacl ----------+---------+------+----------+----------+----------+-------------+--------------+----------+----------+-------+-------------+----------+-----------+---------------+---------------+-------------+-------------+---------+----------+-----------+-------------+----------+----------+---------+------------+------------+-------------+----------------+-------- 11462032 | (27,2) | 0 | 2 | 2 | 46034931 | order_to_do | 2200 | 11462033 | 101 | 0 | 11462032 | 506 | 59401 | 11462039 | 0 | t | f | r | 5 | 0 | 1 | 0 | 0 | 0 | t | t | f | f | 11462032 | (27,43) | 2 | 46034931 | 46034931 | 8051642 | order_to_do | 2200 | 11462033 | 101 | 0 | 11462032 | 453 | 53407 | 11462039 | 0 | t | f | r | 5 | 0 | 1 | 0 | 0 | 0 | t | t | f | f | (2 rows) >Also, would you check whether you get the same results from this query >with enable_indexscan switched off? > > I am pretty sure that they are the same with index scan turned off, but here is the result for verification. live=# set enable_indexscan=off; SET live=# select oid,ctid,cmax,xmax,cmin,xmin,* from pg_class where relname='order_to_do'; oid | ctid | cmax | xmax | cmin | xmin | relname | relnamespace | reltype | relowner | relam | relfilenode | relpages | reltuples | reltoastrelid | reltoastidxid | relhasindex | relisshared | relkind | relnatts | relchecks | reltriggers | relukeys | relfkeys | relrefs | relhasoids | relhaspkey | relhasrules | relhassubclass | relacl ----------+---------+------+----------+----------+----------+-------------+--------------+----------+----------+-------+-------------+----------+-----------+---------------+---------------+-------------+-------------+---------+----------+-----------+-------------+----------+----------+---------+------------+------------+-------------+----------------+-------- 11462032 | (27,2) | 0 | 2 | 2 | 46034931 | order_to_do | 2200 | 11462033 | 101 | 0 | 11462032 | 506 | 59401 | 11462039 | 0 | t | f | r | 5 | 0 | 1 | 0 | 0 | 0 | t | t | f | f | 11462032 | (27,43) | 2 | 46034931 | 46034931 | 8051642 | order_to_do | 2200 | 11462033 | 101 | 0 | 11462032 | 453 | 53407 | 11462039 | 0 | t | f | r | 5 | 0 | 1 | 0 | 0 | 0 | t | t | f | f | (2 rows) thanks Jared Carr
Jared Carr <jared@89glass.com> writes: > live=# select oid,ctid,cmax,xmax,cmin,xmin,* from pg_class where > relname='order_to_do'; > oid | ctid | cmax | xmax | cmin | xmin | > relname | relnamespace | reltype | relowner | relam | relfilenode | > relpages | reltuples | reltoastrelid | reltoastidxid | relhasindex | > relisshared | relkind | relnatts | relchecks | reltriggers | relukeys | > relfkeys | relrefs | relhasoids | relhaspkey | relhasrules | > relhassubclass | relacl > ----------+---------+------+----------+----------+----------+-------------+--------------+----------+----------+-------+-------------+----------+-----------+---------------+---------------+-------------+-------------+---------+----------+-----------+-------------+----------+----------+---------+------------+------------+-------------+----------------+-------- > 11462032 | (27,2) | 0 | 2 | 2 | 46034931 | > order_to_do | 2200 | 11462033 | 101 | 0 | 11462032 > | 506 | 59401 | 11462039 | 0 | t | > f | r | 5 | 0 | 1 | 0 > | 0 | 0 | t | t | f | > f | > 11462032 | (27,43) | 2 | 46034931 | 46034931 | 8051642 | > order_to_do | 2200 | 11462033 | 101 | 0 | 11462032 > | 453 | 53407 | 11462039 | 0 | t | > f | r | 5 | 0 | 1 | 0 > | 0 | 0 | t | t | f | > f | > (2 rows) Hmm. This does not look like an index problem --- it looks like a tuple status problem. Apparently transaction 46034931 updated this row, marking the tuple at (27,43) invalid and inserting a new version at (27,2). That's fine and is not a unique-index violation by itself, but we should only be able to see one of these versions of the row. Clearly, both of them appear good since both are visible to your current transaction. There is no way that both of these tuples should be considered good; one or the other ought to be invalid from the point of view of any observing transaction. I think somebody dropped a status update somewhere. Have you had any system crashes recently? What kind of disk hardware is this running on --- is it IDE, and if so, do you have write caching properly disabled? To really dig into this, we need to get a physical dump of these rows. The best tool for this is pg_filedump, which you can get from http://sources.redhat.com/rhdb/utilities.html. You can run it with something like pg_filedump -i -f -R 27 $PGDATA/base/nnn/1259 where nnn is the OID of the database with the problem (look in pg_database to determine this). 1259 is the known OID of pg_class. Both of the interesting tuples are on the same page 27, so this should get the data we need. Also, please show the output of "ls -l $PGDATA/pg_clog". If the relevant segments of clog are still around, we might want to look at what it says about the commit states of these two transactions. regards, tom lane
Tom Lane wrote: >Jared Carr <jared@89glass.com> writes: > > >>live=# select oid,ctid,cmax,xmax,cmin,xmin,* from pg_class where >>relname='order_to_do'; >> oid | ctid | cmax | xmax | cmin | xmin | >>relname | relnamespace | reltype | relowner | relam | relfilenode | >>relpages | reltuples | reltoastrelid | reltoastidxid | relhasindex | >>relisshared | relkind | relnatts | relchecks | reltriggers | relukeys | >>relfkeys | relrefs | relhasoids | relhaspkey | relhasrules | >>relhassubclass | relacl >>----------+---------+------+----------+----------+----------+-------------+--------------+----------+----------+-------+-------------+----------+-----------+---------------+---------------+-------------+-------------+---------+----------+-----------+-------------+----------+----------+---------+------------+------------+-------------+----------------+-------- >> 11462032 | (27,2) | 0 | 2 | 2 | 46034931 | >>order_to_do | 2200 | 11462033 | 101 | 0 | 11462032 >>| 506 | 59401 | 11462039 | 0 | t | >>f | r | 5 | 0 | 1 | 0 >>| 0 | 0 | t | t | f | >>f | >> 11462032 | (27,43) | 2 | 46034931 | 46034931 | 8051642 | >>order_to_do | 2200 | 11462033 | 101 | 0 | 11462032 >>| 453 | 53407 | 11462039 | 0 | t | >>f | r | 5 | 0 | 1 | 0 >>| 0 | 0 | t | t | f | >>f | >>(2 rows) >> >> > >Hmm. This does not look like an index problem --- it looks like a tuple >status problem. Apparently transaction 46034931 updated this row, >marking the tuple at (27,43) invalid and inserting a new version at >(27,2). That's fine and is not a unique-index violation by itself, >but we should only be able to see one of these versions of the row. >Clearly, both of them appear good since both are visible to your current >transaction. There is no way that both of these tuples should be >considered good; one or the other ought to be invalid from the point of >view of any observing transaction. I think somebody dropped a status >update somewhere. Have you had any system crashes recently? What kind >of disk hardware is this running on --- is it IDE, and if so, do you >have write caching properly disabled > > Hmm...there haven't been any crashes for a good long time, there were some power oddities around a month ago, but I think we would have noticed this by now. The application that was built on it just died yesterday, so...unfortunately I don't know for sure how long this has been this way. Next it is running scsi disks in a raid 5 array. >To really dig into this, we need to get a physical dump of these rows. >The best tool for this is pg_filedump, which you can get from >http://sources.redhat.com/rhdb/utilities.html. You can run it with >something like > pg_filedump -i -f -R 27 $PGDATA/base/nnn/1259 >where nnn is the OID of the database with the problem (look in >pg_database to determine this). 1259 is the known OID of pg_class. >Both of the interesting tuples are on the same page 27, so this should >get the data we need. > > The filedump is pasted in at the end. >Also, please show the output of "ls -l $PGDATA/pg_clog". If the >relevant segments of clog are still around, we might want to look at >what it says about the commit states of these two transactions. > > > ls -l $PGDATA/pg_clog gives -rw------- 1 postgres postgres 262144 Nov 26 11:58 0000 -rw------- 1 postgres postgres 262144 Nov 28 11:15 0001 -rw------- 1 postgres postgres 262144 Nov 30 04:27 0002 -rw------- 1 postgres postgres 262144 Dec 1 07:23 0003 -rw------- 1 postgres postgres 262144 Dec 1 11:09 0004 -rw------- 1 postgres postgres 262144 Dec 1 12:50 0005 -rw------- 1 postgres postgres 262144 Dec 1 15:37 0006 -rw------- 1 postgres postgres 262144 Dec 2 05:58 0007 -rw------- 1 postgres postgres 262144 Dec 2 11:25 0008 -rw------- 1 postgres postgres 262144 Dec 2 16:54 0009 -rw------- 1 postgres postgres 262144 Dec 3 10:29 000A -rw------- 1 postgres postgres 262144 Dec 3 16:31 000B -rw------- 1 postgres postgres 262144 Dec 4 10:34 000C -rw------- 1 postgres postgres 262144 Dec 5 00:00 000D -rw------- 1 postgres postgres 262144 Dec 5 11:50 000E -rw------- 1 postgres postgres 262144 Dec 6 14:41 000F -rw------- 1 postgres postgres 262144 Dec 8 05:58 0010 -rw------- 1 postgres postgres 262144 Dec 8 09:30 0011 -rw------- 1 postgres postgres 262144 Dec 8 13:33 0012 -rw------- 1 postgres postgres 262144 Dec 9 06:15 0013 -rw------- 1 postgres postgres 262144 Dec 9 11:47 0014 -rw------- 1 postgres postgres 262144 Dec 10 05:45 0015 -rw------- 1 postgres postgres 262144 Dec 10 12:43 0016 -rw------- 1 postgres postgres 262144 Dec 11 08:11 0017 -rw------- 1 postgres postgres 262144 Dec 11 13:48 0018 -rw------- 1 postgres postgres 262144 Dec 12 06:20 0019 -rw------- 1 postgres postgres 262144 Dec 12 15:44 001A -rw------- 1 postgres postgres 262144 Dec 14 19:30 001B -rw------- 1 postgres postgres 262144 Dec 15 10:22 001C -rw------- 1 postgres postgres 262144 Dec 15 20:41 001D -rw------- 1 postgres postgres 262144 Dec 16 07:51 001E -rw------- 1 postgres postgres 262144 Dec 16 13:18 001F -rw------- 1 postgres postgres 262144 Dec 17 09:57 0020 -rw------- 1 postgres postgres 262144 Dec 18 06:00 0021 -rw------- 1 postgres postgres 262144 Dec 18 15:09 0022 -rw------- 1 postgres postgres 262144 Dec 19 12:25 0023 -rw------- 1 postgres postgres 262144 Dec 21 16:37 0024 -rw------- 1 postgres postgres 262144 Dec 22 13:30 0025 -rw------- 1 postgres postgres 262144 Dec 23 12:04 0026 -rw------- 1 postgres postgres 262144 Dec 25 17:07 0027 -rw------- 1 postgres postgres 262144 Dec 27 11:50 0028 -rw------- 1 postgres postgres 262144 Dec 29 08:27 0029 -rw------- 1 postgres postgres 262144 Dec 29 12:42 002A -rw------- 1 postgres postgres 262144 Dec 29 17:40 002B -rw------- 1 postgres postgres 262144 Dec 30 09:26 002C -rw------- 1 postgres postgres 262144 Dec 30 14:18 002D -rw------- 1 postgres postgres 262144 Dec 31 08:28 002E -rw------- 1 postgres postgres 262144 Jan 1 00:05 002F -rw------- 1 postgres postgres 262144 Jan 1 09:10 0030 -rw------- 1 postgres postgres 262144 Jan 2 10:04 0031 -rw------- 1 postgres postgres 262144 Jan 3 08:34 0032 -rw------- 1 postgres postgres 262144 Jan 5 08:30 0033 -rw------- 1 postgres postgres 262144 Jan 5 16:29 0034 -rw------- 1 postgres postgres 262144 Jan 6 10:50 0035 -rw------- 1 postgres postgres 262144 Jan 6 21:50 0036 -rw------- 1 postgres postgres 262144 Jan 7 13:11 0037 -rw------- 1 postgres postgres 262144 Jan 8 07:34 0038 -rw------- 1 postgres postgres 262144 Jan 8 14:47 0039 -rw------- 1 postgres postgres 262144 Jan 9 10:00 003A -rw------- 1 postgres postgres 262144 Jan 10 16:58 003B -rw------- 1 postgres postgres 262144 Jan 12 08:20 003C -rw------- 1 postgres postgres 262144 Jan 12 15:13 003D -rw------- 1 postgres postgres 262144 Jan 13 11:25 003E -rw------- 1 postgres postgres 262144 Jan 14 08:23 003F -rw------- 1 postgres postgres 262144 Jan 14 16:47 0040 -rw------- 1 postgres postgres 262144 Jan 15 13:06 0041 -rw------- 1 postgres postgres 262144 Jan 16 07:39 0042 -rw------- 1 postgres postgres 262144 Jan 16 14:28 0043 -rw------- 1 postgres postgres 90112 Jan 19 14:30 0044 And finally the dump. ******************************************************************* * PostgreSQL File/Block Formatted Dump Utility - Version 3.0-Beta * * File: /var/tmp/pgsql/data74/base/11462031/1259 * Options used: -i -f -R 27 * * Dump created on: Tue Jan 20 08:17:53 2004 ******************************************************************* Block 27 ******************************************************** <Header> ----- Block Offset: 0x00036000 Offsets: Lower 232 (0x00e8) Block: Size 8192 Version 1 Upper 6416 (0x1910) LSN: logid 7 recoff 0x2e369840 Special 8192 (0x2000) Items: 53 Free Space: 6184 Length (including item array): 236 0000: 07000000 4098362e 18000000 e8001019 ....@.6......... 0010: 00200120 cc1a0000 cc9a2801 389a2801 . . ......(.8.(. 0020: 10190000 a4992801 e8170000 10992801 ......(.......(. 0030: 601b0000 cc1a0000 381a0000 a4190000 `.......8....... 0040: 10190000 7c180000 e8170000 54170000 ....|.......T... 0050: c0160000 2c160000 98150000 04150000 ....,........... 0060: 70140000 dc130000 48130000 b4120000 p.......H....... 0070: 20120000 8c110000 f8100000 64100000 ...........d... 0080: d00f0000 3c0f0000 a80e0000 140e0000 ....<........... 0090: 800d0000 ec0c0000 580c0000 c40b0000 ........X....... 00a0: 300b0000 9c0a0000 080a0000 74090000 0...........t... 00b0: e0080000 4c080000 b8070000 6c9f2801 ....L.......l.(. 00c0: d89e2801 fc050000 449e2801 b09d2801 ..(.....D.(...(. 00d0: 40040000 1c9d2801 889c2801 f49b2801 @.....(...(...(. 00e0: f0010000 609b2801 00000000 ....`.(..... <Data> ------ Item 1 -- Length: 0 Offset: 6860 (0x1acc) Flags: 0x00 Item 2 -- Length: 148 Offset: 6860 (0x1acc) Flags: USED XID: min (46034931) CMIN|XMAX: 2 CMAX|XVAC: 0 Block Id: 27 linp Index: 2 Attributes: 23 Size: 28 infomask: 0x2910 (HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED) 1acc: f36fbe02 02000000 00000000 00001b00 .o.............. 1adc: 02001700 10291c00 90e5ae00 6f726465 .....)......orde 1aec: 725f746f 5f646f00 00000000 00000000 r_to_do......... 1afc: 00000000 00000000 00000000 00000000 ................ 1b0c: 00000000 00000000 00000000 00000000 ................ 1b1c: 00000000 00000000 00000000 98080000 ................ 1b2c: 91e5ae00 65000000 00000000 90e5ae00 ....e........... 1b3c: fa010000 00096847 97e5ae00 00000000 ......hG........ 1b4c: 01007200 05000000 01000000 00000000 ..r............. 1b5c: 01010000 .... Item 3 -- Length: 148 Offset: 6712 (0x1a38) Flags: USED XID: min (46034931) CMIN|XMAX: 3 CMAX|XVAC: 0 Block Id: 27 linp Index: 3 Attributes: 23 Size: 28 infomask: 0x2910 (HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED) 1a38: f36fbe02 03000000 00000000 00001b00 .o.............. 1a48: 03001700 10291c00 9ae5ae00 6f726465 .....)......orde 1a58: 725f746f 5f646f5f 706b6579 00000000 r_to_do_pkey.... 1a68: 00000000 00000000 00000000 00000000 ................ 1a78: 00000000 00000000 00000000 00000000 ................ 1a88: 00000000 00000000 00000000 98080000 ................ 1a98: 00000000 65000000 93010000 72c7b501 ....e.......r... 1aa8: 1d010000 00096847 00000000 00000000 ......hG........ 1ab8: 00006900 02000000 00000000 00000000 ..i............. 1ac8: 00000000 .... Item 4 -- Length: 0 Offset: 6416 (0x1910) Flags: 0x00 Item 5 -- Length: 148 Offset: 6564 (0x19a4) Flags: USED XID: min (46034931) CMIN|XMAX: 7 CMAX|XVAC: 0 Block Id: 27 linp Index: 5 Attributes: 23 Size: 28 infomask: 0x2910 (HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED) 19a4: f36fbe02 07000000 00000000 00001b00 .o.............. 19b4: 05001700 10291c00 bb7e2101 746f5f64 .....)...~!.to_d 19c4: 6f5f6964 5f627472 65655f69 6e646578 o_id_btree_index 19d4: 00000000 00000000 00000000 00000000 ................ 19e4: 00000000 00000000 00000000 00000000 ................ 19f4: 00000000 00000000 00000000 98080000 ................ 1a04: 00000000 64000000 93010000 73c7b501 ....d.......s... 1a14: fd000000 00096847 00000000 00000000 ......hG........ 1a24: 00006900 01000000 00000000 00000000 ..i............. 1a34: 00000000 .... Item 6 -- Length: 0 Offset: 6120 (0x17e8) Flags: 0x00 Item 7 -- Length: 148 Offset: 6416 (0x1910) Flags: USED XID: min (46034931) CMIN|XMAX: 11 CMAX|XVAC: 0 Block Id: 27 linp Index: 7 Attributes: 23 Size: 28 infomask: 0x2910 (HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED) 1910: f36fbe02 0b000000 00000000 00001b00 .o.............. 1920: 07001700 10291c00 99e5ae00 70675f74 .....)......pg_t 1930: 6f617374 5f313134 36323033 325f696e oast_11462032_in 1940: 64657800 00000000 00000000 00000000 dex............. 1950: 00000000 00000000 00000000 00000000 ................ 1960: 00000000 00000000 00000000 63000000 ............c... 1970: 00000000 65000000 93010000 74c7b501 ....e.......t... 1980: 01000000 00000000 00000000 00000000 ................ 1990: 00006900 02000000 00000000 00000000 ..i............. 19a0: 00000000 .... Item 8 -- Length: 0 Offset: 7008 (0x1b60) Flags: 0x00 Item 9 -- Length: 0 Offset: 6860 (0x1acc) Flags: 0x00 Item 10 -- Length: 0 Offset: 6712 (0x1a38) Flags: 0x00 Item 11 -- Length: 0 Offset: 6564 (0x19a4) Flags: 0x00 Item 12 -- Length: 0 Offset: 6416 (0x1910) Flags: 0x00 Item 13 -- Length: 0 Offset: 6268 (0x187c) Flags: 0x00 Item 14 -- Length: 0 Offset: 6120 (0x17e8) Flags: 0x00 Item 15 -- Length: 0 Offset: 5972 (0x1754) Flags: 0x00 Item 16 -- Length: 0 Offset: 5824 (0x16c0) Flags: 0x00 Item 17 -- Length: 0 Offset: 5676 (0x162c) Flags: 0x00 Item 18 -- Length: 0 Offset: 5528 (0x1598) Flags: 0x00 Item 19 -- Length: 0 Offset: 5380 (0x1504) Flags: 0x00 Item 20 -- Length: 0 Offset: 5232 (0x1470) Flags: 0x00 Item 21 -- Length: 0 Offset: 5084 (0x13dc) Flags: 0x00 Item 22 -- Length: 0 Offset: 4936 (0x1348) Flags: 0x00 Item 23 -- Length: 0 Offset: 4788 (0x12b4) Flags: 0x00 Item 24 -- Length: 0 Offset: 4640 (0x1220) Flags: 0x00 Item 25 -- Length: 0 Offset: 4492 (0x118c) Flags: 0x00 Item 26 -- Length: 0 Offset: 4344 (0x10f8) Flags: 0x00 Item 27 -- Length: 0 Offset: 4196 (0x1064) Flags: 0x00 Item 28 -- Length: 0 Offset: 4048 (0x0fd0) Flags: 0x00 Item 29 -- Length: 0 Offset: 3900 (0x0f3c) Flags: 0x00 Item 30 -- Length: 0 Offset: 3752 (0x0ea8) Flags: 0x00 Item 31 -- Length: 0 Offset: 3604 (0x0e14) Flags: 0x00 Item 32 -- Length: 0 Offset: 3456 (0x0d80) Flags: 0x00 Item 33 -- Length: 0 Offset: 3308 (0x0cec) Flags: 0x00 Item 34 -- Length: 0 Offset: 3160 (0x0c58) Flags: 0x00 Item 35 -- Length: 0 Offset: 3012 (0x0bc4) Flags: 0x00 Item 36 -- Length: 0 Offset: 2864 (0x0b30) Flags: 0x00 Item 37 -- Length: 0 Offset: 2716 (0x0a9c) Flags: 0x00 Item 38 -- Length: 0 Offset: 2568 (0x0a08) Flags: 0x00 Item 39 -- Length: 0 Offset: 2420 (0x0974) Flags: 0x00 Item 40 -- Length: 0 Offset: 2272 (0x08e0) Flags: 0x00 Item 41 -- Length: 0 Offset: 2124 (0x084c) Flags: 0x00 Item 42 -- Length: 0 Offset: 1976 (0x07b8) Flags: 0x00 Item 43 -- Length: 148 Offset: 8044 (0x1f6c) Flags: USED XID: min (8051642) CMIN|XMAX: 46034931 CMAX|XVAC: 2 Block Id: 27 linp Index: 2 Attributes: 23 Size: 28 infomask: 0x2910 (HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED) 1f6c: badb7a00 f36fbe02 02000000 00001b00 ..z..o.......... 1f7c: 02001700 10291c00 90e5ae00 6f726465 .....)......orde 1f8c: 725f746f 5f646f00 00000000 00000000 r_to_do......... 1f9c: 00000000 00000000 00000000 00000000 ................ 1fac: 00000000 00000000 00000000 00000000 ................ 1fbc: 00000000 00000000 00000000 98080000 ................ 1fcc: 91e5ae00 65000000 00000000 90e5ae00 ....e........... 1fdc: c5010000 009f5047 97e5ae00 00000000 ......PG........ 1fec: 01007200 05000000 01000000 00000000 ..r............. 1ffc: 01010000 .... Item 44 -- Length: 148 Offset: 7896 (0x1ed8) Flags: USED XID: min (8051642) CMIN|XMAX: 46034931 CMAX|XVAC: 0 Block Id: 27 linp Index: 1 Attributes: 23 Size: 28 infomask: 0x2910 (HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED) 1ed8: badb7a00 f36fbe02 00000000 00001b00 ..z..o.......... 1ee8: 01001700 10291c00 9ae5ae00 6f726465 .....)......orde 1ef8: 725f746f 5f646f5f 706b6579 00000000 r_to_do_pkey.... 1f08: 00000000 00000000 00000000 00000000 ................ 1f18: 00000000 00000000 00000000 00000000 ................ 1f28: 00000000 00000000 00000000 98080000 ................ 1f38: 00000000 65000000 93010000 54722401 ....e.......Tr$. 1f48: 03010000 009f5047 00000000 00000000 ......PG........ 1f58: 00006900 02000000 00000000 00000000 ..i............. 1f68: 00000000 .... Item 45 -- Length: 0 Offset: 1532 (0x05fc) Flags: 0x00 Item 46 -- Length: 148 Offset: 7748 (0x1e44) Flags: USED XID: min (8051642) CMIN|XMAX: 46034931 CMAX|XVAC: 4 Block Id: 27 linp Index: 4 Attributes: 23 Size: 28 infomask: 0x2910 (HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED) 1e44: badb7a00 f36fbe02 04000000 00001b00 ..z..o.......... 1e54: 04001700 10291c00 bb7e2101 746f5f64 .....)...~!.to_d 1e64: 6f5f6964 5f627472 65655f69 6e646578 o_id_btree_index 1e74: 00000000 00000000 00000000 00000000 ................ 1e84: 00000000 00000000 00000000 00000000 ................ 1e94: 00000000 00000000 00000000 98080000 ................ 1ea4: 00000000 64000000 93010000 55722401 ....d.......Ur$. 1eb4: e1000000 009f5047 00000000 00000000 ......PG........ 1ec4: 00006900 01000000 00000000 00000000 ..i............. 1ed4: 00000000 .... Item 47 -- Length: 148 Offset: 7600 (0x1db0) Flags: USED XID: min (8051642) CMIN|XMAX: 46034931 CMAX|XVAC: 8 Block Id: 27 linp Index: 6 Attributes: 23 Size: 28 infomask: 0x2910 (HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED) 1db0: badb7a00 f36fbe02 08000000 00001b00 ..z..o.......... 1dc0: 06001700 10291c00 99e5ae00 70675f74 .....)......pg_t 1dd0: 6f617374 5f313134 36323033 325f696e oast_11462032_in 1de0: 64657800 00000000 00000000 00000000 dex............. 1df0: 00000000 00000000 00000000 00000000 ................ 1e00: 00000000 00000000 00000000 63000000 ............c... 1e10: 00000000 65000000 93010000 62722401 ....e.......br$. 1e20: 01000000 00000000 00000000 00000000 ................ 1e30: 00006900 02000000 00000000 00000000 ..i............. 1e40: 00000000 .... Item 48 -- Length: 0 Offset: 1088 (0x0440) Flags: 0x00 Item 49 -- Length: 148 Offset: 7452 (0x1d1c) Flags: USED XID: min (8051658) CMIN|XMAX: 3 CMAX|XVAC: 0 Block Id: 27 linp Index: 49 Attributes: 23 Size: 28 infomask: 0x2910 (HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED) 1d1c: cadb7a00 03000000 00000000 00001b00 ..z............. 1d2c: 31001700 10291c00 9ce5ae00 746f5f64 1....)......to_d 1d3c: 6f5f6974 656d7300 00000000 00000000 o_items......... 1d4c: 00000000 00000000 00000000 00000000 ................ 1d5c: 00000000 00000000 00000000 00000000 ................ 1d6c: 00000000 00000000 00000000 98080000 ................ 1d7c: 9de5ae00 65000000 00000000 9ce5ae00 ....e........... 1d8c: 01000000 00004041 a8e5ae00 00000000 ......@A........ 1d9c: 01007200 0b000300 01000000 00000000 ..r............. 1dac: 01010000 .... Item 50 -- Length: 148 Offset: 7304 (0x1c88) Flags: USED XID: min (8051658) CMIN|XMAX: 4 CMAX|XVAC: 0 Block Id: 27 linp Index: 50 Attributes: 23 Size: 28 infomask: 0x2910 (HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED) 1c88: cadb7a00 04000000 00000000 00001b00 ..z............. 1c98: 32001700 10291c00 abe5ae00 746f5f64 2....)......to_d 1ca8: 6f5f6974 656d735f 706b6579 00000000 o_items_pkey.... 1cb8: 00000000 00000000 00000000 00000000 ................ 1cc8: 00000000 00000000 00000000 00000000 ................ 1cd8: 00000000 00000000 00000000 98080000 ................ 1ce8: 00000000 65000000 93010000 63722401 ....e.......cr$. 1cf8: 02000000 00004041 00000000 00000000 ......@A........ 1d08: 00006900 01000000 00000000 00000000 ..i............. 1d18: 00000000 .... Item 51 -- Length: 148 Offset: 7156 (0x1bf4) Flags: USED XID: min (8051658) CMIN|XMAX: 5 CMAX|XVAC: 0 Block Id: 27 linp Index: 51 Attributes: 23 Size: 28 infomask: 0x2910 (HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED) 1bf4: cadb7a00 05000000 00000000 00001b00 ..z............. 1c04: 33001700 10291c00 aae5ae00 70675f74 3....)......pg_t 1c14: 6f617374 5f313134 36323034 345f696e oast_11462044_in 1c24: 64657800 00000000 00000000 00000000 dex............. 1c34: 00000000 00000000 00000000 00000000 ................ 1c44: 00000000 00000000 00000000 63000000 ............c... 1c54: 00000000 65000000 93010000 64722401 ....e.......dr$. 1c64: 01000000 00000000 00000000 00000000 ................ 1c74: 00006900 02000000 00000000 00000000 ..i............. 1c84: 00000000 .... Item 52 -- Length: 0 Offset: 496 (0x01f0) Flags: 0x00 Item 53 -- Length: 148 Offset: 7008 (0x1b60) Flags: USED XID: min (8051660) CMIN|XMAX: 4 CMAX|XVAC: 0 Block Id: 27 linp Index: 53 Attributes: 23 Size: 28 infomask: 0x2910 (HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED) 1b60: ccdb7a00 04000000 00000000 00001b00 ..z............. 1b70: 35001700 10291c00 b8e5ae00 61646d69 5....)......admi 1b80: 6e5f7365 7373696f 6e735f70 6b657900 n_sessions_pkey. 1b90: 00000000 00000000 00000000 00000000 ................ 1ba0: 00000000 00000000 00000000 00000000 ................ 1bb0: 00000000 00000000 00000000 98080000 ................ 1bc0: 00000000 65000000 93010000 65722401 ....e.......er$. 1bd0: 49000000 00008841 00000000 00000000 I......A........ 1be0: 00006900 01000000 00000000 00000000 ..i............. 1bf0: 00000000 .... *** End of Requested Range Encountered. Last Block Read: 27 *** Thanks, Jared Carr
Jared Carr <jared@89glass.com> writes: > Item 2 -- Length: 148 Offset: 6860 (0x1acc) Flags: USED > XID: min (46034931) CMIN|XMAX: 2 CMAX|XVAC: 0 > Block Id: 27 linp Index: 2 Attributes: 23 Size: 28 > infomask: 0x2910 (HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED) > Item 43 -- Length: 148 Offset: 8044 (0x1f6c) Flags: USED > XID: min (8051642) CMIN|XMAX: 46034931 CMAX|XVAC: 2 > Block Id: 27 linp Index: 2 Attributes: 23 Size: 28 > infomask: 0x2910 (HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED) Well, there's the smoking gun ... somebody marked (27,2) as XMIN_COMMITTED, showing that they thought 46034931 was committed, while someone else marked (27,43) as XMAX_INVALID, showing that they thought 46034931 was aborted. So we have some kind of very-infrequent breakage in transaction commit-state lookup. Or a hardware problem, but I suspect we are looking at a bug. Could you check out what pg_clog has for transaction 46034931? This would be pg_clog/002B (which dates your problem to Dec 29 BTW), byte at offset 39BFC hex or 236540 decimal. I forget which way the bits run within the byte but will look it up if you can get me the value of that byte. I'm off to take a real close look at what was done to the pg_clog code during the 7.4 cycle ... regards, tom lane
Tom Lane wrote: >Jared Carr <jared@89glass.com> writes: > > >> Item 2 -- Length: 148 Offset: 6860 (0x1acc) Flags: USED >> XID: min (46034931) CMIN|XMAX: 2 CMAX|XVAC: 0 >> Block Id: 27 linp Index: 2 Attributes: 23 Size: 28 >> infomask: 0x2910 (HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED) >> >> > > > >> Item 43 -- Length: 148 Offset: 8044 (0x1f6c) Flags: USED >> XID: min (8051642) CMIN|XMAX: 46034931 CMAX|XVAC: 2 >> Block Id: 27 linp Index: 2 Attributes: 23 Size: 28 >> infomask: 0x2910 (HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED) >> >> > >Well, there's the smoking gun ... somebody marked (27,2) as >XMIN_COMMITTED, showing that they thought 46034931 was committed, while >someone else marked (27,43) as XMAX_INVALID, showing that they thought >46034931 was aborted. So we have some kind of very-infrequent >breakage in transaction commit-state lookup. Or a hardware problem, >but I suspect we are looking at a bug. > >Could you check out what pg_clog has for transaction 46034931? >This would be pg_clog/002B (which dates your problem to Dec 29 BTW), >byte at offset 39BFC hex or 236540 decimal. I forget which way the >bits run within the byte but will look it up if you can get me the >value of that byte. > > Here is the appropriate "line" (line is used *very* loosely there) 00039BF0 04 10 00 00 44 00 14 44 50 00 10 01 00 40 04 40 ....D..DP....@.@ 39BFC = 0 Jared Carr
Jared Carr <jared@89glass.com> writes: > Tom Lane wrote: >> Could you check out what pg_clog has for transaction 46034931? >> This would be pg_clog/002B (which dates your problem to Dec 29 BTW), >> byte at offset 39BFC hex or 236540 decimal. I forget which way the >> bits run within the byte but will look it up if you can get me the >> value of that byte. >> > Here is the appropriate "line" (line is used *very* loosely there) > 00039BF0 04 10 00 00 44 00 14 44 50 00 10 01 00 40 04 40 ....D..DP....@.@ > 39BFC = 0 [ blinks... ] Hm, no need to check the bit direction on that one. Zero means that the transaction was never recorded as *either* committed or aborted. Which is certainly not the state that whoever marked (27,2) saw. So what we've got here apparently is active loss of transaction commit bits :-(. Can you tell whether you had any backend crashes on 29 December? It's barely possible that the transaction really did crash and this status is correct, in which case we have only a read failure to explain rather than an after-the-fact change in recorded transaction status ... regards, tom lane
Jared Carr <jared@89glass.com> writes: > Tom Lane wrote: >>> Yes, it does appear that there was a backend crash/(operator stupidly >>> kill -9 'ing possibly) on the 29th. >> >> Hmm ... could you send me that area of the log? > Dec 29 16:31:54 penguin postgres[1714]: [3-1] LOG: received smart > shutdown request What it looks like to me is that this shutdown never completed, probably because some client was hanging onto an open connection ("smart shutdown" doesn't forcibly abort active sessions). The subsequent log entries show a couple of failed attempts to start a postmaster. I would guess the sequence was roughly this: pg_ctl stop ... wait a couple minutes, get bored of waiting ... pg_ctl start ... fails with > Dec 29 16:33:44 penguin postgres[5379]: [1-1] FATAL: lock file > "/var/lib/pgsql/data74/postmaster.pid" already exists > Dec 29 16:33:44 penguin postgres[5379]: [1-2] HINT: Is another > postmaster (PID 1714) running in data directory "/var/lib/pgsql/data74"? kill -9 old postmaster pg_ctl start ... fails with > Dec 29 16:34:12 penguin postgres[5395]: [1-1] FATAL: pre-existing > shared memory block (key 5432001, ID 0) is still in use > Dec 29 16:34:12 penguin postgres[5395]: [1-2] HINT: If you're sure > there are no old server processes still running, remove the shared > memory block with the command "ipcrm", > Dec 29 16:34:12 penguin postgres[5395]: [1-3] or just delete the file > "/var/lib/pgsql/data74/postmaster.pid". rm /var/lib/pgsql/data74/postmaster.pid pg_ctl start If those were the only steps taken, then the old postmaster's child backends would still have been running. At that point you're pretty well hosed because there are two independent sets of backends with separate shared memory blocks, and any subsequent transaction status updates from the old backends could overwrite pg_clog data from the new ones. So I now think it was operator error and not a software bug at all. You need to educate whoever did this about proper database shutdown procedures. As a Postgres maintainer, the only thing that troubles me about this is that the error messages from the failed postmaster start attempts could be read as having encouraged the operator to do exactly the worst possible things. I'm cc'ing this back to pgsql-general to see if anyone has any thoughts about rewording these messages. In particular it seems like the HINT for the second failure is really disastrous; it should tell you to kill off the old backends, not to zap the lockfile. regards, tom lane PS: do you know why the database was being shut down in the first place? Was there a pre-existing problem? PPS: at this point I think we've learned all we can from your database, so you can go ahead and repair the damage: delete from pg_class where ctid = '(27,43)'; should do it, at least for the one known problem. I'd encourage you to do what you can to look for other inconsistencies that may have been introduced.
On Tuesday 20 January 2004 19:08, Tom Lane wrote: > Jared Carr <jared@89glass.com> writes: > > Dec 29 16:33:44 penguin postgres[5379]: [1-1] FATAL: lock file > > "/var/lib/pgsql/data74/postmaster.pid" already exists > > Dec 29 16:33:44 penguin postgres[5379]: [1-2] HINT: Is another > > postmaster (PID 1714) running in data directory "/var/lib/pgsql/data74"? > > > Dec 29 16:34:12 penguin postgres[5395]: [1-1] FATAL: pre-existing > > shared memory block (key 5432001, ID 0) is still in use > > Dec 29 16:34:12 penguin postgres[5395]: [1-2] HINT: If you're sure > > there are no old server processes still running, remove the shared > > memory block with the command "ipcrm", > > Dec 29 16:34:12 penguin postgres[5395]: [1-3] or just delete the file > > "/var/lib/pgsql/data74/postmaster.pid". > As a Postgres maintainer, the only thing that troubles me about this > is that the error messages from the failed postmaster start attempts > could be read as having encouraged the operator to do exactly the > worst possible things. I'm cc'ing this back to pgsql-general to see > if anyone has any thoughts about rewording these messages. In > particular it seems like the HINT for the second failure is really > disastrous; it should tell you to kill off the old backends, not to > zap the lockfile. Should we not support something like "pg_ctl cleanup" which does one or more (as necessary) of: 1. kills the backends 2. runs ipcrm 3. rm postmater.pid Why leave it to the operator at all? Actually, maybe it should be "pg_ctl check" and be able to support a wider range of checks/fixes. -- Richard Huxton Archonet Ltd
Richard Huxton <dev@archonet.com> writes: > On Tuesday 20 January 2004 19:08, Tom Lane wrote: >> ... the error messages from the failed postmaster start attempts >> could be read as having encouraged the operator to do exactly the >> worst possible things. I'm cc'ing this back to pgsql-general to see >> if anyone has any thoughts about rewording these messages. In >> particular it seems like the HINT for the second failure is really >> disastrous; it should tell you to kill off the old backends, not to >> zap the lockfile. > Should we not support something like "pg_ctl cleanup" which does one or more > (as necessary) of: > 1. kills the backends > 2. runs ipcrm > 3. rm postmater.pid > Why leave it to the operator at all? Well, given that the proximate cause of Jared's whole problem was incorrect use of pg_ctl (viz, "pg_ctl stop" when "pg_ctl stop -m fast" was wanted), I'm not sure that adding an even-more-destructive stop mode to pg_ctl would have helped. Someone who doesn't know when to use -m fast isn't likely to know when to use this either. Also, I doubt that pg_ctl can do a better job of this than the existing code in the postmaster. The reason that code failed in this case was that the safety interlocks were deliberately defeated (by removing the lockfile). There isn't anything that pg_ctl could do that wouldn't be subject to similar problems. So I still think what we mostly need is to improve the messages to not encourage people to do something wrong. regards, tom lane PS: this whole saga is an excellent illustration of the reasons for the rule "don't kill -9 the postmaster" ...