Thread: Getting rid of duplicate tables.

Getting rid of duplicate tables.

From
Jared Carr
Date:
First I wish I knew how this was caused but here is our problem.

Sometime in the recent past we got a duplicate table.  Here is the
result of a pg_dump with a pg_restore for just that table.

--
-- TOC entry 59 (OID 11462032)
-- Name: order_to_do; Type: TABLE; Schema: public; Owner: www
-- Data Pos: 0
--

CREATE TABLE order_to_do (
    order_id integer DEFAULT 0 NOT NULL,
    to_do_id text DEFAULT ''::text NOT NULL,
    date_time timestamp without time zone DEFAULT '0001-01-01 00:00:00'::timestamp without time zone NOT NULL,
    csr_id integer DEFAULT 0 NOT NULL,
    supplement text DEFAULT ''::text NOT NULL
);


--
-- TOC entry 60 (OID 11462032)
-- Name: order_to_do; Type: TABLE; Schema: public; Owner: www
-- Data Pos: 0
--

CREATE TABLE order_to_do (
    order_id integer DEFAULT 0 NOT NULL,
    to_do_id text DEFAULT ''::text NOT NULL,
    date_time timestamp without time zone DEFAULT '0001-01-01 00:00:00'::timestamp without time zone NOT NULL,
    csr_id integer DEFAULT 0 NOT NULL,
    supplement text DEFAULT ''::text NOT NULL
);

So somehow it appears that we managed to get an *exact* duplicate of the table well at least
a duplicate reference to the table somewhere important.

First of all how do we get rid of this "extra" table?

Note:
  live=# drop table order_to_do;
  ERROR: duplicate key violates unique constraint "pg_class_oid_index"

Secondly any ideas on how it got there in the first place?

Can this just happen? Or did someone here have to accidentally do something
to create it.

Sorry for not having any idea on what caused this, but thanks for any help you can
give.

Jared



Re: Getting rid of duplicate tables.

From
Tom Lane
Date:
Jared Carr <jared@89glass.com> writes:
> First I wish I knew how this was caused but here is our problem.
> Sometime in the recent past we got a duplicate table.  Here is the
> result of a pg_dump with a pg_restore for just that table.

This should be fairly easy to fix: manually DELETE the extra pg_class
entry (selecting it by ctid, since there isn't any other way to select
just one of them).  However, please don't do that until we've exhausted
the opportunity to figure out how this happened.

For starters, what PG version is this?  And how many entries are there
in pg_class at the moment?  (The output from "VACUUM VERBOSE pg_class"
would be useful to look at.)

            regards, tom lane

Re: Getting rid of duplicate tables.

From
Jared Carr
Date:
Tom Lane wrote:

>Jared Carr <jared@89glass.com> writes:
>
>
>>First I wish I knew how this was caused but here is our problem.
>>Sometime in the recent past we got a duplicate table.  Here is the
>>result of a pg_dump with a pg_restore for just that table.
>>
>>
>
>This should be fairly easy to fix: manually DELETE the extra pg_class
>entry (selecting it by ctid, since there isn't any other way to select
>just one of them).  However, please don't do that until we've exhausted
>the opportunity to figure out how this happened.
>
>For starters, what PG version is this?  And how many entries are there
>in pg_class at the moment?  (The output from "VACUUM VERBOSE pg_class"
>would be useful to look at.)
>
>            regards, tom lane
>
>
This is version 7.4.1 (upgraded from 7.4 about 3 weeks ago).

INFO:  vacuuming "pg_catalog.pg_class"
INFO:  index "pg_class_oid_index" now contains 412 row versions in 7 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "pg_class_relname_nsp_index" now contains 412 row versions
in 25 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_class": found 0 removable, 412 nonremovable row versions in
31 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 1216 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.

Query returned successfully with no result in 20 ms.

And the contents of pg_class relating to this table.

live=# select * from pg_class where relname='order_to_do';
   relname   | relnamespace | reltype  | relowner | relam | relfilenode
| relpages | reltuples | reltoastrelid | reltoastidxid | relhasindex |
relisshared | relkind | relnatts | relchecks | reltriggers | relukeys |
relfkeys | relrefs | relhasoids | relhaspkey | relhasrules |
relhassubclass | relacl

-------------+--------------+----------+----------+-------+-------------+----------+-----------+---------------+---------------+-------------+-------------+---------+----------+-----------+-------------+----------+----------+---------+------------+------------+-------------+----------------+--------
 order_to_do |         2200 | 11462033 |      101 |     0 |    11462032
|      506 |     59401 |      11462039 |             0 | t           |
f           | r       |        5 |         0 |           1 |        0
|        0 |       0 | t          | t          | f           |
f              |
 order_to_do |         2200 | 11462033 |      101 |     0 |    11462032
|      453 |     53407 |      11462039 |             0 | t           |
f           | r       |        5 |         0 |           1 |        0
|        0 |       0 | t          | t          | f           |
f              |
(2 rows)

Thanks for the help, and let me know if you need any more information.

Jared

Re: Getting rid of duplicate tables.

From
Tom Lane
Date:
Jared Carr <jared@89glass.com> writes:
>> For starters, what PG version is this?  And how many entries are there
>> in pg_class at the moment?  (The output from "VACUUM VERBOSE pg_class"
>> would be useful to look at.)

> This is version 7.4.1 (upgraded from 7.4 about 3 weeks ago).

Okay, that shoots down my faint hope that this was an old bug.  I am
growing increasingly suspicious that there is some corner-case problem
with duplicate rows in 7.4, because we've seen two or three reports now
of problems with unexpected duplicates.  The reports all have different
symptoms but I sense a common theme ...

> And the contents of pg_class relating to this table.
>
> live=# select * from pg_class where relname='order_to_do';

I was just about to ask for that, but I need to see the system columns
too:

select oid,ctid,cmax,xmax,cmin,xmin,* from pg_class
  where relname='order_to_do';

Also, would you check whether you get the same results from this query
with enable_indexscan switched off?

            regards, tom lane

Re: Getting rid of duplicate tables.

From
Jared Carr
Date:
Tom Lane wrote:

>Jared Carr <jared@89glass.com> writes:
>
>
>>>For starters, what PG version is this?  And how many entries are there
>>>in pg_class at the moment?  (The output from "VACUUM VERBOSE pg_class"
>>>would be useful to look at.)
>>>
>>>
>
>
>
>>This is version 7.4.1 (upgraded from 7.4 about 3 weeks ago).
>>
>>
>
>Okay, that shoots down my faint hope that this was an old bug.  I am
>growing increasingly suspicious that there is some corner-case problem
>with duplicate rows in 7.4, because we've seen two or three reports now
>of problems with unexpected duplicates.  The reports all have different
>symptoms but I sense a common theme ...
>
>

Well, hopefully I can provide enough information to find out what that
theme is.

>>And the contents of pg_class relating to this table.
>>
>>live=# select * from pg_class where relname='order_to_do';
>>
>>
>
>I was just about to ask for that, but I need to see the system columns
>too:
>
>select oid,ctid,cmax,xmax,cmin,xmin,* from pg_class
>  where relname='order_to_do';
>
>

live=# select oid,ctid,cmax,xmax,cmin,xmin,* from pg_class where
relname='order_to_do';
   oid    |  ctid   | cmax |   xmax   |   cmin   |   xmin   |
relname   | relnamespace | reltype  | relowner | relam | relfilenode |
relpages | reltuples | reltoastrelid | reltoastidxid | relhasindex |
relisshared | relkind | relnatts | relchecks | reltriggers | relukeys |
relfkeys | relrefs | relhasoids | relhaspkey | relhasrules |
relhassubclass | relacl

----------+---------+------+----------+----------+----------+-------------+--------------+----------+----------+-------+-------------+----------+-----------+---------------+---------------+-------------+-------------+---------+----------+-----------+-------------+----------+----------+---------+------------+------------+-------------+----------------+--------
 11462032 |  (27,2) |    0 |        2 |        2 | 46034931 |
order_to_do |         2200 | 11462033 |      101 |     0 |    11462032
|      506 |     59401 |      11462039 |             0 | t           |
f           | r       |        5 |         0 |           1 |        0
|        0 |       0 | t          | t          | f           |
f              |
 11462032 | (27,43) |    2 | 46034931 | 46034931 |  8051642 |
order_to_do |         2200 | 11462033 |      101 |     0 |    11462032
|      453 |     53407 |      11462039 |             0 | t           |
f           | r       |        5 |         0 |           1 |        0
|        0 |       0 | t          | t          | f           |
f              |
(2 rows)


>Also, would you check whether you get the same results from this query
>with enable_indexscan switched off?
>
>
I am pretty sure that they are the same with index scan turned off, but
here is the result for verification.

live=# set enable_indexscan=off;
SET
live=# select oid,ctid,cmax,xmax,cmin,xmin,* from pg_class where
relname='order_to_do';
   oid    |  ctid   | cmax |   xmax   |   cmin   |   xmin   |
relname   | relnamespace | reltype  | relowner | relam | relfilenode |
relpages | reltuples | reltoastrelid | reltoastidxid | relhasindex |
relisshared | relkind | relnatts | relchecks | reltriggers | relukeys |
relfkeys | relrefs | relhasoids | relhaspkey | relhasrules |
relhassubclass | relacl

----------+---------+------+----------+----------+----------+-------------+--------------+----------+----------+-------+-------------+----------+-----------+---------------+---------------+-------------+-------------+---------+----------+-----------+-------------+----------+----------+---------+------------+------------+-------------+----------------+--------
 11462032 |  (27,2) |    0 |        2 |        2 | 46034931 |
order_to_do |         2200 | 11462033 |      101 |     0 |    11462032
|      506 |     59401 |      11462039 |             0 | t           |
f           | r       |        5 |         0 |           1 |        0
|        0 |       0 | t          | t          | f           |
f              |
 11462032 | (27,43) |    2 | 46034931 | 46034931 |  8051642 |
order_to_do |         2200 | 11462033 |      101 |     0 |    11462032
|      453 |     53407 |      11462039 |             0 | t           |
f           | r       |        5 |         0 |           1 |        0
|        0 |       0 | t          | t          | f           |
f              |
(2 rows)

thanks
Jared Carr

Re: Getting rid of duplicate tables.

From
Tom Lane
Date:
Jared Carr <jared@89glass.com> writes:
> live=# select oid,ctid,cmax,xmax,cmin,xmin,* from pg_class where
> relname='order_to_do';
>    oid    |  ctid   | cmax |   xmax   |   cmin   |   xmin   |
> relname   | relnamespace | reltype  | relowner | relam | relfilenode |
> relpages | reltuples | reltoastrelid | reltoastidxid | relhasindex |
> relisshared | relkind | relnatts | relchecks | reltriggers | relukeys |
> relfkeys | relrefs | relhasoids | relhaspkey | relhasrules |
> relhassubclass | relacl
>
----------+---------+------+----------+----------+----------+-------------+--------------+----------+----------+-------+-------------+----------+-----------+---------------+---------------+-------------+-------------+---------+----------+-----------+-------------+----------+----------+---------+------------+------------+-------------+----------------+--------
>  11462032 |  (27,2) |    0 |        2 |        2 | 46034931 |
> order_to_do |         2200 | 11462033 |      101 |     0 |    11462032
> |      506 |     59401 |      11462039 |             0 | t           |
> f           | r       |        5 |         0 |           1 |        0
> |        0 |       0 | t          | t          | f           |
> f              |
>  11462032 | (27,43) |    2 | 46034931 | 46034931 |  8051642 |
> order_to_do |         2200 | 11462033 |      101 |     0 |    11462032
> |      453 |     53407 |      11462039 |             0 | t           |
> f           | r       |        5 |         0 |           1 |        0
> |        0 |       0 | t          | t          | f           |
> f              |
> (2 rows)

Hmm.  This does not look like an index problem --- it looks like a tuple
status problem.  Apparently transaction 46034931 updated this row,
marking the tuple at (27,43) invalid and inserting a new version at
(27,2).  That's fine and is not a unique-index violation by itself,
but we should only be able to see one of these versions of the row.
Clearly, both of them appear good since both are visible to your current
transaction.  There is no way that both of these tuples should be
considered good; one or the other ought to be invalid from the point of
view of any observing transaction.  I think somebody dropped a status
update somewhere.  Have you had any system crashes recently?  What kind
of disk hardware is this running on --- is it IDE, and if so, do you
have write caching properly disabled?

To really dig into this, we need to get a physical dump of these rows.
The best tool for this is pg_filedump, which you can get from
http://sources.redhat.com/rhdb/utilities.html.  You can run it with
something like
    pg_filedump -i -f -R 27 $PGDATA/base/nnn/1259
where nnn is the OID of the database with the problem (look in
pg_database to determine this).  1259 is the known OID of pg_class.
Both of the interesting tuples are on the same page 27, so this should
get the data we need.

Also, please show the output of "ls -l $PGDATA/pg_clog".  If the
relevant segments of clog are still around, we might want to look at
what it says about the commit states of these two transactions.

            regards, tom lane

Re: Getting rid of duplicate tables.

From
Jared Carr
Date:
Tom Lane wrote:

>Jared Carr <jared@89glass.com> writes:
>
>
>>live=# select oid,ctid,cmax,xmax,cmin,xmin,* from pg_class where
>>relname='order_to_do';
>>   oid    |  ctid   | cmax |   xmax   |   cmin   |   xmin   |
>>relname   | relnamespace | reltype  | relowner | relam | relfilenode |
>>relpages | reltuples | reltoastrelid | reltoastidxid | relhasindex |
>>relisshared | relkind | relnatts | relchecks | reltriggers | relukeys |
>>relfkeys | relrefs | relhasoids | relhaspkey | relhasrules |
>>relhassubclass | relacl

>>----------+---------+------+----------+----------+----------+-------------+--------------+----------+----------+-------+-------------+----------+-----------+---------------+---------------+-------------+-------------+---------+----------+-----------+-------------+----------+----------+---------+------------+------------+-------------+----------------+--------
>> 11462032 |  (27,2) |    0 |        2 |        2 | 46034931 |
>>order_to_do |         2200 | 11462033 |      101 |     0 |    11462032
>>|      506 |     59401 |      11462039 |             0 | t           |
>>f           | r       |        5 |         0 |           1 |        0
>>|        0 |       0 | t          | t          | f           |
>>f              |
>> 11462032 | (27,43) |    2 | 46034931 | 46034931 |  8051642 |
>>order_to_do |         2200 | 11462033 |      101 |     0 |    11462032
>>|      453 |     53407 |      11462039 |             0 | t           |
>>f           | r       |        5 |         0 |           1 |        0
>>|        0 |       0 | t          | t          | f           |
>>f              |
>>(2 rows)
>>
>>
>
>Hmm.  This does not look like an index problem --- it looks like a tuple
>status problem.  Apparently transaction 46034931 updated this row,
>marking the tuple at (27,43) invalid and inserting a new version at
>(27,2).  That's fine and is not a unique-index violation by itself,
>but we should only be able to see one of these versions of the row.
>Clearly, both of them appear good since both are visible to your current
>transaction.  There is no way that both of these tuples should be
>considered good; one or the other ought to be invalid from the point of
>view of any observing transaction.  I think somebody dropped a status
>update somewhere.  Have you had any system crashes recently?  What kind
>of disk hardware is this running on --- is it IDE, and if so, do you
>have write caching properly disabled
>
>

Hmm...there haven't been any crashes for a good long time, there were
some power oddities around
a month ago, but I think we would have noticed this by now. The
application that was built on it just
died yesterday, so...unfortunately I don't know for sure how long this
has been this way.

Next it is running scsi disks in a raid 5 array.

>To really dig into this, we need to get a physical dump of these rows.
>The best tool for this is pg_filedump, which you can get from
>http://sources.redhat.com/rhdb/utilities.html.  You can run it with
>something like
>    pg_filedump -i -f -R 27 $PGDATA/base/nnn/1259
>where nnn is the OID of the database with the problem (look in
>pg_database to determine this).  1259 is the known OID of pg_class.
>Both of the interesting tuples are on the same page 27, so this should
>get the data we need.
>
>

The filedump is pasted in at the end.

>Also, please show the output of "ls -l $PGDATA/pg_clog".  If the
>relevant segments of clog are still around, we might want to look at
>what it says about the commit states of these two transactions.
>
>
>

ls -l $PGDATA/pg_clog gives

-rw-------    1 postgres postgres   262144 Nov 26 11:58 0000
-rw-------    1 postgres postgres   262144 Nov 28 11:15 0001
-rw-------    1 postgres postgres   262144 Nov 30 04:27 0002
-rw-------    1 postgres postgres   262144 Dec  1 07:23 0003
-rw-------    1 postgres postgres   262144 Dec  1 11:09 0004
-rw-------    1 postgres postgres   262144 Dec  1 12:50 0005
-rw-------    1 postgres postgres   262144 Dec  1 15:37 0006
-rw-------    1 postgres postgres   262144 Dec  2 05:58 0007
-rw-------    1 postgres postgres   262144 Dec  2 11:25 0008
-rw-------    1 postgres postgres   262144 Dec  2 16:54 0009
-rw-------    1 postgres postgres   262144 Dec  3 10:29 000A
-rw-------    1 postgres postgres   262144 Dec  3 16:31 000B
-rw-------    1 postgres postgres   262144 Dec  4 10:34 000C
-rw-------    1 postgres postgres   262144 Dec  5 00:00 000D
-rw-------    1 postgres postgres   262144 Dec  5 11:50 000E
-rw-------    1 postgres postgres   262144 Dec  6 14:41 000F
-rw-------    1 postgres postgres   262144 Dec  8 05:58 0010
-rw-------    1 postgres postgres   262144 Dec  8 09:30 0011
-rw-------    1 postgres postgres   262144 Dec  8 13:33 0012
-rw-------    1 postgres postgres   262144 Dec  9 06:15 0013
-rw-------    1 postgres postgres   262144 Dec  9 11:47 0014
-rw-------    1 postgres postgres   262144 Dec 10 05:45 0015
-rw-------    1 postgres postgres   262144 Dec 10 12:43 0016
-rw-------    1 postgres postgres   262144 Dec 11 08:11 0017
-rw-------    1 postgres postgres   262144 Dec 11 13:48 0018
-rw-------    1 postgres postgres   262144 Dec 12 06:20 0019
-rw-------    1 postgres postgres   262144 Dec 12 15:44 001A
-rw-------    1 postgres postgres   262144 Dec 14 19:30 001B
-rw-------    1 postgres postgres   262144 Dec 15 10:22 001C
-rw-------    1 postgres postgres   262144 Dec 15 20:41 001D
-rw-------    1 postgres postgres   262144 Dec 16 07:51 001E
-rw-------    1 postgres postgres   262144 Dec 16 13:18 001F
-rw-------    1 postgres postgres   262144 Dec 17 09:57 0020
-rw-------    1 postgres postgres   262144 Dec 18 06:00 0021
-rw-------    1 postgres postgres   262144 Dec 18 15:09 0022
-rw-------    1 postgres postgres   262144 Dec 19 12:25 0023
-rw-------    1 postgres postgres   262144 Dec 21 16:37 0024
-rw-------    1 postgres postgres   262144 Dec 22 13:30 0025
-rw-------    1 postgres postgres   262144 Dec 23 12:04 0026
-rw-------    1 postgres postgres   262144 Dec 25 17:07 0027
-rw-------    1 postgres postgres   262144 Dec 27 11:50 0028
-rw-------    1 postgres postgres   262144 Dec 29 08:27 0029
-rw-------    1 postgres postgres   262144 Dec 29 12:42 002A
-rw-------    1 postgres postgres   262144 Dec 29 17:40 002B
-rw-------    1 postgres postgres   262144 Dec 30 09:26 002C
-rw-------    1 postgres postgres   262144 Dec 30 14:18 002D
-rw-------    1 postgres postgres   262144 Dec 31 08:28 002E
-rw-------    1 postgres postgres   262144 Jan  1 00:05 002F
-rw-------    1 postgres postgres   262144 Jan  1 09:10 0030
-rw-------    1 postgres postgres   262144 Jan  2 10:04 0031
-rw-------    1 postgres postgres   262144 Jan  3 08:34 0032
-rw-------    1 postgres postgres   262144 Jan  5 08:30 0033
-rw-------    1 postgres postgres   262144 Jan  5 16:29 0034
-rw-------    1 postgres postgres   262144 Jan  6 10:50 0035
-rw-------    1 postgres postgres   262144 Jan  6 21:50 0036
-rw-------    1 postgres postgres   262144 Jan  7 13:11 0037
-rw-------    1 postgres postgres   262144 Jan  8 07:34 0038
-rw-------    1 postgres postgres   262144 Jan  8 14:47 0039
-rw-------    1 postgres postgres   262144 Jan  9 10:00 003A
-rw-------    1 postgres postgres   262144 Jan 10 16:58 003B
-rw-------    1 postgres postgres   262144 Jan 12 08:20 003C
-rw-------    1 postgres postgres   262144 Jan 12 15:13 003D
-rw-------    1 postgres postgres   262144 Jan 13 11:25 003E
-rw-------    1 postgres postgres   262144 Jan 14 08:23 003F
-rw-------    1 postgres postgres   262144 Jan 14 16:47 0040
-rw-------    1 postgres postgres   262144 Jan 15 13:06 0041
-rw-------    1 postgres postgres   262144 Jan 16 07:39 0042
-rw-------    1 postgres postgres   262144 Jan 16 14:28 0043
-rw-------    1 postgres postgres    90112 Jan 19 14:30 0044


And finally the dump.

*******************************************************************
* PostgreSQL File/Block Formatted Dump Utility - Version 3.0-Beta
*
* File: /var/tmp/pgsql/data74/base/11462031/1259
* Options used: -i -f -R 27
*
* Dump created on: Tue Jan 20 08:17:53 2004
*******************************************************************

Block   27 ********************************************************
<Header> -----
 Block Offset: 0x00036000         Offsets: Lower     232 (0x00e8)
 Block: Size 8192  Version    1            Upper    6416 (0x1910)
 LSN:  logid      7 recoff 0x2e369840      Special  8192 (0x2000)
 Items:   53                   Free Space: 6184
 Length (including item array): 236

  0000: 07000000 4098362e 18000000 e8001019  ....@.6.........
  0010: 00200120 cc1a0000 cc9a2801 389a2801  . . ......(.8.(.
  0020: 10190000 a4992801 e8170000 10992801  ......(.......(.
  0030: 601b0000 cc1a0000 381a0000 a4190000  `.......8.......
  0040: 10190000 7c180000 e8170000 54170000  ....|.......T...
  0050: c0160000 2c160000 98150000 04150000  ....,...........
  0060: 70140000 dc130000 48130000 b4120000  p.......H.......
  0070: 20120000 8c110000 f8100000 64100000   ...........d...
  0080: d00f0000 3c0f0000 a80e0000 140e0000  ....<...........
  0090: 800d0000 ec0c0000 580c0000 c40b0000  ........X.......
  00a0: 300b0000 9c0a0000 080a0000 74090000  0...........t...
  00b0: e0080000 4c080000 b8070000 6c9f2801  ....L.......l.(.
  00c0: d89e2801 fc050000 449e2801 b09d2801  ..(.....D.(...(.
  00d0: 40040000 1c9d2801 889c2801 f49b2801  @.....(...(...(.
  00e0: f0010000 609b2801 00000000           ....`.(.....

<Data> ------
 Item   1 -- Length:    0  Offset: 6860 (0x1acc)  Flags: 0x00
 Item   2 -- Length:  148  Offset: 6860 (0x1acc)  Flags: USED
  XID: min (46034931)  CMIN|XMAX: 2  CMAX|XVAC: 0
  Block Id: 27  linp Index: 2   Attributes: 23   Size: 28
  infomask: 0x2910 (HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)

  1acc: f36fbe02 02000000 00000000 00001b00  .o..............
  1adc: 02001700 10291c00 90e5ae00 6f726465  .....)......orde
  1aec: 725f746f 5f646f00 00000000 00000000  r_to_do.........
  1afc: 00000000 00000000 00000000 00000000  ................
  1b0c: 00000000 00000000 00000000 00000000  ................
  1b1c: 00000000 00000000 00000000 98080000  ................
  1b2c: 91e5ae00 65000000 00000000 90e5ae00  ....e...........
  1b3c: fa010000 00096847 97e5ae00 00000000  ......hG........
  1b4c: 01007200 05000000 01000000 00000000  ..r.............
  1b5c: 01010000                             ....

 Item   3 -- Length:  148  Offset: 6712 (0x1a38)  Flags: USED
  XID: min (46034931)  CMIN|XMAX: 3  CMAX|XVAC: 0
  Block Id: 27  linp Index: 3   Attributes: 23   Size: 28
  infomask: 0x2910 (HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)

  1a38: f36fbe02 03000000 00000000 00001b00  .o..............
  1a48: 03001700 10291c00 9ae5ae00 6f726465  .....)......orde
  1a58: 725f746f 5f646f5f 706b6579 00000000  r_to_do_pkey....
  1a68: 00000000 00000000 00000000 00000000  ................
  1a78: 00000000 00000000 00000000 00000000  ................
  1a88: 00000000 00000000 00000000 98080000  ................
  1a98: 00000000 65000000 93010000 72c7b501  ....e.......r...
  1aa8: 1d010000 00096847 00000000 00000000  ......hG........
  1ab8: 00006900 02000000 00000000 00000000  ..i.............
  1ac8: 00000000                             ....

 Item   4 -- Length:    0  Offset: 6416 (0x1910)  Flags: 0x00
 Item   5 -- Length:  148  Offset: 6564 (0x19a4)  Flags: USED
  XID: min (46034931)  CMIN|XMAX: 7  CMAX|XVAC: 0
  Block Id: 27  linp Index: 5   Attributes: 23   Size: 28
  infomask: 0x2910 (HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)

  19a4: f36fbe02 07000000 00000000 00001b00  .o..............
  19b4: 05001700 10291c00 bb7e2101 746f5f64  .....)...~!.to_d
  19c4: 6f5f6964 5f627472 65655f69 6e646578  o_id_btree_index
  19d4: 00000000 00000000 00000000 00000000  ................
  19e4: 00000000 00000000 00000000 00000000  ................
  19f4: 00000000 00000000 00000000 98080000  ................
  1a04: 00000000 64000000 93010000 73c7b501  ....d.......s...
  1a14: fd000000 00096847 00000000 00000000  ......hG........
  1a24: 00006900 01000000 00000000 00000000  ..i.............
  1a34: 00000000                             ....

 Item   6 -- Length:    0  Offset: 6120 (0x17e8)  Flags: 0x00
 Item   7 -- Length:  148  Offset: 6416 (0x1910)  Flags: USED
  XID: min (46034931)  CMIN|XMAX: 11  CMAX|XVAC: 0
  Block Id: 27  linp Index: 7   Attributes: 23   Size: 28
  infomask: 0x2910 (HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)

  1910: f36fbe02 0b000000 00000000 00001b00  .o..............
  1920: 07001700 10291c00 99e5ae00 70675f74  .....)......pg_t
  1930: 6f617374 5f313134 36323033 325f696e  oast_11462032_in
  1940: 64657800 00000000 00000000 00000000  dex.............
  1950: 00000000 00000000 00000000 00000000  ................
  1960: 00000000 00000000 00000000 63000000  ............c...
  1970: 00000000 65000000 93010000 74c7b501  ....e.......t...
  1980: 01000000 00000000 00000000 00000000  ................
  1990: 00006900 02000000 00000000 00000000  ..i.............
  19a0: 00000000                             ....

 Item   8 -- Length:    0  Offset: 7008 (0x1b60)  Flags: 0x00
 Item   9 -- Length:    0  Offset: 6860 (0x1acc)  Flags: 0x00
 Item  10 -- Length:    0  Offset: 6712 (0x1a38)  Flags: 0x00
 Item  11 -- Length:    0  Offset: 6564 (0x19a4)  Flags: 0x00
 Item  12 -- Length:    0  Offset: 6416 (0x1910)  Flags: 0x00
 Item  13 -- Length:    0  Offset: 6268 (0x187c)  Flags: 0x00
 Item  14 -- Length:    0  Offset: 6120 (0x17e8)  Flags: 0x00
 Item  15 -- Length:    0  Offset: 5972 (0x1754)  Flags: 0x00
 Item  16 -- Length:    0  Offset: 5824 (0x16c0)  Flags: 0x00
 Item  17 -- Length:    0  Offset: 5676 (0x162c)  Flags: 0x00
 Item  18 -- Length:    0  Offset: 5528 (0x1598)  Flags: 0x00
 Item  19 -- Length:    0  Offset: 5380 (0x1504)  Flags: 0x00
 Item  20 -- Length:    0  Offset: 5232 (0x1470)  Flags: 0x00
 Item  21 -- Length:    0  Offset: 5084 (0x13dc)  Flags: 0x00
 Item  22 -- Length:    0  Offset: 4936 (0x1348)  Flags: 0x00
 Item  23 -- Length:    0  Offset: 4788 (0x12b4)  Flags: 0x00
 Item  24 -- Length:    0  Offset: 4640 (0x1220)  Flags: 0x00
 Item  25 -- Length:    0  Offset: 4492 (0x118c)  Flags: 0x00
 Item  26 -- Length:    0  Offset: 4344 (0x10f8)  Flags: 0x00
 Item  27 -- Length:    0  Offset: 4196 (0x1064)  Flags: 0x00
 Item  28 -- Length:    0  Offset: 4048 (0x0fd0)  Flags: 0x00
 Item  29 -- Length:    0  Offset: 3900 (0x0f3c)  Flags: 0x00
 Item  30 -- Length:    0  Offset: 3752 (0x0ea8)  Flags: 0x00
 Item  31 -- Length:    0  Offset: 3604 (0x0e14)  Flags: 0x00
 Item  32 -- Length:    0  Offset: 3456 (0x0d80)  Flags: 0x00
 Item  33 -- Length:    0  Offset: 3308 (0x0cec)  Flags: 0x00
 Item  34 -- Length:    0  Offset: 3160 (0x0c58)  Flags: 0x00
 Item  35 -- Length:    0  Offset: 3012 (0x0bc4)  Flags: 0x00
 Item  36 -- Length:    0  Offset: 2864 (0x0b30)  Flags: 0x00
 Item  37 -- Length:    0  Offset: 2716 (0x0a9c)  Flags: 0x00
 Item  38 -- Length:    0  Offset: 2568 (0x0a08)  Flags: 0x00
 Item  39 -- Length:    0  Offset: 2420 (0x0974)  Flags: 0x00
 Item  40 -- Length:    0  Offset: 2272 (0x08e0)  Flags: 0x00
 Item  41 -- Length:    0  Offset: 2124 (0x084c)  Flags: 0x00
 Item  42 -- Length:    0  Offset: 1976 (0x07b8)  Flags: 0x00
 Item  43 -- Length:  148  Offset: 8044 (0x1f6c)  Flags: USED
  XID: min (8051642)  CMIN|XMAX: 46034931  CMAX|XVAC: 2
  Block Id: 27  linp Index: 2   Attributes: 23   Size: 28
  infomask: 0x2910 (HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)

  1f6c: badb7a00 f36fbe02 02000000 00001b00  ..z..o..........
  1f7c: 02001700 10291c00 90e5ae00 6f726465  .....)......orde
  1f8c: 725f746f 5f646f00 00000000 00000000  r_to_do.........
  1f9c: 00000000 00000000 00000000 00000000  ................
  1fac: 00000000 00000000 00000000 00000000  ................
  1fbc: 00000000 00000000 00000000 98080000  ................
  1fcc: 91e5ae00 65000000 00000000 90e5ae00  ....e...........
  1fdc: c5010000 009f5047 97e5ae00 00000000  ......PG........
  1fec: 01007200 05000000 01000000 00000000  ..r.............
  1ffc: 01010000                             ....

 Item  44 -- Length:  148  Offset: 7896 (0x1ed8)  Flags: USED
  XID: min (8051642)  CMIN|XMAX: 46034931  CMAX|XVAC: 0
  Block Id: 27  linp Index: 1   Attributes: 23   Size: 28
  infomask: 0x2910 (HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)

  1ed8: badb7a00 f36fbe02 00000000 00001b00  ..z..o..........
  1ee8: 01001700 10291c00 9ae5ae00 6f726465  .....)......orde
  1ef8: 725f746f 5f646f5f 706b6579 00000000  r_to_do_pkey....
  1f08: 00000000 00000000 00000000 00000000  ................
  1f18: 00000000 00000000 00000000 00000000  ................
  1f28: 00000000 00000000 00000000 98080000  ................
  1f38: 00000000 65000000 93010000 54722401  ....e.......Tr$.
  1f48: 03010000 009f5047 00000000 00000000  ......PG........
  1f58: 00006900 02000000 00000000 00000000  ..i.............
  1f68: 00000000                             ....

 Item  45 -- Length:    0  Offset: 1532 (0x05fc)  Flags: 0x00
 Item  46 -- Length:  148  Offset: 7748 (0x1e44)  Flags: USED
  XID: min (8051642)  CMIN|XMAX: 46034931  CMAX|XVAC: 4
  Block Id: 27  linp Index: 4   Attributes: 23   Size: 28
  infomask: 0x2910 (HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)

  1e44: badb7a00 f36fbe02 04000000 00001b00  ..z..o..........
  1e54: 04001700 10291c00 bb7e2101 746f5f64  .....)...~!.to_d
  1e64: 6f5f6964 5f627472 65655f69 6e646578  o_id_btree_index
  1e74: 00000000 00000000 00000000 00000000  ................
  1e84: 00000000 00000000 00000000 00000000  ................
  1e94: 00000000 00000000 00000000 98080000  ................
  1ea4: 00000000 64000000 93010000 55722401  ....d.......Ur$.
  1eb4: e1000000 009f5047 00000000 00000000  ......PG........
  1ec4: 00006900 01000000 00000000 00000000  ..i.............
  1ed4: 00000000                             ....

 Item  47 -- Length:  148  Offset: 7600 (0x1db0)  Flags: USED
  XID: min (8051642)  CMIN|XMAX: 46034931  CMAX|XVAC: 8
  Block Id: 27  linp Index: 6   Attributes: 23   Size: 28
  infomask: 0x2910 (HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)

  1db0: badb7a00 f36fbe02 08000000 00001b00  ..z..o..........
  1dc0: 06001700 10291c00 99e5ae00 70675f74  .....)......pg_t
  1dd0: 6f617374 5f313134 36323033 325f696e  oast_11462032_in
  1de0: 64657800 00000000 00000000 00000000  dex.............
  1df0: 00000000 00000000 00000000 00000000  ................
  1e00: 00000000 00000000 00000000 63000000  ............c...
  1e10: 00000000 65000000 93010000 62722401  ....e.......br$.
  1e20: 01000000 00000000 00000000 00000000  ................
  1e30: 00006900 02000000 00000000 00000000  ..i.............
  1e40: 00000000                             ....

 Item  48 -- Length:    0  Offset: 1088 (0x0440)  Flags: 0x00
 Item  49 -- Length:  148  Offset: 7452 (0x1d1c)  Flags: USED
  XID: min (8051658)  CMIN|XMAX: 3  CMAX|XVAC: 0
  Block Id: 27  linp Index: 49   Attributes: 23   Size: 28
  infomask: 0x2910 (HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)

  1d1c: cadb7a00 03000000 00000000 00001b00  ..z.............
  1d2c: 31001700 10291c00 9ce5ae00 746f5f64  1....)......to_d
  1d3c: 6f5f6974 656d7300 00000000 00000000  o_items.........
  1d4c: 00000000 00000000 00000000 00000000  ................
  1d5c: 00000000 00000000 00000000 00000000  ................
  1d6c: 00000000 00000000 00000000 98080000  ................
  1d7c: 9de5ae00 65000000 00000000 9ce5ae00  ....e...........
  1d8c: 01000000 00004041 a8e5ae00 00000000  ......@A........
  1d9c: 01007200 0b000300 01000000 00000000  ..r.............
  1dac: 01010000                             ....

 Item  50 -- Length:  148  Offset: 7304 (0x1c88)  Flags: USED
  XID: min (8051658)  CMIN|XMAX: 4  CMAX|XVAC: 0
  Block Id: 27  linp Index: 50   Attributes: 23   Size: 28
  infomask: 0x2910 (HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)

  1c88: cadb7a00 04000000 00000000 00001b00  ..z.............
  1c98: 32001700 10291c00 abe5ae00 746f5f64  2....)......to_d
  1ca8: 6f5f6974 656d735f 706b6579 00000000  o_items_pkey....
  1cb8: 00000000 00000000 00000000 00000000  ................
  1cc8: 00000000 00000000 00000000 00000000  ................
  1cd8: 00000000 00000000 00000000 98080000  ................
  1ce8: 00000000 65000000 93010000 63722401  ....e.......cr$.
  1cf8: 02000000 00004041 00000000 00000000  ......@A........
  1d08: 00006900 01000000 00000000 00000000  ..i.............
  1d18: 00000000                             ....

 Item  51 -- Length:  148  Offset: 7156 (0x1bf4)  Flags: USED
  XID: min (8051658)  CMIN|XMAX: 5  CMAX|XVAC: 0
  Block Id: 27  linp Index: 51   Attributes: 23   Size: 28
  infomask: 0x2910 (HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)

  1bf4: cadb7a00 05000000 00000000 00001b00  ..z.............
  1c04: 33001700 10291c00 aae5ae00 70675f74  3....)......pg_t
  1c14: 6f617374 5f313134 36323034 345f696e  oast_11462044_in
  1c24: 64657800 00000000 00000000 00000000  dex.............
  1c34: 00000000 00000000 00000000 00000000  ................
  1c44: 00000000 00000000 00000000 63000000  ............c...
  1c54: 00000000 65000000 93010000 64722401  ....e.......dr$.
  1c64: 01000000 00000000 00000000 00000000  ................
  1c74: 00006900 02000000 00000000 00000000  ..i.............
  1c84: 00000000                             ....

 Item  52 -- Length:    0  Offset:  496 (0x01f0)  Flags: 0x00
 Item  53 -- Length:  148  Offset: 7008 (0x1b60)  Flags: USED
  XID: min (8051660)  CMIN|XMAX: 4  CMAX|XVAC: 0
  Block Id: 27  linp Index: 53   Attributes: 23   Size: 28
  infomask: 0x2910 (HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)

  1b60: ccdb7a00 04000000 00000000 00001b00  ..z.............
  1b70: 35001700 10291c00 b8e5ae00 61646d69  5....)......admi
  1b80: 6e5f7365 7373696f 6e735f70 6b657900  n_sessions_pkey.
  1b90: 00000000 00000000 00000000 00000000  ................
  1ba0: 00000000 00000000 00000000 00000000  ................
  1bb0: 00000000 00000000 00000000 98080000  ................
  1bc0: 00000000 65000000 93010000 65722401  ....e.......er$.
  1bd0: 49000000 00008841 00000000 00000000  I......A........
  1be0: 00006900 01000000 00000000 00000000  ..i.............
  1bf0: 00000000                             ....



*** End of Requested Range Encountered. Last Block Read: 27 ***

Thanks,
Jared Carr



Re: Getting rid of duplicate tables.

From
Tom Lane
Date:
Jared Carr <jared@89glass.com> writes:
>  Item   2 -- Length:  148  Offset: 6860 (0x1acc)  Flags: USED
>   XID: min (46034931)  CMIN|XMAX: 2  CMAX|XVAC: 0
>   Block Id: 27  linp Index: 2   Attributes: 23   Size: 28
>   infomask: 0x2910 (HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)

>  Item  43 -- Length:  148  Offset: 8044 (0x1f6c)  Flags: USED
>   XID: min (8051642)  CMIN|XMAX: 46034931  CMAX|XVAC: 2
>   Block Id: 27  linp Index: 2   Attributes: 23   Size: 28
>   infomask: 0x2910 (HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)

Well, there's the smoking gun ... somebody marked (27,2) as
XMIN_COMMITTED, showing that they thought 46034931 was committed, while
someone else marked (27,43) as XMAX_INVALID, showing that they thought
46034931 was aborted.  So we have some kind of very-infrequent
breakage in transaction commit-state lookup.  Or a hardware problem,
but I suspect we are looking at a bug.

Could you check out what pg_clog has for transaction 46034931?
This would be pg_clog/002B (which dates your problem to Dec 29 BTW),
byte at offset 39BFC hex or 236540 decimal.  I forget which way the
bits run within the byte but will look it up if you can get me the
value of that byte.

I'm off to take a real close look at what was done to the pg_clog code
during the 7.4 cycle ...

            regards, tom lane

Re: Getting rid of duplicate tables.

From
Jared Carr
Date:
Tom Lane wrote:

>Jared Carr <jared@89glass.com> writes:
>
>
>> Item   2 -- Length:  148  Offset: 6860 (0x1acc)  Flags: USED
>>  XID: min (46034931)  CMIN|XMAX: 2  CMAX|XVAC: 0
>>  Block Id: 27  linp Index: 2   Attributes: 23   Size: 28
>>  infomask: 0x2910 (HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)
>>
>>
>
>
>
>> Item  43 -- Length:  148  Offset: 8044 (0x1f6c)  Flags: USED
>>  XID: min (8051642)  CMIN|XMAX: 46034931  CMAX|XVAC: 2
>>  Block Id: 27  linp Index: 2   Attributes: 23   Size: 28
>>  infomask: 0x2910 (HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)
>>
>>
>
>Well, there's the smoking gun ... somebody marked (27,2) as
>XMIN_COMMITTED, showing that they thought 46034931 was committed, while
>someone else marked (27,43) as XMAX_INVALID, showing that they thought
>46034931 was aborted.  So we have some kind of very-infrequent
>breakage in transaction commit-state lookup.  Or a hardware problem,
>but I suspect we are looking at a bug.
>
>Could you check out what pg_clog has for transaction 46034931?
>This would be pg_clog/002B (which dates your problem to Dec 29 BTW),
>byte at offset 39BFC hex or 236540 decimal.  I forget which way the
>bits run within the byte but will look it up if you can get me the
>value of that byte.
>
>
Here is the appropriate "line" (line is used *very* loosely there)

00039BF0  04 10 00 00 44 00 14 44 50 00 10 01 00 40 04 40 ....D..DP....@.@

39BFC = 0

Jared Carr

Re: Getting rid of duplicate tables.

From
Tom Lane
Date:
Jared Carr <jared@89glass.com> writes:
> Tom Lane wrote:
>> Could you check out what pg_clog has for transaction 46034931?
>> This would be pg_clog/002B (which dates your problem to Dec 29 BTW),
>> byte at offset 39BFC hex or 236540 decimal.  I forget which way the
>> bits run within the byte but will look it up if you can get me the
>> value of that byte.
>>
> Here is the appropriate "line" (line is used *very* loosely there)

> 00039BF0  04 10 00 00 44 00 14 44 50 00 10 01 00 40 04 40 ....D..DP....@.@

> 39BFC = 0

[ blinks... ]  Hm, no need to check the bit direction on that one.
Zero means that the transaction was never recorded as *either* committed
or aborted.  Which is certainly not the state that whoever marked (27,2)
saw.  So what we've got here apparently is active loss of transaction
commit bits :-(.

Can you tell whether you had any backend crashes on 29 December?
It's barely possible that the transaction really did crash and this
status is correct, in which case we have only a read failure to explain
rather than an after-the-fact change in recorded transaction status ...

            regards, tom lane

Re: Getting rid of duplicate tables.

From
Tom Lane
Date:
Jared Carr <jared@89glass.com> writes:
> Tom Lane wrote:
>>> Yes, it does appear that there was a backend crash/(operator stupidly
>>> kill -9 'ing possibly) on the 29th.
>>
>> Hmm ... could you send me that area of the log?

> Dec 29 16:31:54 penguin postgres[1714]: [3-1] LOG:  received smart
> shutdown request

What it looks like to me is that this shutdown never completed, probably
because some client was hanging onto an open connection ("smart
shutdown" doesn't forcibly abort active sessions).  The subsequent log
entries show a couple of failed attempts to start a postmaster.  I would
guess the sequence was roughly this:

    pg_ctl stop
    ... wait a couple minutes, get bored of waiting ...
    pg_ctl start
    ... fails with

> Dec 29 16:33:44 penguin postgres[5379]: [1-1] FATAL:  lock file
> "/var/lib/pgsql/data74/postmaster.pid" already exists
> Dec 29 16:33:44 penguin postgres[5379]: [1-2] HINT:  Is another
> postmaster (PID 1714) running in data directory "/var/lib/pgsql/data74"?

    kill -9 old postmaster
    pg_ctl start
    ... fails with

> Dec 29 16:34:12 penguin postgres[5395]: [1-1] FATAL:  pre-existing
> shared memory block (key 5432001, ID 0) is still in use
> Dec 29 16:34:12 penguin postgres[5395]: [1-2] HINT:  If you're sure
> there are no old server processes still running, remove the shared
> memory block with the command "ipcrm",
> Dec 29 16:34:12 penguin postgres[5395]: [1-3]  or just delete the file
> "/var/lib/pgsql/data74/postmaster.pid".

    rm /var/lib/pgsql/data74/postmaster.pid
    pg_ctl start

If those were the only steps taken, then the old postmaster's child
backends would still have been running.  At that point you're pretty
well hosed because there are two independent sets of backends with
separate shared memory blocks, and any subsequent transaction status
updates from the old backends could overwrite pg_clog data from the
new ones.

So I now think it was operator error and not a software bug at all.
You need to educate whoever did this about proper database shutdown
procedures.

As a Postgres maintainer, the only thing that troubles me about this
is that the error messages from the failed postmaster start attempts
could be read as having encouraged the operator to do exactly the
worst possible things.  I'm cc'ing this back to pgsql-general to see
if anyone has any thoughts about rewording these messages.  In
particular it seems like the HINT for the second failure is really
disastrous; it should tell you to kill off the old backends, not to
zap the lockfile.

            regards, tom lane

PS: do you know why the database was being shut down in the first place?
Was there a pre-existing problem?

PPS: at this point I think we've learned all we can from your database,
so you can go ahead and repair the damage:
    delete from pg_class where ctid = '(27,43)';
should do it, at least for the one known problem.  I'd encourage you
to do what you can to look for other inconsistencies that may have been
introduced.

Re: Getting rid of duplicate tables.

From
Richard Huxton
Date:
On Tuesday 20 January 2004 19:08, Tom Lane wrote:
> Jared Carr <jared@89glass.com> writes:
> > Dec 29 16:33:44 penguin postgres[5379]: [1-1] FATAL:  lock file
> > "/var/lib/pgsql/data74/postmaster.pid" already exists
> > Dec 29 16:33:44 penguin postgres[5379]: [1-2] HINT:  Is another
> > postmaster (PID 1714) running in data directory "/var/lib/pgsql/data74"?
>
> > Dec 29 16:34:12 penguin postgres[5395]: [1-1] FATAL:  pre-existing
> > shared memory block (key 5432001, ID 0) is still in use
> > Dec 29 16:34:12 penguin postgres[5395]: [1-2] HINT:  If you're sure
> > there are no old server processes still running, remove the shared
> > memory block with the command "ipcrm",
> > Dec 29 16:34:12 penguin postgres[5395]: [1-3]  or just delete the file
> > "/var/lib/pgsql/data74/postmaster.pid".

> As a Postgres maintainer, the only thing that troubles me about this
> is that the error messages from the failed postmaster start attempts
> could be read as having encouraged the operator to do exactly the
> worst possible things.  I'm cc'ing this back to pgsql-general to see
> if anyone has any thoughts about rewording these messages.  In
> particular it seems like the HINT for the second failure is really
> disastrous; it should tell you to kill off the old backends, not to
> zap the lockfile.

Should we not support something like "pg_ctl cleanup" which does one or more
(as necessary) of:
  1. kills the backends
  2. runs ipcrm
  3. rm postmater.pid
Why leave it to the operator at all?

Actually, maybe it should be "pg_ctl check" and be able to support a wider
range of checks/fixes.

--
  Richard Huxton
  Archonet Ltd

Re: Getting rid of duplicate tables.

From
Tom Lane
Date:
Richard Huxton <dev@archonet.com> writes:
> On Tuesday 20 January 2004 19:08, Tom Lane wrote:
>> ... the error messages from the failed postmaster start attempts
>> could be read as having encouraged the operator to do exactly the
>> worst possible things.  I'm cc'ing this back to pgsql-general to see
>> if anyone has any thoughts about rewording these messages.  In
>> particular it seems like the HINT for the second failure is really
>> disastrous; it should tell you to kill off the old backends, not to
>> zap the lockfile.

> Should we not support something like "pg_ctl cleanup" which does one or more
> (as necessary) of:
>   1. kills the backends
>   2. runs ipcrm
>   3. rm postmater.pid
> Why leave it to the operator at all?

Well, given that the proximate cause of Jared's whole problem was
incorrect use of pg_ctl (viz, "pg_ctl stop" when "pg_ctl stop -m fast"
was wanted), I'm not sure that adding an even-more-destructive stop mode
to pg_ctl would have helped.  Someone who doesn't know when to use -m fast
isn't likely to know when to use this either.

Also, I doubt that pg_ctl can do a better job of this than the existing
code in the postmaster.  The reason that code failed in this case was
that the safety interlocks were deliberately defeated (by removing the
lockfile).  There isn't anything that pg_ctl could do that wouldn't be
subject to similar problems.

So I still think what we mostly need is to improve the messages to not
encourage people to do something wrong.

            regards, tom lane

PS: this whole saga is an excellent illustration of the reasons for the
rule "don't kill -9 the postmaster" ...