Thread: "too many trigger records found for relation xyz"
Hello all
Postgres version 8.3.1
I just created a bunch of tables (~10) with identical structure; all tables have 6 foreign key references to other tables and a primary key. To my surprise, some of the tables were created ok, some missed primary key and some didn't get created at all.
Postgres complained:
2008-04-08 08:50:41 EEST [721]: [27-1] ERROR: relation 543058 deleted while still in use
and
2008-04-08 08:50:41 EEST [721]: [36-1] ERROR: could not find tuple for relation 54309
and
2008-04-08 08:50:42 EEST [721]: [114-1] ERROR: duplicate key value violates unique constraint "pg_class_oid_index"
When I tried to remove the tables, the same errors kept coming up
2008-04-08 08:53:15 EEST [27794]: [5-1] ERROR: duplicate key value violates unique constraint "pg_class_oid_index"
2008-04-08 08:53:15 EEST [27794]: [6-1] STATEMENT: drop table xyz ;
I searched google for a while, and then tried again and this time I was able to drop all but one of the tables. When I try to drop the one remaining table, I get this error message:
2008-04-08 10:09:31 EEST [27794]: [95-1] ERROR: too many trigger records found for relation "xyz"
The database had a bunch of autovacuum runs and an import process copying data to temporary tables at the time I tried to create the tables. What could cause such behaviour, and how can I remove the one remaining table?
Regards
Mikko
Postgres version 8.3.1
I just created a bunch of tables (~10) with identical structure; all tables have 6 foreign key references to other tables and a primary key. To my surprise, some of the tables were created ok, some missed primary key and some didn't get created at all.
Postgres complained:
2008-04-08 08:50:41 EEST [721]: [27-1] ERROR: relation 543058 deleted while still in use
and
2008-04-08 08:50:41 EEST [721]: [36-1] ERROR: could not find tuple for relation 54309
and
2008-04-08 08:50:42 EEST [721]: [114-1] ERROR: duplicate key value violates unique constraint "pg_class_oid_index"
When I tried to remove the tables, the same errors kept coming up
2008-04-08 08:53:15 EEST [27794]: [5-1] ERROR: duplicate key value violates unique constraint "pg_class_oid_index"
2008-04-08 08:53:15 EEST [27794]: [6-1] STATEMENT: drop table xyz ;
I searched google for a while, and then tried again and this time I was able to drop all but one of the tables. When I try to drop the one remaining table, I get this error message:
2008-04-08 10:09:31 EEST [27794]: [95-1] ERROR: too many trigger records found for relation "xyz"
The database had a bunch of autovacuum runs and an import process copying data to temporary tables at the time I tried to create the tables. What could cause such behaviour, and how can I remove the one remaining table?
Regards
Mikko
Mikko Partio napsal(a): > Hello all > > Postgres version 8.3.1 > > I just created a bunch of tables (~10) with identical structure; all tables > have 6 foreign key references to other tables and a primary key. To my > surprise, some of the tables were created ok, some missed primary key and > some didn't get created at all. Can you provide self contained reproduction scenario? Zdenek
On Tue, Apr 8, 2008 at 11:27 AM, Zdenek Kotala <Zdenek.Kotala@sun.com> wrote:
Unfortunately not -- if I try to dump one the referenced tables I get this error:
pg_dump: SQL command failed
pg_dump: Error message from server: ERROR: more than one row returned by a subquery used as an expression
pg_dump: The command was: SELECT tableoid, oid, typname, typnamespace, (SELECT rolname FROM pg_catalog.pg_roles WHERE oid = typowner) as rolname, typinput::oid as typinput, typoutput::oid as typoutput, typelem, typrelid, CASE WHEN typrelid = 0 THEN ' '::"char" ELSE (SELECT relkind FROM pg_class WHERE oid = typrelid) END as typrelkind, typtype, typisdefined, typname[0] = '_' AND typelem != 0 AND (SELECT typarray FROM pg_type te WHERE oid = pg_type.typelem) = oid AS isarray FROM pg_type
Regards
Mikko
Mikko Partio napsal(a):Can you provide self contained reproduction scenario?Hello all
Postgres version 8.3.1
I just created a bunch of tables (~10) with identical structure; all tables
have 6 foreign key references to other tables and a primary key. To my
surprise, some of the tables were created ok, some missed primary key and
some didn't get created at all.
Unfortunately not -- if I try to dump one the referenced tables I get this error:
pg_dump: SQL command failed
pg_dump: Error message from server: ERROR: more than one row returned by a subquery used as an expression
pg_dump: The command was: SELECT tableoid, oid, typname, typnamespace, (SELECT rolname FROM pg_catalog.pg_roles WHERE oid = typowner) as rolname, typinput::oid as typinput, typoutput::oid as typoutput, typelem, typrelid, CASE WHEN typrelid = 0 THEN ' '::"char" ELSE (SELECT relkind FROM pg_class WHERE oid = typrelid) END as typrelkind, typtype, typisdefined, typname[0] = '_' AND typelem != 0 AND (SELECT typarray FROM pg_type te WHERE oid = pg_type.typelem) = oid AS isarray FROM pg_type
Regards
Mikko
On Tue, Apr 8, 2008 at 12:20 PM, Mikko Partio <mpartio@gmail.com> wrote:
On Tue, Apr 8, 2008 at 11:27 AM, Zdenek Kotala <Zdenek.Kotala@sun.com> wrote:
Mikko Partio napsal(a):Can you provide self contained reproduction scenario?Hello all
Postgres version 8.3.1
I just created a bunch of tables (~10) with identical structure; all tables
have 6 foreign key references to other tables and a primary key. To my
surprise, some of the tables were created ok, some missed primary key and
some didn't get created at all.
Unfortunately not -- if I try to dump one the referenced tables I get this error:
pg_dump: SQL command failed
pg_dump: Error message from server: ERROR: more than one row returned by a subquery used as an expression
pg_dump: The command was: SELECT tableoid, oid, typname, typnamespace, (SELECT rolname FROM pg_catalog.pg_roles WHERE oid = typowner) as rolname, typinput::oid as typinput, typoutput::oid as typoutput, typelem, typrelid, CASE WHEN typrelid = 0 THEN ' '::"char" ELSE (SELECT relkind FROM pg_class WHERE oid = typrelid) END as typrelkind, typtype, typisdefined, typname[0] = '_' AND typelem != 0 AND (SELECT typarray FROM pg_type te WHERE oid = pg_type.typelem) = oid AS isarray FROM pg_type
I little investigation showed that there is duplicate row in pg_class:
# select oid from pg_class group by oid having count(*) > 1 ;
oid
--------
294397
(1 row)
# \x
Expanded display is on.
# select * from pg_class where oid = 294397;
-[ RECORD 1 ]--+-----------------------------------------
relname | abc
relnamespace | 2200
reltype | 294399
relowner | 16385
relam | 0
relfilenode | 294397
reltablespace | 0
relpages | 1
reltuples | 3
reltoastrelid | 294400
reltoastidxid | 0
relhasindex | t
relisshared | f
relkind | r
relnatts | 4
relchecks | 0
reltriggers | 6
relukeys | 0
relfkeys | 0
relrefs | 0
relhasoids | f
relhaspkey | t
relhasrules | f
relhassubclass | f
relfrozenxid | 3840767676
relacl | {removed}
reloptions |
-[ RECORD 2 ]--+-----------------------------------------
relname | abc
relnamespace | 2200
reltype | 294399
relowner | 16385
relam | 0
relfilenode | 294397
reltablespace | 0
relpages | 1
reltuples | 3
reltoastrelid | 294400
reltoastidxid | 0
relhasindex | t
relisshared | f
relkind | r
relnatts | 4
relchecks | 0
reltriggers | 12
relukeys | 0
relfkeys | 0
relrefs | 0
relhasoids | f
relhaspkey | t
relhasrules | f
relhassubclass | f
relfrozenxid | 3840767676
relacl | {removed}
reloptions |
Regards
Mikko
Mikko Partio napsal(a): > > > I little investigation showed that there is duplicate row in pg_class: > > # select oid from pg_class group by oid having count(*) > 1 ; > oid > -------- > 294397 > (1 row) Could you check if pg_attribute is doubled for this relation? Zdenek
On Tue, Apr 8, 2008 at 1:19 PM, Zdenek Kotala <Zdenek.Kotala@sun.com> wrote:
It looks like pg_attribute has only a single value:
# select * from pg_attribute where attrelid = 294397 ;
attrelid | attname | atttypid | attstattarget | attlen | attnum | attndims | attcacheoff | atttypmod | attbyval | attstorage | attalign | attnotnull | atthasdef | attisdropped | attislocal | attinhcount
----------+----------+----------+---------------+--------+--------+----------+-------------+-----------+----------+------------+----------+------------+-----------+--------------+------------+-------------
294397 | tableoid | 26 | 0 | 4 | -7 | 0 | -1 | -1 | t | p | i | t | f | f | t | 0
294397 | cmax | 29 | 0 | 4 | -6 | 0 | -1 | -1 | t | p | i | t | f | f | t | 0
294397 | xmax | 28 | 0 | 4 | -5 | 0 | -1 | -1 | t | p | i | t | f | f | t | 0
294397 | cmin | 29 | 0 | 4 | -4 | 0 | -1 | -1 | t | p | i | t | f | f | t | 0
294397 | xmin | 28 | 0 | 4 | -3 | 0 | -1 | -1 | t | p | i | t | f | f | t | 0
294397 | ctid | 27 | 0 | 6 | -1 | 0 | -1 | -1 | f | p | s | t | f | f | t | 0
294397 | id | 21 | -1 | 2 | 1 | 0 | -1 | -1 | t | p | s | t | f | f | t | 0
294397 | name | 1043 | -1 | -1 | 2 | 0 | -1 | 16 | f | x | i | t | f | f | t | 0
294397 | desc_fi | 25 | -1 | -1 | 3 | 0 | -1 | -1 | f | x | i | t | f | f | t | 0
294397 | desc_en | 25 | -1 | -1 | 4 | 0 | -1 | -1 | f | x | i | t | f | f | t | 0
(10 rows)
Should I try remove one of the duplicate rows from pg_class?
Regards
Mikko
Mikko Partio napsal(a):Could you check if pg_attribute is doubled for this relation?
I little investigation showed that there is duplicate row in pg_class:
# select oid from pg_class group by oid having count(*) > 1 ;
oid
--------
294397
(1 row)
It looks like pg_attribute has only a single value:
# select * from pg_attribute where attrelid = 294397 ;
attrelid | attname | atttypid | attstattarget | attlen | attnum | attndims | attcacheoff | atttypmod | attbyval | attstorage | attalign | attnotnull | atthasdef | attisdropped | attislocal | attinhcount
----------+----------+----------+---------------+--------+--------+----------+-------------+-----------+----------+------------+----------+------------+-----------+--------------+------------+-------------
294397 | tableoid | 26 | 0 | 4 | -7 | 0 | -1 | -1 | t | p | i | t | f | f | t | 0
294397 | cmax | 29 | 0 | 4 | -6 | 0 | -1 | -1 | t | p | i | t | f | f | t | 0
294397 | xmax | 28 | 0 | 4 | -5 | 0 | -1 | -1 | t | p | i | t | f | f | t | 0
294397 | cmin | 29 | 0 | 4 | -4 | 0 | -1 | -1 | t | p | i | t | f | f | t | 0
294397 | xmin | 28 | 0 | 4 | -3 | 0 | -1 | -1 | t | p | i | t | f | f | t | 0
294397 | ctid | 27 | 0 | 6 | -1 | 0 | -1 | -1 | f | p | s | t | f | f | t | 0
294397 | id | 21 | -1 | 2 | 1 | 0 | -1 | -1 | t | p | s | t | f | f | t | 0
294397 | name | 1043 | -1 | -1 | 2 | 0 | -1 | 16 | f | x | i | t | f | f | t | 0
294397 | desc_fi | 25 | -1 | -1 | 3 | 0 | -1 | -1 | f | x | i | t | f | f | t | 0
294397 | desc_en | 25 | -1 | -1 | 4 | 0 | -1 | -1 | f | x | i | t | f | f | t | 0
(10 rows)
Should I try remove one of the duplicate rows from pg_class?
Regards
Mikko
Mikko Partio napsal(a): > Should I try remove one of the duplicate rows from pg_class? Try it with caution. You should use ctid column to refer to exact row. Try before: select oid, ctid, xmin, xmax, cmin, cmax, relname from pg_class where oid = 294397; If both row are identical or not. Zdenek
On Tue, Apr 8, 2008 at 1:45 PM, Zdenek Kotala <Zdenek.Kotala@sun.com> wrote:
# select oid, ctid, xmin, xmax, cmin, cmax, relname from pg_class where oid = 294397;
oid | ctid | xmin | xmax | cmin | cmax | relname
--------+-----------+------------+------------+------+------+--------------
294397 | (1125,36) | 3944654557 | 0 | 35 | 35 | abc
294397 | (1124,55) | 3943984380 | 3943984642 | 3 | 3 | abc
(2 rows)
Which one should I remove? What is the worst case scenario for the delete? The database is 1,5TB so I would rather not restore it from backups :-)
Regards
Mikko
Mikko Partio napsal(a):Try it with caution. You should use ctid column to refer to exact row.Should I try remove one of the duplicate rows from pg_class?
Try before:
select oid, ctid, xmin, xmax, cmin, cmax, relname from pg_class where oid = 294397;
If both row are identical or not.
# select oid, ctid, xmin, xmax, cmin, cmax, relname from pg_class where oid = 294397;
oid | ctid | xmin | xmax | cmin | cmax | relname
--------+-----------+------------+------------+------+------+--------------
294397 | (1125,36) | 3944654557 | 0 | 35 | 35 | abc
294397 | (1124,55) | 3943984380 | 3943984642 | 3 | 3 | abc
(2 rows)
Which one should I remove? What is the worst case scenario for the delete? The database is 1,5TB so I would rather not restore it from backups :-)
Regards
Mikko
Mikko Partio napsal(a): > On Tue, Apr 8, 2008 at 1:45 PM, Zdenek Kotala <Zdenek.Kotala@sun.com> wrote: > >> Mikko Partio napsal(a): >> >> Should I try remove one of the duplicate rows from pg_class? >> Try it with caution. You should use ctid column to refer to exact row. >> >> Try before: >> >> select oid, ctid, xmin, xmax, cmin, cmax, relname from pg_class where oid >> = 294397; >> >> If both row are identical or not. >> > > > # select oid, ctid, xmin, xmax, cmin, cmax, relname from pg_class where oid > = 294397; > oid | ctid | xmin | xmax | cmin | cmax | relname > --------+-----------+------------+------------+------+------+-------------- > 294397 | (1125,36) | 3944654557 | 0 | 35 | 35 | abc > 294397 | (1124,55) | 3943984380 | 3943984642 | 3 | 3 | abc > (2 rows) > > Which one should I remove? What is the worst case scenario for the delete? > The database is 1,5TB so I would rather not restore it from backups :-) (1124,55) is deleted one. This row should not be visible. But it seems a problem with visibility - transaction wraparound. Do you have any warning message in the log about vacuuming? Or clog corruption when transaction 3943984642 is marked as rollbacked. Zdenek
On Tue, Apr 8, 2008 at 2:49 PM, Zdenek Kotala <Zdenek.Kotala@sun.com> wrote:
There are no such messages in the logfile, but autovacuum is working hard to prevent xid wraparound (we have an import process which consums huge amounts of xids).
Mikko Partio napsal(a):(1124,55) is deleted one. This row should not be visible. But it seems a problem with visibility - transaction wraparound. Do you have any warning message in the log about vacuuming? Or clog corruption when transaction 3943984642 is marked as rollbacked.On Tue, Apr 8, 2008 at 1:45 PM, Zdenek Kotala <Zdenek.Kotala@sun.com> wrote:Mikko Partio napsal(a):
Should I try remove one of the duplicate rows from pg_class?
Try it with caution. You should use ctid column to refer to exact row.
Try before:
select oid, ctid, xmin, xmax, cmin, cmax, relname from pg_class where oid
= 294397;
If both row are identical or not.
# select oid, ctid, xmin, xmax, cmin, cmax, relname from pg_class where oid
= 294397;
oid | ctid | xmin | xmax | cmin | cmax | relname
--------+-----------+------------+------------+------+------+--------------
294397 | (1125,36) | 3944654557 | 0 | 35 | 35 | abc
294397 | (1124,55) | 3943984380 | 3943984642 | 3 | 3 | abc
(2 rows)
Which one should I remove? What is the worst case scenario for the delete?
The database is 1,5TB so I would rather not restore it from backups :-)
There are no such messages in the logfile, but autovacuum is working hard to prevent xid wraparound (we have an import process which consums huge amounts of xids).
# select max(age(relfrozenxid)) from pg_class where relkind = 'r' ;
max
------------
204500712
(1 row)
How can I tell if there is clog corruption?
Thanks a lot for your help.
Regards
Mikko
Mikko Partio napsal(a): > How can I tell if there is clog corruption? > Unfortunatelly, I'm afraid that there is not way how to check it. CLOG file is only array of bits and it does not contains any CRC or backup copy. Try to check filesystem integrity. Zdenek
On Tue, Apr 8, 2008 at 1:45 PM, Zdenek Kotala <Zdenek.Kotala@sun.com> wrote:
Mikko Partio napsal(a):Try it with caution. You should use ctid column to refer to exact row.Should I try remove one of the duplicate rows from pg_class?
Ok I removed the faulty tuple and nothing catastrophical happened. I can do a pg_dump now, but I still can't remove the one remaining table:
# drop table xyz ;
ERROR: too many trigger records found for relation "xyz"
Any insight for this problem?
Regards
Mikko
On Tue, Apr 8, 2008 at 5:26 PM, Mikko Partio <mpartio@gmail.com> wrote:
Ok I removed the faulty tuple and nothing catastrophical happened. I can do a pg_dump now, but I still can't remove the one remaining table:
# drop table xyz ;Any insight for this problem?
ERROR: too many trigger records found for relation "xyz"
I managed to solve the situation with the help of this thread: http://archives.postgresql.org/pgsql-bugs/2007-02/msg00167.php
Everything seems to be ok for now, but I'm still curious what caused to whole mess to begin with.
Thanks a lot for your help again.
Regards
Mikko
Mikko Partio napsal(a): > On Tue, Apr 8, 2008 at 5:26 PM, Mikko Partio <mpartio@gmail.com> wrote: > >> Ok I removed the faulty tuple and nothing catastrophical happened. I can >> do a pg_dump now, but I still can't remove the one remaining table: >> >> # drop table xyz ; >> ERROR: too many trigger records found for relation "xyz" >> >> Any insight for this problem? >> >> > > I managed to solve the situation with the help of this thread: > http://archives.postgresql.org/pgsql-bugs/2007-02/msg00167.php > > Everything seems to be ok for now, but I'm still curious what caused to > whole mess to begin with. It is really strange. It would be nice to have reproduce scenario. Can you run same DDL command which invoke a catalog corruption on test database cluster? Zdenek
On Tue, Apr 8, 2008 at 6:38 PM, Zdenek Kotala <Zdenek.Kotala@sun.com> wrote:
I tried it on a fresh instance but it went through with no errors :(
Regards
Mikko
Mikko Partio napsal(a):It is really strange. It would be nice to have reproduce scenario. Can you run same DDL command which invoke a catalog corruption on test database cluster?On Tue, Apr 8, 2008 at 5:26 PM, Mikko Partio <mpartio@gmail.com> wrote:Ok I removed the faulty tuple and nothing catastrophical happened. I can
do a pg_dump now, but I still can't remove the one remaining table:
# drop table xyz ;
ERROR: too many trigger records found for relation "xyz"
Any insight for this problem?
I managed to solve the situation with the help of this thread:
http://archives.postgresql.org/pgsql-bugs/2007-02/msg00167.php
Everything seems to be ok for now, but I'm still curious what caused to
whole mess to begin with.
I tried it on a fresh instance but it went through with no errors :(
Regards
Mikko
"Mikko Partio" <mpartio@gmail.com> writes: > On Tue, Apr 8, 2008 at 6:38 PM, Zdenek Kotala <Zdenek.Kotala@sun.com> wrote: >> It is really strange. It would be nice to have reproduce scenario. Can you >> run same DDL command which invoke a catalog corruption on test database >> cluster? > I tried it on a fresh instance but it went through with no errors :( Is it possible that anything was doing a VACUUM FULL on pg_class concurrently with your schema changes on these tables? The behavior looks suspiciously like some problems we've seen before wherein rows ended up with the wrong commit-bit markings... regards, tom lane
On Tue, Apr 8, 2008 at 9:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I may have had a VACUUM FULL on some of my own tables but definitely not on any system tables. There was also a bunch (maybe 10) autovacuum processes working at that time.
Regards
Mikko
"Mikko Partio" <mpartio@gmail.com> writes:
> On Tue, Apr 8, 2008 at 6:38 PM, Zdenek Kotala <Zdenek.Kotala@sun.com> wrote:>> It is really strange. It would be nice to have reproduce scenario. Can youIs it possible that anything was doing a VACUUM FULL on pg_class
>> run same DDL command which invoke a catalog corruption on test database
>> cluster?
> I tried it on a fresh instance but it went through with no errors :(
concurrently with your schema changes on these tables? The behavior
looks suspiciously like some problems we've seen before wherein rows
ended up with the wrong commit-bit markings...
I may have had a VACUUM FULL on some of my own tables but definitely not on any system tables. There was also a bunch (maybe 10) autovacuum processes working at that time.
Regards
Mikko