Thread: SELECT returning too many rows (?)
Hi list, pgsql version 7.4.2 linux version 2.4.20 distro redhat 9 We appear to have some kind of problem on one of our internal production systems and I can't think were first to look. What follows is some selects. Note the OID. The column 'id' is primary key. The production system makes selects based on status. As you can see the same row is apparently returned when selecting on any column except the primary key, however the EXPLAIN seems correct. Could this be an INDEX corruption or a fixed bug in postgres ? processing=3D# select oid, id, aid, status from q_certs where id =3D '224'; oid | id | aid | status=20 ----------+-----+-------+-------- 15282219 | 224 | 23604 | 1 (1 row) processing=3D# select oid, id, aid, status from q_certs where oid =3D 15282219 ; oid | id | aid | status=20 ----------+-----+-------+-------- 15282219 | 224 | 23604 | 1 15282219 | 224 | 23604 | 1 15282219 | 224 | 23604 | 1 processing=3D# explain select oid, id, aid, date, status from q_certs where oid =3D 15282219 ; QUERY PLAN=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20 --------------------------------------------------------- Seq Scan on q_certs (cost=3D0.00..57.91 rows=3D1 width=3D26) Filter: (oid =3D 15282219::oid) (2 rows) processing=3D# select oid, id, aid, date, status from q_certs where oid = =3D 15282219 ; oid | id | aid | date | status=20 ----------+-----+-------+-------------------------------+-------- 15282219 | 224 | 23604 | 2005-02-07 17:33:08.404069+00 | 1 15282219 | 224 | 23604 | 2005-02-07 17:33:08.404069+00 | 1 15282219 | 224 | 23604 | 2005-02-07 17:33:08.404069+00 | 1 (3 rows) processing=3D# select count(*)from q_certs where oid =3D 15282219 ; count=20 ------- 3 (1 row) processing=3D# Segmentation fault $ Yikes!!!!! The above segfault has subsequently been unreproducible and has never happened before. processing=3D# \d q_certs; Table "public.q_certs" Column | Type | Modifiers=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 -----------+-------------------------- +---------------------------------------------------- id | integer | not null default nextval ('"q_certs_id_seq"'::text) aid | integer | not null intcode | text | not null seqid | integer | not null objid | integer | not null type | text | not null status | text | not null lastevent | text |=20 date | timestamp with time zone | not null data | text | not null Indexes: "q_certs_pkey" primary key, btree (id) Triggers: tr_trans BEFORE INSERT OR UPDATE ON q_certs FOR EACH ROW EXECUTE PROCEDURE tr_trans() Inherits: _q --=20 Rob Fielding rob@dsvr.net www.dsvr.co.uk Development Designer Servers Business Serve Plc "I don't pretend to understand Brannigans Law. I merely enforce it" - Zapp Brannigan
To summarise: I believe there to be one row stored, which is the reason why the constraints on the primary key (id) were met, and that for reasons as yet unknown a select returns three rows. An addendum: To attempt to discover the database wide implications of this, I have done a pg_dump into a pgsql 7.4.6 database on my FC3 based workstation. I was expecting a successful import, for it to insert duplicate rows and for it to error on creating unique indexes (which it does after the inserts). On my workstation: $ pg_dump -Uusername -h databasehost -O processing | psql processing Thankfully this incident appears localised to this one row. I don't know why, I don't know what's so special about this row and subsequent transactions have been successfully made since.=20 As expected, the pg_dump did contain three rows, which were inserted into my local database. The unique index did fail because there were duplicated values. What was weird, and prompted me to reply to myself was this : I did a pg_dump which didn't preserve oids. Strangely the new oids for the 'problem' row appear to change by a factor of 10. Now that's either some bizarre coincidence or some kind of omen.... processing=3D# select oid, id from q_certs where id =3D '224'; oid | id=20=20 ---------+----- 1246343 | 224 1246353 | 224 1246363 | 224 (3 rows) Directly previous rows appear to follow sequentially. This becomes more scattered as I decrement id. processing=3D# select oid, id from q_certs where id =3D '223'; oid | id=20=20 ---------+----- 1246547 | 223 processing=3D# select oid, id from q_certs where id =3D '222'; oid | id=20=20 ---------+----- 1246546 | 222 processing=3D# select oid, id from q_certs where id =3D '221'; oid | id=20=20 ---------+----- 1246545 | 221 Additional information supplied on request. I think what I'm going to have to do, save todays back up being corrupted with this superfluous row and so that this guys order can be processed is to delete the (all three) entries and add a new row by hand. This will effectively cut off any attempt to get to the bottom of this. I'll hang fire for a few hours on this incase any of the pg devel team happen upon this message. Cheers, --=20 Rob Fielding rob@dsvr.net www.dsvr.co.uk Development Designer Servers Business Serve Plc "I don't pretend to understand Brannigans Law. I merely enforce it" - Zapp Brannigan
On Tue, 8 Feb 2005, rob wrote: > We appear to have some kind of problem on one of our internal production > systems and I can't think were first to look. > > What follows is some selects. Note the OID. The column 'id' is primary > key. The production system makes selects based on status. As you can see > the same row is apparently returned when selecting on any column except > the primary key, however the EXPLAIN seems correct. > > Could this be an INDEX corruption or a fixed bug in postgres ? > > processing=# select oid, id, aid, status from q_certs where id = '224'; > oid | id | aid | status > ----------+-----+-------+-------- > 15282219 | 224 | 23604 | 1 > (1 row) > > > processing=# select oid, id, aid, status from q_certs where oid = > 15282219 ; > oid | id | aid | status > ----------+-----+-------+-------- > 15282219 | 224 | 23604 | 1 > 15282219 | 224 | 23604 | 1 > 15282219 | 224 | 23604 | 1 Hmm, could you run the above including the system-columns ctid, xmin, cmin, xmax, cmax? It doesn't look like plain index corruption to me since it looks like that was from a seq scan of the table. > id | integer | not null default nextval > ('"q_certs_id_seq"'::text) > aid | integer | not null > intcode | text | not null > seqid | integer | not null > objid | integer | not null > type | text | not null > status | text | not null > lastevent | text | > date | timestamp with time zone | not null > data | text | not null > Indexes: > "q_certs_pkey" primary key, btree (id) > Triggers: > tr_trans BEFORE INSERT OR UPDATE ON q_certs FOR EACH ROW EXECUTE > PROCEDURE tr_trans() > Inherits: _q Just for completeness sake, what is _q and what does the trigger do?
> Hmm, could you run the above including the system-columns ctid, xmin, > cmin, xmax, cmax? processing=3D# select oid, ctid, xmin, cmin, xmax, cmax, id, aid, status from q_certs where oid =3D 15282219 ; oid | ctid | xmin | cmin | xmax | cmax | id | aid | status=20 ----------+-------+-----------+-----------+-----------+-----------+----- +-------+-------- 15282219 | (3,5) | 174011432 | 1 | 1 | 174214469 | 224 | 23604 | 1 15282219 | (5,5) | 174011432 | 1 | 1 | 174214469 | 224 | 23604 | 1 15282219 | (7,5) | 174011432 | 174700216 | 174700216 | 1 | 224 | 23604 | 1 (3 rows) Since it wraps, I'll just select those columns : processing=3D# select oid, ctid, xmin, cmin, xmax, cmax from q_certs where oid =3D 15282219 ; oid | ctid | xmin | cmin | xmax | cmax=20=20=20=20 ----------+-------+-----------+-----------+-----------+----------- 15282219 | (3,5) | 174011432 | 1 | 1 | 174214469 15282219 | (5,5) | 174011432 | 1 | 1 | 174214469 15282219 | (7,5) | 174011432 | 174700216 | 174700216 | 1 Different results, that's interesting. I'm afraid I don't know what those columns are for. > Just for completeness sake, what is _q and what does the trigger do? _q is the same table structure. I have used inheritance to create new 'queues' yet able to select across them all. I don't actually use this in production since indexes don't work across inherited tables and it turned out it was quicker to foreach through all my queues in perl/php to utilise each tables native indexes and munge the data together programmatically. That's all by-the-by. The trigger logs insert and update into another table - my own home brewed transaction and change log. The transaction table is foreign keyed to 'id' and includes only one INSERT log, as expected. It was a good job I didn't foreign key on 'oid' instead :) Thanks for the info. --=20 Rob Fielding rob@dsvr.net www.dsvr.co.uk Development Designer Servers Business Serve Plc "I don't pretend to understand Brannigans Law. I merely enforce it" - Zapp Brannigan
rob <rob@dsvr.net> writes: > processing=# select oid, ctid, xmin, cmin, xmax, cmax from q_certs > where oid = 15282219 ; > oid | ctid | xmin | cmin | xmax | cmax > ----------+-------+-----------+-----------+-----------+----------- > 15282219 | (3,5) | 174011432 | 1 | 1 | 174214469 > 15282219 | (5,5) | 174011432 | 1 | 1 | 174214469 > 15282219 | (7,5) | 174011432 | 174700216 | 174700216 | 1 This looks to me like malfeasance of a VACUUM FULL: the reason there are multiple copies is that VACUUM FULL was trying to move the row around, and somehow you ended up with all three copies marked "good", when there should have been only one "good" copy at any instant. So: (1) have you had any system crashes recently? (2) what sort of disk hardware is this running on? I'm wondering about IDE drives with write caching enabled :-( regards, tom lane
rob <rob@dsvr.net> writes: > I believe there to be one row stored, which is the reason why the > constraints on the primary key (id) were met, and that for reasons as > yet unknown a select returns three rows. Just for the record: you do have three physical rows. The reason a select on the primary key returns only one is that Postgres "knows" a unique index can only return one hit for a given key, and so it stops the indexscan after getting the first result. regards, tom lane
> This looks to me like malfeasance of a VACUUM FULL: the reason there > are multiple copies is that VACUUM FULL was trying to move the row > around, and somehow you ended up with all three copies marked "good", > when there should have been only one "good" copy at any instant. > So: (1) have you had any system crashes recently? (2) what sort of > disk hardware is this running on? I'm wondering about IDE drives > with write caching enabled :-( For the record there is only supposed to be one row, and we do run a vacuum full each night, and it happened *after* this record was inserted. 1) No. Up 377 days. 2) SCSI - configured as a hardware RAID 5 thru a LSI Logic / Symbios Logic 53c1010 Ultra3 SCSI adapter Can I now remove these two errant records or do you think there is anything to gain from continuing with this investigation ? Kind regards Rob
robf <rob@dsvr.net> writes: > Can I now remove these two errant records or do you think there is > anything to gain from continuing with this investigation ? If you have the time, it would be interesting to dump out the rows with pg_filedump (see http://sources.redhat.com/rhdb/). I usually use "pg_filedump -i -f" to get the most useful display. Looks like you need to dump pages 3, 5, and 7 of that table to see all three rows. regards, tom lane
> If you have the time, it would be interesting to dump out the rows with > pg_filedump (see http://sources.redhat.com/rhdb/). I usually use > "pg_filedump -i -f" to get the most useful display. Looks like you need > to dump pages 3, 5, and 7 of that table to see all three rows. I have the results however they wrap rather badly when pasted into an email. I can provide them on request. I noted in the log an entry from the vacuum that night : 2005-02-08 01:32:36 [17212] WARNING: index "q_certs_pkey" contains 235 row versions, but table contains 233 row versions HINT: Rebuild the index with REINDEX. Doing so failed with a 'duplicated values' error. Instead I have had to remove two of the dupe rows as time is pressing on. Thanks anyway. Regards, --=20 Rob Fielding rob@dsvr.net www.dsvr.co.uk Development Designer Servers Business Serve Plc "I don't pretend to understand Brannigans Law. I merely enforce it" - Zapp Brannigan