Thread: SELECT returning too many rows (?)

SELECT returning too many rows (?)

From
rob
Date:
Hi list,

pgsql version 7.4.2
linux version 2.4.20
distro redhat 9

We appear to have some kind of problem on one of our internal production
systems and I can't think were first to look.

What follows is some selects. Note the OID. The column 'id' is primary
key. The production system makes selects based on status. As you can see
the same row is apparently returned when selecting on any column except
the primary key, however the EXPLAIN seems correct.

Could this be an INDEX corruption or a fixed bug in postgres ?

processing=3D#  select oid, id, aid, status from q_certs where id =3D '224';
   oid    | id  |  aid  | status=20
----------+-----+-------+--------
 15282219 | 224 | 23604 | 1
(1 row)


processing=3D#  select oid, id, aid, status from q_certs where oid =3D
15282219 ;
   oid    | id  |  aid  | status=20
----------+-----+-------+--------
 15282219 | 224 | 23604 | 1
 15282219 | 224 | 23604 | 1
 15282219 | 224 | 23604 | 1

processing=3D# explain select oid, id, aid, date, status from q_certs
where oid =3D 15282219 ;
                       QUERY PLAN=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20
---------------------------------------------------------
 Seq Scan on q_certs  (cost=3D0.00..57.91 rows=3D1 width=3D26)
   Filter: (oid =3D 15282219::oid)
(2 rows)

processing=3D#  select oid, id, aid, date, status from q_certs where oid =
=3D
15282219 ;
   oid    | id  |  aid  |             date              | status=20
----------+-----+-------+-------------------------------+--------
 15282219 | 224 | 23604 | 2005-02-07 17:33:08.404069+00 | 1
 15282219 | 224 | 23604 | 2005-02-07 17:33:08.404069+00 | 1
 15282219 | 224 | 23604 | 2005-02-07 17:33:08.404069+00 | 1
(3 rows)

processing=3D#  select count(*)from q_certs where oid =3D 15282219 ;
 count=20
-------
     3
(1 row)

processing=3D# Segmentation fault
$


Yikes!!!!! The above segfault has subsequently been unreproducible and
has never happened before.


processing=3D# \d q_certs;
                                  Table "public.q_certs"
  Column   |           Type           |
Modifiers=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
-----------+--------------------------
+----------------------------------------------------
 id        | integer                  | not null default nextval
('"q_certs_id_seq"'::text)
 aid       | integer                  | not null
 intcode   | text                     | not null
 seqid     | integer                  | not null
 objid     | integer                  | not null
 type      | text                     | not null
 status    | text                     | not null
 lastevent | text                     |=20
 date      | timestamp with time zone | not null
 data      | text                     | not null
Indexes:
    "q_certs_pkey" primary key, btree (id)
Triggers:
    tr_trans BEFORE INSERT OR UPDATE ON q_certs FOR EACH ROW EXECUTE
PROCEDURE tr_trans()
Inherits: _q


--=20
Rob Fielding
rob@dsvr.net

www.dsvr.co.uk                                             Development
Designer Servers                                    Business Serve Plc

    "I don't pretend to understand Brannigans Law. I merely enforce it"
        - Zapp Brannigan

Re: SELECT returning too many rows (?) [7.4.2]

From
rob
Date:
To summarise:

I believe there to be one row stored, which is the reason why the
constraints on the primary key (id) were met, and that for reasons as
yet unknown a select returns three rows.

An addendum:

To attempt to discover the database wide implications of this, I have
done a pg_dump into a pgsql 7.4.6 database on my FC3 based workstation.
I was expecting  a successful import, for it to insert duplicate rows
and for it to error on creating unique indexes (which it does after the
inserts).

On my workstation:

$ pg_dump -Uusername -h databasehost -O processing | psql processing

Thankfully this incident appears localised to this one row. I don't know
why, I don't know what's so special about this row and subsequent
transactions have been successfully made since.=20

As expected, the pg_dump did contain three rows, which were inserted
into my local database. The unique index did fail because there were
duplicated values. What was weird, and prompted me to reply to myself
was this :

I did a pg_dump which didn't preserve oids. Strangely the new oids for
the 'problem' row appear to change by a factor of 10. Now that's either
some bizarre coincidence or some kind of omen....

processing=3D# select oid, id from q_certs where id =3D '224';
   oid   | id=20=20
---------+-----
 1246343 | 224
 1246353 | 224
 1246363 | 224
(3 rows)

Directly previous rows appear to follow sequentially. This becomes more
scattered as I decrement id.

processing=3D# select oid, id from q_certs where id =3D '223';
   oid   | id=20=20
---------+-----
 1246547 | 223

processing=3D# select oid, id from q_certs where id =3D '222';
   oid   | id=20=20
---------+-----
 1246546 | 222
processing=3D# select oid, id from q_certs where id =3D '221';
   oid   | id=20=20
---------+-----
 1246545 | 221

Additional information supplied on request.

I think what I'm going to have to do, save todays back up being
corrupted with this superfluous row and so that this guys order can be
processed is to delete the (all three) entries and add a new row by
hand. This will effectively cut off any attempt to get to the bottom of
this. I'll hang fire for a few hours on this incase any of the pg devel
team happen upon this message.

Cheers,

--=20
Rob Fielding
rob@dsvr.net

www.dsvr.co.uk                                             Development
Designer Servers                                    Business Serve Plc

    "I don't pretend to understand Brannigans Law. I merely enforce it"
        - Zapp Brannigan

Re: SELECT returning too many rows (?)

From
Stephan Szabo
Date:
On Tue, 8 Feb 2005, rob wrote:

> We appear to have some kind of problem on one of our internal production
> systems and I can't think were first to look.
>
> What follows is some selects. Note the OID. The column 'id' is primary
> key. The production system makes selects based on status. As you can see
> the same row is apparently returned when selecting on any column except
> the primary key, however the EXPLAIN seems correct.
>
> Could this be an INDEX corruption or a fixed bug in postgres ?
>
> processing=#  select oid, id, aid, status from q_certs where id = '224';
>    oid    | id  |  aid  | status
> ----------+-----+-------+--------
>  15282219 | 224 | 23604 | 1
> (1 row)
>
>
> processing=#  select oid, id, aid, status from q_certs where oid =
> 15282219 ;
>    oid    | id  |  aid  | status
> ----------+-----+-------+--------
>  15282219 | 224 | 23604 | 1
>  15282219 | 224 | 23604 | 1
>  15282219 | 224 | 23604 | 1

Hmm, could you run the above including the system-columns ctid, xmin,
cmin, xmax, cmax?

It doesn't look like plain index corruption to me since it looks like that
was from a seq scan of the table.

>  id        | integer                  | not null default nextval
> ('"q_certs_id_seq"'::text)
>  aid       | integer                  | not null
>  intcode   | text                     | not null
>  seqid     | integer                  | not null
>  objid     | integer                  | not null
>  type      | text                     | not null
>  status    | text                     | not null
>  lastevent | text                     |
>  date      | timestamp with time zone | not null
>  data      | text                     | not null
> Indexes:
>     "q_certs_pkey" primary key, btree (id)
> Triggers:
>     tr_trans BEFORE INSERT OR UPDATE ON q_certs FOR EACH ROW EXECUTE
> PROCEDURE tr_trans()
> Inherits: _q

Just for completeness sake, what is _q and what does the trigger do?

Re: SELECT returning too many rows (?)

From
rob
Date:
> Hmm, could you run the above including the system-columns ctid, xmin,
> cmin, xmax, cmax?

processing=3D# select oid,  ctid, xmin, cmin, xmax, cmax, id, aid, status
from q_certs where oid =3D 15282219 ;
   oid    | ctid  |   xmin    |   cmin    |   xmax    |   cmax    | id
|  aid  | status=20
----------+-------+-----------+-----------+-----------+-----------+-----
+-------+--------
 15282219 | (3,5) | 174011432 |         1 |         1 | 174214469 | 224
| 23604 | 1
 15282219 | (5,5) | 174011432 |         1 |         1 | 174214469 | 224
| 23604 | 1
 15282219 | (7,5) | 174011432 | 174700216 | 174700216 |         1 | 224
| 23604 | 1
(3 rows)

Since it wraps, I'll just select those columns :

processing=3D# select oid,  ctid, xmin, cmin, xmax, cmax from q_certs
where oid =3D 15282219 ;
   oid    | ctid  |   xmin    |   cmin    |   xmax    |   cmax=20=20=20=20
----------+-------+-----------+-----------+-----------+-----------
 15282219 | (3,5) | 174011432 |         1 |         1 | 174214469
 15282219 | (5,5) | 174011432 |         1 |         1 | 174214469
 15282219 | (7,5) | 174011432 | 174700216 | 174700216 |         1

Different results, that's interesting. I'm afraid I don't know what
those columns are for.


> Just for completeness sake, what is _q and what does the trigger do?

_q is the same table structure. I have used inheritance to create new
'queues' yet able to select across them all. I don't actually use this
in production since indexes don't work across inherited tables and it
turned out it was quicker to foreach through all my queues in perl/php
to utilise each tables native indexes and munge the data together
programmatically. That's all by-the-by.

The trigger logs insert and update into another table - my own home
brewed transaction and change log. The transaction table is foreign
keyed to 'id' and includes only one INSERT log, as expected. It was a
good job I didn't foreign key on 'oid' instead :)

Thanks for the info.

--=20
Rob Fielding
rob@dsvr.net

www.dsvr.co.uk                                             Development
Designer Servers                                    Business Serve Plc

    "I don't pretend to understand Brannigans Law. I merely enforce it"
        - Zapp Brannigan

Re: SELECT returning too many rows (?)

From
Tom Lane
Date:
rob <rob@dsvr.net> writes:
> processing=# select oid,  ctid, xmin, cmin, xmax, cmax from q_certs
> where oid = 15282219 ;
>    oid    | ctid  |   xmin    |   cmin    |   xmax    |   cmax
> ----------+-------+-----------+-----------+-----------+-----------
>  15282219 | (3,5) | 174011432 |         1 |         1 | 174214469
>  15282219 | (5,5) | 174011432 |         1 |         1 | 174214469
>  15282219 | (7,5) | 174011432 | 174700216 | 174700216 |         1

This looks to me like malfeasance of a VACUUM FULL: the reason there
are multiple copies is that VACUUM FULL was trying to move the row
around, and somehow you ended up with all three copies marked "good",
when there should have been only one "good" copy at any instant.

So: (1) have you had any system crashes recently?  (2) what sort of
disk hardware is this running on?  I'm wondering about IDE drives
with write caching enabled :-(

            regards, tom lane

Re: SELECT returning too many rows (?) [7.4.2]

From
Tom Lane
Date:
rob <rob@dsvr.net> writes:
> I believe there to be one row stored, which is the reason why the
> constraints on the primary key (id) were met, and that for reasons as
> yet unknown a select returns three rows.

Just for the record: you do have three physical rows.  The reason a
select on the primary key returns only one is that Postgres "knows"
a unique index can only return one hit for a given key, and so it stops
the indexscan after getting the first result.

            regards, tom lane

Re: SELECT returning too many rows (?)

From
robf
Date:
> This looks to me like malfeasance of a VACUUM FULL: the reason there
> are multiple copies is that VACUUM FULL was trying to move the row
> around, and somehow you ended up with all three copies marked "good",
> when there should have been only one "good" copy at any instant.

> So: (1) have you had any system crashes recently?  (2) what sort of
> disk hardware is this running on?  I'm wondering about IDE drives
> with write caching enabled :-(

For the record there is only supposed to be one row, and we do run a
vacuum full each night, and it happened *after* this record was
inserted.

1) No. Up 377 days.
2) SCSI - configured as a hardware RAID 5 thru a LSI Logic / Symbios
Logic 53c1010 Ultra3 SCSI  adapter

Can I now remove these two errant records or do you think there is
anything to gain from continuing with this investigation ?

Kind regards

Rob

Re: SELECT returning too many rows (?)

From
Tom Lane
Date:
robf <rob@dsvr.net> writes:
> Can I now remove these two errant records or do you think there is
> anything to gain from continuing with this investigation ?

If you have the time, it would be interesting to dump out the rows with
pg_filedump (see http://sources.redhat.com/rhdb/).  I usually use
"pg_filedump -i -f" to get the most useful display.  Looks like you need
to dump pages 3, 5, and 7 of that table to see all three rows.

            regards, tom lane

Re: SELECT returning too many rows (?)

From
rob
Date:
> If you have the time, it would be interesting to dump out the rows with
> pg_filedump (see http://sources.redhat.com/rhdb/).  I usually use
> "pg_filedump -i -f" to get the most useful display.  Looks like you need
> to dump pages 3, 5, and 7 of that table to see all three rows.

I have the results however they wrap rather badly when pasted into an
email. I can provide them on request.

I noted in the log an entry from the vacuum that night :

2005-02-08 01:32:36 [17212] WARNING:  index "q_certs_pkey" contains 235
row versions, but table contains 233 row versions
HINT:  Rebuild the index with REINDEX.

Doing so failed with a 'duplicated values' error. Instead I have had to
remove two of the dupe rows as time is pressing on.

Thanks anyway.

Regards,




--=20
Rob Fielding
rob@dsvr.net

www.dsvr.co.uk                                             Development
Designer Servers                                    Business Serve Plc

    "I don't pretend to understand Brannigans Law. I merely enforce it"
        - Zapp Brannigan