Thread: ACK table corrupted, unique index violated.

ACK table corrupted, unique index violated.

From
"Brian Hirt"
Date:
I'm having a very scary problem.

First, here's my system:  smp dual PIII800 512MB memory running redhat 6.2
kernel 2.2.18
PostgreSQL 7.1.3 on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66

In a nutshell, my primary key index got a NOTICE to recreate when the
database was vacuumed.  I dropped the index and tried to recreate it.  I get
a key violation when i try to do this.    I find there are some 200 rows
with the exact same developer_id and oid.

This is a very serious problem.  1) the uinque index should have prevented
this from happening.  2) i looked at my code and there is absolutely no way
my code inserted multiple rows with the same id.

This leads me to believe that there is a big problem with postgres; possibly
in vacuum.  This has also underminded my confidence in postgres data
integrity that such a basic concept can be violated.

I want to help you guys find this problem because I have a lot invested in
postgres and overall have been very happy with it.  I've included
information that i think might be useful.  If there is more that i can
supply, let me know and I will provide it if I can.

My system runs a vacuum every day at 4AMCST.  I've checked all of my
application logs for the day before the NOTICE appeared and the day that it
appeared.  I see no SQL errors logged (and all SQL errors are logged by my
applications) for either day.  I've checked the postmaster logs for both
days and don't see any ERROR's logged.  There are some NOTICES the day
before that i don't know what they mean, but don't look good.

NOTICE:  Cannot rename init file
/moby/pgsql/base/156130/pg_internal.init.19833 to
/moby/pgsql/base/156130/pg_internal.init: No such file or directory
NOTICE:  Cannot rename init file
/moby/pgsql/base/156130/pg_internal.init.19839 to
/moby/pgsql/base/156130/pg_internal.init: No such file or directory
NOTICE:  Cannot rename init file
/moby/pgsql/base/156130/pg_internal.init.19835 to
/moby/pgsql/base/156130/pg_internal.init: No such file or directory
NOTICE:  Cannot rename init file
/moby/pgsql/base/156130/pg_internal.init.19834 to
/moby/pgsql/base/156130/pg_internal.init: No such file or directory
NOTICE:  Cannot rename init file
/moby/pgsql/base/156130/pg_internal.init.19837 to
/moby/pgsql/base/156130/pg_internal.init: No such file or direc

however, on further inspection, the error above appears fairly frequently.
In fact, the first occurance was over 6 weeks before the corruption.

Here is the DEBUG notices for vacuum the day before the corruption:

DEBUG:  --Relation developer--
DEBUG:  Pages 514: Changed 29, reaped 39, Empty 0, New 0; Tup 47971: Vac 52,
Keep/VTL 0/0, Crash 0, UnUsed 89, MinLen 65, MaxLen 133; Re-using:
Free/Avail. Space 6812/1768; EndEmpty/Avail. Pages 0/10. CPU 0.04s/0.00u
sec.
DEBUG:  Index developer_primary_key: Pages 120; Tuples 47971: Deleted 52.
CPU 0.00s/0.09u sec.
DEBUG:  Index developer_recent_mod_key: Pages 119; Tuples 47971: Deleted 52.
CPU 0.00s/0.05u sec.
DEBUG:  Index developer_approved: Pages 121; Tuples 47971: Deleted 52. CPU
0.03s/0.06u sec.
DEBUG:  Index developer_search_idx: Pages 204; Tuples 47971: Deleted 52. CPU
0.00s/0.04u sec.
DEBUG:  Rel developer: Pages: 514 --> 514; Tuple(s) moved: 15. CPU
0.01s/0.02u sec.
DEBUG:  Index developer_primary_key: Pages 120; Tuples 47971: Deleted 15.
CPU 0.00s/0.05u sec.
DEBUG:  Index developer_recent_mod_key: Pages 119; Tuples 47971: Deleted 15.
CPU 0.00s/0.04u sec.
DEBUG:  Index developer_approved: Pages 121; Tuples 47971: Deleted 15. CPU
0.00s/0.05u sec.
DEBUG:  Index developer_search_idx: Pages 204; Tuples 47971: Deleted 15. CPU
0.00s/0.05u sec.

Here is the vaccum DEBUG messages the day of the corruption:

DEBUG:  --Relation developer--
DEBUG:  Pages 515: Changed 25, reaped 39, Empty 0, New 0; Tup 48038: Vac 53,
Keep/VTL 0/0, Crash 0, UnUsed 89, MinLen 65, MaxLen 133; Re-using:
Free/Avail. Space 9144/3348; EndEmpty/Avail. Pages 0/9. CPU 0.02s/0.01u sec.
DEBUG:  Index developer_primary_key: Pages 120; Tuples 48023: Deleted 53.
CPU 0.00s/0.04u sec.
NOTICE:  Index developer_primary_key: NUMBER OF INDEX' TUPLES (48023) IS NOT
THE SAME AS HEAP' (48038).
        Recreate the index.
DEBUG:  Index developer_recent_mod_key: Pages 119; Tuples 48023: Deleted 53.
CPU 0.00s/0.04u sec.
NOTICE:  Index developer_recent_mod_key: NUMBER OF INDEX' TUPLES (48023) IS
NOT THE SAME AS HEAP' (48038).
        Recreate the index.
DEBUG:  Index developer_approved: Pages 121; Tuples 48023: Deleted 53. CPU
0.00s/0.04u sec.
NOTICE:  Index developer_approved: NUMBER OF INDEX' TUPLES (48023) IS NOT
THE SAME AS HEAP' (48038).
        Recreate the index.
DEBUG:  Index developer_search_idx: Pages 204; Tuples 48023: Deleted 53. CPU
0.00s/0.04u sec.
NOTICE:  Index developer_search_idx: NUMBER OF INDEX' TUPLES (48023) IS NOT
THE SAME AS HEAP' (48038).
        Recreate the index.
DEBUG:  Rel developer: Pages: 515 --> 515; Tuple(s) moved: 34. CPU
0.00s/0.03u sec.
DEBUG:  Index developer_primary_key: Pages 120; Tuples 48023: Deleted 34.
CPU 0.00s/0.04u sec.
NOTICE:  Index developer_primary_key: NUMBER OF INDEX' TUPLES (48023) IS NOT
THE SAME AS HEAP' (48038).
        Recreate the index.
DEBUG:  Index developer_recent_mod_key: Pages 119; Tuples 48023: Deleted 34.
CPU 0.00s/0.04u sec.
NOTICE:  Index developer_recent_mod_key: NUMBER OF INDEX' TUPLES (48023) IS
NOT THE SAME AS HEAP' (48038).
        Recreate the index.
DEBUG:  Index developer_approved: Pages 121; Tuples 48023: Deleted 34. CPU
0.00s/0.03u sec.
NOTICE:  Index developer_approved: NUMBER OF INDEX' TUPLES (48023) IS NOT
THE SAME AS HEAP' (48038).
        Recreate the index.
DEBUG:  Index developer_search_idx: Pages 204; Tuples 48023: Deleted 34. CPU
0.00s/0.06u sec.
NOTICE:  Index developer_search_idx: NUMBER OF INDEX' TUPLES (48023) IS NOT
THE SAME AS HEAP' (48038).
        Recreate the index.

I also appear to be getting this quite often:

NOTICE:  RegisterSharedInvalid: SI buffer overflow
NOTICE:  InvalidateSharedInvalid: cache state reset


Here is output from me trying to re-create an index.

basement=# drop index developer_primary_key;
DROP
basement=# create unique index developer_primary_key on
developer(developer_id);ERROR:  Cannot create unique index. Table contains
non-unique values
basement=# select developer_id,count(*) from developer group by developer_id
having count(*) > 1;
 developer_id | count
--------------+-------
        11107 |     2
        18493 |     2
        50983 |     2
        50984 |     2
        50985 |     2
        50986 |     2
        50987 |     2
        50988 |     2
        50989 |     2
        50990 |     2
        50991 |     2
        50992 |     2
        50993 |     2
        50994 |     2
        50995 |     2
        50996 |     2
        50997 |     2
        51020 |     2
        51021 |     2
        51022 |     2
        51023 |     2
        51024 |     2
        51025 |     2
        51026 |     2
        51029 |     2
        51030 |     2
        51031 |     2
        51032 |     2
        51033 |     2
        51034 |     2
        51035 |     2
        51036 |     2
        51037 |     2
        51038 |     2
        51039 |     2
        51040 |     2
        51041 |     2
        51042 |     2
        51043 |     3
        51044 |     3
        51045 |     3
        51046 |     3
        51047 |     3
        51048 |     3
        51049 |     3
        51050 |     3
        51051 |     3
        51052 |     3
        51053 |     3
        51054 |     3
        51055 |     3
        51056 |     3
        51057 |     3
        51058 |     3
        51059 |     3
        51060 |     3
        51061 |     3
        51062 |     2
        51063 |     2
        51064 |     2
        51065 |     2
        51066 |     2
        51067 |     2
        51068 |     2
        51069 |     2
        51070 |     2
        51071 |     2
        51072 |     2
        51073 |     2
        51074 |     2
        51075 |     2
        51076 |     2
        51152 |     2
        51614 |     2
        51615 |     2
        51616 |     2
        51617 |     2
        51618 |     2
        51619 |     2
        51620 |     2
        51621 |     2
        51622 |     2
        51623 |     2
        51678 |     2
        51680 |     2
        51681 |     2
        51682 |     2
        51683 |     2
        51768 |     2
        51862 |     2
        51863 |     2
        51864 |     2
        51950 |     2
        52094 |     2
        52095 |     2
        52096 |     2
        52097 |     2
        52098 |     2
        52099 |     2
        52100 |     2
        52101 |     2
        52103 |     2
        52104 |     2
        52105 |     2
        52106 |     2
        52107 |     2
        52108 |     2
        52109 |     2
        52110 |     2
        52111 |     2
        52112 |     2
        52113 |     2
        52114 |     2
        52115 |     2
        52116 |     2
        52117 |     2
        52118 |     2
        52119 |     2
        52120 |     2
        52121 |     2
        52122 |     2
        52123 |     2
        52124 |     2
        52125 |     2
        52126 |     2
        52127 |     2
        52128 |     2
        52129 |     2
        52130 |     2
        52131 |     2
        52132 |     2
        52133 |     2
        52134 |     2
        52135 |     2
        52136 |     2
        52137 |     2
        52167 |     2
        52168 |     2
        52169 |     2
        52170 |     2
        52171 |     2
        52172 |     2
        52173 |     2
        52174 |     2
        52175 |     2
        52180 |     2
        52181 |     2
        52182 |     2
        52222 |     2
        52223 |     2
        52224 |     2
        52225 |     2
        52226 |     2
        52227 |     2
        52228 |     2
        52229 |     2
        52230 |     2
        52232 |     2
        52233 |     2
        52234 |     2
        52235 |     2
        52236 |     2
        52237 |     2
        52238 |     2
        52239 |     2
        52240 |     2
        52241 |     2
        52242 |     2
        52243 |     2
        52244 |     2
        52245 |     2
        52246 |     2
        52247 |     2
        52248 |     2
        52249 |     2
        52250 |     2
        52251 |     2
        52465 |     2
        52466 |     2
        52467 |     2
        52468 |     2
        52469 |     2
        52470 |     2
        52471 |     2
        52472 |     2
        52473 |     2
        52474 |     2
        52475 |     2
        52476 |     2
        52477 |     2
        52478 |     2
        52479 |     2
        52480 |     2
        52481 |     2
        52482 |     2
        52483 |     2
        52484 |     2
        52485 |     2
        52486 |     2
        52487 |     2
(200 rows)

basement=# select oid,developer_id from developer where developer_id =52487;
    oid    | developer_id
-----------+--------------
 401440180 |        52487
 401440180 |        52487




Re: [GENERAL] ACK table corrupted, unique index violated.

From
Tom Lane
Date:
"Brian Hirt" <bhirt@mobygames.com> writes:
> In a nutshell, my primary key index got a NOTICE to recreate when the
> database was vacuumed.  I dropped the index and tried to recreate it.  I get
> a key violation when i try to do this.    I find there are some 200 rows
> with the exact same developer_id and oid.

You're the third person to have reported something like this, so there's
something strange going on.  Can you give access to your system to
someone who can poke into it (probably me or Vadim)?

> There are some NOTICES the day
> before that i don't know what they mean, but don't look good.

> NOTICE:  Cannot rename init file
> /moby/pgsql/base/156130/pg_internal.init.19833 to
> /moby/pgsql/base/156130/pg_internal.init: No such file or directory

These seem extremely bizarre.

            regards, tom lane

Re: [GENERAL] ACK table corrupted, unique index violated.

From
"Brian Hirt"
Date:
Tom,

I'm a little uncomfortable about giving ssh access to our box.  We have a
lot of sensitive information in the database, and we would be violating our
privacy policy by giving someone access.  If there is some way I could give
you any information, or help you out that would be better.  I ended up
shutting down postgres and copying the pgdata directory somewhere else and
re-creating the database -- so i do have a copy of the corrupted database.

I've been doing a little investigating and i might have a possible lead.
The two tables that were corrupted recently had new indexes put on them that
are based on a plpgsql function.   Basically in the form "create index blah
on table(myfunction(blah_id))"  These are the only two tables in my system
that have an index using a plpgsql function.  Both tables became corrupt on
the same day, and the corruption happened the night that i added the
indexes.  I have no imperical evidence to support that this is the cause,
but it seems possible.

One other note, even after recreating the database, I'm getting NOTICE
"InvalidateShardeInvalid" and "RegisterSharedInvalid: SI buffer overflow".
I never used to get them and now I'm getting tons of them.   Should this
concern me?  I don't understand the implications.

----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Brian Hirt" <bhirt@mobygames.com>
Cc: "Postgres Hackers" <pgsql-hackers@postgresql.org>;
<pgsql-general@postgresql.org>; "Brian A Hirt" <bhirt@berkhirt.com>
Sent: Wednesday, December 12, 2001 11:01 AM
Subject: Re: [GENERAL] ACK table corrupted, unique index violated.


> "Brian Hirt" <bhirt@mobygames.com> writes:
> > In a nutshell, my primary key index got a NOTICE to recreate when the
> > database was vacuumed.  I dropped the index and tried to recreate it.  I
get
> > a key violation when i try to do this.    I find there are some 200 rows
> > with the exact same developer_id and oid.
>
> You're the third person to have reported something like this, so there's
> something strange going on.  Can you give access to your system to
> someone who can poke into it (probably me or Vadim)?
>
> > There are some NOTICES the day
> > before that i don't know what they mean, but don't look good.
>
> > NOTICE:  Cannot rename init file
> > /moby/pgsql/base/156130/pg_internal.init.19833 to
> > /moby/pgsql/base/156130/pg_internal.init: No such file or directory
>
> These seem extremely bizarre.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
>


Re: [GENERAL] ACK table corrupted, unique index violated.

From
Gavin Sherry
Date:
Brian,

> I've been doing a little investigating and i might have a possible lead.
> The two tables that were corrupted recently had new indexes put on them that
> are based on a plpgsql function.   Basically in the form "create index blah
> on table(myfunction(blah_id))"  These are the only two tables in my system
> that have an index using a plpgsql function.  Both tables became corrupt on
> the same day, and the corruption happened the night that i added the
> indexes.  I have no imperical evidence to support that this is the cause,
> but it seems possible.

I've actually seen a very similiar problem, but in a controlled
'situation'. That is, I was intentionally modifying the
source. Unfortunately, I turfed code which produced these kinds of errors
on vacuum but it had to do with recreating heaps and notflushing old heap
data from the cache.

Perhaps you could include the plpgsql code so that other people could
recreate?

> 
> One other note, even after recreating the database, I'm getting NOTICE
> "InvalidateShardeInvalid" and "RegisterSharedInvalid: SI buffer overflow".
> I never used to get them and now I'm getting tons of them.   Should this
> concern me?  I don't understand the implications.

No problem here. These messages really should be surpressed -- they seem
muchof a problem than they are.

Gavin