Thread: int2 unique index malfunction (btree corrupt)

int2 unique index malfunction (btree corrupt)

From
Christof Petig
Date:
Your name               : Christof Petig
Your email address      : christof.petig@wtal.de


System Configuration
---------------------
  Architecture (example: Intel Pentium)         : Intel dual-Pentium II

  Operating System (example: Linux 2.0.26 ELF)  : Linux 2.2.11 ELF

  PostgreSQL version (example: PostgreSQL-6.5.1):   PostgreSQL-6.5,
6.5.1, 6.6

  Compiler used (example:  gcc 2.8.0)           : egcs-1.1.1


Dear PostgreSQL wizards,

During development of a CIM program I frequently updated a table by its
primary key (int2 or numeric(3)). A lot of strange messages
'NOTICE:  (transaction aborted): queries ignored until END' alerted me
that something is going wrong.

update foo set val=val+1 where key=:n
generated 'Cannot insert a duplicate key into a unique index'
which is definitely wrong.

Included is a very small (C++) program (Linux, egcs-1.1.1) which
generates the mentioned errors. If it helps debugging, I might port it
to
plain C (contact me).

It occured with V6.5 as well as with August 2nd's CVS tree.

Sometimes even the initial 'insert's fail, one time (with August 2nd's
CVS tree)
I got an
'FATAL 1:  btree: items are out of order (leftmost 0, stack 10, update
1)' error.

Now with 6.5.1, I don't get the NOTICE messages but straight an
FATAL 1:  btree: items are out of order (leftmost 0, stack 3, update 1)
FATAL 1:  btree: items are out of order (leftmost 0, stack 2, update 1)

It seems there is something wrong with PostgreSQL's internals. Perhaps
this program helps tracking it.

Best Regards,
    Christof

PS: vacuum after the FATAL (6.6):
DEBUG:  --Relation test--
DEBUG:  Pages 13: Changed 2, Reapped 13, Empty 0, New 0; Tup 128: Vac
1823, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 48, MaxLen 48; Re-using:
Free/Avail. Space 92444/86784; EndEmpty/Avail. Pages 0/12. Elapsed 0/0
sec.
DEBUG:  Index test_pkey: Pages 13; Tuples 128: Deleted 1819. Elapsed 0/0
sec.
DEBUG:  Rel test: Pages: 13 --> 1; Tuple(s) moved: 128. Elapsed 0/0 sec.

DEBUG:  Index test_pkey: Pages 13; Tuples 128: Deleted 128. Elapsed 0/0
sec.

I haven't seen the corruption on 6.5.

Attachment

Re: [BUGS] int2 unique index malfunction (btree corrupt)

From
Tom Lane
Date:
Christof Petig <christof.petig@wtal.de> writes:
> During development of a CIM program I frequently updated a table by its
> primary key (int2 or numeric(3)). A lot of strange messages
> 'NOTICE:  (transaction aborted): queries ignored until END' alerted me
> that something is going wrong.
> [ details snipped ]

FWIW, the test program you supplied seems to run without errors for me.
I'm using current CVS sources on an HPUX box.

There was a fix applied on 8/8 to clean up a problem with btrees not
recovering from an aborted transaction properly, but I'm not sure
whether that has anything to do with your example...

            regards, tom lane

Re: [BUGS] int2 unique index malfunction (btree corrupt)

From
Christof Petig
Date:
Tom Lane wrote:

> Christof Petig <christof.petig@wtal.de> writes:
> > During development of a CIM program I frequently updated a table by its
> > primary key (int2 or numeric(3)). A lot of strange messages
> > 'NOTICE:  (transaction aborted): queries ignored until END' alerted me
> > that something is going wrong.
> > [ details snipped ]
>
> FWIW, the test program you supplied seems to run without errors for me.
> I'm using current CVS sources on an HPUX box.
>
> There was a fix applied on 8/8 to clean up a problem with btrees not
> recovering from an aborted transaction properly, but I'm not sure
> whether that has anything to do with your example...

My example fails desperately (within one to two seconds) on 6.5.1, however I
tested it with today's CVS sources and it runs cleanly (disable the
debugging output for testing at full speed).
So the bugfix seems to cover my problem.

However ...
- if I vacuum the database while my test program runs all sorts of strange
things happen:
   -- all goes well (90% chance, better if database had recently shrunk)

   -- the vacuum backend crashes:
/home/christof> vacuumdb test
pqReadData() -- backend closed the channel unexpectedly.
        This probably means the backend terminated abnormally
        before or while processing the request.
We have lost the connection to the backend, so further processing is
impossible.
vacuumdb: database vacuum failed on test.

   -- see yourself:
home/christof> vacuumdb test
ERROR:  Cannot insert a duplicate key into a unique index
vacuumdb: database vacuum failed on test.

   -- postmaster goes into an endless loop, you can't kill test nor vacuumdb
(happened once after a long run, test (the table's file) had reached about
4MB.) Killing postmaster helps ...

- vacuum never shrinks primary indices, and the index' file continues to
grow (even at 7MB+).
  Seems the only choice for long running databases is either (drop
index/create index) or     (dump/delete/restore).

Regards
     Christof

PS: Besides these issues Postgres works rather well!
  I like datetime_part('epoch', ...) and timespan_part('epoch', ...) which
cover a functionality not available on our closed source (aka commercial)
database.
Calculating the speed of a running machine in SQL is nearly trivial
(start_time, current_time, produced_amount).

PPS: I modified the test program to not drop the table and recreate it on
start. This allows many runs (event concurrent) on the same tables.
Simly invoke as
./test something_which_doesnt_matter


Attachment

Re: [BUGS] int2 unique index malfunction (btree corrupt)

From
Christof Petig
Date:
As a side note:

once test_pkey is beyond 13MB in size a failure of a vacuumdb is nearly
certain.
the size of a clean (!) test increases steadily (due to a bigger pkey file?)

Regards
     Christof