Thread: BUG #4838: Database corruption after btree_gin index creation

BUG #4838: Database corruption after btree_gin index creation

From
"Daniele Bortoluzzi"
Date:
The following bug has been logged online:

Bug reference:      4838
Logged by:          Daniele Bortoluzzi
Email address:      bortoluz@gmail.com
PostgreSQL version: 8.4beta2
Operating system:   Linux amd64 2.6.24 (Debian 4.0)
Description:        Database corruption after btree_gin index creation
Details:

I am testing this db
I created a multicolumn GIN index with btree_gin functionality (fulltext
column + timestamp). After creating the index the db segfaulted:

LOG:  server process (PID 14195) was terminated by signal 11: Segmentation
fault
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and
 possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.

The WARNING-DETAIL-HINT messages repeated 4 times, then postgres restarted:

LOG:  all server processes terminated; reinitializing
LOG:  database system was interrupted; last known up at 2009-06-04 12:47:19
CEST
LOG:  database system was not properly shut down; automatic recovery in
progress
LOG:  redo starts at 2/778687D0
LOG:  record with zero length at 2/779392A8
LOG:  redo done at 2/77938E20
LOG:  last completed transaction was at log time 2009-06-04
12:47:35.55392+02
LOG:  autovacuum launcher started
LOG:  database system is ready to accept connections

but segfaulted 2 times more.

Then I launched a VACUUM FULL ANALYZE, no segmentation faults, it completed
succesfully, but now it throws this error:

ERROR:  tuple offset out of range: 48090

or

ERROR:  tuple offset out of range: 0

when doing fulltext queries.

I was using postgres 8.4devel (SVN revision 28901) happily...

Re: BUG #4838: Database corruption after btree_gin index creation

From
Tom Lane
Date:
"Daniele Bortoluzzi" <bortoluz@gmail.com> writes:
> Description:        Database corruption after btree_gin index creation

Can you provide a self-contained test case to reproduce this problem?
We had a similar report yesterday but no one can reproduce it.

            regards, tom lane

Re: BUG #4838: Database corruption after btree_gin index creation

From
Tom Lane
Date:
"Daniele Bortoluzzi" <bortoluz@gmail.com> writes:
> I created a multicolumn GIN index with btree_gin functionality (fulltext
> column + timestamp). After creating the index the db segfaulted:

> LOG:  server process (PID 14195) was terminated by signal 11: Segmentation
> fault

I cannot replicate this problem based on the little information
provided.  The GIN bug we found a couple of days ago would explain
the "tuple offset out of range" errors, and if you had had Asserts
enabled it would explain Assert failures; but I don't see that it
explains a segfault.  Can you still reproduce this with CVS HEAD,
and if so would you submit a test case?  Or at least a stack trace
from the crash?

            regards, tom lane

Re: BUG #4838: Database corruption after btree_gin index creation

From
Daniele Bortoluzzi
Date:
2009/6/10 Tom Lane <tgl@sss.pgh.pa.us>:
[...]
> I cannot replicate this problem based on the little information
> provided. =A0The GIN bug we found a couple of days ago would explain
> the "tuple offset out of range" errors, and if you had had Asserts
> enabled it would explain Assert failures; but I don't see that it
> explains a segfault. =A0Can you still reproduce this with CVS HEAD,
> and if so would you submit a test case? =A0Or at least a stack trace
> from the crash?

I tried to replicate the error with a little set of data (our db
weights ~700MB) but I could not achieve it.
Now I'm checking out from the CVS server, will post a new message
today or at least tomorrow.

If I cannot reproduce the error, what is the best way to catch the
stack trace? Do I have to recompile with --enable-debug?
I read this article:
http://wiki.postgresql.org/wiki/Developer_FAQ#What_debugging_features_are_a=
vailable.3F
but I never debugged postgresql with gdb. Can you give me some hint?

I am sorry for the megadelay. Thank you for supporting.

Re: BUG #4838: Database corruption after btree_gin index creation

From
Tom Lane
Date:
Daniele Bortoluzzi <bortoluz@gmail.com> writes:
> If I cannot reproduce the error, what is the best way to catch the
> stack trace? Do I have to recompile with --enable-debug?

Yes, that would be the best thing.  If you are using gcc there is no
harm in using --enable-debug all the time; it just makes the executable
files a bit bigger, there's no performance change.

Make sure the postmaster is started with "ulimit -c unlimited", else
the crash might not drop a core file.  The core file will normally
appear in $PGDATA, but sometimes in a system-dependent special place
such as /cores/.

Once you've got a core file, do

    $ gdb /path/to/postgres-executable /path/to/core-file
    gdb> bt
    ... stack trace ...
    gdb> quit

and send the whole output of gdb.

            regards, tom lane

Re: BUG #4838: Database corruption after btree_gin index creation

From
Daniele Bortoluzzi
Date:
[...]
>Can you still reproduce this with CVS HEAD,

with CVS HEAD the error is not occurring. Did you fix some GIN bug in
this version?

Thank you for your support

Re: BUG #4838: Database corruption after btree_gin index creation

From
Tom Lane
Date:
Daniele Bortoluzzi <bortoluz@gmail.com> writes:
>> Can you still reproduce this with CVS HEAD,

> with CVS HEAD the error is not occurring. Did you fix some GIN bug in
> this version?

Yes, I told you so.
http://archives.postgresql.org/pgsql-committers/2009-06/msg00081.php

But I don't see how that bug would've led to a segfault.  Bogus TIDs
in the index should be caught without that.

            regards, tom lane