Thread: BUG #1173: Sig 11 in insert

BUG #1173: Sig 11 in insert

From
"PostgreSQL Bugs List"
Date:
The following bug has been logged online:

Bug reference:      1173
Logged by:          Tomi Orava

Email address:      tomi.orava@ncircle.nullnet.fi

PostgreSQL version: 7.4

Operating system:   Fedora Core 2

Description:        Sig 11 in insert

Details:


Hi,

It seems that MnoGoSearch 3.2.17 is capable of crashing postgresql 7.4.2
repeatedly every once in a while on
Amd Athlon/Fedora Core 2.

The server log in question:
Jun 16 15:54:27 alderan postgres[2872]: [663-1] DEBUG:  child process (PID
11304) exited with exit code 0
Jun 16 15:55:57 alderan postgres[2872]: [664-1] DEBUG:  reaping dead
processes
Jun 16 15:55:57 alderan postgres[2872]: [665-1] DEBUG:  child process (PID
2880) was terminated by signal 11
Jun 16 15:55:57 alderan postgres[2872]: [666-1] LOG:  server process (PID
2880) was terminated by signal 11
Jun 16 15:55:57 alderan postgres[2872]: [667-1] LOG:  terminating any other
active server processes
Jun 16 15:55:57 alderan postgres[2872]: [668-1] LOG:  all server processes
terminated; reinitializing
Jun 16 15:55:57 alderan postgres[2872]: [669-1] DEBUG:  shmem_exit(0)
Jun 16 15:55:57 alderan postgres[2872]: [670-1] DEBUG:  invoking
IpcMemoryCreate(size=10436608)
Jun 16 15:55:57 alderan postgres[11512]: [671-1] LOG:  database system was
interrupted at 2004-06-16 15:54:27 EEST
Jun 16 15:55:57 alderan postgres[11512]: [672-1] LOG:  checkpoint record is
at C/8B77675C
Jun 16 15:55:57 alderan postgres[11512]: [673-1] LOG:  redo record is at
C/8B766394; undo record is at 0/0; shutdown FALSE
Jun 16 15:55:57 alderan postgres[11512]: [674-1] LOG:  next transaction ID:
2418885; next OID: 20631665
Jun 16 15:55:57 alderan postgres[11512]: [675-1] LOG:  database system was
not properly shut down; automatic recovery in progress
Jun 16 15:55:58 alderan postgres[11512]: [676-1] LOG:  redo starts at
C/8B766394
Jun 16 15:55:58 alderan postgres[11512]: [677-1] LOG:  unexpected pageaddr
C/7982A000 in log file 12, segment 139, offset 8560640
Jun 16 15:55:58 alderan postgres[11512]: [678-1] LOG:  redo done at
C/8B8298B4
Jun 16 15:56:00 alderan postgres[11512]: [679-1] LOG:  database system is
ready

The gdb trace:

(gdb) where
#0  0x006af8b3 in memcpy () from /lib/libc.so.6
#1  0x081c7f48 in varstr_cmp (arg1=0x83c8f74 "51", len1=2, arg2=0x1f50e054
"..", len2=33685512) at varlena.c:866
#2  0x081c8071 in text_cmp (arg1=0x2020008, arg2=0x522017) at varlena.c:905
#3  0x081c8398 in bttextcmp (fcinfo=0xbfe750e0) at varlena.c:1021
#4  0x081eb96d in FunctionCall2 (flinfo=0x1d400020, arg1=33685512,
arg2=33685512) at fmgr.c:993
#5  0x08089351 in _bt_compare (rel=0x2020008, keysz=2, scankey=0x83c8fa8,
page=0x522017 "", offnum=32)
    at nbtsearch.c:351
#6  0x08089227 in _bt_binsrch (rel=0x1f442790, buf=490733600, keysz=2,
scankey=0x83c8fa8) at nbtsearch.c:243
#7  0x08084aa7 in _bt_insertonpg (rel=0x1f442790, buf=43, stack=0x1d400020,
keysz=2, scankey=0x83c8fa8,
    btitem=0x83c8f68, afteritem=0, split_only_page=0 '\0') at
nbtinsert.c:481
#8  0x08084016 in _bt_doinsert (rel=0x1f442790, btitem=0x83c8f68,
index_is_unique=0 '\0', heapRel=0x1f4415d0)
    at nbtinsert.c:141
#9  0x08088046 in btinsert (fcinfo=0x2020008) at nbtree.c:264
#10 0x081ec4bb in OidFunctionCall6 (functionId=33685512, arg1=33685512,
arg2=33685512, arg3=33685512, arg4=33685512,
    arg5=33685512, arg6=33685512) at fmgr.c:1345
#11 0x080832d5 in index_insert (indexRelation=0x1f442790, datums=0x2020008,
    nulls=0x2020008 <Address 0x2020008 out of bounds>,
heap_t_ctid=0x2020008, heapRelation=0x2020008,
    check_uniqueness=0 '\0') at indexam.c:226
#12 0x0810fc9a in ExecInsertIndexTuples (slot=0x2, tupleid=0x83c8de4,
estate=0x83c86c8, is_vacuum=0 '\0')
    at execUtils.c:852
#13 0x08109869 in ExecInsert (slot=0x83c89d0, tupleid=0x0, estate=0x83c86c8)
at execMain.c:1431
#14 0x08109461 in ExecutePlan (estate=0x83c86c8, planstate=0x83c8a18,
operation=CMD_INSERT, numberTuples=0,
    direction=33685512, dest=0x83bc428) at execMain.c:1250
#15 0x081085c8 in ExecutorRun (queryDesc=0x83c82c0,
direction=ForwardScanDirection, count=33685512) at execMain.c:249
#16 0x0817f4ad in ProcessQuery (parsetree=0x2020008, plan=0x83c82c0,
params=0x522017, dest=0x1d400020,
    completionTag=0xbfe75780 "") at pquery.c:139
#17 0x08180057 in PortalRunMulti (portal=0x83c3aa0, dest=0x83bc428,
altdest=0x83bc428, completionTag=0xbfe75780 "")
    at pquery.c:860
#18 0x0817f987 in PortalRun (portal=0x83c3aa0, count=2147483647,
dest=0x83bc428, altdest=0x2020008,
    completionTag=0xbfe75780 "") at pquery.c:494
#19 0x0817c1f4 in exec_simple_query (
    query_string=0x83aa6b0 "INSERT INTO dict (url_id,word,intag)
VALUES('182576','51',5964032)") at postgres.c:873
#20 0x0817e760 in PostgresMain (argc=5, argv=0x835f610, username=0x835f5e0
"indexer") at postgres.c:2868
#21 0x0815833b in BackendFork (port=0x836d7e8) at postmaster.c:2564
#22 0x08157d23 in BackendStartup (port=0x836d7e8) at postmaster.c:2207
#23 0x08156208 in ServerLoop () at postmaster.c:1119
#24 0x08155899 in PostmasterMain (argc=10, argv=0x835e678) at
postmaster.c:897
#25 0x081258b6 in main (argc=10, argv=0xbfe767e4) at main.c:214

As the crash seems to be repeatebly, I'm happy to provide more detailed
information if needed.

Regards,
Tomi Orava

PS. Does anyone have information where to find source code for ASPSeek's
postgres backend (doesn't seem to be publicly available ...) ?

Re: BUG #1173: Sig 11 in insert

From
Tom Lane
Date:
"PostgreSQL Bugs List" <pgsql-bugs@postgresql.org> writes:
> It seems that MnoGoSearch 3.2.17 is capable of crashing postgresql 7.4.2
> repeatedly every once in a while on Amd Athlon/Fedora Core 2.

Can you provide a self-contained test case for this?

> The gdb trace:
> (gdb) where
> #0  0x006af8b3 in memcpy () from /lib/libc.so.6
> #1  0x081c7f48 in varstr_cmp (arg1=0x83c8f74 "51", len1=2, arg2=0x1f50e054
> "..", len2=33685512) at varlena.c:866
> #2  0x081c8071 in text_cmp (arg1=0x2020008, arg2=0x522017) at varlena.c:905
> #3  0x081c8398 in bttextcmp (fcinfo=0xbfe750e0) at varlena.c:1021
> #4  0x081eb96d in FunctionCall2 (flinfo=0x1d400020, arg1=33685512,
> arg2=33685512) at fmgr.c:993
> #5  0x08089351 in _bt_compare (rel=0x2020008, keysz=2, scankey=0x83c8fa8,
> page=0x522017 "", offnum=32)
>     at nbtsearch.c:351

This isn't super helpful since some of the passed arguments are
evidently clobbered already; it's hard to tell which values to trust.
I was initially going to say that the ridiculous len2 argument to
varstr_cmp suggests corrupt data, but seeing that the same value appears
in several places on the trace (some in hex), I'm not sure if it's real
or if gdb is confused.  You might try rebuilding the backend with a
lower optimization level so as to get a more reliable stack trace.

Note that what is happening here is a comparison between an index key
value being inserted and a key value already in the index.  So one
possible explanation is that the previously stored value is corrupt
due to memory or disk problems.  Have you run any hardware diagnostics?
Can you reproduce the problem on another machine?

            regards, tom lane

Re: BUG #1173: Sig 11 in insert

From
Tomi Orava
Date:
Tom Lane wrote:
> "PostgreSQL Bugs List" <pgsql-bugs@postgresql.org> writes:
>
>>It seems that MnoGoSearch 3.2.17 is capable of crashing postgresql 7.4.2
>>repeatedly every once in a while on Amd Athlon/Fedora Core 2.

>>The gdb trace:

<snip>
>
>
> This isn't super helpful since some of the passed arguments are
> evidently clobbered already; it's hard to tell which values to trust.
> I was initially going to say that the ridiculous len2 argument to
> varstr_cmp suggests corrupt data, but seeing that the same value appears
> in several places on the trace (some in hex), I'm not sure if it's real
> or if gdb is confused.  You might try rebuilding the backend with a
> lower optimization level so as to get a more reliable stack trace.
>
> Note that what is happening here is a comparison between an index key
> value being inserted and a key value already in the index.  So one
> possible explanation is that the previously stored value is corrupt
> due to memory or disk problems.  Have you run any hardware diagnostics?
> Can you reproduce the problem on another machine?

First of all, thank you very much for your extremely fast response.
It seems that you were right about your assumptions of broken hardware,
as I have changed an old Silicon Image 680 PCI-ide card (that has been in
use for several years though) to Promise 20267 ide-card and interestingly
there has been not been a single crash since last friday evening
(I did change the ide-disks without any success, but just couldn't believe
that there might be a problem with ide-controller).

I'm very happy to get see that postgresql is OK and the system is running :)
Unfortunately, I do think that there is something wrong with Sil680 driver
in 2.4.2x linux kernels, but thats about it for postgresql's sake.

Sincerely,
Tomi Orava