Re: BUG #5238: frequent signal 11 segfaults - Mailing list pgsql-bugs

From Nagy Daniel
Subject Re: BUG #5238: frequent signal 11 segfaults
Date
Msg-id 4B24B48A.8090701@telekom.hu
Whole thread Raw
In response to Re: BUG #5238: frequent signal 11 segfaults  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #5238: frequent signal 11 segfaults  (Pavel Stehule <pavel.stehule@gmail.com>)
Re: BUG #5238: frequent signal 11 segfaults  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
I ran "select * from" on both tables. All rows were returned
successfully, no error logs were produced during the selects.

However there are usually many 23505 errors in indices, like:
Dec 13 10:02:13 goldbolt postgres[21949]: [26-1]
user=randirw,db=lovehunter ERROR:  23505: duplicate key value violates
unique constraint "kepek_eredeti_uid_meret_idx"
Dec 13 10:02:13 goldbolt postgres[21949]: [26-2]
user=randirw,db=lovehunter LOCATION:  _bt_check_unique, nbtinsert.c:301

There are many 58P01 errors as well, like:
Dec 13 10:05:18 goldbolt postgres[7931]: [23-1] user=munin,db=lovehunter
ERROR:  58P01: could not open segment 1 of relation base/16
400/19856 (target block 3014766): No such file or directory
Dec 13 10:05:18 goldbolt postgres[7931]: [23-2] user=munin,db=lovehunter
LOCATION:  _mdfd_getseg, md.c:1572
Dec 13 10:05:18 goldbolt postgres[7931]: [23-3] user=munin,db=lovehunter
STATEMENT:  SELECT count(*) FROM users WHERE nem='t'

Reindexing sometimes helps, but the error logs appear again within
hours.

Recently a new error appeared:

Dec 13 03:46:55 goldbolt postgres[18628]: [15-1]
user=randir,db=lovehunter ERROR:  XX000: tuple offset out of range: 0
Dec 13 03:46:55 goldbolt postgres[18628]: [15-2]
user=randir,db=lovehunter LOCATION:  tbm_add_tuples, tidbitmap.c:286
Dec 13 03:46:55 goldbolt postgres[18628]: [15-3]
user=randir,db=lovehunter STATEMENT:  SELECT * FROM valogatas WHERE
uid!='16208' AND eletkor BETWEEN 39 AND 55 AND megyeid='1' AND
keresettnem='f' AND dom='iwiw.hu' AND appid='2001434963' AND nem='t'
ORDER BY random() DESC



If there is on-disk corruption, would a complete dump and
restore to an other directory fix it?

Apart from that, I think that pg shouldn't crash in case of
on-disk corruptions, but log an error message instead.
I'm sure that it's not that easy to implement as it seems,
but nothing is impossible :)


Regards,

Daniel


Tom Lane wrote:
> Nagy Daniel <nagy.daniel@telekom.hu> writes:
>> Here's a better backtrace:
>
> The crash location suggests a problem with a corrupted tuple, but it's
> impossible to guess where the tuple came from.  In particular I can't
> guess whether this reflects on-disk data corruption or some internal
> bug.  Now that you have (some of) the query, can you put together a test
> case?  Or try "select * from" each of the tables used in the query to
> check for on-disk corruption.
>
>             regards, tom lane

pgsql-bugs by date:

Previous
From: Nagy Daniel
Date:
Subject: Re: BUG #5238: frequent signal 11 segfaults
Next
From: Pavel Stehule
Date:
Subject: Re: BUG #5238: frequent signal 11 segfaults