The following bug has been logged on the website:
Bug reference: 6425
Logged by: orval
Email address: postgres@dunquino.com
PostgreSQL version: 9.0.6
Operating system: Solaris 10 u9
Description:=20=20=20=20=20=20=20=20
This is intermittent and hard to reproduce but crashes consistently in the
same place. That place is backend/access/common/heaptuple.c line 1104:
values[attnum] =3D fetchatt(thisatt, tp + off);
off is always 0, tp is an unaligned address (not divisible by 4 -- this is
Sparc BTW.) I've seen tup->t_hoff set to 0x62 and 0x82 in different core
files.
This system is using streaming replication, and the problem always occurrs
on the secondary. The system is under heavy load, both in terms of queries
and DML on the primary. There are usually quite a lot of deadlocks going
on.
The query in question each time is a join between a table called preferences
and one called preference_fields. The tuple is in preference_fields. I have
not confirmed this is a cause, but the following statement does appear in
one of the scripts in action:
DELETE FROM preference_fields WHERE preference_field_id NOT IN (SELECT
DISTINCT preference_field_id FROM preferences);
There is also this kind of nasty stuff going on:
ALTER TABLE preferences RENAME TO preferences_old;
ALTER TABLE preferences_1326144465 RENAME TO preferences;
Where preferences_1326144465 is a copy of preferences that is used during a
data import process.
At the moment I have asserts in the places where t_hoff is set, looking for
(address % 4 !=3D 0) but it's been going for a couple of days and it hasn't
happened yet. Any advice on where better to put some debugging would be
gratefully received.