The severity of this bug heavily depends on your lack of buggy programs.
Short description:
Long standing open transactions combined with high traffic updates and
some regular vacuums eventually corrupt memory.
Long description:
Due to a design flaw within our ecpg Programs (I don't recommend
designing for autocommit off!) some transactions stayed open for several
days. A process data collection system generates a lot of status change
updates (3MB a day) to about 110 rows in a table at the same time.
After 1024 updates I vacuum the high traffic table which should shrink
to 16kB. First I noticed that vacuum did not free old tuples. This put
me on the track of the real cause.
Since three weeks (more buggy long standing transactions) I have seen
one major crash of the program system per week. For months I have seen
some strange NOTICES which went away after another vacuum. And this
morning I found a 'possible memory corruption, killing other backends'
message.
The situation got better and better during the 7.0 development cycle (I
started with a pre-beta version this January and reported some
concurrent vacuum oddities that time). And it got worse the more
interactive programs we added.
But up to now I didn't see the special addon which causes the pain: Long
standing transactions.
It's not very bad. This seems to happen on rare conditions. Until this
week I thought of it as a minor oddity - a temporary nuissance.
And: It is current stable CVS tree! running on a 233MHz Pentium2, Linux
2.2.14(?)
Sample Code:
update bn_actual set meter=meter+1 where machine= ?; // repeat every
second
combined with
begin transaction; // hold
select something;
and
vacuum analyze; // once a day
and
vacuum bn_actual; // every 1024 updates
and some others.
PS: Of course I'm currently fixing the long transactions problem. I'll
tell you once the system runs 4 weeks again without any strange
occurence.
PPS: Yes, I'm following the hackers list.
P3S: No, I don't believe in a hardware bug.