Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",) - Mailing list pgsql-hackers

From Jim C. Nasby
Subject Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)
Date
Msg-id 20051031174238.GF20349@pervasive.com
Whole thread Raw
In response to slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags
Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)
List pgsql-hackers
On Sun, Oct 30, 2005 at 06:17:53PM -0500, Tom Lane wrote:
> I'd like Jim to test this theory by seeing if it helps to reverse the
> order of the if-test elements at lines 294/295, ie make it look like
> 
>         if (shared->page_status[slotno] != SLRU_PAGE_READ_IN_PROGRESS ||
>             shared->page_number[slotno] != pageno)
> 
> This won't do as a permanent patch, because it isn't guaranteed to fix
> the problem on machines that don't strongly order writes, but it should
> work on Opterons, at least well enough to confirm the diagnosis.

Given your proposed fix on -patches, do you still need me to test this?
Also, is there any heap corruption risk associated with this patch?

I'm also wondering what the effect of this is when assertions are turned
off. My client had to go back to running with assertions turned off
because of the performance impact. Are they now risking data corruption?
Is there a way to turn on the assertion just in this code segment?

This incident has made me wonder if it's worth creating two classes of
assertions. The (hopefully more common) set of assertions would be for
things that shouldn't happen, but if go un-caught won't result in heap
corruption. A new set (well, existing asserts, but just re-classified)
would be for things that if uncaught could result in heap corruption. My
hope is that the set of critical assertions could be turned on by
default, helping to identify race conditions and other bugs that
conventional testing is unlikely to find.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


pgsql-hackers by date:

Previous
From: Honda Shigehiro
Date:
Subject: Re: 8.1RC1 on Tru64
Next
From: "Jim C. Nasby"
Date:
Subject: Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)