On Sun, Oct 30, 2005 at 06:17:53PM -0500, Tom Lane wrote:
> I'd like Jim to test this theory by seeing if it helps to reverse the
> order of the if-test elements at lines 294/295, ie make it look like
>
> if (shared->page_status[slotno] != SLRU_PAGE_READ_IN_PROGRESS ||
> shared->page_number[slotno] != pageno)
>
> This won't do as a permanent patch, because it isn't guaranteed to fix
> the problem on machines that don't strongly order writes, but it should
> work on Opterons, at least well enough to confirm the diagnosis.
Given your proposed fix on -patches, do you still need me to test this?
Also, is there any heap corruption risk associated with this patch?
I'm also wondering what the effect of this is when assertions are turned
off. My client had to go back to running with assertions turned off
because of the performance impact. Are they now risking data corruption?
Is there a way to turn on the assertion just in this code segment?
This incident has made me wonder if it's worth creating two classes of
assertions. The (hopefully more common) set of assertions would be for
things that shouldn't happen, but if go un-caught won't result in heap
corruption. A new set (well, existing asserts, but just re-classified)
would be for things that if uncaught could result in heap corruption. My
hope is that the set of critical assertions could be turned on by
default, helping to identify race conditions and other bugs that
conventional testing is unlikely to find.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461