Thread: Any MIPS assembly experts in the house?
I see the latest buildfarm result from a mipsel machine is failing: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfish&dt=2005-08-26%2005:30:07 and the failure is this: TRAP: FailedAssertion("!(lock->shared > 0)", File: "lwlock.c", Line: 456) LOG: server process (PID 10112) was terminated by signal 6 which makes it seem highly probable that this recently committed patch to convert the MIPS out-of-line spinlock code into inline assembler isn't right: http://archives.postgresql.org/pgsql-committers/2005-08/msg00319.php Can anyone spot the problem? If not I fear we'll have to revert this. Also, upon looking more closely at the new inline code, it looks like it's designed to *loop* if it fails to get the lock, which is *wrong*. The assembly fragment should test once and fall out. This doesn't explain the regression test failure, it would just cause stuck-lock detection to not work. regards, tom lane
Tom Lane wrote: > I see the latest buildfarm result from a mipsel machine is failing: > http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfish&dt=2005-08-26%2005:30:07 > > and the failure is this: > > TRAP: FailedAssertion("!(lock->shared > 0)", File: "lwlock.c", Line: 456) > LOG: server process (PID 10112) was terminated by signal 6 > > which makes it seem highly probable that this recently committed patch > to convert the MIPS out-of-line spinlock code into inline assembler > isn't right: > http://archives.postgresql.org/pgsql-committers/2005-08/msg00319.php > > Can anyone spot the problem? If not I fear we'll have to revert this. As the owner of said machine I was about to report the problem - but on a subsequent run of the buildfarm-script(to get access to the compiled source for further debugging and testing) it completed without an error. Stefan
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes: > Tom Lane wrote: >> which makes it seem highly probable that this recently committed patch >> to convert the MIPS out-of-line spinlock code into inline assembler >> isn't right: >> http://archives.postgresql.org/pgsql-committers/2005-08/msg00319.php > As the owner of said machine I was about to report the problem - but on > a subsequent run of the buildfarm-script(to get access to the compiled > source for further debugging and testing) it completed without an error. Unfortunately, that just says that the failure is intermittent, which is pretty much exactly what you'd expect from not-quite-right locking code. Try running the parallel regression tests several times in a row. (The serial test mode is unlikely to exhibit any problem.) I tend to do this by installing the test postmaster and then doing "make installcheck-parallel" over and over, rather than repeating "make check"; repeatedly building a test installation is just wasting time for this sort of thing. regards, tom lane
I wrote: > Can anyone spot the problem? If not I fear we'll have to revert this. After a bit of reading MIPS documentation, I found out that the proposed patch is exactly backward: it returns 1 if it gets the lock and 0 if the lock is already held :-( Because callers will loop on a nonzero return, the second iteration falls through, which is why the thing isn't an infinite loop. Only problem is when we hit the lock at an instant when somebody else already has it. Given the short duration of our spinlock holds, it was probably quite a coincidence that Stefan's machine got a failure almost immediately. We might have had the problem lurking for awhile. I'll try to commit something that really works in a little bit. regards, tom lane
Tom Lane wrote: > I wrote: > >>Can anyone spot the problem? If not I fear we'll have to revert this. > > > After a bit of reading MIPS documentation, I found out that the proposed > patch is exactly backward: it returns 1 if it gets the lock and 0 if the > lock is already held :-( > > Because callers will loop on a nonzero return, the second iteration > falls through, which is why the thing isn't an infinite loop. Only > problem is when we hit the lock at an instant when somebody else > already has it. > > Given the short duration of our spinlock holds, it was probably quite > a coincidence that Stefan's machine got a failure almost immediately. > We might have had the problem lurking for awhile. > > I'll try to commit something that really works in a little bit. well not sure if that counts as "really works" :-) http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfish&dt=2005-08-27%2006:33:05 Stefan
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes: > well not sure if that counts as "really works" :-) > http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfish&dt=2005-08-27%2006:33:05 Nope, apparently the example I looked at was wrong about how to write constants in MIPS assembler. Sigh. Newer version committed. regards, tom lane