Robert_Creager@LogicalChaos.org (Robert Creager) writes:
> When grilled further on (Wed, 7 Jan 2004 18:06:08 -0500),
> Andrew Sullivan <andrew@libertyrms.info> confessed:
>
>> We have lately had a couple of cases where machines either locked
>> up, slowed down to the point of complete unusability, or died
>> completely while using jfs. We are _not_ sure that jfs is in fact
>> the culprit. In one case, a kernel panic appeared to be referring
>> to the jfs kernel module, but I can't be sure as I lost the output
>> immediately thereafter. Yesterday, we had a problem of data
>> corruption on a failed jfs volume.
>>
>> None of this is to say that jfs is in fact to blame, nor even that,
>> if it is, it does not have something to do with the age of our
>> installations, &c. (these are all RH 8). In fact, I suspect
>> hardware in both cases. But I thought I'd mention it just in case
>> other people are seeing strange behaviour, on the principle of
>> "better safe than sorry."
>
> Interestingly enough, I'm using JFS on a new scsi disk with Mandrake
> 9.1 and was having similar problems. I was generating heavy disk
> usage through database and astronomical data reductions. My machine
> (dual AMD) would suddenly hang. No new jobs would run, just
> increase the load, until I reboot the machine.
>
> I solved my problems by creating a 128Mb ram disk (using EXT2) for
> the temp data produced my reduction runs.
>
> I believe JFS was to blame, not hardware, but you never know...
Interesting.
The set of concurrent factors that came together to appear when this
happened "consistently" were thus:
1. Heavy DB updates taking place on JFS filesystems;
2. SMP (we suspected Xeon hyperthreading as a possible factor, but
shut it off and still saw the same problem...)
3. The third factor that appeared a catalyst was copying, via scp, a
file > 2GB in size onto the system.
The third piece was a particularly interesting aspect; the file would
get copied over successfully, and the scp process would hang (to the
point of "kill -9" being unable to touch it) immediately thereafter.
At that point, processes on the system that were accessing files on
the hung-up filesystem were locked, also unkillable by "kill 9."
That's certainly consistent with JFS being at the root of the problem,
whether it was the cause or not...
--
let name="cbbrowne" and tld="libertyrms.info" in String.concat "@" [name;tld];;
<http://dev6.int.libertyrms.com/>
Christopher Browne
(416) 646 3304 x124 (land)