Re: failures on machines using jfs - Mailing list pgsql-performance
From | Spiegelberg, Greg |
---|---|
Subject | Re: failures on machines using jfs |
Date | |
Msg-id | 387C22290D3FD71195D300508BF7DB5238AE92@colmail01.cranel.com Whole thread Raw |
In response to | failures on machines using jfs (Andrew Sullivan <andrew@libertyrms.info>) |
Responses |
Re: failures on machines using jfs
Re: failures on machines using jfs |
List | pgsql-performance |
It would seem we're experiencing somthing similiar with our scratch volume (JFS mounted with noatime). It is still much faster than our experiments with ext2, ext3, and reiserfs but occasionally during large loads it will hiccup for a couple seconds but no crashes yet. I'm reluctant to switch back to any other file system because the data import took a little over 1.5 hours but now takes just under 20 minutes and we haven't crashed yet. For future reference: RedHat 7.3 w/2.4.18-18.7smp PostgreSQL 7.3.3 from source jfsutils 1.0.17-1 Dual PIII Intel 1.4GHz & 2GB ECC Internal disk: 2xU160 SCSI, mirrored, location of our JFS file system External disk Qlogic 2310 attached to FC-SW @2Gbps with ext3 on those LUNs Greg -----Original Message----- From: Christopher Browne To: pgsql-performance@postgresql.org Sent: 1/10/04 9:08 PM Subject: Re: [PERFORM] failures on machines using jfs Robert_Creager@LogicalChaos.org (Robert Creager) writes: > When grilled further on (Wed, 7 Jan 2004 18:06:08 -0500), > Andrew Sullivan <andrew@libertyrms.info> confessed: > >> We have lately had a couple of cases where machines either locked >> up, slowed down to the point of complete unusability, or died >> completely while using jfs. We are _not_ sure that jfs is in fact >> the culprit. In one case, a kernel panic appeared to be referring >> to the jfs kernel module, but I can't be sure as I lost the output >> immediately thereafter. Yesterday, we had a problem of data >> corruption on a failed jfs volume. >> >> None of this is to say that jfs is in fact to blame, nor even that, >> if it is, it does not have something to do with the age of our >> installations, &c. (these are all RH 8). In fact, I suspect >> hardware in both cases. But I thought I'd mention it just in case >> other people are seeing strange behaviour, on the principle of >> "better safe than sorry." > > Interestingly enough, I'm using JFS on a new scsi disk with Mandrake > 9.1 and was having similar problems. I was generating heavy disk > usage through database and astronomical data reductions. My machine > (dual AMD) would suddenly hang. No new jobs would run, just > increase the load, until I reboot the machine. > > I solved my problems by creating a 128Mb ram disk (using EXT2) for > the temp data produced my reduction runs. > > I believe JFS was to blame, not hardware, but you never know... Interesting. The set of concurrent factors that came together to appear when this happened "consistently" were thus: 1. Heavy DB updates taking place on JFS filesystems; 2. SMP (we suspected Xeon hyperthreading as a possible factor, but shut it off and still saw the same problem...) 3. The third factor that appeared a catalyst was copying, via scp, a file > 2GB in size onto the system. The third piece was a particularly interesting aspect; the file would get copied over successfully, and the scp process would hang (to the point of "kill -9" being unable to touch it) immediately thereafter. At that point, processes on the system that were accessing files on the hung-up filesystem were locked, also unkillable by "kill 9." That's certainly consistent with JFS being at the root of the problem, whether it was the cause or not... -- let name="cbbrowne" and tld="libertyrms.info" in String.concat "@" [name;tld];; <http://dev6.int.libertyrms.com/> Christopher Browne (416) 646 3304 x124 (land) ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This footnote also confirms that this email message has been swept by MIMEsweeper for the presence of computer viruses. www.mimesweeper.com **********************************************************************
pgsql-performance by date: