Thread: ext3 filesystem / linux 7.3
hi there, I was reading bruce's 'postgresql hardware performance tuning' article and he has suggested ext3 filesystem with data mode = writeback for high performance. I would really appreciate if anyone could share your experiences with ext3 from a production stand point or any other suggestions for best read/write performance. Our applications is an hybrid of heavy inserts/updates and DSS queries. version - postgres 7.3.2 hardware - raid 5 (5 x 73 g hardware raid), 4g ram, 2 * 2.8 GHz cpu, redhat 7.3 Note : we don't have the luxury of raid 1+0 (dedicated disks) for xlog and clog files to start with but may be down the line we might look into those options, but for now i've planned on having them on local drives rather than raid 5. thanks for any inputs, Shankar __________________________________________________ Do you Yahoo!? Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop! http://platinum.yahoo.com
What is the URL of that article? I understood that ext2 was faster with PG and so I went to a lot of trouble of creating an ext2 partition just for PG and gave up the journalling to do that. Something about double effort since PG already does a lot of that. Bruce, is there a final determination of which is faster/safer? Jeff ----- Original Message ----- From: "Shankar K" <shan0075@yahoo.com> To: <pgsql-performance@postgresql.org> Sent: Monday, March 31, 2003 3:55 PM Subject: [PERFORM] ext3 filesystem / linux 7.3 > hi there, > > I was reading bruce's 'postgresql hardware performance > tuning' article and he has suggested ext3 filesystem > with data mode = writeback for high performance. > > I would really appreciate if anyone could share your > experiences with ext3 from a production stand point or > any other suggestions for best read/write performance. > > Our applications is an hybrid of heavy inserts/updates > and DSS queries. > > version - postgres 7.3.2 > hardware - raid 5 (5 x 73 g hardware raid), 4g ram, 2 > * 2.8 GHz cpu, redhat 7.3 > > Note : we don't have the luxury of raid 1+0 (dedicated > disks) for xlog and clog files to start with but may > be down the line we might look into those options, but > for now i've planned on having them on local drives > rather than raid 5. > > thanks for any inputs, > Shankar > > > > > > > __________________________________________________ > Do you Yahoo!? > Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop! > http://platinum.yahoo.com > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
hi jeff, go to http://www.ca.postgresql.org/docs/momjian/hw_performance/ under 'filesystems' slide. snip File system choice is particularly difficult on Linux because there are so many file system choices, and none of them are optimal: ext2 is not entirely crash-safe, ext3, XFS, and JFS are journal-based, and Reiser is optimized for small files and does journalling. The journalling file systems can be significantly slower than ext2 but when crash recovery is required, ext2 isn't an option. If ext2 must be used, mount it with sync enabled. Some people recommend XFS or an ext3 filesystem mounted with data=writeback. /snip --- "Jeffrey D. Brower" <jeff@pointhere.net> wrote: > What is the URL of that article? I understood that > ext2 was faster with PG > and so I went to a lot of trouble of creating an > ext2 partition just for PG > and gave up the journalling to do that. Something > about double effort since > PG already does a lot of that. > > Bruce, is there a final determination of which is > faster/safer? > > Jeff > > ----- Original Message ----- > From: "Shankar K" <shan0075@yahoo.com> > To: <pgsql-performance@postgresql.org> > Sent: Monday, March 31, 2003 3:55 PM > Subject: [PERFORM] ext3 filesystem / linux 7.3 > > > > hi there, > > > > I was reading bruce's 'postgresql hardware > performance > > tuning' article and he has suggested ext3 > filesystem > > with data mode = writeback for high performance. > > > > I would really appreciate if anyone could share > your > > experiences with ext3 from a production stand > point or > > any other suggestions for best read/write > performance. > > > > Our applications is an hybrid of heavy > inserts/updates > > and DSS queries. > > > > version - postgres 7.3.2 > > hardware - raid 5 (5 x 73 g hardware raid), 4g > ram, 2 > > * 2.8 GHz cpu, redhat 7.3 > > > > Note : we don't have the luxury of raid 1+0 > (dedicated > > disks) for xlog and clog files to start with but > may > > be down the line we might look into those options, > but > > for now i've planned on having them on local > drives > > rather than raid 5. > > > > thanks for any inputs, > > Shankar > > > > > > > > > > > > > > __________________________________________________ > > Do you Yahoo!? > > Yahoo! Platinum - Watch CBS' NCAA March Madness, > live on your desktop! > > http://platinum.yahoo.com > > > > > > ---------------------------(end of > broadcast)--------------------------- > > TIP 1: subscribe and unsubscribe commands go to > majordomo@postgresql.org > > > ---------------------------(end of > broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster __________________________________________________ Do you Yahoo!? Yahoo! Tax Center - File online, calculators, forms, and more http://platinum.yahoo.com
On Tue, 1 Apr 2003 09:39:17 -0800 (PST) in message <20030401173917.19476.qmail@web21101.mail.yahoo.com>, Shankar K <shan0075@yahoo.com>wrote: > hi jeff, > > go to > http://www.ca.postgresql.org/docs/momjian/hw_performance/ > under 'filesystems' slide. > I suspect that is what he's seen. From my experience, ext3 is only a percent or two slower than ext2 under pg_bench. It saves an amazing amount of time onstartup after a failure by not having to fsck to confirm that the filesystem is in a consistent state. I believe that ext3 is a metadata journaling system, and not a data journaling system. This would indicate that the PG transactioningis complimentary to the filesystem journaling, not duplication. eric
I have heard XFS with the mount option is fastest. --------------------------------------------------------------------------- Shankar K wrote: > hi jeff, > > go to > http://www.ca.postgresql.org/docs/momjian/hw_performance/ > under 'filesystems' slide. > > snip > > File system choice is particularly difficult on Linux > because there are so many file system choices, and > none of them are optimal: ext2 is not entirely > crash-safe, ext3, XFS, and JFS are journal-based, and > Reiser is optimized for small files and does > journalling. The journalling file systems can be > significantly slower than ext2 but when crash recovery > is required, ext2 isn't an option. If ext2 must be > used, mount it with sync enabled. Some people > recommend XFS or an ext3 filesystem mounted with > data=writeback. > > /snip > > --- "Jeffrey D. Brower" <jeff@pointhere.net> wrote: > > What is the URL of that article? I understood that > > ext2 was faster with PG > > and so I went to a lot of trouble of creating an > > ext2 partition just for PG > > and gave up the journalling to do that. Something > > about double effort since > > PG already does a lot of that. > > > > Bruce, is there a final determination of which is > > faster/safer? > > > > Jeff > > > > ----- Original Message ----- > > From: "Shankar K" <shan0075@yahoo.com> > > To: <pgsql-performance@postgresql.org> > > Sent: Monday, March 31, 2003 3:55 PM > > Subject: [PERFORM] ext3 filesystem / linux 7.3 > > > > > > > hi there, > > > > > > I was reading bruce's 'postgresql hardware > > performance > > > tuning' article and he has suggested ext3 > > filesystem > > > with data mode = writeback for high performance. > > > > > > I would really appreciate if anyone could share > > your > > > experiences with ext3 from a production stand > > point or > > > any other suggestions for best read/write > > performance. > > > > > > Our applications is an hybrid of heavy > > inserts/updates > > > and DSS queries. > > > > > > version - postgres 7.3.2 > > > hardware - raid 5 (5 x 73 g hardware raid), 4g > > ram, 2 > > > * 2.8 GHz cpu, redhat 7.3 > > > > > > Note : we don't have the luxury of raid 1+0 > > (dedicated > > > disks) for xlog and clog files to start with but > > may > > > be down the line we might look into those options, > > but > > > for now i've planned on having them on local > > drives > > > rather than raid 5. > > > > > > thanks for any inputs, > > > Shankar > > > > > > > > > > > > > > > > > > > > > __________________________________________________ > > > Do you Yahoo!? > > > Yahoo! Platinum - Watch CBS' NCAA March Madness, > > live on your desktop! > > > http://platinum.yahoo.com > > > > > > > > > ---------------------------(end of > > broadcast)--------------------------- > > > TIP 1: subscribe and unsubscribe commands go to > > majordomo@postgresql.org > > > > > > ---------------------------(end of > > broadcast)--------------------------- > > TIP 4: Don't 'kill -9' the postmaster > > > __________________________________________________ > Do you Yahoo!? > Yahoo! Tax Center - File online, calculators, forms, and more > http://platinum.yahoo.com > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Tue, Apr 01, 2003 at 12:33:15PM -0500, Jeffrey D. Brower wrote: > What is the URL of that article? I understood that ext2 was faster with PG > and so I went to a lot of trouble of creating an ext2 partition just for PG > and gave up the journalling to do that. Something about double effort since > PG already does a lot of that. I don't know how ext3 could be faster than ext2, since it has to do more work. But ext2 is not crash-safe. So your data could well be hosed if you come back from a crash on ext2. Actually, I have my doubts about _any_ of the journaling filesystems for Linux: ext3 has a reputation for being slow if you journal in the real-safe mode, and there have been so many unrepeatable reiserfs problem reports that I'm loathe to use it for real systems. I had exceptionally good experiences with xfs when I was admining SGI boxes, but that's not part of the standard Linux kernel distribution, and with no idea why, I think my managers would get grumpy with me for using it. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
eric soroos wrote: > On Tue, 1 Apr 2003 09:39:17 -0800 (PST) in message > <20030401173917.19476.qmail@web21101.mail.yahoo.com>, Shankar > K <shan0075@yahoo.com> wrote: > > hi jeff, > > > > go to > > http://www.ca.postgresql.org/docs/momjian/hw_performance/ > > under 'filesystems' slide. > > > > I suspect that is what he's seen. > > >From my experience, ext3 is only a percent or two slower than ext2 under pg_bench. It saves an amazing amount of timeon startup after a failure by not having to fsck to confirm that the filesystem is in a consistent state. > > I believe that ext3 is a metadata journaling system, and not a > data journaling system. This would indicate that the PG > transactioning is complimentary to the filesystem journaling, > not duplication. Ext3 is only metadata journaling if you set the mount flags as described. I also don't think pgbench is the best test for testing file system performance. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
OK so am I hearing: XFS is the fastest (but is it the safest?) but does not come on Linux. Ext2 does less work than Ext3 so is fastest among what DOES come with Linux - but if you have a crash that fsck can't fix you're hosed. Ext3 is quite a bit slower if set to be real safe, a wee bit slower if run with standard options which makes it more crash-safe, and much slower if the mount flags are set to metadata journaling but that is much safer as a file system because the metadata journaling is complementary to the PG transactioning. To determine which you want you must choose which one feels to you like the right balance of speed and the setup work you are willing to perform and maintain. Do I have it right? Jeff
FYI, I believe that XFS will be included in the 2.6 kernel. Keith Bottner kbottner@istation.com -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Andrew Sullivan Sent: Tuesday, April 01, 2003 11:55 AM To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] ext3 filesystem / linux 7.3 On Tue, Apr 01, 2003 at 12:33:15PM -0500, Jeffrey D. Brower wrote: > What is the URL of that article? I understood that ext2 was faster > with PG and so I went to a lot of trouble of creating an ext2 > partition just for PG and gave up the journalling to do that. > Something about double effort since PG already does a lot of that. I don't know how ext3 could be faster than ext2, since it has to do more work. But ext2 is not crash-safe. So your data could well be hosed if you come back from a crash on ext2. Actually, I have my doubts about _any_ of the journaling filesystems for Linux: ext3 has a reputation for being slow if you journal in the real-safe mode, and there have been so many unrepeatable reiserfs problem reports that I'm loathe to use it for real systems. I had exceptionally good experiences with xfs when I was admining SGI boxes, but that's not part of the standard Linux kernel distribution, and with no idea why, I think my managers would get grumpy with me for using it. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110 ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
Just switch to FreeBSD and use UFS ;) Chris ----- Original Message ----- From: "Jeffrey D. Brower" <jeff@pointhere.net> To: "Bruce Momjian" <pgman@candle.pha.pa.us>; "eric soroos" <eric-psql@soroos.net> Cc: "Shankar K" <shan0075@yahoo.com>; <pgsql-performance@postgresql.org> Sent: Wednesday, April 02, 2003 4:42 AM Subject: Re: [PERFORM] ext3 filesystem / linux 7.3 > OK so am I hearing: > > XFS is the fastest (but is it the safest?) but does not come on Linux. > > Ext2 does less work than Ext3 so is fastest among what DOES come with > Linux - but if you have a crash that fsck can't fix you're hosed. > > Ext3 is quite a bit slower if set to be real safe, a wee bit slower if run > with standard options which makes it more crash-safe, and much slower if the > mount flags are set to metadata journaling but that is much safer as a file > system because the metadata journaling is complementary to the PG > transactioning. > > To determine which you want you must choose which one feels to you like the > right balance of speed and the setup work you are willing to perform and > maintain. > > Do I have it right? > > Jeff > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
> Just switch to FreeBSD and use UFS ;) I must say, I found this whole discussion rather amusing on the sidelines given it's largely a non-problem for non-Linux users. :) "Better performance through engineering elegance." -sc -- Sean Chittenden seanc@FreeBSD.org
On Wednesday 02 April 2003 07:19, you wrote: > > Just switch to FreeBSD and use UFS ;) > > I must say, I found this whole discussion rather amusing on the > sidelines given it's largely a non-problem for non-Linux users. :) > > "Better performance through engineering elegance." Well, this may sound like a troll, but I have said this before and will say that again. I found reiserfs to be faster than ext2, upto 40% at times when we tried a quasi closed source benchmark on a quad xeon machine with SCSI RAID. Everything else being same and defaults used out of box, reiserfs on mandrake9 was far faster in every respect than ext2. I personally find freeBSD UFS to be a better combo based on my workstation tests. I believe freeBSD has a better IO scheuler that utilises disk bandwidth in optimal manner. Scratching (my poor IDE) disk like mad does not happen with freeBSD but linux does it plenty. But I didn't benchmark it for throughput.. Shridhar
On Tue, 2003-04-01 at 19:53, eric soroos wrote: > On Tue, 1 Apr 2003 09:39:17 -0800 (PST) in message <20030401173917.19476.qmail@web21101.mail.yahoo.com>, Shankar K <shan0075@yahoo.com>wrote: > > hi jeff, > > > > go to > > http://www.ca.postgresql.org/docs/momjian/hw_performance/ > > under 'filesystems' slide. > > > > I suspect that is what he's seen. > > >From my experience, ext3 is only a percent or two slower than ext2 under pg_bench. It saves an amazing amount of timeon startup after a failure by not having to fsck to confirm that the filesystem is in a consistent state. > > I believe that ext3 is a metadata journaling system, and not a data journaling system. This would indicate that the PGtransactioning is complimentary to the filesystem journaling, not duplication. It's both. See the -o data=journal|data=ordered|data=writeback mount time option. Andreas
Attachment
On Tue, 2003-04-01 at 19:55, Andrew Sullivan wrote: > I don't know how ext3 could be faster than ext2, since it has to do > more work. Depending upon certain parameters, it can be faster, because it writes the data to the journal serially without head movement. The kernel might be able to write that data in it spot later when the hdd would be idle. So yes, in certain cases, ext3 might be faster than ext2. > > Actually, I have my doubts about _any_ of the journaling filesystems > for Linux: ext3 has a reputation for being slow if you journal in the Well, journaled filesystem usually means only meta-data journaling. ext3 is the only LinuxFS (AFAIK) that offers a fully journaled fs. > real-safe mode, and there have been so many unrepeatable reiserfs > problem reports that I'm loathe to use it for real systems. I had Well, I've been using ReiserFS now for years, and never had any problems with it. Andreas -- Andreas Kostyrka Josef-Mayer-Strasse 5 83043 Bad Aibling
Attachment
... and what *exactly* is the difference?
On Wed, 2003-04-02 at 17:37, Jeffrey D. Brower wrote: > ... and what *exactly* is the difference? Between what? (how about a bit more context?) Andreas -- Andreas Kostyrka Josef-Mayer-Strasse 5 83043 Bad Aibling
Attachment
On Wed, Apr 02, 2003 at 05:18:26PM +0200, Andreas Kostyrka wrote: > Well, I've been using ReiserFS now for years, and never had any problems > with it. Me too. But the "known failure modes" that people keep reporting about have to do with completely trashing, say, a whole page of data. Your directories are fine, but the data is all hosed. I've never had it happen. I've never seen anyone who can consistently reproduce it. But I've certainly read about it often enough to have pretty serious reservations about relying on the filesystem for data I can't afford to lose. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
Are there any comments on JFS regarding real-life safety and speed?
>> This would indicate that the PG transactioning is complimentary to the filesystem journaling, not duplication. >It's both. See the -o data=journal|data=ordered|data=writeback mount >time option. I did a RTFM on that but I am now confused again. I am wondering what the *best* setting is with ext3. When I RTFM the man page for mount, the data=writeback option says plainly that it is fastest but in a crash old data is quite possibly on the dataset. The safest *looks* to be data=journal since the journaling happens before writes are committed to the file (and presumably the journal is used to update the file on the disk to apply the journal entry to the disk file?) and the default is data=ordered which says write to the disk AND THEN to the journal (which seems bizarre to me). How all of that works WITH and/or AGAINST PostgreSQL and what metadata REALLY means is my bottom line quandary. Obviously that is where finding the warm and fuzzy place between speed and safety is found. Jeff
Jeff, > How all of that works WITH and/or AGAINST PostgreSQL and what metadata > REALLY means is my bottom line quandary. Obviously that is where finding > the warm and fuzzy place between speed and safety is found. For your $PGDATA directory, your only need for filesystem journaling is to prevent a painful fsck process on an unexpected power-out. You are not, as a rule, terribly concerned with journaling the data as PostgreSQL already provides some data recovery protection through WAL. As a result, on my one server where I have to use Ext3 (I use Reiser on most machines, and have never had a problem except for one disaster when upgrading Reiser versions), the $PGDATA is mounted "noatime,data=writeback" (BTW, I found that combining "data=writeback" with Linux LVM on RedHat 8.0 resulted in system-fatal mounting errors. Anyone else have this problem?) Of course, if you have a machine with a $60,000 disk array and disk I/O is unlimited, then maybe you want to enable data=journal just for the protection against corruption of the WAL and clog files. -- -Josh Berkus Aglio Database Solutions San Francisco
Thanks for that Josh. I had previously understood that ext3 was a bad thing with PostgreSQL and I went way above and beyond to create it on an Ext2 filesystem (the only one on the server) and mount that. Should I undo that work and go back to Ext3? Jeff ----- Original Message ----- From: "Josh Berkus" <josh@agliodbs.com> To: "Jeffrey D. Brower" <jeff@pointhere.net>; "Andreas Kostyrka" <andreas@mtg.co.at> Cc: "Bruce Momjian" <pgman@candle.pha.pa.us>; <pgsql-performance@postgresql.org>; "Shankar K" <shan0075@yahoo.com>; "eric soroos" <eric-psql@soroos.net> Sent: Wednesday, April 02, 2003 3:05 PM Subject: Re: [PERFORM] ext3 filesystem / linux 7.3 > Jeff, > > > How all of that works WITH and/or AGAINST PostgreSQL and what metadata > > REALLY means is my bottom line quandary. Obviously that is where finding > > the warm and fuzzy place between speed and safety is found. > > For your $PGDATA directory, your only need for filesystem journaling is to > prevent a painful fsck process on an unexpected power-out. You are not, as a > rule, terribly concerned with journaling the data as PostgreSQL already > provides some data recovery protection through WAL. > > As a result, on my one server where I have to use Ext3 (I use Reiser on most > machines, and have never had a problem except for one disaster when upgrading > Reiser versions), the $PGDATA is mounted "noatime,data=writeback" > > (BTW, I found that combining "data=writeback" with Linux LVM on RedHat 8.0 > resulted in system-fatal mounting errors. Anyone else have this problem?) > > Of course, if you have a machine with a $60,000 disk array and disk I/O is > unlimited, then maybe you want to enable data=journal just for the protection > against corruption of the WAL and clog files. > > -- > -Josh Berkus > Aglio Database Solutions > San Francisco
Jeff, > Thanks for that Josh. Welcome > I had previously understood that ext3 was a bad thing with PostgreSQL and I > went way above and beyond to create it on an Ext2 filesystem (the only one > on the server) and mount that. > > Should I undo that work and go back to Ext3? I would. Not necessarily Ext3, mind you; you might want to consider Reiser or JFS, too. My experience has been better with Reiser than Ext3 with Postgres, but I can't back that up with any statistics. (DISCLAIMER: This is not professional advice, and comes with no warranty. If you want professional advice, pay me.) -- -Josh Berkus Aglio Database Solutions San Francisco
On Tuesday, April 1, 2003, at 03:42 PM, Jeffrey D. Brower wrote: > OK so am I hearing: Enough... ...there is waaay too much hearsay going on in this thread. Let's come up with an acceptable test battery and actually settle it once and for all with good hard numbers. It would be worth my while to spend some time on this since the developers I support currently hate pgsql due to performance complaints (on servers that predate my employment there). So if I am going to move them to better servers it would be worth my while to do some homework on what OS and FS is best. I'm not qualified at all to define the tests. I am willing to try it on any OS that will run on a Sun Ultra 5, which would include Linux, several BSD's and Solaris to name a few. It also runs the gammut of filesystems that have been talked about here. The machine isn't a barnstormer but I'm willing to put in an 18GB SCSI drive and try this with many different OS's and FS's if someone qualified will put together an acceptable test suite and it doesn't meet with too much opposition by the gurus here. The test machine: Sun UltraSPARC 5 333MHz UltraSPARC CPU, 2MB cache 256MB RAM whatever SCSI card I can find most quickly either a 9GB or 18GB SCSI drive (whichever I can find most quickly) The test client would likely be an Apple Powerbook G4 800MHz, 512MB, running OS X 10.2.4. Yes the client runs rings around the server but I can afford to abuse the server. While the server is admittedly an older machine, for the purpose of this test it should not matter as long as the hardware configuration is equal for all tests. If we agree on a test suite there is nothing to stop someone from running the same suite on their own hardware and reporting their own results. Anyone game to give a go at this? -- "What difference does it make to the dead, the orphans and the homeless, whether the mad destruction is wrought under the name of totalitarianism or the holy name of liberty or democracy?" - Mahatma Gandhi
Attachment
On Wed, Apr 02, 2003 at 09:44:31PM -0500, Chris Hedemark wrote: > While the server is admittedly an older machine, for the purpose of > this test it should not matter as long as the hardware configuration is > equal for all tests. If we agree on a test suite there is nothing to That's false. One of the big problems with a lot of tuning info is that it tends not to take int consideration hardware, &c. I can tell you for sure that if you have a giant-cache array connected by fibre channel, _it makes no difference_ what the filesystem is. The array is so fast that you can't really fill the cache under normal load anyway. Similarly, if you have enough memory, every read test is going to be as fast as any other: you'll get 100% cache hits, and the same memory configured the same way will always respond at about the same speed. That said, I think you're right to demand some tests, and to say that holding the machine constant and changing filesystems is a good filesystem test. So here are some suggested things, in no real order: 1. Make sure you run out of buffers before you start to read (for read filesystem speed tests). 2. Pull the power plug repeatedly while the server is under load. Judge robustness. 3. Put WAL and data area on different filesystems (to be fair, this should probably be different spindles, but I'll take what I can get) and configure the filesystems in various ways (including, say, writeback for data and full journalling for WAL). See tests above. 4. Make sure your controller doesn't lie about fsync. 5. Test under different loads. 10% writes vs. 90% reads; 20% writes; &c. Compare simple INSERT write with UPDATE write. Compare UPDATE writes where the UPDATEd row is the same one over and over. Make sure you do (2) several times. Lots of these are artificial. But it seems they might reveal something. I'd be particularly keen to hear about what _really_ is up with reiserfs. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
Chris, > ...there is waaay too much hearsay going on in this thread. Let's come > up with an acceptable test battery and actually settle it once and for > all with good hard numbers. It would be worth my while to spend some > time on this since the developers I support currently hate pgsql due to > performance complaints (on servers that predate my employment there). > So if I am going to move them to better servers it would be worth my > while to do some homework on what OS and FS is best. You're not going to be able to determine this for certain, but at least you should be able to debunk some myths. Here's my suggested tests: 1) Read-only test -- numerous small rapidfire queries in the fashion of a PHP web application. PGBench already does this one test ok, maybe you could use that. 2) Complex query test -- run a few 12-table queries with CASE statements, custom functions and subselects and/or UNIONs. 3) Transaction Test -- hit the database with numerous rapid-fire single row updates to a few tables. 4) OLAP Test -- do a few massive updates to thousands of rows based on related data and/or cascading updates to multiple tables and dozens-hundreds of rows. Create large temp tables based on Joe Conway's Crosstab. 5) Mixed use test: combine 1, 2, & 3 in a ratio of 70% 10% 20% on several simultaneous connections. Of course this requires us to have a sample database with at least 100,000 rows of data in one or two tables plus at least 5-10 additional tables with realistically complex relationships. Donor, anyone? Also, we'll have to talk about .conf files ... -- -Josh Berkus Aglio Database Solutions San Francisco
In message <200304022133.44511.josh@agliodbs.com>, Josh Berkus writes: Chris, > ...there is waaay too much hearsay going on in this thread. Let's come > up with an acceptable test battery and actually settle it once and for > all with good hard numbers. It would be worth my while to spend some > time on this since the developers I support currently hate pgsql due to > performance complaints (on servers that predate my employment there). > So if I am going to move them to better servers it would be worth my > while to do some homework on what OS and FS is best. You're not going to be able to determine this for certain, but at least you should be able to debunk some myths. Here's my suggested tests: [...] Also, we'll have to talk about .conf files ... When I installed my postgres, I tried a test program I wrote with all four values of wal_sync, and for my RedHat Linux 8.0 ext3 filesystem (default mount options), and my toy test; open_sync performed the best for me. Thus, I would suggest adding the wal_sync_method as another axis for your testing. -Seth Robertson seth@sysd.com
On Thursday, April 3, 2003, at 12:33 AM, Josh Berkus wrote: > You're not going to be able to determine this for certain, but at > least you > should be able to debunk some myths. Here's my suggested tests: [snip] Being a mere sysadmin, it is creation of the test cases (perl script, maybe?) that I'll have to ask someone else with more of a development bent to help with. My talent is more along the lines of system administration. Plus I am willing to take the time to go through these tests over & over with a different OS or different tuning parameters on the same OS, different FS's, etc. Someone else needs to come up with the test code. The client machine has pgsql on it also if the results are going into a db that won't go away after every test. :) -- "What difference does it make to the dead, the orphans and the homeless, whether the mad destruction is wrought under the name of totalitarianism or the holy name of liberty or democracy?" - Mahatma Gandhi
On Wed, 2 Apr 2003, Josh Berkus wrote: > > I had previously understood that ext3 was a bad thing with PostgreSQL and I > > went way above and beyond to create it on an Ext2 filesystem (the only one > > on the server) and mount that. We recently started using Postgres on a new database server running RH 7.3 and ext3. Due to some kernel problems the machine would crash at random times. Each time it crashed it came back up extremly easily with no data loss. If we were on ext2 coming back up after a crash probably wouldn't have been quite as easy. We have since given up on RH 7.3 and gone with RH Enterprise ES. Just an FIY for any of you out there thinking about moving to RH 7.3 or those that are having problems with 7.3 and ext3. Chris
Chris, > Being a mere sysadmin, it is creation of the test cases (perl script, > maybe?) that I'll have to ask someone else with more of a development > bent to help with. I'll write the test queries and perl scripts if someone else can supply the database. Unfortunately, while I have a few databases that meet the criteria, they are all NDA. Criteria again: Must have at least 100,000 rows with 12+ columns in "main" table. Must have at least 10-12 additional tables, some with FK relationships to the main table and each other. Must be OK to make contents public. More is better up to 500MB. -- Josh Berkus Aglio Database Solutions San Francisco
On Thursday, April 3, 2003, at 11:52 AM, Josh Berkus wrote: > Unfortunately, while I have a few databases that meet the > criteria, they are all NDA. I'm in the same boat. -- "What difference does it make to the dead, the orphans and the homeless, whether the mad destruction is wrought under the name of totalitarianism or the holy name of liberty or democracy?" - Mahatma Gandhi
Can't we generate data? Random data stored in random formats at random sizes would stress the file system wouldn't it? ----- Original Message ----- From: "Josh Berkus" <josh@agliodbs.com> To: "Chris Hedemark" <chrish@trilug.org>; <pgsql-performance@postgresql.org> Sent: Thursday, April 03, 2003 11:52 AM Subject: Re: [PERFORM] ext3 filesystem / linux 7.3 > Chris, > > > Being a mere sysadmin, it is creation of the test cases (perl script, > > maybe?) that I'll have to ask someone else with more of a development > > bent to help with. > > I'll write the test queries and perl scripts if someone else can supply the > database. Unfortunately, while I have a few databases that meet the > criteria, they are all NDA. > > Criteria again: > Must have at least 100,000 rows with 12+ columns in "main" table. > Must have at least 10-12 additional tables, some with FK relationships to the > main table and each other. > Must be OK to make contents public. > More is better up to 500MB. > > -- > Josh Berkus > Aglio Database Solutions > San Francisco > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster
On Thu, 3 Apr 2003, Chris Sutton wrote: > On Wed, 2 Apr 2003, Josh Berkus wrote: > > > > I had previously understood that ext3 was a bad thing with PostgreSQL and I > > > went way above and beyond to create it on an Ext2 filesystem (the only one > > > on the server) and mount that. > > We recently started using Postgres on a new database server running RH 7.3 > and ext3. Due to some kernel problems the machine would crash at random > times. Each time it crashed it came back up extremly easily with no data > loss. If we were on ext2 coming back up after a crash probably wouldn't > have been quite as easy. > > We have since given up on RH 7.3 and gone with RH Enterprise ES. Just an > FIY for any of you out there thinking about moving to RH 7.3 or those that > are having problems with 7.3 and ext3. We're still running RH 7.2 due to issues we had with 7.3 as well.
Jeffery, > Can't we generate data? Random data stored in random formats at random > sizes would stress the file system wouldn't it? In my experience, randomly generated data tends to resemble real data very little in distribution patterns and data types. This is one of the limitations of PGBench. Surely there must be an OSS project out there with a medium-large PG database which is OSS-licensed? I'll post on GENERAL -- -Josh Berkus Aglio Database Solutions San Francisco
Hi Scott, Could you please share with us the problems you had with linux 7.3 would be really interested to know the kernel configs and ext3 filesystem modes Shankar --- "scott.marlowe" <scott.marlowe@ihs.com> wrote: > On Thu, 3 Apr 2003, Chris Sutton wrote: > > > On Wed, 2 Apr 2003, Josh Berkus wrote: > > > > > > I had previously understood that ext3 was a > bad thing with PostgreSQL and I > > > > went way above and beyond to create it on an > Ext2 filesystem (the only one > > > > on the server) and mount that. > > > > We recently started using Postgres on a new > database server running RH 7.3 > > and ext3. Due to some kernel problems the machine > would crash at random > > times. Each time it crashed it came back up > extremly easily with no data > > loss. If we were on ext2 coming back up after a > crash probably wouldn't > > have been quite as easy. > > > > We have since given up on RH 7.3 and gone with RH > Enterprise ES. Just an > > FIY for any of you out there thinking about moving > to RH 7.3 or those that > > are having problems with 7.3 and ext3. > > We're still running RH 7.2 due to issues we had with > 7.3 as well. > > > ---------------------------(end of > broadcast)--------------------------- > TIP 2: you can get off all lists at once with the > unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) __________________________________________________ Do you Yahoo!? Yahoo! Tax Center - File online, calculators, forms, and more http://tax.yahoo.com
On Thu, 3 Apr 2003, Shankar K wrote: > Hi Scott, > > Could you please share with us the problems you had > with linux 7.3 > > would be really interested to know the kernel configs > and ext3 filesystem modes Actually, I had a couple of problems with it, one of which was that I couldn't get it to book with ext3 file systems properly. I think it was something to do with ext3 on linux kernel RAID sets that wouldn't work right. There's probably a fix for it, but 7.2 is pretty stable, and we can wait for 8.0 or maybe look at another distro. I remember there being some other issues I had with configuration stuff like this, but now that it's been many months since I played with it I can't remember them all. My personal problem was that redhat stopped including linuxconf as an rpm package, and the only configuration programs they include don't seem to work well from a command line, but seemed to prefer to be used in X11.
Hey guys, On Thu, 2003-04-03 at 13:19, scott.marlowe wrote: > On Thu, 3 Apr 2003, Shankar K wrote: > > > Hi Scott, > > > > Could you please share with us the problems you had > > with linux 7.3 > > > > would be really interested to know the kernel configs > > and ext3 filesystem modes > > Actually, I had a couple of problems with it, one of which was that I > couldn't get it to book with ext3 file systems properly. I think it was > something to do with ext3 on linux kernel RAID sets that wouldn't work > right. There's probably a fix for it, but 7.2 is pretty stable, and we > can wait for 8.0 or maybe look at another distro. > Normally I stay far far away from the distro wars / filesystem discussions. However I'd like to offer information about the systems we use here at OFS. The 2 core database servers are a matched pair of system with the following statistics. Dual AMD MP 1800's Tyan Thunder K7x motherboard LSI Megaraid Elite 1650 controller w/ battery pack & 128 Mb cache 5 Seagate Cheetak 10k 36 Gig drives Configured in a raid 1+0 w/ hot spare. Both are using the stock redhat 7.3 kernel w/ the latest LSI megaraid drivers and firmware. The postgresql cluster itself contains the records and information necessary to process loans and loan applications. We are using rserv ( from contrib ) to replicate data from three databases in the cluster between the two servers. ( Hahah, I think we may be the only people using this in production or something. ) At any rate we use ext3 on the filesystems and we've had no problems at all with the systems. Everything is stable and runs. We keep the machines running and available 24/7 with scheduled downtime transitions to the redundant servers as we need to for whatever kind of enhancements. The largest table in the cluster btw, has 4.2 million tuples in it and its the rserv log table. Hope this gives you some additional information to base your decisions on. Sincerely, Will LaShell <snip>
Attachment
Will, > At any rate we use ext3 on the filesystems and we've had no problems at > all with the systems. Everything is stable and runs. We keep the > machines running and available 24/7 with scheduled downtime transitions > to the redundant servers as we need to for whatever kind of > enhancements. Hey, can we use you as a case study for advocacy.openoffice.org? -- -Josh Berkus ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete information technology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco
On Wed, 2003-04-02 at 17:56, Andrew Sullivan wrote: > On Wed, Apr 02, 2003 at 05:18:26PM +0200, Andreas Kostyrka wrote: > > > Well, I've been using ReiserFS now for years, and never had any problems > > with it. > > Me too. But the "known failure modes" that people keep reporting > about have to do with completely trashing, say, a whole page of data. > Your directories are fine, but the data is all hosed. > > I've never had it happen. I've never seen anyone who can > consistently reproduce it. But I've certainly read about it often > enough to have pretty serious reservations about relying on the > filesystem for data I can't afford to lose. Well, than backups and statistics are your only solution. Only way to know if something works is to test it for some time. (You never know if something in your use doesn't trigger some border case of malfunction in the kernel.) Andreas -- Andreas Kostyrka Josef-Mayer-Strasse 5 83043 Bad Aibling
Attachment
Yes, I think we'd be willing to do that. ( 480 967 7530 ) is the phone contact for the company, IT manager is Trevor Mantle and you can ask for me as well. wlashell@outsourcefinancial.com is my work email you can feel free to use. Sincerely, Will LaShell On Thu, 2003-04-03 at 16:12, Josh Berkus wrote: > Will, > > > > At any rate we use ext3 on the filesystems and we've had no problems at > > all with the systems. Everything is stable and runs. We keep the > > machines running and available 24/7 with scheduled downtime transitions > > to the redundant servers as we need to for whatever kind of > > enhancements. > > Hey, can we use you as a case study for advocacy.openoffice.org? > > -- > -Josh Berkus > > ______AGLIO DATABASE SOLUTIONS___________________________ > Josh Berkus > Complete information technology josh@agliodbs.com > and data management solutions (415) 565-7293 > for law firms, small businesses fax 621-2533 > and non-profit organizations. San Francisco > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster
Attachment
We've had 2 crashes on red hat 7.3 in about 9 months of running. Both instances required manual power off/on of the server, but everything came up nice and ready to go. The problems seemed to stem from i/o load with the kernel (not postgresql specific), but should be resolved with the latest Red Hat kernel. If you search on buffer_jdirty in bugzilla you'll see a couple of reports. Robert Treat On Thu, 2003-04-03 at 14:45, Shankar K wrote: > Hi Scott, > > Could you please share with us the problems you had > with linux 7.3 > > would be really interested to know the kernel configs > and ext3 filesystem modes > > Shankar > > --- "scott.marlowe" <scott.marlowe@ihs.com> wrote: > > On Thu, 3 Apr 2003, Chris Sutton wrote: > > > > > On Wed, 2 Apr 2003, Josh Berkus wrote: > > > > > > > > I had previously understood that ext3 was a > > bad thing with PostgreSQL and I > > > > > went way above and beyond to create it on an > > Ext2 filesystem (the only one > > > > > on the server) and mount that. > > > > > > We recently started using Postgres on a new > > database server running RH 7.3 > > > and ext3. Due to some kernel problems the machine > > would crash at random > > > times. Each time it crashed it came back up > > extremly easily with no data > > > loss. If we were on ext2 coming back up after a > > crash probably wouldn't > > > have been quite as easy. > > > > > > We have since given up on RH 7.3 and gone with RH > > Enterprise ES. Just an > > > FIY for any of you out there thinking about moving > > to RH 7.3 or those that > > > are having problems with 7.3 and ext3. > > > > We're still running RH 7.2 due to issues we had with > > 7.3 as well. > >
Josh Berkus wrote: > Jeffery, > > > Can't we generate data? Random data stored in random formats at random > > sizes would stress the file system wouldn't it? > > In my experience, randomly generated data tends to resemble real data very > little in distribution patterns and data types. This is one of the > limitations of PGBench. Okay, from this it sounds like what we need is information on the data types typically used for real world applications and information on the the distribution patterns for each type (the latter could get quite complex and varied, I'm sure, but since we're after something that's typical, we only need a few examples). So perhaps the first step in this is to write something that will show what the distribution pattern for data in a table is? With that information, we *could* randomly generate data that would conform to the statistical patterns seen in the real world. In fact, even though the databases you have access to are all proprietary, I'm pretty sure their owners would agree to let you run a program that would gather statistical distribution about it. Then (as long as they agree) you could copy the schema itself, recreate it on the test system, and randomly generate the data. -- Kevin Brown kevin@sysexperts.com
Kevin, > So perhaps the first step in this is to write something that will show > what the distribution pattern for data in a table is? With that > information, we *could* randomly generate data that would conform to > the statistical patterns seen in the real world. Sure. But I think it'll be *much* easier just to use portions of the FCC database. You want to start working on converting it to PostgreSQL? -- Josh Berkus Aglio Database Solutions San Francisco