Thread: Raw devices vs. Filesystems
Can you tell me (or at least guide me to a palce where i can find the answer) what are the benefits of filesystems over raw devices? And what filesystem is the best for postgresql performance? _________________________________________________________________ The new MSN 8: advanced junk mail protection and 2 months FREE* http://join.msn.com/?page=features/junkmail
Hello Jaime, I think you're on the right track but have gotten some concepts possibly confused. As I remember, the original email asked if Postgres could be run in a "raw" mode. Another submitter told us that it can not. ( Did I read that correctly, everyone ? ) This means run Postgress in a raw or character mode versus the "standard" block mode as you would handle other files in a filesystem. The primary advantage for a raw mode is speed. ORACLE allows this method of operation. So ... your last question is now rather moot. For best performance assure your setup parameters and Linux kernel parameters are optimized. Help with that is a few keystrokes from anywhere. Terry Jaime Casanova wrote: > Can you tell me (or at least guide me to a palce where i can find the > answer) what are the benefits of filesystems over raw devices? > > And what filesystem is the best for postgresql performance? > -- Terry L. Hampton Project Manager LimaCorp, LLC www.limacorp.com 513.587.1874
After takin a swig o' Arrakan spice grog, el_vigia_ec@hotmail.com ("Jaime Casanova") belched out: > Can you tell me (or at least guide me to a palce where i can find the > answer) what are the benefits of filesystems over raw devices? For PostgreSQL, filesystems have the merit that you can actually use them. PostgreSQL doesn't support use of "raw devices." Two major benefits of using filesystems as opposed to raw devices are that: a) The use of raw devices is dramatically non-portable; you have to reimplement data access on every platform you are trying to support; b) The use of raw devices essentially mandates that you implement some form of generic filesystem on top of them, which adds considerable complexity to your code. Two benefits to raw devices are claimed... c) It's faster. But that assumes that the "cooked" filesystems are implemented fairly badly. That was typically true, a dozen years ago, but it isn't so typical now, particularly with a fancy cacheing controller. d) It guarantees application control of update ordering. Of course, with a cacheing controller, or disk drives that lie to one degree or another, those guarantees might be gone anyways. There are other filesystem advantages, such as e) Shifting "cooked" data around may be as simple as a "mv," whereas reorganizing on raw disk requires DB-specific tools... > And what filesystem is the best for postgresql performance? That would depend, assortedly, on what OS you are using, what kind of hardware you are running on, what kind of usage patterns you have, as well as on how you define the notion of "best." Absent of any indication of any of those things, the best that can be said is "that depends..." -- (format nil "~S@~S" "cbbrowne" "acm.org") http://cbbrowne.com/info/languages.html TTY Message from The-XGP at MIT-AI: The-XGP@AI 02/59/69 02:59:69 Your XGP output is startling.
No point to beating a dead horse (other than the sheer joy of the thing) since postgres does not have raw device support,but ... raw devices, at least on solaris, are about 10 times as fast as cooked file systems for Informix. This might still be a gainfor postgres' performance, but the portability issues remain. raw device use in Informix is safer in terms of data because Informix does not ever have to use the regular file system andso issues of buffering and so on go away. My understanding -- fortunately not ever tried in the real world -- is thatpostgres' WAL log system is as reliable as Informix writing to raw devices. raw devices can't be copied or tampered with with regular file tools (mv, cp etc.); this changes how backups get done butalso adds a layer of insulation between valuable data and users. Greg Williamson DBA GlobeXplorer LLC -----Original Message----- From: Christopher Browne [mailto:cbbrowne@acm.org] Sent: Mon 3/29/2004 10:28 AM To: pgsql-admin@postgresql.org Cc: Subject: Re: [ADMIN] Raw devices vs. Filesystems After takin a swig o' Arrakan spice grog, el_vigia_ec@hotmail.com ("Jaime Casanova") belched out: > Can you tell me (or at least guide me to a palce where i can find the > answer) what are the benefits of filesystems over raw devices? For PostgreSQL, filesystems have the merit that you can actually use them. PostgreSQL doesn't support use of "raw devices." Two major benefits of using filesystems as opposed to raw devices are that: a) The use of raw devices is dramatically non-portable; you have to reimplement data access on every platform you are trying to support; b) The use of raw devices essentially mandates that you implement some form of generic filesystem on top of them, which adds considerable complexity to your code. Two benefits to raw devices are claimed... c) It's faster. But that assumes that the "cooked" filesystems are implemented fairly badly. That was typically true, a dozen years ago, but it isn't so typical now, particularly with a fancy cacheing controller. d) It guarantees application control of update ordering. Of course, with a cacheing controller, or disk drives that lie to one degree or another, those guarantees might be gone anyways. There are other filesystem advantages, such as e) Shifting "cooked" data around may be as simple as a "mv," whereas reorganizing on raw disk requires DB-specific tools... > And what filesystem is the best for postgresql performance? That would depend, assortedly, on what OS you are using, what kind of hardware you are running on, what kind of usage patterns you have, as well as on how you define the notion of "best." Absent of any indication of any of those things, the best that can be said is "that depends..." -- (format nil "~S@~S" "cbbrowne" "acm.org") http://cbbrowne.com/info/languages.html TTY Message from The-XGP at MIT-AI: The-XGP@AI 02/59/69 02:59:69 Your XGP output is startling. ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend
gsw@globexplorer.com ("Gregory S. Williamson") writes: > No point to beating a dead horse (other than the sheer joy of the > thing) since postgres does not have raw device support, but ... raw > devices, at least on solaris, are about 10 times as fast as cooked > file systems for Informix. This might still be a gain for postgres' > performance, but the portability issues remain. That claim seems really rather remarkable. It implies an entirely stunning degree of inefficiency in the implementation of filesystems on Solaris. The amount of indirection involved in walking through i-nodes and such is something I would expect to introduce some percentage of performance loss, but for it to introduce overhead of over 900% presumably implies that Sun (and/or Veritas) got something really horribly wrong. -- select 'cbbrowne' || '@' || 'cbbrowne.com'; http://www.ntlug.org/~cbbrowne/nonrdbms.html Rules of the Evil Overlord #1. "My Legions of Terror will have helmets with clear plexiglass visors, not face-concealing ones." <http://www.eviloverlord.com/>
Remarkable, perhaps, to you. Not in the Informix world. But irrelevant to postgres, no ? -----Original Message----- From: Chris Browne [mailto:cbbrowne@acm.org] Sent: Tuesday, April 06, 2004 1:57 PM To: pgsql-admin@postgresql.org Subject: Re: [ADMIN] Raw devices vs. Filesystems gsw@globexplorer.com ("Gregory S. Williamson") writes: > No point to beating a dead horse (other than the sheer joy of the > thing) since postgres does not have raw device support, but ... raw > devices, at least on solaris, are about 10 times as fast as cooked > file systems for Informix. This might still be a gain for postgres' > performance, but the portability issues remain. That claim seems really rather remarkable. It implies an entirely stunning degree of inefficiency in the implementation of filesystems on Solaris. The amount of indirection involved in walking through i-nodes and such is something I would expect to introduce some percentage of performance loss, but for it to introduce overhead of over 900% presumably implies that Sun (and/or Veritas) got something really horribly wrong. -- select 'cbbrowne' || '@' || 'cbbrowne.com'; http://www.ntlug.org/~cbbrowne/nonrdbms.html Rules of the Evil Overlord #1. "My Legions of Terror will have helmets with clear plexiglass visors, not face-concealing ones." <http://www.eviloverlord.com/> ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster
Note that the innefficiency could well lie with Informix's file system interfacing as easily as it could lie with the operating system. Do they charge extra for being able to access raw devices or somehow make more money by supporting them? If so, there could be a clear business case for lots of uwaits() in the code path that handles file systems. I'm just saying it's a possibility. On Tue, 6 Apr 2004, Gregory S. Williamson wrote: > Remarkable, perhaps, to you. Not in the Informix world. But irrelevant to postgres, no ? > > -----Original Message----- > From: Chris Browne [mailto:cbbrowne@acm.org] > Sent: Tuesday, April 06, 2004 1:57 PM > To: pgsql-admin@postgresql.org > Subject: Re: [ADMIN] Raw devices vs. Filesystems > > > gsw@globexplorer.com ("Gregory S. Williamson") writes: > > No point to beating a dead horse (other than the sheer joy of the > > thing) since postgres does not have raw device support, but ... raw > > devices, at least on solaris, are about 10 times as fast as cooked > > file systems for Informix. This might still be a gain for postgres' > > performance, but the portability issues remain. > > That claim seems really rather remarkable. > > It implies an entirely stunning degree of inefficiency in the > implementation of filesystems on Solaris. > > The amount of indirection involved in walking through i-nodes and such > is something I would expect to introduce some percentage of > performance loss, but for it to introduce overhead of over 900% > presumably implies that Sun (and/or Veritas) got something really > horribly wrong. >
> gsw@globexplorer.com ("Gregory S. Williamson") writes: >>No point to beating a dead horse (other than the sheer joy of the >>thing) since postgres does not have raw device support, but ... raw >>devices, at least on solaris, are about 10 times as fast as cooked >>file systems for Informix. This might still be a gain for postgres' >>performance, but the portability issues remain. > From: Chris Browne [mailto:cbbrowne@acm.org] > That claim seems really rather remarkable. > It implies an entirely stunning degree of inefficiency in the > implementation of filesystems on Solaris. > The amount of indirection involved in walking through i-nodes and such > is something I would expect to introduce some percentage of > performance loss, but for it to introduce overhead of over 900% > presumably implies that Sun (and/or Veritas) got something really > horribly wrong. Gregory S. Williamson wrote: > Remarkable, perhaps, to you. Not in the Informix world. But > irrelevant to postgres, no ? I too am a little surprised by those numbers, but I think the potential for a performance gain of that order is relevant. As I once heard someone remark: "When show up at a pool hall talking those kind of odds, well, people start making phone calls." - Marsh
Chris Browne <cbbrowne@acm.org> writes: > That claim seems really rather remarkable. > It implies an entirely stunning degree of inefficiency in the > implementation of filesystems on Solaris. Solaris has a reputation for having stunning degrees of inefficiency in a number of places :-(. On the other hand I've also heard it praised for its ability to survive partial hardware failures (eg, N out of M CPUs down), so maybe that's the price you gotta pay. But to get back to the point of this discussion: to allow PG to use raw devices instead of filesystems, we'd first have to do a ton of portability work (since raw disk access is nowhere standard), and abandon our principle that Postgres does not run as root (since raw disk access is not permitted to non-root processes by any sane sysadmin). But that last is a mighty comforting principle to have, anytime someone complains that their el cheapo whitebox PC locks up as soon as they start to stress the database. I know I'd have wasted a lot more time chasing random hardware breakages if I couldn't say "system freezes and filesystem corruption are Clearly Not Our Fault". After that, we get to implement our own filesystem-equivalent management of disk space allocation, disk I/O scheduling, etc. Are we really smarter than all those kernel hackers doing this for a living? I doubt it. After that, we get to re-optimize all the existing Postgres behaviors that are designed to sit on top of a standard Unix buffering filesystem layer. After that, we might reap some performance benefits. Or maybe not. There's not a heck of a lot of hard evidence that we would --- and what there is traces to twenty-year-old assumptions about disk drive and OS behavior, which are quite unlikely to still apply today. Personally, I have a lot of more-promising projects to pursue... regards, tom lane
...and on Wed, Apr 07, 2004 at 01:26:02AM -0400, Tom Lane used the keyboard: > > After that, we get to implement our own filesystem-equivalent management > of disk space allocation, disk I/O scheduling, etc. Are we really > smarter than all those kernel hackers doing this for a living? I doubt it. > > After that, we get to re-optimize all the existing Postgres behaviors > that are designed to sit on top of a standard Unix buffering filesystem > layer. > > After that, we might reap some performance benefits. Or maybe not. > There's not a heck of a lot of hard evidence that we would --- and > what there is traces to twenty-year-old assumptions about disk drive > and OS behavior, which are quite unlikely to still apply today. > > Personally, I have a lot of more-promising projects to pursue... > Has anyone tried PostgreSQL on top of OCFS? Personally, I'm not sure it would even work, as Oracle clearly state that OCFS was _never_ meant to be a fully fledged UNIX filesystem with POSIX features such as correct timestamp updates, inode changes, etc., but OCFSv2 brings some features that might lead one into thinking they're about to make it suitable for uses beyond that of just having Oracle databases sitting on top of it. Furthermore, this filesystem would be a blazing one stop solution for all replication issues PostgreSQL currently suffers from, as its main design goal was to present "a consistent file system image across the servers in a cluster". Now, if both goals can be achieved in one go, hell, I'm willing to try it out myself in an attempt to extract off of it, some performance indicators that could be compared to other database performance tests sent to both this and the PERFORM mailing list. So, anyone? :) Cheers, -- Grega Bremec Senior Administrator Noviforum Ltd., Software & Media http://www.noviforum.si/
Attachment
In article <5719.1081315562@sss.pgh.pa.us>, Tom Lane <tgl@sss.pgh.pa.us> writes: > But to get back to the point of this discussion: to allow PG to use raw > devices instead of filesystems, we'd first have to do a ton of > portability work (since raw disk access is nowhere standard), and > abandon our principle that Postgres does not run as root (since raw disk > access is not permitted to non-root processes by any sane sysadmin). Why not? In MySQL/InnoDB, you do a "chown mysql.daemon /dev/raw/raw1" (or whatever raw disk you want to access), and that's all. > After that, we get to implement our own filesystem-equivalent management > of disk space allocation, disk I/O scheduling, etc. Are we really > smarter than all those kernel hackers doing this for a living? I doubt it. Ditto. I don't have hard numbers for MySQL, but I didn't see any noticeable improvement when messing with raw disks (at least under Linux).
Grega, > Furthermore, this filesystem would be a blazing one stop solution for > all replication issues PostgreSQL currently suffers from, as its main > design goal was to present "a consistent file system image across the > servers in a cluster". Does it work, though? Without Oracle admin tools? > Now, if both goals can be achieved in one go, hell, I'm willing to try > it out myself in an attempt to extract off of it, some performance > indicators that could be compared to other database performance tests > sent to both this and the PERFORM mailing list. Hey, any test you wanna run is fine with us. I'm pretty sure that OCFS belongs to Oracle, though, patent & copyright, so we couldn't actually use it in practice. If your intention in this test is to show the superiority of raw devices, let me give you a reality check: barring some major corporate backing getting involved, we can't possibly implement our own PG-FS for database support. We already have a TODO list which is far too long for our developer pool, and implementing a custom FS either takes a large team (OCFS) or several years of development (Reiser). Now, if you know somebody who might pay for one, then great .... -- Josh Berkus Aglio Database Solutions San Francisco
On Wed, Apr 07, 2004 at 09:09:16AM -0700, Josh Berkus wrote: > If your intention in this test is to show the superiority of raw devices, let > me give you a reality check: barring some major corporate backing getting > involved, we can't possibly implement our own PG-FS for database support. We > already have a TODO list which is far too long for our developer pool, and > implementing a custom FS either takes a large team (OCFS) or several years of > development (Reiser). Is there any documentation as to what guarantees PostgreSQL requires from the filesystem, or what posix semantics can be relaxed? Cheers, Steve
On Wednesday, April 07, 2004 1:26 AM Tom Lane wrote: > > But to get back to the point of this discussion: to allow PG > to use raw devices instead of filesystems, we'd first have to do a ton of > portability work ... [The following is said in a low, tentative voice :) ] I wonder if writing the postgresql data structures as HDF5 data structures (http://hdf.ncsa.uiuc.edu/whatishdf5.html) withina single HDF5 file (perhaps the WAL files would still reside elsewhere) would improve performance while allowing HDF5to handle portability, and other useful features, is a better solution than the relying on filesystem features. HDF5 actually provides an added portability advantage that postgresql does not currently enjoy: "a completely portable file format, so that a file can be written on any system and read on any other" (See http://hdf.ncsa.uiuc.edu/HDF5/RD100-2002/All_About_HDF5.pdf). The HDF5 "distribution" includes tools for dumping data structures, etc. so if you're hooked on filesystem level operations,you have the ability to inspect postgresql data structures within the HDF5 file, i.e., "outside postgresql". HDF5's is also designed for clustered/grid computing systems: "The HDF5 format and library provide a powerful means of organizing and accessing data in a manner that allows scientiststo share, process, and manipulate data in today's heterogeneous and quickly-evolving high-performance computationalenvironment, including the emerging computational GRIDs." (http://hdf.ncsa.uiuc.edu/HDF5/RD100-2002/All_About_HDF5.pdf,p. 3). So, the main purpose of this post is to suggest that HDF5's design moves a postgresql version built on a HDF5 datastore thatmuch closer to being ready for cluster-computing environments, with respect to the datastore (there's still the sharedmemory, etc., that need to be addressed, but ...). We're playing with HDF5 from Python (see the pytables project) for our "analytics" work, but that requires moving data outof postgresql. I suspect that an SQL interface to HDF5 data structures using postgresql would be a lot more convenient,and that postgresql would gain multiple benefits from having all its data structures in a single HDF5 file. OTOH,maybe us analytics types are better off with Python over HDF5 and "postgresql on HDF5" is not a net win for postgresql.Still, there seems to a great advantage to having rich data structures to operate on rather than just "files",and allowing the HDF5 library to deal with portability, I/O efficiency, and clustering. Hope my $0.02 worth was. Cheers, Murthy
...and on Wed, Apr 07, 2004 at 09:09:16AM -0700, Josh Berkus used the keyboard: > > Does it work, though? Without Oracle admin tools? Hello, Josh. :) Well, as I said, that's why I was asking - I'm willing to give it a go if nobody can prove me wrong. :) > > Now, if both goals can be achieved in one go, hell, I'm willing to try > > it out myself in an attempt to extract off of it, some performance > > indicators that could be compared to other database performance tests > > sent to both this and the PERFORM mailing list. > > Hey, any test you wanna run is fine with us. I'm pretty sure that OCFS > belongs to Oracle, though, patent & copyright, so we couldn't actually use it > in practice. I thought you knew - OCFS, OCFS-Tools and OCFSv2 have not only been open- source for quite a while now - they're released under the GPL. http://oss.oracle.com/projects/ocfs/ http://oss.oracle.com/projects/ocfs-tools/ http://oss.oracle.com/projects/ocfs2/ I don't know what that means to you (probably nothing good, as PostgreSQL is released under the BSD license), but it most definitely can be considered a good thing for the end user, as she can download it, compile, and set it up on her disks, without the need to pay Oracle royalties. :) > If your intention in this test is to show the superiority of raw devices, let > me give you a reality check: barring some major corporate backing getting > involved, we can't possibly implement our own PG-FS for database support. We > already have a TODO list which is far too long for our developer pool, and > implementing a custom FS either takes a large team (OCFS) or several years of > development (Reiser). Not really - I was just thinking about something not-entirely-a-filesystem and POK!, OCFS sprang to mind. It omits many POSIX features that slow down a traditional filesystem, yet it does know the concept of inodes and most of all, it's _really_ heavy on caching. As such, it sounded quite promising to me, but trial, I think, is the best test. The question does spring up though, that Steve raised in another post - just for the record, what POSIX semantics can a postmaster live without in a filesystem? Cheers, -- Grega Bremec Senior Administrator Noviforum Ltd., Software & Media http://www.noviforum.si/
Attachment
Grega, > Well, as I said, that's why I was asking - I'm willing to give it a go > if nobody can prove me wrong. :) Why not? If you have time? > I thought you knew - OCFS, OCFS-Tools and OCFSv2 have not only been open- > source for quite a while now - they're released under the GPL. Keen! Wonder if we can make them regret it. Seriously, if Oracle opened this stuff, it's probably becuase they used some GPL components in it. It also probably means that it won't work for anything but Oracle ... > I don't know what that means to you (probably nothing good, as PostgreSQL > is released under the BSD license), Well, it just means that we can't ship OCFS with PostgreSQL. > The question does spring up though, that Steve raised in another post - > just for the record, what POSIX semantics can a postmaster live without in > a filesystem? You might want to ask that question again on Hackers. I don't know the answer, myself. -- Josh Berkus Aglio Database Solutions San Francisco
josh@agliodbs.com (Josh Berkus) wrote: >> Well, as I said, that's why I was asking - I'm willing to give it a go >> if nobody can prove me wrong. :) > > Why not? If you have time? True enough. >> I thought you knew - OCFS, OCFS-Tools and OCFSv2 have not only been >> open- source for quite a while now - they're released under the >> GPL. > > Keen! Wonder if we can make them regret it. > > Seriously, if Oracle opened this stuff, it's probably becuase they > used some GPL components in it. It also probably means that it > won't work for anything but Oracle ... It could be that the experiment shows that OCFS isn't all that helpful. Or that it helps cover inadequacies in certain aspects of how Oracle accesses filesystems. If it _does_ show that it is helpful, then that may suggest a filesystem implementation strategy useful for the BSD folks. The main "failure case" would be if the exercise shows that using OCFS is pretty futile. -- select 'cbbrowne' || '@' || 'acm.org'; http://www3.sympatico.ca/cbbrowne/linux.html Do you know where your towel is?