Re: optimization by removing the file system layer? - Mailing list pgsql-general
From | Giles Lean |
---|---|
Subject | Re: optimization by removing the file system layer? |
Date | |
Msg-id | 10210.961149148@nemeton.com.au Whole thread Raw |
In response to | Re: optimization by removing the file system layer? (Jurgen Defurne <defurnj@glo.be>) |
List | pgsql-general |
> I think that the Un*x filesystem is one of the reasons that large > database vendors rather use raw devices, than filesystem storage > files. This used to be the preference, back in the late 80s and possibly early 90s. I'm seeing a preference toward using the filesystem now, possibly with some sort of async I/O and co-operation from the OS filesystem about interactions with the filesystem cache. Performance preferences don't stand still. The hardware changes, the software changes, the volume of data changes, and different solutions become preferable. > Using a raw device on the disk gives them the possibility to have > complete control over their files, indices and objects without being > bothered by the operating system. > > This speeds up things in several ways : > - the least possible OS intervention Not that this is especially useful, necessarily. If the "raw" device is in fact managed by a logical volume manager doing mirroring onto some sort of storage array there is still plenty of OS code involved. The cost of using a filesystem in addition may not be much if anything and of course a filesystem is considerably more flexible to administer (backup, move, change size, check integrity, etc.) > - choose block sizes according to applications > - reducing fragmentation > - packing data in nearby cilinders ... but when this storage area is spread over multiple mechanisms in a smart storage array with write caching, you've no idea what is where anyway. Better to let the hardware or at least the OS manage this; there are so many levels of caching between a database and the magnetic media that working hard to influence layout is almost certainly a waste of time. Kirk McKusick tells a lovely story that once upon a time it used to be sensible to check some registers on a particular disk controller to find out where the heads were when scheduling I/O. Needless to say, that is history now! There's a considerable cost in complexity and code in using "raw" storage too, and it's not a one off cost: as the technologies change, the "fast" way to do things will change and the code will have to be updated to match. Better to leave this to the OS vendor where possible, and take advantage of the tuning they do. > - Anyone other ideas -> the sky is the limit here > It also aids portability, at least on platforms that have an > equivalent of a raw device. I don't understand that claim. Not much is portable about raw devices, and they're typically not nearlly as well documented as the filesystem interfaces. > It is also independent of the standard implemented Un*x filesystems, > for which you will have to pay extra if you want to take extra > measures against power loss. Rather, it is worse. With a Unix filesystem you get quite defined semantics about what is written when. > The problem with e.g. e2fs, is that it is not robust enough if a CPU > fails. ext2fs doesn't even claim to have Unix filesystem semantics. Regards, Giles
pgsql-general by date: