Re: POSIX file updates - Mailing list pgsql-performance
From | James Mansion |
---|---|
Subject | Re: POSIX file updates |
Date | |
Msg-id | 47F46BFF.1040805@mansionfamily.plus.com Whole thread Raw |
In response to | Re: POSIX file updates (Greg Smith <gsmith@gregsmith.com>) |
List | pgsql-performance |
Greg Smith wrote: > You turn on direct I/O differently under Solaris then everywhere else, > and nobody has bothered to write the patch (trivial) and OS-specific > code to turn it on only when appropriate (slightly tricker) to handle > this case. There's not a lot of pressure on PostgreSQL to handle this > case correctly when Solaris admins are used to doing direct I/O tricks > on filesystems already, so they don't complain about it much. I'm not sure that this will survive use of PostgreSQL on Solaris with more users on Indiana though. Which I'm hoping will happen > RPM of the drive. Seen it on UFS and ZFS, both seem to do the right > thing here. But ZFS *is* smart enough to manage the cache, albeit sometimes with unexpected consequences as with the 2530 here http://milek.blogspot.com/. > You seem to feel that there is an alternative here that PostgreSQL > could take but doesn't. There is not. You either wait until writes > hit disk, which by physical limitations only happens at RPM speed and > therefore is too slow to commit for many cases, or you cache in the > most reliable memory you've got and hope for the best. No software > approach can change any of that. Indeed I do, but the issue I have is that the problem is that some popular operating systems (lets try to avoid the flame war) fail to expose control of disk caches and the so the code assumes that the onus is on the admin and the documentation rightly says so. But this is as much a failure of the POSIX API and operating systems to expose something that's necessary and it seems to me rather valuable that the application be able to work with such facilities as they become available. Exposing the flush cache mechanisms isn't dangerous and can improve performance for non-dbms users of the same drives. I think manipulation of this stuff is a major concern for a DBMS that might be used by amateur SAs, and if at all possible it should work out of the box on common hardware. So far as I can tell, SQLServerExpress makes a pretty good attempt at it, for example It might be enough for initdb to whinge and fail if it thinks the disks are behaving insanely unless the wouldbe dba sets a 'my_disks_really_are_that_fast' flag in the config. At the moment anyone can apt-get themselves a DBMS which may become a liability. At the moment: - casual use is likely to be unreliable - uncontrolled deferred IO can result in almost DOS-like checkpoints These affect other systems than PostgreSQL too - but would be avoidable if the drive cache flush was better exposed and the IO was staged to use it. There's no reason to block on anything but the final IO in a WAL commit after all, and with the deferred commit feature (which I really like for workflow engines) intermediate WAL writes of configured chunk size could let the WAL drives get on with it. Admitedly I'm assuming a non-blocking write through - direct IO from a background thread (process if you must) or aio. > There are plenty of cases where the so-called "lying" drives > themselves are completely stupid on their own regardless of operating > system. With modern NCQ capable drive firmware? Or just with older PATA stuff? There's an awful lot of fud out there about SCSI vs IDE still. James
pgsql-performance by date: