Home > mailing lists

Re: POSIX file updates - Mailing list pgsql-performance

From	James Mansion
Subject	Re: POSIX file updates
Date	April 3, 2008 02:28:51
Msg-id	47F46BFF.1040805@mansionfamily.plus.com Whole thread Raw
In response to	Re: POSIX file updates (Greg Smith <gsmith@gregsmith.com>)
List	pgsql-performance

Tree view

Greg Smith wrote:
> You turn on direct I/O differently under Solaris then everywhere else,
> and nobody has bothered to write the patch (trivial) and OS-specific
> code to turn it on only when appropriate (slightly tricker) to handle
> this case. There's not a lot of pressure on PostgreSQL to handle this
> case correctly when Solaris admins are used to doing direct I/O tricks
> on filesystems already, so they don't complain about it much.
I'm not sure that this will survive use of PostgreSQL on Solaris with
more users
on Indiana though. Which I'm hoping will happen
> RPM of the drive.  Seen it on UFS and ZFS, both seem to do the right
> thing here.
But ZFS *is* smart enough to manage the cache, albeit sometimes with
unexpected
consequences as with the 2530 here http://milek.blogspot.com/.
> You seem to feel that there is an alternative here that PostgreSQL
> could take but doesn't.  There is not.  You either wait until writes
> hit disk, which by physical limitations only happens at RPM speed and
> therefore is too slow to commit for many cases, or you cache in the
> most reliable memory you've got and hope for the best.  No software
> approach can change any of that.
Indeed I do, but the issue I have is that the problem is that some
popular operating
systems (lets try to avoid the flame war) fail to expose control of disk
caches and the
so the code assumes that the onus is on the admin and the documentation
rightly says
so.  But this is as much a failure of the POSIX API and operating
systems to expose
something that's necessary and it seems to me rather valuable that the
application be
able to work with such facilities as they become available. Exposing the
flush cache
mechanisms isn't dangerous and can improve performance for non-dbms users of
the same drives.

I think manipulation of this stuff is a major concern for a DBMS that
might be
used by amateur SAs, and if at all possible it should work out of the
box on common
hardware.  So far as I can tell, SQLServerExpress makes a pretty good
attempt
at it, for example It might be enough for initdb to whinge and fail if
it thinks the
disks are behaving insanely unless the wouldbe dba sets a
'my_disks_really_are_that_fast'
flag in the config. At the moment anyone can apt-get themselves a DBMS
which may
become a liability.

At the moment:
 - casual use is likely to be unreliable
 - uncontrolled deferred IO can result in almost DOS-like checkpoints

These affect other systems than PostgreSQL too - but would be avoidable
if the
drive cache flush was better exposed and the IO was staged to use it.
There's no
reason to block on anything but the final IO in a WAL commit after all,
and with
the deferred commit feature (which I really like for workflow engines)
intermediate
WAL writes of configured chunk size could let the WAL drives get on with it.
Admitedly I'm assuming a non-blocking write through - direct IO from a
background thread (process if you must) or aio.

> There are plenty of cases where the so-called "lying" drives
> themselves are completely stupid on their own regardless of operating
> system.
With modern NCQ capable drive firmware? Or just with older PATA stuff?
There's
an awful lot of fud out there about SCSI vs IDE still.

James

pgsql-performance by date:

From: "bitaoxiao"
Date: 03 April 2008, 01:56:21
Subject: Max shared_buffers

From: "chemuduguntar@gmail.com"
Date: 03 April 2008, 06:24:29
Subject: Re: Performance Implications of Using Exceptions

Re: POSIX file updates - Mailing list pgsql-performance

Previous

Next