Re: a question about Direct I/O and double buffering

From: david@lang.hm
Subject: Re: a question about Direct I/O and double buffering
Date: ,
Msg-id: Pine.LNX.4.64.0704052128250.28411@asgard.lang.hm
(view: Whole thread, Raw)
In response to: Re: a question about Direct I/O and double buffering  (Erik Jones)
List: pgsql-performance

Tree view

a question about Direct I/O and double buffering  (Xiaoning Ding <-state.edu>, )
 Re: a question about Direct I/O and double buffering  (Erik Jones, )
  Re: a question about Direct I/O and double buffering  (Xiaoning Ding <-state.edu>, )
   Re: a question about Direct I/O and double buffering  (Erik Jones, )
    Re: a question about Direct I/O and double buffering  ("Alex Deucher", )
     Re: a question about Direct I/O and double buffering  (Xiaoning Ding <-state.edu>, )
      Re: a question about Direct I/O and double buffering  ("Alex Deucher", )
      Re: a question about Direct I/O and double buffering  (, )
   Re: a question about Direct I/O and double buffering  (, )
    Re: a question about Direct I/O and double buffering  (Erik Jones, )
     Re: a question about Direct I/O and double buffering  (, )
  Re: a question about Direct I/O and double buffering  (Mark Lewis, )
   Re: a question about Direct I/O and double buffering  (Erik Jones, )
    Re: a question about Direct I/O and double buffering  (Mark Lewis, )
     Re: a question about Direct I/O and double buffering  (Erik Jones, )
      Re: a question about Direct I/O and double buffering  ("Jim C. Nasby", )

On Thu, 5 Apr 2007, Erik Jones wrote:

> On Apr 5, 2007, at 3:33 PM,  wrote:
>
>> On Thu, 5 Apr 2007, Xiaoning Ding wrote:
>>
>> > >
>> > >  To the best of my knowledge, Postgres itself does not have a direct IO
>> > >  option (although it would be a good addition).  So, in order to use
>> > >  direct
>> > >  IO with postgres you'll need to consult your filesystem docs for how
>> > >  to
>> > >  set the forcedirectio mount option.  I believe it can be set
>> > >  dynamically,
>> > >  but if you want it to be permanent you'll to add it to your
>> > >  fstab/vfstab
>> > >  file.
>> >
>> > I use Linux.  It supports direct I/O on a per-file basis only.  To bypass
>> > OS buffer cache,
>> > files should be opened with O_DIRECT option.  I afraid that I have to
>> > modify PG.
>>
>> as someone who has been reading the linux-kernel mailing list for 10 years,
>> let me comment on this a bit.
>>
>> linux does have a direct i/o option,
>
> Yes, I know applications can request direct i/o with the O_DIRECT flag to
> open(), but can this be set to be forced for all applications or for
> individual applications from "outside" the application (not that I've ever
> heard of something like the second)?

no it can't, due to the fact that direct i/o has additional requirements
for what you can user for buffers that don't apply to normal i/o

>> but it has significant limits on when and how you cna use it (buffers must
>> be 512byte aligned and multiples of 512 bytes, things like that).
>
> That's a standard limit imposed by the sector size of hard drives, and is
> present in all direct i/o implementations, not just Linux.

right, but you don't have those limits for normal i/o

>> Also, in many cases testing has shon that there is a fairly significant
>> performance hit for this, not a perfomance gain.
>
> Those performance hits have been noticed for high i/o transaction databases?
> The idea here is that these kinds of database manage their own caches and
> having a separate filesystem cache in virtual memory that works with system
> memory page sizes is an unneeded level of indirection.

ahh, you're proposing a re-think of how postgres interacts with the O/S,
not just an optimization to be applied to the current architecture.

unlike Oracle, Postgres doesn't try to be an OS itself, it tries very hard
to rely on the OS to properly implement things rather then doing it's own
implementation.

> Yes, you should
> expect other "normal" utilities will suffer a performance hit as if you are
> trying to cp a 500 byte file you'll still have to work with 8K writes and
> reads whereas with the filesystem cache you can just write/read part of a
> page in memory and let the cache decide when it needs to write and read from
> disk.  If there are other caveats to direct i/o on Linux I'd love to hear
> them.

other then bad interactions with "normal" utilities not compiled for
driect i/o I don't remember them offhand.

David Lang


pgsql-performance by date:

From: Tom Lane
Date:
Subject: Re: SCSI vs SATA
From: Michael Stone
Date:
Subject: Re: SCSI vs SATA