Re: O_DIRECT in freebsd - Mailing list pgsql-hackers

From Manfred Spraul
Subject Re: O_DIRECT in freebsd
Date
Msg-id 3FA14855.5020103@colorfullife.com
Whole thread Raw
In response to Re: O_DIRECT in freebsd  (Greg Stark <gsstark@mit.edu>)
List pgsql-hackers
Greg Stark wrote:

>Manfred Spraul <manfred@colorfullife.com> writes:
>
>  
>
>>One problem for WAL is that O_DIRECT would disable the write cache -
>>each operation would block until the data arrived on disk, and that might block
>>other backends that try to access WALWriteLock.
>>Perhaps a dedicated backend that does the writeback could fix that.
>>    
>>
>
>aio seems a better fit.
>
>  
>
>>Has anyone tried to use posix_fadvise for the wal logs?
>>http://www.opengroup.org/onlinepubs/007904975/functions/posix_fadvise.html
>>
>>Linux supports posix_fadvise, it seems to be part of xopen2k.
>>    
>>
>
>Odd, I don't see it anywhere in the kernel. I don't know what syscall it's
>using to do this tweaking.
>  
>
At least in 2.6: linux/mm/fadvise.c, the syscall is fadvise64 or 64_64

>This is the only option that seems useful for postgres for both the WAL and
>vacuum (though in other threads it seems the problems with vacuum lie
>elsewhere):
>
>       POSIX_FADV_DONTNEED attempts to free cached pages associated with the
>       specified region. This is useful, for example, while streaming large
>       files. A program may periodically request the kernel to free cached
>       data that has already been used, so that more useful cached pages are
>       not discarded instead.
>
>       Pages that have not yet been written out will be unaffected, so if the
>       application wishes to guarantee that pages will be released, it should
>       call fsync or fdatasync first.
>  
>
I agree. Either immediately after each flush syscall, or just before 
closing a log file and switching to the next.

>Perhaps POSIX_FADV_RANDOM and POSIX_FADV_SEQUENTIAL could be useful in a
>backend before starting a sequential scan or index scan, but I kind of doubt
>it.
>  
>
IIRC the recommendation is ~20% total memory for the postgres user space 
buffers. That's quite a lot - it might be sufficient to protect that 
cache from vacuum or sequential scans. AddBufferToFreeList already 
contains a comment that this is the right place to try buffer 
replacement strategies.

--   Manfred



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: 7.4 compatibility question
Next
From: Sailesh Krishnamurthy
Date:
Subject: Re: O_DIRECT in freebsd