Re: O_DIRECT support for Windows - Mailing list pgsql-patches

From Magnus Hagander
Subject Re: O_DIRECT support for Windows
Date
Msg-id 20070116090946.GA1564@svr2.hagander.net
Whole thread Raw
In response to Re: O_DIRECT support for Windows  ("Takayuki Tsunakawa" <tsunakawa.takay@jp.fujitsu.com>)
List pgsql-patches
On Tue, Jan 16, 2007 at 10:59:11AM +0900, Takayuki Tsunakawa wrote:
> From: "Magnus Hagander" <magnus@hagander.net>
> > ITAGAKI Takahiro wrote:
> >> Do you mean there are drives that have larger sector size than 8kB?
> >> We've already put the xlog buffer along the alignment of
> >> ALIGNOF_XLOG_BUFFER (typically 8192 bytes).
> >> But if there are such drives, using FILE_FLAG_NO_BUFFERING is
> harmful!
> >
> > Yes. I have heard this can happen with certain SAN drives. I haven't
> > seen it myself, and I can't seem to find a reference right now :-)
> But I
> > do recall having read about th need to check the sector size and
> > specifically align it, because some do have that problem.
>
> I think many people can benefit from Itagaki-san's proposal, and
> NO_BUFFERING should be default.  Isn't it very rare that disks with
> sector size larger than 8KB are used?

Definitly very rare.


> Providing a way (such as
> wal_sync_method) to avoid NO_BUFFERING is sufficient for people in
> rare environments.  Or, by determining the sector size with
> GetDiskFreeSpaceEx(), we could auto-switch to not using NO_BUFFERING
> when the sector size is larger than 8KB.

I think the second one is better.

> I wonder whether GetDiskFreeSpaceEx() tells us the right sector size
> configured by SAN tools.

It should. If it doesn't, then there are likely to be other issues.

> And I wonder if Microsoft assumes a sector size larger than 4KB and
> NTFS works.  The following paragraph appears in the CreateFile page:
>
> One way to align buffers on integer multiples of the volume sector
> size is to use VirtualAlloc to allocate the buffers. It allocates
> memory that is aligned on addresses that are integer multiples of the
> operating system's memory page size. Because both memory page and
> volume sector sizes are powers of 2, this memory is also aligned on
> addresses that are integer multiples of a volume sector size. Memory
> pages are 4-8 KB in size; sectors are 512 bytes (hard disks) or 2048
> bytes (CD), and therefore, volume sectors can never be larger than
> memory pages.

Good question. Again, I have no firsthand info about systems with >4K
sectors. Obviously you have 2K sectors on CDs, but that doesn't really
apply to us because we don't run with our files on CD at all...

It *could* be someone who mixed up the difference between sector size
and NTFS block size (which is definitly supoprted up to 64K/block at
least).

A quick google shows some inconclusive results :-)BUt look at for
example:

http://groups.google.se/group/microsoft.public.sqlserver.server/tree/browse_frm/thread/d3288d3b43338b47/ff5e825dd02faff4?rnum=1&hl=en&q=ntfs+sector+size&_done=%2Fgroup%2Fmicrosoft.public.sqlserver.server%2Fbrowse_frm%2Fthread%2Fd3288d3b43338b47%2Fff5e825dd02faff4%3Ftvc%3D1%26q%3Dntfs+sector+size%26hl%3Den%26#doc_4556b64132b3baa7

This seems to indicate that *Windows* supports sector sizes >4K, but SQL
Server doesn't. But again, it could be a mixup between cluster and
sector size...

//MAgnus

pgsql-patches by date:

Previous
From: "Albe Laurenz"
Date:
Subject: Re: pg_dumpall default database
Next
From: Dave Page
Date:
Subject: Re: pg_dumpall default database