Re: O_DIRECT on macOS - Mailing list pgsql-hackers

From Tom Lane
Subject Re: O_DIRECT on macOS
Date
Msg-id 337210.1626740787@sss.pgh.pa.us
Whole thread Raw
In response to Re: O_DIRECT on macOS  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: O_DIRECT on macOS
List pgsql-hackers
I wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
>> While I was here again, I couldn't resist trying to extend this to
>> Solaris, since it looked so easy.  I don't have access, but I tested
>> on Illumos by undefining O_DIRECT.  Thoughts?

> I can try that on the gcc farm in a bit.

Hmm, it compiles cleanly, but something seems drastically wrong,
because performance is just awful.  On the other hand, I don't
know what sort of storage is underlying this instance, so maybe
that's to be expected?  If I set fsync = off, the speed seems
comparable to what wrasse reports, but with fsync on it's like

test tablespace                   ... ok        87990 ms
parallel group (20 tests, in groups of 1):  boolean char name varchar text int2 int4 int8 oid float4 float8 bit numeric
txiduuid enum money rangetypes pg_lsn regproc 
     boolean                      ... ok         3229 ms
     char                         ... ok         2758 ms
     name                         ... ok         2229 ms
     varchar                      ... ok         7373 ms
     text                         ... ok          722 ms
     int2                         ... ok          342 ms
     int4                         ... ok         1303 ms
     int8                         ... ok         1095 ms
     oid                          ... ok         1086 ms
     float4                       ... ok         6360 ms
     float8                       ... ok         5224 ms
     bit                          ... ok         6254 ms
     numeric                      ... ok        44304 ms
     txid                         ... ok          377 ms
     uuid                         ... ok         3946 ms
     enum                         ... ok        33189 ms
     money                        ... ok          622 ms
     rangetypes                   ... ok        17301 ms
     pg_lsn                       ... ok          798 ms
     regproc                      ... ok          145 ms

(I stopped running it at that point...)

Also, the results of pg_test_fsync seem wrong; it refuses to run
tests for the cases we're interested in:

$ pg_test_fsync
5 seconds per test
DIRECTIO_ON supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync is Linux's default)
        open_datasync                                  n/a*
        fdatasync                             8.324 ops/sec  120139 usecs/op
        fsync                                 0.906 ops/sec  1103936 usecs/op
        fsync_writethrough                              n/a
        open_sync                                      n/a*
* This file system and its mount options do not support direct
  I/O, e.g. ext4 in journaled mode.

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync is Linux's default)
        open_datasync                                  n/a*
        fdatasync                             7.329 ops/sec  136449 usecs/op
        fsync                                 0.788 ops/sec  1269258 usecs/op
        fsync_writethrough                              n/a
        open_sync                                      n/a*
* This file system and its mount options do not support direct
  I/O, e.g. ext4 in journaled mode.

Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB in different write
open_sync sizes.)
         1 * 16kB open_sync write                      n/a*
         2 *  8kB open_sync writes                     n/a*
         4 *  4kB open_sync writes                     n/a*
         8 *  2kB open_sync writes                     n/a*
        16 *  1kB open_sync writes                     n/a*

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written on a different
descriptor.)
        write, fsync, close                  16.388 ops/sec   61020 usecs/op
        write, close, fsync                   9.084 ops/sec  110082 usecs/op

Non-sync'ed 8kB writes:
        write                             39855.686 ops/sec      25 usecs/op



            regards, tom lane



pgsql-hackers by date:

Previous
From: Arne Roland
Date:
Subject: Re: Rename of triggers for partitioned tables
Next
From: "houzj.fnst@fujitsu.com"
Date:
Subject: RE: row filtering for logical replication