Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Date
Msg-id 20140122215536.GF6346@momjian.us
Whole thread Raw
In response to Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance  (Jan Kara <jack@suse.cz>)
List pgsql-hackers
On Tue, Jan 21, 2014 at 09:20:52PM +0100, Jan Kara wrote:
> > If we're forcing the WAL out to disk because of transaction commit or
> > because we need to write the buffer protected by a certain WAL record
> > only after the WAL hits the platter, then it's fine.  But sometimes
> > we're writing WAL just because we've run out of internal buffer space,
> > and we don't want to block waiting for the write to complete.  Opening
> > the file with O_SYNC deprives us of the ability to control the timing
> > of the sync relative to the timing of the write.
>   O_SYNC has a heavy performance penalty. For ext4 it means an extra fs
> transaction commit whenever there's any metadata changed on the filesystem.
> Since mtime & ctime of files will be changed often, the will be a case very
> often.

Also, there is the issue of writes that don't need sycning being synced
because sync is set on the file descriptor.  Here is output from our
pg_test_fsync tool when run on an SSD with a BBU:
$ pg_test_fsync5 seconds per testO_DIRECT supported on this platform for open_datasync and open_sync.Compare file sync
methodsusing one 8kB write:(in wal_sync_method preference order, except fdatasyncis Linux's default)
open_datasync                                  n/a        fdatasync                          8424.785 ops/sec     119
usecs/op       fsync                              7127.072 ops/sec     140 usecs/op        fsync_writethrough
                  n/a        open_sync                         10548.469 ops/sec      95 usecs/opCompare file sync
methodsusing two 8kB writes:(in wal_sync_method preference order, except fdatasyncis Linux's default)
open_datasync                                  n/a        fdatasync                          4367.375 ops/sec     229
usecs/op       fsync                              4427.761 ops/sec     226 usecs/op        fsync_writethrough
                  n/a        open_sync                          4303.564 ops/sec     232 usecs/opCompare open_sync with
differentwrite sizes:(This is designed to compare the cost of writing 16kBin different write open_sync sizes.)
 
-->             1 * 16kB open_sync write          4938.711 ops/sec     202 usecs/op
-->             2 *  8kB open_sync writes         4233.897 ops/sec     236 usecs/op
-->             4 *  4kB open_sync writes         2904.710 ops/sec     344 usecs/op
-->             8 *  2kB open_sync writes         1736.720 ops/sec     576 usecs/op
-->            16 *  1kB open_sync writes          935.917 ops/sec    1068 usecs/opTest if fsync on non-write file
descriptoris honored:(If the times are similar, fsync() can sync data writtenon a different descriptor.)        write,
fsync,close                7626.783 ops/sec     131 usecs/op        write, close, fsync                6492.697 ops/sec
   154 usecs/opNon-Sync'ed 8kB writes:        write                            351517.178 ops/sec       3 usecs/op
 

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



pgsql-hackers by date:

Previous
From: Jon Nelson
Date:
Subject: Re: PoC: Duplicate Tuple Elidation during External Sort for DISTINCT
Next
From: Simon Riggs
Date:
Subject: Re: Hard limit on WAL space used (because PANIC sucks)