Thread: FYI: fdatasync vs sync_file_range

FYI: fdatasync vs sync_file_range

From
Fujii Masao
Date:
Hi,

Using sync_file_range(2) as wal_sync_method might speed up
the XLOG flush. So, I made the patch to introduce the new valid
value (sync_file_range) to wal_sync_method, and performed the
comparative performance measurement of fdatasync vs
sync_file_range using this patch. The patch is attached to this
mail. This is just a reference information, and I'm not planning to
provide the patch for CommitFest now.

Environment:
- PowerEdge1850 (Xeon 2.8GHz, Mem 512MB)
- Fedora11
- PostgreSQL v8.4 with the patch

Measurement:
- pgbench -i -s64
- pgbench -c16 -t1000 -Mprepared  * [20 times]
- postgresql.conf
  checkpoint_segments = 64
- The above measurement was repeated 3 times

Result:
- The following values indicate throughput of pgbench (tps)

The first set
----------------
       fdatasync   sync_file_range
1       60.6         58.9
2       63.1         58.8
3       61.3         62.3
4       70.3         66.8
5       67.4         66.2
6       67.8         71.1
7       74.3         67.5
8       70.0         71.9
9       71.7         72.8
10     74.0         72.0
11     72.3         72.1
12     79.9         78.6
13     73.3         73.3
14     72.9         71.2
15     78.6         78.6
16     81.7         76.7
17     75.5         75.9
18     78.0         73.3
19     75.3         78.9
20     83.0         77.3
avg   72.5         71.2

The second set
---------------------
       fdatasync   sync_file_range
1       52.6         60.3
2       57.4         65.9
3       62.6         63.7
4       59.0         68.9
5       67.0         72.2
6       61.5         72.2
7       69.0         73.4
8       64.3         75.6
9       67.6         74.8
10     69.1         75.7
11     65.7         77.7
12     72.6         76.6
13     68.8         75.5
14     69.4         79.4
15     74.2         81.2
16     71.4         77.5
17     71.3         78.0
18     73.1         80.4
19     73.5         80.2
20     73.7         80.7
avg   67.2         74.5

The third set
-----------------
       fdatasync   sync_file_range
1       60.9         59.5
2       58.3         64.1
3       64.7         62.9
4       66.6         68.0
5       67.9         70.9
6       69.9         69.4
7       70.0         72.6
8       72.3         76.6
9       70.7         74.7
10     70.3         70.2
11     77.2         78.2
12     74.8         73.9
13     69.6         79.0
14     79.3         80.7
15     78.0         74.6
16     77.8         78.9
17     73.6         81.0
18     81.5         77.6
19     76.1         78.5
20     79.1         83.7
avg   71.9         73.8

According to the result, using sync_file_range instead of fdatasync
has little effect in the performance of postgres. This time I just used
sync_file_range with the following combination of the flags:

   SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE |
      SYNC_FILE_RANGE_WAIT_AFTER

This might be a stupid way, so there might be room for improvement.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: FYI: fdatasync vs sync_file_range

From
Simon Riggs
Date:
On Mon, 2009-07-06 at 17:54 +0900, Fujii Masao wrote:

> According to the result, using sync_file_range instead of fdatasync
> has little effect in the performance of postgres.

["...when flushing XLOG"]

Why did you think it would?

AFAICS the range of dirty pages will be restricted to a fairly tight
range anyway. The only difference between the two would indicate an OS
inefficiency. I don't see an opportunity for XLOG to be more efficient
by using a finer-grained API.

I think there is still a valid use for sync_file_range at checkpoint,
since the for some large tables this could reduce the number of pages
needing to be written at checkpoint time. That would help smooth out
larger writes.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: FYI: fdatasync vs sync_file_range

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> According to the result, using sync_file_range instead of fdatasync
> has little effect in the performance of postgres.

When we flush the WAL, we flush everything we've written that far. I'm
not surprised that sync_file_range makes no difference; it does the same
amount of I/O as fsync().

sync_file_range() might be a useful useful replacement for the data file
fsync()s at checkpoint, though. You could avoid the I/O storm that
fsync() causes by flushing the files in smaller chunks with
sync_file_range(), with a small delay in between. But since I don't
recall any complaints about I/O storms at checkpoints since the smoothed
checkpoints patch in 8.3, it might not be worth it.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com