Re: Load distributed checkpoint - Mailing list pgsql-hackers

From Takayuki Tsunakawa
Subject Re: Load distributed checkpoint
Date
Msg-id 039501c71d1e$75e4c5e0$19527c0a@OPERAO
Whole thread Raw
In response to Load distributed checkpoint  (ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp>)
Responses Re: Load distributed checkpoint
List pgsql-hackers
Hello,

From: "ITAGAKI Takahiro" <itagaki.takahiro@oss.ntt.co.jp>
"Takayuki Tsunakawa" <tsunakawa.takay@jp.fujitsu.com> wrote:
>> I'm afraid it is difficult for system designers to expect steady
>> throughput/response time, as long as PostgreSQL depends on the
>> flushing of file system cache. How does Oracle provide stable
>> performance?
>> Though I'm not sure, isn't it the key to use O_SYNC so that
write()s
>> transfer data to disk?
>
> AFAIK, other databases use write() and fsync() in combination. They
call
> fsync() immediately after they write buffers in some small batches.
Otherwise,
> they uses asynchronous and direct I/O options. Therefore, dirty
pages in
> kernel buffers are keeped to be low at any time.


Oracle seems to use O_SYNC. I've found a related page in the Oracle
manuals.

--------------------------------------------------
http://download-west.oracle.com/docs/cd/B19306_01/win.102/b15688/ap_unix.htm
Direct Writes to Disk
On both UNIX and Windows platforms, bypassing the file system buffer
cache ensures data is written to disk.
On UNIX, Oracle Database uses the O_SYNC flag to bypass the file
system buffer cache. The flag name depends on the UNIX port.
On Windows, Oracle Database bypasses the file system buffer cache
completely.
--------------------------------------------------

As Itagaki-san says, asynchronous+direct I/O provides best
performance, I believe. Oracle supplies the combination as follows:

--------------------------------------------------
http://download-west.oracle.com/docs/cd/B19306_01/server.102/b15658/appc_linux.htm#sthref870
Oracle Database supports kernel asynchronous I/O. This feature is
disabled by default.
By default, the DISK_ASYNCH_IO initialization parameter in the
parameter file is set to TRUE to enable asynchronous I/O on raw
devices. To enable asynchronous I/O on file system files:
Ensure that all Oracle Database files are located on file systems that
support asynchronous I/O.
Set the FILESYSTEMIO_OPTIONS initialization parameter in the parameter
file to one of the following values:

Linux Distribution Recommended Value
SUSE Linux Enterprise Server 9 SETALL
Other distributions ASYNCH
--------------------------------------------------

I believe SQL Server also uses direct+asynchronous I/O, because
Microsoft recommends in MSDN that the combination plus appropriate
multi-threading provides best performance.

I tested Oracle9i on RHEL 2.1. I straced DBWn (database writer, which
is like the bgwriter of PostgreSQL) while creating tables, indexes,
etc. and shutting down the database server. Oracle surely uses the
O_SYNC as follows, but it doesn't use fsync().

24462 open("/work4/ora/tuna/users01.dbf", O_RDWR|O_SYNC|O_LARGEFILE) =
16

I wonder how the other big DBMS, IBM DB2, handles this. Is Itagaki-san
referring to DB2?





pgsql-hackers by date:

Previous
From: Richard Huxton
Date:
Subject: Re: EXPLAIN ANALYZE
Next
From: "Takayuki Tsunakawa"
Date:
Subject: Re: Load distributed checkpoint