Re: Load distributed checkpoint - Mailing list pgsql-hackers
From | Takayuki Tsunakawa |
---|---|
Subject | Re: Load distributed checkpoint |
Date | |
Msg-id | 039501c71d1e$75e4c5e0$19527c0a@OPERAO Whole thread Raw |
In response to | Load distributed checkpoint (ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp>) |
Responses |
Re: Load distributed checkpoint
|
List | pgsql-hackers |
Hello, From: "ITAGAKI Takahiro" <itagaki.takahiro@oss.ntt.co.jp> "Takayuki Tsunakawa" <tsunakawa.takay@jp.fujitsu.com> wrote: >> I'm afraid it is difficult for system designers to expect steady >> throughput/response time, as long as PostgreSQL depends on the >> flushing of file system cache. How does Oracle provide stable >> performance? >> Though I'm not sure, isn't it the key to use O_SYNC so that write()s >> transfer data to disk? > > AFAIK, other databases use write() and fsync() in combination. They call > fsync() immediately after they write buffers in some small batches. Otherwise, > they uses asynchronous and direct I/O options. Therefore, dirty pages in > kernel buffers are keeped to be low at any time. Oracle seems to use O_SYNC. I've found a related page in the Oracle manuals. -------------------------------------------------- http://download-west.oracle.com/docs/cd/B19306_01/win.102/b15688/ap_unix.htm Direct Writes to Disk On both UNIX and Windows platforms, bypassing the file system buffer cache ensures data is written to disk. On UNIX, Oracle Database uses the O_SYNC flag to bypass the file system buffer cache. The flag name depends on the UNIX port. On Windows, Oracle Database bypasses the file system buffer cache completely. -------------------------------------------------- As Itagaki-san says, asynchronous+direct I/O provides best performance, I believe. Oracle supplies the combination as follows: -------------------------------------------------- http://download-west.oracle.com/docs/cd/B19306_01/server.102/b15658/appc_linux.htm#sthref870 Oracle Database supports kernel asynchronous I/O. This feature is disabled by default. By default, the DISK_ASYNCH_IO initialization parameter in the parameter file is set to TRUE to enable asynchronous I/O on raw devices. To enable asynchronous I/O on file system files: Ensure that all Oracle Database files are located on file systems that support asynchronous I/O. Set the FILESYSTEMIO_OPTIONS initialization parameter in the parameter file to one of the following values: Linux Distribution Recommended Value SUSE Linux Enterprise Server 9 SETALL Other distributions ASYNCH -------------------------------------------------- I believe SQL Server also uses direct+asynchronous I/O, because Microsoft recommends in MSDN that the combination plus appropriate multi-threading provides best performance. I tested Oracle9i on RHEL 2.1. I straced DBWn (database writer, which is like the bgwriter of PostgreSQL) while creating tables, indexes, etc. and shutting down the database server. Oracle surely uses the O_SYNC as follows, but it doesn't use fsync(). 24462 open("/work4/ora/tuna/users01.dbf", O_RDWR|O_SYNC|O_LARGEFILE) = 16 I wonder how the other big DBMS, IBM DB2, handles this. Is Itagaki-san referring to DB2?
pgsql-hackers by date: