Thread: Disable page writes when fsync off, add GUC
This patch disables page writes to WAL when fsync is off, because with no fsync guarantee, the page write recovery isn't useful. This also adds a full_page_writes GUC to turn off page writes to WAL. Some people might not want full_page_writes, but still might want fsync. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 Index: doc/src/sgml/runtime.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v retrieving revision 1.335 diff -c -c -r1.335 runtime.sgml *** doc/src/sgml/runtime.sgml 2 Jul 2005 19:16:36 -0000 1.335 --- doc/src/sgml/runtime.sgml 4 Jul 2005 03:58:34 -0000 *************** *** 1687,1692 **** --- 1687,1723 ---- </listitem> </varlistentry> + <varlistentry id="guc-full-page-writes" xreflabel="full_page_writes"> + <indexterm> + <primary><varname>full_page_writes</> configuration parameter</primary> + </indexterm> + <term><varname>full_page_writes</varname> (<type>boolean</type>)</term> + <listitem> + <para> + A page write in process during an operating system crash might + be only partially written to disk, leading to an on-disk page + that contains a mix of old and new data. During recovery, the + row changes stored in WAL are not enough to recover from this + situation. + </para> + + <para> + When this option is on, the <productname>PostgreSQL</> server + writes full pages when first modified after a checkpoint to WAL + so full recovery is possible. Turning this option off might lead + to a corrupt system after an operating system crash because + uncorrected partial pages might contain inconsistent or corrupt + data. The risks are less but similar to <varname>fsync</>. + </para> + + <para> + This option can only be set at server start or in the + <filename>postgresql.conf</filename> file. The default is + <literal>on</>. + </para> + </listitem> + </varlistentry> + <varlistentry id="guc-wal-buffers" xreflabel="wal_buffers"> <term><varname>wal_buffers</varname> (<type>integer</type>)</term> <indexterm> Index: src/backend/access/transam/xlog.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/access/transam/xlog.c,v retrieving revision 1.205 diff -c -c -r1.205 xlog.c *** src/backend/access/transam/xlog.c 30 Jun 2005 00:00:50 -0000 1.205 --- src/backend/access/transam/xlog.c 4 Jul 2005 03:58:38 -0000 *************** *** 97,102 **** --- 97,103 ---- char *XLogArchiveCommand = NULL; char *XLOG_sync_method = NULL; const char XLOG_sync_method_default[] = DEFAULT_SYNC_METHOD_STR; + bool fullPageWrites = true; #ifdef WAL_DEBUG bool XLOG_DEBUG = false; *************** *** 593,599 **** { /* OK, put it in this slot */ dtbuf[i] = rdt->buffer; ! if (XLogCheckBuffer(rdt, &(dtbuf_lsn[i]), &(dtbuf_xlg[i]))) { dtbuf_bkp[i] = true; rdt->data = NULL; --- 594,602 ---- { /* OK, put it in this slot */ dtbuf[i] = rdt->buffer; ! /* If fsync is off, no need to backup pages. */ ! if (enableFsync && fullPageWrites && ! XLogCheckBuffer(rdt, &(dtbuf_lsn[i]), &(dtbuf_xlg[i]))) { dtbuf_bkp[i] = true; rdt->data = NULL; Index: src/backend/utils/misc/guc.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/misc/guc.c,v retrieving revision 1.271 diff -c -c -r1.271 guc.c *** src/backend/utils/misc/guc.c 28 Jun 2005 05:09:02 -0000 1.271 --- src/backend/utils/misc/guc.c 4 Jul 2005 03:58:46 -0000 *************** *** 82,87 **** --- 82,88 ---- extern int CommitDelay; extern int CommitSiblings; extern char *default_tablespace; + extern bool fullPageWrites; static const char *assign_log_destination(const char *value, bool doit, GucSource source); *************** *** 482,487 **** --- 483,500 ---- false, NULL, NULL }, { + {"full_page_writes", PGC_SIGHUP, WAL_SETTINGS, + gettext_noop("Fully writes pages when first modified after a checkpoint."), + gettext_noop("A page write in process during an operating system crash might be " + "only partially written to disk. During recovery, the row changes" + "stored in WAL are not enough to recover. This option writes " + "pages when first modified after a checkpoint to WAL so full recovery " + "is possible.") + }, + &fullPageWrites, + true, NULL, NULL + }, + { {"silent_mode", PGC_POSTMASTER, LOGGING_WHEN, gettext_noop("Runs the server silently."), gettext_noop("If this parameter is set, the server will automatically run in the " Index: src/backend/utils/misc/postgresql.conf.sample =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/misc/postgresql.conf.sample,v retrieving revision 1.151 diff -c -c -r1.151 postgresql.conf.sample *** src/backend/utils/misc/postgresql.conf.sample 2 Jul 2005 18:46:45 -0000 1.151 --- src/backend/utils/misc/postgresql.conf.sample 4 Jul 2005 03:58:46 -0000 *************** *** 121,126 **** --- 121,127 ---- #wal_sync_method = fsync # the default varies across platforms: # fsync, fdatasync, fsync_writethrough, # open_sync, open_datasync + #full_page_writes = on # recover from partial page writes #wal_buffers = 8 # min 4, 8KB each #commit_delay = 0 # range 0-100000, in microseconds #commit_siblings = 5 # range 1-1000
* Bruce Momjian (pgman@candle.pha.pa.us) wrote: > This patch disables page writes to WAL when fsync is off, because with > no fsync guarantee, the page write recovery isn't useful. This doesn't seem quite right to me. What happens with PITR? And Postgres crashes? While many people seriously distrust running w/ fsync off, I'm sure there's quite a few folks which do. > This also adds a full_page_writes GUC to turn off page writes to WAL. > Some people might not want full_page_writes, but still might want fsync. Adding an option to not do page writes to WAL seems fine to me, but I think WAL writes should be on by default, even in the fsync=off case. If people want to turn it off, fine, for either case since we expect they understand what it means to have it turned off, but I don't think the two options should be coupled as is being proposed. Thanks, Stephen
Attachment
Bruce Momjian wrote: > This patch disables page writes to WAL when fsync is off, because > with no fsync guarantee, the page write recovery isn't useful. > > This also adds a full_page_writes GUC to turn off page writes to WAL. > Some people might not want full_page_writes, but still might want > fsync. Do you have some numbers to suggest that there is a performance benefit to be had? -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut wrote: > Bruce Momjian wrote: > > This patch disables page writes to WAL when fsync is off, because > > with no fsync guarantee, the page write recovery isn't useful. > > > > This also adds a full_page_writes GUC to turn off page writes to WAL. > > Some people might not want full_page_writes, but still might want > > fsync. > > Do you have some numbers to suggest that there is a performance benefit > to be had? Josh reported page writes to be a big hit (which we already knew), but I don't have any with fsync off, though it seems like a no-brainer. However, I am thinking decoupling them is best. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian wrote: > This also adds a full_page_writes GUC to turn off page writes to WAL. > Some people might not want full_page_writes. Fsync linkage removed, patch attached and applied. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 Index: doc/src/sgml/runtime.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v retrieving revision 1.335 diff -c -c -r1.335 runtime.sgml *** doc/src/sgml/runtime.sgml 2 Jul 2005 19:16:36 -0000 1.335 --- doc/src/sgml/runtime.sgml 5 Jul 2005 23:15:33 -0000 *************** *** 1660,1666 **** <para> This option can only be set at server start or in the ! <filename>postgresql.conf</filename> file. </para> </listitem> </varlistentry> --- 1660,1668 ---- <para> This option can only be set at server start or in the ! <filename>postgresql.conf</filename> file. If this option ! is <literal>off</>, consider also turning off ! <varname>guc-full-page-writes</>. </para> </listitem> </varlistentry> *************** *** 1687,1692 **** --- 1689,1725 ---- </listitem> </varlistentry> + <varlistentry id="guc-full-page-writes" xreflabel="full_page_writes"> + <indexterm> + <primary><varname>full_page_writes</> configuration parameter</primary> + </indexterm> + <term><varname>full_page_writes</varname> (<type>boolean</type>)</term> + <listitem> + <para> + A page write in process during an operating system crash might + be only partially written to disk, leading to an on-disk page + that contains a mix of old and new data. During recovery, the + row changes stored in WAL are not enough to completely restore + the page. + </para> + + <para> + When this option is on, the <productname>PostgreSQL</> server + writes full pages to WAL when they first modified after a checkpoint + so full recovery is possible. Turning this option off might lead + to a corrupt system after an operating system crash because + uncorrected partial pages might contain inconsistent or corrupt + data. The risks are less but similar to <varname>fsync</>. + </para> + + <para> + This option can only be set at server start or in the + <filename>postgresql.conf</filename> file. The default is + <literal>on</>. + </para> + </listitem> + </varlistentry> + <varlistentry id="guc-wal-buffers" xreflabel="wal_buffers"> <term><varname>wal_buffers</varname> (<type>integer</type>)</term> <indexterm> Index: src/backend/access/transam/xlog.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/access/transam/xlog.c,v retrieving revision 1.206 diff -c -c -r1.206 xlog.c *** src/backend/access/transam/xlog.c 4 Jul 2005 04:51:44 -0000 1.206 --- src/backend/access/transam/xlog.c 5 Jul 2005 23:15:36 -0000 *************** *** 103,108 **** --- 103,109 ---- char *XLogArchiveCommand = NULL; char *XLOG_sync_method = NULL; const char XLOG_sync_method_default[] = DEFAULT_SYNC_METHOD_STR; + bool fullPageWrites = true; #ifdef WAL_DEBUG bool XLOG_DEBUG = false; *************** *** 594,600 **** { /* OK, put it in this slot */ dtbuf[i] = rdt->buffer; ! if (XLogCheckBuffer(rdt, &(dtbuf_lsn[i]), &(dtbuf_xlg[i]))) { dtbuf_bkp[i] = true; rdt->data = NULL; --- 595,603 ---- { /* OK, put it in this slot */ dtbuf[i] = rdt->buffer; ! /* If fsync is off, no need to backup pages. */ ! if (fullPageWrites && ! XLogCheckBuffer(rdt, &(dtbuf_lsn[i]), &(dtbuf_xlg[i]))) { dtbuf_bkp[i] = true; rdt->data = NULL; Index: src/backend/utils/misc/guc.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/misc/guc.c,v retrieving revision 1.272 diff -c -c -r1.272 guc.c *** src/backend/utils/misc/guc.c 4 Jul 2005 04:51:51 -0000 1.272 --- src/backend/utils/misc/guc.c 5 Jul 2005 23:15:39 -0000 *************** *** 83,88 **** --- 83,89 ---- extern int CommitDelay; extern int CommitSiblings; extern char *default_tablespace; + extern bool fullPageWrites; static const char *assign_log_destination(const char *value, bool doit, GucSource source); *************** *** 483,488 **** --- 484,501 ---- false, NULL, NULL }, { + {"full_page_writes", PGC_SIGHUP, WAL_SETTINGS, + gettext_noop("Writes full pages to WAL when first modified after a checkpoint."), + gettext_noop("A page write in process during an operating system crash might be " + "only partially written to disk. During recovery, the row changes" + "stored in WAL are not enough to recover. This option writes " + "pages when first modified after a checkpoint to WAL so full recovery " + "is possible.") + }, + &fullPageWrites, + true, NULL, NULL + }, + { {"silent_mode", PGC_POSTMASTER, LOGGING_WHEN, gettext_noop("Runs the server silently."), gettext_noop("If this parameter is set, the server will automatically run in the " Index: src/backend/utils/misc/postgresql.conf.sample =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/misc/postgresql.conf.sample,v retrieving revision 1.151 diff -c -c -r1.151 postgresql.conf.sample *** src/backend/utils/misc/postgresql.conf.sample 2 Jul 2005 18:46:45 -0000 1.151 --- src/backend/utils/misc/postgresql.conf.sample 5 Jul 2005 23:15:39 -0000 *************** *** 121,126 **** --- 121,127 ---- #wal_sync_method = fsync # the default varies across platforms: # fsync, fdatasync, fsync_writethrough, # open_sync, open_datasync + #full_page_writes = on # recover from partial page writes #wal_buffers = 8 # min 4, 8KB each #commit_delay = 0 # range 0-100000, in microseconds #commit_siblings = 5 # range 1-1000
Bruce Momjian wrote: > Bruce Momjian wrote: >> This also adds a full_page_writes GUC to turn off page writes to WAL. >> Some people might not want full_page_writes. > > Fsync linkage removed, patch attached and applied. ... + When this option is on, the <productname>PostgreSQL</> server + writes full pages to WAL when they first modified after a checkpoint + so full recovery is possible. I believe this should be "when they _are_ first modified after". Perhaps you should also mention power failure, not only an operating system crash as disaster scenario, even if the latter includes the former. Best Regards, Michael Paesold
Michael Paesold wrote: > Bruce Momjian wrote: > > > Bruce Momjian wrote: > >> This also adds a full_page_writes GUC to turn off page writes to WAL. > >> Some people might not want full_page_writes. > > > > Fsync linkage removed, patch attached and applied. > > ... > + When this option is on, the <productname>PostgreSQL</> server > + writes full pages to WAL when they first modified after a checkpoint > + so full recovery is possible. > > I believe this should be "when they _are_ first modified after". > > Perhaps you should also mention power failure, not only an operating system > crash as disaster scenario, even if the latter includes the former. > Thanks. Done. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 Index: doc/src/sgml/runtime.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v retrieving revision 1.336 diff -c -c -r1.336 runtime.sgml *** doc/src/sgml/runtime.sgml 5 Jul 2005 23:18:09 -0000 1.336 --- doc/src/sgml/runtime.sgml 6 Jul 2005 14:40:15 -0000 *************** *** 1705,1715 **** <para> When this option is on, the <productname>PostgreSQL</> server ! writes full pages to WAL when they first modified after a checkpoint ! so full recovery is possible. Turning this option off might lead ! to a corrupt system after an operating system crash because ! uncorrected partial pages might contain inconsistent or corrupt ! data. The risks are less but similar to <varname>fsync</>. </para> <para> --- 1705,1716 ---- <para> When this option is on, the <productname>PostgreSQL</> server ! writes full pages to WAL when they are first modified after a ! checkpoint so full recovery is possible. Turning this option off ! might lead to a corrupt system after an operating system crash ! or power failure because uncorrected partial pages might contain ! inconsistent or corrupt data. The risks are less but similar to ! <varname>fsync</>. </para> <para>