Thread: New XLOG record indicating WAL-skipping

New XLOG record indicating WAL-skipping

From
Fujii Masao
Date:
On Wed, Dec 9, 2009 at 6:25 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> Here is the patch:
>
> - Write an XLOG UNLOGGED record in WAL if WAL-logging is skipped for only
>  the reason that WAL archiving is not enabled and such record has not been
>  written yet.
>
> - Cause archive recovery to end if an XLOG UNLOGGED record is found during
>  it.

Here's an updated version of my "New XLOG record indicating WAL-skipping" patch.
http://archives.postgresql.org/pgsql-hackers/2009-12/msg00788.php

This is rebased to CVS HEAD.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: New XLOG record indicating WAL-skipping

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> On Wed, Dec 9, 2009 at 6:25 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> Here is the patch:
>>
>> - Write an XLOG UNLOGGED record in WAL if WAL-logging is skipped for only
>>  the reason that WAL archiving is not enabled and such record has not been
>>  written yet.
>>
>> - Cause archive recovery to end if an XLOG UNLOGGED record is found during
>>  it.
>
> Here's an updated version of my "New XLOG record indicating WAL-skipping" patch.
> http://archives.postgresql.org/pgsql-hackers/2009-12/msg00788.php

Thanks!

I don't like special-casing UNLOGGED records in XLogInsert and
ReadRecord(). Those functions are complicated enough already. The
special handling from XLogInsert() (and a few other places) is only
required because the UNLOGGED records carry no payload. That's easy to
avoid, just add some payload to them, doesn't matter what it is. And I
don't think ReadRecord() is the right place to emit the errors/warnings,
that belongs naturally in xlog_redo().

It might be useful to add some information in the records telling why
WAL-logging was skipped. It might turn out to be useful in debugging.
That also conveniently adds payload to the records, to avoid the
special-casing in XLogInsert() :-).

I think it's a premature optimization to skip writing the records if
we've written in the same session already. Especially with the 'reason'
information added to the records, it's nice to have a record of each
such operation. All operations that skip WAL-logging are heavy enough
that an additional WAL record will make no difference. I can see that it
was required to avoid the flooding from heap_insert(), but we can move
the XLogSkipLogging() call from heap_insert() to heap_sync().

Attached is an updated patch, doing the above. Am I missing anything?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
? GNUmakefile
? b
? config.log
? config.status
? config.status.lineno
? configure.lineno
? gin-splay-1.patch
? gin-splay-2.patch
? gin-splay-3.patch
? md-1.c
? md-1.patch
? temp-file-resowner-2.patch
? contrib/pgbench/fsynctest
? contrib/pgbench/fsynctest.c
? contrib/pgbench/fsynctestfile
? contrib/spi/.deps
? doc/src/sgml/HTML.index
? doc/src/sgml/bookindex.sgml
? doc/src/sgml/features-supported.sgml
? doc/src/sgml/features-unsupported.sgml
? doc/src/sgml/version.sgml
? src/Makefile.global
? src/backend/aaa.patch
? src/backend/postgres
? src/backend/access/common/.deps
? src/backend/access/gin/.deps
? src/backend/access/gist/.deps
? src/backend/access/hash/.deps
? src/backend/access/heap/.deps
? src/backend/access/index/.deps
? src/backend/access/nbtree/.deps
? src/backend/access/transam/.deps
? src/backend/bootstrap/.deps
? src/backend/catalog/.deps
? src/backend/commands/.deps
? src/backend/executor/.deps
? src/backend/foreign/.deps
? src/backend/foreign/dummy/.deps
? src/backend/foreign/postgresql/.deps
? src/backend/lib/.deps
? src/backend/libpq/.deps
? src/backend/main/.deps
? src/backend/nodes/.deps
? src/backend/optimizer/geqo/.deps
? src/backend/optimizer/path/.deps
? src/backend/optimizer/plan/.deps
? src/backend/optimizer/prep/.deps
? src/backend/optimizer/util/.deps
? src/backend/parser/.deps
? src/backend/po/af.mo
? src/backend/po/cs.mo
? src/backend/po/de.mo
? src/backend/po/es.mo
? src/backend/po/fr.mo
? src/backend/po/hr.mo
? src/backend/po/hu.mo
? src/backend/po/it.mo
? src/backend/po/ja.mo
? src/backend/po/ko.mo
? src/backend/po/nb.mo
? src/backend/po/nl.mo
? src/backend/po/pl.mo
? src/backend/po/pt_BR.mo
? src/backend/po/ro.mo
? src/backend/po/ru.mo
? src/backend/po/sk.mo
? src/backend/po/sl.mo
? src/backend/po/sv.mo
? src/backend/po/tr.mo
? src/backend/po/zh_CN.mo
? src/backend/po/zh_TW.mo
? src/backend/port/.deps
? src/backend/postmaster/.deps
? src/backend/regex/.deps
? src/backend/replication/.deps
? src/backend/replication/walreceiver/.deps
? src/backend/rewrite/.deps
? src/backend/snowball/.deps
? src/backend/snowball/snowball_create.sql
? src/backend/storage/buffer/.deps
? src/backend/storage/file/.deps
? src/backend/storage/freespace/.deps
? src/backend/storage/ipc/.deps
? src/backend/storage/large_object/.deps
? src/backend/storage/lmgr/.deps
? src/backend/storage/page/.deps
? src/backend/storage/smgr/.deps
? src/backend/tcop/.deps
? src/backend/tsearch/.deps
? src/backend/utils/.deps
? src/backend/utils/probes.h
? src/backend/utils/adt/.deps
? src/backend/utils/cache/.deps
? src/backend/utils/error/.deps
? src/backend/utils/fmgr/.deps
? src/backend/utils/hash/.deps
? src/backend/utils/init/.deps
? src/backend/utils/mb/.deps
? src/backend/utils/mb/Unicode/BIG5.TXT
? src/backend/utils/mb/Unicode/CP950.TXT
? src/backend/utils/mb/conversion_procs/conversion_create.sql
? src/backend/utils/mb/conversion_procs/ascii_and_mic/.deps
? src/backend/utils/mb/conversion_procs/cyrillic_and_mic/.deps
? src/backend/utils/mb/conversion_procs/euc2004_sjis2004/.deps
? src/backend/utils/mb/conversion_procs/euc_cn_and_mic/.deps
? src/backend/utils/mb/conversion_procs/euc_jis_2004_and_shift_jis_2004/.deps
? src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/.deps
? src/backend/utils/mb/conversion_procs/euc_kr_and_mic/.deps
? src/backend/utils/mb/conversion_procs/euc_tw_and_big5/.deps
? src/backend/utils/mb/conversion_procs/latin2_and_win1250/.deps
? src/backend/utils/mb/conversion_procs/latin_and_mic/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_ascii/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_big5/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_euc2004/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_euc_jis_2004/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_gb18030/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_gbk/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_iso8859/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_johab/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_shift_jis_2004/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_sjis/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_sjis2004/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_uhc/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_win/.deps
? src/backend/utils/misc/.deps
? src/backend/utils/mmgr/.deps
? src/backend/utils/resowner/.deps
? src/backend/utils/sort/.deps
? src/backend/utils/time/.deps
? src/bin/initdb/.deps
? src/bin/initdb/initdb
? src/bin/initdb/po/cs.mo
? src/bin/initdb/po/de.mo
? src/bin/initdb/po/es.mo
? src/bin/initdb/po/fr.mo
? src/bin/initdb/po/it.mo
? src/bin/initdb/po/ja.mo
? src/bin/initdb/po/ko.mo
? src/bin/initdb/po/pl.mo
? src/bin/initdb/po/pt_BR.mo
? src/bin/initdb/po/ro.mo
? src/bin/initdb/po/ru.mo
? src/bin/initdb/po/sk.mo
? src/bin/initdb/po/sl.mo
? src/bin/initdb/po/sv.mo
? src/bin/initdb/po/ta.mo
? src/bin/initdb/po/tr.mo
? src/bin/initdb/po/zh_CN.mo
? src/bin/initdb/po/zh_TW.mo
? src/bin/pg_config/.deps
? src/bin/pg_config/pg_config
? src/bin/pg_config/po/cs.mo
? src/bin/pg_config/po/de.mo
? src/bin/pg_config/po/es.mo
? src/bin/pg_config/po/fr.mo
? src/bin/pg_config/po/it.mo
? src/bin/pg_config/po/ja.mo
? src/bin/pg_config/po/ko.mo
? src/bin/pg_config/po/nb.mo
? src/bin/pg_config/po/pl.mo
? src/bin/pg_config/po/pt_BR.mo
? src/bin/pg_config/po/ro.mo
? src/bin/pg_config/po/ru.mo
? src/bin/pg_config/po/sl.mo
? src/bin/pg_config/po/sv.mo
? src/bin/pg_config/po/ta.mo
? src/bin/pg_config/po/tr.mo
? src/bin/pg_config/po/zh_CN.mo
? src/bin/pg_config/po/zh_TW.mo
? src/bin/pg_controldata/.deps
? src/bin/pg_controldata/pg_controldata
? src/bin/pg_controldata/po/cs.mo
? src/bin/pg_controldata/po/de.mo
? src/bin/pg_controldata/po/es.mo
? src/bin/pg_controldata/po/fa.mo
? src/bin/pg_controldata/po/fr.mo
? src/bin/pg_controldata/po/hu.mo
? src/bin/pg_controldata/po/it.mo
? src/bin/pg_controldata/po/ja.mo
? src/bin/pg_controldata/po/ko.mo
? src/bin/pg_controldata/po/nb.mo
? src/bin/pg_controldata/po/pl.mo
? src/bin/pg_controldata/po/pt_BR.mo
? src/bin/pg_controldata/po/ro.mo
? src/bin/pg_controldata/po/ru.mo
? src/bin/pg_controldata/po/sk.mo
? src/bin/pg_controldata/po/sl.mo
? src/bin/pg_controldata/po/sv.mo
? src/bin/pg_controldata/po/ta.mo
? src/bin/pg_controldata/po/tr.mo
? src/bin/pg_controldata/po/zh_CN.mo
? src/bin/pg_controldata/po/zh_TW.mo
? src/bin/pg_ctl/.deps
? src/bin/pg_ctl/pg_ctl
? src/bin/pg_ctl/po/cs.mo
? src/bin/pg_ctl/po/de.mo
? src/bin/pg_ctl/po/es.mo
? src/bin/pg_ctl/po/fr.mo
? src/bin/pg_ctl/po/it.mo
? src/bin/pg_ctl/po/ja.mo
? src/bin/pg_ctl/po/ko.mo
? src/bin/pg_ctl/po/pt_BR.mo
? src/bin/pg_ctl/po/ro.mo
? src/bin/pg_ctl/po/ru.mo
? src/bin/pg_ctl/po/sk.mo
? src/bin/pg_ctl/po/sl.mo
? src/bin/pg_ctl/po/sv.mo
? src/bin/pg_ctl/po/ta.mo
? src/bin/pg_ctl/po/tr.mo
? src/bin/pg_ctl/po/zh_CN.mo
? src/bin/pg_ctl/po/zh_TW.mo
? src/bin/pg_dump/.deps
? src/bin/pg_dump/pg_dump
? src/bin/pg_dump/pg_dumpall
? src/bin/pg_dump/pg_restore
? src/bin/pg_dump/po/cs.mo
? src/bin/pg_dump/po/de.mo
? src/bin/pg_dump/po/es.mo
? src/bin/pg_dump/po/fr.mo
? src/bin/pg_dump/po/it.mo
? src/bin/pg_dump/po/ja.mo
? src/bin/pg_dump/po/ko.mo
? src/bin/pg_dump/po/nb.mo
? src/bin/pg_dump/po/pt_BR.mo
? src/bin/pg_dump/po/ro.mo
? src/bin/pg_dump/po/ru.mo
? src/bin/pg_dump/po/sk.mo
? src/bin/pg_dump/po/sl.mo
? src/bin/pg_dump/po/sv.mo
? src/bin/pg_dump/po/tr.mo
? src/bin/pg_dump/po/zh_CN.mo
? src/bin/pg_dump/po/zh_TW.mo
? src/bin/pg_resetxlog/.deps
? src/bin/pg_resetxlog/pg_resetxlog
? src/bin/pg_resetxlog/po/cs.mo
? src/bin/pg_resetxlog/po/de.mo
? src/bin/pg_resetxlog/po/es.mo
? src/bin/pg_resetxlog/po/fr.mo
? src/bin/pg_resetxlog/po/hu.mo
? src/bin/pg_resetxlog/po/it.mo
? src/bin/pg_resetxlog/po/ja.mo
? src/bin/pg_resetxlog/po/ko.mo
? src/bin/pg_resetxlog/po/nb.mo
? src/bin/pg_resetxlog/po/pt_BR.mo
? src/bin/pg_resetxlog/po/ro.mo
? src/bin/pg_resetxlog/po/ru.mo
? src/bin/pg_resetxlog/po/sk.mo
? src/bin/pg_resetxlog/po/sl.mo
? src/bin/pg_resetxlog/po/sv.mo
? src/bin/pg_resetxlog/po/ta.mo
? src/bin/pg_resetxlog/po/tr.mo
? src/bin/pg_resetxlog/po/zh_CN.mo
? src/bin/pg_resetxlog/po/zh_TW.mo
? src/bin/psql/.deps
? src/bin/psql/psql
? src/bin/psql/po/cs.mo
? src/bin/psql/po/de.mo
? src/bin/psql/po/es.mo
? src/bin/psql/po/fa.mo
? src/bin/psql/po/fr.mo
? src/bin/psql/po/hu.mo
? src/bin/psql/po/it.mo
? src/bin/psql/po/ja.mo
? src/bin/psql/po/ko.mo
? src/bin/psql/po/nb.mo
? src/bin/psql/po/pt_BR.mo
? src/bin/psql/po/ro.mo
? src/bin/psql/po/ru.mo
? src/bin/psql/po/sk.mo
? src/bin/psql/po/sl.mo
? src/bin/psql/po/sv.mo
? src/bin/psql/po/tr.mo
? src/bin/psql/po/zh_CN.mo
? src/bin/psql/po/zh_TW.mo
? src/bin/scripts/.deps
? src/bin/scripts/clusterdb
? src/bin/scripts/createdb
? src/bin/scripts/createlang
? src/bin/scripts/createuser
? src/bin/scripts/dropdb
? src/bin/scripts/droplang
? src/bin/scripts/dropuser
? src/bin/scripts/reindexdb
? src/bin/scripts/vacuumdb
? src/bin/scripts/po/cs.mo
? src/bin/scripts/po/de.mo
? src/bin/scripts/po/es.mo
? src/bin/scripts/po/fr.mo
? src/bin/scripts/po/it.mo
? src/bin/scripts/po/ja.mo
? src/bin/scripts/po/ko.mo
? src/bin/scripts/po/pt_BR.mo
? src/bin/scripts/po/ro.mo
? src/bin/scripts/po/ru.mo
? src/bin/scripts/po/sk.mo
? src/bin/scripts/po/sl.mo
? src/bin/scripts/po/sv.mo
? src/bin/scripts/po/ta.mo
? src/bin/scripts/po/tr.mo
? src/bin/scripts/po/zh_CN.mo
? src/bin/scripts/po/zh_TW.mo
? src/include/pg_config.h
? src/include/stamp-h
? src/interfaces/ecpg/compatlib/.deps
? src/interfaces/ecpg/compatlib/exports.list
? src/interfaces/ecpg/compatlib/libecpg_compat.so.3.1
? src/interfaces/ecpg/compatlib/libecpg_compat.so.3.2
? src/interfaces/ecpg/ecpglib/.deps
? src/interfaces/ecpg/ecpglib/exports.list
? src/interfaces/ecpg/ecpglib/libecpg.so.6.1
? src/interfaces/ecpg/ecpglib/libecpg.so.6.2
? src/interfaces/ecpg/ecpglib/po/de.mo
? src/interfaces/ecpg/ecpglib/po/es.mo
? src/interfaces/ecpg/ecpglib/po/fr.mo
? src/interfaces/ecpg/ecpglib/po/it.mo
? src/interfaces/ecpg/ecpglib/po/ja.mo
? src/interfaces/ecpg/ecpglib/po/pt_BR.mo
? src/interfaces/ecpg/ecpglib/po/tr.mo
? src/interfaces/ecpg/include/ecpg_config.h
? src/interfaces/ecpg/include/stamp-h
? src/interfaces/ecpg/pgtypeslib/.deps
? src/interfaces/ecpg/pgtypeslib/exports.list
? src/interfaces/ecpg/pgtypeslib/libpgtypes.so.3.1
? src/interfaces/ecpg/pgtypeslib/libpgtypes.so.3.2
? src/interfaces/ecpg/preproc/.deps
? src/interfaces/ecpg/preproc/ecpg
? src/interfaces/ecpg/preproc/po/de.mo
? src/interfaces/ecpg/preproc/po/es.mo
? src/interfaces/ecpg/preproc/po/fr.mo
? src/interfaces/ecpg/preproc/po/it.mo
? src/interfaces/ecpg/preproc/po/ja.mo
? src/interfaces/ecpg/preproc/po/pt_BR.mo
? src/interfaces/ecpg/preproc/po/tr.mo
? src/interfaces/libpq/.deps
? src/interfaces/libpq/exports.list
? src/interfaces/libpq/libpq.so.5.2
? src/interfaces/libpq/libpq.so.5.3
? src/interfaces/libpq/po/af.mo
? src/interfaces/libpq/po/cs.mo
? src/interfaces/libpq/po/de.mo
? src/interfaces/libpq/po/es.mo
? src/interfaces/libpq/po/fr.mo
? src/interfaces/libpq/po/hr.mo
? src/interfaces/libpq/po/it.mo
? src/interfaces/libpq/po/ja.mo
? src/interfaces/libpq/po/ko.mo
? src/interfaces/libpq/po/nb.mo
? src/interfaces/libpq/po/pl.mo
? src/interfaces/libpq/po/pt_BR.mo
? src/interfaces/libpq/po/ru.mo
? src/interfaces/libpq/po/sk.mo
? src/interfaces/libpq/po/sl.mo
? src/interfaces/libpq/po/sv.mo
? src/interfaces/libpq/po/ta.mo
? src/interfaces/libpq/po/tr.mo
? src/interfaces/libpq/po/zh_CN.mo
? src/interfaces/libpq/po/zh_TW.mo
? src/pl/plperl/.deps
? src/pl/plperl/SPI.c
? src/pl/plperl/perlchunks.h
? src/pl/plperl/po/de.mo
? src/pl/plperl/po/es.mo
? src/pl/plperl/po/fr.mo
? src/pl/plperl/po/it.mo
? src/pl/plperl/po/ja.mo
? src/pl/plperl/po/pt_BR.mo
? src/pl/plperl/po/tr.mo
? src/pl/plpgsql/src/.deps
? src/pl/plpgsql/src/pl_scan.c
? src/pl/plpgsql/src/po/de.mo
? src/pl/plpgsql/src/po/es.mo
? src/pl/plpgsql/src/po/fr.mo
? src/pl/plpgsql/src/po/it.mo
? src/pl/plpgsql/src/po/ja.mo
? src/pl/plpgsql/src/po/ro.mo
? src/pl/plpgsql/src/po/tr.mo
? src/port/.deps
? src/port/pg_config_paths.h
? src/test/regress/.deps
? src/test/regress/log
? src/test/regress/pg_regress
? src/test/regress/results
? src/test/regress/testtablespace
? src/test/regress/tmp_check
? src/test/regress/expected/constraints.out
? src/test/regress/expected/copy.out
? src/test/regress/expected/create_function_1.out
? src/test/regress/expected/create_function_2.out
? src/test/regress/expected/largeobject.out
? src/test/regress/expected/largeobject_1.out
? src/test/regress/expected/misc.out
? src/test/regress/expected/tablespace.out
? src/test/regress/sql/constraints.sql
? src/test/regress/sql/copy.sql
? src/test/regress/sql/create_function_1.sql
? src/test/regress/sql/create_function_2.sql
? src/test/regress/sql/largeobject.sql
? src/test/regress/sql/misc.sql
? src/test/regress/sql/tablespace.sql
? src/timezone/.deps
? src/timezone/zic
Index: src/backend/access/heap/heapam.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/access/heap/heapam.c,v
retrieving revision 1.282
diff -u -r1.282 heapam.c
--- src/backend/access/heap/heapam.c    14 Jan 2010 11:08:00 -0000    1.282
+++ src/backend/access/heap/heapam.c    15 Jan 2010 11:27:44 -0000
@@ -5074,10 +5074,16 @@
 void
 heap_sync(Relation rel)
 {
+    char reason[NAMEDATALEN + 30];
+
     /* temp tables never need fsync */
     if (rel->rd_istemp)
         return;

+    snprintf(reason, sizeof(reason), "heap inserts on \"%s\"",
+             RelationGetRelationName(rel));
+    XLogSkipLogging(reason);
+
     /* main heap */
     FlushRelationBuffers(rel);
     /* FlushRelationBuffers will have opened rd_smgr */
Index: src/backend/access/nbtree/nbtsort.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/access/nbtree/nbtsort.c,v
retrieving revision 1.122
diff -u -r1.122 nbtsort.c
--- src/backend/access/nbtree/nbtsort.c    15 Jan 2010 09:19:00 -0000    1.122
+++ src/backend/access/nbtree/nbtsort.c    15 Jan 2010 11:27:44 -0000
@@ -215,6 +215,18 @@
      */
     wstate.btws_use_wal = XLogIsNeeded() && !wstate.index->rd_istemp;

+    /*
+     * Write an XLOG UNLOGGED record if WAL-logging was skipped because
+     * WAL archiving is not enabled.
+     */
+    if (!wstate.btws_use_wal && !wstate.index->rd_istemp)
+    {
+        char reason[NAMEDATALEN + 20];
+        snprintf(reason, sizeof(reason), "b-tree build on \"%s\"",
+                 RelationGetRelationName(wstate.index));
+        XLogSkipLogging(reason);
+    }
+
     /* reserve the metapage */
     wstate.btws_pages_alloced = BTREE_METAPAGE + 1;
     wstate.btws_pages_written = 0;
Index: src/backend/access/transam/xlog.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.358
diff -u -r1.358 xlog.c
--- src/backend/access/transam/xlog.c    15 Jan 2010 09:19:00 -0000    1.358
+++ src/backend/access/transam/xlog.c    15 Jan 2010 11:27:45 -0000
@@ -7562,6 +7562,31 @@
 }

 /*
+ * Write an XLOG UNLOGGED record, indicating that some operation was
+ * performed on data that we fsync()'d directly to disk, skipping
+ * WAL-logging.
+ *
+ * Such operations screw up archive recovery, so we complain if we see
+ * these records during archive recovery. That shouldn't happen in a
+ * correctly configured server, but you can induce it by temporarily
+ * disabling archiving and restarting, so it's good to at least get a
+ * warning of silent data loss in such cases. These records serve no
+ * other purpose and are simply ignored during crash recovery.
+ */
+void
+XLogSkipLogging(char *reason)
+{
+    XLogRecData rdata;
+
+    rdata.buffer = InvalidBuffer;
+    rdata.data = reason;
+    rdata.len = strlen(reason) + 1;
+    rdata.next = NULL;
+
+    XLogInsert(RM_XLOG_ID, XLOG_UNLOGGED, &rdata);
+}
+
+/*
  * XLOG resource manager's routines
  *
  * Definitions of info values are in include/catalog/pg_control.h, though
@@ -7703,6 +7728,19 @@
             LWLockRelease(ControlFileLock);
         }
     }
+    else if (info == XLOG_UNLOGGED)
+    {
+        if (InArchiveRecovery)
+        {
+            /*
+             * Note: We don't print the reason string from the record,
+             * because that gets added as a line using xlog_desc()
+             */
+            ereport(WARNING,
+                    (errmsg("unlogged operation performed, data may be missing"),
+                     errhint("This can happen if you temporarily disable archive_mode without taking a new base
backup.")));
+        }
+    }
 }

 void
@@ -7752,6 +7790,12 @@
         appendStringInfo(buf, "backup end: %X/%X",
                          startpoint.xlogid, startpoint.xrecoff);
     }
+    else if (info == XLOG_UNLOGGED)
+    {
+        char *reason = rec;
+
+        appendStringInfo(buf, "unlogged operation: %s", reason);
+    }
     else
         appendStringInfo(buf, "UNKNOWN");
 }
Index: src/backend/commands/cluster.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/cluster.c,v
retrieving revision 1.193
diff -u -r1.193 cluster.c
--- src/backend/commands/cluster.c    15 Jan 2010 09:19:01 -0000    1.193
+++ src/backend/commands/cluster.c    15 Jan 2010 11:27:45 -0000
@@ -821,6 +821,18 @@
      */
     use_wal = XLogIsNeeded() && !NewHeap->rd_istemp;

+    /*
+     * Write an XLOG UNLOGGED record if WAL-logging was skipped because
+     * WAL archiving is not enabled.
+     */
+    if (!use_wal && !NewHeap->rd_istemp)
+    {
+        char reason[NAMEDATALEN + 20];
+        snprintf(reason, sizeof(reason), "CLUSTER on \"%s\"",
+                 RelationGetRelationName(NewHeap));
+        XLogSkipLogging(reason);
+    }
+
     /* use_wal off requires rd_targblock be initially invalid */
     Assert(NewHeap->rd_targblock == InvalidBlockNumber);

Index: src/backend/commands/tablecmds.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/tablecmds.c,v
retrieving revision 1.315
diff -u -r1.315 tablecmds.c
--- src/backend/commands/tablecmds.c    15 Jan 2010 09:19:01 -0000    1.315
+++ src/backend/commands/tablecmds.c    15 Jan 2010 11:27:45 -0000
@@ -7018,6 +7018,19 @@

     heap_close(pg_class, RowExclusiveLock);

+    /*
+     * Write an XLOG UNLOGGED record if WAL-logging was skipped because
+     * WAL archiving is not enabled.
+     */
+    if (!XLogIsNeeded() && !rel->rd_istemp)
+    {
+        char reason[NAMEDATALEN + 40];
+        snprintf(reason, sizeof(reason), "ALTER TABLE SET TABLESPACE on \"%s\"",
+                 RelationGetRelationName(rel));
+
+        XLogSkipLogging(reason);
+    }
+
     relation_close(rel, NoLock);

     /* Make sure the reltablespace change is visible */
@@ -7046,6 +7059,10 @@
     /*
      * We need to log the copied data in WAL iff WAL archiving/streaming is
      * enabled AND it's not a temp rel.
+     *
+     * Note: If you change the conditions here, update the conditions in
+     * ATExecSetTableSpace() for when an XLOG UNLOGGED record is written
+     * to match.
      */
     use_wal = XLogIsNeeded() && !istemp;

Index: src/include/access/xlog.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/access/xlog.h,v
retrieving revision 1.96
diff -u -r1.96 xlog.h
--- src/include/access/xlog.h    15 Jan 2010 09:19:06 -0000    1.96
+++ src/include/access/xlog.h    15 Jan 2010 11:27:45 -0000
@@ -278,6 +278,7 @@
 extern void CreateCheckPoint(int flags);
 extern bool CreateRestartPoint(int flags);
 extern void XLogPutNextOid(Oid nextOid);
+extern void XLogSkipLogging(char *reason);
 extern XLogRecPtr GetRedoRecPtr(void);
 extern XLogRecPtr GetInsertRecPtr(void);
 extern XLogRecPtr GetWriteRecPtr(void);
Index: src/include/catalog/pg_control.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/catalog/pg_control.h,v
retrieving revision 1.48
diff -u -r1.48 pg_control.h
--- src/include/catalog/pg_control.h    4 Jan 2010 12:50:50 -0000    1.48
+++ src/include/catalog/pg_control.h    15 Jan 2010 11:27:45 -0000
@@ -63,6 +63,7 @@
 #define XLOG_NEXTOID                    0x30
 #define XLOG_SWITCH                        0x40
 #define XLOG_BACKUP_END                    0x50
+#define XLOG_UNLOGGED                    0x60


 /* System status indicator */

Re: New XLOG record indicating WAL-skipping

From
Greg Stark
Date:
On Fri, Jan 15, 2010 at 11:28 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> I can see that it
> was required to avoid the flooding from heap_insert(), but we can move
> the XLogSkipLogging() call from heap_insert() to heap_sync().
>
> Attached is an updated patch, doing the above. Am I missing anything?

Hm, perhaps the timing is actually important? What if someone takes a
hot backup while an unlogged operation is in progress. The checkpoint
can occur and finish and the backup finish all while the unlogged
operation is happening. Then the replica can start restoring archived
logs from that point forward. In the original coding it sounds like
the replica would never notice the unlogged operation which might not
have been synced before the start of the initial hot backup.  If the
record occurs when the sync begins then the replica would be in
trouble if the checkpoint begins before the operation completed but
finished after the sync began and the record was emitted.

It seems like it's important that the record occur only after the sync
*completes* to be sure that if the replica doesn't see the record then
it knows the sync was done before its initial backup image was taken.

-- 
greg


Re: New XLOG record indicating WAL-skipping

From
Heikki Linnakangas
Date:
Greg Stark wrote:
> What if someone takes a hot backup while an unlogged operation is in progress.

Can't do that, pg_start_backup() throws an error if archive_mode=off.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: New XLOG record indicating WAL-skipping

From
Fujii Masao
Date:
On Fri, Jan 15, 2010 at 8:28 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> I don't like special-casing UNLOGGED records in XLogInsert and
> ReadRecord(). Those functions are complicated enough already. The
> special handling from XLogInsert() (and a few other places) is only
> required because the UNLOGGED records carry no payload. That's easy to
> avoid, just add some payload to them, doesn't matter what it is. And I
> don't think ReadRecord() is the right place to emit the errors/warnings,
> that belongs naturally in xlog_redo().
>
> It might be useful to add some information in the records telling why
> WAL-logging was skipped. It might turn out to be useful in debugging.
> That also conveniently adds payload to the records, to avoid the
> special-casing in XLogInsert() :-).
>
> I think it's a premature optimization to skip writing the records if
> we've written in the same session already. Especially with the 'reason'
> information added to the records, it's nice to have a record of each
> such operation. All operations that skip WAL-logging are heavy enough
> that an additional WAL record will make no difference. I can see that it
> was required to avoid the flooding from heap_insert(), but we can move
> the XLogSkipLogging() call from heap_insert() to heap_sync().
>
> Attached is an updated patch, doing the above. Am I missing anything?

Thanks a lot! Your change seems to be OK.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: New XLOG record indicating WAL-skipping

From
Fujii Masao
Date:
On Sat, Jan 16, 2010 at 3:16 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> Attached is an updated patch, doing the above. Am I missing anything?
>
> Thanks a lot! Your change seems to be OK.

We'll need to do some more work after the following patch
has been committed.
http://archives.postgresql.org/pgsql-hackers/2010-01/msg01715.php

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: New XLOG record indicating WAL-skipping

From
Simon Riggs
Date:
On Fri, 2010-01-15 at 13:28 +0200, Heikki Linnakangas wrote:
> I think it's a premature optimization to skip writing the records if
> we've written in the same session already. Especially with the
> 'reason'
> information added to the records, it's nice to have a record of each
> such operation. All operations that skip WAL-logging are heavy enough
> that an additional WAL record will make no difference. I can see that
> it
> was required to avoid the flooding from heap_insert(), but we can move
> the XLogSkipLogging() call from heap_insert() to heap_sync().

Can we call that XLogReportUnloggedStatement() or similar?
XlogSkipLogging() sounds like a request rather than a mark/report/record
type of action.

> Attached is an updated patch, doing the above. Am I missing anything?

Sounds OK and works with Hot Standby.

-- Simon Riggs           www.2ndQuadrant.com



Re: New XLOG record indicating WAL-skipping

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> Can we call that XLogReportUnloggedStatement() or similar?
> XlogSkipLogging() sounds like a request rather than a mark/report/record
> type of action.

Agreed. I vote for XLogReportUnloggedOperation(). I'll change it to that
before committing, unless Fujii beats me to it.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: New XLOG record indicating WAL-skipping

From
Fujii Masao
Date:
On Mon, Jan 18, 2010 at 9:17 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Agreed. I vote for XLogReportUnloggedOperation().

OK.

> I'll change it to that
> before committing, unless Fujii beats me to it.

Yeah, please go ahead.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center