Thread: Should XLogInsert() be done only inside a critical section?

Should XLogInsert() be done only inside a critical section?

From
Tom Lane
Date:
Over in <17456.1460832307@sss.pgh.pa.us> I speculated about whether
we should be enforcing that WAL insertion happen only inside critical
sections.  We don't currently, and a survey of the backend says that
there are quite a few calls that aren't inside critical sections.
But there are at least two good reasons why we should, IMO:

1. It's not very clear that XLogInsert will recover cleanly if it's
invoked outside a critical section and hits a failure.  Certainly,
if we allow such usage, then every potential error inside that code
has to be analyzed under both critical-section and normal rules.

2. With no such check, it's quite easy for calling code to forget to
create a critical section around code stanzas where one is *necessary*
(because you're changing shared-buffer contents).

Both of these points represent pretty clear hazards for introduction
of future bugs, whether or not there are any such bugs today.

As against this, it could be argued that adding critical sections where
they're not absolutely necessary must make crashing failures more probable
than they need to be.  But first you'd have to prove that they're not
absolutely necessary, which I'm unsure about because of point #1.

Anyway, I went through our tree and added START/END_CRIT_SECTION calls
around all XLogInsert calls that could currently be reached without one;
see attached.  Since this potentially breaks third-party code I would
not propose back-patching it, but I think it's reasonable to propose
applying it to HEAD.

Thoughts?

            regards, tom lane

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 89bad05..8e41802 100644
*** a/src/backend/access/brin/brin.c
--- b/src/backend/access/brin/brin.c
*************** brinbuild(Relation heap, Relation index,
*** 610,624 ****
          elog(ERROR, "index \"%s\" already contains data",
               RelationGetRelationName(index));

-     /*
-      * Critical section not required, because on error the creation of the
-      * whole relation will be rolled back.
-      */
-
      meta = ReadBuffer(index, P_NEW);
      Assert(BufferGetBlockNumber(meta) == BRIN_METAPAGE_BLKNO);
      LockBuffer(meta, BUFFER_LOCK_EXCLUSIVE);

      brin_metapage_init(BufferGetPage(meta), BrinGetPagesPerRange(index),
                         BRIN_CURRENT_VERSION);
      MarkBufferDirty(meta);
--- 610,621 ----
          elog(ERROR, "index \"%s\" already contains data",
               RelationGetRelationName(index));

      meta = ReadBuffer(index, P_NEW);
      Assert(BufferGetBlockNumber(meta) == BRIN_METAPAGE_BLKNO);
      LockBuffer(meta, BUFFER_LOCK_EXCLUSIVE);

+     START_CRIT_SECTION();
+
      brin_metapage_init(BufferGetPage(meta), BrinGetPagesPerRange(index),
                         BRIN_CURRENT_VERSION);
      MarkBufferDirty(meta);
*************** brinbuild(Relation heap, Relation index,
*** 644,649 ****
--- 641,648 ----

      UnlockReleaseBuffer(meta);

+     END_CRIT_SECTION();
+
      /*
       * Initialize our state, including the deformed tuple state.
       */
diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
index 2ddf568..7fc47ec 100644
*** a/src/backend/access/gin/ginfast.c
--- b/src/backend/access/gin/ginfast.c
*************** ginHeapTupleFastInsert(GinState *ginstat
*** 277,282 ****
--- 277,284 ----
          memset(&sublist, 0, sizeof(GinMetaPageData));
          makeSublist(index, collector->tuples, collector->ntuples, &sublist);

+         START_CRIT_SECTION();
+
          if (needWal)
              XLogBeginInsert();

*************** ginHeapTupleFastInsert(GinState *ginstat
*** 291,298 ****
              /*
               * Main list is empty, so just insert sublist as main list
               */
-             START_CRIT_SECTION();
-
              metadata->head = sublist.head;
              metadata->tail = sublist.tail;
              metadata->tailFreeSize = sublist.tailFreeSize;
--- 293,298 ----
*************** ginHeapTupleFastInsert(GinState *ginstat
*** 314,321 ****

              Assert(GinPageGetOpaque(page)->rightlink == InvalidBlockNumber);

-             START_CRIT_SECTION();
-
              GinPageGetOpaque(page)->rightlink = sublist.head;

              MarkBufferDirty(buffer);
--- 314,319 ----
*************** ginHeapTupleFastInsert(GinState *ginstat
*** 353,363 ****

          data.ntuples = collector->ntuples;

          if (needWal)
              XLogBeginInsert();

-         START_CRIT_SECTION();
-
          /*
           * Increase counter of heap tuples
           */
--- 351,361 ----

          data.ntuples = collector->ntuples;

+         START_CRIT_SECTION();
+
          if (needWal)
              XLogBeginInsert();

          /*
           * Increase counter of heap tuples
           */
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 950bfc8..6010164 100644
*** a/src/backend/access/heap/heapam.c
--- b/src/backend/access/heap/heapam.c
*************** log_heap_cleanup_info(RelFileNode rnode,
*** 7151,7161 ****
--- 7151,7165 ----
      xlrec.node = rnode;
      xlrec.latestRemovedXid = latestRemovedXid;

+     START_CRIT_SECTION();
+
      XLogBeginInsert();
      XLogRegisterData((char *) &xlrec, SizeOfHeapCleanupInfo);

      recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_CLEANUP_INFO);

+     END_CRIT_SECTION();
+
      return recptr;
  }

diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index f9ce986..f0f89cd 100644
*** a/src/backend/access/heap/rewriteheap.c
--- b/src/backend/access/heap/rewriteheap.c
*************** logical_heap_rewrite_flush_mappings(Rewr
*** 926,931 ****
--- 926,933 ----
                              written, len)));
          src->off += len;

+         START_CRIT_SECTION();
+
          XLogBeginInsert();
          XLogRegisterData((char *) (&xlrec), sizeof(xlrec));
          XLogRegisterData(waldata_start, len);
*************** logical_heap_rewrite_flush_mappings(Rewr
*** 933,938 ****
--- 935,942 ----
          /* write xlog record */
          XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_REWRITE);

+         END_CRIT_SECTION();
+
          pfree(waldata_start);
      }
      Assert(state->rs_num_rewrite_mappings == 0);
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index 390bd1a..d2f4fca 100644
*** a/src/backend/access/nbtree/nbtpage.c
--- b/src/backend/access/nbtree/nbtpage.c
*************** _bt_log_reuse_page(Relation rel, BlockNu
*** 548,557 ****
--- 548,561 ----
      xlrec_reuse.block = blkno;
      xlrec_reuse.latestRemovedXid = latestRemovedXid;

+     START_CRIT_SECTION();
+
      XLogBeginInsert();
      XLogRegisterData((char *) &xlrec_reuse, SizeOfBtreeReusePage);

      XLogInsert(RM_BTREE_ID, XLOG_BTREE_REUSE_PAGE);
+
+     END_CRIT_SECTION();
  }

  /*
diff --git a/src/backend/access/transam/README b/src/backend/access/transam/README
index 07efebc..d49e753 100644
*** a/src/backend/access/transam/README
--- b/src/backend/access/transam/README
*************** Details of the API functions:
*** 523,528 ****
--- 523,529 ----
  void XLogBeginInsert(void)

      Must be called before XLogRegisterBuffer and XLogRegisterData.
+     You must already be within a critical section when you call this.

  void XLogResetInsertion(void)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 2634476..a4a8dde 100644
*** a/src/backend/access/transam/clog.c
--- b/src/backend/access/transam/clog.c
*************** CLOGPagePrecedes(int page1, int page2)
*** 692,700 ****
--- 692,702 ----
  static void
  WriteZeroPageXlogRec(int pageno)
  {
+     START_CRIT_SECTION();
      XLogBeginInsert();
      XLogRegisterData((char *) (&pageno), sizeof(int));
      (void) XLogInsert(RM_CLOG_ID, CLOG_ZEROPAGE);
+     END_CRIT_SECTION();
  }

  /*
*************** WriteTruncateXlogRec(int pageno)
*** 708,716 ****
--- 710,720 ----
  {
      XLogRecPtr    recptr;

+     START_CRIT_SECTION();
      XLogBeginInsert();
      XLogRegisterData((char *) (&pageno), sizeof(int));
      recptr = XLogInsert(RM_CLOG_ID, CLOG_TRUNCATE);
+     END_CRIT_SECTION();
      XLogFlush(recptr);
  }

diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 1713439..76b550b 100644
*** a/src/backend/access/transam/commit_ts.c
--- b/src/backend/access/transam/commit_ts.c
*************** CommitTsPagePrecedes(int page1, int page
*** 891,899 ****
--- 891,901 ----
  static void
  WriteZeroPageXlogRec(int pageno)
  {
+     START_CRIT_SECTION();
      XLogBeginInsert();
      XLogRegisterData((char *) (&pageno), sizeof(int));
      (void) XLogInsert(RM_COMMIT_TS_ID, COMMIT_TS_ZEROPAGE);
+     END_CRIT_SECTION();
  }

  /*
*************** WriteZeroPageXlogRec(int pageno)
*** 902,910 ****
--- 904,914 ----
  static void
  WriteTruncateXlogRec(int pageno)
  {
+     START_CRIT_SECTION();
      XLogBeginInsert();
      XLogRegisterData((char *) (&pageno), sizeof(int));
      (void) XLogInsert(RM_COMMIT_TS_ID, COMMIT_TS_TRUNCATE);
+     END_CRIT_SECTION();
  }

  /*
*************** WriteSetTimestampXlogRec(TransactionId m
*** 921,932 ****
--- 925,938 ----
      record.nodeid = nodeid;
      record.mainxid = mainxid;

+     START_CRIT_SECTION();
      XLogBeginInsert();
      XLogRegisterData((char *) &record,
                       offsetof(xl_commit_ts_set, mainxid) +
                       sizeof(TransactionId));
      XLogRegisterData((char *) subxids, nsubxids * sizeof(TransactionId));
      XLogInsert(RM_COMMIT_TS_ID, COMMIT_TS_SETTS);
+     END_CRIT_SECTION();
  }

  /*
diff --git a/src/backend/access/transam/generic_xlog.c b/src/backend/access/transam/generic_xlog.c
index 6e213e2..875cac6 100644
*** a/src/backend/access/transam/generic_xlog.c
--- b/src/backend/access/transam/generic_xlog.c
*************** GenericXLogFinish(GenericXLogState *stat
*** 330,339 ****
      if (state->isLogged)
      {
          /* Logged relation: make xlog record in critical section. */
-         XLogBeginInsert();
-
          START_CRIT_SECTION();

          for (i = 0; i < MAX_GENERIC_XLOG_PAGES; i++)
          {
              PageData   *pageData = &state->pages[i];
--- 330,339 ----
      if (state->isLogged)
      {
          /* Logged relation: make xlog record in critical section. */
          START_CRIT_SECTION();

+         XLogBeginInsert();
+
          for (i = 0; i < MAX_GENERIC_XLOG_PAGES; i++)
          {
              PageData   *pageData = &state->pages[i];
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index a677af0..3b3fae2 100644
*** a/src/backend/access/transam/multixact.c
--- b/src/backend/access/transam/multixact.c
*************** MultiXactOffsetPrecedes(MultiXactOffset
*** 3175,3183 ****
--- 3175,3185 ----
  static void
  WriteMZeroPageXlogRec(int pageno, uint8 info)
  {
+     START_CRIT_SECTION();
      XLogBeginInsert();
      XLogRegisterData((char *) (&pageno), sizeof(int));
      (void) XLogInsert(RM_MULTIXACT_ID, info);
+     END_CRIT_SECTION();
  }

  /*
*************** WriteMTruncateXlogRec(Oid oldestMultiDB,
*** 3202,3210 ****
--- 3204,3214 ----
      xlrec.startTruncMemb = startTruncMemb;
      xlrec.endTruncMemb = endTruncMemb;

+     START_CRIT_SECTION();
      XLogBeginInsert();
      XLogRegisterData((char *) (&xlrec), SizeOfMultiXactTruncate);
      recptr = XLogInsert(RM_MULTIXACT_ID, XLOG_MULTIXACT_TRUNCATE_ID);
+     END_CRIT_SECTION();
      XLogFlush(recptr);
  }

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 7e37331..cd4700d 100644
*** a/src/backend/access/transam/xact.c
--- b/src/backend/access/transam/xact.c
*************** AssignTransactionId(TransactionState s)
*** 625,636 ****
--- 625,638 ----
              Assert(TransactionIdIsValid(xlrec.xtop));
              xlrec.nsubxacts = nUnreportedXids;

+             START_CRIT_SECTION();
              XLogBeginInsert();
              XLogRegisterData((char *) &xlrec, MinSizeOfXactAssignment);
              XLogRegisterData((char *) unreportedXids,
                               nUnreportedXids * sizeof(TransactionId));

              (void) XLogInsert(RM_XACT_ID, XLOG_XACT_ASSIGNMENT);
+             END_CRIT_SECTION();

              nUnreportedXids = 0;
              /* mark top, not current xact as having been logged */
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f9644db..05e3248 100644
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
*************** KeepLogSeg(XLogRecPtr recptr, XLogSegNo
*** 8970,8978 ****
--- 8970,8980 ----
  void
  XLogPutNextOid(Oid nextOid)
  {
+     START_CRIT_SECTION();
      XLogBeginInsert();
      XLogRegisterData((char *) (&nextOid), sizeof(Oid));
      (void) XLogInsert(RM_XLOG_ID, XLOG_NEXTOID);
+     END_CRIT_SECTION();

      /*
       * We need not flush the NEXTOID record immediately, because any of the
*************** RequestXLogSwitch(void)
*** 9010,9017 ****
--- 9012,9021 ----
      XLogRecPtr    RecPtr;

      /* XLOG SWITCH has no data */
+     START_CRIT_SECTION();
      XLogBeginInsert();
      RecPtr = XLogInsert(RM_XLOG_ID, XLOG_SWITCH);
+     END_CRIT_SECTION();

      return RecPtr;
  }
*************** XLogRestorePoint(const char *rpName)
*** 9028,9037 ****
--- 9032,9043 ----
      xlrec.rp_time = GetCurrentTimestamp();
      strlcpy(xlrec.rp_name, rpName, MAXFNAMELEN);

+     START_CRIT_SECTION();
      XLogBeginInsert();
      XLogRegisterData((char *) &xlrec, sizeof(xl_restore_point));

      RecPtr = XLogInsert(RM_XLOG_ID, XLOG_RESTORE_POINT);
+     END_CRIT_SECTION();

      ereport(LOG,
              (errmsg("restore point \"%s\" created at %X/%X",
*************** XLogReportParameters(void)
*** 9075,9084 ****
--- 9081,9093 ----
              xlrec.wal_log_hints = wal_log_hints;
              xlrec.track_commit_timestamp = track_commit_timestamp;

+             START_CRIT_SECTION();
              XLogBeginInsert();
              XLogRegisterData((char *) &xlrec, sizeof(xlrec));

              recptr = XLogInsert(RM_XLOG_ID, XLOG_PARAMETER_CHANGE);
+             END_CRIT_SECTION();
+
              XLogFlush(recptr);
          }

*************** do_pg_stop_backup(char *labelfile, bool
*** 10456,10464 ****
--- 10465,10476 ----
      /*
       * Write the backup-end xlog record
       */
+     START_CRIT_SECTION();
      XLogBeginInsert();
      XLogRegisterData((char *) (&startpoint), sizeof(startpoint));
      stoppoint = XLogInsert(RM_XLOG_ID, XLOG_BACKUP_END);
+     END_CRIT_SECTION();
+
      stoptli = ThisTimeLineID;

      /*
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index c37003a..1a992df 100644
*** a/src/backend/access/transam/xloginsert.c
--- b/src/backend/access/transam/xloginsert.c
*************** static bool XLogCompressBackupBlock(char
*** 119,124 ****
--- 119,127 ----
  void
  XLogBeginInsert(void)
  {
+     /* must already be in a critical section */
+     Assert(CritSectionCount > 0);
+
      Assert(max_registered_block_id == 0);
      Assert(mainrdata_last == (XLogRecData *) &mainrdata_head);
      Assert(mainrdata_len == 0);
*************** XLogSaveBufferForHint(Buffer buffer, boo
*** 906,911 ****
--- 909,916 ----
          else
              memcpy(copied_buffer, origdata, BLCKSZ);

+         START_CRIT_SECTION();
+
          XLogBeginInsert();

          flags = REGBUF_FORCE_IMAGE;
*************** XLogSaveBufferForHint(Buffer buffer, boo
*** 916,921 ****
--- 921,928 ----
          XLogRegisterBlock(0, &rnode, forkno, blkno, copied_buffer, flags);

          recptr = XLogInsert(RM_XLOG_ID, XLOG_FPI_FOR_HINT);
+
+         END_CRIT_SECTION();
      }

      return recptr;
*************** log_newpage(RelFileNode *rnode, ForkNumb
*** 940,945 ****
--- 947,954 ----
      int            flags;
      XLogRecPtr    recptr;

+     START_CRIT_SECTION();
+
      flags = REGBUF_FORCE_IMAGE;
      if (page_std)
          flags |= REGBUF_STANDARD;
*************** log_newpage(RelFileNode *rnode, ForkNumb
*** 957,962 ****
--- 966,973 ----
          PageSetLSN(page, recptr);
      }

+     END_CRIT_SECTION();
+
      return recptr;
  }

diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index fe68c99..e71dba0 100644
*** a/src/backend/catalog/storage.c
--- b/src/backend/catalog/storage.c
***************
*** 27,32 ****
--- 27,33 ----
  #include "catalog/catalog.h"
  #include "catalog/storage.h"
  #include "catalog/storage_xlog.h"
+ #include "miscadmin.h"
  #include "storage/freespace.h"
  #include "storage/smgr.h"
  #include "utils/memutils.h"
*************** log_smgrcreate(RelFileNode *rnode, ForkN
*** 132,140 ****
--- 133,143 ----
      xlrec.rnode = *rnode;
      xlrec.forkNum = forkNum;

+     START_CRIT_SECTION();
      XLogBeginInsert();
      XLogRegisterData((char *) &xlrec, sizeof(xlrec));
      XLogInsert(RM_SMGR_ID, XLOG_SMGR_CREATE | XLR_SPECIAL_REL_UPDATE);
+     END_CRIT_SECTION();
  }

  /*
*************** RelationTruncate(Relation rel, BlockNumb
*** 269,279 ****
--- 272,284 ----
          xlrec.blkno = nblocks;
          xlrec.rnode = rel->rd_node;

+         START_CRIT_SECTION();
          XLogBeginInsert();
          XLogRegisterData((char *) &xlrec, sizeof(xlrec));

          lsn = XLogInsert(RM_SMGR_ID,
                           XLOG_SMGR_TRUNCATE | XLR_SPECIAL_REL_UPDATE);
+         END_CRIT_SECTION();

          /*
           * Flush, because otherwise the truncation of the main relation might
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index c1c0223..564c3cd 100644
*** a/src/backend/commands/dbcommands.c
--- b/src/backend/commands/dbcommands.c
*************** createdb(const CreatedbStmt *stmt)
*** 626,636 ****
--- 626,638 ----
                  xlrec.src_db_id = src_dboid;
                  xlrec.src_tablespace_id = srctablespace;

+                 START_CRIT_SECTION();
                  XLogBeginInsert();
                  XLogRegisterData((char *) &xlrec, sizeof(xl_dbase_create_rec));

                  (void) XLogInsert(RM_DBASE_ID,
                                    XLOG_DBASE_CREATE | XLR_SPECIAL_REL_UPDATE);
+                 END_CRIT_SECTION();
              }
          }
          heap_endscan(scan);
*************** movedb(const char *dbname, const char *t
*** 1234,1244 ****
--- 1236,1248 ----
              xlrec.src_db_id = db_id;
              xlrec.src_tablespace_id = src_tblspcoid;

+             START_CRIT_SECTION();
              XLogBeginInsert();
              XLogRegisterData((char *) &xlrec, sizeof(xl_dbase_create_rec));

              (void) XLogInsert(RM_DBASE_ID,
                                XLOG_DBASE_CREATE | XLR_SPECIAL_REL_UPDATE);
+             END_CRIT_SECTION();
          }

          /*
*************** movedb(const char *dbname, const char *t
*** 1334,1344 ****
--- 1338,1350 ----
          xlrec.db_id = db_id;
          xlrec.tablespace_id = src_tblspcoid;

+         START_CRIT_SECTION();
          XLogBeginInsert();
          XLogRegisterData((char *) &xlrec, sizeof(xl_dbase_drop_rec));

          (void) XLogInsert(RM_DBASE_ID,
                            XLOG_DBASE_DROP | XLR_SPECIAL_REL_UPDATE);
+         END_CRIT_SECTION();
      }

      /* Now it's safe to release the database lock */
*************** remove_dbtablespaces(Oid db_id)
*** 1875,1885 ****
--- 1881,1893 ----
              xlrec.db_id = db_id;
              xlrec.tablespace_id = dsttablespace;

+             START_CRIT_SECTION();
              XLogBeginInsert();
              XLogRegisterData((char *) &xlrec, sizeof(xl_dbase_drop_rec));

              (void) XLogInsert(RM_DBASE_ID,
                                XLOG_DBASE_DROP | XLR_SPECIAL_REL_UPDATE);
+             END_CRIT_SECTION();
          }

          pfree(dstpath);
diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c
index fe7f253..c4baa1a 100644
*** a/src/backend/commands/tablespace.c
--- b/src/backend/commands/tablespace.c
*************** CreateTableSpace(CreateTableSpaceStmt *s
*** 367,378 ****
--- 367,380 ----

          xlrec.ts_id = tablespaceoid;

+         START_CRIT_SECTION();
          XLogBeginInsert();
          XLogRegisterData((char *) &xlrec,
                           offsetof(xl_tblspc_create_rec, ts_path));
          XLogRegisterData((char *) location, strlen(location) + 1);

          (void) XLogInsert(RM_TBLSPC_ID, XLOG_TBLSPC_CREATE);
+         END_CRIT_SECTION();
      }

      /*
*************** DropTableSpace(DropTableSpaceStmt *stmt)
*** 524,533 ****
--- 526,537 ----

          xlrec.ts_id = tablespaceoid;

+         START_CRIT_SECTION();
          XLogBeginInsert();
          XLogRegisterData((char *) &xlrec, sizeof(xl_tblspc_drop_rec));

          (void) XLogInsert(RM_TBLSPC_ID, XLOG_TBLSPC_DROP);
+         END_CRIT_SECTION();
      }

      /*
diff --git a/src/backend/replication/logical/message.c b/src/backend/replication/logical/message.c
index efcc25a..c144993 100644
*** a/src/backend/replication/logical/message.c
--- b/src/backend/replication/logical/message.c
*************** XLogRecPtr
*** 51,56 ****
--- 51,57 ----
  LogLogicalMessage(const char *prefix, const char *message, size_t size,
                    bool transactional)
  {
+     XLogRecPtr    recptr;
      xl_logical_message    xlrec;

      /*
*************** LogLogicalMessage(const char *prefix, co
*** 67,72 ****
--- 68,75 ----
      xlrec.prefix_size = strlen(prefix) + 1;
      xlrec.message_size = size;

+     START_CRIT_SECTION();
+
      XLogBeginInsert();
      XLogRegisterData((char *) &xlrec, SizeOfLogicalMessage);
      XLogRegisterData((char *) prefix, xlrec.prefix_size);
*************** LogLogicalMessage(const char *prefix, co
*** 75,81 ****
      /* allow origin filtering */
      XLogIncludeOrigin();

!     return XLogInsert(RM_LOGICALMSG_ID, XLOG_LOGICAL_MESSAGE);
  }

  /*
--- 78,88 ----
      /* allow origin filtering */
      XLogIncludeOrigin();

!     recptr = XLogInsert(RM_LOGICALMSG_ID, XLOG_LOGICAL_MESSAGE);
!
!     END_CRIT_SECTION();
!
!     return recptr;
  }

  /*
diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c
index 9aeb2d8..342d525 100644
*** a/src/backend/replication/logical/origin.c
--- b/src/backend/replication/logical/origin.c
*************** replorigin_drop(RepOriginId roident)
*** 360,368 ****
--- 360,371 ----
                  xl_replorigin_drop xlrec;

                  xlrec.node_id = roident;
+
+                 START_CRIT_SECTION();
                  XLogBeginInsert();
                  XLogRegisterData((char *) (&xlrec), sizeof(xlrec));
                  XLogInsert(RM_REPLORIGIN_ID, XLOG_REPLORIGIN_DROP);
+                 END_CRIT_SECTION();
              }

              /* then reset the in-memory entry */
*************** replorigin_advance(RepOriginId node,
*** 891,900 ****
--- 894,905 ----
          xlrec.node_id = node;
          xlrec.force = go_backward;

+         START_CRIT_SECTION();
          XLogBeginInsert();
          XLogRegisterData((char *) (&xlrec), sizeof(xlrec));

          XLogInsert(RM_REPLORIGIN_ID, XLOG_REPLORIGIN_SET);
+         END_CRIT_SECTION();
      }

      /*
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 6a9bf84..8414a64 100644
*** a/src/backend/storage/ipc/standby.c
--- b/src/backend/storage/ipc/standby.c
*************** LogCurrentRunningXacts(RunningTransactio
*** 967,972 ****
--- 967,974 ----
      xlrec.oldestRunningXid = CurrRunningXacts->oldestRunningXid;
      xlrec.latestCompletedXid = CurrRunningXacts->latestCompletedXid;

+     START_CRIT_SECTION();
+
      /* Header */
      XLogBeginInsert();
      XLogRegisterData((char *) (&xlrec), MinSizeOfXactRunningXacts);
*************** LogCurrentRunningXacts(RunningTransactio
*** 978,983 ****
--- 980,987 ----

      recptr = XLogInsert(RM_STANDBY_ID, XLOG_RUNNING_XACTS);

+     END_CRIT_SECTION();
+
      if (CurrRunningXacts->subxid_overflow)
          elog(trace_recovery(DEBUG2),
               "snapshot of %u running transactions overflowed (lsn %X/%X oldest xid %u latest complete %u next xid
%u)",
*************** LogAccessExclusiveLocks(int nlocks, xl_s
*** 1020,1030 ****
--- 1024,1036 ----

      xlrec.nlocks = nlocks;

+     START_CRIT_SECTION();
      XLogBeginInsert();
      XLogRegisterData((char *) &xlrec, offsetof(xl_standby_locks, locks));
      XLogRegisterData((char *) locks, nlocks * sizeof(xl_standby_lock));

      (void) XLogInsert(RM_STANDBY_ID, XLOG_STANDBY_LOCK);
+     END_CRIT_SECTION();
  }

  /*

Re: Should XLogInsert() be done only inside a critical section?

From
Michael Paquier
Date:
On Thu, Apr 21, 2016 at 5:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Anyway, I went through our tree and added START/END_CRIT_SECTION calls
> around all XLogInsert calls that could currently be reached without one;
> see attached.  Since this potentially breaks third-party code I would
> not propose back-patching it, but I think it's reasonable to propose
> applying it to HEAD.

+1 for sanitizing those code paths this way. This patch looks sane to
me after having a look with some testing.

--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -610,15 +610,12 @@ brinbuild(Relation heap, Relation index,
IndexInfo *indexInfo)       elog(ERROR, "index \"%s\" already contains data",
RelationGetRelationName(index));

-   /*
-    * Critical section not required, because on error the creation of the
-    * whole relation will be rolled back.
-    */
Perhaps Alvaro has a opinion to offer regarding this bit removed in brin.c?
-- 
Michael



Re: Should XLogInsert() be done only inside a critical section?

From
Alvaro Herrera
Date:
Michael Paquier wrote:
> On Thu, Apr 21, 2016 at 5:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Anyway, I went through our tree and added START/END_CRIT_SECTION calls
> > around all XLogInsert calls that could currently be reached without one;
> > see attached.  Since this potentially breaks third-party code I would
> > not propose back-patching it, but I think it's reasonable to propose
> > applying it to HEAD.
> 
> +1 for sanitizing those code paths this way. This patch looks sane to
> me after having a look with some testing.
> 
> --- a/src/backend/access/brin/brin.c
> +++ b/src/backend/access/brin/brin.c
> @@ -610,15 +610,12 @@ brinbuild(Relation heap, Relation index,
> IndexInfo *indexInfo)
>         elog(ERROR, "index \"%s\" already contains data",
>              RelationGetRelationName(index));
> 
> -   /*
> -    * Critical section not required, because on error the creation of the
> -    * whole relation will be rolled back.
> -    */
> Perhaps Alvaro has a opinion to offer regarding this bit removed in brin.c?

I vaguely recall copying this comment from elsewhere, but I didn't see
any other such comment being removed by the patch; I probably copied
something else which got slowly mutated into what's there today during
development.

if we're adding the critical section then the comment should
certainly be removed too.  

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Should XLogInsert() be done only inside a critical section?

From
Michael Paquier
Date:
On Fri, Apr 22, 2016 at 5:18 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Michael Paquier wrote:
>> On Thu, Apr 21, 2016 at 5:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> > Anyway, I went through our tree and added START/END_CRIT_SECTION calls
>> > around all XLogInsert calls that could currently be reached without one;
>> > see attached.  Since this potentially breaks third-party code I would
>> > not propose back-patching it, but I think it's reasonable to propose
>> > applying it to HEAD.
>>
>> +1 for sanitizing those code paths this way. This patch looks sane to
>> me after having a look with some testing.
>>
>> --- a/src/backend/access/brin/brin.c
>> +++ b/src/backend/access/brin/brin.c
>> @@ -610,15 +610,12 @@ brinbuild(Relation heap, Relation index,
>> IndexInfo *indexInfo)
>>         elog(ERROR, "index \"%s\" already contains data",
>>              RelationGetRelationName(index));
>>
>> -   /*
>> -    * Critical section not required, because on error the creation of the
>> -    * whole relation will be rolled back.
>> -    */
>> Perhaps Alvaro has a opinion to offer regarding this bit removed in brin.c?
>
> I vaguely recall copying this comment from elsewhere, but I didn't see
> any other such comment being removed by the patch; I probably copied
> something else which got slowly mutated into what's there today during
> development.
>
> if we're adding the critical section then the comment should
> certainly be removed too.

A scan of the code is showing me that there are 88 sections in the
code containing a comment referring to a critical section, actually a
little bit more because those two terms are sometimes broken between
two lines. With Tom's patch applied, I have found two inconsistencies.

In RecordTransactionAbort@xact.c, there is the following comment that
would need a refresh because XactLogAbortRecord logs a record:   /* XXX do we really need a critical section here? */
START_CRIT_SECTION();

The comment of RelationTruncate@storage.c referring to the use of
critical sections should be updated.

Looking at that the access code while going through the patch, perhaps
it would be good to add some assertions regarding the presence of a
critical section in some code paths of gin. For example
dataExecPlaceToPage and entryExecPlaceToPage should be invoked in a
critical section per their comments. heap_page_prune_execute,
spgPageIndexMultiDelete, spgFormDeadTuple could be as well candidates
for such changes. Surely that's a different, HEAD-only patch though.
-- 
Michael



Re: Should XLogInsert() be done only inside a critical section?

From
Heikki Linnakangas
Date:
On 20/04/16 23:44, Tom Lane wrote:
> Over in <17456.1460832307@sss.pgh.pa.us> I speculated about whether
> we should be enforcing that WAL insertion happen only inside critical
> sections.  We don't currently, and a survey of the backend says that
> there are quite a few calls that aren't inside critical sections.
> But there are at least two good reasons why we should, IMO:
>
> 1. It's not very clear that XLogInsert will recover cleanly if it's
> invoked outside a critical section and hits a failure.  Certainly,
> if we allow such usage, then every potential error inside that code
> has to be analyzed under both critical-section and normal rules.

It was certainly designed to recover from errors gracefully. 
XLogInsertRecord(), which does the low-level work of inserting the 
record to the WAL buffer, has a critical section of its own inside it. 
The code in xloginsert.c, for constructing the record to insert, 
operates on backend-private buffers only.

> 2. With no such check, it's quite easy for calling code to forget to
> create a critical section around code stanzas where one is *necessary*
> (because you're changing shared-buffer contents).

Yeah, that is very true.

> Both of these points represent pretty clear hazards for introduction
> of future bugs, whether or not there are any such bugs today.
>
> As against this, it could be argued that adding critical sections where
> they're not absolutely necessary must make crashing failures more probable
> than they need to be.  But first you'd have to prove that they're not
> absolutely necessary, which I'm unsure about because of point #1.

One option would be to put the must-be-in-critical-section check in 
XLogRegisterBlock(). A WAL record that is associated with a shared 
buffer almost certainly needs a critical section, but many of the others 
are safe without it.

- Heikki