Re: Disable WAL logging to speed up data loading - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Disable WAL logging to speed up data loading
Date
Msg-id 20201002.100621.1668918756520136893.horikyota.ntt@gmail.com
Whole thread Raw
In response to RE: Disable WAL logging to speed up data loading  ("osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com>)
Responses Re: Disable WAL logging to speed up data loading
List pgsql-hackers
At Thu, 1 Oct 2020 08:14:42 +0000, "osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com> wrote in 
> Hi, Horiguchi-San and Fujii-San.
> 
> 
> Thank you so much both of you.
> > > the table needs to be rewriitten. One idea for that is to improve that
> > > command so that it skips the table rewrite if wal_level=minimal.
> > > Of course, also you can change wal_level after marking the table as
> > > unlogged.
> > 
> > tablecmd.c:
> The idea is really interesting.
> I didn't come up with getting rid of the whole copy of
> the ALTER TABLE UNLOGGED/LOGGED commands
> only when wal_level='minimal'.
> 
> > > * There are two reasons for requiring a rewrite when changing
> > > * persistence: on one hand, we need to ensure that the buffers
> > > * belonging to each of the two relations are marked with or without
> > > * BM_PERMANENT properly.  On the other hand, since rewriting creates
> > > * and assigns a new relfilenode, we automatically create or drop an
> > > * init fork for the relation as appropriate.
> Thanks for sharing concrete comments in the source code.
> 
> > According to this comment, perhaps we can do that at least for
> > wal_level=minimal.
> When I compare the 2 ideas,
> one of the benefits of this ALTER TABLE 's improvement 
> is that we can't avoid the downtime
> while that of wal_level='none' provides an easy and faster
> major version up via output file of pg_dumpall.

The speedup has already been achieved with higher durability by
wal_level=minimal in that case.  Or maybe you should consider using
pg_upgrade instead.  Even inducing the time to take a backup copy of
the whole cluster, running pg_upgrade would be far faster than
pg_dumpall then loading.

> Both ideas have good points.
> However, actually to modify ALTER TABLE's copy
> looks far more difficult than wal_level='none' and
> beyond my current ability.
> So, I'd like to go forward with the direction of wal_level='none'.
> Did you have strong objections for this direction ?

For fuel(?) of the discussion, I tried a very-quick PoC for in-place
ALTER TABLE SET LOGGED/UNLOGGED and resulted as attached. After some
trials of several ways, I drifted to the following way after poking
several ways.

1. Flip BM_PERMANENT of active buffers
2. adding/removing init fork
3. sync files,
4. Flip pg_class.relpersistence.

It always skips table copy in the SET UNLOGGED case, and only when
wal_level=minimal in the SET LOGGED case.  Crash recovery seems
working by some brief testing by hand.

Of course, I haven't performed intensive test on it.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index dcaea7135f..d8a1c2a7e7 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -613,6 +613,26 @@ heapam_relation_set_new_filenode(Relation rel,
     smgrclose(srel);
 }
 
+static void
+heapam_relation_set_persistence(Relation rel, char persistence)
+{
+    Assert(rel->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT ||
+           rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED);
+
+    Assert (rel->rd_rel->relpersistence != persistence);
+
+    if (persistence == RELPERSISTENCE_UNLOGGED)
+    {
+        Assert(rel->rd_rel->relkind == RELKIND_RELATION ||
+               rel->rd_rel->relkind == RELKIND_MATVIEW ||
+               rel->rd_rel->relkind == RELKIND_TOASTVALUE);
+        RelationCreateInitFork(rel->rd_node);
+    }
+    else
+        RelationDropInitFork(rel->rd_node);
+}
+
+
 static void
 heapam_relation_nontransactional_truncate(Relation rel)
 {
@@ -2540,6 +2560,7 @@ static const TableAmRoutine heapam_methods = {
     .compute_xid_horizon_for_tuples = heap_compute_xid_horizon_for_tuples,
 
     .relation_set_new_filenode = heapam_relation_set_new_filenode,
+    .relation_set_persistence = heapam_relation_set_persistence,
     .relation_nontransactional_truncate = heapam_relation_nontransactional_truncate,
     .relation_copy_data = heapam_relation_copy_data,
     .relation_copy_for_cluster = heapam_relation_copy_for_cluster,
diff --git a/src/backend/access/rmgrdesc/smgrdesc.c b/src/backend/access/rmgrdesc/smgrdesc.c
index a7c0cb1bc3..8397002613 100644
--- a/src/backend/access/rmgrdesc/smgrdesc.c
+++ b/src/backend/access/rmgrdesc/smgrdesc.c
@@ -40,6 +40,14 @@ smgr_desc(StringInfo buf, XLogReaderState *record)
                          xlrec->blkno, xlrec->flags);
         pfree(path);
     }
+    else if (info == XLOG_SMGR_UNLINK)
+    {
+        xl_smgr_unlink *xlrec = (xl_smgr_unlink *) rec;
+        char       *path = relpathperm(xlrec->rnode, xlrec->forkNum);
+
+        appendStringInfoString(buf, path);
+        pfree(path);
+    }
 }
 
 const char *
@@ -55,6 +63,9 @@ smgr_identify(uint8 info)
         case XLOG_SMGR_TRUNCATE:
             id = "TRUNCATE";
             break;
+        case XLOG_SMGR_UNLINK:
+            id = "UNLINK";
+            break;
     }
 
     return id;
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index dbbd3aa31f..7f695e6d8b 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -60,6 +60,8 @@ int            wal_skip_threshold = 2048;    /* in kilobytes */
 typedef struct PendingRelDelete
 {
     RelFileNode relnode;        /* relation that may need to be deleted */
+    bool        deleteinitfork;    /* delete only init fork if true */
+    bool        createinitfork;    /* create init fork if true */
     BackendId    backend;        /* InvalidBackendId if not a temp rel */
     bool        atCommit;        /* T=delete at commit; F=delete at abort */
     int            nestLevel;        /* xact nesting level of request */
@@ -153,6 +155,8 @@ RelationCreateStorage(RelFileNode rnode, char relpersistence)
     pending = (PendingRelDelete *)
         MemoryContextAlloc(TopMemoryContext, sizeof(PendingRelDelete));
     pending->relnode = rnode;
+    pending->deleteinitfork = false;
+    pending->createinitfork = false;
     pending->backend = backend;
     pending->atCommit = false;    /* delete if abort */
     pending->nestLevel = GetCurrentTransactionNestLevel();
@@ -168,6 +172,53 @@ RelationCreateStorage(RelFileNode rnode, char relpersistence)
     return srel;
 }
 
+void
+RelationCreateInitFork(RelFileNode rnode)
+{
+    PendingRelDelete *pending;
+    SMgrRelation srel;
+
+    srel = smgropen(rnode, InvalidBackendId);
+    smgrcreate(srel, INIT_FORKNUM, false);
+    log_smgrcreate(&rnode, INIT_FORKNUM);
+    smgrimmedsync(srel, INIT_FORKNUM);
+
+    /* Add the relation to the list of stuff to delete at abort */
+    pending = (PendingRelDelete *)
+        MemoryContextAlloc(TopMemoryContext, sizeof(PendingRelDelete));
+    pending->relnode = rnode;
+    pending->deleteinitfork = true;
+    pending->createinitfork = false;
+    pending->backend = InvalidBackendId;
+    pending->atCommit = false;    /* delete if abort */
+    pending->nestLevel = GetCurrentTransactionNestLevel();
+    pending->next = pendingDeletes;
+    pendingDeletes = pending;
+}
+
+void
+RelationDropInitFork(RelFileNode rnode)
+{
+    PendingRelDelete *pending;
+    SMgrRelation srel;
+
+    srel = smgropen(rnode, InvalidBackendId);
+    smgrunlink(srel, INIT_FORKNUM, false);
+    log_smgrunlink(&rnode, INIT_FORKNUM);
+
+    /* Add the relation to the list of stuff to delete at abort */
+    pending = (PendingRelDelete *)
+        MemoryContextAlloc(TopMemoryContext, sizeof(PendingRelDelete));
+    pending->relnode = rnode;
+    pending->deleteinitfork = false;
+    pending->createinitfork = true;
+    pending->backend = InvalidBackendId;
+    pending->atCommit = false;    /* create if abort */
+    pending->nestLevel = GetCurrentTransactionNestLevel();
+    pending->next = pendingDeletes;
+    pendingDeletes = pending;
+}
+
 /*
  * Perform XLogInsert of an XLOG_SMGR_CREATE record to WAL.
  */
@@ -187,6 +238,25 @@ log_smgrcreate(const RelFileNode *rnode, ForkNumber forkNum)
     XLogInsert(RM_SMGR_ID, XLOG_SMGR_CREATE | XLR_SPECIAL_REL_UPDATE);
 }
 
+/*
+ * Perform XLogInsert of an XLOG_SMGR_UNLINK record to WAL.
+ */
+void
+log_smgrunlink(const RelFileNode *rnode, ForkNumber forkNum)
+{
+    xl_smgr_unlink xlrec;
+
+    /*
+     * Make an XLOG entry reporting the file unlink.
+     */
+    xlrec.rnode = *rnode;
+    xlrec.forkNum = forkNum;
+
+    XLogBeginInsert();
+    XLogRegisterData((char *) &xlrec, sizeof(xlrec));
+    XLogInsert(RM_SMGR_ID, XLOG_SMGR_UNLINK | XLR_SPECIAL_REL_UPDATE);
+}
+
 /*
  * RelationDropStorage
  *        Schedule unlinking of physical storage at transaction commit.
@@ -200,6 +270,8 @@ RelationDropStorage(Relation rel)
     pending = (PendingRelDelete *)
         MemoryContextAlloc(TopMemoryContext, sizeof(PendingRelDelete));
     pending->relnode = rel->rd_node;
+    pending->createinitfork = false;
+    pending->deleteinitfork = false;
     pending->backend = rel->rd_backend;
     pending->atCommit = true;    /* delete if commit */
     pending->nestLevel = GetCurrentTransactionNestLevel();
@@ -625,19 +697,34 @@ smgrDoPendingDeletes(bool isCommit)
 
                 srel = smgropen(pending->relnode, pending->backend);
 
-                /* allocate the initial array, or extend it, if needed */
-                if (maxrels == 0)
+                /* XXXX */
+                if (pending->createinitfork)
                 {
-                    maxrels = 8;
-                    srels = palloc(sizeof(SMgrRelation) * maxrels);
+                    smgrcreate(srel, INIT_FORKNUM, false);
+                    log_smgrcreate(&pending->relnode, INIT_FORKNUM);
+                    smgrimmedsync(srel, INIT_FORKNUM);
                 }
-                else if (maxrels <= nrels)
+                else if (pending->deleteinitfork)
                 {
-                    maxrels *= 2;
-                    srels = repalloc(srels, sizeof(SMgrRelation) * maxrels);
+                    smgrunlink(srel, INIT_FORKNUM, false);
+                    log_smgrunlink(&pending->relnode, INIT_FORKNUM);
                 }
+                else
+                {
+                    /* allocate the initial array, or extend it, if needed */
+                    if (maxrels == 0)
+                    {
+                        maxrels = 8;
+                        srels = palloc(sizeof(SMgrRelation) * maxrels);
+                    }
+                    else if (maxrels <= nrels)
+                    {
+                        maxrels *= 2;
+                        srels = repalloc(srels, sizeof(SMgrRelation) * maxrels);
+                    }
 
-                srels[nrels++] = srel;
+                    srels[nrels++] = srel;
+                }
             }
             /* must explicitly free the list entry */
             pfree(pending);
@@ -916,6 +1003,14 @@ smgr_redo(XLogReaderState *record)
         reln = smgropen(xlrec->rnode, InvalidBackendId);
         smgrcreate(reln, xlrec->forkNum, true);
     }
+    else if (info == XLOG_SMGR_UNLINK)
+    {
+        xl_smgr_unlink *xlrec = (xl_smgr_unlink *) XLogRecGetData(record);
+        SMgrRelation reln;
+
+        reln = smgropen(xlrec->rnode, InvalidBackendId);
+        smgrunlink(reln, xlrec->forkNum, true);
+    }
     else if (info == XLOG_SMGR_TRUNCATE)
     {
         xl_smgr_truncate *xlrec = (xl_smgr_truncate *) XLogRecGetData(record);
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index e0ac4e05e5..fdcdb1dbb1 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -4897,6 +4897,89 @@ ATParseTransformCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
     return newcmd;
 }
 
+static bool
+try_inplace_persistence_change(AlteredTableInfo *tab, char persistence,
+                               LOCKMODE lockmode)
+{
+    Relation     rel;
+    Relation    classRel;
+    HeapTuple    tuple,
+                newtuple;
+    Datum        new_val[Natts_pg_class];
+    bool        new_null[Natts_pg_class],
+                new_repl[Natts_pg_class];
+    int            i;
+
+    Assert(tab->rewrite == AT_REWRITE_ALTER_PERSISTENCE);
+    Assert(lockmode == AccessExclusiveLock);
+
+    /*
+     * Under the following condition, we need to call ATRewriteTable, but
+     * cannot be false in the AT_REWRITE_ALTER_PERSISTENCE case.
+     */
+    Assert(tab->constraints == NULL && tab->partition_constraint == NULL &&
+           tab->newvals == NULL && !tab->verify_new_notnull);
+
+    /*
+     * In the case of SET ULOGGED, the resulting relation does not need WALs,
+     * so the storage can be used as-is. In the opposite case, the table is
+     * required to be restored from WAL, so rewrite the relation.  However,
+     * when wal_level = minimal, we can omit WALs by immediately syncing the
+     * storage.
+     */
+    if (tab->newrelpersistence == RELPERSISTENCE_PERMANENT || XLogIsNeeded())
+        return false;
+
+    rel = table_open(tab->relid, lockmode);
+
+    Assert(rel->rd_rel->relpersistence != persistence);
+
+    elog(DEBUG1, "perform im-place persistnce change");
+
+    RelationOpenSmgr(rel);
+    SetRelFileNodeBuffersPersistence(rel->rd_smgr->smgr_rnode,
+                                     persistence == RELPERSISTENCE_PERMANENT);
+    table_relation_set_persistence(rel, persistence);
+
+    /*
+     * This relation is now WAL-logged. Sync all files immediately to establish
+     * the initial state.
+     */
+    if (persistence == RELPERSISTENCE_PERMANENT)
+    {
+        for (i = 0 ; i < MAX_FORKNUM ; i++)
+        {
+            if (smgrexists(rel->rd_smgr, i))
+                smgrimmedsync(rel->rd_smgr, i);
+        }
+    }
+
+    table_close(rel, lockmode);
+    classRel = table_open(RelationRelationId, RowExclusiveLock);
+    tuple = SearchSysCacheCopy1(RELOID,
+                                ObjectIdGetDatum(RelationGetRelid(rel)));
+    if (!HeapTupleIsValid(tuple))
+        elog(ERROR, "cache lookup failed for relation %u",
+             RelationGetRelid(rel));
+
+    memset(new_val, 0, sizeof(new_val));
+    memset(new_null, false, sizeof(new_null));
+    memset(new_repl, false, sizeof(new_repl));
+
+    new_val[Anum_pg_class_relpersistence - 1] = CharGetDatum(persistence);
+    new_null[Anum_pg_class_relpersistence - 1] = false;
+    new_repl[Anum_pg_class_relpersistence - 1] = true;
+
+    newtuple = heap_modify_tuple(tuple, RelationGetDescr(classRel),
+                                 new_val, new_null, new_repl);
+
+    CatalogTupleUpdate(classRel, &newtuple->t_self, newtuple);
+    heap_freetuple(newtuple);
+    table_close(classRel, RowExclusiveLock);
+
+    return true;
+}
+
 /*
  * ATRewriteTables: ALTER TABLE phase 3
  */
@@ -5017,45 +5100,51 @@ ATRewriteTables(AlterTableStmt *parsetree, List **wqueue, LOCKMODE lockmode,
                                          tab->relid,
                                          tab->rewrite);
 
-            /*
-             * Create transient table that will receive the modified data.
-             *
-             * Ensure it is marked correctly as logged or unlogged.  We have
-             * to do this here so that buffers for the new relfilenode will
-             * have the right persistence set, and at the same time ensure
-             * that the original filenode's buffers will get read in with the
-             * correct setting (i.e. the original one).  Otherwise a rollback
-             * after the rewrite would possibly result with buffers for the
-             * original filenode having the wrong persistence setting.
-             *
-             * NB: This relies on swap_relation_files() also swapping the
-             * persistence. That wouldn't work for pg_class, but that can't be
-             * unlogged anyway.
-             */
-            OIDNewHeap = make_new_heap(tab->relid, NewTableSpace, persistence,
-                                       lockmode);
+            if (tab->rewrite != AT_REWRITE_ALTER_PERSISTENCE ||
+                !try_inplace_persistence_change(tab, persistence, lockmode))
+            {
+                /*
+                 * Create transient table that will receive the modified data.
+                 *
+                 * Ensure it is marked correctly as logged or unlogged.  We
+                 * have to do this here so that buffers for the new relfilenode
+                 * will have the right persistence set, and at the same time
+                 * ensure that the original filenode's buffers will get read in
+                 * with the correct setting (i.e. the original one).  Otherwise
+                 * a rollback after the rewrite would possibly result with
+                 * buffers for the original filenode having the wrong
+                 * persistence setting.
+                 *
+                 * NB: This relies on swap_relation_files() also swapping the
+                 * persistence. That wouldn't work for pg_class, but that can't
+                 * be unlogged anyway.
+                 */
+                OIDNewHeap = make_new_heap(tab->relid, NewTableSpace, persistence,
+                                           lockmode);
 
-            /*
-             * Copy the heap data into the new table with the desired
-             * modifications, and test the current data within the table
-             * against new constraints generated by ALTER TABLE commands.
-             */
-            ATRewriteTable(tab, OIDNewHeap, lockmode);
+                /*
+                 * Copy the heap data into the new table with the desired
+                 * modifications, and test the current data within the table
+                 * against new constraints generated by ALTER TABLE commands.
+                 */
+                ATRewriteTable(tab, OIDNewHeap, lockmode);
 
-            /*
-             * Swap the physical files of the old and new heaps, then rebuild
-             * indexes and discard the old heap.  We can use RecentXmin for
-             * the table's new relfrozenxid because we rewrote all the tuples
-             * in ATRewriteTable, so no older Xid remains in the table.  Also,
-             * we never try to swap toast tables by content, since we have no
-             * interest in letting this code work on system catalogs.
-             */
-            finish_heap_swap(tab->relid, OIDNewHeap,
-                             false, false, true,
-                             !OidIsValid(tab->newTableSpace),
-                             RecentXmin,
-                             ReadNextMultiXactId(),
-                             persistence);
+                /*
+                 * Swap the physical files of the old and new heaps, then
+                 * rebuild indexes and discard the old heap.  We can use
+                 * RecentXmin for the table's new relfrozenxid because we
+                 * rewrote all the tuples in ATRewriteTable, so no older Xid
+                 * remains in the table.  Also, we never try to swap toast
+                 * tables by content, since we have no interest in letting this
+                 * code work on system catalogs.
+                 */
+                finish_heap_swap(tab->relid, OIDNewHeap,
+                                 false, false, true,
+                                 !OidIsValid(tab->newTableSpace),
+                                 RecentXmin,
+                                 ReadNextMultiXactId(),
+                                 persistence);
+            }
         }
         else
         {
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index e549fa1d30..7fd6cae12f 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -3031,6 +3031,40 @@ DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber *forkNum,
     }
 }
 
+void
+SetRelFileNodeBuffersPersistence(RelFileNodeBackend rnode, bool permanent)
+{
+    int            i;
+
+    Assert (!RelFileNodeBackendIsTemp(rnode));
+
+    for (i = 0; i < NBuffers; i++)
+    {
+        BufferDesc *bufHdr = GetBufferDescriptor(i);
+        uint32        buf_state;
+
+        if (!RelFileNodeEquals(bufHdr->tag.rnode, rnode.node))
+            continue;
+
+        buf_state = LockBufHdr(bufHdr);
+
+        if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node))
+        {
+            if (permanent)
+            {
+                Assert ((buf_state &  BM_PERMANENT) == 0);
+                buf_state |= BM_PERMANENT;
+            }
+            else
+            {
+                Assert ((buf_state &  BM_PERMANENT) != 0);
+                buf_state &= ~BM_PERMANENT;
+            }
+        }
+        UnlockBufHdr(bufHdr, buf_state);
+    }
+}
+
 /* ---------------------------------------------------------------------
  *        DropRelFileNodesAllBuffers
  *
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index dcc09df0c7..5eb9e97b3d 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -645,6 +645,12 @@ smgrimmedsync(SMgrRelation reln, ForkNumber forknum)
     smgrsw[reln->smgr_which].smgr_immedsync(reln, forknum);
 }
 
+void
+smgrunlink(SMgrRelation reln, ForkNumber forknum, bool isRedo)
+{
+    smgrsw[reln->smgr_which].smgr_unlink(reln->smgr_rnode, forknum, isRedo);
+}
+
 /*
  * AtEOXact_SMgr
  *
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 387eb34a61..245a8f0fdd 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -451,6 +451,8 @@ typedef struct TableAmRoutine
                                               TransactionId *freezeXid,
                                               MultiXactId *minmulti);
 
+    void        (*relation_set_persistence) (Relation rel, char persistence);
+
     /*
      * This callback needs to remove all contents from `rel`'s current
      * relfilenode. No provisions for transactional behaviour need to be made.
@@ -1404,6 +1406,12 @@ table_relation_set_new_filenode(Relation rel,
                                                freezeXid, minmulti);
 }
 
+static inline void
+table_relation_set_persistence(Relation rel, char persistence)
+{
+    rel->rd_tableam->relation_set_persistence(rel, persistence);
+}
+
 /*
  * Remove all table contents from `rel`, in a non-transactional manner.
  * Non-transactional meaning that there's no need to support rollbacks. This
diff --git a/src/include/catalog/storage.h b/src/include/catalog/storage.h
index 30c38e0ca6..f4fd61f639 100644
--- a/src/include/catalog/storage.h
+++ b/src/include/catalog/storage.h
@@ -23,6 +23,8 @@
 extern int    wal_skip_threshold;
 
 extern SMgrRelation RelationCreateStorage(RelFileNode rnode, char relpersistence);
+extern void RelationCreateInitFork(RelFileNode rel);
+extern void RelationDropInitFork(RelFileNode rel);
 extern void RelationDropStorage(Relation rel);
 extern void RelationPreserveStorage(RelFileNode rnode, bool atCommit);
 extern void RelationPreTruncate(Relation rel);
diff --git a/src/include/catalog/storage_xlog.h b/src/include/catalog/storage_xlog.h
index 7b21cab2e0..73ad2ae89e 100644
--- a/src/include/catalog/storage_xlog.h
+++ b/src/include/catalog/storage_xlog.h
@@ -29,6 +29,7 @@
 /* XLOG gives us high 4 bits */
 #define XLOG_SMGR_CREATE    0x10
 #define XLOG_SMGR_TRUNCATE    0x20
+#define XLOG_SMGR_UNLINK    0x30
 
 typedef struct xl_smgr_create
 {
@@ -36,6 +37,12 @@ typedef struct xl_smgr_create
     ForkNumber    forkNum;
 } xl_smgr_create;
 
+typedef struct xl_smgr_unlink
+{
+    RelFileNode rnode;
+    ForkNumber    forkNum;
+} xl_smgr_unlink;
+
 /* flags for xl_smgr_truncate */
 #define SMGR_TRUNCATE_HEAP        0x0001
 #define SMGR_TRUNCATE_VM        0x0002
@@ -51,6 +58,7 @@ typedef struct xl_smgr_truncate
 } xl_smgr_truncate;
 
 extern void log_smgrcreate(const RelFileNode *rnode, ForkNumber forkNum);
+extern void log_smgrunlink(const RelFileNode *rnode, ForkNumber forkNum);
 
 extern void smgr_redo(XLogReaderState *record);
 extern void smgr_desc(StringInfo buf, XLogReaderState *record);
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index ee91b8fa26..98967eeff9 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -205,6 +205,8 @@ extern void FlushRelationsAllBuffers(struct SMgrRelationData **smgrs, int nrels)
 extern void FlushDatabaseBuffers(Oid dbid);
 extern void DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber *forkNum,
                                    int nforks, BlockNumber *firstDelBlock);
+extern void SetRelFileNodeBuffersPersistence(RelFileNodeBackend rnode,
+                                             bool permanent);
 extern void DropRelFileNodesAllBuffers(RelFileNodeBackend *rnodes, int nnodes);
 extern void DropDatabaseBuffers(Oid dbid);
 
diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h
index f28a842401..5d74631006 100644
--- a/src/include/storage/smgr.h
+++ b/src/include/storage/smgr.h
@@ -86,6 +86,7 @@ extern void smgrclose(SMgrRelation reln);
 extern void smgrcloseall(void);
 extern void smgrclosenode(RelFileNodeBackend rnode);
 extern void smgrcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo);
+extern void smgrunlink(SMgrRelation reln, ForkNumber forknum, bool isRedo);
 extern void smgrdosyncall(SMgrRelation *rels, int nrels);
 extern void smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo);
 extern void smgrextend(SMgrRelation reln, ForkNumber forknum,

pgsql-hackers by date:

Previous
From: "tsunakawa.takay@fujitsu.com"
Date:
Subject: RE: New statistics for tuning WAL buffer size
Next
From: James Coleman
Date:
Subject: Re: enable_incremental_sort changes query behavior