Thread: [PATCH] xlogreader-v4

[PATCH] xlogreader-v4

From

Andres Freund

Date:

08 January 2013, 19:10:39

From: Andres Freund <andres@2ndquadrant.com>
Subject: [PATCH] xlogreader-v4
In-Reply-To: 

Hi,

this is the latest and obviously best version of xlogreader & xlogdump with
changes both from Heikki and me.

Changes:
* windows build support for pg_xlogdump
* xlogdump moved to contrib
* xlogdump option parsing enhancements
* generic cleanups
* a few more comments
* removal of some ugliness in XLogFindNextRecord

I think its mostly ready, for xlogdump minimally these two issues remain:

const char *
timestamptz_to_str(TimestampTz dt)
{       return "unimplemented-timestamp";
}

const char *
relpathbackend(RelFileNode rnode, BackendId backend, ForkNumber forknum)
{       return "unimplemented-relpathbackend";
}

aren't exactly the nicest wrapper functions. I think its ok to simply copy
relpathbackend from the backend, but timestamptz_to_str? Thats a heck of a lot
of code.

Patches 1 and 2 and 5 are just preparatory and probably can be applied
beforehand.

3 and 4 are the real meat of this and especially 3 needs some careful review.

Input welcome!

Andres

[PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it

From

Andres Freund

Date:

08 January 2013, 19:10:46

From: Andres Freund <andres@anarazel.de>

relpathbackend() (via some of its wrappers) is used in *_desc routines which we
want to be useable without a backend environment arround.

Change signature to return a 'const char *' to make misuse easier to
detect. That necessicates also changing the 'FileName' typedef to 'const char
*' which seems to be a good idea anyway.
---src/backend/access/rmgrdesc/smgrdesc.c |  6 ++---src/backend/access/rmgrdesc/xactdesc.c |  6
++---src/backend/access/transam/xlogutils.c|  9 +++----src/backend/catalog/catalog.c          | 49
+++++++++++-----------------------src/backend/storage/buffer/bufmgr.c   | 12 +++------src/backend/storage/file/fd.c
    |  6 ++---src/backend/storage/smgr/md.c          | 23 +++++-----------src/backend/utils/adt/dbsize.c         |  4
+--src/include/catalog/catalog.h         |  2 +-src/include/storage/fd.h               |  9 +++----10 files changed, 42
insertions(+),84 deletions(-)
 

diff --git a/src/backend/access/rmgrdesc/smgrdesc.c b/src/backend/access/rmgrdesc/smgrdesc.c
index bcabf89..490c8c7 100644
--- a/src/backend/access/rmgrdesc/smgrdesc.c
+++ b/src/backend/access/rmgrdesc/smgrdesc.c
@@ -26,19 +26,17 @@ smgr_desc(StringInfo buf, uint8 xl_info, char *rec)    if (info == XLOG_SMGR_CREATE)    {
xl_smgr_create*xlrec = (xl_smgr_create *) rec;
 
-        char       *path = relpathperm(xlrec->rnode, xlrec->forkNum);
+        const char *path = relpathperm(xlrec->rnode, xlrec->forkNum);        appendStringInfo(buf, "file create: %s",
path);
-        pfree(path);    }    else if (info == XLOG_SMGR_TRUNCATE)    {        xl_smgr_truncate *xlrec =
(xl_smgr_truncate*) rec;
 
-        char       *path = relpathperm(xlrec->rnode, MAIN_FORKNUM);
+        const char *path = relpathperm(xlrec->rnode, MAIN_FORKNUM);        appendStringInfo(buf, "file truncate: %s to
%ublocks", path,                         xlrec->blkno);
 
-        pfree(path);    }    else        appendStringInfo(buf, "UNKNOWN");
diff --git a/src/backend/access/rmgrdesc/xactdesc.c b/src/backend/access/rmgrdesc/xactdesc.c
index 2471279..b86a53e 100644
--- a/src/backend/access/rmgrdesc/xactdesc.c
+++ b/src/backend/access/rmgrdesc/xactdesc.c
@@ -35,10 +35,9 @@ xact_desc_commit(StringInfo buf, xl_xact_commit *xlrec)        appendStringInfo(buf, "; rels:");
  for (i = 0; i < xlrec->nrels; i++)        {
 
-            char       *path = relpathperm(xlrec->xnodes[i], MAIN_FORKNUM);
+            const char *path = relpathperm(xlrec->xnodes[i], MAIN_FORKNUM);            appendStringInfo(buf, " %s",
path);
-            pfree(path);        }    }    if (xlrec->nsubxacts > 0)
@@ -105,10 +104,9 @@ xact_desc_abort(StringInfo buf, xl_xact_abort *xlrec)        appendStringInfo(buf, "; rels:");
  for (i = 0; i < xlrec->nrels; i++)        {
 
-            char       *path = relpathperm(xlrec->xnodes[i], MAIN_FORKNUM);
+            const char *path = relpathperm(xlrec->xnodes[i], MAIN_FORKNUM);            appendStringInfo(buf, " %s",
path);
-            pfree(path);        }    }    if (xlrec->nsubxacts > 0)
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index f9a6e62..8266f3c 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -57,7 +57,7 @@ static voidreport_invalid_page(int elevel, RelFileNode node, ForkNumber forkno,
BlockNumberblkno, bool present){
 
-    char       *path = relpathperm(node, forkno);
+    const char *path = relpathperm(node, forkno);    if (present)        elog(elevel, "page %u of relation %s is
uninitialized",
@@ -65,7 +65,6 @@ report_invalid_page(int elevel, RelFileNode node, ForkNumber forkno,    else        elog(elevel,
"page%u of relation %s does not exist",             blkno, path);
 
-    pfree(path);}/* Log a reference to an invalid page */
@@ -153,11 +152,10 @@ forget_invalid_pages(RelFileNode node, ForkNumber forkno, BlockNumber minblkno)        {
 if (log_min_messages <= DEBUG2 || client_min_messages <= DEBUG2)            {
 
-                char       *path = relpathperm(hentry->key.node, forkno);
+                const char *path = relpathperm(hentry->key.node, forkno);                elog(DEBUG2, "page %u of
relation%s has been dropped",                     hentry->key.blkno, path);
 
-                pfree(path);            }            if (hash_search(invalid_page_tab,
@@ -186,11 +184,10 @@ forget_invalid_pages_db(Oid dbid)        {            if (log_min_messages <= DEBUG2 ||
client_min_messages<= DEBUG2)            {
 
-                char       *path = relpathperm(hentry->key.node, hentry->key.forkno);
+                const char *path = relpathperm(hentry->key.node, hentry->key.forkno);                elog(DEBUG2,
"page%u of relation %s has been dropped",                     hentry->key.blkno, path);
 
-                pfree(path);            }            if (hash_search(invalid_page_tab,
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 9686486..6455ef0 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -112,54 +112,45 @@ forkname_chars(const char *str, ForkNumber *fork)/* * relpathbackend - construct path to a
relation'sfile *
 
- * Result is a palloc'd string.
+ * Result is a pointer to a statically allocated string. */
-char *
+const char *relpathbackend(RelFileNode rnode, BackendId backend, ForkNumber forknum){
-    int            pathlen;
-    char       *path;
+    static char path[MAXPGPATH];    if (rnode.spcNode == GLOBALTABLESPACE_OID)    {        /* Shared system relations
livein {datadir}/global */        Assert(rnode.dbNode == 0);        Assert(backend == InvalidBackendId);
 
-        pathlen = 7 + OIDCHARS + 1 + FORKNAMECHARS + 1;
-        path = (char *) palloc(pathlen);        if (forknum != MAIN_FORKNUM)
-            snprintf(path, pathlen, "global/%u_%s",
+            snprintf(path, MAXPGPATH, "global/%u_%s",                     rnode.relNode, forkNames[forknum]);
else
-            snprintf(path, pathlen, "global/%u", rnode.relNode);
+            snprintf(path, MAXPGPATH, "global/%u", rnode.relNode);    }    else if (rnode.spcNode ==
DEFAULTTABLESPACE_OID)   {        /* The default tablespace is {datadir}/base */        if (backend ==
InvalidBackendId)       {
 
-            pathlen = 5 + OIDCHARS + 1 + OIDCHARS + 1 + FORKNAMECHARS + 1;
-            path = (char *) palloc(pathlen);            if (forknum != MAIN_FORKNUM)
-                snprintf(path, pathlen, "base/%u/%u_%s",
+                snprintf(path, MAXPGPATH, "base/%u/%u_%s",                         rnode.dbNode, rnode.relNode,
                forkNames[forknum]);            else
 
-                snprintf(path, pathlen, "base/%u/%u",
+                snprintf(path, MAXPGPATH, "base/%u/%u",                         rnode.dbNode, rnode.relNode);        }
      else        {
 
-            /* OIDCHARS will suffice for an integer, too */
-            pathlen = 5 + OIDCHARS + 2 + OIDCHARS + 1 + OIDCHARS + 1
-                + FORKNAMECHARS + 1;
-            path = (char *) palloc(pathlen);            if (forknum != MAIN_FORKNUM)
-                snprintf(path, pathlen, "base/%u/t%d_%u_%s",
+                snprintf(path, MAXPGPATH, "base/%u/t%d_%u_%s",                         rnode.dbNode, backend,
rnode.relNode,                        forkNames[forknum]);            else
 
-                snprintf(path, pathlen, "base/%u/t%d_%u",
+                snprintf(path, MAXPGPATH, "base/%u/t%d_%u",                         rnode.dbNode, backend,
rnode.relNode);       }    }
 
@@ -168,38 +159,30 @@ relpathbackend(RelFileNode rnode, BackendId backend, ForkNumber forknum)        /* All other
tablespacesare accessed via symlinks */        if (backend == InvalidBackendId)        {
 
-            pathlen = 9 + 1 + OIDCHARS + 1
-                + strlen(TABLESPACE_VERSION_DIRECTORY) + 1 + OIDCHARS + 1
-                + OIDCHARS + 1 + FORKNAMECHARS + 1;
-            path = (char *) palloc(pathlen);            if (forknum != MAIN_FORKNUM)
-                snprintf(path, pathlen, "pg_tblspc/%u/%s/%u/%u_%s",
+                snprintf(path, MAXPGPATH, "pg_tblspc/%u/%s/%u/%u_%s",                         rnode.spcNode,
TABLESPACE_VERSION_DIRECTORY,                        rnode.dbNode, rnode.relNode,
forkNames[forknum]);           else
 
-                snprintf(path, pathlen, "pg_tblspc/%u/%s/%u/%u",
+                snprintf(path, MAXPGPATH, "pg_tblspc/%u/%s/%u/%u",                         rnode.spcNode,
TABLESPACE_VERSION_DIRECTORY,                        rnode.dbNode, rnode.relNode);        }        else        {
 
-            /* OIDCHARS will suffice for an integer, too */
-            pathlen = 9 + 1 + OIDCHARS + 1
-                + strlen(TABLESPACE_VERSION_DIRECTORY) + 1 + OIDCHARS + 2
-                + OIDCHARS + 1 + OIDCHARS + 1 + FORKNAMECHARS + 1;
-            path = (char *) palloc(pathlen);            if (forknum != MAIN_FORKNUM)
-                snprintf(path, pathlen, "pg_tblspc/%u/%s/%u/t%d_%u_%s",
+                snprintf(path, MAXPGPATH, "pg_tblspc/%u/%s/%u/t%d_%u_%s",                         rnode.spcNode,
TABLESPACE_VERSION_DIRECTORY,                        rnode.dbNode, backend, rnode.relNode,
forkNames[forknum]);           else
 
-                snprintf(path, pathlen, "pg_tblspc/%u/%s/%u/t%d_%u",
+                snprintf(path, MAXPGPATH, "pg_tblspc/%u/%s/%u/t%d_%u",                         rnode.spcNode,
TABLESPACE_VERSION_DIRECTORY,                        rnode.dbNode, backend, rnode.relNode);        }    }
 
+    return path;}
@@ -534,7 +517,7 @@ OidGetNewRelFileNode(Oid reltablespace, Relation pg_class, char relpersistence){
RelFileNodeBackendrnode;
 
-    char       *rpath;
+    const char *rpath;    int            fd;    bool        collides;    BackendId    backend;
@@ -599,8 +582,6 @@ GetNewRelFileNode(Oid reltablespace, Relation pg_class, char relpersistence)             */
  collides = false;        }
 
-
-        pfree(rpath);    } while (collides);    return rnode.node.relNode;
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 03ed41d..6c2620d 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1757,7 +1757,7 @@ PrintBufferLeakWarning(Buffer buffer){    volatile BufferDesc *buf;    int32        loccount;
-    char       *path;
+    const char *path;    BackendId    backend;    Assert(BufferIsValid(buffer));
@@ -1782,7 +1782,6 @@ PrintBufferLeakWarning(Buffer buffer)         buffer, path,         buf->tag.blockNum,
buf->flags,        buf->refcount, loccount);
 
-    pfree(path);}/*
@@ -2901,7 +2900,7 @@ AbortBufferIO(void)            if (sv_flags & BM_IO_ERROR)            {                /* Buffer
ispinned, so we can read tag without spinlock */
 
-                char       *path;
+                const char *path;                path = relpathperm(buf->tag.rnode, buf->tag.forkNum);
ereport(WARNING,
@@ -2909,7 +2908,6 @@ AbortBufferIO(void)                         errmsg("could not write block %u of %s",
                 buf->tag.blockNum, path),                         errdetail("Multiple failures --- write error might
bepermanent.")));
 
-                pfree(path);            }        }        TerminateBufferIO(buf, false, BM_IO_ERROR);
@@ -2927,11 +2925,10 @@ shared_buffer_write_error_callback(void *arg)    /* Buffer is pinned, so we can read the tag
withoutlocking the spinlock */    if (bufHdr != NULL)    {
 
-        char       *path = relpathperm(bufHdr->tag.rnode, bufHdr->tag.forkNum);
+        const char *path = relpathperm(bufHdr->tag.rnode, bufHdr->tag.forkNum);        errcontext("writing block %u of
relation%s",                   bufHdr->tag.blockNum, path);
 
-        pfree(path);    }}
@@ -2945,11 +2942,10 @@ local_buffer_write_error_callback(void *arg)    if (bufHdr != NULL)    {
-        char       *path = relpathbackend(bufHdr->tag.rnode, MyBackendId,
+        const char *path = relpathbackend(bufHdr->tag.rnode, MyBackendId,
bufHdr->tag.forkNum);       errcontext("writing block %u of relation %s",                   bufHdr->tag.blockNum,
path);
-        pfree(path);    }}
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index c026731..64d2a1e 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -556,7 +556,7 @@ set_max_safe_fds(void) * this module wouldn't have any open files to close at that point anyway.
*/int
-BasicOpenFile(FileName fileName, int fileFlags, int fileMode)
+BasicOpenFile(const char *fileName, int fileFlags, int fileMode){    int            fd;
@@ -878,7 +878,7 @@ FileInvalidate(File file) * (which should always be $PGDATA when this code is running). */File
-PathNameOpenFile(FileName fileName, int fileFlags, int fileMode)
+PathNameOpenFile(const char *fileName, int fileFlags, int fileMode){    char       *fnamecopy;    File        file;
@@ -1548,7 +1548,7 @@ TryAgain: * Like AllocateFile, but returns an unbuffered fd like open(2) */int
-OpenTransientFile(FileName fileName, int fileFlags, int fileMode)
+OpenTransientFile(const char *fileName, int fileFlags, int fileMode){    int            fd;
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 20eb36a..4ef47b1 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -264,7 +264,7 @@ mdexists(SMgrRelation reln, ForkNumber forkNum)voidmdcreate(SMgrRelation reln, ForkNumber forkNum,
boolisRedo){
 
-    char       *path;
+    const char *path;    File        fd;    if (isRedo && reln->md_fd[forkNum] != NULL)
@@ -298,8 +298,6 @@ mdcreate(SMgrRelation reln, ForkNumber forkNum, bool isRedo)        }    }
-    pfree(path);
-    reln->md_fd[forkNum] = _fdvec_alloc();    reln->md_fd[forkNum]->mdfd_vfd = fd;
@@ -380,7 +378,7 @@ mdunlink(RelFileNodeBackend rnode, ForkNumber forkNum, bool isRedo)static
voidmdunlinkfork(RelFileNodeBackendrnode, ForkNumber forkNum, bool isRedo){
 
-    char       *path;
+    const char *path;    int            ret;    path = relpath(rnode, forkNum);
@@ -449,8 +447,6 @@ mdunlinkfork(RelFileNodeBackend rnode, ForkNumber forkNum, bool isRedo)        }
pfree(segpath);   }
 
-
-    pfree(path);}/*
@@ -545,7 +541,7 @@ static MdfdVec *mdopen(SMgrRelation reln, ForkNumber forknum, ExtensionBehavior behavior){
MdfdVec   *mdfd;
 
-    char       *path;
+    const char *path;    File        fd;    /* No work if already open */
@@ -571,7 +567,6 @@ mdopen(SMgrRelation reln, ForkNumber forknum, ExtensionBehavior behavior)            if (behavior
==EXTENSION_RETURN_NULL &&                FILE_POSSIBLY_DELETED(errno))            {
 
-                pfree(path);                return NULL;            }            ereport(ERROR,
@@ -580,8 +575,6 @@ mdopen(SMgrRelation reln, ForkNumber forknum, ExtensionBehavior behavior)        }    }
-    pfree(path);
-    reln->md_fd[forknum] = mdfd = _fdvec_alloc();    mdfd->mdfd_vfd = fd;
@@ -1279,7 +1272,7 @@ mdpostckpt(void)    while (pendingUnlinks != NIL)    {        PendingUnlinkEntry *entry =
(PendingUnlinkEntry*) linitial(pendingUnlinks);
 
-        char       *path;
+        const char *path;        /*         * New entries are appended to the end, so if the entry is new we've
@@ -1309,7 +1302,6 @@ mdpostckpt(void)                        (errcode_for_file_access(),
errmsg("couldnot remove file \"%s\": %m", path)));        }
 
-        pfree(path);        /* And remove the list entry */        pendingUnlinks =
list_delete_first(pendingUnlinks);
@@ -1634,8 +1626,8 @@ _fdvec_alloc(void)static char *_mdfd_segpath(SMgrRelation reln, ForkNumber forknum, BlockNumber
segno){
-    char       *path,
-               *fullpath;
+    const char *path;
+    char       *fullpath;    path = relpath(reln->smgr_rnode, forknum);
@@ -1644,10 +1636,9 @@ _mdfd_segpath(SMgrRelation reln, ForkNumber forknum, BlockNumber segno)        /* be sure we
haveenough space for the '.segno' */        fullpath = (char *) palloc(strlen(path) + 12);        sprintf(fullpath,
"%s.%u",path, segno);
 
-        pfree(path);    }    else
-        fullpath = path;
+        fullpath = pstrdup(path);    return fullpath;}
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 89ad386..c285007 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -268,7 +268,7 @@ static int64calculate_relation_size(RelFileNode *rfn, BackendId backend, ForkNumber forknum){
int64       totalsize = 0;
 
-    char       *relationpath;
+    const char *relationpath;    char        pathname[MAXPGPATH];    unsigned int segcount = 0;
@@ -756,7 +756,7 @@ pg_relation_filepath(PG_FUNCTION_ARGS)    Form_pg_class relform;    RelFileNode rnode;    BackendId
  backend;
 
-    char       *path;
+    const char *path;    tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));    if (!HeapTupleIsValid(tuple))
diff --git a/src/include/catalog/catalog.h b/src/include/catalog/catalog.h
index 5d506fe..299cb03 100644
--- a/src/include/catalog/catalog.h
+++ b/src/include/catalog/catalog.h
@@ -31,7 +31,7 @@ extern const char *forkNames[];extern ForkNumber forkname_to_number(char *forkName);extern int
forkname_chars(constchar *str, ForkNumber *);
 
-extern char *relpathbackend(RelFileNode rnode, BackendId backend,
+extern const char *relpathbackend(RelFileNode rnode, BackendId backend,               ForkNumber forknum);extern char
*GetDatabasePath(OiddbNode, Oid spcNode);
 
diff --git a/src/include/storage/fd.h b/src/include/storage/fd.h
index bd36c9d..b2565c4 100644
--- a/src/include/storage/fd.h
+++ b/src/include/storage/fd.h
@@ -45,9 +45,6 @@/* * FileSeek uses the standard UNIX lseek(2) flags. */
-
-typedef char *FileName;
-typedef int File;
@@ -65,7 +62,7 @@ extern int    max_safe_fds; *//* Operations on virtual Files --- equivalent to Unix kernel file ops
*/
-extern File PathNameOpenFile(FileName fileName, int fileFlags, int fileMode);
+extern File PathNameOpenFile(const char *fileName, int fileFlags, int fileMode);extern File OpenTemporaryFile(bool
interXact);externvoid FileClose(File file);extern int    FilePrefetch(File file, off_t offset, int amount);
 
@@ -86,11 +83,11 @@ extern struct dirent *ReadDir(DIR *dir, const char *dirname);extern int    FreeDir(DIR *dir);/*
Operationsto allow use of a plain kernel FD, with automatic cleanup */
 
-extern int    OpenTransientFile(FileName fileName, int fileFlags, int fileMode);
+extern int    OpenTransientFile(const char *fileName, int fileFlags, int fileMode);extern int
CloseTransientFile(intfd);/* If you've really really gotta have a plain kernel FD, use this */
 
-extern int    BasicOpenFile(FileName fileName, int fileFlags, int fileMode);
+extern int    BasicOpenFile(const char *fileName, int fileFlags, int fileMode);/* Miscellaneous support routines
*/externvoid InitFileAccess(void);
 
-- 
1.7.12.289.g0ce9864.dirty

[PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend

From

Andres Freund

Date:

08 January 2013, 19:10:53

From: Andres Freund <andres@anarazel.de>

c.h already had parts of the assert support (StaticAssert*) and its the shared
file between postgres.h and postgres_fe.h. This makes it easier to build
frontend programs which have to do the hack.
---src/include/c.h           | 65 +++++++++++++++++++++++++++++++++++++++++++++++src/include/postgres.h    | 54
++-------------------------------------src/include/postgres_fe.h| 12 ---------3 files changed, 67 insertions(+), 64
deletions(-)

diff --git a/src/include/c.h b/src/include/c.h
index f7db157..c30df8b 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -694,6 +694,71 @@ typedef NameData *Name;/*
+ * USE_ASSERT_CHECKING, if defined, turns on all the assertions.
+ * - plai  9/5/90
+ *
+ * It should _NOT_ be defined in releases or in benchmark copies
+ */
+
+/*
+ * Assert() can be used in both frontend and backend code. In frontend code it
+ * just calls the standard assert, if it's available. If use of assertions is
+ * not configured, it does nothing.
+ */
+#ifndef USE_ASSERT_CHECKING
+
+#define Assert(condition)
+#define AssertMacro(condition)    ((void)true)
+#define AssertArg(condition)
+#define AssertState(condition)
+
+#elif defined FRONTEND
+
+#include <assert.h>
+#define Assert(p) assert(p)
+#define AssertMacro(p)    ((void) assert(p))
+
+#else /* USE_ASSERT_CHECKING && FRONTEND */
+
+/*
+ * Trap
+ *        Generates an exception if the given condition is true.
+ */
+#define Trap(condition, errorType) \
+    do { \
+        if ((assert_enabled) && (condition)) \
+            ExceptionalCondition(CppAsString(condition), (errorType), \
+                                 __FILE__, __LINE__); \
+    } while (0)
+
+/*
+ *    TrapMacro is the same as Trap but it's intended for use in macros:
+ *
+ *        #define foo(x) (AssertMacro(x != 0), bar(x))
+ *
+ *    Isn't CPP fun?
+ */
+#define TrapMacro(condition, errorType) \
+    ((bool) ((! assert_enabled) || ! (condition) || \
+             (ExceptionalCondition(CppAsString(condition), (errorType), \
+                                   __FILE__, __LINE__), 0)))
+
+#define Assert(condition) \
+        Trap(!(condition), "FailedAssertion")
+
+#define AssertMacro(condition) \
+        ((void) TrapMacro(!(condition), "FailedAssertion"))
+
+#define AssertArg(condition) \
+        Trap(!(condition), "BadArgument")
+
+#define AssertState(condition) \
+        Trap(!(condition), "BadState")
+
+#endif /* USE_ASSERT_CHECKING && !FRONTEND */
+
+
+/* * Macros to support compile-time assertion checks. * * If the "condition" (a compile-time-constant expression)
evaluatesto false,
 
diff --git a/src/include/postgres.h b/src/include/postgres.h
index b6e922f..bbe125a 100644
--- a/src/include/postgres.h
+++ b/src/include/postgres.h
@@ -25,7 +25,7 @@ *      -------    ------------------------------------------------ *        1)        variable-length
datatypes(TOAST support) *        2)        datum type + support macros
 
- *        3)        exception handling definitions
+ *        3)        exception handling * *     NOTES *
@@ -627,62 +627,12 @@ extern Datum Float8GetDatum(float8 X);/*
----------------------------------------------------------------
- *                Section 3:    exception handling definitions
- *                            Assert, Trap, etc macros
+ *                Section 3:    exception handling backend support *
----------------------------------------------------------------*/extern PGDLLIMPORT bool assert_enabled;
 
-/*
- * USE_ASSERT_CHECKING, if defined, turns on all the assertions.
- * - plai  9/5/90
- *
- * It should _NOT_ be defined in releases or in benchmark copies
- */
-
-/*
- * Trap
- *        Generates an exception if the given condition is true.
- */
-#define Trap(condition, errorType) \
-    do { \
-        if ((assert_enabled) && (condition)) \
-            ExceptionalCondition(CppAsString(condition), (errorType), \
-                                 __FILE__, __LINE__); \
-    } while (0)
-
-/*
- *    TrapMacro is the same as Trap but it's intended for use in macros:
- *
- *        #define foo(x) (AssertMacro(x != 0), bar(x))
- *
- *    Isn't CPP fun?
- */
-#define TrapMacro(condition, errorType) \
-    ((bool) ((! assert_enabled) || ! (condition) || \
-             (ExceptionalCondition(CppAsString(condition), (errorType), \
-                                   __FILE__, __LINE__), 0)))
-
-#ifndef USE_ASSERT_CHECKING
-#define Assert(condition)
-#define AssertMacro(condition)    ((void)true)
-#define AssertArg(condition)
-#define AssertState(condition)
-#else
-#define Assert(condition) \
-        Trap(!(condition), "FailedAssertion")
-
-#define AssertMacro(condition) \
-        ((void) TrapMacro(!(condition), "FailedAssertion"))
-
-#define AssertArg(condition) \
-        Trap(!(condition), "BadArgument")
-
-#define AssertState(condition) \
-        Trap(!(condition), "BadState")
-#endif   /* USE_ASSERT_CHECKING */
-extern void ExceptionalCondition(const char *conditionName,                     const char *errorType,
constchar *fileName, int lineNumber) __attribute__((noreturn));
 
diff --git a/src/include/postgres_fe.h b/src/include/postgres_fe.h
index af31227..0f35ecc 100644
--- a/src/include/postgres_fe.h
+++ b/src/include/postgres_fe.h
@@ -24,16 +24,4 @@#include "c.h"
-/*
- * Assert() can be used in both frontend and backend code. In frontend code it
- * just calls the standard assert, if it's available. If use of assertions is
- * not configured, it does nothing.
- */
-#ifdef USE_ASSERT_CHECKING
-#include <assert.h>
-#define Assert(p) assert(p)
-#else
-#define Assert(p)
-#endif
-#endif   /* POSTGRES_FE_H */
-- 
1.7.12.289.g0ce9864.dirty

[PATCH 4/5] Add pg_xlogdump contrib module

From

Andres Freund

Date:

08 January 2013, 19:11:20

From: Andres Freund <andres@anarazel.de>

Authors: Andres Freund, Heikki Linnakangas
---contrib/Makefile                  |   1 +contrib/pg_xlogdump/Makefile      |  37 +++contrib/pg_xlogdump/compat.c
|  58 ++++contrib/pg_xlogdump/pg_xlogdump.c | 654 ++++++++++++++++++++++++++++++++++++++contrib/pg_xlogdump/tables.c
 |  78 +++++doc/src/sgml/ref/allfiles.sgml    |   1 +doc/src/sgml/ref/pg_xlogdump.sgml |  76
+++++doc/src/sgml/reference.sgml      |   1 +src/backend/access/transam/rmgr.c |   1 +src/backend/catalog/catalog.c
|  2 +src/tools/msvc/Mkvcbuild.pm       |  16 +-11 files changed, 924 insertions(+), 1 deletion(-)create mode 100644
contrib/pg_xlogdump/Makefilecreatemode 100644 contrib/pg_xlogdump/compat.ccreate mode 100644
contrib/pg_xlogdump/pg_xlogdump.ccreatemode 100644 contrib/pg_xlogdump/tables.ccreate mode 100644
doc/src/sgml/ref/pg_xlogdump.sgml

diff --git a/contrib/Makefile b/contrib/Makefile
index fcd7c1e..5d290b8 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -39,6 +39,7 @@ SUBDIRS = \        pg_trgm        \        pg_upgrade    \        pg_upgrade_support \
+        pg_xlogdump    \        pgbench        \        pgcrypto    \        pgrowlocks    \
diff --git a/contrib/pg_xlogdump/Makefile b/contrib/pg_xlogdump/Makefile
new file mode 100644
index 0000000..1adef35
--- /dev/null
+++ b/contrib/pg_xlogdump/Makefile
@@ -0,0 +1,37 @@
+# contrib/pg_xlogdump/Makefile
+
+PGFILEDESC = "pg_xlogdump"
+PGAPPICON=win32
+
+PROGRAM = pg_xlogdump
+OBJS =  pg_xlogdump.o compat.o tables.o xlogreader.o $(RMGRDESCOBJS) \
+    $(WIN32RES)
+
+# XXX: Perhaps this should be done by a wildcard rule so that you don't need
+# to remember to add new rmgrdesc files to this list.
+RMGRDESCSOURCES = clogdesc.c dbasedesc.c gindesc.c gistdesc.c hashdesc.c \
+    heapdesc.c mxactdesc.c nbtdesc.c relmapdesc.c seqdesc.c smgrdesc.c \
+    spgdesc.c standbydesc.c tblspcdesc.c xactdesc.c xlogdesc.c
+
+RMGRDESCOBJS = $(patsubst %.c,%.o,$(RMGRDESCSOURCES))
+
+EXTRA_CLEAN = $(RMGRDESCSOURCES) xlogreader.c
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_xlogdump
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
+override CPPFLAGS := -DFRONTEND $(CPPFLAGS)
+
+xlogreader.c: % : $(top_srcdir)/src/backend/access/transam/%
+    rm -f $@ && $(LN_S) $< .
+
+$(RMGRDESCSOURCES): % : $(top_srcdir)/src/backend/access/rmgrdesc/%
+    rm -f $@ && $(LN_S) $< .
diff --git a/contrib/pg_xlogdump/compat.c b/contrib/pg_xlogdump/compat.c
new file mode 100644
index 0000000..e150afb
--- /dev/null
+++ b/contrib/pg_xlogdump/compat.c
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * compat.c
+ *        Reimplementations of various backend functions.
+ *
+ * Portions Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *        contrib/pg_xlogdump/compat.c
+ *
+ * This file contains client-side implementations for various backend
+ * functions that the rm_desc functions in *desc.c files rely on.
+ *
+ *-------------------------------------------------------------------------
+ */
+
+/* ugly hack, same as in e.g pg_controldata */
+#define FRONTEND 1
+#include "postgres.h"
+
+#include "catalog/catalog.h"
+#include "datatype/timestamp.h"
+#include "lib/stringinfo.h"
+#include "storage/relfilenode.h"
+#include "utils/timestamp.h"
+#include "utils/datetime.h"
+
+const char *
+timestamptz_to_str(TimestampTz dt)
+{
+    return "unimplemented-timestamp";
+}
+
+const char *
+relpathbackend(RelFileNode rnode, BackendId backend, ForkNumber forknum)
+{
+    return "unimplemented-relpathbackend";
+}
+
+/*
+ * Provide a hacked up compat layer for StringInfos so xlog desc functions can
+ * be linked/called.
+ */
+void
+appendStringInfo(StringInfo str, const char *fmt, ...)
+{
+    va_list        args;
+
+    va_start(args, fmt);
+    vprintf(fmt, args);
+    va_end(args);
+}
+
+void
+appendStringInfoString(StringInfo str, const char *string)
+{
+    appendStringInfo(str, "%s", string);
+}
diff --git a/contrib/pg_xlogdump/pg_xlogdump.c b/contrib/pg_xlogdump/pg_xlogdump.c
new file mode 100644
index 0000000..29ee73e
--- /dev/null
+++ b/contrib/pg_xlogdump/pg_xlogdump.c
@@ -0,0 +1,654 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_xlogdump.c - decode and display WAL
+ *
+ * Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *          contrib/pg_xlogdump/pg_xlogdump.c
+ *-------------------------------------------------------------------------
+ */
+
+/* ugly hack, same as in e.g pg_controldata */
+#define FRONTEND 1
+#include "postgres.h"
+
+#include <unistd.h>
+
+#include "access/xlog.h"
+#include "access/xlogreader.h"
+#include "access/rmgr.h"
+#include "access/transam.h"
+
+#include "catalog/catalog.h"
+
+#include "getopt_long.h"
+
+static const char *progname;
+
+typedef struct XLogDumpPrivateData
+{
+    TimeLineID    timeline;
+    char       *inpath;
+    XLogRecPtr    startptr;
+    XLogRecPtr    endptr;
+
+    /* display options */
+    bool        bkp_details;
+    int            stop_after_records;
+    int            already_displayed_records;
+
+    /* filter options */
+    int         filter_by_rmgr;
+    TransactionId filter_by_xid;
+} XLogDumpPrivateData;
+
+static void fatal_error(const char *fmt, ...)
+__attribute__((format(PG_PRINTF_ATTRIBUTE, 1, 2)));
+
+static void fatal_error(const char *fmt, ...)
+{
+    va_list        args;
+    fflush(stdout);
+
+    fprintf(stderr, "%s: fatal_error: ", progname);
+    va_start(args, fmt);
+    vfprintf(stderr, fmt, args);
+    va_end(args);
+    fputc('\n', stderr);
+    exit(EXIT_FAILURE);
+}
+
+/*
+ * Check whether directory exists and whether we can open it. Keep errno set
+ * error reporting by the caller.
+ */
+static bool
+verify_directory(const char *directory)
+{
+    int fd = open(directory, O_DIRECTORY|O_RDONLY);
+    if (fd < 0)
+        return false;
+    close(fd);
+    return true;
+}
+
+static void
+split_path(const char *path, char **dir, char **fname)
+{
+    char *sep;
+
+    /* split filepath into directory & filename */
+    sep = strrchr(path, '/');
+
+    /* directory path */
+    if (sep != NULL)
+    {
+        /* windows doesn't have strndup */
+        *dir = strdup(path);
+        (*dir)[(sep - path) + 1] = '\0';
+        *fname = strdup(sep + 1);
+        }
+    /* local directory */
+    else
+    {
+        *dir = NULL;
+        *fname = strdup(path);
+    }
+}
+
+/*
+ * Try to find the file in several places:
+ * if directory == NULL:
+ *   fname
+ *   XLOGDIR / fname
+ *   $PGDATA / XLOGDIR / fname
+ * else
+ *   directory / fname
+ *   directory / XLOGDIR / fname
+ *
+ * return a read only fd
+ */
+static int
+fuzzy_open_file(const char *directory, const char *fname)
+{
+    int fd = -1;
+    char fpath[MAXPGPATH];
+
+    if (directory == NULL)
+    {
+        const char* datadir;
+
+        /* fname */
+        fd = open(fname, O_RDONLY | PG_BINARY, 0);
+        if (fd < 0 && errno != ENOENT)
+            return -1;
+        else if (fd > 0)
+            return fd;
+
+        /* XLOGDIR / fname */
+        snprintf(fpath, MAXPGPATH, "%s/%s",
+                 XLOGDIR, fname);
+        fd = open(fpath, O_RDONLY | PG_BINARY, 0);
+        if (fd < 0 && errno != ENOENT)
+            return -1;
+        else if (fd > 0)
+            return fd;
+
+        datadir = getenv("PGDATA");
+        /* $PGDATA / XLOGDIR / fname */
+        if (datadir != NULL)
+        {
+            snprintf(fpath, MAXPGPATH, "%s/%s/%s",
+                     datadir, XLOGDIR, fname);
+            fd = open(fpath, O_RDONLY | PG_BINARY, 0);
+            if (fd < 0 && errno != ENOENT)
+                return -1;
+            else if (fd > 0)
+                return fd;
+        }
+    }
+    else
+    {
+        /* directory / fname */
+        snprintf(fpath, MAXPGPATH, "%s/%s",
+                 directory, fname);
+        fd = open(fpath, O_RDONLY | PG_BINARY, 0);
+        if (fd < 0 && errno != ENOENT)
+            return -1;
+        else if (fd > 0)
+            return fd;
+
+        /* directory / XLOGDIR / fname */
+        snprintf(fpath, MAXPGPATH, "%s/%s/%s",
+                 directory, XLOGDIR, fname);
+        fd = open(fpath, O_RDONLY | PG_BINARY, 0);
+        if (fd < 0 && errno != ENOENT)
+            return -1;
+        else if (fd > 0)
+            return fd;
+    }
+    return -1;
+}
+
+/* this should probably be put in a general implementation */
+static void
+XLogDumpXLogRead(const char *directory, TimeLineID timeline_id,
+                 XLogRecPtr startptr, char *buf, Size count)
+{
+    char       *p;
+    XLogRecPtr    recptr;
+    Size        nbytes;
+
+    static int    sendFile = -1;
+    static XLogSegNo sendSegNo = 0;
+    static uint32 sendOff = 0;
+
+    p = buf;
+    recptr = startptr;
+    nbytes = count;
+
+    while (nbytes > 0)
+    {
+        uint32        startoff;
+        int            segbytes;
+        int            readbytes;
+
+        startoff = recptr % XLogSegSize;
+
+        if (sendFile < 0 || !XLByteInSeg(recptr, sendSegNo))
+        {
+            char        fname[MAXFNAMELEN];
+
+            /* Switch to another logfile segment */
+            if (sendFile >= 0)
+                close(sendFile);
+
+            XLByteToSeg(recptr, sendSegNo);
+
+            XLogFileName(fname, timeline_id, sendSegNo);
+
+            sendFile = fuzzy_open_file(directory, fname);
+
+            if (sendFile < 0)
+                fatal_error("could not find file \"%s\": %s",
+                            fname, strerror(errno));
+            sendOff = 0;
+        }
+
+        /* Need to seek in the file? */
+        if (sendOff != startoff)
+        {
+            if (lseek(sendFile, (off_t) startoff, SEEK_SET) < 0)
+            {
+                int        err = errno;
+                char    fname[MAXPGPATH];
+                XLogFileName(fname, timeline_id, sendSegNo);
+
+                fatal_error("could not seek in log segment %s to offset %u: %s",
+                            fname, startoff, strerror(err));
+            }
+            sendOff = startoff;
+        }
+
+        /* How many bytes are within this segment? */
+        if (nbytes > (XLogSegSize - startoff))
+            segbytes = XLogSegSize - startoff;
+        else
+            segbytes = nbytes;
+
+        readbytes = read(sendFile, p, segbytes);
+        if (readbytes <= 0)
+        {
+            int        err = errno;
+            char    fname[MAXPGPATH];
+            XLogFileName(fname, timeline_id, sendSegNo);
+
+            fatal_error("could not read from log segment %s, offset %d, length %d: %s",
+                        fname, sendOff, segbytes, strerror(err));
+        }
+
+        /* Update state for read */
+        recptr += readbytes;
+
+        sendOff += readbytes;
+        nbytes -= readbytes;
+        p += readbytes;
+    }
+}
+
+static int
+XLogDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
+                 char *readBuff, TimeLineID *curFileTLI)
+{
+    XLogDumpPrivateData *private = state->private_data;
+    int            count = XLOG_BLCKSZ;
+
+    if (private->endptr != InvalidXLogRecPtr)
+    {
+        if (targetPagePtr + XLOG_BLCKSZ <= private->endptr)
+            count = XLOG_BLCKSZ;
+        else if (targetPagePtr + reqLen <= private->endptr)
+            count = private->endptr - targetPagePtr;
+        else
+            return -1;
+    }
+
+    XLogDumpXLogRead(private->inpath, private->timeline, targetPagePtr,
+                     readBuff, count);
+
+    return count;
+}
+
+static void
+XLogDumpDisplayRecord(XLogReaderState *state, XLogRecord *record)
+{
+    XLogDumpPrivateData *config = (XLogDumpPrivateData *) state->private_data;
+    const RmgrData *rmgr = &RmgrTable[record->xl_rmid];
+
+    if (config->filter_by_rmgr != -1 &&
+        config->filter_by_rmgr != record->xl_rmid)
+        return;
+
+    if (TransactionIdIsValid(config->filter_by_xid) &&
+        config->filter_by_xid != record->xl_xid)
+        return;
+
+    config->already_displayed_records++;
+
+    printf("xlog record: rmgr: %-11s, record_len: %6u, tot_len: %6u, tx: %10u, lsn: %X/%08X, prev %X/%08X, bkp:
%u%u%u%u,desc:",
 
+            rmgr->rm_name,
+            record->xl_len, record->xl_tot_len,
+            record->xl_xid,
+            (uint32) (state->ReadRecPtr >> 32), (uint32) state->ReadRecPtr,
+            (uint32) (record->xl_prev >> 32), (uint32) record->xl_prev,
+            !!(XLR_BKP_BLOCK(0) & record->xl_info),
+            !!(XLR_BKP_BLOCK(1) & record->xl_info),
+            !!(XLR_BKP_BLOCK(2) & record->xl_info),
+            !!(XLR_BKP_BLOCK(3) & record->xl_info));
+
+    /* the desc routine will printf the description directly to stdout */
+    rmgr->rm_desc(NULL, record->xl_info, XLogRecGetData(record));
+
+    putchar('\n');
+
+    if (config->bkp_details)
+    {
+        int        off;
+        char   *blk = (char *) XLogRecGetData(record) + record->xl_len;
+
+        for (off = 0; off < XLR_MAX_BKP_BLOCKS; off++)
+        {
+            BkpBlock    bkpb;
+
+            if (!(XLR_BKP_BLOCK(off) & record->xl_info))
+                continue;
+
+            memcpy(&bkpb, blk, sizeof(BkpBlock));
+            blk += sizeof(BkpBlock);
+            blk += BLCKSZ - bkpb.hole_length;
+
+            printf("\tbackup bkp #%u; rel %u/%u/%u; fork: %s; block: %u; hole: offset: %u, length: %u\n",
+                   off, bkpb.node.spcNode, bkpb.node.dbNode, bkpb.node.relNode,
+                   forkNames[bkpb.fork], bkpb.block, bkpb.hole_offset, bkpb.hole_length);
+        }
+    }
+}
+
+static void
+usage(void)
+{
+    printf("%s: reads/writes postgres transaction logs for debugging.\n\n",
+           progname);
+    printf("Usage:\n");
+    printf("  %s [OPTION] [STARTSEG [ENDSEG]] \n", progname);
+    printf("\nOptions:\n");
+    printf("  -b, --bkp-details      output detailed information about backup blocks\n");
+    printf("  -e, --end RECPTR       read wal up to RECPTR\n");
+    printf("  -h, --help             show this help, then exit\n");
+    printf("  -n, --limit RECORDS    only display n records, abort afterwards\n");
+    printf("  -p, --path PATH        from where do we want to read? cwd/pg_xlog is the default\n");
+    printf("  -r, --rmgr RMGR        only show records generated by the rmgr RMGR\n");
+    printf("  -s, --start RECPTR     read wal in directory indicated by -p starting at RECPTR\n");
+    printf("  -t, --timeline TLI     which timeline do we want to read, defaults to 1\n");
+    printf("  -V, --version          output version information, then exit\n");
+    printf("  -x, --xid XID          only show records with transactionid XID\n");
+}
+
+int
+main(int argc, char **argv)
+{
+    uint32        xlogid;
+    uint32        xrecoff;
+    XLogReaderState *xlogreader_state;
+    XLogDumpPrivateData private;
+    XLogRecord *record;
+    XLogRecPtr    first_record;
+    char       *errormsg;
+
+    static struct option long_options[] = {
+        {"bkp-details", no_argument, NULL, 'b'},
+        {"end", required_argument, NULL, 'e'},
+        {"help", no_argument, NULL, '?'},
+        {"limit", required_argument, NULL, 'n'},
+        {"path", required_argument, NULL, 'p'},
+        {"rmgr", required_argument, NULL, 'r'},
+        {"start", required_argument, NULL, 's'},
+        {"timeline", required_argument, NULL, 't'},
+        {"xid", required_argument, NULL, 'x'},
+        {"version", no_argument, NULL, 'V'},
+        {NULL, 0, NULL, 0}
+    };
+
+    int            option;
+    int            optindex = 0;
+
+    progname = get_progname(argv[0]);
+
+    memset(&private, 0, sizeof(XLogDumpPrivateData));
+
+    private.timeline = 1;
+    private.bkp_details = false;
+    private.startptr = InvalidXLogRecPtr;
+    private.endptr = InvalidXLogRecPtr;
+    private.stop_after_records = -1;
+    private.already_displayed_records = 0;
+    private.filter_by_rmgr = -1;
+    private.filter_by_xid = InvalidTransactionId;
+
+    if (argc <= 1)
+    {
+        fprintf(stderr, "%s: no arguments specified\n", progname);
+        goto bad_argument;
+    }
+
+    while ((option = getopt_long(argc, argv, "be:?n:p:r:s:t:Vx:",
+                                 long_options, &optindex)) != -1)
+    {
+        switch (option)
+        {
+            case 'b':
+                private.bkp_details = true;
+                break;
+            case 'e':
+                if (sscanf(optarg, "%X/%X", &xlogid, &xrecoff) != 2)
+                {
+                    fprintf(stderr, "%s: could not parse parse --end %s\n",
+                            progname, optarg);
+                    goto bad_argument;
+                }
+                private.endptr = (uint64)xlogid << 32 | xrecoff;
+                break;
+            case '?':
+                usage();
+                exit(EXIT_SUCCESS);
+                break;
+            case 'n':
+                if (sscanf(optarg, "%d", &private.stop_after_records) != 1)
+                {
+                    fprintf(stderr, "%s: could not parse parse --limit %s\n",
+                            progname, optarg);
+                    goto bad_argument;
+                }
+                break;
+            case 'p':
+                private.inpath = strdup(optarg);
+                break;
+            case 'r':
+            {
+                int i;
+                for (i = 0; i < RM_MAX_ID; i++)
+                {
+                    if (strcmp(optarg, RmgrTable[i].rm_name) == 0)
+                    {
+                        private.filter_by_rmgr = i;
+                        break;
+                    }
+                }
+
+                if (private.filter_by_rmgr == -1)
+                {
+                    fprintf(stderr, "%s: --rmgr %s does not exist\n",
+                            progname, optarg);
+                    goto bad_argument;
+                }
+            }
+            break;
+            case 's':
+                if (sscanf(optarg, "%X/%X", &xlogid, &xrecoff) != 2)
+                {
+                    fprintf(stderr, "%s: could not parse parse --end %s\n",
+                            progname, optarg);
+                    goto bad_argument;
+                }
+                else
+                    private.startptr = (uint64)xlogid << 32 | xrecoff;
+                break;
+            case 't':
+                if (sscanf(optarg, "%d", &private.timeline) != 1)
+                {
+                    fprintf(stderr, "%s: could not parse timeline --timeline %s\n",
+                            progname, optarg);
+                    goto bad_argument;
+                }
+                break;
+            case 'V':
+                puts("pg_xlogdump (PostgreSQL) " PG_VERSION);
+                exit(EXIT_SUCCESS);
+                break;
+            case 'x':
+                if (sscanf(optarg, "%u", &private.filter_by_xid) != 1)
+                {
+                    fprintf(stderr, "%s: could not parse --xid %s as a valid xid\n",
+                            progname, optarg);
+                    goto bad_argument;
+                }
+                break;
+            default:
+                goto bad_argument;
+        }
+    }
+
+    if ((optind + 2) < argc)
+    {
+        fprintf(stderr,
+                "%s: too many command-line arguments (first is \"%s\")\n",
+                progname, argv[optind + 2]);
+        goto bad_argument;
+    }
+
+    if (private.inpath != NULL)
+    {
+        /* validdate path points to directory */
+        if (!verify_directory(private.inpath))
+        {
+            fprintf(stderr,
+                    "%s: --path %s is cannot be opened: %s",
+                    progname, private.inpath, strerror(errno));
+            goto bad_argument;
+        }
+    }
+
+    /* parse files as start/end boundaries, extract path if not specified */
+    if (optind < argc)
+    {
+        char *directory = NULL;
+        char *fname = NULL;
+        int fd;
+        XLogSegNo segno;
+
+        split_path(argv[optind], &directory, &fname);
+
+        if (private.inpath == NULL && directory != NULL)
+        {
+            private.inpath = directory;
+
+            if (!verify_directory(private.inpath))
+                fatal_error("cannot open directory %s: %s",
+                            private.inpath, strerror(errno));
+        }
+
+        fd = fuzzy_open_file(private.inpath, fname);
+        if (fd < 0)
+            fatal_error("could not open file %s", fname);
+        close(fd);
+
+        /* parse position from file */
+        XLogFromFileName(fname, &private.timeline, &segno);
+
+        if (XLogRecPtrIsInvalid(private.startptr))
+            XLogSegNoOffsetToRecPtr(segno, 0, private.startptr);
+        else if (!XLByteInSeg(private.startptr, segno))
+        {
+            fprintf(stderr,
+                    "%s: --end %X/%X is not inside file \"%s\"\n",
+                    progname,
+                    (uint32)(private.startptr >> 32),
+                    (uint32)private.startptr,
+                    fname);
+            goto bad_argument;
+        }
+
+        /* no second file specified, set end position */
+        if (!(optind + 1 < argc) && XLogRecPtrIsInvalid(private.endptr))
+            XLogSegNoOffsetToRecPtr(segno + 1, 0, private.endptr);
+
+        /* parse ENDSEG if passed */
+        if (optind + 1 < argc)
+        {
+            XLogSegNo endsegno;
+
+            /* ignore directory, already have that */
+            split_path(argv[optind + 1], &directory, &fname);
+
+            fd = fuzzy_open_file(private.inpath, fname);
+            if (fd < 0)
+                fatal_error("could not open file %s", fname);
+            close(fd);
+
+            /* parse position from file */
+            XLogFromFileName(fname, &private.timeline, &endsegno);
+
+            if (endsegno < segno)
+                fatal_error("ENDSEG %s is before STARSEG %s",
+                            argv[optind + 1], argv[optind]);
+
+            if (XLogRecPtrIsInvalid(private.endptr))
+                XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.endptr);
+
+            /* set segno to endsegno for check of --end */
+            segno = endsegno;
+        }
+
+
+        if (!XLByteInSeg(private.endptr, segno) &&
+            private.endptr != (segno + 1) * XLogSegSize)
+        {
+            fprintf(stderr,
+                    "%s: --end %X/%X is not inside file \"%s\"\n",
+                    progname,
+                    (uint32)(private.endptr >> 32),
+                    (uint32)private.endptr,
+                    argv[argc -1]);
+            goto bad_argument;
+        }
+    }
+
+    /* we don't know what to print */
+    if (XLogRecPtrIsInvalid(private.startptr))
+    {
+        fprintf(stderr, "%s: no --start given in range mode.\n", progname);
+        goto bad_argument;
+    }
+
+    /* done with argument parsing, do the actual work */
+
+    /* we have everything we need, start reading */
+    xlogreader_state = XLogReaderAllocate(private.startptr,
+                                          XLogDumpReadPage,
+                                          &private);
+
+    /* first find a valid recptr to start from */
+    first_record = XLogFindNextRecord(xlogreader_state, private.startptr);
+
+    if (first_record == InvalidXLogRecPtr)
+        fatal_error("could not find a valid record after %X/%X",
+                    (uint32) (private.startptr >> 32),
+                    (uint32) private.startptr);
+
+    /*
+     * Display a message that were skipping data if `from` wasn't a pointer
+     * to the start of a record and also wasn't a pointer to the beginning
+     * of a segment (e.g. we were used in file mode).
+     */
+    if (first_record != private.startptr && (private.startptr % XLogSegSize) != 0)
+        printf("first record is after %X/%X, at %X/%X, skipping over %u bytes\n",
+               (uint32) (private.startptr >> 32), (uint32) private.startptr,
+               (uint32) (first_record >> 32), (uint32) first_record,
+               (uint32) (first_record - private.startptr));
+
+    while ((record = XLogReadRecord(xlogreader_state, first_record, &errormsg)))
+    {
+        /* continue after the last record */
+        first_record = InvalidXLogRecPtr;
+        XLogDumpDisplayRecord(xlogreader_state, record);
+
+        /* check whether we printed enough */
+        if (private.stop_after_records > 0 &&
+            private.already_displayed_records >= private.stop_after_records)
+            break;
+    }
+
+    if (errormsg)
+        fatal_error("error in WAL record at %X/%X: %s\n",
+                    (uint32)(xlogreader_state->ReadRecPtr >> 32),
+                    (uint32)xlogreader_state->ReadRecPtr,
+                    errormsg);
+
+    XLogReaderFree(xlogreader_state);
+
+    return EXIT_SUCCESS;
+bad_argument:
+    fprintf(stderr, "Try \"%s --help\" for more information.\n", progname);
+    return EXIT_FAILURE;
+}
diff --git a/contrib/pg_xlogdump/tables.c b/contrib/pg_xlogdump/tables.c
new file mode 100644
index 0000000..e947e0d
--- /dev/null
+++ b/contrib/pg_xlogdump/tables.c
@@ -0,0 +1,78 @@
+/*-------------------------------------------------------------------------
+ *
+ * tables.c
+ *        Support data for xlogdump.c
+ *
+ * Portions Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *        contrib/pg_xlogdump/tables.c
+ *
+ * NOTES
+ *
+ *-------------------------------------------------------------------------
+ */
+
+/*
+ * rmgr.c
+ *
+ * Resource managers definition
+ *
+ * src/backend/access/transam/rmgr.c
+ */
+#include "postgres.h"
+
+#include "access/clog.h"
+#include "access/gin.h"
+#include "access/gist_private.h"
+#include "access/hash.h"
+#include "access/heapam_xlog.h"
+#include "access/multixact.h"
+#include "access/nbtree.h"
+#include "access/spgist.h"
+#include "access/xact.h"
+#include "access/xlog_internal.h"
+#include "catalog/storage_xlog.h"
+#include "commands/dbcommands.h"
+#include "commands/sequence.h"
+#include "commands/tablespace.h"
+#include "storage/standby.h"
+#include "utils/relmapper.h"
+#include "catalog/catalog.h"
+
+/*
+ * Table of fork names.
+ *
+ * needs to be synced with src/backend/catalog/catalog.c
+ */
+const char *forkNames[] = {
+    "main",                        /* MAIN_FORKNUM */
+    "fsm",                        /* FSM_FORKNUM */
+    "vm",                        /* VISIBILITYMAP_FORKNUM */
+    "init"                        /* INIT_FORKNUM */
+};
+
+/*
+ * RmgrTable linked only to functions available outside of the backend.
+ *
+ * needs to be synced with src/backend/access/transam/rmgr.c
+ */
+const RmgrData RmgrTable[RM_MAX_ID + 1] = {
+    {"XLOG", NULL, xlog_desc, NULL, NULL, NULL},
+    {"Transaction", NULL, xact_desc, NULL, NULL, NULL},
+    {"Storage", NULL, smgr_desc, NULL, NULL, NULL},
+    {"CLOG", NULL, clog_desc, NULL, NULL, NULL},
+    {"Database", NULL, dbase_desc, NULL, NULL, NULL},
+    {"Tablespace", NULL, tblspc_desc, NULL, NULL, NULL},
+    {"MultiXact", NULL, multixact_desc, NULL, NULL, NULL},
+    {"RelMap", NULL, relmap_desc, NULL, NULL, NULL},
+    {"Standby", NULL, standby_desc, NULL, NULL, NULL},
+    {"Heap2", NULL, heap2_desc, NULL, NULL, NULL},
+    {"Heap", NULL, heap_desc, NULL, NULL, NULL},
+    {"Btree", NULL, btree_desc, NULL, NULL, NULL},
+    {"Hash", NULL, hash_desc, NULL, NULL, NULL},
+    {"Gin", NULL, gin_desc, NULL, NULL, NULL},
+    {"Gist", NULL, gist_desc, NULL, NULL, NULL},
+    {"Sequence", NULL, seq_desc, NULL, NULL, NULL},
+    {"SPGist", NULL, spg_desc, NULL, NULL, NULL}
+};
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index df84054..49cb7ac 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -178,6 +178,7 @@ Complete list of usable sgml source files in this directory.<!ENTITY pgReceivexlog      SYSTEM
"pg_receivexlog.sgml"><!ENTITYpgResetxlog        SYSTEM "pg_resetxlog.sgml"><!ENTITY pgRestore          SYSTEM
"pg_restore.sgml">
+<!ENTITY pgXlogdump         SYSTEM "pg_xlogdump.sgml"><!ENTITY postgres           SYSTEM "postgres-ref.sgml"><!ENTITY
postmaster        SYSTEM "postmaster.sgml"><!ENTITY psqlRef            SYSTEM "psql-ref.sgml">
 
diff --git a/doc/src/sgml/ref/pg_xlogdump.sgml b/doc/src/sgml/ref/pg_xlogdump.sgml
new file mode 100644
index 0000000..7a27c7b
--- /dev/null
+++ b/doc/src/sgml/ref/pg_xlogdump.sgml
@@ -0,0 +1,76 @@
+<!--
+doc/src/sgml/ref/pg_xlogdump.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="APP-PGXLOGDUMP">
+ <refmeta>
+  <refentrytitle><application>pg_xlogdump</application></refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>pg_xlogdump</refname>
+  <refpurpose>Display the write-ahead log of a <productname>PostgreSQL</productname> database cluster</refpurpose>
+ </refnamediv>
+
+ <indexterm zone="app-pgxlogdump">
+  <primary>pg_xlogdump</primary>
+ </indexterm>
+
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_xlogdump</command>
+   <arg choice="opt"><option>-b</option></arg>
+   <arg choice="opt"><option>-e</option> <replaceable class="parameter">xlogrecptr</replaceable></arg>
+   <arg choice="opt"><option>-f</option> <replaceable class="parameter">filename</replaceable></arg>
+   <arg choice="opt"><option>-h</option></arg>
+   <arg choice="opt"><option>-p</option> <replaceable class="parameter">directory</replaceable></arg>
+   <arg choice="opt"><option>-s</option> <replaceable class="parameter">xlogrecptr</replaceable></arg>
+   <arg choice="opt"><option>-t</option> <replaceable class="parameter">timelineid</replaceable></arg>
+   <arg choice="opt"><option>-v</option></arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1 id="R1-APP-PGXLOGDUMP-1">
+  <title>Description</title>
+  <para>
+   <command>pg_xlogdump</command> display the write-ahead log (WAL) and is only
+   useful for debugging or educational purposes.
+  </para>
+
+  <para>
+   This utility can only be run by the user who installed the server, because
+   it requires read access to the data directory. It does not perform any
+   modifications.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Options</title>
+
+   <para>
+    The following command-line options control the location and format of the
+    output.
+
+    <variablelist>
+     <varlistentry>
+      <term><option>-p <replaceable class="parameter">directory</replaceable></option></term>
+      <listitem>
+       <para>
+        Directory to find xlog files in.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+   </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+  <para>
+    Can give wrong results when the server is running.
+  </para>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 0872168..fed1fdd 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -225,6 +225,7 @@   &pgDumpall;   &pgReceivexlog;   &pgRestore;
+   &pgXlogdump;   &psqlRef;   &reindexdb;   &vacuumdb;
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index cc210a7..4e94af1 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -24,6 +24,7 @@#include "storage/standby.h"#include "utils/relmapper.h"
+/* Also update contrib/pg_xlogdump/tables.c if you add something here. */const RmgrData RmgrTable[RM_MAX_ID + 1] = {
{"XLOG", xlog_redo, xlog_desc, NULL, NULL, NULL},
 
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index 6455ef0..92ab3ca 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -52,6 +52,8 @@ * If you add a new entry, remember to update the errhint below, and the * documentation for
pg_relation_size().Also keep FORKNAMECHARS above * up-to-date.
 
+ *
+ * Also update contrib/pg_xlogdump/tables.c if you add something here. */const char *forkNames[] = {    "main",
               /* MAIN_FORKNUM */
 
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index d587365..7b6ed41 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -41,7 +41,7 @@ my $contrib_extraincludes =my $contrib_extrasource = {    'cube' => [ 'cubescan.l', 'cubeparse.y' ],
 'seg'  => [ 'segscan.l',  'segparse.y' ] };
 
-my @contrib_excludes = ('pgcrypto', 'intagg', 'sepgsql');
+my @contrib_excludes = ('intagg', 'pgcrypto', 'pg_xlogdump', 'sepgsql');sub mkvcbuild{
@@ -409,6 +409,20 @@ sub mkvcbuild        'localtime.c');    $zic->AddReference($libpgport);
+    my $pgxlogdump = $solution->AddProject('pg_xlogdump', 'exe', 'contrib');
+    $pgxlogdump->{name} = 'pg_xlogdump';
+    $pgxlogdump->AddIncludeDir('src\backend');
+    $pgxlogdump->AddFiles('contrib\pg_xlogdump',
+        'compat.c', 'pg_xlogdump.c', 'tables.c');
+    $pgxlogdump->AddFile('src\backend\access\transam\xlogreader.c');
+    $pgxlogdump->AddFiles('src\backend\access\rmgrdesc',
+        'clogdesc.c', 'dbasedesc.c', 'gindesc.c', 'gistdesc.c', 'hashdesc.c',
+        'heapdesc.c', 'mxactdesc.c', 'nbtdesc.c', 'relmapdesc.c', 'seqdesc.c',
+        'smgrdesc.c', 'spgdesc.c', 'standbydesc.c', 'tblspcdesc.c',
+        'xactdesc.c', 'xlogdesc.c');
+    $pgxlogdump->AddReference($libpgport);
+    $pgxlogdump->AddDefine('FRONTEND');
+    if ($solution->{options}->{xml})    {        $contrib_extraincludes->{'pgxml'} = [
-- 
1.7.12.289.g0ce9864.dirty

[PATCH 3/5] Split out xlog reading into its own module called xlogreader

From

Andres Freund

Date:

08 January 2013, 19:11:43

From: Andres Freund <andres@anarazel.de>

The way xlog reading was done up to now made it impossible to use that
nontrivial code outside of xlog.c although it is useful for different purposes
like debugging wal (xlogdump) and decoding wal back into logical changes.

Authors: Heikki Linnakangas, Andres Freund, Alvaro Herrera
Reviewed-By: Alvaro Herrera
---src/backend/access/transam/Makefile     |   2 +-src/backend/access/transam/xlog.c       | 830
+++++----------------------src/backend/access/transam/xlogreader.c| 987
++++++++++++++++++++++++++++++++src/backend/nls.mk                     |   5 +-src/include/access/xlogreader.h
|141 +++++5 files changed, 1261 insertions(+), 704 deletions(-)create mode 100644
src/backend/access/transam/xlogreader.ccreatemode 100644 src/include/access/xlogreader.h
 

diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile
index 700cfd8..eb6cfc5 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/transam/Makefile
@@ -14,7 +14,7 @@ include $(top_builddir)/src/Makefile.globalOBJS = clog.o transam.o varsup.o xact.o rmgr.o slru.o
subtrans.omultixact.o \    timeline.o twophase.o twophase_rmgr.o xlog.o xlogarchive.o xlogfuncs.o \
 
-    xlogutils.o
+    xlogreader.o xlogutils.oinclude $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 51a515a..310a654 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -30,6 +30,7 @@#include "access/twophase.h"#include "access/xact.h"#include "access/xlog_internal.h"
+#include "access/xlogreader.h"#include "access/xlogutils.h"#include "catalog/catversion.h"#include
"catalog/pg_control.h"
@@ -548,7 +549,6 @@ static int    readFile = -1;static XLogSegNo readSegNo = 0;static uint32 readOff = 0;static uint32
readLen= 0;
 
-static bool    readFileHeaderValidated = false;static XLogSource readSource = 0;        /* XLOG_FROM_* code *//*
@@ -561,6 +561,13 @@ static XLogSource readSource = 0;        /* XLOG_FROM_* code */static XLogSource currentSource =
0;   /* XLOG_FROM_* code */static bool    lastSourceFailed = false;
 
+typedef struct XLogPageReadPrivate
+{
+    int            emode;
+    bool        fetching_ckpt;    /* are we fetching a checkpoint record? */
+    bool        randAccess;
+} XLogPageReadPrivate;
+/* * These variables track when we last obtained some WAL data to process, * and where we got it from.
(XLogReceiptSourceis initially the same as
 
@@ -572,18 +579,9 @@ static bool    lastSourceFailed = false;static TimestampTz XLogReceiptTime = 0;static XLogSource
XLogReceiptSource= 0;    /* XLOG_FROM_* code */
 
-/* Buffer for currently read page (XLOG_BLCKSZ bytes) */
-static char *readBuf = NULL;
-
-/* Buffer for current ReadRecord result (expandable) */
-static char *readRecordBuf = NULL;
-static uint32 readRecordBufSize = 0;
-/* State information for XLOG reading */static XLogRecPtr ReadRecPtr;    /* start of last record read */static
XLogRecPtrEndRecPtr;    /* end+1 of last record read */
 
-static TimeLineID lastPageTLI = 0;
-static TimeLineID lastSegmentTLI = 0;static XLogRecPtr minRecoveryPoint;        /* local copy of
                 * ControlFile->minRecoveryPoint */
 
@@ -627,8 +625,8 @@ static bool InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,static int
XLogFileRead(XLogSegNosegno, int emode, TimeLineID tli,             int source, bool notexistOk);static int
XLogFileReadAnyTLI(XLogSegNosegno, int emode, int source);
 
-static bool XLogPageRead(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt,
-             bool randAccess);
+static int XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr,
+                 int reqLen, char *readBuf, TimeLineID *readTLI);static bool WaitForWALToBecomeAvailable(XLogRecPtr
RecPtr,bool randAccess,                            bool fetching_ckpt);static int    emode_for_corrupt_record(int
emode,XLogRecPtr RecPtr);
 
@@ -639,12 +637,11 @@ static void UpdateLastRemovedPtr(char *filename);static void
ValidateXLOGDirectoryStructure(void);staticvoid CleanupBackupHistory(void);static void
UpdateMinRecoveryPoint(XLogRecPtrlsn, bool force);
 
-static XLogRecord *ReadRecord(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt);
+static XLogRecord *ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr,
+           int emode, bool fetching_ckpt);static void CheckRecoveryConsistency(void);
-static bool ValidXLogPageHeader(XLogPageHeader hdr, int emode, bool segmentonly);
-static bool ValidXLogRecordHeader(XLogRecPtr *RecPtr, XLogRecord *record,
-                      int emode, bool randAccess);
-static XLogRecord *ReadCheckpointRecord(XLogRecPtr RecPtr, int whichChkpt);
+static XLogRecord *ReadCheckpointRecord(XLogReaderState *xlogreader,
+                     XLogRecPtr RecPtr, int whichChkpt);static bool rescanLatestTimeLine(void);static void
WriteControlFile(void);staticvoid ReadControlFile(void);
 
@@ -2652,9 +2649,6 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,        if (source != XLOG_FROM_STREAM)
        XLogReceiptTime = GetCurrentTimestamp();
 
-        /* The file header needs to be validated on first access */
-        readFileHeaderValidated = false;
-        return fd;    }    if (errno != ENOENT || !notfoundOk) /* unexpected failure? */
@@ -2709,7 +2703,8 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source)        if (source == XLOG_FROM_ANY ||
source== XLOG_FROM_ARCHIVE)        {
 
-            fd = XLogFileRead(segno, emode, tli, XLOG_FROM_ARCHIVE, true);
+            fd = XLogFileRead(segno, emode, tli,
+                              XLOG_FROM_ARCHIVE, true);            if (fd != -1)            {
elog(DEBUG1,"got WAL segment from archive");
 
@@ -2721,7 +2716,8 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source)        if (source == XLOG_FROM_ANY ||
source== XLOG_FROM_PG_XLOG)        {
 
-            fd = XLogFileRead(segno, emode, tli, XLOG_FROM_PG_XLOG, true);
+            fd = XLogFileRead(segno, emode, tli,
+                              XLOG_FROM_PG_XLOG, true);            if (fd != -1)            {                if
(!expectedTLEs)
@@ -3178,102 +3174,6 @@ RestoreBackupBlock(XLogRecPtr lsn, XLogRecord *record, int block_index,}/*
- * CRC-check an XLOG record.  We do not believe the contents of an XLOG
- * record (other than to the minimal extent of computing the amount of
- * data to read in) until we've checked the CRCs.
- *
- * We assume all of the record (that is, xl_tot_len bytes) has been read
- * into memory at *record.  Also, ValidXLogRecordHeader() has accepted the
- * record's header, which means in particular that xl_tot_len is at least
- * SizeOfXlogRecord, so it is safe to fetch xl_len.
- */
-static bool
-RecordIsValid(XLogRecord *record, XLogRecPtr recptr, int emode)
-{
-    pg_crc32    crc;
-    int            i;
-    uint32        len = record->xl_len;
-    BkpBlock    bkpb;
-    char       *blk;
-    size_t        remaining = record->xl_tot_len;
-
-    /* First the rmgr data */
-    if (remaining < SizeOfXLogRecord + len)
-    {
-        /* ValidXLogRecordHeader() should've caught this already... */
-        ereport(emode_for_corrupt_record(emode, recptr),
-                (errmsg("invalid record length at %X/%X",
-                        (uint32) (recptr >> 32), (uint32) recptr)));
-        return false;
-    }
-    remaining -= SizeOfXLogRecord + len;
-    INIT_CRC32(crc);
-    COMP_CRC32(crc, XLogRecGetData(record), len);
-
-    /* Add in the backup blocks, if any */
-    blk = (char *) XLogRecGetData(record) + len;
-    for (i = 0; i < XLR_MAX_BKP_BLOCKS; i++)
-    {
-        uint32        blen;
-
-        if (!(record->xl_info & XLR_BKP_BLOCK(i)))
-            continue;
-
-        if (remaining < sizeof(BkpBlock))
-        {
-            ereport(emode_for_corrupt_record(emode, recptr),
-                    (errmsg("invalid backup block size in record at %X/%X",
-                            (uint32) (recptr >> 32), (uint32) recptr)));
-            return false;
-        }
-        memcpy(&bkpb, blk, sizeof(BkpBlock));
-
-        if (bkpb.hole_offset + bkpb.hole_length > BLCKSZ)
-        {
-            ereport(emode_for_corrupt_record(emode, recptr),
-                    (errmsg("incorrect hole size in record at %X/%X",
-                            (uint32) (recptr >> 32), (uint32) recptr)));
-            return false;
-        }
-        blen = sizeof(BkpBlock) + BLCKSZ - bkpb.hole_length;
-
-        if (remaining < blen)
-        {
-            ereport(emode_for_corrupt_record(emode, recptr),
-                    (errmsg("invalid backup block size in record at %X/%X",
-                            (uint32) (recptr >> 32), (uint32) recptr)));
-            return false;
-        }
-        remaining -= blen;
-        COMP_CRC32(crc, blk, blen);
-        blk += blen;
-    }
-
-    /* Check that xl_tot_len agrees with our calculation */
-    if (remaining != 0)
-    {
-        ereport(emode_for_corrupt_record(emode, recptr),
-                (errmsg("incorrect total length in record at %X/%X",
-                        (uint32) (recptr >> 32), (uint32) recptr)));
-        return false;
-    }
-
-    /* Finally include the record header */
-    COMP_CRC32(crc, (char *) record, offsetof(XLogRecord, xl_crc));
-    FIN_CRC32(crc);
-
-    if (!EQ_CRC32(record->xl_crc, crc))
-    {
-        ereport(emode_for_corrupt_record(emode, recptr),
-        (errmsg("incorrect resource manager data checksum in record at %X/%X",
-                (uint32) (recptr >> 32), (uint32) recptr)));
-        return false;
-    }
-
-    return true;
-}
-
-/* * Attempt to read an XLOG record. * * If RecPtr is not NULL, try to read a record at that position.  Otherwise
@@ -3286,511 +3186,65 @@ RecordIsValid(XLogRecord *record, XLogRecPtr recptr, int emode) * the returned record pointer
alwayspoints there. */static XLogRecord *
 
-ReadRecord(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt)
+ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
+           bool fetching_ckpt){    XLogRecord *record;
-    XLogRecPtr    tmpRecPtr = EndRecPtr;
-    bool        randAccess = false;
-    uint32        len,
-                total_len;
-    uint32        targetRecOff;
-    uint32        pageHeaderSize;
-    bool        gotheader;
-
-    if (readBuf == NULL)
-    {
-        /*
-         * First time through, permanently allocate readBuf.  We do it this
-         * way, rather than just making a static array, for two reasons: (1)
-         * no need to waste the storage in most instantiations of the backend;
-         * (2) a static char array isn't guaranteed to have any particular
-         * alignment, whereas malloc() will provide MAXALIGN'd storage.
-         */
-        readBuf = (char *) malloc(XLOG_BLCKSZ);
-        Assert(readBuf != NULL);
-    }
-
-    if (RecPtr == NULL)
-    {
-        RecPtr = &tmpRecPtr;
-
-        /*
-         * RecPtr is pointing to end+1 of the previous WAL record.  If
-         * we're at a page boundary, no more records can fit on the current
-         * page. We must skip over the page header, but we can't do that
-         * until we've read in the page, since the header size is variable.
-         */
-    }
-    else
-    {
-        /*
-         * In this case, the passed-in record pointer should already be
-         * pointing to a valid record starting position.
-         */
-        if (!XRecOffIsValid(*RecPtr))
-            ereport(PANIC,
-                    (errmsg("invalid record offset at %X/%X",
-                            (uint32) (*RecPtr >> 32), (uint32) *RecPtr)));
+    XLogPageReadPrivate *private = (XLogPageReadPrivate *) xlogreader->private_data;
-        /*
-         * Since we are going to a random position in WAL, forget any prior
-         * state about what timeline we were in, and allow it to be any
-         * timeline in expectedTLEs.  We also set a flag to allow curFileTLI
-         * to go backwards (but we can't reset that variable right here, since
-         * we might not change files at all).
-         */
-        /* see comment in ValidXLogPageHeader */
-        lastPageTLI = lastSegmentTLI = 0;
-        randAccess = true;        /* allow curFileTLI to go backwards too */
-    }
+    /* Pass through parameters to XLogPageRead */
+    private->fetching_ckpt = fetching_ckpt;
+    private->emode = emode;
+    private->randAccess = (RecPtr != InvalidXLogRecPtr);    /* This is the first try to read this page. */
lastSourceFailed= false;
 
-retry:
-    /* Read the page containing the record */
-    if (!XLogPageRead(RecPtr, emode, fetching_ckpt, randAccess))
-        return NULL;
-    pageHeaderSize = XLogPageHeaderSize((XLogPageHeader) readBuf);
-    targetRecOff = (*RecPtr) % XLOG_BLCKSZ;
-    if (targetRecOff == 0)
-    {
-        /*
-         * At page start, so skip over page header.  The Assert checks that
-         * we're not scribbling on caller's record pointer; it's OK because we
-         * can only get here in the continuing-from-prev-record case, since
-         * XRecOffIsValid rejected the zero-page-offset case otherwise.
-         */
-        Assert(RecPtr == &tmpRecPtr);
-        (*RecPtr) += pageHeaderSize;
-        targetRecOff = pageHeaderSize;
-    }
-    else if (targetRecOff < pageHeaderSize)
+    do    {
-        ereport(emode_for_corrupt_record(emode, *RecPtr),
-                (errmsg("invalid record offset at %X/%X",
-                        (uint32) ((*RecPtr) >> 32), (uint32) *RecPtr)));
-        goto next_record_is_invalid;
-    }
-    if ((((XLogPageHeader) readBuf)->xlp_info & XLP_FIRST_IS_CONTRECORD) &&
-        targetRecOff == pageHeaderSize)
-    {
-        ereport(emode_for_corrupt_record(emode, *RecPtr),
-                (errmsg("contrecord is requested by %X/%X",
-                        (uint32) ((*RecPtr) >> 32), (uint32) *RecPtr)));
-        goto next_record_is_invalid;
-    }
-
-    /*
-     * Read the record length.
-     *
-     * NB: Even though we use an XLogRecord pointer here, the whole record
-     * header might not fit on this page. xl_tot_len is the first field of
-     * the struct, so it must be on this page (the records are MAXALIGNed),
-     * but we cannot access any other fields until we've verified that we
-     * got the whole header.
-     */
-    record = (XLogRecord *) (readBuf + (*RecPtr) % XLOG_BLCKSZ);
-    total_len = record->xl_tot_len;
-
-    /*
-     * If the whole record header is on this page, validate it immediately.
-     * Otherwise do just a basic sanity check on xl_tot_len, and validate the
-     * rest of the header after reading it from the next page.  The xl_tot_len
-     * check is necessary here to ensure that we enter the "Need to reassemble
-     * record" code path below; otherwise we might fail to apply
-     * ValidXLogRecordHeader at all.
-     */
-    if (targetRecOff <= XLOG_BLCKSZ - SizeOfXLogRecord)
-    {
-        if (!ValidXLogRecordHeader(RecPtr, record, emode, randAccess))
-            goto next_record_is_invalid;
-        gotheader = true;
-    }
-    else
-    {
-        if (total_len < SizeOfXLogRecord)
+        char   *errormsg;
+        record = XLogReadRecord(xlogreader, RecPtr, &errormsg);
+        ReadRecPtr = xlogreader->ReadRecPtr;
+        EndRecPtr = xlogreader->EndRecPtr;
+        if (record == NULL)        {
-            ereport(emode_for_corrupt_record(emode, *RecPtr),
-                    (errmsg("invalid record length at %X/%X",
-                            (uint32) ((*RecPtr) >> 32), (uint32) *RecPtr)));
-            goto next_record_is_invalid;
-        }
-        gotheader = false;
-    }
-
-    /*
-     * Allocate or enlarge readRecordBuf as needed.  To avoid useless small
-     * increases, round its size to a multiple of XLOG_BLCKSZ, and make sure
-     * it's at least 4*Max(BLCKSZ, XLOG_BLCKSZ) to start with.  (That is
-     * enough for all "normal" records, but very large commit or abort records
-     * might need more space.)
-     */
-    if (total_len > readRecordBufSize)
-    {
-        uint32        newSize = total_len;
-
-        newSize += XLOG_BLCKSZ - (newSize % XLOG_BLCKSZ);
-        newSize = Max(newSize, 4 * Max(BLCKSZ, XLOG_BLCKSZ));
-        if (readRecordBuf)
-            free(readRecordBuf);
-        readRecordBuf = (char *) malloc(newSize);
-        if (!readRecordBuf)
-        {
-            readRecordBufSize = 0;
-            /* We treat this as a "bogus data" condition */
-            ereport(emode_for_corrupt_record(emode, *RecPtr),
-                    (errmsg("record length %u at %X/%X too long",
-                            total_len, (uint32) ((*RecPtr) >> 32), (uint32) *RecPtr)));
-            goto next_record_is_invalid;
-        }
-        readRecordBufSize = newSize;
-    }
-
-    len = XLOG_BLCKSZ - (*RecPtr) % XLOG_BLCKSZ;
-    if (total_len > len)
-    {
-        /* Need to reassemble record */
-        char       *contrecord;
-        XLogPageHeader pageHeader;
-        XLogRecPtr    pagelsn;
-        char       *buffer;
-        uint32        gotlen;
-
-        /* Initialize pagelsn to the beginning of the page this record is on */
-        pagelsn = ((*RecPtr) / XLOG_BLCKSZ) * XLOG_BLCKSZ;
-
-        /* Copy the first fragment of the record from the first page. */
-        memcpy(readRecordBuf, readBuf + (*RecPtr) % XLOG_BLCKSZ, len);
-        buffer = readRecordBuf + len;
-        gotlen = len;
+            ereport(emode_for_corrupt_record(emode,
+                                             RecPtr ? RecPtr : EndRecPtr),
+                    (errmsg_internal("%s", errormsg) /* already translated */));
-        do
-        {
-            /* Calculate pointer to beginning of next page */
-            pagelsn += XLOG_BLCKSZ;
-            /* Wait for the next page to become available */
-            if (!XLogPageRead(&pagelsn, emode, false, false))
-                return NULL;
-
-            /* Check that the continuation on next page looks valid */
-            pageHeader = (XLogPageHeader) readBuf;
-            if (!(pageHeader->xlp_info & XLP_FIRST_IS_CONTRECORD))
-            {
-                ereport(emode_for_corrupt_record(emode, *RecPtr),
-                        (errmsg("there is no contrecord flag in log segment %s, offset %u",
-                                XLogFileNameP(curFileTLI, readSegNo),
-                                readOff)));
-                goto next_record_is_invalid;
-            }
-            /*
-             * Cross-check that xlp_rem_len agrees with how much of the record
-             * we expect there to be left.
-             */
-            if (pageHeader->xlp_rem_len == 0 ||
-                total_len != (pageHeader->xlp_rem_len + gotlen))
-            {
-                ereport(emode_for_corrupt_record(emode, *RecPtr),
-                        (errmsg("invalid contrecord length %u in log segment %s, offset %u",
-                                pageHeader->xlp_rem_len,
-                                XLogFileNameP(curFileTLI, readSegNo),
-                                readOff)));
-                goto next_record_is_invalid;
-            }
+            lastSourceFailed = true;
-            /* Append the continuation from this page to the buffer */
-            pageHeaderSize = XLogPageHeaderSize(pageHeader);
-            contrecord = (char *) readBuf + pageHeaderSize;
-            len = XLOG_BLCKSZ - pageHeaderSize;
-            if (pageHeader->xlp_rem_len < len)
-                len = pageHeader->xlp_rem_len;
-            memcpy(buffer, (char *) contrecord, len);
-            buffer += len;
-            gotlen += len;
-
-            /* If we just reassembled the record header, validate it. */
-            if (!gotheader)
+            if (readFile >= 0)            {
-                record = (XLogRecord *) readRecordBuf;
-                if (!ValidXLogRecordHeader(RecPtr, record, emode, randAccess))
-                    goto next_record_is_invalid;
-                gotheader = true;
+                close(readFile);
+                readFile = -1;            }
-        } while (pageHeader->xlp_rem_len > len);
-
-        record = (XLogRecord *) readRecordBuf;
-        if (!RecordIsValid(record, *RecPtr, emode))
-            goto next_record_is_invalid;
-        pageHeaderSize = XLogPageHeaderSize((XLogPageHeader) readBuf);
-        XLogSegNoOffsetToRecPtr(
-            readSegNo,
-            readOff + pageHeaderSize + MAXALIGN(pageHeader->xlp_rem_len),
-            EndRecPtr);
-        ReadRecPtr = *RecPtr;
-    }
-    else
-    {
-        /* Record does not cross a page boundary */
-        if (!RecordIsValid(record, *RecPtr, emode))
-            goto next_record_is_invalid;
-        EndRecPtr = *RecPtr + MAXALIGN(total_len);
-
-        ReadRecPtr = *RecPtr;
-        memcpy(readRecordBuf, record, total_len);
-    }
-
-    /*
-     * Special processing if it's an XLOG SWITCH record
-     */
-    if (record->xl_rmid == RM_XLOG_ID && record->xl_info == XLOG_SWITCH)
-    {
-        /* Pretend it extends to end of segment */
-        EndRecPtr += XLogSegSize - 1;
-        EndRecPtr -= EndRecPtr % XLogSegSize;
-
-        /*
-         * Pretend that readBuf contains the last page of the segment. This is
-         * just to avoid Assert failure in StartupXLOG if XLOG ends with this
-         * segment.
-         */
-        readOff = XLogSegSize - XLOG_BLCKSZ;
-    }
-    return record;
-
-next_record_is_invalid:
-    lastSourceFailed = true;
-
-    if (readFile >= 0)
-    {
-        close(readFile);
-        readFile = -1;
-    }
-
-    /* In standby-mode, keep trying */
-    if (StandbyMode)
-        goto retry;
-    else
-        return NULL;
-}
-
-/*
- * Check whether the xlog header of a page just read in looks valid.
- *
- * This is just a convenience subroutine to avoid duplicated code in
- * ReadRecord.    It's not intended for use from anywhere else.
- */
-static bool
-ValidXLogPageHeader(XLogPageHeader hdr, int emode, bool segmentonly)
-{
-    XLogRecPtr    recaddr;
-
-    XLogSegNoOffsetToRecPtr(readSegNo, readOff, recaddr);
-
-    if (hdr->xlp_magic != XLOG_PAGE_MAGIC)
-    {
-        ereport(emode_for_corrupt_record(emode, recaddr),
-                (errmsg("invalid magic number %04X in log segment %s, offset %u",
-                        hdr->xlp_magic,
-                        XLogFileNameP(curFileTLI, readSegNo),
-                        readOff)));
-        return false;
-    }
-    if ((hdr->xlp_info & ~XLP_ALL_FLAGS) != 0)
-    {
-        ereport(emode_for_corrupt_record(emode, recaddr),
-                (errmsg("invalid info bits %04X in log segment %s, offset %u",
-                        hdr->xlp_info,
-                        XLogFileNameP(curFileTLI, readSegNo),
-                        readOff)));
-        return false;
-    }
-    if (hdr->xlp_info & XLP_LONG_HEADER)
-    {
-        XLogLongPageHeader longhdr = (XLogLongPageHeader) hdr;
-
-        if (longhdr->xlp_sysid != ControlFile->system_identifier)
-        {
-            char        fhdrident_str[32];
-            char        sysident_str[32];
-
-            /*
-             * Format sysids separately to keep platform-dependent format code
-             * out of the translatable message string.
-             */
-            snprintf(fhdrident_str, sizeof(fhdrident_str), UINT64_FORMAT,
-                     longhdr->xlp_sysid);
-            snprintf(sysident_str, sizeof(sysident_str), UINT64_FORMAT,
-                     ControlFile->system_identifier);
-            ereport(emode_for_corrupt_record(emode, recaddr),
-                    (errmsg("WAL file is from different database system"),
-                     errdetail("WAL file database system identifier is %s, pg_control database system identifier is
%s.",
-                               fhdrident_str, sysident_str)));
-            return false;
-        }
-        if (longhdr->xlp_seg_size != XLogSegSize)
-        {
-            ereport(emode_for_corrupt_record(emode, recaddr),
-                    (errmsg("WAL file is from different database system"),
-                     errdetail("Incorrect XLOG_SEG_SIZE in page header.")));
-            return false;
-        }
-        if (longhdr->xlp_xlog_blcksz != XLOG_BLCKSZ)
-        {
-            ereport(emode_for_corrupt_record(emode, recaddr),
-                    (errmsg("WAL file is from different database system"),
-                     errdetail("Incorrect XLOG_BLCKSZ in page header.")));
-            return false;
+            break;        }
-    }
-    else if (readOff == 0)
-    {
-        /* hmm, first page of file doesn't have a long header? */
-        ereport(emode_for_corrupt_record(emode, recaddr),
-                (errmsg("invalid info bits %04X in log segment %s, offset %u",
-                        hdr->xlp_info,
-                        XLogFileNameP(curFileTLI, readSegNo),
-                        readOff)));
-        return false;
-    }
-
-    if (hdr->xlp_pageaddr != recaddr)
-    {
-        ereport(emode_for_corrupt_record(emode, recaddr),
-                (errmsg("unexpected pageaddr %X/%X in log segment %s, offset %u",
-                        (uint32) (hdr->xlp_pageaddr >> 32), (uint32) hdr->xlp_pageaddr,
-                        XLogFileNameP(curFileTLI, readSegNo),
-                        readOff)));
-        return false;
-    }
-    /*
-     * Check page TLI is one of the expected values.
-     */
-    if (!tliInHistory(hdr->xlp_tli, expectedTLEs))
-    {
-        ereport(emode_for_corrupt_record(emode, recaddr),
-                (errmsg("unexpected timeline ID %u in log segment %s, offset %u",
-                        hdr->xlp_tli,
-                        XLogFileNameP(curFileTLI, readSegNo),
-                        readOff)));
-        return false;
-    }
-
-    /*
-     * Since child timelines are always assigned a TLI greater than their
-     * immediate parent's TLI, we should never see TLI go backwards across
-     * successive pages of a consistent WAL sequence.
-     *
-     * Of course this check should only be applied when advancing sequentially
-     * across pages; therefore ReadRecord resets lastPageTLI and
-     * lastSegmentTLI to zero when going to a random page.
-     *
-     * Sometimes we re-open a segment that's already been partially replayed.
-     * In that case we cannot perform the normal TLI check: if there is a
-     * timeline switch within the segment, the first page has a smaller TLI
-     * than later pages following the timeline switch, and we might've read
-     * them already. As a weaker test, we still check that it's not smaller
-     * than the TLI we last saw at the beginning of a segment. Pass
-     * segmentonly = true when re-validating the first page like that, and the
-     * page you're actually interested in comes later.
-     */
-    if (hdr->xlp_tli < (segmentonly ? lastSegmentTLI : lastPageTLI))
-    {
-        ereport(emode_for_corrupt_record(emode, recaddr),
-                (errmsg("out-of-sequence timeline ID %u (after %u) in log segment %s, offset %u",
-                        hdr->xlp_tli,
-                        segmentonly ? lastSegmentTLI : lastPageTLI,
-                        XLogFileNameP(curFileTLI, readSegNo),
-                        readOff)));
-        return false;
-    }
-    lastPageTLI = hdr->xlp_tli;
-    if (readOff == 0)
-        lastSegmentTLI = hdr->xlp_tli;
-
-    return true;
-}
-
-/*
- * Validate an XLOG record header.
- *
- * This is just a convenience subroutine to avoid duplicated code in
- * ReadRecord.    It's not intended for use from anywhere else.
- */
-static bool
-ValidXLogRecordHeader(XLogRecPtr *RecPtr, XLogRecord *record, int emode,
-                      bool randAccess)
-{
-    /*
-     * xl_len == 0 is bad data for everything except XLOG SWITCH, where it is
-     * required.
-     */
-    if (record->xl_rmid == RM_XLOG_ID && record->xl_info == XLOG_SWITCH)
-    {
-        if (record->xl_len != 0)
-        {
-            ereport(emode_for_corrupt_record(emode, *RecPtr),
-                    (errmsg("invalid xlog switch record at %X/%X",
-                            (uint32) ((*RecPtr) >> 32), (uint32) *RecPtr)));
-            return false;
-        }
-    }
-    else if (record->xl_len == 0)
-    {
-        ereport(emode_for_corrupt_record(emode, *RecPtr),
-                (errmsg("record with zero length at %X/%X",
-                        (uint32) ((*RecPtr) >> 32), (uint32) *RecPtr)));
-        return false;
-    }
-    if (record->xl_tot_len < SizeOfXLogRecord + record->xl_len ||
-        record->xl_tot_len > SizeOfXLogRecord + record->xl_len +
-        XLR_MAX_BKP_BLOCKS * (sizeof(BkpBlock) + BLCKSZ))
-    {
-        ereport(emode_for_corrupt_record(emode, *RecPtr),
-                (errmsg("invalid record length at %X/%X",
-                        (uint32) ((*RecPtr) >> 32), (uint32) *RecPtr)));
-        return false;
-    }
-    if (record->xl_rmid > RM_MAX_ID)
-    {
-        ereport(emode_for_corrupt_record(emode, *RecPtr),
-                (errmsg("invalid resource manager ID %u at %X/%X",
-                        record->xl_rmid, (uint32) ((*RecPtr) >> 32), (uint32) *RecPtr)));
-        return false;
-    }
-    if (randAccess)
-    {        /*
-         * We can't exactly verify the prev-link, but surely it should be less
-         * than the record's own address.
+         * Check page TLI is one of the expected values.         */
-        if (!(record->xl_prev < *RecPtr))
+        if (!tliInHistory(xlogreader->latestPageTLI, expectedTLEs))        {
-            ereport(emode_for_corrupt_record(emode, *RecPtr),
-                    (errmsg("record with incorrect prev-link %X/%X at %X/%X",
-                            (uint32) (record->xl_prev >> 32), (uint32) record->xl_prev,
-                            (uint32) ((*RecPtr) >> 32), (uint32) *RecPtr)));
+            char        fname[MAXFNAMELEN];
+            XLogSegNo segno;
+            int32 offset;
+
+            XLByteToSeg(xlogreader->latestPagePtr, segno);
+            offset = xlogreader->latestPagePtr % XLogSegSize;
+            XLogFileName(fname, xlogreader->readPageTLI, segno);
+            ereport(emode_for_corrupt_record(emode,
+                                             RecPtr ? RecPtr : EndRecPtr),
+                    (errmsg("unexpected timeline ID %u in log segment %s, offset %u",
+                            xlogreader->latestPageTLI,
+                            fname,
+                            offset)));            return false;        }
-    }
-    else
-    {
-        /*
-         * Record's prev-link should exactly match our previous location. This
-         * check guards against torn WAL pages where a stale but valid-looking
-         * WAL record starts on a sector boundary.
-         */
-        if (record->xl_prev != ReadRecPtr)
-        {
-            ereport(emode_for_corrupt_record(emode, *RecPtr),
-                    (errmsg("record with incorrect prev-link %X/%X at %X/%X",
-                            (uint32) (record->xl_prev >> 32), (uint32) record->xl_prev,
-                            (uint32) ((*RecPtr) >> 32), (uint32) *RecPtr)));
-            return false;
-        }
-    }
+    } while (StandbyMode && record == NULL);
-    return true;
+    return record;}/*
@@ -5235,6 +4689,8 @@ StartupXLOG(void)    bool        backupEndRequired = false;    bool        backupFromStandby =
false;   DBState        dbstate_at_startup;
 
+    XLogReaderState *xlogreader;
+    XLogPageReadPrivate private;    /*     * Read control file and check XLOG status looks valid.
@@ -5351,6 +4807,16 @@ StartupXLOG(void)    if (StandbyMode)        OwnLatch(&XLogCtl->recoveryWakeupLatch);
+    /* Set up XLOG reader facility */
+    MemSet(&private, 0, sizeof(XLogPageReadPrivate));
+    xlogreader = XLogReaderAllocate(InvalidXLogRecPtr, &XLogPageRead, &private);
+    if (!xlogreader)
+        ereport(ERROR,
+                (errcode(ERRCODE_OUT_OF_MEMORY),
+                 errmsg("out of memory"),
+                 errdetail("Failed while allocating an XLog reading processor")));
+    xlogreader->system_identifier = ControlFile->system_identifier;
+    if (read_backup_label(&checkPointLoc, &backupEndRequired,                          &backupFromStandby))    {
@@ -5358,7 +4824,7 @@ StartupXLOG(void)         * When a backup_label file is present, we want to roll forward from
   * the checkpoint it identifies, rather than using pg_control.         */
 
-        record = ReadCheckpointRecord(checkPointLoc, 0);
+        record = ReadCheckpointRecord(xlogreader, checkPointLoc, 0);        if (record != NULL)        {
memcpy(&checkPoint,XLogRecGetData(record), sizeof(CheckPoint));
 
@@ -5376,7 +4842,7 @@ StartupXLOG(void)             */            if (checkPoint.redo < checkPointLoc)            {
-                if (!ReadRecord(&(checkPoint.redo), LOG, false))
+                if (!ReadRecord(xlogreader, checkPoint.redo, LOG, false))                    ereport(FATAL,
               (errmsg("could not find redo location referenced by checkpoint record"),
errhint("Ifyou are not restoring from a backup, try removing the file \"%s/backup_label\".", DataDir)));
 
@@ -5400,7 +4866,7 @@ StartupXLOG(void)         */        checkPointLoc = ControlFile->checkPoint;        RedoStartLSN
=ControlFile->checkPointCopy.redo;
 
-        record = ReadCheckpointRecord(checkPointLoc, 1);
+        record = ReadCheckpointRecord(xlogreader, checkPointLoc, 1);        if (record != NULL)        {
ereport(DEBUG1,
@@ -5419,7 +4885,7 @@ StartupXLOG(void)        else        {            checkPointLoc = ControlFile->prevCheckPoint;
-            record = ReadCheckpointRecord(checkPointLoc, 2);
+            record = ReadCheckpointRecord(xlogreader, checkPointLoc, 2);            if (record != NULL)            {
            ereport(LOG,
 
@@ -5777,12 +5243,12 @@ StartupXLOG(void)        if (checkPoint.redo < RecPtr)        {            /* back up to find
therecord */
 
-            record = ReadRecord(&(checkPoint.redo), PANIC, false);
+            record = ReadRecord(xlogreader, checkPoint.redo, PANIC, false);        }        else        {
/*just have to read next record after CheckPoint */
 
-            record = ReadRecord(NULL, LOG, false);
+            record = ReadRecord(xlogreader, InvalidXLogRecPtr, LOG, false);        }        if (record != NULL)
@@ -5963,7 +5429,7 @@ StartupXLOG(void)                    break;                /* Else, try to fetch the next WAL
record*/
 
-                record = ReadRecord(NULL, LOG, false);
+                record = ReadRecord(xlogreader, InvalidXLogRecPtr, LOG, false);            } while (record != NULL);
        /*
 
@@ -6013,7 +5479,7 @@ StartupXLOG(void)     * Re-fetch the last valid or last applied record, so we can identify the
* exact endpoint of what we consider the valid portion of WAL.     */
 
-    record = ReadRecord(&LastRec, PANIC, false);
+    record = ReadRecord(xlogreader, LastRec, PANIC, false);    EndOfLog = EndRecPtr;    XLByteToPrevSeg(EndOfLog,
endLogSegNo);
@@ -6117,7 +5583,7 @@ StartupXLOG(void)     * we will use that below.)     */    if (InArchiveRecovery)
-        exitArchiveRecovery(curFileTLI, endLogSegNo);
+        exitArchiveRecovery(xlogreader->readPageTLI, endLogSegNo);    /*     * Prepare to write WAL starting at
EndOfLogposition, and init xlog
 
@@ -6136,8 +5602,15 @@ StartupXLOG(void)     * record spans, not the one it starts in.    The last block is indeed the
  * one we want to use.     */
 
-    Assert(readOff == (XLogCtl->xlblocks[0] - XLOG_BLCKSZ) % XLogSegSize);
-    memcpy((char *) Insert->currpage, readBuf, XLOG_BLCKSZ);
+    if (EndOfLog % XLOG_BLCKSZ == 0)
+    {
+        memset(Insert->currpage, 0, XLOG_BLCKSZ);
+    }
+    else
+    {
+        Assert(readOff == (XLogCtl->xlblocks[0] - XLOG_BLCKSZ) % XLogSegSize);
+        memcpy((char *) Insert->currpage, xlogreader->readBuf, XLOG_BLCKSZ);
+    }    Insert->currpos = (char *) Insert->currpage +        (EndOfLog + XLOG_BLCKSZ - XLogCtl->xlblocks[0]);
@@ -6288,23 +5761,13 @@ StartupXLOG(void)    if (standbyState != STANDBY_DISABLED)
ShutdownRecoveryTransactionEnvironment();
-    /* Shut down readFile facility, free space */
+    /* Shut down xlogreader */    if (readFile >= 0)    {        close(readFile);        readFile = -1;    }
-    if (readBuf)
-    {
-        free(readBuf);
-        readBuf = NULL;
-    }
-    if (readRecordBuf)
-    {
-        free(readRecordBuf);
-        readRecordBuf = NULL;
-        readRecordBufSize = 0;
-    }
+    XLogReaderFree(xlogreader);    /*     * If any of the critical GUCs have changed, log them before we allow
@@ -6554,7 +6017,7 @@ LocalSetXLogInsertAllowed(void) * 1 for "primary", 2 for "secondary", 0 for "other"
(backup_label)*/static XLogRecord *
 
-ReadCheckpointRecord(XLogRecPtr RecPtr, int whichChkpt)
+ReadCheckpointRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int whichChkpt){    XLogRecord *record;
@@ -6578,7 +6041,7 @@ ReadCheckpointRecord(XLogRecPtr RecPtr, int whichChkpt)        return NULL;    }
-    record = ReadRecord(&RecPtr, LOG, true);
+    record = ReadRecord(xlogreader, RecPtr, LOG, true);    if (record == NULL)    {
@@ -9332,28 +8795,24 @@ CancelBackup(void) * XLogPageRead() to try fetching the record from another source, or to *
sleepand retry. */
 
-static bool
-XLogPageRead(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt,
-             bool randAccess)
+static int
+XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen,
+             char *readBuf, TimeLineID *readTLI){
+    XLogPageReadPrivate *private =
+        (XLogPageReadPrivate *) xlogreader->private_data;
+    int            emode = private->emode;    uint32        targetPageOff;
-    uint32        targetRecOff;
-    XLogSegNo    targetSegNo;
-
-    XLByteToSeg(*RecPtr, targetSegNo);
-    targetPageOff = (((*RecPtr) % XLogSegSize) / XLOG_BLCKSZ) * XLOG_BLCKSZ;
-    targetRecOff = (*RecPtr) % XLOG_BLCKSZ;
+    XLogSegNo    targetSegNo PG_USED_FOR_ASSERTS_ONLY;
-    /* Fast exit if we have read the record in the current buffer already */
-    if (!lastSourceFailed && targetSegNo == readSegNo &&
-        targetPageOff == readOff && targetRecOff < readLen)
-        return true;
+    XLByteToSeg(targetPagePtr, targetSegNo);
+    targetPageOff = targetPagePtr % XLogSegSize;    /*     * See if we need to switch to a new segment because the
requestedrecord     * is not in the currently open one.     */
 
-    if (readFile >= 0 && !XLByteInSeg(*RecPtr, readSegNo))
+    if (readFile >= 0 && !XLByteInSeg(targetPagePtr, readSegNo))    {        /*         * Request a restartpoint if
we'vereplayed too much xlog since the
 
@@ -9374,39 +8833,34 @@ XLogPageRead(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt,        readSource = 0;    }
-    XLByteToSeg(*RecPtr, readSegNo);
+    XLByteToSeg(targetPagePtr, readSegNo);retry:    /* See if we need to retrieve more data */    if (readFile < 0 ||
-        (readSource == XLOG_FROM_STREAM && receivedUpto <= *RecPtr))
+        (readSource == XLOG_FROM_STREAM &&
+         receivedUpto <= targetPagePtr + reqLen))    {        if (StandbyMode)        {
-            if (!WaitForWALToBecomeAvailable(*RecPtr, randAccess,
-                                             fetching_ckpt))
+            if (!WaitForWALToBecomeAvailable(targetPagePtr + reqLen,
+                                             private->randAccess,
+                                             private->fetching_ckpt))                goto triggered;        }
-        else
+        /* In archive or crash recovery. */
+        else if (readFile < 0)        {
-            /* In archive or crash recovery. */
-            if (readFile < 0)
-            {
-                int            source;
+            int source;
-                /* Reset curFileTLI if random fetch. */
-                if (randAccess)
-                    curFileTLI = 0;
-
-                if (InArchiveRecovery)
-                    source = XLOG_FROM_ANY;
-                else
-                    source = XLOG_FROM_PG_XLOG;
+            if (InArchiveRecovery)
+                source = XLOG_FROM_ANY;
+            else
+                source = XLOG_FROM_PG_XLOG;
-                readFile = XLogFileReadAnyTLI(readSegNo, emode, source);
-                if (readFile < 0)
-                    return false;
-            }
+            readFile = XLogFileReadAnyTLI(readSegNo, emode, source);
+            if (readFile < 0)
+                return -1;        }    }
@@ -9424,72 +8878,46 @@ retry:     */    if (readSource == XLOG_FROM_STREAM)    {
-        if (((*RecPtr) / XLOG_BLCKSZ) != (receivedUpto / XLOG_BLCKSZ))
-        {
+        if (((targetPagePtr) / XLOG_BLCKSZ) != (receivedUpto / XLOG_BLCKSZ))            readLen = XLOG_BLCKSZ;
-        }        else            readLen = receivedUpto % XLogSegSize - targetPageOff;    }    else        readLen =
XLOG_BLCKSZ;
-    if (!readFileHeaderValidated && targetPageOff != 0)
-    {
-        /*
-         * Whenever switching to a new WAL segment, we read the first page of
-         * the file and validate its header, even if that's not where the
-         * target record is.  This is so that we can check the additional
-         * identification info that is present in the first page's "long"
-         * header.
-         */
-        readOff = 0;
-        if (read(readFile, readBuf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
-        {
-            char fname[MAXFNAMELEN];
-            XLogFileName(fname, curFileTLI, readSegNo);
-            ereport(emode_for_corrupt_record(emode, *RecPtr),
-                    (errcode_for_file_access(),
-                     errmsg("could not read from log segment %s, offset %u: %m",
-                            fname, readOff)));
-            goto next_record_is_invalid;
-        }
-        if (!ValidXLogPageHeader((XLogPageHeader) readBuf, emode, true))
-            goto next_record_is_invalid;
-    }
-    /* Read the requested page */    readOff = targetPageOff;    if (lseek(readFile, (off_t) readOff, SEEK_SET) < 0)
{        char fname[MAXFNAMELEN];
 
+        XLogFileName(fname, curFileTLI, readSegNo);
-        ereport(emode_for_corrupt_record(emode, *RecPtr),
+        ereport(emode_for_corrupt_record(emode, targetPagePtr + reqLen),                (errcode_for_file_access(),
    errmsg("could not seek in log segment %s to offset %u: %m",
 
-                fname, readOff)));
+                        fname, readOff)));        goto next_record_is_invalid;    }
+    if (read(readFile, readBuf, XLOG_BLCKSZ) != XLOG_BLCKSZ)    {        char fname[MAXFNAMELEN];
+        XLogFileName(fname, curFileTLI, readSegNo);
-        ereport(emode_for_corrupt_record(emode, *RecPtr),
+        ereport(emode_for_corrupt_record(emode, targetPagePtr + reqLen),                (errcode_for_file_access(),
    errmsg("could not read from log segment %s, offset %u: %m",
 
-                fname, readOff)));
+                        fname, readOff)));        goto next_record_is_invalid;    }
-    if (!ValidXLogPageHeader((XLogPageHeader) readBuf, emode, false))
-        goto next_record_is_invalid;
-
-    readFileHeaderValidated = true;    Assert(targetSegNo == readSegNo);    Assert(targetPageOff == readOff);
-    Assert(targetRecOff < readLen);
+    Assert(reqLen <= readLen);
-    return true;
+    *readTLI = curFileTLI;
+    return readLen;next_record_is_invalid:    lastSourceFailed = true;
@@ -9504,7 +8932,7 @@ next_record_is_invalid:    if (StandbyMode)        goto retry;    else
-        return false;
+        return -1;triggered:    if (readFile >= 0)
@@ -9513,7 +8941,7 @@ triggered:    readLen = 0;    readSource = 0;
-    return false;
+    return -1;}/*
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
new file mode 100644
index 0000000..6a420e6
--- /dev/null
+++ b/src/backend/access/transam/xlogreader.c
@@ -0,0 +1,987 @@
+/*-------------------------------------------------------------------------
+ *
+ * xlogreader.c
+ *        Generic xlog reading facility
+ *
+ * Portions Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *        src/backend/access/transam/xlogreader.c
+ *
+ * NOTES
+ *        Documentation about how do use this interface can be found in
+ *        xlogreader.h, more specifically in the definition of the
+ *        XLogReaderState struct where all parameters are documented.
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/transam.h"
+#include "access/xlog.h"
+#include "access/xlog_internal.h"
+#include "access/xlogreader.h"
+#include "catalog/pg_control.h"
+
+static bool allocate_recordbuf(XLogReaderState *state, uint32 reclength);
+
+static bool ValidXLogPageHeader(XLogReaderState *state, XLogRecPtr recptr,
+                                XLogPageHeader hdr);
+static bool ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr,
+        XLogRecPtr PrevRecPtr, XLogRecord *record, bool randAccess);
+static bool ValidXLogRecord(XLogReaderState *state, XLogRecord *record,
+                            XLogRecPtr recptr);
+static int ReadPageInternal(struct XLogReaderState *state, XLogRecPtr pageptr,
+                 int reqLen);
+static void report_invalid_record(XLogReaderState *state, const char *fmt, ...)
+/* This extension allows gcc to check the format string for consistency with
+   the supplied arguments. */
+__attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 3)));
+
+/* size of the buffer allocated for error message. */
+#define MAX_ERRORMSG_LEN 1000
+
+/*
+ * Construct a string in state->errormsg_buf explaining what's wrong with
+ * the current record being read.
+ */
+static void
+report_invalid_record(XLogReaderState *state, const char *fmt, ...)
+{
+    va_list    args;
+
+    fmt = _(fmt);
+
+    va_start(args, fmt);
+    vsnprintf(state->errormsg_buf, MAX_ERRORMSG_LEN, fmt, args);
+    va_end(args);
+}
+
+/*
+ * Allocate and initialize a new xlog reader
+ *
+ * Returns NULL if the xlogreader couldn't be allocated.
+ */
+XLogReaderState *
+XLogReaderAllocate(XLogRecPtr startpoint, XLogPageReadCB pagereadfunc,
+                   void *private_data)
+{
+    XLogReaderState *state;
+
+    state = (XLogReaderState *) malloc(sizeof(XLogReaderState));
+    if (!state)
+        return NULL;
+    MemSet(state, 0, sizeof(XLogReaderState));
+
+    /*
+     * Permanently allocate readBuf.  We do it this way, rather than just
+     * making a static array, for two reasons: (1) no need to waste the
+     * storage in most instantiations of the backend; (2) a static char array
+     * isn't guaranteed to have any particular alignment, whereas malloc()
+     * will provide MAXALIGN'd storage.
+     */
+    state->readBuf = (char *) malloc(XLOG_BLCKSZ);
+    if (!state->readBuf)
+    {
+        free(state);
+        return NULL;
+    }
+
+    state->read_page = pagereadfunc;
+    state->private_data = private_data;
+    state->EndRecPtr = startpoint;
+    state->readPageTLI = 0;
+    state->system_identifier = 0;
+    state->errormsg_buf = malloc(MAX_ERRORMSG_LEN + 1);
+    if (!state->errormsg_buf)
+    {
+        free(state->readBuf);
+        free(state);
+        return NULL;
+    }
+    state->errormsg_buf[0] = '\0';
+
+    /*
+     * Allocate an initial readRecordBuf of minimal size, which can later be
+     * enlarged if necessary.
+     */
+    if (!allocate_recordbuf(state, 0))
+    {
+        free(state->errormsg_buf);
+        free(state->readBuf);
+        free(state);
+        return NULL;
+    }
+
+    return state;
+}
+
+void
+XLogReaderFree(XLogReaderState *state)
+{
+    free(state->errormsg_buf);
+    if (state->readRecordBuf)
+        free(state->readRecordBuf);
+    free(state->readBuf);
+    free(state);
+}
+
+/*
+ * Allocate readRecordBuf to fit a record of at least the given length.
+ * Returns true if successful, false if out of memory.
+ *
+ * readRecordBufSize is set to the new buffer size.
+ *
+ * To avoid useless small increases, round its size to a multiple of
+ * XLOG_BLCKSZ, and make sure it's at least 5*Max(BLCKSZ, XLOG_BLCKSZ) to start
+ * with.  (That is enough for all "normal" records, but very large commit or
+ * abort records might need more space.)
+ */
+static bool
+allocate_recordbuf(XLogReaderState *state, uint32 reclength)
+{
+    uint32        newSize = reclength;
+
+    newSize += XLOG_BLCKSZ - (newSize % XLOG_BLCKSZ);
+    newSize = Max(newSize, 5 * Max(BLCKSZ, XLOG_BLCKSZ));
+
+    if (state->readRecordBuf)
+        free(state->readRecordBuf);
+    state->readRecordBuf = (char *) malloc(newSize);
+    if (!state->readRecordBuf)
+    {
+        state->readRecordBufSize = 0;
+        return false;
+    }
+
+    state->readRecordBufSize = newSize;
+    return true;
+}
+
+/*
+ * Attempt to read an XLOG record.
+ *
+ * If RecPtr is not NULL, try to read a record at that position.  Otherwise
+ * try to read a record just after the last one previously read.
+ *
+ * If no valid record is available, returns NULL. On NULL return, *errormsg
+ * is usually set to a string with details of the failure. One typical error
+ * where *errormsg is not set is when the read_page callback returns an error.
+ *
+ * The returned pointer (or *errormsg) points to an internal buffer that's
+ * valid until the next call to XLogReadRecord.
+ */
+XLogRecord *
+XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg)
+{
+    XLogRecord *record;
+    XLogRecPtr    tmpRecPtr = state->EndRecPtr;
+    XLogRecPtr  targetPagePtr;
+    bool        randAccess = false;
+    uint32        len,
+                total_len;
+    uint32        targetRecOff;
+    uint32        pageHeaderSize;
+    bool        gotheader;
+    int         readOff;
+
+    *errormsg = NULL;
+    state->errormsg_buf[0] = '\0';
+
+    if (RecPtr == InvalidXLogRecPtr)
+    {
+        RecPtr = tmpRecPtr;
+
+        if (state->ReadRecPtr == InvalidXLogRecPtr)
+            randAccess = true;
+
+        /*
+         * RecPtr is pointing to end+1 of the previous WAL record.    If we're
+         * at a page boundary, no more records can fit on the current page. We
+         * must skip over the page header, but we can't do that until we've
+         * read in the page, since the header size is variable.
+         */
+    }
+    else
+    {
+        /*
+         * In this case, the passed-in record pointer should already be
+         * pointing to a valid record starting position.
+         */
+        Assert(XRecOffIsValid(RecPtr));
+        randAccess = true;        /* allow readPageTLI to go backwards too */
+    }
+
+    targetPagePtr = RecPtr - (RecPtr % XLOG_BLCKSZ);
+
+    /* Read the page containing the record into state->readBuf */
+    readOff = ReadPageInternal(state, targetPagePtr, SizeOfXLogRecord);
+
+    if (readOff < 0)
+    {
+        if (state->errormsg_buf[0] != '\0')
+            *errormsg = state->errormsg_buf;
+        return NULL;
+    }
+
+    /* ReadPageInternal always returns at least the page header */
+    pageHeaderSize = XLogPageHeaderSize((XLogPageHeader) state->readBuf);
+    targetRecOff = RecPtr % XLOG_BLCKSZ;
+    if (targetRecOff == 0)
+    {
+        /*
+         * At page start, so skip over page header.
+         */
+        RecPtr += pageHeaderSize;
+        targetRecOff = pageHeaderSize;
+    }
+    else if (targetRecOff < pageHeaderSize)
+    {
+        report_invalid_record(state, "invalid record offset at %X/%X",
+                              (uint32) (RecPtr >> 32), (uint32) RecPtr);
+        *errormsg = state->errormsg_buf;
+        return NULL;
+    }
+
+    if ((((XLogPageHeader) state->readBuf)->xlp_info & XLP_FIRST_IS_CONTRECORD) &&
+        targetRecOff == pageHeaderSize)
+    {
+        report_invalid_record(state, "contrecord is requested by %X/%X",
+                              (uint32) (RecPtr >> 32), (uint32) RecPtr);
+        *errormsg = state->errormsg_buf;
+        return NULL;
+    }
+
+    /* ReadPageInternal has verified the page header */
+    Assert(pageHeaderSize <= readOff);
+
+    /*
+     * Ensure the whole record header or at least the part on this page is
+     * read.
+     */
+    readOff = ReadPageInternal(state,
+                               targetPagePtr,
+                               Min(targetRecOff + SizeOfXLogRecord, XLOG_BLCKSZ));
+    if (readOff < 0)
+    {
+        if (state->errormsg_buf[0] != '\0')
+            *errormsg = state->errormsg_buf;
+        return NULL;
+    }
+
+    /*
+     * Read the record length.
+     *
+     * NB: Even though we use an XLogRecord pointer here, the whole record
+     * header might not fit on this page. xl_tot_len is the first field of the
+     * struct, so it must be on this page (the records are MAXALIGNed), but we
+     * cannot access any other fields until we've verified that we got the
+     * whole header.
+     */
+    record = (XLogRecord *) (state->readBuf + RecPtr % XLOG_BLCKSZ);
+    total_len = record->xl_tot_len;
+
+    /*
+     * If the whole record header is on this page, validate it immediately.
+     * Otherwise do just a basic sanity check on xl_tot_len, and validate the
+     * rest of the header after reading it from the next page.    The xl_tot_len
+     * check is necessary here to ensure that we enter the "Need to reassemble
+     * record" code path below; otherwise we might fail to apply
+     * ValidXLogRecordHeader at all.
+     */
+    if (targetRecOff <= XLOG_BLCKSZ - SizeOfXLogRecord)
+    {
+        if (!ValidXLogRecordHeader(state, RecPtr, state->ReadRecPtr, record,
+                                   randAccess))
+        {
+            if (state->errormsg_buf[0] != '\0')
+                *errormsg = state->errormsg_buf;
+            return NULL;
+        }
+        gotheader = true;
+    }
+    else
+    {
+        /* XXX: more validation should be done here */
+        if (total_len < SizeOfXLogRecord)
+        {
+            report_invalid_record(state, "invalid record length at %X/%X",
+                                  (uint32) (RecPtr >> 32), (uint32) RecPtr);
+            *errormsg = state->errormsg_buf;
+            return NULL;
+        }
+        gotheader = false;
+    }
+
+    /*
+     * Enlarge readRecordBuf as needed.
+     */
+    if (total_len > state->readRecordBufSize &&
+        !allocate_recordbuf(state, total_len))
+    {
+        /* We treat this as a "bogus data" condition */
+        report_invalid_record(state, "record length %u at %X/%X too long",
+                              total_len,
+                              (uint32) (RecPtr >> 32), (uint32) RecPtr);
+        *errormsg = state->errormsg_buf;
+        return NULL;
+    }
+
+    len = XLOG_BLCKSZ - RecPtr % XLOG_BLCKSZ;
+    if (total_len > len)
+    {
+        /* Need to reassemble record */
+        char       *contdata;
+        XLogPageHeader pageHeader;
+        char       *buffer;
+        uint32        gotlen;
+
+        /* Copy the first fragment of the record from the first page. */
+        memcpy(state->readRecordBuf,
+               state->readBuf + RecPtr % XLOG_BLCKSZ, len);
+        buffer = state->readRecordBuf + len;
+        gotlen = len;
+
+        do
+        {
+            /* Calculate pointer to beginning of next page */
+            targetPagePtr += XLOG_BLCKSZ;
+
+            /* Wait for the next page to become available */
+            readOff = ReadPageInternal(state, targetPagePtr,
+                                       Min(len, XLOG_BLCKSZ));
+
+            if (readOff < 0)
+                goto err;
+
+            Assert(SizeOfXLogShortPHD <= readOff);
+
+            /* Check that the continuation on next page looks valid */
+            pageHeader = (XLogPageHeader) state->readBuf;
+            if (!(pageHeader->xlp_info & XLP_FIRST_IS_CONTRECORD))
+            {
+                report_invalid_record(state,
+                                      "there is no contrecord flag at %X/%X",
+                                  (uint32) (RecPtr >> 32), (uint32) RecPtr);
+                goto err;
+            }
+
+            /*
+             * Cross-check that xlp_rem_len agrees with how much of the record
+             * we expect there to be left.
+             */
+            if (pageHeader->xlp_rem_len == 0 ||
+                total_len != (pageHeader->xlp_rem_len + gotlen))
+            {
+                report_invalid_record(state,
+                                      "invalid contrecord length %u at %X/%X",
+                                      pageHeader->xlp_rem_len,
+                                  (uint32) (RecPtr >> 32), (uint32) RecPtr);
+                goto err;
+            }
+
+            /* Append the continuation from this page to the buffer */
+            pageHeaderSize = XLogPageHeaderSize(pageHeader);
+            Assert(pageHeaderSize <= readOff);
+
+            contdata = (char *) state->readBuf + pageHeaderSize;
+            len = XLOG_BLCKSZ - pageHeaderSize;
+            if (pageHeader->xlp_rem_len < len)
+                len = pageHeader->xlp_rem_len;
+
+            memcpy(buffer, (char *) contdata, len);
+            buffer += len;
+            gotlen += len;
+
+            /* If we just reassembled the record header, validate it. */
+            if (!gotheader)
+            {
+                record = (XLogRecord *) state->readRecordBuf;
+                if (!ValidXLogRecordHeader(state, RecPtr, state->ReadRecPtr,
+                                           record, randAccess))
+                    goto err;
+                gotheader = true;
+            }
+        } while (gotlen < total_len);
+
+        Assert(gotheader);
+
+        record = (XLogRecord *) state->readRecordBuf;
+        if (!ValidXLogRecord(state, record, RecPtr))
+            goto err;
+
+        pageHeaderSize = XLogPageHeaderSize((XLogPageHeader) state->readBuf);
+        state->ReadRecPtr = RecPtr;
+        state->EndRecPtr = targetPagePtr + pageHeaderSize
+            + MAXALIGN(pageHeader->xlp_rem_len);
+    }
+    else
+    {
+        /* Wait for the record data to become available */
+        readOff = ReadPageInternal(state, targetPagePtr,
+                                   Min(targetRecOff + total_len, XLOG_BLCKSZ));
+        if (readOff < 0)
+            goto err;
+
+        /* Record does not cross a page boundary */
+        if (!ValidXLogRecord(state, record, RecPtr))
+            goto err;
+
+        state->EndRecPtr = RecPtr + MAXALIGN(total_len);
+
+        state->ReadRecPtr = RecPtr;
+        memcpy(state->readRecordBuf, record, total_len);
+    }
+
+    /*
+     * Special processing if it's an XLOG SWITCH record
+     */
+    if (record->xl_rmid == RM_XLOG_ID && record->xl_info == XLOG_SWITCH)
+    {
+        /* Pretend it extends to end of segment */
+        state->EndRecPtr += XLogSegSize - 1;
+        state->EndRecPtr -= state->EndRecPtr % XLogSegSize;
+    }
+
+    return record;
+
+err:
+    /*
+     * Invalidate the xlog page we've cached. We might read from a different
+     * source after failure.
+     */
+    state->readSegNo = 0;
+    state->readOff = 0;
+    state->readLen = 0;
+
+    if (state->errormsg_buf[0] != '\0')
+        *errormsg = state->errormsg_buf;
+
+    return NULL;
+}
+
+/*
+ * Read a single xlog page including at least [pagestart, RecPtr] of valid data
+ * via the read_page() callback.
+ *
+ * Returns -1 if the required page cannot be read for some reason.
+ *
+ * We fetch the page from a reader-local cache if we know we have the required
+ * data and if there hasn't been any error since caching the data.
+ */
+static int
+ReadPageInternal(struct XLogReaderState *state, XLogRecPtr pageptr,
+                 int reqLen)
+{
+    int            readLen;
+    uint32        targetPageOff;
+    XLogSegNo    targetSegNo;
+    XLogPageHeader hdr;
+
+    Assert((pageptr % XLOG_BLCKSZ) == 0);
+
+    XLByteToSeg(pageptr, targetSegNo);
+    targetPageOff = (pageptr % XLogSegSize);
+
+    /* check whether we have all the requested data already */
+    if (targetSegNo == state->readSegNo && targetPageOff == state->readOff &&
+        reqLen < state->readLen)
+        return state->readLen;
+
+    /*
+     * Data is not cached.
+     *
+     * Everytime we actually read the page, even if we looked at parts of it
+     * before, we need to do verification as the read_page callback might now
+     * be rereading data from a different source.
+     *
+     * Whenever switching to a new WAL segment, we read the first page of the
+     * file and validate its header, even if that's not where the target record
+     * is.  This is so that we can check the additional identification info
+     * that is present in the first page's "long" header.
+     */
+    if (targetSegNo != state->readSegNo &&
+        targetPageOff != 0)
+    {
+        XLogPageHeader hdr;
+        XLogRecPtr targetSegmentPtr = pageptr - targetPageOff;
+
+        readLen = state->read_page(state, targetSegmentPtr, XLOG_BLCKSZ,
+                                   state->readBuf, &state->readPageTLI);
+
+        if (readLen < 0)
+            goto err;
+
+        Assert(readLen <= XLOG_BLCKSZ);
+
+        /* we can be sure to have enough WAL available, we scrolled back */
+        Assert(readLen == XLOG_BLCKSZ);
+
+        hdr = (XLogPageHeader) state->readBuf;
+
+        if (!ValidXLogPageHeader(state, targetSegmentPtr, hdr))
+            goto err;
+    }
+
+    /* now read the target data */
+    readLen = state->read_page(state, pageptr, Max(reqLen, SizeOfXLogShortPHD),
+                               state->readBuf, &state->readPageTLI);
+    if (readLen < 0)
+        goto err;
+
+    Assert(readLen <= XLOG_BLCKSZ);
+
+    /* check we have enough data to check for the actual length of a the page header */
+    if (readLen <= SizeOfXLogShortPHD)
+        goto err;
+
+    Assert(readLen >= reqLen);
+
+    hdr = (XLogPageHeader) state->readBuf;
+
+    /* still not enough */
+    if (readLen < XLogPageHeaderSize(hdr))
+    {
+        readLen = state->read_page(state, pageptr, XLogPageHeaderSize(hdr),
+                                   state->readBuf, &state->readPageTLI);
+        if (readLen < 0)
+            goto err;
+    }
+
+    if (!ValidXLogPageHeader(state, pageptr, hdr))
+        goto err;
+
+    /* update cache information */
+    state->readSegNo = targetSegNo;
+    state->readOff = targetPageOff;
+    state->readLen = readLen;
+
+    return readLen;
+err:
+    state->readSegNo = 0;
+    state->readOff = 0;
+    state->readLen = 0;
+    return -1;
+}
+
+/*
+ * Validate an XLOG record header.
+ *
+ * This is just a convenience subroutine to avoid duplicated code in
+ * XLogReadRecord.    It's not intended for use from anywhere else.
+ */
+static bool
+ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr,
+                      XLogRecPtr PrevRecPtr, XLogRecord *record,
+                      bool randAccess)
+{
+    /*
+     * xl_len == 0 is bad data for everything except XLOG SWITCH, where it is
+     * required.
+     */
+    if (record->xl_rmid == RM_XLOG_ID && record->xl_info == XLOG_SWITCH)
+    {
+        if (record->xl_len != 0)
+        {
+            report_invalid_record(state,
+                                  "invalid xlog switch record at %X/%X",
+                                  (uint32) (RecPtr >> 32), (uint32) RecPtr);
+            return false;
+        }
+    }
+    else if (record->xl_len == 0)
+    {
+        report_invalid_record(state,
+                              "record with zero length at %X/%X",
+                              (uint32) (RecPtr >> 32), (uint32) RecPtr);
+        return false;
+    }
+    if (record->xl_tot_len < SizeOfXLogRecord + record->xl_len ||
+        record->xl_tot_len > SizeOfXLogRecord + record->xl_len +
+        XLR_MAX_BKP_BLOCKS * (sizeof(BkpBlock) + BLCKSZ))
+    {
+        report_invalid_record(state,
+                              "invalid record length at %X/%X",
+                              (uint32) (RecPtr >> 32), (uint32) RecPtr);
+        return false;
+    }
+    if (record->xl_rmid > RM_MAX_ID)
+    {
+        report_invalid_record(state,
+                              "invalid resource manager ID %u at %X/%X",
+                              record->xl_rmid, (uint32) (RecPtr >> 32),
+                              (uint32) RecPtr);
+        return false;
+    }
+    if (randAccess)
+    {
+        /*
+         * We can't exactly verify the prev-link, but surely it should be less
+         * than the record's own address.
+         */
+        if (!(record->xl_prev < RecPtr))
+        {
+            report_invalid_record(state,
+                                  "record with incorrect prev-link %X/%X at %X/%X",
+                                  (uint32) (record->xl_prev >> 32),
+                                  (uint32) record->xl_prev,
+                                  (uint32) (RecPtr >> 32), (uint32) RecPtr);
+            return false;
+        }
+    }
+    else
+    {
+        /*
+         * Record's prev-link should exactly match our previous location. This
+         * check guards against torn WAL pages where a stale but valid-looking
+         * WAL record starts on a sector boundary.
+         */
+        if (record->xl_prev != PrevRecPtr)
+        {
+            report_invalid_record(state,
+                                  "record with incorrect prev-link %X/%X at %X/%X",
+                                  (uint32) (record->xl_prev >> 32),
+                                  (uint32) record->xl_prev,
+                                  (uint32) (RecPtr >> 32), (uint32) RecPtr);
+            return false;
+        }
+    }
+
+    return true;
+}
+
+
+/*
+ * CRC-check an XLOG record.  We do not believe the contents of an XLOG
+ * record (other than to the minimal extent of computing the amount of
+ * data to read in) until we've checked the CRCs.
+ *
+ * We assume all of the record (that is, xl_tot_len bytes) has been read
+ * into memory at *record.    Also, ValidXLogRecordHeader() has accepted the
+ * record's header, which means in particular that xl_tot_len is at least
+ * SizeOfXlogRecord, so it is safe to fetch xl_len.
+ */
+static bool
+ValidXLogRecord(XLogReaderState *state, XLogRecord *record, XLogRecPtr recptr)
+{
+    pg_crc32    crc;
+    int            i;
+    uint32        len = record->xl_len;
+    BkpBlock    bkpb;
+    char       *blk;
+    size_t        remaining = record->xl_tot_len;
+
+    /* First the rmgr data */
+    if (remaining < SizeOfXLogRecord + len)
+    {
+        /* ValidXLogRecordHeader() should've caught this already... */
+        report_invalid_record(state, "invalid record length at %X/%X",
+                              (uint32) (recptr >> 32), (uint32) recptr);
+        return false;
+    }
+    remaining -= SizeOfXLogRecord + len;
+    INIT_CRC32(crc);
+    COMP_CRC32(crc, XLogRecGetData(record), len);
+
+    /* Add in the backup blocks, if any */
+    blk = (char *) XLogRecGetData(record) + len;
+    for (i = 0; i < XLR_MAX_BKP_BLOCKS; i++)
+    {
+        uint32        blen;
+
+        if (!(record->xl_info & XLR_BKP_BLOCK(i)))
+            continue;
+
+        if (remaining < sizeof(BkpBlock))
+        {
+            report_invalid_record(state,
+                              "invalid backup block size in record at %X/%X",
+                                  (uint32) (recptr >> 32), (uint32) recptr);
+            return false;
+        }
+        memcpy(&bkpb, blk, sizeof(BkpBlock));
+
+        if (bkpb.hole_offset + bkpb.hole_length > BLCKSZ)
+        {
+            report_invalid_record(state,
+                                  "incorrect hole size in record at %X/%X",
+                                  (uint32) (recptr >> 32), (uint32) recptr);
+            return false;
+        }
+        blen = sizeof(BkpBlock) + BLCKSZ - bkpb.hole_length;
+
+        if (remaining < blen)
+        {
+            report_invalid_record(state,
+                              "invalid backup block size in record at %X/%X",
+                                  (uint32) (recptr >> 32), (uint32) recptr);
+            return false;
+        }
+        remaining -= blen;
+        COMP_CRC32(crc, blk, blen);
+        blk += blen;
+    }
+
+    /* Check that xl_tot_len agrees with our calculation */
+    if (remaining != 0)
+    {
+        report_invalid_record(state,
+                              "incorrect total length in record at %X/%X",
+                              (uint32) (recptr >> 32), (uint32) recptr);
+        return false;
+    }
+
+    /* Finally include the record header */
+    COMP_CRC32(crc, (char *) record, offsetof(XLogRecord, xl_crc));
+    FIN_CRC32(crc);
+
+    if (!EQ_CRC32(record->xl_crc, crc))
+    {
+        report_invalid_record(state,
+                 "incorrect resource manager data checksum in record at %X/%X",
+                              (uint32) (recptr >> 32), (uint32) recptr);
+        return false;
+    }
+
+    return true;
+}
+
+static bool
+ValidXLogPageHeader(XLogReaderState *state, XLogRecPtr recptr,
+                    XLogPageHeader hdr)
+{
+    XLogRecPtr    recaddr;
+    XLogSegNo segno;
+    int32 offset;
+
+    Assert((recptr % XLOG_BLCKSZ) == 0);
+
+    XLByteToSeg(recptr, segno);
+    offset = recptr % XLogSegSize;
+
+    XLogSegNoOffsetToRecPtr(segno, offset, recaddr);
+
+    if (hdr->xlp_magic != XLOG_PAGE_MAGIC)
+    {
+        char        fname[MAXFNAMELEN];
+
+        XLogFileName(fname, state->readPageTLI, segno);
+
+        report_invalid_record(state,
+                      "invalid magic number %04X in log segment %s, offset %u",
+                              hdr->xlp_magic,
+                              fname,
+                              offset);
+        return false;
+    }
+
+    if ((hdr->xlp_info & ~XLP_ALL_FLAGS) != 0)
+    {
+        char        fname[MAXFNAMELEN];
+
+        XLogFileName(fname, state->readPageTLI, segno);
+
+        report_invalid_record(state,
+                        "invalid info bits %04X in log segment %s, offset %u",
+                              hdr->xlp_info,
+                              fname,
+                              offset);
+        return false;
+    }
+
+    if (hdr->xlp_info & XLP_LONG_HEADER)
+    {
+        XLogLongPageHeader longhdr = (XLogLongPageHeader) hdr;
+
+        if (state->system_identifier &&
+            longhdr->xlp_sysid != state->system_identifier)
+        {
+            char        fhdrident_str[32];
+            char        sysident_str[32];
+
+            /*
+             * Format sysids separately to keep platform-dependent format code
+             * out of the translatable message string.
+             */
+            snprintf(fhdrident_str, sizeof(fhdrident_str), UINT64_FORMAT,
+                     longhdr->xlp_sysid);
+            snprintf(sysident_str, sizeof(sysident_str), UINT64_FORMAT,
+                     state->system_identifier);
+            report_invalid_record(state,
+                      "WAL file is from different database system: WAL file database system identifier is %s,
pg_controldatabase system identifier is %s.",
 
+                                  fhdrident_str, sysident_str);
+            return false;
+        }
+        else if (longhdr->xlp_seg_size != XLogSegSize)
+        {
+            report_invalid_record(state,
+                      "WAL file is from different database system: Incorrect XLOG_SEG_SIZE in page header.");
+            return false;
+        }
+        else if (longhdr->xlp_xlog_blcksz != XLOG_BLCKSZ)
+        {
+            report_invalid_record(state,
+                     "WAL file is from different database system: Incorrect XLOG_BLCKSZ in page header.");
+            return false;
+        }
+    }
+    else if (offset == 0)
+    {
+        char        fname[MAXFNAMELEN];
+
+        XLogFileName(fname, state->readPageTLI, segno);
+
+        /* hmm, first page of file doesn't have a long header? */
+        report_invalid_record(state,
+                      "invalid info bits %04X in log segment %s, offset %u",
+                              hdr->xlp_info,
+                              fname,
+                              offset);
+        return false;
+    }
+
+    if (hdr->xlp_pageaddr != recaddr)
+    {
+        char        fname[MAXFNAMELEN];
+
+        XLogFileName(fname, state->readPageTLI, segno);
+
+        report_invalid_record(state,
+              "unexpected pageaddr %X/%X in log segment %s, offset %u",
+              (uint32) (hdr->xlp_pageaddr >> 32), (uint32) hdr->xlp_pageaddr,
+                              fname,
+                              offset);
+        return false;
+    }
+
+    /*
+     * Since child timelines are always assigned a TLI greater than their
+     * immediate parent's TLI, we should never see TLI go backwards across
+     * successive pages of a consistent WAL sequence.
+     *
+     * Of course this check should only be applied when advancing sequentially
+     * across pages; therefore ReadRecord resets lastPageTLI and lastSegmentTLI
+     * to zero when going to a random page. FIXME
+     *
+     * Sometimes we re-read a segment that's already been (partially) read. So
+     * we only verify TLIs for pages that are later than the last remembered
+     * LSN.
+     *
+     * XXX: This is slightly less precise than the check we did in earlier
+     * times. I don't see a problem with that though.
+     */
+    if (state->latestPagePtr < recptr)
+    {
+        if (hdr->xlp_tli < state->latestPageTLI)
+        {
+            char        fname[MAXFNAMELEN];
+
+            XLogFileName(fname, state->readPageTLI, segno);
+
+            report_invalid_record(state,
+                                  "out-of-sequence timeline ID %u (after %u) in log segment %s, offset %u",
+                                  hdr->xlp_tli,
+                                  state->latestPageTLI,
+                                  fname,
+                                  offset);
+            return false;
+        }
+    }
+    state->latestPagePtr = recptr;
+    state->latestPageTLI = hdr->xlp_tli;
+    return true;
+}
+
+/*
+ * Functions that are currently only needed in the backend, but are better
+ * implemented inside xlogreader because the internal functions available
+ * there.
+ */
+#ifdef FRONTEND
+
+/*
+ * Find the first record with at an lsn >= RecPtr.
+ *
+ * Useful for checking wether RecPtr is a valid xlog address for reading and to
+ * find the first valid address after some address when dumping records for
+ * debugging purposes.
+ */
+XLogRecPtr
+XLogFindNextRecord(XLogReaderState *state, XLogRecPtr RecPtr)
+{
+   XLogReaderState saved_state = *state;
+   XLogRecPtr  targetPagePtr;
+   XLogRecPtr  tmpRecPtr;
+   int targetRecOff;
+   XLogRecPtr found = InvalidXLogRecPtr;
+   uint32      pageHeaderSize;
+   XLogPageHeader header;
+   XLogRecord *record;
+   uint32 readLen;
+   char       *errormsg;
+
+   if (RecPtr == InvalidXLogRecPtr)
+       RecPtr = state->EndRecPtr;
+
+   targetRecOff = RecPtr % XLOG_BLCKSZ;
+
+   /* scroll back to page boundary */
+   targetPagePtr = RecPtr - targetRecOff;
+
+   /* Read the page containing the record */
+   readLen = ReadPageInternal(state, targetPagePtr, targetRecOff);
+   if (readLen < 0)
+       goto err;
+
+   header = (XLogPageHeader) state->readBuf;
+
+   pageHeaderSize = XLogPageHeaderSize(header);
+
+   /* make sure we have enough data for the page header */
+   readLen = ReadPageInternal(state, targetPagePtr, pageHeaderSize);
+   if (readLen < 0)
+       goto err;
+
+   /* skip over potential continuation data */
+   if (header->xlp_info & XLP_FIRST_IS_CONTRECORD)
+   {
+       /* record headers are MAXALIGN'ed */
+       tmpRecPtr = targetPagePtr + pageHeaderSize
+           + MAXALIGN(header->xlp_rem_len);
+   }
+   else
+   {
+       tmpRecPtr = targetPagePtr + pageHeaderSize;
+   }
+
+   /*
+    * we know now that tmpRecPtr is an address pointing to a valid XLogRecord
+    * because either were at the first record after the beginning of a page or
+    * we just jumped over the remaining data of a continuation.
+    */
+   while ((record = XLogReadRecord(state, tmpRecPtr, &errormsg)))
+   {
+       /* continue after the record */
+       tmpRecPtr = InvalidXLogRecPtr;
+
+       /* past the record we've found, break out */
+       if (RecPtr <= state->ReadRecPtr)
+       {
+           found = state->ReadRecPtr;
+           goto out;
+       }
+   }
+
+err:
+out:
+   /* Reset state to what we had before finding the record */
+   state->readSegNo = 0;
+   state->readOff = 0;
+   state->readLen = 0;
+   state->ReadRecPtr = saved_state.ReadRecPtr;
+   state->EndRecPtr = saved_state.EndRecPtr;
+   return found;
+}
+
+#endif /* FRONTEND */
diff --git a/src/backend/nls.mk b/src/backend/nls.mk
index 30f6a2b..c072de7 100644
--- a/src/backend/nls.mk
+++ b/src/backend/nls.mk
@@ -4,12 +4,13 @@ AVAIL_LANGUAGES  = de es fr ja pt_BR tr zh_CN zh_TWGETTEXT_FILES    = + gettext-filesGETTEXT_TRIGGERS
=$(BACKEND_COMMON_GETTEXT_TRIGGERS) \    GUC_check_errmsg GUC_check_errdetail GUC_check_errhint \
 
-    write_stderr yyerror parser_yyerror
+    write_stderr yyerror parser_yyerror report_invalid_recordGETTEXT_FLAGS    = $(BACKEND_COMMON_GETTEXT_FLAGS) \
GUC_check_errmsg:1:c-format\    GUC_check_errdetail:1:c-format \    GUC_check_errhint:1:c-format \
 
-    write_stderr:1:c-format
+    write_stderr:1:c-format \
+    report_invalid_record:2:c-formatgettext-files: distprep    find $(srcdir)/ $(srcdir)/../port/ -name '*.c' -print |
LC_ALL=Csort >$@
 
diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h
new file mode 100644
index 0000000..acc8309
--- /dev/null
+++ b/src/include/access/xlogreader.h
@@ -0,0 +1,141 @@
+/*-------------------------------------------------------------------------
+ *
+ * readxlog.h
+ *
+ *        Generic xlog reading facility.
+ *
+ * Portions Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *        src/include/access/xlogreader.h
+ *
+ * NOTES
+ *        Check the definition of the XLogReaderState struct for instructions on
+ *        how to use the XLogReader infrastructure.
+ *
+ *        The basic idea is to allocate an XLogReaderState via
+ *        XLogReaderAllocate, and call XLogReadRecord() until it returns NULL.
+ *-------------------------------------------------------------------------
+ */
+#ifndef XLOGREADER_H
+#define XLOGREADER_H
+
+#include "access/xlog_internal.h"
+
+struct XLogReaderState;
+
+/*
+ * The callbacks are explained in more detail inside the XLogReaderState
+ * struct.
+ */
+
+typedef int (*XLogPageReadCB) (struct XLogReaderState *state,
+                               XLogRecPtr pageptr,
+                               int reqLen,
+                               char *readBuf,
+                               TimeLineID *pageTLI);
+
+typedef struct XLogReaderState
+{
+    /* ----------------------------------------
+     * Public parameters
+     * ----------------------------------------
+     */
+
+    /*
+     * Data input callback (mandatory).
+     *
+     * This callback shall read the the xlog page (of size XLOG_BLKSZ) in which
+     * RecPtr resides. All data <= RecPtr must be visible. The callback shall
+     * return the range of actually valid bytes returned or -1 upon
+     * failure.
+     *
+     * *pageTLI should be set to the TLI of the file the page was read from
+     * to be in. It is currently used only for error reporting purposes, to
+     * reconstruct the name of the WAL file where an error occurred.
+     */
+    XLogPageReadCB read_page;
+
+    /*
+     * System identifier of the xlog files were about to read.
+     *
+     * Set to zero (the default value) if unknown or unimportant.
+     */
+    uint64        system_identifier;
+
+    /*
+     * Opaque data for callbacks to use.  Not used by XLogReader.
+     */
+    void       *private_data;
+
+    /*
+     * From where to where are we reading
+     */
+    XLogRecPtr    ReadRecPtr;        /* start of last record read */
+    XLogRecPtr    EndRecPtr;        /* end+1 of last record read */
+
+    /*
+     * TLI of the current xlog page
+     */
+    TimeLineID    ReadTimeLineID;
+
+    /* ----------------------------------------
+     * private/internal state
+     * ----------------------------------------
+     */
+
+    /* Buffer for currently read page (XLOG_BLCKSZ bytes) */
+    char       *readBuf;
+
+    /* last read segment, segment offset, read length, TLI */
+    XLogSegNo   readSegNo;
+    uint32      readOff;
+    uint32      readLen;
+    TimeLineID  readPageTLI;
+
+    /* beginning of last page read, and its TLI  */
+    XLogRecPtr    latestPagePtr;
+    TimeLineID    latestPageTLI;
+
+    /* Buffer for current ReadRecord result (expandable) */
+    char       *readRecordBuf;
+    uint32        readRecordBufSize;
+
+    /* Buffer to hold error message */
+    char       *errormsg_buf;
+} XLogReaderState;
+
+/*
+ * Get a new XLogReader
+ *
+ * At least the read_page callback, startptr and endptr have to be set before
+ * the reader can be used.
+ */
+extern XLogReaderState *XLogReaderAllocate(XLogRecPtr startpoint,
+                   XLogPageReadCB pagereadfunc, void *private_data);
+
+/*
+ * Free an XLogReader
+ */
+extern void XLogReaderFree(XLogReaderState *state);
+
+/*
+ * Read the next record from xlog. Returns NULL on end-of-WAL or on failure.
+ */
+extern struct XLogRecord *XLogReadRecord(XLogReaderState *state, XLogRecPtr ptr,
+               char **errormsg);
+
+/*
+ * Functions that are currently only needed in the backend, but are better
+ * implemented inside xlogreader because the internal functions available
+ * there.
+ */
+#ifdef FRONTEND
+/*
+ * Find the address of the first record with a lsn >= RecPtr.
+ */
+extern XLogRecPtr XLogFindNextRecord(XLogReaderState *state, XLogRecPtr RecPtr);
+
+#endif /* FRONTEND */
+
+#endif   /* XLOGREADER_H */
-- 
1.7.12.289.g0ce9864.dirty

[PATCH 5/5] remove spurious space in running_xact's _desc function

From

Andres Freund

Date:

08 January 2013, 19:12:51

From: Andres Freund <andres@anarazel.de>

---src/backend/access/rmgrdesc/standbydesc.c | 2 +-1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/access/rmgrdesc/standbydesc.c b/src/backend/access/rmgrdesc/standbydesc.c
index c38892b..5fb6f54 100644
--- a/src/backend/access/rmgrdesc/standbydesc.c
+++ b/src/backend/access/rmgrdesc/standbydesc.c
@@ -57,7 +57,7 @@ standby_desc(StringInfo buf, uint8 xl_info, char *rec)    {        xl_running_xacts *xlrec =
(xl_running_xacts*) rec;
 
-        appendStringInfo(buf, " running xacts:");
+        appendStringInfo(buf, "running xacts:");        standby_desc_running_xacts(buf, xlrec);    }    else
-- 
1.7.12.289.g0ce9864.dirty

Re: [PATCH] xlogreader-v4

From

Thom Brown

Date:

08 January 2013, 19:16:06

On 8 January 2013 19:09, Andres Freund <andres@2ndquadrant.com> wrote:

From: Andres Freund <andres@2ndquadrant.com>
Subject: [PATCH] xlogreader-v4
In-Reply-To:

Hi,

this is the latest and obviously best version of xlogreader & xlogdump with
changes both from Heikki and me.

Aren't you forgetting something?

--
Thom

Re: [PATCH] xlogreader-v4

From

Thom Brown

Date:

08 January 2013, 19:17:14

On 8 January 2013 19:15, Thom Brown <thom@linux.com> wrote:

On 8 January 2013 19:09, Andres Freund <andres@2ndquadrant.com> wrote:
From: Andres Freund <andres@2ndquadrant.com>
Subject: [PATCH] xlogreader-v4
In-Reply-To:

Hi,

this is the latest and obviously best version of xlogreader & xlogdump with
changes both from Heikki and me.

Aren't you forgetting something?

I see, you're posting them separately. Nevermind.

--
Thom

Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend

From

Tom Lane

Date:

08 January 2013, 19:25:35

Andres Freund <andres@2ndquadrant.com> writes:
> From: Andres Freund <andres@anarazel.de>
> c.h already had parts of the assert support (StaticAssert*) and its the shared
> file between postgres.h and postgres_fe.h. This makes it easier to build
> frontend programs which have to do the hack.

This patch seems unnecessary given that we already put a version of Assert()
into postgres_fe.h.  I don't think that moving the two different
definitions into an #if block in one file is an improvement.  If that
were an improvement, we might as well move everything in both postgres.h
and postgres_fe.h into c.h with a pile of #ifs.
        regards, tom lane

Re: [PATCH] xlogreader-v4

From

Andres Freund

Date:

08 January 2013, 19:26:34

On 2013-01-08 20:09:42 +0100, Andres Freund wrote:
> From: Andres Freund <andres@2ndquadrant.com>
> Subject: [PATCH] xlogreader-v4
> In-Reply-To: 
> 
> Hi,
> 
> this is the latest and obviously best version of xlogreader & xlogdump with
> changes both from Heikki and me.
> 
> Changes:
> * windows build support for pg_xlogdump

That was done blindly, btw, so I only know it compiles, not that it
runs...

Its in git://git.postgresql.org/git/users/andresfreund/postgres.git
branch xlogreader_v4 btw.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it

From

Tom Lane

Date:

08 January 2013, 19:28:38

Andres Freund <andres@2ndquadrant.com> writes:
maxpg> From: Andres Freund <andres@anarazel.de>
> relpathbackend() (via some of its wrappers) is used in *_desc routines which we
> want to be useable without a backend environment arround.

I'm 100% unimpressed with making relpathbackend return a pointer to a
static buffer.  Who's to say whether that won't create bugs due to
overlapping usages?

> Change signature to return a 'const char *' to make misuse easier to
> detect.

That seems to create way more churn than is necessary, and it's wrong
anyway if the result is palloc'd.
        regards, tom lane

Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend

From

Andres Freund

Date:

08 January 2013, 19:31:41

On 2013-01-08 14:25:06 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > From: Andres Freund <andres@anarazel.de>
> > c.h already had parts of the assert support (StaticAssert*) and its the shared
> > file between postgres.h and postgres_fe.h. This makes it easier to build
> > frontend programs which have to do the hack.
>
> This patch seems unnecessary given that we already put a version of Assert()
> into postgres_fe.h.  I don't think that moving the two different
> definitions into an #if block in one file is an improvement.  If that
> were an improvement, we might as well move everything in both postgres.h
> and postgres_fe.h into c.h with a pile of #ifs.

The problem is that some (including existing) pieces of code need to
include postgres.h itself, those can't easily include postgres_fe.h as
well without getting into problems with redefinitions. It seems the most
consistent to move all of that into c.h, enough of the assertion stuff
is already there, I don't see an advantage of splitting it across 3
files as it currently is.

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend

From

Tom Lane

Date:

08 January 2013, 19:35:17

Andres Freund <andres@2ndquadrant.com> writes:
> On 2013-01-08 14:25:06 -0500, Tom Lane wrote:
>> This patch seems unnecessary given that we already put a version of Assert()
>> into postgres_fe.h.

> The problem is that some (including existing) pieces of code need to
> include postgres.h itself, those can't easily include postgres_fe.h as
> well without getting into problems with redefinitions.

There is no place, anywhere, that should be including both.  So I don't
see the problem.
        regards, tom lane

Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it

From

Andres Freund

Date:

08 January 2013, 19:37:07

On 2013-01-08 14:28:14 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> maxpg> From: Andres Freund <andres@anarazel.de>
> > relpathbackend() (via some of its wrappers) is used in *_desc routines which we
> > want to be useable without a backend environment arround.
>
> I'm 100% unimpressed with making relpathbackend return a pointer to a
> static buffer.  Who's to say whether that won't create bugs due to
> overlapping usages?

I say it ;). I've gone through all callers and checked. Not that that
guarantees anything, but ...

The reason a static buffer is better is that some of the *desc routines
use relpathbackend() and pfree() the result. That would require
providing a stub pfree() in xlogdump which seems to be exceedingly ugly.

> > Change signature to return a 'const char *' to make misuse easier to
> > detect.
>
> That seems to create way more churn than is necessary, and it's wrong
> anyway if the result is palloc'd.

It causes warnings in potential external users that pfree() the result
of relpathbackend which seems helpful. Obviously only makes sense if
relpathbackend() returns a pointer into a static buffer...

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it

From

Tom Lane

Date:

08 January 2013, 19:53:35

Andres Freund <andres@2ndquadrant.com> writes:
> On 2013-01-08 14:28:14 -0500, Tom Lane wrote:
>> I'm 100% unimpressed with making relpathbackend return a pointer to a
>> static buffer.  Who's to say whether that won't create bugs due to
>> overlapping usages?

> I say it ;). I've gone through all callers and checked. Not that that
> guarantees anything, but ...

Even if you've proven it safe today, it seems unnecessarily fragile.
Just about any other place we've used a static result buffer, we've
regretted it, unless the use cases were *very* narrow.

> The reason a static buffer is better is that some of the *desc routines
> use relpathbackend() and pfree() the result. That would require
> providing a stub pfree() in xlogdump which seems to be exceedingly ugly.

Why?  If we have palloc support how hard can it be to map pfree to free?
And why wouldn't we want to?  I can hardly imagine providing only palloc
and not pfree support.
        regards, tom lane

Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it

From

Andres Freund

Date:

08 January 2013, 20:00:27

On 2013-01-08 14:53:29 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2013-01-08 14:28:14 -0500, Tom Lane wrote:
> >> I'm 100% unimpressed with making relpathbackend return a pointer to a
> >> static buffer.  Who's to say whether that won't create bugs due to
> >> overlapping usages?
>
> > I say it ;). I've gone through all callers and checked. Not that that
> > guarantees anything, but ...
>
> Even if you've proven it safe today, it seems unnecessarily fragile.
> Just about any other place we've used a static result buffer, we've
> regretted it, unless the use cases were *very* narrow.

Hm, relpathbackend seems pretty narrow to me.

Funny, we both argued the other way round than we are now when talking
about the sprintf(..., "%X/%X", (uint32)(recptr >> 32), (uint32)recptr)
thingy ;)

> > The reason a static buffer is better is that some of the *desc routines
> > use relpathbackend() and pfree() the result. That would require
> > providing a stub pfree() in xlogdump which seems to be exceedingly ugly.
>
> Why?  If we have palloc support how hard can it be to map pfree to free?
> And why wouldn't we want to?  I can hardly imagine providing only palloc
> and not pfree support.

Uhm, we don't have & need palloc support and I don't think
relpathbackend() is a good justification for adding it.

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend

From

Andres Freund

Date:

08 January 2013, 20:18:19

On 2013-01-08 14:35:12 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2013-01-08 14:25:06 -0500, Tom Lane wrote:
> >> This patch seems unnecessary given that we already put a version of Assert()
> >> into postgres_fe.h.
> 
> > The problem is that some (including existing) pieces of code need to
> > include postgres.h itself, those can't easily include postgres_fe.h as
> > well without getting into problems with redefinitions.
> 
> There is no place, anywhere, that should be including both.  So I don't
> see the problem.

Sorry, misremembered the problem somewhat. The problem is that code that
includes postgres.h atm ends up with ExceptionalCondition() et
al. declared even if FRONTEND is defined. So if anything uses an assert
you need to provide wrappers for those which seems nasty. If they are
provided centrally and check for FRONTEND that problem doesn't exist.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it

From

Tom Lane

Date:

08 January 2013, 20:27:29

Andres Freund <andres@2ndquadrant.com> writes:
> Uhm, we don't have & need palloc support and I don't think
> relpathbackend() is a good justification for adding it.

I've said from the very beginning of this effort that it would be
impossible to share any meaningful amount of code between frontend and
backend environments without adding some sort of emulation of
palloc/pfree/elog.  I think this patch is just making the code uglier
and more fragile to put off the inevitable, and that we'd be better
served to bite the bullet and do that.
        regards, tom lane

Re: Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend

From

Alvaro Herrera

Date:

08 January 2013, 20:36:27

Andres Freund wrote:
> On 2013-01-08 14:35:12 -0500, Tom Lane wrote:
> > Andres Freund <andres@2ndquadrant.com> writes:
> > > On 2013-01-08 14:25:06 -0500, Tom Lane wrote:
> > >> This patch seems unnecessary given that we already put a version of Assert()
> > >> into postgres_fe.h.
> >
> > > The problem is that some (including existing) pieces of code need to
> > > include postgres.h itself, those can't easily include postgres_fe.h as
> > > well without getting into problems with redefinitions.
> >
> > There is no place, anywhere, that should be including both.  So I don't
> > see the problem.
>
> Sorry, misremembered the problem somewhat. The problem is that code that
> includes postgres.h atm ends up with ExceptionalCondition() et
> al. declared even if FRONTEND is defined. So if anything uses an assert
> you need to provide wrappers for those which seems nasty. If they are
> provided centrally and check for FRONTEND that problem doesn't exist.

I think the right fix here is to fix things so that postgres.h is not
necessary.  How hard is that?  Maybe it just requires some more
reshuffling of xlog headers.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it

From

Alvaro Herrera

Date:

08 January 2013, 20:38:26

Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > Uhm, we don't have & need palloc support and I don't think
> > relpathbackend() is a good justification for adding it.
>
> I've said from the very beginning of this effort that it would be
> impossible to share any meaningful amount of code between frontend and
> backend environments without adding some sort of emulation of
> palloc/pfree/elog.  I think this patch is just making the code uglier
> and more fragile to put off the inevitable, and that we'd be better
> served to bite the bullet and do that.

As far as this patch is concerned, I think it's sufficient to do

#define palloc(x) malloc(x)
#define pfree(x) free(x)

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it

From

Andres Freund

Date:

08 January 2013, 20:40:07

On 2013-01-08 15:27:23 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > Uhm, we don't have & need palloc support and I don't think
> > relpathbackend() is a good justification for adding it.
> 
> I've said from the very beginning of this effort that it would be
> impossible to share any meaningful amount of code between frontend and
> backend environments without adding some sort of emulation of
> palloc/pfree/elog.  I think this patch is just making the code uglier
> and more fragile to put off the inevitable, and that we'd be better
> served to bite the bullet and do that.

If you think relpathbackend() alone warrants that, yes, I can provide a
wrapper. Everything else is imo already handled in a sensible and not
really ugly manner? Imo its not worth the effort *for this alone*.

I already had some elog(), ereport(), whatever emulation but Heikki
preferred not to have it, so its removed by now.

To what extent do you want palloc et al. emulation? Provide actual pools
or just make redirect to malloc and provide the required symbols (at the
very least CurrentMemoryContext)?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend

From

Andres Freund

Date:

08 January 2013, 20:42:14

On 2013-01-08 17:36:19 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > On 2013-01-08 14:35:12 -0500, Tom Lane wrote:
> > > Andres Freund <andres@2ndquadrant.com> writes:
> > > > On 2013-01-08 14:25:06 -0500, Tom Lane wrote:
> > > >> This patch seems unnecessary given that we already put a version of Assert()
> > > >> into postgres_fe.h.
> > >
> > > > The problem is that some (including existing) pieces of code need to
> > > > include postgres.h itself, those can't easily include postgres_fe.h as
> > > > well without getting into problems with redefinitions.
> > >
> > > There is no place, anywhere, that should be including both.  So I don't
> > > see the problem.
> >
> > Sorry, misremembered the problem somewhat. The problem is that code that
> > includes postgres.h atm ends up with ExceptionalCondition() et
> > al. declared even if FRONTEND is defined. So if anything uses an assert
> > you need to provide wrappers for those which seems nasty. If they are
> > provided centrally and check for FRONTEND that problem doesn't exist.
>
> I think the right fix here is to fix things so that postgres.h is not
> necessary.  How hard is that?  Maybe it just requires some more
> reshuffling of xlog headers.

I don't really think thats realistic. Think the rmgrdesc/* files and
xlogreader.c itself...

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it

From

Tom Lane

Date:

08 January 2013, 20:45:13

Andres Freund <andres@2ndquadrant.com> writes:
> To what extent do you want palloc et al. emulation? Provide actual pools
> or just make redirect to malloc and provide the required symbols (at the
> very least CurrentMemoryContext)?

I don't see any need for memory pools, at least not for frontend
applications of the currently envisioned levels of complexity.  I concur
with Alvaro's suggestion about just #define'ing them to malloc/free ---
or maybe better, pg_malloc/free so that we can have a failure-checking
wrapper.

Not sure how we ought to handle elog, but maybe we can put off that bit
of design until we have a more concrete use-case.
        regards, tom lane

Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it

From

Heikki Linnakangas

Date:

08 January 2013, 20:47:43

On 08.01.2013 22:39, Andres Freund wrote:
> On 2013-01-08 15:27:23 -0500, Tom Lane wrote:
>> Andres Freund<andres@2ndquadrant.com>  writes:
>>> Uhm, we don't have&  need palloc support and I don't think
>>> relpathbackend() is a good justification for adding it.
>>
>> I've said from the very beginning of this effort that it would be
>> impossible to share any meaningful amount of code between frontend and
>> backend environments without adding some sort of emulation of
>> palloc/pfree/elog.  I think this patch is just making the code uglier
>> and more fragile to put off the inevitable, and that we'd be better
>> served to bite the bullet and do that.
>
> If you think relpathbackend() alone warrants that, yes, I can provide a
> wrapper. Everything else is imo already handled in a sensible and not
> really ugly manner? Imo its not worth the effort *for this alone*.
>
> I already had some elog(), ereport(), whatever emulation but Heikki
> preferred not to have it, so its removed by now.
>
> To what extent do you want palloc et al. emulation? Provide actual pools
> or just make redirect to malloc and provide the required symbols (at the
> very least CurrentMemoryContext)?

Note that the xlogreader facility doesn't need any more emulation. I'm 
quite satisfied with that part of the patch now. However, the rmgr desc 
routines do, and I'm not very happy with those. Not sure what to do 
about it. As you said, we could add enough infrastructure to make them 
compile in frontend context, or we could try to make them rely less on 
backend functions.

One consideration is that once we have a separate pg_xlogdump utility, I 
expect that to be the most visible consumer of the rmgr desc functions. 
The backend will of course still use those functions in e.g error 
messages, but those don't happen very often. Not sure how that should be 
taken into account in this patch, but I thought I'd mention it.

An rmgr desc routine probably shouldn't be calling elog() even in the 
backend, because you don't want to throw errors in the contexts where 
those routines are called.

- Heikki

Re: Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend

From

Tom Lane

Date:

08 January 2013, 20:55:28

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Andres Freund wrote:
>> Sorry, misremembered the problem somewhat. The problem is that code that
>> includes postgres.h atm ends up with ExceptionalCondition() et
>> al. declared even if FRONTEND is defined. So if anything uses an assert
>> you need to provide wrappers for those which seems nasty. If they are
>> provided centrally and check for FRONTEND that problem doesn't exist.

> I think the right fix here is to fix things so that postgres.h is not
> necessary.  How hard is that?  Maybe it just requires some more
> reshuffling of xlog headers.

That would definitely be the ideal fix.  In general, #include'ing
postgres.h into code that's not backend code opens all kinds of hazards
that are likely to bite us sooner or later; the incompatibility of the
Assert definitions is just the tip of that iceberg.  (In the past we've
had issues around <stdio.h> providing different definitions in the two
environments, for example.)

But having said that, I'm also now remembering that we have files in
src/port/ and probably elsewhere that try to work in both environments
by just #include'ing c.h directly (relying on the Makefile to supply
-DFRONTEND or not).  If we wanted to support Assert use in such files,
we would have to move at least the Assert() macro definitions into c.h
as Andres is proposing.  Now, I've always thought that #include'ing c.h
directly was kind of an ugly hack, but I don't have a better design than
that to offer ATM.
        regards, tom lane

Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it

From

Andres Freund

Date:

08 January 2013, 21:00:18

On 2013-01-08 22:47:43 +0200, Heikki Linnakangas wrote:
> On 08.01.2013 22:39, Andres Freund wrote:
> >On 2013-01-08 15:27:23 -0500, Tom Lane wrote:
> >>Andres Freund<andres@2ndquadrant.com>  writes:
> >>>Uhm, we don't have&  need palloc support and I don't think
> >>>relpathbackend() is a good justification for adding it.
> >>
> >>I've said from the very beginning of this effort that it would be
> >>impossible to share any meaningful amount of code between frontend and
> >>backend environments without adding some sort of emulation of
> >>palloc/pfree/elog.  I think this patch is just making the code uglier
> >>and more fragile to put off the inevitable, and that we'd be better
> >>served to bite the bullet and do that.
> >
> >If you think relpathbackend() alone warrants that, yes, I can provide a
> >wrapper. Everything else is imo already handled in a sensible and not
> >really ugly manner? Imo its not worth the effort *for this alone*.
> >
> >I already had some elog(), ereport(), whatever emulation but Heikki
> >preferred not to have it, so its removed by now.
> >
> >To what extent do you want palloc et al. emulation? Provide actual pools
> >or just make redirect to malloc and provide the required symbols (at the
> >very least CurrentMemoryContext)?
> 
> Note that the xlogreader facility doesn't need any more emulation. I'm quite
> satisfied with that part of the patch now. However, the rmgr desc routines
> do, and I'm not very happy with those. Not sure what to do about it. As you
> said, we could add enough infrastructure to make them compile in frontend
> context, or we could try to make them rely less on backend functions.

Which emulation needs are you missing in the desc routines besides
relpathbackend() and timestamptz_to_str()? pfree() is "just" needed
because its used to free the result of relpathbackend(). No own pallocs,
no ereport ...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it

From

Heikki Linnakangas

Date:

08 January 2013, 21:02:16

On 08.01.2013 23:00, Andres Freund wrote:
>> Note that the xlogreader facility doesn't need any more emulation. I'm quite
>> satisfied with that part of the patch now. However, the rmgr desc routines
>> do, and I'm not very happy with those. Not sure what to do about it. As you
>> said, we could add enough infrastructure to make them compile in frontend
>> context, or we could try to make them rely less on backend functions.
>
> Which emulation needs are you missing in the desc routines besides
> relpathbackend() and timestamptz_to_str()? pfree() is "just" needed
> because its used to free the result of relpathbackend(). No own pallocs,
> no ereport ...

Nothing besides those, those are the stuff I was referring to.

- Heikki

Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it

From

Andres Freund

Date:

08 January 2013, 21:09:12

On 2013-01-08 15:45:07 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > To what extent do you want palloc et al. emulation? Provide actual pools
> > or just make redirect to malloc and provide the required symbols (at the
> > very least CurrentMemoryContext)?
> 
> I don't see any need for memory pools, at least not for frontend
> applications of the currently envisioned levels of complexity.  I concur
> with Alvaro's suggestion about just #define'ing them to malloc/free ---
> or maybe better, pg_malloc/free so that we can have a failure-checking
> wrapper.

Unless we want to introduce those into common headers, its more complex
than #define's, you actually need to provide at least
palloc/pfree/CurrentMemoryContext symbols.

Still seems like a shame to do that for one lonely pfree() (+ something
an eventual own implementation of relpathbackend().

> Not sure how we ought to handle elog, but maybe we can put off that bit
> of design until we have a more concrete use-case.

Agreed.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it

From

Andres Freund

Date:

08 January 2013, 21:13:21

On 2013-01-08 23:02:15 +0200, Heikki Linnakangas wrote:
> On 08.01.2013 23:00, Andres Freund wrote:
> >>Note that the xlogreader facility doesn't need any more emulation. I'm quite
> >>satisfied with that part of the patch now. However, the rmgr desc routines
> >>do, and I'm not very happy with those. Not sure what to do about it. As you
> >>said, we could add enough infrastructure to make them compile in frontend
> >>context, or we could try to make them rely less on backend functions.
> >
> >Which emulation needs are you missing in the desc routines besides
> >relpathbackend() and timestamptz_to_str()? pfree() is "just" needed
> >because its used to free the result of relpathbackend(). No own pallocs,
> >no ereport ...
>
> Nothing besides those, those are the stuff I was referring to.

Yea :(. As I said, I think its reasonable to emulate the former but
timestamptz_to_str() seems too be too complex for that.

Extracting the whole datetime/timestamp handling into a backend
independent "module" seems like complex overkill to me although it might
be useful to reduce the duplication of all that code in ecpg. (which
deviated quite a bit by now). No idea how to solve that other than
returning placeholder data atm.

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it

From

Robert Haas

Date:

08 January 2013, 22:28:38

On Tue, Jan 8, 2013 at 3:00 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> Uhm, we don't have & need palloc support and I don't think
> relpathbackend() is a good justification for adding it.

FWIW, I'm with Tom on this one.  Any meaningful code sharing is going
to need that, so we might as well bite the bullet.

And functions that return static buffers are evil incarnate.  I've
spent way too much of my life dealing with the supreme idiocy that is
fmtId().  If someone ever finds a way to make that go away, I will buy
them a beverage of their choice at the next conference we're both at.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it

From

Alvaro Herrera

Date:

08 January 2013, 22:37:10

Robert Haas escribió:

> And functions that return static buffers are evil incarnate.  I've
> spent way too much of my life dealing with the supreme idiocy that is
> fmtId().

+1

> If someone ever finds a way to make that go away, I will buy
> them a beverage of their choice at the next conference we're both at.

+1

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it

From

Andres Freund

Date:

08 January 2013, 22:40:40

On 2013-01-08 17:28:33 -0500, Robert Haas wrote:
> On Tue, Jan 8, 2013 at 3:00 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > Uhm, we don't have & need palloc support and I don't think
> > relpathbackend() is a good justification for adding it.
>
> FWIW, I'm with Tom on this one.  Any meaningful code sharing is going
> to need that, so we might as well bite the bullet.

Yes, if we set the scope bigger than xlogreader I aggree. If its
xlogreader itself I don't. But as I happen to think we should share more
code...
Will prepare a patch.

I wonder whether it would be acceptable to make palloc() an actual
function instead of

#define palloc(sz)      MemoryContextAlloc(CurrentMemoryContext, (sz))

so we don't have to expose CurrentMemoryContext?

Alternatively we can "just" move the whole of utils/mmgr/* to port, but
that would imply an elog/ereport wrapper...

> And functions that return static buffers are evil incarnate.  I've
> spent way too much of my life dealing with the supreme idiocy that is
> fmtId().  If someone ever finds a way to make that go away, I will buy
> them a beverage of their choice at the next conference we're both at.

Imo it depends on the circumstances and number of possible callers, but
anyway, it seems to be already decided that my suggestion isn't the way
to go.

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it

From

Tom Lane

Date:

08 January 2013, 23:23:37

Robert Haas <robertmhaas@gmail.com> writes:
> And functions that return static buffers are evil incarnate.  I've
> spent way too much of my life dealing with the supreme idiocy that is
> fmtId().  If someone ever finds a way to make that go away, I will buy
> them a beverage of their choice at the next conference we're both at.

Yeah, that was exactly the case that was top-of-mind when I was
complaining about static return buffers upthread.

It's not hard to make the ugliness go away: just let it strdup its
return value.  The problem is that in the vast majority of usages it
wouldn't be convenient to free the result, so we'd have a good deal
of memory leakage.  What might be interesting is to instrument it to
see how much (adding a counter to the function ought to be easy enough)
and then find out whether it's an amount we still care about in 2013.
Frankly, pg_dump is a memory hog already - a few more identifier-sized
strings laying about might not matter anymore.

(Wanders away wondering how many relpathbackend callers bother to free
its result, and whether that matters either ...)
        regards, tom lane

Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it

From

Robert Haas

Date:

09 January 2013, 00:52:45

On Tue, Jan 8, 2013 at 6:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> And functions that return static buffers are evil incarnate.  I've
>> spent way too much of my life dealing with the supreme idiocy that is
>> fmtId().  If someone ever finds a way to make that go away, I will buy
>> them a beverage of their choice at the next conference we're both at.
>
> Yeah, that was exactly the case that was top-of-mind when I was
> complaining about static return buffers upthread.
>
> It's not hard to make the ugliness go away: just let it strdup its
> return value.  The problem is that in the vast majority of usages it
> wouldn't be convenient to free the result, so we'd have a good deal
> of memory leakage.  What might be interesting is to instrument it to
> see how much (adding a counter to the function ought to be easy enough)
> and then find out whether it's an amount we still care about in 2013.
> Frankly, pg_dump is a memory hog already - a few more identifier-sized
> strings laying about might not matter anymore.
>
> (Wanders away wondering how many relpathbackend callers bother to free
> its result, and whether that matters either ...)

I was thinking more about a sprintf()-type function that only
understands a handful of escapes, but adds the additional and novel
escapes %I (quote as identifier) and %L (quote as literal).  I think
that would allow a great deal of code simplification, and it'd be more
efficient, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it

From

Tom Lane

Date:

09 January 2013, 00:57:58

Robert Haas <robertmhaas@gmail.com> writes:
> I was thinking more about a sprintf()-type function that only
> understands a handful of escapes, but adds the additional and novel
> escapes %I (quote as identifier) and %L (quote as literal).  I think
> that would allow a great deal of code simplification, and it'd be more
> efficient, too.

Seems like a great idea.  Are you offering to code it?

Note that this wouldn't entirely fix the fmtId problem, as not all the
uses of fmtId are directly in sprintf calls.  Still, it might get rid of
most of the places where it'd be painful to avoid a memory leak with
a strdup'ing version of fmtId.
        regards, tom lane

Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it

From

Robert Haas

Date:

09 January 2013, 01:02:19

On Tue, Jan 8, 2013 at 7:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I was thinking more about a sprintf()-type function that only
>> understands a handful of escapes, but adds the additional and novel
>> escapes %I (quote as identifier) and %L (quote as literal).  I think
>> that would allow a great deal of code simplification, and it'd be more
>> efficient, too.
>
> Seems like a great idea.  Are you offering to code it?

Not imminently.

> Note that this wouldn't entirely fix the fmtId problem, as not all the
> uses of fmtId are directly in sprintf calls.  Still, it might get rid of
> most of the places where it'd be painful to avoid a memory leak with
> a strdup'ing version of fmtId.

Yeah, I didn't think about that.  Might be worth a look to see how
comprehensively it would solve the problem.  But I'll have to leave
that for another day.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[PATCH 2/2] use pg_malloc instead of an unchecked malloc in pg_resetxlog

From

Andres Freund

Date:

09 January 2013, 11:27:26

---
 src/bin/pg_resetxlog/pg_resetxlog.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Attachment

0002-use-pg_malloc-instead-of-an-unchecked-malloc-in-pg_r.patch

[PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Andres Freund

Date:

09 January 2013, 11:27:32

Hi,

As promised here's a patch to provide palloc emulation for frontend-ish
environments.

The patch:
- makes palloc() into a real function so CurrentMemoryContext doesn't need to be provided
- provides common pg_(malloc,malloc0, realloc, strdup, free) wrappers and removes various versions of those across
differentutilities
 
- removes ugly palloc redefinery for frontend use of backend code (dirmod.c)

Controversial/Unclear things:
- palloc[0] are currently copies of the MemoryContextAlloc[Zero] functions to preclude performance regressions, imo the
levelof duplication is ok though
 
- the common memory management is implemented in [pg]port/palloc.[ch], I am not too happy with the name and location
- pgport/palloc.c is only built in the backend, not sure if there is a nicer way to do this from a make POV
- the different versions of pg_malloc et al used different error signaling methods, I've settled on    fprintf(stderr,
_("outof memory\n"));    exit(EXIT_FAILURE);
 

Results in a nice net removal of code:37 files changed, 218 insertions(+), 621 deletions(-)

[PATCH 1/2] Provide a common malloc wrappers and palloc et al. emulation for frontend'ish environs

From

Andres Freund

Date:

09 January 2013, 11:27:56

---
 contrib/oid2name/oid2name.c            |  52 +------------
 contrib/pg_upgrade/pg_upgrade.h        |   5 +-
 contrib/pg_upgrade/util.c              |  49 -------------
 contrib/pgbench/pgbench.c              |  54 +-------------
 src/backend/utils/mmgr/mcxt.c          |  78 +++++++++++---------
 src/bin/initdb/initdb.c                |  40 +---------
 src/bin/pg_basebackup/pg_basebackup.c  |   2 +-
 src/bin/pg_basebackup/pg_receivexlog.c |   1 +
 src/bin/pg_basebackup/receivelog.c     |   1 +
 src/bin/pg_basebackup/streamutil.c     |  38 +---------
 src/bin/pg_basebackup/streamutil.h     |   4 -
 src/bin/pg_ctl/pg_ctl.c                |  39 +---------
 src/bin/pg_dump/Makefile               |   6 +-
 src/bin/pg_dump/common.c               |   1 -
 src/bin/pg_dump/compress_io.c          |   1 -
 src/bin/pg_dump/dumpmem.c              |  76 -------------------
 src/bin/pg_dump/dumpmem.h              |  22 ------
 src/bin/pg_dump/dumputils.h            |   1 +
 src/bin/pg_dump/pg_backup_archiver.c   |   1 -
 src/bin/pg_dump/pg_backup_custom.c     |   2 +-
 src/bin/pg_dump/pg_backup_db.c         |   1 -
 src/bin/pg_dump/pg_backup_directory.c  |   1 -
 src/bin/pg_dump/pg_backup_null.c       |   1 -
 src/bin/pg_dump/pg_backup_tar.c        |   1 -
 src/bin/pg_dump/pg_dump.c              |   1 -
 src/bin/pg_dump/pg_dump_sort.c         |   1 -
 src/bin/pg_dump/pg_dumpall.c           |   1 -
 src/bin/pg_dump/pg_restore.c           |   1 -
 src/bin/psql/common.c                  |  50 -------------
 src/bin/psql/common.h                  |  10 +--
 src/bin/scripts/common.c               |  49 -------------
 src/bin/scripts/common.h               |   5 +-
 src/include/port/palloc.h              |  19 +++++
 src/include/utils/palloc.h             |  12 +--
 src/port/Makefile                      |   8 +-
 src/port/dirmod.c                      |  75 +------------------
 src/port/palloc.c                      | 130 +++++++++++++++++++++++++++++++++
 37 files changed, 218 insertions(+), 621 deletions(-)
 delete mode 100644 src/bin/pg_dump/dumpmem.c
 delete mode 100644 src/bin/pg_dump/dumpmem.h
 create mode 100644 src/include/port/palloc.h
 create mode 100644 src/port/palloc.c

Attachment

0001-Provide-a-common-malloc-wrappers-and-palloc-et-al.-e.patch

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Heikki Linnakangas

Date:

09 January 2013, 11:46:48

On 09.01.2013 13:27, Andres Freund wrote:
> - makes palloc() into a real function so CurrentMemoryContext doesn't
>    need to be provided

I don't understand the need for this change. Can't you just:

#define palloc(s) pg_malloc(s)

in the frontend context?

- Heikki

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Andres Freund

Date:

09 January 2013, 11:56:51

On 2013-01-09 13:46:53 +0200, Heikki Linnakangas wrote:
> On 09.01.2013 13:27, Andres Freund wrote:
> >- makes palloc() into a real function so CurrentMemoryContext doesn't
> >   need to be provided
>
> I don't understand the need for this change. Can't you just:
>
> #define palloc(s) pg_malloc(s)
>
> in the frontend context?

Yes, that would be possible, but imo its the inferior solution:
* it precludes ever sharing code without compiling twice
* removing allows us to get rid of the following ugliness in dirmod.c:
-#ifndef FRONTEND
-
-/*
- * On Windows, call non-macro versions of palloc; we can't reference
- * CurrentMemoryContext in this file because of PGDLLIMPORT conflict.
- */
-#if defined(WIN32) || defined(__CYGWIN__)
-#undef palloc
-#undef pstrdup
-#define palloc(sz)     pgport_palloc(sz)
-#define pstrdup(str)   pgport_pstrdup(str)
-#endif
-#else                          /* FRONTEND */
-
* it opens the window for moving more stuff from utils/palloc.h to memutils.h

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH 1/2] Provide a common malloc wrappers and palloc et al. emulation for frontend'ish environs

From

Magnus Hagander

Date:

09 January 2013, 12:34:16

Am I the only one who finds this way of posting patches really annoying?

Here is a patch with no description other than a list of changed
files. And discussion happens in a completely different email.

What's wrong with just posting the patch as a regular attachment(s) to
a regular thread, like other people do?

Yes, I'm well aware that some mailers thread them in right. Our
archives code does. But many MUAs don't. But even when using a MUA
that threads it correctly, I find it quite annoying that you have to
open one email to read the comments and a different email toview the
patch.

It may be just me. But it may be others as well, so I figured I should
raise the issue :)

//Magnus


On Wed, Jan 9, 2013 at 12:27 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> ---
>  contrib/oid2name/oid2name.c            |  52 +------------
>  contrib/pg_upgrade/pg_upgrade.h        |   5 +-
>  contrib/pg_upgrade/util.c              |  49 -------------
>  contrib/pgbench/pgbench.c              |  54 +-------------
>  src/backend/utils/mmgr/mcxt.c          |  78 +++++++++++---------
>  src/bin/initdb/initdb.c                |  40 +---------
>  src/bin/pg_basebackup/pg_basebackup.c  |   2 +-
>  src/bin/pg_basebackup/pg_receivexlog.c |   1 +
>  src/bin/pg_basebackup/receivelog.c     |   1 +
>  src/bin/pg_basebackup/streamutil.c     |  38 +---------
>  src/bin/pg_basebackup/streamutil.h     |   4 -
>  src/bin/pg_ctl/pg_ctl.c                |  39 +---------
>  src/bin/pg_dump/Makefile               |   6 +-
>  src/bin/pg_dump/common.c               |   1 -
>  src/bin/pg_dump/compress_io.c          |   1 -
>  src/bin/pg_dump/dumpmem.c              |  76 -------------------
>  src/bin/pg_dump/dumpmem.h              |  22 ------
>  src/bin/pg_dump/dumputils.h            |   1 +
>  src/bin/pg_dump/pg_backup_archiver.c   |   1 -
>  src/bin/pg_dump/pg_backup_custom.c     |   2 +-
>  src/bin/pg_dump/pg_backup_db.c         |   1 -
>  src/bin/pg_dump/pg_backup_directory.c  |   1 -
>  src/bin/pg_dump/pg_backup_null.c       |   1 -
>  src/bin/pg_dump/pg_backup_tar.c        |   1 -
>  src/bin/pg_dump/pg_dump.c              |   1 -
>  src/bin/pg_dump/pg_dump_sort.c         |   1 -
>  src/bin/pg_dump/pg_dumpall.c           |   1 -
>  src/bin/pg_dump/pg_restore.c           |   1 -
>  src/bin/psql/common.c                  |  50 -------------
>  src/bin/psql/common.h                  |  10 +--
>  src/bin/scripts/common.c               |  49 -------------
>  src/bin/scripts/common.h               |   5 +-
>  src/include/port/palloc.h              |  19 +++++
>  src/include/utils/palloc.h             |  12 +--
>  src/port/Makefile                      |   8 +-
>  src/port/dirmod.c                      |  75 +------------------
>  src/port/palloc.c                      | 130 +++++++++++++++++++++++++++++++++
>  37 files changed, 218 insertions(+), 621 deletions(-)
>  delete mode 100644 src/bin/pg_dump/dumpmem.c
>  delete mode 100644 src/bin/pg_dump/dumpmem.h
>  create mode 100644 src/include/port/palloc.h
>  create mode 100644 src/port/palloc.c
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/

Re: [PATCH 1/2] Provide a common malloc wrappers and palloc et al. emulation for frontend'ish environs

From

Andres Freund

Date:

09 January 2013, 12:47:40

On 2013-01-09 13:34:12 +0100, Magnus Hagander wrote:
> Am I the only one who finds this way of posting patches really annoying?

Well, I unsurprisingly don't ;)

> Here is a patch with no description other than a list of changed
> files. And discussion happens in a completely different email.

They contain the commit message - which in most of the cases is more
informative than the one just posted, which was definitely rather
short. It should like in
e.g.http://archives.postgresql.org/message-id/1352942234-3953-11-git-send-email-andres%402ndquadrant.com

> What's wrong with just posting the patch as a regular attachment(s) to
> a regular thread, like other people do?

Two issues:
- If you have a bigger series of patches (like the whole logical decoding thing) posting all patches in a single mail
makesthe following thread even harder to follow than its currently the case. Note how even in this, far smaller, case
thediscussion actually happened in the appropriate subthreads. I find it way much easier to reread through an old
threadthat way to reassure myself what was discussed.
 
- mhonarc does really strange things if you attach two git created patches (splits them into multiple mails)

> It may be just me. But it may be others as well, so I figured I should
> raise the issue :)

I am happy to comply with whatever others prefer.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH 1/2] Provide a common malloc wrappers and palloc et al. emulation for frontend'ish environs

From

Magnus Hagander

Date:

09 January 2013, 12:54:08

On Wed, Jan 9, 2013 at 1:47 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-01-09 13:34:12 +0100, Magnus Hagander wrote:
>> Am I the only one who finds this way of posting patches really annoying?
>
> Well, I unsurprisingly don't ;)

Yeah, that's not surprising :)

>> Here is a patch with no description other than a list of changed
>> files. And discussion happens in a completely different email.
>
> They contain the commit message - which in most of the cases is more
> informative than the one just posted, which was definitely rather
> short. It should like in e.g.
>  http://archives.postgresql.org/message-id/1352942234-3953-11-git-send-email-andres%402ndquadrant.com

They are really two different issues - the posting a patch without a
description, and the separation of threads. It's when they are
combined together that it becomes *really* annoying :) When it'sposted
as a separate email *with* a better commit message it's at least
easier to start a discussion off it. But I still find it much omre
annoying than just posting the patch in-thread.

>> What's wrong with just posting the patch as a regular attachment(s) to
>> a regular thread, like other people do?
>
> Two issues:
> - If you have a bigger series of patches (like the whole logical
>   decoding thing) posting all patches in a single mail makes the
>   following thread even harder to follow than its currently the
>   case. Note how even in this, far smaller, case the discussion actually
>   happened in the appropriate subthreads. I find it way much easier to
>   reread through an old thread that way to reassure myself what was
>   discussed.

Yes. So one thread per patch. That's what you already have. That's not
a factor of how the patches are posted, that's just a factor of how
many threads you break it up in. I can agree that posting 20 different
patches inthe same thread is even worse :)

> - mhonarc does really strange things if you attach two git created
>   patches (splits them into multiple mails)

mhonarc does a lot of strange things. But this part is actually not
mhonarc's fault - it's majordomo that writes them into an mbox file in
a format that you can't see the difference between the patch and the
different message. Heck, it quite often gets it wrong even if you just
post *one* patch when it's generated by git.

This is handled better by the new archives code.

>> It may be just me. But it may be others as well, so I figured I should
>> raise the issue :)
>
> I am happy to comply with whatever others prefer.

Yeah, so far it's also just my opinion in the other direction :)
Hopefully, some others will have thoughts about it too.

--Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/

Re: [PATCH 1/2] Provide a common malloc wrappers and palloc et al. emulation for frontend'ish environs

From

Michael Paquier

Date:

09 January 2013, 13:23:30

On Wed, Jan 9, 2013 at 9:54 PM, Magnus Hagander <magnus@hagander.net> wrote:

On Wed, Jan 9, 2013 at 1:47 PM, Andres Freund <andres@2ndquadrant.com> wrote:

Yeah, so far it's also just my opinion in the other direction :)
Hopefully, some others will have thoughts about it too.

Just giving my 2c here...

Instead of posting multiple 5~7 patches at the same time, why not limiting the number of patches published at the same time to a lower number (max 2~3)? The logical replication implementation can be surely broken down into many more pieces that could be reviewed carefully one by one, and in a way that would make the implementation steps clearer than it is now for all the people of this ML.

OK this would make the review process longer but the good point is that some hackers who are only specialized in some areas of the PG code would be able to give precious feedback.
--
Michael Paquier
http://michael.otacoo.com

Re: [PATCH 1/2] Provide a common malloc wrappers and palloc et al. emulation for frontend'ish environs

From

Andres Freund

Date:

09 January 2013, 13:58:17

On 2013-01-09 22:23:25 +0900, Michael Paquier wrote:
> On Wed, Jan 9, 2013 at 9:54 PM, Magnus Hagander <magnus@hagander.net> wrote:
> 
> > On Wed, Jan 9, 2013 at 1:47 PM, Andres Freund <andres@2ndquadrant.com>
> > wrote:
> >
> Yeah, so far it's also just my opinion in the other direction :)
> > Hopefully, some others will have thoughts about it too.
> >
> Just giving my 2c here...
> 
> Instead of posting multiple 5~7 patches at the same time, why not limiting
> the number of patches published at the same time to a lower number (max
> 2~3)?

I tried to do this. ilist, binaryheap and this this thread... ;)

> The logical replication implementation can be surely broken down into
> many more pieces that could be reviewed carefully one by one, and in a way
> that would make the implementation steps clearer than it is now for all the
> people of this ML.

I don't really see any additional useful splits from the ones made last
round. The relfilenode stuff could be submitted separately but thats
about it and its seemingly not all that useful itself (I personally want
the (tablespace, relfilenode) => reloid mapping function independently,
but I seem to be alone). Its hard to get useful review for patches which
don't have a patch using the facility nearby in my experience.

Where do you see a useful split?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH 1/2] Provide a common malloc wrappers and palloc et al. emulation for frontend'ish environs

From

Alvaro Herrera

Date:

09 January 2013, 14:45:19

How hard is the backend hit by palloc being now an additional function
call?  Would it be a good idea to make it (and friends) STATIC_IF_INLINE?

> diff --git a/src/include/port/palloc.h b/src/include/port/palloc.h
> new file mode 100644
> index 0000000..a7900bf
> --- /dev/null
> +++ b/src/include/port/palloc.h
> @@ -0,0 +1,19 @@
> +/*
> + *    common.h
> + *        Common support routines for bin/scripts/
> + *
> + *    Copyright (c) 2003-2013, PostgreSQL Global Development Group
> + *
> + *    src/bin/scripts/common.h
> + */

You forgot to update the above comment.


--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: [PATCH 1/2] Provide a common malloc wrappers and palloc et al. emulation for frontend'ish environs

From

Andres Freund

Date:

09 January 2013, 14:58:36

On 2013-01-09 11:45:12 -0300, Alvaro Herrera wrote:
>
> How hard is the backend hit by palloc being now an additional function
> call?  Would it be a good idea to make it (and friends) STATIC_IF_INLINE?
>
> > diff --git a/src/include/port/palloc.h b/src/include/port/palloc.h
> > new file mode 100644
> > index 0000000..a7900bf
> > --- /dev/null
> > +++ b/src/include/port/palloc.h
> > @@ -0,0 +1,19 @@
> > +/*
> > + *    common.h
> > + *        Common support routines for bin/scripts/
> > + *
> > + *    Copyright (c) 2003-2013, PostgreSQL Global Development Group
> > + *
> > + *    src/bin/scripts/common.h
> > + */
>
> You forgot to update the above comment.

*headbang*. Gah. I hate these comments... Will update, but won't send
anything again until more changes have accumulated.

Thanks!

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH 1/2] Provide a common malloc wrappers and palloc et al. emulation for frontend'ish environs

From

Andres Freund

Date:

09 January 2013, 15:18:56

On 2013-01-09 11:45:12 -0300, Alvaro Herrera wrote:
> 
> How hard is the backend hit by palloc being now an additional function
> call?  Would it be a good idea to make it (and friends) STATIC_IF_INLINE?

Missed this at first...

I don't think there's any measurable hit now as there is no additional
function call as I chose to directly do the work directly in
palloc[0]. That causes a minor amount of code duplication but imo thats
ok. I didn't do that for pstrdup() but it would be easy enough to do it
there as well, but I don't think it matters there.

In a quick test I couldn't find any performance difference.

I don't think making them STATIC_IF_INLINE functions would help as I
think that would still require the CurrentMemoryContext symbol to be
visible in the callers context.

FWIW I previously measured whether the function call overhead via mcxt.c
is measurable and it wasn't...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH 1/2] Provide a common malloc wrappers and palloc et al. emulation for frontend'ish environs

From

Tom Lane

Date:

09 January 2013, 16:27:53

Magnus Hagander <magnus@hagander.net> writes:
> On Wed, Jan 9, 2013 at 1:47 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> On 2013-01-09 13:34:12 +0100, Magnus Hagander wrote:
>>> Am I the only one who finds this way of posting patches really annoying?

>> Well, I unsurprisingly don't ;)

> Yeah, that's not surprising :)

I'm with Magnus.  This is seriously annoying, especially when the
"discussion" thread has a title not closely related to the "patch"
emails.  (It doesn't help any that the list server doesn't manage to
deliver the emails in order, at least not to me --- more often than
not, they're spread out over a few minutes and interleaved with other
messages.)

I also don't find the argument that the commit messages are a substitute
for patch descriptions to be worth anything.  Commit messages are, or
should be, for a different audience: they are for whoever writes the
release notes, or for historical purposes when someone is looking for
"which patches touched a particular area?".  That's not the same as
explaining/justifying the patch for review purposes.  Reviewers want
a lot more depth than is appropriate in a commit message.  (TBH, I rarely
use any submitter's proposed commit message anyway, but that's just me.)

I'd prefer posting a single message with the discussion and the
patch(es).  If you think it's helpful to split a patch into separate
parts for reviewing, add multiple attachments.  But my experience is
that such separation isn't nearly as useful as you seem to think.
        regards, tom lane

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Tom Lane

Date:

09 January 2013, 16:37:52

Andres Freund <andres@2ndquadrant.com> writes:
> On 2013-01-09 13:46:53 +0200, Heikki Linnakangas wrote:
>> I don't understand the need for this change. Can't you just:
>> #define palloc(s) pg_malloc(s)
>> in the frontend context?

> Yes, that would be possible, but imo its the inferior solution:

I'm with Heikki: in fact, I will go farther and say that this approach
is DOA.  The only way you get to change the backend's definition of
palloc is if you can show that doing so yields a performance boost
in the backend.  Otherwise, we use a macro or whatever is necessary
to allow frontend code to have a different underlying implementation
of palloc(x).

> * it precludes ever sharing code without compiling twice

That's never going to happen anyway, or at least there are plenty more
problems in the way of it.

> * removing allows us to get rid of the following ugliness in dirmod.c:

I agree that what dirmod is doing is pretty ugly, but it's localized
enough that I don't care particularly.  (Really, the only reason it's
a hack and not The Solution is that at the moment there's only one
file doing it that way.  A trampoline function for use only by code
that needs to work in both frontend and backend isn't an unreasonable
idea.)
        regards, tom lane

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Andres Freund

Date:

09 January 2013, 16:42:39

On 2013-01-09 11:37:46 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2013-01-09 13:46:53 +0200, Heikki Linnakangas wrote:
> >> I don't understand the need for this change. Can't you just:
> >> #define palloc(s) pg_malloc(s)
> >> in the frontend context?
>
> > Yes, that would be possible, but imo its the inferior solution:
>
> I'm with Heikki: in fact, I will go farther and say that this approach
> is DOA.  The only way you get to change the backend's definition of
> palloc is if you can show that doing so yields a performance boost
> in the backend.  Otherwise, we use a macro or whatever is necessary
> to allow frontend code to have a different underlying implementation
> of palloc(x).

Whats the usecase for being able to redefine it after the patch? Its not
like you can do anything interesting given that pfree() et. al. is not a
macro.

> > * removing allows us to get rid of the following ugliness in dirmod.c:
>
> I agree that what dirmod is doing is pretty ugly, but it's localized
> enough that I don't care particularly.  (Really, the only reason it's
> a hack and not The Solution is that at the moment there's only one
> file doing it that way.  A trampoline function for use only by code
> that needs to work in both frontend and backend isn't an unreasonable
> idea.)

Well, after the patch the trampoline function would be palloc() itself.

Greetings,

Andres Freund

Re: [PATCH 1/2] Provide a common malloc wrappers and palloc et al. emulation for frontend'ish environs

From

Peter Geoghegan

Date:

09 January 2013, 16:47:44

On 9 January 2013 16:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm with Magnus.  This is seriously annoying, especially when the
> "discussion" thread has a title not closely related to the "patch"
> emails.  (It doesn't help any that the list server doesn't manage to
> deliver the emails in order, at least not to me --- more often than
> not, they're spread out over a few minutes and interleaved with other
> messages.)

I agree. I actually go to some lengths to coordinate all of these
threads during review, and I'd prefer not to have to. I would have
said something, but it seemed unimportant (relative to the patchset
itself).

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: [PATCH 1/2] Provide a common malloc wrappers and palloc et al. emulation for frontend'ish environs

From

Andres Freund

Date:

09 January 2013, 16:56:55

On 2013-01-09 11:27:46 -0500, Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
> > On Wed, Jan 9, 2013 at 1:47 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> >> On 2013-01-09 13:34:12 +0100, Magnus Hagander wrote:
> >>> Am I the only one who finds this way of posting patches really annoying?
> 
> >> Well, I unsurprisingly don't ;)
> 
> > Yeah, that's not surprising :)
> 
> I'm with Magnus.  This is seriously annoying, especially when the
> "discussion" thread has a title not closely related to the "patch"
> emails.  (It doesn't help any that the list server doesn't manage to
> deliver the emails in order, at least not to me --- more often than
> not, they're spread out over a few minutes and interleaved with other
> messages.)

Ok, most seem to have a clear preference, so I won't do so anymore.
> I also don't find the argument that the commit messages are a substitute
> for patch descriptions to be worth anything.  Commit messages are, or
> should be, for a different audience: they are for whoever writes the
> release notes, or for historical purposes when someone is looking for
> "which patches touched a particular area?".  That's not the same as
> explaining/justifying the patch for review purposes.  Reviewers want
> a lot more depth than is appropriate in a commit message.

Aggreed that they have different audiences.

> (TBH, I rarely use any submitter's proposed commit message anyway, but
> that's just me.)

I noticed ;)
> 
> I'd prefer posting a single message with the discussion and the
> patch(es).  If you think it's helpful to split a patch into separate
> parts for reviewing, add multiple attachments.  But my experience is
> that such separation isn't nearly as useful as you seem to think.

Well, would it have been better if xlog reading, ilist, binaryheap, this
cleanup, etc. have been in the same patch? They have originated out of
the same work...
Even the splitup in this thread seems to have helped as youve jumped on
the patches where you could give rather quick input (static
relpathbackend(), central Assert definitions), probably without having
read the xlogreader patch itself...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Tom Lane

Date:

09 January 2013, 16:57:41

Andres Freund <andres@anarazel.de> writes:
> On 2013-01-09 11:37:46 -0500, Tom Lane wrote:
>> I agree that what dirmod is doing is pretty ugly, but it's localized
>> enough that I don't care particularly.  (Really, the only reason it's
>> a hack and not The Solution is that at the moment there's only one
>> file doing it that way.  A trampoline function for use only by code
>> that needs to work in both frontend and backend isn't an unreasonable
>> idea.)

> Well, after the patch the trampoline function would be palloc() itself.

My objection is that you're forcing *all* backend code to go through
the trampoline.  That probably has a negative impact on performance, and
if you think it'll get committed without a thorough proof that there's
no such impact, you're mistaken.  Perhaps you're not aware of exactly
how hot a hot-spot palloc is?  It's not unlikely that this patch could
hurt backend performance by several percent all by itself.

Now on the other side of the coin, it's also possible that it's a wash,
particularly on machines where fetching a global variable is a
relatively expensive thing to do.  But the driving force behind any
proposed change to the backend's palloc mechanism has to be performance,
and nothing but.  Convenience for a patch such as this not only doesn't
trump performance, I'm unwilling to grant it any standing at all.
        regards, tom lane

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Andres Freund

Date:

09 January 2013, 17:02:55

On 2013-01-09 11:57:35 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2013-01-09 11:37:46 -0500, Tom Lane wrote:
> >> I agree that what dirmod is doing is pretty ugly, but it's localized
> >> enough that I don't care particularly.  (Really, the only reason it's
> >> a hack and not The Solution is that at the moment there's only one
> >> file doing it that way.  A trampoline function for use only by code
> >> that needs to work in both frontend and backend isn't an unreasonable
> >> idea.)
> 
> > Well, after the patch the trampoline function would be palloc() itself.
> 
> My objection is that you're forcing *all* backend code to go through
> the trampoline.  That probably has a negative impact on performance, and
> if you think it'll get committed without a thorough proof that there's
> no such impact, you're mistaken.  Perhaps you're not aware of exactly
> how hot a hot-spot palloc is?  It's not unlikely that this patch could
> hurt backend performance by several percent all by itself.

There is no additional overhead except some minor increase in code size.
If you look at the patch palloc() now simply directly calls
CurrentMemoryContext->methods->alloc. So there is no additional function
call relative to the previous state.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Tom Lane

Date:

09 January 2013, 17:32:23

Andres Freund <andres@anarazel.de> writes:
> On 2013-01-09 11:57:35 -0500, Tom Lane wrote:
>> My objection is that you're forcing *all* backend code to go through
>> the trampoline.  That probably has a negative impact on performance, and
>> if you think it'll get committed without a thorough proof that there's
>> no such impact, you're mistaken.  Perhaps you're not aware of exactly
>> how hot a hot-spot palloc is?  It's not unlikely that this patch could
>> hurt backend performance by several percent all by itself.

> There is no additional overhead except some minor increase in code size.
> If you look at the patch palloc() now simply directly calls
> CurrentMemoryContext->methods->alloc. So there is no additional function
> call relative to the previous state.

Apparently we're failing to communicate, so let me put this as clearly
as possible: an unsupported assertion that this patch has zero cost
isn't worth the electrons it's written on.  We're talking about a place
where single instructions can make a large difference.  I see that
you've avoided an extra level of function call by duplicating code,
but there are (at least) two reasons why that could be a loser:

* now there are two hotspots not one, ie both MemoryContextAlloc and
palloc will be competing for L1 cache, likewise for
MemoryContextAllocZero and palloc0;

* depending on machine architecture and compiler optimizations, multiple
fetches of the global variable CurrentMemoryContext are quite likely to
cost more than fetching it once into a parameter register.

As I said, it's possible that this is a wash.  But it's very possible
that it isn't.

In any case it's not clear to me why duplicating code like that is a
less ugly approach than having different macro definitions for frontend
and backend.
        regards, tom lane

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Andres Freund

Date:

09 January 2013, 17:59:41

On 2013-01-09 12:32:20 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2013-01-09 11:57:35 -0500, Tom Lane wrote:
> >> My objection is that you're forcing *all* backend code to go through
> >> the trampoline.  That probably has a negative impact on performance, and
> >> if you think it'll get committed without a thorough proof that there's
> >> no such impact, you're mistaken.  Perhaps you're not aware of exactly
> >> how hot a hot-spot palloc is?  It's not unlikely that this patch could
> >> hurt backend performance by several percent all by itself.
>
> > There is no additional overhead except some minor increase in code size.
> > If you look at the patch palloc() now simply directly calls
> > CurrentMemoryContext->methods->alloc. So there is no additional function
> > call relative to the previous state.
>
> Apparently we're failing to communicate, so let me put this as clearly
> as possible: an unsupported assertion that this patch has zero cost
> isn't worth the electrons it's written on.

Well, I *did* benchmark it as noted elsewhere in the thread, but thats
obviously just machine (E5520 x 2) with one rather restricted workload
(pgbench -S -jc 40 -T60). At least its rather palloc heavy.

Here are the numbers:

before:
#101646.763208 101350.361595 101421.425668 101571.211688 101862.172051 101449.857665
after:
#101553.596257 102132.277795 101528.816229 101733.541792 101438.531618 101673.400992

So on my system if there is a difference, its positive (0.12%).

> In any case it's not clear to me why duplicating code like that is a
> less ugly approach than having different macro definitions for frontend
> and backend.

Imo its better abstracted and more consistent (pfree() is not a macro,
palloc() is) and actually makes it easier to redirect code somewhere
else. It also opens the door for exposing less knowledge in
utils/palloc.h.

Anyway, we can use a macro instead, I don't think its that important.

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Re: [PATCH 1/2] Provide a common malloc wrappers and palloc et al. emulation for frontend'ish environs

From

Tom Lane

Date:

09 January 2013, 19:08:06

Andres Freund <andres@2ndquadrant.com> writes:
> On 2013-01-09 11:27:46 -0500, Tom Lane wrote:
>> I'd prefer posting a single message with the discussion and the
>> patch(es).  If you think it's helpful to split a patch into separate
>> parts for reviewing, add multiple attachments.  But my experience is
>> that such separation isn't nearly as useful as you seem to think.

> Well, would it have been better if xlog reading, ilist, binaryheap, this
> cleanup, etc. have been in the same patch? They have originated out of
> the same work...
> Even the splitup in this thread seems to have helped as youve jumped on
> the patches where you could give rather quick input (static
> relpathbackend(), central Assert definitions), probably without having
> read the xlogreader patch itself...

No, I agree that global-impact things like this palloc rearrangement are
much better proposed and debated separately than as part of something
like xlogreader.  What I was reacting to was the specific patch set
associated with this thread.  I don't see the point of breaking out a
two-line sub-patch such as you did in
http://archives.postgresql.org/message-id/1357730830-25999-3-git-send-email-andres@2ndquadrant.com
        regards, tom lane

Re: Re: [PATCH 1/2] Provide a common malloc wrappers and palloc et al. emulation for frontend'ish environs

From

"anarazel@anarazel.de"

Date:

09 January 2013, 19:20:44


Tom Lane <tgl@sss.pgh.pa.us> schrieb:

>Andres Freund <andres@2ndquadrant.com> writes:
>> On 2013-01-09 11:27:46 -0500, Tom Lane wrote:
>>> I'd prefer posting a single message with the discussion and the
>>> patch(es).  If you think it's helpful to split a patch into separate
>>> parts for reviewing, add multiple attachments.  But my experience is
>>> that such separation isn't nearly as useful as you seem to think.
>
>> Well, would it have been better if xlog reading, ilist, binaryheap,
>this
>> cleanup, etc. have been in the same patch? They have originated out
>of
>> the same work...
>> Even the splitup in this thread seems to have helped as youve jumped
>on
>> the patches where you could give rather quick input (static
>> relpathbackend(), central Assert definitions), probably without
>having
>> read the xlogreader patch itself...
>
>No, I agree that global-impact things like this palloc rearrangement
>are
>much better proposed and debated separately than as part of something
>like xlogreader.  What I was reacting to was the specific patch set
>associated with this thread.  I don't see the point of breaking out a
>two-line sub-patch such as you did in
>http://archives.postgresql.org/message-id/1357730830-25999-3-git-send-email-andres@2ndquadrant.com

Ah, yes. I See your point. The not all that good reasoning I had in mind was that that one should be uncontroversial as
itseemed to be the only unchecked malloc call in src/bin. So it could be committed independent from the more
controversialstuff... Same with the single whitespace removal patch upthread... 

Andres

---
Please excuse brevity and formatting - I am writing this on my mobile phone.

Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Tom Lane

Date:

09 January 2013, 20:07:16

Andres Freund <andres@2ndquadrant.com> writes:
> Well, I *did* benchmark it as noted elsewhere in the thread, but thats
> obviously just machine (E5520 x 2) with one rather restricted workload
> (pgbench -S -jc 40 -T60). At least its rather palloc heavy.

> Here are the numbers:

> before:
> #101646.763208 101350.361595 101421.425668 101571.211688 101862.172051 101449.857665
> after:
> #101553.596257 102132.277795 101528.816229 101733.541792 101438.531618 101673.400992

> So on my system if there is a difference, its positive (0.12%).

pgbench-based testing doesn't fill me with a lot of confidence for this
--- its numbers contain a lot of communication overhead, not to mention
that pgbench itself can be a bottleneck.  It struck me that we have a
recent test case that's known to be really palloc-intensive, namely
Pavel's example here:
http://www.postgresql.org/message-id/CAFj8pRCKfoz6L82PovLXNK-1JL=jzjwaT8e2BD2PwNKm7i7KVg@mail.gmail.com

I set up a non-cassert build of commit
78a5e738e97b4dda89e1bfea60675bcf15f25994 (ie, just before the patch that
reduced the data-copying overhead for that).  On my Fedora 16 machine
(dual 2.0GHz Xeon E5503, gcc version 4.6.3 20120306 (Red Hat 4.6.3-2))
I get a runtime for Pavel's example of 17023 msec (average over five
runs).  I then applied oprofile and got a breakdown like this:
 samples|      %|
------------------  108409 84.5083 /home/tgl/testversion/bin/postgres   13723 10.6975 /lib64/libc-2.14.90.so    3153
2.4579/home/tgl/testversion/lib/postgresql/plpgsql.so
 

samples  %        symbol name
10960    10.1495  AllocSetAlloc
6325      5.8572  MemoryContextAllocZeroAligned
6225      5.7646  base_yyparse
3765      3.4866  copyObject
2511      2.3253  MemoryContextAlloc
2292      2.1225  grouping_planner
2044      1.8928  SearchCatCache
1956      1.8113  core_yylex
1763      1.6326  expression_tree_walker
1347      1.2474  MemoryContextCreate
1340      1.2409  check_stack_depth
1276      1.1816  GetCachedPlan
1175      1.0881  AllocSetFree
1106      1.0242  GetSnapshotData
1106      1.0242  _SPI_execute_plan
1101      1.0196  extract_query_dependencies_walker

I then applied the palloc.h and mcxt.c hunks of your patch and rebuilt.
Now I get an average runtime of 16666 ms, a full 2% faster, which is a
bit astonishing, particularly because the oprofile results haven't moved
much:
  107642 83.7427 /home/tgl/testversion/bin/postgres   14677 11.4183 /lib64/libc-2.14.90.so    3180  2.4740
/home/tgl/testversion/lib/postgresql/plpgsql.so

samples  %        symbol name
10038     9.3537  AllocSetAlloc
6392      5.9562  MemoryContextAllocZeroAligned
5763      5.3701  base_yyparse
4810      4.4821  copyObject
2268      2.1134  grouping_planner
2178      2.0295  core_yylex
1963      1.8292  palloc
1867      1.7397  SearchCatCache
1835      1.7099  expression_tree_walker
1551      1.4453  check_stack_depth
1374      1.2803  _SPI_execute_plan
1282      1.1946  MemoryContextCreate
1187      1.1061  AllocSetFree
...
653       0.6085  palloc0
...
552       0.5144  MemoryContextAlloc

The number of calls of AllocSetAlloc certainly hasn't changed at all, so
how did that get faster?

I notice that the postgres executable is about 0.2% smaller, presumably
because a whole lot of inlined fetches of CurrentMemoryContext are gone.
This makes me wonder if my result is due to chance improvements of cache
line alignment for inner loops.

I would like to know if other people get comparable results on other
hardware (non-Intel hardware would be especially interesting).  If this
result holds up across a range of platforms, I'll withdraw my objection
to making palloc a plain function.
        regards, tom lane

Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Tom Lane

Date:

09 January 2013, 20:43:23

I wrote:
> I then applied the palloc.h and mcxt.c hunks of your patch and rebuilt.
> Now I get an average runtime of 16666 ms, a full 2% faster, which is a
> bit astonishing, particularly because the oprofile results haven't moved
> much:

I studied the assembly code being generated for palloc(), and I believe
I see the reason why it's a bit faster: when there's only a single local
variable that has to survive over the elog call, gcc generates a shorter
function entry/exit sequence.  I had thought of proposing that we code
palloc() like this:

void *
palloc(Size size)
{   MemoryContext context = CurrentMemoryContext;
   AssertArg(MemoryContextIsValid(context));
   if (!AllocSizeIsValid(size))       elog(ERROR, "invalid memory alloc request size %lu",            (unsigned long)
size);
   context->isReset = false;
   return (*context->methods->alloc) (context, size);
}

but at least on this specific hardware and compiler that would evidently
be a net loss compared to direct use of CurrentMemoryContext.  I would
not put a lot of faith in that result holding up on other machines
though.

In any case this doesn't explain the whole 2% speedup, but it probably
accounts for palloc() showing as slightly cheaper than
MemoryContextAlloc had been in the oprofile listing.
        regards, tom lane

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Andres Freund

Date:

09 January 2013, 22:15:03

On 2013-01-09 15:43:19 -0500, Tom Lane wrote:
> I wrote:
> > I then applied the palloc.h and mcxt.c hunks of your patch and rebuilt.
> > Now I get an average runtime of 16666 ms, a full 2% faster, which is a
> > bit astonishing, particularly because the oprofile results haven't moved
> > much:
>
> I studied the assembly code being generated for palloc(), and I believe
> I see the reason why it's a bit faster: when there's only a single local
> variable that has to survive over the elog call, gcc generates a shorter
> function entry/exit sequence.

Makes sense.

>  I had thought of proposing that we code
> palloc() like this:
>
> void *
> palloc(Size size)
> {
>     MemoryContext context = CurrentMemoryContext;
>
>     AssertArg(MemoryContextIsValid(context));
>
>     if (!AllocSizeIsValid(size))
>         elog(ERROR, "invalid memory alloc request size %lu",
>              (unsigned long) size);
>
>     context->isReset = false;
>
>     return (*context->methods->alloc) (context, size);
> }
>
> but at least on this specific hardware and compiler that would evidently
> be a net loss compared to direct use of CurrentMemoryContext.  I would
> not put a lot of faith in that result holding up on other machines
> though.

Thats not optimized to the same? ISTM the compiler should produce
exactly the same code for both.

> In any case this doesn't explain the whole 2% speedup, but it probably
> accounts for palloc() showing as slightly cheaper than
> MemoryContextAlloc had been in the oprofile listing.

I'd guess that a good part of the cost is just smeared across all
callers and not individually accountable to any function visible in the
profile. Additionally, With functions as short as MemoryContextAllocZero
profiles like oprofile (and perf) also often leak quite a bit of the
actual cost to the callsites in my experience.

I wonder whether it makes sense to "inline" the contents pstrdup()
additionally? My gut feeling is not, but...

I would like to move CurrentMemoryContext to memutils.h, but that seems
to require too many changes.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Andres Freund

Date:

09 January 2013, 22:22:36

On 2013-01-09 15:07:10 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > Well, I *did* benchmark it as noted elsewhere in the thread, but thats
> > obviously just machine (E5520 x 2) with one rather restricted workload
> > (pgbench -S -jc 40 -T60). At least its rather palloc heavy.
> 
> > Here are the numbers:
> 
> > before:
> > #101646.763208 101350.361595 101421.425668 101571.211688 101862.172051 101449.857665
> > after:
> > #101553.596257 102132.277795 101528.816229 101733.541792 101438.531618 101673.400992
> 
> > So on my system if there is a difference, its positive (0.12%).
> 
> pgbench-based testing doesn't fill me with a lot of confidence for this
> --- its numbers contain a lot of communication overhead, not to mention
> that pgbench itself can be a bottleneck. 

I chose it because I looked at profiles of it often enough to know
memory allocation shows up high in the profile (especially without -M
prepared).

> I would like to know if other people get comparable results on other
> hardware (non-Intel hardware would be especially interesting).  If this
> result holds up across a range of platforms, I'll withdraw my objection
> to making palloc a plain function.

I only have access to core2 and Nehalem atm, but FWIW I just tested it
on another workload (decoding WAL changes of a large transaction into
text) and I see improvements around 1.2% on both.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Tom Lane

Date:

09 January 2013, 22:28:42

Andres Freund <andres@2ndquadrant.com> writes:
> On 2013-01-09 15:43:19 -0500, Tom Lane wrote:
>> I had thought of proposing that we code
>> palloc() like this:
>> 
>> void *
>> palloc(Size size)
>> {
>>     MemoryContext context = CurrentMemoryContext;
>> 
>>     AssertArg(MemoryContextIsValid(context));
>> 
>>     if (!AllocSizeIsValid(size))
>>         elog(ERROR, "invalid memory alloc request size %lu",
>>              (unsigned long) size);
>> 
>>     context->isReset = false;
>> 
>>     return (*context->methods->alloc) (context, size);
>> }
>> 
>> but at least on this specific hardware and compiler that would evidently
>> be a net loss compared to direct use of CurrentMemoryContext.  I would
>> not put a lot of faith in that result holding up on other machines
>> though.

> Thats not optimized to the same? ISTM the compiler should produce
> exactly the same code for both.

No, that's exactly the point here, you can't just assume that you get
the same code; this is tense enough that a few instructions matter.
Remember we're considering non-assert builds, so the AssertArg vanishes.
So in the form where CurrentMemoryContext is only read after the elog
call, the compiler can see that it requires but one fetch - it can't
change within the two lines where it's used.  In the form given above,
the compiler is required to fetch CurrentMemoryContext into a local
variable and keep it across the elog call.  (We know this doesn't
matter, but gcc doesn't; this version of gcc apparently isn't doing much
with the knowledge that elog won't return.)  Since the extra local
variable adds several instructions to the overall function's entry/exit
sequences, you come out behind.  At least on this platform.  On other
machines with different code generation behavior, the story could easily
be very different.

>> In any case this doesn't explain the whole 2% speedup, but it probably
>> accounts for palloc() showing as slightly cheaper than
>> MemoryContextAlloc had been in the oprofile listing.

> I'd guess that a good part of the cost is just smeared across all
> callers and not individually accountable to any function visible in the
> profile.

Yeah, I think this is most likely showing that there is a real,
measurable cost of code bloat.
        regards, tom lane

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Tom Lane

Date:

09 January 2013, 22:35:20

Andres Freund <andres@2ndquadrant.com> writes:
> I wonder whether it makes sense to "inline" the contents pstrdup()
> additionally? My gut feeling is not, but...

I don't recall ever having seen MemoryContextStrdup amount to much,
so probably not worth the extra code space.
        regards, tom lane

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Andres Freund

Date:

09 January 2013, 23:05:23

On 2013-01-09 17:28:35 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2013-01-09 15:43:19 -0500, Tom Lane wrote:
> >> I had thought of proposing that we code
> >> palloc() like this:
> >>
> >> void *
> >> palloc(Size size)
> >> {
> >>     MemoryContext context = CurrentMemoryContext;
> >>
> >>     AssertArg(MemoryContextIsValid(context));
> >>
> >>     if (!AllocSizeIsValid(size))
> >>         elog(ERROR, "invalid memory alloc request size %lu",
> >>              (unsigned long) size);
> >>
> >>     context->isReset = false;
> >>
> >>     return (*context->methods->alloc) (context, size);
> >> }
> >>
> >> but at least on this specific hardware and compiler that would evidently
> >> be a net loss compared to direct use of CurrentMemoryContext.  I would
> >> not put a lot of faith in that result holding up on other machines
> >> though.
>
> > Thats not optimized to the same? ISTM the compiler should produce
> > exactly the same code for both.
>
> No, that's exactly the point here, you can't just assume that you get
> the same code; this is tense enough that a few instructions matter.
> Remember we're considering non-assert builds, so the AssertArg vanishes.
> So in the form where CurrentMemoryContext is only read after the elog
> call, the compiler can see that it requires but one fetch - it can't
> change within the two lines where it's used.  In the form given above,
> the compiler is required to fetch CurrentMemoryContext into a local
> variable and keep it across the elog call.

I had thought that it could simply store the address in a callee
saved register inside palloc, totally forgetting that palloc would need
to save that register itself in that case.

> (We know this doesn't
> matter, but gcc doesn't; this version of gcc apparently isn't doing much
> with the knowledge that elog won't return.)

Afaics one reason for that is that we don't have any such annotation for
elog(), just for ereport. And I don't immediately see how it could be
added to elog without relying on variadic macros. Bit of a shame, there
probably a good number of callsites that could actually benefit from
that knowledge.
Is it worth making that annotation conditional on variadic macro
support?

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Andres Freund

Date:

10 January 2013, 09:31:16

On 2013-01-10 00:05:07 +0100, Andres Freund wrote:
> On 2013-01-09 17:28:35 -0500, Tom Lane wrote:
> > (We know this doesn't
> > matter, but gcc doesn't; this version of gcc apparently isn't doing much
> > with the knowledge that elog won't return.)
>
> Afaics one reason for that is that we don't have any such annotation for
> elog(), just for ereport. And I don't immediately see how it could be
> added to elog without relying on variadic macros. Bit of a shame, there
> probably a good number of callsites that could actually benefit from
> that knowledge.
> Is it worth making that annotation conditional on variadic macro
> support?

A quick test confirms that. With:

diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index cbbda04..39cd809 100644
--- a/src/include/utils/elog.h
+++ b/src/include/utils/elog.h
@@ -212,7 +212,13 @@ extern int    getinternalerrposition(void); *        elog(ERROR, "portal \"%s\" not found",
stmt->portalname);*---------- */

+#ifdef __GNUC__
+#define elog(elevel, ...) elog_start(__FILE__, __LINE__, PG_FUNCNAME_MACRO), \
+        elog_finish(elevel, __VA_ARGS__),                                \
+        ((elevel) >= ERROR ? __builtin_unreachable() : (void) 0)
+#else#define elog    elog_start(__FILE__, __LINE__, PG_FUNCNAME_MACRO), elog_finish
+#endifextern void elog_start(const char *filename, int lineno, const char *funcname);extern void

nearly the same code is generated for both variants (this is no
additionally saved registers and such).

Two unfinished things:
* the above only checks for gcc although all C99 compilers should support varargs macros
* It uses __builtin_unreachable() instead of abort() as that creates a smaller executable. That's gcc 4.5 only. Saves
about30kb in a both a stripped and unstripped binary.

* elog(ERROR) i.e. no args would require more macro ugliness, but that seems unneccesary

Doing the __builtin_unreachable() for ereport as well saves about 50kb.

Given that some of those elog/ereports are in performance critical code
paths it seems like a good idea to remove code from the
callsites. Adding configure check for __VA_ARGS__ and
__builtin_unreachable() should be simple enough?

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Andres Freund

Date:

10 January 2013, 09:55:27

On 2013-01-10 10:31:04 +0100, Andres Freund wrote:
> On 2013-01-10 00:05:07 +0100, Andres Freund wrote:
> > On 2013-01-09 17:28:35 -0500, Tom Lane wrote:
> > > (We know this doesn't
> > > matter, but gcc doesn't; this version of gcc apparently isn't doing much
> > > with the knowledge that elog won't return.)
> >
> > Afaics one reason for that is that we don't have any such annotation for
> > elog(), just for ereport. And I don't immediately see how it could be
> > added to elog without relying on variadic macros. Bit of a shame, there
> > probably a good number of callsites that could actually benefit from
> > that knowledge.
> > Is it worth making that annotation conditional on variadic macro
> > support?
>
> A quick test confirms that. With:
>
> diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
> index cbbda04..39cd809 100644
> --- a/src/include/utils/elog.h
> +++ b/src/include/utils/elog.h
> @@ -212,7 +212,13 @@ extern int    getinternalerrposition(void);
>   *        elog(ERROR, "portal \"%s\" not found", stmt->portalname);
>   *----------
>   */
> +#ifdef __GNUC__
> +#define elog(elevel, ...) elog_start(__FILE__, __LINE__, PG_FUNCNAME_MACRO), \
> +        elog_finish(elevel, __VA_ARGS__),                                \
> +        ((elevel) >= ERROR ? __builtin_unreachable() : (void) 0)
> +#else
>  #define elog    elog_start(__FILE__, __LINE__, PG_FUNCNAME_MACRO), elog_finish
> +#endif
>
>  extern void elog_start(const char *filename, int lineno, const char *funcname);
>  extern void
>
> nearly the same code is generated for both variants (this is no
> additionally saved registers and such).
>
> Two unfinished things:
> * the above only checks for gcc although all C99 compilers should
>   support varargs macros
> * It uses __builtin_unreachable() instead of abort() as that creates
>   a smaller executable. That's gcc 4.5 only. Saves about 30kb in a
>   both a stripped and unstripped binary.
> * elog(ERROR) i.e. no args would require more macro ugliness, but that
>   seems unneccesary
>
> Doing the __builtin_unreachable() for ereport as well saves about 50kb.
>
> Given that some of those elog/ereports are in performance critical code
> paths it seems like a good idea to remove code from the
> callsites. Adding configure check for __VA_ARGS__ and
> __builtin_unreachable() should be simple enough?

For whatever its worth - I am not sure its much - after relying on
variadic macros we can make elog into one function call instead of
two. That saves another 100kb.

[tinker]

The builtin_unreachable and combined elog thing bring a measurable
performance benefit of a rather surprising 7% when running Pavel's
testcase ontop of yesterdays HEAD. So it seems worth investigating.
About 3% of that is the __builtin_unreachable() and about 4% the
combining of elog_start/finish.

Now that testcase sure isn't a very representative one for this, but I
don't really see why it should benefit much more than other cases. Seems
to be quite the benefit for relatively minor cost.
Although I am pretty sure just copying the whole of
elog_start/elog_finish into one elog_all() isn't the best way to go at
this...

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Andres Freund

Date:

11 January 2013, 16:05:09

On 2013-01-10 10:55:20 +0100, Andres Freund wrote:
> On 2013-01-10 10:31:04 +0100, Andres Freund wrote:
> > On 2013-01-10 00:05:07 +0100, Andres Freund wrote:
> > > On 2013-01-09 17:28:35 -0500, Tom Lane wrote:
> > > > (We know this doesn't
> > > > matter, but gcc doesn't; this version of gcc apparently isn't doing much
> > > > with the knowledge that elog won't return.)
> > >
> > > Afaics one reason for that is that we don't have any such annotation for
> > > elog(), just for ereport. And I don't immediately see how it could be
> > > added to elog without relying on variadic macros. Bit of a shame, there
> > > probably a good number of callsites that could actually benefit from
> > > that knowledge.
> > > Is it worth making that annotation conditional on variadic macro
> > > support?
> >
> > A quick test confirms that. With:
> >
> > diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
> > index cbbda04..39cd809 100644
> > --- a/src/include/utils/elog.h
> > +++ b/src/include/utils/elog.h
> > @@ -212,7 +212,13 @@ extern int    getinternalerrposition(void);
> >   *        elog(ERROR, "portal \"%s\" not found", stmt->portalname);
> >   *----------
> >   */
> > +#ifdef __GNUC__
> > +#define elog(elevel, ...) elog_start(__FILE__, __LINE__, PG_FUNCNAME_MACRO), \
> > +        elog_finish(elevel, __VA_ARGS__),                                \
> > +        ((elevel) >= ERROR ? __builtin_unreachable() : (void) 0)
> > +#else
> >  #define elog    elog_start(__FILE__, __LINE__, PG_FUNCNAME_MACRO), elog_finish
> > +#endif
> >
> >  extern void elog_start(const char *filename, int lineno, const char *funcname);
> >  extern void
> >
> > nearly the same code is generated for both variants (this is no
> > additionally saved registers and such).
> >
> > Two unfinished things:
> > * the above only checks for gcc although all C99 compilers should
> >   support varargs macros
> > * It uses __builtin_unreachable() instead of abort() as that creates
> >   a smaller executable. That's gcc 4.5 only. Saves about 30kb in a
> >   both a stripped and unstripped binary.
> > * elog(ERROR) i.e. no args would require more macro ugliness, but that
> >   seems unneccesary
> >
> > Doing the __builtin_unreachable() for ereport as well saves about 50kb.
> >
> > Given that some of those elog/ereports are in performance critical code
> > paths it seems like a good idea to remove code from the
> > callsites. Adding configure check for __VA_ARGS__ and
> > __builtin_unreachable() should be simple enough?
>
> For whatever its worth - I am not sure its much - after relying on
> variadic macros we can make elog into one function call instead of
> two. That saves another 100kb.
>
> [tinker]
>
> The builtin_unreachable and combined elog thing bring a measurable
> performance benefit of a rather surprising 7% when running Pavel's
> testcase ontop of yesterdays HEAD. So it seems worth investigating.
> About 3% of that is the __builtin_unreachable() and about 4% the
> combining of elog_start/finish.
>
> Now that testcase sure isn't a very representative one for this, but I
> don't really see why it should benefit much more than other cases. Seems
> to be quite the benefit for relatively minor cost.
> Although I am pretty sure just copying the whole of
> elog_start/elog_finish into one elog_all() isn't the best way to go at
> this...

The attached patch:

* adds configure checks for __VA_ARGS__ and __builtin_unreachable
  support
* provides a pg_unreachable() macro that expands to
  __builtin_unreachable() or abort()
* changes #define elog ... into #define elog(elevel, ...) if varargs are
  available

It does *not* combine elog_start and elog_finish into one function if
varargs are available although that brings a rather measurable
size/performance benefit.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

0001-Mark-elog-as-not-returning-if-the-capabilities-of-th.patch

Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Tom Lane

Date:

11 January 2013, 19:01:51

Andres Freund <andres@2ndquadrant.com> writes:
> The attached patch:

> * adds configure checks for __VA_ARGS__ and __builtin_unreachable
>   support
> * provides a pg_unreachable() macro that expands to
>   __builtin_unreachable() or abort()
> * changes #define elog ... into #define elog(elevel, ...) if varargs are
>   available

Seems like a good thing to do --- will review and commit.

> It does *not* combine elog_start and elog_finish into one function if
> varargs are available although that brings a rather measurable
> size/performance benefit.

Since you've apparently already done the measurement: how much?
It would be a bit tedious to support two different infrastructures for
elog(), but if it's a big enough win maybe we should.
        regards, tom lane

Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Andres Freund

Date:

11 January 2013, 19:33:43

On 2013-01-11 14:01:40 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > The attached patch:
>
> > * adds configure checks for __VA_ARGS__ and __builtin_unreachable
> >   support
> > * provides a pg_unreachable() macro that expands to
> >   __builtin_unreachable() or abort()
> > * changes #define elog ... into #define elog(elevel, ...) if varargs are
> >   available
>
> Seems like a good thing to do --- will review and commit.

Thanks.

I guess you will catch that anyway, but afaik msvc supports __VA_ARGS__
these days (since v2005 seemingly) and I didn't add HAVE__VA_ARGS to the
respective pg_config.h.win32

> > It does *not* combine elog_start and elog_finish into one function if
> > varargs are available although that brings a rather measurable
> > size/performance benefit.
>
> Since you've apparently already done the measurement: how much?
> It would be a bit tedious to support two different infrastructures for
> elog(), but if it's a big enough win maybe we should.

Imo its pretty definitely a big enough win. So big I have a hard time
believing it can be true without negative effects somewhere else.

Ontop of the patch youve replied to it saves somewhere around 80kb and
between 0.8 (-S -M prepared), 2% (-S -M simple) and 4% (own stuff) I had
lying around and it consistently gives 6%-7% in Pavel's testcase. With
todays HEAD, not with the one before your fix.

I attached the absolutely ugly and unready patch I used for testing.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

ugly-elog-fold.patch

Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Tom Lane

Date:

11 January 2013, 20:06:03

Andres Freund <andres@2ndquadrant.com> writes:
> [ patch to mark elog() as non-returning if elevel >= ERROR ]

It strikes me that both in this patch, and in Peter's previous patch to
do this for ereport(), there is an opportunity for improvement.  Namely,
that the added code only has any use if elevel is a constant, which in
some places it isn't.  We don't really need a call to abort() to be
executed there, but the compiler knows no better and will have to
generate code to test the value of elevel and call abort().  Which
rather defeats the purpose of saving code; plus the compiler will still
not have any knowledge that code after the ereport isn't reached.
And another thing: what if the elevel argument isn't safe for multiple
evaluation?  No such hazard ever existed before these patches, so I'm
not very comfortable with adding one.  (Even if all our own code is
safe, there's third-party code to worry about.)

When we're using recent gcc, there is an easy fix, which is that the
macros could do this:

__builtin_constant_p(elevel) && (elevel) >= ERROR : pg_unreachable() : (void) 0

This gets rid of both useless code and the risk of undesirable multiple
evaluation, while not giving up any optimization possibility that
existed before.  So I think we should add a configure test for
__builtin_constant_p() and do it like that when possible.

That still leaves the question of what to do when the compiler doesn't
have __builtin_constant_p().  Should we use the coding that we have,
or give up and not try to mark the macros as non-returning?  I'm rather
inclined to do the latter, because of the multiple-evaluation-risk
argument.

Comments?
        regards, tom lane

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Andres Freund

Date:

11 January 2013, 20:33:51

On 2013-01-11 15:05:54 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > [ patch to mark elog() as non-returning if elevel >= ERROR ]
>
> It strikes me that both in this patch, and in Peter's previous patch to
> do this for ereport(), there is an opportunity for improvement.  Namely,
> that the added code only has any use if elevel is a constant, which in
> some places it isn't.  We don't really need a call to abort() to be
> executed there, but the compiler knows no better and will have to
> generate code to test the value of elevel and call abort().
> Which rather defeats the purpose of saving code; plus the compiler will still
> not have any knowledge that code after the ereport isn't reached.

Yea, I think that's why __builtin_unreachable() is a performance
benefit in comparison with abort().
Both are a benefit over not doing either btw in my measurements.

> And another thing: what if the elevel argument isn't safe for multiple
> evaluation?  No such hazard ever existed before these patches, so I'm
> not very comfortable with adding one.  (Even if all our own code is
> safe, there's third-party code to worry about.)

Hm. I am not really too scared about those dangers I have to admit. If
at all I am caring more for ereport than for elog.
Placing code with side effects as elevel arguments just seems a bit too
absurd.

> When we're using recent gcc, there is an easy fix, which is that the
> macros could do this:

Recent as in 3.1 or so ;)

Its really too bad that there's no __builtin_pure() or
__builtin_side_effect_free() or somesuch :(.

> __builtin_constant_p(elevel) && (elevel) >= ERROR : pg_unreachable() : (void) 0
>
> This gets rid of both useless code and the risk of undesirable multiple
> evaluation, while not giving up any optimization possibility that
> existed before.  So I think we should add a configure test for
> __builtin_constant_p() and do it like that when possible.

I think there might be code somewhere that decides between FATAL and
PANIC which would not hit that path then. Imo not a good enough reason
not to do it, but I wanted to mention it anyway.

> That still leaves the question of what to do when the compiler doesn't
> have __builtin_constant_p().  Should we use the coding that we have,
> or give up and not try to mark the macros as non-returning?  I'm rather
> inclined to do the latter, because of the multiple-evaluation-risk
> argument.

I dislike the latter somewhat as it would mean not to give that
information at all to msvc and others which seems a bit sad. But I don't
feel particularly strongly.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Tom Lane

Date:

11 January 2013, 20:52:28

Andres Freund <andres@2ndquadrant.com> writes:
> On 2013-01-11 15:05:54 -0500, Tom Lane wrote:
>> And another thing: what if the elevel argument isn't safe for multiple
>> evaluation?  No such hazard ever existed before these patches, so I'm
>> not very comfortable with adding one.  (Even if all our own code is
>> safe, there's third-party code to worry about.)

> Hm. I am not really too scared about those dangers I have to admit.

I agree the scenario doesn't seem all that probable, but what scares me
here is that if we use "__builtin_constant_p(elevel) && (elevel) >= ERROR"
in some builds, and just "(elevel) >= ERROR" in others, then if there is
any code with a multiple-evaluation hazard, it is only buggy in the
latter builds.  That's sufficiently nasty that I'm willing to give up
an optimization that we never had before 9.3 anyway.

> I dislike the latter somewhat as it would mean not to give that
> information at all to msvc and others which seems a bit sad. But I don't
> feel particularly strongly.

It's not like our code isn't rather gcc-biased anyway.  I'd keep the
optimization for other compilers if possible, but not at the cost of
introducing bugs that occur only on those compilers.
        regards, tom lane

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Andres Freund

Date:

11 January 2013, 21:02:11

On 2013-01-11 15:52:19 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2013-01-11 15:05:54 -0500, Tom Lane wrote:
> >> And another thing: what if the elevel argument isn't safe for multiple
> >> evaluation?  No such hazard ever existed before these patches, so I'm
> >> not very comfortable with adding one.  (Even if all our own code is
> >> safe, there's third-party code to worry about.)
> 
> > Hm. I am not really too scared about those dangers I have to admit.
> 
> I agree the scenario doesn't seem all that probable, but what scares me
> here is that if we use "__builtin_constant_p(elevel) && (elevel) >= ERROR"
> in some builds, and just "(elevel) >= ERROR" in others, then if there is
> any code with a multiple-evaluation hazard, it is only buggy in the
> latter builds.  That's sufficiently nasty that I'm willing to give up
> an optimization that we never had before 9.3 anyway.

Well, why use it at all then and not just rely on
__builtin_unreachable() in any recent gcc (and llvm fwiw) and abort()
otherwise? Then the code is small for anything recent (gcc 4.4 afair)
and always consistently buggy.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Tom Lane

Date:

11 January 2013, 21:18:21

Andres Freund <andres@2ndquadrant.com> writes:
> On 2013-01-11 15:52:19 -0500, Tom Lane wrote:
>> I agree the scenario doesn't seem all that probable, but what scares me
>> here is that if we use "__builtin_constant_p(elevel) && (elevel) >= ERROR"
>> in some builds, and just "(elevel) >= ERROR" in others, then if there is
>> any code with a multiple-evaluation hazard, it is only buggy in the
>> latter builds.  That's sufficiently nasty that I'm willing to give up
>> an optimization that we never had before 9.3 anyway.

> Well, why use it at all then and not just rely on
> __builtin_unreachable() in any recent gcc (and llvm fwiw) and abort()
> otherwise? Then the code is small for anything recent (gcc 4.4 afair)
> and always consistently buggy.

Uh ... because it's *not* unreachable if elevel < ERROR.  Otherwise we'd
just mark errfinish as __attribute((noreturn)) and be done.  Of course,
that's a gcc-ism too.
        regards, tom lane

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Andres Freund

Date:

11 January 2013, 21:18:48

On 2013-01-11 16:16:58 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2013-01-11 15:52:19 -0500, Tom Lane wrote:
> >> I agree the scenario doesn't seem all that probable, but what scares me
> >> here is that if we use "__builtin_constant_p(elevel) && (elevel) >= ERROR"
> >> in some builds, and just "(elevel) >= ERROR" in others, then if there is
> >> any code with a multiple-evaluation hazard, it is only buggy in the
> >> latter builds.  That's sufficiently nasty that I'm willing to give up
> >> an optimization that we never had before 9.3 anyway.
>
> > Well, why use it at all then and not just rely on
> > __builtin_unreachable() in any recent gcc (and llvm fwiw) and abort()
> > otherwise? Then the code is small for anything recent (gcc 4.4 afair)
> > and always consistently buggy.
>
> Uh ... because it's *not* unreachable if elevel < ERROR.  Otherwise we'd
> just mark errfinish as __attribute((noreturn)) and be done.  Of course,
> that's a gcc-ism too.

Well, I mean with the double evaluation risk.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Tom Lane

Date:

11 January 2013, 21:28:23

Andres Freund <andres@2ndquadrant.com> writes:
> On 2013-01-11 16:16:58 -0500, Tom Lane wrote:
>> Uh ... because it's *not* unreachable if elevel < ERROR.  Otherwise we'd
>> just mark errfinish as __attribute((noreturn)) and be done.  Of course,
>> that's a gcc-ism too.

> Well, I mean with the double evaluation risk.

Oh, are you saying you don't want to make the __builtin_constant_p
addition?  I'm not very satisfied with that answer.  Right now, Peter's
patch has added a class of bugs that never existed before 9.3, and yours
would add more.  It might well be that those classes are empty ... but
*we can't know that*.  I don't think that keeping a new optimization for
non-gcc compilers is worth that risk.  Postgres is already full of
gcc-only optimizations, anyway.
        regards, tom lane

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Andres Freund

Date:

11 January 2013, 21:46:24

On 2013-01-11 16:28:13 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2013-01-11 16:16:58 -0500, Tom Lane wrote:
> >> Uh ... because it's *not* unreachable if elevel < ERROR.  Otherwise we'd
> >> just mark errfinish as __attribute((noreturn)) and be done.  Of course,
> >> that's a gcc-ism too.
> 
> > Well, I mean with the double evaluation risk.
> 
> Oh, are you saying you don't want to make the __builtin_constant_p
> addition?

Yes, that was what I wanted to say.

> I'm not very satisfied with that answer.  Right now, Peter's
> patch has added a class of bugs that never existed before 9.3, and yours
> would add more.  It might well be that those classes are empty ... but
> *we can't know that*.  I don't think that keeping a new optimization for
> non-gcc compilers is worth that risk.  Postgres is already full of
> gcc-only optimizations, anyway.

ISTM that ereport() already has so strange behaviour wrt evaluation of
arguments (i.e doing it only when the message is really logged) that its
doesn't seem to add a real additional risk.
It also means that we potentially need to add more quieting returns et
al to keep the warning level on non-gcc compatible compilers low
(probably only msvc matters here) which we all don't see on our primary
compilers.

Anyway, youve got a strong opinion and I don't, so ...

Andres

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Tom Lane

Date:

11 January 2013, 21:49:41

Andres Freund <andres@2ndquadrant.com> writes:
> On 2013-01-11 16:28:13 -0500, Tom Lane wrote:
>> I'm not very satisfied with that answer.  Right now, Peter's
>> patch has added a class of bugs that never existed before 9.3, and yours
>> would add more.  It might well be that those classes are empty ... but
>> *we can't know that*.  I don't think that keeping a new optimization for
>> non-gcc compilers is worth that risk.  Postgres is already full of
>> gcc-only optimizations, anyway.

> ISTM that ereport() already has so strange behaviour wrt evaluation of
> arguments (i.e doing it only when the message is really logged) that its
> doesn't seem to add a real additional risk.

Hm ... well, that's a point.  I may be putting too much weight on the
fact that any such bug for elevel would be a new one that never existed
before.  What do others think?
        regards, tom lane

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Heikki Linnakangas

Date:

12 January 2013, 10:26:56

On 11.01.2013 23:49, Tom Lane wrote:
> Andres Freund<andres@2ndquadrant.com>  writes:
>> On 2013-01-11 16:28:13 -0500, Tom Lane wrote:
>>> I'm not very satisfied with that answer.  Right now, Peter's
>>> patch has added a class of bugs that never existed before 9.3, and yours
>>> would add more.  It might well be that those classes are empty ... but
>>> *we can't know that*.  I don't think that keeping a new optimization for
>>> non-gcc compilers is worth that risk.  Postgres is already full of
>>> gcc-only optimizations, anyway.
>
>> ISTM that ereport() already has so strange behaviour wrt evaluation of
>> arguments (i.e doing it only when the message is really logged) that its
>> doesn't seem to add a real additional risk.
>
> Hm ... well, that's a point.  I may be putting too much weight on the
> fact that any such bug for elevel would be a new one that never existed
> before.  What do others think?

It sure would be nice to avoid multiple evaluation. At least in xlog.c
we use emode_for_corrupt_record() to pass the elevel, and it does have a
side-effect. It's quite harmless in practice, you'll get some extra
DEBUG1 messages in the log, but still.

Here's one more construct to consider:

#define ereport_domain(elevel, domain, rest) \do { \    const int elevel_ = elevel;    \    if
(errstart(elevel_,__FILE__,__LINE__, PG_FUNCNAME_MACRO, domain) ||  
elevel_>= ERROR) \    { \        (void) rest; \        if (elevel_>= ERROR) \            errfinish_noreturn(1); \
else \            errfinish(1); \    } \} while(0) 

extern void errfinish(int dummy,...);
extern void errfinish_noreturn(int dummy,...) __attribute__((noreturn));

With do-while, ereport() would no longer be an expression. There only
place where that's a problem in our codebase is in scan.l, bootscanner.l
and repl_scanner.l, which do this:

#undef fprintf
#define fprintf(file, fmt, msg)  ereport(ERROR, (errmsg_internal("%s",
msg)))

and flex does this:
        (void) fprintf( stderr, "%s\n", msg );

but that's easy to work around by creating a wrapper fprintf function in
those .l files.

When I compile with gcc -O0, I get one warning with this:

datetime.c: In function ‘DateTimeParseError’:
datetime.c:3575:1: warning: ‘noreturn’ function does return [enabled by
default]

That suggests that the compiler didn't correctly deduce that
ereport(ERROR, ...) doesn't return. With -O1, the warning goes away.

- Heikki

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Andres Freund

Date:

12 January 2013, 17:20:40

On 2013-01-12 12:26:37 +0200, Heikki Linnakangas wrote:
> On 11.01.2013 23:49, Tom Lane wrote:
> >Andres Freund<andres@2ndquadrant.com>  writes:
> >>On 2013-01-11 16:28:13 -0500, Tom Lane wrote:
> >>>I'm not very satisfied with that answer.  Right now, Peter's
> >>>patch has added a class of bugs that never existed before 9.3, and yours
> >>>would add more.  It might well be that those classes are empty ... but
> >>>*we can't know that*.  I don't think that keeping a new optimization for
> >>>non-gcc compilers is worth that risk.  Postgres is already full of
> >>>gcc-only optimizations, anyway.
> >
> >>ISTM that ereport() already has so strange behaviour wrt evaluation of
> >>arguments (i.e doing it only when the message is really logged) that its
> >>doesn't seem to add a real additional risk.
> >
> >Hm ... well, that's a point.  I may be putting too much weight on the
> >fact that any such bug for elevel would be a new one that never existed
> >before.  What do others think?
> 
> It sure would be nice to avoid multiple evaluation. At least in xlog.c we
> use emode_for_corrupt_record() to pass the elevel, and it does have a
> side-effect. It's quite harmless in practice, you'll get some extra DEBUG1
> messages in the log, but still.
> 
> Here's one more construct to consider:
> 
> #define ereport_domain(elevel, domain, rest) \
>     do { \
>         const int elevel_ = elevel;    \
>         if (errstart(elevel_,__FILE__, __LINE__, PG_FUNCNAME_MACRO, domain) ||
> elevel_>= ERROR) \
>         { \
>             (void) rest; \
>             if (elevel_>= ERROR) \
>                 errfinish_noreturn(1); \
>             else \
>                 errfinish(1); \
>         } \
>     } while(0)
> 
> extern void errfinish(int dummy,...);
> extern void errfinish_noreturn(int dummy,...) __attribute__((noreturn));

I am afraid that that would bloat the code again as it would have to put
that if() into each callsite whereas a variant that ends up using
__builtin_unreachable() doesn't. So I think we should use
__builtin_unreachable() while falling back to abort() as before. But
that's really more a detail than anything fundamental about the approach.

I'll play a bit.

> With do-while, ereport() would no longer be an expression. There only place
> where that's a problem in our codebase is in scan.l, bootscanner.l and
> repl_scanner.l, which do this:

I aggree that would easy to fix, so no problem there.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Tom Lane

Date:

12 January 2013, 18:17:12

Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> On 11.01.2013 23:49, Tom Lane wrote:
>> Hm ... well, that's a point.  I may be putting too much weight on the
>> fact that any such bug for elevel would be a new one that never existed
>> before.  What do others think?

> It sure would be nice to avoid multiple evaluation. At least in xlog.c 
> we use emode_for_corrupt_record() to pass the elevel, and it does have a 
> side-effect.

Um.  I had forgotten about that example, but its existence is sufficient
to reinforce my opinion that we must not risk double evaluation of the
elevel argument.

> Here's one more construct to consider:

> #define ereport_domain(elevel, domain, rest) \
>     do { \
>         const int elevel_ = elevel;    \
>         if (errstart(elevel_,__FILE__, __LINE__, PG_FUNCNAME_MACRO, domain) || 
> elevel_>= ERROR) \
>         { \
>             (void) rest; \
>             if (elevel_>= ERROR) \
>                 errfinish_noreturn(1); \
>             else \
>                 errfinish(1); \
>         } \
>     } while(0)

I don't care for that too much in detail -- if errstart were to return
false (it shouldn't, but if it did) this would be utterly broken,
because we'd be making error reporting calls that elog.c wouldn't be
prepared to accept.  We should stick to the technique of doing the
ereport as today and then adding something that tells the compiler we
don't expect to get here; preferably something that emits no added code.

However, using a do-block with a local variable is definitely something
worth considering.  I'm getting less enamored of the __builtin_constant_p
idea after finding out that the gcc boys seem to have curious ideas
about what its semantics ought to be:
https://bugzilla.redhat.com/show_bug.cgi?id=894515
        regards, tom lane

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Andres Freund

Date:

12 January 2013, 18:42:32

On 2013-01-12 13:16:56 -0500, Tom Lane wrote:
> Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> > On 11.01.2013 23:49, Tom Lane wrote:
> >> Hm ... well, that's a point.  I may be putting too much weight on the
> >> fact that any such bug for elevel would be a new one that never existed
> >> before.  What do others think?
>
> > It sure would be nice to avoid multiple evaluation. At least in xlog.c
> > we use emode_for_corrupt_record() to pass the elevel, and it does have a
> > side-effect.
>
> Um.  I had forgotten about that example, but its existence is sufficient
> to reinforce my opinion that we must not risk double evaluation of the
> elevel argument.

Unfortunately I aggree :(

> > Here's one more construct to consider:
>
> > #define ereport_domain(elevel, domain, rest) \
> >     do { \
> >         const int elevel_ = elevel;    \
> >         if (errstart(elevel_,__FILE__, __LINE__, PG_FUNCNAME_MACRO, domain) ||
> > elevel_>= ERROR) \
> >         { \
> >             (void) rest; \
> >             if (elevel_>= ERROR) \
> >                 errfinish_noreturn(1); \
> >             else \
> >                 errfinish(1); \
> >         } \
> >     } while(0)
>
> I don't care for that too much in detail -- if errstart were to return
> false (it shouldn't, but if it did) this would be utterly broken,
> because we'd be making error reporting calls that elog.c wouldn't be
> prepared to accept.  We should stick to the technique of doing the
> ereport as today and then adding something that tells the compiler we
> don't expect to get here; preferably something that emits no added code.

Yea, I didn't really like it all that much either - but anyway, I have
yet to find *any* version with a local variable that doesn't lead to
200kb size increase :(.

> However, using a do-block with a local variable is definitely something
> worth considering.  I'm getting less enamored of the __builtin_constant_p
> idea after finding out that the gcc boys seem to have curious ideas
> about what its semantics ought to be:
> https://bugzilla.redhat.com/show_bug.cgi?id=894515

I wonder whether __builtin_choose_expr is any better?

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Heikki Linnakangas

Date:

12 January 2013, 20:14:20

On 12.01.2013 20:42, Andres Freund wrote:
> On 2013-01-12 13:16:56 -0500, Tom Lane wrote:
>> Heikki Linnakangas<hlinnakangas@vmware.com>  writes:
>>> Here's one more construct to consider:
>>
>>> #define ereport_domain(elevel, domain, rest) \
>>>     do { \
>>>         const int elevel_ = elevel;    \
>>>         if (errstart(elevel_,__FILE__, __LINE__, PG_FUNCNAME_MACRO, domain) ||
>>> elevel_>= ERROR) \
>>>         { \
>>>             (void) rest; \
>>>             if (elevel_>= ERROR) \
>>>                 errfinish_noreturn(1); \
>>>             else \
>>>                 errfinish(1); \
>>>         } \
>>>     } while(0)
>>
>> I don't care for that too much in detail -- if errstart were to return
>> false (it shouldn't, but if it did) this would be utterly broken,
>> because we'd be making error reporting calls that elog.c wouldn't be
>> prepared to accept.  We should stick to the technique of doing the
>> ereport as today and then adding something that tells the compiler we
>> don't expect to get here; preferably something that emits no added code.

With the current ereport(), we'll call abort() if errstart returns false
and elevel >= ERROR. That's even worse than making an error reporting
calls that elog.c isn't prepared for. If we take that risk seriously,
with bogus error reporting calls we at least have chance of catching it
and reporting an error.

> Yea, I didn't really like it all that much either - but anyway, I have
> yet to find *any* version with a local variable that doesn't lead to
> 200kb size increase :(.

Hmm, that's strange. Assuming the optimizer optimizes away the paths in
the macro that are never taken when elevel is a constant, I would expect
the resulting binary to be smaller than what we have now. All those
useless abort() calls must take up some space.

Here's what I got with and without my patch on my laptop:

-rwxr-xr-x 1 heikki heikki 24767237 tammi 12 21:27 postgres-do-while-mypatch
-rwxr-xr-x 1 heikki heikki 24623081 tammi 12 21:28 postgres-unpatched

But when I build without --enable-debug, the situation reverses:

-rwxr-xr-x 1 heikki heikki 5961309 tammi 12 21:48 postgres-do-while-mypatch
-rwxr-xr-x 1 heikki heikki 5986961 tammi 12 21:49 postgres-unpatched

I suspect you're also just seeing bloat caused by more debugging
symbols, which won't have any effect at runtime. It would be nice to
have smaller debug-enabled builds, too, of course, but it's not that
important.

>> However, using a do-block with a local variable is definitely something
>> worth considering.  I'm getting less enamored of the __builtin_constant_p
>> idea after finding out that the gcc boys seem to have curious ideas
>> about what its semantics ought to be:
>> https://bugzilla.redhat.com/show_bug.cgi?id=894515
>
> I wonder whether __builtin_choose_expr is any better?

I forgot to mention (and it wasn't visible in the snippet I posted) the
reason I used __attribute__ ((noreturn)). We already use that attribute
elsewhere, so this optimization wouldn't require anything more from the
compiler than what already take advantage of. I think that noreturn is
more widely supported than these other constructs. Also, if I'm reading
the MSDN docs correctly, while MSVC doesn't understand __attribute__
((noreturn)) directly, it has an equivalent syntax _declspec(noreturn).
So we could easily support MSVC with this approach.

Attached is the complete patch I used for the above tests.

- Heikki

Attachment

ereport-with-do-while-1.patch

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Tom Lane

Date:

12 January 2013, 20:39:25

Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> On 12.01.2013 20:42, Andres Freund wrote:
>>> I don't care for that too much in detail -- if errstart were to return
>>> false (it shouldn't, but if it did) this would be utterly broken,

> With the current ereport(), we'll call abort() if errstart returns false 
> and elevel >= ERROR. That's even worse than making an error reporting 
> calls that elog.c isn't prepared for.

No it isn't: you'd get a clean and easily interpretable abort.  I am not
sure exactly what would happen if we plow forward with calling elog.c
functions after errstart returns false, but it wouldn't be good, and
debugging a field report of such behavior wouldn't be easy.

This is actually a disadvantage of the proposal to replace the abort()
calls with __builtin_unreachable(), too.  The gcc boys interpret the
semantics of that as "if control reaches here, the behavior is
undefined" --- and their notion of undefined generally seems to amount
to "we will screw you as hard as we can".  For example, they have no
problem with using the assumption that control won't reach there to make
deductions while optimizing the rest of the function.  If it ever did
happen, I would not want to have to debug it.  The same goes for
__attribute__((noreturn)), BTW.

>> Yea, I didn't really like it all that much either - but anyway, I have
>> yet to find *any* version with a local variable that doesn't lead to
>> 200kb size increase :(.

> Hmm, that's strange. Assuming the optimizer optimizes away the paths in 
> the macro that are never taken when elevel is a constant, I would expect 
> the resulting binary to be smaller than what we have now.

I was wondering about that too, but haven't tried it locally yet.  It
could be that Andres was looking at an optimization level in which a
register still got allocated for the local variable.

> Here's what I got with and without my patch on my laptop:

> -rwxr-xr-x 1 heikki heikki 24767237 tammi 12 21:27 postgres-do-while-mypatch
> -rwxr-xr-x 1 heikki heikki 24623081 tammi 12 21:28 postgres-unpatched

> But when I build without --enable-debug, the situation reverses:

> -rwxr-xr-x 1 heikki heikki 5961309 tammi 12 21:48 postgres-do-while-mypatch
> -rwxr-xr-x 1 heikki heikki 5986961 tammi 12 21:49 postgres-unpatched

Yes, I noticed that too: these patches make the debug overhead grow
substantially for some reason.  It's better to look at the output of
"size" rather than the executable's total file size.  That way you don't
need to rebuild without debug to see reality.  (I guess you could also
"strip" the executables for comparison purposes.)
        regards, tom lane

Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Tom Lane

Date:

12 January 2013, 21:36:43

Andres Freund <andres@2ndquadrant.com> writes:
>>> It does *not* combine elog_start and elog_finish into one function if
>>> varargs are available although that brings a rather measurable
>>> size/performance benefit.

>> Since you've apparently already done the measurement: how much?
>> It would be a bit tedious to support two different infrastructures for
>> elog(), but if it's a big enough win maybe we should.

> Imo its pretty definitely a big enough win. So big I have a hard time
> believing it can be true without negative effects somewhere else.

Well, actually there's a pretty serious negative effect here, which is
that when it's implemented this way it's impossible to save errno for %m
before the elog argument list is evaluated.  So I think this is a no-go.
We've always had the contract that functions in the argument list could
stomp on errno without care.

If we switch to a do-while macro expansion it'd be possible to do
something like
do {    int save_errno = errno;    int elevel = whatever;
    elog_internal( save_errno, elevel, fmt, __VA__ARGS__ );} while (0);

but this would almost certainly result in more code bloat not less,
since call sites would now be responsible for fetching errno.
        regards, tom lane

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Tom Lane

Date:

12 January 2013, 23:15:30

Andres Freund <andres@2ndquadrant.com> writes:
> On 2013-01-12 13:16:56 -0500, Tom Lane wrote:
>> However, using a do-block with a local variable is definitely something
>> worth considering.  I'm getting less enamored of the __builtin_constant_p
>> idea after finding out that the gcc boys seem to have curious ideas
>> about what its semantics ought to be:
>> https://bugzilla.redhat.com/show_bug.cgi?id=894515

> I wonder whether __builtin_choose_expr is any better?

Right offhand I don't see how that helps us.  AFAICS,
__builtin_choose_expr is the same as x?y:z except that y and z are not
required to have the same datatype, which is okay because they require
the value of x to be determinable at compile time.  So we couldn't just
write __builtin_choose_expr((elevel) >= ERROR, ...).  That would fail
if elevel wasn't compile-time-constant.  We could conceivably do

__builtin_choose_expr(__builtin_constant_p(elevel) && (elevel) >= ERROR, ...)

with the idea of making real sure that that expression reduces to a
compile time constant ... but stacking two nonstandard constructs on
each other seems to me to probably increase our exposure to gcc's
definitional randomness rather than reduce it.  I mean, if
__builtin_constant_p can't already be trusted to act like a constant,
why should we trust that __builtin_choose_expr doesn't also have a
curious definition of constant-ness?
        regards, tom lane

Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Tom Lane

Date:

13 January 2013, 00:47:26

Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> When I compile with gcc -O0, I get one warning with this:

> datetime.c: In function �DateTimeParseError�:
> datetime.c:3575:1: warning: �noreturn� function does return [enabled by default]

> That suggests that the compiler didn't correctly deduce that 
> ereport(ERROR, ...) doesn't return. With -O1, the warning goes away.

Yeah, I am seeing this too.  It appears to be endemic to the
local-variable approach, ie if we have
const int elevel_ = (elevel);...(elevel_ >= ERROR) ? pg_unreachable() : (void) 0

then we do not get the desired results at -O0, which is not terribly
surprising --- I'd not really expect the compiler to propagate the
value of elevel_ when not optimizing.

If we don't use a local variable, we don't get the warning, which
I take to mean that gcc will fold "ERROR >= ERROR" to true even at -O0,
and that it does this early enough to conclude that unreachability
holds.

I experimented with some variant ways of phrasing the macro, but the
only thing that worked at -O0 required __builtin_constant_p, which
rather defeats the purpose of making this accessible to non-gcc
compilers.

If we go with the local-variable approach, we could probably suppress
this warning by putting an abort() call at the bottom of
DateTimeParseError.  It seems a tad ugly though, and what's a bigger
deal is that if the compiler is unable to deduce unreachability at -O0
then we are probably going to be fighting such bogus warnings for all
time to come.  Note also that an abort() (much less a pg_unreachable())
would not do anything positive like give us a compile warning if we
mistakenly added a case that could fall through.

On the other hand, if there's only one such case in our tree today,
maybe I'm worrying too much.
        regards, tom lane

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Andres Freund

Date:

13 January 2013, 17:51:34

On 2013-01-12 15:39:16 -0500, Tom Lane wrote:
> Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> > On 12.01.2013 20:42, Andres Freund wrote:
> >>> I don't care for that too much in detail -- if errstart were to return
> >>> false (it shouldn't, but if it did) this would be utterly broken,
> 
> > With the current ereport(), we'll call abort() if errstart returns false 
> > and elevel >= ERROR. That's even worse than making an error reporting 
> > calls that elog.c isn't prepared for.
> 
> No it isn't: you'd get a clean and easily interpretable abort.  I am not
> sure exactly what would happen if we plow forward with calling elog.c
> functions after errstart returns false, but it wouldn't be good, and
> debugging a field report of such behavior wouldn't be easy.
> 
> This is actually a disadvantage of the proposal to replace the abort()
> calls with __builtin_unreachable(), too.  The gcc boys interpret the
> semantics of that as "if control reaches here, the behavior is
> undefined" --- and their notion of undefined generally seems to amount
> to "we will screw you as hard as we can".  For example, they have no
> problem with using the assumption that control won't reach there to make
> deductions while optimizing the rest of the function.  If it ever did
> happen, I would not want to have to debug it.  The same goes for
> __attribute__((noreturn)), BTW.


We could make add an Assert() additionally?
> >> Yea, I didn't really like it all that much either - but anyway, I have
> >> yet to find *any* version with a local variable that doesn't lead to
> >> 200kb size increase :(.
> 
> > Hmm, that's strange. Assuming the optimizer optimizes away the paths in 
> > the macro that are never taken when elevel is a constant, I would expect 
> > the resulting binary to be smaller than what we have now.
> 
> I was wondering about that too, but haven't tried it locally yet.  It
> could be that Andres was looking at an optimization level in which a
> register still got allocated for the local variable.

Nope, it was O2, albeit with -fno-omit-frame-pointer (for usable
hierarchical profiles).

> > Here's what I got with and without my patch on my laptop:
> 
> > -rwxr-xr-x 1 heikki heikki 24767237 tammi 12 21:27 postgres-do-while-mypatch
> > -rwxr-xr-x 1 heikki heikki 24623081 tammi 12 21:28 postgres-unpatched
> 
> > But when I build without --enable-debug, the situation reverses:
> 
> > -rwxr-xr-x 1 heikki heikki 5961309 tammi 12 21:48 postgres-do-while-mypatch
> > -rwxr-xr-x 1 heikki heikki 5986961 tammi 12 21:49 postgres-unpatched
> 
> Yes, I noticed that too: these patches make the debug overhead grow
> substantially for some reason.  It's better to look at the output of
> "size" rather than the executable's total file size.  That way you don't
> need to rebuild without debug to see reality.  (I guess you could also

But the above is spot on. While I used strip earlier I somehow forgot it
here and so the debug overhead is part of what I saw. But, in comparison
to my baseline (which is replacing the abort() with
pg_unreachable()/__builtin_unreachable()) its still a loss
performancewise which is why I didn't doubt my results...

Pavel's testcase with changing ereport implementations (5 runs, minimal
time):
EREPORT_EXPR_OLD_NO_UNREACHABLE: 9964.015ms,    5530296 bytes (stripped)
EREPORT_EXPR_OLD_ABORT:          9765.916ms,    5466936 bytes (stripped)
EREPORT_EXPR_OLD_UNREACHABLE:    9646.502ms,    5462456 bytes (stripped)
EREPORT_STMT_HEIKKI:             10133.968ms,   5435704 bytes (stripped)
EREPORT_STMT_ANDRES:             9548.436ms,    5462232 bytes (stripped)

Where my variant is:
#define ereport_domain(elevel, domain, rest)                                \   do {
                               \       const int elevel_ = elevel;                                         \       if
(errstart(elevel_,__FILE__,__LINE__, PG_FUNCNAME_MACRO, domain))\       {
                   \           errfinish rest;                                                 \       }
                                                  \       if (elevel_>= ERROR)
     \           pg_unreachable();                                               \   } while(0)
 

So I suggest using that with an additional Assert(0) in the if(elevel_)
branch.

The assembler generated for my version is exactly the same when elevel
is constant in comparison with Heikki's but doesn't generate any
additional code in the case its not constant and I guess thats where it
gets faster?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Andres Freund

Date:

13 January 2013, 18:01:14

On 2013-01-12 19:47:18 -0500, Tom Lane wrote:
> Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> > When I compile with gcc -O0, I get one warning with this:
>
> > datetime.c: In function �DateTimeParseError�:
> > datetime.c:3575:1: warning: �noreturn� function does return [enabled by default]
>
> > That suggests that the compiler didn't correctly deduce that
> > ereport(ERROR, ...) doesn't return. With -O1, the warning goes away.
>
> Yeah, I am seeing this too.  It appears to be endemic to the
> local-variable approach, ie if we have
>
>     const int elevel_ = (elevel);
>     ...
>     (elevel_ >= ERROR) ? pg_unreachable() : (void) 0
>
> then we do not get the desired results at -O0, which is not terribly
> surprising --- I'd not really expect the compiler to propagate the
> value of elevel_ when not optimizing.
>
> If we don't use a local variable, we don't get the warning, which
> I take to mean that gcc will fold "ERROR >= ERROR" to true even at -O0,
> and that it does this early enough to conclude that unreachability
> holds.
>
> I experimented with some variant ways of phrasing the macro, but the
> only thing that worked at -O0 required __builtin_constant_p, which
> rather defeats the purpose of making this accessible to non-gcc
> compilers.

Yea, its rather sad. The only thing I can see is actually doing both
variants like:

#define ereport_domain(elevel, domain, rest)                                \   do {
                               \       const int elevel_ = elevel;                                         \       if
(errstart(elevel_,__FILE__,__LINE__, PG_FUNCNAME_MACRO, domain))\       {
                   \           errfinish rest;                                                 \       }
                                                  \       if (pg_is_known_constant(elevel) && (elevel) >= ERROR)
     \       {                                                                   \           pg_unreachable();
                                    \           Assert(0);                                                      \
}                                                                  \       else if (elevel_>= ERROR)
                      \       {                                                                   \
pg_unreachable();                                              \           Assert(0);
                  \       }                                                                   \   } while(0) 

but that obviously still relies on __builtin_constant_p giving
reasonable answers.

We should probably put that Assert(0) into pg_unreachable()...

> If we go with the local-variable approach, we could probably suppress
> this warning by putting an abort() call at the bottom of
> DateTimeParseError.  It seems a tad ugly though, and what's a bigger
> deal is that if the compiler is unable to deduce unreachability at -O0
> then we are probably going to be fighting such bogus warnings for all
> time to come.  Note also that an abort() (much less a pg_unreachable())
> would not do anything positive like give us a compile warning if we
> mistakenly added a case that could fall through.
>
> On the other hand, if there's only one such case in our tree today,
> maybe I'm worrying too much.

I think the only reason there's only one such case (two here actually,
second in renametrig) is that all others already have been silenced
because the abort() didn't use to be there. And I think the danger youre
alluding to above is a real one.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Andres Freund

Date:

13 January 2013, 18:07:42

On 2013-01-12 18:15:17 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2013-01-12 13:16:56 -0500, Tom Lane wrote:
> >> However, using a do-block with a local variable is definitely something
> >> worth considering.  I'm getting less enamored of the __builtin_constant_p
> >> idea after finding out that the gcc boys seem to have curious ideas
> >> about what its semantics ought to be:
> >> https://bugzilla.redhat.com/show_bug.cgi?id=894515
> 
> > I wonder whether __builtin_choose_expr is any better?
> 
> Right offhand I don't see how that helps us.  AFAICS,
> __builtin_choose_expr is the same as x?y:z except that y and z are not
> required to have the same datatype, which is okay because they require
> the value of x to be determinable at compile time.  So we couldn't just
> write __builtin_choose_expr((elevel) >= ERROR, ...).  That would fail
> if elevel wasn't compile-time-constant.  We could conceivably do
> 
> __builtin_choose_expr(__builtin_constant_p(elevel) && (elevel) >= ERROR, ...)
> 
> with the idea of making real sure that that expression reduces to a
> compile time constant ... but stacking two nonstandard constructs on
> each other seems to me to probably increase our exposure to gcc's
> definitional randomness rather than reduce it.  I mean, if
> __builtin_constant_p can't already be trusted to act like a constant,
> why should we trust that __builtin_choose_expr doesn't also have a
> curious definition of constant-ness?

I was thinking of something like the above, yes. It seems to me to
generate code the compiler would need to really make sure the condition
in __builtin_choose_expr() is really constant, so I hoped the checks
inside are somewhat strict...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Tom Lane

Date:

13 January 2013, 18:41:45

Andres Freund <andres@2ndquadrant.com> writes:
> On 2013-01-12 15:39:16 -0500, Tom Lane wrote:
>> This is actually a disadvantage of the proposal to replace the abort()
>> calls with __builtin_unreachable(), too.  The gcc boys interpret the
>> semantics of that as "if control reaches here, the behavior is
>> undefined"

> We could make add an Assert() additionally?

I see little value in that.  In a non-assert build it will do nothing at
all, and in an assert build it'd just add code bloat.

I think there might be a good argument for defining pg_unreachable() as
abort() in assert-enabled builds, even if the compiler has
__builtin_unreachable(): that way, if the impossible happens, we'll find
out about it in a predictable and debuggable way.  And by definition,
we're not so concerned about either performance or code bloat in assert
builds.

> Where my variant is:
> #define ereport_domain(elevel, domain, rest)                                \
>     do {                                                                    \
>         const int elevel_ = elevel;                                         \
>         if (errstart(elevel_,__FILE__, __LINE__, PG_FUNCNAME_MACRO, domain))\
>         {                                                                   \
>             errfinish rest;                                                 \
>         }                                                                   \
>         if (elevel_>= ERROR)                                                \
>             pg_unreachable();                                               \
>     } while(0)

This is the same implementation I'd arrived at after looking at
Heikki's.  I think we should probably use this for non-gcc builds.
However, for recent gcc (those with __builtin_constant_p) I think that
my original suggestion is better, ie don't use a local variable but
instead use __builtin_constant_p(elevel) && (elevel) >= ERROR in the
second test.  That way we don't have the problem with unreachability
tests failing at -O0.  We may still have some issues with bogus warnings
in other compilers, but I think most PG developers are using recent gcc.
        regards, tom lane

Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Andres Freund

Date:

13 January 2013, 19:06:44

On 2013-01-12 16:36:39 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> >>> It does *not* combine elog_start and elog_finish into one function if
> >>> varargs are available although that brings a rather measurable
> >>> size/performance benefit.
> 
> >> Since you've apparently already done the measurement: how much?
> >> It would be a bit tedious to support two different infrastructures for
> >> elog(), but if it's a big enough win maybe we should.
> 
> > Imo its pretty definitely a big enough win. So big I have a hard time
> > believing it can be true without negative effects somewhere else.
> 
> Well, actually there's a pretty serious negative effect here, which is
> that when it's implemented this way it's impossible to save errno for %m
> before the elog argument list is evaluated.  So I think this is a no-go.
> We've always had the contract that functions in the argument list could
> stomp on errno without care.
> 
> If we switch to a do-while macro expansion it'd be possible to do
> something like
> 
>     do {
>         int save_errno = errno;
>         int elevel = whatever;
> 
>         elog_internal( save_errno, elevel, fmt, __VA__ARGS__ );
>     } while (0);
> 
> but this would almost certainly result in more code bloat not less,
> since call sites would now be responsible for fetching errno.

the numbers are:
old definition:                                 10393.658ms, 5497912 bytes
old definition + unreachable:                   10011.102ms, 5469144 bytes
stmt, two calls, unreachable:                   10036.132ms, 5468792 bytes
stmt, one call, unreachable:                    9443.612ms,  5462232 bytes
stmt, one call, unreachable, save errno:        9615.863ms,  5489688 bytes

So while not saving errno is unsurprisingly better its still a win.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Tom Lane

Date:

13 January 2013, 19:18:00

Andres Freund <andres@2ndquadrant.com> writes:
> the numbers are:
> old definition:                                 10393.658ms, 5497912 bytes
> old definition + unreachable:                   10011.102ms, 5469144 bytes
> stmt, two calls, unreachable:                   10036.132ms, 5468792 bytes
> stmt, one call, unreachable:                    9443.612ms,  5462232 bytes
> stmt, one call, unreachable, save errno:        9615.863ms,  5489688 bytes

I find these numbers pretty hard to credit.  Why should replacing two
calls by one, in code paths that are not being taken, move the runtime
so much?  The argument that a net reduction of code size is a win
doesn't work, because the last case is more code than any except the
first.

I think you're measuring some coincidental effect or other, not a
reproducible performance improvement.  Or there's a bug in the code
you're using.
        regards, tom lane

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Andres Freund

Date:

13 January 2013, 19:44:04

On 2013-01-13 14:17:52 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > the numbers are:
> > old definition:                                 10393.658ms, 5497912 bytes
> > old definition + unreachable:                   10011.102ms, 5469144 bytes
> > stmt, two calls, unreachable:                   10036.132ms, 5468792 bytes
> > stmt, one call, unreachable:                    9443.612ms,  5462232 bytes
> > stmt, one call, unreachable, save errno:        9615.863ms,  5489688 bytes
> 
> I find these numbers pretty hard to credit.  Why should replacing two
> calls by one, in code paths that are not being taken, move the runtime
> so much?  The argument that a net reduction of code size is a win
> doesn't work, because the last case is more code than any except the
> first.

There are quite some elog(DEBUG*)s in the backend, and those are taken,
so I don't find it unreasonable that doing less work in those cases is
measurable.

And if you look at the disassembly of ERROR codepaths:

saving errno, one function:
Dump of assembler code for function palloc:
639  {  0x0000000000736397 <+7>:     mov    %rdi,%rsi  0x00000000007363b0 <+32>:    push   %rbp  0x00000000007363b1
<+33>:   mov    %rsp,%rbp  0x00000000007363b4 <+36>:    sub    $0x20,%rsp
 

640                             /* duplicates MemoryContextAlloc to
avoid increased overhead */
641             AssertArg(MemoryContextIsValid(CurrentMemoryContext));
642
643             if (!AllocSizeIsValid(size))  0x0000000000736390 <+0>:     cmp    $0x3fffffff,%rdi  0x000000000073639a
<+10>:   ja     0x7363b0 <palloc+32>
 

644                                    elog(ERROR, "invalid memory alloc
request size %lu",  0x00000000007363b8 <+40>:    mov    %rdi,-0x8(%rbp)  0x00000000007363bc <+44>:    callq  0x462010
<__errno_location@plt> 0x00000000007363c1 <+49>:    mov    -0x8(%rbp),%rsi  0x00000000007363c5 <+53>:    mov
$0x8ace30,%r9d 0x00000000007363cb <+59>:    mov    $0x14,%ecx  0x00000000007363d0 <+64>:    mov    $0x8acefe,%edx
0x00000000007363d5<+69>:    mov    $0x8ace58,%edi  0x00000000007363da <+74>:    mov    %rsi,(%rsp)  0x00000000007363de
<+78>:   mov    (%rax),%r8d  0x00000000007363e1 <+81>:    mov    $0x285,%esi  0x00000000007363e6 <+86>:    xor
%eax,%eax 0x00000000007363e8 <+88>:    callq  0x71a0b0 <elog_all>  0x00000000007363ed:          nopl   (%rax)
 
645                                              (unsigned long) size);
646
647             CurrentMemoryContext->isReset = false;  0x000000000073639c <+12>:                  mov
0x25c6e5(%rip),%rdi # 0x992a88 <CurrentMemoryContext>  0x00000000007363a7 <+23>:    movb   $0x0,0x30(%rdi)
 

648
649             return (*CurrentMemoryContext->methods->alloc)
(CurrentMemoryContext, size);  0x00000000007363a3 <+19>:    mov    0x8(%rdi),%rax  0x00000000007363ab <+27>:    mov
(%rax),%rax 0x00000000007363ae <+30>:    jmpq   *%rax
 

End of assembler dump.

vs

Dump of assembler code for function palloc:
639  {  0x00000000007382b0 <+0>:     push   %rbp  0x00000000007382b1 <+1>:     mov    %rsp,%rbp  0x00000000007382b4
<+4>:    push   %rbx  0x00000000007382b5 <+5>:     mov    %rdi,%rbx  0x00000000007382b8 <+8>:     sub    $0x8,%rsp
 

640                             /* duplicates MemoryContextAlloc to
avoid increased overhead */
641             AssertArg(MemoryContextIsValid(CurrentMemoryContext));
642
643             if (!AllocSizeIsValid(size))  0x00000000007382bc <+12>:    cmp    $0x3fffffff,%rdi  0x00000000007382c3
<+19>:   jbe    0x7382ed <palloc+61>
 

644                                    elog(ERROR, "invalid memory alloc
request size %lu",  0x00000000007382c5 <+21>:    mov    $0x8aec9e,%edx  0x00000000007382ca <+26>:    mov    $0x284,%esi
0x00000000007382cf <+31>:    mov    $0x8aebd0,%edi  0x00000000007382d4 <+36>:    callq  0x71bcb0 <elog_start>
0x00000000007382d9<+41>:    mov    %rbx,%rdx  0x00000000007382dc <+44>:    mov    $0x8aec10,%esi  0x00000000007382e1
<+49>:   mov    $0x14,%edi  0x00000000007382e6 <+54>:    xor    %eax,%eax  0x00000000007382e8 <+56>:    callq  0x71bac0
<elog_finish>

645                                              (unsigned long) size);
646
647             CurrentMemoryContext->isReset = false;  0x00000000007382ed <+61>:                  mov
0x25bc54(%rip),%rdi # 0x993f48 <CurrentMemoryContext>  0x00000000007382fb <+75>:    movb   $0x0,0x30(%rdi)
 

648
649             return (*CurrentMemoryContext->methods->alloc)
(CurrentMemoryContext, size);  0x00000000007382f4 <+68>:    mov    %rbx,%rsi  0x00000000007382f7 <+71>:    mov
0x8(%rdi),%rax 0x00000000007382ff <+79>:    mov    (%rax),%rax  0x0000000000738308 <+88>:    jmpq   *%rax
0x000000000073830a:         nopw   0x0(%rax,%rax,1)
 

650                             }  0x0000000000738302 <+82>:    add    $0x8,%rsp  0x0000000000738306 <+86>:    pop
%rbx 0x0000000000738307 <+87>:    pop    %rbp
 
End of assembler dump.

If other critical paths look like this its not that surprising.

> I think you're measuring some coincidental effect or other, not a
> reproducible performance improvement.  Or there's a bug in the code
> you're using.

Might be, I repeated all of it though.


Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Tom Lane

Date:

13 January 2013, 20:45:08

Andres Freund <andres@2ndquadrant.com> writes:
> On 2013-01-13 14:17:52 -0500, Tom Lane wrote:
>> I find these numbers pretty hard to credit.

> There are quite some elog(DEBUG*)s in the backend, and those are taken,
> so I don't find it unreasonable that doing less work in those cases is
> measurable.

Meh.  If there are any elog(DEBUG)s in time-critical places, they should
be changed to ereport's anyway, so as to eliminate most of the overhead
when they're not firing.

> And if you look at the disassembly of ERROR codepaths:

I think your numbers are being twisted by -fno-omit-frame-pointer.
What I get, with the __builtin_unreachable version of the macro,
looks more like

MemoryContextAlloc:cmpq    $1073741823, %rsipushq    %rbxmovq    %rsi, %rbxja    .L53movq    8(%rdi), %raxmovb    $0,
48(%rdi)popq   %rbxmovq    (%rax), %raxjmp    *%rax

.L53:movl    $__func__.5262, %edxmovl    $576, %esimovl    $.LC2, %edicall    elog_startmovq    %rbx, %rdxmovl
$.LC3,%esimovl    $20, %edixorl    %eax, %eaxcall    elog_finish

With -fno-omit-frame-pointer it's a little worse, but still not what you
show --- in particular, for me gcc still pushes the elog calls out of
the main code path.  I don't think that the main path will get any
better with one elog function instead of two.  It could easily get worse.
On many machines, the single-function version would be worse because of
needing to use more parameter registers, which would translate into more
save/restore work in the calling function, which is overhead that would
likely be paid whether elog actually gets called or not.  (As an
example, in the above code gcc evidently isn't noticing that it doesn't
need to save/restore rbx so far as the main path is concerned.)

In any case, results from a single micro-benchmark on a single platform
with a single compiler version (and single set of compiler options)
don't convince me a whole lot here.

Basically, the aspects of this that I think are likely to be
reproducible wins across different platforms are (a) teaching the
compiler that elog(ERROR) doesn't return, and (b) reducing code size as
much as possible.  The single-function change isn't going to help on
either ground --- maybe it would have helped on (b) without the errno
problem, but that's going to destroy any possible code size savings.
        regards, tom lane

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Andres Freund

Date:

13 January 2013, 21:06:46

On 2013-01-13 15:44:58 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:

> > And if you look at the disassembly of ERROR codepaths:
> 
> I think your numbers are being twisted by -fno-omit-frame-pointer.
> What I get, with the __builtin_unreachable version of the macro,
> looks more like

The only difference is that the following two instructions aren't done:  0x00000000007363b0 <+32>:    push   %rbp
0x00000000007363b1<+33>:    mov    %rsp,%rbp
 

Given that that function barely uses registers (from an amd64 pov at
least), thats notreally surprising.

> MemoryContextAlloc:
>     cmpq    $1073741823, %rsi
>     pushq    %rbx
>     movq    %rsi, %rbx
>     ja    .L53
>     movq    8(%rdi), %rax
>     movb    $0, 48(%rdi)
>     popq    %rbx
>     movq    (%rax), %rax
>     jmp    *%rax
> .L53:
>     movl    $__func__.5262, %edx
>     movl    $576, %esi
>     movl    $.LC2, %edi
>     call    elog_start
>     movq    %rbx, %rdx
>     movl    $.LC3, %esi
>     movl    $20, %edi
>     xorl    %eax, %eax
>     call    elog_finish
> 
> With -fno-omit-frame-pointer it's a little worse, but still not what you
> show --- in particular, for me gcc still pushes the elog calls out of
> the main code path.  I don't think that the main path will get any
> better with one elog function instead of two.  It could easily get worse.
> On many machines, the single-function version would be worse because of
> needing to use more parameter registers, which would translate into more
> save/restore work in the calling function, which is overhead that would
> likely be paid whether elog actually gets called or not.  (As an
> example, in the above code gcc evidently isn't noticing that it doesn't
> need to save/restore rbx so far as the main path is concerned.)
> 
> In any case, results from a single micro-benchmark on a single platform
> with a single compiler version (and single set of compiler options)
> don't convince me a whole lot here.

I am not convinced either, I just got curious whether it would be a win
after the __VA_ARGS__ thing made it possible. If the errno problem
didn't exist I would be pretty damn sure its just about always a win,
but the way its now...

We could make elog behave the same as ereport WRT to argument evaluation
but that seems a bit dangerous given it would still be different on some
platforms.

> Basically, the aspects of this that I think are likely to be
> reproducible wins across different platforms are (a) teaching the
> compiler that elog(ERROR) doesn't return, and (b) reducing code size as
> much as possible.  The single-function change isn't going to help on
> either ground --- maybe it would have helped on (b) without the errno
> problem, but that's going to destroy any possible code size savings.

Agreed. I am happy to produce an updated patch unless youre already on
it?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Tom Lane

Date:

13 January 2013, 21:16:15

Andres Freund <andres@2ndquadrant.com> writes:
>> Basically, the aspects of this that I think are likely to be
>> reproducible wins across different platforms are (a) teaching the
>> compiler that elog(ERROR) doesn't return, and (b) reducing code size as
>> much as possible.  The single-function change isn't going to help on
>> either ground --- maybe it would have helped on (b) without the errno
>> problem, but that's going to destroy any possible code size savings.

> Agreed. I am happy to produce an updated patch unless youre already on
> it?

On it now (busy testing on some old slow boxes, else I'd be done already).
        regards, tom lane

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Robert Haas

Date:

15 January 2013, 19:40:42

On Sun, Jan 13, 2013 at 4:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
>>> Basically, the aspects of this that I think are likely to be
>>> reproducible wins across different platforms are (a) teaching the
>>> compiler that elog(ERROR) doesn't return, and (b) reducing code size as
>>> much as possible.  The single-function change isn't going to help on
>>> either ground --- maybe it would have helped on (b) without the errno
>>> problem, but that's going to destroy any possible code size savings.
>
>> Agreed. I am happy to produce an updated patch unless youre already on
>> it?
>
> On it now (busy testing on some old slow boxes, else I'd be done already).

Just a random thought here...

There are an awful lot of places in our source tree where the error
level is fixed.  We could invent a new construct, say ereport_error or
so, that is just like ereport except that it takes no error-level
parameter because it's hard-coded to ERROR.

It would be a bit of a pain to change all of the existing call sites,
but presumably it would dodge a lot of these issues about the way
compilers optimize things, because we could simply say categorically
that ereport_error NEVER returns.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Tom Lane

Date:

15 January 2013, 20:03:08

Robert Haas <robertmhaas@gmail.com> writes:
> There are an awful lot of places in our source tree where the error
> level is fixed.  We could invent a new construct, say ereport_error or
> so, that is just like ereport except that it takes no error-level
> parameter because it's hard-coded to ERROR.

> It would be a bit of a pain to change all of the existing call sites,
> but presumably it would dodge a lot of these issues about the way
> compilers optimize things, because we could simply say categorically
> that ereport_error NEVER returns.

Meh.  We've already got it working, and in a way that doesn't require
the compiler to understand __attribute__((noreturn)) --- it only has
to be aware that abort() doesn't return, in one fashion or another.
So I'm disinclined to run around and change a few thousand call sites,
much less expect extension authors to do so too.

(By my count there are about six thousand places we'd have to change.)

Note that whatever's going on on dugong is not a counterexample to
"got it working", because presumably dugong would also be misbehaving
if we'd used a different method of flagging all the ereports/elogs
as nonreturning.
        regards, tom lane

Re: [PATCH 3/5] Split out xlog reading into its own module called xlogreader

From

Alvaro Herrera

Date:

16 January 2013, 19:21:02

Andres Freund wrote:

> The way xlog reading was done up to now made it impossible to use that
> nontrivial code outside of xlog.c although it is useful for different purposes
> like debugging wal (xlogdump) and decoding wal back into logical changes.

I have pushed this part after some more editorialization.

Most notably, (I think,) I removed the #ifdef FRONTEND piece.  I think
this should be part of the pg_xlogdump patch.  Sorry about this.

Also I changed XLogReaderAllocate() to not receive an initial read
starting point, because that seemed rather pointless.  Both xlog.c and
the submitted pg_xlogdump use a non-invalid RecPtr in their first call
AFAICS.

There was a bug in xlog.c's usage of XLogReadRecord: it was calling
ereport() on the errorbuf when the return value was NULL, which is not
always sane because some errors do not set the errorbuf.  I fixed that,
and I documented it more clearly in xlogreader.h (I hope).

I made some other minor stylistic changes, reworded a few comments, etc.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend

From

Andres Freund

Date:

18 January 2013, 14:48:05

On 2013-01-08 15:55:24 -0500, Tom Lane wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > Andres Freund wrote:
> >> Sorry, misremembered the problem somewhat. The problem is that code that
> >> includes postgres.h atm ends up with ExceptionalCondition() et
> >> al. declared even if FRONTEND is defined. So if anything uses an assert
> >> you need to provide wrappers for those which seems nasty. If they are
> >> provided centrally and check for FRONTEND that problem doesn't exist.
> 
> > I think the right fix here is to fix things so that postgres.h is not
> > necessary.  How hard is that?  Maybe it just requires some more
> > reshuffling of xlog headers.
> 
> That would definitely be the ideal fix.  In general, #include'ing
> postgres.h into code that's not backend code opens all kinds of hazards
> that are likely to bite us sooner or later; the incompatibility of the
> Assert definitions is just the tip of that iceberg.  (In the past we've
> had issues around <stdio.h> providing different definitions in the two
> environments, for example.)

I agree that going in that direction is a good thing, but imo that doesn't
really has any bearing on whether this patch is a good/bad idea...

> But having said that, I'm also now remembering that we have files in
> src/port/ and probably elsewhere that try to work in both environments
> by just #include'ing c.h directly (relying on the Makefile to supply
> -DFRONTEND or not).  If we wanted to support Assert use in such files,
> we would have to move at least the Assert() macro definitions into c.h
> as Andres is proposing.  Now, I've always thought that #include'ing c.h
> directly was kind of an ugly hack, but I don't have a better design than
> that to offer ATM.

Imo having some common dumping ground for things that concern both environments
is a good idea. I don't think c.h would be the best place for it when
redesigning from ground up, but were not doing that so imo its acceptable as
long as we take care not to let it grow too much.

Here's a trivially updated patch which also defines AssertArg() for
FRONTEND-ish environments since Alvaro added one in xlogreader.c.

Do we agree this is the way forward, at least for now?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Andres Freund

Date:

18 January 2013, 15:06:39

On 2013-01-09 15:07:10 -0500, Tom Lane wrote:
> I would like to know if other people get comparable results on other
> hardware (non-Intel hardware would be especially interesting).  If this
> result holds up across a range of platforms, I'll withdraw my objection
> to making palloc a plain function.

Unfortunately nobody spoke up and I don't have any non-intel hardware
anymore. Well, aside from my phone that is...

Updated patch attached, the previous version missed some contrib modules.

I like the diffstat:
 41 files changed, 224 insertions(+), 636 deletions(-)

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

0001-Provide-a-common-malloc-wrappers-and-palloc-et-al.-e.patch

Re: Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend

From

Andres Freund

Date:

18 January 2013, 15:09:22

On 2013-01-18 15:48:01 +0100, Andres Freund wrote:
> Here's a trivially updated patch which also defines AssertArg() for
> FRONTEND-ish environments since Alvaro added one in xlogreader.c.

This time for real. Please.

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

0001-Centralize-Assert-macros-into-c.h-so-its-common-betw.patch

Re: Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend

From

Alvaro Herrera

Date:

18 January 2013, 15:23:14

Andres Freund wrote:
> On 2013-01-18 15:48:01 +0100, Andres Freund wrote:
> > Here's a trivially updated patch which also defines AssertArg() for
> > FRONTEND-ish environments since Alvaro added one in xlogreader.c.
>
> This time for real. Please.

Here's a different idea: move all the Assert() and StaticAssert() etc
definitions from c.h and postgres.h into a new header, say pgassert.h.
That header is included directly by postgres.h (just like palloc.h and
elog.h already are) so we don't have to touch the backend code at all.
Frontend programs that want that functionality can just #include
"pgassert.h" by themselves.  The definitions are (obviously) protected
by #ifdef FRONTEND so that it all works in both environments cleanly.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend

From

Tom Lane

Date:

18 January 2013, 15:33:24

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Here's a different idea: move all the Assert() and StaticAssert() etc
> definitions from c.h and postgres.h into a new header, say pgassert.h.
> That header is included directly by postgres.h (just like palloc.h and
> elog.h already are) so we don't have to touch the backend code at all.
> Frontend programs that want that functionality can just #include
> "pgassert.h" by themselves.  The definitions are (obviously) protected
> by #ifdef FRONTEND so that it all works in both environments cleanly.

That might work.  Files using c.h would have to include this too, but
as described it should work in either environment for them.  The other
corner case is pg_controldata.c and other frontend programs that include
postgres.h --- but it looks like they #define FRONTEND first, so they'd
get the correct set of Assert definitions.

Whether it's really any better than just sticking them in c.h isn't
clear.

Really I'd prefer not to move the backend definitions out of postgres.h
at all, just because doing so will lose fifteen years of git history
about those particular lines (or at least make it a lot harder to
locate with git blame).
        regards, tom lane

Re: Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend

From

Alvaro Herrera

Date:

18 January 2013, 15:34:42

Alvaro Herrera wrote:
> Andres Freund wrote:
> > On 2013-01-18 15:48:01 +0100, Andres Freund wrote:
> > > Here's a trivially updated patch which also defines AssertArg() for
> > > FRONTEND-ish environments since Alvaro added one in xlogreader.c.
> >
> > This time for real. Please.
>
> Here's a different idea: move all the Assert() and StaticAssert() etc
> definitions from c.h and postgres.h into a new header, say pgassert.h.
> That header is included directly by postgres.h (just like palloc.h and
> elog.h already are) so we don't have to touch the backend code at all.
> Frontend programs that want that functionality can just #include
> "pgassert.h" by themselves.  The definitions are (obviously) protected
> by #ifdef FRONTEND so that it all works in both environments cleanly.

One slight problem with this is the common port/*.c idiom would require
an extra include:

#ifndef FRONTEND
#include "postgres.h"
#else
#include "postgres_fe.h"
#include "pgassert.h"        /* <--- new line required here */
#endif /* FRONTEND */

If this is seen as a serious issue (I don't think so) then we would have
to include pgassert.h into postgres_fe.h which would make the whole
thing pointless (i.e. the same as just having the definitions in c.h).

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend

From

Tom Lane

Date:

18 January 2013, 15:43:13

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> One slight problem with this is the common port/*.c idiom would require
> an extra include:

> #ifndef FRONTEND
> #include "postgres.h"
> #else
> #include "postgres_fe.h"
> #include "pgassert.h"        /* <--- new line required here */
> #endif /* FRONTEND */

> If this is seen as a serious issue (I don't think so) then we would have
> to include pgassert.h into postgres_fe.h which would make the whole
> thing pointless (i.e. the same as just having the definitions in c.h).

We'd only need to add that when/if the file started to use asserts,
so I don't think it's a problem.  But still not sure it's better than
just putting the stuff in c.h.
        regards, tom lane

Re: Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend

From

Andres Freund

Date:

18 January 2013, 15:54:56

On 2013-01-18 10:33:16 -0500, Tom Lane wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > Here's a different idea: move all the Assert() and StaticAssert() etc
> > definitions from c.h and postgres.h into a new header, say pgassert.h.
> > That header is included directly by postgres.h (just like palloc.h and
> > elog.h already are) so we don't have to touch the backend code at all.
> > Frontend programs that want that functionality can just #include
> > "pgassert.h" by themselves.  The definitions are (obviously) protected
> > by #ifdef FRONTEND so that it all works in both environments cleanly.
> 
> That might work.  Files using c.h would have to include this too, but
> as described it should work in either environment for them.  The other
> corner case is pg_controldata.c and other frontend programs that include
> postgres.h --- but it looks like they #define FRONTEND first, so they'd
> get the correct set of Assert definitions.
> 
> Whether it's really any better than just sticking them in c.h isn't
> clear.

I don't really see an advantage. To me providing sensible asserts sounds
like a good fit for c.h... And we already have StaticAssert*,
AssertVariableIsOfType* in c.h...

> Really I'd prefer not to move the backend definitions out of postgres.h
> at all, just because doing so will lose fifteen years of git history
> about those particular lines (or at least make it a lot harder to
> locate with git blame).

I can imagine some ugly macro solutions to that problem (pgassert.h
includes postgres.h with special defines), but that doesn't seem to be
warranted to me.

The alternative seems to be sprinkling a few more #ifdef FRONTEND's in
postgres.h, if thats preferred I can prepare a patch for that although I
prefer my proposal.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend

From

Tom Lane

Date:

18 January 2013, 16:11:56

Andres Freund <andres@2ndquadrant.com> writes:
> On 2013-01-18 10:33:16 -0500, Tom Lane wrote:
>> Really I'd prefer not to move the backend definitions out of postgres.h
>> at all, just because doing so will lose fifteen years of git history
>> about those particular lines (or at least make it a lot harder to
>> locate with git blame).

> The alternative seems to be sprinkling a few more #ifdef FRONTEND's in
> postgres.h, if thats preferred I can prepare a patch for that although I
> prefer my proposal.

Yeah, surrounding the existing definitions with #ifndef FRONTEND was
what I was imagining.  But on reflection that seems pretty darn ugly,
especially if the corresponding FRONTEND definitions are far away.
Maybe we should just live with the git history disconnect.

Right at the moment the c.h location seems like the best bet.  I don't
see much advantage to Alvaro's proposal of a new header file.
        regards, tom lane

Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend

From

Andres Freund

Date:

18 January 2013, 16:17:22

On 2013-01-18 11:11:50 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2013-01-18 10:33:16 -0500, Tom Lane wrote:
> >> Really I'd prefer not to move the backend definitions out of postgres.h
> >> at all, just because doing so will lose fifteen years of git history
> >> about those particular lines (or at least make it a lot harder to
> >> locate with git blame).
> 
> > The alternative seems to be sprinkling a few more #ifdef FRONTEND's in
> > postgres.h, if thats preferred I can prepare a patch for that although I
> > prefer my proposal.
> 
> Yeah, surrounding the existing definitions with #ifndef FRONTEND was
> what I was imagining.  But on reflection that seems pretty darn ugly,
> especially if the corresponding FRONTEND definitions are far away.
> Maybe we should just live with the git history disconnect.

FWIW, git blame -M -C, while noticeably more expensive, seems to be able to
connect the history:

d08741ea src/include/postgres.h                (Tom Lane           2001-02-10 02:31:31 +0000  746) 
d08741ea src/include/postgres.h                (Tom Lane           2001-02-10 02:31:31 +0000  747) #define
Assert(condition)\
 
c5354dff src/include/postgres.h                (Bruce Momjian      2002-08-10 20:29:18 +0000  748)
Trap(!(condition),"FailedAssertion")
 
d08741ea src/include/postgres.h                (Tom Lane           2001-02-10 02:31:31 +0000  749) 
d08741ea src/include/postgres.h                (Tom Lane           2001-02-10 02:31:31 +0000  750) #define
AssertMacro(condition)\
 
c5354dff src/include/postgres.h                (Bruce Momjian      2002-08-10 20:29:18 +0000  751)              ((void)
TrapMacro(!(condition),"FailedAssertion"))
 
d08741ea src/include/postgres.h                (Tom Lane           2001-02-10 02:31:31 +0000  752) 
d08741ea src/include/postgres.h                (Tom Lane           2001-02-10 02:31:31 +0000  753) #define
AssertArg(condition)\
 
c5354dff src/include/postgres.h                (Bruce Momjian      2002-08-10 20:29:18 +0000  754)
Trap(!(condition),"BadArgument")
 
d08741ea src/include/postgres.h                (Tom Lane           2001-02-10 02:31:31 +0000  755) 
d08741ea src/include/postgres.h                (Tom Lane           2001-02-10 02:31:31 +0000  756) #define
AssertState(condition)\
 
c5354dff src/include/postgres.h                (Bruce Momjian      2002-08-10 20:29:18 +0000  757)
Trap(!(condition),"BadState")
 
8118de2b src/include/c.h                       (Andres Freund      2012-12-18 00:58:36 +0100  758) 


Andres

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Steve Singer

Date:

19 January 2013, 22:33:21

On 13-01-09 03:07 PM, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
>> Well, I *did* benchmark it as noted elsewhere in the thread, but thats
>> obviously just machine (E5520 x 2) with one rather restricted workload
>> (pgbench -S -jc 40 -T60). At least its rather palloc heavy.
>> Here are the numbers:
>> before:
>> #101646.763208 101350.361595 101421.425668 101571.211688 101862.172051 101449.857665
>> after:
>> #101553.596257 102132.277795 101528.816229 101733.541792 101438.531618 101673.400992
>> So on my system if there is a difference, its positive (0.12%).
> pgbench-based testing doesn't fill me with a lot of confidence for this
> --- its numbers contain a lot of communication overhead, not to mention
> that pgbench itself can be a bottleneck.  It struck me that we have a
> recent test case that's known to be really palloc-intensive, namely
> Pavel's example here:
> http://www.postgresql.org/message-id/CAFj8pRCKfoz6L82PovLXNK-1JL=jzjwaT8e2BD2PwNKm7i7KVg@mail.gmail.com
>
> I set up a non-cassert build of commit
> 78a5e738e97b4dda89e1bfea60675bcf15f25994 (ie, just before the patch that
> reduced the data-copying overhead for that).  On my Fedora 16 machine
> (dual 2.0GHz Xeon E5503, gcc version 4.6.3 20120306 (Red Hat 4.6.3-2))
> I get a runtime for Pavel's example of 17023 msec (average over five
> runs).  I then applied oprofile and got a breakdown like this:
>
>    samples|      %|
> ------------------
>     108409 84.5083 /home/tgl/testversion/bin/postgres
>      13723 10.6975 /lib64/libc-2.14.90.so
>       3153  2.4579 /home/tgl/testversion/lib/postgresql/plpgsql.so
>
> samples  %        symbol name
> 10960    10.1495  AllocSetAlloc
> 6325      5.8572  MemoryContextAllocZeroAligned
> 6225      5.7646  base_yyparse
> 3765      3.4866  copyObject
> 2511      2.3253  MemoryContextAlloc
> 2292      2.1225  grouping_planner
> 2044      1.8928  SearchCatCache
> 1956      1.8113  core_yylex
> 1763      1.6326  expression_tree_walker
> 1347      1.2474  MemoryContextCreate
> 1340      1.2409  check_stack_depth
> 1276      1.1816  GetCachedPlan
> 1175      1.0881  AllocSetFree
> 1106      1.0242  GetSnapshotData
> 1106      1.0242  _SPI_execute_plan
> 1101      1.0196  extract_query_dependencies_walker
>
> I then applied the palloc.h and mcxt.c hunks of your patch and rebuilt.
> Now I get an average runtime of 16666 ms, a full 2% faster, which is a
> bit astonishing, particularly because the oprofile results haven't moved
> much:
>
>     107642 83.7427 /home/tgl/testversion/bin/postgres
>      14677 11.4183 /lib64/libc-2.14.90.so
>       3180  2.4740 /home/tgl/testversion/lib/postgresql/plpgsql.so
>
> samples  %        symbol name
> 10038     9.3537  AllocSetAlloc
> 6392      5.9562  MemoryContextAllocZeroAligned
> 5763      5.3701  base_yyparse
> 4810      4.4821  copyObject
> 2268      2.1134  grouping_planner
> 2178      2.0295  core_yylex
> 1963      1.8292  palloc
> 1867      1.7397  SearchCatCache
> 1835      1.7099  expression_tree_walker
> 1551      1.4453  check_stack_depth
> 1374      1.2803  _SPI_execute_plan
> 1282      1.1946  MemoryContextCreate
> 1187      1.1061  AllocSetFree
> ...
> 653       0.6085  palloc0
> ...
> 552       0.5144  MemoryContextAlloc
>
> The number of calls of AllocSetAlloc certainly hasn't changed at all, so
> how did that get faster?
>
> I notice that the postgres executable is about 0.2% smaller, presumably
> because a whole lot of inlined fetches of CurrentMemoryContext are gone.
> This makes me wonder if my result is due to chance improvements of cache
> line alignment for inner loops.
>
> I would like to know if other people get comparable results on other
> hardware (non-Intel hardware would be especially interesting).  If this
> result holds up across a range of platforms, I'll withdraw my objection
> to making palloc a plain function.
>
>             regards, tom lane
>

Sorry for the delay I only read this thread today.


I just tried Pawel's test on a POWER5 machine with an older version of 
gcc (see the grebe buildfarm animal for details)

78a5e738e:                   37874.855 (average of 6 runs)
78a5e738 + palloc.h + mcxt.c: 38076.8035

The functions do seem to slightly slow things down on POWER. I haven't bothered to run oprofile or tprof to get a
breakdownof the functions since Andres has already removed this from his patch.
 

Steve

Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Andres Freund

Date:

21 January 2013, 07:29:09

On 2013-01-19 17:33:05 -0500, Steve Singer wrote:
> On 13-01-09 03:07 PM, Tom Lane wrote:
> >Andres Freund <andres@2ndquadrant.com> writes:
> >>Well, I *did* benchmark it as noted elsewhere in the thread, but thats
> >>obviously just machine (E5520 x 2) with one rather restricted workload
> >>(pgbench -S -jc 40 -T60). At least its rather palloc heavy.
> >>Here are the numbers:
> >>before:
> >>#101646.763208 101350.361595 101421.425668 101571.211688 101862.172051 101449.857665
> >>after:
> >>#101553.596257 102132.277795 101528.816229 101733.541792 101438.531618 101673.400992
> >>So on my system if there is a difference, its positive (0.12%).
> >pgbench-based testing doesn't fill me with a lot of confidence for this
> >--- its numbers contain a lot of communication overhead, not to mention
> >that pgbench itself can be a bottleneck.  It struck me that we have a
> >recent test case that's known to be really palloc-intensive, namely
> >Pavel's example here:
> >http://www.postgresql.org/message-id/CAFj8pRCKfoz6L82PovLXNK-1JL=jzjwaT8e2BD2PwNKm7i7KVg@mail.gmail.com
> >
> >I set up a non-cassert build of commit
> >78a5e738e97b4dda89e1bfea60675bcf15f25994 (ie, just before the patch that
> >reduced the data-copying overhead for that).  On my Fedora 16 machine
> >(dual 2.0GHz Xeon E5503, gcc version 4.6.3 20120306 (Red Hat 4.6.3-2))
> >I get a runtime for Pavel's example of 17023 msec (average over five
> >runs).  I then applied oprofile and got a breakdown like this:
...
> >
> >The number of calls of AllocSetAlloc certainly hasn't changed at all, so
> >how did that get faster?
> >
> >I notice that the postgres executable is about 0.2% smaller, presumably
> >because a whole lot of inlined fetches of CurrentMemoryContext are gone.
> >This makes me wonder if my result is due to chance improvements of cache
> >line alignment for inner loops.
> >
> >I would like to know if other people get comparable results on other
> >hardware (non-Intel hardware would be especially interesting).  If this
> >result holds up across a range of platforms, I'll withdraw my objection
> >to making palloc a plain function.
> >
> >            regards, tom lane
> >
>
> Sorry for the delay I only read this thread today.
>
>
> I just tried Pawel's test on a POWER5 machine with an older version of gcc
> (see the grebe buildfarm animal for details)
>
> 78a5e738e:                   37874.855 (average of 6 runs)
> 78a5e738 + palloc.h + mcxt.c: 38076.8035
>
> The functions do seem to slightly slow things down on POWER. I haven't
> bothered to run oprofile or tprof to get a breakdown of the functions
> since Andres has already removed this from his patch.

I haven't removed it from the patch afaik, so it would be great to get a
profile here! Its "only" for xlogdump, but that tool helped me immensely
and I don't want to maintain it independently...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Steve Singer

Date:

21 January 2013, 16:59:31

On 13-01-21 02:28 AM, Andres Freund wrote:
>
> I haven't removed it from the patch afaik, so it would be great to get a
> profile here! Its "only" for xlogdump, but that tool helped me immensely
> and I don't want to maintain it independently...

Here is the output from tprof

Here is the baseline:

Average time 37862

Total Ticks For All Processes (./postgres_perftest) = 19089



Subroutine                 Ticks    %   Source Address  Bytes
==========                 ===== ====== ====== =======  =====
.AllocSetAlloc              2270   0.99 aset.c 3bfb8    4b0
.base_yyparse               1217   0.53 gram.c fe644  168ec
.text                       1015   0.44 copyfuncs.c 11bfb4   4430
.MemoryContextAllocZero      535   0.23 mcxt.c 3b990     f0
.MemoryContextAlloc          533   0.23 mcxt.c 3b780     e0
.check_stack_depth           392   0.17 postgres.c 568ac    100
.core_yylex                  385   0.17 gram.c fb5b4   1c90
.expression_tree_walker      347   0.15 nodeFuncs.c 50da4    750
.AllocSetFree                308   0.13 aset.c 3c998    1b0
.grouping_planner            242   0.11 planner.c 2399f0   1d10
.SearchCatCache              198   0.09 catcache.c 7ec20    550
._SPI_execute_plan           195   0.09 spi.c 2f0c18    7c0
.pfree                       195   0.09 mcxt.c 3b3b0     70
query_dependencies_walker    185   0.08 setrefs.c 255944    1b0
.GetSnapshotData             183   0.08 procarray.c 69efc    460
.query_tree_walker           176   0.08 nodeFuncs.c 50ae4    210
.strncpy                     168   0.07 strncpy.s ba080    130
.fmgr_info_cxt_security      166   0.07 fmgr.c 3f7b0    850
.transformStmt               159   0.07 analyze.c 29091c   12d0
.text                        141   0.06 parse_collate.c 28ddf8      1
.ExecInitExpr                137   0.06 execQual.c 17f18c   15f0
.fix_expr_common             132   0.06 setrefs.c 2557e4    160
.standard_ExecutorStart      127   0.06 execMain.c 1d9a00    940
.GetCachedPlan               125   0.05 plancache.c ce664    310
.strcpy                      121   0.05 noname 3bd40    1a8


With your changes (same test as I described before)

Average: 37938


Total Ticks For All Processes (./postgres_perftest) = 19311

Subroutine                 Ticks    %   Source Address  Bytes
==========                 ===== ====== ====== =======  =====
.AllocSetAlloc              2162   2.17 aset.c 3bfb8    4b0
.base_yyparse               1242   1.25 gram.c fdc7c  167f0
.text                       1028   1.03 copyfuncs.c 11b4d0   4210
.palloc                      553   0.56 mcxt.c 3b4c8     d0
.MemoryContextAllocZero      509   0.51 mcxt.c 3b9e8     f0
.core_yylex                  413   0.41 gram.c fac2c   1c60
.check_stack_depth           404   0.41 postgres.c 56730    100
.expression_tree_walker      320   0.32 nodeFuncs.c 50d28    750
.AllocSetFree                261   0.26 aset.c 3c998    1b0
._SPI_execute_plan           232   0.23 spi.c 2ee624    7c0
.GetSnapshotData             221   0.22 procarray.c 69d54    460
.grouping_planner            211   0.21 planner.c 237b60   1cf0
.MemoryContextAlloc          190   0.19 mcxt.c 3b738     e0
.query_tree_walker           184   0.18 nodeFuncs.c 50a68    210
.SearchCatCache              182   0.18 catcache.c 7ea08    550
.transformStmt               181   0.18 analyze.c 28e774   12d0
query_dependencies_walker    180   0.18 setrefs.c 25397c    1b0
.strncpy                     175   0.18 strncpy.s b9a60    130
.MemoryContextCreate         167   0.17 mcxt.c 3bad8    160
.pfree                       151   0.15 mcxt.c 3b208     70
.strcpy                      150   0.15 noname 3bd40    1a8
.fmgr_info_cxt_security      146   0.15 fmgr.c 3f790    850
.text                        132   0.13 parse_collate.c 28bc50      1
.ExecInitExpr                125   0.13 execQual.c 17de28   15e0
.expression_tree_mutator     124   0.12 nodeFuncs.c 53268   1080
.strcmp                      122   0.12 noname 1e44    158
.fix_expr_common             117   0.12 setrefs.c 25381c    160




> Greetings,
>
> Andres Freund
>

Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Andres Freund

Date:

21 January 2013, 17:15:39

On 2013-01-21 11:59:18 -0500, Steve Singer wrote:
> On 13-01-21 02:28 AM, Andres Freund wrote:
> >
> >I haven't removed it from the patch afaik, so it would be great to get a
> >profile here! Its "only" for xlogdump, but that tool helped me immensely
> >and I don't want to maintain it independently...
>
> Here is the output from tprof
>
> Here is the baseline:

Any chance to do the test ontop of HEAD? The elog stuff has gone in only
afterwards and might actually effect this enough to be relevant.

Otherwise I have to say I am a bit confused by the mighty difference in
costs attributed to AllocSetAlloc given the amount of calls should be
exactly the same and the number of "trampoline" function calls should
also stay the same.

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Steve Singer

Date:

21 January 2013, 21:59:12

On 13-01-21 12:15 PM, Andres Freund wrote:
> On 2013-01-21 11:59:18 -0500, Steve Singer wrote:
>> On 13-01-21 02:28 AM, Andres Freund wrote:
>>> I haven't removed it from the patch afaik, so it would be great to get a
>>> profile here! Its "only" for xlogdump, but that tool helped me immensely
>>> and I don't want to maintain it independently...
>> Here is the output from tprof
>>
>> Here is the baseline:
> Any chance to do the test ontop of HEAD? The elog stuff has gone in only
> afterwards and might actually effect this enough to be relevant.

HEAD(:fb197290)
Average: 28877

.AllocSetAlloc              1442   1.90 aset.c 3a938    4b0
.base_yyparse               1220   1.61 gram.c fdbc0  168ec
.MemoryContextAlloc          485   0.64 mcxt.c 3a154     e0
.core_yylex                  407   0.54 gram.c fab30   1c90
.AllocSetFree                320   0.42 aset.c 3b318    1b0
.MemoryContextAllocZero      316   0.42 mcxt.c 3a364     f0
.grouping_planner            271   0.36 planner.c 2b0ce8   1d10
.strncpy                     256   0.34 strncpy.s b8ca0    130
.expression_tree_walker      222   0.29 nodeFuncs.c 4f734    750
._SPI_execute_plan           221   0.29 spi.c 2fb230    840
.SearchCatCache              182   0.24 catcache.c 7d870    550
.GetSnapshotData             161   0.21 procarray.c 68acc    460
.fmgr_info_cxt_security      155   0.20 fmgr.c 3e130    850
.pfree                       152   0.20 mcxt.c 39d84     70
.expression_tree_mutator     151   0.20 nodeFuncs.c 51c84   1170
.check_stack_depth           142   0.19 postgres.c 5523c    100
.text                        126   0.17 parse_collate.c 233d90      1
.transformStmt               125   0.16 analyze.c 289e88   12c0
.ScanKeywordLookup           123   0.16 kwlookup.c f7ab4    154
.new_list                    120   0.16 list.c 40f74     b0
.subquery_planner            115   0.15 planner.c 2b29f8    dc0
.GetCachedPlan               115   0.15 plancache.c cdab0    310
.ExecInitExpr                114   0.15 execQual.c 17e690   15f0
.set_plan_refs               113   0.15 setrefs.c 252630    cb0
.standard_ExecutorStart      110   0.14 execMain.c 1d9264    940
.heap_form_tuple             107   0.14 heaptuple.c 177c84    2f0
.query_tree_walker           105   0.14 nodeFuncs.c 4f474    210



HEAD + patch:
Average: 29035

Total Ticks For All Processes (./postgres_perftest) = 15044

Subroutine                 Ticks    %   Source Address  Bytes
==========                 ===== ====== ====== =======  =====
.AllocSetAlloc              1422   1.64 aset.c 3a938    4b0
.base_yyparse               1201   1.38 gram.c fd1e8  167f0
.palloc                      470   0.54 mcxt.c 39e04     d0
.core_yylex                  364   0.42 gram.c fa198   1c60
.MemoryContextAllocZero      282   0.33 mcxt.c 3a324     f0
._SPI_execute_plan           250   0.29 spi.c 2f8c18    840
.AllocSetFree                244   0.28 aset.c 3b318    1b0
.strncpy                     234   0.27 strncpy.s b86a0    130
.expression_tree_walker      229   0.26 nodeFuncs.c 4f698    750
.grouping_planner            176   0.20 planner.c 2aea84   1d30
.SearchCatCache              170   0.20 catcache.c 7d638    550
.GetSnapshotData             168   0.19 procarray.c 68904    460
.assign_collations_walker    155   0.18 parse_collate.c 231f4c    7e0
.subquery_planner            141   0.16 planner.c 2b07b4    dc0
.fmgr_info_cxt_security      141   0.16 fmgr.c 3e110    850
.check_stack_depth           140   0.16 postgres.c 550a0    100
.ExecInitExpr                138   0.16 execQual.c 17d2f8   15e0
.pfree                       137   0.16 mcxt.c 39b44     70
.transformStmt               132   0.15 analyze.c 287dec   12c0
.new_list                    128   0.15 list.c 40f44     90
.expression_tree_mutator     125   0.14 nodeFuncs.c 51bd8   1080
.preprocess_expression       118   0.14 planner.c 2adf54    1a0
.strcmp                      118   0.14 noname 1e44    158
.standard_ExecutorStart      115   0.13 execMain.c 1d77c0    940
.MemoryContextAlloc          111   0.13 mcxt.c 3a074     e0
.set_plan_refs               109   0.13 setrefs.c 2506c4    ca0




> Otherwise I have to say I am a bit confused by the mighty difference in
> costs attributed to AllocSetAlloc given the amount of calls should be
> exactly the same and the number of "trampoline" function calls should
> also stay the same.
> Greetings,
>
> Andres Freund
>
> --
>   Andres Freund                       http://www.2ndQuadrant.com/
>   PostgreSQL Development, 24x7 Support, Training & Services
>
>

Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From

Heikki Linnakangas

Date:

23 January 2013, 11:03:44

For the record, on MSVC we can use __assume(0) to mark unreachable code. 
It does the same as gcc's __builtin_unreachable(). I tested it with the 
same Pavel's palloc-heavy test case that you used earlier, with the 
one-shot plan commit temporarily reverted, and saw a similar speedup you 
reported with gcc's __builtin_unreachable(). So, committed that.

- Heikki

Re: Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend

From

Alvaro Herrera

Date:

01 February 2013, 21:34:15

Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2013-01-18 10:33:16 -0500, Tom Lane wrote:
> >> Really I'd prefer not to move the backend definitions out of postgres.h
> >> at all, just because doing so will lose fifteen years of git history
> >> about those particular lines (or at least make it a lot harder to
> >> locate with git blame).
>
> > The alternative seems to be sprinkling a few more #ifdef FRONTEND's in
> > postgres.h, if thats preferred I can prepare a patch for that although I
> > prefer my proposal.
>
> Yeah, surrounding the existing definitions with #ifndef FRONTEND was
> what I was imagining.  But on reflection that seems pretty darn ugly,
> especially if the corresponding FRONTEND definitions are far away.
> Maybe we should just live with the git history disconnect.
>
> Right at the moment the c.h location seems like the best bet.

Since there seems to be consensus that this is the best way to go, I
have committed it.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: [PATCH 4/5] Add pg_xlogdump contrib module

From

Alvaro Herrera

Date:

04 February 2013, 16:22:34

I didn't like this bit too much:

> diff --git a/contrib/pg_xlogdump/tables.c b/contrib/pg_xlogdump/tables.c
> new file mode 100644
> index 0000000..e947e0d
> --- /dev/null
> +++ b/contrib/pg_xlogdump/tables.c
> @@ -0,0 +1,78 @@

> +/*
> + * RmgrTable linked only to functions available outside of the backend.
> + *
> + * needs to be synced with src/backend/access/transam/rmgr.c
> + */
> +const RmgrData RmgrTable[RM_MAX_ID + 1] = {
> +    {"XLOG", NULL, xlog_desc, NULL, NULL, NULL},
> +    {"Transaction", NULL, xact_desc, NULL, NULL, NULL},

So I propose the following patch instead.  This lets pg_xlogdump compile
only the file it needs, and not concern itself with the rest of rmgr.c.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

split-rm_desc.patch

Re: [PATCH 4/5] Add pg_xlogdump contrib module

From

Andres Freund

Date:

04 February 2013, 16:29:36

On 2013-02-04 13:22:26 -0300, Alvaro Herrera wrote:
> 
> I didn't like this bit too much:
> 
> > diff --git a/contrib/pg_xlogdump/tables.c b/contrib/pg_xlogdump/tables.c
> > new file mode 100644
> > index 0000000..e947e0d
> > --- /dev/null
> > +++ b/contrib/pg_xlogdump/tables.c
> > @@ -0,0 +1,78 @@
> 
> > +/*
> > + * RmgrTable linked only to functions available outside of the backend.
> > + *
> > + * needs to be synced with src/backend/access/transam/rmgr.c
> > + */
> > +const RmgrData RmgrTable[RM_MAX_ID + 1] = {
> > +    {"XLOG", NULL, xlog_desc, NULL, NULL, NULL},
> > +    {"Transaction", NULL, xact_desc, NULL, NULL, NULL},
> 
> So I propose the following patch instead.  This lets pg_xlogdump compile
> only the file it needs, and not concern itself with the rest of rmgr.c.

Hm. Not sure whats the advantage of that, duplicates about as much and
makes it harder to copy & paste between both files.
I am fine with it, I just don't see a clear advantage of either way.

> diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
> index ce9957e..b751882 100644
> --- a/src/include/access/xlog_internal.h
> +++ b/src/include/access/xlog_internal.h
> @@ -231,15 +231,17 @@ typedef struct xl_end_of_recovery
>  struct XLogRecord;
>  
>  /*
> - * Method table for resource managers.
> + * Method tables for resource managers.
>   *
> - * RmgrTable[] is indexed by RmgrId values (see rmgr.h).
> + * RmgrDescData (for textual descriptor functions) is split so that the file it
> + * lives in can be used by frontend programs.
> + *
> + * RmgrTable[] and RmgrDescTable[] are indexed by RmgrId values (see rmgr.h).
>   */
>  typedef struct RmgrData
>  {
>      const char *rm_name;
>      void        (*rm_redo) (XLogRecPtr lsn, struct XLogRecord *rptr);
> -    void        (*rm_desc) (StringInfo buf, uint8 xl_info, char *rec);
>      void        (*rm_startup) (void);
>      void        (*rm_cleanup) (void);
>      bool        (*rm_safe_restartpoint) (void);
> @@ -247,6 +249,14 @@ typedef struct RmgrData
>  
>  extern const RmgrData RmgrTable[];
>  
> +typedef struct RmgrDescData
> +{
> +    void        (*rm_desc) (StringInfo buf, uint8 xl_info, char *rec);
> +} RmgrDescData;
> +
> +extern const RmgrDescData RmgrDescTable[];
> +

I think we would at least need to include rm_name here as well.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH 4/5] Add pg_xlogdump contrib module

From

Alvaro Herrera

Date:

04 February 2013, 17:56:02

Andres Freund wrote:
> On 2013-02-04 13:22:26 -0300, Alvaro Herrera wrote:

> > +typedef struct RmgrDescData
> > +{
> > +    void        (*rm_desc) (StringInfo buf, uint8 xl_info, char *rec);
> > +} RmgrDescData;
> > +
> > +extern const RmgrDescData RmgrDescTable[];
> > +
>
> I think we would at least need to include rm_name here as well.

Not rm_name (that would be duplicative), but a comment with the name of
each rmgr's name in its line should be sufficient.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: [PATCH 4/5] Add pg_xlogdump contrib module

From

Andres Freund

Date:

04 February 2013, 17:57:53

On 2013-02-04 14:55:56 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > On 2013-02-04 13:22:26 -0300, Alvaro Herrera wrote:
> 
> > > +typedef struct RmgrDescData
> > > +{
> > > +    void        (*rm_desc) (StringInfo buf, uint8 xl_info, char *rec);
> > > +} RmgrDescData;
> > > +
> > > +extern const RmgrDescData RmgrDescTable[];
> > > +
> > 
> > I think we would at least need to include rm_name here as well.
> 
> Not rm_name (that would be duplicative), but a comment with the name of
> each rmgr's name in its line should be sufficient.

Maybe I am misunderstanding what you want to do, but xlogdump prints the
name of the rmgr and allows to filter by it, how would we do that
without rm_name?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [PATCH 4/5] Add pg_xlogdump contrib module

From

Alvaro Herrera

Date:

04 February 2013, 18:03:35

Andres Freund wrote:
> On 2013-02-04 14:55:56 -0300, Alvaro Herrera wrote:
> > Andres Freund wrote:
> > > On 2013-02-04 13:22:26 -0300, Alvaro Herrera wrote:
> >
> > > > +typedef struct RmgrDescData
> > > > +{
> > > > +    void        (*rm_desc) (StringInfo buf, uint8 xl_info, char *rec);
> > > > +} RmgrDescData;
> > > > +
> > > > +extern const RmgrDescData RmgrDescTable[];
> > > > +
> > >
> > > I think we would at least need to include rm_name here as well.
> >
> > Not rm_name (that would be duplicative), but a comment with the name of
> > each rmgr's name in its line should be sufficient.
>
> Maybe I am misunderstanding what you want to do, but xlogdump prints the
> name of the rmgr and allows to filter by it, how would we do that
> without rm_name?

Oh, I thought you wanted it for code documentation purposes only.  Okay,
that makes sense.  (Of course, we'd remove it from RmgrTable).

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it

From

Noah Misch

Date:

20 February 2013, 03:28:14

On Tue, Jan 08, 2013 at 08:02:16PM -0500, Robert Haas wrote:
> On Tue, Jan 8, 2013 at 7:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> >> I was thinking more about a sprintf()-type function that only
> >> understands a handful of escapes, but adds the additional and novel
> >> escapes %I (quote as identifier) and %L (quote as literal).  I think
> >> that would allow a great deal of code simplification, and it'd be more
> >> efficient, too.
> >
> > Seems like a great idea.  Are you offering to code it?
> 
> Not imminently.
> 
> > Note that this wouldn't entirely fix the fmtId problem, as not all the
> > uses of fmtId are directly in sprintf calls.  Still, it might get rid of
> > most of the places where it'd be painful to avoid a memory leak with
> > a strdup'ing version of fmtId.
> 
> Yeah, I didn't think about that.  Might be worth a look to see how
> comprehensively it would solve the problem.  But I'll have to leave
> that for another day.

As a cheap solution addressing 95+% of the trouble, how about giving fmtId() a
ring of, say, eight static buffers to cycle through?  That satisfies code
merely wishing to pass multiple fmtId()'d values to one appendPQExpBuffer().

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com