Thread: [PATCH] xlogreader-v4
From: Andres Freund <andres@2ndquadrant.com> Subject: [PATCH] xlogreader-v4 In-Reply-To: Hi, this is the latest and obviously best version of xlogreader & xlogdump with changes both from Heikki and me. Changes: * windows build support for pg_xlogdump * xlogdump moved to contrib * xlogdump option parsing enhancements * generic cleanups * a few more comments * removal of some ugliness in XLogFindNextRecord I think its mostly ready, for xlogdump minimally these two issues remain: const char * timestamptz_to_str(TimestampTz dt) { return "unimplemented-timestamp"; } const char * relpathbackend(RelFileNode rnode, BackendId backend, ForkNumber forknum) { return "unimplemented-relpathbackend"; } aren't exactly the nicest wrapper functions. I think its ok to simply copy relpathbackend from the backend, but timestamptz_to_str? Thats a heck of a lot of code. Patches 1 and 2 and 5 are just preparatory and probably can be applied beforehand. 3 and 4 are the real meat of this and especially 3 needs some careful review. Input welcome! Andres
[PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
From
Andres Freund
Date:
From: Andres Freund <andres@anarazel.de> relpathbackend() (via some of its wrappers) is used in *_desc routines which we want to be useable without a backend environment arround. Change signature to return a 'const char *' to make misuse easier to detect. That necessicates also changing the 'FileName' typedef to 'const char *' which seems to be a good idea anyway. ---src/backend/access/rmgrdesc/smgrdesc.c | 6 ++---src/backend/access/rmgrdesc/xactdesc.c | 6 ++---src/backend/access/transam/xlogutils.c| 9 +++----src/backend/catalog/catalog.c | 49 +++++++++++-----------------------src/backend/storage/buffer/bufmgr.c | 12 +++------src/backend/storage/file/fd.c | 6 ++---src/backend/storage/smgr/md.c | 23 +++++-----------src/backend/utils/adt/dbsize.c | 4 +--src/include/catalog/catalog.h | 2 +-src/include/storage/fd.h | 9 +++----10 files changed, 42 insertions(+),84 deletions(-) diff --git a/src/backend/access/rmgrdesc/smgrdesc.c b/src/backend/access/rmgrdesc/smgrdesc.c index bcabf89..490c8c7 100644 --- a/src/backend/access/rmgrdesc/smgrdesc.c +++ b/src/backend/access/rmgrdesc/smgrdesc.c @@ -26,19 +26,17 @@ smgr_desc(StringInfo buf, uint8 xl_info, char *rec) if (info == XLOG_SMGR_CREATE) { xl_smgr_create*xlrec = (xl_smgr_create *) rec; - char *path = relpathperm(xlrec->rnode, xlrec->forkNum); + const char *path = relpathperm(xlrec->rnode, xlrec->forkNum); appendStringInfo(buf, "file create: %s", path); - pfree(path); } else if (info == XLOG_SMGR_TRUNCATE) { xl_smgr_truncate *xlrec = (xl_smgr_truncate*) rec; - char *path = relpathperm(xlrec->rnode, MAIN_FORKNUM); + const char *path = relpathperm(xlrec->rnode, MAIN_FORKNUM); appendStringInfo(buf, "file truncate: %s to %ublocks", path, xlrec->blkno); - pfree(path); } else appendStringInfo(buf, "UNKNOWN"); diff --git a/src/backend/access/rmgrdesc/xactdesc.c b/src/backend/access/rmgrdesc/xactdesc.c index 2471279..b86a53e 100644 --- a/src/backend/access/rmgrdesc/xactdesc.c +++ b/src/backend/access/rmgrdesc/xactdesc.c @@ -35,10 +35,9 @@ xact_desc_commit(StringInfo buf, xl_xact_commit *xlrec) appendStringInfo(buf, "; rels:"); for (i = 0; i < xlrec->nrels; i++) { - char *path = relpathperm(xlrec->xnodes[i], MAIN_FORKNUM); + const char *path = relpathperm(xlrec->xnodes[i], MAIN_FORKNUM); appendStringInfo(buf, " %s", path); - pfree(path); } } if (xlrec->nsubxacts > 0) @@ -105,10 +104,9 @@ xact_desc_abort(StringInfo buf, xl_xact_abort *xlrec) appendStringInfo(buf, "; rels:"); for (i = 0; i < xlrec->nrels; i++) { - char *path = relpathperm(xlrec->xnodes[i], MAIN_FORKNUM); + const char *path = relpathperm(xlrec->xnodes[i], MAIN_FORKNUM); appendStringInfo(buf, " %s", path); - pfree(path); } } if (xlrec->nsubxacts > 0) diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c index f9a6e62..8266f3c 100644 --- a/src/backend/access/transam/xlogutils.c +++ b/src/backend/access/transam/xlogutils.c @@ -57,7 +57,7 @@ static voidreport_invalid_page(int elevel, RelFileNode node, ForkNumber forkno, BlockNumberblkno, bool present){ - char *path = relpathperm(node, forkno); + const char *path = relpathperm(node, forkno); if (present) elog(elevel, "page %u of relation %s is uninitialized", @@ -65,7 +65,6 @@ report_invalid_page(int elevel, RelFileNode node, ForkNumber forkno, else elog(elevel, "page%u of relation %s does not exist", blkno, path); - pfree(path);}/* Log a reference to an invalid page */ @@ -153,11 +152,10 @@ forget_invalid_pages(RelFileNode node, ForkNumber forkno, BlockNumber minblkno) { if (log_min_messages <= DEBUG2 || client_min_messages <= DEBUG2) { - char *path = relpathperm(hentry->key.node, forkno); + const char *path = relpathperm(hentry->key.node, forkno); elog(DEBUG2, "page %u of relation%s has been dropped", hentry->key.blkno, path); - pfree(path); } if (hash_search(invalid_page_tab, @@ -186,11 +184,10 @@ forget_invalid_pages_db(Oid dbid) { if (log_min_messages <= DEBUG2 || client_min_messages<= DEBUG2) { - char *path = relpathperm(hentry->key.node, hentry->key.forkno); + const char *path = relpathperm(hentry->key.node, hentry->key.forkno); elog(DEBUG2, "page%u of relation %s has been dropped", hentry->key.blkno, path); - pfree(path); } if (hash_search(invalid_page_tab, diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c index 9686486..6455ef0 100644 --- a/src/backend/catalog/catalog.c +++ b/src/backend/catalog/catalog.c @@ -112,54 +112,45 @@ forkname_chars(const char *str, ForkNumber *fork)/* * relpathbackend - construct path to a relation'sfile * - * Result is a palloc'd string. + * Result is a pointer to a statically allocated string. */ -char * +const char *relpathbackend(RelFileNode rnode, BackendId backend, ForkNumber forknum){ - int pathlen; - char *path; + static char path[MAXPGPATH]; if (rnode.spcNode == GLOBALTABLESPACE_OID) { /* Shared system relations livein {datadir}/global */ Assert(rnode.dbNode == 0); Assert(backend == InvalidBackendId); - pathlen = 7 + OIDCHARS + 1 + FORKNAMECHARS + 1; - path = (char *) palloc(pathlen); if (forknum != MAIN_FORKNUM) - snprintf(path, pathlen, "global/%u_%s", + snprintf(path, MAXPGPATH, "global/%u_%s", rnode.relNode, forkNames[forknum]); else - snprintf(path, pathlen, "global/%u", rnode.relNode); + snprintf(path, MAXPGPATH, "global/%u", rnode.relNode); } else if (rnode.spcNode == DEFAULTTABLESPACE_OID) { /* The default tablespace is {datadir}/base */ if (backend == InvalidBackendId) { - pathlen = 5 + OIDCHARS + 1 + OIDCHARS + 1 + FORKNAMECHARS + 1; - path = (char *) palloc(pathlen); if (forknum != MAIN_FORKNUM) - snprintf(path, pathlen, "base/%u/%u_%s", + snprintf(path, MAXPGPATH, "base/%u/%u_%s", rnode.dbNode, rnode.relNode, forkNames[forknum]); else - snprintf(path, pathlen, "base/%u/%u", + snprintf(path, MAXPGPATH, "base/%u/%u", rnode.dbNode, rnode.relNode); } else { - /* OIDCHARS will suffice for an integer, too */ - pathlen = 5 + OIDCHARS + 2 + OIDCHARS + 1 + OIDCHARS + 1 - + FORKNAMECHARS + 1; - path = (char *) palloc(pathlen); if (forknum != MAIN_FORKNUM) - snprintf(path, pathlen, "base/%u/t%d_%u_%s", + snprintf(path, MAXPGPATH, "base/%u/t%d_%u_%s", rnode.dbNode, backend, rnode.relNode, forkNames[forknum]); else - snprintf(path, pathlen, "base/%u/t%d_%u", + snprintf(path, MAXPGPATH, "base/%u/t%d_%u", rnode.dbNode, backend, rnode.relNode); } } @@ -168,38 +159,30 @@ relpathbackend(RelFileNode rnode, BackendId backend, ForkNumber forknum) /* All other tablespacesare accessed via symlinks */ if (backend == InvalidBackendId) { - pathlen = 9 + 1 + OIDCHARS + 1 - + strlen(TABLESPACE_VERSION_DIRECTORY) + 1 + OIDCHARS + 1 - + OIDCHARS + 1 + FORKNAMECHARS + 1; - path = (char *) palloc(pathlen); if (forknum != MAIN_FORKNUM) - snprintf(path, pathlen, "pg_tblspc/%u/%s/%u/%u_%s", + snprintf(path, MAXPGPATH, "pg_tblspc/%u/%s/%u/%u_%s", rnode.spcNode, TABLESPACE_VERSION_DIRECTORY, rnode.dbNode, rnode.relNode, forkNames[forknum]); else - snprintf(path, pathlen, "pg_tblspc/%u/%s/%u/%u", + snprintf(path, MAXPGPATH, "pg_tblspc/%u/%s/%u/%u", rnode.spcNode, TABLESPACE_VERSION_DIRECTORY, rnode.dbNode, rnode.relNode); } else { - /* OIDCHARS will suffice for an integer, too */ - pathlen = 9 + 1 + OIDCHARS + 1 - + strlen(TABLESPACE_VERSION_DIRECTORY) + 1 + OIDCHARS + 2 - + OIDCHARS + 1 + OIDCHARS + 1 + FORKNAMECHARS + 1; - path = (char *) palloc(pathlen); if (forknum != MAIN_FORKNUM) - snprintf(path, pathlen, "pg_tblspc/%u/%s/%u/t%d_%u_%s", + snprintf(path, MAXPGPATH, "pg_tblspc/%u/%s/%u/t%d_%u_%s", rnode.spcNode, TABLESPACE_VERSION_DIRECTORY, rnode.dbNode, backend, rnode.relNode, forkNames[forknum]); else - snprintf(path, pathlen, "pg_tblspc/%u/%s/%u/t%d_%u", + snprintf(path, MAXPGPATH, "pg_tblspc/%u/%s/%u/t%d_%u", rnode.spcNode, TABLESPACE_VERSION_DIRECTORY, rnode.dbNode, backend, rnode.relNode); } } + return path;} @@ -534,7 +517,7 @@ OidGetNewRelFileNode(Oid reltablespace, Relation pg_class, char relpersistence){ RelFileNodeBackendrnode; - char *rpath; + const char *rpath; int fd; bool collides; BackendId backend; @@ -599,8 +582,6 @@ GetNewRelFileNode(Oid reltablespace, Relation pg_class, char relpersistence) */ collides = false; } - - pfree(rpath); } while (collides); return rnode.node.relNode; diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 03ed41d..6c2620d 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -1757,7 +1757,7 @@ PrintBufferLeakWarning(Buffer buffer){ volatile BufferDesc *buf; int32 loccount; - char *path; + const char *path; BackendId backend; Assert(BufferIsValid(buffer)); @@ -1782,7 +1782,6 @@ PrintBufferLeakWarning(Buffer buffer) buffer, path, buf->tag.blockNum, buf->flags, buf->refcount, loccount); - pfree(path);}/* @@ -2901,7 +2900,7 @@ AbortBufferIO(void) if (sv_flags & BM_IO_ERROR) { /* Buffer ispinned, so we can read tag without spinlock */ - char *path; + const char *path; path = relpathperm(buf->tag.rnode, buf->tag.forkNum); ereport(WARNING, @@ -2909,7 +2908,6 @@ AbortBufferIO(void) errmsg("could not write block %u of %s", buf->tag.blockNum, path), errdetail("Multiple failures --- write error might bepermanent."))); - pfree(path); } } TerminateBufferIO(buf, false, BM_IO_ERROR); @@ -2927,11 +2925,10 @@ shared_buffer_write_error_callback(void *arg) /* Buffer is pinned, so we can read the tag withoutlocking the spinlock */ if (bufHdr != NULL) { - char *path = relpathperm(bufHdr->tag.rnode, bufHdr->tag.forkNum); + const char *path = relpathperm(bufHdr->tag.rnode, bufHdr->tag.forkNum); errcontext("writing block %u of relation%s", bufHdr->tag.blockNum, path); - pfree(path); }} @@ -2945,11 +2942,10 @@ local_buffer_write_error_callback(void *arg) if (bufHdr != NULL) { - char *path = relpathbackend(bufHdr->tag.rnode, MyBackendId, + const char *path = relpathbackend(bufHdr->tag.rnode, MyBackendId, bufHdr->tag.forkNum); errcontext("writing block %u of relation %s", bufHdr->tag.blockNum, path); - pfree(path); }} diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c index c026731..64d2a1e 100644 --- a/src/backend/storage/file/fd.c +++ b/src/backend/storage/file/fd.c @@ -556,7 +556,7 @@ set_max_safe_fds(void) * this module wouldn't have any open files to close at that point anyway. */int -BasicOpenFile(FileName fileName, int fileFlags, int fileMode) +BasicOpenFile(const char *fileName, int fileFlags, int fileMode){ int fd; @@ -878,7 +878,7 @@ FileInvalidate(File file) * (which should always be $PGDATA when this code is running). */File -PathNameOpenFile(FileName fileName, int fileFlags, int fileMode) +PathNameOpenFile(const char *fileName, int fileFlags, int fileMode){ char *fnamecopy; File file; @@ -1548,7 +1548,7 @@ TryAgain: * Like AllocateFile, but returns an unbuffered fd like open(2) */int -OpenTransientFile(FileName fileName, int fileFlags, int fileMode) +OpenTransientFile(const char *fileName, int fileFlags, int fileMode){ int fd; diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c index 20eb36a..4ef47b1 100644 --- a/src/backend/storage/smgr/md.c +++ b/src/backend/storage/smgr/md.c @@ -264,7 +264,7 @@ mdexists(SMgrRelation reln, ForkNumber forkNum)voidmdcreate(SMgrRelation reln, ForkNumber forkNum, boolisRedo){ - char *path; + const char *path; File fd; if (isRedo && reln->md_fd[forkNum] != NULL) @@ -298,8 +298,6 @@ mdcreate(SMgrRelation reln, ForkNumber forkNum, bool isRedo) } } - pfree(path); - reln->md_fd[forkNum] = _fdvec_alloc(); reln->md_fd[forkNum]->mdfd_vfd = fd; @@ -380,7 +378,7 @@ mdunlink(RelFileNodeBackend rnode, ForkNumber forkNum, bool isRedo)static voidmdunlinkfork(RelFileNodeBackendrnode, ForkNumber forkNum, bool isRedo){ - char *path; + const char *path; int ret; path = relpath(rnode, forkNum); @@ -449,8 +447,6 @@ mdunlinkfork(RelFileNodeBackend rnode, ForkNumber forkNum, bool isRedo) } pfree(segpath); } - - pfree(path);}/* @@ -545,7 +541,7 @@ static MdfdVec *mdopen(SMgrRelation reln, ForkNumber forknum, ExtensionBehavior behavior){ MdfdVec *mdfd; - char *path; + const char *path; File fd; /* No work if already open */ @@ -571,7 +567,6 @@ mdopen(SMgrRelation reln, ForkNumber forknum, ExtensionBehavior behavior) if (behavior ==EXTENSION_RETURN_NULL && FILE_POSSIBLY_DELETED(errno)) { - pfree(path); return NULL; } ereport(ERROR, @@ -580,8 +575,6 @@ mdopen(SMgrRelation reln, ForkNumber forknum, ExtensionBehavior behavior) } } - pfree(path); - reln->md_fd[forknum] = mdfd = _fdvec_alloc(); mdfd->mdfd_vfd = fd; @@ -1279,7 +1272,7 @@ mdpostckpt(void) while (pendingUnlinks != NIL) { PendingUnlinkEntry *entry = (PendingUnlinkEntry*) linitial(pendingUnlinks); - char *path; + const char *path; /* * New entries are appended to the end, so if the entry is new we've @@ -1309,7 +1302,6 @@ mdpostckpt(void) (errcode_for_file_access(), errmsg("couldnot remove file \"%s\": %m", path))); } - pfree(path); /* And remove the list entry */ pendingUnlinks = list_delete_first(pendingUnlinks); @@ -1634,8 +1626,8 @@ _fdvec_alloc(void)static char *_mdfd_segpath(SMgrRelation reln, ForkNumber forknum, BlockNumber segno){ - char *path, - *fullpath; + const char *path; + char *fullpath; path = relpath(reln->smgr_rnode, forknum); @@ -1644,10 +1636,9 @@ _mdfd_segpath(SMgrRelation reln, ForkNumber forknum, BlockNumber segno) /* be sure we haveenough space for the '.segno' */ fullpath = (char *) palloc(strlen(path) + 12); sprintf(fullpath, "%s.%u",path, segno); - pfree(path); } else - fullpath = path; + fullpath = pstrdup(path); return fullpath;} diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c index 89ad386..c285007 100644 --- a/src/backend/utils/adt/dbsize.c +++ b/src/backend/utils/adt/dbsize.c @@ -268,7 +268,7 @@ static int64calculate_relation_size(RelFileNode *rfn, BackendId backend, ForkNumber forknum){ int64 totalsize = 0; - char *relationpath; + const char *relationpath; char pathname[MAXPGPATH]; unsigned int segcount = 0; @@ -756,7 +756,7 @@ pg_relation_filepath(PG_FUNCTION_ARGS) Form_pg_class relform; RelFileNode rnode; BackendId backend; - char *path; + const char *path; tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid)); if (!HeapTupleIsValid(tuple)) diff --git a/src/include/catalog/catalog.h b/src/include/catalog/catalog.h index 5d506fe..299cb03 100644 --- a/src/include/catalog/catalog.h +++ b/src/include/catalog/catalog.h @@ -31,7 +31,7 @@ extern const char *forkNames[];extern ForkNumber forkname_to_number(char *forkName);extern int forkname_chars(constchar *str, ForkNumber *); -extern char *relpathbackend(RelFileNode rnode, BackendId backend, +extern const char *relpathbackend(RelFileNode rnode, BackendId backend, ForkNumber forknum);extern char *GetDatabasePath(OiddbNode, Oid spcNode); diff --git a/src/include/storage/fd.h b/src/include/storage/fd.h index bd36c9d..b2565c4 100644 --- a/src/include/storage/fd.h +++ b/src/include/storage/fd.h @@ -45,9 +45,6 @@/* * FileSeek uses the standard UNIX lseek(2) flags. */ - -typedef char *FileName; -typedef int File; @@ -65,7 +62,7 @@ extern int max_safe_fds; *//* Operations on virtual Files --- equivalent to Unix kernel file ops */ -extern File PathNameOpenFile(FileName fileName, int fileFlags, int fileMode); +extern File PathNameOpenFile(const char *fileName, int fileFlags, int fileMode);extern File OpenTemporaryFile(bool interXact);externvoid FileClose(File file);extern int FilePrefetch(File file, off_t offset, int amount); @@ -86,11 +83,11 @@ extern struct dirent *ReadDir(DIR *dir, const char *dirname);extern int FreeDir(DIR *dir);/* Operationsto allow use of a plain kernel FD, with automatic cleanup */ -extern int OpenTransientFile(FileName fileName, int fileFlags, int fileMode); +extern int OpenTransientFile(const char *fileName, int fileFlags, int fileMode);extern int CloseTransientFile(intfd);/* If you've really really gotta have a plain kernel FD, use this */ -extern int BasicOpenFile(FileName fileName, int fileFlags, int fileMode); +extern int BasicOpenFile(const char *fileName, int fileFlags, int fileMode);/* Miscellaneous support routines */externvoid InitFileAccess(void); -- 1.7.12.289.g0ce9864.dirty
[PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend
From
Andres Freund
Date:
From: Andres Freund <andres@anarazel.de> c.h already had parts of the assert support (StaticAssert*) and its the shared file between postgres.h and postgres_fe.h. This makes it easier to build frontend programs which have to do the hack. ---src/include/c.h | 65 +++++++++++++++++++++++++++++++++++++++++++++++src/include/postgres.h | 54 ++-------------------------------------src/include/postgres_fe.h| 12 ---------3 files changed, 67 insertions(+), 64 deletions(-) diff --git a/src/include/c.h b/src/include/c.h index f7db157..c30df8b 100644 --- a/src/include/c.h +++ b/src/include/c.h @@ -694,6 +694,71 @@ typedef NameData *Name;/* + * USE_ASSERT_CHECKING, if defined, turns on all the assertions. + * - plai 9/5/90 + * + * It should _NOT_ be defined in releases or in benchmark copies + */ + +/* + * Assert() can be used in both frontend and backend code. In frontend code it + * just calls the standard assert, if it's available. If use of assertions is + * not configured, it does nothing. + */ +#ifndef USE_ASSERT_CHECKING + +#define Assert(condition) +#define AssertMacro(condition) ((void)true) +#define AssertArg(condition) +#define AssertState(condition) + +#elif defined FRONTEND + +#include <assert.h> +#define Assert(p) assert(p) +#define AssertMacro(p) ((void) assert(p)) + +#else /* USE_ASSERT_CHECKING && FRONTEND */ + +/* + * Trap + * Generates an exception if the given condition is true. + */ +#define Trap(condition, errorType) \ + do { \ + if ((assert_enabled) && (condition)) \ + ExceptionalCondition(CppAsString(condition), (errorType), \ + __FILE__, __LINE__); \ + } while (0) + +/* + * TrapMacro is the same as Trap but it's intended for use in macros: + * + * #define foo(x) (AssertMacro(x != 0), bar(x)) + * + * Isn't CPP fun? + */ +#define TrapMacro(condition, errorType) \ + ((bool) ((! assert_enabled) || ! (condition) || \ + (ExceptionalCondition(CppAsString(condition), (errorType), \ + __FILE__, __LINE__), 0))) + +#define Assert(condition) \ + Trap(!(condition), "FailedAssertion") + +#define AssertMacro(condition) \ + ((void) TrapMacro(!(condition), "FailedAssertion")) + +#define AssertArg(condition) \ + Trap(!(condition), "BadArgument") + +#define AssertState(condition) \ + Trap(!(condition), "BadState") + +#endif /* USE_ASSERT_CHECKING && !FRONTEND */ + + +/* * Macros to support compile-time assertion checks. * * If the "condition" (a compile-time-constant expression) evaluatesto false, diff --git a/src/include/postgres.h b/src/include/postgres.h index b6e922f..bbe125a 100644 --- a/src/include/postgres.h +++ b/src/include/postgres.h @@ -25,7 +25,7 @@ * ------- ------------------------------------------------ * 1) variable-length datatypes(TOAST support) * 2) datum type + support macros - * 3) exception handling definitions + * 3) exception handling * * NOTES * @@ -627,62 +627,12 @@ extern Datum Float8GetDatum(float8 X);/* ---------------------------------------------------------------- - * Section 3: exception handling definitions - * Assert, Trap, etc macros + * Section 3: exception handling backend support * ----------------------------------------------------------------*/extern PGDLLIMPORT bool assert_enabled; -/* - * USE_ASSERT_CHECKING, if defined, turns on all the assertions. - * - plai 9/5/90 - * - * It should _NOT_ be defined in releases or in benchmark copies - */ - -/* - * Trap - * Generates an exception if the given condition is true. - */ -#define Trap(condition, errorType) \ - do { \ - if ((assert_enabled) && (condition)) \ - ExceptionalCondition(CppAsString(condition), (errorType), \ - __FILE__, __LINE__); \ - } while (0) - -/* - * TrapMacro is the same as Trap but it's intended for use in macros: - * - * #define foo(x) (AssertMacro(x != 0), bar(x)) - * - * Isn't CPP fun? - */ -#define TrapMacro(condition, errorType) \ - ((bool) ((! assert_enabled) || ! (condition) || \ - (ExceptionalCondition(CppAsString(condition), (errorType), \ - __FILE__, __LINE__), 0))) - -#ifndef USE_ASSERT_CHECKING -#define Assert(condition) -#define AssertMacro(condition) ((void)true) -#define AssertArg(condition) -#define AssertState(condition) -#else -#define Assert(condition) \ - Trap(!(condition), "FailedAssertion") - -#define AssertMacro(condition) \ - ((void) TrapMacro(!(condition), "FailedAssertion")) - -#define AssertArg(condition) \ - Trap(!(condition), "BadArgument") - -#define AssertState(condition) \ - Trap(!(condition), "BadState") -#endif /* USE_ASSERT_CHECKING */ -extern void ExceptionalCondition(const char *conditionName, const char *errorType, constchar *fileName, int lineNumber) __attribute__((noreturn)); diff --git a/src/include/postgres_fe.h b/src/include/postgres_fe.h index af31227..0f35ecc 100644 --- a/src/include/postgres_fe.h +++ b/src/include/postgres_fe.h @@ -24,16 +24,4 @@#include "c.h" -/* - * Assert() can be used in both frontend and backend code. In frontend code it - * just calls the standard assert, if it's available. If use of assertions is - * not configured, it does nothing. - */ -#ifdef USE_ASSERT_CHECKING -#include <assert.h> -#define Assert(p) assert(p) -#else -#define Assert(p) -#endif -#endif /* POSTGRES_FE_H */ -- 1.7.12.289.g0ce9864.dirty
From: Andres Freund <andres@anarazel.de> Authors: Andres Freund, Heikki Linnakangas ---contrib/Makefile | 1 +contrib/pg_xlogdump/Makefile | 37 +++contrib/pg_xlogdump/compat.c | 58 ++++contrib/pg_xlogdump/pg_xlogdump.c | 654 ++++++++++++++++++++++++++++++++++++++contrib/pg_xlogdump/tables.c | 78 +++++doc/src/sgml/ref/allfiles.sgml | 1 +doc/src/sgml/ref/pg_xlogdump.sgml | 76 +++++doc/src/sgml/reference.sgml | 1 +src/backend/access/transam/rmgr.c | 1 +src/backend/catalog/catalog.c | 2 +src/tools/msvc/Mkvcbuild.pm | 16 +-11 files changed, 924 insertions(+), 1 deletion(-)create mode 100644 contrib/pg_xlogdump/Makefilecreatemode 100644 contrib/pg_xlogdump/compat.ccreate mode 100644 contrib/pg_xlogdump/pg_xlogdump.ccreatemode 100644 contrib/pg_xlogdump/tables.ccreate mode 100644 doc/src/sgml/ref/pg_xlogdump.sgml diff --git a/contrib/Makefile b/contrib/Makefile index fcd7c1e..5d290b8 100644 --- a/contrib/Makefile +++ b/contrib/Makefile @@ -39,6 +39,7 @@ SUBDIRS = \ pg_trgm \ pg_upgrade \ pg_upgrade_support \ + pg_xlogdump \ pgbench \ pgcrypto \ pgrowlocks \ diff --git a/contrib/pg_xlogdump/Makefile b/contrib/pg_xlogdump/Makefile new file mode 100644 index 0000000..1adef35 --- /dev/null +++ b/contrib/pg_xlogdump/Makefile @@ -0,0 +1,37 @@ +# contrib/pg_xlogdump/Makefile + +PGFILEDESC = "pg_xlogdump" +PGAPPICON=win32 + +PROGRAM = pg_xlogdump +OBJS = pg_xlogdump.o compat.o tables.o xlogreader.o $(RMGRDESCOBJS) \ + $(WIN32RES) + +# XXX: Perhaps this should be done by a wildcard rule so that you don't need +# to remember to add new rmgrdesc files to this list. +RMGRDESCSOURCES = clogdesc.c dbasedesc.c gindesc.c gistdesc.c hashdesc.c \ + heapdesc.c mxactdesc.c nbtdesc.c relmapdesc.c seqdesc.c smgrdesc.c \ + spgdesc.c standbydesc.c tblspcdesc.c xactdesc.c xlogdesc.c + +RMGRDESCOBJS = $(patsubst %.c,%.o,$(RMGRDESCSOURCES)) + +EXTRA_CLEAN = $(RMGRDESCSOURCES) xlogreader.c + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = contrib/pg_xlogdump +top_builddir = ../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif + +override CPPFLAGS := -DFRONTEND $(CPPFLAGS) + +xlogreader.c: % : $(top_srcdir)/src/backend/access/transam/% + rm -f $@ && $(LN_S) $< . + +$(RMGRDESCSOURCES): % : $(top_srcdir)/src/backend/access/rmgrdesc/% + rm -f $@ && $(LN_S) $< . diff --git a/contrib/pg_xlogdump/compat.c b/contrib/pg_xlogdump/compat.c new file mode 100644 index 0000000..e150afb --- /dev/null +++ b/contrib/pg_xlogdump/compat.c @@ -0,0 +1,58 @@ +/*------------------------------------------------------------------------- + * + * compat.c + * Reimplementations of various backend functions. + * + * Portions Copyright (c) 2012, PostgreSQL Global Development Group + * + * IDENTIFICATION + * contrib/pg_xlogdump/compat.c + * + * This file contains client-side implementations for various backend + * functions that the rm_desc functions in *desc.c files rely on. + * + *------------------------------------------------------------------------- + */ + +/* ugly hack, same as in e.g pg_controldata */ +#define FRONTEND 1 +#include "postgres.h" + +#include "catalog/catalog.h" +#include "datatype/timestamp.h" +#include "lib/stringinfo.h" +#include "storage/relfilenode.h" +#include "utils/timestamp.h" +#include "utils/datetime.h" + +const char * +timestamptz_to_str(TimestampTz dt) +{ + return "unimplemented-timestamp"; +} + +const char * +relpathbackend(RelFileNode rnode, BackendId backend, ForkNumber forknum) +{ + return "unimplemented-relpathbackend"; +} + +/* + * Provide a hacked up compat layer for StringInfos so xlog desc functions can + * be linked/called. + */ +void +appendStringInfo(StringInfo str, const char *fmt, ...) +{ + va_list args; + + va_start(args, fmt); + vprintf(fmt, args); + va_end(args); +} + +void +appendStringInfoString(StringInfo str, const char *string) +{ + appendStringInfo(str, "%s", string); +} diff --git a/contrib/pg_xlogdump/pg_xlogdump.c b/contrib/pg_xlogdump/pg_xlogdump.c new file mode 100644 index 0000000..29ee73e --- /dev/null +++ b/contrib/pg_xlogdump/pg_xlogdump.c @@ -0,0 +1,654 @@ +/*------------------------------------------------------------------------- + * + * pg_xlogdump.c - decode and display WAL + * + * Copyright (c) 2012, PostgreSQL Global Development Group + * + * IDENTIFICATION + * contrib/pg_xlogdump/pg_xlogdump.c + *------------------------------------------------------------------------- + */ + +/* ugly hack, same as in e.g pg_controldata */ +#define FRONTEND 1 +#include "postgres.h" + +#include <unistd.h> + +#include "access/xlog.h" +#include "access/xlogreader.h" +#include "access/rmgr.h" +#include "access/transam.h" + +#include "catalog/catalog.h" + +#include "getopt_long.h" + +static const char *progname; + +typedef struct XLogDumpPrivateData +{ + TimeLineID timeline; + char *inpath; + XLogRecPtr startptr; + XLogRecPtr endptr; + + /* display options */ + bool bkp_details; + int stop_after_records; + int already_displayed_records; + + /* filter options */ + int filter_by_rmgr; + TransactionId filter_by_xid; +} XLogDumpPrivateData; + +static void fatal_error(const char *fmt, ...) +__attribute__((format(PG_PRINTF_ATTRIBUTE, 1, 2))); + +static void fatal_error(const char *fmt, ...) +{ + va_list args; + fflush(stdout); + + fprintf(stderr, "%s: fatal_error: ", progname); + va_start(args, fmt); + vfprintf(stderr, fmt, args); + va_end(args); + fputc('\n', stderr); + exit(EXIT_FAILURE); +} + +/* + * Check whether directory exists and whether we can open it. Keep errno set + * error reporting by the caller. + */ +static bool +verify_directory(const char *directory) +{ + int fd = open(directory, O_DIRECTORY|O_RDONLY); + if (fd < 0) + return false; + close(fd); + return true; +} + +static void +split_path(const char *path, char **dir, char **fname) +{ + char *sep; + + /* split filepath into directory & filename */ + sep = strrchr(path, '/'); + + /* directory path */ + if (sep != NULL) + { + /* windows doesn't have strndup */ + *dir = strdup(path); + (*dir)[(sep - path) + 1] = '\0'; + *fname = strdup(sep + 1); + } + /* local directory */ + else + { + *dir = NULL; + *fname = strdup(path); + } +} + +/* + * Try to find the file in several places: + * if directory == NULL: + * fname + * XLOGDIR / fname + * $PGDATA / XLOGDIR / fname + * else + * directory / fname + * directory / XLOGDIR / fname + * + * return a read only fd + */ +static int +fuzzy_open_file(const char *directory, const char *fname) +{ + int fd = -1; + char fpath[MAXPGPATH]; + + if (directory == NULL) + { + const char* datadir; + + /* fname */ + fd = open(fname, O_RDONLY | PG_BINARY, 0); + if (fd < 0 && errno != ENOENT) + return -1; + else if (fd > 0) + return fd; + + /* XLOGDIR / fname */ + snprintf(fpath, MAXPGPATH, "%s/%s", + XLOGDIR, fname); + fd = open(fpath, O_RDONLY | PG_BINARY, 0); + if (fd < 0 && errno != ENOENT) + return -1; + else if (fd > 0) + return fd; + + datadir = getenv("PGDATA"); + /* $PGDATA / XLOGDIR / fname */ + if (datadir != NULL) + { + snprintf(fpath, MAXPGPATH, "%s/%s/%s", + datadir, XLOGDIR, fname); + fd = open(fpath, O_RDONLY | PG_BINARY, 0); + if (fd < 0 && errno != ENOENT) + return -1; + else if (fd > 0) + return fd; + } + } + else + { + /* directory / fname */ + snprintf(fpath, MAXPGPATH, "%s/%s", + directory, fname); + fd = open(fpath, O_RDONLY | PG_BINARY, 0); + if (fd < 0 && errno != ENOENT) + return -1; + else if (fd > 0) + return fd; + + /* directory / XLOGDIR / fname */ + snprintf(fpath, MAXPGPATH, "%s/%s/%s", + directory, XLOGDIR, fname); + fd = open(fpath, O_RDONLY | PG_BINARY, 0); + if (fd < 0 && errno != ENOENT) + return -1; + else if (fd > 0) + return fd; + } + return -1; +} + +/* this should probably be put in a general implementation */ +static void +XLogDumpXLogRead(const char *directory, TimeLineID timeline_id, + XLogRecPtr startptr, char *buf, Size count) +{ + char *p; + XLogRecPtr recptr; + Size nbytes; + + static int sendFile = -1; + static XLogSegNo sendSegNo = 0; + static uint32 sendOff = 0; + + p = buf; + recptr = startptr; + nbytes = count; + + while (nbytes > 0) + { + uint32 startoff; + int segbytes; + int readbytes; + + startoff = recptr % XLogSegSize; + + if (sendFile < 0 || !XLByteInSeg(recptr, sendSegNo)) + { + char fname[MAXFNAMELEN]; + + /* Switch to another logfile segment */ + if (sendFile >= 0) + close(sendFile); + + XLByteToSeg(recptr, sendSegNo); + + XLogFileName(fname, timeline_id, sendSegNo); + + sendFile = fuzzy_open_file(directory, fname); + + if (sendFile < 0) + fatal_error("could not find file \"%s\": %s", + fname, strerror(errno)); + sendOff = 0; + } + + /* Need to seek in the file? */ + if (sendOff != startoff) + { + if (lseek(sendFile, (off_t) startoff, SEEK_SET) < 0) + { + int err = errno; + char fname[MAXPGPATH]; + XLogFileName(fname, timeline_id, sendSegNo); + + fatal_error("could not seek in log segment %s to offset %u: %s", + fname, startoff, strerror(err)); + } + sendOff = startoff; + } + + /* How many bytes are within this segment? */ + if (nbytes > (XLogSegSize - startoff)) + segbytes = XLogSegSize - startoff; + else + segbytes = nbytes; + + readbytes = read(sendFile, p, segbytes); + if (readbytes <= 0) + { + int err = errno; + char fname[MAXPGPATH]; + XLogFileName(fname, timeline_id, sendSegNo); + + fatal_error("could not read from log segment %s, offset %d, length %d: %s", + fname, sendOff, segbytes, strerror(err)); + } + + /* Update state for read */ + recptr += readbytes; + + sendOff += readbytes; + nbytes -= readbytes; + p += readbytes; + } +} + +static int +XLogDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen, + char *readBuff, TimeLineID *curFileTLI) +{ + XLogDumpPrivateData *private = state->private_data; + int count = XLOG_BLCKSZ; + + if (private->endptr != InvalidXLogRecPtr) + { + if (targetPagePtr + XLOG_BLCKSZ <= private->endptr) + count = XLOG_BLCKSZ; + else if (targetPagePtr + reqLen <= private->endptr) + count = private->endptr - targetPagePtr; + else + return -1; + } + + XLogDumpXLogRead(private->inpath, private->timeline, targetPagePtr, + readBuff, count); + + return count; +} + +static void +XLogDumpDisplayRecord(XLogReaderState *state, XLogRecord *record) +{ + XLogDumpPrivateData *config = (XLogDumpPrivateData *) state->private_data; + const RmgrData *rmgr = &RmgrTable[record->xl_rmid]; + + if (config->filter_by_rmgr != -1 && + config->filter_by_rmgr != record->xl_rmid) + return; + + if (TransactionIdIsValid(config->filter_by_xid) && + config->filter_by_xid != record->xl_xid) + return; + + config->already_displayed_records++; + + printf("xlog record: rmgr: %-11s, record_len: %6u, tot_len: %6u, tx: %10u, lsn: %X/%08X, prev %X/%08X, bkp: %u%u%u%u,desc:", + rmgr->rm_name, + record->xl_len, record->xl_tot_len, + record->xl_xid, + (uint32) (state->ReadRecPtr >> 32), (uint32) state->ReadRecPtr, + (uint32) (record->xl_prev >> 32), (uint32) record->xl_prev, + !!(XLR_BKP_BLOCK(0) & record->xl_info), + !!(XLR_BKP_BLOCK(1) & record->xl_info), + !!(XLR_BKP_BLOCK(2) & record->xl_info), + !!(XLR_BKP_BLOCK(3) & record->xl_info)); + + /* the desc routine will printf the description directly to stdout */ + rmgr->rm_desc(NULL, record->xl_info, XLogRecGetData(record)); + + putchar('\n'); + + if (config->bkp_details) + { + int off; + char *blk = (char *) XLogRecGetData(record) + record->xl_len; + + for (off = 0; off < XLR_MAX_BKP_BLOCKS; off++) + { + BkpBlock bkpb; + + if (!(XLR_BKP_BLOCK(off) & record->xl_info)) + continue; + + memcpy(&bkpb, blk, sizeof(BkpBlock)); + blk += sizeof(BkpBlock); + blk += BLCKSZ - bkpb.hole_length; + + printf("\tbackup bkp #%u; rel %u/%u/%u; fork: %s; block: %u; hole: offset: %u, length: %u\n", + off, bkpb.node.spcNode, bkpb.node.dbNode, bkpb.node.relNode, + forkNames[bkpb.fork], bkpb.block, bkpb.hole_offset, bkpb.hole_length); + } + } +} + +static void +usage(void) +{ + printf("%s: reads/writes postgres transaction logs for debugging.\n\n", + progname); + printf("Usage:\n"); + printf(" %s [OPTION] [STARTSEG [ENDSEG]] \n", progname); + printf("\nOptions:\n"); + printf(" -b, --bkp-details output detailed information about backup blocks\n"); + printf(" -e, --end RECPTR read wal up to RECPTR\n"); + printf(" -h, --help show this help, then exit\n"); + printf(" -n, --limit RECORDS only display n records, abort afterwards\n"); + printf(" -p, --path PATH from where do we want to read? cwd/pg_xlog is the default\n"); + printf(" -r, --rmgr RMGR only show records generated by the rmgr RMGR\n"); + printf(" -s, --start RECPTR read wal in directory indicated by -p starting at RECPTR\n"); + printf(" -t, --timeline TLI which timeline do we want to read, defaults to 1\n"); + printf(" -V, --version output version information, then exit\n"); + printf(" -x, --xid XID only show records with transactionid XID\n"); +} + +int +main(int argc, char **argv) +{ + uint32 xlogid; + uint32 xrecoff; + XLogReaderState *xlogreader_state; + XLogDumpPrivateData private; + XLogRecord *record; + XLogRecPtr first_record; + char *errormsg; + + static struct option long_options[] = { + {"bkp-details", no_argument, NULL, 'b'}, + {"end", required_argument, NULL, 'e'}, + {"help", no_argument, NULL, '?'}, + {"limit", required_argument, NULL, 'n'}, + {"path", required_argument, NULL, 'p'}, + {"rmgr", required_argument, NULL, 'r'}, + {"start", required_argument, NULL, 's'}, + {"timeline", required_argument, NULL, 't'}, + {"xid", required_argument, NULL, 'x'}, + {"version", no_argument, NULL, 'V'}, + {NULL, 0, NULL, 0} + }; + + int option; + int optindex = 0; + + progname = get_progname(argv[0]); + + memset(&private, 0, sizeof(XLogDumpPrivateData)); + + private.timeline = 1; + private.bkp_details = false; + private.startptr = InvalidXLogRecPtr; + private.endptr = InvalidXLogRecPtr; + private.stop_after_records = -1; + private.already_displayed_records = 0; + private.filter_by_rmgr = -1; + private.filter_by_xid = InvalidTransactionId; + + if (argc <= 1) + { + fprintf(stderr, "%s: no arguments specified\n", progname); + goto bad_argument; + } + + while ((option = getopt_long(argc, argv, "be:?n:p:r:s:t:Vx:", + long_options, &optindex)) != -1) + { + switch (option) + { + case 'b': + private.bkp_details = true; + break; + case 'e': + if (sscanf(optarg, "%X/%X", &xlogid, &xrecoff) != 2) + { + fprintf(stderr, "%s: could not parse parse --end %s\n", + progname, optarg); + goto bad_argument; + } + private.endptr = (uint64)xlogid << 32 | xrecoff; + break; + case '?': + usage(); + exit(EXIT_SUCCESS); + break; + case 'n': + if (sscanf(optarg, "%d", &private.stop_after_records) != 1) + { + fprintf(stderr, "%s: could not parse parse --limit %s\n", + progname, optarg); + goto bad_argument; + } + break; + case 'p': + private.inpath = strdup(optarg); + break; + case 'r': + { + int i; + for (i = 0; i < RM_MAX_ID; i++) + { + if (strcmp(optarg, RmgrTable[i].rm_name) == 0) + { + private.filter_by_rmgr = i; + break; + } + } + + if (private.filter_by_rmgr == -1) + { + fprintf(stderr, "%s: --rmgr %s does not exist\n", + progname, optarg); + goto bad_argument; + } + } + break; + case 's': + if (sscanf(optarg, "%X/%X", &xlogid, &xrecoff) != 2) + { + fprintf(stderr, "%s: could not parse parse --end %s\n", + progname, optarg); + goto bad_argument; + } + else + private.startptr = (uint64)xlogid << 32 | xrecoff; + break; + case 't': + if (sscanf(optarg, "%d", &private.timeline) != 1) + { + fprintf(stderr, "%s: could not parse timeline --timeline %s\n", + progname, optarg); + goto bad_argument; + } + break; + case 'V': + puts("pg_xlogdump (PostgreSQL) " PG_VERSION); + exit(EXIT_SUCCESS); + break; + case 'x': + if (sscanf(optarg, "%u", &private.filter_by_xid) != 1) + { + fprintf(stderr, "%s: could not parse --xid %s as a valid xid\n", + progname, optarg); + goto bad_argument; + } + break; + default: + goto bad_argument; + } + } + + if ((optind + 2) < argc) + { + fprintf(stderr, + "%s: too many command-line arguments (first is \"%s\")\n", + progname, argv[optind + 2]); + goto bad_argument; + } + + if (private.inpath != NULL) + { + /* validdate path points to directory */ + if (!verify_directory(private.inpath)) + { + fprintf(stderr, + "%s: --path %s is cannot be opened: %s", + progname, private.inpath, strerror(errno)); + goto bad_argument; + } + } + + /* parse files as start/end boundaries, extract path if not specified */ + if (optind < argc) + { + char *directory = NULL; + char *fname = NULL; + int fd; + XLogSegNo segno; + + split_path(argv[optind], &directory, &fname); + + if (private.inpath == NULL && directory != NULL) + { + private.inpath = directory; + + if (!verify_directory(private.inpath)) + fatal_error("cannot open directory %s: %s", + private.inpath, strerror(errno)); + } + + fd = fuzzy_open_file(private.inpath, fname); + if (fd < 0) + fatal_error("could not open file %s", fname); + close(fd); + + /* parse position from file */ + XLogFromFileName(fname, &private.timeline, &segno); + + if (XLogRecPtrIsInvalid(private.startptr)) + XLogSegNoOffsetToRecPtr(segno, 0, private.startptr); + else if (!XLByteInSeg(private.startptr, segno)) + { + fprintf(stderr, + "%s: --end %X/%X is not inside file \"%s\"\n", + progname, + (uint32)(private.startptr >> 32), + (uint32)private.startptr, + fname); + goto bad_argument; + } + + /* no second file specified, set end position */ + if (!(optind + 1 < argc) && XLogRecPtrIsInvalid(private.endptr)) + XLogSegNoOffsetToRecPtr(segno + 1, 0, private.endptr); + + /* parse ENDSEG if passed */ + if (optind + 1 < argc) + { + XLogSegNo endsegno; + + /* ignore directory, already have that */ + split_path(argv[optind + 1], &directory, &fname); + + fd = fuzzy_open_file(private.inpath, fname); + if (fd < 0) + fatal_error("could not open file %s", fname); + close(fd); + + /* parse position from file */ + XLogFromFileName(fname, &private.timeline, &endsegno); + + if (endsegno < segno) + fatal_error("ENDSEG %s is before STARSEG %s", + argv[optind + 1], argv[optind]); + + if (XLogRecPtrIsInvalid(private.endptr)) + XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.endptr); + + /* set segno to endsegno for check of --end */ + segno = endsegno; + } + + + if (!XLByteInSeg(private.endptr, segno) && + private.endptr != (segno + 1) * XLogSegSize) + { + fprintf(stderr, + "%s: --end %X/%X is not inside file \"%s\"\n", + progname, + (uint32)(private.endptr >> 32), + (uint32)private.endptr, + argv[argc -1]); + goto bad_argument; + } + } + + /* we don't know what to print */ + if (XLogRecPtrIsInvalid(private.startptr)) + { + fprintf(stderr, "%s: no --start given in range mode.\n", progname); + goto bad_argument; + } + + /* done with argument parsing, do the actual work */ + + /* we have everything we need, start reading */ + xlogreader_state = XLogReaderAllocate(private.startptr, + XLogDumpReadPage, + &private); + + /* first find a valid recptr to start from */ + first_record = XLogFindNextRecord(xlogreader_state, private.startptr); + + if (first_record == InvalidXLogRecPtr) + fatal_error("could not find a valid record after %X/%X", + (uint32) (private.startptr >> 32), + (uint32) private.startptr); + + /* + * Display a message that were skipping data if `from` wasn't a pointer + * to the start of a record and also wasn't a pointer to the beginning + * of a segment (e.g. we were used in file mode). + */ + if (first_record != private.startptr && (private.startptr % XLogSegSize) != 0) + printf("first record is after %X/%X, at %X/%X, skipping over %u bytes\n", + (uint32) (private.startptr >> 32), (uint32) private.startptr, + (uint32) (first_record >> 32), (uint32) first_record, + (uint32) (first_record - private.startptr)); + + while ((record = XLogReadRecord(xlogreader_state, first_record, &errormsg))) + { + /* continue after the last record */ + first_record = InvalidXLogRecPtr; + XLogDumpDisplayRecord(xlogreader_state, record); + + /* check whether we printed enough */ + if (private.stop_after_records > 0 && + private.already_displayed_records >= private.stop_after_records) + break; + } + + if (errormsg) + fatal_error("error in WAL record at %X/%X: %s\n", + (uint32)(xlogreader_state->ReadRecPtr >> 32), + (uint32)xlogreader_state->ReadRecPtr, + errormsg); + + XLogReaderFree(xlogreader_state); + + return EXIT_SUCCESS; +bad_argument: + fprintf(stderr, "Try \"%s --help\" for more information.\n", progname); + return EXIT_FAILURE; +} diff --git a/contrib/pg_xlogdump/tables.c b/contrib/pg_xlogdump/tables.c new file mode 100644 index 0000000..e947e0d --- /dev/null +++ b/contrib/pg_xlogdump/tables.c @@ -0,0 +1,78 @@ +/*------------------------------------------------------------------------- + * + * tables.c + * Support data for xlogdump.c + * + * Portions Copyright (c) 2012, PostgreSQL Global Development Group + * + * IDENTIFICATION + * contrib/pg_xlogdump/tables.c + * + * NOTES + * + *------------------------------------------------------------------------- + */ + +/* + * rmgr.c + * + * Resource managers definition + * + * src/backend/access/transam/rmgr.c + */ +#include "postgres.h" + +#include "access/clog.h" +#include "access/gin.h" +#include "access/gist_private.h" +#include "access/hash.h" +#include "access/heapam_xlog.h" +#include "access/multixact.h" +#include "access/nbtree.h" +#include "access/spgist.h" +#include "access/xact.h" +#include "access/xlog_internal.h" +#include "catalog/storage_xlog.h" +#include "commands/dbcommands.h" +#include "commands/sequence.h" +#include "commands/tablespace.h" +#include "storage/standby.h" +#include "utils/relmapper.h" +#include "catalog/catalog.h" + +/* + * Table of fork names. + * + * needs to be synced with src/backend/catalog/catalog.c + */ +const char *forkNames[] = { + "main", /* MAIN_FORKNUM */ + "fsm", /* FSM_FORKNUM */ + "vm", /* VISIBILITYMAP_FORKNUM */ + "init" /* INIT_FORKNUM */ +}; + +/* + * RmgrTable linked only to functions available outside of the backend. + * + * needs to be synced with src/backend/access/transam/rmgr.c + */ +const RmgrData RmgrTable[RM_MAX_ID + 1] = { + {"XLOG", NULL, xlog_desc, NULL, NULL, NULL}, + {"Transaction", NULL, xact_desc, NULL, NULL, NULL}, + {"Storage", NULL, smgr_desc, NULL, NULL, NULL}, + {"CLOG", NULL, clog_desc, NULL, NULL, NULL}, + {"Database", NULL, dbase_desc, NULL, NULL, NULL}, + {"Tablespace", NULL, tblspc_desc, NULL, NULL, NULL}, + {"MultiXact", NULL, multixact_desc, NULL, NULL, NULL}, + {"RelMap", NULL, relmap_desc, NULL, NULL, NULL}, + {"Standby", NULL, standby_desc, NULL, NULL, NULL}, + {"Heap2", NULL, heap2_desc, NULL, NULL, NULL}, + {"Heap", NULL, heap_desc, NULL, NULL, NULL}, + {"Btree", NULL, btree_desc, NULL, NULL, NULL}, + {"Hash", NULL, hash_desc, NULL, NULL, NULL}, + {"Gin", NULL, gin_desc, NULL, NULL, NULL}, + {"Gist", NULL, gist_desc, NULL, NULL, NULL}, + {"Sequence", NULL, seq_desc, NULL, NULL, NULL}, + {"SPGist", NULL, spg_desc, NULL, NULL, NULL} +}; diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml index df84054..49cb7ac 100644 --- a/doc/src/sgml/ref/allfiles.sgml +++ b/doc/src/sgml/ref/allfiles.sgml @@ -178,6 +178,7 @@ Complete list of usable sgml source files in this directory.<!ENTITY pgReceivexlog SYSTEM "pg_receivexlog.sgml"><!ENTITYpgResetxlog SYSTEM "pg_resetxlog.sgml"><!ENTITY pgRestore SYSTEM "pg_restore.sgml"> +<!ENTITY pgXlogdump SYSTEM "pg_xlogdump.sgml"><!ENTITY postgres SYSTEM "postgres-ref.sgml"><!ENTITY postmaster SYSTEM "postmaster.sgml"><!ENTITY psqlRef SYSTEM "psql-ref.sgml"> diff --git a/doc/src/sgml/ref/pg_xlogdump.sgml b/doc/src/sgml/ref/pg_xlogdump.sgml new file mode 100644 index 0000000..7a27c7b --- /dev/null +++ b/doc/src/sgml/ref/pg_xlogdump.sgml @@ -0,0 +1,76 @@ +<!-- +doc/src/sgml/ref/pg_xlogdump.sgml +PostgreSQL documentation +--> + +<refentry id="APP-PGXLOGDUMP"> + <refmeta> + <refentrytitle><application>pg_xlogdump</application></refentrytitle> + <manvolnum>1</manvolnum> + <refmiscinfo>Application</refmiscinfo> + </refmeta> + + <refnamediv> + <refname>pg_xlogdump</refname> + <refpurpose>Display the write-ahead log of a <productname>PostgreSQL</productname> database cluster</refpurpose> + </refnamediv> + + <indexterm zone="app-pgxlogdump"> + <primary>pg_xlogdump</primary> + </indexterm> + + <refsynopsisdiv> + <cmdsynopsis> + <command>pg_xlogdump</command> + <arg choice="opt"><option>-b</option></arg> + <arg choice="opt"><option>-e</option> <replaceable class="parameter">xlogrecptr</replaceable></arg> + <arg choice="opt"><option>-f</option> <replaceable class="parameter">filename</replaceable></arg> + <arg choice="opt"><option>-h</option></arg> + <arg choice="opt"><option>-p</option> <replaceable class="parameter">directory</replaceable></arg> + <arg choice="opt"><option>-s</option> <replaceable class="parameter">xlogrecptr</replaceable></arg> + <arg choice="opt"><option>-t</option> <replaceable class="parameter">timelineid</replaceable></arg> + <arg choice="opt"><option>-v</option></arg> + </cmdsynopsis> + </refsynopsisdiv> + + <refsect1 id="R1-APP-PGXLOGDUMP-1"> + <title>Description</title> + <para> + <command>pg_xlogdump</command> display the write-ahead log (WAL) and is only + useful for debugging or educational purposes. + </para> + + <para> + This utility can only be run by the user who installed the server, because + it requires read access to the data directory. It does not perform any + modifications. + </para> + </refsect1> + + <refsect1> + <title>Options</title> + + <para> + The following command-line options control the location and format of the + output. + + <variablelist> + <varlistentry> + <term><option>-p <replaceable class="parameter">directory</replaceable></option></term> + <listitem> + <para> + Directory to find xlog files in. + </para> + </listitem> + </varlistentry> + </variablelist> + </para> + </refsect1> + + <refsect1> + <title>Notes</title> + <para> + Can give wrong results when the server is running. + </para> + </refsect1> +</refentry> diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml index 0872168..fed1fdd 100644 --- a/doc/src/sgml/reference.sgml +++ b/doc/src/sgml/reference.sgml @@ -225,6 +225,7 @@ &pgDumpall; &pgReceivexlog; &pgRestore; + &pgXlogdump; &psqlRef; &reindexdb; &vacuumdb; diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c index cc210a7..4e94af1 100644 --- a/src/backend/access/transam/rmgr.c +++ b/src/backend/access/transam/rmgr.c @@ -24,6 +24,7 @@#include "storage/standby.h"#include "utils/relmapper.h" +/* Also update contrib/pg_xlogdump/tables.c if you add something here. */const RmgrData RmgrTable[RM_MAX_ID + 1] = { {"XLOG", xlog_redo, xlog_desc, NULL, NULL, NULL}, diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c index 6455ef0..92ab3ca 100644 --- a/src/backend/catalog/catalog.c +++ b/src/backend/catalog/catalog.c @@ -52,6 +52,8 @@ * If you add a new entry, remember to update the errhint below, and the * documentation for pg_relation_size().Also keep FORKNAMECHARS above * up-to-date. + * + * Also update contrib/pg_xlogdump/tables.c if you add something here. */const char *forkNames[] = { "main", /* MAIN_FORKNUM */ diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm index d587365..7b6ed41 100644 --- a/src/tools/msvc/Mkvcbuild.pm +++ b/src/tools/msvc/Mkvcbuild.pm @@ -41,7 +41,7 @@ my $contrib_extraincludes =my $contrib_extrasource = { 'cube' => [ 'cubescan.l', 'cubeparse.y' ], 'seg' => [ 'segscan.l', 'segparse.y' ] }; -my @contrib_excludes = ('pgcrypto', 'intagg', 'sepgsql'); +my @contrib_excludes = ('intagg', 'pgcrypto', 'pg_xlogdump', 'sepgsql');sub mkvcbuild{ @@ -409,6 +409,20 @@ sub mkvcbuild 'localtime.c'); $zic->AddReference($libpgport); + my $pgxlogdump = $solution->AddProject('pg_xlogdump', 'exe', 'contrib'); + $pgxlogdump->{name} = 'pg_xlogdump'; + $pgxlogdump->AddIncludeDir('src\backend'); + $pgxlogdump->AddFiles('contrib\pg_xlogdump', + 'compat.c', 'pg_xlogdump.c', 'tables.c'); + $pgxlogdump->AddFile('src\backend\access\transam\xlogreader.c'); + $pgxlogdump->AddFiles('src\backend\access\rmgrdesc', + 'clogdesc.c', 'dbasedesc.c', 'gindesc.c', 'gistdesc.c', 'hashdesc.c', + 'heapdesc.c', 'mxactdesc.c', 'nbtdesc.c', 'relmapdesc.c', 'seqdesc.c', + 'smgrdesc.c', 'spgdesc.c', 'standbydesc.c', 'tblspcdesc.c', + 'xactdesc.c', 'xlogdesc.c'); + $pgxlogdump->AddReference($libpgport); + $pgxlogdump->AddDefine('FRONTEND'); + if ($solution->{options}->{xml}) { $contrib_extraincludes->{'pgxml'} = [ -- 1.7.12.289.g0ce9864.dirty
From: Andres Freund <andres@anarazel.de> The way xlog reading was done up to now made it impossible to use that nontrivial code outside of xlog.c although it is useful for different purposes like debugging wal (xlogdump) and decoding wal back into logical changes. Authors: Heikki Linnakangas, Andres Freund, Alvaro Herrera Reviewed-By: Alvaro Herrera ---src/backend/access/transam/Makefile | 2 +-src/backend/access/transam/xlog.c | 830 +++++----------------------src/backend/access/transam/xlogreader.c| 987 ++++++++++++++++++++++++++++++++src/backend/nls.mk | 5 +-src/include/access/xlogreader.h |141 +++++5 files changed, 1261 insertions(+), 704 deletions(-)create mode 100644 src/backend/access/transam/xlogreader.ccreatemode 100644 src/include/access/xlogreader.h diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile index 700cfd8..eb6cfc5 100644 --- a/src/backend/access/transam/Makefile +++ b/src/backend/access/transam/Makefile @@ -14,7 +14,7 @@ include $(top_builddir)/src/Makefile.globalOBJS = clog.o transam.o varsup.o xact.o rmgr.o slru.o subtrans.omultixact.o \ timeline.o twophase.o twophase_rmgr.o xlog.o xlogarchive.o xlogfuncs.o \ - xlogutils.o + xlogreader.o xlogutils.oinclude $(top_srcdir)/src/backend/common.mk diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 51a515a..310a654 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -30,6 +30,7 @@#include "access/twophase.h"#include "access/xact.h"#include "access/xlog_internal.h" +#include "access/xlogreader.h"#include "access/xlogutils.h"#include "catalog/catversion.h"#include "catalog/pg_control.h" @@ -548,7 +549,6 @@ static int readFile = -1;static XLogSegNo readSegNo = 0;static uint32 readOff = 0;static uint32 readLen= 0; -static bool readFileHeaderValidated = false;static XLogSource readSource = 0; /* XLOG_FROM_* code *//* @@ -561,6 +561,13 @@ static XLogSource readSource = 0; /* XLOG_FROM_* code */static XLogSource currentSource = 0; /* XLOG_FROM_* code */static bool lastSourceFailed = false; +typedef struct XLogPageReadPrivate +{ + int emode; + bool fetching_ckpt; /* are we fetching a checkpoint record? */ + bool randAccess; +} XLogPageReadPrivate; +/* * These variables track when we last obtained some WAL data to process, * and where we got it from. (XLogReceiptSourceis initially the same as @@ -572,18 +579,9 @@ static bool lastSourceFailed = false;static TimestampTz XLogReceiptTime = 0;static XLogSource XLogReceiptSource= 0; /* XLOG_FROM_* code */ -/* Buffer for currently read page (XLOG_BLCKSZ bytes) */ -static char *readBuf = NULL; - -/* Buffer for current ReadRecord result (expandable) */ -static char *readRecordBuf = NULL; -static uint32 readRecordBufSize = 0; -/* State information for XLOG reading */static XLogRecPtr ReadRecPtr; /* start of last record read */static XLogRecPtrEndRecPtr; /* end+1 of last record read */ -static TimeLineID lastPageTLI = 0; -static TimeLineID lastSegmentTLI = 0;static XLogRecPtr minRecoveryPoint; /* local copy of * ControlFile->minRecoveryPoint */ @@ -627,8 +625,8 @@ static bool InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,static int XLogFileRead(XLogSegNosegno, int emode, TimeLineID tli, int source, bool notexistOk);static int XLogFileReadAnyTLI(XLogSegNosegno, int emode, int source); -static bool XLogPageRead(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt, - bool randAccess); +static int XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, + int reqLen, char *readBuf, TimeLineID *readTLI);static bool WaitForWALToBecomeAvailable(XLogRecPtr RecPtr,bool randAccess, bool fetching_ckpt);static int emode_for_corrupt_record(int emode,XLogRecPtr RecPtr); @@ -639,12 +637,11 @@ static void UpdateLastRemovedPtr(char *filename);static void ValidateXLOGDirectoryStructure(void);staticvoid CleanupBackupHistory(void);static void UpdateMinRecoveryPoint(XLogRecPtrlsn, bool force); -static XLogRecord *ReadRecord(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt); +static XLogRecord *ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, + int emode, bool fetching_ckpt);static void CheckRecoveryConsistency(void); -static bool ValidXLogPageHeader(XLogPageHeader hdr, int emode, bool segmentonly); -static bool ValidXLogRecordHeader(XLogRecPtr *RecPtr, XLogRecord *record, - int emode, bool randAccess); -static XLogRecord *ReadCheckpointRecord(XLogRecPtr RecPtr, int whichChkpt); +static XLogRecord *ReadCheckpointRecord(XLogReaderState *xlogreader, + XLogRecPtr RecPtr, int whichChkpt);static bool rescanLatestTimeLine(void);static void WriteControlFile(void);staticvoid ReadControlFile(void); @@ -2652,9 +2649,6 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli, if (source != XLOG_FROM_STREAM) XLogReceiptTime = GetCurrentTimestamp(); - /* The file header needs to be validated on first access */ - readFileHeaderValidated = false; - return fd; } if (errno != ENOENT || !notfoundOk) /* unexpected failure? */ @@ -2709,7 +2703,8 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source) if (source == XLOG_FROM_ANY || source== XLOG_FROM_ARCHIVE) { - fd = XLogFileRead(segno, emode, tli, XLOG_FROM_ARCHIVE, true); + fd = XLogFileRead(segno, emode, tli, + XLOG_FROM_ARCHIVE, true); if (fd != -1) { elog(DEBUG1,"got WAL segment from archive"); @@ -2721,7 +2716,8 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source) if (source == XLOG_FROM_ANY || source== XLOG_FROM_PG_XLOG) { - fd = XLogFileRead(segno, emode, tli, XLOG_FROM_PG_XLOG, true); + fd = XLogFileRead(segno, emode, tli, + XLOG_FROM_PG_XLOG, true); if (fd != -1) { if (!expectedTLEs) @@ -3178,102 +3174,6 @@ RestoreBackupBlock(XLogRecPtr lsn, XLogRecord *record, int block_index,}/* - * CRC-check an XLOG record. We do not believe the contents of an XLOG - * record (other than to the minimal extent of computing the amount of - * data to read in) until we've checked the CRCs. - * - * We assume all of the record (that is, xl_tot_len bytes) has been read - * into memory at *record. Also, ValidXLogRecordHeader() has accepted the - * record's header, which means in particular that xl_tot_len is at least - * SizeOfXlogRecord, so it is safe to fetch xl_len. - */ -static bool -RecordIsValid(XLogRecord *record, XLogRecPtr recptr, int emode) -{ - pg_crc32 crc; - int i; - uint32 len = record->xl_len; - BkpBlock bkpb; - char *blk; - size_t remaining = record->xl_tot_len; - - /* First the rmgr data */ - if (remaining < SizeOfXLogRecord + len) - { - /* ValidXLogRecordHeader() should've caught this already... */ - ereport(emode_for_corrupt_record(emode, recptr), - (errmsg("invalid record length at %X/%X", - (uint32) (recptr >> 32), (uint32) recptr))); - return false; - } - remaining -= SizeOfXLogRecord + len; - INIT_CRC32(crc); - COMP_CRC32(crc, XLogRecGetData(record), len); - - /* Add in the backup blocks, if any */ - blk = (char *) XLogRecGetData(record) + len; - for (i = 0; i < XLR_MAX_BKP_BLOCKS; i++) - { - uint32 blen; - - if (!(record->xl_info & XLR_BKP_BLOCK(i))) - continue; - - if (remaining < sizeof(BkpBlock)) - { - ereport(emode_for_corrupt_record(emode, recptr), - (errmsg("invalid backup block size in record at %X/%X", - (uint32) (recptr >> 32), (uint32) recptr))); - return false; - } - memcpy(&bkpb, blk, sizeof(BkpBlock)); - - if (bkpb.hole_offset + bkpb.hole_length > BLCKSZ) - { - ereport(emode_for_corrupt_record(emode, recptr), - (errmsg("incorrect hole size in record at %X/%X", - (uint32) (recptr >> 32), (uint32) recptr))); - return false; - } - blen = sizeof(BkpBlock) + BLCKSZ - bkpb.hole_length; - - if (remaining < blen) - { - ereport(emode_for_corrupt_record(emode, recptr), - (errmsg("invalid backup block size in record at %X/%X", - (uint32) (recptr >> 32), (uint32) recptr))); - return false; - } - remaining -= blen; - COMP_CRC32(crc, blk, blen); - blk += blen; - } - - /* Check that xl_tot_len agrees with our calculation */ - if (remaining != 0) - { - ereport(emode_for_corrupt_record(emode, recptr), - (errmsg("incorrect total length in record at %X/%X", - (uint32) (recptr >> 32), (uint32) recptr))); - return false; - } - - /* Finally include the record header */ - COMP_CRC32(crc, (char *) record, offsetof(XLogRecord, xl_crc)); - FIN_CRC32(crc); - - if (!EQ_CRC32(record->xl_crc, crc)) - { - ereport(emode_for_corrupt_record(emode, recptr), - (errmsg("incorrect resource manager data checksum in record at %X/%X", - (uint32) (recptr >> 32), (uint32) recptr))); - return false; - } - - return true; -} - -/* * Attempt to read an XLOG record. * * If RecPtr is not NULL, try to read a record at that position. Otherwise @@ -3286,511 +3186,65 @@ RecordIsValid(XLogRecord *record, XLogRecPtr recptr, int emode) * the returned record pointer alwayspoints there. */static XLogRecord * -ReadRecord(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt) +ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode, + bool fetching_ckpt){ XLogRecord *record; - XLogRecPtr tmpRecPtr = EndRecPtr; - bool randAccess = false; - uint32 len, - total_len; - uint32 targetRecOff; - uint32 pageHeaderSize; - bool gotheader; - - if (readBuf == NULL) - { - /* - * First time through, permanently allocate readBuf. We do it this - * way, rather than just making a static array, for two reasons: (1) - * no need to waste the storage in most instantiations of the backend; - * (2) a static char array isn't guaranteed to have any particular - * alignment, whereas malloc() will provide MAXALIGN'd storage. - */ - readBuf = (char *) malloc(XLOG_BLCKSZ); - Assert(readBuf != NULL); - } - - if (RecPtr == NULL) - { - RecPtr = &tmpRecPtr; - - /* - * RecPtr is pointing to end+1 of the previous WAL record. If - * we're at a page boundary, no more records can fit on the current - * page. We must skip over the page header, but we can't do that - * until we've read in the page, since the header size is variable. - */ - } - else - { - /* - * In this case, the passed-in record pointer should already be - * pointing to a valid record starting position. - */ - if (!XRecOffIsValid(*RecPtr)) - ereport(PANIC, - (errmsg("invalid record offset at %X/%X", - (uint32) (*RecPtr >> 32), (uint32) *RecPtr))); + XLogPageReadPrivate *private = (XLogPageReadPrivate *) xlogreader->private_data; - /* - * Since we are going to a random position in WAL, forget any prior - * state about what timeline we were in, and allow it to be any - * timeline in expectedTLEs. We also set a flag to allow curFileTLI - * to go backwards (but we can't reset that variable right here, since - * we might not change files at all). - */ - /* see comment in ValidXLogPageHeader */ - lastPageTLI = lastSegmentTLI = 0; - randAccess = true; /* allow curFileTLI to go backwards too */ - } + /* Pass through parameters to XLogPageRead */ + private->fetching_ckpt = fetching_ckpt; + private->emode = emode; + private->randAccess = (RecPtr != InvalidXLogRecPtr); /* This is the first try to read this page. */ lastSourceFailed= false; -retry: - /* Read the page containing the record */ - if (!XLogPageRead(RecPtr, emode, fetching_ckpt, randAccess)) - return NULL; - pageHeaderSize = XLogPageHeaderSize((XLogPageHeader) readBuf); - targetRecOff = (*RecPtr) % XLOG_BLCKSZ; - if (targetRecOff == 0) - { - /* - * At page start, so skip over page header. The Assert checks that - * we're not scribbling on caller's record pointer; it's OK because we - * can only get here in the continuing-from-prev-record case, since - * XRecOffIsValid rejected the zero-page-offset case otherwise. - */ - Assert(RecPtr == &tmpRecPtr); - (*RecPtr) += pageHeaderSize; - targetRecOff = pageHeaderSize; - } - else if (targetRecOff < pageHeaderSize) + do { - ereport(emode_for_corrupt_record(emode, *RecPtr), - (errmsg("invalid record offset at %X/%X", - (uint32) ((*RecPtr) >> 32), (uint32) *RecPtr))); - goto next_record_is_invalid; - } - if ((((XLogPageHeader) readBuf)->xlp_info & XLP_FIRST_IS_CONTRECORD) && - targetRecOff == pageHeaderSize) - { - ereport(emode_for_corrupt_record(emode, *RecPtr), - (errmsg("contrecord is requested by %X/%X", - (uint32) ((*RecPtr) >> 32), (uint32) *RecPtr))); - goto next_record_is_invalid; - } - - /* - * Read the record length. - * - * NB: Even though we use an XLogRecord pointer here, the whole record - * header might not fit on this page. xl_tot_len is the first field of - * the struct, so it must be on this page (the records are MAXALIGNed), - * but we cannot access any other fields until we've verified that we - * got the whole header. - */ - record = (XLogRecord *) (readBuf + (*RecPtr) % XLOG_BLCKSZ); - total_len = record->xl_tot_len; - - /* - * If the whole record header is on this page, validate it immediately. - * Otherwise do just a basic sanity check on xl_tot_len, and validate the - * rest of the header after reading it from the next page. The xl_tot_len - * check is necessary here to ensure that we enter the "Need to reassemble - * record" code path below; otherwise we might fail to apply - * ValidXLogRecordHeader at all. - */ - if (targetRecOff <= XLOG_BLCKSZ - SizeOfXLogRecord) - { - if (!ValidXLogRecordHeader(RecPtr, record, emode, randAccess)) - goto next_record_is_invalid; - gotheader = true; - } - else - { - if (total_len < SizeOfXLogRecord) + char *errormsg; + record = XLogReadRecord(xlogreader, RecPtr, &errormsg); + ReadRecPtr = xlogreader->ReadRecPtr; + EndRecPtr = xlogreader->EndRecPtr; + if (record == NULL) { - ereport(emode_for_corrupt_record(emode, *RecPtr), - (errmsg("invalid record length at %X/%X", - (uint32) ((*RecPtr) >> 32), (uint32) *RecPtr))); - goto next_record_is_invalid; - } - gotheader = false; - } - - /* - * Allocate or enlarge readRecordBuf as needed. To avoid useless small - * increases, round its size to a multiple of XLOG_BLCKSZ, and make sure - * it's at least 4*Max(BLCKSZ, XLOG_BLCKSZ) to start with. (That is - * enough for all "normal" records, but very large commit or abort records - * might need more space.) - */ - if (total_len > readRecordBufSize) - { - uint32 newSize = total_len; - - newSize += XLOG_BLCKSZ - (newSize % XLOG_BLCKSZ); - newSize = Max(newSize, 4 * Max(BLCKSZ, XLOG_BLCKSZ)); - if (readRecordBuf) - free(readRecordBuf); - readRecordBuf = (char *) malloc(newSize); - if (!readRecordBuf) - { - readRecordBufSize = 0; - /* We treat this as a "bogus data" condition */ - ereport(emode_for_corrupt_record(emode, *RecPtr), - (errmsg("record length %u at %X/%X too long", - total_len, (uint32) ((*RecPtr) >> 32), (uint32) *RecPtr))); - goto next_record_is_invalid; - } - readRecordBufSize = newSize; - } - - len = XLOG_BLCKSZ - (*RecPtr) % XLOG_BLCKSZ; - if (total_len > len) - { - /* Need to reassemble record */ - char *contrecord; - XLogPageHeader pageHeader; - XLogRecPtr pagelsn; - char *buffer; - uint32 gotlen; - - /* Initialize pagelsn to the beginning of the page this record is on */ - pagelsn = ((*RecPtr) / XLOG_BLCKSZ) * XLOG_BLCKSZ; - - /* Copy the first fragment of the record from the first page. */ - memcpy(readRecordBuf, readBuf + (*RecPtr) % XLOG_BLCKSZ, len); - buffer = readRecordBuf + len; - gotlen = len; + ereport(emode_for_corrupt_record(emode, + RecPtr ? RecPtr : EndRecPtr), + (errmsg_internal("%s", errormsg) /* already translated */)); - do - { - /* Calculate pointer to beginning of next page */ - pagelsn += XLOG_BLCKSZ; - /* Wait for the next page to become available */ - if (!XLogPageRead(&pagelsn, emode, false, false)) - return NULL; - - /* Check that the continuation on next page looks valid */ - pageHeader = (XLogPageHeader) readBuf; - if (!(pageHeader->xlp_info & XLP_FIRST_IS_CONTRECORD)) - { - ereport(emode_for_corrupt_record(emode, *RecPtr), - (errmsg("there is no contrecord flag in log segment %s, offset %u", - XLogFileNameP(curFileTLI, readSegNo), - readOff))); - goto next_record_is_invalid; - } - /* - * Cross-check that xlp_rem_len agrees with how much of the record - * we expect there to be left. - */ - if (pageHeader->xlp_rem_len == 0 || - total_len != (pageHeader->xlp_rem_len + gotlen)) - { - ereport(emode_for_corrupt_record(emode, *RecPtr), - (errmsg("invalid contrecord length %u in log segment %s, offset %u", - pageHeader->xlp_rem_len, - XLogFileNameP(curFileTLI, readSegNo), - readOff))); - goto next_record_is_invalid; - } + lastSourceFailed = true; - /* Append the continuation from this page to the buffer */ - pageHeaderSize = XLogPageHeaderSize(pageHeader); - contrecord = (char *) readBuf + pageHeaderSize; - len = XLOG_BLCKSZ - pageHeaderSize; - if (pageHeader->xlp_rem_len < len) - len = pageHeader->xlp_rem_len; - memcpy(buffer, (char *) contrecord, len); - buffer += len; - gotlen += len; - - /* If we just reassembled the record header, validate it. */ - if (!gotheader) + if (readFile >= 0) { - record = (XLogRecord *) readRecordBuf; - if (!ValidXLogRecordHeader(RecPtr, record, emode, randAccess)) - goto next_record_is_invalid; - gotheader = true; + close(readFile); + readFile = -1; } - } while (pageHeader->xlp_rem_len > len); - - record = (XLogRecord *) readRecordBuf; - if (!RecordIsValid(record, *RecPtr, emode)) - goto next_record_is_invalid; - pageHeaderSize = XLogPageHeaderSize((XLogPageHeader) readBuf); - XLogSegNoOffsetToRecPtr( - readSegNo, - readOff + pageHeaderSize + MAXALIGN(pageHeader->xlp_rem_len), - EndRecPtr); - ReadRecPtr = *RecPtr; - } - else - { - /* Record does not cross a page boundary */ - if (!RecordIsValid(record, *RecPtr, emode)) - goto next_record_is_invalid; - EndRecPtr = *RecPtr + MAXALIGN(total_len); - - ReadRecPtr = *RecPtr; - memcpy(readRecordBuf, record, total_len); - } - - /* - * Special processing if it's an XLOG SWITCH record - */ - if (record->xl_rmid == RM_XLOG_ID && record->xl_info == XLOG_SWITCH) - { - /* Pretend it extends to end of segment */ - EndRecPtr += XLogSegSize - 1; - EndRecPtr -= EndRecPtr % XLogSegSize; - - /* - * Pretend that readBuf contains the last page of the segment. This is - * just to avoid Assert failure in StartupXLOG if XLOG ends with this - * segment. - */ - readOff = XLogSegSize - XLOG_BLCKSZ; - } - return record; - -next_record_is_invalid: - lastSourceFailed = true; - - if (readFile >= 0) - { - close(readFile); - readFile = -1; - } - - /* In standby-mode, keep trying */ - if (StandbyMode) - goto retry; - else - return NULL; -} - -/* - * Check whether the xlog header of a page just read in looks valid. - * - * This is just a convenience subroutine to avoid duplicated code in - * ReadRecord. It's not intended for use from anywhere else. - */ -static bool -ValidXLogPageHeader(XLogPageHeader hdr, int emode, bool segmentonly) -{ - XLogRecPtr recaddr; - - XLogSegNoOffsetToRecPtr(readSegNo, readOff, recaddr); - - if (hdr->xlp_magic != XLOG_PAGE_MAGIC) - { - ereport(emode_for_corrupt_record(emode, recaddr), - (errmsg("invalid magic number %04X in log segment %s, offset %u", - hdr->xlp_magic, - XLogFileNameP(curFileTLI, readSegNo), - readOff))); - return false; - } - if ((hdr->xlp_info & ~XLP_ALL_FLAGS) != 0) - { - ereport(emode_for_corrupt_record(emode, recaddr), - (errmsg("invalid info bits %04X in log segment %s, offset %u", - hdr->xlp_info, - XLogFileNameP(curFileTLI, readSegNo), - readOff))); - return false; - } - if (hdr->xlp_info & XLP_LONG_HEADER) - { - XLogLongPageHeader longhdr = (XLogLongPageHeader) hdr; - - if (longhdr->xlp_sysid != ControlFile->system_identifier) - { - char fhdrident_str[32]; - char sysident_str[32]; - - /* - * Format sysids separately to keep platform-dependent format code - * out of the translatable message string. - */ - snprintf(fhdrident_str, sizeof(fhdrident_str), UINT64_FORMAT, - longhdr->xlp_sysid); - snprintf(sysident_str, sizeof(sysident_str), UINT64_FORMAT, - ControlFile->system_identifier); - ereport(emode_for_corrupt_record(emode, recaddr), - (errmsg("WAL file is from different database system"), - errdetail("WAL file database system identifier is %s, pg_control database system identifier is %s.", - fhdrident_str, sysident_str))); - return false; - } - if (longhdr->xlp_seg_size != XLogSegSize) - { - ereport(emode_for_corrupt_record(emode, recaddr), - (errmsg("WAL file is from different database system"), - errdetail("Incorrect XLOG_SEG_SIZE in page header."))); - return false; - } - if (longhdr->xlp_xlog_blcksz != XLOG_BLCKSZ) - { - ereport(emode_for_corrupt_record(emode, recaddr), - (errmsg("WAL file is from different database system"), - errdetail("Incorrect XLOG_BLCKSZ in page header."))); - return false; + break; } - } - else if (readOff == 0) - { - /* hmm, first page of file doesn't have a long header? */ - ereport(emode_for_corrupt_record(emode, recaddr), - (errmsg("invalid info bits %04X in log segment %s, offset %u", - hdr->xlp_info, - XLogFileNameP(curFileTLI, readSegNo), - readOff))); - return false; - } - - if (hdr->xlp_pageaddr != recaddr) - { - ereport(emode_for_corrupt_record(emode, recaddr), - (errmsg("unexpected pageaddr %X/%X in log segment %s, offset %u", - (uint32) (hdr->xlp_pageaddr >> 32), (uint32) hdr->xlp_pageaddr, - XLogFileNameP(curFileTLI, readSegNo), - readOff))); - return false; - } - /* - * Check page TLI is one of the expected values. - */ - if (!tliInHistory(hdr->xlp_tli, expectedTLEs)) - { - ereport(emode_for_corrupt_record(emode, recaddr), - (errmsg("unexpected timeline ID %u in log segment %s, offset %u", - hdr->xlp_tli, - XLogFileNameP(curFileTLI, readSegNo), - readOff))); - return false; - } - - /* - * Since child timelines are always assigned a TLI greater than their - * immediate parent's TLI, we should never see TLI go backwards across - * successive pages of a consistent WAL sequence. - * - * Of course this check should only be applied when advancing sequentially - * across pages; therefore ReadRecord resets lastPageTLI and - * lastSegmentTLI to zero when going to a random page. - * - * Sometimes we re-open a segment that's already been partially replayed. - * In that case we cannot perform the normal TLI check: if there is a - * timeline switch within the segment, the first page has a smaller TLI - * than later pages following the timeline switch, and we might've read - * them already. As a weaker test, we still check that it's not smaller - * than the TLI we last saw at the beginning of a segment. Pass - * segmentonly = true when re-validating the first page like that, and the - * page you're actually interested in comes later. - */ - if (hdr->xlp_tli < (segmentonly ? lastSegmentTLI : lastPageTLI)) - { - ereport(emode_for_corrupt_record(emode, recaddr), - (errmsg("out-of-sequence timeline ID %u (after %u) in log segment %s, offset %u", - hdr->xlp_tli, - segmentonly ? lastSegmentTLI : lastPageTLI, - XLogFileNameP(curFileTLI, readSegNo), - readOff))); - return false; - } - lastPageTLI = hdr->xlp_tli; - if (readOff == 0) - lastSegmentTLI = hdr->xlp_tli; - - return true; -} - -/* - * Validate an XLOG record header. - * - * This is just a convenience subroutine to avoid duplicated code in - * ReadRecord. It's not intended for use from anywhere else. - */ -static bool -ValidXLogRecordHeader(XLogRecPtr *RecPtr, XLogRecord *record, int emode, - bool randAccess) -{ - /* - * xl_len == 0 is bad data for everything except XLOG SWITCH, where it is - * required. - */ - if (record->xl_rmid == RM_XLOG_ID && record->xl_info == XLOG_SWITCH) - { - if (record->xl_len != 0) - { - ereport(emode_for_corrupt_record(emode, *RecPtr), - (errmsg("invalid xlog switch record at %X/%X", - (uint32) ((*RecPtr) >> 32), (uint32) *RecPtr))); - return false; - } - } - else if (record->xl_len == 0) - { - ereport(emode_for_corrupt_record(emode, *RecPtr), - (errmsg("record with zero length at %X/%X", - (uint32) ((*RecPtr) >> 32), (uint32) *RecPtr))); - return false; - } - if (record->xl_tot_len < SizeOfXLogRecord + record->xl_len || - record->xl_tot_len > SizeOfXLogRecord + record->xl_len + - XLR_MAX_BKP_BLOCKS * (sizeof(BkpBlock) + BLCKSZ)) - { - ereport(emode_for_corrupt_record(emode, *RecPtr), - (errmsg("invalid record length at %X/%X", - (uint32) ((*RecPtr) >> 32), (uint32) *RecPtr))); - return false; - } - if (record->xl_rmid > RM_MAX_ID) - { - ereport(emode_for_corrupt_record(emode, *RecPtr), - (errmsg("invalid resource manager ID %u at %X/%X", - record->xl_rmid, (uint32) ((*RecPtr) >> 32), (uint32) *RecPtr))); - return false; - } - if (randAccess) - { /* - * We can't exactly verify the prev-link, but surely it should be less - * than the record's own address. + * Check page TLI is one of the expected values. */ - if (!(record->xl_prev < *RecPtr)) + if (!tliInHistory(xlogreader->latestPageTLI, expectedTLEs)) { - ereport(emode_for_corrupt_record(emode, *RecPtr), - (errmsg("record with incorrect prev-link %X/%X at %X/%X", - (uint32) (record->xl_prev >> 32), (uint32) record->xl_prev, - (uint32) ((*RecPtr) >> 32), (uint32) *RecPtr))); + char fname[MAXFNAMELEN]; + XLogSegNo segno; + int32 offset; + + XLByteToSeg(xlogreader->latestPagePtr, segno); + offset = xlogreader->latestPagePtr % XLogSegSize; + XLogFileName(fname, xlogreader->readPageTLI, segno); + ereport(emode_for_corrupt_record(emode, + RecPtr ? RecPtr : EndRecPtr), + (errmsg("unexpected timeline ID %u in log segment %s, offset %u", + xlogreader->latestPageTLI, + fname, + offset))); return false; } - } - else - { - /* - * Record's prev-link should exactly match our previous location. This - * check guards against torn WAL pages where a stale but valid-looking - * WAL record starts on a sector boundary. - */ - if (record->xl_prev != ReadRecPtr) - { - ereport(emode_for_corrupt_record(emode, *RecPtr), - (errmsg("record with incorrect prev-link %X/%X at %X/%X", - (uint32) (record->xl_prev >> 32), (uint32) record->xl_prev, - (uint32) ((*RecPtr) >> 32), (uint32) *RecPtr))); - return false; - } - } + } while (StandbyMode && record == NULL); - return true; + return record;}/* @@ -5235,6 +4689,8 @@ StartupXLOG(void) bool backupEndRequired = false; bool backupFromStandby = false; DBState dbstate_at_startup; + XLogReaderState *xlogreader; + XLogPageReadPrivate private; /* * Read control file and check XLOG status looks valid. @@ -5351,6 +4807,16 @@ StartupXLOG(void) if (StandbyMode) OwnLatch(&XLogCtl->recoveryWakeupLatch); + /* Set up XLOG reader facility */ + MemSet(&private, 0, sizeof(XLogPageReadPrivate)); + xlogreader = XLogReaderAllocate(InvalidXLogRecPtr, &XLogPageRead, &private); + if (!xlogreader) + ereport(ERROR, + (errcode(ERRCODE_OUT_OF_MEMORY), + errmsg("out of memory"), + errdetail("Failed while allocating an XLog reading processor"))); + xlogreader->system_identifier = ControlFile->system_identifier; + if (read_backup_label(&checkPointLoc, &backupEndRequired, &backupFromStandby)) { @@ -5358,7 +4824,7 @@ StartupXLOG(void) * When a backup_label file is present, we want to roll forward from * the checkpoint it identifies, rather than using pg_control. */ - record = ReadCheckpointRecord(checkPointLoc, 0); + record = ReadCheckpointRecord(xlogreader, checkPointLoc, 0); if (record != NULL) { memcpy(&checkPoint,XLogRecGetData(record), sizeof(CheckPoint)); @@ -5376,7 +4842,7 @@ StartupXLOG(void) */ if (checkPoint.redo < checkPointLoc) { - if (!ReadRecord(&(checkPoint.redo), LOG, false)) + if (!ReadRecord(xlogreader, checkPoint.redo, LOG, false)) ereport(FATAL, (errmsg("could not find redo location referenced by checkpoint record"), errhint("Ifyou are not restoring from a backup, try removing the file \"%s/backup_label\".", DataDir))); @@ -5400,7 +4866,7 @@ StartupXLOG(void) */ checkPointLoc = ControlFile->checkPoint; RedoStartLSN =ControlFile->checkPointCopy.redo; - record = ReadCheckpointRecord(checkPointLoc, 1); + record = ReadCheckpointRecord(xlogreader, checkPointLoc, 1); if (record != NULL) { ereport(DEBUG1, @@ -5419,7 +4885,7 @@ StartupXLOG(void) else { checkPointLoc = ControlFile->prevCheckPoint; - record = ReadCheckpointRecord(checkPointLoc, 2); + record = ReadCheckpointRecord(xlogreader, checkPointLoc, 2); if (record != NULL) { ereport(LOG, @@ -5777,12 +5243,12 @@ StartupXLOG(void) if (checkPoint.redo < RecPtr) { /* back up to find therecord */ - record = ReadRecord(&(checkPoint.redo), PANIC, false); + record = ReadRecord(xlogreader, checkPoint.redo, PANIC, false); } else { /*just have to read next record after CheckPoint */ - record = ReadRecord(NULL, LOG, false); + record = ReadRecord(xlogreader, InvalidXLogRecPtr, LOG, false); } if (record != NULL) @@ -5963,7 +5429,7 @@ StartupXLOG(void) break; /* Else, try to fetch the next WAL record*/ - record = ReadRecord(NULL, LOG, false); + record = ReadRecord(xlogreader, InvalidXLogRecPtr, LOG, false); } while (record != NULL); /* @@ -6013,7 +5479,7 @@ StartupXLOG(void) * Re-fetch the last valid or last applied record, so we can identify the * exact endpoint of what we consider the valid portion of WAL. */ - record = ReadRecord(&LastRec, PANIC, false); + record = ReadRecord(xlogreader, LastRec, PANIC, false); EndOfLog = EndRecPtr; XLByteToPrevSeg(EndOfLog, endLogSegNo); @@ -6117,7 +5583,7 @@ StartupXLOG(void) * we will use that below.) */ if (InArchiveRecovery) - exitArchiveRecovery(curFileTLI, endLogSegNo); + exitArchiveRecovery(xlogreader->readPageTLI, endLogSegNo); /* * Prepare to write WAL starting at EndOfLogposition, and init xlog @@ -6136,8 +5602,15 @@ StartupXLOG(void) * record spans, not the one it starts in. The last block is indeed the * one we want to use. */ - Assert(readOff == (XLogCtl->xlblocks[0] - XLOG_BLCKSZ) % XLogSegSize); - memcpy((char *) Insert->currpage, readBuf, XLOG_BLCKSZ); + if (EndOfLog % XLOG_BLCKSZ == 0) + { + memset(Insert->currpage, 0, XLOG_BLCKSZ); + } + else + { + Assert(readOff == (XLogCtl->xlblocks[0] - XLOG_BLCKSZ) % XLogSegSize); + memcpy((char *) Insert->currpage, xlogreader->readBuf, XLOG_BLCKSZ); + } Insert->currpos = (char *) Insert->currpage + (EndOfLog + XLOG_BLCKSZ - XLogCtl->xlblocks[0]); @@ -6288,23 +5761,13 @@ StartupXLOG(void) if (standbyState != STANDBY_DISABLED) ShutdownRecoveryTransactionEnvironment(); - /* Shut down readFile facility, free space */ + /* Shut down xlogreader */ if (readFile >= 0) { close(readFile); readFile = -1; } - if (readBuf) - { - free(readBuf); - readBuf = NULL; - } - if (readRecordBuf) - { - free(readRecordBuf); - readRecordBuf = NULL; - readRecordBufSize = 0; - } + XLogReaderFree(xlogreader); /* * If any of the critical GUCs have changed, log them before we allow @@ -6554,7 +6017,7 @@ LocalSetXLogInsertAllowed(void) * 1 for "primary", 2 for "secondary", 0 for "other" (backup_label)*/static XLogRecord * -ReadCheckpointRecord(XLogRecPtr RecPtr, int whichChkpt) +ReadCheckpointRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int whichChkpt){ XLogRecord *record; @@ -6578,7 +6041,7 @@ ReadCheckpointRecord(XLogRecPtr RecPtr, int whichChkpt) return NULL; } - record = ReadRecord(&RecPtr, LOG, true); + record = ReadRecord(xlogreader, RecPtr, LOG, true); if (record == NULL) { @@ -9332,28 +8795,24 @@ CancelBackup(void) * XLogPageRead() to try fetching the record from another source, or to * sleepand retry. */ -static bool -XLogPageRead(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt, - bool randAccess) +static int +XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen, + char *readBuf, TimeLineID *readTLI){ + XLogPageReadPrivate *private = + (XLogPageReadPrivate *) xlogreader->private_data; + int emode = private->emode; uint32 targetPageOff; - uint32 targetRecOff; - XLogSegNo targetSegNo; - - XLByteToSeg(*RecPtr, targetSegNo); - targetPageOff = (((*RecPtr) % XLogSegSize) / XLOG_BLCKSZ) * XLOG_BLCKSZ; - targetRecOff = (*RecPtr) % XLOG_BLCKSZ; + XLogSegNo targetSegNo PG_USED_FOR_ASSERTS_ONLY; - /* Fast exit if we have read the record in the current buffer already */ - if (!lastSourceFailed && targetSegNo == readSegNo && - targetPageOff == readOff && targetRecOff < readLen) - return true; + XLByteToSeg(targetPagePtr, targetSegNo); + targetPageOff = targetPagePtr % XLogSegSize; /* * See if we need to switch to a new segment because the requestedrecord * is not in the currently open one. */ - if (readFile >= 0 && !XLByteInSeg(*RecPtr, readSegNo)) + if (readFile >= 0 && !XLByteInSeg(targetPagePtr, readSegNo)) { /* * Request a restartpoint if we'vereplayed too much xlog since the @@ -9374,39 +8833,34 @@ XLogPageRead(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt, readSource = 0; } - XLByteToSeg(*RecPtr, readSegNo); + XLByteToSeg(targetPagePtr, readSegNo);retry: /* See if we need to retrieve more data */ if (readFile < 0 || - (readSource == XLOG_FROM_STREAM && receivedUpto <= *RecPtr)) + (readSource == XLOG_FROM_STREAM && + receivedUpto <= targetPagePtr + reqLen)) { if (StandbyMode) { - if (!WaitForWALToBecomeAvailable(*RecPtr, randAccess, - fetching_ckpt)) + if (!WaitForWALToBecomeAvailable(targetPagePtr + reqLen, + private->randAccess, + private->fetching_ckpt)) goto triggered; } - else + /* In archive or crash recovery. */ + else if (readFile < 0) { - /* In archive or crash recovery. */ - if (readFile < 0) - { - int source; + int source; - /* Reset curFileTLI if random fetch. */ - if (randAccess) - curFileTLI = 0; - - if (InArchiveRecovery) - source = XLOG_FROM_ANY; - else - source = XLOG_FROM_PG_XLOG; + if (InArchiveRecovery) + source = XLOG_FROM_ANY; + else + source = XLOG_FROM_PG_XLOG; - readFile = XLogFileReadAnyTLI(readSegNo, emode, source); - if (readFile < 0) - return false; - } + readFile = XLogFileReadAnyTLI(readSegNo, emode, source); + if (readFile < 0) + return -1; } } @@ -9424,72 +8878,46 @@ retry: */ if (readSource == XLOG_FROM_STREAM) { - if (((*RecPtr) / XLOG_BLCKSZ) != (receivedUpto / XLOG_BLCKSZ)) - { + if (((targetPagePtr) / XLOG_BLCKSZ) != (receivedUpto / XLOG_BLCKSZ)) readLen = XLOG_BLCKSZ; - } else readLen = receivedUpto % XLogSegSize - targetPageOff; } else readLen = XLOG_BLCKSZ; - if (!readFileHeaderValidated && targetPageOff != 0) - { - /* - * Whenever switching to a new WAL segment, we read the first page of - * the file and validate its header, even if that's not where the - * target record is. This is so that we can check the additional - * identification info that is present in the first page's "long" - * header. - */ - readOff = 0; - if (read(readFile, readBuf, XLOG_BLCKSZ) != XLOG_BLCKSZ) - { - char fname[MAXFNAMELEN]; - XLogFileName(fname, curFileTLI, readSegNo); - ereport(emode_for_corrupt_record(emode, *RecPtr), - (errcode_for_file_access(), - errmsg("could not read from log segment %s, offset %u: %m", - fname, readOff))); - goto next_record_is_invalid; - } - if (!ValidXLogPageHeader((XLogPageHeader) readBuf, emode, true)) - goto next_record_is_invalid; - } - /* Read the requested page */ readOff = targetPageOff; if (lseek(readFile, (off_t) readOff, SEEK_SET) < 0) { char fname[MAXFNAMELEN]; + XLogFileName(fname, curFileTLI, readSegNo); - ereport(emode_for_corrupt_record(emode, *RecPtr), + ereport(emode_for_corrupt_record(emode, targetPagePtr + reqLen), (errcode_for_file_access(), errmsg("could not seek in log segment %s to offset %u: %m", - fname, readOff))); + fname, readOff))); goto next_record_is_invalid; } + if (read(readFile, readBuf, XLOG_BLCKSZ) != XLOG_BLCKSZ) { char fname[MAXFNAMELEN]; + XLogFileName(fname, curFileTLI, readSegNo); - ereport(emode_for_corrupt_record(emode, *RecPtr), + ereport(emode_for_corrupt_record(emode, targetPagePtr + reqLen), (errcode_for_file_access(), errmsg("could not read from log segment %s, offset %u: %m", - fname, readOff))); + fname, readOff))); goto next_record_is_invalid; } - if (!ValidXLogPageHeader((XLogPageHeader) readBuf, emode, false)) - goto next_record_is_invalid; - - readFileHeaderValidated = true; Assert(targetSegNo == readSegNo); Assert(targetPageOff == readOff); - Assert(targetRecOff < readLen); + Assert(reqLen <= readLen); - return true; + *readTLI = curFileTLI; + return readLen;next_record_is_invalid: lastSourceFailed = true; @@ -9504,7 +8932,7 @@ next_record_is_invalid: if (StandbyMode) goto retry; else - return false; + return -1;triggered: if (readFile >= 0) @@ -9513,7 +8941,7 @@ triggered: readLen = 0; readSource = 0; - return false; + return -1;}/* diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c new file mode 100644 index 0000000..6a420e6 --- /dev/null +++ b/src/backend/access/transam/xlogreader.c @@ -0,0 +1,987 @@ +/*------------------------------------------------------------------------- + * + * xlogreader.c + * Generic xlog reading facility + * + * Portions Copyright (c) 2012, PostgreSQL Global Development Group + * + * IDENTIFICATION + * src/backend/access/transam/xlogreader.c + * + * NOTES + * Documentation about how do use this interface can be found in + * xlogreader.h, more specifically in the definition of the + * XLogReaderState struct where all parameters are documented. + * + *------------------------------------------------------------------------- + */ + +#include "postgres.h" + +#include "access/transam.h" +#include "access/xlog.h" +#include "access/xlog_internal.h" +#include "access/xlogreader.h" +#include "catalog/pg_control.h" + +static bool allocate_recordbuf(XLogReaderState *state, uint32 reclength); + +static bool ValidXLogPageHeader(XLogReaderState *state, XLogRecPtr recptr, + XLogPageHeader hdr); +static bool ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr, + XLogRecPtr PrevRecPtr, XLogRecord *record, bool randAccess); +static bool ValidXLogRecord(XLogReaderState *state, XLogRecord *record, + XLogRecPtr recptr); +static int ReadPageInternal(struct XLogReaderState *state, XLogRecPtr pageptr, + int reqLen); +static void report_invalid_record(XLogReaderState *state, const char *fmt, ...) +/* This extension allows gcc to check the format string for consistency with + the supplied arguments. */ +__attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 3))); + +/* size of the buffer allocated for error message. */ +#define MAX_ERRORMSG_LEN 1000 + +/* + * Construct a string in state->errormsg_buf explaining what's wrong with + * the current record being read. + */ +static void +report_invalid_record(XLogReaderState *state, const char *fmt, ...) +{ + va_list args; + + fmt = _(fmt); + + va_start(args, fmt); + vsnprintf(state->errormsg_buf, MAX_ERRORMSG_LEN, fmt, args); + va_end(args); +} + +/* + * Allocate and initialize a new xlog reader + * + * Returns NULL if the xlogreader couldn't be allocated. + */ +XLogReaderState * +XLogReaderAllocate(XLogRecPtr startpoint, XLogPageReadCB pagereadfunc, + void *private_data) +{ + XLogReaderState *state; + + state = (XLogReaderState *) malloc(sizeof(XLogReaderState)); + if (!state) + return NULL; + MemSet(state, 0, sizeof(XLogReaderState)); + + /* + * Permanently allocate readBuf. We do it this way, rather than just + * making a static array, for two reasons: (1) no need to waste the + * storage in most instantiations of the backend; (2) a static char array + * isn't guaranteed to have any particular alignment, whereas malloc() + * will provide MAXALIGN'd storage. + */ + state->readBuf = (char *) malloc(XLOG_BLCKSZ); + if (!state->readBuf) + { + free(state); + return NULL; + } + + state->read_page = pagereadfunc; + state->private_data = private_data; + state->EndRecPtr = startpoint; + state->readPageTLI = 0; + state->system_identifier = 0; + state->errormsg_buf = malloc(MAX_ERRORMSG_LEN + 1); + if (!state->errormsg_buf) + { + free(state->readBuf); + free(state); + return NULL; + } + state->errormsg_buf[0] = '\0'; + + /* + * Allocate an initial readRecordBuf of minimal size, which can later be + * enlarged if necessary. + */ + if (!allocate_recordbuf(state, 0)) + { + free(state->errormsg_buf); + free(state->readBuf); + free(state); + return NULL; + } + + return state; +} + +void +XLogReaderFree(XLogReaderState *state) +{ + free(state->errormsg_buf); + if (state->readRecordBuf) + free(state->readRecordBuf); + free(state->readBuf); + free(state); +} + +/* + * Allocate readRecordBuf to fit a record of at least the given length. + * Returns true if successful, false if out of memory. + * + * readRecordBufSize is set to the new buffer size. + * + * To avoid useless small increases, round its size to a multiple of + * XLOG_BLCKSZ, and make sure it's at least 5*Max(BLCKSZ, XLOG_BLCKSZ) to start + * with. (That is enough for all "normal" records, but very large commit or + * abort records might need more space.) + */ +static bool +allocate_recordbuf(XLogReaderState *state, uint32 reclength) +{ + uint32 newSize = reclength; + + newSize += XLOG_BLCKSZ - (newSize % XLOG_BLCKSZ); + newSize = Max(newSize, 5 * Max(BLCKSZ, XLOG_BLCKSZ)); + + if (state->readRecordBuf) + free(state->readRecordBuf); + state->readRecordBuf = (char *) malloc(newSize); + if (!state->readRecordBuf) + { + state->readRecordBufSize = 0; + return false; + } + + state->readRecordBufSize = newSize; + return true; +} + +/* + * Attempt to read an XLOG record. + * + * If RecPtr is not NULL, try to read a record at that position. Otherwise + * try to read a record just after the last one previously read. + * + * If no valid record is available, returns NULL. On NULL return, *errormsg + * is usually set to a string with details of the failure. One typical error + * where *errormsg is not set is when the read_page callback returns an error. + * + * The returned pointer (or *errormsg) points to an internal buffer that's + * valid until the next call to XLogReadRecord. + */ +XLogRecord * +XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg) +{ + XLogRecord *record; + XLogRecPtr tmpRecPtr = state->EndRecPtr; + XLogRecPtr targetPagePtr; + bool randAccess = false; + uint32 len, + total_len; + uint32 targetRecOff; + uint32 pageHeaderSize; + bool gotheader; + int readOff; + + *errormsg = NULL; + state->errormsg_buf[0] = '\0'; + + if (RecPtr == InvalidXLogRecPtr) + { + RecPtr = tmpRecPtr; + + if (state->ReadRecPtr == InvalidXLogRecPtr) + randAccess = true; + + /* + * RecPtr is pointing to end+1 of the previous WAL record. If we're + * at a page boundary, no more records can fit on the current page. We + * must skip over the page header, but we can't do that until we've + * read in the page, since the header size is variable. + */ + } + else + { + /* + * In this case, the passed-in record pointer should already be + * pointing to a valid record starting position. + */ + Assert(XRecOffIsValid(RecPtr)); + randAccess = true; /* allow readPageTLI to go backwards too */ + } + + targetPagePtr = RecPtr - (RecPtr % XLOG_BLCKSZ); + + /* Read the page containing the record into state->readBuf */ + readOff = ReadPageInternal(state, targetPagePtr, SizeOfXLogRecord); + + if (readOff < 0) + { + if (state->errormsg_buf[0] != '\0') + *errormsg = state->errormsg_buf; + return NULL; + } + + /* ReadPageInternal always returns at least the page header */ + pageHeaderSize = XLogPageHeaderSize((XLogPageHeader) state->readBuf); + targetRecOff = RecPtr % XLOG_BLCKSZ; + if (targetRecOff == 0) + { + /* + * At page start, so skip over page header. + */ + RecPtr += pageHeaderSize; + targetRecOff = pageHeaderSize; + } + else if (targetRecOff < pageHeaderSize) + { + report_invalid_record(state, "invalid record offset at %X/%X", + (uint32) (RecPtr >> 32), (uint32) RecPtr); + *errormsg = state->errormsg_buf; + return NULL; + } + + if ((((XLogPageHeader) state->readBuf)->xlp_info & XLP_FIRST_IS_CONTRECORD) && + targetRecOff == pageHeaderSize) + { + report_invalid_record(state, "contrecord is requested by %X/%X", + (uint32) (RecPtr >> 32), (uint32) RecPtr); + *errormsg = state->errormsg_buf; + return NULL; + } + + /* ReadPageInternal has verified the page header */ + Assert(pageHeaderSize <= readOff); + + /* + * Ensure the whole record header or at least the part on this page is + * read. + */ + readOff = ReadPageInternal(state, + targetPagePtr, + Min(targetRecOff + SizeOfXLogRecord, XLOG_BLCKSZ)); + if (readOff < 0) + { + if (state->errormsg_buf[0] != '\0') + *errormsg = state->errormsg_buf; + return NULL; + } + + /* + * Read the record length. + * + * NB: Even though we use an XLogRecord pointer here, the whole record + * header might not fit on this page. xl_tot_len is the first field of the + * struct, so it must be on this page (the records are MAXALIGNed), but we + * cannot access any other fields until we've verified that we got the + * whole header. + */ + record = (XLogRecord *) (state->readBuf + RecPtr % XLOG_BLCKSZ); + total_len = record->xl_tot_len; + + /* + * If the whole record header is on this page, validate it immediately. + * Otherwise do just a basic sanity check on xl_tot_len, and validate the + * rest of the header after reading it from the next page. The xl_tot_len + * check is necessary here to ensure that we enter the "Need to reassemble + * record" code path below; otherwise we might fail to apply + * ValidXLogRecordHeader at all. + */ + if (targetRecOff <= XLOG_BLCKSZ - SizeOfXLogRecord) + { + if (!ValidXLogRecordHeader(state, RecPtr, state->ReadRecPtr, record, + randAccess)) + { + if (state->errormsg_buf[0] != '\0') + *errormsg = state->errormsg_buf; + return NULL; + } + gotheader = true; + } + else + { + /* XXX: more validation should be done here */ + if (total_len < SizeOfXLogRecord) + { + report_invalid_record(state, "invalid record length at %X/%X", + (uint32) (RecPtr >> 32), (uint32) RecPtr); + *errormsg = state->errormsg_buf; + return NULL; + } + gotheader = false; + } + + /* + * Enlarge readRecordBuf as needed. + */ + if (total_len > state->readRecordBufSize && + !allocate_recordbuf(state, total_len)) + { + /* We treat this as a "bogus data" condition */ + report_invalid_record(state, "record length %u at %X/%X too long", + total_len, + (uint32) (RecPtr >> 32), (uint32) RecPtr); + *errormsg = state->errormsg_buf; + return NULL; + } + + len = XLOG_BLCKSZ - RecPtr % XLOG_BLCKSZ; + if (total_len > len) + { + /* Need to reassemble record */ + char *contdata; + XLogPageHeader pageHeader; + char *buffer; + uint32 gotlen; + + /* Copy the first fragment of the record from the first page. */ + memcpy(state->readRecordBuf, + state->readBuf + RecPtr % XLOG_BLCKSZ, len); + buffer = state->readRecordBuf + len; + gotlen = len; + + do + { + /* Calculate pointer to beginning of next page */ + targetPagePtr += XLOG_BLCKSZ; + + /* Wait for the next page to become available */ + readOff = ReadPageInternal(state, targetPagePtr, + Min(len, XLOG_BLCKSZ)); + + if (readOff < 0) + goto err; + + Assert(SizeOfXLogShortPHD <= readOff); + + /* Check that the continuation on next page looks valid */ + pageHeader = (XLogPageHeader) state->readBuf; + if (!(pageHeader->xlp_info & XLP_FIRST_IS_CONTRECORD)) + { + report_invalid_record(state, + "there is no contrecord flag at %X/%X", + (uint32) (RecPtr >> 32), (uint32) RecPtr); + goto err; + } + + /* + * Cross-check that xlp_rem_len agrees with how much of the record + * we expect there to be left. + */ + if (pageHeader->xlp_rem_len == 0 || + total_len != (pageHeader->xlp_rem_len + gotlen)) + { + report_invalid_record(state, + "invalid contrecord length %u at %X/%X", + pageHeader->xlp_rem_len, + (uint32) (RecPtr >> 32), (uint32) RecPtr); + goto err; + } + + /* Append the continuation from this page to the buffer */ + pageHeaderSize = XLogPageHeaderSize(pageHeader); + Assert(pageHeaderSize <= readOff); + + contdata = (char *) state->readBuf + pageHeaderSize; + len = XLOG_BLCKSZ - pageHeaderSize; + if (pageHeader->xlp_rem_len < len) + len = pageHeader->xlp_rem_len; + + memcpy(buffer, (char *) contdata, len); + buffer += len; + gotlen += len; + + /* If we just reassembled the record header, validate it. */ + if (!gotheader) + { + record = (XLogRecord *) state->readRecordBuf; + if (!ValidXLogRecordHeader(state, RecPtr, state->ReadRecPtr, + record, randAccess)) + goto err; + gotheader = true; + } + } while (gotlen < total_len); + + Assert(gotheader); + + record = (XLogRecord *) state->readRecordBuf; + if (!ValidXLogRecord(state, record, RecPtr)) + goto err; + + pageHeaderSize = XLogPageHeaderSize((XLogPageHeader) state->readBuf); + state->ReadRecPtr = RecPtr; + state->EndRecPtr = targetPagePtr + pageHeaderSize + + MAXALIGN(pageHeader->xlp_rem_len); + } + else + { + /* Wait for the record data to become available */ + readOff = ReadPageInternal(state, targetPagePtr, + Min(targetRecOff + total_len, XLOG_BLCKSZ)); + if (readOff < 0) + goto err; + + /* Record does not cross a page boundary */ + if (!ValidXLogRecord(state, record, RecPtr)) + goto err; + + state->EndRecPtr = RecPtr + MAXALIGN(total_len); + + state->ReadRecPtr = RecPtr; + memcpy(state->readRecordBuf, record, total_len); + } + + /* + * Special processing if it's an XLOG SWITCH record + */ + if (record->xl_rmid == RM_XLOG_ID && record->xl_info == XLOG_SWITCH) + { + /* Pretend it extends to end of segment */ + state->EndRecPtr += XLogSegSize - 1; + state->EndRecPtr -= state->EndRecPtr % XLogSegSize; + } + + return record; + +err: + /* + * Invalidate the xlog page we've cached. We might read from a different + * source after failure. + */ + state->readSegNo = 0; + state->readOff = 0; + state->readLen = 0; + + if (state->errormsg_buf[0] != '\0') + *errormsg = state->errormsg_buf; + + return NULL; +} + +/* + * Read a single xlog page including at least [pagestart, RecPtr] of valid data + * via the read_page() callback. + * + * Returns -1 if the required page cannot be read for some reason. + * + * We fetch the page from a reader-local cache if we know we have the required + * data and if there hasn't been any error since caching the data. + */ +static int +ReadPageInternal(struct XLogReaderState *state, XLogRecPtr pageptr, + int reqLen) +{ + int readLen; + uint32 targetPageOff; + XLogSegNo targetSegNo; + XLogPageHeader hdr; + + Assert((pageptr % XLOG_BLCKSZ) == 0); + + XLByteToSeg(pageptr, targetSegNo); + targetPageOff = (pageptr % XLogSegSize); + + /* check whether we have all the requested data already */ + if (targetSegNo == state->readSegNo && targetPageOff == state->readOff && + reqLen < state->readLen) + return state->readLen; + + /* + * Data is not cached. + * + * Everytime we actually read the page, even if we looked at parts of it + * before, we need to do verification as the read_page callback might now + * be rereading data from a different source. + * + * Whenever switching to a new WAL segment, we read the first page of the + * file and validate its header, even if that's not where the target record + * is. This is so that we can check the additional identification info + * that is present in the first page's "long" header. + */ + if (targetSegNo != state->readSegNo && + targetPageOff != 0) + { + XLogPageHeader hdr; + XLogRecPtr targetSegmentPtr = pageptr - targetPageOff; + + readLen = state->read_page(state, targetSegmentPtr, XLOG_BLCKSZ, + state->readBuf, &state->readPageTLI); + + if (readLen < 0) + goto err; + + Assert(readLen <= XLOG_BLCKSZ); + + /* we can be sure to have enough WAL available, we scrolled back */ + Assert(readLen == XLOG_BLCKSZ); + + hdr = (XLogPageHeader) state->readBuf; + + if (!ValidXLogPageHeader(state, targetSegmentPtr, hdr)) + goto err; + } + + /* now read the target data */ + readLen = state->read_page(state, pageptr, Max(reqLen, SizeOfXLogShortPHD), + state->readBuf, &state->readPageTLI); + if (readLen < 0) + goto err; + + Assert(readLen <= XLOG_BLCKSZ); + + /* check we have enough data to check for the actual length of a the page header */ + if (readLen <= SizeOfXLogShortPHD) + goto err; + + Assert(readLen >= reqLen); + + hdr = (XLogPageHeader) state->readBuf; + + /* still not enough */ + if (readLen < XLogPageHeaderSize(hdr)) + { + readLen = state->read_page(state, pageptr, XLogPageHeaderSize(hdr), + state->readBuf, &state->readPageTLI); + if (readLen < 0) + goto err; + } + + if (!ValidXLogPageHeader(state, pageptr, hdr)) + goto err; + + /* update cache information */ + state->readSegNo = targetSegNo; + state->readOff = targetPageOff; + state->readLen = readLen; + + return readLen; +err: + state->readSegNo = 0; + state->readOff = 0; + state->readLen = 0; + return -1; +} + +/* + * Validate an XLOG record header. + * + * This is just a convenience subroutine to avoid duplicated code in + * XLogReadRecord. It's not intended for use from anywhere else. + */ +static bool +ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr, + XLogRecPtr PrevRecPtr, XLogRecord *record, + bool randAccess) +{ + /* + * xl_len == 0 is bad data for everything except XLOG SWITCH, where it is + * required. + */ + if (record->xl_rmid == RM_XLOG_ID && record->xl_info == XLOG_SWITCH) + { + if (record->xl_len != 0) + { + report_invalid_record(state, + "invalid xlog switch record at %X/%X", + (uint32) (RecPtr >> 32), (uint32) RecPtr); + return false; + } + } + else if (record->xl_len == 0) + { + report_invalid_record(state, + "record with zero length at %X/%X", + (uint32) (RecPtr >> 32), (uint32) RecPtr); + return false; + } + if (record->xl_tot_len < SizeOfXLogRecord + record->xl_len || + record->xl_tot_len > SizeOfXLogRecord + record->xl_len + + XLR_MAX_BKP_BLOCKS * (sizeof(BkpBlock) + BLCKSZ)) + { + report_invalid_record(state, + "invalid record length at %X/%X", + (uint32) (RecPtr >> 32), (uint32) RecPtr); + return false; + } + if (record->xl_rmid > RM_MAX_ID) + { + report_invalid_record(state, + "invalid resource manager ID %u at %X/%X", + record->xl_rmid, (uint32) (RecPtr >> 32), + (uint32) RecPtr); + return false; + } + if (randAccess) + { + /* + * We can't exactly verify the prev-link, but surely it should be less + * than the record's own address. + */ + if (!(record->xl_prev < RecPtr)) + { + report_invalid_record(state, + "record with incorrect prev-link %X/%X at %X/%X", + (uint32) (record->xl_prev >> 32), + (uint32) record->xl_prev, + (uint32) (RecPtr >> 32), (uint32) RecPtr); + return false; + } + } + else + { + /* + * Record's prev-link should exactly match our previous location. This + * check guards against torn WAL pages where a stale but valid-looking + * WAL record starts on a sector boundary. + */ + if (record->xl_prev != PrevRecPtr) + { + report_invalid_record(state, + "record with incorrect prev-link %X/%X at %X/%X", + (uint32) (record->xl_prev >> 32), + (uint32) record->xl_prev, + (uint32) (RecPtr >> 32), (uint32) RecPtr); + return false; + } + } + + return true; +} + + +/* + * CRC-check an XLOG record. We do not believe the contents of an XLOG + * record (other than to the minimal extent of computing the amount of + * data to read in) until we've checked the CRCs. + * + * We assume all of the record (that is, xl_tot_len bytes) has been read + * into memory at *record. Also, ValidXLogRecordHeader() has accepted the + * record's header, which means in particular that xl_tot_len is at least + * SizeOfXlogRecord, so it is safe to fetch xl_len. + */ +static bool +ValidXLogRecord(XLogReaderState *state, XLogRecord *record, XLogRecPtr recptr) +{ + pg_crc32 crc; + int i; + uint32 len = record->xl_len; + BkpBlock bkpb; + char *blk; + size_t remaining = record->xl_tot_len; + + /* First the rmgr data */ + if (remaining < SizeOfXLogRecord + len) + { + /* ValidXLogRecordHeader() should've caught this already... */ + report_invalid_record(state, "invalid record length at %X/%X", + (uint32) (recptr >> 32), (uint32) recptr); + return false; + } + remaining -= SizeOfXLogRecord + len; + INIT_CRC32(crc); + COMP_CRC32(crc, XLogRecGetData(record), len); + + /* Add in the backup blocks, if any */ + blk = (char *) XLogRecGetData(record) + len; + for (i = 0; i < XLR_MAX_BKP_BLOCKS; i++) + { + uint32 blen; + + if (!(record->xl_info & XLR_BKP_BLOCK(i))) + continue; + + if (remaining < sizeof(BkpBlock)) + { + report_invalid_record(state, + "invalid backup block size in record at %X/%X", + (uint32) (recptr >> 32), (uint32) recptr); + return false; + } + memcpy(&bkpb, blk, sizeof(BkpBlock)); + + if (bkpb.hole_offset + bkpb.hole_length > BLCKSZ) + { + report_invalid_record(state, + "incorrect hole size in record at %X/%X", + (uint32) (recptr >> 32), (uint32) recptr); + return false; + } + blen = sizeof(BkpBlock) + BLCKSZ - bkpb.hole_length; + + if (remaining < blen) + { + report_invalid_record(state, + "invalid backup block size in record at %X/%X", + (uint32) (recptr >> 32), (uint32) recptr); + return false; + } + remaining -= blen; + COMP_CRC32(crc, blk, blen); + blk += blen; + } + + /* Check that xl_tot_len agrees with our calculation */ + if (remaining != 0) + { + report_invalid_record(state, + "incorrect total length in record at %X/%X", + (uint32) (recptr >> 32), (uint32) recptr); + return false; + } + + /* Finally include the record header */ + COMP_CRC32(crc, (char *) record, offsetof(XLogRecord, xl_crc)); + FIN_CRC32(crc); + + if (!EQ_CRC32(record->xl_crc, crc)) + { + report_invalid_record(state, + "incorrect resource manager data checksum in record at %X/%X", + (uint32) (recptr >> 32), (uint32) recptr); + return false; + } + + return true; +} + +static bool +ValidXLogPageHeader(XLogReaderState *state, XLogRecPtr recptr, + XLogPageHeader hdr) +{ + XLogRecPtr recaddr; + XLogSegNo segno; + int32 offset; + + Assert((recptr % XLOG_BLCKSZ) == 0); + + XLByteToSeg(recptr, segno); + offset = recptr % XLogSegSize; + + XLogSegNoOffsetToRecPtr(segno, offset, recaddr); + + if (hdr->xlp_magic != XLOG_PAGE_MAGIC) + { + char fname[MAXFNAMELEN]; + + XLogFileName(fname, state->readPageTLI, segno); + + report_invalid_record(state, + "invalid magic number %04X in log segment %s, offset %u", + hdr->xlp_magic, + fname, + offset); + return false; + } + + if ((hdr->xlp_info & ~XLP_ALL_FLAGS) != 0) + { + char fname[MAXFNAMELEN]; + + XLogFileName(fname, state->readPageTLI, segno); + + report_invalid_record(state, + "invalid info bits %04X in log segment %s, offset %u", + hdr->xlp_info, + fname, + offset); + return false; + } + + if (hdr->xlp_info & XLP_LONG_HEADER) + { + XLogLongPageHeader longhdr = (XLogLongPageHeader) hdr; + + if (state->system_identifier && + longhdr->xlp_sysid != state->system_identifier) + { + char fhdrident_str[32]; + char sysident_str[32]; + + /* + * Format sysids separately to keep platform-dependent format code + * out of the translatable message string. + */ + snprintf(fhdrident_str, sizeof(fhdrident_str), UINT64_FORMAT, + longhdr->xlp_sysid); + snprintf(sysident_str, sizeof(sysident_str), UINT64_FORMAT, + state->system_identifier); + report_invalid_record(state, + "WAL file is from different database system: WAL file database system identifier is %s, pg_controldatabase system identifier is %s.", + fhdrident_str, sysident_str); + return false; + } + else if (longhdr->xlp_seg_size != XLogSegSize) + { + report_invalid_record(state, + "WAL file is from different database system: Incorrect XLOG_SEG_SIZE in page header."); + return false; + } + else if (longhdr->xlp_xlog_blcksz != XLOG_BLCKSZ) + { + report_invalid_record(state, + "WAL file is from different database system: Incorrect XLOG_BLCKSZ in page header."); + return false; + } + } + else if (offset == 0) + { + char fname[MAXFNAMELEN]; + + XLogFileName(fname, state->readPageTLI, segno); + + /* hmm, first page of file doesn't have a long header? */ + report_invalid_record(state, + "invalid info bits %04X in log segment %s, offset %u", + hdr->xlp_info, + fname, + offset); + return false; + } + + if (hdr->xlp_pageaddr != recaddr) + { + char fname[MAXFNAMELEN]; + + XLogFileName(fname, state->readPageTLI, segno); + + report_invalid_record(state, + "unexpected pageaddr %X/%X in log segment %s, offset %u", + (uint32) (hdr->xlp_pageaddr >> 32), (uint32) hdr->xlp_pageaddr, + fname, + offset); + return false; + } + + /* + * Since child timelines are always assigned a TLI greater than their + * immediate parent's TLI, we should never see TLI go backwards across + * successive pages of a consistent WAL sequence. + * + * Of course this check should only be applied when advancing sequentially + * across pages; therefore ReadRecord resets lastPageTLI and lastSegmentTLI + * to zero when going to a random page. FIXME + * + * Sometimes we re-read a segment that's already been (partially) read. So + * we only verify TLIs for pages that are later than the last remembered + * LSN. + * + * XXX: This is slightly less precise than the check we did in earlier + * times. I don't see a problem with that though. + */ + if (state->latestPagePtr < recptr) + { + if (hdr->xlp_tli < state->latestPageTLI) + { + char fname[MAXFNAMELEN]; + + XLogFileName(fname, state->readPageTLI, segno); + + report_invalid_record(state, + "out-of-sequence timeline ID %u (after %u) in log segment %s, offset %u", + hdr->xlp_tli, + state->latestPageTLI, + fname, + offset); + return false; + } + } + state->latestPagePtr = recptr; + state->latestPageTLI = hdr->xlp_tli; + return true; +} + +/* + * Functions that are currently only needed in the backend, but are better + * implemented inside xlogreader because the internal functions available + * there. + */ +#ifdef FRONTEND + +/* + * Find the first record with at an lsn >= RecPtr. + * + * Useful for checking wether RecPtr is a valid xlog address for reading and to + * find the first valid address after some address when dumping records for + * debugging purposes. + */ +XLogRecPtr +XLogFindNextRecord(XLogReaderState *state, XLogRecPtr RecPtr) +{ + XLogReaderState saved_state = *state; + XLogRecPtr targetPagePtr; + XLogRecPtr tmpRecPtr; + int targetRecOff; + XLogRecPtr found = InvalidXLogRecPtr; + uint32 pageHeaderSize; + XLogPageHeader header; + XLogRecord *record; + uint32 readLen; + char *errormsg; + + if (RecPtr == InvalidXLogRecPtr) + RecPtr = state->EndRecPtr; + + targetRecOff = RecPtr % XLOG_BLCKSZ; + + /* scroll back to page boundary */ + targetPagePtr = RecPtr - targetRecOff; + + /* Read the page containing the record */ + readLen = ReadPageInternal(state, targetPagePtr, targetRecOff); + if (readLen < 0) + goto err; + + header = (XLogPageHeader) state->readBuf; + + pageHeaderSize = XLogPageHeaderSize(header); + + /* make sure we have enough data for the page header */ + readLen = ReadPageInternal(state, targetPagePtr, pageHeaderSize); + if (readLen < 0) + goto err; + + /* skip over potential continuation data */ + if (header->xlp_info & XLP_FIRST_IS_CONTRECORD) + { + /* record headers are MAXALIGN'ed */ + tmpRecPtr = targetPagePtr + pageHeaderSize + + MAXALIGN(header->xlp_rem_len); + } + else + { + tmpRecPtr = targetPagePtr + pageHeaderSize; + } + + /* + * we know now that tmpRecPtr is an address pointing to a valid XLogRecord + * because either were at the first record after the beginning of a page or + * we just jumped over the remaining data of a continuation. + */ + while ((record = XLogReadRecord(state, tmpRecPtr, &errormsg))) + { + /* continue after the record */ + tmpRecPtr = InvalidXLogRecPtr; + + /* past the record we've found, break out */ + if (RecPtr <= state->ReadRecPtr) + { + found = state->ReadRecPtr; + goto out; + } + } + +err: +out: + /* Reset state to what we had before finding the record */ + state->readSegNo = 0; + state->readOff = 0; + state->readLen = 0; + state->ReadRecPtr = saved_state.ReadRecPtr; + state->EndRecPtr = saved_state.EndRecPtr; + return found; +} + +#endif /* FRONTEND */ diff --git a/src/backend/nls.mk b/src/backend/nls.mk index 30f6a2b..c072de7 100644 --- a/src/backend/nls.mk +++ b/src/backend/nls.mk @@ -4,12 +4,13 @@ AVAIL_LANGUAGES = de es fr ja pt_BR tr zh_CN zh_TWGETTEXT_FILES = + gettext-filesGETTEXT_TRIGGERS =$(BACKEND_COMMON_GETTEXT_TRIGGERS) \ GUC_check_errmsg GUC_check_errdetail GUC_check_errhint \ - write_stderr yyerror parser_yyerror + write_stderr yyerror parser_yyerror report_invalid_recordGETTEXT_FLAGS = $(BACKEND_COMMON_GETTEXT_FLAGS) \ GUC_check_errmsg:1:c-format\ GUC_check_errdetail:1:c-format \ GUC_check_errhint:1:c-format \ - write_stderr:1:c-format + write_stderr:1:c-format \ + report_invalid_record:2:c-formatgettext-files: distprep find $(srcdir)/ $(srcdir)/../port/ -name '*.c' -print | LC_ALL=Csort >$@ diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h new file mode 100644 index 0000000..acc8309 --- /dev/null +++ b/src/include/access/xlogreader.h @@ -0,0 +1,141 @@ +/*------------------------------------------------------------------------- + * + * readxlog.h + * + * Generic xlog reading facility. + * + * Portions Copyright (c) 2012, PostgreSQL Global Development Group + * + * IDENTIFICATION + * src/include/access/xlogreader.h + * + * NOTES + * Check the definition of the XLogReaderState struct for instructions on + * how to use the XLogReader infrastructure. + * + * The basic idea is to allocate an XLogReaderState via + * XLogReaderAllocate, and call XLogReadRecord() until it returns NULL. + *------------------------------------------------------------------------- + */ +#ifndef XLOGREADER_H +#define XLOGREADER_H + +#include "access/xlog_internal.h" + +struct XLogReaderState; + +/* + * The callbacks are explained in more detail inside the XLogReaderState + * struct. + */ + +typedef int (*XLogPageReadCB) (struct XLogReaderState *state, + XLogRecPtr pageptr, + int reqLen, + char *readBuf, + TimeLineID *pageTLI); + +typedef struct XLogReaderState +{ + /* ---------------------------------------- + * Public parameters + * ---------------------------------------- + */ + + /* + * Data input callback (mandatory). + * + * This callback shall read the the xlog page (of size XLOG_BLKSZ) in which + * RecPtr resides. All data <= RecPtr must be visible. The callback shall + * return the range of actually valid bytes returned or -1 upon + * failure. + * + * *pageTLI should be set to the TLI of the file the page was read from + * to be in. It is currently used only for error reporting purposes, to + * reconstruct the name of the WAL file where an error occurred. + */ + XLogPageReadCB read_page; + + /* + * System identifier of the xlog files were about to read. + * + * Set to zero (the default value) if unknown or unimportant. + */ + uint64 system_identifier; + + /* + * Opaque data for callbacks to use. Not used by XLogReader. + */ + void *private_data; + + /* + * From where to where are we reading + */ + XLogRecPtr ReadRecPtr; /* start of last record read */ + XLogRecPtr EndRecPtr; /* end+1 of last record read */ + + /* + * TLI of the current xlog page + */ + TimeLineID ReadTimeLineID; + + /* ---------------------------------------- + * private/internal state + * ---------------------------------------- + */ + + /* Buffer for currently read page (XLOG_BLCKSZ bytes) */ + char *readBuf; + + /* last read segment, segment offset, read length, TLI */ + XLogSegNo readSegNo; + uint32 readOff; + uint32 readLen; + TimeLineID readPageTLI; + + /* beginning of last page read, and its TLI */ + XLogRecPtr latestPagePtr; + TimeLineID latestPageTLI; + + /* Buffer for current ReadRecord result (expandable) */ + char *readRecordBuf; + uint32 readRecordBufSize; + + /* Buffer to hold error message */ + char *errormsg_buf; +} XLogReaderState; + +/* + * Get a new XLogReader + * + * At least the read_page callback, startptr and endptr have to be set before + * the reader can be used. + */ +extern XLogReaderState *XLogReaderAllocate(XLogRecPtr startpoint, + XLogPageReadCB pagereadfunc, void *private_data); + +/* + * Free an XLogReader + */ +extern void XLogReaderFree(XLogReaderState *state); + +/* + * Read the next record from xlog. Returns NULL on end-of-WAL or on failure. + */ +extern struct XLogRecord *XLogReadRecord(XLogReaderState *state, XLogRecPtr ptr, + char **errormsg); + +/* + * Functions that are currently only needed in the backend, but are better + * implemented inside xlogreader because the internal functions available + * there. + */ +#ifdef FRONTEND +/* + * Find the address of the first record with a lsn >= RecPtr. + */ +extern XLogRecPtr XLogFindNextRecord(XLogReaderState *state, XLogRecPtr RecPtr); + +#endif /* FRONTEND */ + +#endif /* XLOGREADER_H */ -- 1.7.12.289.g0ce9864.dirty
From: Andres Freund <andres@anarazel.de> ---src/backend/access/rmgrdesc/standbydesc.c | 2 +-1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/backend/access/rmgrdesc/standbydesc.c b/src/backend/access/rmgrdesc/standbydesc.c index c38892b..5fb6f54 100644 --- a/src/backend/access/rmgrdesc/standbydesc.c +++ b/src/backend/access/rmgrdesc/standbydesc.c @@ -57,7 +57,7 @@ standby_desc(StringInfo buf, uint8 xl_info, char *rec) { xl_running_xacts *xlrec = (xl_running_xacts*) rec; - appendStringInfo(buf, " running xacts:"); + appendStringInfo(buf, "running xacts:"); standby_desc_running_xacts(buf, xlrec); } else -- 1.7.12.289.g0ce9864.dirty
On 8 January 2013 19:09, Andres Freund <andres@2ndquadrant.com> wrote:
From: Andres Freund <andres@2ndquadrant.com>
Subject: [PATCH] xlogreader-v4
In-Reply-To:
Hi,
this is the latest and obviously best version of xlogreader & xlogdump with
changes both from Heikki and me.
Thom
On 8 January 2013 19:15, Thom Brown <thom@linux.com> wrote:
Aren't you forgetting something?On 8 January 2013 19:09, Andres Freund <andres@2ndquadrant.com> wrote:From: Andres Freund <andres@2ndquadrant.com>
Subject: [PATCH] xlogreader-v4
In-Reply-To:
Hi,
this is the latest and obviously best version of xlogreader & xlogdump with
changes both from Heikki and me.
I see, you're posting them separately. Nevermind.
Thom
Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend
From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes: > From: Andres Freund <andres@anarazel.de> > c.h already had parts of the assert support (StaticAssert*) and its the shared > file between postgres.h and postgres_fe.h. This makes it easier to build > frontend programs which have to do the hack. This patch seems unnecessary given that we already put a version of Assert() into postgres_fe.h. I don't think that moving the two different definitions into an #if block in one file is an improvement. If that were an improvement, we might as well move everything in both postgres.h and postgres_fe.h into c.h with a pile of #ifs. regards, tom lane
On 2013-01-08 20:09:42 +0100, Andres Freund wrote: > From: Andres Freund <andres@2ndquadrant.com> > Subject: [PATCH] xlogreader-v4 > In-Reply-To: > > Hi, > > this is the latest and obviously best version of xlogreader & xlogdump with > changes both from Heikki and me. > > Changes: > * windows build support for pg_xlogdump That was done blindly, btw, so I only know it compiles, not that it runs... Its in git://git.postgresql.org/git/users/andresfreund/postgres.git branch xlogreader_v4 btw. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes: maxpg> From: Andres Freund <andres@anarazel.de> > relpathbackend() (via some of its wrappers) is used in *_desc routines which we > want to be useable without a backend environment arround. I'm 100% unimpressed with making relpathbackend return a pointer to a static buffer. Who's to say whether that won't create bugs due to overlapping usages? > Change signature to return a 'const char *' to make misuse easier to > detect. That seems to create way more churn than is necessary, and it's wrong anyway if the result is palloc'd. regards, tom lane
Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend
From
Andres Freund
Date:
On 2013-01-08 14:25:06 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > From: Andres Freund <andres@anarazel.de> > > c.h already had parts of the assert support (StaticAssert*) and its the shared > > file between postgres.h and postgres_fe.h. This makes it easier to build > > frontend programs which have to do the hack. > > This patch seems unnecessary given that we already put a version of Assert() > into postgres_fe.h. I don't think that moving the two different > definitions into an #if block in one file is an improvement. If that > were an improvement, we might as well move everything in both postgres.h > and postgres_fe.h into c.h with a pile of #ifs. The problem is that some (including existing) pieces of code need to include postgres.h itself, those can't easily include postgres_fe.h as well without getting into problems with redefinitions. It seems the most consistent to move all of that into c.h, enough of the assertion stuff is already there, I don't see an advantage of splitting it across 3 files as it currently is. Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend
From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes: > On 2013-01-08 14:25:06 -0500, Tom Lane wrote: >> This patch seems unnecessary given that we already put a version of Assert() >> into postgres_fe.h. > The problem is that some (including existing) pieces of code need to > include postgres.h itself, those can't easily include postgres_fe.h as > well without getting into problems with redefinitions. There is no place, anywhere, that should be including both. So I don't see the problem. regards, tom lane
Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
From
Andres Freund
Date:
On 2013-01-08 14:28:14 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > maxpg> From: Andres Freund <andres@anarazel.de> > > relpathbackend() (via some of its wrappers) is used in *_desc routines which we > > want to be useable without a backend environment arround. > > I'm 100% unimpressed with making relpathbackend return a pointer to a > static buffer. Who's to say whether that won't create bugs due to > overlapping usages? I say it ;). I've gone through all callers and checked. Not that that guarantees anything, but ... The reason a static buffer is better is that some of the *desc routines use relpathbackend() and pfree() the result. That would require providing a stub pfree() in xlogdump which seems to be exceedingly ugly. > > Change signature to return a 'const char *' to make misuse easier to > > detect. > > That seems to create way more churn than is necessary, and it's wrong > anyway if the result is palloc'd. It causes warnings in potential external users that pfree() the result of relpathbackend which seems helpful. Obviously only makes sense if relpathbackend() returns a pointer into a static buffer... Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes: > On 2013-01-08 14:28:14 -0500, Tom Lane wrote: >> I'm 100% unimpressed with making relpathbackend return a pointer to a >> static buffer. Who's to say whether that won't create bugs due to >> overlapping usages? > I say it ;). I've gone through all callers and checked. Not that that > guarantees anything, but ... Even if you've proven it safe today, it seems unnecessarily fragile. Just about any other place we've used a static result buffer, we've regretted it, unless the use cases were *very* narrow. > The reason a static buffer is better is that some of the *desc routines > use relpathbackend() and pfree() the result. That would require > providing a stub pfree() in xlogdump which seems to be exceedingly ugly. Why? If we have palloc support how hard can it be to map pfree to free? And why wouldn't we want to? I can hardly imagine providing only palloc and not pfree support. regards, tom lane
Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
From
Andres Freund
Date:
On 2013-01-08 14:53:29 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2013-01-08 14:28:14 -0500, Tom Lane wrote: > >> I'm 100% unimpressed with making relpathbackend return a pointer to a > >> static buffer. Who's to say whether that won't create bugs due to > >> overlapping usages? > > > I say it ;). I've gone through all callers and checked. Not that that > > guarantees anything, but ... > > Even if you've proven it safe today, it seems unnecessarily fragile. > Just about any other place we've used a static result buffer, we've > regretted it, unless the use cases were *very* narrow. Hm, relpathbackend seems pretty narrow to me. Funny, we both argued the other way round than we are now when talking about the sprintf(..., "%X/%X", (uint32)(recptr >> 32), (uint32)recptr) thingy ;) > > The reason a static buffer is better is that some of the *desc routines > > use relpathbackend() and pfree() the result. That would require > > providing a stub pfree() in xlogdump which seems to be exceedingly ugly. > > Why? If we have palloc support how hard can it be to map pfree to free? > And why wouldn't we want to? I can hardly imagine providing only palloc > and not pfree support. Uhm, we don't have & need palloc support and I don't think relpathbackend() is a good justification for adding it. Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend
From
Andres Freund
Date:
On 2013-01-08 14:35:12 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2013-01-08 14:25:06 -0500, Tom Lane wrote: > >> This patch seems unnecessary given that we already put a version of Assert() > >> into postgres_fe.h. > > > The problem is that some (including existing) pieces of code need to > > include postgres.h itself, those can't easily include postgres_fe.h as > > well without getting into problems with redefinitions. > > There is no place, anywhere, that should be including both. So I don't > see the problem. Sorry, misremembered the problem somewhat. The problem is that code that includes postgres.h atm ends up with ExceptionalCondition() et al. declared even if FRONTEND is defined. So if anything uses an assert you need to provide wrappers for those which seems nasty. If they are provided centrally and check for FRONTEND that problem doesn't exist. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes: > Uhm, we don't have & need palloc support and I don't think > relpathbackend() is a good justification for adding it. I've said from the very beginning of this effort that it would be impossible to share any meaningful amount of code between frontend and backend environments without adding some sort of emulation of palloc/pfree/elog. I think this patch is just making the code uglier and more fragile to put off the inevitable, and that we'd be better served to bite the bullet and do that. regards, tom lane
Re: Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend
From
Alvaro Herrera
Date:
Andres Freund wrote: > On 2013-01-08 14:35:12 -0500, Tom Lane wrote: > > Andres Freund <andres@2ndquadrant.com> writes: > > > On 2013-01-08 14:25:06 -0500, Tom Lane wrote: > > >> This patch seems unnecessary given that we already put a version of Assert() > > >> into postgres_fe.h. > > > > > The problem is that some (including existing) pieces of code need to > > > include postgres.h itself, those can't easily include postgres_fe.h as > > > well without getting into problems with redefinitions. > > > > There is no place, anywhere, that should be including both. So I don't > > see the problem. > > Sorry, misremembered the problem somewhat. The problem is that code that > includes postgres.h atm ends up with ExceptionalCondition() et > al. declared even if FRONTEND is defined. So if anything uses an assert > you need to provide wrappers for those which seems nasty. If they are > provided centrally and check for FRONTEND that problem doesn't exist. I think the right fix here is to fix things so that postgres.h is not necessary. How hard is that? Maybe it just requires some more reshuffling of xlog headers. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
From
Alvaro Herrera
Date:
Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > Uhm, we don't have & need palloc support and I don't think > > relpathbackend() is a good justification for adding it. > > I've said from the very beginning of this effort that it would be > impossible to share any meaningful amount of code between frontend and > backend environments without adding some sort of emulation of > palloc/pfree/elog. I think this patch is just making the code uglier > and more fragile to put off the inevitable, and that we'd be better > served to bite the bullet and do that. As far as this patch is concerned, I think it's sufficient to do #define palloc(x) malloc(x) #define pfree(x) free(x) -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
From
Andres Freund
Date:
On 2013-01-08 15:27:23 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > Uhm, we don't have & need palloc support and I don't think > > relpathbackend() is a good justification for adding it. > > I've said from the very beginning of this effort that it would be > impossible to share any meaningful amount of code between frontend and > backend environments without adding some sort of emulation of > palloc/pfree/elog. I think this patch is just making the code uglier > and more fragile to put off the inevitable, and that we'd be better > served to bite the bullet and do that. If you think relpathbackend() alone warrants that, yes, I can provide a wrapper. Everything else is imo already handled in a sensible and not really ugly manner? Imo its not worth the effort *for this alone*. I already had some elog(), ereport(), whatever emulation but Heikki preferred not to have it, so its removed by now. To what extent do you want palloc et al. emulation? Provide actual pools or just make redirect to malloc and provide the required symbols (at the very least CurrentMemoryContext)? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend
From
Andres Freund
Date:
On 2013-01-08 17:36:19 -0300, Alvaro Herrera wrote: > Andres Freund wrote: > > On 2013-01-08 14:35:12 -0500, Tom Lane wrote: > > > Andres Freund <andres@2ndquadrant.com> writes: > > > > On 2013-01-08 14:25:06 -0500, Tom Lane wrote: > > > >> This patch seems unnecessary given that we already put a version of Assert() > > > >> into postgres_fe.h. > > > > > > > The problem is that some (including existing) pieces of code need to > > > > include postgres.h itself, those can't easily include postgres_fe.h as > > > > well without getting into problems with redefinitions. > > > > > > There is no place, anywhere, that should be including both. So I don't > > > see the problem. > > > > Sorry, misremembered the problem somewhat. The problem is that code that > > includes postgres.h atm ends up with ExceptionalCondition() et > > al. declared even if FRONTEND is defined. So if anything uses an assert > > you need to provide wrappers for those which seems nasty. If they are > > provided centrally and check for FRONTEND that problem doesn't exist. > > I think the right fix here is to fix things so that postgres.h is not > necessary. How hard is that? Maybe it just requires some more > reshuffling of xlog headers. I don't really think thats realistic. Think the rmgrdesc/* files and xlogreader.c itself... Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes: > To what extent do you want palloc et al. emulation? Provide actual pools > or just make redirect to malloc and provide the required symbols (at the > very least CurrentMemoryContext)? I don't see any need for memory pools, at least not for frontend applications of the currently envisioned levels of complexity. I concur with Alvaro's suggestion about just #define'ing them to malloc/free --- or maybe better, pg_malloc/free so that we can have a failure-checking wrapper. Not sure how we ought to handle elog, but maybe we can put off that bit of design until we have a more concrete use-case. regards, tom lane
Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
From
Heikki Linnakangas
Date:
On 08.01.2013 22:39, Andres Freund wrote: > On 2013-01-08 15:27:23 -0500, Tom Lane wrote: >> Andres Freund<andres@2ndquadrant.com> writes: >>> Uhm, we don't have& need palloc support and I don't think >>> relpathbackend() is a good justification for adding it. >> >> I've said from the very beginning of this effort that it would be >> impossible to share any meaningful amount of code between frontend and >> backend environments without adding some sort of emulation of >> palloc/pfree/elog. I think this patch is just making the code uglier >> and more fragile to put off the inevitable, and that we'd be better >> served to bite the bullet and do that. > > If you think relpathbackend() alone warrants that, yes, I can provide a > wrapper. Everything else is imo already handled in a sensible and not > really ugly manner? Imo its not worth the effort *for this alone*. > > I already had some elog(), ereport(), whatever emulation but Heikki > preferred not to have it, so its removed by now. > > To what extent do you want palloc et al. emulation? Provide actual pools > or just make redirect to malloc and provide the required symbols (at the > very least CurrentMemoryContext)? Note that the xlogreader facility doesn't need any more emulation. I'm quite satisfied with that part of the patch now. However, the rmgr desc routines do, and I'm not very happy with those. Not sure what to do about it. As you said, we could add enough infrastructure to make them compile in frontend context, or we could try to make them rely less on backend functions. One consideration is that once we have a separate pg_xlogdump utility, I expect that to be the most visible consumer of the rmgr desc functions. The backend will of course still use those functions in e.g error messages, but those don't happen very often. Not sure how that should be taken into account in this patch, but I thought I'd mention it. An rmgr desc routine probably shouldn't be calling elog() even in the backend, because you don't want to throw errors in the contexts where those routines are called. - Heikki
Re: Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend
From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Andres Freund wrote: >> Sorry, misremembered the problem somewhat. The problem is that code that >> includes postgres.h atm ends up with ExceptionalCondition() et >> al. declared even if FRONTEND is defined. So if anything uses an assert >> you need to provide wrappers for those which seems nasty. If they are >> provided centrally and check for FRONTEND that problem doesn't exist. > I think the right fix here is to fix things so that postgres.h is not > necessary. How hard is that? Maybe it just requires some more > reshuffling of xlog headers. That would definitely be the ideal fix. In general, #include'ing postgres.h into code that's not backend code opens all kinds of hazards that are likely to bite us sooner or later; the incompatibility of the Assert definitions is just the tip of that iceberg. (In the past we've had issues around <stdio.h> providing different definitions in the two environments, for example.) But having said that, I'm also now remembering that we have files in src/port/ and probably elsewhere that try to work in both environments by just #include'ing c.h directly (relying on the Makefile to supply -DFRONTEND or not). If we wanted to support Assert use in such files, we would have to move at least the Assert() macro definitions into c.h as Andres is proposing. Now, I've always thought that #include'ing c.h directly was kind of an ugly hack, but I don't have a better design than that to offer ATM. regards, tom lane
Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
From
Andres Freund
Date:
On 2013-01-08 22:47:43 +0200, Heikki Linnakangas wrote: > On 08.01.2013 22:39, Andres Freund wrote: > >On 2013-01-08 15:27:23 -0500, Tom Lane wrote: > >>Andres Freund<andres@2ndquadrant.com> writes: > >>>Uhm, we don't have& need palloc support and I don't think > >>>relpathbackend() is a good justification for adding it. > >> > >>I've said from the very beginning of this effort that it would be > >>impossible to share any meaningful amount of code between frontend and > >>backend environments without adding some sort of emulation of > >>palloc/pfree/elog. I think this patch is just making the code uglier > >>and more fragile to put off the inevitable, and that we'd be better > >>served to bite the bullet and do that. > > > >If you think relpathbackend() alone warrants that, yes, I can provide a > >wrapper. Everything else is imo already handled in a sensible and not > >really ugly manner? Imo its not worth the effort *for this alone*. > > > >I already had some elog(), ereport(), whatever emulation but Heikki > >preferred not to have it, so its removed by now. > > > >To what extent do you want palloc et al. emulation? Provide actual pools > >or just make redirect to malloc and provide the required symbols (at the > >very least CurrentMemoryContext)? > > Note that the xlogreader facility doesn't need any more emulation. I'm quite > satisfied with that part of the patch now. However, the rmgr desc routines > do, and I'm not very happy with those. Not sure what to do about it. As you > said, we could add enough infrastructure to make them compile in frontend > context, or we could try to make them rely less on backend functions. Which emulation needs are you missing in the desc routines besides relpathbackend() and timestamptz_to_str()? pfree() is "just" needed because its used to free the result of relpathbackend(). No own pallocs, no ereport ... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
From
Heikki Linnakangas
Date:
On 08.01.2013 23:00, Andres Freund wrote: >> Note that the xlogreader facility doesn't need any more emulation. I'm quite >> satisfied with that part of the patch now. However, the rmgr desc routines >> do, and I'm not very happy with those. Not sure what to do about it. As you >> said, we could add enough infrastructure to make them compile in frontend >> context, or we could try to make them rely less on backend functions. > > Which emulation needs are you missing in the desc routines besides > relpathbackend() and timestamptz_to_str()? pfree() is "just" needed > because its used to free the result of relpathbackend(). No own pallocs, > no ereport ... Nothing besides those, those are the stuff I was referring to. - Heikki
Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
From
Andres Freund
Date:
On 2013-01-08 15:45:07 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > To what extent do you want palloc et al. emulation? Provide actual pools > > or just make redirect to malloc and provide the required symbols (at the > > very least CurrentMemoryContext)? > > I don't see any need for memory pools, at least not for frontend > applications of the currently envisioned levels of complexity. I concur > with Alvaro's suggestion about just #define'ing them to malloc/free --- > or maybe better, pg_malloc/free so that we can have a failure-checking > wrapper. Unless we want to introduce those into common headers, its more complex than #define's, you actually need to provide at least palloc/pfree/CurrentMemoryContext symbols. Still seems like a shame to do that for one lonely pfree() (+ something an eventual own implementation of relpathbackend(). > Not sure how we ought to handle elog, but maybe we can put off that bit > of design until we have a more concrete use-case. Agreed. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
From
Andres Freund
Date:
On 2013-01-08 23:02:15 +0200, Heikki Linnakangas wrote: > On 08.01.2013 23:00, Andres Freund wrote: > >>Note that the xlogreader facility doesn't need any more emulation. I'm quite > >>satisfied with that part of the patch now. However, the rmgr desc routines > >>do, and I'm not very happy with those. Not sure what to do about it. As you > >>said, we could add enough infrastructure to make them compile in frontend > >>context, or we could try to make them rely less on backend functions. > > > >Which emulation needs are you missing in the desc routines besides > >relpathbackend() and timestamptz_to_str()? pfree() is "just" needed > >because its used to free the result of relpathbackend(). No own pallocs, > >no ereport ... > > Nothing besides those, those are the stuff I was referring to. Yea :(. As I said, I think its reasonable to emulate the former but timestamptz_to_str() seems too be too complex for that. Extracting the whole datetime/timestamp handling into a backend independent "module" seems like complex overkill to me although it might be useful to reduce the duplication of all that code in ecpg. (which deviated quite a bit by now). No idea how to solve that other than returning placeholder data atm. Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
From
Robert Haas
Date:
On Tue, Jan 8, 2013 at 3:00 PM, Andres Freund <andres@2ndquadrant.com> wrote: > Uhm, we don't have & need palloc support and I don't think > relpathbackend() is a good justification for adding it. FWIW, I'm with Tom on this one. Any meaningful code sharing is going to need that, so we might as well bite the bullet. And functions that return static buffers are evil incarnate. I've spent way too much of my life dealing with the supreme idiocy that is fmtId(). If someone ever finds a way to make that go away, I will buy them a beverage of their choice at the next conference we're both at. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
From
Alvaro Herrera
Date:
Robert Haas escribió: > And functions that return static buffers are evil incarnate. I've > spent way too much of my life dealing with the supreme idiocy that is > fmtId(). +1 > If someone ever finds a way to make that go away, I will buy > them a beverage of their choice at the next conference we're both at. +1 -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
From
Andres Freund
Date:
On 2013-01-08 17:28:33 -0500, Robert Haas wrote: > On Tue, Jan 8, 2013 at 3:00 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > Uhm, we don't have & need palloc support and I don't think > > relpathbackend() is a good justification for adding it. > > FWIW, I'm with Tom on this one. Any meaningful code sharing is going > to need that, so we might as well bite the bullet. Yes, if we set the scope bigger than xlogreader I aggree. If its xlogreader itself I don't. But as I happen to think we should share more code... Will prepare a patch. I wonder whether it would be acceptable to make palloc() an actual function instead of #define palloc(sz) MemoryContextAlloc(CurrentMemoryContext, (sz)) so we don't have to expose CurrentMemoryContext? Alternatively we can "just" move the whole of utils/mmgr/* to port, but that would imply an elog/ereport wrapper... > And functions that return static buffers are evil incarnate. I've > spent way too much of my life dealing with the supreme idiocy that is > fmtId(). If someone ever finds a way to make that go away, I will buy > them a beverage of their choice at the next conference we're both at. Imo it depends on the circumstances and number of possible callers, but anyway, it seems to be already decided that my suggestion isn't the way to go. Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > And functions that return static buffers are evil incarnate. I've > spent way too much of my life dealing with the supreme idiocy that is > fmtId(). If someone ever finds a way to make that go away, I will buy > them a beverage of their choice at the next conference we're both at. Yeah, that was exactly the case that was top-of-mind when I was complaining about static return buffers upthread. It's not hard to make the ugliness go away: just let it strdup its return value. The problem is that in the vast majority of usages it wouldn't be convenient to free the result, so we'd have a good deal of memory leakage. What might be interesting is to instrument it to see how much (adding a counter to the function ought to be easy enough) and then find out whether it's an amount we still care about in 2013. Frankly, pg_dump is a memory hog already - a few more identifier-sized strings laying about might not matter anymore. (Wanders away wondering how many relpathbackend callers bother to free its result, and whether that matters either ...) regards, tom lane
Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
From
Robert Haas
Date:
On Tue, Jan 8, 2013 at 6:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> And functions that return static buffers are evil incarnate. I've >> spent way too much of my life dealing with the supreme idiocy that is >> fmtId(). If someone ever finds a way to make that go away, I will buy >> them a beverage of their choice at the next conference we're both at. > > Yeah, that was exactly the case that was top-of-mind when I was > complaining about static return buffers upthread. > > It's not hard to make the ugliness go away: just let it strdup its > return value. The problem is that in the vast majority of usages it > wouldn't be convenient to free the result, so we'd have a good deal > of memory leakage. What might be interesting is to instrument it to > see how much (adding a counter to the function ought to be easy enough) > and then find out whether it's an amount we still care about in 2013. > Frankly, pg_dump is a memory hog already - a few more identifier-sized > strings laying about might not matter anymore. > > (Wanders away wondering how many relpathbackend callers bother to free > its result, and whether that matters either ...) I was thinking more about a sprintf()-type function that only understands a handful of escapes, but adds the additional and novel escapes %I (quote as identifier) and %L (quote as literal). I think that would allow a great deal of code simplification, and it'd be more efficient, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > I was thinking more about a sprintf()-type function that only > understands a handful of escapes, but adds the additional and novel > escapes %I (quote as identifier) and %L (quote as literal). I think > that would allow a great deal of code simplification, and it'd be more > efficient, too. Seems like a great idea. Are you offering to code it? Note that this wouldn't entirely fix the fmtId problem, as not all the uses of fmtId are directly in sprintf calls. Still, it might get rid of most of the places where it'd be painful to avoid a memory leak with a strdup'ing version of fmtId. regards, tom lane
Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
From
Robert Haas
Date:
On Tue, Jan 8, 2013 at 7:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I was thinking more about a sprintf()-type function that only >> understands a handful of escapes, but adds the additional and novel >> escapes %I (quote as identifier) and %L (quote as literal). I think >> that would allow a great deal of code simplification, and it'd be more >> efficient, too. > > Seems like a great idea. Are you offering to code it? Not imminently. > Note that this wouldn't entirely fix the fmtId problem, as not all the > uses of fmtId are directly in sprintf calls. Still, it might get rid of > most of the places where it'd be painful to avoid a memory leak with > a strdup'ing version of fmtId. Yeah, I didn't think about that. Might be worth a look to see how comprehensively it would solve the problem. But I'll have to leave that for another day. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
--- src/bin/pg_resetxlog/pg_resetxlog.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
Attachment
[PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Andres Freund
Date:
Hi, As promised here's a patch to provide palloc emulation for frontend-ish environments. The patch: - makes palloc() into a real function so CurrentMemoryContext doesn't need to be provided - provides common pg_(malloc,malloc0, realloc, strdup, free) wrappers and removes various versions of those across differentutilities - removes ugly palloc redefinery for frontend use of backend code (dirmod.c) Controversial/Unclear things: - palloc[0] are currently copies of the MemoryContextAlloc[Zero] functions to preclude performance regressions, imo the levelof duplication is ok though - the common memory management is implemented in [pg]port/palloc.[ch], I am not too happy with the name and location - pgport/palloc.c is only built in the backend, not sure if there is a nicer way to do this from a make POV - the different versions of pg_malloc et al used different error signaling methods, I've settled on fprintf(stderr, _("outof memory\n")); exit(EXIT_FAILURE); Results in a nice net removal of code:37 files changed, 218 insertions(+), 621 deletions(-)
[PATCH 1/2] Provide a common malloc wrappers and palloc et al. emulation for frontend'ish environs
From
Andres Freund
Date:
--- contrib/oid2name/oid2name.c | 52 +------------ contrib/pg_upgrade/pg_upgrade.h | 5 +- contrib/pg_upgrade/util.c | 49 ------------- contrib/pgbench/pgbench.c | 54 +------------- src/backend/utils/mmgr/mcxt.c | 78 +++++++++++--------- src/bin/initdb/initdb.c | 40 +--------- src/bin/pg_basebackup/pg_basebackup.c | 2 +- src/bin/pg_basebackup/pg_receivexlog.c | 1 + src/bin/pg_basebackup/receivelog.c | 1 + src/bin/pg_basebackup/streamutil.c | 38 +--------- src/bin/pg_basebackup/streamutil.h | 4 - src/bin/pg_ctl/pg_ctl.c | 39 +--------- src/bin/pg_dump/Makefile | 6 +- src/bin/pg_dump/common.c | 1 - src/bin/pg_dump/compress_io.c | 1 - src/bin/pg_dump/dumpmem.c | 76 ------------------- src/bin/pg_dump/dumpmem.h | 22 ------ src/bin/pg_dump/dumputils.h | 1 + src/bin/pg_dump/pg_backup_archiver.c | 1 - src/bin/pg_dump/pg_backup_custom.c | 2 +- src/bin/pg_dump/pg_backup_db.c | 1 - src/bin/pg_dump/pg_backup_directory.c | 1 - src/bin/pg_dump/pg_backup_null.c | 1 - src/bin/pg_dump/pg_backup_tar.c | 1 - src/bin/pg_dump/pg_dump.c | 1 - src/bin/pg_dump/pg_dump_sort.c | 1 - src/bin/pg_dump/pg_dumpall.c | 1 - src/bin/pg_dump/pg_restore.c | 1 - src/bin/psql/common.c | 50 ------------- src/bin/psql/common.h | 10 +-- src/bin/scripts/common.c | 49 ------------- src/bin/scripts/common.h | 5 +- src/include/port/palloc.h | 19 +++++ src/include/utils/palloc.h | 12 +-- src/port/Makefile | 8 +- src/port/dirmod.c | 75 +------------------ src/port/palloc.c | 130 +++++++++++++++++++++++++++++++++ 37 files changed, 218 insertions(+), 621 deletions(-) delete mode 100644 src/bin/pg_dump/dumpmem.c delete mode 100644 src/bin/pg_dump/dumpmem.h create mode 100644 src/include/port/palloc.h create mode 100644 src/port/palloc.c
Attachment
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Heikki Linnakangas
Date:
On 09.01.2013 13:27, Andres Freund wrote: > - makes palloc() into a real function so CurrentMemoryContext doesn't > need to be provided I don't understand the need for this change. Can't you just: #define palloc(s) pg_malloc(s) in the frontend context? - Heikki
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Andres Freund
Date:
On 2013-01-09 13:46:53 +0200, Heikki Linnakangas wrote: > On 09.01.2013 13:27, Andres Freund wrote: > >- makes palloc() into a real function so CurrentMemoryContext doesn't > > need to be provided > > I don't understand the need for this change. Can't you just: > > #define palloc(s) pg_malloc(s) > > in the frontend context? Yes, that would be possible, but imo its the inferior solution: * it precludes ever sharing code without compiling twice * removing allows us to get rid of the following ugliness in dirmod.c: -#ifndef FRONTEND - -/* - * On Windows, call non-macro versions of palloc; we can't reference - * CurrentMemoryContext in this file because of PGDLLIMPORT conflict. - */ -#if defined(WIN32) || defined(__CYGWIN__) -#undef palloc -#undef pstrdup -#define palloc(sz) pgport_palloc(sz) -#define pstrdup(str) pgport_pstrdup(str) -#endif -#else /* FRONTEND */ - * it opens the window for moving more stuff from utils/palloc.h to memutils.h Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH 1/2] Provide a common malloc wrappers and palloc et al. emulation for frontend'ish environs
From
Magnus Hagander
Date:
Am I the only one who finds this way of posting patches really annoying? Here is a patch with no description other than a list of changed files. And discussion happens in a completely different email. What's wrong with just posting the patch as a regular attachment(s) to a regular thread, like other people do? Yes, I'm well aware that some mailers thread them in right. Our archives code does. But many MUAs don't. But even when using a MUA that threads it correctly, I find it quite annoying that you have to open one email to read the comments and a different email toview the patch. It may be just me. But it may be others as well, so I figured I should raise the issue :) //Magnus On Wed, Jan 9, 2013 at 12:27 PM, Andres Freund <andres@2ndquadrant.com> wrote: > --- > contrib/oid2name/oid2name.c | 52 +------------ > contrib/pg_upgrade/pg_upgrade.h | 5 +- > contrib/pg_upgrade/util.c | 49 ------------- > contrib/pgbench/pgbench.c | 54 +------------- > src/backend/utils/mmgr/mcxt.c | 78 +++++++++++--------- > src/bin/initdb/initdb.c | 40 +--------- > src/bin/pg_basebackup/pg_basebackup.c | 2 +- > src/bin/pg_basebackup/pg_receivexlog.c | 1 + > src/bin/pg_basebackup/receivelog.c | 1 + > src/bin/pg_basebackup/streamutil.c | 38 +--------- > src/bin/pg_basebackup/streamutil.h | 4 - > src/bin/pg_ctl/pg_ctl.c | 39 +--------- > src/bin/pg_dump/Makefile | 6 +- > src/bin/pg_dump/common.c | 1 - > src/bin/pg_dump/compress_io.c | 1 - > src/bin/pg_dump/dumpmem.c | 76 ------------------- > src/bin/pg_dump/dumpmem.h | 22 ------ > src/bin/pg_dump/dumputils.h | 1 + > src/bin/pg_dump/pg_backup_archiver.c | 1 - > src/bin/pg_dump/pg_backup_custom.c | 2 +- > src/bin/pg_dump/pg_backup_db.c | 1 - > src/bin/pg_dump/pg_backup_directory.c | 1 - > src/bin/pg_dump/pg_backup_null.c | 1 - > src/bin/pg_dump/pg_backup_tar.c | 1 - > src/bin/pg_dump/pg_dump.c | 1 - > src/bin/pg_dump/pg_dump_sort.c | 1 - > src/bin/pg_dump/pg_dumpall.c | 1 - > src/bin/pg_dump/pg_restore.c | 1 - > src/bin/psql/common.c | 50 ------------- > src/bin/psql/common.h | 10 +-- > src/bin/scripts/common.c | 49 ------------- > src/bin/scripts/common.h | 5 +- > src/include/port/palloc.h | 19 +++++ > src/include/utils/palloc.h | 12 +-- > src/port/Makefile | 8 +- > src/port/dirmod.c | 75 +------------------ > src/port/palloc.c | 130 +++++++++++++++++++++++++++++++++ > 37 files changed, 218 insertions(+), 621 deletions(-) > delete mode 100644 src/bin/pg_dump/dumpmem.c > delete mode 100644 src/bin/pg_dump/dumpmem.h > create mode 100644 src/include/port/palloc.h > create mode 100644 src/port/palloc.c > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
Re: [PATCH 1/2] Provide a common malloc wrappers and palloc et al. emulation for frontend'ish environs
From
Andres Freund
Date:
On 2013-01-09 13:34:12 +0100, Magnus Hagander wrote: > Am I the only one who finds this way of posting patches really annoying? Well, I unsurprisingly don't ;) > Here is a patch with no description other than a list of changed > files. And discussion happens in a completely different email. They contain the commit message - which in most of the cases is more informative than the one just posted, which was definitely rather short. It should like in e.g.http://archives.postgresql.org/message-id/1352942234-3953-11-git-send-email-andres%402ndquadrant.com > What's wrong with just posting the patch as a regular attachment(s) to > a regular thread, like other people do? Two issues: - If you have a bigger series of patches (like the whole logical decoding thing) posting all patches in a single mail makesthe following thread even harder to follow than its currently the case. Note how even in this, far smaller, case thediscussion actually happened in the appropriate subthreads. I find it way much easier to reread through an old threadthat way to reassure myself what was discussed. - mhonarc does really strange things if you attach two git created patches (splits them into multiple mails) > It may be just me. But it may be others as well, so I figured I should > raise the issue :) I am happy to comply with whatever others prefer. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH 1/2] Provide a common malloc wrappers and palloc et al. emulation for frontend'ish environs
From
Magnus Hagander
Date:
On Wed, Jan 9, 2013 at 1:47 PM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2013-01-09 13:34:12 +0100, Magnus Hagander wrote: >> Am I the only one who finds this way of posting patches really annoying? > > Well, I unsurprisingly don't ;) Yeah, that's not surprising :) >> Here is a patch with no description other than a list of changed >> files. And discussion happens in a completely different email. > > They contain the commit message - which in most of the cases is more > informative than the one just posted, which was definitely rather > short. It should like in e.g. > http://archives.postgresql.org/message-id/1352942234-3953-11-git-send-email-andres%402ndquadrant.com They are really two different issues - the posting a patch without a description, and the separation of threads. It's when they are combined together that it becomes *really* annoying :) When it'sposted as a separate email *with* a better commit message it's at least easier to start a discussion off it. But I still find it much omre annoying than just posting the patch in-thread. >> What's wrong with just posting the patch as a regular attachment(s) to >> a regular thread, like other people do? > > Two issues: > - If you have a bigger series of patches (like the whole logical > decoding thing) posting all patches in a single mail makes the > following thread even harder to follow than its currently the > case. Note how even in this, far smaller, case the discussion actually > happened in the appropriate subthreads. I find it way much easier to > reread through an old thread that way to reassure myself what was > discussed. Yes. So one thread per patch. That's what you already have. That's not a factor of how the patches are posted, that's just a factor of how many threads you break it up in. I can agree that posting 20 different patches inthe same thread is even worse :) > - mhonarc does really strange things if you attach two git created > patches (splits them into multiple mails) mhonarc does a lot of strange things. But this part is actually not mhonarc's fault - it's majordomo that writes them into an mbox file in a format that you can't see the difference between the patch and the different message. Heck, it quite often gets it wrong even if you just post *one* patch when it's generated by git. This is handled better by the new archives code. >> It may be just me. But it may be others as well, so I figured I should >> raise the issue :) > > I am happy to comply with whatever others prefer. Yeah, so far it's also just my opinion in the other direction :) Hopefully, some others will have thoughts about it too. --Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
Re: [PATCH 1/2] Provide a common malloc wrappers and palloc et al. emulation for frontend'ish environs
From
Michael Paquier
Date:
On Wed, Jan 9, 2013 at 9:54 PM, Magnus Hagander <magnus@hagander.net> wrote:
OK this would make the review process longer but the good point is that some hackers who are only specialized in some areas of the PG code would be able to give precious feedback.On Wed, Jan 9, 2013 at 1:47 PM, Andres Freund <andres@2ndquadrant.com> wrote:
Yeah, so far it's also just my opinion in the other direction :)
Hopefully, some others will have thoughts about it too.
Just giving my 2c here...
Instead of posting multiple 5~7 patches at the same time, why not limiting the number of patches published at the same time to a lower number (max 2~3)? The logical replication implementation can be surely broken down into many more pieces that could be reviewed carefully one by one, and in a way that would make the implementation steps clearer than it is now for all the people of this ML.
--
Michael Paquier
http://michael.otacoo.com
Re: [PATCH 1/2] Provide a common malloc wrappers and palloc et al. emulation for frontend'ish environs
From
Andres Freund
Date:
On 2013-01-09 22:23:25 +0900, Michael Paquier wrote: > On Wed, Jan 9, 2013 at 9:54 PM, Magnus Hagander <magnus@hagander.net> wrote: > > > On Wed, Jan 9, 2013 at 1:47 PM, Andres Freund <andres@2ndquadrant.com> > > wrote: > > > Yeah, so far it's also just my opinion in the other direction :) > > Hopefully, some others will have thoughts about it too. > > > Just giving my 2c here... > > Instead of posting multiple 5~7 patches at the same time, why not limiting > the number of patches published at the same time to a lower number (max > 2~3)? I tried to do this. ilist, binaryheap and this this thread... ;) > The logical replication implementation can be surely broken down into > many more pieces that could be reviewed carefully one by one, and in a way > that would make the implementation steps clearer than it is now for all the > people of this ML. I don't really see any additional useful splits from the ones made last round. The relfilenode stuff could be submitted separately but thats about it and its seemingly not all that useful itself (I personally want the (tablespace, relfilenode) => reloid mapping function independently, but I seem to be alone). Its hard to get useful review for patches which don't have a patch using the facility nearby in my experience. Where do you see a useful split? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH 1/2] Provide a common malloc wrappers and palloc et al. emulation for frontend'ish environs
From
Alvaro Herrera
Date:
How hard is the backend hit by palloc being now an additional function call? Would it be a good idea to make it (and friends) STATIC_IF_INLINE? > diff --git a/src/include/port/palloc.h b/src/include/port/palloc.h > new file mode 100644 > index 0000000..a7900bf > --- /dev/null > +++ b/src/include/port/palloc.h > @@ -0,0 +1,19 @@ > +/* > + * common.h > + * Common support routines for bin/scripts/ > + * > + * Copyright (c) 2003-2013, PostgreSQL Global Development Group > + * > + * src/bin/scripts/common.h > + */ You forgot to update the above comment. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH 1/2] Provide a common malloc wrappers and palloc et al. emulation for frontend'ish environs
From
Andres Freund
Date:
On 2013-01-09 11:45:12 -0300, Alvaro Herrera wrote: > > How hard is the backend hit by palloc being now an additional function > call? Would it be a good idea to make it (and friends) STATIC_IF_INLINE? > > > diff --git a/src/include/port/palloc.h b/src/include/port/palloc.h > > new file mode 100644 > > index 0000000..a7900bf > > --- /dev/null > > +++ b/src/include/port/palloc.h > > @@ -0,0 +1,19 @@ > > +/* > > + * common.h > > + * Common support routines for bin/scripts/ > > + * > > + * Copyright (c) 2003-2013, PostgreSQL Global Development Group > > + * > > + * src/bin/scripts/common.h > > + */ > > You forgot to update the above comment. *headbang*. Gah. I hate these comments... Will update, but won't send anything again until more changes have accumulated. Thanks! Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH 1/2] Provide a common malloc wrappers and palloc et al. emulation for frontend'ish environs
From
Andres Freund
Date:
On 2013-01-09 11:45:12 -0300, Alvaro Herrera wrote: > > How hard is the backend hit by palloc being now an additional function > call? Would it be a good idea to make it (and friends) STATIC_IF_INLINE? Missed this at first... I don't think there's any measurable hit now as there is no additional function call as I chose to directly do the work directly in palloc[0]. That causes a minor amount of code duplication but imo thats ok. I didn't do that for pstrdup() but it would be easy enough to do it there as well, but I don't think it matters there. In a quick test I couldn't find any performance difference. I don't think making them STATIC_IF_INLINE functions would help as I think that would still require the CurrentMemoryContext symbol to be visible in the callers context. FWIW I previously measured whether the function call overhead via mcxt.c is measurable and it wasn't... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH 1/2] Provide a common malloc wrappers and palloc et al. emulation for frontend'ish environs
From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes: > On Wed, Jan 9, 2013 at 1:47 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> On 2013-01-09 13:34:12 +0100, Magnus Hagander wrote: >>> Am I the only one who finds this way of posting patches really annoying? >> Well, I unsurprisingly don't ;) > Yeah, that's not surprising :) I'm with Magnus. This is seriously annoying, especially when the "discussion" thread has a title not closely related to the "patch" emails. (It doesn't help any that the list server doesn't manage to deliver the emails in order, at least not to me --- more often than not, they're spread out over a few minutes and interleaved with other messages.) I also don't find the argument that the commit messages are a substitute for patch descriptions to be worth anything. Commit messages are, or should be, for a different audience: they are for whoever writes the release notes, or for historical purposes when someone is looking for "which patches touched a particular area?". That's not the same as explaining/justifying the patch for review purposes. Reviewers want a lot more depth than is appropriate in a commit message. (TBH, I rarely use any submitter's proposed commit message anyway, but that's just me.) I'd prefer posting a single message with the discussion and the patch(es). If you think it's helpful to split a patch into separate parts for reviewing, add multiple attachments. But my experience is that such separation isn't nearly as useful as you seem to think. regards, tom lane
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes: > On 2013-01-09 13:46:53 +0200, Heikki Linnakangas wrote: >> I don't understand the need for this change. Can't you just: >> #define palloc(s) pg_malloc(s) >> in the frontend context? > Yes, that would be possible, but imo its the inferior solution: I'm with Heikki: in fact, I will go farther and say that this approach is DOA. The only way you get to change the backend's definition of palloc is if you can show that doing so yields a performance boost in the backend. Otherwise, we use a macro or whatever is necessary to allow frontend code to have a different underlying implementation of palloc(x). > * it precludes ever sharing code without compiling twice That's never going to happen anyway, or at least there are plenty more problems in the way of it. > * removing allows us to get rid of the following ugliness in dirmod.c: I agree that what dirmod is doing is pretty ugly, but it's localized enough that I don't care particularly. (Really, the only reason it's a hack and not The Solution is that at the moment there's only one file doing it that way. A trampoline function for use only by code that needs to work in both frontend and backend isn't an unreasonable idea.) regards, tom lane
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Andres Freund
Date:
On 2013-01-09 11:37:46 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2013-01-09 13:46:53 +0200, Heikki Linnakangas wrote: > >> I don't understand the need for this change. Can't you just: > >> #define palloc(s) pg_malloc(s) > >> in the frontend context? > > > Yes, that would be possible, but imo its the inferior solution: > > I'm with Heikki: in fact, I will go farther and say that this approach > is DOA. The only way you get to change the backend's definition of > palloc is if you can show that doing so yields a performance boost > in the backend. Otherwise, we use a macro or whatever is necessary > to allow frontend code to have a different underlying implementation > of palloc(x). Whats the usecase for being able to redefine it after the patch? Its not like you can do anything interesting given that pfree() et. al. is not a macro. > > * removing allows us to get rid of the following ugliness in dirmod.c: > > I agree that what dirmod is doing is pretty ugly, but it's localized > enough that I don't care particularly. (Really, the only reason it's > a hack and not The Solution is that at the moment there's only one > file doing it that way. A trampoline function for use only by code > that needs to work in both frontend and backend isn't an unreasonable > idea.) Well, after the patch the trampoline function would be palloc() itself. Greetings, Andres Freund
Re: [PATCH 1/2] Provide a common malloc wrappers and palloc et al. emulation for frontend'ish environs
From
Peter Geoghegan
Date:
On 9 January 2013 16:27, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm with Magnus. This is seriously annoying, especially when the > "discussion" thread has a title not closely related to the "patch" > emails. (It doesn't help any that the list server doesn't manage to > deliver the emails in order, at least not to me --- more often than > not, they're spread out over a few minutes and interleaved with other > messages.) I agree. I actually go to some lengths to coordinate all of these threads during review, and I'd prefer not to have to. I would have said something, but it seemed unimportant (relative to the patchset itself). -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Re: [PATCH 1/2] Provide a common malloc wrappers and palloc et al. emulation for frontend'ish environs
From
Andres Freund
Date:
On 2013-01-09 11:27:46 -0500, Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: > > On Wed, Jan 9, 2013 at 1:47 PM, Andres Freund <andres@2ndquadrant.com> wrote: > >> On 2013-01-09 13:34:12 +0100, Magnus Hagander wrote: > >>> Am I the only one who finds this way of posting patches really annoying? > > >> Well, I unsurprisingly don't ;) > > > Yeah, that's not surprising :) > > I'm with Magnus. This is seriously annoying, especially when the > "discussion" thread has a title not closely related to the "patch" > emails. (It doesn't help any that the list server doesn't manage to > deliver the emails in order, at least not to me --- more often than > not, they're spread out over a few minutes and interleaved with other > messages.) Ok, most seem to have a clear preference, so I won't do so anymore. > I also don't find the argument that the commit messages are a substitute > for patch descriptions to be worth anything. Commit messages are, or > should be, for a different audience: they are for whoever writes the > release notes, or for historical purposes when someone is looking for > "which patches touched a particular area?". That's not the same as > explaining/justifying the patch for review purposes. Reviewers want > a lot more depth than is appropriate in a commit message. Aggreed that they have different audiences. > (TBH, I rarely use any submitter's proposed commit message anyway, but > that's just me.) I noticed ;) > > I'd prefer posting a single message with the discussion and the > patch(es). If you think it's helpful to split a patch into separate > parts for reviewing, add multiple attachments. But my experience is > that such separation isn't nearly as useful as you seem to think. Well, would it have been better if xlog reading, ilist, binaryheap, this cleanup, etc. have been in the same patch? They have originated out of the same work... Even the splitup in this thread seems to have helped as youve jumped on the patches where you could give rather quick input (static relpathbackend(), central Assert definitions), probably without having read the xlogreader patch itself... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes: > On 2013-01-09 11:37:46 -0500, Tom Lane wrote: >> I agree that what dirmod is doing is pretty ugly, but it's localized >> enough that I don't care particularly. (Really, the only reason it's >> a hack and not The Solution is that at the moment there's only one >> file doing it that way. A trampoline function for use only by code >> that needs to work in both frontend and backend isn't an unreasonable >> idea.) > Well, after the patch the trampoline function would be palloc() itself. My objection is that you're forcing *all* backend code to go through the trampoline. That probably has a negative impact on performance, and if you think it'll get committed without a thorough proof that there's no such impact, you're mistaken. Perhaps you're not aware of exactly how hot a hot-spot palloc is? It's not unlikely that this patch could hurt backend performance by several percent all by itself. Now on the other side of the coin, it's also possible that it's a wash, particularly on machines where fetching a global variable is a relatively expensive thing to do. But the driving force behind any proposed change to the backend's palloc mechanism has to be performance, and nothing but. Convenience for a patch such as this not only doesn't trump performance, I'm unwilling to grant it any standing at all. regards, tom lane
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Andres Freund
Date:
On 2013-01-09 11:57:35 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2013-01-09 11:37:46 -0500, Tom Lane wrote: > >> I agree that what dirmod is doing is pretty ugly, but it's localized > >> enough that I don't care particularly. (Really, the only reason it's > >> a hack and not The Solution is that at the moment there's only one > >> file doing it that way. A trampoline function for use only by code > >> that needs to work in both frontend and backend isn't an unreasonable > >> idea.) > > > Well, after the patch the trampoline function would be palloc() itself. > > My objection is that you're forcing *all* backend code to go through > the trampoline. That probably has a negative impact on performance, and > if you think it'll get committed without a thorough proof that there's > no such impact, you're mistaken. Perhaps you're not aware of exactly > how hot a hot-spot palloc is? It's not unlikely that this patch could > hurt backend performance by several percent all by itself. There is no additional overhead except some minor increase in code size. If you look at the patch palloc() now simply directly calls CurrentMemoryContext->methods->alloc. So there is no additional function call relative to the previous state. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes: > On 2013-01-09 11:57:35 -0500, Tom Lane wrote: >> My objection is that you're forcing *all* backend code to go through >> the trampoline. That probably has a negative impact on performance, and >> if you think it'll get committed without a thorough proof that there's >> no such impact, you're mistaken. Perhaps you're not aware of exactly >> how hot a hot-spot palloc is? It's not unlikely that this patch could >> hurt backend performance by several percent all by itself. > There is no additional overhead except some minor increase in code size. > If you look at the patch palloc() now simply directly calls > CurrentMemoryContext->methods->alloc. So there is no additional function > call relative to the previous state. Apparently we're failing to communicate, so let me put this as clearly as possible: an unsupported assertion that this patch has zero cost isn't worth the electrons it's written on. We're talking about a place where single instructions can make a large difference. I see that you've avoided an extra level of function call by duplicating code, but there are (at least) two reasons why that could be a loser: * now there are two hotspots not one, ie both MemoryContextAlloc and palloc will be competing for L1 cache, likewise for MemoryContextAllocZero and palloc0; * depending on machine architecture and compiler optimizations, multiple fetches of the global variable CurrentMemoryContext are quite likely to cost more than fetching it once into a parameter register. As I said, it's possible that this is a wash. But it's very possible that it isn't. In any case it's not clear to me why duplicating code like that is a less ugly approach than having different macro definitions for frontend and backend. regards, tom lane
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Andres Freund
Date:
On 2013-01-09 12:32:20 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2013-01-09 11:57:35 -0500, Tom Lane wrote: > >> My objection is that you're forcing *all* backend code to go through > >> the trampoline. That probably has a negative impact on performance, and > >> if you think it'll get committed without a thorough proof that there's > >> no such impact, you're mistaken. Perhaps you're not aware of exactly > >> how hot a hot-spot palloc is? It's not unlikely that this patch could > >> hurt backend performance by several percent all by itself. > > > There is no additional overhead except some minor increase in code size. > > If you look at the patch palloc() now simply directly calls > > CurrentMemoryContext->methods->alloc. So there is no additional function > > call relative to the previous state. > > Apparently we're failing to communicate, so let me put this as clearly > as possible: an unsupported assertion that this patch has zero cost > isn't worth the electrons it's written on. Well, I *did* benchmark it as noted elsewhere in the thread, but thats obviously just machine (E5520 x 2) with one rather restricted workload (pgbench -S -jc 40 -T60). At least its rather palloc heavy. Here are the numbers: before: #101646.763208 101350.361595 101421.425668 101571.211688 101862.172051 101449.857665 after: #101553.596257 102132.277795 101528.816229 101733.541792 101438.531618 101673.400992 So on my system if there is a difference, its positive (0.12%). > In any case it's not clear to me why duplicating code like that is a > less ugly approach than having different macro definitions for frontend > and backend. Imo its better abstracted and more consistent (pfree() is not a macro, palloc() is) and actually makes it easier to redirect code somewhere else. It also opens the door for exposing less knowledge in utils/palloc.h. Anyway, we can use a macro instead, I don't think its that important. Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: Re: [PATCH 1/2] Provide a common malloc wrappers and palloc et al. emulation for frontend'ish environs
From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes: > On 2013-01-09 11:27:46 -0500, Tom Lane wrote: >> I'd prefer posting a single message with the discussion and the >> patch(es). If you think it's helpful to split a patch into separate >> parts for reviewing, add multiple attachments. But my experience is >> that such separation isn't nearly as useful as you seem to think. > Well, would it have been better if xlog reading, ilist, binaryheap, this > cleanup, etc. have been in the same patch? They have originated out of > the same work... > Even the splitup in this thread seems to have helped as youve jumped on > the patches where you could give rather quick input (static > relpathbackend(), central Assert definitions), probably without having > read the xlogreader patch itself... No, I agree that global-impact things like this palloc rearrangement are much better proposed and debated separately than as part of something like xlogreader. What I was reacting to was the specific patch set associated with this thread. I don't see the point of breaking out a two-line sub-patch such as you did in http://archives.postgresql.org/message-id/1357730830-25999-3-git-send-email-andres@2ndquadrant.com regards, tom lane
Re: Re: [PATCH 1/2] Provide a common malloc wrappers and palloc et al. emulation for frontend'ish environs
From
"anarazel@anarazel.de"
Date:
Tom Lane <tgl@sss.pgh.pa.us> schrieb: >Andres Freund <andres@2ndquadrant.com> writes: >> On 2013-01-09 11:27:46 -0500, Tom Lane wrote: >>> I'd prefer posting a single message with the discussion and the >>> patch(es). If you think it's helpful to split a patch into separate >>> parts for reviewing, add multiple attachments. But my experience is >>> that such separation isn't nearly as useful as you seem to think. > >> Well, would it have been better if xlog reading, ilist, binaryheap, >this >> cleanup, etc. have been in the same patch? They have originated out >of >> the same work... >> Even the splitup in this thread seems to have helped as youve jumped >on >> the patches where you could give rather quick input (static >> relpathbackend(), central Assert definitions), probably without >having >> read the xlogreader patch itself... > >No, I agree that global-impact things like this palloc rearrangement >are >much better proposed and debated separately than as part of something >like xlogreader. What I was reacting to was the specific patch set >associated with this thread. I don't see the point of breaking out a >two-line sub-patch such as you did in >http://archives.postgresql.org/message-id/1357730830-25999-3-git-send-email-andres@2ndquadrant.com Ah, yes. I See your point. The not all that good reasoning I had in mind was that that one should be uncontroversial as itseemed to be the only unchecked malloc call in src/bin. So it could be committed independent from the more controversialstuff... Same with the single whitespace removal patch upthread... Andres --- Please excuse brevity and formatting - I am writing this on my mobile phone.
Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes: > Well, I *did* benchmark it as noted elsewhere in the thread, but thats > obviously just machine (E5520 x 2) with one rather restricted workload > (pgbench -S -jc 40 -T60). At least its rather palloc heavy. > Here are the numbers: > before: > #101646.763208 101350.361595 101421.425668 101571.211688 101862.172051 101449.857665 > after: > #101553.596257 102132.277795 101528.816229 101733.541792 101438.531618 101673.400992 > So on my system if there is a difference, its positive (0.12%). pgbench-based testing doesn't fill me with a lot of confidence for this --- its numbers contain a lot of communication overhead, not to mention that pgbench itself can be a bottleneck. It struck me that we have a recent test case that's known to be really palloc-intensive, namely Pavel's example here: http://www.postgresql.org/message-id/CAFj8pRCKfoz6L82PovLXNK-1JL=jzjwaT8e2BD2PwNKm7i7KVg@mail.gmail.com I set up a non-cassert build of commit 78a5e738e97b4dda89e1bfea60675bcf15f25994 (ie, just before the patch that reduced the data-copying overhead for that). On my Fedora 16 machine (dual 2.0GHz Xeon E5503, gcc version 4.6.3 20120306 (Red Hat 4.6.3-2)) I get a runtime for Pavel's example of 17023 msec (average over five runs). I then applied oprofile and got a breakdown like this: samples| %| ------------------ 108409 84.5083 /home/tgl/testversion/bin/postgres 13723 10.6975 /lib64/libc-2.14.90.so 3153 2.4579/home/tgl/testversion/lib/postgresql/plpgsql.so samples % symbol name 10960 10.1495 AllocSetAlloc 6325 5.8572 MemoryContextAllocZeroAligned 6225 5.7646 base_yyparse 3765 3.4866 copyObject 2511 2.3253 MemoryContextAlloc 2292 2.1225 grouping_planner 2044 1.8928 SearchCatCache 1956 1.8113 core_yylex 1763 1.6326 expression_tree_walker 1347 1.2474 MemoryContextCreate 1340 1.2409 check_stack_depth 1276 1.1816 GetCachedPlan 1175 1.0881 AllocSetFree 1106 1.0242 GetSnapshotData 1106 1.0242 _SPI_execute_plan 1101 1.0196 extract_query_dependencies_walker I then applied the palloc.h and mcxt.c hunks of your patch and rebuilt. Now I get an average runtime of 16666 ms, a full 2% faster, which is a bit astonishing, particularly because the oprofile results haven't moved much: 107642 83.7427 /home/tgl/testversion/bin/postgres 14677 11.4183 /lib64/libc-2.14.90.so 3180 2.4740 /home/tgl/testversion/lib/postgresql/plpgsql.so samples % symbol name 10038 9.3537 AllocSetAlloc 6392 5.9562 MemoryContextAllocZeroAligned 5763 5.3701 base_yyparse 4810 4.4821 copyObject 2268 2.1134 grouping_planner 2178 2.0295 core_yylex 1963 1.8292 palloc 1867 1.7397 SearchCatCache 1835 1.7099 expression_tree_walker 1551 1.4453 check_stack_depth 1374 1.2803 _SPI_execute_plan 1282 1.1946 MemoryContextCreate 1187 1.1061 AllocSetFree ... 653 0.6085 palloc0 ... 552 0.5144 MemoryContextAlloc The number of calls of AllocSetAlloc certainly hasn't changed at all, so how did that get faster? I notice that the postgres executable is about 0.2% smaller, presumably because a whole lot of inlined fetches of CurrentMemoryContext are gone. This makes me wonder if my result is due to chance improvements of cache line alignment for inner loops. I would like to know if other people get comparable results on other hardware (non-Intel hardware would be especially interesting). If this result holds up across a range of platforms, I'll withdraw my objection to making palloc a plain function. regards, tom lane
Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Tom Lane
Date:
I wrote: > I then applied the palloc.h and mcxt.c hunks of your patch and rebuilt. > Now I get an average runtime of 16666 ms, a full 2% faster, which is a > bit astonishing, particularly because the oprofile results haven't moved > much: I studied the assembly code being generated for palloc(), and I believe I see the reason why it's a bit faster: when there's only a single local variable that has to survive over the elog call, gcc generates a shorter function entry/exit sequence. I had thought of proposing that we code palloc() like this: void * palloc(Size size) { MemoryContext context = CurrentMemoryContext; AssertArg(MemoryContextIsValid(context)); if (!AllocSizeIsValid(size)) elog(ERROR, "invalid memory alloc request size %lu", (unsigned long) size); context->isReset = false; return (*context->methods->alloc) (context, size); } but at least on this specific hardware and compiler that would evidently be a net loss compared to direct use of CurrentMemoryContext. I would not put a lot of faith in that result holding up on other machines though. In any case this doesn't explain the whole 2% speedup, but it probably accounts for palloc() showing as slightly cheaper than MemoryContextAlloc had been in the oprofile listing. regards, tom lane
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Andres Freund
Date:
On 2013-01-09 15:43:19 -0500, Tom Lane wrote: > I wrote: > > I then applied the palloc.h and mcxt.c hunks of your patch and rebuilt. > > Now I get an average runtime of 16666 ms, a full 2% faster, which is a > > bit astonishing, particularly because the oprofile results haven't moved > > much: > > I studied the assembly code being generated for palloc(), and I believe > I see the reason why it's a bit faster: when there's only a single local > variable that has to survive over the elog call, gcc generates a shorter > function entry/exit sequence. Makes sense. > I had thought of proposing that we code > palloc() like this: > > void * > palloc(Size size) > { > MemoryContext context = CurrentMemoryContext; > > AssertArg(MemoryContextIsValid(context)); > > if (!AllocSizeIsValid(size)) > elog(ERROR, "invalid memory alloc request size %lu", > (unsigned long) size); > > context->isReset = false; > > return (*context->methods->alloc) (context, size); > } > > but at least on this specific hardware and compiler that would evidently > be a net loss compared to direct use of CurrentMemoryContext. I would > not put a lot of faith in that result holding up on other machines > though. Thats not optimized to the same? ISTM the compiler should produce exactly the same code for both. > In any case this doesn't explain the whole 2% speedup, but it probably > accounts for palloc() showing as slightly cheaper than > MemoryContextAlloc had been in the oprofile listing. I'd guess that a good part of the cost is just smeared across all callers and not individually accountable to any function visible in the profile. Additionally, With functions as short as MemoryContextAllocZero profiles like oprofile (and perf) also often leak quite a bit of the actual cost to the callsites in my experience. I wonder whether it makes sense to "inline" the contents pstrdup() additionally? My gut feeling is not, but... I would like to move CurrentMemoryContext to memutils.h, but that seems to require too many changes. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Andres Freund
Date:
On 2013-01-09 15:07:10 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > Well, I *did* benchmark it as noted elsewhere in the thread, but thats > > obviously just machine (E5520 x 2) with one rather restricted workload > > (pgbench -S -jc 40 -T60). At least its rather palloc heavy. > > > Here are the numbers: > > > before: > > #101646.763208 101350.361595 101421.425668 101571.211688 101862.172051 101449.857665 > > after: > > #101553.596257 102132.277795 101528.816229 101733.541792 101438.531618 101673.400992 > > > So on my system if there is a difference, its positive (0.12%). > > pgbench-based testing doesn't fill me with a lot of confidence for this > --- its numbers contain a lot of communication overhead, not to mention > that pgbench itself can be a bottleneck. I chose it because I looked at profiles of it often enough to know memory allocation shows up high in the profile (especially without -M prepared). > I would like to know if other people get comparable results on other > hardware (non-Intel hardware would be especially interesting). If this > result holds up across a range of platforms, I'll withdraw my objection > to making palloc a plain function. I only have access to core2 and Nehalem atm, but FWIW I just tested it on another workload (decoding WAL changes of a large transaction into text) and I see improvements around 1.2% on both. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes: > On 2013-01-09 15:43:19 -0500, Tom Lane wrote: >> I had thought of proposing that we code >> palloc() like this: >> >> void * >> palloc(Size size) >> { >> MemoryContext context = CurrentMemoryContext; >> >> AssertArg(MemoryContextIsValid(context)); >> >> if (!AllocSizeIsValid(size)) >> elog(ERROR, "invalid memory alloc request size %lu", >> (unsigned long) size); >> >> context->isReset = false; >> >> return (*context->methods->alloc) (context, size); >> } >> >> but at least on this specific hardware and compiler that would evidently >> be a net loss compared to direct use of CurrentMemoryContext. I would >> not put a lot of faith in that result holding up on other machines >> though. > Thats not optimized to the same? ISTM the compiler should produce > exactly the same code for both. No, that's exactly the point here, you can't just assume that you get the same code; this is tense enough that a few instructions matter. Remember we're considering non-assert builds, so the AssertArg vanishes. So in the form where CurrentMemoryContext is only read after the elog call, the compiler can see that it requires but one fetch - it can't change within the two lines where it's used. In the form given above, the compiler is required to fetch CurrentMemoryContext into a local variable and keep it across the elog call. (We know this doesn't matter, but gcc doesn't; this version of gcc apparently isn't doing much with the knowledge that elog won't return.) Since the extra local variable adds several instructions to the overall function's entry/exit sequences, you come out behind. At least on this platform. On other machines with different code generation behavior, the story could easily be very different. >> In any case this doesn't explain the whole 2% speedup, but it probably >> accounts for palloc() showing as slightly cheaper than >> MemoryContextAlloc had been in the oprofile listing. > I'd guess that a good part of the cost is just smeared across all > callers and not individually accountable to any function visible in the > profile. Yeah, I think this is most likely showing that there is a real, measurable cost of code bloat. regards, tom lane
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes: > I wonder whether it makes sense to "inline" the contents pstrdup() > additionally? My gut feeling is not, but... I don't recall ever having seen MemoryContextStrdup amount to much, so probably not worth the extra code space. regards, tom lane
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Andres Freund
Date:
On 2013-01-09 17:28:35 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2013-01-09 15:43:19 -0500, Tom Lane wrote: > >> I had thought of proposing that we code > >> palloc() like this: > >> > >> void * > >> palloc(Size size) > >> { > >> MemoryContext context = CurrentMemoryContext; > >> > >> AssertArg(MemoryContextIsValid(context)); > >> > >> if (!AllocSizeIsValid(size)) > >> elog(ERROR, "invalid memory alloc request size %lu", > >> (unsigned long) size); > >> > >> context->isReset = false; > >> > >> return (*context->methods->alloc) (context, size); > >> } > >> > >> but at least on this specific hardware and compiler that would evidently > >> be a net loss compared to direct use of CurrentMemoryContext. I would > >> not put a lot of faith in that result holding up on other machines > >> though. > > > Thats not optimized to the same? ISTM the compiler should produce > > exactly the same code for both. > > No, that's exactly the point here, you can't just assume that you get > the same code; this is tense enough that a few instructions matter. > Remember we're considering non-assert builds, so the AssertArg vanishes. > So in the form where CurrentMemoryContext is only read after the elog > call, the compiler can see that it requires but one fetch - it can't > change within the two lines where it's used. In the form given above, > the compiler is required to fetch CurrentMemoryContext into a local > variable and keep it across the elog call. I had thought that it could simply store the address in a callee saved register inside palloc, totally forgetting that palloc would need to save that register itself in that case. > (We know this doesn't > matter, but gcc doesn't; this version of gcc apparently isn't doing much > with the knowledge that elog won't return.) Afaics one reason for that is that we don't have any such annotation for elog(), just for ereport. And I don't immediately see how it could be added to elog without relying on variadic macros. Bit of a shame, there probably a good number of callsites that could actually benefit from that knowledge. Is it worth making that annotation conditional on variadic macro support? Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Andres Freund
Date:
On 2013-01-10 00:05:07 +0100, Andres Freund wrote: > On 2013-01-09 17:28:35 -0500, Tom Lane wrote: > > (We know this doesn't > > matter, but gcc doesn't; this version of gcc apparently isn't doing much > > with the knowledge that elog won't return.) > > Afaics one reason for that is that we don't have any such annotation for > elog(), just for ereport. And I don't immediately see how it could be > added to elog without relying on variadic macros. Bit of a shame, there > probably a good number of callsites that could actually benefit from > that knowledge. > Is it worth making that annotation conditional on variadic macro > support? A quick test confirms that. With: diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h index cbbda04..39cd809 100644 --- a/src/include/utils/elog.h +++ b/src/include/utils/elog.h @@ -212,7 +212,13 @@ extern int getinternalerrposition(void); * elog(ERROR, "portal \"%s\" not found", stmt->portalname);*---------- */ +#ifdef __GNUC__ +#define elog(elevel, ...) elog_start(__FILE__, __LINE__, PG_FUNCNAME_MACRO), \ + elog_finish(elevel, __VA_ARGS__), \ + ((elevel) >= ERROR ? __builtin_unreachable() : (void) 0) +#else#define elog elog_start(__FILE__, __LINE__, PG_FUNCNAME_MACRO), elog_finish +#endifextern void elog_start(const char *filename, int lineno, const char *funcname);extern void nearly the same code is generated for both variants (this is no additionally saved registers and such). Two unfinished things: * the above only checks for gcc although all C99 compilers should support varargs macros * It uses __builtin_unreachable() instead of abort() as that creates a smaller executable. That's gcc 4.5 only. Saves about30kb in a both a stripped and unstripped binary. * elog(ERROR) i.e. no args would require more macro ugliness, but that seems unneccesary Doing the __builtin_unreachable() for ereport as well saves about 50kb. Given that some of those elog/ereports are in performance critical code paths it seems like a good idea to remove code from the callsites. Adding configure check for __VA_ARGS__ and __builtin_unreachable() should be simple enough? Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Andres Freund
Date:
On 2013-01-10 10:31:04 +0100, Andres Freund wrote: > On 2013-01-10 00:05:07 +0100, Andres Freund wrote: > > On 2013-01-09 17:28:35 -0500, Tom Lane wrote: > > > (We know this doesn't > > > matter, but gcc doesn't; this version of gcc apparently isn't doing much > > > with the knowledge that elog won't return.) > > > > Afaics one reason for that is that we don't have any such annotation for > > elog(), just for ereport. And I don't immediately see how it could be > > added to elog without relying on variadic macros. Bit of a shame, there > > probably a good number of callsites that could actually benefit from > > that knowledge. > > Is it worth making that annotation conditional on variadic macro > > support? > > A quick test confirms that. With: > > diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h > index cbbda04..39cd809 100644 > --- a/src/include/utils/elog.h > +++ b/src/include/utils/elog.h > @@ -212,7 +212,13 @@ extern int getinternalerrposition(void); > * elog(ERROR, "portal \"%s\" not found", stmt->portalname); > *---------- > */ > +#ifdef __GNUC__ > +#define elog(elevel, ...) elog_start(__FILE__, __LINE__, PG_FUNCNAME_MACRO), \ > + elog_finish(elevel, __VA_ARGS__), \ > + ((elevel) >= ERROR ? __builtin_unreachable() : (void) 0) > +#else > #define elog elog_start(__FILE__, __LINE__, PG_FUNCNAME_MACRO), elog_finish > +#endif > > extern void elog_start(const char *filename, int lineno, const char *funcname); > extern void > > nearly the same code is generated for both variants (this is no > additionally saved registers and such). > > Two unfinished things: > * the above only checks for gcc although all C99 compilers should > support varargs macros > * It uses __builtin_unreachable() instead of abort() as that creates > a smaller executable. That's gcc 4.5 only. Saves about 30kb in a > both a stripped and unstripped binary. > * elog(ERROR) i.e. no args would require more macro ugliness, but that > seems unneccesary > > Doing the __builtin_unreachable() for ereport as well saves about 50kb. > > Given that some of those elog/ereports are in performance critical code > paths it seems like a good idea to remove code from the > callsites. Adding configure check for __VA_ARGS__ and > __builtin_unreachable() should be simple enough? For whatever its worth - I am not sure its much - after relying on variadic macros we can make elog into one function call instead of two. That saves another 100kb. [tinker] The builtin_unreachable and combined elog thing bring a measurable performance benefit of a rather surprising 7% when running Pavel's testcase ontop of yesterdays HEAD. So it seems worth investigating. About 3% of that is the __builtin_unreachable() and about 4% the combining of elog_start/finish. Now that testcase sure isn't a very representative one for this, but I don't really see why it should benefit much more than other cases. Seems to be quite the benefit for relatively minor cost. Although I am pretty sure just copying the whole of elog_start/elog_finish into one elog_all() isn't the best way to go at this... Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Andres Freund
Date:
On 2013-01-10 10:55:20 +0100, Andres Freund wrote: > On 2013-01-10 10:31:04 +0100, Andres Freund wrote: > > On 2013-01-10 00:05:07 +0100, Andres Freund wrote: > > > On 2013-01-09 17:28:35 -0500, Tom Lane wrote: > > > > (We know this doesn't > > > > matter, but gcc doesn't; this version of gcc apparently isn't doing much > > > > with the knowledge that elog won't return.) > > > > > > Afaics one reason for that is that we don't have any such annotation for > > > elog(), just for ereport. And I don't immediately see how it could be > > > added to elog without relying on variadic macros. Bit of a shame, there > > > probably a good number of callsites that could actually benefit from > > > that knowledge. > > > Is it worth making that annotation conditional on variadic macro > > > support? > > > > A quick test confirms that. With: > > > > diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h > > index cbbda04..39cd809 100644 > > --- a/src/include/utils/elog.h > > +++ b/src/include/utils/elog.h > > @@ -212,7 +212,13 @@ extern int getinternalerrposition(void); > > * elog(ERROR, "portal \"%s\" not found", stmt->portalname); > > *---------- > > */ > > +#ifdef __GNUC__ > > +#define elog(elevel, ...) elog_start(__FILE__, __LINE__, PG_FUNCNAME_MACRO), \ > > + elog_finish(elevel, __VA_ARGS__), \ > > + ((elevel) >= ERROR ? __builtin_unreachable() : (void) 0) > > +#else > > #define elog elog_start(__FILE__, __LINE__, PG_FUNCNAME_MACRO), elog_finish > > +#endif > > > > extern void elog_start(const char *filename, int lineno, const char *funcname); > > extern void > > > > nearly the same code is generated for both variants (this is no > > additionally saved registers and such). > > > > Two unfinished things: > > * the above only checks for gcc although all C99 compilers should > > support varargs macros > > * It uses __builtin_unreachable() instead of abort() as that creates > > a smaller executable. That's gcc 4.5 only. Saves about 30kb in a > > both a stripped and unstripped binary. > > * elog(ERROR) i.e. no args would require more macro ugliness, but that > > seems unneccesary > > > > Doing the __builtin_unreachable() for ereport as well saves about 50kb. > > > > Given that some of those elog/ereports are in performance critical code > > paths it seems like a good idea to remove code from the > > callsites. Adding configure check for __VA_ARGS__ and > > __builtin_unreachable() should be simple enough? > > For whatever its worth - I am not sure its much - after relying on > variadic macros we can make elog into one function call instead of > two. That saves another 100kb. > > [tinker] > > The builtin_unreachable and combined elog thing bring a measurable > performance benefit of a rather surprising 7% when running Pavel's > testcase ontop of yesterdays HEAD. So it seems worth investigating. > About 3% of that is the __builtin_unreachable() and about 4% the > combining of elog_start/finish. > > Now that testcase sure isn't a very representative one for this, but I > don't really see why it should benefit much more than other cases. Seems > to be quite the benefit for relatively minor cost. > Although I am pretty sure just copying the whole of > elog_start/elog_finish into one elog_all() isn't the best way to go at > this... The attached patch: * adds configure checks for __VA_ARGS__ and __builtin_unreachable support * provides a pg_unreachable() macro that expands to __builtin_unreachable() or abort() * changes #define elog ... into #define elog(elevel, ...) if varargs are available It does *not* combine elog_start and elog_finish into one function if varargs are available although that brings a rather measurable size/performance benefit. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes: > The attached patch: > * adds configure checks for __VA_ARGS__ and __builtin_unreachable > support > * provides a pg_unreachable() macro that expands to > __builtin_unreachable() or abort() > * changes #define elog ... into #define elog(elevel, ...) if varargs are > available Seems like a good thing to do --- will review and commit. > It does *not* combine elog_start and elog_finish into one function if > varargs are available although that brings a rather measurable > size/performance benefit. Since you've apparently already done the measurement: how much? It would be a bit tedious to support two different infrastructures for elog(), but if it's a big enough win maybe we should. regards, tom lane
Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Andres Freund
Date:
On 2013-01-11 14:01:40 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > The attached patch: > > > * adds configure checks for __VA_ARGS__ and __builtin_unreachable > > support > > * provides a pg_unreachable() macro that expands to > > __builtin_unreachable() or abort() > > * changes #define elog ... into #define elog(elevel, ...) if varargs are > > available > > Seems like a good thing to do --- will review and commit. Thanks. I guess you will catch that anyway, but afaik msvc supports __VA_ARGS__ these days (since v2005 seemingly) and I didn't add HAVE__VA_ARGS to the respective pg_config.h.win32 > > It does *not* combine elog_start and elog_finish into one function if > > varargs are available although that brings a rather measurable > > size/performance benefit. > > Since you've apparently already done the measurement: how much? > It would be a bit tedious to support two different infrastructures for > elog(), but if it's a big enough win maybe we should. Imo its pretty definitely a big enough win. So big I have a hard time believing it can be true without negative effects somewhere else. Ontop of the patch youve replied to it saves somewhere around 80kb and between 0.8 (-S -M prepared), 2% (-S -M simple) and 4% (own stuff) I had lying around and it consistently gives 6%-7% in Pavel's testcase. With todays HEAD, not with the one before your fix. I attached the absolutely ugly and unready patch I used for testing. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes: > [ patch to mark elog() as non-returning if elevel >= ERROR ] It strikes me that both in this patch, and in Peter's previous patch to do this for ereport(), there is an opportunity for improvement. Namely, that the added code only has any use if elevel is a constant, which in some places it isn't. We don't really need a call to abort() to be executed there, but the compiler knows no better and will have to generate code to test the value of elevel and call abort(). Which rather defeats the purpose of saving code; plus the compiler will still not have any knowledge that code after the ereport isn't reached. And another thing: what if the elevel argument isn't safe for multiple evaluation? No such hazard ever existed before these patches, so I'm not very comfortable with adding one. (Even if all our own code is safe, there's third-party code to worry about.) When we're using recent gcc, there is an easy fix, which is that the macros could do this: __builtin_constant_p(elevel) && (elevel) >= ERROR : pg_unreachable() : (void) 0 This gets rid of both useless code and the risk of undesirable multiple evaluation, while not giving up any optimization possibility that existed before. So I think we should add a configure test for __builtin_constant_p() and do it like that when possible. That still leaves the question of what to do when the compiler doesn't have __builtin_constant_p(). Should we use the coding that we have, or give up and not try to mark the macros as non-returning? I'm rather inclined to do the latter, because of the multiple-evaluation-risk argument. Comments? regards, tom lane
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Andres Freund
Date:
On 2013-01-11 15:05:54 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > [ patch to mark elog() as non-returning if elevel >= ERROR ] > > It strikes me that both in this patch, and in Peter's previous patch to > do this for ereport(), there is an opportunity for improvement. Namely, > that the added code only has any use if elevel is a constant, which in > some places it isn't. We don't really need a call to abort() to be > executed there, but the compiler knows no better and will have to > generate code to test the value of elevel and call abort(). > Which rather defeats the purpose of saving code; plus the compiler will still > not have any knowledge that code after the ereport isn't reached. Yea, I think that's why __builtin_unreachable() is a performance benefit in comparison with abort(). Both are a benefit over not doing either btw in my measurements. > And another thing: what if the elevel argument isn't safe for multiple > evaluation? No such hazard ever existed before these patches, so I'm > not very comfortable with adding one. (Even if all our own code is > safe, there's third-party code to worry about.) Hm. I am not really too scared about those dangers I have to admit. If at all I am caring more for ereport than for elog. Placing code with side effects as elevel arguments just seems a bit too absurd. > When we're using recent gcc, there is an easy fix, which is that the > macros could do this: Recent as in 3.1 or so ;) Its really too bad that there's no __builtin_pure() or __builtin_side_effect_free() or somesuch :(. > __builtin_constant_p(elevel) && (elevel) >= ERROR : pg_unreachable() : (void) 0 > > This gets rid of both useless code and the risk of undesirable multiple > evaluation, while not giving up any optimization possibility that > existed before. So I think we should add a configure test for > __builtin_constant_p() and do it like that when possible. I think there might be code somewhere that decides between FATAL and PANIC which would not hit that path then. Imo not a good enough reason not to do it, but I wanted to mention it anyway. > That still leaves the question of what to do when the compiler doesn't > have __builtin_constant_p(). Should we use the coding that we have, > or give up and not try to mark the macros as non-returning? I'm rather > inclined to do the latter, because of the multiple-evaluation-risk > argument. I dislike the latter somewhat as it would mean not to give that information at all to msvc and others which seems a bit sad. But I don't feel particularly strongly. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes: > On 2013-01-11 15:05:54 -0500, Tom Lane wrote: >> And another thing: what if the elevel argument isn't safe for multiple >> evaluation? No such hazard ever existed before these patches, so I'm >> not very comfortable with adding one. (Even if all our own code is >> safe, there's third-party code to worry about.) > Hm. I am not really too scared about those dangers I have to admit. I agree the scenario doesn't seem all that probable, but what scares me here is that if we use "__builtin_constant_p(elevel) && (elevel) >= ERROR" in some builds, and just "(elevel) >= ERROR" in others, then if there is any code with a multiple-evaluation hazard, it is only buggy in the latter builds. That's sufficiently nasty that I'm willing to give up an optimization that we never had before 9.3 anyway. > I dislike the latter somewhat as it would mean not to give that > information at all to msvc and others which seems a bit sad. But I don't > feel particularly strongly. It's not like our code isn't rather gcc-biased anyway. I'd keep the optimization for other compilers if possible, but not at the cost of introducing bugs that occur only on those compilers. regards, tom lane
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Andres Freund
Date:
On 2013-01-11 15:52:19 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2013-01-11 15:05:54 -0500, Tom Lane wrote: > >> And another thing: what if the elevel argument isn't safe for multiple > >> evaluation? No such hazard ever existed before these patches, so I'm > >> not very comfortable with adding one. (Even if all our own code is > >> safe, there's third-party code to worry about.) > > > Hm. I am not really too scared about those dangers I have to admit. > > I agree the scenario doesn't seem all that probable, but what scares me > here is that if we use "__builtin_constant_p(elevel) && (elevel) >= ERROR" > in some builds, and just "(elevel) >= ERROR" in others, then if there is > any code with a multiple-evaluation hazard, it is only buggy in the > latter builds. That's sufficiently nasty that I'm willing to give up > an optimization that we never had before 9.3 anyway. Well, why use it at all then and not just rely on __builtin_unreachable() in any recent gcc (and llvm fwiw) and abort() otherwise? Then the code is small for anything recent (gcc 4.4 afair) and always consistently buggy. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes: > On 2013-01-11 15:52:19 -0500, Tom Lane wrote: >> I agree the scenario doesn't seem all that probable, but what scares me >> here is that if we use "__builtin_constant_p(elevel) && (elevel) >= ERROR" >> in some builds, and just "(elevel) >= ERROR" in others, then if there is >> any code with a multiple-evaluation hazard, it is only buggy in the >> latter builds. That's sufficiently nasty that I'm willing to give up >> an optimization that we never had before 9.3 anyway. > Well, why use it at all then and not just rely on > __builtin_unreachable() in any recent gcc (and llvm fwiw) and abort() > otherwise? Then the code is small for anything recent (gcc 4.4 afair) > and always consistently buggy. Uh ... because it's *not* unreachable if elevel < ERROR. Otherwise we'd just mark errfinish as __attribute((noreturn)) and be done. Of course, that's a gcc-ism too. regards, tom lane
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Andres Freund
Date:
On 2013-01-11 16:16:58 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2013-01-11 15:52:19 -0500, Tom Lane wrote: > >> I agree the scenario doesn't seem all that probable, but what scares me > >> here is that if we use "__builtin_constant_p(elevel) && (elevel) >= ERROR" > >> in some builds, and just "(elevel) >= ERROR" in others, then if there is > >> any code with a multiple-evaluation hazard, it is only buggy in the > >> latter builds. That's sufficiently nasty that I'm willing to give up > >> an optimization that we never had before 9.3 anyway. > > > Well, why use it at all then and not just rely on > > __builtin_unreachable() in any recent gcc (and llvm fwiw) and abort() > > otherwise? Then the code is small for anything recent (gcc 4.4 afair) > > and always consistently buggy. > > Uh ... because it's *not* unreachable if elevel < ERROR. Otherwise we'd > just mark errfinish as __attribute((noreturn)) and be done. Of course, > that's a gcc-ism too. Well, I mean with the double evaluation risk. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes: > On 2013-01-11 16:16:58 -0500, Tom Lane wrote: >> Uh ... because it's *not* unreachable if elevel < ERROR. Otherwise we'd >> just mark errfinish as __attribute((noreturn)) and be done. Of course, >> that's a gcc-ism too. > Well, I mean with the double evaluation risk. Oh, are you saying you don't want to make the __builtin_constant_p addition? I'm not very satisfied with that answer. Right now, Peter's patch has added a class of bugs that never existed before 9.3, and yours would add more. It might well be that those classes are empty ... but *we can't know that*. I don't think that keeping a new optimization for non-gcc compilers is worth that risk. Postgres is already full of gcc-only optimizations, anyway. regards, tom lane
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Andres Freund
Date:
On 2013-01-11 16:28:13 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2013-01-11 16:16:58 -0500, Tom Lane wrote: > >> Uh ... because it's *not* unreachable if elevel < ERROR. Otherwise we'd > >> just mark errfinish as __attribute((noreturn)) and be done. Of course, > >> that's a gcc-ism too. > > > Well, I mean with the double evaluation risk. > > Oh, are you saying you don't want to make the __builtin_constant_p > addition? Yes, that was what I wanted to say. > I'm not very satisfied with that answer. Right now, Peter's > patch has added a class of bugs that never existed before 9.3, and yours > would add more. It might well be that those classes are empty ... but > *we can't know that*. I don't think that keeping a new optimization for > non-gcc compilers is worth that risk. Postgres is already full of > gcc-only optimizations, anyway. ISTM that ereport() already has so strange behaviour wrt evaluation of arguments (i.e doing it only when the message is really logged) that its doesn't seem to add a real additional risk. It also means that we potentially need to add more quieting returns et al to keep the warning level on non-gcc compatible compilers low (probably only msvc matters here) which we all don't see on our primary compilers. Anyway, youve got a strong opinion and I don't, so ... Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes: > On 2013-01-11 16:28:13 -0500, Tom Lane wrote: >> I'm not very satisfied with that answer. Right now, Peter's >> patch has added a class of bugs that never existed before 9.3, and yours >> would add more. It might well be that those classes are empty ... but >> *we can't know that*. I don't think that keeping a new optimization for >> non-gcc compilers is worth that risk. Postgres is already full of >> gcc-only optimizations, anyway. > ISTM that ereport() already has so strange behaviour wrt evaluation of > arguments (i.e doing it only when the message is really logged) that its > doesn't seem to add a real additional risk. Hm ... well, that's a point. I may be putting too much weight on the fact that any such bug for elevel would be a new one that never existed before. What do others think? regards, tom lane
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Heikki Linnakangas
Date:
On 11.01.2013 23:49, Tom Lane wrote: > Andres Freund<andres@2ndquadrant.com> writes: >> On 2013-01-11 16:28:13 -0500, Tom Lane wrote: >>> I'm not very satisfied with that answer. Right now, Peter's >>> patch has added a class of bugs that never existed before 9.3, and yours >>> would add more. It might well be that those classes are empty ... but >>> *we can't know that*. I don't think that keeping a new optimization for >>> non-gcc compilers is worth that risk. Postgres is already full of >>> gcc-only optimizations, anyway. > >> ISTM that ereport() already has so strange behaviour wrt evaluation of >> arguments (i.e doing it only when the message is really logged) that its >> doesn't seem to add a real additional risk. > > Hm ... well, that's a point. I may be putting too much weight on the > fact that any such bug for elevel would be a new one that never existed > before. What do others think? It sure would be nice to avoid multiple evaluation. At least in xlog.c we use emode_for_corrupt_record() to pass the elevel, and it does have a side-effect. It's quite harmless in practice, you'll get some extra DEBUG1 messages in the log, but still. Here's one more construct to consider: #define ereport_domain(elevel, domain, rest) \do { \ const int elevel_ = elevel; \ if (errstart(elevel_,__FILE__,__LINE__, PG_FUNCNAME_MACRO, domain) || elevel_>= ERROR) \ { \ (void) rest; \ if (elevel_>= ERROR) \ errfinish_noreturn(1); \ else \ errfinish(1); \ } \} while(0) extern void errfinish(int dummy,...); extern void errfinish_noreturn(int dummy,...) __attribute__((noreturn)); With do-while, ereport() would no longer be an expression. There only place where that's a problem in our codebase is in scan.l, bootscanner.l and repl_scanner.l, which do this: #undef fprintf #define fprintf(file, fmt, msg) ereport(ERROR, (errmsg_internal("%s", msg))) and flex does this: (void) fprintf( stderr, "%s\n", msg ); but that's easy to work around by creating a wrapper fprintf function in those .l files. When I compile with gcc -O0, I get one warning with this: datetime.c: In function ‘DateTimeParseError’: datetime.c:3575:1: warning: ‘noreturn’ function does return [enabled by default] That suggests that the compiler didn't correctly deduce that ereport(ERROR, ...) doesn't return. With -O1, the warning goes away. - Heikki
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Andres Freund
Date:
On 2013-01-12 12:26:37 +0200, Heikki Linnakangas wrote: > On 11.01.2013 23:49, Tom Lane wrote: > >Andres Freund<andres@2ndquadrant.com> writes: > >>On 2013-01-11 16:28:13 -0500, Tom Lane wrote: > >>>I'm not very satisfied with that answer. Right now, Peter's > >>>patch has added a class of bugs that never existed before 9.3, and yours > >>>would add more. It might well be that those classes are empty ... but > >>>*we can't know that*. I don't think that keeping a new optimization for > >>>non-gcc compilers is worth that risk. Postgres is already full of > >>>gcc-only optimizations, anyway. > > > >>ISTM that ereport() already has so strange behaviour wrt evaluation of > >>arguments (i.e doing it only when the message is really logged) that its > >>doesn't seem to add a real additional risk. > > > >Hm ... well, that's a point. I may be putting too much weight on the > >fact that any such bug for elevel would be a new one that never existed > >before. What do others think? > > It sure would be nice to avoid multiple evaluation. At least in xlog.c we > use emode_for_corrupt_record() to pass the elevel, and it does have a > side-effect. It's quite harmless in practice, you'll get some extra DEBUG1 > messages in the log, but still. > > Here's one more construct to consider: > > #define ereport_domain(elevel, domain, rest) \ > do { \ > const int elevel_ = elevel; \ > if (errstart(elevel_,__FILE__, __LINE__, PG_FUNCNAME_MACRO, domain) || > elevel_>= ERROR) \ > { \ > (void) rest; \ > if (elevel_>= ERROR) \ > errfinish_noreturn(1); \ > else \ > errfinish(1); \ > } \ > } while(0) > > extern void errfinish(int dummy,...); > extern void errfinish_noreturn(int dummy,...) __attribute__((noreturn)); I am afraid that that would bloat the code again as it would have to put that if() into each callsite whereas a variant that ends up using __builtin_unreachable() doesn't. So I think we should use __builtin_unreachable() while falling back to abort() as before. But that's really more a detail than anything fundamental about the approach. I'll play a bit. > With do-while, ereport() would no longer be an expression. There only place > where that's a problem in our codebase is in scan.l, bootscanner.l and > repl_scanner.l, which do this: I aggree that would easy to fix, so no problem there. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Tom Lane
Date:
Heikki Linnakangas <hlinnakangas@vmware.com> writes: > On 11.01.2013 23:49, Tom Lane wrote: >> Hm ... well, that's a point. I may be putting too much weight on the >> fact that any such bug for elevel would be a new one that never existed >> before. What do others think? > It sure would be nice to avoid multiple evaluation. At least in xlog.c > we use emode_for_corrupt_record() to pass the elevel, and it does have a > side-effect. Um. I had forgotten about that example, but its existence is sufficient to reinforce my opinion that we must not risk double evaluation of the elevel argument. > Here's one more construct to consider: > #define ereport_domain(elevel, domain, rest) \ > do { \ > const int elevel_ = elevel; \ > if (errstart(elevel_,__FILE__, __LINE__, PG_FUNCNAME_MACRO, domain) || > elevel_>= ERROR) \ > { \ > (void) rest; \ > if (elevel_>= ERROR) \ > errfinish_noreturn(1); \ > else \ > errfinish(1); \ > } \ > } while(0) I don't care for that too much in detail -- if errstart were to return false (it shouldn't, but if it did) this would be utterly broken, because we'd be making error reporting calls that elog.c wouldn't be prepared to accept. We should stick to the technique of doing the ereport as today and then adding something that tells the compiler we don't expect to get here; preferably something that emits no added code. However, using a do-block with a local variable is definitely something worth considering. I'm getting less enamored of the __builtin_constant_p idea after finding out that the gcc boys seem to have curious ideas about what its semantics ought to be: https://bugzilla.redhat.com/show_bug.cgi?id=894515 regards, tom lane
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Andres Freund
Date:
On 2013-01-12 13:16:56 -0500, Tom Lane wrote: > Heikki Linnakangas <hlinnakangas@vmware.com> writes: > > On 11.01.2013 23:49, Tom Lane wrote: > >> Hm ... well, that's a point. I may be putting too much weight on the > >> fact that any such bug for elevel would be a new one that never existed > >> before. What do others think? > > > It sure would be nice to avoid multiple evaluation. At least in xlog.c > > we use emode_for_corrupt_record() to pass the elevel, and it does have a > > side-effect. > > Um. I had forgotten about that example, but its existence is sufficient > to reinforce my opinion that we must not risk double evaluation of the > elevel argument. Unfortunately I aggree :( > > Here's one more construct to consider: > > > #define ereport_domain(elevel, domain, rest) \ > > do { \ > > const int elevel_ = elevel; \ > > if (errstart(elevel_,__FILE__, __LINE__, PG_FUNCNAME_MACRO, domain) || > > elevel_>= ERROR) \ > > { \ > > (void) rest; \ > > if (elevel_>= ERROR) \ > > errfinish_noreturn(1); \ > > else \ > > errfinish(1); \ > > } \ > > } while(0) > > I don't care for that too much in detail -- if errstart were to return > false (it shouldn't, but if it did) this would be utterly broken, > because we'd be making error reporting calls that elog.c wouldn't be > prepared to accept. We should stick to the technique of doing the > ereport as today and then adding something that tells the compiler we > don't expect to get here; preferably something that emits no added code. Yea, I didn't really like it all that much either - but anyway, I have yet to find *any* version with a local variable that doesn't lead to 200kb size increase :(. > However, using a do-block with a local variable is definitely something > worth considering. I'm getting less enamored of the __builtin_constant_p > idea after finding out that the gcc boys seem to have curious ideas > about what its semantics ought to be: > https://bugzilla.redhat.com/show_bug.cgi?id=894515 I wonder whether __builtin_choose_expr is any better? Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Heikki Linnakangas
Date:
On 12.01.2013 20:42, Andres Freund wrote: > On 2013-01-12 13:16:56 -0500, Tom Lane wrote: >> Heikki Linnakangas<hlinnakangas@vmware.com> writes: >>> Here's one more construct to consider: >> >>> #define ereport_domain(elevel, domain, rest) \ >>> do { \ >>> const int elevel_ = elevel; \ >>> if (errstart(elevel_,__FILE__, __LINE__, PG_FUNCNAME_MACRO, domain) || >>> elevel_>= ERROR) \ >>> { \ >>> (void) rest; \ >>> if (elevel_>= ERROR) \ >>> errfinish_noreturn(1); \ >>> else \ >>> errfinish(1); \ >>> } \ >>> } while(0) >> >> I don't care for that too much in detail -- if errstart were to return >> false (it shouldn't, but if it did) this would be utterly broken, >> because we'd be making error reporting calls that elog.c wouldn't be >> prepared to accept. We should stick to the technique of doing the >> ereport as today and then adding something that tells the compiler we >> don't expect to get here; preferably something that emits no added code. With the current ereport(), we'll call abort() if errstart returns false and elevel >= ERROR. That's even worse than making an error reporting calls that elog.c isn't prepared for. If we take that risk seriously, with bogus error reporting calls we at least have chance of catching it and reporting an error. > Yea, I didn't really like it all that much either - but anyway, I have > yet to find *any* version with a local variable that doesn't lead to > 200kb size increase :(. Hmm, that's strange. Assuming the optimizer optimizes away the paths in the macro that are never taken when elevel is a constant, I would expect the resulting binary to be smaller than what we have now. All those useless abort() calls must take up some space. Here's what I got with and without my patch on my laptop: -rwxr-xr-x 1 heikki heikki 24767237 tammi 12 21:27 postgres-do-while-mypatch -rwxr-xr-x 1 heikki heikki 24623081 tammi 12 21:28 postgres-unpatched But when I build without --enable-debug, the situation reverses: -rwxr-xr-x 1 heikki heikki 5961309 tammi 12 21:48 postgres-do-while-mypatch -rwxr-xr-x 1 heikki heikki 5986961 tammi 12 21:49 postgres-unpatched I suspect you're also just seeing bloat caused by more debugging symbols, which won't have any effect at runtime. It would be nice to have smaller debug-enabled builds, too, of course, but it's not that important. >> However, using a do-block with a local variable is definitely something >> worth considering. I'm getting less enamored of the __builtin_constant_p >> idea after finding out that the gcc boys seem to have curious ideas >> about what its semantics ought to be: >> https://bugzilla.redhat.com/show_bug.cgi?id=894515 > > I wonder whether __builtin_choose_expr is any better? I forgot to mention (and it wasn't visible in the snippet I posted) the reason I used __attribute__ ((noreturn)). We already use that attribute elsewhere, so this optimization wouldn't require anything more from the compiler than what already take advantage of. I think that noreturn is more widely supported than these other constructs. Also, if I'm reading the MSDN docs correctly, while MSVC doesn't understand __attribute__ ((noreturn)) directly, it has an equivalent syntax _declspec(noreturn). So we could easily support MSVC with this approach. Attached is the complete patch I used for the above tests. - Heikki
Attachment
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Tom Lane
Date:
Heikki Linnakangas <hlinnakangas@vmware.com> writes: > On 12.01.2013 20:42, Andres Freund wrote: >>> I don't care for that too much in detail -- if errstart were to return >>> false (it shouldn't, but if it did) this would be utterly broken, > With the current ereport(), we'll call abort() if errstart returns false > and elevel >= ERROR. That's even worse than making an error reporting > calls that elog.c isn't prepared for. No it isn't: you'd get a clean and easily interpretable abort. I am not sure exactly what would happen if we plow forward with calling elog.c functions after errstart returns false, but it wouldn't be good, and debugging a field report of such behavior wouldn't be easy. This is actually a disadvantage of the proposal to replace the abort() calls with __builtin_unreachable(), too. The gcc boys interpret the semantics of that as "if control reaches here, the behavior is undefined" --- and their notion of undefined generally seems to amount to "we will screw you as hard as we can". For example, they have no problem with using the assumption that control won't reach there to make deductions while optimizing the rest of the function. If it ever did happen, I would not want to have to debug it. The same goes for __attribute__((noreturn)), BTW. >> Yea, I didn't really like it all that much either - but anyway, I have >> yet to find *any* version with a local variable that doesn't lead to >> 200kb size increase :(. > Hmm, that's strange. Assuming the optimizer optimizes away the paths in > the macro that are never taken when elevel is a constant, I would expect > the resulting binary to be smaller than what we have now. I was wondering about that too, but haven't tried it locally yet. It could be that Andres was looking at an optimization level in which a register still got allocated for the local variable. > Here's what I got with and without my patch on my laptop: > -rwxr-xr-x 1 heikki heikki 24767237 tammi 12 21:27 postgres-do-while-mypatch > -rwxr-xr-x 1 heikki heikki 24623081 tammi 12 21:28 postgres-unpatched > But when I build without --enable-debug, the situation reverses: > -rwxr-xr-x 1 heikki heikki 5961309 tammi 12 21:48 postgres-do-while-mypatch > -rwxr-xr-x 1 heikki heikki 5986961 tammi 12 21:49 postgres-unpatched Yes, I noticed that too: these patches make the debug overhead grow substantially for some reason. It's better to look at the output of "size" rather than the executable's total file size. That way you don't need to rebuild without debug to see reality. (I guess you could also "strip" the executables for comparison purposes.) regards, tom lane
Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes: >>> It does *not* combine elog_start and elog_finish into one function if >>> varargs are available although that brings a rather measurable >>> size/performance benefit. >> Since you've apparently already done the measurement: how much? >> It would be a bit tedious to support two different infrastructures for >> elog(), but if it's a big enough win maybe we should. > Imo its pretty definitely a big enough win. So big I have a hard time > believing it can be true without negative effects somewhere else. Well, actually there's a pretty serious negative effect here, which is that when it's implemented this way it's impossible to save errno for %m before the elog argument list is evaluated. So I think this is a no-go. We've always had the contract that functions in the argument list could stomp on errno without care. If we switch to a do-while macro expansion it'd be possible to do something like do { int save_errno = errno; int elevel = whatever; elog_internal( save_errno, elevel, fmt, __VA__ARGS__ );} while (0); but this would almost certainly result in more code bloat not less, since call sites would now be responsible for fetching errno. regards, tom lane
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes: > On 2013-01-12 13:16:56 -0500, Tom Lane wrote: >> However, using a do-block with a local variable is definitely something >> worth considering. I'm getting less enamored of the __builtin_constant_p >> idea after finding out that the gcc boys seem to have curious ideas >> about what its semantics ought to be: >> https://bugzilla.redhat.com/show_bug.cgi?id=894515 > I wonder whether __builtin_choose_expr is any better? Right offhand I don't see how that helps us. AFAICS, __builtin_choose_expr is the same as x?y:z except that y and z are not required to have the same datatype, which is okay because they require the value of x to be determinable at compile time. So we couldn't just write __builtin_choose_expr((elevel) >= ERROR, ...). That would fail if elevel wasn't compile-time-constant. We could conceivably do __builtin_choose_expr(__builtin_constant_p(elevel) && (elevel) >= ERROR, ...) with the idea of making real sure that that expression reduces to a compile time constant ... but stacking two nonstandard constructs on each other seems to me to probably increase our exposure to gcc's definitional randomness rather than reduce it. I mean, if __builtin_constant_p can't already be trusted to act like a constant, why should we trust that __builtin_choose_expr doesn't also have a curious definition of constant-ness? regards, tom lane
Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Tom Lane
Date:
Heikki Linnakangas <hlinnakangas@vmware.com> writes: > When I compile with gcc -O0, I get one warning with this: > datetime.c: In function �DateTimeParseError�: > datetime.c:3575:1: warning: �noreturn� function does return [enabled by default] > That suggests that the compiler didn't correctly deduce that > ereport(ERROR, ...) doesn't return. With -O1, the warning goes away. Yeah, I am seeing this too. It appears to be endemic to the local-variable approach, ie if we have const int elevel_ = (elevel);...(elevel_ >= ERROR) ? pg_unreachable() : (void) 0 then we do not get the desired results at -O0, which is not terribly surprising --- I'd not really expect the compiler to propagate the value of elevel_ when not optimizing. If we don't use a local variable, we don't get the warning, which I take to mean that gcc will fold "ERROR >= ERROR" to true even at -O0, and that it does this early enough to conclude that unreachability holds. I experimented with some variant ways of phrasing the macro, but the only thing that worked at -O0 required __builtin_constant_p, which rather defeats the purpose of making this accessible to non-gcc compilers. If we go with the local-variable approach, we could probably suppress this warning by putting an abort() call at the bottom of DateTimeParseError. It seems a tad ugly though, and what's a bigger deal is that if the compiler is unable to deduce unreachability at -O0 then we are probably going to be fighting such bogus warnings for all time to come. Note also that an abort() (much less a pg_unreachable()) would not do anything positive like give us a compile warning if we mistakenly added a case that could fall through. On the other hand, if there's only one such case in our tree today, maybe I'm worrying too much. regards, tom lane
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Andres Freund
Date:
On 2013-01-12 15:39:16 -0500, Tom Lane wrote: > Heikki Linnakangas <hlinnakangas@vmware.com> writes: > > On 12.01.2013 20:42, Andres Freund wrote: > >>> I don't care for that too much in detail -- if errstart were to return > >>> false (it shouldn't, but if it did) this would be utterly broken, > > > With the current ereport(), we'll call abort() if errstart returns false > > and elevel >= ERROR. That's even worse than making an error reporting > > calls that elog.c isn't prepared for. > > No it isn't: you'd get a clean and easily interpretable abort. I am not > sure exactly what would happen if we plow forward with calling elog.c > functions after errstart returns false, but it wouldn't be good, and > debugging a field report of such behavior wouldn't be easy. > > This is actually a disadvantage of the proposal to replace the abort() > calls with __builtin_unreachable(), too. The gcc boys interpret the > semantics of that as "if control reaches here, the behavior is > undefined" --- and their notion of undefined generally seems to amount > to "we will screw you as hard as we can". For example, they have no > problem with using the assumption that control won't reach there to make > deductions while optimizing the rest of the function. If it ever did > happen, I would not want to have to debug it. The same goes for > __attribute__((noreturn)), BTW. We could make add an Assert() additionally? > >> Yea, I didn't really like it all that much either - but anyway, I have > >> yet to find *any* version with a local variable that doesn't lead to > >> 200kb size increase :(. > > > Hmm, that's strange. Assuming the optimizer optimizes away the paths in > > the macro that are never taken when elevel is a constant, I would expect > > the resulting binary to be smaller than what we have now. > > I was wondering about that too, but haven't tried it locally yet. It > could be that Andres was looking at an optimization level in which a > register still got allocated for the local variable. Nope, it was O2, albeit with -fno-omit-frame-pointer (for usable hierarchical profiles). > > Here's what I got with and without my patch on my laptop: > > > -rwxr-xr-x 1 heikki heikki 24767237 tammi 12 21:27 postgres-do-while-mypatch > > -rwxr-xr-x 1 heikki heikki 24623081 tammi 12 21:28 postgres-unpatched > > > But when I build without --enable-debug, the situation reverses: > > > -rwxr-xr-x 1 heikki heikki 5961309 tammi 12 21:48 postgres-do-while-mypatch > > -rwxr-xr-x 1 heikki heikki 5986961 tammi 12 21:49 postgres-unpatched > > Yes, I noticed that too: these patches make the debug overhead grow > substantially for some reason. It's better to look at the output of > "size" rather than the executable's total file size. That way you don't > need to rebuild without debug to see reality. (I guess you could also But the above is spot on. While I used strip earlier I somehow forgot it here and so the debug overhead is part of what I saw. But, in comparison to my baseline (which is replacing the abort() with pg_unreachable()/__builtin_unreachable()) its still a loss performancewise which is why I didn't doubt my results... Pavel's testcase with changing ereport implementations (5 runs, minimal time): EREPORT_EXPR_OLD_NO_UNREACHABLE: 9964.015ms, 5530296 bytes (stripped) EREPORT_EXPR_OLD_ABORT: 9765.916ms, 5466936 bytes (stripped) EREPORT_EXPR_OLD_UNREACHABLE: 9646.502ms, 5462456 bytes (stripped) EREPORT_STMT_HEIKKI: 10133.968ms, 5435704 bytes (stripped) EREPORT_STMT_ANDRES: 9548.436ms, 5462232 bytes (stripped) Where my variant is: #define ereport_domain(elevel, domain, rest) \ do { \ const int elevel_ = elevel; \ if (errstart(elevel_,__FILE__,__LINE__, PG_FUNCNAME_MACRO, domain))\ { \ errfinish rest; \ } \ if (elevel_>= ERROR) \ pg_unreachable(); \ } while(0) So I suggest using that with an additional Assert(0) in the if(elevel_) branch. The assembler generated for my version is exactly the same when elevel is constant in comparison with Heikki's but doesn't generate any additional code in the case its not constant and I guess thats where it gets faster? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Andres Freund
Date:
On 2013-01-12 19:47:18 -0500, Tom Lane wrote: > Heikki Linnakangas <hlinnakangas@vmware.com> writes: > > When I compile with gcc -O0, I get one warning with this: > > > datetime.c: In function �DateTimeParseError�: > > datetime.c:3575:1: warning: �noreturn� function does return [enabled by default] > > > That suggests that the compiler didn't correctly deduce that > > ereport(ERROR, ...) doesn't return. With -O1, the warning goes away. > > Yeah, I am seeing this too. It appears to be endemic to the > local-variable approach, ie if we have > > const int elevel_ = (elevel); > ... > (elevel_ >= ERROR) ? pg_unreachable() : (void) 0 > > then we do not get the desired results at -O0, which is not terribly > surprising --- I'd not really expect the compiler to propagate the > value of elevel_ when not optimizing. > > If we don't use a local variable, we don't get the warning, which > I take to mean that gcc will fold "ERROR >= ERROR" to true even at -O0, > and that it does this early enough to conclude that unreachability > holds. > > I experimented with some variant ways of phrasing the macro, but the > only thing that worked at -O0 required __builtin_constant_p, which > rather defeats the purpose of making this accessible to non-gcc > compilers. Yea, its rather sad. The only thing I can see is actually doing both variants like: #define ereport_domain(elevel, domain, rest) \ do { \ const int elevel_ = elevel; \ if (errstart(elevel_,__FILE__,__LINE__, PG_FUNCNAME_MACRO, domain))\ { \ errfinish rest; \ } \ if (pg_is_known_constant(elevel) && (elevel) >= ERROR) \ { \ pg_unreachable(); \ Assert(0); \ } \ else if (elevel_>= ERROR) \ { \ pg_unreachable(); \ Assert(0); \ } \ } while(0) but that obviously still relies on __builtin_constant_p giving reasonable answers. We should probably put that Assert(0) into pg_unreachable()... > If we go with the local-variable approach, we could probably suppress > this warning by putting an abort() call at the bottom of > DateTimeParseError. It seems a tad ugly though, and what's a bigger > deal is that if the compiler is unable to deduce unreachability at -O0 > then we are probably going to be fighting such bogus warnings for all > time to come. Note also that an abort() (much less a pg_unreachable()) > would not do anything positive like give us a compile warning if we > mistakenly added a case that could fall through. > > On the other hand, if there's only one such case in our tree today, > maybe I'm worrying too much. I think the only reason there's only one such case (two here actually, second in renametrig) is that all others already have been silenced because the abort() didn't use to be there. And I think the danger youre alluding to above is a real one. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Andres Freund
Date:
On 2013-01-12 18:15:17 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2013-01-12 13:16:56 -0500, Tom Lane wrote: > >> However, using a do-block with a local variable is definitely something > >> worth considering. I'm getting less enamored of the __builtin_constant_p > >> idea after finding out that the gcc boys seem to have curious ideas > >> about what its semantics ought to be: > >> https://bugzilla.redhat.com/show_bug.cgi?id=894515 > > > I wonder whether __builtin_choose_expr is any better? > > Right offhand I don't see how that helps us. AFAICS, > __builtin_choose_expr is the same as x?y:z except that y and z are not > required to have the same datatype, which is okay because they require > the value of x to be determinable at compile time. So we couldn't just > write __builtin_choose_expr((elevel) >= ERROR, ...). That would fail > if elevel wasn't compile-time-constant. We could conceivably do > > __builtin_choose_expr(__builtin_constant_p(elevel) && (elevel) >= ERROR, ...) > > with the idea of making real sure that that expression reduces to a > compile time constant ... but stacking two nonstandard constructs on > each other seems to me to probably increase our exposure to gcc's > definitional randomness rather than reduce it. I mean, if > __builtin_constant_p can't already be trusted to act like a constant, > why should we trust that __builtin_choose_expr doesn't also have a > curious definition of constant-ness? I was thinking of something like the above, yes. It seems to me to generate code the compiler would need to really make sure the condition in __builtin_choose_expr() is really constant, so I hoped the checks inside are somewhat strict... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes: > On 2013-01-12 15:39:16 -0500, Tom Lane wrote: >> This is actually a disadvantage of the proposal to replace the abort() >> calls with __builtin_unreachable(), too. The gcc boys interpret the >> semantics of that as "if control reaches here, the behavior is >> undefined" > We could make add an Assert() additionally? I see little value in that. In a non-assert build it will do nothing at all, and in an assert build it'd just add code bloat. I think there might be a good argument for defining pg_unreachable() as abort() in assert-enabled builds, even if the compiler has __builtin_unreachable(): that way, if the impossible happens, we'll find out about it in a predictable and debuggable way. And by definition, we're not so concerned about either performance or code bloat in assert builds. > Where my variant is: > #define ereport_domain(elevel, domain, rest) \ > do { \ > const int elevel_ = elevel; \ > if (errstart(elevel_,__FILE__, __LINE__, PG_FUNCNAME_MACRO, domain))\ > { \ > errfinish rest; \ > } \ > if (elevel_>= ERROR) \ > pg_unreachable(); \ > } while(0) This is the same implementation I'd arrived at after looking at Heikki's. I think we should probably use this for non-gcc builds. However, for recent gcc (those with __builtin_constant_p) I think that my original suggestion is better, ie don't use a local variable but instead use __builtin_constant_p(elevel) && (elevel) >= ERROR in the second test. That way we don't have the problem with unreachability tests failing at -O0. We may still have some issues with bogus warnings in other compilers, but I think most PG developers are using recent gcc. regards, tom lane
Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Andres Freund
Date:
On 2013-01-12 16:36:39 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > >>> It does *not* combine elog_start and elog_finish into one function if > >>> varargs are available although that brings a rather measurable > >>> size/performance benefit. > > >> Since you've apparently already done the measurement: how much? > >> It would be a bit tedious to support two different infrastructures for > >> elog(), but if it's a big enough win maybe we should. > > > Imo its pretty definitely a big enough win. So big I have a hard time > > believing it can be true without negative effects somewhere else. > > Well, actually there's a pretty serious negative effect here, which is > that when it's implemented this way it's impossible to save errno for %m > before the elog argument list is evaluated. So I think this is a no-go. > We've always had the contract that functions in the argument list could > stomp on errno without care. > > If we switch to a do-while macro expansion it'd be possible to do > something like > > do { > int save_errno = errno; > int elevel = whatever; > > elog_internal( save_errno, elevel, fmt, __VA__ARGS__ ); > } while (0); > > but this would almost certainly result in more code bloat not less, > since call sites would now be responsible for fetching errno. the numbers are: old definition: 10393.658ms, 5497912 bytes old definition + unreachable: 10011.102ms, 5469144 bytes stmt, two calls, unreachable: 10036.132ms, 5468792 bytes stmt, one call, unreachable: 9443.612ms, 5462232 bytes stmt, one call, unreachable, save errno: 9615.863ms, 5489688 bytes So while not saving errno is unsurprisingly better its still a win. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes: > the numbers are: > old definition: 10393.658ms, 5497912 bytes > old definition + unreachable: 10011.102ms, 5469144 bytes > stmt, two calls, unreachable: 10036.132ms, 5468792 bytes > stmt, one call, unreachable: 9443.612ms, 5462232 bytes > stmt, one call, unreachable, save errno: 9615.863ms, 5489688 bytes I find these numbers pretty hard to credit. Why should replacing two calls by one, in code paths that are not being taken, move the runtime so much? The argument that a net reduction of code size is a win doesn't work, because the last case is more code than any except the first. I think you're measuring some coincidental effect or other, not a reproducible performance improvement. Or there's a bug in the code you're using. regards, tom lane
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Andres Freund
Date:
On 2013-01-13 14:17:52 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > the numbers are: > > old definition: 10393.658ms, 5497912 bytes > > old definition + unreachable: 10011.102ms, 5469144 bytes > > stmt, two calls, unreachable: 10036.132ms, 5468792 bytes > > stmt, one call, unreachable: 9443.612ms, 5462232 bytes > > stmt, one call, unreachable, save errno: 9615.863ms, 5489688 bytes > > I find these numbers pretty hard to credit. Why should replacing two > calls by one, in code paths that are not being taken, move the runtime > so much? The argument that a net reduction of code size is a win > doesn't work, because the last case is more code than any except the > first. There are quite some elog(DEBUG*)s in the backend, and those are taken, so I don't find it unreasonable that doing less work in those cases is measurable. And if you look at the disassembly of ERROR codepaths: saving errno, one function: Dump of assembler code for function palloc: 639 { 0x0000000000736397 <+7>: mov %rdi,%rsi 0x00000000007363b0 <+32>: push %rbp 0x00000000007363b1 <+33>: mov %rsp,%rbp 0x00000000007363b4 <+36>: sub $0x20,%rsp 640 /* duplicates MemoryContextAlloc to avoid increased overhead */ 641 AssertArg(MemoryContextIsValid(CurrentMemoryContext)); 642 643 if (!AllocSizeIsValid(size)) 0x0000000000736390 <+0>: cmp $0x3fffffff,%rdi 0x000000000073639a <+10>: ja 0x7363b0 <palloc+32> 644 elog(ERROR, "invalid memory alloc request size %lu", 0x00000000007363b8 <+40>: mov %rdi,-0x8(%rbp) 0x00000000007363bc <+44>: callq 0x462010 <__errno_location@plt> 0x00000000007363c1 <+49>: mov -0x8(%rbp),%rsi 0x00000000007363c5 <+53>: mov $0x8ace30,%r9d 0x00000000007363cb <+59>: mov $0x14,%ecx 0x00000000007363d0 <+64>: mov $0x8acefe,%edx 0x00000000007363d5<+69>: mov $0x8ace58,%edi 0x00000000007363da <+74>: mov %rsi,(%rsp) 0x00000000007363de <+78>: mov (%rax),%r8d 0x00000000007363e1 <+81>: mov $0x285,%esi 0x00000000007363e6 <+86>: xor %eax,%eax 0x00000000007363e8 <+88>: callq 0x71a0b0 <elog_all> 0x00000000007363ed: nopl (%rax) 645 (unsigned long) size); 646 647 CurrentMemoryContext->isReset = false; 0x000000000073639c <+12>: mov 0x25c6e5(%rip),%rdi # 0x992a88 <CurrentMemoryContext> 0x00000000007363a7 <+23>: movb $0x0,0x30(%rdi) 648 649 return (*CurrentMemoryContext->methods->alloc) (CurrentMemoryContext, size); 0x00000000007363a3 <+19>: mov 0x8(%rdi),%rax 0x00000000007363ab <+27>: mov (%rax),%rax 0x00000000007363ae <+30>: jmpq *%rax End of assembler dump. vs Dump of assembler code for function palloc: 639 { 0x00000000007382b0 <+0>: push %rbp 0x00000000007382b1 <+1>: mov %rsp,%rbp 0x00000000007382b4 <+4>: push %rbx 0x00000000007382b5 <+5>: mov %rdi,%rbx 0x00000000007382b8 <+8>: sub $0x8,%rsp 640 /* duplicates MemoryContextAlloc to avoid increased overhead */ 641 AssertArg(MemoryContextIsValid(CurrentMemoryContext)); 642 643 if (!AllocSizeIsValid(size)) 0x00000000007382bc <+12>: cmp $0x3fffffff,%rdi 0x00000000007382c3 <+19>: jbe 0x7382ed <palloc+61> 644 elog(ERROR, "invalid memory alloc request size %lu", 0x00000000007382c5 <+21>: mov $0x8aec9e,%edx 0x00000000007382ca <+26>: mov $0x284,%esi 0x00000000007382cf <+31>: mov $0x8aebd0,%edi 0x00000000007382d4 <+36>: callq 0x71bcb0 <elog_start> 0x00000000007382d9<+41>: mov %rbx,%rdx 0x00000000007382dc <+44>: mov $0x8aec10,%esi 0x00000000007382e1 <+49>: mov $0x14,%edi 0x00000000007382e6 <+54>: xor %eax,%eax 0x00000000007382e8 <+56>: callq 0x71bac0 <elog_finish> 645 (unsigned long) size); 646 647 CurrentMemoryContext->isReset = false; 0x00000000007382ed <+61>: mov 0x25bc54(%rip),%rdi # 0x993f48 <CurrentMemoryContext> 0x00000000007382fb <+75>: movb $0x0,0x30(%rdi) 648 649 return (*CurrentMemoryContext->methods->alloc) (CurrentMemoryContext, size); 0x00000000007382f4 <+68>: mov %rbx,%rsi 0x00000000007382f7 <+71>: mov 0x8(%rdi),%rax 0x00000000007382ff <+79>: mov (%rax),%rax 0x0000000000738308 <+88>: jmpq *%rax 0x000000000073830a: nopw 0x0(%rax,%rax,1) 650 } 0x0000000000738302 <+82>: add $0x8,%rsp 0x0000000000738306 <+86>: pop %rbx 0x0000000000738307 <+87>: pop %rbp End of assembler dump. If other critical paths look like this its not that surprising. > I think you're measuring some coincidental effect or other, not a > reproducible performance improvement. Or there's a bug in the code > you're using. Might be, I repeated all of it though. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes: > On 2013-01-13 14:17:52 -0500, Tom Lane wrote: >> I find these numbers pretty hard to credit. > There are quite some elog(DEBUG*)s in the backend, and those are taken, > so I don't find it unreasonable that doing less work in those cases is > measurable. Meh. If there are any elog(DEBUG)s in time-critical places, they should be changed to ereport's anyway, so as to eliminate most of the overhead when they're not firing. > And if you look at the disassembly of ERROR codepaths: I think your numbers are being twisted by -fno-omit-frame-pointer. What I get, with the __builtin_unreachable version of the macro, looks more like MemoryContextAlloc:cmpq $1073741823, %rsipushq %rbxmovq %rsi, %rbxja .L53movq 8(%rdi), %raxmovb $0, 48(%rdi)popq %rbxmovq (%rax), %raxjmp *%rax .L53:movl $__func__.5262, %edxmovl $576, %esimovl $.LC2, %edicall elog_startmovq %rbx, %rdxmovl $.LC3,%esimovl $20, %edixorl %eax, %eaxcall elog_finish With -fno-omit-frame-pointer it's a little worse, but still not what you show --- in particular, for me gcc still pushes the elog calls out of the main code path. I don't think that the main path will get any better with one elog function instead of two. It could easily get worse. On many machines, the single-function version would be worse because of needing to use more parameter registers, which would translate into more save/restore work in the calling function, which is overhead that would likely be paid whether elog actually gets called or not. (As an example, in the above code gcc evidently isn't noticing that it doesn't need to save/restore rbx so far as the main path is concerned.) In any case, results from a single micro-benchmark on a single platform with a single compiler version (and single set of compiler options) don't convince me a whole lot here. Basically, the aspects of this that I think are likely to be reproducible wins across different platforms are (a) teaching the compiler that elog(ERROR) doesn't return, and (b) reducing code size as much as possible. The single-function change isn't going to help on either ground --- maybe it would have helped on (b) without the errno problem, but that's going to destroy any possible code size savings. regards, tom lane
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Andres Freund
Date:
On 2013-01-13 15:44:58 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > And if you look at the disassembly of ERROR codepaths: > > I think your numbers are being twisted by -fno-omit-frame-pointer. > What I get, with the __builtin_unreachable version of the macro, > looks more like The only difference is that the following two instructions aren't done: 0x00000000007363b0 <+32>: push %rbp 0x00000000007363b1<+33>: mov %rsp,%rbp Given that that function barely uses registers (from an amd64 pov at least), thats notreally surprising. > MemoryContextAlloc: > cmpq $1073741823, %rsi > pushq %rbx > movq %rsi, %rbx > ja .L53 > movq 8(%rdi), %rax > movb $0, 48(%rdi) > popq %rbx > movq (%rax), %rax > jmp *%rax > .L53: > movl $__func__.5262, %edx > movl $576, %esi > movl $.LC2, %edi > call elog_start > movq %rbx, %rdx > movl $.LC3, %esi > movl $20, %edi > xorl %eax, %eax > call elog_finish > > With -fno-omit-frame-pointer it's a little worse, but still not what you > show --- in particular, for me gcc still pushes the elog calls out of > the main code path. I don't think that the main path will get any > better with one elog function instead of two. It could easily get worse. > On many machines, the single-function version would be worse because of > needing to use more parameter registers, which would translate into more > save/restore work in the calling function, which is overhead that would > likely be paid whether elog actually gets called or not. (As an > example, in the above code gcc evidently isn't noticing that it doesn't > need to save/restore rbx so far as the main path is concerned.) > > In any case, results from a single micro-benchmark on a single platform > with a single compiler version (and single set of compiler options) > don't convince me a whole lot here. I am not convinced either, I just got curious whether it would be a win after the __VA_ARGS__ thing made it possible. If the errno problem didn't exist I would be pretty damn sure its just about always a win, but the way its now... We could make elog behave the same as ereport WRT to argument evaluation but that seems a bit dangerous given it would still be different on some platforms. > Basically, the aspects of this that I think are likely to be > reproducible wins across different platforms are (a) teaching the > compiler that elog(ERROR) doesn't return, and (b) reducing code size as > much as possible. The single-function change isn't going to help on > either ground --- maybe it would have helped on (b) without the errno > problem, but that's going to destroy any possible code size savings. Agreed. I am happy to produce an updated patch unless youre already on it? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes: >> Basically, the aspects of this that I think are likely to be >> reproducible wins across different platforms are (a) teaching the >> compiler that elog(ERROR) doesn't return, and (b) reducing code size as >> much as possible. The single-function change isn't going to help on >> either ground --- maybe it would have helped on (b) without the errno >> problem, but that's going to destroy any possible code size savings. > Agreed. I am happy to produce an updated patch unless youre already on > it? On it now (busy testing on some old slow boxes, else I'd be done already). regards, tom lane
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Robert Haas
Date:
On Sun, Jan 13, 2013 at 4:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andres Freund <andres@2ndquadrant.com> writes: >>> Basically, the aspects of this that I think are likely to be >>> reproducible wins across different platforms are (a) teaching the >>> compiler that elog(ERROR) doesn't return, and (b) reducing code size as >>> much as possible. The single-function change isn't going to help on >>> either ground --- maybe it would have helped on (b) without the errno >>> problem, but that's going to destroy any possible code size savings. > >> Agreed. I am happy to produce an updated patch unless youre already on >> it? > > On it now (busy testing on some old slow boxes, else I'd be done already). Just a random thought here... There are an awful lot of places in our source tree where the error level is fixed. We could invent a new construct, say ereport_error or so, that is just like ereport except that it takes no error-level parameter because it's hard-coded to ERROR. It would be a bit of a pain to change all of the existing call sites, but presumably it would dodge a lot of these issues about the way compilers optimize things, because we could simply say categorically that ereport_error NEVER returns. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > There are an awful lot of places in our source tree where the error > level is fixed. We could invent a new construct, say ereport_error or > so, that is just like ereport except that it takes no error-level > parameter because it's hard-coded to ERROR. > It would be a bit of a pain to change all of the existing call sites, > but presumably it would dodge a lot of these issues about the way > compilers optimize things, because we could simply say categorically > that ereport_error NEVER returns. Meh. We've already got it working, and in a way that doesn't require the compiler to understand __attribute__((noreturn)) --- it only has to be aware that abort() doesn't return, in one fashion or another. So I'm disinclined to run around and change a few thousand call sites, much less expect extension authors to do so too. (By my count there are about six thousand places we'd have to change.) Note that whatever's going on on dugong is not a counterexample to "got it working", because presumably dugong would also be misbehaving if we'd used a different method of flagging all the ereports/elogs as nonreturning. regards, tom lane
Re: [PATCH 3/5] Split out xlog reading into its own module called xlogreader
From
Alvaro Herrera
Date:
Andres Freund wrote: > The way xlog reading was done up to now made it impossible to use that > nontrivial code outside of xlog.c although it is useful for different purposes > like debugging wal (xlogdump) and decoding wal back into logical changes. I have pushed this part after some more editorialization. Most notably, (I think,) I removed the #ifdef FRONTEND piece. I think this should be part of the pg_xlogdump patch. Sorry about this. Also I changed XLogReaderAllocate() to not receive an initial read starting point, because that seemed rather pointless. Both xlog.c and the submitted pg_xlogdump use a non-invalid RecPtr in their first call AFAICS. There was a bug in xlog.c's usage of XLogReadRecord: it was calling ereport() on the errorbuf when the return value was NULL, which is not always sane because some errors do not set the errorbuf. I fixed that, and I documented it more clearly in xlogreader.h (I hope). I made some other minor stylistic changes, reworded a few comments, etc. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend
From
Andres Freund
Date:
On 2013-01-08 15:55:24 -0500, Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > Andres Freund wrote: > >> Sorry, misremembered the problem somewhat. The problem is that code that > >> includes postgres.h atm ends up with ExceptionalCondition() et > >> al. declared even if FRONTEND is defined. So if anything uses an assert > >> you need to provide wrappers for those which seems nasty. If they are > >> provided centrally and check for FRONTEND that problem doesn't exist. > > > I think the right fix here is to fix things so that postgres.h is not > > necessary. How hard is that? Maybe it just requires some more > > reshuffling of xlog headers. > > That would definitely be the ideal fix. In general, #include'ing > postgres.h into code that's not backend code opens all kinds of hazards > that are likely to bite us sooner or later; the incompatibility of the > Assert definitions is just the tip of that iceberg. (In the past we've > had issues around <stdio.h> providing different definitions in the two > environments, for example.) I agree that going in that direction is a good thing, but imo that doesn't really has any bearing on whether this patch is a good/bad idea... > But having said that, I'm also now remembering that we have files in > src/port/ and probably elsewhere that try to work in both environments > by just #include'ing c.h directly (relying on the Makefile to supply > -DFRONTEND or not). If we wanted to support Assert use in such files, > we would have to move at least the Assert() macro definitions into c.h > as Andres is proposing. Now, I've always thought that #include'ing c.h > directly was kind of an ugly hack, but I don't have a better design than > that to offer ATM. Imo having some common dumping ground for things that concern both environments is a good idea. I don't think c.h would be the best place for it when redesigning from ground up, but were not doing that so imo its acceptable as long as we take care not to let it grow too much. Here's a trivially updated patch which also defines AssertArg() for FRONTEND-ish environments since Alvaro added one in xlogreader.c. Do we agree this is the way forward, at least for now? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Andres Freund
Date:
On 2013-01-09 15:07:10 -0500, Tom Lane wrote: > I would like to know if other people get comparable results on other > hardware (non-Intel hardware would be especially interesting). If this > result holds up across a range of platforms, I'll withdraw my objection > to making palloc a plain function. Unfortunately nobody spoke up and I don't have any non-intel hardware anymore. Well, aside from my phone that is... Updated patch attached, the previous version missed some contrib modules. I like the diffstat: 41 files changed, 224 insertions(+), 636 deletions(-) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Re: Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend
From
Andres Freund
Date:
On 2013-01-18 15:48:01 +0100, Andres Freund wrote: > Here's a trivially updated patch which also defines AssertArg() for > FRONTEND-ish environments since Alvaro added one in xlogreader.c. This time for real. Please. -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Re: Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend
From
Alvaro Herrera
Date:
Andres Freund wrote: > On 2013-01-18 15:48:01 +0100, Andres Freund wrote: > > Here's a trivially updated patch which also defines AssertArg() for > > FRONTEND-ish environments since Alvaro added one in xlogreader.c. > > This time for real. Please. Here's a different idea: move all the Assert() and StaticAssert() etc definitions from c.h and postgres.h into a new header, say pgassert.h. That header is included directly by postgres.h (just like palloc.h and elog.h already are) so we don't have to touch the backend code at all. Frontend programs that want that functionality can just #include "pgassert.h" by themselves. The definitions are (obviously) protected by #ifdef FRONTEND so that it all works in both environments cleanly. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend
From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Here's a different idea: move all the Assert() and StaticAssert() etc > definitions from c.h and postgres.h into a new header, say pgassert.h. > That header is included directly by postgres.h (just like palloc.h and > elog.h already are) so we don't have to touch the backend code at all. > Frontend programs that want that functionality can just #include > "pgassert.h" by themselves. The definitions are (obviously) protected > by #ifdef FRONTEND so that it all works in both environments cleanly. That might work. Files using c.h would have to include this too, but as described it should work in either environment for them. The other corner case is pg_controldata.c and other frontend programs that include postgres.h --- but it looks like they #define FRONTEND first, so they'd get the correct set of Assert definitions. Whether it's really any better than just sticking them in c.h isn't clear. Really I'd prefer not to move the backend definitions out of postgres.h at all, just because doing so will lose fifteen years of git history about those particular lines (or at least make it a lot harder to locate with git blame). regards, tom lane
Re: Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend
From
Alvaro Herrera
Date:
Alvaro Herrera wrote: > Andres Freund wrote: > > On 2013-01-18 15:48:01 +0100, Andres Freund wrote: > > > Here's a trivially updated patch which also defines AssertArg() for > > > FRONTEND-ish environments since Alvaro added one in xlogreader.c. > > > > This time for real. Please. > > Here's a different idea: move all the Assert() and StaticAssert() etc > definitions from c.h and postgres.h into a new header, say pgassert.h. > That header is included directly by postgres.h (just like palloc.h and > elog.h already are) so we don't have to touch the backend code at all. > Frontend programs that want that functionality can just #include > "pgassert.h" by themselves. The definitions are (obviously) protected > by #ifdef FRONTEND so that it all works in both environments cleanly. One slight problem with this is the common port/*.c idiom would require an extra include: #ifndef FRONTEND #include "postgres.h" #else #include "postgres_fe.h" #include "pgassert.h" /* <--- new line required here */ #endif /* FRONTEND */ If this is seen as a serious issue (I don't think so) then we would have to include pgassert.h into postgres_fe.h which would make the whole thing pointless (i.e. the same as just having the definitions in c.h). -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend
From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > One slight problem with this is the common port/*.c idiom would require > an extra include: > #ifndef FRONTEND > #include "postgres.h" > #else > #include "postgres_fe.h" > #include "pgassert.h" /* <--- new line required here */ > #endif /* FRONTEND */ > If this is seen as a serious issue (I don't think so) then we would have > to include pgassert.h into postgres_fe.h which would make the whole > thing pointless (i.e. the same as just having the definitions in c.h). We'd only need to add that when/if the file started to use asserts, so I don't think it's a problem. But still not sure it's better than just putting the stuff in c.h. regards, tom lane
Re: Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend
From
Andres Freund
Date:
On 2013-01-18 10:33:16 -0500, Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > Here's a different idea: move all the Assert() and StaticAssert() etc > > definitions from c.h and postgres.h into a new header, say pgassert.h. > > That header is included directly by postgres.h (just like palloc.h and > > elog.h already are) so we don't have to touch the backend code at all. > > Frontend programs that want that functionality can just #include > > "pgassert.h" by themselves. The definitions are (obviously) protected > > by #ifdef FRONTEND so that it all works in both environments cleanly. > > That might work. Files using c.h would have to include this too, but > as described it should work in either environment for them. The other > corner case is pg_controldata.c and other frontend programs that include > postgres.h --- but it looks like they #define FRONTEND first, so they'd > get the correct set of Assert definitions. > > Whether it's really any better than just sticking them in c.h isn't > clear. I don't really see an advantage. To me providing sensible asserts sounds like a good fit for c.h... And we already have StaticAssert*, AssertVariableIsOfType* in c.h... > Really I'd prefer not to move the backend definitions out of postgres.h > at all, just because doing so will lose fifteen years of git history > about those particular lines (or at least make it a lot harder to > locate with git blame). I can imagine some ugly macro solutions to that problem (pgassert.h includes postgres.h with special defines), but that doesn't seem to be warranted to me. The alternative seems to be sprinkling a few more #ifdef FRONTEND's in postgres.h, if thats preferred I can prepare a patch for that although I prefer my proposal. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend
From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes: > On 2013-01-18 10:33:16 -0500, Tom Lane wrote: >> Really I'd prefer not to move the backend definitions out of postgres.h >> at all, just because doing so will lose fifteen years of git history >> about those particular lines (or at least make it a lot harder to >> locate with git blame). > The alternative seems to be sprinkling a few more #ifdef FRONTEND's in > postgres.h, if thats preferred I can prepare a patch for that although I > prefer my proposal. Yeah, surrounding the existing definitions with #ifndef FRONTEND was what I was imagining. But on reflection that seems pretty darn ugly, especially if the corresponding FRONTEND definitions are far away. Maybe we should just live with the git history disconnect. Right at the moment the c.h location seems like the best bet. I don't see much advantage to Alvaro's proposal of a new header file. regards, tom lane
Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend
From
Andres Freund
Date:
On 2013-01-18 11:11:50 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2013-01-18 10:33:16 -0500, Tom Lane wrote: > >> Really I'd prefer not to move the backend definitions out of postgres.h > >> at all, just because doing so will lose fifteen years of git history > >> about those particular lines (or at least make it a lot harder to > >> locate with git blame). > > > The alternative seems to be sprinkling a few more #ifdef FRONTEND's in > > postgres.h, if thats preferred I can prepare a patch for that although I > > prefer my proposal. > > Yeah, surrounding the existing definitions with #ifndef FRONTEND was > what I was imagining. But on reflection that seems pretty darn ugly, > especially if the corresponding FRONTEND definitions are far away. > Maybe we should just live with the git history disconnect. FWIW, git blame -M -C, while noticeably more expensive, seems to be able to connect the history: d08741ea src/include/postgres.h (Tom Lane 2001-02-10 02:31:31 +0000 746) d08741ea src/include/postgres.h (Tom Lane 2001-02-10 02:31:31 +0000 747) #define Assert(condition)\ c5354dff src/include/postgres.h (Bruce Momjian 2002-08-10 20:29:18 +0000 748) Trap(!(condition),"FailedAssertion") d08741ea src/include/postgres.h (Tom Lane 2001-02-10 02:31:31 +0000 749) d08741ea src/include/postgres.h (Tom Lane 2001-02-10 02:31:31 +0000 750) #define AssertMacro(condition)\ c5354dff src/include/postgres.h (Bruce Momjian 2002-08-10 20:29:18 +0000 751) ((void) TrapMacro(!(condition),"FailedAssertion")) d08741ea src/include/postgres.h (Tom Lane 2001-02-10 02:31:31 +0000 752) d08741ea src/include/postgres.h (Tom Lane 2001-02-10 02:31:31 +0000 753) #define AssertArg(condition)\ c5354dff src/include/postgres.h (Bruce Momjian 2002-08-10 20:29:18 +0000 754) Trap(!(condition),"BadArgument") d08741ea src/include/postgres.h (Tom Lane 2001-02-10 02:31:31 +0000 755) d08741ea src/include/postgres.h (Tom Lane 2001-02-10 02:31:31 +0000 756) #define AssertState(condition)\ c5354dff src/include/postgres.h (Bruce Momjian 2002-08-10 20:29:18 +0000 757) Trap(!(condition),"BadState") 8118de2b src/include/c.h (Andres Freund 2012-12-18 00:58:36 +0100 758) Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Steve Singer
Date:
On 13-01-09 03:07 PM, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: >> Well, I *did* benchmark it as noted elsewhere in the thread, but thats >> obviously just machine (E5520 x 2) with one rather restricted workload >> (pgbench -S -jc 40 -T60). At least its rather palloc heavy. >> Here are the numbers: >> before: >> #101646.763208 101350.361595 101421.425668 101571.211688 101862.172051 101449.857665 >> after: >> #101553.596257 102132.277795 101528.816229 101733.541792 101438.531618 101673.400992 >> So on my system if there is a difference, its positive (0.12%). > pgbench-based testing doesn't fill me with a lot of confidence for this > --- its numbers contain a lot of communication overhead, not to mention > that pgbench itself can be a bottleneck. It struck me that we have a > recent test case that's known to be really palloc-intensive, namely > Pavel's example here: > http://www.postgresql.org/message-id/CAFj8pRCKfoz6L82PovLXNK-1JL=jzjwaT8e2BD2PwNKm7i7KVg@mail.gmail.com > > I set up a non-cassert build of commit > 78a5e738e97b4dda89e1bfea60675bcf15f25994 (ie, just before the patch that > reduced the data-copying overhead for that). On my Fedora 16 machine > (dual 2.0GHz Xeon E5503, gcc version 4.6.3 20120306 (Red Hat 4.6.3-2)) > I get a runtime for Pavel's example of 17023 msec (average over five > runs). I then applied oprofile and got a breakdown like this: > > samples| %| > ------------------ > 108409 84.5083 /home/tgl/testversion/bin/postgres > 13723 10.6975 /lib64/libc-2.14.90.so > 3153 2.4579 /home/tgl/testversion/lib/postgresql/plpgsql.so > > samples % symbol name > 10960 10.1495 AllocSetAlloc > 6325 5.8572 MemoryContextAllocZeroAligned > 6225 5.7646 base_yyparse > 3765 3.4866 copyObject > 2511 2.3253 MemoryContextAlloc > 2292 2.1225 grouping_planner > 2044 1.8928 SearchCatCache > 1956 1.8113 core_yylex > 1763 1.6326 expression_tree_walker > 1347 1.2474 MemoryContextCreate > 1340 1.2409 check_stack_depth > 1276 1.1816 GetCachedPlan > 1175 1.0881 AllocSetFree > 1106 1.0242 GetSnapshotData > 1106 1.0242 _SPI_execute_plan > 1101 1.0196 extract_query_dependencies_walker > > I then applied the palloc.h and mcxt.c hunks of your patch and rebuilt. > Now I get an average runtime of 16666 ms, a full 2% faster, which is a > bit astonishing, particularly because the oprofile results haven't moved > much: > > 107642 83.7427 /home/tgl/testversion/bin/postgres > 14677 11.4183 /lib64/libc-2.14.90.so > 3180 2.4740 /home/tgl/testversion/lib/postgresql/plpgsql.so > > samples % symbol name > 10038 9.3537 AllocSetAlloc > 6392 5.9562 MemoryContextAllocZeroAligned > 5763 5.3701 base_yyparse > 4810 4.4821 copyObject > 2268 2.1134 grouping_planner > 2178 2.0295 core_yylex > 1963 1.8292 palloc > 1867 1.7397 SearchCatCache > 1835 1.7099 expression_tree_walker > 1551 1.4453 check_stack_depth > 1374 1.2803 _SPI_execute_plan > 1282 1.1946 MemoryContextCreate > 1187 1.1061 AllocSetFree > ... > 653 0.6085 palloc0 > ... > 552 0.5144 MemoryContextAlloc > > The number of calls of AllocSetAlloc certainly hasn't changed at all, so > how did that get faster? > > I notice that the postgres executable is about 0.2% smaller, presumably > because a whole lot of inlined fetches of CurrentMemoryContext are gone. > This makes me wonder if my result is due to chance improvements of cache > line alignment for inner loops. > > I would like to know if other people get comparable results on other > hardware (non-Intel hardware would be especially interesting). If this > result holds up across a range of platforms, I'll withdraw my objection > to making palloc a plain function. > > regards, tom lane > Sorry for the delay I only read this thread today. I just tried Pawel's test on a POWER5 machine with an older version of gcc (see the grebe buildfarm animal for details) 78a5e738e: 37874.855 (average of 6 runs) 78a5e738 + palloc.h + mcxt.c: 38076.8035 The functions do seem to slightly slow things down on POWER. I haven't bothered to run oprofile or tprof to get a breakdownof the functions since Andres has already removed this from his patch. Steve
Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Andres Freund
Date:
On 2013-01-19 17:33:05 -0500, Steve Singer wrote: > On 13-01-09 03:07 PM, Tom Lane wrote: > >Andres Freund <andres@2ndquadrant.com> writes: > >>Well, I *did* benchmark it as noted elsewhere in the thread, but thats > >>obviously just machine (E5520 x 2) with one rather restricted workload > >>(pgbench -S -jc 40 -T60). At least its rather palloc heavy. > >>Here are the numbers: > >>before: > >>#101646.763208 101350.361595 101421.425668 101571.211688 101862.172051 101449.857665 > >>after: > >>#101553.596257 102132.277795 101528.816229 101733.541792 101438.531618 101673.400992 > >>So on my system if there is a difference, its positive (0.12%). > >pgbench-based testing doesn't fill me with a lot of confidence for this > >--- its numbers contain a lot of communication overhead, not to mention > >that pgbench itself can be a bottleneck. It struck me that we have a > >recent test case that's known to be really palloc-intensive, namely > >Pavel's example here: > >http://www.postgresql.org/message-id/CAFj8pRCKfoz6L82PovLXNK-1JL=jzjwaT8e2BD2PwNKm7i7KVg@mail.gmail.com > > > >I set up a non-cassert build of commit > >78a5e738e97b4dda89e1bfea60675bcf15f25994 (ie, just before the patch that > >reduced the data-copying overhead for that). On my Fedora 16 machine > >(dual 2.0GHz Xeon E5503, gcc version 4.6.3 20120306 (Red Hat 4.6.3-2)) > >I get a runtime for Pavel's example of 17023 msec (average over five > >runs). I then applied oprofile and got a breakdown like this: ... > > > >The number of calls of AllocSetAlloc certainly hasn't changed at all, so > >how did that get faster? > > > >I notice that the postgres executable is about 0.2% smaller, presumably > >because a whole lot of inlined fetches of CurrentMemoryContext are gone. > >This makes me wonder if my result is due to chance improvements of cache > >line alignment for inner loops. > > > >I would like to know if other people get comparable results on other > >hardware (non-Intel hardware would be especially interesting). If this > >result holds up across a range of platforms, I'll withdraw my objection > >to making palloc a plain function. > > > > regards, tom lane > > > > Sorry for the delay I only read this thread today. > > > I just tried Pawel's test on a POWER5 machine with an older version of gcc > (see the grebe buildfarm animal for details) > > 78a5e738e: 37874.855 (average of 6 runs) > 78a5e738 + palloc.h + mcxt.c: 38076.8035 > > The functions do seem to slightly slow things down on POWER. I haven't > bothered to run oprofile or tprof to get a breakdown of the functions > since Andres has already removed this from his patch. I haven't removed it from the patch afaik, so it would be great to get a profile here! Its "only" for xlogdump, but that tool helped me immensely and I don't want to maintain it independently... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Steve Singer
Date:
On 13-01-21 02:28 AM, Andres Freund wrote: > > I haven't removed it from the patch afaik, so it would be great to get a > profile here! Its "only" for xlogdump, but that tool helped me immensely > and I don't want to maintain it independently... Here is the output from tprof Here is the baseline: Average time 37862 Total Ticks For All Processes (./postgres_perftest) = 19089 Subroutine Ticks % Source Address Bytes ========== ===== ====== ====== ======= ===== .AllocSetAlloc 2270 0.99 aset.c 3bfb8 4b0 .base_yyparse 1217 0.53 gram.c fe644 168ec .text 1015 0.44 copyfuncs.c 11bfb4 4430 .MemoryContextAllocZero 535 0.23 mcxt.c 3b990 f0 .MemoryContextAlloc 533 0.23 mcxt.c 3b780 e0 .check_stack_depth 392 0.17 postgres.c 568ac 100 .core_yylex 385 0.17 gram.c fb5b4 1c90 .expression_tree_walker 347 0.15 nodeFuncs.c 50da4 750 .AllocSetFree 308 0.13 aset.c 3c998 1b0 .grouping_planner 242 0.11 planner.c 2399f0 1d10 .SearchCatCache 198 0.09 catcache.c 7ec20 550 ._SPI_execute_plan 195 0.09 spi.c 2f0c18 7c0 .pfree 195 0.09 mcxt.c 3b3b0 70 query_dependencies_walker 185 0.08 setrefs.c 255944 1b0 .GetSnapshotData 183 0.08 procarray.c 69efc 460 .query_tree_walker 176 0.08 nodeFuncs.c 50ae4 210 .strncpy 168 0.07 strncpy.s ba080 130 .fmgr_info_cxt_security 166 0.07 fmgr.c 3f7b0 850 .transformStmt 159 0.07 analyze.c 29091c 12d0 .text 141 0.06 parse_collate.c 28ddf8 1 .ExecInitExpr 137 0.06 execQual.c 17f18c 15f0 .fix_expr_common 132 0.06 setrefs.c 2557e4 160 .standard_ExecutorStart 127 0.06 execMain.c 1d9a00 940 .GetCachedPlan 125 0.05 plancache.c ce664 310 .strcpy 121 0.05 noname 3bd40 1a8 With your changes (same test as I described before) Average: 37938 Total Ticks For All Processes (./postgres_perftest) = 19311 Subroutine Ticks % Source Address Bytes ========== ===== ====== ====== ======= ===== .AllocSetAlloc 2162 2.17 aset.c 3bfb8 4b0 .base_yyparse 1242 1.25 gram.c fdc7c 167f0 .text 1028 1.03 copyfuncs.c 11b4d0 4210 .palloc 553 0.56 mcxt.c 3b4c8 d0 .MemoryContextAllocZero 509 0.51 mcxt.c 3b9e8 f0 .core_yylex 413 0.41 gram.c fac2c 1c60 .check_stack_depth 404 0.41 postgres.c 56730 100 .expression_tree_walker 320 0.32 nodeFuncs.c 50d28 750 .AllocSetFree 261 0.26 aset.c 3c998 1b0 ._SPI_execute_plan 232 0.23 spi.c 2ee624 7c0 .GetSnapshotData 221 0.22 procarray.c 69d54 460 .grouping_planner 211 0.21 planner.c 237b60 1cf0 .MemoryContextAlloc 190 0.19 mcxt.c 3b738 e0 .query_tree_walker 184 0.18 nodeFuncs.c 50a68 210 .SearchCatCache 182 0.18 catcache.c 7ea08 550 .transformStmt 181 0.18 analyze.c 28e774 12d0 query_dependencies_walker 180 0.18 setrefs.c 25397c 1b0 .strncpy 175 0.18 strncpy.s b9a60 130 .MemoryContextCreate 167 0.17 mcxt.c 3bad8 160 .pfree 151 0.15 mcxt.c 3b208 70 .strcpy 150 0.15 noname 3bd40 1a8 .fmgr_info_cxt_security 146 0.15 fmgr.c 3f790 850 .text 132 0.13 parse_collate.c 28bc50 1 .ExecInitExpr 125 0.13 execQual.c 17de28 15e0 .expression_tree_mutator 124 0.12 nodeFuncs.c 53268 1080 .strcmp 122 0.12 noname 1e44 158 .fix_expr_common 117 0.12 setrefs.c 25381c 160 > Greetings, > > Andres Freund >
Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Andres Freund
Date:
On 2013-01-21 11:59:18 -0500, Steve Singer wrote: > On 13-01-21 02:28 AM, Andres Freund wrote: > > > >I haven't removed it from the patch afaik, so it would be great to get a > >profile here! Its "only" for xlogdump, but that tool helped me immensely > >and I don't want to maintain it independently... > > Here is the output from tprof > > Here is the baseline: Any chance to do the test ontop of HEAD? The elog stuff has gone in only afterwards and might actually effect this enough to be relevant. Otherwise I have to say I am a bit confused by the mighty difference in costs attributed to AllocSetAlloc given the amount of calls should be exactly the same and the number of "trampoline" function calls should also stay the same. Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Steve Singer
Date:
On 13-01-21 12:15 PM, Andres Freund wrote: > On 2013-01-21 11:59:18 -0500, Steve Singer wrote: >> On 13-01-21 02:28 AM, Andres Freund wrote: >>> I haven't removed it from the patch afaik, so it would be great to get a >>> profile here! Its "only" for xlogdump, but that tool helped me immensely >>> and I don't want to maintain it independently... >> Here is the output from tprof >> >> Here is the baseline: > Any chance to do the test ontop of HEAD? The elog stuff has gone in only > afterwards and might actually effect this enough to be relevant. HEAD(:fb197290) Average: 28877 .AllocSetAlloc 1442 1.90 aset.c 3a938 4b0 .base_yyparse 1220 1.61 gram.c fdbc0 168ec .MemoryContextAlloc 485 0.64 mcxt.c 3a154 e0 .core_yylex 407 0.54 gram.c fab30 1c90 .AllocSetFree 320 0.42 aset.c 3b318 1b0 .MemoryContextAllocZero 316 0.42 mcxt.c 3a364 f0 .grouping_planner 271 0.36 planner.c 2b0ce8 1d10 .strncpy 256 0.34 strncpy.s b8ca0 130 .expression_tree_walker 222 0.29 nodeFuncs.c 4f734 750 ._SPI_execute_plan 221 0.29 spi.c 2fb230 840 .SearchCatCache 182 0.24 catcache.c 7d870 550 .GetSnapshotData 161 0.21 procarray.c 68acc 460 .fmgr_info_cxt_security 155 0.20 fmgr.c 3e130 850 .pfree 152 0.20 mcxt.c 39d84 70 .expression_tree_mutator 151 0.20 nodeFuncs.c 51c84 1170 .check_stack_depth 142 0.19 postgres.c 5523c 100 .text 126 0.17 parse_collate.c 233d90 1 .transformStmt 125 0.16 analyze.c 289e88 12c0 .ScanKeywordLookup 123 0.16 kwlookup.c f7ab4 154 .new_list 120 0.16 list.c 40f74 b0 .subquery_planner 115 0.15 planner.c 2b29f8 dc0 .GetCachedPlan 115 0.15 plancache.c cdab0 310 .ExecInitExpr 114 0.15 execQual.c 17e690 15f0 .set_plan_refs 113 0.15 setrefs.c 252630 cb0 .standard_ExecutorStart 110 0.14 execMain.c 1d9264 940 .heap_form_tuple 107 0.14 heaptuple.c 177c84 2f0 .query_tree_walker 105 0.14 nodeFuncs.c 4f474 210 HEAD + patch: Average: 29035 Total Ticks For All Processes (./postgres_perftest) = 15044 Subroutine Ticks % Source Address Bytes ========== ===== ====== ====== ======= ===== .AllocSetAlloc 1422 1.64 aset.c 3a938 4b0 .base_yyparse 1201 1.38 gram.c fd1e8 167f0 .palloc 470 0.54 mcxt.c 39e04 d0 .core_yylex 364 0.42 gram.c fa198 1c60 .MemoryContextAllocZero 282 0.33 mcxt.c 3a324 f0 ._SPI_execute_plan 250 0.29 spi.c 2f8c18 840 .AllocSetFree 244 0.28 aset.c 3b318 1b0 .strncpy 234 0.27 strncpy.s b86a0 130 .expression_tree_walker 229 0.26 nodeFuncs.c 4f698 750 .grouping_planner 176 0.20 planner.c 2aea84 1d30 .SearchCatCache 170 0.20 catcache.c 7d638 550 .GetSnapshotData 168 0.19 procarray.c 68904 460 .assign_collations_walker 155 0.18 parse_collate.c 231f4c 7e0 .subquery_planner 141 0.16 planner.c 2b07b4 dc0 .fmgr_info_cxt_security 141 0.16 fmgr.c 3e110 850 .check_stack_depth 140 0.16 postgres.c 550a0 100 .ExecInitExpr 138 0.16 execQual.c 17d2f8 15e0 .pfree 137 0.16 mcxt.c 39b44 70 .transformStmt 132 0.15 analyze.c 287dec 12c0 .new_list 128 0.15 list.c 40f44 90 .expression_tree_mutator 125 0.14 nodeFuncs.c 51bd8 1080 .preprocess_expression 118 0.14 planner.c 2adf54 1a0 .strcmp 118 0.14 noname 1e44 158 .standard_ExecutorStart 115 0.13 execMain.c 1d77c0 940 .MemoryContextAlloc 111 0.13 mcxt.c 3a074 e0 .set_plan_refs 109 0.13 setrefs.c 2506c4 ca0 > Otherwise I have to say I am a bit confused by the mighty difference in > costs attributed to AllocSetAlloc given the amount of calls should be > exactly the same and the number of "trampoline" function calls should > also stay the same. > Greetings, > > Andres Freund > > -- > Andres Freund http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > >
Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
From
Heikki Linnakangas
Date:
For the record, on MSVC we can use __assume(0) to mark unreachable code. It does the same as gcc's __builtin_unreachable(). I tested it with the same Pavel's palloc-heavy test case that you used earlier, with the one-shot plan commit temporarily reverted, and saw a similar speedup you reported with gcc's __builtin_unreachable(). So, committed that. - Heikki
Re: Re: [PATCH 1/5] Centralize Assert* macros into c.h so its common between backend/frontend
From
Alvaro Herrera
Date:
Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2013-01-18 10:33:16 -0500, Tom Lane wrote: > >> Really I'd prefer not to move the backend definitions out of postgres.h > >> at all, just because doing so will lose fifteen years of git history > >> about those particular lines (or at least make it a lot harder to > >> locate with git blame). > > > The alternative seems to be sprinkling a few more #ifdef FRONTEND's in > > postgres.h, if thats preferred I can prepare a patch for that although I > > prefer my proposal. > > Yeah, surrounding the existing definitions with #ifndef FRONTEND was > what I was imagining. But on reflection that seems pretty darn ugly, > especially if the corresponding FRONTEND definitions are far away. > Maybe we should just live with the git history disconnect. > > Right at the moment the c.h location seems like the best bet. Since there seems to be consensus that this is the best way to go, I have committed it. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
I didn't like this bit too much: > diff --git a/contrib/pg_xlogdump/tables.c b/contrib/pg_xlogdump/tables.c > new file mode 100644 > index 0000000..e947e0d > --- /dev/null > +++ b/contrib/pg_xlogdump/tables.c > @@ -0,0 +1,78 @@ > +/* > + * RmgrTable linked only to functions available outside of the backend. > + * > + * needs to be synced with src/backend/access/transam/rmgr.c > + */ > +const RmgrData RmgrTable[RM_MAX_ID + 1] = { > + {"XLOG", NULL, xlog_desc, NULL, NULL, NULL}, > + {"Transaction", NULL, xact_desc, NULL, NULL, NULL}, So I propose the following patch instead. This lets pg_xlogdump compile only the file it needs, and not concern itself with the rest of rmgr.c. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 2013-02-04 13:22:26 -0300, Alvaro Herrera wrote: > > I didn't like this bit too much: > > > diff --git a/contrib/pg_xlogdump/tables.c b/contrib/pg_xlogdump/tables.c > > new file mode 100644 > > index 0000000..e947e0d > > --- /dev/null > > +++ b/contrib/pg_xlogdump/tables.c > > @@ -0,0 +1,78 @@ > > > +/* > > + * RmgrTable linked only to functions available outside of the backend. > > + * > > + * needs to be synced with src/backend/access/transam/rmgr.c > > + */ > > +const RmgrData RmgrTable[RM_MAX_ID + 1] = { > > + {"XLOG", NULL, xlog_desc, NULL, NULL, NULL}, > > + {"Transaction", NULL, xact_desc, NULL, NULL, NULL}, > > So I propose the following patch instead. This lets pg_xlogdump compile > only the file it needs, and not concern itself with the rest of rmgr.c. Hm. Not sure whats the advantage of that, duplicates about as much and makes it harder to copy & paste between both files. I am fine with it, I just don't see a clear advantage of either way. > diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h > index ce9957e..b751882 100644 > --- a/src/include/access/xlog_internal.h > +++ b/src/include/access/xlog_internal.h > @@ -231,15 +231,17 @@ typedef struct xl_end_of_recovery > struct XLogRecord; > > /* > - * Method table for resource managers. > + * Method tables for resource managers. > * > - * RmgrTable[] is indexed by RmgrId values (see rmgr.h). > + * RmgrDescData (for textual descriptor functions) is split so that the file it > + * lives in can be used by frontend programs. > + * > + * RmgrTable[] and RmgrDescTable[] are indexed by RmgrId values (see rmgr.h). > */ > typedef struct RmgrData > { > const char *rm_name; > void (*rm_redo) (XLogRecPtr lsn, struct XLogRecord *rptr); > - void (*rm_desc) (StringInfo buf, uint8 xl_info, char *rec); > void (*rm_startup) (void); > void (*rm_cleanup) (void); > bool (*rm_safe_restartpoint) (void); > @@ -247,6 +249,14 @@ typedef struct RmgrData > > extern const RmgrData RmgrTable[]; > > +typedef struct RmgrDescData > +{ > + void (*rm_desc) (StringInfo buf, uint8 xl_info, char *rec); > +} RmgrDescData; > + > +extern const RmgrDescData RmgrDescTable[]; > + I think we would at least need to include rm_name here as well. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund wrote: > On 2013-02-04 13:22:26 -0300, Alvaro Herrera wrote: > > +typedef struct RmgrDescData > > +{ > > + void (*rm_desc) (StringInfo buf, uint8 xl_info, char *rec); > > +} RmgrDescData; > > + > > +extern const RmgrDescData RmgrDescTable[]; > > + > > I think we would at least need to include rm_name here as well. Not rm_name (that would be duplicative), but a comment with the name of each rmgr's name in its line should be sufficient. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2013-02-04 14:55:56 -0300, Alvaro Herrera wrote: > Andres Freund wrote: > > On 2013-02-04 13:22:26 -0300, Alvaro Herrera wrote: > > > > +typedef struct RmgrDescData > > > +{ > > > + void (*rm_desc) (StringInfo buf, uint8 xl_info, char *rec); > > > +} RmgrDescData; > > > + > > > +extern const RmgrDescData RmgrDescTable[]; > > > + > > > > I think we would at least need to include rm_name here as well. > > Not rm_name (that would be duplicative), but a comment with the name of > each rmgr's name in its line should be sufficient. Maybe I am misunderstanding what you want to do, but xlogdump prints the name of the rmgr and allows to filter by it, how would we do that without rm_name? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund wrote: > On 2013-02-04 14:55:56 -0300, Alvaro Herrera wrote: > > Andres Freund wrote: > > > On 2013-02-04 13:22:26 -0300, Alvaro Herrera wrote: > > > > > > +typedef struct RmgrDescData > > > > +{ > > > > + void (*rm_desc) (StringInfo buf, uint8 xl_info, char *rec); > > > > +} RmgrDescData; > > > > + > > > > +extern const RmgrDescData RmgrDescTable[]; > > > > + > > > > > > I think we would at least need to include rm_name here as well. > > > > Not rm_name (that would be duplicative), but a comment with the name of > > each rmgr's name in its line should be sufficient. > > Maybe I am misunderstanding what you want to do, but xlogdump prints the > name of the rmgr and allows to filter by it, how would we do that > without rm_name? Oh, I thought you wanted it for code documentation purposes only. Okay, that makes sense. (Of course, we'd remove it from RmgrTable). -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH 2/5] Make relpathbackend return a statically result instead of palloc()'ing it
From
Noah Misch
Date:
On Tue, Jan 08, 2013 at 08:02:16PM -0500, Robert Haas wrote: > On Tue, Jan 8, 2013 at 7:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > >> I was thinking more about a sprintf()-type function that only > >> understands a handful of escapes, but adds the additional and novel > >> escapes %I (quote as identifier) and %L (quote as literal). I think > >> that would allow a great deal of code simplification, and it'd be more > >> efficient, too. > > > > Seems like a great idea. Are you offering to code it? > > Not imminently. > > > Note that this wouldn't entirely fix the fmtId problem, as not all the > > uses of fmtId are directly in sprintf calls. Still, it might get rid of > > most of the places where it'd be painful to avoid a memory leak with > > a strdup'ing version of fmtId. > > Yeah, I didn't think about that. Might be worth a look to see how > comprehensively it would solve the problem. But I'll have to leave > that for another day. As a cheap solution addressing 95+% of the trouble, how about giving fmtId() a ring of, say, eight static buffers to cycle through? That satisfies code merely wishing to pass multiple fmtId()'d values to one appendPQExpBuffer(). -- Noah Misch EnterpriseDB http://www.enterprisedb.com