From 0022ac81f9eb4f15d7ac43e52767f446e58df1fc Mon Sep 17 00:00:00 2001 From: Greg Lamberson Date: Fri, 10 Apr 2026 07:27:14 -0500 Subject: [PATCH v1 1/2] Make sync.c syncsw[] extensible via register_sync_handler() sync.c's syncsw[] dispatch table is currently a static const array indexed by the SyncRequestHandler enum. Extension-provided storage managers cannot add their own sync handlers: the only options are to fake being SYNC_HANDLER_MD (which only works if the extension's data lives as md segment files) or to bypass sync.c entirely (losing checkpoint batching, cycle_ctr semantics, SYNC_FORGET_REQUEST and SYNC_FILTER_REQUEST cancellation, and the existing ring-buffer protocol). This commit adds a public register_sync_handler() API parallel to smgr_register(), converts syncsw[] to a heap-allocated dynamic array, and pre-registers the five built-in handlers (MD, CLOG, commit_ts, multixact_offset, multixact_member) so their canonical enum values 0..4 are preserved exactly. Extension IDs start at SYNC_HANDLER_FIRST_DYNAMIC. Extensions call register_sync_handler() from _PG_init() during shared_preload_libraries load; late calls raise FATAL. Built-in registration uses a private sync_handler_register_internal() helper that bypasses the preload-phase guard, because InitSync() runs in auxiliary processes after preload is done (for idempotency: the NSyncHandlers == 0 guard makes repeated calls safe). Built-in registration happens in the postmaster before process_shared_preload_libraries() runs, so the five built-ins always occupy IDs 0..4 before any extension gets a chance to claim an ID. This is the key ordering constraint: the enum values SYNC_HANDLER_MD=0 et al. are compile-time constants that the rest of the tree still uses, so the runtime table must be populated in that exact order. Design notes: - SyncOps is moved to a public typedef in sync.h so extensions can declare const SyncOps at file scope. It is placed after FileTag because its function-pointer fields take const FileTag *. - SYNC_HANDLER_NONE becomes -1 (sentinel) instead of 5. SYNC_HANDLER_FIRST_DYNAMIC takes the post-MULTIXACT_MEMBER slot that SYNC_HANDLER_NONE used to occupy. - Assert(NSyncHandlers == SYNC_HANDLER_FIRST_DYNAMIC) enforces the invariant: if a new built-in is added to the enum, the build fail-fasts at first boot until a matching sync_handler_register_ internal() call is added to InitSyncHandlers(). - ProcessSyncRequests() and related dispatch sites gain defensive bounds-check Asserts. - No WAL format changes. No FileTag layout changes. No shared memory changes. Dispatch hot path: because syncsw is now a mutable static pointer instead of a const array, GCC cannot hoist the base-address load out of the dispatch loop on its own (it must conservatively assume the pointer could change between iterations). To recover the stock register allocation, each caller that dispatches through the table caches the base pointer in a local at function entry: SyncOps *ops = syncsw; ... if (ops[entry->tag.handler].sync_syncfiletag(&entry->tag, path) == 0) This applies to ProcessSyncRequests() and SyncPostCheckpoint(). The SYNC_FILTER_REQUEST branch of RememberSyncRequest() caches the per- handler filetagmatches function pointer directly since both match loops call the same one. With this hoisting, the per-dispatch instruction sequence compiles to byte-identical assembly as the pre-patch static const array (verified with objdump on GCC 14.2 at -O2). The only remaining delta is one additional memory load at function entry to fetch the syncsw pointer: a single L1-cache hit, paid once per call to ProcessSyncRequests, not per dispatch. Motivating use case: lamstore, a new extent-based storage backend whose data lives at byte offsets inside a volume rather than as md segment files. Additional beneficiaries include the fsync validation tooling Heikki Linnakangas and Andres Freund have asked for in passing (quoted in Tristan Partin's 2024-01-12 message in the CF 5616 thread): with this API, a validation extension can register its own handler and observe every sync request regardless of the underlying storage manager. Relationship to commitfest 5616 'Extensible storage manager API': this is a narrow, focused companion, not a dependency or replacement. 5616 makes smgrsw[] dynamic; this patch does the same for syncsw[]. The two are orthogonal: this patch applies cleanly against master today without 5616, and does not depend on any of 5616's unresolved design questions (catalog recovery, GUC chaining, per-tablespace vs per-relation configuration, performance benchmarks). If 5616 lands later, the two patches compose naturally: extensions will call smgr_register() and register_sync_handler() back to back in their _PG_init. Tested: make check-world passes. See the next commit which adds src/test/modules/test_sync_handler, a minimal extension and TAP test exercising registration, dispatch, HASH_BLOBS coalescing, and cycle_ctr skip behavior. Signed-off-by: Greg Lamberson --- src/backend/postmaster/postmaster.c | 11 ++ src/backend/storage/sync/sync.c | 259 ++++++++++++++++++++++++---- src/include/storage/sync.h | 64 ++++++- 3 files changed, 293 insertions(+), 41 deletions(-) diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 6e0f41d2661..8d2ab37ce26 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -116,6 +116,7 @@ #include "storage/pmsignal.h" #include "storage/proc.h" #include "storage/shmem_internal.h" +#include "storage/sync.h" #include "tcop/backend_startup.h" #include "tcop/tcopprot.h" #include "utils/datetime.h" @@ -929,6 +930,16 @@ PostmasterMain(int argc, char *argv[]) */ RegisterBuiltinShmemCallbacks(); + /* + * Register the built-in sync handlers (md, CLOG, commit_ts, + * multixact_offset, multixact_member). This must happen before + * process_shared_preload_libraries() so that extensions which + * call register_sync_handler() from their _PG_init() receive IDs + * starting at SYNC_HANDLER_FIRST_DYNAMIC instead of colliding + * with the built-in slots. + */ + InitSyncHandlers(); + /* * process any libraries that should be preloaded at postmaster start */ diff --git a/src/backend/storage/sync/sync.c b/src/backend/storage/sync/sync.c index 2c964b6f3d9..222829310cf 100644 --- a/src/backend/storage/sync/sync.c +++ b/src/backend/storage/sync/sync.c @@ -80,50 +80,200 @@ static CycleCtr checkpoint_cycle_ctr = 0; #define UNLINKS_PER_ABSORB 10 /* - * Function pointers for handling sync and unlink requests. + * Sync handler dispatch table. + * + * Populated by register_sync_handler(), which is called from InitSync() + * for the built-in handlers and from extension _PG_init() functions + * for extension handlers. After shared_preload_libraries finishes + * loading, syncsw[] is effectively immutable: every backend and the + * checkpointer inherit the same fully-populated array via fork() from + * the postmaster. + * + * SyncOps itself is defined in sync.h so that extensions can declare + * const SyncOps instances at file scope. */ -typedef struct SyncOps -{ - int (*sync_syncfiletag) (const FileTag *ftag, char *path); - int (*sync_unlinkfiletag) (const FileTag *ftag, char *path); - bool (*sync_filetagmatches) (const FileTag *ftag, - const FileTag *candidate); -} SyncOps; +static SyncOps *syncsw = NULL; +static const char **sync_handler_names = NULL; +static int NSyncHandlers = 0; +static int sync_handlers_capacity = 0; /* - * These indexes must correspond to the values of the SyncRequestHandler enum. + * Built-in SyncOps, registered in enum order during InitSync() so that + * SYNC_HANDLER_MD == 0, SYNC_HANDLER_CLOG == 1, etc. */ -static const SyncOps syncsw[] = { - /* magnetic disk */ - [SYNC_HANDLER_MD] = { - .sync_syncfiletag = mdsyncfiletag, - .sync_unlinkfiletag = mdunlinkfiletag, - .sync_filetagmatches = mdfiletagmatches - }, - /* pg_xact */ - [SYNC_HANDLER_CLOG] = { - .sync_syncfiletag = clogsyncfiletag - }, - /* pg_commit_ts */ - [SYNC_HANDLER_COMMIT_TS] = { - .sync_syncfiletag = committssyncfiletag - }, - /* pg_multixact/offsets */ - [SYNC_HANDLER_MULTIXACT_OFFSET] = { - .sync_syncfiletag = multixactoffsetssyncfiletag - }, - /* pg_multixact/members */ - [SYNC_HANDLER_MULTIXACT_MEMBER] = { - .sync_syncfiletag = multixactmemberssyncfiletag - } +static const SyncOps builtin_md_ops = { + .sync_syncfiletag = mdsyncfiletag, + .sync_unlinkfiletag = mdunlinkfiletag, + .sync_filetagmatches = mdfiletagmatches, +}; +static const SyncOps builtin_clog_ops = { + .sync_syncfiletag = clogsyncfiletag, +}; +static const SyncOps builtin_committs_ops = { + .sync_syncfiletag = committssyncfiletag, +}; +static const SyncOps builtin_multixact_offset_ops = { + .sync_syncfiletag = multixactoffsetssyncfiletag, +}; +static const SyncOps builtin_multixact_member_ops = { + .sync_syncfiletag = multixactmemberssyncfiletag, }; +/* + * Internal helper that adds an entry to syncsw[] without performing the + * preload-phase check. Used by InitSync() to install the built-in + * handlers, which must be present in every process that calls into + * sync.c (including the checkpointer, which runs after + * shared_preload_libraries has finished loading). + */ +static int16 +sync_handler_register_internal(const SyncOps *ops, const char *name) +{ + int16 my_id; + MemoryContext old; + + if (ops == NULL || ops->sync_syncfiletag == NULL) + ereport(FATAL, + (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), + errmsg("register_sync_handler: sync_syncfiletag is required"))); + + if (name == NULL || *name == '\0') + ereport(FATAL, + (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), + errmsg("register_sync_handler: name must be non-empty"))); + + if (NSyncHandlers >= SYNC_HANDLER_MAX) + ereport(FATAL, + (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED), + errmsg("too many sync handlers registered (limit %d)", + SYNC_HANDLER_MAX))); + + old = MemoryContextSwitchTo(TopMemoryContext); + + if (NSyncHandlers >= sync_handlers_capacity) + { + int new_cap = (sync_handlers_capacity == 0) + ? 8 + : sync_handlers_capacity * 2; + + if (new_cap > SYNC_HANDLER_MAX) + new_cap = SYNC_HANDLER_MAX; + + if (syncsw == NULL) + { + syncsw = palloc(sizeof(SyncOps) * new_cap); + sync_handler_names = palloc(sizeof(char *) * new_cap); + } + else + { + syncsw = repalloc(syncsw, sizeof(SyncOps) * new_cap); + sync_handler_names = repalloc(sync_handler_names, + sizeof(char *) * new_cap); + } + sync_handlers_capacity = new_cap; + } + + my_id = (int16) NSyncHandlers++; + memcpy(&syncsw[my_id], ops, sizeof(SyncOps)); + sync_handler_names[my_id] = pstrdup(name); + + MemoryContextSwitchTo(old); + + /* + * No barrier needed: registration only happens during + * shared_preload_libraries load, which is single-threaded in the + * postmaster. All backends and the checkpointer inherit the + * fully-populated array via fork() after preload returns. + */ + return my_id; +} + +/* + * Public registration entry point for extensions. See sync.h for the + * contract. + * + * Extensions must call this from their _PG_init() while the postmaster + * is still loading shared_preload_libraries; late calls raise FATAL. + * Built-in handlers bypass this guard via sync_handler_register_internal() + * because the checkpointer auxiliary process calls InitSync() after + * preload has finished, and the built-in dispatch table must still be + * populated in that process. + */ +int16 +register_sync_handler(const SyncOps *ops, const char *name) +{ + if (process_shared_preload_libraries_done) + ereport(FATAL, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("sync handlers must be registered in " + "shared_preload_libraries phase"))); + + return sync_handler_register_internal(ops, name); +} + +/* + * Register the built-in sync handlers. + * + * This MUST run before any call to register_sync_handler() from + * extension _PG_init() code, so that the built-in handlers occupy + * their canonical IDs (SYNC_HANDLER_MD = 0, SYNC_HANDLER_CLOG = 1, + * etc.) and extension handlers are assigned IDs >= + * SYNC_HANDLER_FIRST_DYNAMIC. + * + * Called from: + * - PostmasterMain(), just before process_shared_preload_libraries() + * - AuxiliaryProcessMain() (not currently needed because aux procs + * fork from the postmaster with syncsw[] already populated, but + * see the idempotent NSyncHandlers==0 guard below) + * - Standalone backend init (via InitSync -> InitSyncHandlers) + * + * Idempotent: the NSyncHandlers == 0 guard ensures built-ins are + * registered exactly once per process. Safe to call from multiple + * init paths. + */ +void +InitSyncHandlers(void) +{ + if (NSyncHandlers != 0) + return; + + (void) sync_handler_register_internal(&builtin_md_ops, "md"); + (void) sync_handler_register_internal(&builtin_clog_ops, "clog"); + (void) sync_handler_register_internal(&builtin_committs_ops, "commit_ts"); + (void) sync_handler_register_internal(&builtin_multixact_offset_ops, + "multixact_offset"); + (void) sync_handler_register_internal(&builtin_multixact_member_ops, + "multixact_member"); + + /* + * Enforce the enum-to-count invariant: if a new built-in is added + * to the SyncRequestHandler enum, the build will fail-fast at + * first boot until a matching sync_handler_register_internal() + * call is added here. + */ + Assert(NSyncHandlers == SYNC_HANDLER_FIRST_DYNAMIC); +} + /* * Initialize data structures for the file sync tracking. + * + * This runs in processes that actually need the pendingOps hash table + * (standalone backends and the checkpointer). It also calls + * InitSyncHandlers() defensively in case this process reached here + * without the postmaster having done so, e.g., standalone mode. */ void InitSync(void) { + /* + * Make sure built-in handlers are registered. In the postmaster, + * this was already called from PostmasterMain() before + * process_shared_preload_libraries(); in standalone mode it is + * called here for the first (and only) time. The NSyncHandlers + * guard inside InitSyncHandlers() makes it idempotent. + */ + InitSyncHandlers(); + /* * Create pending-operations hashtable if we need it. Currently, we need * it if we are standalone (not under a postmaster) or if we are a @@ -205,6 +355,19 @@ SyncPostCheckpoint(void) int absorb_counter; ListCell *lc; + /* + * Cache the syncsw base pointer in a local for the duration of this + * function. Without this, the compiler cannot hoist the load of the + * mutable static pointer out of the dispatch loop, and each dispatch + * costs an extra memory load plus an address-materialization LEA + * (verified with objdump on GCC 14.2 -O2). With the local cached, the + * per-entry dispatch compiles down to identical assembly as the + * pre-patch static-const array. Safe because register_sync_handler() + * is forbidden after process_shared_preload_libraries_done and syncsw + * is never mutated outside registration. + */ + SyncOps *ops = syncsw; + absorb_counter = UNLINKS_PER_ABSORB; foreach(lc, pendingUnlinks) { @@ -227,9 +390,12 @@ SyncPostCheckpoint(void) if (entry->cycle_ctr == checkpoint_cycle_ctr) break; + Assert(entry->tag.handler >= 0 && + entry->tag.handler < NSyncHandlers); + /* Unlink the file */ - if (syncsw[entry->tag.handler].sync_unlinkfiletag(&entry->tag, - path) < 0) + if (ops[entry->tag.handler].sync_unlinkfiletag(&entry->tag, + path) < 0) { /* * There's a race condition, when the database is dropped at the @@ -301,6 +467,9 @@ ProcessSyncRequests(void) uint64 longest = 0; uint64 total_elapsed = 0; + /* See comment in SyncPostCheckpoint() above. */ + SyncOps *ops = syncsw; + /* * This is only called during checkpoints, and checkpoints should only * occur in processes that have created a pendingOps. @@ -412,9 +581,12 @@ ProcessSyncRequests(void) { char path[MAXPGPATH]; + Assert(entry->tag.handler >= 0 && + entry->tag.handler < NSyncHandlers); + INSTR_TIME_SET_CURRENT(sync_start); - if (syncsw[entry->tag.handler].sync_syncfiletag(&entry->tag, - path) == 0) + if (ops[entry->tag.handler].sync_syncfiletag(&entry->tag, + path) == 0) { /* Success; update statistics about sync timing */ INSTR_TIME_SET_CURRENT(sync_end); @@ -506,13 +678,24 @@ RememberSyncRequest(const FileTag *ftag, SyncRequestType type) HASH_SEQ_STATUS hstat; PendingFsyncEntry *pfe; ListCell *cell; + bool (*filetagmatches) (const FileTag *ftag, + const FileTag *candidate); + + Assert(ftag->handler >= 0 && ftag->handler < NSyncHandlers); + + /* + * Cache the per-handler filetagmatches function pointer once so + * both match loops keep it in a register. See comment in + * SyncPostCheckpoint(). + */ + filetagmatches = syncsw[ftag->handler].sync_filetagmatches; /* Cancel matching fsync requests */ hash_seq_init(&hstat, pendingOps); while ((pfe = (PendingFsyncEntry *) hash_seq_search(&hstat)) != NULL) { if (pfe->tag.handler == ftag->handler && - syncsw[ftag->handler].sync_filetagmatches(ftag, &pfe->tag)) + filetagmatches(ftag, &pfe->tag)) pfe->canceled = true; } @@ -522,7 +705,7 @@ RememberSyncRequest(const FileTag *ftag, SyncRequestType type) PendingUnlinkEntry *pue = (PendingUnlinkEntry *) lfirst(cell); if (pue->tag.handler == ftag->handler && - syncsw[ftag->handler].sync_filetagmatches(ftag, &pue->tag)) + filetagmatches(ftag, &pue->tag)) pue->canceled = true; } } diff --git a/src/include/storage/sync.h b/src/include/storage/sync.h index 88290500bc9..5452eb714da 100644 --- a/src/include/storage/sync.h +++ b/src/include/storage/sync.h @@ -29,8 +29,13 @@ typedef enum SyncRequestType } SyncRequestType; /* - * Which set of functions to use to handle a given request. The values of - * the enumerators must match the indexes of the function table in sync.c. + * Which set of functions to use to handle a given request. Built-in + * handlers occupy the fixed enum values below; extensions register + * additional handlers via register_sync_handler() during + * shared_preload_libraries initialization and receive IDs starting + * at SYNC_HANDLER_FIRST_DYNAMIC. The values of the built-in + * enumerators must match the order in which InitSync() pre-registers + * the corresponding SyncOps structs in sync.c. */ typedef enum SyncRequestHandler { @@ -39,9 +44,19 @@ typedef enum SyncRequestHandler SYNC_HANDLER_COMMIT_TS, SYNC_HANDLER_MULTIXACT_OFFSET, SYNC_HANDLER_MULTIXACT_MEMBER, - SYNC_HANDLER_NONE, + + /* Extensions' dynamic handler IDs start here. */ + SYNC_HANDLER_FIRST_DYNAMIC, + + /* + * Sentinel for "no handler": fits in int16, outside the valid ID + * range so it cannot be confused with any registered handler. + */ + SYNC_HANDLER_NONE = -1, } SyncRequestHandler; +#define SYNC_HANDLER_MAX INT16_MAX + /* * A tag identifying a file. Currently it has the members required for md.c's * usage, but sync.c has no knowledge of the internal structure, and it is @@ -55,6 +70,25 @@ typedef struct FileTag uint64 segno; } FileTag; +/* + * Dispatch table entry for a sync handler. Public so extensions can + * define their own SyncOps and pass them to register_sync_handler(). + * + * sync_syncfiletag is required. sync_unlinkfiletag and + * sync_filetagmatches may be NULL if the handler does not support + * SYNC_UNLINK_REQUEST or SYNC_FILTER_REQUEST respectively, matching + * the pattern of the built-in CLOG/commit_ts/multixact handlers which + * only define sync_syncfiletag. + */ +typedef struct SyncOps +{ + int (*sync_syncfiletag) (const FileTag *ftag, char *path); + int (*sync_unlinkfiletag) (const FileTag *ftag, char *path); + bool (*sync_filetagmatches) (const FileTag *ftag, + const FileTag *candidate); +} SyncOps; + +extern void InitSyncHandlers(void); extern void InitSync(void); extern void SyncPreCheckpoint(void); extern void SyncPostCheckpoint(void); @@ -63,4 +97,28 @@ extern void RememberSyncRequest(const FileTag *ftag, SyncRequestType type); extern bool RegisterSyncRequest(const FileTag *ftag, SyncRequestType type, bool retryOnError); +/* + * Register a custom sync handler. Returns the assigned handler ID + * which the extension stores in FileTag.handler when queueing sync + * requests via RegisterSyncRequest(). + * + * MUST be called during shared_preload_libraries initialization + * (before process_shared_preload_libraries_done is set); later calls + * raise FATAL. `name` is used for error messages and is pstrdup'd + * into TopMemoryContext by the caller; callers do not need to keep + * the buffer alive. + * + * `ops->sync_syncfiletag` is required; the other two pointers may + * be NULL if the handler does not participate in SYNC_UNLINK_REQUEST + * or SYNC_FILTER_REQUEST flows. + * + * The returned ID is stable for the lifetime of the postmaster. + * Sync requests live only in the checkpointer's in-memory pendingOps + * hash table (they are not persisted across restarts), so there is + * no cross-restart stability requirement beyond the same + * shared_preload_libraries order that smgr_register() already relies + * on. + */ +extern int16 register_sync_handler(const SyncOps *ops, const char *name); + #endif /* SYNC_H */ -- 2.47.3