Re: TODO item for pg_ctl and server detection - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: TODO item for pg_ctl and server detection |
Date | |
Msg-id | 201012281828.oBSIS1q23355@momjian.us Whole thread Raw |
In response to | Re: TODO item for pg_ctl and server detection (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: TODO item for pg_ctl and server detection
|
List | pgsql-hackers |
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > While I am working on pg_ctl, I saw this TODO item: > > Have the postmaster write a random number to a file on startup that > > pg_ctl checks against the contents of a pg_ping response on its initial > > connection (without login) > > > This will protect against connecting to an old instance of the > > postmaster in a different or deleted subdirectory. > > > http://archives.postgresql.org/pgsql-bugs/2009-10/msg00110.php > > http://archives.postgresql.org/pgsql-bugs/2009-10/msg00156.php > > > Based on our new PQPing(), do we ever want to implement this or should I > > remove the TODO item? It seems this would require a server connection, > > which is something we didn't want to force pg_ctl -w to do in case > > authentication is broken. > > Well, rereading that old thread makes me realize that what you just > implemented is still pretty far short of what was discussed. In > particular, this implementation entirely fails to cope with the > possibility that a Windows postmaster is using a specialized > listen_addresses setting that has to be taken into account in order to > get a TCP connection. I wonder whether we should revert this patch and > have another go at the idea of a separate postmaster.ports status file > with a line for each active port. I had forgotten about having to use TCP and needing to honor listen_address restrictions. We only need one valid listen_address so I went ahead and added a line to the postmaster.pid file. I am not sure what a separate file will buy us except additional files to open/manage. > The business with a magic number can't be implemented unless we actually > add a new separate pg_ping protocol. PQping() has removed a lot of the > pressure to have that, namely all the authentication-failure problem > cases. I'm not sure that the case where you're looking at an inactive > data directory but there's a live postmaster someplace else with the > same port number is important enough to justify new protocol all by > itself. Yes, that was my calculus too. I realized that we create session ids by merging the process id and backend start time, so I went ahead and added the postmaster start time epoch to the postmaster.pid file. While there is no way to pass back the postmaster start time from PQping, I added code to pg_ctl to make sure the time in the postmaster.pid file is not _before_ pg_ctl started running. We only check PQping() after we have started the postmaster ourselves, so it fits our needs. Patch attached. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml index cda7f64..86bc5a6 100644 *** /tmp/pgdiff.11857/wrhSFb_storage.sgml Tue Dec 28 12:51:36 2010 --- doc/src/sgml/storage.sgml Tue Dec 28 11:57:56 2010 *************** last started with</entry> *** 117,124 **** <row> <entry><filename>postmaster.pid</></entry> <entry>A lock file recording the current postmaster process id (PID), ! cluster data directory, port number, Unix domain socket directory, ! and shared memory segment ID</entry> </row> </tbody> --- 117,125 ---- <row> <entry><filename>postmaster.pid</></entry> <entry>A lock file recording the current postmaster process id (PID), ! postmaster start time, cluster data directory, port number, user-specified ! Unix domain socket directory, first valid listen_address host, and ! shared memory segment ID</entry> </row> </tbody> diff --git a/src/backend/port/ipc_test.c b/src/backend/port/ipc_test.c index a003dc9..461a7a6 100644 *** /tmp/pgdiff.11857/QXAQWd_ipc_test.c Tue Dec 28 12:51:36 2010 --- src/backend/port/ipc_test.c Tue Dec 28 09:38:50 2010 *************** on_exit_reset(void) *** 104,110 **** } void ! RecordSharedMemoryInLockFile(unsigned long id1, unsigned long id2) { } --- 104,110 ---- } void ! AddToLockFile(int target_line, const char *str) { } diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c index d970eb2..ff77099 100644 *** /tmp/pgdiff.11857/KqzRVc_sysv_shmem.c Tue Dec 28 12:51:36 2010 --- src/backend/port/sysv_shmem.c Tue Dec 28 09:54:14 2010 *************** InternalIpcMemoryCreate(IpcMemoryKey mem *** 198,206 **** /* Register on-exit routine to detach new segment before deleting */ on_shmem_exit(IpcMemoryDetach, PointerGetDatum(memAddress)); ! /* Record key and ID in lockfile for data directory. */ ! RecordSharedMemoryInLockFile((unsigned long) memKey, ! (unsigned long) shmid); return memAddress; } --- 198,214 ---- /* Register on-exit routine to detach new segment before deleting */ on_shmem_exit(IpcMemoryDetach, PointerGetDatum(memAddress)); ! /* ! * Append record key and ID in lockfile for data directory. Format ! * to try to keep it the same length. ! */ ! { ! char line[32]; ! ! sprintf(line, "%9lu %9lu\n", (unsigned long) memKey, ! (unsigned long) shmid); ! AddToLockFile(LOCK_FILE_LINES, line); ! } return memAddress; } diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index a46a323..c1e553a 100644 *** /tmp/pgdiff.11857/IjURra_postmaster.c Tue Dec 28 12:51:36 2010 --- src/backend/postmaster/postmaster.c Tue Dec 28 12:22:12 2010 *************** PostmasterMain(int argc, char *argv[]) *** 483,489 **** int status; char *userDoption = NULL; int i; ! MyProcPid = PostmasterPid = getpid(); MyStartTime = time(NULL); --- 483,490 ---- int status; char *userDoption = NULL; int i; ! bool connection_line_output = false; ! MyProcPid = PostmasterPid = getpid(); MyStartTime = time(NULL); *************** PostmasterMain(int argc, char *argv[]) *** 860,869 **** --- 861,882 ---- UnixSocketDir, ListenSocket, MAXLISTEN); else + { status = StreamServerPort(AF_UNSPEC, curhost, (unsigned short) PostPortNumber, UnixSocketDir, ListenSocket, MAXLISTEN); + /* must supply a valid listen_address for PQping() */ + if (!connection_line_output) + { + char line[MAXPGPATH + 2]; + + sprintf(line, "%s\n", curhost); + AddToLockFile(LOCK_FILE_LINES - 1, line); + connection_line_output = true; + } + } + if (status == STATUS_OK) success++; else *************** PostmasterMain(int argc, char *argv[]) *** 880,885 **** --- 893,902 ---- pfree(rawstring); } + /* Supply an empty listen_address line for PQping() */ + if (!connection_line_output) + AddToLockFile(LOCK_FILE_LINES - 1, "\n"); + #ifdef USE_BONJOUR /* Register for Bonjour only if we opened TCP socket(s) */ if (enable_bonjour && ListenSocket[0] != PGINVALID_SOCKET) diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c index d74b5cc..6935786 100644 *** /tmp/pgdiff.11857/sjyEPe_miscinit.c Tue Dec 28 12:51:36 2010 --- src/backend/utils/init/miscinit.c Tue Dec 28 12:38:33 2010 *************** *** 46,52 **** #define DIRECTORY_LOCK_FILE "postmaster.pid" ! ProcessingMode Mode = InitProcessing; /* Note: we rely on this to initialize as zeroes */ --- 46,65 ---- #define DIRECTORY_LOCK_FILE "postmaster.pid" ! /* ! * The lock file contents are: ! * ! * line # ! * 1 pid ! * 2 postmaster start time ! * 3 data directory ! * 4 port # ! * 5 user-specified socket directory ! * (the lines below are added later) ! * 6 first valid listen_address ! * 7 shared memory key ! */ ! ProcessingMode Mode = InitProcessing; /* Note: we rely on this to initialize as zeroes */ *************** CreateLockFile(const char *filename, boo *** 659,665 **** bool isDDLock, const char *refName) { int fd; ! char buffer[MAXPGPATH * 2 + 256]; int ntries; int len; int encoded_pid; --- 672,678 ---- bool isDDLock, const char *refName) { int fd; ! char buffer[MAXPGPATH * 3 + 256]; int ntries; int len; int encoded_pid; *************** CreateLockFile(const char *filename, boo *** 826,836 **** if (isDDLock) { char *ptr = buffer; ! unsigned long id1, ! id2; int lineno; ! for (lineno = 1; lineno <= 4; lineno++) { if ((ptr = strchr(ptr, '\n')) == NULL) { --- 839,848 ---- if (isDDLock) { char *ptr = buffer; ! unsigned long id1, id2; int lineno; ! for (lineno = 1; lineno <= LOCK_FILE_LINES - 1; lineno++) { if ((ptr = strchr(ptr, '\n')) == NULL) { *************** CreateLockFile(const char *filename, boo *** 874,882 **** /* * Successfully created the file, now fill it. */ ! snprintf(buffer, sizeof(buffer), "%d\n%s\n%d\n%s\n", amPostmaster ? (int) my_pid : -((int) my_pid), ! DataDir, PostPortNumber, UnixSocketDir); errno = 0; if (write(fd, buffer, strlen(buffer)) != strlen(buffer)) { --- 886,895 ---- /* * Successfully created the file, now fill it. */ ! snprintf(buffer, sizeof(buffer), "%d\n%ld\n%s\n%d\n%s\n", amPostmaster ? (int) my_pid : -((int) my_pid), ! (long) MyStartTime, DataDir, PostPortNumber, ! UnixSocketDir); errno = 0; if (write(fd, buffer, strlen(buffer)) != strlen(buffer)) { *************** TouchSocketLockFile(void) *** 985,1008 **** } } /* ! * Append information about a shared memory segment to the data directory ! * lock file. ! * ! * This may be called multiple times in the life of a postmaster, if we ! * delete and recreate shmem due to backend crash. Therefore, be prepared ! * to overwrite existing information. (As of 7.1, a postmaster only creates ! * one shm seg at a time; but for the purposes here, if we did have more than ! * one then any one of them would do anyway.) */ void ! RecordSharedMemoryInLockFile(unsigned long id1, unsigned long id2) { int fd; int len; int lineno; char *ptr; ! char buffer[MAXPGPATH * 2 + 256]; fd = open(DIRECTORY_LOCK_FILE, O_RDWR | PG_BINARY, 0); if (fd < 0) --- 998,1016 ---- } } + /* ! * Add lines to the data directory lock file. This erases all following ! * lines, but that is OK because lines are added in order. */ void ! AddToLockFile(int target_line, const char *str) { int fd; int len; int lineno; char *ptr; ! char buffer[MAXPGPATH * 3 + 256]; fd = open(DIRECTORY_LOCK_FILE, O_RDWR | PG_BINARY, 0); if (fd < 0) *************** RecordSharedMemoryInLockFile(unsigned lo *** 1029,1035 **** * Skip over first four lines (PID, pgdata, portnum, socketdir). */ ptr = buffer; ! for (lineno = 1; lineno <= 4; lineno++) { if ((ptr = strchr(ptr, '\n')) == NULL) { --- 1037,1043 ---- * Skip over first four lines (PID, pgdata, portnum, socketdir). */ ptr = buffer; ! for (lineno = 1; lineno < target_line; lineno++) { if ((ptr = strchr(ptr, '\n')) == NULL) { *************** RecordSharedMemoryInLockFile(unsigned lo *** 1040,1050 **** ptr++; } ! /* ! * Append key information. Format to try to keep it the same length ! * always (trailing junk won't hurt, but might confuse humans). ! */ ! sprintf(ptr, "%9lu %9lu\n", id1, id2); /* * And rewrite the data. Since we write in a single kernel call, this --- 1048,1054 ---- ptr++; } ! strlcat(buffer, str, sizeof(buffer)); /* * And rewrite the data. Since we write in a single kernel call, this diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c index 5e1b772..395a863 100644 *** /tmp/pgdiff.11857/8MG68b_pg_ctl.c Tue Dec 28 12:51:36 2010 --- src/bin/pg_ctl/pg_ctl.c Tue Dec 28 12:44:45 2010 *************** *** 22,27 **** --- 22,28 ---- #include <locale.h> #include <signal.h> + #include <stdlib.h> #include <sys/types.h> #include <sys/stat.h> #include <unistd.h> *************** static void read_post_opts(void); *** 138,143 **** --- 139,145 ---- static PGPing test_postmaster_connection(bool); static bool postmaster_is_alive(pid_t pid); + static time_t start_time; static char postopts_file[MAXPGPATH]; static char pid_file[MAXPGPATH]; *************** static PGPing *** 404,416 **** test_postmaster_connection(bool do_checkpoint) { int portnum = 0; ! char socket_dir[MAXPGPATH]; char connstr[MAXPGPATH + 256]; PGPing ret = PQPING_OK; /* assume success for wait == zero */ char **optlines; int i; ! socket_dir[0] = '\0'; connstr[0] = '\0'; for (i = 0; i < wait_seconds; i++) --- 406,418 ---- test_postmaster_connection(bool do_checkpoint) { int portnum = 0; ! char host_str[MAXPGPATH]; char connstr[MAXPGPATH + 256]; PGPing ret = PQPING_OK; /* assume success for wait == zero */ char **optlines; int i; ! host_str[0] = '\0'; connstr[0] = '\0'; for (i = 0; i < wait_seconds; i++) *************** test_postmaster_connection(bool do_check *** 425,437 **** * 0 lock file created but status not written * 2 pre-9.1 server, shared memory not created * 3 pre-9.1 server, shared memory created ! * 4 9.1+ server, shared memory not created ! * 5 9.1+ server, shared memory created * * For pre-9.1 Unix servers, we grab the port number from the * shmem key (first value on line 3). Pre-9.1 Win32 has no ! * written shmem key, so we fail. 9.1+ writes both the port ! * number and socket address in the file for us to use. * (PG_VERSION could also have told us the major version.) */ --- 427,440 ---- * 0 lock file created but status not written * 2 pre-9.1 server, shared memory not created * 3 pre-9.1 server, shared memory created ! * 5 9.1+ server, listen host not created ! * 6 9.1+ server, shared memory not created ! * 7 9.1+ server, shared memory created * * For pre-9.1 Unix servers, we grab the port number from the * shmem key (first value on line 3). Pre-9.1 Win32 has no ! * written shmem key, so we fail. 9.1+ writes connection ! * information in the file for us to use. * (PG_VERSION could also have told us the major version.) */ *************** test_postmaster_connection(bool do_check *** 439,445 **** if ((optlines = readfile(pid_file)) != NULL && optlines[0] != NULL && optlines[1] != NULL && ! optlines[2] != NULL) { /* A 3-line file? */ if (optlines[3] == NULL) --- 442,451 ---- if ((optlines = readfile(pid_file)) != NULL && optlines[0] != NULL && optlines[1] != NULL && ! optlines[2] != NULL && ! /* pre-9.1 server or listen_address line is present? */ ! (optlines[3] == NULL || ! optlines[5] != NULL)) { /* A 3-line file? */ if (optlines[3] == NULL) *************** test_postmaster_connection(bool do_check *** 459,489 **** return PQPING_NO_ATTEMPT; } } ! else /* 9.1+ server */ { ! portnum = atoi(optlines[2]); ! ! /* Get socket directory, if specified. */ ! if (optlines[3][0] != '\n') { ! /* ! * While unix_socket_directory can accept relative ! * directories, libpq's host must have a leading slash ! * to indicate a socket directory. ! */ ! if (optlines[3][0] != '/') ! { ! write_stderr(_("%s: -w option cannot use a relative socket directory specification\n"), ! progname); ! return PQPING_NO_ATTEMPT; ! } ! strlcpy(socket_dir, optlines[3], MAXPGPATH); ! /* remove newline */ ! if (strchr(socket_dir, '\n') != NULL) ! *strchr(socket_dir, '\n') = '\0'; } ! } /* * We need to set connect_timeout otherwise on Windows the * Service Control Manager (SCM) will probably timeout first. --- 465,515 ---- return PQPING_NO_ATTEMPT; } } ! else { ! /* ! * Easy check to see if we are looking at the right ! * data directory: Is the postmaster older than this ! * execution of pg_ctl? ! */ ! if (atol(optlines[1]) < start_time) { ! write_stderr(_("%s: this data directory is running an older postmaster\n"), ! progname); ! return PQPING_NO_ATTEMPT; } ! ! portnum = atoi(optlines[3]); + /* + * Determine the proper host string to use. + */ + #ifdef HAVE_UNIX_SOCKETS + /* + * Use socket directory, if specified. We assume if we + * have unix sockets, the server does too because we + * just started the postmaster. + */ + /* + * While unix_socket_directory can accept relative + * directories, libpq's host must have a leading slash + * to indicate a socket directory. + */ + if (optlines[4][0] != '\n' && optlines[4][0] != '/') + { + write_stderr(_("%s: -w option cannot use a relative socket directory specification\n"), + progname); + return PQPING_NO_ATTEMPT; + } + strlcpy(host_str, optlines[4], sizeof(host_str)); + #else + strlcpy(host_str, optlines[5], sizeof(host_str)); + #endif + /* remove newline */ + if (strchr(host_str, '\n') != NULL) + *strchr(host_str, '\n') = '\0'; + } + /* * We need to set connect_timeout otherwise on Windows the * Service Control Manager (SCM) will probably timeout first. *************** test_postmaster_connection(bool do_check *** 491,499 **** snprintf(connstr, sizeof(connstr), "dbname=postgres port=%d connect_timeout=5", portnum); ! if (socket_dir[0] != '\0') snprintf(connstr + strlen(connstr), sizeof(connstr) - strlen(connstr), ! " host='%s'", socket_dir); } } --- 517,525 ---- snprintf(connstr, sizeof(connstr), "dbname=postgres port=%d connect_timeout=5", portnum); ! if (host_str[0] != '\0') snprintf(connstr + strlen(connstr), sizeof(connstr) - strlen(connstr), ! " host='%s'", host_str); } } *************** main(int argc, char **argv) *** 1756,1761 **** --- 1782,1788 ---- progname = get_progname(argv[0]); set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_ctl")); + start_time = time(NULL); /* * save argv[0] so do_start() can look for the postmaster if necessary. we diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h index 032875e..72e1dc8 100644 *** /tmp/pgdiff.11857/4qcsPd_miscadmin.h Tue Dec 28 12:51:36 2010 --- src/include/miscadmin.h Tue Dec 28 11:13:38 2010 *************** extern PGDLLIMPORT bool process_shared_p *** 348,358 **** extern char *shared_preload_libraries_string; extern char *local_preload_libraries_string; extern void CreateDataDirLockFile(bool amPostmaster); extern void CreateSocketLockFile(const char *socketfile, bool amPostmaster); extern void TouchSocketLockFile(void); ! extern void RecordSharedMemoryInLockFile(unsigned long id1, ! unsigned long id2); extern void ValidatePgVersion(const char *path); extern void process_shared_preload_libraries(void); extern void process_local_preload_libraries(void); --- 348,358 ---- extern char *shared_preload_libraries_string; extern char *local_preload_libraries_string; + #define LOCK_FILE_LINES 7 extern void CreateDataDirLockFile(bool amPostmaster); extern void CreateSocketLockFile(const char *socketfile, bool amPostmaster); extern void TouchSocketLockFile(void); ! extern void AddToLockFile(int target_line, const char *str); extern void ValidatePgVersion(const char *path); extern void process_shared_preload_libraries(void); extern void process_local_preload_libraries(void);
pgsql-hackers by date: