Thread: TODO item for pg_ctl and server detection
While I am working on pg_ctl, I saw this TODO item: Have the postmaster write a random number to a file on startup thatpg_ctl checks against the contents of a pg_ping responseon its initialconnection (without login) This will protect against connecting to an old instance of thepostmasterin a different or deleted subdirectory. http://archives.postgresql.org/pgsql-bugs/2009-10/msg00110.phphttp://archives.postgresql.org/pgsql-bugs/2009-10/msg00156.php Based on our new PQPing(), do we ever want to implement this or should I remove the TODO item? It seems this would require a server connection, which is something we didn't want to force pg_ctl -w to do in case authentication is broken. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian <bruce@momjian.us> writes: > While I am working on pg_ctl, I saw this TODO item: > Have the postmaster write a random number to a file on startup that > pg_ctl checks against the contents of a pg_ping response on its initial > connection (without login) > This will protect against connecting to an old instance of the > postmaster in a different or deleted subdirectory. > http://archives.postgresql.org/pgsql-bugs/2009-10/msg00110.php > http://archives.postgresql.org/pgsql-bugs/2009-10/msg00156.php > Based on our new PQPing(), do we ever want to implement this or should I > remove the TODO item? It seems this would require a server connection, > which is something we didn't want to force pg_ctl -w to do in case > authentication is broken. Well, rereading that old thread makes me realize that what you just implemented is still pretty far short of what was discussed. In particular, this implementation entirely fails to cope with the possibility that a Windows postmaster is using a specialized listen_addresses setting that has to be taken into account in order to get a TCP connection. I wonder whether we should revert this patch and have another go at the idea of a separate postmaster.ports status file with a line for each active port. The business with a magic number can't be implemented unless we actually add a new separate pg_ping protocol. PQping() has removed a lot of the pressure to have that, namely all the authentication-failure problem cases. I'm not sure that the case where you're looking at an inactive data directory but there's a live postmaster someplace else with the same port number is important enough to justify new protocol all by itself. regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > While I am working on pg_ctl, I saw this TODO item: > > Have the postmaster write a random number to a file on startup that > > pg_ctl checks against the contents of a pg_ping response on its initial > > connection (without login) > > > This will protect against connecting to an old instance of the > > postmaster in a different or deleted subdirectory. > > > http://archives.postgresql.org/pgsql-bugs/2009-10/msg00110.php > > http://archives.postgresql.org/pgsql-bugs/2009-10/msg00156.php > > > Based on our new PQPing(), do we ever want to implement this or should I > > remove the TODO item? It seems this would require a server connection, > > which is something we didn't want to force pg_ctl -w to do in case > > authentication is broken. > > Well, rereading that old thread makes me realize that what you just > implemented is still pretty far short of what was discussed. In > particular, this implementation entirely fails to cope with the > possibility that a Windows postmaster is using a specialized > listen_addresses setting that has to be taken into account in order to > get a TCP connection. I wonder whether we should revert this patch and > have another go at the idea of a separate postmaster.ports status file > with a line for each active port. I had forgotten about having to use TCP and needing to honor listen_address restrictions. We only need one valid listen_address so I went ahead and added a line to the postmaster.pid file. I am not sure what a separate file will buy us except additional files to open/manage. > The business with a magic number can't be implemented unless we actually > add a new separate pg_ping protocol. PQping() has removed a lot of the > pressure to have that, namely all the authentication-failure problem > cases. I'm not sure that the case where you're looking at an inactive > data directory but there's a live postmaster someplace else with the > same port number is important enough to justify new protocol all by > itself. Yes, that was my calculus too. I realized that we create session ids by merging the process id and backend start time, so I went ahead and added the postmaster start time epoch to the postmaster.pid file. While there is no way to pass back the postmaster start time from PQping, I added code to pg_ctl to make sure the time in the postmaster.pid file is not _before_ pg_ctl started running. We only check PQping() after we have started the postmaster ourselves, so it fits our needs. Patch attached. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml index cda7f64..86bc5a6 100644 *** /tmp/pgdiff.11857/wrhSFb_storage.sgml Tue Dec 28 12:51:36 2010 --- doc/src/sgml/storage.sgml Tue Dec 28 11:57:56 2010 *************** last started with</entry> *** 117,124 **** <row> <entry><filename>postmaster.pid</></entry> <entry>A lock file recording the current postmaster process id (PID), ! cluster data directory, port number, Unix domain socket directory, ! and shared memory segment ID</entry> </row> </tbody> --- 117,125 ---- <row> <entry><filename>postmaster.pid</></entry> <entry>A lock file recording the current postmaster process id (PID), ! postmaster start time, cluster data directory, port number, user-specified ! Unix domain socket directory, first valid listen_address host, and ! shared memory segment ID</entry> </row> </tbody> diff --git a/src/backend/port/ipc_test.c b/src/backend/port/ipc_test.c index a003dc9..461a7a6 100644 *** /tmp/pgdiff.11857/QXAQWd_ipc_test.c Tue Dec 28 12:51:36 2010 --- src/backend/port/ipc_test.c Tue Dec 28 09:38:50 2010 *************** on_exit_reset(void) *** 104,110 **** } void ! RecordSharedMemoryInLockFile(unsigned long id1, unsigned long id2) { } --- 104,110 ---- } void ! AddToLockFile(int target_line, const char *str) { } diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c index d970eb2..ff77099 100644 *** /tmp/pgdiff.11857/KqzRVc_sysv_shmem.c Tue Dec 28 12:51:36 2010 --- src/backend/port/sysv_shmem.c Tue Dec 28 09:54:14 2010 *************** InternalIpcMemoryCreate(IpcMemoryKey mem *** 198,206 **** /* Register on-exit routine to detach new segment before deleting */ on_shmem_exit(IpcMemoryDetach, PointerGetDatum(memAddress)); ! /* Record key and ID in lockfile for data directory. */ ! RecordSharedMemoryInLockFile((unsigned long) memKey, ! (unsigned long) shmid); return memAddress; } --- 198,214 ---- /* Register on-exit routine to detach new segment before deleting */ on_shmem_exit(IpcMemoryDetach, PointerGetDatum(memAddress)); ! /* ! * Append record key and ID in lockfile for data directory. Format ! * to try to keep it the same length. ! */ ! { ! char line[32]; ! ! sprintf(line, "%9lu %9lu\n", (unsigned long) memKey, ! (unsigned long) shmid); ! AddToLockFile(LOCK_FILE_LINES, line); ! } return memAddress; } diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index a46a323..c1e553a 100644 *** /tmp/pgdiff.11857/IjURra_postmaster.c Tue Dec 28 12:51:36 2010 --- src/backend/postmaster/postmaster.c Tue Dec 28 12:22:12 2010 *************** PostmasterMain(int argc, char *argv[]) *** 483,489 **** int status; char *userDoption = NULL; int i; ! MyProcPid = PostmasterPid = getpid(); MyStartTime = time(NULL); --- 483,490 ---- int status; char *userDoption = NULL; int i; ! bool connection_line_output = false; ! MyProcPid = PostmasterPid = getpid(); MyStartTime = time(NULL); *************** PostmasterMain(int argc, char *argv[]) *** 860,869 **** --- 861,882 ---- UnixSocketDir, ListenSocket, MAXLISTEN); else + { status = StreamServerPort(AF_UNSPEC, curhost, (unsigned short) PostPortNumber, UnixSocketDir, ListenSocket, MAXLISTEN); + /* must supply a valid listen_address for PQping() */ + if (!connection_line_output) + { + char line[MAXPGPATH + 2]; + + sprintf(line, "%s\n", curhost); + AddToLockFile(LOCK_FILE_LINES - 1, line); + connection_line_output = true; + } + } + if (status == STATUS_OK) success++; else *************** PostmasterMain(int argc, char *argv[]) *** 880,885 **** --- 893,902 ---- pfree(rawstring); } + /* Supply an empty listen_address line for PQping() */ + if (!connection_line_output) + AddToLockFile(LOCK_FILE_LINES - 1, "\n"); + #ifdef USE_BONJOUR /* Register for Bonjour only if we opened TCP socket(s) */ if (enable_bonjour && ListenSocket[0] != PGINVALID_SOCKET) diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c index d74b5cc..6935786 100644 *** /tmp/pgdiff.11857/sjyEPe_miscinit.c Tue Dec 28 12:51:36 2010 --- src/backend/utils/init/miscinit.c Tue Dec 28 12:38:33 2010 *************** *** 46,52 **** #define DIRECTORY_LOCK_FILE "postmaster.pid" ! ProcessingMode Mode = InitProcessing; /* Note: we rely on this to initialize as zeroes */ --- 46,65 ---- #define DIRECTORY_LOCK_FILE "postmaster.pid" ! /* ! * The lock file contents are: ! * ! * line # ! * 1 pid ! * 2 postmaster start time ! * 3 data directory ! * 4 port # ! * 5 user-specified socket directory ! * (the lines below are added later) ! * 6 first valid listen_address ! * 7 shared memory key ! */ ! ProcessingMode Mode = InitProcessing; /* Note: we rely on this to initialize as zeroes */ *************** CreateLockFile(const char *filename, boo *** 659,665 **** bool isDDLock, const char *refName) { int fd; ! char buffer[MAXPGPATH * 2 + 256]; int ntries; int len; int encoded_pid; --- 672,678 ---- bool isDDLock, const char *refName) { int fd; ! char buffer[MAXPGPATH * 3 + 256]; int ntries; int len; int encoded_pid; *************** CreateLockFile(const char *filename, boo *** 826,836 **** if (isDDLock) { char *ptr = buffer; ! unsigned long id1, ! id2; int lineno; ! for (lineno = 1; lineno <= 4; lineno++) { if ((ptr = strchr(ptr, '\n')) == NULL) { --- 839,848 ---- if (isDDLock) { char *ptr = buffer; ! unsigned long id1, id2; int lineno; ! for (lineno = 1; lineno <= LOCK_FILE_LINES - 1; lineno++) { if ((ptr = strchr(ptr, '\n')) == NULL) { *************** CreateLockFile(const char *filename, boo *** 874,882 **** /* * Successfully created the file, now fill it. */ ! snprintf(buffer, sizeof(buffer), "%d\n%s\n%d\n%s\n", amPostmaster ? (int) my_pid : -((int) my_pid), ! DataDir, PostPortNumber, UnixSocketDir); errno = 0; if (write(fd, buffer, strlen(buffer)) != strlen(buffer)) { --- 886,895 ---- /* * Successfully created the file, now fill it. */ ! snprintf(buffer, sizeof(buffer), "%d\n%ld\n%s\n%d\n%s\n", amPostmaster ? (int) my_pid : -((int) my_pid), ! (long) MyStartTime, DataDir, PostPortNumber, ! UnixSocketDir); errno = 0; if (write(fd, buffer, strlen(buffer)) != strlen(buffer)) { *************** TouchSocketLockFile(void) *** 985,1008 **** } } /* ! * Append information about a shared memory segment to the data directory ! * lock file. ! * ! * This may be called multiple times in the life of a postmaster, if we ! * delete and recreate shmem due to backend crash. Therefore, be prepared ! * to overwrite existing information. (As of 7.1, a postmaster only creates ! * one shm seg at a time; but for the purposes here, if we did have more than ! * one then any one of them would do anyway.) */ void ! RecordSharedMemoryInLockFile(unsigned long id1, unsigned long id2) { int fd; int len; int lineno; char *ptr; ! char buffer[MAXPGPATH * 2 + 256]; fd = open(DIRECTORY_LOCK_FILE, O_RDWR | PG_BINARY, 0); if (fd < 0) --- 998,1016 ---- } } + /* ! * Add lines to the data directory lock file. This erases all following ! * lines, but that is OK because lines are added in order. */ void ! AddToLockFile(int target_line, const char *str) { int fd; int len; int lineno; char *ptr; ! char buffer[MAXPGPATH * 3 + 256]; fd = open(DIRECTORY_LOCK_FILE, O_RDWR | PG_BINARY, 0); if (fd < 0) *************** RecordSharedMemoryInLockFile(unsigned lo *** 1029,1035 **** * Skip over first four lines (PID, pgdata, portnum, socketdir). */ ptr = buffer; ! for (lineno = 1; lineno <= 4; lineno++) { if ((ptr = strchr(ptr, '\n')) == NULL) { --- 1037,1043 ---- * Skip over first four lines (PID, pgdata, portnum, socketdir). */ ptr = buffer; ! for (lineno = 1; lineno < target_line; lineno++) { if ((ptr = strchr(ptr, '\n')) == NULL) { *************** RecordSharedMemoryInLockFile(unsigned lo *** 1040,1050 **** ptr++; } ! /* ! * Append key information. Format to try to keep it the same length ! * always (trailing junk won't hurt, but might confuse humans). ! */ ! sprintf(ptr, "%9lu %9lu\n", id1, id2); /* * And rewrite the data. Since we write in a single kernel call, this --- 1048,1054 ---- ptr++; } ! strlcat(buffer, str, sizeof(buffer)); /* * And rewrite the data. Since we write in a single kernel call, this diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c index 5e1b772..395a863 100644 *** /tmp/pgdiff.11857/8MG68b_pg_ctl.c Tue Dec 28 12:51:36 2010 --- src/bin/pg_ctl/pg_ctl.c Tue Dec 28 12:44:45 2010 *************** *** 22,27 **** --- 22,28 ---- #include <locale.h> #include <signal.h> + #include <stdlib.h> #include <sys/types.h> #include <sys/stat.h> #include <unistd.h> *************** static void read_post_opts(void); *** 138,143 **** --- 139,145 ---- static PGPing test_postmaster_connection(bool); static bool postmaster_is_alive(pid_t pid); + static time_t start_time; static char postopts_file[MAXPGPATH]; static char pid_file[MAXPGPATH]; *************** static PGPing *** 404,416 **** test_postmaster_connection(bool do_checkpoint) { int portnum = 0; ! char socket_dir[MAXPGPATH]; char connstr[MAXPGPATH + 256]; PGPing ret = PQPING_OK; /* assume success for wait == zero */ char **optlines; int i; ! socket_dir[0] = '\0'; connstr[0] = '\0'; for (i = 0; i < wait_seconds; i++) --- 406,418 ---- test_postmaster_connection(bool do_checkpoint) { int portnum = 0; ! char host_str[MAXPGPATH]; char connstr[MAXPGPATH + 256]; PGPing ret = PQPING_OK; /* assume success for wait == zero */ char **optlines; int i; ! host_str[0] = '\0'; connstr[0] = '\0'; for (i = 0; i < wait_seconds; i++) *************** test_postmaster_connection(bool do_check *** 425,437 **** * 0 lock file created but status not written * 2 pre-9.1 server, shared memory not created * 3 pre-9.1 server, shared memory created ! * 4 9.1+ server, shared memory not created ! * 5 9.1+ server, shared memory created * * For pre-9.1 Unix servers, we grab the port number from the * shmem key (first value on line 3). Pre-9.1 Win32 has no ! * written shmem key, so we fail. 9.1+ writes both the port ! * number and socket address in the file for us to use. * (PG_VERSION could also have told us the major version.) */ --- 427,440 ---- * 0 lock file created but status not written * 2 pre-9.1 server, shared memory not created * 3 pre-9.1 server, shared memory created ! * 5 9.1+ server, listen host not created ! * 6 9.1+ server, shared memory not created ! * 7 9.1+ server, shared memory created * * For pre-9.1 Unix servers, we grab the port number from the * shmem key (first value on line 3). Pre-9.1 Win32 has no ! * written shmem key, so we fail. 9.1+ writes connection ! * information in the file for us to use. * (PG_VERSION could also have told us the major version.) */ *************** test_postmaster_connection(bool do_check *** 439,445 **** if ((optlines = readfile(pid_file)) != NULL && optlines[0] != NULL && optlines[1] != NULL && ! optlines[2] != NULL) { /* A 3-line file? */ if (optlines[3] == NULL) --- 442,451 ---- if ((optlines = readfile(pid_file)) != NULL && optlines[0] != NULL && optlines[1] != NULL && ! optlines[2] != NULL && ! /* pre-9.1 server or listen_address line is present? */ ! (optlines[3] == NULL || ! optlines[5] != NULL)) { /* A 3-line file? */ if (optlines[3] == NULL) *************** test_postmaster_connection(bool do_check *** 459,489 **** return PQPING_NO_ATTEMPT; } } ! else /* 9.1+ server */ { ! portnum = atoi(optlines[2]); ! ! /* Get socket directory, if specified. */ ! if (optlines[3][0] != '\n') { ! /* ! * While unix_socket_directory can accept relative ! * directories, libpq's host must have a leading slash ! * to indicate a socket directory. ! */ ! if (optlines[3][0] != '/') ! { ! write_stderr(_("%s: -w option cannot use a relative socket directory specification\n"), ! progname); ! return PQPING_NO_ATTEMPT; ! } ! strlcpy(socket_dir, optlines[3], MAXPGPATH); ! /* remove newline */ ! if (strchr(socket_dir, '\n') != NULL) ! *strchr(socket_dir, '\n') = '\0'; } ! } /* * We need to set connect_timeout otherwise on Windows the * Service Control Manager (SCM) will probably timeout first. --- 465,515 ---- return PQPING_NO_ATTEMPT; } } ! else { ! /* ! * Easy check to see if we are looking at the right ! * data directory: Is the postmaster older than this ! * execution of pg_ctl? ! */ ! if (atol(optlines[1]) < start_time) { ! write_stderr(_("%s: this data directory is running an older postmaster\n"), ! progname); ! return PQPING_NO_ATTEMPT; } ! ! portnum = atoi(optlines[3]); + /* + * Determine the proper host string to use. + */ + #ifdef HAVE_UNIX_SOCKETS + /* + * Use socket directory, if specified. We assume if we + * have unix sockets, the server does too because we + * just started the postmaster. + */ + /* + * While unix_socket_directory can accept relative + * directories, libpq's host must have a leading slash + * to indicate a socket directory. + */ + if (optlines[4][0] != '\n' && optlines[4][0] != '/') + { + write_stderr(_("%s: -w option cannot use a relative socket directory specification\n"), + progname); + return PQPING_NO_ATTEMPT; + } + strlcpy(host_str, optlines[4], sizeof(host_str)); + #else + strlcpy(host_str, optlines[5], sizeof(host_str)); + #endif + /* remove newline */ + if (strchr(host_str, '\n') != NULL) + *strchr(host_str, '\n') = '\0'; + } + /* * We need to set connect_timeout otherwise on Windows the * Service Control Manager (SCM) will probably timeout first. *************** test_postmaster_connection(bool do_check *** 491,499 **** snprintf(connstr, sizeof(connstr), "dbname=postgres port=%d connect_timeout=5", portnum); ! if (socket_dir[0] != '\0') snprintf(connstr + strlen(connstr), sizeof(connstr) - strlen(connstr), ! " host='%s'", socket_dir); } } --- 517,525 ---- snprintf(connstr, sizeof(connstr), "dbname=postgres port=%d connect_timeout=5", portnum); ! if (host_str[0] != '\0') snprintf(connstr + strlen(connstr), sizeof(connstr) - strlen(connstr), ! " host='%s'", host_str); } } *************** main(int argc, char **argv) *** 1756,1761 **** --- 1782,1788 ---- progname = get_progname(argv[0]); set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_ctl")); + start_time = time(NULL); /* * save argv[0] so do_start() can look for the postmaster if necessary. we diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h index 032875e..72e1dc8 100644 *** /tmp/pgdiff.11857/4qcsPd_miscadmin.h Tue Dec 28 12:51:36 2010 --- src/include/miscadmin.h Tue Dec 28 11:13:38 2010 *************** extern PGDLLIMPORT bool process_shared_p *** 348,358 **** extern char *shared_preload_libraries_string; extern char *local_preload_libraries_string; extern void CreateDataDirLockFile(bool amPostmaster); extern void CreateSocketLockFile(const char *socketfile, bool amPostmaster); extern void TouchSocketLockFile(void); ! extern void RecordSharedMemoryInLockFile(unsigned long id1, ! unsigned long id2); extern void ValidatePgVersion(const char *path); extern void process_shared_preload_libraries(void); extern void process_local_preload_libraries(void); --- 348,358 ---- extern char *shared_preload_libraries_string; extern char *local_preload_libraries_string; + #define LOCK_FILE_LINES 7 extern void CreateDataDirLockFile(bool amPostmaster); extern void CreateSocketLockFile(const char *socketfile, bool amPostmaster); extern void TouchSocketLockFile(void); ! extern void AddToLockFile(int target_line, const char *str); extern void ValidatePgVersion(const char *path); extern void process_shared_preload_libraries(void); extern void process_local_preload_libraries(void);
Bruce Momjian wrote: > Yes, that was my calculus too. I realized that we create session ids by > merging the process id and backend start time, so I went ahead and added > the postmaster start time epoch to the postmaster.pid file. While there > is no way to pass back the postmaster start time from PQping, I added > code to pg_ctl to make sure the time in the postmaster.pid file is not > _before_ pg_ctl started running. We only check PQping() after we have > started the postmaster ourselves, so it fits our needs. Tom suggested that there might be clock skew between pg_ctl and the postmaster, so I added a 2-second slop in checking the postmaster start time. Tom also wanted the connection information to be output all at once, but that causes a problem with detecting pre-9.1 servers so I avoided it. Updated patch attached. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml index cda7f64..86bc5a6 100644 *** /tmp/pgdiff.14558/cXBdee_storage.sgml Wed Dec 29 13:42:38 2010 --- doc/src/sgml/storage.sgml Wed Dec 29 12:11:03 2010 *************** last started with</entry> *** 117,124 **** <row> <entry><filename>postmaster.pid</></entry> <entry>A lock file recording the current postmaster process id (PID), ! cluster data directory, port number, Unix domain socket directory, ! and shared memory segment ID</entry> </row> </tbody> --- 117,125 ---- <row> <entry><filename>postmaster.pid</></entry> <entry>A lock file recording the current postmaster process id (PID), ! postmaster start time, cluster data directory, port number, user-specified ! Unix domain socket directory, first valid listen_address host, and ! shared memory segment ID</entry> </row> </tbody> diff --git a/src/backend/port/ipc_test.c b/src/backend/port/ipc_test.c index a003dc9..461a7a6 100644 *** /tmp/pgdiff.14558/o3NlPc_ipc_test.c Wed Dec 29 13:42:38 2010 --- src/backend/port/ipc_test.c Wed Dec 29 12:11:03 2010 *************** on_exit_reset(void) *** 104,110 **** } void ! RecordSharedMemoryInLockFile(unsigned long id1, unsigned long id2) { } --- 104,110 ---- } void ! AddToLockFile(int target_line, const char *str) { } diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c index d970eb2..ff77099 100644 *** /tmp/pgdiff.14558/mac69b_sysv_shmem.c Wed Dec 29 13:42:38 2010 --- src/backend/port/sysv_shmem.c Wed Dec 29 12:11:03 2010 *************** InternalIpcMemoryCreate(IpcMemoryKey mem *** 198,206 **** /* Register on-exit routine to detach new segment before deleting */ on_shmem_exit(IpcMemoryDetach, PointerGetDatum(memAddress)); ! /* Record key and ID in lockfile for data directory. */ ! RecordSharedMemoryInLockFile((unsigned long) memKey, ! (unsigned long) shmid); return memAddress; } --- 198,214 ---- /* Register on-exit routine to detach new segment before deleting */ on_shmem_exit(IpcMemoryDetach, PointerGetDatum(memAddress)); ! /* ! * Append record key and ID in lockfile for data directory. Format ! * to try to keep it the same length. ! */ ! { ! char line[32]; ! ! sprintf(line, "%9lu %9lu\n", (unsigned long) memKey, ! (unsigned long) shmid); ! AddToLockFile(LOCK_FILE_LINES, line); ! } return memAddress; } diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index a46a323..c1e553a 100644 *** /tmp/pgdiff.14558/GKd28a_postmaster.c Wed Dec 29 13:42:38 2010 --- src/backend/postmaster/postmaster.c Wed Dec 29 12:11:03 2010 *************** PostmasterMain(int argc, char *argv[]) *** 483,489 **** int status; char *userDoption = NULL; int i; ! MyProcPid = PostmasterPid = getpid(); MyStartTime = time(NULL); --- 483,490 ---- int status; char *userDoption = NULL; int i; ! bool connection_line_output = false; ! MyProcPid = PostmasterPid = getpid(); MyStartTime = time(NULL); *************** PostmasterMain(int argc, char *argv[]) *** 860,869 **** --- 861,882 ---- UnixSocketDir, ListenSocket, MAXLISTEN); else + { status = StreamServerPort(AF_UNSPEC, curhost, (unsigned short) PostPortNumber, UnixSocketDir, ListenSocket, MAXLISTEN); + /* must supply a valid listen_address for PQping() */ + if (!connection_line_output) + { + char line[MAXPGPATH + 2]; + + sprintf(line, "%s\n", curhost); + AddToLockFile(LOCK_FILE_LINES - 1, line); + connection_line_output = true; + } + } + if (status == STATUS_OK) success++; else *************** PostmasterMain(int argc, char *argv[]) *** 880,885 **** --- 893,902 ---- pfree(rawstring); } + /* Supply an empty listen_address line for PQping() */ + if (!connection_line_output) + AddToLockFile(LOCK_FILE_LINES - 1, "\n"); + #ifdef USE_BONJOUR /* Register for Bonjour only if we opened TCP socket(s) */ if (enable_bonjour && ListenSocket[0] != PGINVALID_SOCKET) diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c index d74b5cc..6935786 100644 *** /tmp/pgdiff.14558/IzFLnd_miscinit.c Wed Dec 29 13:42:38 2010 --- src/backend/utils/init/miscinit.c Wed Dec 29 12:11:03 2010 *************** *** 46,52 **** #define DIRECTORY_LOCK_FILE "postmaster.pid" ! ProcessingMode Mode = InitProcessing; /* Note: we rely on this to initialize as zeroes */ --- 46,65 ---- #define DIRECTORY_LOCK_FILE "postmaster.pid" ! /* ! * The lock file contents are: ! * ! * line # ! * 1 pid ! * 2 postmaster start time ! * 3 data directory ! * 4 port # ! * 5 user-specified socket directory ! * (the lines below are added later) ! * 6 first valid listen_address ! * 7 shared memory key ! */ ! ProcessingMode Mode = InitProcessing; /* Note: we rely on this to initialize as zeroes */ *************** CreateLockFile(const char *filename, boo *** 659,665 **** bool isDDLock, const char *refName) { int fd; ! char buffer[MAXPGPATH * 2 + 256]; int ntries; int len; int encoded_pid; --- 672,678 ---- bool isDDLock, const char *refName) { int fd; ! char buffer[MAXPGPATH * 3 + 256]; int ntries; int len; int encoded_pid; *************** CreateLockFile(const char *filename, boo *** 826,836 **** if (isDDLock) { char *ptr = buffer; ! unsigned long id1, ! id2; int lineno; ! for (lineno = 1; lineno <= 4; lineno++) { if ((ptr = strchr(ptr, '\n')) == NULL) { --- 839,848 ---- if (isDDLock) { char *ptr = buffer; ! unsigned long id1, id2; int lineno; ! for (lineno = 1; lineno <= LOCK_FILE_LINES - 1; lineno++) { if ((ptr = strchr(ptr, '\n')) == NULL) { *************** CreateLockFile(const char *filename, boo *** 874,882 **** /* * Successfully created the file, now fill it. */ ! snprintf(buffer, sizeof(buffer), "%d\n%s\n%d\n%s\n", amPostmaster ? (int) my_pid : -((int) my_pid), ! DataDir, PostPortNumber, UnixSocketDir); errno = 0; if (write(fd, buffer, strlen(buffer)) != strlen(buffer)) { --- 886,895 ---- /* * Successfully created the file, now fill it. */ ! snprintf(buffer, sizeof(buffer), "%d\n%ld\n%s\n%d\n%s\n", amPostmaster ? (int) my_pid : -((int) my_pid), ! (long) MyStartTime, DataDir, PostPortNumber, ! UnixSocketDir); errno = 0; if (write(fd, buffer, strlen(buffer)) != strlen(buffer)) { *************** TouchSocketLockFile(void) *** 985,1008 **** } } /* ! * Append information about a shared memory segment to the data directory ! * lock file. ! * ! * This may be called multiple times in the life of a postmaster, if we ! * delete and recreate shmem due to backend crash. Therefore, be prepared ! * to overwrite existing information. (As of 7.1, a postmaster only creates ! * one shm seg at a time; but for the purposes here, if we did have more than ! * one then any one of them would do anyway.) */ void ! RecordSharedMemoryInLockFile(unsigned long id1, unsigned long id2) { int fd; int len; int lineno; char *ptr; ! char buffer[MAXPGPATH * 2 + 256]; fd = open(DIRECTORY_LOCK_FILE, O_RDWR | PG_BINARY, 0); if (fd < 0) --- 998,1016 ---- } } + /* ! * Add lines to the data directory lock file. This erases all following ! * lines, but that is OK because lines are added in order. */ void ! AddToLockFile(int target_line, const char *str) { int fd; int len; int lineno; char *ptr; ! char buffer[MAXPGPATH * 3 + 256]; fd = open(DIRECTORY_LOCK_FILE, O_RDWR | PG_BINARY, 0); if (fd < 0) *************** RecordSharedMemoryInLockFile(unsigned lo *** 1029,1035 **** * Skip over first four lines (PID, pgdata, portnum, socketdir). */ ptr = buffer; ! for (lineno = 1; lineno <= 4; lineno++) { if ((ptr = strchr(ptr, '\n')) == NULL) { --- 1037,1043 ---- * Skip over first four lines (PID, pgdata, portnum, socketdir). */ ptr = buffer; ! for (lineno = 1; lineno < target_line; lineno++) { if ((ptr = strchr(ptr, '\n')) == NULL) { *************** RecordSharedMemoryInLockFile(unsigned lo *** 1040,1050 **** ptr++; } ! /* ! * Append key information. Format to try to keep it the same length ! * always (trailing junk won't hurt, but might confuse humans). ! */ ! sprintf(ptr, "%9lu %9lu\n", id1, id2); /* * And rewrite the data. Since we write in a single kernel call, this --- 1048,1054 ---- ptr++; } ! strlcat(buffer, str, sizeof(buffer)); /* * And rewrite the data. Since we write in a single kernel call, this diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c index 5e1b772..92ea514 100644 *** /tmp/pgdiff.14558/gFBQjc_pg_ctl.c Wed Dec 29 13:42:38 2010 --- src/bin/pg_ctl/pg_ctl.c Wed Dec 29 12:32:48 2010 *************** *** 22,27 **** --- 22,28 ---- #include <locale.h> #include <signal.h> + #include <stdlib.h> #include <sys/types.h> #include <sys/stat.h> #include <unistd.h> *************** static void read_post_opts(void); *** 138,143 **** --- 139,145 ---- static PGPing test_postmaster_connection(bool); static bool postmaster_is_alive(pid_t pid); + static time_t start_time; static char postopts_file[MAXPGPATH]; static char pid_file[MAXPGPATH]; *************** static PGPing *** 404,416 **** test_postmaster_connection(bool do_checkpoint) { int portnum = 0; ! char socket_dir[MAXPGPATH]; char connstr[MAXPGPATH + 256]; PGPing ret = PQPING_OK; /* assume success for wait == zero */ char **optlines; int i; ! socket_dir[0] = '\0'; connstr[0] = '\0'; for (i = 0; i < wait_seconds; i++) --- 406,418 ---- test_postmaster_connection(bool do_checkpoint) { int portnum = 0; ! char host_str[MAXPGPATH]; char connstr[MAXPGPATH + 256]; PGPing ret = PQPING_OK; /* assume success for wait == zero */ char **optlines; int i; ! host_str[0] = '\0'; connstr[0] = '\0'; for (i = 0; i < wait_seconds; i++) *************** test_postmaster_connection(bool do_check *** 425,437 **** * 0 lock file created but status not written * 2 pre-9.1 server, shared memory not created * 3 pre-9.1 server, shared memory created ! * 4 9.1+ server, shared memory not created ! * 5 9.1+ server, shared memory created * * For pre-9.1 Unix servers, we grab the port number from the * shmem key (first value on line 3). Pre-9.1 Win32 has no ! * written shmem key, so we fail. 9.1+ writes both the port ! * number and socket address in the file for us to use. * (PG_VERSION could also have told us the major version.) */ --- 427,440 ---- * 0 lock file created but status not written * 2 pre-9.1 server, shared memory not created * 3 pre-9.1 server, shared memory created ! * 5 9.1+ server, listen host not created ! * 6 9.1+ server, shared memory not created ! * 7 9.1+ server, shared memory created * * For pre-9.1 Unix servers, we grab the port number from the * shmem key (first value on line 3). Pre-9.1 Win32 has no ! * written shmem key, so we fail. 9.1+ writes connection ! * information in the file for us to use. * (PG_VERSION could also have told us the major version.) */ *************** test_postmaster_connection(bool do_check *** 439,445 **** if ((optlines = readfile(pid_file)) != NULL && optlines[0] != NULL && optlines[1] != NULL && ! optlines[2] != NULL) { /* A 3-line file? */ if (optlines[3] == NULL) --- 442,451 ---- if ((optlines = readfile(pid_file)) != NULL && optlines[0] != NULL && optlines[1] != NULL && ! optlines[2] != NULL && ! /* pre-9.1 server or listen_address line is present? */ ! (optlines[3] == NULL || ! optlines[5] != NULL)) { /* A 3-line file? */ if (optlines[3] == NULL) *************** test_postmaster_connection(bool do_check *** 459,489 **** return PQPING_NO_ATTEMPT; } } ! else /* 9.1+ server */ { ! portnum = atoi(optlines[2]); ! ! /* Get socket directory, if specified. */ ! if (optlines[3][0] != '\n') { ! /* ! * While unix_socket_directory can accept relative ! * directories, libpq's host must have a leading slash ! * to indicate a socket directory. ! */ ! if (optlines[3][0] != '/') ! { ! write_stderr(_("%s: -w option cannot use a relative socket directory specification\n"), ! progname); ! return PQPING_NO_ATTEMPT; ! } ! strlcpy(socket_dir, optlines[3], MAXPGPATH); ! /* remove newline */ ! if (strchr(socket_dir, '\n') != NULL) ! *strchr(socket_dir, '\n') = '\0'; } ! } /* * We need to set connect_timeout otherwise on Windows the * Service Control Manager (SCM) will probably timeout first. --- 465,517 ---- return PQPING_NO_ATTEMPT; } } ! else { ! /* ! * Easy check to see if we are looking at the right ! * data directory: Is the postmaster older than this ! * execution of pg_ctl? Subtract 2 seconds to account ! * for possible clock skew between pg_ctl and the ! * postmaster. ! */ ! if (atol(optlines[1]) < start_time - 2) { ! write_stderr(_("%s: this data directory is running an older postmaster\n"), ! progname); ! return PQPING_NO_ATTEMPT; } ! ! portnum = atoi(optlines[3]); + /* + * Determine the proper host string to use. + */ + #ifdef HAVE_UNIX_SOCKETS + /* + * Use socket directory, if specified. We assume if we + * have unix sockets, the server does too because we + * just started the postmaster. + */ + /* + * While unix_socket_directory can accept relative + * directories, libpq's host must have a leading slash + * to indicate a socket directory. + */ + if (optlines[4][0] != '\n' && optlines[4][0] != '/') + { + write_stderr(_("%s: -w option cannot use a relative socket directory specification\n"), + progname); + return PQPING_NO_ATTEMPT; + } + strlcpy(host_str, optlines[4], sizeof(host_str)); + #else + strlcpy(host_str, optlines[5], sizeof(host_str)); + #endif + /* remove newline */ + if (strchr(host_str, '\n') != NULL) + *strchr(host_str, '\n') = '\0'; + } + /* * We need to set connect_timeout otherwise on Windows the * Service Control Manager (SCM) will probably timeout first. *************** test_postmaster_connection(bool do_check *** 491,499 **** snprintf(connstr, sizeof(connstr), "dbname=postgres port=%d connect_timeout=5", portnum); ! if (socket_dir[0] != '\0') snprintf(connstr + strlen(connstr), sizeof(connstr) - strlen(connstr), ! " host='%s'", socket_dir); } } --- 519,527 ---- snprintf(connstr, sizeof(connstr), "dbname=postgres port=%d connect_timeout=5", portnum); ! if (host_str[0] != '\0') snprintf(connstr + strlen(connstr), sizeof(connstr) - strlen(connstr), ! " host='%s'", host_str); } } *************** main(int argc, char **argv) *** 1756,1761 **** --- 1784,1790 ---- progname = get_progname(argv[0]); set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_ctl")); + start_time = time(NULL); /* * save argv[0] so do_start() can look for the postmaster if necessary. we diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h index 032875e..72e1dc8 100644 *** /tmp/pgdiff.14558/onEvLd_miscadmin.h Wed Dec 29 13:42:38 2010 --- src/include/miscadmin.h Wed Dec 29 12:11:03 2010 *************** extern PGDLLIMPORT bool process_shared_p *** 348,358 **** extern char *shared_preload_libraries_string; extern char *local_preload_libraries_string; extern void CreateDataDirLockFile(bool amPostmaster); extern void CreateSocketLockFile(const char *socketfile, bool amPostmaster); extern void TouchSocketLockFile(void); ! extern void RecordSharedMemoryInLockFile(unsigned long id1, ! unsigned long id2); extern void ValidatePgVersion(const char *path); extern void process_shared_preload_libraries(void); extern void process_local_preload_libraries(void); --- 348,358 ---- extern char *shared_preload_libraries_string; extern char *local_preload_libraries_string; + #define LOCK_FILE_LINES 7 extern void CreateDataDirLockFile(bool amPostmaster); extern void CreateSocketLockFile(const char *socketfile, bool amPostmaster); extern void TouchSocketLockFile(void); ! extern void AddToLockFile(int target_line, const char *str); extern void ValidatePgVersion(const char *path); extern void process_shared_preload_libraries(void); extern void process_local_preload_libraries(void);
Bruce Momjian wrote: > Bruce Momjian wrote: > > Yes, that was my calculus too. I realized that we create session ids by > > merging the process id and backend start time, so I went ahead and added > > the postmaster start time epoch to the postmaster.pid file. While there > > is no way to pass back the postmaster start time from PQping, I added > > code to pg_ctl to make sure the time in the postmaster.pid file is not > > _before_ pg_ctl started running. We only check PQping() after we have > > started the postmaster ourselves, so it fits our needs. > > Tom suggested that there might be clock skew between pg_ctl and the > postmaster, so I added a 2-second slop in checking the postmaster start > time. Tom also wanted the connection information to be output all at > once, but that causes a problem with detecting pre-9.1 servers so I > avoided it. Patch applied, and TODO item removed because patch mostly detects if a stale postmaster created the postmaster.pid file. The TODO was: Allow pg_ctl to work properly with configuration files located outsidethe PGDATA directory) -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On fre, 2010-12-31 at 17:26 -0500, Bruce Momjian wrote: > Patch applied, and TODO item removed because patch mostly detects if a > stale postmaster created the postmaster.pid file. The TODO was: Please fix this new compiler warning: pg_ctl.c:1787: warning: implicit declaration of function ‘time’
Peter Eisentraut wrote: > On fre, 2010-12-31 at 17:26 -0500, Bruce Momjian wrote: > > Patch applied, and TODO item removed because patch mostly detects if a > > stale postmaster created the postmaster.pid file. The TODO was: > > Please fix this new compiler warning: > > pg_ctl.c:1787: warning: implicit declaration of function ?time? Thanks, done. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +