Thread: [HACKERS] Reducing pg_ctl's reaction time
I still have a bee in my bonnet about how slow the recovery TAP tests are, and especially about how low the CPU usage is while they run, suggesting that a lot of the wall clock time is being expended on useless sleeps. Some analysis I did today found some low-hanging fruit there: a significant part of the time is being spent in pg_ctl waiting for postmaster start/stop operations. It waits in quanta of 1 second, but the postmaster usually starts or stops in much less than that. (In these tests, most of the shutdown checkpoints have little to do, so that in many cases the postmaster stops in just a couple of msec. Startup isn't very many msec either.) The attached proposed patch adjusts pg_ctl to check every 100msec, instead of every second, for the postmaster to be done starting or stopping. This cuts the runtime of the recovery TAP tests from around 4m30s to around 3m10s on my machine, a good 30% savings. I experimented with different check frequencies but there doesn't seem to be anything to be gained by cutting the delay further (and presumably, it becomes counterproductive at some point due to pg_ctl chewing too many cycles). This change probably doesn't offer much real-world benefit, since few people start/stop their postmasters often, and shutdown checkpoints are seldom so cheap on production installations. Still, it doesn't seem like it could hurt. The original choice to use once-per-second checks was made for hardware a lot slower than what most of us use now; I do not think it's been revisited since the first implementation of pg_ctl in 1999 (cf 5b912b089). Barring objections I'd like to push this soon. regards, tom lane diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c index 82ac62d..1e6cb69 100644 *** a/src/bin/pg_ctl/pg_ctl.c --- b/src/bin/pg_ctl/pg_ctl.c *************** typedef enum *** 68,73 **** --- 68,75 ---- #define DEFAULT_WAIT 60 + #define WAITS_PER_SEC 10 /* should divide 1000000 evenly */ + static bool do_wait = true; static int wait_seconds = DEFAULT_WAIT; static bool wait_seconds_arg = false; *************** test_postmaster_connection(pgpid_t pm_pi *** 531,537 **** connstr[0] = '\0'; ! for (i = 0; i < wait_seconds; i++) { /* Do we need a connection string? */ if (connstr[0] == '\0') --- 533,539 ---- connstr[0] = '\0'; ! for (i = 0; i < wait_seconds * WAITS_PER_SEC; i++) { /* Do we need a connection string? */ if (connstr[0] == '\0') *************** test_postmaster_connection(pgpid_t pm_pi *** 701,724 **** #endif /* No response, or startup still in process; wait */ ! #ifdef WIN32 ! if (do_checkpoint) { ! /* ! * Increment the wait hint by 6 secs (connection timeout + sleep) ! * We must do this to indicate to the SCM that our startup time is ! * changing, otherwise it'll usually send a stop signal after 20 ! * seconds, despite incrementing the checkpoint counter. ! */ ! status.dwWaitHint += 6000; ! status.dwCheckPoint++; ! SetServiceStatus(hStatus, (LPSERVICE_STATUS) &status); ! } ! else #endif ! print_msg("."); ! pg_usleep(1000000); /* 1 sec */ } /* return result of last call to PQping */ --- 703,730 ---- #endif /* No response, or startup still in process; wait */ ! if (i % WAITS_PER_SEC == 0) { ! #ifdef WIN32 ! if (do_checkpoint) ! { ! /* ! * Increment the wait hint by 6 secs (connection timeout + ! * sleep). We must do this to indicate to the SCM that our ! * startup time is changing, otherwise it'll usually send a ! * stop signal after 20 seconds, despite incrementing the ! * checkpoint counter. ! */ ! status.dwWaitHint += 6000; ! status.dwCheckPoint++; ! SetServiceStatus(hStatus, (LPSERVICE_STATUS) &status); ! } ! else #endif ! print_msg("."); ! } ! pg_usleep(1000000 / WAITS_PER_SEC); /* 1/WAITS_PER_SEC sec */ } /* return result of last call to PQping */ *************** do_stop(void) *** 998,1009 **** print_msg(_("waiting for server to shut down...")); ! for (cnt = 0; cnt < wait_seconds; cnt++) { if ((pid = get_pgpid(false)) != 0) { ! print_msg("."); ! pg_usleep(1000000); /* 1 sec */ } else break; --- 1004,1016 ---- print_msg(_("waiting for server to shut down...")); ! for (cnt = 0; cnt < wait_seconds * WAITS_PER_SEC; cnt++) { if ((pid = get_pgpid(false)) != 0) { ! if (cnt % WAITS_PER_SEC == 0) ! print_msg("."); ! pg_usleep(1000000 / WAITS_PER_SEC); /* 1/WAITS_PER_SEC sec */ } else break; *************** do_restart(void) *** 1088,1099 **** /* always wait for restart */ ! for (cnt = 0; cnt < wait_seconds; cnt++) { if ((pid = get_pgpid(false)) != 0) { ! print_msg("."); ! pg_usleep(1000000); /* 1 sec */ } else break; --- 1095,1107 ---- /* always wait for restart */ ! for (cnt = 0; cnt < wait_seconds * WAITS_PER_SEC; cnt++) { if ((pid = get_pgpid(false)) != 0) { ! if (cnt % WAITS_PER_SEC == 0) ! print_msg("."); ! pg_usleep(1000000 / WAITS_PER_SEC); /* 1/WAITS_PER_SEC sec */ } else break; *************** do_promote(void) *** 1225,1241 **** if (do_wait) { DBState state = DB_STARTUP; print_msg(_("waiting for server to promote...")); ! while (wait_seconds > 0) { state = get_control_dbstate(); if (state == DB_IN_PRODUCTION) break; ! print_msg("."); ! pg_usleep(1000000); /* 1 sec */ ! wait_seconds--; } if (state == DB_IN_PRODUCTION) { --- 1233,1250 ---- if (do_wait) { DBState state = DB_STARTUP; + int cnt; print_msg(_("waiting for server to promote...")); ! for (cnt = 0; cnt < wait_seconds * WAITS_PER_SEC; cnt++) { state = get_control_dbstate(); if (state == DB_IN_PRODUCTION) break; ! if (cnt % WAITS_PER_SEC == 0) ! print_msg("."); ! pg_usleep(1000000 / WAITS_PER_SEC); /* 1/WAITS_PER_SEC sec */ } if (state == DB_IN_PRODUCTION) { -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jun 26, 2017 at 7:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The attached proposed patch adjusts pg_ctl to check every 100msec, > instead of every second, for the postmaster to be done starting or > stopping. This cuts the runtime of the recovery TAP tests from around > 4m30s to around 3m10s on my machine, a good 30% savings. I experimented > with different check frequencies but there doesn't seem to be anything > to be gained by cutting the delay further (and presumably, it becomes > counterproductive at some point due to pg_ctl chewing too many cycles). I see with a 2~3% noise similar improvements on my laptop with non-parallel tests. That's nice. +#define WAITS_PER_SEC 10 /* should divide 1000000 evenly */ As a matter of style, you could define 1000000 as well in a variable and refer to the variable for the division. > Barring objections I'd like to push this soon. This also pops up more easily failures with 001_stream_rep.pl without a patch applied from the other recent thread, so this patch had better not get in before anything from https://www.postgresql.org/message-id/8962.1498425057@sss.pgh.pa.us. -- Michael
Michael Paquier <michael.paquier@gmail.com> writes: > On Mon, Jun 26, 2017 at 7:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The attached proposed patch adjusts pg_ctl to check every 100msec, >> instead of every second, for the postmaster to be done starting or >> stopping. >> +#define WAITS_PER_SEC 10 /* should divide 1000000 evenly */ > As a matter of style, you could define 1000000 as well in a variable > and refer to the variable for the division. Good idea, done that way. (My initial thought was to use USECS_PER_SEC from timestamp.h, but that's declared as int64 which would have complicated matters, so I just made a new symbol.) > This also pops up more easily failures with 001_stream_rep.pl without > a patch applied from the other recent thread, so this patch had better > not get in before anything from > https://www.postgresql.org/message-id/8962.1498425057@sss.pgh.pa.us. Check. I pushed your fix for that first. Thanks for the review! regards, tom lane
On Mon, Jun 26, 2017 at 12:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Paquier <michael.paquier@gmail.com> writes:
> On Mon, Jun 26, 2017 at 7:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The attached proposed patch adjusts pg_ctl to check every 100msec,
>> instead of every second, for the postmaster to be done starting or
>> stopping.
>> +#define WAITS_PER_SEC 10 /* should divide 1000000 evenly */
> As a matter of style, you could define 1000000 as well in a variable
> and refer to the variable for the division.
Good idea, done that way. (My initial thought was to use USECS_PER_SEC
from timestamp.h, but that's declared as int64 which would have
complicated matters, so I just made a new symbol.)
> This also pops up more easily failures with 001_stream_rep.pl without
> a patch applied from the other recent thread, so this patch had better
> not get in before anything from
> https://www.postgresql.org/message-id/8962.1498425057@ sss.pgh.pa.us.
Check. I pushed your fix for that first.
Thanks for the review!
regards, tom lane
The 10 fold increase in log spam during long PITR recoveries is a bit unfortunate.
9153 2017-06-26 12:55:40.243 PDT FATAL: the database system is starting up
9154 2017-06-26 12:55:40.345 PDT FATAL: the database system is starting up
9156 2017-06-26 12:55:40.447 PDT FATAL: the database system is starting up
9157 2017-06-26 12:55:40.550 PDT FATAL: the database system is starting up
...
I can live with it, but could we use an escalating wait time so it slows back down to once a second after a while?
Cheers,
Jeff
Jeff Janes <jeff.janes@gmail.com> writes: > The 10 fold increase in log spam during long PITR recoveries is a bit > unfortunate. > 9153 2017-06-26 12:55:40.243 PDT FATAL: the database system is starting up > 9154 2017-06-26 12:55:40.345 PDT FATAL: the database system is starting up > 9156 2017-06-26 12:55:40.447 PDT FATAL: the database system is starting up > 9157 2017-06-26 12:55:40.550 PDT FATAL: the database system is starting up > ... > I can live with it, but could we use an escalating wait time so it slows > back down to once a second after a while? Sure, what do you think an appropriate behavior would be? regards, tom lane
On 2017-06-26 16:19:16 -0400, Tom Lane wrote: > Jeff Janes <jeff.janes@gmail.com> writes: > > The 10 fold increase in log spam during long PITR recoveries is a bit > > unfortunate. > > > 9153 2017-06-26 12:55:40.243 PDT FATAL: the database system is starting up > > 9154 2017-06-26 12:55:40.345 PDT FATAL: the database system is starting up > > 9156 2017-06-26 12:55:40.447 PDT FATAL: the database system is starting up > > 9157 2017-06-26 12:55:40.550 PDT FATAL: the database system is starting up > > ... > > > I can live with it, but could we use an escalating wait time so it slows > > back down to once a second after a while? > > Sure, what do you think an appropriate behavior would be? It'd not be unreasonble to check pg_control first, and only after that indicates readyness check via the protocol. Doesn't quite seem like something backpatchable tho. - Andres
Andres Freund <andres@anarazel.de> writes: > On 2017-06-26 16:19:16 -0400, Tom Lane wrote: >> Sure, what do you think an appropriate behavior would be? > It'd not be unreasonble to check pg_control first, and only after that > indicates readyness check via the protocol. Hm, that's a thought. The problem here isn't the frequency of checks, but the log spam. > Doesn't quite seem like something backpatchable tho. I didn't back-patch the pg_ctl change anyway, so that's no issue. regards, tom lane
On 2017-06-26 16:26:00 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2017-06-26 16:19:16 -0400, Tom Lane wrote: > >> Sure, what do you think an appropriate behavior would be? > > > It'd not be unreasonble to check pg_control first, and only after that > > indicates readyness check via the protocol. > > Hm, that's a thought. The problem here isn't the frequency of checks, > but the log spam. Right. I think to deal with hot-standby we'd probably have to add new state to the control file however. We don't just want to treat the server as ready once DB_IN_PRODUCTION is reached. Arguably we could and should improve the logic when the server has started, right now it's pretty messy because we never treat a standby as up if hot_standby is disabled... - Andres
I wrote: > Andres Freund <andres@anarazel.de> writes: >> It'd not be unreasonble to check pg_control first, and only after that >> indicates readyness check via the protocol. > Hm, that's a thought. The problem here isn't the frequency of checks, > but the log spam. Actually, that wouldn't help much as things stand, because you can't tell from pg_control whether hot standby is active. Assuming that we want "pg_ctl start" to come back as soon as connections are allowed, it'd have to start probing when it sees DB_IN_ARCHIVE_RECOVERY, which means Jeff still has a problem with long recovery sessions. We could maybe address that by changing the set of states in pg_control (or perhaps simpler, adding a "hot standby active" flag there). That might have wider consequences than we really want to deal with post-beta1 though. regards, tom lane
Andres Freund <andres@anarazel.de> writes: > Arguably we could and should improve the logic when the server has > started, right now it's pretty messy because we never treat a standby as > up if hot_standby is disabled... True. If you could tell the difference between "HS disabled" and "HS not enabled yet" from pg_control, that would make pg_ctl's behavior with cold-standby servers much cleaner. Maybe it *is* worth messing with the contents of pg_control at this late hour. My inclination for the least invasive fix is to leave the DBState enum alone and add a separate hot-standby state field with three values (disabled/not-yet-enabled/enabled). Then pg_ctl would start probing the postmaster when it saw either DB_IN_PRODUCTION DBstate or hot-standby-enabled. (It'd almost not have to probe the postmaster at all, except there's a race condition that the startup process will probably change the field a little before the postmaster gets the word to open the gates.) On the other hand, if it saw DB_IN_ARCHIVE_RECOVERY with hot standby disabled, it'd stop waiting. Any objections to that design sketch? Do we need to distinguish between master and slave servers in the when-to-stop-waiting logic? regards, tom lane
Hi, On 2017-06-26 16:49:07 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > Arguably we could and should improve the logic when the server has > > started, right now it's pretty messy because we never treat a standby as > > up if hot_standby is disabled... > > True. If you could tell the difference between "HS disabled" and "HS not > enabled yet" from pg_control, that would make pg_ctl's behavior with > cold-standby servers much cleaner. Maybe it *is* worth messing with the > contents of pg_control at this late hour. I'm +0.5. > My inclination for the least invasive fix is to leave the DBState > enum alone and add a separate hot-standby state field with three > values (disabled/not-yet-enabled/enabled). Yea, that seems sane. > Then pg_ctl would start > probing the postmaster when it saw either DB_IN_PRODUCTION DBstate > or hot-standby-enabled. (It'd almost not have to probe the postmaster > at all, except there's a race condition that the startup process > will probably change the field a little before the postmaster gets > the word to open the gates.) On the other hand, if it saw > DB_IN_ARCHIVE_RECOVERY with hot standby disabled, it'd stop waiting. It'd be quite possible to address the race-condition by moving the updating of the control file to postmaster, to the CheckPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY) block. That'd require updating the control file from postmaster, which'd be somewhat ugly. Perhaps that indicates that field shouldn't be in pg_control, but in the pid file? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > It'd be quite possible to address the race-condition by moving the > updating of the control file to postmaster, to the > CheckPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY) block. That'd require > updating the control file from postmaster, which'd be somewhat ugly. No, I don't like that at all. Has race conditions against updates coming from the startup process. > Perhaps that indicates that field shouldn't be in pg_control, but in the > pid file? Yeah, that would be a different way to go at it. The postmaster would probably just write the state of the hot_standby GUC to the file, and pg_ctl would have to infer things from there. regards, tom lane
On 2017-06-26 17:30:30 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > It'd be quite possible to address the race-condition by moving the > > updating of the control file to postmaster, to the > > CheckPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY) block. That'd require > > updating the control file from postmaster, which'd be somewhat ugly. > > No, I don't like that at all. Has race conditions against updates > coming from the startup process. You'd obviously have to take the appropriate locks. I think the issue here is less race conditions, and more that architecturally we'd interact with shmem too much. > > Perhaps that indicates that field shouldn't be in pg_control, but in the > > pid file? > > Yeah, that would be a different way to go at it. The postmaster would > probably just write the state of the hot_standby GUC to the file, and > pg_ctl would have to infer things from there. I'd actually say we should just mirror the existing #ifdef USE_SYSTEMD if (!EnableHotStandby) sd_notify(0, "READY=1"); #endif with corresponding pidfile updates - doesn't really seem necessary for pg_ctl to do more? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2017-06-26 17:30:30 -0400, Tom Lane wrote: >> No, I don't like that at all. Has race conditions against updates >> coming from the startup process. > You'd obviously have to take the appropriate locks. I think the issue > here is less race conditions, and more that architecturally we'd > interact with shmem too much. Uh, we are *not* taking any locks in the postmaster. >> Yeah, that would be a different way to go at it. The postmaster would >> probably just write the state of the hot_standby GUC to the file, and >> pg_ctl would have to infer things from there. > I'd actually say we should just mirror the existing > #ifdef USE_SYSTEMD > if (!EnableHotStandby) > sd_notify(0, "READY=1"); > #endif > with corresponding pidfile updates - doesn't really seem necessary for > pg_ctl to do more? Hm. Take that a bit further, and we could drop the connection probes altogether --- just put the whole responsibility on the postmaster to show in the pidfile whether it's ready for connections or not. regards, tom lane
On 2017-06-26 17:38:03 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2017-06-26 17:30:30 -0400, Tom Lane wrote: > >> No, I don't like that at all. Has race conditions against updates > >> coming from the startup process. > > > You'd obviously have to take the appropriate locks. I think the issue > > here is less race conditions, and more that architecturally we'd > > interact with shmem too much. > > Uh, we are *not* taking any locks in the postmaster. I'm not sure why you're 'Uh'ing, when my my point pretty much is that we do not want to do so? > Hm. Take that a bit further, and we could drop the connection probes > altogether --- just put the whole responsibility on the postmaster to > show in the pidfile whether it's ready for connections or not. Yea, that seems quite appealing, both from an architectural, simplicity, and log noise perspective. I wonder if there's some added reliability by the connection probe though? Essentially wondering if it'd be worthwhile to keep a single connection test at the end. I'm somewhat disinclined though. - Andres
Andres Freund <andres@anarazel.de> writes: > On 2017-06-26 17:38:03 -0400, Tom Lane wrote: >> Hm. Take that a bit further, and we could drop the connection probes >> altogether --- just put the whole responsibility on the postmaster to >> show in the pidfile whether it's ready for connections or not. > Yea, that seems quite appealing, both from an architectural, simplicity, > and log noise perspective. I wonder if there's some added reliability by > the connection probe though? Essentially wondering if it'd be worthwhile > to keep a single connection test at the end. I'm somewhat disinclined > though. I agree --- part of the appeal of this idea is that there could be a net subtraction of code from pg_ctl. (I think it wouldn't have to link libpq anymore at all, though maybe I forgot something.) And you get rid of a bunch of can't-connect failure modes, eg kernel packet filter in the way, which probably outweighs any hypothetical reliability gain from confirming the postmaster state the old way. This would mean that v10 pg_ctl could not be used to start/stop older postmasters, which is flexibility we tried to preserve in the past. But I see that we already gave that up in quite a few code paths because of their attempts to read pg_control, so I think it's a concern we can ignore. I'll draft something up later. regards, tom lane
I wrote: > Andres Freund <andres@anarazel.de> writes: >> On 2017-06-26 17:38:03 -0400, Tom Lane wrote: >>> Hm. Take that a bit further, and we could drop the connection probes >>> altogether --- just put the whole responsibility on the postmaster to >>> show in the pidfile whether it's ready for connections or not. >> Yea, that seems quite appealing, both from an architectural, simplicity, >> and log noise perspective. I wonder if there's some added reliability by >> the connection probe though? Essentially wondering if it'd be worthwhile >> to keep a single connection test at the end. I'm somewhat disinclined >> though. > I agree --- part of the appeal of this idea is that there could be a net > subtraction of code from pg_ctl. (I think it wouldn't have to link libpq > anymore at all, though maybe I forgot something.) And you get rid of a > bunch of can't-connect failure modes, eg kernel packet filter in the way, > which probably outweighs any hypothetical reliability gain from confirming > the postmaster state the old way. Here's a draft patch for that. I quite like the results --- this seems way simpler and more reliable than what pg_ctl has done up to now. However, it's certainly arguable that this is too much change for an optional post-beta patch. If we decide that it has to wait for v11, I'd address Jeff's complaint by hacking the loop behavior in test_postmaster_connection, which'd be ugly but not many lines of code. Note that I followed the USE_SYSTEMD patch's lead as to where to report postmaster state changes. Arguably, in the standby-server case, we should not report the postmaster is ready until we've reached consistency. But that would require additional signaling from the startup process to the postmaster, so it seems like a separate change if we want it. Thoughts? regards, tom lane diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 2874f63..38f534f 100644 *** a/src/backend/postmaster/postmaster.c --- b/src/backend/postmaster/postmaster.c *************** PostmasterMain(int argc, char *argv[]) *** 1341,1346 **** --- 1341,1354 ---- #endif /* + * Report postmaster status in the postmaster.pid file, to allow pg_ctl to + * see what's happening. Note that all strings written to the status line + * must be the same length, per comments for AddToDataDirLockFile(). We + * currently make them all 8 bytes, padding with spaces. + */ + AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, "starting"); + + /* * We're ready to rock and roll... */ StartupPID = StartupDataBase(); *************** pmdie(SIGNAL_ARGS) *** 2608,2613 **** --- 2616,2624 ---- Shutdown = SmartShutdown; ereport(LOG, (errmsg("received smart shutdown request"))); + + /* Report status */ + AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, "stopping"); #ifdef USE_SYSTEMD sd_notify(0, "STOPPING=1"); #endif *************** pmdie(SIGNAL_ARGS) *** 2663,2668 **** --- 2674,2682 ---- Shutdown = FastShutdown; ereport(LOG, (errmsg("received fast shutdown request"))); + + /* Report status */ + AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, "stopping"); #ifdef USE_SYSTEMD sd_notify(0, "STOPPING=1"); #endif *************** pmdie(SIGNAL_ARGS) *** 2727,2732 **** --- 2741,2749 ---- Shutdown = ImmediateShutdown; ereport(LOG, (errmsg("received immediate shutdown request"))); + + /* Report status */ + AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, "stopping"); #ifdef USE_SYSTEMD sd_notify(0, "STOPPING=1"); #endif *************** reaper(SIGNAL_ARGS) *** 2872,2877 **** --- 2889,2896 ---- ereport(LOG, (errmsg("database system is ready to accept connections"))); + /* Report status */ + AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, "ready "); #ifdef USE_SYSTEMD sd_notify(0, "READY=1"); #endif *************** sigusr1_handler(SIGNAL_ARGS) *** 5005,5014 **** if (XLogArchivingAlways()) PgArchPID = pgarch_start(); ! #ifdef USE_SYSTEMD if (!EnableHotStandby) sd_notify(0, "READY=1"); #endif pmState = PM_RECOVERY; } --- 5024,5041 ---- if (XLogArchivingAlways()) PgArchPID = pgarch_start(); ! /* ! * If we aren't planning to enter hot standby mode later, treat ! * RECOVERY_STARTED as meaning we're out of startup, and report status ! * accordingly. ! */ if (!EnableHotStandby) + { + AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, "standby "); + #ifdef USE_SYSTEMD sd_notify(0, "READY=1"); #endif + } pmState = PM_RECOVERY; } *************** sigusr1_handler(SIGNAL_ARGS) *** 5024,5029 **** --- 5051,5058 ---- ereport(LOG, (errmsg("database system is ready to accept read only connections"))); + /* Report status */ + AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, "ready "); #ifdef USE_SYSTEMD sd_notify(0, "READY=1"); #endif diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c index 49a6afa..216bcc7 100644 *** a/src/backend/utils/init/miscinit.c --- b/src/backend/utils/init/miscinit.c *************** TouchSocketLockFiles(void) *** 1149,1156 **** * * Note: because we don't truncate the file, if we were to rewrite a line * with less data than it had before, there would be garbage after the last ! * line. We don't ever actually do that, so not worth adding another kernel ! * call to cover the possibility. */ void AddToDataDirLockFile(int target_line, const char *str) --- 1149,1157 ---- * * Note: because we don't truncate the file, if we were to rewrite a line * with less data than it had before, there would be garbage after the last ! * line. While we could fix that by adding a truncate call, that would make ! * the file update non-atomic, which we'd rather avoid. Therefore, callers ! * should endeavor never to shorten a line once it's been written. */ void AddToDataDirLockFile(int target_line, const char *str) *************** AddToDataDirLockFile(int target_line, co *** 1193,1211 **** srcptr = srcbuffer; for (lineno = 1; lineno < target_line; lineno++) { ! if ((srcptr = strchr(srcptr, '\n')) == NULL) ! { ! elog(LOG, "incomplete data in \"%s\": found only %d newlines while trying to add line %d", ! DIRECTORY_LOCK_FILE, lineno - 1, target_line); ! close(fd); ! return; ! } ! srcptr++; } memcpy(destbuffer, srcbuffer, srcptr - srcbuffer); destptr = destbuffer + (srcptr - srcbuffer); /* * Write or rewrite the target line. */ snprintf(destptr, destbuffer + sizeof(destbuffer) - destptr, "%s\n", str); --- 1194,1219 ---- srcptr = srcbuffer; for (lineno = 1; lineno < target_line; lineno++) { ! char *eol = strchr(srcptr, '\n'); ! ! if (eol == NULL) ! break; /* not enough lines in file yet */ ! srcptr = eol + 1; } memcpy(destbuffer, srcbuffer, srcptr - srcbuffer); destptr = destbuffer + (srcptr - srcbuffer); /* + * Fill in any missing lines before the target line, in case lines are + * added to the file out of order. + */ + for (; lineno < target_line; lineno++) + { + if (destptr < destbuffer + sizeof(destbuffer)) + *destptr++ = '\n'; + } + + /* * Write or rewrite the target line. */ snprintf(destptr, destbuffer + sizeof(destbuffer) - destptr, "%s\n", str); diff --git a/src/bin/pg_ctl/Makefile b/src/bin/pg_ctl/Makefile index f5ec088..46f30bd 100644 *** a/src/bin/pg_ctl/Makefile --- b/src/bin/pg_ctl/Makefile *************** subdir = src/bin/pg_ctl *** 16,29 **** top_builddir = ../../.. include $(top_builddir)/src/Makefile.global - override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS) - OBJS= pg_ctl.o $(WIN32RES) all: pg_ctl ! pg_ctl: $(OBJS) | submake-libpq submake-libpgport ! $(CC) $(CFLAGS) $(OBJS) $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X) install: all installdirs $(INSTALL_PROGRAM) pg_ctl$(X) '$(DESTDIR)$(bindir)/pg_ctl$(X)' --- 16,27 ---- top_builddir = ../../.. include $(top_builddir)/src/Makefile.global OBJS= pg_ctl.o $(WIN32RES) all: pg_ctl ! pg_ctl: $(OBJS) | submake-libpgport ! $(CC) $(CFLAGS) $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X) install: all installdirs $(INSTALL_PROGRAM) pg_ctl$(X) '$(DESTDIR)$(bindir)/pg_ctl$(X)' diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c index ad2a16f..81b276c 100644 *** a/src/bin/pg_ctl/pg_ctl.c --- b/src/bin/pg_ctl/pg_ctl.c *************** *** 34,42 **** #include "catalog/pg_control.h" #include "common/controldata_utils.h" #include "getopt_long.h" - #include "libpq-fe.h" #include "miscadmin.h" - #include "pqexpbuffer.h" /* PID can be negative for standalone backend */ typedef long pgpid_t; --- 34,40 ---- *************** typedef enum *** 49,54 **** --- 47,58 ---- IMMEDIATE_MODE } ShutdownMode; + typedef enum + { + POSTMASTER_READY, + POSTMASTER_STILL_STARTING, + POSTMASTER_FAILED + } WaitPMResult; typedef enum { *************** static int CreateRestrictedProcess(char *** 147,158 **** #endif static pgpid_t get_pgpid(bool is_status_request); ! static char **readfile(const char *path); static void free_readfile(char **optlines); static pgpid_t start_postmaster(void); static void read_post_opts(void); ! static PGPing test_postmaster_connection(pgpid_t pm_pid, bool do_checkpoint); static bool postmaster_is_alive(pid_t pid); #if defined(HAVE_GETRLIMIT) && defined(RLIMIT_CORE) --- 151,162 ---- #endif static pgpid_t get_pgpid(bool is_status_request); ! static char **readfile(const char *path, int *numlines); static void free_readfile(char **optlines); static pgpid_t start_postmaster(void); static void read_post_opts(void); ! static WaitPMResult wait_for_postmaster(pgpid_t pm_pid, bool do_checkpoint); static bool postmaster_is_alive(pid_t pid); #if defined(HAVE_GETRLIMIT) && defined(RLIMIT_CORE) *************** get_pgpid(bool is_status_request) *** 304,312 **** /* * get the lines from a text file - return NULL if file can't be opened */ static char ** ! readfile(const char *path) { int fd; int nlines; --- 308,319 ---- /* * get the lines from a text file - return NULL if file can't be opened + * + * *numlines is set to the number of line pointers returned; there is + * also an additional NULL pointer after the last real line. */ static char ** ! readfile(const char *path, int *numlines) { int fd; int nlines; *************** readfile(const char *path) *** 318,323 **** --- 325,332 ---- int len; struct stat statbuf; + *numlines = 0; /* in case of failure or empty file */ + /* * Slurp the file into memory. * *************** readfile(const char *path) *** 367,372 **** --- 376,382 ---- /* set up the result buffer */ result = (char **) pg_malloc((nlines + 1) * sizeof(char *)); + *numlines = nlines; /* now split the buffer into lines */ linebegin = buffer; *************** start_postmaster(void) *** 509,515 **** /* ! * Find the pgport and try a connection * * On Unix, pm_pid is the PID of the just-launched postmaster. On Windows, * it may be the PID of an ancestor shell process, so we can't check the --- 519,525 ---- /* ! * Wait for the postmaster to become ready. * * On Unix, pm_pid is the PID of the just-launched postmaster. On Windows, * it may be the PID of an ancestor shell process, so we can't check the *************** start_postmaster(void) *** 522,689 **** * Note that the checkpoint parameter enables a Windows service control * manager checkpoint, it's got nothing to do with database checkpoints!! */ ! static PGPing ! test_postmaster_connection(pgpid_t pm_pid, bool do_checkpoint) { - PGPing ret = PQPING_NO_RESPONSE; - char connstr[MAXPGPATH * 2 + 256]; int i; - /* if requested wait time is zero, return "still starting up" code */ - if (wait_seconds <= 0) - return PQPING_REJECT; - - connstr[0] = '\0'; - for (i = 0; i < wait_seconds * WAITS_PER_SEC; i++) { ! /* Do we need a connection string? */ ! if (connstr[0] == '\0') ! { ! /*---------- ! * The number of lines in postmaster.pid tells us several things: ! * ! * # of lines ! * 0 lock file created but status not written ! * 2 pre-9.1 server, shared memory not created ! * 3 pre-9.1 server, shared memory created ! * 5 9.1+ server, ports not opened ! * 6 9.1+ server, shared memory not created ! * 7 9.1+ server, shared memory created ! * ! * This code does not support pre-9.1 servers. On Unix machines ! * we could consider extracting the port number from the shmem ! * key, but that (a) is not robust, and (b) doesn't help with ! * finding out the socket directory. And it wouldn't work anyway ! * on Windows. ! * ! * If we see less than 6 lines in postmaster.pid, just keep ! * waiting. ! *---------- ! */ ! char **optlines; ! /* Try to read the postmaster.pid file */ ! if ((optlines = readfile(pid_file)) != NULL && ! optlines[0] != NULL && ! optlines[1] != NULL && ! optlines[2] != NULL) ! { ! if (optlines[3] == NULL) ! { ! /* File is exactly three lines, must be pre-9.1 */ ! write_stderr(_("\n%s: -w option is not supported when starting a pre-9.1 server\n"), ! progname); ! return PQPING_NO_ATTEMPT; ! } ! else if (optlines[4] != NULL && ! optlines[5] != NULL) ! { ! /* File is complete enough for us, parse it */ ! pgpid_t pmpid; ! time_t pmstart; ! /* ! * Make sanity checks. If it's for the wrong PID, or the ! * recorded start time is before pg_ctl started, then ! * either we are looking at the wrong data directory, or ! * this is a pre-existing pidfile that hasn't (yet?) been ! * overwritten by our child postmaster. Allow 2 seconds ! * slop for possible cross-process clock skew. ! */ ! pmpid = atol(optlines[LOCK_FILE_LINE_PID - 1]); ! pmstart = atol(optlines[LOCK_FILE_LINE_START_TIME - 1]); ! if (pmstart >= start_time - 2 && #ifndef WIN32 ! pmpid == pm_pid #else ! /* Windows can only reject standalone-backend PIDs */ ! pmpid > 0 #endif ! ) ! { ! /* ! * OK, seems to be a valid pidfile from our child. ! */ ! int portnum; ! char *sockdir; ! char *hostaddr; ! char host_str[MAXPGPATH]; ! ! /* ! * Extract port number and host string to use. Prefer ! * using Unix socket if available. ! */ ! portnum = atoi(optlines[LOCK_FILE_LINE_PORT - 1]); ! sockdir = optlines[LOCK_FILE_LINE_SOCKET_DIR - 1]; ! hostaddr = optlines[LOCK_FILE_LINE_LISTEN_ADDR - 1]; ! ! /* ! * While unix_socket_directories can accept relative ! * directories, libpq's host parameter must have a ! * leading slash to indicate a socket directory. So, ! * ignore sockdir if it's relative, and try to use TCP ! * instead. ! */ ! if (sockdir[0] == '/') ! strlcpy(host_str, sockdir, sizeof(host_str)); ! else ! strlcpy(host_str, hostaddr, sizeof(host_str)); ! ! /* remove trailing newline */ ! if (strchr(host_str, '\n') != NULL) ! *strchr(host_str, '\n') = '\0'; ! ! /* Fail if couldn't get either sockdir or host addr */ ! if (host_str[0] == '\0') ! { ! write_stderr(_("\n%s: -w option cannot use a relative socket directory specification\n"), ! progname); ! return PQPING_NO_ATTEMPT; ! } ! ! /* ! * Map listen-only addresses to counterparts usable ! * for establishing a connection. connect() to "::" ! * or "0.0.0.0" is not portable to OpenBSD 5.0 or to ! * Windows Server 2008, and connect() to "::" is ! * additionally not portable to NetBSD 6.0. (Cygwin ! * does handle both addresses, though.) ! */ ! if (strcmp(host_str, "*") == 0) ! strcpy(host_str, "localhost"); ! else if (strcmp(host_str, "0.0.0.0") == 0) ! strcpy(host_str, "127.0.0.1"); ! else if (strcmp(host_str, "::") == 0) ! strcpy(host_str, "::1"); ! /* ! * We need to set connect_timeout otherwise on Windows ! * the Service Control Manager (SCM) will probably ! * timeout first. ! */ ! snprintf(connstr, sizeof(connstr), ! "dbname=postgres port=%d host='%s' connect_timeout=5", ! portnum, host_str); ! } } } - - /* - * Free the results of readfile. - * - * This is safe to call even if optlines is NULL. - */ - free_readfile(optlines); } ! /* If we have a connection string, ping the server */ ! if (connstr[0] != '\0') ! { ! ret = PQping(connstr); ! if (ret == PQPING_OK || ret == PQPING_NO_ATTEMPT) ! break; ! } /* * Check whether the child postmaster process is still alive. This --- 532,599 ---- * Note that the checkpoint parameter enables a Windows service control * manager checkpoint, it's got nothing to do with database checkpoints!! */ ! static WaitPMResult ! wait_for_postmaster(pgpid_t pm_pid, bool do_checkpoint) { int i; for (i = 0; i < wait_seconds * WAITS_PER_SEC; i++) { ! char **optlines; ! int numlines; ! /* ! * Try to read the postmaster.pid file. If it's not valid, or if the ! * status line isn't there yet, just keep waiting. ! */ ! if ((optlines = readfile(pid_file, &numlines)) != NULL && ! numlines >= LOCK_FILE_LINE_PM_STATUS) ! { ! /* File is complete enough for us, parse it */ ! pgpid_t pmpid; ! time_t pmstart; ! /* ! * Make sanity checks. If it's for the wrong PID, or the recorded ! * start time is before pg_ctl started, then either we are looking ! * at the wrong data directory, or this is a pre-existing pidfile ! * that hasn't (yet?) been overwritten by our child postmaster. ! * Allow 2 seconds slop for possible cross-process clock skew. ! */ ! pmpid = atol(optlines[LOCK_FILE_LINE_PID - 1]); ! pmstart = atol(optlines[LOCK_FILE_LINE_START_TIME - 1]); ! if (pmstart >= start_time - 2 && #ifndef WIN32 ! pmpid == pm_pid #else ! /* Windows can only reject standalone-backend PIDs */ ! pmpid > 0 #endif ! ) ! { ! /* ! * OK, seems to be a valid pidfile from our child. Check the ! * status line (this assumes a v10 or later server). ! */ ! char *pmstatus = optlines[LOCK_FILE_LINE_PM_STATUS - 1]; ! /* status line may be blank-padded */ ! if (strncmp(pmstatus, "ready", 5) == 0 || ! strncmp(pmstatus, "standby", 7) == 0) ! { ! /* postmaster is done starting up */ ! free_readfile(optlines); ! return POSTMASTER_READY; } } } ! /* ! * Free the results of readfile. ! * ! * This is safe to call even if optlines is NULL. ! */ ! free_readfile(optlines); /* * Check whether the child postmaster process is still alive. This *************** test_postmaster_connection(pgpid_t pm_pi *** 697,710 **** int exitstatus; if (waitpid((pid_t) pm_pid, &exitstatus, WNOHANG) == (pid_t) pm_pid) ! return PQPING_NO_RESPONSE; } #else if (WaitForSingleObject(postmasterProcess, 0) == WAIT_OBJECT_0) ! return PQPING_NO_RESPONSE; #endif ! /* No response, or startup still in process; wait */ if (i % WAITS_PER_SEC == 0) { #ifdef WIN32 --- 607,620 ---- int exitstatus; if (waitpid((pid_t) pm_pid, &exitstatus, WNOHANG) == (pid_t) pm_pid) ! return POSTMASTER_FAILED; } #else if (WaitForSingleObject(postmasterProcess, 0) == WAIT_OBJECT_0) ! return POSTMASTER_FAILED; #endif ! /* Startup still in process; wait, printing a dot once per second */ if (i % WAITS_PER_SEC == 0) { #ifdef WIN32 *************** test_postmaster_connection(pgpid_t pm_pi *** 729,736 **** pg_usleep(USEC_PER_SEC / WAITS_PER_SEC); } ! /* return result of last call to PQping */ ! return ret; } --- 639,646 ---- pg_usleep(USEC_PER_SEC / WAITS_PER_SEC); } ! /* out of patience; report that postmaster is still starting up */ ! return POSTMASTER_STILL_STARTING; } *************** read_post_opts(void) *** 764,777 **** if (ctl_command == RESTART_COMMAND) { char **optlines; ! optlines = readfile(postopts_file); if (optlines == NULL) { write_stderr(_("%s: could not read file \"%s\"\n"), progname, postopts_file); exit(1); } ! else if (optlines[0] == NULL || optlines[1] != NULL) { write_stderr(_("%s: option file \"%s\" must have exactly one line\n"), progname, postopts_file); --- 674,688 ---- if (ctl_command == RESTART_COMMAND) { char **optlines; + int numlines; ! optlines = readfile(postopts_file, &numlines); if (optlines == NULL) { write_stderr(_("%s: could not read file \"%s\"\n"), progname, postopts_file); exit(1); } ! else if (numlines != 1) { write_stderr(_("%s: option file \"%s\" must have exactly one line\n"), progname, postopts_file); *************** do_start(void) *** 917,944 **** { print_msg(_("waiting for server to start...")); ! switch (test_postmaster_connection(pm_pid, false)) { ! case PQPING_OK: print_msg(_(" done\n")); print_msg(_("server started\n")); break; ! case PQPING_REJECT: print_msg(_(" stopped waiting\n")); print_msg(_("server is still starting up\n")); break; ! case PQPING_NO_RESPONSE: print_msg(_(" stopped waiting\n")); write_stderr(_("%s: could not start server\n" "Examine the log output.\n"), progname); exit(1); break; - case PQPING_NO_ATTEMPT: - print_msg(_(" failed\n")); - write_stderr(_("%s: could not wait for server because of misconfiguration\n"), - progname); - exit(1); } } else --- 828,850 ---- { print_msg(_("waiting for server to start...")); ! switch (wait_for_postmaster(pm_pid, false)) { ! case POSTMASTER_READY: print_msg(_(" done\n")); print_msg(_("server started\n")); break; ! case POSTMASTER_STILL_STARTING: print_msg(_(" stopped waiting\n")); print_msg(_("server is still starting up\n")); break; ! case POSTMASTER_FAILED: print_msg(_(" stopped waiting\n")); write_stderr(_("%s: could not start server\n" "Examine the log output.\n"), progname); exit(1); break; } } else *************** do_status(void) *** 1319,1329 **** { char **optlines; char **curr_line; printf(_("%s: server is running (PID: %ld)\n"), progname, pid); ! optlines = readfile(postopts_file); if (optlines != NULL) { for (curr_line = optlines; *curr_line != NULL; curr_line++) --- 1225,1236 ---- { char **optlines; char **curr_line; + int numlines; printf(_("%s: server is running (PID: %ld)\n"), progname, pid); ! optlines = readfile(postopts_file, &numlines); if (optlines != NULL) { for (curr_line = optlines; *curr_line != NULL; curr_line++) *************** pgwin32_ServiceMain(DWORD argc, LPTSTR * *** 1634,1640 **** if (do_wait) { write_eventlog(EVENTLOG_INFORMATION_TYPE, _("Waiting for server startup...\n")); ! if (test_postmaster_connection(postmasterPID, true) != PQPING_OK) { write_eventlog(EVENTLOG_ERROR_TYPE, _("Timed out waiting for server startup\n")); pgwin32_SetServiceStatus(SERVICE_STOPPED); --- 1541,1547 ---- if (do_wait) { write_eventlog(EVENTLOG_INFORMATION_TYPE, _("Waiting for server startup...\n")); ! if (wait_for_postmaster(postmasterPID, true) != POSTMASTER_READY) { write_eventlog(EVENTLOG_ERROR_TYPE, _("Timed out waiting for server startup\n")); pgwin32_SetServiceStatus(SERVICE_STOPPED); *************** pgwin32_ServiceMain(DWORD argc, LPTSTR * *** 1655,1661 **** { /* * status.dwCheckPoint can be incremented by ! * test_postmaster_connection(), so it might not start from 0. */ int maxShutdownCheckPoint = status.dwCheckPoint + 12; --- 1562,1568 ---- { /* * status.dwCheckPoint can be incremented by ! * wait_for_postmaster(), so it might not start from 0. */ int maxShutdownCheckPoint = status.dwCheckPoint + 12; diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h index 21a7728..0a07a02 100644 *** a/src/include/miscadmin.h --- b/src/include/miscadmin.h *************** extern char *shared_preload_libraries_st *** 432,438 **** extern char *local_preload_libraries_string; /* ! * As of 9.1, the contents of the data-directory lock file are: * * line # * 1 postmaster PID (or negative of a standalone backend's PID) --- 432,438 ---- extern char *local_preload_libraries_string; /* ! * As of Postgres 10, the contents of the data-directory lock file are: * * line # * 1 postmaster PID (or negative of a standalone backend's PID) *************** extern char *local_preload_libraries_str *** 441,452 **** * 4 port number * 5 first Unix socket directory path (empty if none) * 6 first listen_address (IP address or "*"; empty if no TCP port) ! * 7 shared memory key (not present on Windows) * * Lines 6 and up are added via AddToDataDirLockFile() after initial file ! * creation. * ! * The socket lock file, if used, has the same contents as lines 1-5. */ #define LOCK_FILE_LINE_PID 1 #define LOCK_FILE_LINE_DATA_DIR 2 --- 441,455 ---- * 4 port number * 5 first Unix socket directory path (empty if none) * 6 first listen_address (IP address or "*"; empty if no TCP port) ! * 7 shared memory key (empty on Windows) ! * 8 postmaster status ("starting", "stopping", "ready ", "standby ") * * Lines 6 and up are added via AddToDataDirLockFile() after initial file ! * creation; also, line 5 is initially empty and is changed after the first ! * Unix socket is opened. * ! * Socket lock file(s), if used, have the same contents as lines 1-5, with ! * line 5 being their own directory. */ #define LOCK_FILE_LINE_PID 1 #define LOCK_FILE_LINE_DATA_DIR 2 *************** extern char *local_preload_libraries_str *** 455,460 **** --- 458,464 ---- #define LOCK_FILE_LINE_SOCKET_DIR 5 #define LOCK_FILE_LINE_LISTEN_ADDR 6 #define LOCK_FILE_LINE_SHMEM_KEY 7 + #define LOCK_FILE_LINE_PM_STATUS 8 extern void CreateDataDirLockFile(bool amPostmaster); extern void CreateSocketLockFile(const char *socketfile, bool amPostmaster, -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Hi, On 2017-06-27 14:59:18 -0400, Tom Lane wrote: > Here's a draft patch for that. I quite like the results --- this seems > way simpler and more reliable than what pg_ctl has done up to now. Yea, I like that too. > However, it's certainly arguable that this is too much change for an > optional post-beta patch. Yea, I think there's a valid case to be made for that. I'm still inclined to go along with this, it seems we're otherwise just adding complexity we're going to remove in a bit again. > If we decide that it has to wait for v11, > I'd address Jeff's complaint by hacking the loop behavior in > test_postmaster_connection, which'd be ugly but not many lines of code. Basically increasing the wait time over time? > Note that I followed the USE_SYSTEMD patch's lead as to where to report > postmaster state changes. Arguably, in the standby-server case, we > should not report the postmaster is ready until we've reached consistency. > But that would require additional signaling from the startup process > to the postmaster, so it seems like a separate change if we want it. I suspect we're going to want to add more states to this over time, but as you say, there's no need to hurry. > /* > + * Report postmaster status in the postmaster.pid file, to allow pg_ctl to > + * see what's happening. Note that all strings written to the status line > + * must be the same length, per comments for AddToDataDirLockFile(). We > + * currently make them all 8 bytes, padding with spaces. > + */ > + AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, "starting"); The 8-space thing in multiple places is a bit ugly. How about having a a bunch of constants declared in one place? Alternatively make it something like: status: $c, where $c is a one character code for the various states? > @@ -1149,8 +1149,9 @@ TouchSocketLockFiles(void) > * > * Note: because we don't truncate the file, if we were to rewrite a line > * with less data than it had before, there would be garbage after the last > - * line. We don't ever actually do that, so not worth adding another kernel > - * call to cover the possibility. > + * line. While we could fix that by adding a truncate call, that would make > + * the file update non-atomic, which we'd rather avoid. Therefore, callers > + * should endeavor never to shorten a line once it's been written. > */ > void > AddToDataDirLockFile(int target_line, const char *str) > @@ -1193,19 +1194,26 @@ AddToDataDirLockFile(int target_line, co > srcptr = srcbuffer; > for (lineno = 1; lineno < target_line; lineno++) > { > - if ((srcptr = strchr(srcptr, '\n')) == NULL) > - { > - elog(LOG, "incomplete data in \"%s\": found only %d newlines while trying to add line %d", > - DIRECTORY_LOCK_FILE, lineno - 1, target_line); > - close(fd); > - return; > - } > - srcptr++; > + char *eol = strchr(srcptr, '\n'); > + > + if (eol == NULL) > + break; /* not enough lines in file yet */ > + srcptr = eol + 1; > } > memcpy(destbuffer, srcbuffer, srcptr - srcbuffer); > destptr = destbuffer + (srcptr - srcbuffer); > > /* > + * Fill in any missing lines before the target line, in case lines are > + * added to the file out of order. > + */ > + for (; lineno < target_line; lineno++) > + { > + if (destptr < destbuffer + sizeof(destbuffer)) > + *destptr++ = '\n'; > + } > + > + /* > * Write or rewrite the target line. > */ > snprintf(destptr, destbuffer + sizeof(destbuffer) - destptr, > "%s\n", str); Not this patches fault, and shouldn't be changed now, but this is a fairly weird way to manage and update this file. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2017-06-27 14:59:18 -0400, Tom Lane wrote: >> If we decide that it has to wait for v11, >> I'd address Jeff's complaint by hacking the loop behavior in >> test_postmaster_connection, which'd be ugly but not many lines of code. > Basically increasing the wait time over time? I was thinking of just dropping back to once-per-second after a couple of seconds, with something along the lines of this in place of the existing sleep at the bottom of the loop: if (i >= 2 * WAITS_PER_SEC){ pg_usleep(USEC_PER_SEC); i += WAITS_PER_SEC - 1;}else pg_usleep(USEC_PER_SEC / WAITS_PER_SEC); > The 8-space thing in multiple places is a bit ugly. How about having a > a bunch of constants declared in one place? Alternatively make it > something like: status: $c, where $c is a one character code for the > various states? Yeah, we could add the string values as macros in miscadmin.h, perhaps. I don't like the single-character idea --- if we do expand the number of things reported this way, that could get restrictive. regards, tom lane
Andres Freund <andres@anarazel.de> writes: > On 2017-06-27 14:59:18 -0400, Tom Lane wrote: >> However, it's certainly arguable that this is too much change for an >> optional post-beta patch. > Yea, I think there's a valid case to be made for that. I'm still > inclined to go along with this, it seems we're otherwise just adding > complexity we're going to remove in a bit again. I'm not hearing anyone speaking against doing this now, so I'm going to go ahead with it. > The 8-space thing in multiple places is a bit ugly. How about having a > a bunch of constants declared in one place? While looking this over again, I got worried about the fact that pg_ctl is #including "miscadmin.h". That's a pretty low-level backend header and it wouldn't be surprising at all if somebody tried to put stuff in it that wouldn't compile frontend-side. I think we should take the opportunity, as long as we're touching this stuff, to split the #defines that describe the contents of postmaster.pid into a separate header file. Maybe "utils/pidfile.h" ? regards, tom lane
On 2017-06-28 13:31:27 -0400, Tom Lane wrote: > I'm not hearing anyone speaking against doing this now, so I'm going > to go ahead with it. Cool. > While looking this over again, I got worried about the fact that pg_ctl > is #including "miscadmin.h". That's a pretty low-level backend header > and it wouldn't be surprising at all if somebody tried to put stuff in > it that wouldn't compile frontend-side. I think we should take the > opportunity, as long as we're touching this stuff, to split the #defines > that describe the contents of postmaster.pid into a separate header file. > Maybe "utils/pidfile.h" ? Yes, that sounds like a valid concern, and solution. - Andres
Andres Freund <andres@anarazel.de> writes: > On 2017-06-28 13:31:27 -0400, Tom Lane wrote: >> While looking this over again, I got worried about the fact that pg_ctl >> is #including "miscadmin.h". That's a pretty low-level backend header >> and it wouldn't be surprising at all if somebody tried to put stuff in >> it that wouldn't compile frontend-side. I think we should take the >> opportunity, as long as we're touching this stuff, to split the #defines >> that describe the contents of postmaster.pid into a separate header file. >> Maybe "utils/pidfile.h" ? > Yes, that sounds like a valid concern, and solution. So when I removed the miscadmin.h include, I found out that pg_ctl is also relying on PG_BACKEND_VERSIONSTR from that file. There are at least three things we could do here: 1. Give this up as not worth this much trouble. 2. Move PG_BACKEND_VERSIONSTR into pg_config.h to go along with the other version-related macros. 3. Give PG_BACKEND_VERSIONSTR its very own new header file. Any preferences? regards, tom lane
Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > So when I removed the miscadmin.h include, I found out that pg_ctl is > also relying on PG_BACKEND_VERSIONSTR from that file. > > There are at least three things we could do here: > > 1. Give this up as not worth this much trouble. > > 2. Move PG_BACKEND_VERSIONSTR into pg_config.h to go along with the > other version-related macros. pg_config.h sounds like a decent enough solution. It's a bit strange this hasn't come up before, given that that symbol is used more in frontend environ than backend. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Tom Lane wrote: >> So when I removed the miscadmin.h include, I found out that pg_ctl is >> also relying on PG_BACKEND_VERSIONSTR from that file. >> >> There are at least three things we could do here: >> >> 1. Give this up as not worth this much trouble. >> >> 2. Move PG_BACKEND_VERSIONSTR into pg_config.h to go along with the >> other version-related macros. > pg_config.h sounds like a decent enough solution. It's a bit strange > this hasn't come up before, given that that symbol is used more in > frontend environ than backend. Right at the moment, what I've done is to stick it into port.h beside the declaration of find_other_exec, since the existing uses are all as parameters of find_other_exec[_or_die]. But maybe that's a bit too expedient. regards, tom lane
On Wed, Jun 28, 2017 at 8:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andres Freund <andres@anarazel.de> writes: >> On 2017-06-27 14:59:18 -0400, Tom Lane wrote: >>> However, it's certainly arguable that this is too much change for an >>> optional post-beta patch. > >> Yea, I think there's a valid case to be made for that. I'm still >> inclined to go along with this, it seems we're otherwise just adding >> complexity we're going to remove in a bit again. > > I'm not hearing anyone speaking against doing this now, so I'm going > to go ahead with it. +1 for going for it. Besides pg_ctl it would also help cluster management software. In Patroni we basically reimplement pg_ctl to get at the started postmaster pid to detect a crash during postmaster startup earlier and to be able to reliably cancel start. Getting the current state from the pid file sounds better than having a loop poke the server with pg_isready. To introduce feature creep, I would like to see more detailed states than proposed here. Specifically I'm interested in knowing when PM_WAIT_BACKENDS ends. When we lose quorum we restart PostgreSQL as a standby. We use a regular fast shutdown, but that can take a long time due to the shutdown checkpoint. The leader lease may run out during this time so we would have to escalate to immediate shutdown or have a watchdog fence the node. If we knew that no user backends are left we can let the shutdown checkpoint complete at leisure without risk for split brain. Regards, Ants Aasma
On Tue, Jun 27, 2017 at 11:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
> Andres Freund <andres@anarazel.de> writes:
>> On 2017-06-26 17:38:03 -0400, Tom Lane wrote:
>>> Hm. Take that a bit further, and we could drop the connection probes
>>> altogether --- just put the whole responsibility on the postmaster to
>>> show in the pidfile whether it's ready for connections or not.
>> Yea, that seems quite appealing, both from an architectural, simplicity,
>> and log noise perspective. I wonder if there's some added reliability by
>> the connection probe though? Essentially wondering if it'd be worthwhile
>> to keep a single connection test at the end. I'm somewhat disinclined
>> though.
> I agree --- part of the appeal of this idea is that there could be a net
> subtraction of code from pg_ctl. (I think it wouldn't have to link libpq
> anymore at all, though maybe I forgot something.) And you get rid of a
> bunch of can't-connect failure modes, eg kernel packet filter in the way,
> which probably outweighs any hypothetical reliability gain from confirming
> the postmaster state the old way.
Here's a draft patch for that. I quite like the results --- this seems
way simpler and more reliable than what pg_ctl has done up to now.
However, it's certainly arguable that this is too much change for an
optional post-beta patch.
In the now-committed version of this, the 'pg_ctl start' returns successfully as soon as the server reaches a consistent state. Which is OK, except that it does the same thing when hot_standby=off. When hot_standby=off, I would expect it to wait for the end of recovery before exiting with a success code.
Cheers,
Jeff
On June 29, 2017 10:19:46 AM PDT, Jeff Janes <jeff.janes@gmail.com> wrote: >On Tue, Jun 27, 2017 at 11:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> I wrote: >> > Andres Freund <andres@anarazel.de> writes: >> >> On 2017-06-26 17:38:03 -0400, Tom Lane wrote: >> >>> Hm. Take that a bit further, and we could drop the connection >probes >> >>> altogether --- just put the whole responsibility on the >postmaster to >> >>> show in the pidfile whether it's ready for connections or not. >> >> >> Yea, that seems quite appealing, both from an architectural, >simplicity, >> >> and log noise perspective. I wonder if there's some added >reliability by >> >> the connection probe though? Essentially wondering if it'd be >worthwhile >> >> to keep a single connection test at the end. I'm somewhat >disinclined >> >> though. >> >> > I agree --- part of the appeal of this idea is that there could be >a net >> > subtraction of code from pg_ctl. (I think it wouldn't have to link >libpq >> > anymore at all, though maybe I forgot something.) And you get rid >of a >> > bunch of can't-connect failure modes, eg kernel packet filter in >the way, >> > which probably outweighs any hypothetical reliability gain from >> confirming >> > the postmaster state the old way. >> >> Here's a draft patch for that. I quite like the results --- this >seems >> way simpler and more reliable than what pg_ctl has done up to now. >> However, it's certainly arguable that this is too much change for an >> optional post-beta patch. > > >In the now-committed version of this, the 'pg_ctl start' returns >successfully as soon as the server reaches a consistent state. Which is >OK, >except that it does the same thing when hot_standby=off. When >hot_standby=off, I would expect it to wait for the end of recovery >before >exiting with a success code. I've not looked at the committed version, but earlier versions had code dealing with that difference; essentially doing whatyou suggest. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Jeff Janes <jeff.janes@gmail.com> writes: > In the now-committed version of this, the 'pg_ctl start' returns > successfully as soon as the server reaches a consistent state. Which is OK, > except that it does the same thing when hot_standby=off. When > hot_standby=off, I would expect it to wait for the end of recovery before > exiting with a success code. Um, won't it be waiting forever with that definition? regards, tom lane
On Thu, Jun 29, 2017 at 11:39 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Cheers,Jeff Janes <jeff.janes@gmail.com> writes:
> In the now-committed version of this, the 'pg_ctl start' returns
> successfully as soon as the server reaches a consistent state. Which is OK,
> except that it does the same thing when hot_standby=off. When
> hot_standby=off, I would expect it to wait for the end of recovery before
> exiting with a success code.
Um, won't it be waiting forever with that definition?
regards, tom lane
No, this isn't streaming. It hits the PITR limit (recovery_target_*), or runs out of archived wal, and then it opens for business.
Jeff