Re: when the startup process doesn't (logging startup delays) - Mailing list pgsql-hackers
From | Nitin Jadhav |
---|---|
Subject | Re: when the startup process doesn't (logging startup delays) |
Date | |
Msg-id | CAMm1aWZqQNdjawULasOPr0Kt0JaweRoDF2W1353O7VUraxisVA@mail.gmail.com Whole thread Raw |
In response to | Re: when the startup process doesn't (logging startup delays) (Nitin Jadhav <nitinjadhavpostgres@gmail.com>) |
Responses |
Re: when the startup process doesn't (logging startup delays)
|
List | pgsql-hackers |
> Michael is right. You updated some of the units based on Robert's suggestion > to use MS, but didn't update all of the corresponding parts of the patch. > guc.c says that the units are in MS, which means that unqualified values are > interpretted as such. But postgresql.conf.sample still says "seconds", and > guc.c says the default value is "10", and you still do: > > + interval_in_ms = log_startup_progress_interval * 1000; > > I checked that this currently does not interpret the value as ms: > |./tmp_install/usr/local/pgsql/bin/postgres -D src/test/regress/tmp_check/data/ -c log_startup_progress_interval=1 > |2021-09-07 06:28:58.694 CDT startup[18865] LOG: redo in progress, elapsed time: 1.00 s, current LSN: 0/E94ED88 > |2021-09-07 06:28:59.694 CDT startup[18865] LOG: redo in progress, elapsed time: 2.00 s, current LSN: 0/10808EE0 > |2021-09-07 06:29:00.694 CDT startup[18865] LOG: redo in progress, elapsed time: 3.00 s, current LSN: 0/126B8C80 > > (Also, the GUC value is in the range 0..INT_MAX, so multiplying and storing to > another int could overflow.) > > I think the convention is to for GUC global vars to be initialized with the > same default as in guc.c, so both should be 10000, like: > > +int log_startup_progress_interval = 10 * 1000 /* 10sec */ Thanks Justin for the detailed explanation. Done the necessary changes. Please find the updated patch attached. On Mon, Sep 13, 2021 at 8:32 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > > I think you're confusing discussions. > > > > Robert was talking about initdb/bootstrap/single, and I separately and > > independently asked about hot standbys. It seems like Robert and I agreed > > about the desired behavior and there was no further discussion. > > Sorry for including 2 separate points into one. > > > Looking over this version, I realized something I (or you) should have > > noticed sooner: you've added the RegisterTimeout call to > > InitPostgres(), but that's for things that are used by all backends, > > and this is only used by the startup process. So it seems to me that > > the right place is StartupProcessMain. That would have the further > > advantage of allowing startup_progress_timeout_handler to be made > > static. begin_startup_progress_phase() and > > startup_progress_timeout_has_expired() are the actual API functions > > though so they will need to remain extern. > > The earlier discussion wrt this point is as follows. > > > > I also agree that this is the better place to do it. Hence modified > > > the patch accordingly. The condition "!AmStartupProcess()" is added to > > > differentiate whether the call is done from a startup process or some > > > other process. Actually StartupXLOG() gets called in 2 places. one in > > > StartupProcessMain() and the other in InitPostgres(). As the logging > > > of the startup progress is required only during the startup process > > > and not in the other cases, > > > > The InitPostgres() case occurs when the server is started in bootstrap > > mode (during initdb) or in single-user mode (postgres --single). I do > > not see any reason why we shouldn't produce progress messages in at > > least the latter case. I suspect that someone who is in the rather > > desperate scenario of having to use single-user mode would really like > > to know how long the server is going to take to start up. > > > > Perhaps during initdb we don't want messages, but I'm not sure that we > > need to do anything about that here. None of the messages that the > > server normally produces show up when you run initdb, so I guess they > > are getting redirected to /dev/null or something. > > > > So I don't think that using AmStartupProcess() for this purpose is right. > > This point is really confusing. As per the earlier discussion we > concluded to include RegisterTimeout() call even in case of > InitPostgres() to support logging in case of single-user mode. Now if > we remove the RegisterTimeout() call from InitPostgres(), we are not > going to support that anymore. Is this what you're trying to convey? > or we should add some checks and disable the code to RegisterTimeout() > if it is other than single-user mode. I have added a check if > (!IsPostmasterEnvironment) in the attached patch for this scenario. > Kindly confirm my understanding. > > > > Should this feature distinguish between crash recovery and archive recovery on > > > a hot standby ? Otherwise the standby will display this all the time. > > > > > >2021-08-14 16:13:33.139 CDT startup[11741] LOG: redo in progress, elapsed time: 124.42 s, current LSN: 0/EEE2100 > > > > > >If so, I think maybe you'd check !InArchiveRecovery (but until Robert finishes > > > cleanup of xlog.c variables, I can't say that with much confidence). > > > > Hmm. My inclination is to think that on an actual standby, you > > wouldn't want to get messages like this, but if you were doing a > > point-in-time-recovery, then you would. So I think maybe > > InArchiveRecovery is not the right thing. Perhaps StandbyMode? > > I also feel that the log messages should be recorded in case of > point-in-time-recovery. So I have added a check if (!StandbyMode) and > verified the replication and point-in-time-recovery scenario. > > > > Some doc comments: > > > > Thanks for the suggestions. I will take care in the next patch. > > Fixed. > > > Michael is right. You updated some of the units based on Robert's suggestion > > to use MS, but didn't update all of the corresponding parts of the patch. > > guc.c says that the units are in MS, which means that unqualified values are > > interpretted as such. But postgresql.conf.sample still says "seconds", and > > guc.c says the default value is "10", and you still do: > > > > + interval_in_ms = log_startup_progress_interval * 1000; > > > > I checked that this currently does not interpret the value as ms: > > |./tmp_install/usr/local/pgsql/bin/postgres -D src/test/regress/tmp_check/data/ -c log_startup_progress_interval=1 > > |2021-09-07 06:28:58.694 CDT startup[18865] LOG: redo in progress, elapsed time: 1.00 s, current LSN: 0/E94ED88 > > |2021-09-07 06:28:59.694 CDT startup[18865] LOG: redo in progress, elapsed time: 2.00 s, current LSN: 0/10808EE0 > > |2021-09-07 06:29:00.694 CDT startup[18865] LOG: redo in progress, elapsed time: 3.00 s, current LSN: 0/126B8C80 > > > > (Also, the GUC value is in the range 0..INT_MAX, so multiplying and storing to > > another int could overflow.) > > > > I think the convention is to for GUC global vars to be initialized with the > > same default as in guc.c, so both should be 10000, like: > > > > +int log_startup_progress_interval = 10 * 1000 /* 10sec */ > > Following is the discussion done wrt this point. Kindly refer and > share your thoughts. > > > > > I suggest making the GUC GUC_UNIT_MS rather than GUC_UNIT_S, but > > > > expressing the default in postgresl.conf.sample as 10s rather than > > > > 10000ms. I tried values measured in milliseconds just for testing > > > > purposes and didn't initially understand why it wasn't working. I > > > > don't think there's any reason it can't work. > > > > > > As suggested, I have changed it to GUC_UNIT_MS and kept the default > > > value to 10s. I would like to know the reason why it can't be > > > GUC_UNIT_S as we are expressing the values in terms of seconds. > > > > I mean, it *could* be. There's just no advantage. Values in seconds > > will work correctly either way. But values in milliseconds will only > > work if it's GUC_UNIT_MS. It seems to me that it's better to make more > > things work rather than fewer. > > Thanks & Regards, > Nitin Jadhav > On Tue, Sep 7, 2021 at 5:19 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > On Tue, Sep 07, 2021 at 03:07:15PM +0530, Nitin Jadhav wrote: > > > > Looking over this version, I realized something I (or you) should have > > > > noticed sooner: you've added the RegisterTimeout call to > > > > InitPostgres(), but that's for things that are used by all backends, > > > > and this is only used by the startup process. So it seems to me that > > > > the right place is StartupProcessMain. That would have the further > > > > advantage of allowing startup_progress_timeout_handler to be made > > > > static. begin_startup_progress_phase() and > > > > startup_progress_timeout_has_expired() are the actual API functions > > > > though so they will need to remain extern. > > > > > > > > I agree with Robert that a standby server should not continuously show timing > > > > messages for WAL replay. > > > > > > The earlier discussion wrt this point is as follows. > > > > I think you're confusing discussions. > > > > Robert was talking about initdb/bootstrap/single, and I separately and > > independently asked about hot standbys. It seems like Robert and I agreed > > about the desired behavior and there was no further discussion. > > > > > > Honestly, I don't see why we should have > > > > a GUC for what's proposed here. A value too low would bloat the logs > > > > with entries that are not that meaningful. A value too large would > > > > just prevent access to some information wanted. Wouldn't it be better > > > > to just pick up a value like 10s or 20s? > > > > I don't think bloating logs is a issue for values > 10sec. > > > > You agreed that it's important to choose the "right" value, but I think that > > will vary between users. Some installations with large checkpoint_timeout > > might anticipate taking 15+min to perform recovery, but others might even have > > a strict requirement that recovery must not take more than (say) 10sec; someone > > might want to use this to verify that, or to optimize the slow parts of > > recovery, with an interval that someone else might not care about. > > > > > > + {"log_startup_progress_interval", PGC_SIGHUP, LOGGING_WHEN, > > > > + gettext_noop("Sets the time interval between each progress update " > > > > + "of the startup process."), > > > > + gettext_noop("0 logs all messages. -1 turns this feature off."), > > > > + GUC_UNIT_MS, > > |+ 10, -1, INT_MAX, > > > > The unit is incorrect here, as that would default to 10ms, contrary to > > > > what the documentation says about 10s. > > > > > > Kindly refer the previous few discussions wrt this point. > > > > You copied and pasted unrelated emails, which isn't helpful. > > > > Michael is right. You updated some of the units based on Robert's suggestion > > to use MS, but didn't update all of the corresponding parts of the patch. > > guc.c says that the units are in MS, which means that unqualified values are > > interpretted as such. But postgresql.conf.sample still says "seconds", and > > guc.c says the default value is "10", and you still do: > > > > + interval_in_ms = log_startup_progress_interval * 1000; > > > > I checked that this currently does not interpret the value as ms: > > |./tmp_install/usr/local/pgsql/bin/postgres -D src/test/regress/tmp_check/data/ -c log_startup_progress_interval=1 > > |2021-09-07 06:28:58.694 CDT startup[18865] LOG: redo in progress, elapsed time: 1.00 s, current LSN: 0/E94ED88 > > |2021-09-07 06:28:59.694 CDT startup[18865] LOG: redo in progress, elapsed time: 2.00 s, current LSN: 0/10808EE0 > > |2021-09-07 06:29:00.694 CDT startup[18865] LOG: redo in progress, elapsed time: 3.00 s, current LSN: 0/126B8C80 > > > > (Also, the GUC value is in the range 0..INT_MAX, so multiplying and storing to > > another int could overflow.) > > > > I think the convention is to for GUC global vars to be initialized with the > > same default as in guc.c, so both should be 10000, like: > > > > +int log_startup_progress_interval = 10 * 1000 /* 10sec */ > > > > -- > > Justin
Attachment
pgsql-hackers by date: