Re: when the startup process doesn't (logging startup delays) - Mailing list pgsql-hackers

From Nitin Jadhav
Subject Re: when the startup process doesn't (logging startup delays)
Date
Msg-id CAMm1aWZqQNdjawULasOPr0Kt0JaweRoDF2W1353O7VUraxisVA@mail.gmail.com
Whole thread Raw
In response to Re: when the startup process doesn't (logging startup delays)  (Nitin Jadhav <nitinjadhavpostgres@gmail.com>)
Responses Re: when the startup process doesn't (logging startup delays)
List pgsql-hackers
> Michael is right.  You updated some of the units based on Robert's suggestion
> to use MS, but didn't update all of the corresponding parts of the patch.
> guc.c says that the units are in MS, which means that unqualified values are
> interpretted as such.  But postgresql.conf.sample still says "seconds", and
> guc.c says the default value is "10", and you still do:
>
> +       interval_in_ms = log_startup_progress_interval * 1000;
>
> I checked that this currently does not interpret the value as ms:
> |./tmp_install/usr/local/pgsql/bin/postgres -D src/test/regress/tmp_check/data/ -c log_startup_progress_interval=1
> |2021-09-07 06:28:58.694 CDT startup[18865] LOG:  redo in progress, elapsed time: 1.00 s, current LSN: 0/E94ED88
> |2021-09-07 06:28:59.694 CDT startup[18865] LOG:  redo in progress, elapsed time: 2.00 s, current LSN: 0/10808EE0
> |2021-09-07 06:29:00.694 CDT startup[18865] LOG:  redo in progress, elapsed time: 3.00 s, current LSN: 0/126B8C80
>
> (Also, the GUC value is in the range 0..INT_MAX, so multiplying and storing to
> another int could overflow.)
>
> I think the convention is to for GUC global vars to be initialized with the
> same default as in guc.c, so both should be 10000, like:
>
> +int log_startup_progress_interval = 10 * 1000 /* 10sec */

Thanks Justin for the detailed explanation.  Done the necessary changes.

Please find the updated patch attached.


On Mon, Sep 13, 2021 at 8:32 PM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
>
> > I think you're confusing discussions.
> >
> > Robert was talking about initdb/bootstrap/single, and I separately and
> > independently asked about hot standbys.  It seems like Robert and I agreed
> > about the desired behavior and there was no further discussion.
>
> Sorry for including 2 separate points into one.
>
> > Looking over this version, I realized something I (or you) should have
> > noticed sooner: you've added the RegisterTimeout call to
> > InitPostgres(), but that's for things that are used by all backends,
> > and this is only used by the startup process. So it seems to me that
> > the right place is StartupProcessMain. That would have the further
> > advantage of allowing startup_progress_timeout_handler to be made
> > static. begin_startup_progress_phase() and
> > startup_progress_timeout_has_expired() are the actual API functions
> > though so they will need to remain extern.
>
> The earlier discussion wrt this point is as follows.
>
> > > I also agree that this is the better place to do it. Hence modified
> > > the patch accordingly. The condition "!AmStartupProcess()" is added to
> > > differentiate whether the call is done from a startup process or some
> > > other process. Actually StartupXLOG() gets called in 2 places. one in
> > > StartupProcessMain() and the other in InitPostgres(). As the logging
> > > of the startup progress is required only during the startup process
> > > and not in the other cases,
> >
> > The InitPostgres() case occurs when the server is started in bootstrap
> > mode (during initdb) or in single-user mode (postgres --single). I do
> > not see any reason why we shouldn't produce progress messages in at
> > least the latter case. I suspect that someone who is in the rather
> > desperate scenario of having to use single-user mode would really like
> > to know how long the server is going to take to start up.
> >
> > Perhaps during initdb we don't want messages, but I'm not sure that we
> > need to do anything about that here. None of the messages that the
> > server normally produces show up when you run initdb, so I guess they
> > are getting redirected to /dev/null or something.
> >
> > So I don't think that using AmStartupProcess() for this purpose is right.
>
> This point is really confusing. As per the earlier discussion we
> concluded to include RegisterTimeout() call even in case of
> InitPostgres() to support logging in case of single-user mode. Now if
> we remove the RegisterTimeout() call from InitPostgres(), we are not
> going to support that anymore. Is this what you're trying to convey?
> or we should add some checks and disable the code to RegisterTimeout()
> if it is other than single-user mode. I have added a check if
> (!IsPostmasterEnvironment) in the attached patch for this scenario.
> Kindly confirm my understanding.
>
> > > Should this feature distinguish between crash recovery and archive recovery on
> > > a hot standby ?  Otherwise the standby will display this all the time.
> > >
> > >2021-08-14 16:13:33.139 CDT startup[11741] LOG:  redo in progress, elapsed time: 124.42 s, current LSN: 0/EEE2100
> > >
> > >If so, I think maybe you'd check !InArchiveRecovery (but until Robert finishes
> > > cleanup of xlog.c variables, I can't say that with much confidence).
> >
> > Hmm. My inclination is to think that on an actual standby, you
> > wouldn't want to get messages like this, but if you were doing a
> > point-in-time-recovery, then you would. So I think maybe
> > InArchiveRecovery is not the right thing. Perhaps StandbyMode?
>
> I also feel that the log messages should be recorded in case of
> point-in-time-recovery. So I have added a check if (!StandbyMode) and
> verified the replication and point-in-time-recovery scenario.
>
> > > Some doc comments:
> >
> > Thanks for the suggestions. I will take care in the next patch.
>
> Fixed.
>
> > Michael is right.  You updated some of the units based on Robert's suggestion
> > to use MS, but didn't update all of the corresponding parts of the patch.
> > guc.c says that the units are in MS, which means that unqualified values are
> > interpretted as such.  But postgresql.conf.sample still says "seconds", and
> > guc.c says the default value is "10", and you still do:
> >
> > +       interval_in_ms = log_startup_progress_interval * 1000;
> >
> > I checked that this currently does not interpret the value as ms:
> > |./tmp_install/usr/local/pgsql/bin/postgres -D src/test/regress/tmp_check/data/ -c log_startup_progress_interval=1
> > |2021-09-07 06:28:58.694 CDT startup[18865] LOG:  redo in progress, elapsed time: 1.00 s, current LSN: 0/E94ED88
> > |2021-09-07 06:28:59.694 CDT startup[18865] LOG:  redo in progress, elapsed time: 2.00 s, current LSN: 0/10808EE0
> > |2021-09-07 06:29:00.694 CDT startup[18865] LOG:  redo in progress, elapsed time: 3.00 s, current LSN: 0/126B8C80
> >
> > (Also, the GUC value is in the range 0..INT_MAX, so multiplying and storing to
> > another int could overflow.)
> >
> > I think the convention is to for GUC global vars to be initialized with the
> > same default as in guc.c, so both should be 10000, like:
> >
> > +int log_startup_progress_interval = 10 * 1000 /* 10sec */
>
> Following is the discussion done wrt this point. Kindly refer and
> share your thoughts.
>
> > > > I suggest making the GUC GUC_UNIT_MS rather than GUC_UNIT_S, but
> > > > expressing the default in postgresl.conf.sample as 10s rather than
> > > > 10000ms. I tried values measured in milliseconds just for testing
> > > > purposes and didn't initially understand why it wasn't working. I
> > > > don't think there's any reason it can't work.
> > >
> > > As suggested, I have changed it to GUC_UNIT_MS and kept the default
> > > value to 10s. I would like to know the reason why it can't be
> > > GUC_UNIT_S as we are expressing the values in terms of seconds.
> >
> > I mean, it *could* be. There's just no advantage. Values in seconds
> > will work correctly either way. But values in milliseconds will only
> > work if it's GUC_UNIT_MS. It seems to me that it's better to make more
> > things work rather than fewer.
>
> Thanks & Regards,
> Nitin Jadhav
> On Tue, Sep 7, 2021 at 5:19 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> >
> > On Tue, Sep 07, 2021 at 03:07:15PM +0530, Nitin Jadhav wrote:
> > > > Looking over this version, I realized something I (or you) should have
> > > > noticed sooner: you've added the RegisterTimeout call to
> > > > InitPostgres(), but that's for things that are used by all backends,
> > > > and this is only used by the startup process. So it seems to me that
> > > > the right place is StartupProcessMain. That would have the further
> > > > advantage of allowing startup_progress_timeout_handler to be made
> > > > static. begin_startup_progress_phase() and
> > > > startup_progress_timeout_has_expired() are the actual API functions
> > > > though so they will need to remain extern.
> > > >
> > > > I agree with Robert that a standby server should not continuously show timing
> > > > messages for WAL replay.
> > >
> > > The earlier discussion wrt this point is as follows.
> >
> > I think you're confusing discussions.
> >
> > Robert was talking about initdb/bootstrap/single, and I separately and
> > independently asked about hot standbys.  It seems like Robert and I agreed
> > about the desired behavior and there was no further discussion.
> >
> > > > Honestly, I don't see why we should have
> > > > a GUC for what's proposed here.  A value too low would bloat the logs
> > > > with entries that are not that meaningful.  A value too large would
> > > > just prevent access to some information wanted.  Wouldn't it be better
> > > > to just pick up a value like 10s or 20s?
> >
> > I don't think bloating logs is a issue for values > 10sec.
> >
> > You agreed that it's important to choose the "right" value, but I think that
> > will vary between users.  Some installations with large checkpoint_timeout
> > might anticipate taking 15+min to perform recovery, but others might even have
> > a strict requirement that recovery must not take more than (say) 10sec; someone
> > might want to use this to verify that, or to optimize the slow parts of
> > recovery, with an interval that someone else might not care about.
> >
> > > > +       {"log_startup_progress_interval", PGC_SIGHUP, LOGGING_WHEN,
> > > > +           gettext_noop("Sets the time interval between each progress update "
> > > > +                        "of the startup process."),
> > > > +           gettext_noop("0 logs all messages. -1 turns this feature off."),
> > > > +           GUC_UNIT_MS,
> > |+               10, -1, INT_MAX,
> > > > The unit is incorrect here, as that would default to 10ms, contrary to
> > > > what the documentation says about 10s.
> > >
> > > Kindly refer the previous few discussions wrt this point.
> >
> > You copied and pasted unrelated emails, which isn't helpful.
> >
> > Michael is right.  You updated some of the units based on Robert's suggestion
> > to use MS, but didn't update all of the corresponding parts of the patch.
> > guc.c says that the units are in MS, which means that unqualified values are
> > interpretted as such.  But postgresql.conf.sample still says "seconds", and
> > guc.c says the default value is "10", and you still do:
> >
> > +       interval_in_ms = log_startup_progress_interval * 1000;
> >
> > I checked that this currently does not interpret the value as ms:
> > |./tmp_install/usr/local/pgsql/bin/postgres -D src/test/regress/tmp_check/data/ -c log_startup_progress_interval=1
> > |2021-09-07 06:28:58.694 CDT startup[18865] LOG:  redo in progress, elapsed time: 1.00 s, current LSN: 0/E94ED88
> > |2021-09-07 06:28:59.694 CDT startup[18865] LOG:  redo in progress, elapsed time: 2.00 s, current LSN: 0/10808EE0
> > |2021-09-07 06:29:00.694 CDT startup[18865] LOG:  redo in progress, elapsed time: 3.00 s, current LSN: 0/126B8C80
> >
> > (Also, the GUC value is in the range 0..INT_MAX, so multiplying and storing to
> > another int could overflow.)
> >
> > I think the convention is to for GUC global vars to be initialized with the
> > same default as in guc.c, so both should be 10000, like:
> >
> > +int log_startup_progress_interval = 10 * 1000 /* 10sec */
> >
> > --
> > Justin

Attachment

pgsql-hackers by date:

Previous
From: "Jonathan S. Katz"
Date:
Subject: Re: PostgreSQL 14 press release draft
Next
From: Fujii Masao
Date:
Subject: Re: Refactoring postgres_fdw code to rollback remote transaction