Thread: when the startup process doesn't

when the startup process doesn't

From
Robert Haas
Date:
Hi,

I've noticed that customers not infrequently complain that they start
postgres and then the system doesn't come up for a while and they have
no idea what's going on and are (understandably) worried. There are
probably a number of reasons why this can happen, but the ones that
seem to come up most often in my experience are (1) SyncDataDirectory
takes a long time, (b) ResetUnloggedRelations takes a long time, and
(c) there's a lot of WAL to apply so that takes a long time. It's
possible to distinguish this last case from the other two by looking
at the output of 'ps', but that's not super-convenient if your normal
method of access to the server is via libpq, and it only works if you
are monitoring it as it's happening rather than looking at the logs
after-the-fact. I am not sure there's any real way to distinguish the
other two cases without using strace or gdb or similar.

It seems to me that we could do better. One approach would be to try
to issue a log message periodically - maybe once per minute, or some
configurable interval, e.g. perhaps add messages something like this:

LOG:  still syncing data directory, elapsed time %ld.%03d ms, current path %s
LOG:  data directory sync complete after %ld.%03d ms
LOG:  still resetting unlogged relations, elapsed time %ld.%03d ms,
current path %s
LOG:  unlogged relations reset after %ld.%03d ms
LOG:  still performing crash recovery, elapsed time %ld.%03d ms,
current LSN %08X/%08X

We already have a message when redo is complete, so there's no need
for another one. The implementation here doesn't seem too hard either:
the startup process would set a timer, when the timer expires the
signal handler sets a flag, at a convenient point we notice the flag
is set and responding by printing a message and clearing the flag.

Another possible approach would be to accept connections for
monitoring purposes even during crash recovery. We can't allow access
to any database at that point, since the system might not be
consistent, but we could allow something like a replication connection
(the non-database-associated variant). Maybe it would be precisely a
replication connection and we'd just refuse all but a subset of
commands, or maybe it would be some other kinds of thing. But either
way you'd be able to issue a command in some mini-language saying "so,
tell me how startup is going" and it would reply with a result set of
some kind.

If I had to pick one of these two ideas, I'd pick the one the
log-based solution, since it seems easier to access and simplifies
retrospective analysis, but I suspect SQL access would be quite useful
for some users too, especially in cloud environments where "just log
into the machine and have a look" is not an option.

Thoughts?

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't

From
Alvaro Herrera
Date:
On 2021-Apr-19, Robert Haas wrote:

> Another possible approach would be to accept connections for
> monitoring purposes even during crash recovery. We can't allow access
> to any database at that point, since the system might not be
> consistent, but we could allow something like a replication connection
> (the non-database-associated variant).

Hmm.  We already have pg_isready, which is pretty simplistic -- it tries
to connect to the server and derive a status in a very simplistic way.
Can we perhaps improve on that?  I think your idea of using the
non-database-connected replication mode would let the server return a
tuple with some status information with a new command.  And then
pg_isready could interpret that, or just print it.

-- 
Álvaro Herrera                            39°49'30"S 73°17'W
Subversion to GIT: the shortest path to happiness I've ever heard of
                                                (Alexey Klyukin)



Re: when the startup process doesn't

From
Bruce Momjian
Date:
On Mon, Apr 19, 2021 at 01:55:13PM -0400, Robert Haas wrote:
> I've noticed that customers not infrequently complain that they start
> postgres and then the system doesn't come up for a while and they have
> no idea what's going on and are (understandably) worried. There are
> probably a number of reasons why this can happen, but the ones that
> seem to come up most often in my experience are (1) SyncDataDirectory
> takes a long time, (b) ResetUnloggedRelations takes a long time, and
> (c) there's a lot of WAL to apply so that takes a long time. It's
> possible to distinguish this last case from the other two by looking
> at the output of 'ps', but that's not super-convenient if your normal
> method of access to the server is via libpq, and it only works if you
> are monitoring it as it's happening rather than looking at the logs
> after-the-fact. I am not sure there's any real way to distinguish the
> other two cases without using strace or gdb or similar.
> 
> It seems to me that we could do better. One approach would be to try
> to issue a log message periodically - maybe once per minute, or some
> configurable interval, e.g. perhaps add messages something like this:

Yes, this certainly needs improvement.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: when the startup process doesn't

From
Thomas Munro
Date:
On Tue, Apr 20, 2021 at 5:55 AM Robert Haas <robertmhaas@gmail.com> wrote:
> If I had to pick one of these two ideas, I'd pick the one the
> log-based solution, since it seems easier to access and simplifies
> retrospective analysis, but I suspect SQL access would be quite useful
> for some users too, especially in cloud environments where "just log
> into the machine and have a look" is not an option.

+1 for both ideas.  I've heard multiple requests for something like
that.  A couple of users with update_process_title=off told me they
regretted that choice when they found themselves running a long crash
recovery with the only indicator of progress disabled.



Re: when the startup process doesn't

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> On Tue, Apr 20, 2021 at 5:55 AM Robert Haas <robertmhaas@gmail.com> wrote:
>> If I had to pick one of these two ideas, I'd pick the one the
>> log-based solution, since it seems easier to access and simplifies
>> retrospective analysis, but I suspect SQL access would be quite useful
>> for some users too, especially in cloud environments where "just log
>> into the machine and have a look" is not an option.

> +1 for both ideas.  I've heard multiple requests for something like
> that.  A couple of users with update_process_title=off told me they
> regretted that choice when they found themselves running a long crash
> recovery with the only indicator of progress disabled.

Hmm ... +1 for progress messages in the log, but I'm suspicious about
the complexity-and-fragility-versus-value tradeoff for the other thing.

            regards, tom lane



Re: when the startup process doesn't

From
Magnus Hagander
Date:
On Mon, Apr 19, 2021 at 7:55 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> Hi,
>
> I've noticed that customers not infrequently complain that they start
> postgres and then the system doesn't come up for a while and they have
> no idea what's going on and are (understandably) worried. There are
> probably a number of reasons why this can happen, but the ones that
> seem to come up most often in my experience are (1) SyncDataDirectory
> takes a long time, (b) ResetUnloggedRelations takes a long time, and
> (c) there's a lot of WAL to apply so that takes a long time. It's
> possible to distinguish this last case from the other two by looking
> at the output of 'ps', but that's not super-convenient if your normal
> method of access to the server is via libpq, and it only works if you
> are monitoring it as it's happening rather than looking at the logs
> after-the-fact. I am not sure there's any real way to distinguish the
> other two cases without using strace or gdb or similar.
>
> It seems to me that we could do better. One approach would be to try
> to issue a log message periodically - maybe once per minute, or some
> configurable interval, e.g. perhaps add messages something like this:
>
> LOG:  still syncing data directory, elapsed time %ld.%03d ms, current path %s
> LOG:  data directory sync complete after %ld.%03d ms
> LOG:  still resetting unlogged relations, elapsed time %ld.%03d ms,
> current path %s
> LOG:  unlogged relations reset after %ld.%03d ms
> LOG:  still performing crash recovery, elapsed time %ld.%03d ms,
> current LSN %08X/%08X
>
> We already have a message when redo is complete, so there's no need
> for another one. The implementation here doesn't seem too hard either:
> the startup process would set a timer, when the timer expires the
> signal handler sets a flag, at a convenient point we notice the flag
> is set and responding by printing a message and clearing the flag.
>
> Another possible approach would be to accept connections for
> monitoring purposes even during crash recovery. We can't allow access
> to any database at that point, since the system might not be
> consistent, but we could allow something like a replication connection
> (the non-database-associated variant). Maybe it would be precisely a
> replication connection and we'd just refuse all but a subset of
> commands, or maybe it would be some other kinds of thing. But either
> way you'd be able to issue a command in some mini-language saying "so,
> tell me how startup is going" and it would reply with a result set of
> some kind.
>
> If I had to pick one of these two ideas, I'd pick the one the
> log-based solution, since it seems easier to access and simplifies
> retrospective analysis, but I suspect SQL access would be quite useful
> for some users too, especially in cloud environments where "just log
> into the machine and have a look" is not an option.
>
> Thoughts?

(Ugh. Did reply instead of reply-all. Surely I should know that by
now... Here's a re-send!)

+1 for the log based one.

In general I'm usually against the log based one, but something over
the replication protocol is really not going to help a lot of people
who are in this situation. They may not even have permissions to log
in, and any kind of monitoring system would fail to work as well. And
can we even log users in at this point? We can't get the list of
roles... If we could, I would say it's probably better to allow the
login in a regular connection, but then immediately throw an error and
give this error a more detailed message if the user has monitoring
permissions.

But against either of those, the log based method is certainly a lot
easier to build :)

And FWIW, I believe most -- probably all -- cloud environments do give
an interface to view the log at least, so the log based solution would
work there as well. Maybe not as convenient, but it would work.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: when the startup process doesn't

From
Robert Haas
Date:
On Mon, Apr 19, 2021 at 8:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Hmm ... +1 for progress messages in the log, but I'm suspicious about
> the complexity-and-fragility-versus-value tradeoff for the other thing.

All right, so it's nice to hear that nobody so far is opposed to the
log-based solution, and I think it's sensible to think about building
that one first and doing anything else later.

But, if we did want to invent something to allow monitoring via libpq
even at this early stage, how would we make it work? Magnus pointed
out that we can hardly read pg_authid during crash recovery, which
means that accepting logins in the usual sense at that stage is not
feasible. But, what if we picked a fixed, hard-coded role name for
this? I would suggest pg_monitor, but that's already taken for
something else, so maybe pg_console or some better thing someone else
can suggest. Without a pg_authid row, you couldn't use password, md5,
or scram authentication, unless we provided some other place to store
the verifier, like a flatfile. I'm not sure we want to go there, but
that still leaves a lot of workable authentication methods.

I think Álvaro is right to see this kind of work as an extension of
pg_isready, but the problem with pg_isready is that we don't want to
expose a lot of information to the whole Internet, or however much of
it can reach the postgres port. But with this approach, you can lock
down access via pg_hba.conf, which means that it's OK to expose
information that we don't want to make available to everyone. I think
we're still limited to exposing what can be observed from shared
memory here, because the whole idea is to have something that can be
used even before consistency is reached, so we shouldn't really be
doing anything that would look at the contents of data files. But that
still leaves a bunch of things that we could show here, the progress
of the startup process being one of them.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't

From
Magnus Hagander
Date:
On Tue, Apr 20, 2021 at 2:43 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Apr 19, 2021 at 8:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Hmm ... +1 for progress messages in the log, but I'm suspicious about
> > the complexity-and-fragility-versus-value tradeoff for the other thing.
>
> All right, so it's nice to hear that nobody so far is opposed to the
> log-based solution, and I think it's sensible to think about building
> that one first and doing anything else later.
>
> But, if we did want to invent something to allow monitoring via libpq
> even at this early stage, how would we make it work? Magnus pointed
> out that we can hardly read pg_authid during crash recovery, which
> means that accepting logins in the usual sense at that stage is not
> feasible. But, what if we picked a fixed, hard-coded role name for
> this? I would suggest pg_monitor, but that's already taken for
> something else, so maybe pg_console or some better thing someone else
> can suggest. Without a pg_authid row, you couldn't use password, md5,
> or scram authentication, unless we provided some other place to store
> the verifier, like a flatfile. I'm not sure we want to go there, but
> that still leaves a lot of workable authentication methods.

Another option would be to keep this check entirely outside the scope
of normal roles, and just listen on a port (or unix socket) during
startup which basically just replies with the current status if you
connect to it. On Unix this could also make use of peer authentication
requiring you to be the same user as postgres for example.


> I think Álvaro is right to see this kind of work as an extension of
> pg_isready, but the problem with pg_isready is that we don't want to
> expose a lot of information to the whole Internet, or however much of
> it can reach the postgres port. But with this approach, you can lock
> down access via pg_hba.conf, which means that it's OK to expose
> information that we don't want to make available to everyone. I think
> we're still limited to exposing what can be observed from shared
> memory here, because the whole idea is to have something that can be
> used even before consistency is reached, so we shouldn't really be
> doing anything that would look at the contents of data files. But that
> still leaves a bunch of things that we could show here, the progress
> of the startup process being one of them.

Yeah, I think we should definitely limit this to local access, one way
or another. Realistically using pg_hba is going to require catalog
access, isn't it? And we can't just go ignore those rows in pg_hba
that for example references role membership (as well as all those auth
methods you can't use).

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: when the startup process doesn't

From
Jehan-Guillaume de Rorthais
Date:
On Tue, 20 Apr 2021 15:04:28 +0200
Magnus Hagander <magnus@hagander.net> wrote:
[...]
> Yeah, I think we should definitely limit this to local access, one way
> or another. Realistically using pg_hba is going to require catalog
> access, isn't it? And we can't just go ignore those rows in pg_hba
> that for example references role membership (as well as all those auth
> methods you can't use).

Two another options:

1. if this is limited to local access only, outside of the log entries, the
status of the startup could be updated in the controldata file as well. This
would allows to watch it without tail-grep'ing logs using eg. pg_controldata.

2. maybe the startup process could ignore update_process_title? As far
as I understand the doc correctly, this GUC is mostly useful for backends on
Windows.

Regards,



Re: when the startup process doesn't

From
Magnus Hagander
Date:
On Tue, Apr 20, 2021 at 5:17 PM Jehan-Guillaume de Rorthais
<jgdr@dalibo.com> wrote:
>
> On Tue, 20 Apr 2021 15:04:28 +0200
> Magnus Hagander <magnus@hagander.net> wrote:
> [...]
> > Yeah, I think we should definitely limit this to local access, one way
> > or another. Realistically using pg_hba is going to require catalog
> > access, isn't it? And we can't just go ignore those rows in pg_hba
> > that for example references role membership (as well as all those auth
> > methods you can't use).
>
> Two another options:
>
> 1. if this is limited to local access only, outside of the log entries, the
> status of the startup could be updated in the controldata file as well. This
> would allows to watch it without tail-grep'ing logs using eg. pg_controldata.

I think doing so in controldata would definitely make things
complicated for no real reason. Plus controldata has a fixed size (and
has to have), whereas something like this would probably want more
variation than that makes easy.

There could be a "startup.status" file I guess which would basically
contain the last line of what would otherwise be in the log. But if it
remains a textfile, I'm not sure what the gain is -- you'll just have
to have the dba look in more places than one to find it? It's not like
there's likely to be much other data written to the log during these
times?


> 2. maybe the startup process could ignore update_process_title? As far
> as I understand the doc correctly, this GUC is mostly useful for backends on
> Windows.

You mention Windows -- that would be one excellent reason not to go
for this particular method. Viewing the process title is much harder
on Windows, as there is actually no such thing and we fake it.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: when the startup process doesn't

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> On Tue, Apr 20, 2021 at 5:17 PM Jehan-Guillaume de Rorthais
> <jgdr@dalibo.com> wrote:
>> Two another options:
>> 1. if this is limited to local access only, outside of the log entries, the
>> status of the startup could be updated in the controldata file as well. This
>> would allows to watch it without tail-grep'ing logs using eg. pg_controldata.

> I think doing so in controldata would definitely make things
> complicated for no real reason. Plus controldata has a fixed size (and
> has to have), whereas something like this would probably want more
> variation than that makes easy.

Also, given that pg_control is as critical a bit of data as we have,
we really don't want to be writing it more often than we absolutely
have to.

> There could be a "startup.status" file I guess which would basically
> contain the last line of what would otherwise be in the log. But if it
> remains a textfile, I'm not sure what the gain is -- you'll just have
> to have the dba look in more places than one to find it? It's not like
> there's likely to be much other data written to the log during these
> times?

Yeah, once you are talking about dumping stuff in a file, it's not
clear how that's better than progress-messages-in-the-log.  People
already have a lot of tooling for looking at the postmaster log.

I think the point of Robert's other proposal is to allow remote
checks of the restart's progress, so local files aren't much of
a substitute anyway.

            regards, tom lane



Re: when the startup process doesn't

From
Stephen Frost
Date:
Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Magnus Hagander <magnus@hagander.net> writes:
> > On Tue, Apr 20, 2021 at 5:17 PM Jehan-Guillaume de Rorthais
> > <jgdr@dalibo.com> wrote:
> >> Two another options:
> >> 1. if this is limited to local access only, outside of the log entries, the
> >> status of the startup could be updated in the controldata file as well. This
> >> would allows to watch it without tail-grep'ing logs using eg. pg_controldata.
>
> > I think doing so in controldata would definitely make things
> > complicated for no real reason. Plus controldata has a fixed size (and
> > has to have), whereas something like this would probably want more
> > variation than that makes easy.
>
> Also, given that pg_control is as critical a bit of data as we have,
> we really don't want to be writing it more often than we absolutely
> have to.

Yeah, don't think pg_control fiddling is what we want.  I do agree with
improving the logging situation around here, certainly.

> > There could be a "startup.status" file I guess which would basically
> > contain the last line of what would otherwise be in the log. But if it
> > remains a textfile, I'm not sure what the gain is -- you'll just have
> > to have the dba look in more places than one to find it? It's not like
> > there's likely to be much other data written to the log during these
> > times?
>
> Yeah, once you are talking about dumping stuff in a file, it's not
> clear how that's better than progress-messages-in-the-log.  People
> already have a lot of tooling for looking at the postmaster log.

Agreed.

> I think the point of Robert's other proposal is to allow remote
> checks of the restart's progress, so local files aren't much of
> a substitute anyway.

Yeah, being able to pick up on this remotely seems like it'd be quite
nice.  I'm not really thrilled with the idea, but the best I've got
offhand for this would be a new role that's "pg_recovery_login" where an
admin can GRANT that role to the roles they'd like to be able to use to
login during the recovery process and then, for those roles, we write
out flat files to allow authentication without access to pg_authid,
whenever their password or such changes.  It's certainly a bit grotty
but I do think it'd work.  I definitely don't want to go back to having
all of pg_authid written as a flat file and I'd rather that existing
tools and libraries work with this (meaning using the same port and
speaking the PG protocol and such) rather than inventing some new thing
that listens on some other port, etc.

On the fence about tying this to 'pg_monitor' instead of using a new
predefined role.  Either way, I would definitely prefer to see the admin
have to create a role and then GRANT the predefined role to that role.
I really dislike the idea of having predefined roles that can be used to
directly log into the database.

Thanks,

Stephen

Attachment

Re: when the startup process doesn't

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> Yeah, being able to pick up on this remotely seems like it'd be quite
> nice.  I'm not really thrilled with the idea, but the best I've got
> offhand for this would be a new role that's "pg_recovery_login" where an
> admin can GRANT that role to the roles they'd like to be able to use to
> login during the recovery process and then, for those roles, we write
> out flat files to allow authentication without access to pg_authid,

We got rid of those flat files for good and sufficient reasons.  I really
really don't want to go back to having such.

I wonder though whether we really need authentication here.  pg_ping
already exposes whether the database is up, to anyone who can reach the
postmaster port at all.  Would it be so horrible if the "can't accept
connections" error message included a detail about "recovery is X%
done"?

            regards, tom lane



Re: when the startup process doesn't

From
Stephen Frost
Date:
Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > Yeah, being able to pick up on this remotely seems like it'd be quite
> > nice.  I'm not really thrilled with the idea, but the best I've got
> > offhand for this would be a new role that's "pg_recovery_login" where an
> > admin can GRANT that role to the roles they'd like to be able to use to
> > login during the recovery process and then, for those roles, we write
> > out flat files to allow authentication without access to pg_authid,
>
> We got rid of those flat files for good and sufficient reasons.  I really
> really don't want to go back to having such.

Yeah, certainly is part of the reason that I didn't really like that
idea either.

> I wonder though whether we really need authentication here.  pg_ping
> already exposes whether the database is up, to anyone who can reach the
> postmaster port at all.  Would it be so horrible if the "can't accept
> connections" error message included a detail about "recovery is X%
> done"?

Ultimately it seems like it would depend on exactly what we are thinking
of returning there.  A simple percentage of recovery which has been
completed doesn't seem like it'd really be revealing too much
information though.

Thanks,

Stephen

Attachment

Re: when the startup process doesn't

From
Andres Freund
Date:
Hi,

On 2021-04-19 13:55:13 -0400, Robert Haas wrote:
> Another possible approach would be to accept connections for
> monitoring purposes even during crash recovery. We can't allow access
> to any database at that point, since the system might not be
> consistent, but we could allow something like a replication connection
> (the non-database-associated variant). Maybe it would be precisely a
> replication connection and we'd just refuse all but a subset of
> commands, or maybe it would be some other kinds of thing. But either
> way you'd be able to issue a command in some mini-language saying "so,
> tell me how startup is going" and it would reply with a result set of
> some kind.

The hard part about this seems to be how to perform authentication -
obviously we can't do catalog lookups for users at that time.

If that weren't the issue, we could easily do much better than now, by
just providing an errdetail() with recovery progress information. But we
presumably don't want to spray such information to unauthenticated
connection attempts.


I've vaguely wondered before whether it'd be worth having something like
an "admin" socket somewhere in the data directory. Which explicitly
wouldn't require authentication, have the cluster owner as the user,
etc.  That'd not just be useful for monitoring during recovery, but also
make some interactions with the server easier for admin tools I think.



> If I had to pick one of these two ideas, I'd pick the one the
> log-based solution, since it seems easier to access and simplifies
> retrospective analysis, but I suspect SQL access would be quite useful
> for some users too, especially in cloud environments where "just log
> into the machine and have a look" is not an option.

However, leaving aside the implementation effort, the crazy idea
above would not easily address the issue of only being accessible with
local access...

Greetings,

Andres Freund



Re: when the startup process doesn't

From
Andres Freund
Date:
Hi,

On 2021-04-20 14:56:58 -0400, Tom Lane wrote:
> I wonder though whether we really need authentication here.  pg_ping
> already exposes whether the database is up, to anyone who can reach the
> postmaster port at all.  Would it be so horrible if the "can't accept
> connections" error message included a detail about "recovery is X%
> done"?

Unfortunately I think something like a percentage is hard to calculate
right now.  Even just looking at crash recovery (vs replication or
PITR), we don't currently know where the WAL ends without reading all
the WAL. The easiest thing to return would be something in LSNs or
bytes and I suspect that we don't want to expose either unauthenticated?

I wonder if we ought to occasionally update something like
ControlFileData->minRecoveryPoint on primaries, similar to what we do on
standbys? Then we could actually calculate a percentage, and it'd have
the added advantage of allowing to detect more cases where the end of
the WAL was lost. Obviously we'd have to throttle it somehow, to avoid
adding a lot of fsyncs, but that seems doable?

Greetings,

Andres Freund



Re: when the startup process doesn't

From
Alvaro Herrera
Date:
On 2021-Apr-20, Andres Freund wrote:

> On 2021-04-19 13:55:13 -0400, Robert Haas wrote:
> > Another possible approach would be to accept connections for
> > monitoring purposes even during crash recovery. We can't allow access
> > to any database at that point, since the system might not be
> > consistent, but we could allow something like a replication connection
> > (the non-database-associated variant). Maybe it would be precisely a
> > replication connection and we'd just refuse all but a subset of
> > commands, or maybe it would be some other kinds of thing. But either
> > way you'd be able to issue a command in some mini-language saying "so,
> > tell me how startup is going" and it would reply with a result set of
> > some kind.
> 
> The hard part about this seems to be how to perform authentication -
> obviously we can't do catalog lookups for users at that time.

Maybe a way to do this would involve some sort of monitoring cookie
that's obtained ahead of time (maybe at initdb time?) and is supplied to
the frontend by some OOB means.  Then frontend can present that during
startup to the server, which ascertains its legitimacy without having to
access catalogs.  Perhaps it even requires a specific pg_hba.conf rule.

-- 
Álvaro Herrera       Valdivia, Chile
"La verdad no siempre es bonita, pero el hambre de ella sí"



Re: when the startup process doesn't

From
SATYANARAYANA NARLAPURAM
Date:
+1 for both log messages and allowing connections. I believe these two complement each other. 

In the cloud world, we oftentimes want to monitor the progress of the recovery without connecting to the server as the operators don't necessarily have the required permissions to connect and query. Secondly, having this information in the log helps going back in time and understand where Postgres spent time during recovery.

The ability to query the server provides real time information  and come handy.

Thanks,
Satya



On Mon, Apr 19, 2021 at 10:55 AM Robert Haas <robertmhaas@gmail.com> wrote:
Hi,

I've noticed that customers not infrequently complain that they start
postgres and then the system doesn't come up for a while and they have
no idea what's going on and are (understandably) worried. There are
probably a number of reasons why this can happen, but the ones that
seem to come up most often in my experience are (1) SyncDataDirectory
takes a long time, (b) ResetUnloggedRelations takes a long time, and
(c) there's a lot of WAL to apply so that takes a long time. It's
possible to distinguish this last case from the other two by looking
at the output of 'ps', but that's not super-convenient if your normal
method of access to the server is via libpq, and it only works if you
are monitoring it as it's happening rather than looking at the logs
after-the-fact. I am not sure there's any real way to distinguish the
other two cases without using strace or gdb or similar.

It seems to me that we could do better. One approach would be to try
to issue a log message periodically - maybe once per minute, or some
configurable interval, e.g. perhaps add messages something like this:

LOG:  still syncing data directory, elapsed time %ld.%03d ms, current path %s
LOG:  data directory sync complete after %ld.%03d ms
LOG:  still resetting unlogged relations, elapsed time %ld.%03d ms,
current path %s
LOG:  unlogged relations reset after %ld.%03d ms
LOG:  still performing crash recovery, elapsed time %ld.%03d ms,
current LSN %08X/%08X

We already have a message when redo is complete, so there's no need
for another one. The implementation here doesn't seem too hard either:
the startup process would set a timer, when the timer expires the
signal handler sets a flag, at a convenient point we notice the flag
is set and responding by printing a message and clearing the flag.

Another possible approach would be to accept connections for
monitoring purposes even during crash recovery. We can't allow access
to any database at that point, since the system might not be
consistent, but we could allow something like a replication connection
(the non-database-associated variant). Maybe it would be precisely a
replication connection and we'd just refuse all but a subset of
commands, or maybe it would be some other kinds of thing. But either
way you'd be able to issue a command in some mini-language saying "so,
tell me how startup is going" and it would reply with a result set of
some kind.

If I had to pick one of these two ideas, I'd pick the one the
log-based solution, since it seems easier to access and simplifies
retrospective analysis, but I suspect SQL access would be quite useful
for some users too, especially in cloud environments where "just log
into the machine and have a look" is not an option.

Thoughts?

--
Robert Haas
EDB: http://www.enterprisedb.com


Re: when the startup process doesn't

From
Stephen Frost
Date:
Greetings,

* Andres Freund (andres@anarazel.de) wrote:
> On 2021-04-20 14:56:58 -0400, Tom Lane wrote:
> > I wonder though whether we really need authentication here.  pg_ping
> > already exposes whether the database is up, to anyone who can reach the
> > postmaster port at all.  Would it be so horrible if the "can't accept
> > connections" error message included a detail about "recovery is X%
> > done"?
>
> Unfortunately I think something like a percentage is hard to calculate
> right now.  Even just looking at crash recovery (vs replication or
> PITR), we don't currently know where the WAL ends without reading all
> the WAL. The easiest thing to return would be something in LSNs or
> bytes and I suspect that we don't want to expose either unauthenticated?

While it obviously wouldn't be exactly accurate, I wonder if we couldn't
just look at the WAL files we have to reply and then guess that we'll go
through about half of them before we reach the end..?  I mean, wouldn't
exactly be the first time that a percentage progress report wasn't
completely accurate. :)

> I wonder if we ought to occasionally update something like
> ControlFileData->minRecoveryPoint on primaries, similar to what we do on
> standbys? Then we could actually calculate a percentage, and it'd have
> the added advantage of allowing to detect more cases where the end of
> the WAL was lost. Obviously we'd have to throttle it somehow, to avoid
> adding a lot of fsyncs, but that seems doable?

This seems to go against Tom's concerns wrt rewriting pg_control.
Perhaps we could work through a solution to that, which would be nice,
but I'm not sure that we need the percentage to be super accurate
anyway, though, ideally, we'd work it out so that it's always increasing
and doesn't look "stuck" as long as we're actually moving forward
through the WAL.

Maybe a heuristic of 'look at the end of the WAL files, assume we'll go
through 50% of it, but only consider that to be 90%, with the last 10%
going from half-way through the WAL to the actual end of the WAL
available."

Yes, such heuristics are terrible, but they're also relatively simple
and wouldn't require tracking anything additional and would, maybe,
avoid the concern about needing to authenticate the user..

Thanks,

Stephen

Attachment

Re: when the startup process doesn't

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> * Andres Freund (andres@anarazel.de) wrote:
>> On 2021-04-20 14:56:58 -0400, Tom Lane wrote:
>>> I wonder though whether we really need authentication here.  pg_ping
>>> already exposes whether the database is up, to anyone who can reach the
>>> postmaster port at all.  Would it be so horrible if the "can't accept
>>> connections" error message included a detail about "recovery is X%
>>> done"?

>> Unfortunately I think something like a percentage is hard to calculate
>> right now.

> While it obviously wouldn't be exactly accurate, I wonder if we couldn't
> just look at the WAL files we have to reply and then guess that we'll go
> through about half of them before we reach the end..?  I mean, wouldn't
> exactly be the first time that a percentage progress report wasn't
> completely accurate. :)

Or we could skip all the guessing and just print something like what
the startup process exposes in ps status, ie "currently processing
WAL file so-and-so".

            regards, tom lane



Re: when the startup process doesn't

From
Andres Freund
Date:
Hi,

On 2021-04-21 14:36:24 -0400, Stephen Frost wrote:
> * Andres Freund (andres@anarazel.de) wrote:
> > Unfortunately I think something like a percentage is hard to calculate
> > right now.  Even just looking at crash recovery (vs replication or
> > PITR), we don't currently know where the WAL ends without reading all
> > the WAL. The easiest thing to return would be something in LSNs or
> > bytes and I suspect that we don't want to expose either unauthenticated?
> 
> While it obviously wouldn't be exactly accurate, I wonder if we couldn't
> just look at the WAL files we have to reply and then guess that we'll go
> through about half of them before we reach the end..?  I mean, wouldn't
> exactly be the first time that a percentage progress report wasn't
> completely accurate. :)

I don't think that'd work well, due to WAL segment recycling. We rename
WAL files into place when removing them, and sometimes that can be a
*lot* of files. It's one thing for there to be a ~20% inaccuracy in
estimated amount of work, another to have misestimates on the order of
magnitudes.



> > I wonder if we ought to occasionally update something like
> > ControlFileData->minRecoveryPoint on primaries, similar to what we do on
> > standbys? Then we could actually calculate a percentage, and it'd have
> > the added advantage of allowing to detect more cases where the end of
> > the WAL was lost. Obviously we'd have to throttle it somehow, to avoid
> > adding a lot of fsyncs, but that seems doable?
> 
> This seems to go against Tom's concerns wrt rewriting pg_control.

I don't think that concern equally applies for what I am proposing
here. For one, we already have minRecoveryPoint in ControlData, and we
already use it for the purpose of determining where we need to recover
to, albeit only during crash recovery. Imo that's substantially
different from adding actual recovery progress status information to the
control file.

I also think that it'd actually be a significant reliability improvement
if we maintained an approximate minRecoveryPoint during normal running:
I've seen way too many cases where WAL files were lost / removed and
crash recovery just started up happily. Only hitting problems months
down the line. Yes, it'd obviously not bullet proof, since we'd not want
to add a significant stream of new fsyncs, but IME such WAL files
lost/removed issues tend not to be about a few hundred bytes of WAL but
many segments missing.


Greetings,

Andres Freund



Re: when the startup process doesn't

From
Stephen Frost
Date:
Greetings,

* Andres Freund (andres@anarazel.de) wrote:
> On 2021-04-21 14:36:24 -0400, Stephen Frost wrote:
> > * Andres Freund (andres@anarazel.de) wrote:
> > > Unfortunately I think something like a percentage is hard to calculate
> > > right now.  Even just looking at crash recovery (vs replication or
> > > PITR), we don't currently know where the WAL ends without reading all
> > > the WAL. The easiest thing to return would be something in LSNs or
> > > bytes and I suspect that we don't want to expose either unauthenticated?
> >
> > While it obviously wouldn't be exactly accurate, I wonder if we couldn't
> > just look at the WAL files we have to reply and then guess that we'll go
> > through about half of them before we reach the end..?  I mean, wouldn't
> > exactly be the first time that a percentage progress report wasn't
> > completely accurate. :)
>
> I don't think that'd work well, due to WAL segment recycling. We rename
> WAL files into place when removing them, and sometimes that can be a
> *lot* of files. It's one thing for there to be a ~20% inaccuracy in
> estimated amount of work, another to have misestimates on the order of
> magnitudes.

I mean- we actively try to guess at how many WAL files we'll need during
each checkpoint and if we're doing that decently then it'd hopefully be
on about the order of half the files, as I suggested, that we'll end up
going through at any point in time.  Naturally, it'll be different if
there's a forced checkpoint or a sudden spike of activity, but I'm not
sure that it's an entirely unreasonable place to start if we're going to
be guessing at it.

> > > I wonder if we ought to occasionally update something like
> > > ControlFileData->minRecoveryPoint on primaries, similar to what we do on
> > > standbys? Then we could actually calculate a percentage, and it'd have
> > > the added advantage of allowing to detect more cases where the end of
> > > the WAL was lost. Obviously we'd have to throttle it somehow, to avoid
> > > adding a lot of fsyncs, but that seems doable?
> >
> > This seems to go against Tom's concerns wrt rewriting pg_control.
>
> I don't think that concern equally applies for what I am proposing
> here. For one, we already have minRecoveryPoint in ControlData, and we
> already use it for the purpose of determining where we need to recover
> to, albeit only during crash recovery. Imo that's substantially
> different from adding actual recovery progress status information to the
> control file.

I agree that it's not the same as adding actual recovery progress status
information.

> I also think that it'd actually be a significant reliability improvement
> if we maintained an approximate minRecoveryPoint during normal running:
> I've seen way too many cases where WAL files were lost / removed and
> crash recovery just started up happily. Only hitting problems months
> down the line. Yes, it'd obviously not bullet proof, since we'd not want
> to add a significant stream of new fsyncs, but IME such WAL files
> lost/removed issues tend not to be about a few hundred bytes of WAL but
> many segments missing.

I do agree that it's definitely a problem and one that I've seen as well
where we think we reach the end of recovery even though we didn't
actually.  Having a way to avoid that happening would be quite nice.  It
does seem like we have some trade-offs here to weigh, but pg_control is
indeed quite small..

Thanks,

Stephen

Attachment

Re: when the startup process doesn't

From
Andres Freund
Date:
Hi,

On 2021-04-21 15:51:38 -0400, Stephen Frost wrote:
> It does seem like we have some trade-offs here to weigh, but
> pg_control is indeed quite small..

What do you mean by that? That the overhead of writing it out more
frequently wouldn't be too bad? Or that we shouldn't "unnecessarily" add
more fields to it?

Greetings,

Andres Freund



Re: when the startup process doesn't

From
Stephen Frost
Date:
Greetings,

* Andres Freund (andres@anarazel.de) wrote:
> On 2021-04-21 15:51:38 -0400, Stephen Frost wrote:
> > It does seem like we have some trade-offs here to weigh, but
> > pg_control is indeed quite small..
>
> What do you mean by that? That the overhead of writing it out more
> frequently wouldn't be too bad? Or that we shouldn't "unnecessarily" add
> more fields to it?

Mostly just that the added overhead in writing it out more frequently
wouldn't be too bad.  Adding fields runs the risk of crossing the
threshold where we feel that we can safely assume all of it will make it
to disk in one shot and therefore there's more reason to not add extra
fields to it, if possible.

Seems the missing bit here is "how often, and how do we make that
happen?" and then we can discuss if there's reason to be concerned that
it would be 'too frequent' or cause too much additional overhead in
terms of IO/fsyncs.

Thanks,

Stephen

Attachment

Re: when the startup process doesn't

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> * Andres Freund (andres@anarazel.de) wrote:
>> What do you mean by that? That the overhead of writing it out more
>> frequently wouldn't be too bad? Or that we shouldn't "unnecessarily" add
>> more fields to it?

> Mostly just that the added overhead in writing it out more frequently
> wouldn't be too bad.

My concern about it was not at all about performance, but that every time
you write it is a new opportunity for the filesystem to lose or corrupt
the data.

            regards, tom lane



Re: when the startup process doesn't

From
Andres Freund
Date:
Hi,

On 2021-04-21 16:28:26 -0400, Stephen Frost wrote:
> * Andres Freund (andres@anarazel.de) wrote:
> > On 2021-04-21 15:51:38 -0400, Stephen Frost wrote:
> > > It does seem like we have some trade-offs here to weigh, but
> > > pg_control is indeed quite small..
> > 
> > What do you mean by that? That the overhead of writing it out more
> > frequently wouldn't be too bad? Or that we shouldn't "unnecessarily" add
> > more fields to it?
> 
> Mostly just that the added overhead in writing it out more frequently
> wouldn't be too bad.
>
> Seems the missing bit here is "how often, and how do we make that
> happen?" and then we can discuss if there's reason to be concerned that
> it would be 'too frequent' or cause too much additional overhead in
> terms of IO/fsyncs.

The number of writes and the number of fsyncs of the control file
wouldn't necessarily have to be the same. We could e.g. update the file
once per segment, but only fsync it at a lower cadence. We already rely
on handling writes-without-fsync of the control file (which is trivial
due to the <= 512 byte limit).

Another interesting question is where we'd do the update from. It seems
like it ought to be some background process:

I can see doing it in the checkpointer - but there's a few phases that
can take a while (e.g. sync) where currently don't call something like
CheckpointWriteDelay() on a regular basis.

I also can see doing it in bgwriter - none of the work it does should
take all that long, and minor increases in latency ought not to have
much of an impact.

Wal writer seems less suitable, some workloads are sensitive to it not
getting around doing what it ought to do.


> Adding fields runs the risk of crossing the
> threshold where we feel that we can safely assume all of it will make it
> to disk in one shot and therefore there's more reason to not add extra
> fields to it, if possible.

Yea, we really should stay below 512 bytes (sector size). We're at 296
right now, with 20 bytes lost to padding. If we got close to the limit
we could easily move some of the contents out of pg_control - we
e.g. don't need to write out all the compile time values all the time,
they could live in a file similar to PG_VERSION instead.  So I'm not too
concerned right now.  But we also don't need to add anything, given that
we already have minRecoveryPoint.

Greetings,

Andres Freund



Re: when the startup process doesn't

From
Andres Freund
Date:
Hi,

On 2021-04-21 16:55:28 -0400, Tom Lane wrote:
> My concern about it was not at all about performance, but that every time
> you write it is a new opportunity for the filesystem to lose or corrupt
> the data.

We already do, sometimes very frequent, control file updates on standbys
to update minRecoveryLSN. I don't recall reports of that causing
corruption issues. So I'd not be too concerned about that aspect?

Greetings,

Andres Freund



Re: when the startup process doesn't

From
Stephen Frost
Date:
Greetings,

On Wed, Apr 21, 2021 at 17:01 Andres Freund <andres@anarazel.de> wrote:
On 2021-04-21 16:55:28 -0400, Tom Lane wrote:
> My concern about it was not at all about performance, but that every time
> you write it is a new opportunity for the filesystem to lose or corrupt
> the data.

We already do, sometimes very frequent, control file updates on standbys
to update minRecoveryLSN. I don't recall reports of that causing
corruption issues. So I'd not be too concerned about that aspect?

Or perhaps we should consider having multiple copies..?  Though I definitely have seen missing WAL causing difficult to realize / detect corruption more than corrupt pg_control files...

Thanks,

Stephen

Re: when the startup process doesn't

From
Jehan-Guillaume de Rorthais
Date:
On Wed, 21 Apr 2021 12:36:05 -0700
Andres Freund <andres@anarazel.de> wrote:

>  [...]  
> 
> I don't think that concern equally applies for what I am proposing
> here. For one, we already have minRecoveryPoint in ControlData, and we
> already use it for the purpose of determining where we need to recover
> to, albeit only during crash recovery. Imo that's substantially
> different from adding actual recovery progress status information to the
> control file.

Just for the record, when I was talking about updating status of the startup
in the controldata, I was thinking about setting the last known LSN replayed.
Not some kind of variable string.

> 
> I also think that it'd actually be a significant reliability improvement
> if we maintained an approximate minRecoveryPoint during normal running:
> I've seen way too many cases where WAL files were lost / removed and
> crash recovery just started up happily. Only hitting problems months
> down the line. Yes, it'd obviously not bullet proof, since we'd not want
> to add a significant stream of new fsyncs, but IME such WAL files
> lost/removed issues tend not to be about a few hundred bytes of WAL but
> many segments missing.

Maybe setting this minRecoveryPoint once per segment would be enough, near
from the beginning of the WAL. At least, the recovery process would be
forced to actually replay until the very last known segment.

Regards,



Re: when the startup process doesn't

From
Nitin Jadhav
Date:
Hi,

As nobody opposed the log based approach, I thought of creating a
patch using this approach. Please find the patch attached.

Introduced the new GUC variable 'log_min_duration_startup_process'
which indicates the time period after every time period it logs the
progress of the process startup when it is set to a value (In
millisecond) greater than zero. if it is set to zero, then it logs all
available messages. If it is set to -1, then disables the feature.

> There are probably a number of reasons why this can happen, but the
> ones that seem to come up most often in my experience are
> (1) SyncDataDirectory takes a long time, (b) ResetUnloggedRelations
> takes a long time, and (c) there's a lot of WAL to apply so that takes a
> long time.

I have added the proper logs in all of the above scenarios.

Following is the sample log displayed during server startup when the
time period is set to 10ms.

2021-06-04 19:40:06.390 IST [51116] LOG:  Syncing data directory,
elapsed time: 14.165 ms,  current path: ./base/13892/16384_fsm
2021-06-04 19:40:06.399 IST [51116] LOG:  Syncing data directory
completed after 22.661 ms
2021-06-04 19:40:06.399 IST [51116] LOG:  database system was not
properly shut down; automatic recovery in progress
2021-06-04 19:40:06.401 IST [51116] LOG:  Resetting unlogged relations
completed after 0.219 ms
2021-06-04 19:40:06.401 IST [51116] LOG:  redo starts at 0/4728B88
2021-06-04 19:40:06.411 IST [51116] LOG:  Performing crash recovery,
elapsed time: 10.002 ms,  current LSN: 0/47AA998
2021-06-04 19:40:06.421 IST [51116] LOG:  Performing crash recovery,
elapsed time: 20.002 ms,  current LSN: 0/4838D80
2021-06-04 19:40:06.431 IST [51116] LOG:  Performing crash recovery,
elapsed time: 30.002 ms,  current LSN: 0/48DA718
2021-06-04 19:40:06.441 IST [51116] LOG:  Performing crash recovery,
elapsed time: 40.002 ms,  current LSN: 0/49791C0
.
.
.
2021-06-04 19:40:07.222 IST [51116] LOG:  Performing crash recovery,
elapsed time: 820.805 ms,  current LSN: 0/76F6F10
2021-06-04 19:40:07.227 IST [51116] LOG:  invalid record length at
0/774E758: wanted 24, got 0
2021-06-04 19:40:07.227 IST [51116] LOG:  redo done at 0/774E730
system usage: CPU: user: 0.77 s, system: 0.03 s, elapsed: 0.82 s

Kindly let me know if any changes are required.

Thanks & Regards,
Nitin Jadhav

On Thu, Apr 22, 2021 at 4:39 AM Jehan-Guillaume de Rorthais
<jgdr@dalibo.com> wrote:
>
> On Wed, 21 Apr 2021 12:36:05 -0700
> Andres Freund <andres@anarazel.de> wrote:
>
> >  [...]
> >
> > I don't think that concern equally applies for what I am proposing
> > here. For one, we already have minRecoveryPoint in ControlData, and we
> > already use it for the purpose of determining where we need to recover
> > to, albeit only during crash recovery. Imo that's substantially
> > different from adding actual recovery progress status information to the
> > control file.
>
> Just for the record, when I was talking about updating status of the startup
> in the controldata, I was thinking about setting the last known LSN replayed.
> Not some kind of variable string.
>
> >
> > I also think that it'd actually be a significant reliability improvement
> > if we maintained an approximate minRecoveryPoint during normal running:
> > I've seen way too many cases where WAL files were lost / removed and
> > crash recovery just started up happily. Only hitting problems months
> > down the line. Yes, it'd obviously not bullet proof, since we'd not want
> > to add a significant stream of new fsyncs, but IME such WAL files
> > lost/removed issues tend not to be about a few hundred bytes of WAL but
> > many segments missing.
>
> Maybe setting this minRecoveryPoint once per segment would be enough, near
> from the beginning of the WAL. At least, the recovery process would be
> forced to actually replay until the very last known segment.
>
> Regards,
>
>

Attachment

Re: when the startup process doesn't

From
Justin Pryzby
Date:
On Fri, Jun 04, 2021 at 07:49:21PM +0530, Nitin Jadhav wrote:
> I have added the proper logs in all of the above scenarios.
> 
> Following is the sample log displayed during server startup when the
> time period is set to 10ms.
> 
> 2021-06-04 19:40:06.390 IST [51116] LOG:  Syncing data directory, elapsed time: 14.165 ms,  current path:
./base/13892/16384_fsm
> 2021-06-04 19:40:06.399 IST [51116] LOG:  Syncing data directory completed after 22.661 ms

|2021-06-04 19:40:07.222 IST [51116] LOG:  Performing crash recovery, elapsed time: 820.805 ms,  current LSN:
0/76F6F10
|2021-06-04 19:40:07.227 IST [51116] LOG:  invalid record length at 0/774E758: wanted 24, got 0
|2021-06-04 19:40:07.227 IST [51116] LOG:  redo done at 0/774E730 system usage: CPU: user: 0.77 s, system: 0.03 s,
elapsed:0.82 s
 

Should it show the rusage ?  It's shown at startup completion since 10a5b35a0,
so it seems strange not to show it here.

+                       log_startup_process_progress("Syncing data directory", path, false);

I think the fsync vs syncfs paths should be distinguished: "Syncing data
directory (fsync)" vs "Syncing data directory (syncfs)".

+               {"log_min_duration_startup_process", PGC_SUSET, LOGGING_WHEN,

I think it should be PGC_SIGHUP, to allow changing it during runtime.
Obviously it has no effect except during startup, but the change will be
effective if the current process crashes.
See also: https://www.postgresql.org/message-id/20210526001359.GE3676@telsasoft.com

+extern void log_startup_process_progress(char *operation, void *data,
+                                         bool is_complete);

I think this should take an enum operation, rather than using strcmp() on it
later.  The enum values might be RECOVERY_START, RECOVERY_END, rather than
having a bool is_complete.

-- 
Justin



Re: when the startup process doesn't

From
Robert Haas
Date:
On Sun, Jun 6, 2021 at 6:23 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> Should it show the rusage ?  It's shown at startup completion since 10a5b35a0,
> so it seems strange not to show it here.

I don't know, that seems like it's going to make the messages awfully
long, and I'm not sure of what use it is to see that for every report.

I don't like the name very much. log_min_duration_startup_process
seems to have been chosen to correspond to log_min_duration_statement,
but the semantics are different. That one is a threshold, whereas this
one is an interval. Maybe something like
log_startup_progress_interval?

As far as the patch itself goes, I think that the overhead of this
approach is going to be unacceptably high. I was imagining having a
timer running in the background that fires periodically, with the
interval handler just setting a flag. Then in the foreground we just
need to check whether the flag is set. I doubt that we can get away
with a GetCurrentTimestamp() after applying every WAL record ... that
seems like it will be slow.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> ... I doubt that we can get away
> with a GetCurrentTimestamp() after applying every WAL record ... that
> seems like it will be slow.

Yeah, that's going to be pretty awful even on machines with fast
gettimeofday, never mind ones where it isn't.

It should be possible to use utils/misc/timeout.c to manage the
interrupt, I'd think.

            regards, tom lane



Re: when the startup process doesn't

From
Nitin Jadhav
Date:
> Should it show the rusage ?  It's shown at startup completion since 10a5b35a0,
> so it seems strange not to show it here.

> I don't know, that seems like it's going to make the messages awfully
> long, and I'm not sure of what use it is to see that for every report.

I have not changed anything wrt this. If it is really required to
change, then I will change.

> +                       log_startup_process_progress("Syncing data directory", path, false);

> I think the fsync vs syncfs paths should be distinguished: "Syncing data
> directory (fsync)" vs "Syncing data directory (syncfs)".

Fixed.

> +               {"log_min_duration_startup_process", PGC_SUSET, LOGGING_WHEN,
>
> I think it should be PGC_SIGHUP, to allow changing it during runtime.
> Obviously it has no effect except during startup, but the change will be
> effective if the current process crashes.
> See also: https://www.postgresql.org/message-id/20210526001359.GE3676@telsasoft.com

I did not get exactly how it will change behaviour. In my
understanding, when the server restarts after a crash, it fetches the
value from the config file. So if there is any change that gets
affected. Kindly correct me if I am wrong.

> +extern void log_startup_process_progress(char *operation, void *data,
> +                                         bool is_complete);
>
> I think this should take an enum operation, rather than using strcmp() on it
> later.  The enum values might be RECOVERY_START, RECOVERY_END, rather than
> having a bool is_complete.

Fixed.

> I don't like the name very much. log_min_duration_startup_process
> seems to have been chosen to correspond to log_min_duration_statement,
> but the semantics are different. That one is a threshold, whereas this
> one is an interval. Maybe something like
> log_startup_progress_interval?

Yes. This looks more appropriate. Fixed in the attached patch.

> As far as the patch itself goes, I think that the overhead of this
> approach is going to be unacceptably high. I was imagining having a
> timer running in the background that fires periodically, with the
> interval handler just setting a flag. Then in the foreground we just
> need to check whether the flag is set. I doubt that we can get away
> with a GetCurrentTimestamp() after applying every WAL record ... that
> seems like it will be slow.

Thanks for correcting me. This approach is far better than what I had
used earlier. I have done the code changes as per your approach in the
attached patch.

Kindly let me know if any changes are required.

Thanks & Regards,
Nitin Jadhav

On Mon, Jun 7, 2021 at 7:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Robert Haas <robertmhaas@gmail.com> writes:
> > ... I doubt that we can get away
> > with a GetCurrentTimestamp() after applying every WAL record ... that
> > seems like it will be slow.
>
> Yeah, that's going to be pretty awful even on machines with fast
> gettimeofday, never mind ones where it isn't.
>
> It should be possible to use utils/misc/timeout.c to manage the
> interrupt, I'd think.
>
>                         regards, tom lane

Attachment

Re: when the startup process doesn't

From
Justin Pryzby
Date:
On Wed, Jun 09, 2021 at 05:09:54PM +0530, Nitin Jadhav wrote:
> > +               {"log_min_duration_startup_process", PGC_SUSET, LOGGING_WHEN,
> >
> > I think it should be PGC_SIGHUP, to allow changing it during runtime.
> > Obviously it has no effect except during startup, but the change will be
> > effective if the current process crashes.
> > See also: https://www.postgresql.org/message-id/20210526001359.GE3676@telsasoft.com
> 
> I did not get exactly how it will change behaviour. In my
> understanding, when the server restarts after a crash, it fetches the
> value from the config file. So if there is any change that gets
> affected. Kindly correct me if I am wrong.

I don't think so.  I checked and SelectConfigFiles is called only once to read
config files and cmdline args.  And not called on restart_after_crash.

The GUC definitely isn't SUSET, since it's not useful to write in a (super)
user session SET log_min_duration_startup_process=123.

I've triple checked the behavior using a patch I submitted for Thomas' syncfs
feature.  ALTER SYSTEM recovery_init_sync_method=syncfs was not picked up when
I sent SIGABRT.  But with my patch, if I also do SELECT pg_reload_conf(), then
a future crash uses syncfs.
https://www.postgresql.org/message-id/20210526001359.GE3676@telsasoft.com

-- 
Justin



Re: when the startup process doesn't

From
Nitin Jadhav
Date:
> > > +               {"log_min_duration_startup_process", PGC_SUSET, LOGGING_WHEN,
> > >
> > > I think it should be PGC_SIGHUP, to allow changing it during runtime.
> > > Obviously it has no effect except during startup, but the change will be
> > > effective if the current process crashes.
> > > See also: https://www.postgresql.org/message-id/20210526001359.GE3676@telsasoft.com
> >
> > I did not get exactly how it will change behaviour. In my
> > understanding, when the server restarts after a crash, it fetches the
> > value from the config file. So if there is any change that gets
> > affected. Kindly correct me if I am wrong.

Sorry my understanding was wrong. But I'm not completely convinced
with the above description saying that the change will be effective if
the current process crashes.
AFAIK, whenever we set the GucContext less than PGC_SIGHUP (that is
either PGC_POSTMASTER or PGC_INTERNAL) then any change in the config
file will not get affected during restart after crash. If the
GucContext is greater than or equal to PGC_SIGHUP, then any change in
the config file will be changed once it receives the SIGHUP signal. So
it gets affected by a restart after a crash. So since the GucContext
set here is PGC_SUSET which is greater than PGC_SIGHUP, there is no
change in the behaviour wrt this point.

> I've triple checked the behavior using a patch I submitted for Thomas' syncfs
> feature.  ALTER SYSTEM recovery_init_sync_method=syncfs was not picked up when
> I sent SIGABRT.  But with my patch, if I also do SELECT pg_reload_conf(), then
> a future crash uses syncfs.
> https://www.postgresql.org/message-id/20210526001359.GE3676@telsasoft.com

The difference is since the behaviour is compared between
PGC_POSTMASTER and PGC_SIGHUP.

> The GUC definitely isn't SUSET, since it's not useful to write in a (super)
> user session SET log_min_duration_startup_process=123.
I agree with this. I may have to change this value as setting in a
user session is not at all useful. But I am confused between
PGC_POSTMASTER and PGC_SIGHUP. We should use PGC_SIGHUP if we would
like to allow the change during restart after a crash. Otherwise
PGC_POSTMASTER would be sufficient. Kindly share your thoughts.

Thanks & Regards,
Nitin Jadhav

On Wed, Jun 9, 2021 at 9:49 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Wed, Jun 09, 2021 at 05:09:54PM +0530, Nitin Jadhav wrote:
> > > +               {"log_min_duration_startup_process", PGC_SUSET, LOGGING_WHEN,
> > >
> > > I think it should be PGC_SIGHUP, to allow changing it during runtime.
> > > Obviously it has no effect except during startup, but the change will be
> > > effective if the current process crashes.
> > > See also: https://www.postgresql.org/message-id/20210526001359.GE3676@telsasoft.com
> >
> > I did not get exactly how it will change behaviour. In my
> > understanding, when the server restarts after a crash, it fetches the
> > value from the config file. So if there is any change that gets
> > affected. Kindly correct me if I am wrong.
>
> I don't think so.  I checked and SelectConfigFiles is called only once to read
> config files and cmdline args.  And not called on restart_after_crash.
>
> The GUC definitely isn't SUSET, since it's not useful to write in a (super)
> user session SET log_min_duration_startup_process=123.
>
> I've triple checked the behavior using a patch I submitted for Thomas' syncfs
> feature.  ALTER SYSTEM recovery_init_sync_method=syncfs was not picked up when
> I sent SIGABRT.  But with my patch, if I also do SELECT pg_reload_conf(), then
> a future crash uses syncfs.
> https://www.postgresql.org/message-id/20210526001359.GE3676@telsasoft.com
>
> --
> Justin



Re: when the startup process doesn't

From
Justin Pryzby
Date:
On Thu, Jun 10, 2021 at 03:19:20PM +0530, Nitin Jadhav wrote:
> > > > +               {"log_min_duration_startup_process", PGC_SUSET, LOGGING_WHEN,
> > > >
> > > > I think it should be PGC_SIGHUP, to allow changing it during runtime.
> > > > Obviously it has no effect except during startup, but the change will be
> > > > effective if the current process crashes.
> > > > See also: https://www.postgresql.org/message-id/20210526001359.GE3676@telsasoft.com
> > >
> > > I did not get exactly how it will change behaviour. In my
> > > understanding, when the server restarts after a crash, it fetches the
> > > value from the config file. So if there is any change that gets
> > > affected. Kindly correct me if I am wrong.
> 
> Sorry my understanding was wrong. But I'm not completely convinced
> with the above description saying that the change will be effective if
> the current process crashes.
> AFAIK, whenever we set the GucContext less than PGC_SIGHUP (that is
> either PGC_POSTMASTER or PGC_INTERNAL) then any change in the config
> file will not get affected during restart after crash. If the
> GucContext is greater than or equal to PGC_SIGHUP, then any change in
> the config file will be changed once it receives the SIGHUP signal. So
> it gets affected by a restart after a crash. So since the GucContext
> set here is PGC_SUSET which is greater than PGC_SIGHUP, there is no
> change in the behaviour wrt this point.

Since you agreed that SUSET was wrong, and PGC_POSTMASTER doesn't allow
changing the value without restart, doesn't it follow that SIGHUP is what's
wanted ?

> > I've triple checked the behavior using a patch I submitted for Thomas' syncfs
> > feature.  ALTER SYSTEM recovery_init_sync_method=syncfs was not picked up when
> > I sent SIGABRT.  But with my patch, if I also do SELECT pg_reload_conf(), then
> > a future crash uses syncfs.
> > https://www.postgresql.org/message-id/20210526001359.GE3676@telsasoft.com
> 
> The difference is since the behaviour is compared between
> PGC_POSTMASTER and PGC_SIGHUP.
> 
> > The GUC definitely isn't SUSET, since it's not useful to write in a (super)
> > user session SET log_min_duration_startup_process=123.
> I agree with this. I may have to change this value as setting in a
> user session is not at all useful. But I am confused between
> PGC_POSTMASTER and PGC_SIGHUP. We should use PGC_SIGHUP if we would
> like to allow the change during restart after a crash. Otherwise
> PGC_POSTMASTER would be sufficient. Kindly share your thoughts.
> 
> Thanks & Regards,
> Nitin Jadhav
> 
> On Wed, Jun 9, 2021 at 9:49 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> >
> > On Wed, Jun 09, 2021 at 05:09:54PM +0530, Nitin Jadhav wrote:
> > > > +               {"log_min_duration_startup_process", PGC_SUSET, LOGGING_WHEN,
> > > >
> > > > I think it should be PGC_SIGHUP, to allow changing it during runtime.
> > > > Obviously it has no effect except during startup, but the change will be
> > > > effective if the current process crashes.
> > > > See also: https://www.postgresql.org/message-id/20210526001359.GE3676@telsasoft.com
> > >
> > > I did not get exactly how it will change behaviour. In my
> > > understanding, when the server restarts after a crash, it fetches the
> > > value from the config file. So if there is any change that gets
> > > affected. Kindly correct me if I am wrong.
> >
> > I don't think so.  I checked and SelectConfigFiles is called only once to read
> > config files and cmdline args.  And not called on restart_after_crash.
> >
> > The GUC definitely isn't SUSET, since it's not useful to write in a (super)
> > user session SET log_min_duration_startup_process=123.
> >
> > I've triple checked the behavior using a patch I submitted for Thomas' syncfs
> > feature.  ALTER SYSTEM recovery_init_sync_method=syncfs was not picked up when
> > I sent SIGABRT.  But with my patch, if I also do SELECT pg_reload_conf(), then
> > a future crash uses syncfs.
> > https://www.postgresql.org/message-id/20210526001359.GE3676@telsasoft.com



Re: when the startup process doesn't

From
Nitin Jadhav
Date:
> Since you agreed that SUSET was wrong, and PGC_POSTMASTER doesn't allow
> changing the value without restart, doesn't it follow that SIGHUP is what's
> wanted ?

Yes. I have done the changes in the attached patch.

Apart from this, I have done a few other changes to the patch. The
changes include

1. Renamed 'InitCurrentOperation' to 'InitStartupProgress()'.
2. Divided the functionality of 'LogStartupProgress()' into 2 parts.
One for logging the progress and the other to log the completion
information. The first part's function name remains as is and a new
function 'CloseStartupProgress()' added for the second part.
3. In case of any invalid operations found during logging of the
startup progress, throwing an error. This is not a concern unless the
developer makes a mistake.
4. Modified the 'StartupProcessOp' enums like 'FSYNC_START' to
'FSYNC_IN_PROGRESS' for better readability.
5. Updated the comments and some cosmetic changes.

Kindly share your comments.

Thanks & Regards,
Nitin Jadhav

On Thu, Jun 10, 2021 at 3:31 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Thu, Jun 10, 2021 at 03:19:20PM +0530, Nitin Jadhav wrote:
> > > > > +               {"log_min_duration_startup_process", PGC_SUSET, LOGGING_WHEN,
> > > > >
> > > > > I think it should be PGC_SIGHUP, to allow changing it during runtime.
> > > > > Obviously it has no effect except during startup, but the change will be
> > > > > effective if the current process crashes.
> > > > > See also: https://www.postgresql.org/message-id/20210526001359.GE3676@telsasoft.com
> > > >
> > > > I did not get exactly how it will change behaviour. In my
> > > > understanding, when the server restarts after a crash, it fetches the
> > > > value from the config file. So if there is any change that gets
> > > > affected. Kindly correct me if I am wrong.
> >
> > Sorry my understanding was wrong. But I'm not completely convinced
> > with the above description saying that the change will be effective if
> > the current process crashes.
> > AFAIK, whenever we set the GucContext less than PGC_SIGHUP (that is
> > either PGC_POSTMASTER or PGC_INTERNAL) then any change in the config
> > file will not get affected during restart after crash. If the
> > GucContext is greater than or equal to PGC_SIGHUP, then any change in
> > the config file will be changed once it receives the SIGHUP signal. So
> > it gets affected by a restart after a crash. So since the GucContext
> > set here is PGC_SUSET which is greater than PGC_SIGHUP, there is no
> > change in the behaviour wrt this point.
>
> Since you agreed that SUSET was wrong, and PGC_POSTMASTER doesn't allow
> changing the value without restart, doesn't it follow that SIGHUP is what's
> wanted ?
>
> > > I've triple checked the behavior using a patch I submitted for Thomas' syncfs
> > > feature.  ALTER SYSTEM recovery_init_sync_method=syncfs was not picked up when
> > > I sent SIGABRT.  But with my patch, if I also do SELECT pg_reload_conf(), then
> > > a future crash uses syncfs.
> > > https://www.postgresql.org/message-id/20210526001359.GE3676@telsasoft.com
> >
> > The difference is since the behaviour is compared between
> > PGC_POSTMASTER and PGC_SIGHUP.
> >
> > > The GUC definitely isn't SUSET, since it's not useful to write in a (super)
> > > user session SET log_min_duration_startup_process=123.
> > I agree with this. I may have to change this value as setting in a
> > user session is not at all useful. But I am confused between
> > PGC_POSTMASTER and PGC_SIGHUP. We should use PGC_SIGHUP if we would
> > like to allow the change during restart after a crash. Otherwise
> > PGC_POSTMASTER would be sufficient. Kindly share your thoughts.
> >
> > Thanks & Regards,
> > Nitin Jadhav
> >
> > On Wed, Jun 9, 2021 at 9:49 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > >
> > > On Wed, Jun 09, 2021 at 05:09:54PM +0530, Nitin Jadhav wrote:
> > > > > +               {"log_min_duration_startup_process", PGC_SUSET, LOGGING_WHEN,
> > > > >
> > > > > I think it should be PGC_SIGHUP, to allow changing it during runtime.
> > > > > Obviously it has no effect except during startup, but the change will be
> > > > > effective if the current process crashes.
> > > > > See also: https://www.postgresql.org/message-id/20210526001359.GE3676@telsasoft.com
> > > >
> > > > I did not get exactly how it will change behaviour. In my
> > > > understanding, when the server restarts after a crash, it fetches the
> > > > value from the config file. So if there is any change that gets
> > > > affected. Kindly correct me if I am wrong.
> > >
> > > I don't think so.  I checked and SelectConfigFiles is called only once to read
> > > config files and cmdline args.  And not called on restart_after_crash.
> > >
> > > The GUC definitely isn't SUSET, since it's not useful to write in a (super)
> > > user session SET log_min_duration_startup_process=123.
> > >
> > > I've triple checked the behavior using a patch I submitted for Thomas' syncfs
> > > feature.  ALTER SYSTEM recovery_init_sync_method=syncfs was not picked up when
> > > I sent SIGABRT.  But with my patch, if I also do SELECT pg_reload_conf(), then
> > > a future crash uses syncfs.
> > > https://www.postgresql.org/message-id/20210526001359.GE3676@telsasoft.com

Attachment

Re: when the startup process doesn't (logging startup delays)

From
Justin Pryzby
Date:
+ * Codes of the operations performed during startup process
+ */
+typedef enum StartupProcessOp
+{
+       SYNCFS_IN_PROGRESS,
+       FSYNC_IN_PROGRESS,
+       RECOVERY_IN_PROGRESS,
+       RESET_UNLOGGED_REL_IN_PROGRESS,
+       DUMMY,
+       SYNCFS_END,
+       FSYNC_END,
+       RECOVERY_END,
+       RESET_UNLOGGED_REL_END
+} StartupProcessOp;

What is DUMMY about ?  If you just want to separate the "start" from "end",
you could write:

/* codes for start of operations */
FSYNC_IN_PROGRESS
SYNCFS_IN_PROGRESS
...
/* codes for end of operations */
FSYNC_END
SYNCFS_END
...

Or group them together like:

FSYNC_IN_PROGRESS,
FSYNC_END,
SYNCFS_IN_PROGRESS, 
SYNCFS_END,
RECOVERY_IN_PROGRESS,
RECOVERY_END,
RESET_UNLOGGED_REL_IN_PROGRESS,
RESET_UNLOGGED_REL_END,

-- 
Justin



Re: when the startup process doesn't (logging startup delays)

From
Nitin Jadhav
Date:
> What is DUMMY about ?  If you just want to separate the "start" from "end",
> you could write:
>
> /* codes for start of operations */
> FSYNC_IN_PROGRESS
> SYNCFS_IN_PROGRESS
> ...
> /* codes for end of operations */
> FSYNC_END
> SYNCFS_END
> ...

That was by mistake and I have corrected it in the attached patch.

Thanks & Regards,
Nitin Jadhav

On Thu, Jun 17, 2021 at 6:22 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> + * Codes of the operations performed during startup process
> + */
> +typedef enum StartupProcessOp
> +{
> +       SYNCFS_IN_PROGRESS,
> +       FSYNC_IN_PROGRESS,
> +       RECOVERY_IN_PROGRESS,
> +       RESET_UNLOGGED_REL_IN_PROGRESS,
> +       DUMMY,
> +       SYNCFS_END,
> +       FSYNC_END,
> +       RECOVERY_END,
> +       RESET_UNLOGGED_REL_END
> +} StartupProcessOp;
>
> What is DUMMY about ?  If you just want to separate the "start" from "end",
> you could write:
>
> /* codes for start of operations */
> FSYNC_IN_PROGRESS
> SYNCFS_IN_PROGRESS
> ...
> /* codes for end of operations */
> FSYNC_END
> SYNCFS_END
> ...
>
> Or group them together like:
>
> FSYNC_IN_PROGRESS,
> FSYNC_END,
> SYNCFS_IN_PROGRESS,
> SYNCFS_END,
> RECOVERY_IN_PROGRESS,
> RECOVERY_END,
> RESET_UNLOGGED_REL_IN_PROGRESS,
> RESET_UNLOGGED_REL_END,
>
> --
> Justin

Attachment

Re: when the startup process doesn't (logging startup delays)

From
Amul Sul
Date:
Few comments for v4 patch:

@@ -7351,6 +7363,8 @@ StartupXLOG(void)
                    (errmsg("redo starts at %X/%X",
                            LSN_FORMAT_ARGS(ReadRecPtr))));

+           InitStartupProgress();
+
            /*
             * main redo apply loop
             */
@@ -7358,6 +7372,8 @@ StartupXLOG(void)
            {
                bool        switchedTLI = false;

+               LogStartupProgress(RECOVERY_IN_PROGRESS, NULL);
+
 #ifdef WAL_DEBUG
                if (XLOG_DEBUG ||
                    (rmid == RM_XACT_ID && trace_recovery_messages <= DEBUG2) ||
@@ -7569,6 +7585,8 @@ StartupXLOG(void)
             * end of main redo apply loop
             */

+           CloseStartupProgress(RECOVERY_END);

I am not sure I am getting the code flow correctly. From CloseStartupProgress()
naming it seems, it corresponds to InitStartupProgress() but what it is doing
is similar to LogStartupProgress(). I think it should be renamed to be inlined
with LogStartupProgress(), IMO.
---

+
+   /* Return if any invalid operation */
+   if (operation >= SYNCFS_END)
+       return;
....
+   /* Return if any invalid operation */
+   if (operation < SYNCFS_END)
+       return;
+

This part should be an assertion, it's the developer's responsibility to call
correctly.
---

+/*
+ * Codes of the operations performed during startup process
+ */
+typedef enum StartupProcessOp
+{
+   /* Codes for in-progress operations */
+   SYNCFS_IN_PROGRESS,
+   FSYNC_IN_PROGRESS,
+   RECOVERY_IN_PROGRESS,
+   RESET_UNLOGGED_REL_IN_PROGRESS,
+   /* Codes for end of operations */
+   SYNCFS_END,
+   FSYNC_END,
+   RECOVERY_END,
+   RESET_UNLOGGED_REL_END
+} StartupProcessOp;
+

Since we do have a separate call for the in-progress operation and the
end-operation, only a single enum would have been enough. If we do this, then I
think we should remove get_startup_process_operation_string() move messages to
the respective function.
---

Also, with your patch "make check-world" has few failures, kindly check that.

Regards,
Amul


On Mon, Jun 21, 2021 at 12:06 PM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
>
> > What is DUMMY about ?  If you just want to separate the "start" from "end",
> > you could write:
> >
> > /* codes for start of operations */
> > FSYNC_IN_PROGRESS
> > SYNCFS_IN_PROGRESS
> > ...
> > /* codes for end of operations */
> > FSYNC_END
> > SYNCFS_END
> > ...
>
> That was by mistake and I have corrected it in the attached patch.
>
> Thanks & Regards,
> Nitin Jadhav
>
> On Thu, Jun 17, 2021 at 6:22 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> >
> > + * Codes of the operations performed during startup process
> > + */
> > +typedef enum StartupProcessOp
> > +{
> > +       SYNCFS_IN_PROGRESS,
> > +       FSYNC_IN_PROGRESS,
> > +       RECOVERY_IN_PROGRESS,
> > +       RESET_UNLOGGED_REL_IN_PROGRESS,
> > +       DUMMY,
> > +       SYNCFS_END,
> > +       FSYNC_END,
> > +       RECOVERY_END,
> > +       RESET_UNLOGGED_REL_END
> > +} StartupProcessOp;
> >
> > What is DUMMY about ?  If you just want to separate the "start" from "end",
> > you could write:
> >
> > /* codes for start of operations */
> > FSYNC_IN_PROGRESS
> > SYNCFS_IN_PROGRESS
> > ...
> > /* codes for end of operations */
> > FSYNC_END
> > SYNCFS_END
> > ...
> >
> > Or group them together like:
> >
> > FSYNC_IN_PROGRESS,
> > FSYNC_END,
> > SYNCFS_IN_PROGRESS,
> > SYNCFS_END,
> > RECOVERY_IN_PROGRESS,
> > RECOVERY_END,
> > RESET_UNLOGGED_REL_IN_PROGRESS,
> > RESET_UNLOGGED_REL_END,
> >
> > --
> > Justin



Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
Hi,

I'd really like to see this enabled by default, say with a default
interval of 10 seconds. If it has to be enabled explicitly, most
people won't, but I think a lot of people would benefit from knowing
why their system is slow to start up when that sort of thing happens.
I don't see much downside to having it on by default either, since it
shouldn't be expensive. I think the GUC's units should be seconds, not
milliseconds, though.

I tried starting the server with log_startup_progress_interval=1000
and then crashing it to see what the output looked like. I got this:

2021-07-09 15:49:55.956 EDT [99033] LOG:  all server processes
terminated; reinitializing
2021-07-09 15:49:55.970 EDT [99106] LOG:  database system was
interrupted; last known up at 2021-07-09 15:48:39 EDT
2021-07-09 15:49:56.499 EDT [99106] LOG:  Data directory sync (fsync)
complete after 529.673 ms
2021-07-09 15:49:56.501 EDT [99106] LOG:  database system was not
properly shut down; automatic recovery in progress
2021-07-09 15:49:56.503 EDT [99106] LOG:  redo starts at 0/223494A8
2021-07-09 15:49:57.504 EDT [99106] LOG:  Performing crash recovery,
elapsed time: 1000.373 ms, current LSN: 0/40A3F888
2021-07-09 15:49:58.505 EDT [99106] LOG:  Performing crash recovery,
elapsed time: 2001.449 ms, current LSN: 0/41F89388
2021-07-09 15:49:59.505 EDT [99106] LOG:  Performing crash recovery,
elapsed time: 3001.602 ms, current LSN: 0/55745760
2021-07-09 15:50:00.506 EDT [99106] LOG:  Performing crash recovery,
elapsed time: 4002.677 ms, current LSN: 0/60CB9FE0
2021-07-09 15:50:01.507 EDT [99106] LOG:  Performing crash recovery,
elapsed time: 5003.808 ms, current LSN: 0/6A2BBE10
2021-07-09 15:50:02.508 EDT [99106] LOG:  Performing crash recovery,
elapsed time: 6004.916 ms, current LSN: 0/71BA3F90
2021-07-09 15:50:03.385 EDT [99106] LOG:  invalid record length at
0/76BD80F0: wanted 24, got 0
2021-07-09 15:50:03.385 EDT [99106] LOG:  Crash recovery complete
after 6881.834 ms
2021-07-09 15:50:03.385 EDT [99106] LOG:  redo done at 0/76BD80C8
system usage: CPU: user: 2.77 s, system: 3.80 s, elapsed: 6.88 s
2021-07-09 15:50:04.778 EDT [99033] LOG:  database system is ready to
accept connections

Few observations on this:

- The messages you've added are capitalized, but the ones PostgreSQL
has currently are not. You should conform to the existing style.

- The "crash recovery complete" message looks redundant with the "redo
done" message. Also, in my mind, "redo" is one particular phase of
crash recovery, so it looks really odd that "crash recovery" finishes
first and then "redo" finishes. I think some thought is needed about
the terminology here.

- I'm not clear why I get a message about the data directory fsync but
not about resetting unlogged relations. I wasn't really expecting to
get a message about things that completed in less than the configured
interval, although I find that I don't mind having it there either.
But then it seems like it should be consistent across the various
things we're timing, and resetting unlogged relations should certainly
be one of those.

- The way you've coded this has some drift. In a perfect world, I'd
get a progress report at 1000.00 ms, 2000.00 ms, 3000.00 ms, etc.
That's never going to be the case, because there's always going to be
a slightly delay in responding to the timer interrupt. However, as
you've coded it, the delay increases over time. The first "Performing
crash recovery" message is only 373 ms late, but the last one is 4916
ms late. To avoid this, you want to reschedule the timer interrupt
based on the time the last one was supposed to fire, not the time it
actually did fire. (Exception: If the time it actually did fire is
before the time it was supposed to fire, then use the time it actually
did fire instead. This protects against the system clock being set
backwards.)

...Robert



Re: when the startup process doesn't (logging startup delays)

From
Bharath Rupireddy
Date:
On Mon, Jun 21, 2021 at 12:06 PM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
> That was by mistake and I have corrected it in the attached patch.

Thanks for the patch. Here are some comments. Please ignore if I
repeat any of the comments given previously, as I didn't look at the
entire thread.

1) A new line between function return value and the function name:
+void CloseStartupProgress(StartupProcessOp operation)
+{
Like below:
+void
+CloseStartupProgress(StartupProcessOp operation)
+{

2) Add an entry in the commit fest, if it's not done yet. That way,
the patch gets tested on many platforms.

3) Replace "zero" with the number "0".
+ # -1 disables the feature, zero logs all

4) I think we can just rename InitStartupProgress to
EnableStartupProgress or EnableStartupOpProgress to be more in sync
with what it does.
+/*
+ * Sets the start timestamp of the current operation and also enables the
+ * timeout for logging the progress of startup process.
+ */
+void
+InitStartupProgress(void)
+{

5) Do we need the GetCurrentOperationStartTimestamp function at all?
It does nothing great actually, you might have referred to
GetCurrentTimestamp which does a good amount of work that qualifies to
be inside a function. Can't we just use the operationStartTimestamp
variable? Can we rename operationStartTimestamp (I don't think we need
to specify Timestamp in a variable name) to startup_op_start_time or
some other better name?
+/*
+ * Fetches the start timestamp of the current operation.
+ */
+TimestampTz
+GetCurrentOperationStartTimestamp(void)
+{

6) I think you can transform below
+ /* If the feature is disabled, then no need to proceed further. */
+ if (log_startup_progress_interval < 0)
+ return;
to
+ /* If the feature is disabled, then no need to proceed further. */
+ if (log_startup_progress_interval == -1)
+ return;
as -1 means feature disabled and values < -1 are not allowed to be set at all.

7) Isn't RECOVERY_IN_PROGRESS supposed to be REDO_IN_PRGRESS? Because,
"recovery in progress" generally applies to the entire startup process
right? Put it another way, the startup process as a whole does the
entire recovery, and you have the RECOVERY_IN_PROGRESS in the redo
phase of the entire startup process.

8) Why do we need to call get_startup_process_operation_string here?
Can't you directly use the error message text?
if (operation == RECOVERY_IN_PROGRESS)
ereport(LOG,
(errmsg("%s, elapsed time: %ld.%03d ms, current LSN: %X/%X",
get_startup_process_operation_string(operation),

9) Do you need error handling in the default case of
get_startup_process_operation_string? Instead, how about an assertion,
Assert(operation >= SYNCFS_IN_PROGRESS && operation <=
RESET_UNLOGGED_REL_END);?
        default:
            ereport(ERROR,
                    (errmsg("unrecognized operation (%d) in startup
progress update",
                            operation)));
10) I personally didn't like the name
get_startup_process_operation_string. How about get_startup_op_string?

11) As pointed out by Robert, the error log message should start with
small letters.
"syncing data directory (syncfs)");
"syncing data directory (fsync)");
"performing crash recovery");
"resetting unlogged relations");
 In general, the error log message should start with small letters and
not end with ".". The detail/hit messages should start with capital
letters and end with "."
        ereport(ERROR,
                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                 errmsg("only B-Tree indexes are supported as targets
for verification"),
                 errdetail("Relation \"%s\" is not a B-Tree index.",
                           RelationGetRelationName(rel))));
         ereport(ERROR,
                        (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
                         errmsg("sslcert and sslkey are superuser-only"),
                         errhint("User mappings with the sslcert or
sslkey options set may only be created or modified by the
superuser")));

12) How about starting SYNCFS_IN_PROGRESS = 1, and leaving 0 for some
unknown value?
typedef enum StartupProcessOp
{
    /* Codes for in-progress operations */
    SYNCFS_IN_PROGRESS = 1,

13) Can we combine LogStartupProgress and CloseStartupProgress?  Let's
have a single function LogStartupProgress(StartupProcessOp operation,
const char *path, bool start);, and differentiate the functionality
with the start flag.

14) Can we move log_startup_progress_interval declaration from guc.h
and guc.c to xlog.h and xlog.c? Because it makes more sense to be
there, similar to the other GUCs under /* these variables are GUC
parameters related to XLOG */ in xlog.h.

Regards,
Bharath Rupireddy.



Re: when the startup process doesn't (logging startup delays)

From
Nitin Jadhav
Date:
> I am not sure I am getting the code flow correctly. From CloseStartupProgress()
> naming it seems, it corresponds to InitStartupProgress() but what it is doing
> is similar to LogStartupProgress(). I think it should be renamed to be inlined
> with LogStartupProgress(), IMO.

Renamed CloseStartupProgress() to LogStartupProgressComplete().

> This part should be an assertion, it's the developer's responsibility to call
> correctly.

This code is not required at all due to the fix of the next comment.

> Since we do have a separate call for the in-progress operation and the
> end-operation, only a single enum would have been enough. If we do this, then I
> think we should remove get_startup_process_operation_string() move messages to
> the respective function.

Fixed.

> I'd really like to see this enabled by default, say with a default
> interval of 10 seconds. If it has to be enabled explicitly, most
> people won't, but I think a lot of people would benefit from knowing
> why their system is slow to start up when that sort of thing happens.
> I don't see much downside to having it on by default either, since it
> shouldn't be expensive.  I think the GUC's units should be seconds, not
> milliseconds, though.

I agree that it is better to enable it by default. Changed the code
accordingly and changed the GUC's units to seconds.

> The messages you've added are capitalized, but the ones PostgreSQL
> has currently are not. You should conform to the existing style.

Fixed.

> The "crash recovery complete" message looks redundant with the "redo
> done" message. Also, in my mind, "redo" is one particular phase of
> crash recovery, so it looks really odd that "crash recovery" finishes
> first and then "redo" finishes. I think some thought is needed about
> the terminology here.

Yes. "redo" is one phase of the crash recovery. Even "resetting
unlogged relations" is also a part of the crash recovery. These 2 are
the major time consuming operations of the crash recovery task. There
is a separate log message to indicate the progress of "resetting the
unlogged relations". So instead of saying 'performing crash recovery",
it is better to say "redo in progress" and not add any additional
message at the end of redo, instead retain the existing message to
avoid redundancy.

> I'm not clear why I get a message about the data directory fsync but
> not about resetting unlogged relations. I wasn't really expecting to
> get a message about things that completed in less than the configured
> interval, although I find that I don't mind having it there either.
> But then it seems like it should be consistent across the various
> things we're timing, and resetting unlogged relations should certainly
> be one of those.

It is the same across all the things we'are timing. I was able to see
those messages on my machine. I feel there is not much overhead of
logging one message at the end of the operation even though it
completes within the configured interval. Following are the log
messages shown on my machine.

2021-07-20 18:47:32.046 IST [102230] LOG:  listening on IPv4 address
"127.0.0.1", port 5445
2021-07-20 18:47:32.048 IST [102230] LOG:  listening on Unix socket
"/tmp/.s.PGSQL.5445"
2021-07-20 18:47:32.051 IST [102234] LOG:  database system was
interrupted; last known up at 2021-07-20 18:46:27 IST
2021-07-20 18:47:32.060 IST [102234] LOG:  data directory sync (fsync)
complete after 0.00 s
2021-07-20 18:47:32.060 IST [102234] LOG:  database system was not
properly shut down; automatic recovery in progress
2021-07-20 18:47:32.063 IST [102234] LOG:  unlogged relations reset after 0.00 s
2021-07-20 18:47:32.063 IST [102234] LOG:  redo starts at 0/14EF418
.2021-07-20 18:47:33.063 IST [102234] LOG:  redo in progress, elapsed
time: 1.00 s, current LSN: 0/5C13D28
.2021-07-20 18:47:34.063 IST [102234] LOG:  redo in progress, elapsed
time: 2.00 s, current LSN: 0/A289160
.2021-07-20 18:47:35.063 IST [102234] LOG:  redo in progress, elapsed
time: 3.00 s, current LSN: 0/EE2DE10
2021-07-20 18:47:35.563 IST [102234] LOG:  invalid record length at
0/115C63E0: wanted 24, got 0
2021-07-20 18:47:35.563 IST [102234] LOG:  redo done at 0/115C63B8
system usage: CPU: user: 3.58 s, system: 0.14 s, elapsed: 3.50 s
2021-07-20 18:47:35.564 IST [102234] LOG:  unlogged relations reset after 0.00 s
2021-07-20 18:47:35.706 IST [102230] LOG:  database system is ready to
accept connections


> The way you've coded this has some drift. In a perfect world, I'd
> get a progress report at 1000.00 ms, 2000.00 ms, 3000.00 ms, etc.
> That's never going to be the case, because there's always going to be
> a slightly delay in responding to the timer interrupt. However, as
> you've coded it, the delay increases over time. The first "Performing
> crash recovery" message is only 373 ms late, but the last one is 4916
> ms late. To avoid this, you want to reschedule the timer interrupt
> based on the time the last one was supposed to fire, not the time it
> actually did fire. (Exception: If the time it actually did fire is
> before the time it was supposed to fire, then use the time it actually
> did fire instead. This protects against the system clock being set
> backwards.)

I have rescheduled the timer interrupt based on the time the last one
was supposed to fire, not the time it actually did fire. Now I am able
to see the messages when the timer is timed out and it is very close
to the configured interval. But I did not find a scenario when the
above mentioned exception can occur. Kindly let me know if I am wrong
in the approach.

> 1) A new line between function return value and the function name:
> +void CloseStartupProgress(StartupProcessOp operation)
> +{
> Like below:
> +void
> +CloseStartupProgress(StartupProcessOp operation)
> +{

Fixed.

> 2) Add an entry in the commit fest, if it's not done yet. That way,
> the patch gets tested on many platforms.

I have an entry in the sept commitfest
https://commitfest.postgresql.org/34/3261/

> 3) Replace "zero" with the number "0".
> + # -1 disables the feature, zero logs all

Fixed.

> 4) I think we can just rename InitStartupProgress to
> EnableStartupProgress or EnableStartupOpProgress to be more in sync
> with what it does.

I feel 'Init' is more appropriate than 'enable' here. As it not only
enables the timer but also initializes some variables. Timer enabling
can also be interpreted as initialization. So in common 'init' is
better than 'enable'.

> 5) Do we need the GetCurrentOperationStartTimestamp function at all?
> It does nothing great actually, you might have referred to
> GetCurrentTimestamp which does a good amount of work that qualifies to
> be inside a function. Can't we just use the operationStartTimestamp
> variable? Can we rename operationStartTimestamp (I don't think we need
> to specify Timestamp in a variable name) to startup_op_start_time or
> some other better name?

Changed it to 'startupProcessOpStartTime'.

> 6) I think you can transform below
> + /* If the feature is disabled, then no need to proceed further. */
> + if (log_startup_progress_interval < 0)
> + return;
> to
> + /* If the feature is disabled, then no need to proceed further. */
> + if (log_startup_progress_interval == -1)
> + return;
> as -1 means feature disabled and values < -1 are not allowed to be set at all.

I feel that should be ok. As '<0' includes '-1'. So it does our job. I
can change it if it is really required to do so.

> 7) Isn't RECOVERY_IN_PROGRESS supposed to be REDO_IN_PRGRESS? Because,
> "recovery in progress" generally applies to the entire startup process
> right? Put it another way, the startup process as a whole does the
> entire recovery, and you have the RECOVERY_IN_PROGRESS in the redo
> phase of the entire startup process.

Changed it as part of the earlier comment's fix. Modified the message
also to 'redo in progress' rather than 'recovery in progress'.

> 8) Why do we need to call get_startup_process_operation_string here?
> Can't you directly use the error message text?
> if (operation == RECOVERY_IN_PROGRESS)
> ereport(LOG,
> (errmsg("%s, elapsed time: %ld.%03d ms, current LSN: %X/%X",
> get_startup_process_operation_string(operation),

Fixed it as part of an earlier comment.

> 9) Do you need error handling in the default case of
> get_startup_process_operation_string? Instead, how about an assertion,
> Assert(operation >= SYNCFS_IN_PROGRESS && operation <=
> RESET_UNLOGGED_REL_END);?
>        default:
>            ereport(ERROR,
>                    (errmsg("unrecognized operation (%d) in startup
>progress update",
>                            operation)));

It is better to have a default case. Assert is difficult to maintain
if there are any modifications to the operations.

> 10) I personally didn't like the name
> get_startup_process_operation_string. How about get_startup_op_string?

I have removed it as part of fixing the earlier comment.

> 11) As pointed out by Robert, the error log message should start with
> small letters.
> "syncing data directory (syncfs)");
> "syncing data directory (fsync)");
> "performing crash recovery");
> "resetting unlogged relations");
>  In general, the error log message should start with small letters and
> not end with ".". The detail/hit messages should start with capital
> letters and end with "."

Thanks for the information.

> 12) How about starting SYNCFS_IN_PROGRESS = 1, and leaving 0 for some
> unknown value?
> typedef enum StartupProcessOp
> {
>   /* Codes for in-progress operations */
>    SYNCFS_IN_PROGRESS = 1,

I don't find any reason to do so. So not changed. Kindly let me know
if there is any specific reason which would help changing it.

> 13) Can we combine LogStartupProgress and CloseStartupProgress?  Let's
> have a single function LogStartupProgress(StartupProcessOp operation,
> const char *path, bool start);, and differentiate the functionality
> with the start flag.

The function becomes complex and it will affect the readability.

> 14) Can we move log_startup_progress_interval declaration from guc.h
> and guc.c to xlog.h and xlog.c? Because it makes more sense to be
> there, similar to the other GUCs under /* these variables are GUC
> parameters related to XLOG */ in xlog.h.

Fixed.

Please find the v5 patch attached. Kindly let me know your comments.






On Sat, Jul 10, 2021 at 1:51 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Jun 21, 2021 at 12:06 PM Nitin Jadhav
> <nitinjadhavpostgres@gmail.com> wrote:
> > That was by mistake and I have corrected it in the attached patch.
>
> Thanks for the patch. Here are some comments. Please ignore if I
> repeat any of the comments given previously, as I didn't look at the
> entire thread.
>
> 1) A new line between function return value and the function name:
> +void CloseStartupProgress(StartupProcessOp operation)
> +{
> Like below:
> +void
> +CloseStartupProgress(StartupProcessOp operation)
> +{
>
> 2) Add an entry in the commit fest, if it's not done yet. That way,
> the patch gets tested on many platforms.
>
> 3) Replace "zero" with the number "0".
> + # -1 disables the feature, zero logs all
>
> 4) I think we can just rename InitStartupProgress to
> EnableStartupProgress or EnableStartupOpProgress to be more in sync
> with what it does.
> +/*
> + * Sets the start timestamp of the current operation and also enables the
> + * timeout for logging the progress of startup process.
> + */
> +void
> +InitStartupProgress(void)
> +{
>
> 5) Do we need the GetCurrentOperationStartTimestamp function at all?
> It does nothing great actually, you might have referred to
> GetCurrentTimestamp which does a good amount of work that qualifies to
> be inside a function. Can't we just use the operationStartTimestamp
> variable? Can we rename operationStartTimestamp (I don't think we need
> to specify Timestamp in a variable name) to startup_op_start_time or
> some other better name?
> +/*
> + * Fetches the start timestamp of the current operation.
> + */
> +TimestampTz
> +GetCurrentOperationStartTimestamp(void)
> +{
>
> 6) I think you can transform below
> + /* If the feature is disabled, then no need to proceed further. */
> + if (log_startup_progress_interval < 0)
> + return;
> to
> + /* If the feature is disabled, then no need to proceed further. */
> + if (log_startup_progress_interval == -1)
> + return;
> as -1 means feature disabled and values < -1 are not allowed to be set at all.
>
> 7) Isn't RECOVERY_IN_PROGRESS supposed to be REDO_IN_PRGRESS? Because,
> "recovery in progress" generally applies to the entire startup process
> right? Put it another way, the startup process as a whole does the
> entire recovery, and you have the RECOVERY_IN_PROGRESS in the redo
> phase of the entire startup process.
>
> 8) Why do we need to call get_startup_process_operation_string here?
> Can't you directly use the error message text?
> if (operation == RECOVERY_IN_PROGRESS)
> ereport(LOG,
> (errmsg("%s, elapsed time: %ld.%03d ms, current LSN: %X/%X",
> get_startup_process_operation_string(operation),
>
> 9) Do you need error handling in the default case of
> get_startup_process_operation_string? Instead, how about an assertion,
> Assert(operation >= SYNCFS_IN_PROGRESS && operation <=
> RESET_UNLOGGED_REL_END);?
>         default:
>             ereport(ERROR,
>                     (errmsg("unrecognized operation (%d) in startup
> progress update",
>                             operation)));
> 10) I personally didn't like the name
> get_startup_process_operation_string. How about get_startup_op_string?
>
> 11) As pointed out by Robert, the error log message should start with
> small letters.
> "syncing data directory (syncfs)");
> "syncing data directory (fsync)");
> "performing crash recovery");
> "resetting unlogged relations");
>  In general, the error log message should start with small letters and
> not end with ".". The detail/hit messages should start with capital
> letters and end with "."
>         ereport(ERROR,
>                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>                  errmsg("only B-Tree indexes are supported as targets
> for verification"),
>                  errdetail("Relation \"%s\" is not a B-Tree index.",
>                            RelationGetRelationName(rel))));
>          ereport(ERROR,
>                         (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
>                          errmsg("sslcert and sslkey are superuser-only"),
>                          errhint("User mappings with the sslcert or
> sslkey options set may only be created or modified by the
> superuser")));
>
> 12) How about starting SYNCFS_IN_PROGRESS = 1, and leaving 0 for some
> unknown value?
> typedef enum StartupProcessOp
> {
>     /* Codes for in-progress operations */
>     SYNCFS_IN_PROGRESS = 1,
>
> 13) Can we combine LogStartupProgress and CloseStartupProgress?  Let's
> have a single function LogStartupProgress(StartupProcessOp operation,
> const char *path, bool start);, and differentiate the functionality
> with the start flag.
>
> 14) Can we move log_startup_progress_interval declaration from guc.h
> and guc.c to xlog.h and xlog.c? Because it makes more sense to be
> there, similar to the other GUCs under /* these variables are GUC
> parameters related to XLOG */ in xlog.h.
>
> Regards,
> Bharath Rupireddy.

Attachment

Re: when the startup process doesn't (logging startup delays)

From
Bharath Rupireddy
Date:
On Wed, Jul 21, 2021 at 12:52 PM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
> Please find the v5 patch attached. Kindly let me know your comments.

Thanks for the patch. Here are some comments on v5:
1) I still don't see the need for two functions LogStartupProgress and
LogStartupProgressComplete. Most of the code is duplicate. I think we
can just do it with a single function something like [1]:

2) Why isn't there a
LogStartupProgressComplete(STARTUP_PROCESS_OP_REDO)? Is it because of
the below existing log message?
ereport(LOG,
(errmsg("redo done at %X/%X system usage: %s",
LSN_FORMAT_ARGS(ReadRecPtr),
pg_rusage_show(&ru0))));

3) I think it should be, "," after occurred instead of "."
+ * elapsed or not. TRUE if timeout occurred, FALSE otherwise.
instead of
+ * elapsed or not. TRUE if timeout occurred. FALSE otherwise.

[1]
+/*
+ * Logs the progress of the operations performed during the startup process.
+ * is_complete true/false indicates that the operation is finished/
+ * in-progress respectively.
+ */
+void
+LogStartupProgress(StartupProcessOp op, const char *path,
+                                  bool is_complete)
+{
+       long                  secs;
+       int                     usecs;
+       int                     elapsed_ms;
+       int                     interval_in_ms;
+
+       /* If not called from the startup process then this feature is
of no use. */
+       if (!AmStartupProcess())
+               return;
+
+       /* If the feature is disabled, then no need to proceed further. */
+       if (log_startup_progress_interval < 0)
+               return;
+
+       /*
+        * If the operation is in-progress and the timeout hasn't occurred, then
+        * no need to log the details.
+        */
+       if (!is_complete && !logStartupProgressTimeout)
+               return;
+
+       /* Timeout has occurred. */
+       TimestampDifference(startupProcessOpStartTime,
+                                               GetCurrentTimestamp(),
+                                               &secs, &usecs);
+
+       /*
+        * If the operation is in-progress, enable the timer for the next log
+        * message based on the time that current log message timer
was supposed to
+        * fire.
+        */
+       if (!is_complete)
+       {
+               elapsed_ms = (secs * 1000) + (usecs / 1000);
+               interval_in_ms = log_startup_progress_interval * 1000;
+               enable_timeout_after(LOG_STARTUP_PROGRESS_TIMEOUT,
+
(interval_in_ms - (elapsed_ms % interval_in_ms)));
+       }
+
+       switch(op)
+       {
+               case STARTUP_PROCESS_OP_SYNCFS:
+                       {
+                               if (is_complete)
+                                       ereport(LOG,
+                                                       (errmsg("data
directory sync (syncfs) complete after %ld.%02d s",
+
 secs, (usecs / 10000))));
+                               else
+                                       ereport(LOG,
+
(errmsg("syncing data directory (syncfs), elapsed time: %ld.%02d s,
current path: %s",
+
 secs, (usecs / 10000), path)));
+                       }
+                       break;
+               case STARTUP_PROCESS_OP_FSYNC:
+                       {
+                               if (is_complete)
+                                       ereport(LOG,
+                                                       (errmsg("data
directory sync (fsync) complete after %ld.%02d s",
+
 secs, (usecs / 10000))));
+                               else
+                                       ereport(LOG,
+
(errmsg("syncing data directory (fsync), elapsed time: %ld.%02d s,
current path: %s",
+
 secs, (usecs / 10000), path)));
+                       }
+                       break;
+               case STARTUP_PROCESS_OP_REDO:
+                       {
+                               /*
+                                * No need to log redo completion
status here, as it will be
+                                * done in the caller.
+                                */
+                               if (!is_complete)
+                                       ereport(LOG,
+                                                       (errmsg("redo
in progress, elapsed time: %ld.%02d s, current LSN: %X/%X",
+
 secs, (usecs / 10000), LSN_FORMAT_ARGS(ReadRecPtr))));
+                       }
+                       break;
+               case STARTUP_PROCESS_OP_RESET_UNLOGGED_REL:
+                       {
+                               if (is_complete)
+                                       ereport(LOG,
+
(errmsg("unlogged relations reset after %ld.%02d s",
+
 secs, (usecs / 10000))));
+                               else
+                                       ereport(LOG,
+
(errmsg("resetting unlogged relations, elapsed time: %ld.%02d s,
current path: %s",
+
 secs, (usecs / 10000), path)));
+                       }
+                       break;
+               default:
+                       ereport(ERROR,
+                                       (errmsg("unrecognized
operation (%d) in startup progress update",
+                                                       op)));
+                       break;
+       }
+
+       if (is_complete)
+               disable_timeout(LOG_STARTUP_PROGRESS_TIMEOUT, false);
+       else
+               logStartupProgressTimeout = false;
+}

Regards,
Bharath Rupireddy.



Re: when the startup process doesn't (logging startup delays)

From
Justin Pryzby
Date:
I think walkdir() should only call LogStartupProgress(FSYNC_IN_PROGRESS, path);
when action == datadir_fsync_fname.

ResetUnloggedRelations() is calling
ResetUnloggedRelationsInTablespaceDir("base", op);
before calling InitStartupProgress().

This shows StartupXLOG() calling ResetUnloggedRelations() twice.
Should they both be shown ?  If so, maybe they should be distinguished as here:

        elog(DEBUG1, "resetting unlogged relations: cleanup %d init %d",
                 (op & UNLOGGED_RELATION_CLEANUP) != 0,
                 (op & UNLOGGED_RELATION_INIT) != 0);

On Wed, Jul 21, 2021 at 12:52:24PM +0530, Nitin Jadhav wrote:
> 2021-07-20 18:47:32.046 IST [102230] LOG:  listening on IPv4 address "127.0.0.1", port 5445
> 2021-07-20 18:47:32.048 IST [102230] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5445"
> 2021-07-20 18:47:32.051 IST [102234] LOG:  database system was interrupted; last known up at 2021-07-20 18:46:27 IST
> 2021-07-20 18:47:32.060 IST [102234] LOG:  data directory sync (fsync) complete after 0.00 s
> 2021-07-20 18:47:32.060 IST [102234] LOG:  database system was not properly shut down; automatic recovery in
progress
> 2021-07-20 18:47:32.063 IST [102234] LOG:  unlogged relations reset after 0.00 s
> 2021-07-20 18:47:32.063 IST [102234] LOG:  redo starts at 0/14EF418
> 2021-07-20 18:47:33.063 IST [102234] LOG:  redo in progress, elapsed time: 1.00 s, current LSN: 0/5C13D28
> 2021-07-20 18:47:34.063 IST [102234] LOG:  redo in progress, elapsed time: 2.00 s, current LSN: 0/A289160
> 2021-07-20 18:47:35.063 IST [102234] LOG:  redo in progress, elapsed time: 3.00 s, current LSN: 0/EE2DE10
> 2021-07-20 18:47:35.563 IST [102234] LOG:  invalid record length at 0/115C63E0: wanted 24, got 0
> 2021-07-20 18:47:35.563 IST [102234] LOG:  redo done at 0/115C63B8 system usage: CPU: user: 3.58 s, system: 0.14 s,
elapsed:3.50 s
 
> 2021-07-20 18:47:35.564 IST [102234] LOG:  unlogged relations reset after 0.00 s
> 2021-07-20 18:47:35.706 IST [102230] LOG:  database system is ready to accept connections

-- 
Justin



Re: when the startup process doesn't (logging startup delays)

From
Nitin Jadhav
Date:
> I still don't see the need for two functions LogStartupProgress and
> LogStartupProgressComplete. Most of the code is duplicate. I think we
> can just do it with a single function something like [1]:

Initially I had written a common function for these 2. You can see
that in the earlier version of the patch. Later separated it since it
was too much for one function. If others also agree to make it common,
I can make that change.

> Why isn't there a
> LogStartupProgressComplete(STARTUP_PROCESS_OP_REDO)? Is it because of
> the below existing log message?
> ereport(LOG,
> (errmsg("redo done at %X/%X system usage: %s",
> LSN_FORMAT_ARGS(ReadRecPtr),
> pg_rusage_show(&ru0))));

Yes. Adding another log message makes it redundant.

>  I think it should be, "," after occurred instead of "."
> + * elapsed or not. TRUE if timeout occurred, FALSE otherwise.
> instead of
> + * elapsed or not. TRUE if timeout occurred. FALSE otherwise.

Fixed.

> I think walkdir() should only call LogStartupProgress(FSYNC_IN_PROGRESS, path);
> when action == datadir_fsync_fname.

I agree and fixed it.

> ResetUnloggedRelations() is calling
> ResetUnloggedRelationsInTablespaceDir("base", op);
> before calling InitStartupProgress().

Fixed.

> This shows StartupXLOG() calling ResetUnloggedRelations() twice.
> Should they both be shown ?  If so, maybe they should be distinguished as here:
>
>        elog(DEBUG1, "resetting unlogged relations: cleanup %d init %d",
>                 (op & UNLOGGED_RELATION_CLEANUP) != 0,
>                 (op & UNLOGGED_RELATION_INIT) != 0);

Fixed. Added separate codes to distinguish.

Please find the patch attached.




On Wed, Jul 21, 2021 at 6:43 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> I think walkdir() should only call LogStartupProgress(FSYNC_IN_PROGRESS, path);
> when action == datadir_fsync_fname.
>
> ResetUnloggedRelations() is calling
> ResetUnloggedRelationsInTablespaceDir("base", op);
> before calling InitStartupProgress().
>
> This shows StartupXLOG() calling ResetUnloggedRelations() twice.
> Should they both be shown ?  If so, maybe they should be distinguished as here:
>
>         elog(DEBUG1, "resetting unlogged relations: cleanup %d init %d",
>                  (op & UNLOGGED_RELATION_CLEANUP) != 0,
>                  (op & UNLOGGED_RELATION_INIT) != 0);
>
> On Wed, Jul 21, 2021 at 12:52:24PM +0530, Nitin Jadhav wrote:
> > 2021-07-20 18:47:32.046 IST [102230] LOG:  listening on IPv4 address "127.0.0.1", port 5445
> > 2021-07-20 18:47:32.048 IST [102230] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5445"
> > 2021-07-20 18:47:32.051 IST [102234] LOG:  database system was interrupted; last known up at 2021-07-20 18:46:27
IST
> > 2021-07-20 18:47:32.060 IST [102234] LOG:  data directory sync (fsync) complete after 0.00 s
> > 2021-07-20 18:47:32.060 IST [102234] LOG:  database system was not properly shut down; automatic recovery in
progress
> > 2021-07-20 18:47:32.063 IST [102234] LOG:  unlogged relations reset after 0.00 s
> > 2021-07-20 18:47:32.063 IST [102234] LOG:  redo starts at 0/14EF418
> > 2021-07-20 18:47:33.063 IST [102234] LOG:  redo in progress, elapsed time: 1.00 s, current LSN: 0/5C13D28
> > 2021-07-20 18:47:34.063 IST [102234] LOG:  redo in progress, elapsed time: 2.00 s, current LSN: 0/A289160
> > 2021-07-20 18:47:35.063 IST [102234] LOG:  redo in progress, elapsed time: 3.00 s, current LSN: 0/EE2DE10
> > 2021-07-20 18:47:35.563 IST [102234] LOG:  invalid record length at 0/115C63E0: wanted 24, got 0
> > 2021-07-20 18:47:35.563 IST [102234] LOG:  redo done at 0/115C63B8 system usage: CPU: user: 3.58 s, system: 0.14 s,
elapsed:3.50 s
 
> > 2021-07-20 18:47:35.564 IST [102234] LOG:  unlogged relations reset after 0.00 s
> > 2021-07-20 18:47:35.706 IST [102230] LOG:  database system is ready to accept connections
>
> --
> Justin

Attachment

Re: when the startup process doesn't (logging startup delays)

From
Justin Pryzby
Date:
On Fri, Jul 23, 2021 at 04:09:47PM +0530, Nitin Jadhav wrote:
> > I think walkdir() should only call LogStartupProgress(FSYNC_IN_PROGRESS, path);
> > when action == datadir_fsync_fname.
> 
> I agree and fixed it.

I saw that you fixed it by calling InitStartupProgress() after the walkdir()
calls which do pre_sync_fname.  So then walkdir is calling
LogStartupProgress(STARTUP_PROCESS_OP_FSYNC) even when it's not doing fsync,
and then LogStartupProgress() is returning because !AmStartupProcess().

That seems indirect, fragile, and confusing.  I suggest that walkdir() should
take an argument for which operation to pass to LogStartupProgress().  You can
pass a special enum for cases where nothing should be logged, like
STARTUP_PROCESS_OP_NONE.

On Wed, Jul 21, 2021 at 04:47:32PM +0530, Bharath Rupireddy wrote:
> 1) I still don't see the need for two functions LogStartupProgress and
> LogStartupProgressComplete. Most of the code is duplicate. I think we
> can just do it with a single function something like [1]:

I agree that one function can do this more succinctly.  I think it's best to
use a separate enum value for START operations and END operations.

   switch(operation)
   {
       case STARTUP_PROCESS_OP_SYNCFS_START:
           ereport(...);
           break;

       case STARTUP_PROCESS_OP_SYNCFS_END:
           ereport(...);
           break;

       case STARTUP_PROCESS_OP_FSYNC_START:
           ereport(...);
           break;

       case STARTUP_PROCESS_OP_FSYNC_END:
           ereport(...);
           break;

       ...

-- 
Justin



Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Sun, Jul 25, 2021 at 1:56 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> On Fri, Jul 23, 2021 at 04:09:47PM +0530, Nitin Jadhav wrote:
> > > I think walkdir() should only call LogStartupProgress(FSYNC_IN_PROGRESS, path);
> > > when action == datadir_fsync_fname.
> >
> > I agree and fixed it.
>
> I saw that you fixed it by calling InitStartupProgress() after the walkdir()
> calls which do pre_sync_fname.  So then walkdir is calling
> LogStartupProgress(STARTUP_PROCESS_OP_FSYNC) even when it's not doing fsync,
> and then LogStartupProgress() is returning because !AmStartupProcess().
>
> That seems indirect, fragile, and confusing.  I suggest that walkdir() should
> take an argument for which operation to pass to LogStartupProgress().  You can
> pass a special enum for cases where nothing should be logged, like
> STARTUP_PROCESS_OP_NONE.

I don't think walkdir() has any business calling LogStartupProgress()
at all. It's supposed to be a generic function, not one that is only
available to be called from the startup process, or has different
behavior there. From my point of view, the right thing is to put the
logging calls into the particular callbacks that SyncDataDirectory()
uses.

> On Wed, Jul 21, 2021 at 04:47:32PM +0530, Bharath Rupireddy wrote:
> > 1) I still don't see the need for two functions LogStartupProgress and
> > LogStartupProgressComplete. Most of the code is duplicate. I think we
> > can just do it with a single function something like [1]:
>
> I agree that one function can do this more succinctly.  I think it's best to
> use a separate enum value for START operations and END operations.

Maybe I'm missing something here, but I don't understand the purpose
of this. You can always combine two functions into one, but it's only
worth doing if you end up with less code, which doesn't seem to be the
case here. The strings are all different and that's most of the
function, and the other stuff that gets done isn't the same either, so
you'd just end up with a bunch of if-statements. That doesn't seem
like an improvement.

Thinking further, I guess I understand it from the caller's
perspective. It's not necessarily clear why in some places we call
LogStartupProgress() and others LogStartupProgressComplete(). Someone
might expect a function with "complete" in the name like that to only
be called once at the very end, rather than once at the end of a
phase, and it does sort of make sense that you'd want to call one
function everywhere rather than sometimes one and sometimes the other
... but I'm not exactly sure how to get there from here. Having only
LogStartupProgress() but having it do a giant if-statement to figure
out whether we're mid-phase or end-of-phase does not seem like the
right approach.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Justin Pryzby
Date:
On Mon, Jul 26, 2021 at 10:13:09AM -0400, Robert Haas wrote:
> I don't think walkdir() has any business calling LogStartupProgress()
> at all. It's supposed to be a generic function, not one that is only
> available to be called from the startup process, or has different
> behavior there. From my point of view, the right thing is to put the
> logging calls into the particular callbacks that SyncDataDirectory()
> uses.

You're right - this is better.

On Wed, Jul 21, 2021 at 04:47:32PM +0530, Bharath Rupireddy wrote:
> > > 1) I still don't see the need for two functions LogStartupProgress and
> > > LogStartupProgressComplete. Most of the code is duplicate. I think we
> > > can just do it with a single function something like [1]:
> >
> > I agree that one function can do this more succinctly.  I think it's best to
> > use a separate enum value for START operations and END operations.
> 
> Maybe I'm missing something here, but I don't understand the purpose
> of this. You can always combine two functions into one, but it's only
> worth doing if you end up with less code, which doesn't seem to be the
> case here.

 4 files changed, 39 insertions(+), 71 deletions(-)

> ... but I'm not exactly sure how to get there from here. Having only
> LogStartupProgress() but having it do a giant if-statement to figure
> out whether we're mid-phase or end-of-phase does not seem like the
> right approach.

I used a bool arg and negation to handle within a single switch.  Maybe it's
cleaner to use a separate enum value for each DONE, and set a local done flag.

 startup[29675] LOG:  database system was interrupted; last known up at 2021-07-26 10:23:04 CDT
 startup[29675] LOG:  syncing data directory (fsync), elapsed time: 1.38 s, current path: ./pg_ident.conf
 startup[29675] LOG:  data directory sync (fsync) complete after 1.72 s
 startup[29675] LOG:  database system was not properly shut down; automatic recovery in progress
 startup[29675] LOG:  resetting unlogged relations (cleanup) complete after 0.00 s
 startup[29675] LOG:  redo starts at 0/17BE500
 startup[29675] LOG:  redo in progress, elapsed time: 1.00 s, current LSN: 0/35D7CB8
 startup[29675] LOG:  redo in progress, elapsed time: 2.00 s, current LSN: 0/54A6918
 startup[29675] LOG:  redo in progress, elapsed time: 3.00 s, current LSN: 0/7370570
 startup[29675] LOG:  redo in progress, elapsed time: 4.00 s, current LSN: 0/924D8A0
 startup[29675] LOG:  redo done at 0/9FFFFB8 system usage: CPU: user: 4.28 s, system: 0.15 s, elapsed: 4.44 s
 startup[29675] LOG:  resetting unlogged relations (init) complete after 0.03 s
 startup[29675] LOG:  checkpoint starting: end-of-recovery immediate
 startup[29675] LOG:  checkpoint complete: wrote 9872 buffers (60.3%); 0 WAL file(s) added, 0 removed, 8 recycled;
write=0.136s, sync=0.801 s, total=1.260 s; sync files=21, longest=0.774 s, average=B
 


Attachment

Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Mon, Jul 26, 2021 at 11:30 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > Maybe I'm missing something here, but I don't understand the purpose
> > of this. You can always combine two functions into one, but it's only
> > worth doing if you end up with less code, which doesn't seem to be the
> > case here.
>
>  4 files changed, 39 insertions(+), 71 deletions(-)

Hmm. I don't completely hate that, but I don't love it either. I don't
think the result is any easier to understand than the original, and in
some ways it's worse. In particular, you've flattened the separate
comments for each of the individual functions into a single-line
comment that is more generic than the comments it replaced. You could
fix that and you'd still be slightly ahead on LOC, but I'm not
convinced that it's actually a real win.

> > ... but I'm not exactly sure how to get there from here. Having only
> > LogStartupProgress() but having it do a giant if-statement to figure
> > out whether we're mid-phase or end-of-phase does not seem like the
> > right approach.
>
> I used a bool arg and negation to handle within a single switch.  Maybe it's
> cleaner to use a separate enum value for each DONE, and set a local done flag.

If we're going to go the route of combining the functions, I agree
that a Boolean is the way to go; I think we have some existing
precedent for 'bool finished' rather than 'bool done'.

But I kind of wonder if we should have an enum in the first place. It
feels like we've got code in a bunch of places that just exists to
decide which enum value to use, and then code someplace else that
turns around and decides which message to produce. That would be
sensible if we were using the same enum values in lots of places, but
that's not the case here. So suppose we just moved the messages to the
places where we're now calling LogStartupProgress() or
LogStartupProgressComplete()? So something like this:

if (should_report_startup_progress())
    ereport(LOG,
                 (errmsg("syncing data directory (syncfs), elapsed
time: %ld.%02d s, current path: %s",
                               secs, (usecs / 10000), path)));

Well, that doesn't quite work, because "secs" and "usecs" aren't going
to exist in the caller. We can fix that easily enough: let's just make
should_report_startup_progress take arguments long *secs, int *usecs,
and bool done. Then the above becomes:

if (should_report_startup_progress(&secs, &usecs, false))
    ereport(LOG,
                 (errmsg("syncing data directory (syncfs), elapsed
time: %ld.%02d s, current path: %s",
                               secs, (usecs / 10000), path)));

And if this were the call site that corresponds to
LogStartupProgressComplete(), we'd replace false with true. Now, the
only real disadvantage of this that I can see is that it requires the
caller to declare 'secs' and 'usecs', which is not a big deal, but
mildly annoying perhaps. I think we can do better still with a little
macro magic. Suppose we define a macro report_startup_progress(force,
msg, ...) that expands to:

{
long secs;
int usecs;
if (startup_progress_timer_expired(&secs, &usecs, force))
ereport(LOG, errmsg(msg, secs, usecs, ...));
}

Then the above just becomes this:

    report_startup_progress(false, "syncing data directory (syncfs),
elapsed time: %ld.%02d s, current path: %s", path);

This would have the advantage that the strings we're using would be
present in the code that arranges to emit them, instead of being
removed to some other module, so I think it would be clearer. It would
also have the advantage of making it much easier to add further calls,
if someone feels they want to do that. You don't have to run around
and update enums and all the various things that use them, just copy
and paste the line above and adjust as required.

With this design, we avoid a lot of "action at a distance". We don't
define the message strings in a place far-removed from the code that
wants to emit them any more. When someone wants a new progress
message, they can just add another call to report_statup_progress()
wherever it needs to go and they're done. They don't have to go run
and update the enum and various switch statements. They're just done.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Nitin Jadhav
Date:
> > > I saw that you fixed it by calling InitStartupProgress() after the walkdir()
> > > calls which do pre_sync_fname.  So then walkdir is calling
> > > LogStartupProgress(STARTUP_PROCESS_OP_FSYNC) even when it's not doing fsync,
> > > and then LogStartupProgress() is returning because !AmStartupProcess().
> >
> > I don't think walkdir() has any business calling LogStartupProgress()
> > at all. It's supposed to be a generic function, not one that is only
> > available to be called from the startup process, or has different
> > behavior there. From my point of view, the right thing is to put the
> > logging calls into the particular callbacks that SyncDataDirectory()
> > uses.
>
> You're right - this is better.

I also agree that this is the better place to do it. Hence modified
the patch accordingly. The condition "!AmStartupProcess()" is added to
differentiate whether the call is done from a startup process or some
other process. Actually StartupXLOG() gets called in 2 places. one in
StartupProcessMain() and the other in InitPostgres(). As the logging
of the startup progress is required only during the startup process
and not in the other cases, so added the condition to confirm the call
is from the startup process. I did not find any other way to
differentiate. Kindly let me know if there is any other better
approach to do this.

> > > Maybe I'm missing something here, but I don't understand the purpose
> > > of this. You can always combine two functions into one, but it's only
> > > worth doing if you end up with less code, which doesn't seem to be the
> > > case here.
> >
> >  4 files changed, 39 insertions(+), 71 deletions(-)
>
> Hmm. I don't completely hate that, but I don't love it either. I don't
> think the result is any easier to understand than the original, and in
> some ways it's worse. In particular, you've flattened the separate
> comments for each of the individual functions into a single-line
> comment that is more generic than the comments it replaced. You could
> fix that and you'd still be slightly ahead on LOC, but I'm not
> convinced that it's actually a real win.

As per my understanding there are no changes required wrt this. Hence
not done any changes.

Please find the updated patch attached. Kindly share your comments if any.

On Mon, Jul 26, 2021 at 10:41 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Jul 26, 2021 at 11:30 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > Maybe I'm missing something here, but I don't understand the purpose
> > > of this. You can always combine two functions into one, but it's only
> > > worth doing if you end up with less code, which doesn't seem to be the
> > > case here.
> >
> >  4 files changed, 39 insertions(+), 71 deletions(-)
>
> Hmm. I don't completely hate that, but I don't love it either. I don't
> think the result is any easier to understand than the original, and in
> some ways it's worse. In particular, you've flattened the separate
> comments for each of the individual functions into a single-line
> comment that is more generic than the comments it replaced. You could
> fix that and you'd still be slightly ahead on LOC, but I'm not
> convinced that it's actually a real win.
>
> > > ... but I'm not exactly sure how to get there from here. Having only
> > > LogStartupProgress() but having it do a giant if-statement to figure
> > > out whether we're mid-phase or end-of-phase does not seem like the
> > > right approach.
> >
> > I used a bool arg and negation to handle within a single switch.  Maybe it's
> > cleaner to use a separate enum value for each DONE, and set a local done flag.
>
> If we're going to go the route of combining the functions, I agree
> that a Boolean is the way to go; I think we have some existing
> precedent for 'bool finished' rather than 'bool done'.
>
> But I kind of wonder if we should have an enum in the first place. It
> feels like we've got code in a bunch of places that just exists to
> decide which enum value to use, and then code someplace else that
> turns around and decides which message to produce. That would be
> sensible if we were using the same enum values in lots of places, but
> that's not the case here. So suppose we just moved the messages to the
> places where we're now calling LogStartupProgress() or
> LogStartupProgressComplete()? So something like this:
>
> if (should_report_startup_progress())
>     ereport(LOG,
>                  (errmsg("syncing data directory (syncfs), elapsed
> time: %ld.%02d s, current path: %s",
>                                secs, (usecs / 10000), path)));
>
> Well, that doesn't quite work, because "secs" and "usecs" aren't going
> to exist in the caller. We can fix that easily enough: let's just make
> should_report_startup_progress take arguments long *secs, int *usecs,
> and bool done. Then the above becomes:
>
> if (should_report_startup_progress(&secs, &usecs, false))
>     ereport(LOG,
>                  (errmsg("syncing data directory (syncfs), elapsed
> time: %ld.%02d s, current path: %s",
>                                secs, (usecs / 10000), path)));
>
> And if this were the call site that corresponds to
> LogStartupProgressComplete(), we'd replace false with true. Now, the
> only real disadvantage of this that I can see is that it requires the
> caller to declare 'secs' and 'usecs', which is not a big deal, but
> mildly annoying perhaps. I think we can do better still with a little
> macro magic. Suppose we define a macro report_startup_progress(force,
> msg, ...) that expands to:
>
> {
> long secs;
> int usecs;
> if (startup_progress_timer_expired(&secs, &usecs, force))
> ereport(LOG, errmsg(msg, secs, usecs, ...));
> }
>
> Then the above just becomes this:
>
>     report_startup_progress(false, "syncing data directory (syncfs),
> elapsed time: %ld.%02d s, current path: %s", path);
>
> This would have the advantage that the strings we're using would be
> present in the code that arranges to emit them, instead of being
> removed to some other module, so I think it would be clearer. It would
> also have the advantage of making it much easier to add further calls,
> if someone feels they want to do that. You don't have to run around
> and update enums and all the various things that use them, just copy
> and paste the line above and adjust as required.
>
> With this design, we avoid a lot of "action at a distance". We don't
> define the message strings in a place far-removed from the code that
> wants to emit them any more. When someone wants a new progress
> message, they can just add another call to report_statup_progress()
> wherever it needs to go and they're done. They don't have to go run
> and update the enum and various switch statements. They're just done.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com

Attachment

Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Wed, Jul 28, 2021 at 5:24 AM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
> I also agree that this is the better place to do it. Hence modified
> the patch accordingly. The condition "!AmStartupProcess()" is added to
> differentiate whether the call is done from a startup process or some
> other process. Actually StartupXLOG() gets called in 2 places. one in
> StartupProcessMain() and the other in InitPostgres(). As the logging
> of the startup progress is required only during the startup process
> and not in the other cases,

The InitPostgres() case occurs when the server is started in bootstrap
mode (during initdb) or in single-user mode (postgres --single). I do
not see any reason why we shouldn't produce progress messages in at
least the latter case. I suspect that someone who is in the rather
desperate scenario of having to use single-user mode would really like
to know how long the server is going to take to start up.

Perhaps during initdb we don't want messages, but I'm not sure that we
need to do anything about that here. None of the messages that the
server normally produces show up when you run initdb, so I guess they
are getting redirected to /dev/null or something.

So I don't think that using AmStartupProcess() for this purpose is right.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Bharath Rupireddy
Date:
On Wed, Jul 28, 2021 at 7:02 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Jul 28, 2021 at 5:24 AM Nitin Jadhav
> <nitinjadhavpostgres@gmail.com> wrote:
> > I also agree that this is the better place to do it. Hence modified
> > the patch accordingly. The condition "!AmStartupProcess()" is added to
> > differentiate whether the call is done from a startup process or some
> > other process. Actually StartupXLOG() gets called in 2 places. one in
> > StartupProcessMain() and the other in InitPostgres(). As the logging
> > of the startup progress is required only during the startup process
> > and not in the other cases,
>
> The InitPostgres() case occurs when the server is started in bootstrap
> mode (during initdb) or in single-user mode (postgres --single). I do
> not see any reason why we shouldn't produce progress messages in at
> least the latter case. I suspect that someone who is in the rather
> desperate scenario of having to use single-user mode would really like
> to know how long the server is going to take to start up.

+1 to emit the same log messages in single-user mode and basically
whoever calls StartupXLOG. Do we need to adjust the GUC parameter
log_startup_progress_interval(a reasonable value) so that the logs are
generated by default?

> Perhaps during initdb we don't want messages, but I'm not sure that we
> need to do anything about that here. None of the messages that the
> server normally produces show up when you run initdb, so I guess they
> are getting redirected to /dev/null or something.

I enabled the below log message in XLogFlush and ran initdb, it is
printing these logs onto the stdout, looks like the logs have not been
redirected to /dev/null in initdb/bootstrap mode.

#ifdef WAL_DEBUG
if (XLOG_DEBUG)
elog(LOG, "xlog flush request %X/%X; write %X/%X; flush %X/%X",
LSN_FORMAT_ARGS(record),
LSN_FORMAT_ARGS(LogwrtResult.Write),
LSN_FORMAT_ARGS(LogwrtResult.Flush));
#endif

So, even in bootstrap mode, can we use the above #ifdef WAL_DEBUG and
XLOG_DEBUG to print those logs? It looks like the problem with these
macros is that they are not settable by normal users in the production
environment, but only by the developers. I'm not sure how much it is
helpful to print the startup process progress logs in bootstrap mode.
Maybe we can use the IsBootstrapProcessingMode macro to disable these
logs in the bootstrap mode.

> So I don't think that using AmStartupProcess() for this purpose is right.

Agree.

Regards,
Bharath Rupireddy.



Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Wed, Jul 28, 2021 at 11:25 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> > Perhaps during initdb we don't want messages, but I'm not sure that we
> > need to do anything about that here. None of the messages that the
> > server normally produces show up when you run initdb, so I guess they
> > are getting redirected to /dev/null or something.
>
> I enabled the below log message in XLogFlush and ran initdb, it is
> printing these logs onto the stdout, looks like the logs have not been
> redirected to /dev/null in initdb/bootstrap mode.
>
> #ifdef WAL_DEBUG
> if (XLOG_DEBUG)
> elog(LOG, "xlog flush request %X/%X; write %X/%X; flush %X/%X",
> LSN_FORMAT_ARGS(record),
> LSN_FORMAT_ARGS(LogwrtResult.Write),
> LSN_FORMAT_ARGS(LogwrtResult.Flush));
> #endif
>
> So, even in bootstrap mode, can we use the above #ifdef WAL_DEBUG and
> XLOG_DEBUG to print those logs? It looks like the problem with these
> macros is that they are not settable by normal users in the production
> environment, but only by the developers. I'm not sure how much it is
> helpful to print the startup process progress logs in bootstrap mode.
> Maybe we can use the IsBootstrapProcessingMode macro to disable these
> logs in the bootstrap mode.

I don't think we should drag XLOG_DEBUG into this. That's a debugging
facility that isn't really relevant to the topic at hand. The point is
the server normally prints a bunch of messages that we don't see in
bootstrap mode. For example:

[rhaas pgsql]$ postgres
2021-07-28 11:32:33.824 EDT [36801] LOG:  starting PostgreSQL 15devel
on x86_64-apple-darwin19.6.0, compiled by clang version 5.0.2
(tags/RELEASE_502/final), 64-bit
2021-07-28 11:32:33.825 EDT [36801] LOG:  listening on IPv6 address
"::1", port 5432
2021-07-28 11:32:33.825 EDT [36801] LOG:  listening on IPv4 address
"127.0.0.1", port 5432
2021-07-28 11:32:33.826 EDT [36801] LOG:  listening on Unix socket
"/tmp/.s.PGSQL.5432"
2021-07-28 11:32:33.846 EDT [36802] LOG:  database system was shut
down at 2021-07-28 11:32:32 EDT
2021-07-28 11:32:33.857 EDT [36801] LOG:  database system is ready to
accept connections

None of that shows up when you run initdb. Taking a fast look at the
code, I don't think the reasons are the same for all of those
messages. Some of the code isn't reached, whereas e.g. "database
system was shut down at 2021-07-28 11:32:32 EDT" is special-cased. I'm
not sure right off the top of my head what this code should do, but
ideally it looks something like one of the cases we've already got.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Nitin Jadhav
Date:
> The InitPostgres() case occurs when the server is started in bootstrap
> mode (during initdb) or in single-user mode (postgres --single). I do
> not see any reason why we shouldn't produce progress messages in at
> least the latter case. I suspect that someone who is in the rather
> desperate scenario of having to use single-user mode would really like
> to know how long the server is going to take to start up.

Thanks for sharing the information. I have done the necessary changes
to show the logs during the latter case (postgres --single) and
verified the log messages.

> +1 to emit the same log messages in single-user mode and basically
> whoever calls StartupXLOG. Do we need to adjust the GUC parameter
> log_startup_progress_interval(a reasonable value) so that the logs are
> generated by default?

At present, this feature is enabled by default and the initial value
set for log_startup_progress_interval is 10 seconds.


On Wed, Jul 28, 2021 at 9:07 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Jul 28, 2021 at 11:25 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > Perhaps during initdb we don't want messages, but I'm not sure that we
> > > need to do anything about that here. None of the messages that the
> > > server normally produces show up when you run initdb, so I guess they
> > > are getting redirected to /dev/null or something.
> >
> > I enabled the below log message in XLogFlush and ran initdb, it is
> > printing these logs onto the stdout, looks like the logs have not been
> > redirected to /dev/null in initdb/bootstrap mode.
> >
> > #ifdef WAL_DEBUG
> > if (XLOG_DEBUG)
> > elog(LOG, "xlog flush request %X/%X; write %X/%X; flush %X/%X",
> > LSN_FORMAT_ARGS(record),
> > LSN_FORMAT_ARGS(LogwrtResult.Write),
> > LSN_FORMAT_ARGS(LogwrtResult.Flush));
> > #endif
> >
> > So, even in bootstrap mode, can we use the above #ifdef WAL_DEBUG and
> > XLOG_DEBUG to print those logs? It looks like the problem with these
> > macros is that they are not settable by normal users in the production
> > environment, but only by the developers. I'm not sure how much it is
> > helpful to print the startup process progress logs in bootstrap mode.
> > Maybe we can use the IsBootstrapProcessingMode macro to disable these
> > logs in the bootstrap mode.
>
> I don't think we should drag XLOG_DEBUG into this. That's a debugging
> facility that isn't really relevant to the topic at hand. The point is
> the server normally prints a bunch of messages that we don't see in
> bootstrap mode. For example:
>
> [rhaas pgsql]$ postgres
> 2021-07-28 11:32:33.824 EDT [36801] LOG:  starting PostgreSQL 15devel
> on x86_64-apple-darwin19.6.0, compiled by clang version 5.0.2
> (tags/RELEASE_502/final), 64-bit
> 2021-07-28 11:32:33.825 EDT [36801] LOG:  listening on IPv6 address
> "::1", port 5432
> 2021-07-28 11:32:33.825 EDT [36801] LOG:  listening on IPv4 address
> "127.0.0.1", port 5432
> 2021-07-28 11:32:33.826 EDT [36801] LOG:  listening on Unix socket
> "/tmp/.s.PGSQL.5432"
> 2021-07-28 11:32:33.846 EDT [36802] LOG:  database system was shut
> down at 2021-07-28 11:32:32 EDT
> 2021-07-28 11:32:33.857 EDT [36801] LOG:  database system is ready to
> accept connections
>
> None of that shows up when you run initdb. Taking a fast look at the
> code, I don't think the reasons are the same for all of those
> messages. Some of the code isn't reached, whereas e.g. "database
> system was shut down at 2021-07-28 11:32:32 EDT" is special-cased. I'm
> not sure right off the top of my head what this code should do, but
> ideally it looks something like one of the cases we've already got.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com

Attachment

Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Thu, Jul 29, 2021 at 4:56 AM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
> Thanks for sharing the information. I have done the necessary changes
> to show the logs during the latter case (postgres --single) and
> verified the log messages.

Thanks. Can you please have a look at what I suggested down toward the
bottom of http://postgr.es/m/CA+TgmoaP2wEFSktmCgwT9LXuz7Y99HNdUYshpk7qNFuQB98g6g@mail.gmail.com
?

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Nitin Jadhav
Date:
> Thanks. Can you please have a look at what I suggested down toward the
> bottom of http://postgr.es/m/CA+TgmoaP2wEFSktmCgwT9LXuz7Y99HNdUYshpk7qNFuQB98g6g@mail.gmail.com
> ?
>
> If we're going to go the route of combining the functions, I agree
> that a Boolean is the way to go; I think we have some existing
> precedent for 'bool finished' rather than 'bool done'.
>
> But I kind of wonder if we should have an enum in the first place. It
> feels like we've got code in a bunch of places that just exists to
> decide which enum value to use, and then code someplace else that
> turns around and decides which message to produce. That would be
> sensible if we were using the same enum values in lots of places, but
> that's not the case here. So suppose we just moved the messages to the
> places where we're now calling LogStartupProgress() or
> LogStartupProgressComplete()? So something like this:

Sorry. I thought it is related to the discussion of deciding whether
LogStartupProgress() and LogStartupProgressComplete() should be
combined or not. I feel it's a really nice design. With this we avoid
a "action at a distance" issue and its easy to use. If we are
reporting the same kind of msgs at multiple places then the current
approach of using enum will be more suitable since we don't have to
worry about matching the log msg string. But in the current scenario,
we are not using the same kind of msgs at multiple places (I feel such
scenario will not occur in future also. Even if there is similar
operation, it can be distinguished like resetting unlogged relations
is distinguished by init and clean. Kindly mention if you can oversee
any such scenario), hence the approach you are suggesting will be a
best suit.

Thanks & Regards,
Nitin Jadhav

On Thu, Jul 29, 2021 at 9:48 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Jul 29, 2021 at 4:56 AM Nitin Jadhav
> <nitinjadhavpostgres@gmail.com> wrote:
> > Thanks for sharing the information. I have done the necessary changes
> > to show the logs during the latter case (postgres --single) and
> > verified the log messages.
>
> Thanks. Can you please have a look at what I suggested down toward the
> bottom of http://postgr.es/m/CA+TgmoaP2wEFSktmCgwT9LXuz7Y99HNdUYshpk7qNFuQB98g6g@mail.gmail.com
> ?
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Nitin Jadhav
Date:
> Thanks. Can you please have a look at what I suggested down toward the
> bottom of http://postgr.es/m/CA+TgmoaP2wEFSktmCgwT9LXuz7Y99HNdUYshpk7qNFuQB98g6g@mail.gmail.com
?

Implemented the above approach and verified the patch. Kindly have a
look and share your thoughts.

Thanks & Regards,
Nitin Jadhav

On Fri, Jul 30, 2021 at 10:40 AM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
>
> > Thanks. Can you please have a look at what I suggested down toward the
> > bottom of http://postgr.es/m/CA+TgmoaP2wEFSktmCgwT9LXuz7Y99HNdUYshpk7qNFuQB98g6g@mail.gmail.com
> > ?
> >
> > If we're going to go the route of combining the functions, I agree
> > that a Boolean is the way to go; I think we have some existing
> > precedent for 'bool finished' rather than 'bool done'.
> >
> > But I kind of wonder if we should have an enum in the first place. It
> > feels like we've got code in a bunch of places that just exists to
> > decide which enum value to use, and then code someplace else that
> > turns around and decides which message to produce. That would be
> > sensible if we were using the same enum values in lots of places, but
> > that's not the case here. So suppose we just moved the messages to the
> > places where we're now calling LogStartupProgress() or
> > LogStartupProgressComplete()? So something like this:
>
> Sorry. I thought it is related to the discussion of deciding whether
> LogStartupProgress() and LogStartupProgressComplete() should be
> combined or not. I feel it's a really nice design. With this we avoid
> a "action at a distance" issue and its easy to use. If we are
> reporting the same kind of msgs at multiple places then the current
> approach of using enum will be more suitable since we don't have to
> worry about matching the log msg string. But in the current scenario,
> we are not using the same kind of msgs at multiple places (I feel such
> scenario will not occur in future also. Even if there is similar
> operation, it can be distinguished like resetting unlogged relations
> is distinguished by init and clean. Kindly mention if you can oversee
> any such scenario), hence the approach you are suggesting will be a
> best suit.
>
> Thanks & Regards,
> Nitin Jadhav
>
> On Thu, Jul 29, 2021 at 9:48 PM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Thu, Jul 29, 2021 at 4:56 AM Nitin Jadhav
> > <nitinjadhavpostgres@gmail.com> wrote:
> > > Thanks for sharing the information. I have done the necessary changes
> > > to show the logs during the latter case (postgres --single) and
> > > verified the log messages.
> >
> > Thanks. Can you please have a look at what I suggested down toward the
> > bottom of http://postgr.es/m/CA+TgmoaP2wEFSktmCgwT9LXuz7Y99HNdUYshpk7qNFuQB98g6g@mail.gmail.com
> > ?
> >
> > --
> > Robert Haas
> > EDB: http://www.enterprisedb.com

Attachment

Re: when the startup process doesn't (logging startup delays)

From
Justin Pryzby
Date:
Two issues that I saw:

The first syncfs message is not output, because it's before
InitStartupProgress() is called:

2021-08-03 07:53:02.176 CDT startup[9717] LOG:  database system was interrupted; last known up at 2021-08-03 07:52:15
CDT
2021-08-03 07:53:02.733 CDT startup[9717] LOG:  data directory sync (syncfs) complete after 0.55 s
2021-08-03 07:53:02.734 CDT startup[9717] LOG:  database system was not properly shut down; automatic recovery in
progress

FP exception when the GUC is set to 0:

2021-08-03 07:53:02.877 CDT postmaster[9715] LOG:  startup process (PID 9717) was terminated by signal 8: Floating
pointexception
 

Probably due to mod zero operation.
This prevents the process from starting.

+       enable_timeout_after(LOG_STARTUP_PROGRESS_TIMEOUT,
+                                                (interval_in_ms - (elapsed_ms % interval_in_ms)));

-- 
Justin



Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Tue, Aug 3, 2021 at 2:48 AM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
> Implemented the above approach and verified the patch. Kindly have a
> look and share your thoughts.

+LogStartupProgressTimerExpired(bool force, long *secs, int *usecs)

The name of this function begins with "log" but it does not log
anything, so that's probably a sign that you should rethink the name
of the function. I suggested startup_progress_timer_expired()
upthread, but now I think maybe we should call it
startup_progress_timer_has_expired() and then renaming the Boolean
you've called logStartupProgressTimeout to
startup_progress_timer_expired. Also, the argument "bool force"
doesn't really make sense for this function, which is why I suggested
above calling it "bool done" instead.

+    elapsed_ms = (seconds * 1000) + (useconds / 1000);
+    interval_in_ms = log_startup_progress_interval * 1000;
+    enable_timeout_after(LOG_STARTUP_PROGRESS_TIMEOUT,
+                         (interval_in_ms - (elapsed_ms % interval_in_ms)));

This will work correctly only if elapsed_ms is equal to interval_in_ms
or slightly greater than interval_ms. But if elapsed_ms is greater
than two times interval_ms, then it will produce pretty much random
results. If elapsed_ms is negative because the system clock gets set
backward, a possibility I've already mentioned to you in a previous
review, then it will also misbehave. I actually don't think
enable_timeout_after() is very easy to use for this kind of thing. At
least for me, it's way easier to think about calculating the timestamp
at which I want the timer to expire. Maybe something along these
lines:

next_timeout = last_startup_progress_timeout + interval;
if (next_timeout < now)
{
    // Either the timeout was processed so late that we missed an entire cycle,
    // or the system clock was set backwards.
    next_timeout = now + interval;
}
enable_timeout_at(next_timeout);

Also, I said before that I thought it was OK that you were logging a
line at the end of every operation as well as after every N
milliseconds. But, the more I think about it, the less I like it. We
already have a 'redo done' line that shows up at the end of redo,
which the patch wisely does not duplicate. But it's not quite clear
that any of these other things are important enough to bother
mentioning in the log unless they take a long time. After an immediate
shutdown of an empty cluster, with this patch applied, I get 3 extra
log messages:

2021-08-03 10:17:49.630 EDT [17567] LOG:  data directory sync (fsync)
complete after 0.13 s
2021-08-03 10:17:49.633 EDT [17567] LOG:  resetting unlogged relations
(cleanup) complete after 0.00 s
2021-08-03 10:17:49.635 EDT [17567] LOG:  resetting unlogged relations
(init) complete after 0.00 s

That doesn't seem like information anyone really needs. If it had
taken a long time, it would have been worth logging, but in the normal
case where it doesn't, it's just clutter. Another funny thing is that,
as you've coded it, those additional log lines only appear when
log_startup_progress_interval != 0. That's strange. It seems
particularly strange because of the existing precedent where 'redo
done' appears regardless of any setting, but also because when I set,
say, a 10s interval, I guess I expect something to happen every 10s.
Making something happen once at the end is different.

So I think we should take this out, which would permit simplifying a
bunch of code.The four places where you call
ereport_startup_progress(true, ...) would go away.
ereport_startup_progress() would no longer need a Boolean argument,
and neither would LogStartupProgressTimerExpired() /
startup_progress_timer_has_expired(). Note that there's no real need
to disable the timeout when we're done with it. It's fine if we do,
but if we don't, it's also not a big deal; all that happens if we
leave the timer scheduled and let it expire is that it will set a
Boolean flag that nobody will care about. So what I'm thinking about
is that we could just have, say, reset_startup_progress_timeout() and
startup_progress_timeout_has_expired().
reset_startup_progress_timeout() would just do exactly what I showed
above to reset the timeout, and you'd call it at the beginning of each
phase. And startup_progress_timeout_has_expired() would look roughly
like this:

if (!startup_progress_timer_expired)
    return;
now = GetCurrentTimestamp();
// compute timestamp difference
last_startup_progress_timeout = now;
reset_startup_progress_timeout();

With these changes you'd have only 1 place in the code that needs to
care about log_startup_progress_interval, as opposed to 3 as you have
it currently, and only one place that enables the timeout, as opposed
to 2 as you have it currently. I think that would be tidier.

I think we also should consider where to put the new functions
introduced by this patch, and the GUC. You put them in xlog.c which is
reasonable since that is where StartupXLOG() lives. However, xlog.c is
also a gigantic file, and xlog.h is included in a lot of places, most
of which aren't going to care about the new things you're adding to
that file at all. So I'm thinking it might make more sense to put the
new code in src/backend/postmaster/startup.c. That is actually a
better thematic fit given that this is really about the startup
process specifically, not WAL-logging in general. Then reinit.c would
include startup.h instead of xlog.h, which seems better, because I
don't think we want any actual xlog operations to happen from within
that file, so better not to be including xlog.h.

The patch currently lacks documentation. It needs to update config.sgml.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Nitin Jadhav
Date:
Thanks for the detailed explanation.

> +    elapsed_ms = (seconds * 1000) + (useconds / 1000);
> +    interval_in_ms = log_startup_progress_interval * 1000;
> +    enable_timeout_after(LOG_STARTUP_PROGRESS_TIMEOUT,
> +                         (interval_in_ms - (elapsed_ms % interval_in_ms)));
>
> This will work correctly only if elapsed_ms is equal to interval_in_ms
> or slightly greater than interval_ms. But if elapsed_ms is greater
> than two times interval_ms, then it will produce pretty much random
> results. If elapsed_ms is negative because the system clock gets set
> backward, a possibility I've already mentioned to you in a previous
> review, then it will also misbehave. I actually don't think
> enable_timeout_after() is very easy to use for this kind of thing. At
> least for me, it's way easier to think about calculating the timestamp
> at which I want the timer to expire. Maybe something along these
> lines:
>
> next_timeout = last_startup_progress_timeout + interval;
> if (next_timeout < now)
> {
>    // Either the timeout was processed so late that we missed an entire cycle,
>    // or the system clock was set backwards.
>    next_timeout = now + interval;
> }
> enable_timeout_at(next_timeout);
>
> So I think we should take this out, which would permit simplifying a
> bunch of code.The four places where you call
> ereport_startup_progress(true, ...) would go away.
> ereport_startup_progress() would no longer need a Boolean argument,
> and neither would LogStartupProgressTimerExpired() /
> startup_progress_timer_has_expired(). Note that there's no real need
> to disable the timeout when we're done with it. It's fine if we do,
> but if we don't, it's also not a big deal; all that happens if we
> leave the timer scheduled and let it expire is that it will set a
> Boolean flag that nobody will care about. So what I'm thinking about
> is that we could just have, say, reset_startup_progress_timeout() and
> startup_progress_timeout_has_expired().
> reset_startup_progress_timeout() would just do exactly what I showed
> above to reset the timeout, and you'd call it at the beginning of each
> phase. And startup_progress_timeout_has_expired() would look roughly
> like this:
>
> if (!startup_progress_timer_expired)
>    return;
> now = GetCurrentTimestamp();
> // compute timestamp difference
> last_startup_progress_timeout = now;
> reset_startup_progress_timeout();

This seems a little confusing. As we are setting
last_startup_progress_timeout = now and then calling
reset_startup_progress_timeout() which will calculate the next_time
based on the value of last_startup_progress_timeout initially and
checks whether next_timeout is less than now. It doesn't make sense to
me. I feel we should calculate the next_timeout based on the time that
it is supposed to fire next time. So we should set
last_startup_progress_timeout = next_timeout after enabling the timer.
Also I feel with the 2 functions mentioned above, we also need
InitStartupProgress() which sets the initial value to
startupProcessOpStartTime which is used to calculate the time
difference and display in the logs. I could see those functions like
below.

InitStartupProgress(void)
{
    startupProcessOpStartTime = GetCurrentTimestamp();
    ResetStartupProgressTimeout(startupProcessOpStartTime);
}

reset_startup_progress_timeout(TimeStampTz now)
{
  next_timeout = last_startup_progress_timeout + interval;
  if (next_timeout < now)
  {
     // Either the timeout was processed so late that we missed an entire cycle,
     // or the system clock was set backwards.
     next_timeout = now + interval;
  }
  enable_timeout_at(next_timeout);
  last_startup_progress_timeout = next_timeout;
}

startup_progress_timeout_has_expired()
{
   if (!startup_progress_timer_expired)
     return;
  now = GetCurrentTimestamp();
  // compute timestamp difference based on startupProcessOpStartTime
  reset_startup_progress_timeout(now);
}

Kindly share your thoughts and correct me if I am wrong.

> I think we also should consider where to put the new functions
> introduced by this patch, and the GUC. You put them in xlog.c which is
> reasonable since that is where StartupXLOG() lives. However, xlog.c is
> also a gigantic file, and xlog.h is included in a lot of places, most
> of which aren't going to care about the new things you're adding to
> that file at all. So I'm thinking it might make more sense to put the
> new code in src/backend/postmaster/startup.c. That is actually a
> better thematic fit given that this is really about the startup
> process specifically, not WAL-logging in general. Then reinit.c would
> include startup.h instead of xlog.h, which seems better, because I
> don't think we want any actual xlog operations to happen from within
> that file, so better not to be including xlog.h.
>
> The patch currently lacks documentation. It needs to update config.sgml.

I agree and I will take care in the next patch.

Thanks & Regards,
Nitin Jadhav



On Tue, Aug 3, 2021 at 8:21 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Aug 3, 2021 at 2:48 AM Nitin Jadhav
> <nitinjadhavpostgres@gmail.com> wrote:
> > Implemented the above approach and verified the patch. Kindly have a
> > look and share your thoughts.
>
> +LogStartupProgressTimerExpired(bool force, long *secs, int *usecs)
>
> The name of this function begins with "log" but it does not log
> anything, so that's probably a sign that you should rethink the name
> of the function. I suggested startup_progress_timer_expired()
> upthread, but now I think maybe we should call it
> startup_progress_timer_has_expired() and then renaming the Boolean
> you've called logStartupProgressTimeout to
> startup_progress_timer_expired. Also, the argument "bool force"
> doesn't really make sense for this function, which is why I suggested
> above calling it "bool done" instead.
>
> +    elapsed_ms = (seconds * 1000) + (useconds / 1000);
> +    interval_in_ms = log_startup_progress_interval * 1000;
> +    enable_timeout_after(LOG_STARTUP_PROGRESS_TIMEOUT,
> +                         (interval_in_ms - (elapsed_ms % interval_in_ms)));
>
> This will work correctly only if elapsed_ms is equal to interval_in_ms
> or slightly greater than interval_ms. But if elapsed_ms is greater
> than two times interval_ms, then it will produce pretty much random
> results. If elapsed_ms is negative because the system clock gets set
> backward, a possibility I've already mentioned to you in a previous
> review, then it will also misbehave. I actually don't think
> enable_timeout_after() is very easy to use for this kind of thing. At
> least for me, it's way easier to think about calculating the timestamp
> at which I want the timer to expire. Maybe something along these
> lines:
>
> next_timeout = last_startup_progress_timeout + interval;
> if (next_timeout < now)
> {
>     // Either the timeout was processed so late that we missed an entire cycle,
>     // or the system clock was set backwards.
>     next_timeout = now + interval;
> }
> enable_timeout_at(next_timeout);
>
> Also, I said before that I thought it was OK that you were logging a
> line at the end of every operation as well as after every N
> milliseconds. But, the more I think about it, the less I like it. We
> already have a 'redo done' line that shows up at the end of redo,
> which the patch wisely does not duplicate. But it's not quite clear
> that any of these other things are important enough to bother
> mentioning in the log unless they take a long time. After an immediate
> shutdown of an empty cluster, with this patch applied, I get 3 extra
> log messages:
>
> 2021-08-03 10:17:49.630 EDT [17567] LOG:  data directory sync (fsync)
> complete after 0.13 s
> 2021-08-03 10:17:49.633 EDT [17567] LOG:  resetting unlogged relations
> (cleanup) complete after 0.00 s
> 2021-08-03 10:17:49.635 EDT [17567] LOG:  resetting unlogged relations
> (init) complete after 0.00 s
>
> That doesn't seem like information anyone really needs. If it had
> taken a long time, it would have been worth logging, but in the normal
> case where it doesn't, it's just clutter. Another funny thing is that,
> as you've coded it, those additional log lines only appear when
> log_startup_progress_interval != 0. That's strange. It seems
> particularly strange because of the existing precedent where 'redo
> done' appears regardless of any setting, but also because when I set,
> say, a 10s interval, I guess I expect something to happen every 10s.
> Making something happen once at the end is different.
>
> So I think we should take this out, which would permit simplifying a
> bunch of code.The four places where you call
> ereport_startup_progress(true, ...) would go away.
> ereport_startup_progress() would no longer need a Boolean argument,
> and neither would LogStartupProgressTimerExpired() /
> startup_progress_timer_has_expired(). Note that there's no real need
> to disable the timeout when we're done with it. It's fine if we do,
> but if we don't, it's also not a big deal; all that happens if we
> leave the timer scheduled and let it expire is that it will set a
> Boolean flag that nobody will care about. So what I'm thinking about
> is that we could just have, say, reset_startup_progress_timeout() and
> startup_progress_timeout_has_expired().
> reset_startup_progress_timeout() would just do exactly what I showed
> above to reset the timeout, and you'd call it at the beginning of each
> phase. And startup_progress_timeout_has_expired() would look roughly
> like this:
>
> if (!startup_progress_timer_expired)
>     return;
> now = GetCurrentTimestamp();
> // compute timestamp difference
> last_startup_progress_timeout = now;
> reset_startup_progress_timeout();
>
> With these changes you'd have only 1 place in the code that needs to
> care about log_startup_progress_interval, as opposed to 3 as you have
> it currently, and only one place that enables the timeout, as opposed
> to 2 as you have it currently. I think that would be tidier.
>
> I think we also should consider where to put the new functions
> introduced by this patch, and the GUC. You put them in xlog.c which is
> reasonable since that is where StartupXLOG() lives. However, xlog.c is
> also a gigantic file, and xlog.h is included in a lot of places, most
> of which aren't going to care about the new things you're adding to
> that file at all. So I'm thinking it might make more sense to put the
> new code in src/backend/postmaster/startup.c. That is actually a
> better thematic fit given that this is really about the startup
> process specifically, not WAL-logging in general. Then reinit.c would
> include startup.h instead of xlog.h, which seems better, because I
> don't think we want any actual xlog operations to happen from within
> that file, so better not to be including xlog.h.
>
> The patch currently lacks documentation. It needs to update config.sgml.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Thu, Aug 5, 2021 at 7:41 AM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
> This seems a little confusing. As we are setting
> last_startup_progress_timeout = now and then calling
> reset_startup_progress_timeout() which will calculate the next_time
> based on the value of last_startup_progress_timeout initially and
> checks whether next_timeout is less than now. It doesn't make sense to
> me. I feel we should calculate the next_timeout based on the time that
> it is supposed to fire next time. So we should set
> last_startup_progress_timeout = next_timeout after enabling the timer.
> Also I feel with the 2 functions mentioned above, we also need
> InitStartupProgress() which sets the initial value to
> startupProcessOpStartTime which is used to calculate the time
> difference and display in the logs. I could see those functions like
> below.
>
> InitStartupProgress(void)
> {
>     startupProcessOpStartTime = GetCurrentTimestamp();
>     ResetStartupProgressTimeout(startupProcessOpStartTime);
> }

This makes sense, but I think I'd like to have all the functions in
this patch use names_like_this() rather than NamesLikeThis().

> reset_startup_progress_timeout(TimeStampTz now)
> {
>   next_timeout = last_startup_progress_timeout + interval;
>   if (next_timeout < now)
>   {
>      // Either the timeout was processed so late that we missed an entire cycle,
>      // or the system clock was set backwards.
>      next_timeout = now + interval;
>   }
>   enable_timeout_at(next_timeout);
>   last_startup_progress_timeout = next_timeout;
> }

Hmm, yeah, that seems good, but given this change, maybe the variables
need a little renaming. Like change last_startup_progress_timeout to
scheduled_startup_progress_timeout, perhaps.

> startup_progress_timeout_has_expired()
> {
>    if (!startup_progress_timer_expired)
>      return;
>   now = GetCurrentTimestamp();
>   // compute timestamp difference based on startupProcessOpStartTime
>   reset_startup_progress_timeout(now);
> }

I guess this one needs to return a Boolean, actually.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Nitin Jadhav
Date:
Modified the reset_startup_progress_timeout() to take the second
parameter which indicates whether it is called for initialization or
for resetting. Without this parameter there is a problem if we call
init_startup_progress() more than one time if there is no call to
ereport_startup_progress() in between as the code related to disabling
the timer has been removed.

reset_startup_progress_timeout(TimeStampTz now, bool is_init)
{
   if (is_init)
      next_timeout = now + interval;
   else
   {
     next_timeout = scheduled_startup_progress_timeout + interval;
     if (next_timeout < now)
     {
        // Either the timeout was processed so late that we missed an
entire cycle,
        // or the system clock was set backwards.
        next_timeout = now + interval;
     }
     enable_timeout_at(next_timeout);
     scheduled_startup_progress_timeout = next_timeout;
   }
}

I have incorporated this in the attached patch. Please correct me if I am wrong.

> This makes sense, but I think I'd like to have all the functions in
> this patch use names_like_this() rather than NamesLikeThis().

I have changed all the function names accordingly. But I would like to
know why it should be names_like_this() as there are many functions
with the format NamesLikeThis(). I would like to understand when to
use what, just to incorporate in the future patches.

> Hmm, yeah, that seems good, but given this change, maybe the variables
> need a little renaming. Like change last_startup_progress_timeout to
> scheduled_startup_progress_timeout, perhaps.

Right. Changed the variable name.

> I guess this one needs to return a Boolean, actually.

Yes.

> reset_startup_progress_timeout(TimeStampTz now)
> {
>   next_timeout = last_startup_progress_timeout + interval;
>   if (next_timeout < now)
>   {
>      // Either the timeout was processed so late that we missed an entire cycle,
>      // or the system clock was set backwards.
>      next_timeout = now + interval;
>   }
>   enable_timeout_at(next_timeout);
>   last_startup_progress_timeout = next_timeout;
> }

As per the above logic, I would like to discuss 2 cases.

Case-1:
First scheduled timeout is after 1 sec. But the time out occurred
after 1.5 sec. So the log msg prints after 1.5 sec. Next timer is
scheduled after 2 sec (scheduled_startup_progress_timeout + interval).
The condition (next_timeout < now) gets evaluated to false and
everything goes smooth.

Case-2:
First scheduled timeout is after 1 sec. But the timeout occurred after
2.5 sec. So the log msg prints after 2.5 sec. Now next time is
scheduled after 2 sec (scheduled_startup_progress_timeout + interval).
So the condition (next_timeout < now) will fail and the next_timeout
will get reset to 3.5 sec (2.5 + 1) and it continues. Is this
behaviour ok or should we set the next_timeout to 3 sec. Please share
your thoughts on this.

Thanks & Regards,
Nitin Jadhav


On Thu, Aug 5, 2021 at 7:49 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Aug 5, 2021 at 7:41 AM Nitin Jadhav
> <nitinjadhavpostgres@gmail.com> wrote:
> > This seems a little confusing. As we are setting
> > last_startup_progress_timeout = now and then calling
> > reset_startup_progress_timeout() which will calculate the next_time
> > based on the value of last_startup_progress_timeout initially and
> > checks whether next_timeout is less than now. It doesn't make sense to
> > me. I feel we should calculate the next_timeout based on the time that
> > it is supposed to fire next time. So we should set
> > last_startup_progress_timeout = next_timeout after enabling the timer.
> > Also I feel with the 2 functions mentioned above, we also need
> > InitStartupProgress() which sets the initial value to
> > startupProcessOpStartTime which is used to calculate the time
> > difference and display in the logs. I could see those functions like
> > below.
> >
> > InitStartupProgress(void)
> > {
> >     startupProcessOpStartTime = GetCurrentTimestamp();
> >     ResetStartupProgressTimeout(startupProcessOpStartTime);
> > }
>
> This makes sense, but I think I'd like to have all the functions in
> this patch use names_like_this() rather than NamesLikeThis().
>
> > reset_startup_progress_timeout(TimeStampTz now)
> > {
> >   next_timeout = last_startup_progress_timeout + interval;
> >   if (next_timeout < now)
> >   {
> >      // Either the timeout was processed so late that we missed an entire cycle,
> >      // or the system clock was set backwards.
> >      next_timeout = now + interval;
> >   }
> >   enable_timeout_at(next_timeout);
> >   last_startup_progress_timeout = next_timeout;
> > }
>
> Hmm, yeah, that seems good, but given this change, maybe the variables
> need a little renaming. Like change last_startup_progress_timeout to
> scheduled_startup_progress_timeout, perhaps.
>
> > startup_progress_timeout_has_expired()
> > {
> >    if (!startup_progress_timer_expired)
> >      return;
> >   now = GetCurrentTimestamp();
> >   // compute timestamp difference based on startupProcessOpStartTime
> >   reset_startup_progress_timeout(now);
> > }
>
> I guess this one needs to return a Boolean, actually.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com

Attachment

Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Mon, Aug 9, 2021 at 11:20 AM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
> Modified the reset_startup_progress_timeout() to take the second
> parameter which indicates whether it is called for initialization or
> for resetting. Without this parameter there is a problem if we call
> init_startup_progress() more than one time if there is no call to
> ereport_startup_progress() in between as the code related to disabling
> the timer has been removed.

I'd really like to avoid this. I don't see why it's necessary. You say
it causes a problem, but you don't explain what problem it causes.
enable_timeout_at() will disable the timer if not already done. I
think all we need to do is set scheduled_startup_progress_timeout = 0
before calling reset_startup_progress_timeout() in the "init" case and
don't do that for the non-init case. If that's not quite right, maybe
you can work out something that does work. But adding an is_init flag
to a function and having no common code between the is_init = true
case and the is_init = false case is exactly the kind of thing that I
don't want here. I want as much common code as possible.

> > This makes sense, but I think I'd like to have all the functions in
> > this patch use names_like_this() rather than NamesLikeThis().
>
> I have changed all the function names accordingly. But I would like to
> know why it should be names_like_this() as there are many functions
> with the format NamesLikeThis(). I would like to understand when to
> use what, just to incorporate in the future patches.

There is, unfortunately, no hard-and-fast rule, but we want to
maintain as much consistency with the existing style as we can. I
wasn't initially sure what would work best for this particular patch,
but after we committed to a name like ereport_startup_progress() that
to me was a strong hint in favor of using names_like_this()
throughout. It seems impossible to imagine punctuating it as
EreportStartupProgress() or something since that would be wildly
inconsistent with the existing function name, and there seems to be no
good reason why this patch can't be internally consistent.

To some degree, we tend to use names_like_this() for low-level
functions and NamesLikeThis() for higher-level functions, but that is
not a very consistent practice.

> > reset_startup_progress_timeout(TimeStampTz now)
> > {
> >   next_timeout = last_startup_progress_timeout + interval;
> >   if (next_timeout < now)
> >   {
> >      // Either the timeout was processed so late that we missed an entire cycle,
> >      // or the system clock was set backwards.
> >      next_timeout = now + interval;
> >   }
> >   enable_timeout_at(next_timeout);
> >   last_startup_progress_timeout = next_timeout;
> > }
>
> As per the above logic, I would like to discuss 2 cases.
>
> Case-1:
> First scheduled timeout is after 1 sec. But the time out occurred
> after 1.5 sec. So the log msg prints after 1.5 sec. Next timer is
> scheduled after 2 sec (scheduled_startup_progress_timeout + interval).
> The condition (next_timeout < now) gets evaluated to false and
> everything goes smooth.
>
> Case-2:
> First scheduled timeout is after 1 sec. But the timeout occurred after
> 2.5 sec. So the log msg prints after 2.5 sec. Now next time is
> scheduled after 2 sec (scheduled_startup_progress_timeout + interval).
> So the condition (next_timeout < now) will fail and the next_timeout
> will get reset to 3.5 sec (2.5 + 1) and it continues. Is this
> behaviour ok or should we set the next_timeout to 3 sec. Please share
> your thoughts on this.

I can't quite follow this, because it seems like you are sometimes
viewing the interval as 1 second and sometimes as 2 seconds. Maybe you
could clarify that, and perhaps show example output?

My feeling is that the timer will almost always be slightly late, but
it will very rarely be extremely late, and it will also very rarely be
early (only if someone resets the system clock). So let's consider
those two cases separately. If the timer is a little bit late each
time, we want to avoid drift, so we want to shorten the next sleep
time by the amount that the previous interrupt was late. If the
interval is 1000ms and the interrupt fires 1ms late then we should
sleep 999ms the next time; if 2ms late, 998ms. That way, although
there will be some variation in which the messages are logged, the
drift won't accumulate over time and even after many minutes of
recovery the messages will be printed at ABOUT the same number of
milliseconds after the second every time, instead of drifting further
and further off course.

But this strategy cannot be used if the drift is larger than the
interval. If we are trying to log a message every 1000ms and the timer
doesn't fire for 14ms, we can wait only 986ms the next time. If it
doesn't fire for 140ms, we can wait only 860ms the next time. But if
the timer doesn't fire for 1400ms, we cannot wait for -400ms the next
time. So what should we do? My proposal is to just wait for the
configured interval, 1000ms, essentially giving up on drift
correction. Now you could argue that we ought to just wait for 600ms
in the hopes of making it 2 * 1000ms after the previous status
message, but I'm not sure that really has any value, and it doesn't
seem especially likely to work. The only way timer interrupts are
likely to be that badly delayed is if the system is horrifically
overloaded, and if that's the case the next timer interrupt isn't
likely to fire on schedule anyway. Trying to correct for drift in such
a situation seems more likely to be confusing than to produce any
helpful result.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Nitin Jadhav
Date:
> I'd really like to avoid this. I don't see why it's necessary. You say
> it causes a problem, but you don't explain what problem it causes.
> enable_timeout_at() will disable the timer if not already done. I
> think all we need to do is set scheduled_startup_progress_timeout = 0
> before calling reset_startup_progress_timeout() in the "init" case and
> don't do that for the non-init case. If that's not quite right, maybe
> you can work out something that does work. But adding an is_init flag
> to a function and having no common code between the is_init = true
> case and the is_init = false case is exactly the kind of thing that I
> don't want here. I want as much common code as possible.

Setting set scheduled_startup_progress_timeout = 0 in the "init" case
solves the problem. The problem was if we call init_start_progress()
continuously then the first call to reset_startup_progress_timeout()
sets the value of scheduled_startup_progress_timeout to "now +
interval". Later call to reset_startup_progress_timeout() uses the
previously set value of scheduled_startup_progress_timeout which was
not correct and it was not behaving as expected. I could see that the
first log gets printed far later than the expected interval.

> To some degree, we tend to use names_like_this() for low-level
> functions and NamesLikeThis() for higher-level functions, but that is
> not a very consistent practice.

Ok. Thanks for the information.

> But this strategy cannot be used if the drift is larger than the
> interval. If we are trying to log a message every 1000ms and the timer
> doesn't fire for 14ms, we can wait only 986ms the next time. If it
> doesn't fire for 140ms, we can wait only 860ms the next time. But if
> the timer doesn't fire for 1400ms, we cannot wait for -400ms the next
> time. So what should we do? My proposal is to just wait for the
> configured interval, 1000ms, essentially giving up on drift
> correction. Now you could argue that we ought to just wait for 600ms
> in the hopes of making it 2 * 1000ms after the previous status
> message, but I'm not sure that really has any value, and it doesn't
> seem especially likely to work. The only way timer interrupts are
> likely to be that badly delayed is if the system is horrifically
> overloaded, and if that's the case the next timer interrupt isn't
> likely to fire on schedule anyway. Trying to correct for drift in such
> a situation seems more likely to be confusing than to produce any
> helpful result.

This is what I was trying to convey in case-2. I agree that it is
better to consider "now + interval" in such a case instead of trying
to adjust the drift.

Please find the updated patch attached.

On Tue, Aug 10, 2021 at 1:06 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Aug 9, 2021 at 11:20 AM Nitin Jadhav
> <nitinjadhavpostgres@gmail.com> wrote:
> > Modified the reset_startup_progress_timeout() to take the second
> > parameter which indicates whether it is called for initialization or
> > for resetting. Without this parameter there is a problem if we call
> > init_startup_progress() more than one time if there is no call to
> > ereport_startup_progress() in between as the code related to disabling
> > the timer has been removed.
>
> I'd really like to avoid this. I don't see why it's necessary. You say
> it causes a problem, but you don't explain what problem it causes.
> enable_timeout_at() will disable the timer if not already done. I
> think all we need to do is set scheduled_startup_progress_timeout = 0
> before calling reset_startup_progress_timeout() in the "init" case and
> don't do that for the non-init case. If that's not quite right, maybe
> you can work out something that does work. But adding an is_init flag
> to a function and having no common code between the is_init = true
> case and the is_init = false case is exactly the kind of thing that I
> don't want here. I want as much common code as possible.
>
> > > This makes sense, but I think I'd like to have all the functions in
> > > this patch use names_like_this() rather than NamesLikeThis().
> >
> > I have changed all the function names accordingly. But I would like to
> > know why it should be names_like_this() as there are many functions
> > with the format NamesLikeThis(). I would like to understand when to
> > use what, just to incorporate in the future patches.
>
> There is, unfortunately, no hard-and-fast rule, but we want to
> maintain as much consistency with the existing style as we can. I
> wasn't initially sure what would work best for this particular patch,
> but after we committed to a name like ereport_startup_progress() that
> to me was a strong hint in favor of using names_like_this()
> throughout. It seems impossible to imagine punctuating it as
> EreportStartupProgress() or something since that would be wildly
> inconsistent with the existing function name, and there seems to be no
> good reason why this patch can't be internally consistent.
>
> To some degree, we tend to use names_like_this() for low-level
> functions and NamesLikeThis() for higher-level functions, but that is
> not a very consistent practice.
>
> > > reset_startup_progress_timeout(TimeStampTz now)
> > > {
> > >   next_timeout = last_startup_progress_timeout + interval;
> > >   if (next_timeout < now)
> > >   {
> > >      // Either the timeout was processed so late that we missed an entire cycle,
> > >      // or the system clock was set backwards.
> > >      next_timeout = now + interval;
> > >   }
> > >   enable_timeout_at(next_timeout);
> > >   last_startup_progress_timeout = next_timeout;
> > > }
> >
> > As per the above logic, I would like to discuss 2 cases.
> >
> > Case-1:
> > First scheduled timeout is after 1 sec. But the time out occurred
> > after 1.5 sec. So the log msg prints after 1.5 sec. Next timer is
> > scheduled after 2 sec (scheduled_startup_progress_timeout + interval).
> > The condition (next_timeout < now) gets evaluated to false and
> > everything goes smooth.
> >
> > Case-2:
> > First scheduled timeout is after 1 sec. But the timeout occurred after
> > 2.5 sec. So the log msg prints after 2.5 sec. Now next time is
> > scheduled after 2 sec (scheduled_startup_progress_timeout + interval).
> > So the condition (next_timeout < now) will fail and the next_timeout
> > will get reset to 3.5 sec (2.5 + 1) and it continues. Is this
> > behaviour ok or should we set the next_timeout to 3 sec. Please share
> > your thoughts on this.
>
> I can't quite follow this, because it seems like you are sometimes
> viewing the interval as 1 second and sometimes as 2 seconds. Maybe you
> could clarify that, and perhaps show example output?
>
> My feeling is that the timer will almost always be slightly late, but
> it will very rarely be extremely late, and it will also very rarely be
> early (only if someone resets the system clock). So let's consider
> those two cases separately. If the timer is a little bit late each
> time, we want to avoid drift, so we want to shorten the next sleep
> time by the amount that the previous interrupt was late. If the
> interval is 1000ms and the interrupt fires 1ms late then we should
> sleep 999ms the next time; if 2ms late, 998ms. That way, although
> there will be some variation in which the messages are logged, the
> drift won't accumulate over time and even after many minutes of
> recovery the messages will be printed at ABOUT the same number of
> milliseconds after the second every time, instead of drifting further
> and further off course.
>
> But this strategy cannot be used if the drift is larger than the
> interval. If we are trying to log a message every 1000ms and the timer
> doesn't fire for 14ms, we can wait only 986ms the next time. If it
> doesn't fire for 140ms, we can wait only 860ms the next time. But if
> the timer doesn't fire for 1400ms, we cannot wait for -400ms the next
> time. So what should we do? My proposal is to just wait for the
> configured interval, 1000ms, essentially giving up on drift
> correction. Now you could argue that we ought to just wait for 600ms
> in the hopes of making it 2 * 1000ms after the previous status
> message, but I'm not sure that really has any value, and it doesn't
> seem especially likely to work. The only way timer interrupts are
> likely to be that badly delayed is if the system is horrifically
> overloaded, and if that's the case the next timer interrupt isn't
> likely to fire on schedule anyway. Trying to correct for drift in such
> a situation seems more likely to be confusing than to produce any
> helpful result.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com

Attachment

Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Tue, Aug 10, 2021 at 9:28 AM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
> Please find the updated patch attached.

I think this is getting close. The output looks nice. However, I still
see a bunch of issues.

You mentioned previously that you would add documentation, but I do
not see it here.

startup_progress_timer_expired should be declared as sig_atomic_t like
we do in other cases (see interrupt.c).

The default value of the new GUC is 10s in postgresql.conf.sample, but
-1 in guc.c. They should both be 10s, and the one in
postgresql.conf.sample should be commented out.

I suggest making the GUC GUC_UNIT_MS rather than GUC_UNIT_S, but
expressing the default in postgresl.conf.sample as 10s rather than
10000ms. I tried values measured in milliseconds just for testing
purposes and didn't initially understand why it wasn't working. I
don't think there's any reason it can't work.

I would prefer to see log_startup_progress_interval declared and
defined in startup.c/startup.h rather than guc.c/guc.h.

There's no precedent in the tree for the use of ##__VA_ARGS__. On my
system it seems to work fine if I just leave out the ##. Any reason
not to do that?

Two of the declarations in startup.h forgot the leading "extern",
while the other two that are right next to them have it, matching
project style.

I'm reasonably happy with most of the identifier names now, but I
think init_startup_progress() is confusing. The reason I think that is
that we call it more than once, which is not really what people think
about when they think of an "init" function, I think. It's not
initializing the startup progress facility in general; it's preparing
for the next phase of startup progress reporting. How about renaming
it to begin_startup_progress_phase()? And then
startup_process_op_start_time could be
startup_process_phase_start_time to match.

SyncDataDirectory() potentially walks over the data directory three
times: once to call do_syncfs(), once to call pre_sync_fname(), and
once to call datadir_fsync_fname(). With this patch, the first and
third will emit progress messages if the operation runs long, but the
second will not. I think they should all be treated the same. Also,
the locations where you've inserted calls to init_startup_progress()
don't really make it clear with what code that's associated. I'd put
them *immediately* before the call to do_syncfs() or walkdir().

Remember that PostgreSQL comments are typically written "telegraph
style," so function comments should say "Does whatever" not "It does
whatever". Most of them are correct, but there's one sentence you need
to fix.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Nitin Jadhav
Date:
> startup_progress_timer_expired should be declared as sig_atomic_t like
> we do in other cases (see interrupt.c).

Fixed.

> The default value of the new GUC is 10s in postgresql.conf.sample, but
> -1 in guc.c. They should both be 10s, and the one in
> postgresql.conf.sample should be commented out.

Fixed.

> I suggest making the GUC GUC_UNIT_MS rather than GUC_UNIT_S, but
> expressing the default in postgresl.conf.sample as 10s rather than
> 10000ms. I tried values measured in milliseconds just for testing
> purposes and didn't initially understand why it wasn't working. I
> don't think there's any reason it can't work.

As suggested, I have changed it to GUC_UNIT_MS and kept the default
value to 10s. I would like to know the reason why it can't be
GUC_UNIT_S as we are expressing the values in terms of seconds.

> I would prefer to see log_startup_progress_interval declared and
> defined in startup.c/startup.h rather than guc.c/guc.h.

Fixed.

> There's no precedent in the tree for the use of ##__VA_ARGS__. On my
> system it seems to work fine if I just leave out the ##. Any reason
> not to do that?

I had added this to support if no variable argument are passed to the
macro. For example, in the previous patches we used to log the
progress at the end of the operation like
"ereport_startup_progress(true, "data directory sync (syncfs) complete
after %ld.%02d s");". Since these calls are removed now, ## is not
required. Hence removed in the attached patch.

> Two of the declarations in startup.h forgot the leading "extern",
> while the other two that are right next to them have it, matching
> project style.

I had not added extern since those function were not used in the other
files. Now added to match the project style.

> I'm reasonably happy with most of the identifier names now, but I
> think init_startup_progress() is confusing. The reason I think that is
> that we call it more than once, which is not really what people think
> about when they think of an "init" function, I think. It's not
> initializing the startup progress facility in general; it's preparing
> for the next phase of startup progress reporting. How about renaming
> it to begin_startup_progress_phase()? And then
> startup_process_op_start_time could be
> startup_process_phase_start_time to match.

Yes begin_startup_progress_phase() looks more appropriate. Instead of
startup_process_phase_start_time, startup_progress_phase_start_time
looks more appropriate. Changed these in the attached patch.

> SyncDataDirectory() potentially walks over the data directory three
> times: once to call do_syncfs(), once to call pre_sync_fname(), and
> once to call datadir_fsync_fname(). With this patch, the first and
> third will emit progress messages if the operation runs long, but the
> second will not. I think they should all be treated the same. Also,
> the locations where you've inserted calls to init_startup_progress()
> don't really make it clear with what code that's associated. I'd put
> them *immediately* before the call to do_syncfs() or walkdir().

Fixed.

> Remember that PostgreSQL comments are typically written "telegraph
> style," so function comments should say "Does whatever" not "It does
> whatever". Most of them are correct, but there's one sentence you need
> to fix.

Fixed in the function comments of
startup_progress_timeout_has_expired(). Please let me now if this is
not the one you wanted me to correct.

> You mentioned previously that you would add documentation, but I do
> not see it here.

Sorry. I missed this. I have added the documentation in the attached patch.
On Tue, Aug 10, 2021 at 8:56 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Aug 10, 2021 at 9:28 AM Nitin Jadhav
> <nitinjadhavpostgres@gmail.com> wrote:
> > Please find the updated patch attached.
>
> I think this is getting close. The output looks nice. However, I still
> see a bunch of issues.
>
> You mentioned previously that you would add documentation, but I do
> not see it here.
>
> startup_progress_timer_expired should be declared as sig_atomic_t like
> we do in other cases (see interrupt.c).
>
> The default value of the new GUC is 10s in postgresql.conf.sample, but
> -1 in guc.c. They should both be 10s, and the one in
> postgresql.conf.sample should be commented out.
>
> I suggest making the GUC GUC_UNIT_MS rather than GUC_UNIT_S, but
> expressing the default in postgresl.conf.sample as 10s rather than
> 10000ms. I tried values measured in milliseconds just for testing
> purposes and didn't initially understand why it wasn't working. I
> don't think there's any reason it can't work.
>
> I would prefer to see log_startup_progress_interval declared and
> defined in startup.c/startup.h rather than guc.c/guc.h.
>
> There's no precedent in the tree for the use of ##__VA_ARGS__. On my
> system it seems to work fine if I just leave out the ##. Any reason
> not to do that?
>
> Two of the declarations in startup.h forgot the leading "extern",
> while the other two that are right next to them have it, matching
> project style.
>
> I'm reasonably happy with most of the identifier names now, but I
> think init_startup_progress() is confusing. The reason I think that is
> that we call it more than once, which is not really what people think
> about when they think of an "init" function, I think. It's not
> initializing the startup progress facility in general; it's preparing
> for the next phase of startup progress reporting. How about renaming
> it to begin_startup_progress_phase()? And then
> startup_process_op_start_time could be
> startup_process_phase_start_time to match.
>
> SyncDataDirectory() potentially walks over the data directory three
> times: once to call do_syncfs(), once to call pre_sync_fname(), and
> once to call datadir_fsync_fname(). With this patch, the first and
> third will emit progress messages if the operation runs long, but the
> second will not. I think they should all be treated the same. Also,
> the locations where you've inserted calls to init_startup_progress()
> don't really make it clear with what code that's associated. I'd put
> them *immediately* before the call to do_syncfs() or walkdir().
>
> Remember that PostgreSQL comments are typically written "telegraph
> style," so function comments should say "Does whatever" not "It does
> whatever". Most of them are correct, but there's one sentence you need
> to fix.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com

Attachment

Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Thu, Aug 12, 2021 at 7:40 AM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
> > I suggest making the GUC GUC_UNIT_MS rather than GUC_UNIT_S, but
> > expressing the default in postgresl.conf.sample as 10s rather than
> > 10000ms. I tried values measured in milliseconds just for testing
> > purposes and didn't initially understand why it wasn't working. I
> > don't think there's any reason it can't work.
>
> As suggested, I have changed it to GUC_UNIT_MS and kept the default
> value to 10s. I would like to know the reason why it can't be
> GUC_UNIT_S as we are expressing the values in terms of seconds.

I mean, it *could* be. There's just no advantage. Values in seconds
will work correctly either way. But values in milliseconds will only
work if it's GUC_UNIT_MS. It seems to me that it's better to make more
things work rather than fewer.

> > There's no precedent in the tree for the use of ##__VA_ARGS__. On my
> > system it seems to work fine if I just leave out the ##. Any reason
> > not to do that?
>
> I had added this to support if no variable argument are passed to the
> macro. For example, in the previous patches we used to log the
> progress at the end of the operation like
> "ereport_startup_progress(true, "data directory sync (syncfs) complete
> after %ld.%02d s");". Since these calls are removed now, ## is not
> required. Hence removed in the attached patch.

Hmm, OK. That's actually a pretty good reason for using ## there. It
just made me nervous because we have no similar uses of ## in the
backend code. We rely on it elsewhere for concatenation, but not for
comma removal. Let's try leaving it out for now unless somebody else
shows up with a different opinion.

> I had not added extern since those function were not used in the other
> files. Now added to match the project style.

Anything that's not used in other files should be declared static in
the file itself, and not present in the header. Once you fix this for
reset_startup_progress_timeout, the header won't need to include
datatype/timestamp.h any more, which is good, because we don't want
header files to depend on more other header files than necessary.

Looking over this version, I realized something I (or you) should have
noticed sooner: you've added the RegisterTimeout call to
InitPostgres(), but that's for things that are used by all backends,
and this is only used by the startup process. So it seems to me that
the right place is StartupProcessMain. That would have the further
advantage of allowing startup_progress_timeout_handler to be made
static. begin_startup_progress_phase() and
startup_progress_timeout_has_expired() are the actual API functions
though so they will need to remain extern.

@@ -679,7 +680,6 @@ static char *recovery_target_lsn_string;
 /* should be static, but commands/variable.c needs to get at this */
 char      *role_string;

-
 /*
  * Displayable names for context types (enum GucContext)
  *

This hunk should be removed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Justin Pryzby
Date:
Should this feature distinguish between crash recovery and archive recovery on
a hot standby ?  Otherwise the standby will display this all the time.

2021-08-14 16:13:33.139 CDT startup[11741] LOG:  redo in progress, elapsed time: 124.42 s, current LSN: 0/EEE2100

If so, I think maybe you'd check !InArchiveRecovery (but until Robert finishes
cleanup of xlog.c variables, I can't say that with much confidence).



Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Sat, Aug 14, 2021 at 5:47 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> Should this feature distinguish between crash recovery and archive recovery on
> a hot standby ?  Otherwise the standby will display this all the time.
>
> 2021-08-14 16:13:33.139 CDT startup[11741] LOG:  redo in progress, elapsed time: 124.42 s, current LSN: 0/EEE2100
>
> If so, I think maybe you'd check !InArchiveRecovery (but until Robert finishes
> cleanup of xlog.c variables, I can't say that with much confidence).

Hmm. My inclination is to think that on an actual standby, you
wouldn't want to get messages like this, but if you were doing a
point-in-time-recovery, then you would. So I think maybe
InArchiveRecovery is not the right thing. Perhaps StandbyMode?

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Nitin Jadhav
Date:
> Anything that's not used in other files should be declared static in
> the file itself, and not present in the header. Once you fix this for
> reset_startup_progress_timeout, the header won't need to include
> datatype/timestamp.h any more, which is good, because we don't want
> header files to depend on more other header files than necessary.

Thanks for identifying this. I will take care in the next patch.

> Looking over this version, I realized something I (or you) should have
> noticed sooner: you've added the RegisterTimeout call to
> InitPostgres(), but that's for things that are used by all backends,
> and this is only used by the startup process. So it seems to me that
> the right place is StartupProcessMain. That would have the further
> advantage of allowing startup_progress_timeout_handler to be made
> static. begin_startup_progress_phase() and
> startup_progress_timeout_has_expired() are the actual API functions
> though so they will need to remain extern.

Yes. I had noticed this earlier and the RegisterTimeout() call was
only present in StartupProcessMain() and not in InitPostgres() in the
earlier versions (v7) of the patch. Since StartupXLOG() gets called in
the 2 places, I had restricted the InitPostgres() flow by checking for
the !AmStartupProcess() in the newly added functions. But later we had
discussion and concluded to add the RegisterTimeout() call even in
case of InitPostgres(). Kindly refer to the discussion just after the
v7 patch in this thread and let me know your thoughts.

> This hunk should be removed.

I will remove it in the next patch.

On Tue, Aug 17, 2021 at 1:08 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Sat, Aug 14, 2021 at 5:47 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > Should this feature distinguish between crash recovery and archive recovery on
> > a hot standby ?  Otherwise the standby will display this all the time.
> >
> > 2021-08-14 16:13:33.139 CDT startup[11741] LOG:  redo in progress, elapsed time: 124.42 s, current LSN: 0/EEE2100
> >
> > If so, I think maybe you'd check !InArchiveRecovery (but until Robert finishes
> > cleanup of xlog.c variables, I can't say that with much confidence).
>
> Hmm. My inclination is to think that on an actual standby, you
> wouldn't want to get messages like this, but if you were doing a
> point-in-time-recovery, then you would. So I think maybe
> InArchiveRecovery is not the right thing. Perhaps StandbyMode?
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Nitin Jadhav
Date:
> > Anything that's not used in other files should be declared static in
> > the file itself, and not present in the header. Once you fix this for
> > reset_startup_progress_timeout, the header won't need to include
> > datatype/timestamp.h any more, which is good, because we don't want
> > header files to depend on more other header files than necessary.
>
> Thanks for identifying this. I will take care in the next patch.

Fixed.

> > This hunk should be removed.
>
> I will remove it in the next patch.

Removed.

Please find the updated patch attached.

On Wed, Aug 18, 2021 at 12:23 PM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
>
> > Anything that's not used in other files should be declared static in
> > the file itself, and not present in the header. Once you fix this for
> > reset_startup_progress_timeout, the header won't need to include
> > datatype/timestamp.h any more, which is good, because we don't want
> > header files to depend on more other header files than necessary.
>
> Thanks for identifying this. I will take care in the next patch.
>
> > Looking over this version, I realized something I (or you) should have
> > noticed sooner: you've added the RegisterTimeout call to
> > InitPostgres(), but that's for things that are used by all backends,
> > and this is only used by the startup process. So it seems to me that
> > the right place is StartupProcessMain. That would have the further
> > advantage of allowing startup_progress_timeout_handler to be made
> > static. begin_startup_progress_phase() and
> > startup_progress_timeout_has_expired() are the actual API functions
> > though so they will need to remain extern.
>
> Yes. I had noticed this earlier and the RegisterTimeout() call was
> only present in StartupProcessMain() and not in InitPostgres() in the
> earlier versions (v7) of the patch. Since StartupXLOG() gets called in
> the 2 places, I had restricted the InitPostgres() flow by checking for
> the !AmStartupProcess() in the newly added functions. But later we had
> discussion and concluded to add the RegisterTimeout() call even in
> case of InitPostgres(). Kindly refer to the discussion just after the
> v7 patch in this thread and let me know your thoughts.
>
> > This hunk should be removed.
>
> I will remove it in the next patch.
>
> On Tue, Aug 17, 2021 at 1:08 AM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Sat, Aug 14, 2021 at 5:47 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > Should this feature distinguish between crash recovery and archive recovery on
> > > a hot standby ?  Otherwise the standby will display this all the time.
> > >
> > > 2021-08-14 16:13:33.139 CDT startup[11741] LOG:  redo in progress, elapsed time: 124.42 s, current LSN:
0/EEE2100
> > >
> > > If so, I think maybe you'd check !InArchiveRecovery (but until Robert finishes
> > > cleanup of xlog.c variables, I can't say that with much confidence).
> >
> > Hmm. My inclination is to think that on an actual standby, you
> > wouldn't want to get messages like this, but if you were doing a
> > point-in-time-recovery, then you would. So I think maybe
> > InArchiveRecovery is not the right thing. Perhaps StandbyMode?
> >
> > --
> > Robert Haas
> > EDB: http://www.enterprisedb.com

Attachment

Re: when the startup process doesn't (logging startup delays)

From
Justin Pryzby
Date:
On Fri, Sep 03, 2021 at 01:23:56PM +0530, Nitin Jadhav wrote:
> Please find the updated patch attached.

Please check CA+TgmoZtbqxaOLdpNkBcDbz=41tWALA8kpH4M=RWtPYHC7-KNg@mail.gmail.com

I agree with Robert that a standby server should not continuously show timing
messages for WAL replay.

Some doc comments:

+         Sets the time interval between each progress update of the operations
+         performed during startup process. This produces the log message after

Either say "performed by the startup process" or "performed during startup".

s/the/a/

+         every interval of time for the operations that take longer time. The unit

..for those operations which take longer than the specified duration.

+         used to specify the value is seconds. For example, if you set it to
+         <literal> 10s </literal>, then after every <literal> 10s </literal> there

remove "after"

+         is a log message indicating which operation is going on and what is the

say "..every 10s, a log is emitted indicating which operation is ongoing, and
the elapsed time from the beginning of the operation.."

+         elapsed time from beginning. If the value is set to <literal> 0 </literal>,
+         then it logs all the available messages for such operations. <literal> -1

"..then all messages for such operations are logged."

+         </literal> disables the feature. The default value is set to <literal> 10s
+         </literal>

"The default value is >10s<."



Re: when the startup process doesn't (logging startup delays)

From
Michael Paquier
Date:
On Fri, Sep 03, 2021 at 09:23:27PM -0500, Justin Pryzby wrote:
> On Fri, Sep 03, 2021 at 01:23:56PM +0530, Nitin Jadhav wrote:
> > Please find the updated patch attached.
>
> Please check CA+TgmoZtbqxaOLdpNkBcDbz=41tWALA8kpH4M=RWtPYHC7-KNg@mail.gmail.com
>
> I agree with Robert that a standby server should not continuously show timing
> messages for WAL replay.

Clearly.  This should be limited to crash recovery, and maybe there
could be some checks to make sure that nothing is logged once a
consistent point is reached.  Honestly, I don't see why we should have
a GUC for what's proposed here.  A value too low would bloat the logs
with entries that are not that meaningful.  A value too large would
just prevent access to some information wanted.  Wouldn't it be better
to just pick up a value like 10s or 20s?

Looking at v13..

+       {"log_startup_progress_interval", PGC_SIGHUP, LOGGING_WHEN,
+           gettext_noop("Sets the time interval between each progress update "
+                        "of the startup process."),
+           gettext_noop("0 logs all messages. -1 turns this feature off."),
+           GUC_UNIT_MS,
The unit is incorrect here, as that would default to 10ms, contrary to
what the documentation says about 10s.
--
Michael

Attachment

Re: when the startup process doesn't (logging startup delays)

From
Nitin Jadhav
Date:
> I also agree that this is the better place to do it. Hence modified
> the patch accordingly. The condition "!AmStartupProcess()" is added to
> differentiate whether the call is done from a startup process or some
> other process. Actually StartupXLOG() gets called in 2 places. one in
> StartupProcessMain() and the other in InitPostgres(). As the logging
> of the startup progress is required only during the startup process
> and not in the other cases,

The InitPostgres() case occurs when the server is started in bootstrap
mode (during initdb) or in single-user mode (postgres --single). I do
not see any reason why we shouldn't produce progress messages in at
least the latter case. I suspect that someone who is in the rather
desperate scenario of having to use single-user mode would really like
to know how long the server is going to take to start up.

Perhaps during initdb we don't want messages, but I'm not sure that we
need to do anything about that here. None of the messages that the
server normally produces show up when you run initdb, so I guess they
are getting redirected to /dev/null or something.

So I don't think that using AmStartupProcess() for this purpose is right.

On Tue, Sep 7, 2021 at 10:58 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Sep 03, 2021 at 09:23:27PM -0500, Justin Pryzby wrote:
> > On Fri, Sep 03, 2021 at 01:23:56PM +0530, Nitin Jadhav wrote:
> > > Please find the updated patch attached.
> >
> > Please check CA+TgmoZtbqxaOLdpNkBcDbz=41tWALA8kpH4M=RWtPYHC7-KNg@mail.gmail.com
> >
> > I agree with Robert that a standby server should not continuously show timing
> > messages for WAL replay.
>
> Clearly.  This should be limited to crash recovery, and maybe there
> could be some checks to make sure that nothing is logged once a
> consistent point is reached.  Honestly, I don't see why we should have
> a GUC for what's proposed here.  A value too low would bloat the logs
> with entries that are not that meaningful.  A value too large would
> just prevent access to some information wanted.  Wouldn't it be better
> to just pick up a value like 10s or 20s?
>
> Looking at v13..
>
> +       {"log_startup_progress_interval", PGC_SIGHUP, LOGGING_WHEN,
> +           gettext_noop("Sets the time interval between each progress update "
> +                        "of the startup process."),
> +           gettext_noop("0 logs all messages. -1 turns this feature off."),
> +           GUC_UNIT_MS,
> The unit is incorrect here, as that would default to 10ms, contrary to
> what the documentation says about 10s.
> --
> Michael



Re: when the startup process doesn't (logging startup delays)

From
Nitin Jadhav
Date:
> Looking over this version, I realized something I (or you) should have
> noticed sooner: you've added the RegisterTimeout call to
> InitPostgres(), but that's for things that are used by all backends,
> and this is only used by the startup process. So it seems to me that
> the right place is StartupProcessMain. That would have the further
> advantage of allowing startup_progress_timeout_handler to be made
> static. begin_startup_progress_phase() and
> startup_progress_timeout_has_expired() are the actual API functions
> though so they will need to remain extern.
>
> I agree with Robert that a standby server should not continuously show timing
> messages for WAL replay.

The earlier discussion wrt this point is as follows.

> > I also agree that this is the better place to do it. Hence modified
> > the patch accordingly. The condition "!AmStartupProcess()" is added to
> > differentiate whether the call is done from a startup process or some
> > other process. Actually StartupXLOG() gets called in 2 places. one in
> > StartupProcessMain() and the other in InitPostgres(). As the logging
> > of the startup progress is required only during the startup process
> > and not in the other cases,
>
> The InitPostgres() case occurs when the server is started in bootstrap
> mode (during initdb) or in single-user mode (postgres --single). I do
> not see any reason why we shouldn't produce progress messages in at
> least the latter case. I suspect that someone who is in the rather
> desperate scenario of having to use single-user mode would really like
> to know how long the server is going to take to start up.
>
> Perhaps during initdb we don't want messages, but I'm not sure that we
> need to do anything about that here. None of the messages that the
> server normally produces show up when you run initdb, so I guess they
> are getting redirected to /dev/null or something.
>
> So I don't think that using AmStartupProcess() for this purpose is right.

So as per the recent discussion, RegisterTimeout call should be
removed from InitPostgres() and the condition "!AmStartupProcess()" is
to be added in begin_startup_progress_phase() and
ereport_startup_progress() to differentiate whether the call is from a
startup process or some other process. Kindly correct me if I am
wrong.

> Some doc comments:

Thanks for the suggestions. I will take care in the next patch.

> Clearly.  This should be limited to crash recovery, and maybe there
> could be some checks to make sure that nothing is logged once a
> consistent point is reached.

The purpose here is to show the progress of the operation if it is
taking longer than the interval set by the user until it completes the
operation. Users should know what operation is happening in the
background and to show the progress, displaying the elapsed time. So
according to me the consistent point is nothing but the end of the
operation. Kindly let me know if you have something in mind and that
could be the better consistent point.

> Honestly, I don't see why we should have
> a GUC for what's proposed here.  A value too low would bloat the logs
> with entries that are not that meaningful.  A value too large would
> just prevent access to some information wanted.  Wouldn't it be better
> to just pick up a value like 10s or 20s?

It is difficult to finalise the value and use that value without
providing an option to change. If we choose one value (say 10s), it
may be too less for some users or too large for some other users. So I
feel it is better to provide an option to users so that they can
choose the value according to their need. Anyway the default value set
for this setting is 10s.

> +       {"log_startup_progress_interval", PGC_SIGHUP, LOGGING_WHEN,
> +           gettext_noop("Sets the time interval between each progress update "
> +                        "of the startup process."),
> +           gettext_noop("0 logs all messages. -1 turns this feature off."),
> +           GUC_UNIT_MS,
> The unit is incorrect here, as that would default to 10ms, contrary to
> what the documentation says about 10s.

Kindly refer the previous few discussions wrt this point.

On Tue, Sep 7, 2021 at 10:58 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Sep 03, 2021 at 09:23:27PM -0500, Justin Pryzby wrote:
> > On Fri, Sep 03, 2021 at 01:23:56PM +0530, Nitin Jadhav wrote:
> > > Please find the updated patch attached.
> >
> > Please check CA+TgmoZtbqxaOLdpNkBcDbz=41tWALA8kpH4M=RWtPYHC7-KNg@mail.gmail.com
> >
> > I agree with Robert that a standby server should not continuously show timing
> > messages for WAL replay.
>
> Clearly.  This should be limited to crash recovery, and maybe there
> could be some checks to make sure that nothing is logged once a
> consistent point is reached.  Honestly, I don't see why we should have
> a GUC for what's proposed here.  A value too low would bloat the logs
> with entries that are not that meaningful.  A value too large would
> just prevent access to some information wanted.  Wouldn't it be better
> to just pick up a value like 10s or 20s?
>
> Looking at v13..
>
> +       {"log_startup_progress_interval", PGC_SIGHUP, LOGGING_WHEN,
> +           gettext_noop("Sets the time interval between each progress update "
> +                        "of the startup process."),
> +           gettext_noop("0 logs all messages. -1 turns this feature off."),
> +           GUC_UNIT_MS,
> The unit is incorrect here, as that would default to 10ms, contrary to
> what the documentation says about 10s.
> --
> Michael



Re: when the startup process doesn't (logging startup delays)

From
Justin Pryzby
Date:
On Tue, Sep 07, 2021 at 03:07:15PM +0530, Nitin Jadhav wrote:
> > Looking over this version, I realized something I (or you) should have
> > noticed sooner: you've added the RegisterTimeout call to
> > InitPostgres(), but that's for things that are used by all backends,
> > and this is only used by the startup process. So it seems to me that
> > the right place is StartupProcessMain. That would have the further
> > advantage of allowing startup_progress_timeout_handler to be made
> > static. begin_startup_progress_phase() and
> > startup_progress_timeout_has_expired() are the actual API functions
> > though so they will need to remain extern.
> >
> > I agree with Robert that a standby server should not continuously show timing
> > messages for WAL replay.
> 
> The earlier discussion wrt this point is as follows.

I think you're confusing discussions.

Robert was talking about initdb/bootstrap/single, and I separately and
independently asked about hot standbys.  It seems like Robert and I agreed
about the desired behavior and there was no further discussion.

> > Honestly, I don't see why we should have
> > a GUC for what's proposed here.  A value too low would bloat the logs
> > with entries that are not that meaningful.  A value too large would
> > just prevent access to some information wanted.  Wouldn't it be better
> > to just pick up a value like 10s or 20s?

I don't think bloating logs is a issue for values > 10sec.

You agreed that it's important to choose the "right" value, but I think that
will vary between users.  Some installations with large checkpoint_timeout
might anticipate taking 15+min to perform recovery, but others might even have
a strict requirement that recovery must not take more than (say) 10sec; someone
might want to use this to verify that, or to optimize the slow parts of
recovery, with an interval that someone else might not care about.

> > +       {"log_startup_progress_interval", PGC_SIGHUP, LOGGING_WHEN,
> > +           gettext_noop("Sets the time interval between each progress update "
> > +                        "of the startup process."),
> > +           gettext_noop("0 logs all messages. -1 turns this feature off."),
> > +           GUC_UNIT_MS,
|+               10, -1, INT_MAX,
> > The unit is incorrect here, as that would default to 10ms, contrary to
> > what the documentation says about 10s.
> 
> Kindly refer the previous few discussions wrt this point.

You copied and pasted unrelated emails, which isn't helpful.

Michael is right.  You updated some of the units based on Robert's suggestion
to use MS, but didn't update all of the corresponding parts of the patch.
guc.c says that the units are in MS, which means that unqualified values are
interpretted as such.  But postgresql.conf.sample still says "seconds", and
guc.c says the default value is "10", and you still do:

+       interval_in_ms = log_startup_progress_interval * 1000;

I checked that this currently does not interpret the value as ms:
|./tmp_install/usr/local/pgsql/bin/postgres -D src/test/regress/tmp_check/data/ -c log_startup_progress_interval=1
|2021-09-07 06:28:58.694 CDT startup[18865] LOG:  redo in progress, elapsed time: 1.00 s, current LSN: 0/E94ED88
|2021-09-07 06:28:59.694 CDT startup[18865] LOG:  redo in progress, elapsed time: 2.00 s, current LSN: 0/10808EE0
|2021-09-07 06:29:00.694 CDT startup[18865] LOG:  redo in progress, elapsed time: 3.00 s, current LSN: 0/126B8C80

(Also, the GUC value is in the range 0..INT_MAX, so multiplying and storing to
another int could overflow.)

I think the convention is to for GUC global vars to be initialized with the
same default as in guc.c, so both should be 10000, like:

+int log_startup_progress_interval = 10 * 1000 /* 10sec */

-- 
Justin



Re: when the startup process doesn't (logging startup delays)

From
Nitin Jadhav
Date:
> I think you're confusing discussions.
>
> Robert was talking about initdb/bootstrap/single, and I separately and
> independently asked about hot standbys.  It seems like Robert and I agreed
> about the desired behavior and there was no further discussion.

Sorry for including 2 separate points into one.

> Looking over this version, I realized something I (or you) should have
> noticed sooner: you've added the RegisterTimeout call to
> InitPostgres(), but that's for things that are used by all backends,
> and this is only used by the startup process. So it seems to me that
> the right place is StartupProcessMain. That would have the further
> advantage of allowing startup_progress_timeout_handler to be made
> static. begin_startup_progress_phase() and
> startup_progress_timeout_has_expired() are the actual API functions
> though so they will need to remain extern.

The earlier discussion wrt this point is as follows.

> > I also agree that this is the better place to do it. Hence modified
> > the patch accordingly. The condition "!AmStartupProcess()" is added to
> > differentiate whether the call is done from a startup process or some
> > other process. Actually StartupXLOG() gets called in 2 places. one in
> > StartupProcessMain() and the other in InitPostgres(). As the logging
> > of the startup progress is required only during the startup process
> > and not in the other cases,
>
> The InitPostgres() case occurs when the server is started in bootstrap
> mode (during initdb) or in single-user mode (postgres --single). I do
> not see any reason why we shouldn't produce progress messages in at
> least the latter case. I suspect that someone who is in the rather
> desperate scenario of having to use single-user mode would really like
> to know how long the server is going to take to start up.
>
> Perhaps during initdb we don't want messages, but I'm not sure that we
> need to do anything about that here. None of the messages that the
> server normally produces show up when you run initdb, so I guess they
> are getting redirected to /dev/null or something.
>
> So I don't think that using AmStartupProcess() for this purpose is right.

This point is really confusing. As per the earlier discussion we
concluded to include RegisterTimeout() call even in case of
InitPostgres() to support logging in case of single-user mode. Now if
we remove the RegisterTimeout() call from InitPostgres(), we are not
going to support that anymore. Is this what you're trying to convey?
or we should add some checks and disable the code to RegisterTimeout()
if it is other than single-user mode. I have added a check if
(!IsPostmasterEnvironment) in the attached patch for this scenario.
Kindly confirm my understanding.

> > Should this feature distinguish between crash recovery and archive recovery on
> > a hot standby ?  Otherwise the standby will display this all the time.
> >
> >2021-08-14 16:13:33.139 CDT startup[11741] LOG:  redo in progress, elapsed time: 124.42 s, current LSN: 0/EEE2100
> >
> >If so, I think maybe you'd check !InArchiveRecovery (but until Robert finishes
> > cleanup of xlog.c variables, I can't say that with much confidence).
>
> Hmm. My inclination is to think that on an actual standby, you
> wouldn't want to get messages like this, but if you were doing a
> point-in-time-recovery, then you would. So I think maybe
> InArchiveRecovery is not the right thing. Perhaps StandbyMode?

I also feel that the log messages should be recorded in case of
point-in-time-recovery. So I have added a check if (!StandbyMode) and
verified the replication and point-in-time-recovery scenario.

> > Some doc comments:
>
> Thanks for the suggestions. I will take care in the next patch.

Fixed.

> Michael is right.  You updated some of the units based on Robert's suggestion
> to use MS, but didn't update all of the corresponding parts of the patch.
> guc.c says that the units are in MS, which means that unqualified values are
> interpretted as such.  But postgresql.conf.sample still says "seconds", and
> guc.c says the default value is "10", and you still do:
>
> +       interval_in_ms = log_startup_progress_interval * 1000;
>
> I checked that this currently does not interpret the value as ms:
> |./tmp_install/usr/local/pgsql/bin/postgres -D src/test/regress/tmp_check/data/ -c log_startup_progress_interval=1
> |2021-09-07 06:28:58.694 CDT startup[18865] LOG:  redo in progress, elapsed time: 1.00 s, current LSN: 0/E94ED88
> |2021-09-07 06:28:59.694 CDT startup[18865] LOG:  redo in progress, elapsed time: 2.00 s, current LSN: 0/10808EE0
> |2021-09-07 06:29:00.694 CDT startup[18865] LOG:  redo in progress, elapsed time: 3.00 s, current LSN: 0/126B8C80
>
> (Also, the GUC value is in the range 0..INT_MAX, so multiplying and storing to
> another int could overflow.)
>
> I think the convention is to for GUC global vars to be initialized with the
> same default as in guc.c, so both should be 10000, like:
>
> +int log_startup_progress_interval = 10 * 1000 /* 10sec */

Following is the discussion done wrt this point. Kindly refer and
share your thoughts.

> > > I suggest making the GUC GUC_UNIT_MS rather than GUC_UNIT_S, but
> > > expressing the default in postgresl.conf.sample as 10s rather than
> > > 10000ms. I tried values measured in milliseconds just for testing
> > > purposes and didn't initially understand why it wasn't working. I
> > > don't think there's any reason it can't work.
> >
> > As suggested, I have changed it to GUC_UNIT_MS and kept the default
> > value to 10s. I would like to know the reason why it can't be
> > GUC_UNIT_S as we are expressing the values in terms of seconds.
>
> I mean, it *could* be. There's just no advantage. Values in seconds
> will work correctly either way. But values in milliseconds will only
> work if it's GUC_UNIT_MS. It seems to me that it's better to make more
> things work rather than fewer.

Thanks & Regards,
Nitin Jadhav
On Tue, Sep 7, 2021 at 5:19 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Tue, Sep 07, 2021 at 03:07:15PM +0530, Nitin Jadhav wrote:
> > > Looking over this version, I realized something I (or you) should have
> > > noticed sooner: you've added the RegisterTimeout call to
> > > InitPostgres(), but that's for things that are used by all backends,
> > > and this is only used by the startup process. So it seems to me that
> > > the right place is StartupProcessMain. That would have the further
> > > advantage of allowing startup_progress_timeout_handler to be made
> > > static. begin_startup_progress_phase() and
> > > startup_progress_timeout_has_expired() are the actual API functions
> > > though so they will need to remain extern.
> > >
> > > I agree with Robert that a standby server should not continuously show timing
> > > messages for WAL replay.
> >
> > The earlier discussion wrt this point is as follows.
>
> I think you're confusing discussions.
>
> Robert was talking about initdb/bootstrap/single, and I separately and
> independently asked about hot standbys.  It seems like Robert and I agreed
> about the desired behavior and there was no further discussion.
>
> > > Honestly, I don't see why we should have
> > > a GUC for what's proposed here.  A value too low would bloat the logs
> > > with entries that are not that meaningful.  A value too large would
> > > just prevent access to some information wanted.  Wouldn't it be better
> > > to just pick up a value like 10s or 20s?
>
> I don't think bloating logs is a issue for values > 10sec.
>
> You agreed that it's important to choose the "right" value, but I think that
> will vary between users.  Some installations with large checkpoint_timeout
> might anticipate taking 15+min to perform recovery, but others might even have
> a strict requirement that recovery must not take more than (say) 10sec; someone
> might want to use this to verify that, or to optimize the slow parts of
> recovery, with an interval that someone else might not care about.
>
> > > +       {"log_startup_progress_interval", PGC_SIGHUP, LOGGING_WHEN,
> > > +           gettext_noop("Sets the time interval between each progress update "
> > > +                        "of the startup process."),
> > > +           gettext_noop("0 logs all messages. -1 turns this feature off."),
> > > +           GUC_UNIT_MS,
> |+               10, -1, INT_MAX,
> > > The unit is incorrect here, as that would default to 10ms, contrary to
> > > what the documentation says about 10s.
> >
> > Kindly refer the previous few discussions wrt this point.
>
> You copied and pasted unrelated emails, which isn't helpful.
>
> Michael is right.  You updated some of the units based on Robert's suggestion
> to use MS, but didn't update all of the corresponding parts of the patch.
> guc.c says that the units are in MS, which means that unqualified values are
> interpretted as such.  But postgresql.conf.sample still says "seconds", and
> guc.c says the default value is "10", and you still do:
>
> +       interval_in_ms = log_startup_progress_interval * 1000;
>
> I checked that this currently does not interpret the value as ms:
> |./tmp_install/usr/local/pgsql/bin/postgres -D src/test/regress/tmp_check/data/ -c log_startup_progress_interval=1
> |2021-09-07 06:28:58.694 CDT startup[18865] LOG:  redo in progress, elapsed time: 1.00 s, current LSN: 0/E94ED88
> |2021-09-07 06:28:59.694 CDT startup[18865] LOG:  redo in progress, elapsed time: 2.00 s, current LSN: 0/10808EE0
> |2021-09-07 06:29:00.694 CDT startup[18865] LOG:  redo in progress, elapsed time: 3.00 s, current LSN: 0/126B8C80
>
> (Also, the GUC value is in the range 0..INT_MAX, so multiplying and storing to
> another int could overflow.)
>
> I think the convention is to for GUC global vars to be initialized with the
> same default as in guc.c, so both should be 10000, like:
>
> +int log_startup_progress_interval = 10 * 1000 /* 10sec */
>
> --
> Justin

Attachment

Re: when the startup process doesn't (logging startup delays)

From
Nitin Jadhav
Date:
> Michael is right.  You updated some of the units based on Robert's suggestion
> to use MS, but didn't update all of the corresponding parts of the patch.
> guc.c says that the units are in MS, which means that unqualified values are
> interpretted as such.  But postgresql.conf.sample still says "seconds", and
> guc.c says the default value is "10", and you still do:
>
> +       interval_in_ms = log_startup_progress_interval * 1000;
>
> I checked that this currently does not interpret the value as ms:
> |./tmp_install/usr/local/pgsql/bin/postgres -D src/test/regress/tmp_check/data/ -c log_startup_progress_interval=1
> |2021-09-07 06:28:58.694 CDT startup[18865] LOG:  redo in progress, elapsed time: 1.00 s, current LSN: 0/E94ED88
> |2021-09-07 06:28:59.694 CDT startup[18865] LOG:  redo in progress, elapsed time: 2.00 s, current LSN: 0/10808EE0
> |2021-09-07 06:29:00.694 CDT startup[18865] LOG:  redo in progress, elapsed time: 3.00 s, current LSN: 0/126B8C80
>
> (Also, the GUC value is in the range 0..INT_MAX, so multiplying and storing to
> another int could overflow.)
>
> I think the convention is to for GUC global vars to be initialized with the
> same default as in guc.c, so both should be 10000, like:
>
> +int log_startup_progress_interval = 10 * 1000 /* 10sec */

Thanks Justin for the detailed explanation.  Done the necessary changes.

Please find the updated patch attached.


On Mon, Sep 13, 2021 at 8:32 PM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
>
> > I think you're confusing discussions.
> >
> > Robert was talking about initdb/bootstrap/single, and I separately and
> > independently asked about hot standbys.  It seems like Robert and I agreed
> > about the desired behavior and there was no further discussion.
>
> Sorry for including 2 separate points into one.
>
> > Looking over this version, I realized something I (or you) should have
> > noticed sooner: you've added the RegisterTimeout call to
> > InitPostgres(), but that's for things that are used by all backends,
> > and this is only used by the startup process. So it seems to me that
> > the right place is StartupProcessMain. That would have the further
> > advantage of allowing startup_progress_timeout_handler to be made
> > static. begin_startup_progress_phase() and
> > startup_progress_timeout_has_expired() are the actual API functions
> > though so they will need to remain extern.
>
> The earlier discussion wrt this point is as follows.
>
> > > I also agree that this is the better place to do it. Hence modified
> > > the patch accordingly. The condition "!AmStartupProcess()" is added to
> > > differentiate whether the call is done from a startup process or some
> > > other process. Actually StartupXLOG() gets called in 2 places. one in
> > > StartupProcessMain() and the other in InitPostgres(). As the logging
> > > of the startup progress is required only during the startup process
> > > and not in the other cases,
> >
> > The InitPostgres() case occurs when the server is started in bootstrap
> > mode (during initdb) or in single-user mode (postgres --single). I do
> > not see any reason why we shouldn't produce progress messages in at
> > least the latter case. I suspect that someone who is in the rather
> > desperate scenario of having to use single-user mode would really like
> > to know how long the server is going to take to start up.
> >
> > Perhaps during initdb we don't want messages, but I'm not sure that we
> > need to do anything about that here. None of the messages that the
> > server normally produces show up when you run initdb, so I guess they
> > are getting redirected to /dev/null or something.
> >
> > So I don't think that using AmStartupProcess() for this purpose is right.
>
> This point is really confusing. As per the earlier discussion we
> concluded to include RegisterTimeout() call even in case of
> InitPostgres() to support logging in case of single-user mode. Now if
> we remove the RegisterTimeout() call from InitPostgres(), we are not
> going to support that anymore. Is this what you're trying to convey?
> or we should add some checks and disable the code to RegisterTimeout()
> if it is other than single-user mode. I have added a check if
> (!IsPostmasterEnvironment) in the attached patch for this scenario.
> Kindly confirm my understanding.
>
> > > Should this feature distinguish between crash recovery and archive recovery on
> > > a hot standby ?  Otherwise the standby will display this all the time.
> > >
> > >2021-08-14 16:13:33.139 CDT startup[11741] LOG:  redo in progress, elapsed time: 124.42 s, current LSN: 0/EEE2100
> > >
> > >If so, I think maybe you'd check !InArchiveRecovery (but until Robert finishes
> > > cleanup of xlog.c variables, I can't say that with much confidence).
> >
> > Hmm. My inclination is to think that on an actual standby, you
> > wouldn't want to get messages like this, but if you were doing a
> > point-in-time-recovery, then you would. So I think maybe
> > InArchiveRecovery is not the right thing. Perhaps StandbyMode?
>
> I also feel that the log messages should be recorded in case of
> point-in-time-recovery. So I have added a check if (!StandbyMode) and
> verified the replication and point-in-time-recovery scenario.
>
> > > Some doc comments:
> >
> > Thanks for the suggestions. I will take care in the next patch.
>
> Fixed.
>
> > Michael is right.  You updated some of the units based on Robert's suggestion
> > to use MS, but didn't update all of the corresponding parts of the patch.
> > guc.c says that the units are in MS, which means that unqualified values are
> > interpretted as such.  But postgresql.conf.sample still says "seconds", and
> > guc.c says the default value is "10", and you still do:
> >
> > +       interval_in_ms = log_startup_progress_interval * 1000;
> >
> > I checked that this currently does not interpret the value as ms:
> > |./tmp_install/usr/local/pgsql/bin/postgres -D src/test/regress/tmp_check/data/ -c log_startup_progress_interval=1
> > |2021-09-07 06:28:58.694 CDT startup[18865] LOG:  redo in progress, elapsed time: 1.00 s, current LSN: 0/E94ED88
> > |2021-09-07 06:28:59.694 CDT startup[18865] LOG:  redo in progress, elapsed time: 2.00 s, current LSN: 0/10808EE0
> > |2021-09-07 06:29:00.694 CDT startup[18865] LOG:  redo in progress, elapsed time: 3.00 s, current LSN: 0/126B8C80
> >
> > (Also, the GUC value is in the range 0..INT_MAX, so multiplying and storing to
> > another int could overflow.)
> >
> > I think the convention is to for GUC global vars to be initialized with the
> > same default as in guc.c, so both should be 10000, like:
> >
> > +int log_startup_progress_interval = 10 * 1000 /* 10sec */
>
> Following is the discussion done wrt this point. Kindly refer and
> share your thoughts.
>
> > > > I suggest making the GUC GUC_UNIT_MS rather than GUC_UNIT_S, but
> > > > expressing the default in postgresl.conf.sample as 10s rather than
> > > > 10000ms. I tried values measured in milliseconds just for testing
> > > > purposes and didn't initially understand why it wasn't working. I
> > > > don't think there's any reason it can't work.
> > >
> > > As suggested, I have changed it to GUC_UNIT_MS and kept the default
> > > value to 10s. I would like to know the reason why it can't be
> > > GUC_UNIT_S as we are expressing the values in terms of seconds.
> >
> > I mean, it *could* be. There's just no advantage. Values in seconds
> > will work correctly either way. But values in milliseconds will only
> > work if it's GUC_UNIT_MS. It seems to me that it's better to make more
> > things work rather than fewer.
>
> Thanks & Regards,
> Nitin Jadhav
> On Tue, Sep 7, 2021 at 5:19 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> >
> > On Tue, Sep 07, 2021 at 03:07:15PM +0530, Nitin Jadhav wrote:
> > > > Looking over this version, I realized something I (or you) should have
> > > > noticed sooner: you've added the RegisterTimeout call to
> > > > InitPostgres(), but that's for things that are used by all backends,
> > > > and this is only used by the startup process. So it seems to me that
> > > > the right place is StartupProcessMain. That would have the further
> > > > advantage of allowing startup_progress_timeout_handler to be made
> > > > static. begin_startup_progress_phase() and
> > > > startup_progress_timeout_has_expired() are the actual API functions
> > > > though so they will need to remain extern.
> > > >
> > > > I agree with Robert that a standby server should not continuously show timing
> > > > messages for WAL replay.
> > >
> > > The earlier discussion wrt this point is as follows.
> >
> > I think you're confusing discussions.
> >
> > Robert was talking about initdb/bootstrap/single, and I separately and
> > independently asked about hot standbys.  It seems like Robert and I agreed
> > about the desired behavior and there was no further discussion.
> >
> > > > Honestly, I don't see why we should have
> > > > a GUC for what's proposed here.  A value too low would bloat the logs
> > > > with entries that are not that meaningful.  A value too large would
> > > > just prevent access to some information wanted.  Wouldn't it be better
> > > > to just pick up a value like 10s or 20s?
> >
> > I don't think bloating logs is a issue for values > 10sec.
> >
> > You agreed that it's important to choose the "right" value, but I think that
> > will vary between users.  Some installations with large checkpoint_timeout
> > might anticipate taking 15+min to perform recovery, but others might even have
> > a strict requirement that recovery must not take more than (say) 10sec; someone
> > might want to use this to verify that, or to optimize the slow parts of
> > recovery, with an interval that someone else might not care about.
> >
> > > > +       {"log_startup_progress_interval", PGC_SIGHUP, LOGGING_WHEN,
> > > > +           gettext_noop("Sets the time interval between each progress update "
> > > > +                        "of the startup process."),
> > > > +           gettext_noop("0 logs all messages. -1 turns this feature off."),
> > > > +           GUC_UNIT_MS,
> > |+               10, -1, INT_MAX,
> > > > The unit is incorrect here, as that would default to 10ms, contrary to
> > > > what the documentation says about 10s.
> > >
> > > Kindly refer the previous few discussions wrt this point.
> >
> > You copied and pasted unrelated emails, which isn't helpful.
> >
> > Michael is right.  You updated some of the units based on Robert's suggestion
> > to use MS, but didn't update all of the corresponding parts of the patch.
> > guc.c says that the units are in MS, which means that unqualified values are
> > interpretted as such.  But postgresql.conf.sample still says "seconds", and
> > guc.c says the default value is "10", and you still do:
> >
> > +       interval_in_ms = log_startup_progress_interval * 1000;
> >
> > I checked that this currently does not interpret the value as ms:
> > |./tmp_install/usr/local/pgsql/bin/postgres -D src/test/regress/tmp_check/data/ -c log_startup_progress_interval=1
> > |2021-09-07 06:28:58.694 CDT startup[18865] LOG:  redo in progress, elapsed time: 1.00 s, current LSN: 0/E94ED88
> > |2021-09-07 06:28:59.694 CDT startup[18865] LOG:  redo in progress, elapsed time: 2.00 s, current LSN: 0/10808EE0
> > |2021-09-07 06:29:00.694 CDT startup[18865] LOG:  redo in progress, elapsed time: 3.00 s, current LSN: 0/126B8C80
> >
> > (Also, the GUC value is in the range 0..INT_MAX, so multiplying and storing to
> > another int could overflow.)
> >
> > I think the convention is to for GUC global vars to be initialized with the
> > same default as in guc.c, so both should be 10000, like:
> >
> > +int log_startup_progress_interval = 10 * 1000 /* 10sec */
> >
> > --
> > Justin

Attachment

Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Wed, Sep 22, 2021 at 10:28 AM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
> Thanks Justin for the detailed explanation.  Done the necessary changes.

Not really. The documentation here does not make a ton of sense:

+         Sets the time interval between each progress update of the operations
+         performed by the startup process. This produces the log messages for
+         those operations which take longer than the specified
duration. The unit
+         used to specify the value is milliseconds. For example, if
you set it to
+         <literal> 10s </literal>, then every <literal> 10s
</literal>, a log is
+         emitted indicating which operation is ongoing, and the
elapsed time from
+         the beginning of the operation. If the value is set to
<literal> 0 </literal>,
+         then all messages for such operations are logged. <literal>
-1 </literal>
+         disables the feature. The default value is <literal> 10s </literal>

I really don't know what to say about this. You say that the time is
measured in milliseconds, and then immediately turn around and say
"For example, if you set it to 10s". Now we do expect that most people
will set it to intervals that are measured in seconds rather than
milliseconds, but saying that setting it to a value measured in
seconds is an example of setting it in milliseconds is not logical. It
also looks pretty silly to say that if you set the value to 10s,
something will happen every 10s. What else would anyone expect to
happen? You really need to give some thought to how to explain the
behavior in a way that is clear and logical but not overly wordy.
Also, please note that you've got spaces around the literals, which
does not match the surrounding style and does not render properly in
HTML.

It is also not logical to define 0 as meaning that "all messages for
such operations are logged". What does that even mean? It makes sense
for something like log_autovacuum_min_duration, because there we are
talking about logging one message per operation, and we could log
messages for all operations or just some of them. Here we are talking
about the time between one message and the next, so talking about "all
messages" does not really seem to make a lot of sense. Experimentally,
what 0 actually does is cause the system to spam log lines in a tight
loop, which cannot be what anyone wants. I think you should make 0
mean disabled, and a positive value mean log at that interval, and
disallow -1 altogether.

And on that note, I tested your patch with
log_startup_progress_interval=-1 and found that -1 behaves just like
0. In other words, contrary to what the documentation says, -1 does
not disable the feature. It instead behaves just like 0. It's really
confusing to me how you write documentation that says -1 has a certain
behavior without thinking about the fact that you haven't written any
code that would make -1 behave that way. And apparently you didn't
test it, either. It took me approximately 1 minute of testing to find
that this is broken, which really is not very much.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Nitin Jadhav
Date:
> I really don't know what to say about this. You say that the time is
> measured in milliseconds, and then immediately turn around and say
> "For example, if you set it to 10s". Now we do expect that most people
> will set it to intervals that are measured in seconds rather than
> milliseconds, but saying that setting it to a value measured in
> seconds is an example of setting it in milliseconds is not logical.

Based on the statement "I suggest making the GUC GUC_UNIT_MS rather
than GUC_UNIT_S, but expressing the default in postgresl.conf.sample
as 10s rather than 10000ms", I have used the default value in the
postgresl.conf.sample as 10s rather than 10000ms. So I just used the
same value in the example too in config.sgml. If it is really getting
confusing, I will change it to 100ms in config.sgml.

> It also looks pretty silly to say that if you set the value to 10s,
> something will happen every 10s. What else would anyone expect to
> happen? You really need to give some thought to how to explain the
> behavior in a way that is clear and logical but not overly wordy.

Added a few lines about that. "For example, if you set it to 1000ms,
then it tries to emit a log every 1000ms. If the log message is not
available at every 100th millisecond, then there is a possibility of
delay in emitting the log. If the delay is more than a cycle or if the
system clock gets set backwards then the next attempt is done based on
the last logging time, otherwise the delay gets adjusted in the next
attempt."

Please correct the explanation if it does not meet your expectations.

> Also, please note that you've got spaces around the literals, which
> does not match the surrounding style and does not render properly in
> HTML.

Fixed.

> It is also not logical to define 0 as meaning that "all messages for
> such operations are logged". What does that even mean? It makes sense
> for something like log_autovacuum_min_duration, because there we are
> talking about logging one message per operation, and we could log
> messages for all operations or just some of them. Here we are talking
> about the time between one message and the next, so talking about "all
> messages" does not really seem to make a lot of sense. Experimentally,
> what 0 actually does is cause the system to spam log lines in a tight
> loop, which cannot be what anyone wants. I think you should make 0
> mean disabled, and a positive value mean log at that interval, and
> disallow -1 altogether.

Made changes which indicate 0 mean disabled, > 0 mean interval in
millisecond and removed -1.

Please find the patch attached.



On Thu, Sep 23, 2021 at 9:44 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Sep 22, 2021 at 10:28 AM Nitin Jadhav
> <nitinjadhavpostgres@gmail.com> wrote:
> > Thanks Justin for the detailed explanation.  Done the necessary changes.
>
> Not really. The documentation here does not make a ton of sense:
>
> +         Sets the time interval between each progress update of the operations
> +         performed by the startup process. This produces the log messages for
> +         those operations which take longer than the specified
> duration. The unit
> +         used to specify the value is milliseconds. For example, if
> you set it to
> +         <literal> 10s </literal>, then every <literal> 10s
> </literal>, a log is
> +         emitted indicating which operation is ongoing, and the
> elapsed time from
> +         the beginning of the operation. If the value is set to
> <literal> 0 </literal>,
> +         then all messages for such operations are logged. <literal>
> -1 </literal>
> +         disables the feature. The default value is <literal> 10s </literal>
>
> I really don't know what to say about this. You say that the time is
> measured in milliseconds, and then immediately turn around and say
> "For example, if you set it to 10s". Now we do expect that most people
> will set it to intervals that are measured in seconds rather than
> milliseconds, but saying that setting it to a value measured in
> seconds is an example of setting it in milliseconds is not logical. It
> also looks pretty silly to say that if you set the value to 10s,
> something will happen every 10s. What else would anyone expect to
> happen? You really need to give some thought to how to explain the
> behavior in a way that is clear and logical but not overly wordy.
> Also, please note that you've got spaces around the literals, which
> does not match the surrounding style and does not render properly in
> HTML.
>
> It is also not logical to define 0 as meaning that "all messages for
> such operations are logged". What does that even mean? It makes sense
> for something like log_autovacuum_min_duration, because there we are
> talking about logging one message per operation, and we could log
> messages for all operations or just some of them. Here we are talking
> about the time between one message and the next, so talking about "all
> messages" does not really seem to make a lot of sense. Experimentally,
> what 0 actually does is cause the system to spam log lines in a tight
> loop, which cannot be what anyone wants. I think you should make 0
> mean disabled, and a positive value mean log at that interval, and
> disallow -1 altogether.
>
> And on that note, I tested your patch with
> log_startup_progress_interval=-1 and found that -1 behaves just like
> 0. In other words, contrary to what the documentation says, -1 does
> not disable the feature. It instead behaves just like 0. It's really
> confusing to me how you write documentation that says -1 has a certain
> behavior without thinking about the fact that you haven't written any
> code that would make -1 behave that way. And apparently you didn't
> test it, either. It took me approximately 1 minute of testing to find
> that this is broken, which really is not very much.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com

Attachment

Re: when the startup process doesn't (logging startup delays)

From
Justin Pryzby
Date:
On Mon, Sep 27, 2021 at 04:57:20PM +0530, Nitin Jadhav wrote:
> > It is also not logical to define 0 as meaning that "all messages for
> > such operations are logged". What does that even mean? It makes sense
> > for something like log_autovacuum_min_duration, because there we are
> > talking about logging one message per operation, and we could log
> > messages for all operations or just some of them. Here we are talking
> > about the time between one message and the next, so talking about "all
> > messages" does not really seem to make a lot of sense. Experimentally,
> > what 0 actually does is cause the system to spam log lines in a tight
> > loop, which cannot be what anyone wants. I think you should make 0
> > mean disabled, and a positive value mean log at that interval, and
> > disallow -1 altogether.
> 
> Made changes which indicate 0 mean disabled, > 0 mean interval in
> millisecond and removed -1.
> 
> Please find the patch attached.

I think you misunderstood - Robert was saying that interval=0 doesn't work, not
suggesting that you write more documentation about it.

Also, I agree with Robert that the documentation is too verbose.  I don't think
you need to talk about what happens if the clock goes backwards (It just needs
to behave conveniently).

Look at the other _duration statements for what they say about units.
"If this value is specified without units, it is taken as milliseconds."
https://www.postgresql.org/docs/14/runtime-config-logging.html
 log_autovacuum_min_duration
 log_min_duration_statement

>>It also looks pretty silly to say that if you set the value to 10s, something
>>will happen every 10s. What else would anyone expect to happen?

@Robert: that's consistent with existing documentation, even though it might
seem obvious and silly to us.

| For example, if you set this to 250ms then all automatic vacuums and analyzes that run 250ms or longer will be
logged
| For example, if you set it to 250ms then all SQL statements that run 250ms or longer will be logged

-- 
Justin



Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Mon, Sep 27, 2021 at 7:26 AM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
> > I really don't know what to say about this. You say that the time is
> > measured in milliseconds, and then immediately turn around and say
> > "For example, if you set it to 10s". Now we do expect that most people
> > will set it to intervals that are measured in seconds rather than
> > milliseconds, but saying that setting it to a value measured in
> > seconds is an example of setting it in milliseconds is not logical.
>
> Based on the statement "I suggest making the GUC GUC_UNIT_MS rather
> than GUC_UNIT_S, but expressing the default in postgresl.conf.sample
> as 10s rather than 10000ms", I have used the default value in the
> postgresl.conf.sample as 10s rather than 10000ms. So I just used the
> same value in the example too in config.sgml. If it is really getting
> confusing, I will change it to 100ms in config.sgml.

That's really not what I'm complaining about. I think if we're going
to give an example at all, 10ms is a better example than 100ms,
because 10s is a value that people are more likely to find useful. But
I'm not sure that it's necessary to mention a specific value, and if
it is, I think it needs to be phrased in a less confusing way.

> Made changes which indicate 0 mean disabled, > 0 mean interval in
> millisecond and removed -1.

Well, I see that -1 is now disallowed, and that's good as far as it
goes, but 0 still does not actually disable the feature. I don't
understand why you posted the previous version of the patch without
testing that it works, and I even less understand why you are posting
another version without fixing the bug that I pointed out to you in
the last version.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Mon, Sep 27, 2021 at 9:32 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> >>It also looks pretty silly to say that if you set the value to 10s, something
> >>will happen every 10s. What else would anyone expect to happen?
>
> @Robert: that's consistent with existing documentation, even though it might
> seem obvious and silly to us.
>
> | For example, if you set this to 250ms then all automatic vacuums and analyzes that run 250ms or longer will be
logged
> | For example, if you set it to 250ms then all SQL statements that run 250ms or longer will be logged

Fair enough, but I still don't like it much. I tried my hand at
rewriting this and came up with the attached:

+         Sets the amount of time after which the startup process will log
+         a message about a long-running operation that is still in progress,
+         as well as the interval between further progress messages for that
+         operation. This setting is applied separately to each operation.
+         For example, if syncing the data directory takes 25 seconds and
+         thereafter resetting unlogged relations takes 8 seconds, and if this
+         setting has the default value of 10 seconds, then a messages will be
+         logged for syncing the data directory after it has been in progress
+         for 10 seconds and again after it has been in progress for 20 seconds,
+         but nothing will be logged for resetting unlogged operations.
+         A setting of <literal>0</literal> disables the feature.

I prefer this to Nitin's version because I think it could be unclear
to someone that the value applies separately to each operation,
whereas I don't think we need to document that we can't guarantee that
the messages will be perfectly on time - that's true of every kind of
scheduled event in pretty much every computer system - or what happens
if the system clock goes backwards. Those are things we should try to
get right, as well as we can anyway, but we don't need to tell the
user that we got them right.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Alvaro Herrera
Date:
> +/*
> + * Decides whether to log the startup progress or not based on whether the
> + * timer is expired or not. Returns FALSE if the timer is not expired, otherwise
> + * calculates the elapsed time and sets the respective out parameters secs and
> + * usecs. Enables the timer for the next log message and returns TRUE.
> + */
> +bool
> +startup_progress_timeout_has_expired(long *secs, int *usecs)

I think this comment can be worded better.  It says it "decides", but it
doesn't actually decide on any action to take -- it just reports whether
the timer expired or not, to allow its caller to make the decision.  In
such situations we just say something like "Report whether startup
progress has caused a timeout, return true and rearm the timer if it
did, or just return false otherwise"; and we don't indicate what the
value is going to be used *for*.  Then the caller can use the boolean
return value to make a decision, such as whether something is going to
be logged.  This function can be oblivious to details such as this:

> +    /* If the timeout has not occurred, then no need to log the details. */
> +    if (!startup_progress_timer_expired)
> +        return false;

here we can just say "No timeout has occurred" and make no inference
about what's going to happen or not happen.

Also, for functions that do things like this we typically use English
sentence structure with the function name starting with the verb --
perhaps has_startup_progress_timeout_expired().  Sometimes we are lax
about this if we have some sort of poor-man's modularisation by using a
common prefix for several functions doing related things, which perhaps
could be "startup_progress_*" in your case, but your other functions are
already not doing that (such as begin_startup_progress_phase) so it's
not clear why you would not use the most natural name for this one.

> +    ereport_startup_progress("syncing data directory (syncfs), elapsed time: %ld.%02d s, current path: %s",
> +                             path);

Please make sure to add ereport_startup_progress() as a translation
trigger in src/backend/nls.mk.

-- 
Álvaro Herrera           39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/



Re: when the startup process doesn't (logging startup delays)

From
Nitin Jadhav
Date:
> That's really not what I'm complaining about. I think if we're going
> to give an example at all, 10ms is a better example than 100ms,
> because 10s is a value that people are more likely to find useful. But
> I'm not sure that it's necessary to mention a specific value, and if
> it is, I think it needs to be phrased in a less confusing way.
>
> > >>It also looks pretty silly to say that if you set the value to 10s, something
> > >>will happen every 10s. What else would anyone expect to happen?
> >
> > @Robert: that's consistent with existing documentation, even though it might
> > seem obvious and silly to us.
> >
> > | For example, if you set this to 250ms then all automatic vacuums and analyzes that run 250ms or longer will be
logged
> > | For example, if you set it to 250ms then all SQL statements that run 250ms or longer will be logged
>
> Fair enough, but I still don't like it much. I tried my hand at
> rewriting this and came up with the attached:
>
> +         Sets the amount of time after which the startup process will log
> +         a message about a long-running operation that is still in progress,
> +         as well as the interval between further progress messages for that
> +         operation. This setting is applied separately to each operation.
> +         For example, if syncing the data directory takes 25 seconds and
> +         thereafter resetting unlogged relations takes 8 seconds, and if this
> +         setting has the default value of 10 seconds, then a messages will be
> +         logged for syncing the data directory after it has been in progress
> +         for 10 seconds and again after it has been in progress for 20 seconds,
> +         but nothing will be logged for resetting unlogged operations.
> +         A setting of <literal>0</literal> disables the feature.
>
> I prefer this to Nitin's version because I think it could be unclear
> to someone that the value applies separately to each operation,
> whereas I don't think we need to document that we can't guarantee that
> the messages will be perfectly on time - that's true of every kind of
> scheduled event in pretty much every computer system - or what happens
> if the system clock goes backwards. Those are things we should try to
> get right, as well as we can anyway, but we don't need to tell the
> user that we got them right.

I thought mentioning the unit in milliseconds and the example in
seconds would confuse the user, so I changed the example to
milliseconds.The message behind the above description looks good to me
however I feel some sentences can be explained in less words. The
information related to the units is missing and I feel it should be
mentioned in the document. The example says, if the setting has the
default value of 10 seconds, then it explains the behaviour. I feel it
may not be the default value, it can be any value set by the user. So
mentioning 'default' in the example does not look good to me. I feel
we just have to mention "if this setting is set to the value of 10
seconds". Below is the modified version of the above information.

+         Sets the amount of time after every such interval the startup process
+         will log a message about a long-running operation that is still in
+         progress. This setting is applied separately to each operation.
+         For example, if syncing the data directory takes 25 seconds and
+         thereafter resetting unlogged relations takes 8 seconds, and if this
+         setting is set to the value of 10 seconds, then a messages will be
+         logged for syncing the data directory after it has been in progress
+         for 10 seconds and again after it has been in progress for 20 seconds,
+         but nothing will be logged for resetting unlogged operations.
+         A setting of <literal>0</literal> disables the feature. If this value
+         is specified without units, it is taken as milliseconds.

> Well, I see that -1 is now disallowed, and that's good as far as it
> goes, but 0 still does not actually disable the feature. I don't
> understand why you posted the previous version of the patch without
> testing that it works, and I even less understand why you are posting
> another version without fixing the bug that I pointed out to you in
> the last version.

I had added additional code to check the value of the
'log_startup_progress_interval' variable and  disable the feature in
case of -1 in the earlier versions of the patch (Specifically
v9.patch). There was a review comment for v9 patch and it resulted in
major refactoring of the patch. The comment was

> With these changes you'd have only 1 place in the code that needs to
> care about log_startup_progress_interval, as opposed to 3 as you have
> it currently, and only one place that enables the timeout, as opposed
> to 2 as you have it currently. I think that would be tidier.

Based on the above comment and the idea behind enabling the timer, it
does not log anything if the value is set to -1. So I thought there is
no extra code necessary to disable the feature even though it executes
through the code flow. So I did not worry about adding logic to
disable the feature. I will take care of this in the next patch.

The answer for the question of "I don't understand why you posted the
previous version of the patch without testing that it works" is, for
the value of -1, the above description was my understanding and for
the value of 0, the older versions of the patch was behaving as
documented. But with the later versions the behaviour got changed and
I missed to modify the documentation. So I modified it in the last
version.

Please share your thoughts.

Thanks & Regards,
Nitin Jadhav
On Mon, Sep 27, 2021 at 9:47 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Sep 27, 2021 at 9:32 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > >>It also looks pretty silly to say that if you set the value to 10s, something
> > >>will happen every 10s. What else would anyone expect to happen?
> >
> > @Robert: that's consistent with existing documentation, even though it might
> > seem obvious and silly to us.
> >
> > | For example, if you set this to 250ms then all automatic vacuums and analyzes that run 250ms or longer will be
logged
> > | For example, if you set it to 250ms then all SQL statements that run 250ms or longer will be logged
>
> Fair enough, but I still don't like it much. I tried my hand at
> rewriting this and came up with the attached:
>
> +         Sets the amount of time after which the startup process will log
> +         a message about a long-running operation that is still in progress,
> +         as well as the interval between further progress messages for that
> +         operation. This setting is applied separately to each operation.
> +         For example, if syncing the data directory takes 25 seconds and
> +         thereafter resetting unlogged relations takes 8 seconds, and if this
> +         setting has the default value of 10 seconds, then a messages will be
> +         logged for syncing the data directory after it has been in progress
> +         for 10 seconds and again after it has been in progress for 20 seconds,
> +         but nothing will be logged for resetting unlogged operations.
> +         A setting of <literal>0</literal> disables the feature.
>
> I prefer this to Nitin's version because I think it could be unclear
> to someone that the value applies separately to each operation,
> whereas I don't think we need to document that we can't guarantee that
> the messages will be perfectly on time - that's true of every kind of
> scheduled event in pretty much every computer system - or what happens
> if the system clock goes backwards. Those are things we should try to
> get right, as well as we can anyway, but we don't need to tell the
> user that we got them right.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Tue, Sep 28, 2021 at 8:06 AM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
> I thought mentioning the unit in milliseconds and the example in
> seconds would confuse the user, so I changed the example to
> milliseconds.The message behind the above description looks good to me
> however I feel some sentences can be explained in less words. The
> information related to the units is missing and I feel it should be
> mentioned in the document. The example says, if the setting has the
> default value of 10 seconds, then it explains the behaviour. I feel it
> may not be the default value, it can be any value set by the user. So
> mentioning 'default' in the example does not look good to me. I feel
> we just have to mention "if this setting is set to the value of 10
> seconds". Below is the modified version of the above information.

It is common to mention what the default is as part of the
documentation of a GUC. I don't see why this one should be an
exception, especially since not mentioning it reduces the length of
the documentation by exactly one word.

I don't mind the sentence you added at the end to clarify the default
units, but the way you've rewritten the first sentence makes it, in my
opinion, much less clear.

> I had added additional code to check the value of the
> 'log_startup_progress_interval' variable and  disable the feature in
> case of -1 in the earlier versions of the patch (Specifically
> v9.patch). There was a review comment for v9 patch and it resulted in
> major refactoring of the patch.
...
> The answer for the question of "I don't understand why you posted the
> previous version of the patch without testing that it works" is, for
> the value of -1, the above description was my understanding and for
> the value of 0, the older versions of the patch was behaving as
> documented. But with the later versions the behaviour got changed and
> I missed to modify the documentation. So I modified it in the last
> version.

v9 was posted on August 3rd. I told you that it wasn't working on
September 23rd. You posted a new version that still does not work on
September 27th. I think you should have tested each version of your
patch before posting it, and especially after any major refactorings.
And if for whatever reason you didn't, then certainly after I told you
on September 23rd that it didn't work, I think you should have fixed
it before posting a new version.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Nitin Jadhav
Date:
> It is common to mention what the default is as part of the
> documentation of a GUC. I don't see why this one should be an
> exception, especially since not mentioning it reduces the length of
> the documentation by exactly one word.
>
> I don't mind the sentence you added at the end to clarify the default
> units, but the way you've rewritten the first sentence makes it, in my
> opinion, much less clear.

Ok. I have kept your documentation as it is and added the sentence at
the end to clarify the default units.

> v9 was posted on August 3rd. I told you that it wasn't working on
> September 23rd. You posted a new version that still does not work on
> September 27th. I think you should have tested each version of your
> patch before posting it, and especially after any major refactorings.
> And if for whatever reason you didn't, then certainly after I told you
> on September 23rd that it didn't work, I think you should have fixed
> it before posting a new version.

Sorry. There was a misunderstanding about this and for the patch
shared on September 27th, I had tested for the value '0' and observed
that no progress messages were getting logged, probably the time at
which 'enable_timeout_at' is getting called is past the 'next_timeout'
value. This behaviour is completely dependent on the system. Now added
an extra condition to disable the feature in case of '0' setting.

> I think this comment can be worded better.  It says it "decides", but it
> doesn't actually decide on any action to take -- it just reports whether
> the timer expired or not, to allow its caller to make the decision.  In
> such situations we just say something like "Report whether startup
> progress has caused a timeout, return true and rearm the timer if it
> did, or just return false otherwise"; and we don't indicate what the
> value is going to be used *for*.  Then the caller can use the boolean
> return value to make a decision, such as whether something is going to
> be logged.  This function can be oblivious to details such as this:
>
> here we can just say "No timeout has occurred" and make no inference
> about what's going to happen or not happen.

Modified the comment.

> Also, for functions that do things like this we typically use English
> sentence structure with the function name starting with the verb --
> perhaps has_startup_progress_timeout_expired().  Sometimes we are lax
> about this if we have some sort of poor-man's modularisation by using a
> common prefix for several functions doing related things, which perhaps
> could be "startup_progress_*" in your case, but your other functions are
> already not doing that (such as begin_startup_progress_phase) so it's
> not clear why you would not use the most natural name for this one.

I agree that has_startup_progress_timeout_expired() is better than the
existing function name. So I changed the function name accordingly.

> Please make sure to add ereport_startup_progress() as a translation
> trigger in src/backend/nls.mk.

I have added ereport_startup_progress() under the section
GETTEXT_TRIGGERS and GETTEXT_FLAGS in src/backend/nls.mk. Also
verified the messages in src/backend/po/postgres.pot file.

Kindly let me know if I have missed anything.

Thanks & Regards,
Nitin Jadhav

On Tue, Sep 28, 2021 at 8:29 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Sep 28, 2021 at 8:06 AM Nitin Jadhav
> <nitinjadhavpostgres@gmail.com> wrote:
> > I thought mentioning the unit in milliseconds and the example in
> > seconds would confuse the user, so I changed the example to
> > milliseconds.The message behind the above description looks good to me
> > however I feel some sentences can be explained in less words. The
> > information related to the units is missing and I feel it should be
> > mentioned in the document. The example says, if the setting has the
> > default value of 10 seconds, then it explains the behaviour. I feel it
> > may not be the default value, it can be any value set by the user. So
> > mentioning 'default' in the example does not look good to me. I feel
> > we just have to mention "if this setting is set to the value of 10
> > seconds". Below is the modified version of the above information.
>
> It is common to mention what the default is as part of the
> documentation of a GUC. I don't see why this one should be an
> exception, especially since not mentioning it reduces the length of
> the documentation by exactly one word.
>
> I don't mind the sentence you added at the end to clarify the default
> units, but the way you've rewritten the first sentence makes it, in my
> opinion, much less clear.
>
> > I had added additional code to check the value of the
> > 'log_startup_progress_interval' variable and  disable the feature in
> > case of -1 in the earlier versions of the patch (Specifically
> > v9.patch). There was a review comment for v9 patch and it resulted in
> > major refactoring of the patch.
> ...
> > The answer for the question of "I don't understand why you posted the
> > previous version of the patch without testing that it works" is, for
> > the value of -1, the above description was my understanding and for
> > the value of 0, the older versions of the patch was behaving as
> > documented. But with the later versions the behaviour got changed and
> > I missed to modify the documentation. So I modified it in the last
> > version.
>
> v9 was posted on August 3rd. I told you that it wasn't working on
> September 23rd. You posted a new version that still does not work on
> September 27th. I think you should have tested each version of your
> patch before posting it, and especially after any major refactorings.
> And if for whatever reason you didn't, then certainly after I told you
> on September 23rd that it didn't work, I think you should have fixed
> it before posting a new version.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com

Attachment

Re: when the startup process doesn't (logging startup delays)

From
Alvaro Herrera
Date:
So, I've wondered about this part all along:

> +/*
> + * Calculates the timestamp at which the next timer should expire and enables
> + * the timer accordingly.
> + */
> +static void
> +reset_startup_progress_timeout(TimestampTz now)
> +{
> +    TimestampTz next_timeout;
> +
> +    next_timeout = TimestampTzPlusMilliseconds(scheduled_startup_progress_timeout,
> +                                               log_startup_progress_interval);
> +    if (next_timeout < now)
> +    {
> +        /*
> +         * Either the timeout was processed so late that we missed an
> +         * entire cycle or system clock was set backwards.
> +         */
> +        next_timeout = TimestampTzPlusMilliseconds(now, log_startup_progress_interval);
> +    }
> +
> +    enable_timeout_at(STARTUP_PROGRESS_TIMEOUT, next_timeout);

Why is it that we set the next timeout to fire not at "now + interval"
but at "when-it-should-have-fired-but-didn't + interval"?  As a user, if
I request a message to be logged every N milliseconds, and one
of them is a little bit delayed, then (assuming I set it to 10s) I still
expect the next one to occur at now+10s.  I don't expect the next at
"now+5s" if one is delayed 5s.

In other words, I think this function should just be
  enable_timeout_after(STARTUP_PROGRESS_TIMEOUT, log_startup_progress_interval);

This means you can remove the scheduled_startup_progress_timeout
variable.

-- 
Álvaro Herrera           39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"No hay ausente sin culpa ni presente sin disculpa" (Prov. francés)



Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Wed, Sep 29, 2021 at 10:08 AM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
> Sorry. There was a misunderstanding about this and for the patch
> shared on September 27th, I had tested for the value '0' and observed
> that no progress messages were getting logged, probably the time at
> which 'enable_timeout_at' is getting called is past the 'next_timeout'
> value. This behaviour is completely dependent on the system. Now added
> an extra condition to disable the feature in case of '0' setting.

Oh, interesting. I failed to consider that the behavior might vary
from one system to another.

I just noticed something else which I realize is the indirect result
of my own suggestion but which doesn't actually look all that nice.
You've now got a call to RegisterTimeout(STARTUP_PROGRESS_TIMEOUT,
...) in InitPostgres, guarded by ! IsPostmasterEnvironment, and then
another one in StartupProcessMain(). I think that's so that the
feature works in single-user mode, but that means that technically,
we're not reporting on the progress of the startup process. We're
reporting progress on the startup operations that are normally
performed by the startup process. But that means that the
documentation isn't quite accurate (because it mentions the startup
process specifically) and that the placement of the code in startup.c
is suspect (because that's specifically for the startup process) and
that basically every instance of the word "process" in the patch is
technically a little bit wrong. I'm not sure if that's a big enough
problem to be worth worrying about or exactly what we ought to do
about it, but it doesn't seem fantastic. At a minimum, I think we
should probably try to eliminate as many references to the startup
process as we can, and talk about startup progress or startup
operations or something like that.

+ * Start timestamp of the operation that occur during startup process.

This is not correct grammar - it would need to be "operations that
occur" or "operation that occurs". But neither seems particularly
clear about what the variable actually does. How about "Time at which
the most recent startup operation started"?

+ * Indicates the timestamp at which the timer was supposed to expire.

"Indicates" can be deleted, but also I think it would be better to
state the purpose of the timer i.e. "Timestamp at which the next
startup progress message should be logged."

+ enable_timeout_at(STARTUP_PROGRESS_TIMEOUT, next_timeout);
+ scheduled_startup_progress_timeout = next_timeout;
+ startup_progress_timer_expired = false;

I think you should set startup_progress_timer_expired to false before
calling enable_timeout_at. Otherwise there's a race condition. It's
unlikely that the timer could expire and the interrupt handler fire
before we reach startup_progress_timer_expired = false, but it seems
like there's no reason to take a chance.

+ * Calculates the timestamp at which the next timer should expire and enables

So in some places you have verbs with an "s" on the end, like here,
and in other places without, like in the next example. In "telegraph
style" comments like this, this implicit subject is "it" or "this,"
but you don't write that. However you write the rest of the sentence
as if it were there. So this should say "calculate" and "enable"
rather than "calculates" and "enables".

+ * Schedule a wakeup call for logging the progress of startup process.

This isn't really an accurate description, I think. It's not
scheduling anything, and I don't know what a "wakeup call" is anyway.
"Set a flag indicating that it's time to log a progress report" might
be better.

+ * Sets the start timestamp of the current operation and also enables the

Set. enable.

+ * timeout for logging the progress of startup process.

I think you could delete "for logging the progress of startup process"
here; that seems clear enough, and this reads a bit awkwardly. If you
want to keep something like this I wrote write "...enable the timeout
so that the progress of the startup progress will be logged."

+ * the timer if it did, otheriwse return false.

otherwise

+ * Begin the startup progress phase to report the progress of syncing
+ * data directory (syncfs).

All of the comments that start with "Begin the startup progress phase"
should instead start with "Begin startup progress phase".

I think this could be condensed down to "Prepare to report progress
syncing the data directory via syncfs", and likewise "Prepare to
report progress of the pre-fsync phase", "Prepare to report progress
resetting unlogged relations," etc.

+ gettext_noop("Sets the time interval between each progress update "
+ "of the startup process."),

This is actually inaccurate. It's sort of the same problem I was
worried about with respect to the documentation: it's NOT the interval
between progress updates, because it applies separately to each
operation. We need to say something that makes that clear, or at least
that doesn't get overtly the opposite impression. It's hard to do that
briefly, but maybe something like "Time between progress updates for
long-running startup operations"?

Whatever we use here could also be the comment for
log_startup_progress_interval.

+ * Logs the startup progress message if the timer has expired.

the -> a

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Wed, Sep 29, 2021 at 1:36 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Why is it that we set the next timeout to fire not at "now + interval"
> but at "when-it-should-have-fired-but-didn't + interval"?  As a user, if
> I request a message to be logged every N milliseconds, and one
> of them is a little bit delayed, then (assuming I set it to 10s) I still
> expect the next one to occur at now+10s.  I don't expect the next at
> "now+5s" if one is delayed 5s.

Well, this was my suggestion, because if you don't do this, you get
drift, which I think looks weird. Like the timestamps will be:

13:41:05.012456
13:41:15.072484
13:41:25.149632

...and it gets further and further off as it goes on.'

I guess my expectation is different from yours: I expect that if I ask
for a message every 10 seconds, the time between messages is going to
be 10s, at least on average, not 10s + however much latency the system
has.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Justin Pryzby
Date:
On Wed, Sep 29, 2021 at 02:36:14PM -0300, Alvaro Herrera wrote:
> Why is it that we set the next timeout to fire not at "now + interval"
> but at "when-it-should-have-fired-but-didn't + interval"?  As a user, if
> I request a message to be logged every N milliseconds, and one
> of them is a little bit delayed, then (assuming I set it to 10s) I still
> expect the next one to occur at now+10s.  I don't expect the next at
> "now+5s" if one is delayed 5s.
> 
> In other words, I think this function should just be
>   enable_timeout_after(STARTUP_PROGRESS_TIMEOUT, log_startup_progress_interval);
> 
> This means you can remove the scheduled_startup_progress_timeout
> variable.

Robert requested the current behavior here.
https://www.postgresql.org/message-id/CA%2BTgmoYkS1ZeWdSMFMBecMNxWonHk6J5eoX4FEQrpKtvEbXqGQ%40mail.gmail.com

It's confusing (at least) to get these kind of requests to change the behavior
back and forth.

-- 
Justin



Re: when the startup process doesn't (logging startup delays)

From
Alvaro Herrera
Date:
On 2021-Sep-29, Robert Haas wrote:

> Well, this was my suggestion, because if you don't do this, you get
> drift, which I think looks weird. Like the timestamps will be:
> 
> 13:41:05.012456
> 13:41:15.072484
> 13:41:25.149632
> 
> ...and it gets further and further off as it goes on.'

Right ... I actually *expect* this drift to occur.  Maybe people
generally don't like this, it just seems natural to me.  Are there other
opinions on this aspect?

-- 
Álvaro Herrera           39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"Nadie está tan esclavizado como el que se cree libre no siéndolo" (Goethe)



Re: when the startup process doesn't (logging startup delays)

From
Alvaro Herrera
Date:
On 2021-Sep-29, Justin Pryzby wrote:

> Robert requested the current behavior here.
> https://www.postgresql.org/message-id/CA%2BTgmoYkS1ZeWdSMFMBecMNxWonHk6J5eoX4FEQrpKtvEbXqGQ%40mail.gmail.com
> 
> It's confusing (at least) to get these kind of requests to change the behavior
> back and forth.

Well, I did scan the thread to see if this had been discussed, and I
overlooked that message.  But there was no reply to that message, so
it's not clear whether this was just Robert's opinion or consensus; in
fact we now have exactly two votes on it (mine and Robert's).

I think one person casting an opinion on one aspect does not set that
aspect in stone.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"It takes less than 2 seconds to get to 78% complete; that's a good sign.
A few seconds later it's at 90%, but it seems to have stuck there.  Did
somebody make percentages logarithmic while I wasn't looking?"
                http://smylers.hates-software.com/2005/09/08/1995c749.html



Re: when the startup process doesn't (logging startup delays)

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2021-Sep-29, Robert Haas wrote:
>> Well, this was my suggestion, because if you don't do this, you get
>> drift, which I think looks weird. Like the timestamps will be:
>> 
>> 13:41:05.012456
>> 13:41:15.072484
>> 13:41:25.149632
>> 
>> ...and it gets further and further off as it goes on.'

> Right ... I actually *expect* this drift to occur.  Maybe people
> generally don't like this, it just seems natural to me.  Are there other
> opinions on this aspect?

FWIW, I agree with Robert that it's nicer if the timeout doesn't drift.
There's a limit to how much complexity I'm willing to tolerate for that,
but it doesn't seem like this exceeds it.

The real comment I'd have here, though, is that writing one-off
code for this purpose is bad.  If we have a need for a repetitive
timeout, it'd be better to add the feature to timeout.c explicitly.
That would probably also remove the need for extra copies of the
timeout time.

            regards, tom lane



Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Wed, Sep 29, 2021 at 1:52 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> I think one person casting an opinion on one aspect does not set that
> aspect in stone.

Of course not. I was just explaining that how the patch ended up like it did.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Wed, Sep 29, 2021 at 2:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The real comment I'd have here, though, is that writing one-off
> code for this purpose is bad.  If we have a need for a repetitive
> timeout, it'd be better to add the feature to timeout.c explicitly.
> That would probably also remove the need for extra copies of the
> timeout time.

I'm not sure that really helps very much, honestly.  I mean it would
be useful in this particular case, but there are other cases where we
have logic like this already, and this wouldn't do anything about
those. For example, consider autoprewarm_main(). Like this code, that
code thinks (perhaps just because I'm the one who reviewed it) that
the next time should be measured from the last time ... but an
enhancement to the timeout machinery wouldn't help it at all. I
suspect there are other cases like this elsewhere, because this is
what I personally tend to think is the right behavior and I feel like
it comes up in patch reviews from time to time, but I'm not finding
any at the moment. Even if I'm right that they exist, I'm not sure
they look much like each other or can easily reuse any code.

And then again on the other hand, BackgroundWriterMain() thinks that
the next time should be measured from the time we got around to doing
it, not the scheduled time. I guess we don't really have any
consistent practice here.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Sep 29, 2021 at 2:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The real comment I'd have here, though, is that writing one-off
>> code for this purpose is bad.  If we have a need for a repetitive
>> timeout, it'd be better to add the feature to timeout.c explicitly.
>> That would probably also remove the need for extra copies of the
>> timeout time.

> I'm not sure that really helps very much, honestly.

I didn't claim there are any other places that could use the feature
*today*.  But once we've got one, it seems like there could be more
tomorrow.  In any case, I dislike keeping timeout state data outside
timeout.c, because it's so likely to get out-of-sync that way.

            regards, tom lane



Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Wed, Sep 29, 2021 at 5:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I didn't claim there are any other places that could use the feature
> *today*.  But once we've got one, it seems like there could be more
> tomorrow.  In any case, I dislike keeping timeout state data outside
> timeout.c, because it's so likely to get out-of-sync that way.

Well, I had a quick go at implementing this (attached).

It seems to do a satisfactory job preventing drift over time, but it
doesn't behave nicely if you set the system clock backward. With a bit
of extra debugging output:

2021-09-30 14:23:50.291 EDT [2279] LOG:  scheduling wakeup in 2 secs,
998727 usecs
2021-09-30 14:23:53.291 EDT [2279] LOG:  scheduling wakeup in 2 secs,
998730 usecs
2021-09-30 14:23:56.291 EDT [2279] LOG:  scheduling wakeup in 2 secs,
998728 usecs
2021-09-30 14:20:01.154 EDT [2279] LOG:  scheduling wakeup in 238
secs, 135532 usecs
2021-09-30 14:23:59.294 EDT [2279] LOG:  scheduling wakeup in 2 secs, 995922 use

The issue here is that fin_time is really the driving force behind
everything timeout.c does. In particular, it determines the order of
active_timeouts[]. And that's not really correct either for
enable_timeout_after(), or for the new function I added in this draft
patch, enable_timeout_every(). When I say I want my handler to be
fired in 3 s, I don't mean that I want it to be fired when the system
time is 3 seconds greater than it is right now. I mean I want it to be
fired in 3 actual seconds, regardless of what dumb thing the system
clock may choose to do. I don't really think that precise behavior can
be implemented, but ideally if a timeout that was supposed to happen
after 3 s is now scheduled for a time that is more than 3 seconds
beyond the current value of the system clock, we'd move the firing
time backwards to 3 seconds beyond the current system clock. That way,
if you set the system time backward by 4 minutes, you might see a few
seconds of delay before the next firing, but you wouldn't go into the
tank for 4 minutes.

I don't see an obvious way of making timeout.c behave like that
without major surgery, though. If nobody else does either, then we
could (1) stick with something closer to Nitin's current patch, which
tries to handle this concern outside of timeout.c, (2) adopt something
like the attached 0001 and leave the question of improved behavior in
case of backwards system clock adjustments for another day, or (3)
undertake to rewrite timeout.c as a precondition of being able to log
messages about why startup is slow. I'm not a huge fan of (3) but I'm
open to suggestions.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Attachment

Re: when the startup process doesn't (logging startup delays)

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> ...  When I say I want my handler to be
> fired in 3 s, I don't mean that I want it to be fired when the system
> time is 3 seconds greater than it is right now. I mean I want it to be
> fired in 3 actual seconds, regardless of what dumb thing the system
> clock may choose to do.

That would be lovely, certainly.  But aren't you moving the goalposts
rather far?  I don't think we make any promises about such things
today, so why has the issue suddenly gotten more pressing?  In particular,
why do you think Nitin's patch is proof against this?  Seems to me it's
probably got *more* failure cases, not fewer, if the system clock is
acting funny.

BTW, one could imagine addressing this concern by having timeout.c work
with CLOCK_MONOTONIC instead of the regular wall clock.  But I fear
we'd have to drop enable_timeout_at(), for lack of ability to translate
between CLOCK_MONOTONIC timestamps and those used by anybody else.
Also get_timeout_start_time/get_timeout_finish_time would become
problematic.  Maybe we only really care about deltas, so the more
restrictive API would be workable, but it seems like a nontrivial
amount of work.

On the whole, in these days of NTP, I'm not sure I care to spend
large amounts of effort on dealing with a bogus system clock.

            regards, tom lane



Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Thu, Sep 30, 2021 at 3:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> That would be lovely, certainly.  But aren't you moving the goalposts
> rather far?  I don't think we make any promises about such things
> today, so why has the issue suddenly gotten more pressing?

Yeah, perhaps it's best to not to worry about it. I dislike failure to
worry about that case on general principle, but I agree with you that
it seems to be moving the goalposts a disproportionate distance.

> In particular,
> why do you think Nitin's patch is proof against this?  Seems to me it's
> probably got *more* failure cases, not fewer, if the system clock is
> acting funny.

You might be right. I sort of assumed that timeout.c had some defense
against this, but since that seems not to be the case, I suppose no
facility that depends on it can hope to stay out of trouble either.

> On the whole, in these days of NTP, I'm not sure I care to spend
> large amounts of effort on dealing with a bogus system clock.

It's certainly less of an issue than it used to be back in my day.

Any thoughts on the patch I attached?

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Michael Paquier
Date:
On Thu, Sep 30, 2021 at 05:08:17PM -0400, Robert Haas wrote:
> It's certainly less of an issue than it used to be back in my day.
>
> Any thoughts on the patch I attached?

I don't know.  Anyway, this is actively worked on, so I have taken the
liberty to move that to the next CF.
--
Michael

Attachment

Re: when the startup process doesn't (logging startup delays)

From
Amul Sul
Date:
On Wed, Sep 29, 2021 at 11:10 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Sep 29, 2021 at 10:08 AM Nitin Jadhav
> <nitinjadhavpostgres@gmail.com> wrote:
> > Sorry. There was a misunderstanding about this and for the patch
> > shared on September 27th, I had tested for the value '0' and observed
> > that no progress messages were getting logged, probably the time at
> > which 'enable_timeout_at' is getting called is past the 'next_timeout'
> > value. This behaviour is completely dependent on the system. Now added
> > an extra condition to disable the feature in case of '0' setting.
>
> Oh, interesting. I failed to consider that the behavior might vary
> from one system to another.
>
> I just noticed something else which I realize is the indirect result
> of my own suggestion but which doesn't actually look all that nice.
> You've now got a call to RegisterTimeout(STARTUP_PROGRESS_TIMEOUT,
> ...) in InitPostgres, guarded by ! IsPostmasterEnvironment, and then
> another one in StartupProcessMain(). I think that's so that the
> feature works in single-user mode, but that means that technically,
> we're not reporting on the progress of the startup process. We're
> reporting progress on the startup operations that are normally
> performed by the startup process. But that means that the
> documentation isn't quite accurate (because it mentions the startup
> process specifically) and that the placement of the code in startup.c
> is suspect (because that's specifically for the startup process) and
> that basically every instance of the word "process" in the patch is
> technically a little bit wrong. I'm not sure if that's a big enough
> problem to be worth worrying about or exactly what we ought to do
> about it, but it doesn't seem fantastic. At a minimum, I think we
> should probably try to eliminate as many references to the startup
> process as we can, and talk about startup progress or startup
> operations or something like that.
>
> + * Start timestamp of the operation that occur during startup process.
>
> This is not correct grammar - it would need to be "operations that
> occur" or "operation that occurs". But neither seems particularly
> clear about what the variable actually does. How about "Time at which
> the most recent startup operation started"?
>
> + * Indicates the timestamp at which the timer was supposed to expire.
>
> "Indicates" can be deleted, but also I think it would be better to
> state the purpose of the timer i.e. "Timestamp at which the next
> startup progress message should be logged."
>
> + enable_timeout_at(STARTUP_PROGRESS_TIMEOUT, next_timeout);
> + scheduled_startup_progress_timeout = next_timeout;
> + startup_progress_timer_expired = false;
>
> I think you should set startup_progress_timer_expired to false before
> calling enable_timeout_at. Otherwise there's a race condition. It's
> unlikely that the timer could expire and the interrupt handler fire
> before we reach startup_progress_timer_expired = false, but it seems
> like there's no reason to take a chance.
>
> + * Calculates the timestamp at which the next timer should expire and enables
>
> So in some places you have verbs with an "s" on the end, like here,
> and in other places without, like in the next example. In "telegraph
> style" comments like this, this implicit subject is "it" or "this,"
> but you don't write that. However you write the rest of the sentence
> as if it were there. So this should say "calculate" and "enable"
> rather than "calculates" and "enables".
>
> + * Schedule a wakeup call for logging the progress of startup process.
>
> This isn't really an accurate description, I think. It's not
> scheduling anything, and I don't know what a "wakeup call" is anyway.
> "Set a flag indicating that it's time to log a progress report" might
> be better.
>
> + * Sets the start timestamp of the current operation and also enables the
>
> Set. enable.
>
> + * timeout for logging the progress of startup process.
>
> I think you could delete "for logging the progress of startup process"
> here; that seems clear enough, and this reads a bit awkwardly. If you
> want to keep something like this I wrote write "...enable the timeout
> so that the progress of the startup progress will be logged."
>
> + * the timer if it did, otheriwse return false.
>
> otherwise
>
> + * Begin the startup progress phase to report the progress of syncing
> + * data directory (syncfs).
>
> All of the comments that start with "Begin the startup progress phase"
> should instead start with "Begin startup progress phase".
>
> I think this could be condensed down to "Prepare to report progress
> syncing the data directory via syncfs", and likewise "Prepare to
> report progress of the pre-fsync phase", "Prepare to report progress
> resetting unlogged relations," etc.
>
> + gettext_noop("Sets the time interval between each progress update "
> + "of the startup process."),
>
> This is actually inaccurate. It's sort of the same problem I was
> worried about with respect to the documentation: it's NOT the interval
> between progress updates, because it applies separately to each
> operation. We need to say something that makes that clear, or at least
> that doesn't get overtly the opposite impression. It's hard to do that
> briefly, but maybe something like "Time between progress updates for
> long-running startup operations"?
>
> Whatever we use here could also be the comment for
> log_startup_progress_interval.
>
> + * Logs the startup progress message if the timer has expired.
>
> the -> a
>

In addition I have little concern about ereport_startup_progress() use:

+               if (!StandbyMode)
+                   ereport_startup_progress("redo in progress,
elapsed time: %ld.%02d s, current LSN: %X/%X",
+                                            LSN_FORMAT_ARGS(ReadRecPtr));

The format string, its input, input count, and input position
is not properly aligned. The input for "elapsed time: %ld.%02d s" is
getting value implicitly inside ereport_startup_progress() which
could be confusing, usually, the input count should be the same as
expected in the format string in the same sequence that is expected to
appear in the log message.

I think the "elapsed time" part can be implicitly added to the error
message inside ereport_startup_progress() which is common to all
calls.

Regards,
Amul



Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Wed, Oct 13, 2021 at 9:06 AM Amul Sul <sulamul@gmail.com> wrote:
> I think the "elapsed time" part can be implicitly added to the error
> message inside ereport_startup_progress() which is common to all
> calls.

Not if it means having to call psprintf there!

If there's some way we could do it with macro tricks, it might be
worth considering, but I'm not sure there is, or that it would be less
confusing.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Thu, Sep 30, 2021 at 5:08 PM Robert Haas <robertmhaas@gmail.com> wrote:
> Any thoughts on the patch I attached?

Apparently not, but here's a v2 anyway. In this version I made
enable_timeout_every() a three-argument function, so that the caller
can specify both the first time at which the timeout routine should be
called and the interval between them, instead of only the latter. That
seems to be more convenient for this use case, and is more powerful in
general.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Attachment

Re: when the startup process doesn't (logging startup delays)

From
Nitin Jadhav
Date:
> Apparently not, but here's a v2 anyway. In this version I made
> enable_timeout_every() a three-argument function, so that the caller
> can specify both the first time at which the timeout routine should be
> called and the interval between them, instead of only the latter. That
> seems to be more convenient for this use case, and is more powerful in
> general.

Thanks for sharing the patch. Overall approach looks good to me. But
just one concern about using enable_timeout_every() functionality. As
per my understanding the caller should calculate the first scheduled
timeout (now + interval) and pass it as the second argument but this
is not the same in 'v2-0002-Quick-testing-hack.patch'. Anyways I have
done the changes as I have mentioned (like now + interval). Kindly
correct me if I am wrong. I am attaching 2 patches here.
'v19-0001-Add-enable_timeout_every-to-fire-the-same-timeout.patch' is
the same as Robert's v2 patch. I have rebased my patch on top of this
and it is 'v19-0002-startup-progress.patch'.

> I just noticed something else which I realize is the indirect result
> of my own suggestion but which doesn't actually look all that nice.
> You've now got a call to RegisterTimeout(STARTUP_PROGRESS_TIMEOUT,
> ...) in InitPostgres, guarded by ! IsPostmasterEnvironment, and then
> another one in StartupProcessMain(). I think that's so that the
> feature works in single-user mode, but that means that technically,
> we're not reporting on the progress of the startup process. We're
> reporting progress on the startup operations that are normally
> performed by the startup process. But that means that the
> documentation isn't quite accurate (because it mentions the startup
> process specifically) and that the placement of the code in startup.c
> is suspect (because that's specifically for the startup process) and
> that basically every instance of the word "process" in the patch is
> technically a little bit wrong. I'm not sure if that's a big enough
> problem to be worth worrying about or exactly what we ought to do
> about it, but it doesn't seem fantastic. At a minimum, I think we
> should probably try to eliminate as many references to the startup
> process as we can, and talk about startup progress or startup
> operations or something like that.

Yeah right. I have modified the comments accordingly and also fixed
the other review comments related to the code comments.

I have modified the code to include a call to RegisterTimeout() in
only one place than the two calls previously. Initially I thought to
call this in begin_startup_progress_phase(). I feel this is not a
better choice since begin_startup_progress_phase function is getting
called in many places. So it calls RegisterTimeout() many times which
is not required. I feel StartupXLOG() is a better place for this since
StartupXLOG() gets called during the startup process, bootstrap
process or standalone backend. As per earlier discussion we need
support for this in the case of startup process and standalone
backend. Hence guarded this with '!IsBootstrapProcessingMode()'. Also
verified the behaviour in both of the cases. Please correct me if I am
wrong.

Thanks & Regards,
Nitin Jadhav

On Mon, Oct 18, 2021 at 9:15 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Sep 30, 2021 at 5:08 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > Any thoughts on the patch I attached?
>
> Apparently not, but here's a v2 anyway. In this version I made
> enable_timeout_every() a three-argument function, so that the caller
> can specify both the first time at which the timeout routine should be
> called and the interval between them, instead of only the latter. That
> seems to be more convenient for this use case, and is more powerful in
> general.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com

Attachment

Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Tue, Oct 19, 2021 at 9:06 AM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
> Thanks for sharing the patch. Overall approach looks good to me. But
> just one concern about using enable_timeout_every() functionality. As
> per my understanding the caller should calculate the first scheduled
> timeout (now + interval) and pass it as the second argument but this
> is not the same in 'v2-0002-Quick-testing-hack.patch'. Anyways I have
> done the changes as I have mentioned (like now + interval). Kindly
> correct me if I am wrong. I am attaching 2 patches here.
> 'v19-0001-Add-enable_timeout_every-to-fire-the-same-timeout.patch' is
> the same as Robert's v2 patch. I have rebased my patch on top of this
> and it is 'v19-0002-startup-progress.patch'.

This version looks fine, so I have committed it (and my
enable_timeout_every patch also, as a necessary prerequisite).

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Bharath Rupireddy
Date:
On Mon, Oct 25, 2021 at 9:26 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Oct 19, 2021 at 9:06 AM Nitin Jadhav
> <nitinjadhavpostgres@gmail.com> wrote:
> > Thanks for sharing the patch. Overall approach looks good to me. But
> > just one concern about using enable_timeout_every() functionality. As
> > per my understanding the caller should calculate the first scheduled
> > timeout (now + interval) and pass it as the second argument but this
> > is not the same in 'v2-0002-Quick-testing-hack.patch'. Anyways I have
> > done the changes as I have mentioned (like now + interval). Kindly
> > correct me if I am wrong. I am attaching 2 patches here.
> > 'v19-0001-Add-enable_timeout_every-to-fire-the-same-timeout.patch' is
> > the same as Robert's v2 patch. I have rebased my patch on top of this
> > and it is 'v19-0002-startup-progress.patch'.
>
> This version looks fine, so I have committed it (and my
> enable_timeout_every patch also, as a necessary prerequisite).

Thanks for getting this in.

I have few more thoughts:

Can we also log the total time the startup process took to recover,
and also the total time each stage of the recovery/redo processing
took: 1) into a file or 2) emitting that info via a new hook 3) into a
system catalog table (assuming at the end of the recovery the database
is in a consistent state, but I'm not sure if we ever update any
catalog tables in/after the startup/recovery phase).

This will help the users/admins/developers for summarizing, analytical
and debugging purposes. This information can easily help us to
understand the recovery patterns.

Thoughts?

If okay, I can spend some more time and start a separate thread to discuss.

Regards,
Bharath Rupireddy.



Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Tue, Oct 26, 2021 at 4:19 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> Can we also log the total time the startup process took to recover,
> and also the total time each stage of the recovery/redo processing
> took: 1) into a file or 2) emitting that info via a new hook 3) into a
> system catalog table (assuming at the end of the recovery the database
> is in a consistent state, but I'm not sure if we ever update any
> catalog tables in/after the startup/recovery phase).

#3 would be hard to do because there could be any number of databases
and it is unclear which one we ought to update. Also, we'd have to
launch a background worker to connect to that database in order to do
it, and be prepared for what happens if that worker fails to get the
work done for whatever reason. Also, it is unclear why we should only
log this specific thing to a system catalog and not anything else.

#1 and #2 could certainly be done, but I'm not sure what the use case
is, and I think it's a separate proposal from what we did here. So I
think a new thread would be appropriate if you want to make a new
proposal.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Mon, Oct 25, 2021 at 11:56 AM Robert Haas <robertmhaas@gmail.com> wrote:
> This version looks fine, so I have committed it (and my
> enable_timeout_every patch also, as a necessary prerequisite).

I was fooling around with a test setup today, working on an unrelated
problem, and this happened:

2021-10-28 14:21:23.145 EDT [92010] LOG:  resetting unlogged relations
(init), elapsed time: 0.00 s, current path: base/13020

That's not supposed to happen. I assume the problem is that the
timeout for the previous phase fired just as we were beginning a new
one, and the code got confused. I think we probably need to do
something like this to make sure that the timeout from one operation
can't trigger a log message for the next:

diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 28e68dd871..47ec737888 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -320,6 +320,8 @@ begin_startup_progress_phase(void)
     if (log_startup_progress_interval == 0)
         return;

+    disable_timeout(STARTUP_PROGRESS_TIMEOUT, false);
+    startup_progress_timer_expired = false;
     startup_progress_phase_start_time = GetCurrentTimestamp();
     fin_time = TimestampTzPlusMilliseconds(startup_progress_phase_start_time,
                                            log_startup_progress_interval);

Thoughts?

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Nitin Jadhav
Date:
> I was fooling around with a test setup today, working on an unrelated
> problem, and this happened:
>
> 2021-10-28 14:21:23.145 EDT [92010] LOG:  resetting unlogged relations
> (init), elapsed time: 0.00 s, current path: base/13020

Nice catch and interesting case.

> That's not supposed to happen. I assume the problem is that the
> timeout for the previous phase fired just as we were beginning a new
> one, and the code got confused. I think we probably need to do
> something like this to make sure that the timeout from one operation
> can't trigger a log message for the next:
>
> diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
> index 28e68dd871..47ec737888 100644
> --- a/src/backend/postmaster/startup.c
> +++ b/src/backend/postmaster/startup.c
> @@ -320,6 +320,8 @@ begin_startup_progress_phase(void)
>     if (log_startup_progress_interval == 0)
>         return;
>
> +    disable_timeout(STARTUP_PROGRESS_TIMEOUT, false);
> +    startup_progress_timer_expired = false;
>     startup_progress_phase_start_time = GetCurrentTimestamp();
>     fin_time = TimestampTzPlusMilliseconds(startup_progress_phase_start_time,
>                                            log_startup_progress_interval);
>
> Thoughts?

Yes. I agree that this is not an expected behaviour and I do agree
that, probably the timeout for the previous phase fired just as we
were beginning a new one. For each operation, we call
begin_startup_progress_phase() before starting the operation and
one/multiple calls to ereport_startup_progress() to report the
progress and intentionally we have not added any functionality to
disable the timer at the end of the operation. The timer remains
active and there may be multiple calls to
startup_progress_timeout_handler() which sets a flag to true. So
whenever we call a begin_startup_progress_phase() for the next
operation, we do enable the timer (In my understanding it will
internally disable the old timer and schedule a new one) but the flag
is already set to true by the previous timer. Hence the next call to
ereport_startup_progress() logs a message. So I feel just setting
'startup_progress_timer_expired' to false in
begin_startup_progress_phase() would work. Please correct me if I am
wrong.

Thanks & Regards,
Nitin Jadhav

On Thu, Oct 28, 2021 at 11:59 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Oct 25, 2021 at 11:56 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > This version looks fine, so I have committed it (and my
> > enable_timeout_every patch also, as a necessary prerequisite).
>
> I was fooling around with a test setup today, working on an unrelated
> problem, and this happened:
>
> 2021-10-28 14:21:23.145 EDT [92010] LOG:  resetting unlogged relations
> (init), elapsed time: 0.00 s, current path: base/13020
>
> That's not supposed to happen. I assume the problem is that the
> timeout for the previous phase fired just as we were beginning a new
> one, and the code got confused. I think we probably need to do
> something like this to make sure that the timeout from one operation
> can't trigger a log message for the next:
>
> diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
> index 28e68dd871..47ec737888 100644
> --- a/src/backend/postmaster/startup.c
> +++ b/src/backend/postmaster/startup.c
> @@ -320,6 +320,8 @@ begin_startup_progress_phase(void)
>      if (log_startup_progress_interval == 0)
>          return;
>
> +    disable_timeout(STARTUP_PROGRESS_TIMEOUT, false);
> +    startup_progress_timer_expired = false;
>      startup_progress_phase_start_time = GetCurrentTimestamp();
>      fin_time = TimestampTzPlusMilliseconds(startup_progress_phase_start_time,
>                                             log_startup_progress_interval);
>
> Thoughts?
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Fri, Oct 29, 2021 at 7:37 AM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
> ereport_startup_progress() logs a message. So I feel just setting
> 'startup_progress_timer_expired' to false in
> begin_startup_progress_phase() would work. Please correct me if I am
> wrong.

I think you're wrong. If we did that, the previous timer could fire
right after we set startup_progress_timer_expired = false, and before
we reschedule the timeout. It seems annoying to have to disable the
timeout and immediately turn around and re-enable it, but I don't see
how to avoid the race condition otherwise.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Nitin Jadhav
Date:
> I think you're wrong. If we did that, the previous timer could fire
> right after we set startup_progress_timer_expired = false, and before
> we reschedule the timeout. It seems annoying to have to disable the
> timeout and immediately turn around and re-enable it, but I don't see
> how to avoid the race condition otherwise.

Right. There is a possibility of race conditions. In that case the
above changes look good to me.

Thanks & Regards,
Nitin Jadhav

On Fri, Oct 29, 2021 at 6:10 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Oct 29, 2021 at 7:37 AM Nitin Jadhav
> <nitinjadhavpostgres@gmail.com> wrote:
> > ereport_startup_progress() logs a message. So I feel just setting
> > 'startup_progress_timer_expired' to false in
> > begin_startup_progress_phase() would work. Please correct me if I am
> > wrong.
>
> I think you're wrong. If we did that, the previous timer could fire
> right after we set startup_progress_timer_expired = false, and before
> we reschedule the timeout. It seems annoying to have to disable the
> timeout and immediately turn around and re-enable it, but I don't see
> how to avoid the race condition otherwise.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>
>



Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Fri, Oct 29, 2021 at 9:10 AM Nitin Jadhav
<nitinjadhavpostgres@gmail.com> wrote:
> > I think you're wrong. If we did that, the previous timer could fire
> > right after we set startup_progress_timer_expired = false, and before
> > we reschedule the timeout. It seems annoying to have to disable the
> > timeout and immediately turn around and re-enable it, but I don't see
> > how to avoid the race condition otherwise.
>
> Right. There is a possibility of race conditions. In that case the
> above changes look good to me.

Committed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Thomas Munro
Date:
On Sat, Oct 30, 2021 at 7:44 AM Robert Haas <robertmhaas@gmail.com> wrote:
> Committed.

Is it expected that an otherwise idle standby's recovery process
receives SIGALRM every N seconds, or should the timer be canceled at
that point, as there is no further progress to report?



Re: when the startup process doesn't (logging startup delays)

From
Bharath Rupireddy
Date:
On Tue, Nov 8, 2022 at 4:35 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Sat, Oct 30, 2021 at 7:44 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > Committed.
>
> Is it expected that an otherwise idle standby's recovery process
> receives SIGALRM every N seconds, or should the timer be canceled at
> that point, as there is no further progress to report?

Nice catch. Yeah, that seems unnecessary, see the below standby logs.
I think we need to disable_timeout(STARTUP_PROGRESS_TIMEOUT, false);,
something like the attached? I think there'll be no issue with the
patch since the StandbyMode gets reset only at the end of recovery (in
FinishWalRecovery()) but it can very well be set during recovery (in
ReadRecord()). Note that I also added an assertion in
has_startup_progress_timeout_expired(), just in case.

2022-11-08 11:28:23.563 UTC [980909] LOG:  SIGALRM handle_sig_alarm received
2022-11-08 11:28:23.563 UTC [980909] LOG:
startup_progress_timeout_handler called
2022-11-08 11:28:33.563 UTC [980909] LOG:  SIGALRM handle_sig_alarm received
2022-11-08 11:28:33.563 UTC [980909] LOG:
startup_progress_timeout_handler called
2022-11-08 11:28:43.563 UTC [980909] LOG:  SIGALRM handle_sig_alarm received
2022-11-08 11:28:43.563 UTC [980909] LOG:
startup_progress_timeout_handler called
2022-11-08 11:28:53.563 UTC [980909] LOG:  SIGALRM handle_sig_alarm received
2022-11-08 11:28:53.563 UTC [980909] LOG:
startup_progress_timeout_handler called

Whilte at it, I noticed that we report redo progress for PITR, but we
don't report when standby enters archive recovery mode, say due to a
failure in the connection to primary or after the promote signal is
found. Isn't it useful to report in this case as well to know the
recovery progress?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: when the startup process doesn't (logging startup delays)

From
Simon Riggs
Date:
On Tue, 8 Nov 2022 at 12:33, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Nov 8, 2022 at 4:35 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> >
> > On Sat, Oct 30, 2021 at 7:44 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > > Committed.
> >
> > Is it expected that an otherwise idle standby's recovery process
> > receives SIGALRM every N seconds, or should the timer be canceled at
> > that point, as there is no further progress to report?
>
> Nice catch. Yeah, that seems unnecessary, see the below standby logs.
> I think we need to disable_timeout(STARTUP_PROGRESS_TIMEOUT, false);,
> something like the attached? I think there'll be no issue with the
> patch since the StandbyMode gets reset only at the end of recovery (in
> FinishWalRecovery()) but it can very well be set during recovery (in
> ReadRecord()). Note that I also added an assertion in
> has_startup_progress_timeout_expired(), just in case.
>
> 2022-11-08 11:28:23.563 UTC [980909] LOG:  SIGALRM handle_sig_alarm received
> 2022-11-08 11:28:23.563 UTC [980909] LOG:
> startup_progress_timeout_handler called
> 2022-11-08 11:28:33.563 UTC [980909] LOG:  SIGALRM handle_sig_alarm received
> 2022-11-08 11:28:33.563 UTC [980909] LOG:
> startup_progress_timeout_handler called
> 2022-11-08 11:28:43.563 UTC [980909] LOG:  SIGALRM handle_sig_alarm received
> 2022-11-08 11:28:43.563 UTC [980909] LOG:
> startup_progress_timeout_handler called
> 2022-11-08 11:28:53.563 UTC [980909] LOG:  SIGALRM handle_sig_alarm received
> 2022-11-08 11:28:53.563 UTC [980909] LOG:
> startup_progress_timeout_handler called
>
> Whilte at it, I noticed that we report redo progress for PITR, but we
> don't report when standby enters archive recovery mode, say due to a
> failure in the connection to primary or after the promote signal is
> found. Isn't it useful to report in this case as well to know the
> recovery progress?

I think your patch disables progress too early, effectively turning
off the standby progress feature. The purpose was to report on things
that take long periods during recovery, not just prior to recovery.

I would advocate that we disable progress only while waiting, as I've done here:
https://www.postgresql.org/message-id/CANbhV-GcWjZ2cmj0uCbZDWQUHnneMi_4EfY3dVWq0-yD5o7Ccg%40mail.gmail.com

--
Simon Riggs                http://www.EnterpriseDB.com/



Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Mon, Nov 14, 2022 at 7:37 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
> > Whilte at it, I noticed that we report redo progress for PITR, but we
> > don't report when standby enters archive recovery mode, say due to a
> > failure in the connection to primary or after the promote signal is
> > found. Isn't it useful to report in this case as well to know the
> > recovery progress?
>
> I think your patch disables progress too early, effectively turning
> off the standby progress feature. The purpose was to report on things
> that take long periods during recovery, not just prior to recovery.
>
> I would advocate that we disable progress only while waiting, as I've done here:
> https://www.postgresql.org/message-id/CANbhV-GcWjZ2cmj0uCbZDWQUHnneMi_4EfY3dVWq0-yD5o7Ccg%40mail.gmail.com

Maybe I'm confused here, but I think that, on a standby, startup
progress messages are only printed until the main redo loop is
reached. Otherwise, we would print a message on a standby every 10s
forever, which seems like a thing that most users would not like. So I
think that Bharath has the right idea here.

I don't think that his patch is right in detail, though. I don't think
the call to disable_timeout() needs to be conditional, and I don't
think the Assert is correct. Also, I think that your patch has the
right idea in encapsulating the disable_timeout() call inside a new
function disable_startup_progress_timeout(), rather than having the
details known directly by xlogrecovery.c.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Bharath Rupireddy
Date:
On Mon, Nov 14, 2022 at 9:31 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Nov 14, 2022 at 7:37 AM Simon Riggs
> <simon.riggs@enterprisedb.com> wrote:
> > > Whilte at it, I noticed that we report redo progress for PITR, but we
> > > don't report when standby enters archive recovery mode, say due to a
> > > failure in the connection to primary or after the promote signal is
> > > found. Isn't it useful to report in this case as well to know the
> > > recovery progress?
> >
> > I think your patch disables progress too early, effectively turning
> > off the standby progress feature. The purpose was to report on things
> > that take long periods during recovery, not just prior to recovery.
> >
> > I would advocate that we disable progress only while waiting, as I've done here:
> > https://www.postgresql.org/message-id/CANbhV-GcWjZ2cmj0uCbZDWQUHnneMi_4EfY3dVWq0-yD5o7Ccg%40mail.gmail.com
>
> Maybe I'm confused here, but I think that, on a standby, startup
> progress messages are only printed until the main redo loop is
> reached. Otherwise, we would print a message on a standby every 10s
> forever, which seems like a thing that most users would not like. So I
> think that Bharath has the right idea here.

Yes, the idea is to disable the timeout on standby completely since we
actually don't report any recovery progress. Keeping it enabled,
unnecessarily calls startup_progress_timeout_handler() every
log_startup_progress_interval seconds i.e. 10 seconds. That's the
intention of the patch.

> I don't think that his patch is right in detail, though. I don't think
> the call to disable_timeout() needs to be conditional,

Yes, disable_timeout() returns if the timeout was previously disabled
i.e. all_timeouts[STARTUP_PROGRESS_TIMEOUT].active is false. I changed
it in the v2 patch.

> and I don't
> think the Assert is correct.

You're right. My intention there was to check if the timeout is
enabled while ereport_startup_progress() is called. In the v2 patch,
when we actually disable the timeout startup_progress_timer_expired
gets set to false and has_startup_progress_timeout_expired() just
returns in such a case.

> Also, I think that your patch has the
> right idea in encapsulating the disable_timeout() call inside a new
> function disable_startup_progress_timeout(), rather than having the
> details known directly by xlogrecovery.c.

Yes, I too like Simon's idea of {enable,
disable}_startup_progress_timeout functions, I utilized them in the v2
patch here.

I actually want to get rid of begin_startup_progress_phase() which now
becomes a thin wrapper calling disable and enable functions and ensure
the callers do follow enable()-report_progress()-disable() way to use
the feature, however I didn't code for that as it needs changes across
many files. If okay, I can code for that too. Thoughts?

Please review the v2 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: when the startup process doesn't (logging startup delays)

From
Simon Riggs
Date:
On Tue, 15 Nov 2022 at 13:33, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Nov 14, 2022 at 9:31 PM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Mon, Nov 14, 2022 at 7:37 AM Simon Riggs
> > <simon.riggs@enterprisedb.com> wrote:
> > > > Whilte at it, I noticed that we report redo progress for PITR, but we
> > > > don't report when standby enters archive recovery mode, say due to a
> > > > failure in the connection to primary or after the promote signal is
> > > > found. Isn't it useful to report in this case as well to know the
> > > > recovery progress?
> > >
> > > I think your patch disables progress too early, effectively turning
> > > off the standby progress feature. The purpose was to report on things
> > > that take long periods during recovery, not just prior to recovery.
> > >
> > > I would advocate that we disable progress only while waiting, as I've done here:
> > > https://www.postgresql.org/message-id/CANbhV-GcWjZ2cmj0uCbZDWQUHnneMi_4EfY3dVWq0-yD5o7Ccg%40mail.gmail.com
> >
> > Maybe I'm confused here, but I think that, on a standby, startup
> > progress messages are only printed until the main redo loop is
> > reached. Otherwise, we would print a message on a standby every 10s
> > forever, which seems like a thing that most users would not like. So I
> > think that Bharath has the right idea here.
>
> Yes, the idea is to disable the timeout on standby completely since we
> actually don't report any recovery progress. Keeping it enabled,
> unnecessarily calls startup_progress_timeout_handler() every
> log_startup_progress_interval seconds i.e. 10 seconds. That's the
> intention of the patch.

As long as we don't get the SIGALRMs that Thomas identified, then I'm happy.

-- 
Simon Riggs                http://www.EnterpriseDB.com/



Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Tue, Nov 15, 2022 at 8:33 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> Please review the v2 patch.

It seems to me that this will call disable_startup_progress_timeout
once per WAL record, which seems like an unnecessary expense. How
about leaving the code inside the loop just as we have it, and putting
if (StandbyMode) disable_startup_progress_timeout() before entering
the loop?

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Bharath Rupireddy
Date:
On Tue, Nov 15, 2022 at 10:55 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Nov 15, 2022 at 8:33 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> > Please review the v2 patch.
>
> It seems to me that this will call disable_startup_progress_timeout
> once per WAL record, which seems like an unnecessary expense. How
> about leaving the code inside the loop just as we have it, and putting
> if (StandbyMode) disable_startup_progress_timeout() before entering
> the loop?

That can be done, only if we can disable the timeout in another place
when the StandbyMode is set to true in ReadRecord(), that is, after
the standby server finishes crash recovery and enters standby mode.

I'm attaching the v3 patch for further review. Please find the CF
entry here - https://commitfest.postgresql.org/41/4012/.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: when the startup process doesn't (logging startup delays)

From
Simon Riggs
Date:
On Wed, 16 Nov 2022 at 06:47, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Nov 15, 2022 at 10:55 PM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Tue, Nov 15, 2022 at 8:33 AM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > Please review the v2 patch.
> >
> > It seems to me that this will call disable_startup_progress_timeout
> > once per WAL record, which seems like an unnecessary expense. How
> > about leaving the code inside the loop just as we have it, and putting
> > if (StandbyMode) disable_startup_progress_timeout() before entering
> > the loop?
>
> That can be done, only if we can disable the timeout in another place
> when the StandbyMode is set to true in ReadRecord(), that is, after
> the standby server finishes crash recovery and enters standby mode.
>
> I'm attaching the v3 patch for further review. Please find the CF
> entry here - https://commitfest.postgresql.org/41/4012/.

begin_startup_progress_phase() checks to see if feature is disabled
twice, so I think you can skip the check and just rely on the check in
enable().

Otherwise, all good.

-- 
Simon Riggs                http://www.EnterpriseDB.com/



Re: when the startup process doesn't (logging startup delays)

From
Bharath Rupireddy
Date:
On Wed, Nov 16, 2022 at 2:28 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
>
> On Wed, 16 Nov 2022 at 06:47, Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Tue, Nov 15, 2022 at 10:55 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > >
> > > On Tue, Nov 15, 2022 at 8:33 AM Bharath Rupireddy
> > > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > > Please review the v2 patch.
> > >
> > > It seems to me that this will call disable_startup_progress_timeout
> > > once per WAL record, which seems like an unnecessary expense. How
> > > about leaving the code inside the loop just as we have it, and putting
> > > if (StandbyMode) disable_startup_progress_timeout() before entering
> > > the loop?
> >
> > That can be done, only if we can disable the timeout in another place
> > when the StandbyMode is set to true in ReadRecord(), that is, after
> > the standby server finishes crash recovery and enters standby mode.
> >
> > I'm attaching the v3 patch for further review. Please find the CF
> > entry here - https://commitfest.postgresql.org/41/4012/.
>
> begin_startup_progress_phase() checks to see if feature is disabled
> twice, so I think you can skip the check and just rely on the check in
> enable().

Yes, I did that intentionally to avoid begin_startup_progress_phase()
calling disable and enable functions when the feature is disabled.
I'll leave it to the committer whether to retain it or delete it.

> Otherwise, all good.

Thanks.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Wed, Nov 16, 2022 at 1:47 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> That can be done, only if we can disable the timeout in another place
> when the StandbyMode is set to true in ReadRecord(), that is, after
> the standby server finishes crash recovery and enters standby mode.

Oh, interesting. I didn't realize that we would need to worry about that case.

> I'm attaching the v3 patch for further review. Please find the CF
> entry here - https://commitfest.postgresql.org/41/4012/.

I kind of dislike having to have logic for this in two places. Seems
like it could create future bugs.

How about the attached approach, instead? This way, the first time the
timer expires after we reach standby mode, we reactively disable it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Attachment

Re: when the startup process doesn't (logging startup delays)

From
Thomas Munro
Date:
On Thu, Nov 17, 2022 at 7:51 AM Robert Haas <robertmhaas@gmail.com> wrote:
+     * up, since standby mode is a state that is intendeded to persist

typo

Otherwise LGTM.



Re: when the startup process doesn't (logging startup delays)

From
Bharath Rupireddy
Date:
On Thu, Nov 17, 2022 at 12:21 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Nov 16, 2022 at 1:47 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> > That can be done, only if we can disable the timeout in another place
> > when the StandbyMode is set to true in ReadRecord(), that is, after
> > the standby server finishes crash recovery and enters standby mode.
>
> Oh, interesting. I didn't realize that we would need to worry about that case.
>
> > I'm attaching the v3 patch for further review. Please find the CF
> > entry here - https://commitfest.postgresql.org/41/4012/.
>
> I kind of dislike having to have logic for this in two places. Seems
> like it could create future bugs.

Duplication is a problem that I agree with and I have an idea here -
how about introducing a new function, say EnableStandbyMode() that
sets StandbyMode to true and disables the startup progress timeout,
something like the attached?

> How about the attached approach, instead? This way, the first time the
> timer expires after we reach standby mode, we reactively disable it.

Hm. I'm not really sure if it's a good idea. While it simplifies the
code, the has_startup_progress_timeout_expired() gets called for every
WAL record in standby mode. Isn't this an unnecessary thing?
Currently, the if (!StandbyMode) condition blocks the function calls.
And I'm also a little concerned that we move the StandbyMode variable
to startup.c which so far tiled to xlogrecovery.c. Maybe these are not
really concerns at all. Maybe others are okay with this approach.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Thu, Nov 17, 2022 at 2:22 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> Duplication is a problem that I agree with and I have an idea here -
> how about introducing a new function, say EnableStandbyMode() that
> sets StandbyMode to true and disables the startup progress timeout,
> something like the attached?

That works for me, more or less. But I think that
enable_startup_progress_timeout should be amended to either say if
(log_startup_progress_interval == 0 || StandbyMode) return; or else it
should at least Assert(!StandbyMode), so that we can't accidentally
re-enable the timer after we shut it off.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Bharath Rupireddy
Date:
On Fri, Nov 18, 2022 at 12:42 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Nov 17, 2022 at 2:22 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> > Duplication is a problem that I agree with and I have an idea here -
> > how about introducing a new function, say EnableStandbyMode() that
> > sets StandbyMode to true and disables the startup progress timeout,
> > something like the attached?
>
> That works for me, more or less. But I think that
> enable_startup_progress_timeout should be amended to either say if
> (log_startup_progress_interval == 0 || StandbyMode) return; or else it
> should at least Assert(!StandbyMode), so that we can't accidentally
> re-enable the timer after we shut it off.

Hm, an assertion may not help in typical production servers running on
non-assert builds. I've modified the if condition, please see the
attached v5 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: when the startup process doesn't (logging startup delays)

From
Kyotaro Horiguchi
Date:
At Fri, 18 Nov 2022 15:55:00 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in 
> On Fri, Nov 18, 2022 at 12:42 AM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Thu, Nov 17, 2022 at 2:22 AM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > Duplication is a problem that I agree with and I have an idea here -
> > > how about introducing a new function, say EnableStandbyMode() that
> > > sets StandbyMode to true and disables the startup progress timeout,
> > > something like the attached?
> >
> > That works for me, more or less. But I think that
> > enable_startup_progress_timeout should be amended to either say if
> > (log_startup_progress_interval == 0 || StandbyMode) return; or else it
> > should at least Assert(!StandbyMode), so that we can't accidentally
> > re-enable the timer after we shut it off.
> 
> Hm, an assertion may not help in typical production servers running on
> non-assert builds. I've modified the if condition, please see the
> attached v5 patch.

I prefer Robert's approach as it is more robust for future changes and
simple. I prefer to avoid this kind of piggy-backing and it doesn't
seem to be needed in this case. XLogShutdownWalRcv() looks like a
similar case to me and honestly I don't like it in the sense of
robustness but it is simpler than checking walreceiver status at every
site that refers to the flag.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Sun, Nov 20, 2022 at 9:20 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> I prefer Robert's approach as it is more robust for future changes and
> simple. I prefer to avoid this kind of piggy-backing and it doesn't
> seem to be needed in this case. XLogShutdownWalRcv() looks like a
> similar case to me and honestly I don't like it in the sense of
> robustness but it is simpler than checking walreceiver status at every
> site that refers to the flag.

I don't understand what you want changed. Can you be more specific
about what you mean by "Robert's approach"?

I don't agree with Bharath's logic for preferring an if-test to an
Assert. There are some cases where we think we've written the code
correctly but also recognize that the logic is complicated enough that
something might slip through the cracks. Then, a runtime check makes
sense, because otherwise something real bad might happen on a
production instance. But here, I don't think that's the main risk. I
think the main risk is that some future patch tries to add code that
should print startup log messages later on. That would probably be a
coding mistake, and Assert would alert the patch author about that,
whereas amending the if-test would just make the code do something
differently then the author intended.

But I don't feel super-strongly about this, which is why I mentioned
both options in my previous email. I'm not clear on whether you are
expressing an opinion on this point in particular or something more
general.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Bharath Rupireddy
Date:
On Mon, Nov 21, 2022 at 10:37 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Sun, Nov 20, 2022 at 9:20 PM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> > I prefer Robert's approach as it is more robust for future changes and
> > simple. I prefer to avoid this kind of piggy-backing and it doesn't
> > seem to be needed in this case. XLogShutdownWalRcv() looks like a
> > similar case to me and honestly I don't like it in the sense of
> > robustness but it is simpler than checking walreceiver status at every
> > site that refers to the flag.
>
> I don't understand what you want changed. Can you be more specific
> about what you mean by "Robert's approach"?
>
> I don't agree with Bharath's logic for preferring an if-test to an
> Assert. There are some cases where we think we've written the code
> correctly but also recognize that the logic is complicated enough that
> something might slip through the cracks. Then, a runtime check makes
> sense, because otherwise something real bad might happen on a
> production instance. But here, I don't think that's the main risk. I
> think the main risk is that some future patch tries to add code that
> should print startup log messages later on. That would probably be a
> coding mistake, and Assert would alert the patch author about that,
> whereas amending the if-test would just make the code do something
> differently then the author intended.
>
> But I don't feel super-strongly about this, which is why I mentioned
> both options in my previous email. I'm not clear on whether you are
> expressing an opinion on this point in particular or something more
> general.

If we place just the Assert(!StandbyMode); in
enable_startup_progress_timeout(), it fails for
begin_startup_progress_phase() in ResetUnloggedRelations() because the
InitWalRecovery(), that sets StandbyMode to true, is called before
ResetUnloggedRelations() . However, with the if (StandbyMode) {
return; }, we fail to report progress of ResetUnloggedRelations() in a
standby, which isn't a good idea at all because we only want to
disable the timeout during the recovery's main loop.

I introduced an assert-only function returning true when we're in
standby's main redo apply loop and modified the assertion to be
Assert(!InStandbyMainRedoApplyLoop()); works better but it complicates
the code a bit. FWIW, I'm attaching the v6 patch with this change.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Tue, Nov 22, 2022 at 6:05 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> If we place just the Assert(!StandbyMode); in
> enable_startup_progress_timeout(), it fails for
> begin_startup_progress_phase() in ResetUnloggedRelations() because the
> InitWalRecovery(), that sets StandbyMode to true, is called before
> ResetUnloggedRelations() . However, with the if (StandbyMode) {
> return; }, we fail to report progress of ResetUnloggedRelations() in a
> standby, which isn't a good idea at all because we only want to
> disable the timeout during the recovery's main loop.

Ugh. Well, in that case, I guess my vote is to forget about this whole
Assert business and just commit what you had in v4. Does that work for
you?

Protecting against specifically the situation where we're in the
standby's main redo apply loop is not really what I had in mind here,
but this is already sort of weirdly complicated-looking, and making it
more weirdly complicated-looking to achieve the kind of protection
that I had in mind doesn't really seem worth it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Bharath Rupireddy
Date:
On Fri, Feb 3, 2023 at 2:29 AM Robert Haas <robertmhaas@gmail.com> wrote:
>

Thanks for looking at this.

> On Tue, Nov 22, 2022 at 6:05 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> > If we place just the Assert(!StandbyMode); in
> > enable_startup_progress_timeout(), it fails for
> > begin_startup_progress_phase() in ResetUnloggedRelations() because the
> > InitWalRecovery(), that sets StandbyMode to true, is called before
> > ResetUnloggedRelations() . However, with the if (StandbyMode) {
> > return; }, we fail to report progress of ResetUnloggedRelations() in a
> > standby, which isn't a good idea at all because we only want to
> > disable the timeout during the recovery's main loop.
>
> Ugh. Well, in that case, I guess my vote is to forget about this whole
> Assert business and just commit what you had in v4. Does that work for
> you?

Yes, it seems reasonable to me.

> Protecting against specifically the situation where we're in the
> standby's main redo apply loop is not really what I had in mind here,
> but this is already sort of weirdly complicated-looking, and making it
> more weirdly complicated-looking to achieve the kind of protection
> that I had in mind doesn't really seem worth it.

IMHO, the responsibility of whether or not to report progress of any
operation can lie with the developers writing features using the
progress reporting mechanism. The progress reporting mechanism can
just be independent of all that.

I took the v4 patch, added some comments and attached it as the v7
patch here. Please find it.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Thu, Feb 2, 2023 at 11:22 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> I took the v4 patch, added some comments and attached it as the v7
> patch here. Please find it.

Committed and back-patched to v15.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Feb 2, 2023 at 11:22 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>> I took the v4 patch, added some comments and attached it as the v7
>> patch here. Please find it.

> Committed and back-patched to v15.

Umm ... is this really the sort of patch to be committing on a
release wrap day?

            regards, tom lane



Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Mon, Feb 6, 2023 at 11:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Umm ... is this really the sort of patch to be committing on a
> release wrap day?

Oh, shoot, I wasn't thinking about that. Would you like me to revert
it in v15 for now?

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Feb 6, 2023 at 11:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Umm ... is this really the sort of patch to be committing on a
>> release wrap day?

> Oh, shoot, I wasn't thinking about that. Would you like me to revert
> it in v15 for now?

Yeah, seems like the safest course.  I wouldn't object to it going in
after the release is over, but right now there's zero margin for error.

            regards, tom lane



Re: when the startup process doesn't (logging startup delays)

From
Robert Haas
Date:
On Mon, Feb 6, 2023 at 11:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Mon, Feb 6, 2023 at 11:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Umm ... is this really the sort of patch to be committing on a
> >> release wrap day?
>
> > Oh, shoot, I wasn't thinking about that. Would you like me to revert
> > it in v15 for now?
>
> Yeah, seems like the safest course.  I wouldn't object to it going in
> after the release is over, but right now there's zero margin for error.

Done.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: when the startup process doesn't (logging startup delays)

From
Bharath Rupireddy
Date:
On Mon, Feb 6, 2023 at 9:39 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Feb 6, 2023 at 11:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Umm ... is this really the sort of patch to be committing on a
> > release wrap day?
>
> Oh, shoot, I wasn't thinking about that. Would you like me to revert
> it in v15 for now?

Thanks a lot Robert for taking care of this. The patch is committed on
HEAD and reverted on v15. Now that the minor version branches are
stamped, is it time for us to get this to v15? I can then close the CF
entry - https://commitfest.postgresql.org/42/4012/.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: when the startup process doesn't (logging startup delays)

From
Tom Lane
Date:
Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:
> Thanks a lot Robert for taking care of this. The patch is committed on
> HEAD and reverted on v15. Now that the minor version branches are
> stamped, is it time for us to get this to v15? I can then close the CF
> entry - https://commitfest.postgresql.org/42/4012/.

No objection to un-reverting from here.

            regards, tom lane



Re: when the startup process doesn't (logging startup delays)

From
Bharath Rupireddy
Date:
On Wed, Feb 8, 2023 at 11:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:
> > Thanks a lot Robert for taking care of this. The patch is committed on
> > HEAD and reverted on v15. Now that the minor version branches are
> > stamped, is it time for us to get this to v15? I can then close the CF
> > entry - https://commitfest.postgresql.org/42/4012/.
>
> No objection to un-reverting from here.

Thanks Robert, Tom. It is now un-reverted on v15. I've closed the CF entry.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com