Thread: when the startup process doesn't
Hi, I've noticed that customers not infrequently complain that they start postgres and then the system doesn't come up for a while and they have no idea what's going on and are (understandably) worried. There are probably a number of reasons why this can happen, but the ones that seem to come up most often in my experience are (1) SyncDataDirectory takes a long time, (b) ResetUnloggedRelations takes a long time, and (c) there's a lot of WAL to apply so that takes a long time. It's possible to distinguish this last case from the other two by looking at the output of 'ps', but that's not super-convenient if your normal method of access to the server is via libpq, and it only works if you are monitoring it as it's happening rather than looking at the logs after-the-fact. I am not sure there's any real way to distinguish the other two cases without using strace or gdb or similar. It seems to me that we could do better. One approach would be to try to issue a log message periodically - maybe once per minute, or some configurable interval, e.g. perhaps add messages something like this: LOG: still syncing data directory, elapsed time %ld.%03d ms, current path %s LOG: data directory sync complete after %ld.%03d ms LOG: still resetting unlogged relations, elapsed time %ld.%03d ms, current path %s LOG: unlogged relations reset after %ld.%03d ms LOG: still performing crash recovery, elapsed time %ld.%03d ms, current LSN %08X/%08X We already have a message when redo is complete, so there's no need for another one. The implementation here doesn't seem too hard either: the startup process would set a timer, when the timer expires the signal handler sets a flag, at a convenient point we notice the flag is set and responding by printing a message and clearing the flag. Another possible approach would be to accept connections for monitoring purposes even during crash recovery. We can't allow access to any database at that point, since the system might not be consistent, but we could allow something like a replication connection (the non-database-associated variant). Maybe it would be precisely a replication connection and we'd just refuse all but a subset of commands, or maybe it would be some other kinds of thing. But either way you'd be able to issue a command in some mini-language saying "so, tell me how startup is going" and it would reply with a result set of some kind. If I had to pick one of these two ideas, I'd pick the one the log-based solution, since it seems easier to access and simplifies retrospective analysis, but I suspect SQL access would be quite useful for some users too, especially in cloud environments where "just log into the machine and have a look" is not an option. Thoughts? -- Robert Haas EDB: http://www.enterprisedb.com
On 2021-Apr-19, Robert Haas wrote: > Another possible approach would be to accept connections for > monitoring purposes even during crash recovery. We can't allow access > to any database at that point, since the system might not be > consistent, but we could allow something like a replication connection > (the non-database-associated variant). Hmm. We already have pg_isready, which is pretty simplistic -- it tries to connect to the server and derive a status in a very simplistic way. Can we perhaps improve on that? I think your idea of using the non-database-connected replication mode would let the server return a tuple with some status information with a new command. And then pg_isready could interpret that, or just print it. -- Álvaro Herrera 39°49'30"S 73°17'W Subversion to GIT: the shortest path to happiness I've ever heard of (Alexey Klyukin)
On Mon, Apr 19, 2021 at 01:55:13PM -0400, Robert Haas wrote: > I've noticed that customers not infrequently complain that they start > postgres and then the system doesn't come up for a while and they have > no idea what's going on and are (understandably) worried. There are > probably a number of reasons why this can happen, but the ones that > seem to come up most often in my experience are (1) SyncDataDirectory > takes a long time, (b) ResetUnloggedRelations takes a long time, and > (c) there's a lot of WAL to apply so that takes a long time. It's > possible to distinguish this last case from the other two by looking > at the output of 'ps', but that's not super-convenient if your normal > method of access to the server is via libpq, and it only works if you > are monitoring it as it's happening rather than looking at the logs > after-the-fact. I am not sure there's any real way to distinguish the > other two cases without using strace or gdb or similar. > > It seems to me that we could do better. One approach would be to try > to issue a log message periodically - maybe once per minute, or some > configurable interval, e.g. perhaps add messages something like this: Yes, this certainly needs improvement. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Tue, Apr 20, 2021 at 5:55 AM Robert Haas <robertmhaas@gmail.com> wrote: > If I had to pick one of these two ideas, I'd pick the one the > log-based solution, since it seems easier to access and simplifies > retrospective analysis, but I suspect SQL access would be quite useful > for some users too, especially in cloud environments where "just log > into the machine and have a look" is not an option. +1 for both ideas. I've heard multiple requests for something like that. A couple of users with update_process_title=off told me they regretted that choice when they found themselves running a long crash recovery with the only indicator of progress disabled.
Thomas Munro <thomas.munro@gmail.com> writes: > On Tue, Apr 20, 2021 at 5:55 AM Robert Haas <robertmhaas@gmail.com> wrote: >> If I had to pick one of these two ideas, I'd pick the one the >> log-based solution, since it seems easier to access and simplifies >> retrospective analysis, but I suspect SQL access would be quite useful >> for some users too, especially in cloud environments where "just log >> into the machine and have a look" is not an option. > +1 for both ideas. I've heard multiple requests for something like > that. A couple of users with update_process_title=off told me they > regretted that choice when they found themselves running a long crash > recovery with the only indicator of progress disabled. Hmm ... +1 for progress messages in the log, but I'm suspicious about the complexity-and-fragility-versus-value tradeoff for the other thing. regards, tom lane
On Mon, Apr 19, 2021 at 7:55 PM Robert Haas <robertmhaas@gmail.com> wrote: > > Hi, > > I've noticed that customers not infrequently complain that they start > postgres and then the system doesn't come up for a while and they have > no idea what's going on and are (understandably) worried. There are > probably a number of reasons why this can happen, but the ones that > seem to come up most often in my experience are (1) SyncDataDirectory > takes a long time, (b) ResetUnloggedRelations takes a long time, and > (c) there's a lot of WAL to apply so that takes a long time. It's > possible to distinguish this last case from the other two by looking > at the output of 'ps', but that's not super-convenient if your normal > method of access to the server is via libpq, and it only works if you > are monitoring it as it's happening rather than looking at the logs > after-the-fact. I am not sure there's any real way to distinguish the > other two cases without using strace or gdb or similar. > > It seems to me that we could do better. One approach would be to try > to issue a log message periodically - maybe once per minute, or some > configurable interval, e.g. perhaps add messages something like this: > > LOG: still syncing data directory, elapsed time %ld.%03d ms, current path %s > LOG: data directory sync complete after %ld.%03d ms > LOG: still resetting unlogged relations, elapsed time %ld.%03d ms, > current path %s > LOG: unlogged relations reset after %ld.%03d ms > LOG: still performing crash recovery, elapsed time %ld.%03d ms, > current LSN %08X/%08X > > We already have a message when redo is complete, so there's no need > for another one. The implementation here doesn't seem too hard either: > the startup process would set a timer, when the timer expires the > signal handler sets a flag, at a convenient point we notice the flag > is set and responding by printing a message and clearing the flag. > > Another possible approach would be to accept connections for > monitoring purposes even during crash recovery. We can't allow access > to any database at that point, since the system might not be > consistent, but we could allow something like a replication connection > (the non-database-associated variant). Maybe it would be precisely a > replication connection and we'd just refuse all but a subset of > commands, or maybe it would be some other kinds of thing. But either > way you'd be able to issue a command in some mini-language saying "so, > tell me how startup is going" and it would reply with a result set of > some kind. > > If I had to pick one of these two ideas, I'd pick the one the > log-based solution, since it seems easier to access and simplifies > retrospective analysis, but I suspect SQL access would be quite useful > for some users too, especially in cloud environments where "just log > into the machine and have a look" is not an option. > > Thoughts? (Ugh. Did reply instead of reply-all. Surely I should know that by now... Here's a re-send!) +1 for the log based one. In general I'm usually against the log based one, but something over the replication protocol is really not going to help a lot of people who are in this situation. They may not even have permissions to log in, and any kind of monitoring system would fail to work as well. And can we even log users in at this point? We can't get the list of roles... If we could, I would say it's probably better to allow the login in a regular connection, but then immediately throw an error and give this error a more detailed message if the user has monitoring permissions. But against either of those, the log based method is certainly a lot easier to build :) And FWIW, I believe most -- probably all -- cloud environments do give an interface to view the log at least, so the log based solution would work there as well. Maybe not as convenient, but it would work. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
On Mon, Apr 19, 2021 at 8:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Hmm ... +1 for progress messages in the log, but I'm suspicious about > the complexity-and-fragility-versus-value tradeoff for the other thing. All right, so it's nice to hear that nobody so far is opposed to the log-based solution, and I think it's sensible to think about building that one first and doing anything else later. But, if we did want to invent something to allow monitoring via libpq even at this early stage, how would we make it work? Magnus pointed out that we can hardly read pg_authid during crash recovery, which means that accepting logins in the usual sense at that stage is not feasible. But, what if we picked a fixed, hard-coded role name for this? I would suggest pg_monitor, but that's already taken for something else, so maybe pg_console or some better thing someone else can suggest. Without a pg_authid row, you couldn't use password, md5, or scram authentication, unless we provided some other place to store the verifier, like a flatfile. I'm not sure we want to go there, but that still leaves a lot of workable authentication methods. I think Álvaro is right to see this kind of work as an extension of pg_isready, but the problem with pg_isready is that we don't want to expose a lot of information to the whole Internet, or however much of it can reach the postgres port. But with this approach, you can lock down access via pg_hba.conf, which means that it's OK to expose information that we don't want to make available to everyone. I think we're still limited to exposing what can be observed from shared memory here, because the whole idea is to have something that can be used even before consistency is reached, so we shouldn't really be doing anything that would look at the contents of data files. But that still leaves a bunch of things that we could show here, the progress of the startup process being one of them. -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, Apr 20, 2021 at 2:43 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Apr 19, 2021 at 8:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Hmm ... +1 for progress messages in the log, but I'm suspicious about > > the complexity-and-fragility-versus-value tradeoff for the other thing. > > All right, so it's nice to hear that nobody so far is opposed to the > log-based solution, and I think it's sensible to think about building > that one first and doing anything else later. > > But, if we did want to invent something to allow monitoring via libpq > even at this early stage, how would we make it work? Magnus pointed > out that we can hardly read pg_authid during crash recovery, which > means that accepting logins in the usual sense at that stage is not > feasible. But, what if we picked a fixed, hard-coded role name for > this? I would suggest pg_monitor, but that's already taken for > something else, so maybe pg_console or some better thing someone else > can suggest. Without a pg_authid row, you couldn't use password, md5, > or scram authentication, unless we provided some other place to store > the verifier, like a flatfile. I'm not sure we want to go there, but > that still leaves a lot of workable authentication methods. Another option would be to keep this check entirely outside the scope of normal roles, and just listen on a port (or unix socket) during startup which basically just replies with the current status if you connect to it. On Unix this could also make use of peer authentication requiring you to be the same user as postgres for example. > I think Álvaro is right to see this kind of work as an extension of > pg_isready, but the problem with pg_isready is that we don't want to > expose a lot of information to the whole Internet, or however much of > it can reach the postgres port. But with this approach, you can lock > down access via pg_hba.conf, which means that it's OK to expose > information that we don't want to make available to everyone. I think > we're still limited to exposing what can be observed from shared > memory here, because the whole idea is to have something that can be > used even before consistency is reached, so we shouldn't really be > doing anything that would look at the contents of data files. But that > still leaves a bunch of things that we could show here, the progress > of the startup process being one of them. Yeah, I think we should definitely limit this to local access, one way or another. Realistically using pg_hba is going to require catalog access, isn't it? And we can't just go ignore those rows in pg_hba that for example references role membership (as well as all those auth methods you can't use). -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
On Tue, 20 Apr 2021 15:04:28 +0200 Magnus Hagander <magnus@hagander.net> wrote: [...] > Yeah, I think we should definitely limit this to local access, one way > or another. Realistically using pg_hba is going to require catalog > access, isn't it? And we can't just go ignore those rows in pg_hba > that for example references role membership (as well as all those auth > methods you can't use). Two another options: 1. if this is limited to local access only, outside of the log entries, the status of the startup could be updated in the controldata file as well. This would allows to watch it without tail-grep'ing logs using eg. pg_controldata. 2. maybe the startup process could ignore update_process_title? As far as I understand the doc correctly, this GUC is mostly useful for backends on Windows. Regards,
On Tue, Apr 20, 2021 at 5:17 PM Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote: > > On Tue, 20 Apr 2021 15:04:28 +0200 > Magnus Hagander <magnus@hagander.net> wrote: > [...] > > Yeah, I think we should definitely limit this to local access, one way > > or another. Realistically using pg_hba is going to require catalog > > access, isn't it? And we can't just go ignore those rows in pg_hba > > that for example references role membership (as well as all those auth > > methods you can't use). > > Two another options: > > 1. if this is limited to local access only, outside of the log entries, the > status of the startup could be updated in the controldata file as well. This > would allows to watch it without tail-grep'ing logs using eg. pg_controldata. I think doing so in controldata would definitely make things complicated for no real reason. Plus controldata has a fixed size (and has to have), whereas something like this would probably want more variation than that makes easy. There could be a "startup.status" file I guess which would basically contain the last line of what would otherwise be in the log. But if it remains a textfile, I'm not sure what the gain is -- you'll just have to have the dba look in more places than one to find it? It's not like there's likely to be much other data written to the log during these times? > 2. maybe the startup process could ignore update_process_title? As far > as I understand the doc correctly, this GUC is mostly useful for backends on > Windows. You mention Windows -- that would be one excellent reason not to go for this particular method. Viewing the process title is much harder on Windows, as there is actually no such thing and we fake it. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> writes: > On Tue, Apr 20, 2021 at 5:17 PM Jehan-Guillaume de Rorthais > <jgdr@dalibo.com> wrote: >> Two another options: >> 1. if this is limited to local access only, outside of the log entries, the >> status of the startup could be updated in the controldata file as well. This >> would allows to watch it without tail-grep'ing logs using eg. pg_controldata. > I think doing so in controldata would definitely make things > complicated for no real reason. Plus controldata has a fixed size (and > has to have), whereas something like this would probably want more > variation than that makes easy. Also, given that pg_control is as critical a bit of data as we have, we really don't want to be writing it more often than we absolutely have to. > There could be a "startup.status" file I guess which would basically > contain the last line of what would otherwise be in the log. But if it > remains a textfile, I'm not sure what the gain is -- you'll just have > to have the dba look in more places than one to find it? It's not like > there's likely to be much other data written to the log during these > times? Yeah, once you are talking about dumping stuff in a file, it's not clear how that's better than progress-messages-in-the-log. People already have a lot of tooling for looking at the postmaster log. I think the point of Robert's other proposal is to allow remote checks of the restart's progress, so local files aren't much of a substitute anyway. regards, tom lane
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Magnus Hagander <magnus@hagander.net> writes: > > On Tue, Apr 20, 2021 at 5:17 PM Jehan-Guillaume de Rorthais > > <jgdr@dalibo.com> wrote: > >> Two another options: > >> 1. if this is limited to local access only, outside of the log entries, the > >> status of the startup could be updated in the controldata file as well. This > >> would allows to watch it without tail-grep'ing logs using eg. pg_controldata. > > > I think doing so in controldata would definitely make things > > complicated for no real reason. Plus controldata has a fixed size (and > > has to have), whereas something like this would probably want more > > variation than that makes easy. > > Also, given that pg_control is as critical a bit of data as we have, > we really don't want to be writing it more often than we absolutely > have to. Yeah, don't think pg_control fiddling is what we want. I do agree with improving the logging situation around here, certainly. > > There could be a "startup.status" file I guess which would basically > > contain the last line of what would otherwise be in the log. But if it > > remains a textfile, I'm not sure what the gain is -- you'll just have > > to have the dba look in more places than one to find it? It's not like > > there's likely to be much other data written to the log during these > > times? > > Yeah, once you are talking about dumping stuff in a file, it's not > clear how that's better than progress-messages-in-the-log. People > already have a lot of tooling for looking at the postmaster log. Agreed. > I think the point of Robert's other proposal is to allow remote > checks of the restart's progress, so local files aren't much of > a substitute anyway. Yeah, being able to pick up on this remotely seems like it'd be quite nice. I'm not really thrilled with the idea, but the best I've got offhand for this would be a new role that's "pg_recovery_login" where an admin can GRANT that role to the roles they'd like to be able to use to login during the recovery process and then, for those roles, we write out flat files to allow authentication without access to pg_authid, whenever their password or such changes. It's certainly a bit grotty but I do think it'd work. I definitely don't want to go back to having all of pg_authid written as a flat file and I'd rather that existing tools and libraries work with this (meaning using the same port and speaking the PG protocol and such) rather than inventing some new thing that listens on some other port, etc. On the fence about tying this to 'pg_monitor' instead of using a new predefined role. Either way, I would definitely prefer to see the admin have to create a role and then GRANT the predefined role to that role. I really dislike the idea of having predefined roles that can be used to directly log into the database. Thanks, Stephen
Attachment
Stephen Frost <sfrost@snowman.net> writes: > Yeah, being able to pick up on this remotely seems like it'd be quite > nice. I'm not really thrilled with the idea, but the best I've got > offhand for this would be a new role that's "pg_recovery_login" where an > admin can GRANT that role to the roles they'd like to be able to use to > login during the recovery process and then, for those roles, we write > out flat files to allow authentication without access to pg_authid, We got rid of those flat files for good and sufficient reasons. I really really don't want to go back to having such. I wonder though whether we really need authentication here. pg_ping already exposes whether the database is up, to anyone who can reach the postmaster port at all. Would it be so horrible if the "can't accept connections" error message included a detail about "recovery is X% done"? regards, tom lane
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > Yeah, being able to pick up on this remotely seems like it'd be quite > > nice. I'm not really thrilled with the idea, but the best I've got > > offhand for this would be a new role that's "pg_recovery_login" where an > > admin can GRANT that role to the roles they'd like to be able to use to > > login during the recovery process and then, for those roles, we write > > out flat files to allow authentication without access to pg_authid, > > We got rid of those flat files for good and sufficient reasons. I really > really don't want to go back to having such. Yeah, certainly is part of the reason that I didn't really like that idea either. > I wonder though whether we really need authentication here. pg_ping > already exposes whether the database is up, to anyone who can reach the > postmaster port at all. Would it be so horrible if the "can't accept > connections" error message included a detail about "recovery is X% > done"? Ultimately it seems like it would depend on exactly what we are thinking of returning there. A simple percentage of recovery which has been completed doesn't seem like it'd really be revealing too much information though. Thanks, Stephen
Attachment
Hi, On 2021-04-19 13:55:13 -0400, Robert Haas wrote: > Another possible approach would be to accept connections for > monitoring purposes even during crash recovery. We can't allow access > to any database at that point, since the system might not be > consistent, but we could allow something like a replication connection > (the non-database-associated variant). Maybe it would be precisely a > replication connection and we'd just refuse all but a subset of > commands, or maybe it would be some other kinds of thing. But either > way you'd be able to issue a command in some mini-language saying "so, > tell me how startup is going" and it would reply with a result set of > some kind. The hard part about this seems to be how to perform authentication - obviously we can't do catalog lookups for users at that time. If that weren't the issue, we could easily do much better than now, by just providing an errdetail() with recovery progress information. But we presumably don't want to spray such information to unauthenticated connection attempts. I've vaguely wondered before whether it'd be worth having something like an "admin" socket somewhere in the data directory. Which explicitly wouldn't require authentication, have the cluster owner as the user, etc. That'd not just be useful for monitoring during recovery, but also make some interactions with the server easier for admin tools I think. > If I had to pick one of these two ideas, I'd pick the one the > log-based solution, since it seems easier to access and simplifies > retrospective analysis, but I suspect SQL access would be quite useful > for some users too, especially in cloud environments where "just log > into the machine and have a look" is not an option. However, leaving aside the implementation effort, the crazy idea above would not easily address the issue of only being accessible with local access... Greetings, Andres Freund
Hi, On 2021-04-20 14:56:58 -0400, Tom Lane wrote: > I wonder though whether we really need authentication here. pg_ping > already exposes whether the database is up, to anyone who can reach the > postmaster port at all. Would it be so horrible if the "can't accept > connections" error message included a detail about "recovery is X% > done"? Unfortunately I think something like a percentage is hard to calculate right now. Even just looking at crash recovery (vs replication or PITR), we don't currently know where the WAL ends without reading all the WAL. The easiest thing to return would be something in LSNs or bytes and I suspect that we don't want to expose either unauthenticated? I wonder if we ought to occasionally update something like ControlFileData->minRecoveryPoint on primaries, similar to what we do on standbys? Then we could actually calculate a percentage, and it'd have the added advantage of allowing to detect more cases where the end of the WAL was lost. Obviously we'd have to throttle it somehow, to avoid adding a lot of fsyncs, but that seems doable? Greetings, Andres Freund
On 2021-Apr-20, Andres Freund wrote: > On 2021-04-19 13:55:13 -0400, Robert Haas wrote: > > Another possible approach would be to accept connections for > > monitoring purposes even during crash recovery. We can't allow access > > to any database at that point, since the system might not be > > consistent, but we could allow something like a replication connection > > (the non-database-associated variant). Maybe it would be precisely a > > replication connection and we'd just refuse all but a subset of > > commands, or maybe it would be some other kinds of thing. But either > > way you'd be able to issue a command in some mini-language saying "so, > > tell me how startup is going" and it would reply with a result set of > > some kind. > > The hard part about this seems to be how to perform authentication - > obviously we can't do catalog lookups for users at that time. Maybe a way to do this would involve some sort of monitoring cookie that's obtained ahead of time (maybe at initdb time?) and is supplied to the frontend by some OOB means. Then frontend can present that during startup to the server, which ascertains its legitimacy without having to access catalogs. Perhaps it even requires a specific pg_hba.conf rule. -- Álvaro Herrera Valdivia, Chile "La verdad no siempre es bonita, pero el hambre de ella sí"
+1 for both log messages and allowing connections. I believe these two complement each other.
In the cloud world, we oftentimes want to monitor the progress of the recovery without connecting to the server as the operators don't necessarily have the required permissions to connect and query. Secondly, having this information in the log helps going back in time and understand where Postgres spent time during recovery.
The ability to query the server provides real time information and come handy.
Thanks,
Satya
On Mon, Apr 19, 2021 at 10:55 AM Robert Haas <robertmhaas@gmail.com> wrote:
Hi,
I've noticed that customers not infrequently complain that they start
postgres and then the system doesn't come up for a while and they have
no idea what's going on and are (understandably) worried. There are
probably a number of reasons why this can happen, but the ones that
seem to come up most often in my experience are (1) SyncDataDirectory
takes a long time, (b) ResetUnloggedRelations takes a long time, and
(c) there's a lot of WAL to apply so that takes a long time. It's
possible to distinguish this last case from the other two by looking
at the output of 'ps', but that's not super-convenient if your normal
method of access to the server is via libpq, and it only works if you
are monitoring it as it's happening rather than looking at the logs
after-the-fact. I am not sure there's any real way to distinguish the
other two cases without using strace or gdb or similar.
It seems to me that we could do better. One approach would be to try
to issue a log message periodically - maybe once per minute, or some
configurable interval, e.g. perhaps add messages something like this:
LOG: still syncing data directory, elapsed time %ld.%03d ms, current path %s
LOG: data directory sync complete after %ld.%03d ms
LOG: still resetting unlogged relations, elapsed time %ld.%03d ms,
current path %s
LOG: unlogged relations reset after %ld.%03d ms
LOG: still performing crash recovery, elapsed time %ld.%03d ms,
current LSN %08X/%08X
We already have a message when redo is complete, so there's no need
for another one. The implementation here doesn't seem too hard either:
the startup process would set a timer, when the timer expires the
signal handler sets a flag, at a convenient point we notice the flag
is set and responding by printing a message and clearing the flag.
Another possible approach would be to accept connections for
monitoring purposes even during crash recovery. We can't allow access
to any database at that point, since the system might not be
consistent, but we could allow something like a replication connection
(the non-database-associated variant). Maybe it would be precisely a
replication connection and we'd just refuse all but a subset of
commands, or maybe it would be some other kinds of thing. But either
way you'd be able to issue a command in some mini-language saying "so,
tell me how startup is going" and it would reply with a result set of
some kind.
If I had to pick one of these two ideas, I'd pick the one the
log-based solution, since it seems easier to access and simplifies
retrospective analysis, but I suspect SQL access would be quite useful
for some users too, especially in cloud environments where "just log
into the machine and have a look" is not an option.
Thoughts?
--
Robert Haas
EDB: http://www.enterprisedb.com
Greetings, * Andres Freund (andres@anarazel.de) wrote: > On 2021-04-20 14:56:58 -0400, Tom Lane wrote: > > I wonder though whether we really need authentication here. pg_ping > > already exposes whether the database is up, to anyone who can reach the > > postmaster port at all. Would it be so horrible if the "can't accept > > connections" error message included a detail about "recovery is X% > > done"? > > Unfortunately I think something like a percentage is hard to calculate > right now. Even just looking at crash recovery (vs replication or > PITR), we don't currently know where the WAL ends without reading all > the WAL. The easiest thing to return would be something in LSNs or > bytes and I suspect that we don't want to expose either unauthenticated? While it obviously wouldn't be exactly accurate, I wonder if we couldn't just look at the WAL files we have to reply and then guess that we'll go through about half of them before we reach the end..? I mean, wouldn't exactly be the first time that a percentage progress report wasn't completely accurate. :) > I wonder if we ought to occasionally update something like > ControlFileData->minRecoveryPoint on primaries, similar to what we do on > standbys? Then we could actually calculate a percentage, and it'd have > the added advantage of allowing to detect more cases where the end of > the WAL was lost. Obviously we'd have to throttle it somehow, to avoid > adding a lot of fsyncs, but that seems doable? This seems to go against Tom's concerns wrt rewriting pg_control. Perhaps we could work through a solution to that, which would be nice, but I'm not sure that we need the percentage to be super accurate anyway, though, ideally, we'd work it out so that it's always increasing and doesn't look "stuck" as long as we're actually moving forward through the WAL. Maybe a heuristic of 'look at the end of the WAL files, assume we'll go through 50% of it, but only consider that to be 90%, with the last 10% going from half-way through the WAL to the actual end of the WAL available." Yes, such heuristics are terrible, but they're also relatively simple and wouldn't require tracking anything additional and would, maybe, avoid the concern about needing to authenticate the user.. Thanks, Stephen
Attachment
Stephen Frost <sfrost@snowman.net> writes: > * Andres Freund (andres@anarazel.de) wrote: >> On 2021-04-20 14:56:58 -0400, Tom Lane wrote: >>> I wonder though whether we really need authentication here. pg_ping >>> already exposes whether the database is up, to anyone who can reach the >>> postmaster port at all. Would it be so horrible if the "can't accept >>> connections" error message included a detail about "recovery is X% >>> done"? >> Unfortunately I think something like a percentage is hard to calculate >> right now. > While it obviously wouldn't be exactly accurate, I wonder if we couldn't > just look at the WAL files we have to reply and then guess that we'll go > through about half of them before we reach the end..? I mean, wouldn't > exactly be the first time that a percentage progress report wasn't > completely accurate. :) Or we could skip all the guessing and just print something like what the startup process exposes in ps status, ie "currently processing WAL file so-and-so". regards, tom lane
Hi, On 2021-04-21 14:36:24 -0400, Stephen Frost wrote: > * Andres Freund (andres@anarazel.de) wrote: > > Unfortunately I think something like a percentage is hard to calculate > > right now. Even just looking at crash recovery (vs replication or > > PITR), we don't currently know where the WAL ends without reading all > > the WAL. The easiest thing to return would be something in LSNs or > > bytes and I suspect that we don't want to expose either unauthenticated? > > While it obviously wouldn't be exactly accurate, I wonder if we couldn't > just look at the WAL files we have to reply and then guess that we'll go > through about half of them before we reach the end..? I mean, wouldn't > exactly be the first time that a percentage progress report wasn't > completely accurate. :) I don't think that'd work well, due to WAL segment recycling. We rename WAL files into place when removing them, and sometimes that can be a *lot* of files. It's one thing for there to be a ~20% inaccuracy in estimated amount of work, another to have misestimates on the order of magnitudes. > > I wonder if we ought to occasionally update something like > > ControlFileData->minRecoveryPoint on primaries, similar to what we do on > > standbys? Then we could actually calculate a percentage, and it'd have > > the added advantage of allowing to detect more cases where the end of > > the WAL was lost. Obviously we'd have to throttle it somehow, to avoid > > adding a lot of fsyncs, but that seems doable? > > This seems to go against Tom's concerns wrt rewriting pg_control. I don't think that concern equally applies for what I am proposing here. For one, we already have minRecoveryPoint in ControlData, and we already use it for the purpose of determining where we need to recover to, albeit only during crash recovery. Imo that's substantially different from adding actual recovery progress status information to the control file. I also think that it'd actually be a significant reliability improvement if we maintained an approximate minRecoveryPoint during normal running: I've seen way too many cases where WAL files were lost / removed and crash recovery just started up happily. Only hitting problems months down the line. Yes, it'd obviously not bullet proof, since we'd not want to add a significant stream of new fsyncs, but IME such WAL files lost/removed issues tend not to be about a few hundred bytes of WAL but many segments missing. Greetings, Andres Freund
Greetings, * Andres Freund (andres@anarazel.de) wrote: > On 2021-04-21 14:36:24 -0400, Stephen Frost wrote: > > * Andres Freund (andres@anarazel.de) wrote: > > > Unfortunately I think something like a percentage is hard to calculate > > > right now. Even just looking at crash recovery (vs replication or > > > PITR), we don't currently know where the WAL ends without reading all > > > the WAL. The easiest thing to return would be something in LSNs or > > > bytes and I suspect that we don't want to expose either unauthenticated? > > > > While it obviously wouldn't be exactly accurate, I wonder if we couldn't > > just look at the WAL files we have to reply and then guess that we'll go > > through about half of them before we reach the end..? I mean, wouldn't > > exactly be the first time that a percentage progress report wasn't > > completely accurate. :) > > I don't think that'd work well, due to WAL segment recycling. We rename > WAL files into place when removing them, and sometimes that can be a > *lot* of files. It's one thing for there to be a ~20% inaccuracy in > estimated amount of work, another to have misestimates on the order of > magnitudes. I mean- we actively try to guess at how many WAL files we'll need during each checkpoint and if we're doing that decently then it'd hopefully be on about the order of half the files, as I suggested, that we'll end up going through at any point in time. Naturally, it'll be different if there's a forced checkpoint or a sudden spike of activity, but I'm not sure that it's an entirely unreasonable place to start if we're going to be guessing at it. > > > I wonder if we ought to occasionally update something like > > > ControlFileData->minRecoveryPoint on primaries, similar to what we do on > > > standbys? Then we could actually calculate a percentage, and it'd have > > > the added advantage of allowing to detect more cases where the end of > > > the WAL was lost. Obviously we'd have to throttle it somehow, to avoid > > > adding a lot of fsyncs, but that seems doable? > > > > This seems to go against Tom's concerns wrt rewriting pg_control. > > I don't think that concern equally applies for what I am proposing > here. For one, we already have minRecoveryPoint in ControlData, and we > already use it for the purpose of determining where we need to recover > to, albeit only during crash recovery. Imo that's substantially > different from adding actual recovery progress status information to the > control file. I agree that it's not the same as adding actual recovery progress status information. > I also think that it'd actually be a significant reliability improvement > if we maintained an approximate minRecoveryPoint during normal running: > I've seen way too many cases where WAL files were lost / removed and > crash recovery just started up happily. Only hitting problems months > down the line. Yes, it'd obviously not bullet proof, since we'd not want > to add a significant stream of new fsyncs, but IME such WAL files > lost/removed issues tend not to be about a few hundred bytes of WAL but > many segments missing. I do agree that it's definitely a problem and one that I've seen as well where we think we reach the end of recovery even though we didn't actually. Having a way to avoid that happening would be quite nice. It does seem like we have some trade-offs here to weigh, but pg_control is indeed quite small.. Thanks, Stephen
Attachment
Hi, On 2021-04-21 15:51:38 -0400, Stephen Frost wrote: > It does seem like we have some trade-offs here to weigh, but > pg_control is indeed quite small.. What do you mean by that? That the overhead of writing it out more frequently wouldn't be too bad? Or that we shouldn't "unnecessarily" add more fields to it? Greetings, Andres Freund
Greetings, * Andres Freund (andres@anarazel.de) wrote: > On 2021-04-21 15:51:38 -0400, Stephen Frost wrote: > > It does seem like we have some trade-offs here to weigh, but > > pg_control is indeed quite small.. > > What do you mean by that? That the overhead of writing it out more > frequently wouldn't be too bad? Or that we shouldn't "unnecessarily" add > more fields to it? Mostly just that the added overhead in writing it out more frequently wouldn't be too bad. Adding fields runs the risk of crossing the threshold where we feel that we can safely assume all of it will make it to disk in one shot and therefore there's more reason to not add extra fields to it, if possible. Seems the missing bit here is "how often, and how do we make that happen?" and then we can discuss if there's reason to be concerned that it would be 'too frequent' or cause too much additional overhead in terms of IO/fsyncs. Thanks, Stephen
Attachment
Stephen Frost <sfrost@snowman.net> writes: > * Andres Freund (andres@anarazel.de) wrote: >> What do you mean by that? That the overhead of writing it out more >> frequently wouldn't be too bad? Or that we shouldn't "unnecessarily" add >> more fields to it? > Mostly just that the added overhead in writing it out more frequently > wouldn't be too bad. My concern about it was not at all about performance, but that every time you write it is a new opportunity for the filesystem to lose or corrupt the data. regards, tom lane
Hi, On 2021-04-21 16:28:26 -0400, Stephen Frost wrote: > * Andres Freund (andres@anarazel.de) wrote: > > On 2021-04-21 15:51:38 -0400, Stephen Frost wrote: > > > It does seem like we have some trade-offs here to weigh, but > > > pg_control is indeed quite small.. > > > > What do you mean by that? That the overhead of writing it out more > > frequently wouldn't be too bad? Or that we shouldn't "unnecessarily" add > > more fields to it? > > Mostly just that the added overhead in writing it out more frequently > wouldn't be too bad. > > Seems the missing bit here is "how often, and how do we make that > happen?" and then we can discuss if there's reason to be concerned that > it would be 'too frequent' or cause too much additional overhead in > terms of IO/fsyncs. The number of writes and the number of fsyncs of the control file wouldn't necessarily have to be the same. We could e.g. update the file once per segment, but only fsync it at a lower cadence. We already rely on handling writes-without-fsync of the control file (which is trivial due to the <= 512 byte limit). Another interesting question is where we'd do the update from. It seems like it ought to be some background process: I can see doing it in the checkpointer - but there's a few phases that can take a while (e.g. sync) where currently don't call something like CheckpointWriteDelay() on a regular basis. I also can see doing it in bgwriter - none of the work it does should take all that long, and minor increases in latency ought not to have much of an impact. Wal writer seems less suitable, some workloads are sensitive to it not getting around doing what it ought to do. > Adding fields runs the risk of crossing the > threshold where we feel that we can safely assume all of it will make it > to disk in one shot and therefore there's more reason to not add extra > fields to it, if possible. Yea, we really should stay below 512 bytes (sector size). We're at 296 right now, with 20 bytes lost to padding. If we got close to the limit we could easily move some of the contents out of pg_control - we e.g. don't need to write out all the compile time values all the time, they could live in a file similar to PG_VERSION instead. So I'm not too concerned right now. But we also don't need to add anything, given that we already have minRecoveryPoint. Greetings, Andres Freund
Hi, On 2021-04-21 16:55:28 -0400, Tom Lane wrote: > My concern about it was not at all about performance, but that every time > you write it is a new opportunity for the filesystem to lose or corrupt > the data. We already do, sometimes very frequent, control file updates on standbys to update minRecoveryLSN. I don't recall reports of that causing corruption issues. So I'd not be too concerned about that aspect? Greetings, Andres Freund
Greetings,
On Wed, Apr 21, 2021 at 17:01 Andres Freund <andres@anarazel.de> wrote:
On 2021-04-21 16:55:28 -0400, Tom Lane wrote:
> My concern about it was not at all about performance, but that every time
> you write it is a new opportunity for the filesystem to lose or corrupt
> the data.
We already do, sometimes very frequent, control file updates on standbys
to update minRecoveryLSN. I don't recall reports of that causing
corruption issues. So I'd not be too concerned about that aspect?
Or perhaps we should consider having multiple copies..? Though I definitely have seen missing WAL causing difficult to realize / detect corruption more than corrupt pg_control files...
Thanks,
Stephen
On Wed, 21 Apr 2021 12:36:05 -0700 Andres Freund <andres@anarazel.de> wrote: > [...] > > I don't think that concern equally applies for what I am proposing > here. For one, we already have minRecoveryPoint in ControlData, and we > already use it for the purpose of determining where we need to recover > to, albeit only during crash recovery. Imo that's substantially > different from adding actual recovery progress status information to the > control file. Just for the record, when I was talking about updating status of the startup in the controldata, I was thinking about setting the last known LSN replayed. Not some kind of variable string. > > I also think that it'd actually be a significant reliability improvement > if we maintained an approximate minRecoveryPoint during normal running: > I've seen way too many cases where WAL files were lost / removed and > crash recovery just started up happily. Only hitting problems months > down the line. Yes, it'd obviously not bullet proof, since we'd not want > to add a significant stream of new fsyncs, but IME such WAL files > lost/removed issues tend not to be about a few hundred bytes of WAL but > many segments missing. Maybe setting this minRecoveryPoint once per segment would be enough, near from the beginning of the WAL. At least, the recovery process would be forced to actually replay until the very last known segment. Regards,
Hi, As nobody opposed the log based approach, I thought of creating a patch using this approach. Please find the patch attached. Introduced the new GUC variable 'log_min_duration_startup_process' which indicates the time period after every time period it logs the progress of the process startup when it is set to a value (In millisecond) greater than zero. if it is set to zero, then it logs all available messages. If it is set to -1, then disables the feature. > There are probably a number of reasons why this can happen, but the > ones that seem to come up most often in my experience are > (1) SyncDataDirectory takes a long time, (b) ResetUnloggedRelations > takes a long time, and (c) there's a lot of WAL to apply so that takes a > long time. I have added the proper logs in all of the above scenarios. Following is the sample log displayed during server startup when the time period is set to 10ms. 2021-06-04 19:40:06.390 IST [51116] LOG: Syncing data directory, elapsed time: 14.165 ms, current path: ./base/13892/16384_fsm 2021-06-04 19:40:06.399 IST [51116] LOG: Syncing data directory completed after 22.661 ms 2021-06-04 19:40:06.399 IST [51116] LOG: database system was not properly shut down; automatic recovery in progress 2021-06-04 19:40:06.401 IST [51116] LOG: Resetting unlogged relations completed after 0.219 ms 2021-06-04 19:40:06.401 IST [51116] LOG: redo starts at 0/4728B88 2021-06-04 19:40:06.411 IST [51116] LOG: Performing crash recovery, elapsed time: 10.002 ms, current LSN: 0/47AA998 2021-06-04 19:40:06.421 IST [51116] LOG: Performing crash recovery, elapsed time: 20.002 ms, current LSN: 0/4838D80 2021-06-04 19:40:06.431 IST [51116] LOG: Performing crash recovery, elapsed time: 30.002 ms, current LSN: 0/48DA718 2021-06-04 19:40:06.441 IST [51116] LOG: Performing crash recovery, elapsed time: 40.002 ms, current LSN: 0/49791C0 . . . 2021-06-04 19:40:07.222 IST [51116] LOG: Performing crash recovery, elapsed time: 820.805 ms, current LSN: 0/76F6F10 2021-06-04 19:40:07.227 IST [51116] LOG: invalid record length at 0/774E758: wanted 24, got 0 2021-06-04 19:40:07.227 IST [51116] LOG: redo done at 0/774E730 system usage: CPU: user: 0.77 s, system: 0.03 s, elapsed: 0.82 s Kindly let me know if any changes are required. Thanks & Regards, Nitin Jadhav On Thu, Apr 22, 2021 at 4:39 AM Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote: > > On Wed, 21 Apr 2021 12:36:05 -0700 > Andres Freund <andres@anarazel.de> wrote: > > > [...] > > > > I don't think that concern equally applies for what I am proposing > > here. For one, we already have minRecoveryPoint in ControlData, and we > > already use it for the purpose of determining where we need to recover > > to, albeit only during crash recovery. Imo that's substantially > > different from adding actual recovery progress status information to the > > control file. > > Just for the record, when I was talking about updating status of the startup > in the controldata, I was thinking about setting the last known LSN replayed. > Not some kind of variable string. > > > > > I also think that it'd actually be a significant reliability improvement > > if we maintained an approximate minRecoveryPoint during normal running: > > I've seen way too many cases where WAL files were lost / removed and > > crash recovery just started up happily. Only hitting problems months > > down the line. Yes, it'd obviously not bullet proof, since we'd not want > > to add a significant stream of new fsyncs, but IME such WAL files > > lost/removed issues tend not to be about a few hundred bytes of WAL but > > many segments missing. > > Maybe setting this minRecoveryPoint once per segment would be enough, near > from the beginning of the WAL. At least, the recovery process would be > forced to actually replay until the very last known segment. > > Regards, > >
Attachment
On Fri, Jun 04, 2021 at 07:49:21PM +0530, Nitin Jadhav wrote: > I have added the proper logs in all of the above scenarios. > > Following is the sample log displayed during server startup when the > time period is set to 10ms. > > 2021-06-04 19:40:06.390 IST [51116] LOG: Syncing data directory, elapsed time: 14.165 ms, current path: ./base/13892/16384_fsm > 2021-06-04 19:40:06.399 IST [51116] LOG: Syncing data directory completed after 22.661 ms |2021-06-04 19:40:07.222 IST [51116] LOG: Performing crash recovery, elapsed time: 820.805 ms, current LSN: 0/76F6F10 |2021-06-04 19:40:07.227 IST [51116] LOG: invalid record length at 0/774E758: wanted 24, got 0 |2021-06-04 19:40:07.227 IST [51116] LOG: redo done at 0/774E730 system usage: CPU: user: 0.77 s, system: 0.03 s, elapsed:0.82 s Should it show the rusage ? It's shown at startup completion since 10a5b35a0, so it seems strange not to show it here. + log_startup_process_progress("Syncing data directory", path, false); I think the fsync vs syncfs paths should be distinguished: "Syncing data directory (fsync)" vs "Syncing data directory (syncfs)". + {"log_min_duration_startup_process", PGC_SUSET, LOGGING_WHEN, I think it should be PGC_SIGHUP, to allow changing it during runtime. Obviously it has no effect except during startup, but the change will be effective if the current process crashes. See also: https://www.postgresql.org/message-id/20210526001359.GE3676@telsasoft.com +extern void log_startup_process_progress(char *operation, void *data, + bool is_complete); I think this should take an enum operation, rather than using strcmp() on it later. The enum values might be RECOVERY_START, RECOVERY_END, rather than having a bool is_complete. -- Justin
On Sun, Jun 6, 2021 at 6:23 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > Should it show the rusage ? It's shown at startup completion since 10a5b35a0, > so it seems strange not to show it here. I don't know, that seems like it's going to make the messages awfully long, and I'm not sure of what use it is to see that for every report. I don't like the name very much. log_min_duration_startup_process seems to have been chosen to correspond to log_min_duration_statement, but the semantics are different. That one is a threshold, whereas this one is an interval. Maybe something like log_startup_progress_interval? As far as the patch itself goes, I think that the overhead of this approach is going to be unacceptably high. I was imagining having a timer running in the background that fires periodically, with the interval handler just setting a flag. Then in the foreground we just need to check whether the flag is set. I doubt that we can get away with a GetCurrentTimestamp() after applying every WAL record ... that seems like it will be slow. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > ... I doubt that we can get away > with a GetCurrentTimestamp() after applying every WAL record ... that > seems like it will be slow. Yeah, that's going to be pretty awful even on machines with fast gettimeofday, never mind ones where it isn't. It should be possible to use utils/misc/timeout.c to manage the interrupt, I'd think. regards, tom lane
> Should it show the rusage ? It's shown at startup completion since 10a5b35a0, > so it seems strange not to show it here. > I don't know, that seems like it's going to make the messages awfully > long, and I'm not sure of what use it is to see that for every report. I have not changed anything wrt this. If it is really required to change, then I will change. > + log_startup_process_progress("Syncing data directory", path, false); > I think the fsync vs syncfs paths should be distinguished: "Syncing data > directory (fsync)" vs "Syncing data directory (syncfs)". Fixed. > + {"log_min_duration_startup_process", PGC_SUSET, LOGGING_WHEN, > > I think it should be PGC_SIGHUP, to allow changing it during runtime. > Obviously it has no effect except during startup, but the change will be > effective if the current process crashes. > See also: https://www.postgresql.org/message-id/20210526001359.GE3676@telsasoft.com I did not get exactly how it will change behaviour. In my understanding, when the server restarts after a crash, it fetches the value from the config file. So if there is any change that gets affected. Kindly correct me if I am wrong. > +extern void log_startup_process_progress(char *operation, void *data, > + bool is_complete); > > I think this should take an enum operation, rather than using strcmp() on it > later. The enum values might be RECOVERY_START, RECOVERY_END, rather than > having a bool is_complete. Fixed. > I don't like the name very much. log_min_duration_startup_process > seems to have been chosen to correspond to log_min_duration_statement, > but the semantics are different. That one is a threshold, whereas this > one is an interval. Maybe something like > log_startup_progress_interval? Yes. This looks more appropriate. Fixed in the attached patch. > As far as the patch itself goes, I think that the overhead of this > approach is going to be unacceptably high. I was imagining having a > timer running in the background that fires periodically, with the > interval handler just setting a flag. Then in the foreground we just > need to check whether the flag is set. I doubt that we can get away > with a GetCurrentTimestamp() after applying every WAL record ... that > seems like it will be slow. Thanks for correcting me. This approach is far better than what I had used earlier. I have done the code changes as per your approach in the attached patch. Kindly let me know if any changes are required. Thanks & Regards, Nitin Jadhav On Mon, Jun 7, 2021 at 7:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > > ... I doubt that we can get away > > with a GetCurrentTimestamp() after applying every WAL record ... that > > seems like it will be slow. > > Yeah, that's going to be pretty awful even on machines with fast > gettimeofday, never mind ones where it isn't. > > It should be possible to use utils/misc/timeout.c to manage the > interrupt, I'd think. > > regards, tom lane
Attachment
On Wed, Jun 09, 2021 at 05:09:54PM +0530, Nitin Jadhav wrote: > > + {"log_min_duration_startup_process", PGC_SUSET, LOGGING_WHEN, > > > > I think it should be PGC_SIGHUP, to allow changing it during runtime. > > Obviously it has no effect except during startup, but the change will be > > effective if the current process crashes. > > See also: https://www.postgresql.org/message-id/20210526001359.GE3676@telsasoft.com > > I did not get exactly how it will change behaviour. In my > understanding, when the server restarts after a crash, it fetches the > value from the config file. So if there is any change that gets > affected. Kindly correct me if I am wrong. I don't think so. I checked and SelectConfigFiles is called only once to read config files and cmdline args. And not called on restart_after_crash. The GUC definitely isn't SUSET, since it's not useful to write in a (super) user session SET log_min_duration_startup_process=123. I've triple checked the behavior using a patch I submitted for Thomas' syncfs feature. ALTER SYSTEM recovery_init_sync_method=syncfs was not picked up when I sent SIGABRT. But with my patch, if I also do SELECT pg_reload_conf(), then a future crash uses syncfs. https://www.postgresql.org/message-id/20210526001359.GE3676@telsasoft.com -- Justin
> > > + {"log_min_duration_startup_process", PGC_SUSET, LOGGING_WHEN, > > > > > > I think it should be PGC_SIGHUP, to allow changing it during runtime. > > > Obviously it has no effect except during startup, but the change will be > > > effective if the current process crashes. > > > See also: https://www.postgresql.org/message-id/20210526001359.GE3676@telsasoft.com > > > > I did not get exactly how it will change behaviour. In my > > understanding, when the server restarts after a crash, it fetches the > > value from the config file. So if there is any change that gets > > affected. Kindly correct me if I am wrong. Sorry my understanding was wrong. But I'm not completely convinced with the above description saying that the change will be effective if the current process crashes. AFAIK, whenever we set the GucContext less than PGC_SIGHUP (that is either PGC_POSTMASTER or PGC_INTERNAL) then any change in the config file will not get affected during restart after crash. If the GucContext is greater than or equal to PGC_SIGHUP, then any change in the config file will be changed once it receives the SIGHUP signal. So it gets affected by a restart after a crash. So since the GucContext set here is PGC_SUSET which is greater than PGC_SIGHUP, there is no change in the behaviour wrt this point. > I've triple checked the behavior using a patch I submitted for Thomas' syncfs > feature. ALTER SYSTEM recovery_init_sync_method=syncfs was not picked up when > I sent SIGABRT. But with my patch, if I also do SELECT pg_reload_conf(), then > a future crash uses syncfs. > https://www.postgresql.org/message-id/20210526001359.GE3676@telsasoft.com The difference is since the behaviour is compared between PGC_POSTMASTER and PGC_SIGHUP. > The GUC definitely isn't SUSET, since it's not useful to write in a (super) > user session SET log_min_duration_startup_process=123. I agree with this. I may have to change this value as setting in a user session is not at all useful. But I am confused between PGC_POSTMASTER and PGC_SIGHUP. We should use PGC_SIGHUP if we would like to allow the change during restart after a crash. Otherwise PGC_POSTMASTER would be sufficient. Kindly share your thoughts. Thanks & Regards, Nitin Jadhav On Wed, Jun 9, 2021 at 9:49 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Wed, Jun 09, 2021 at 05:09:54PM +0530, Nitin Jadhav wrote: > > > + {"log_min_duration_startup_process", PGC_SUSET, LOGGING_WHEN, > > > > > > I think it should be PGC_SIGHUP, to allow changing it during runtime. > > > Obviously it has no effect except during startup, but the change will be > > > effective if the current process crashes. > > > See also: https://www.postgresql.org/message-id/20210526001359.GE3676@telsasoft.com > > > > I did not get exactly how it will change behaviour. In my > > understanding, when the server restarts after a crash, it fetches the > > value from the config file. So if there is any change that gets > > affected. Kindly correct me if I am wrong. > > I don't think so. I checked and SelectConfigFiles is called only once to read > config files and cmdline args. And not called on restart_after_crash. > > The GUC definitely isn't SUSET, since it's not useful to write in a (super) > user session SET log_min_duration_startup_process=123. > > I've triple checked the behavior using a patch I submitted for Thomas' syncfs > feature. ALTER SYSTEM recovery_init_sync_method=syncfs was not picked up when > I sent SIGABRT. But with my patch, if I also do SELECT pg_reload_conf(), then > a future crash uses syncfs. > https://www.postgresql.org/message-id/20210526001359.GE3676@telsasoft.com > > -- > Justin
On Thu, Jun 10, 2021 at 03:19:20PM +0530, Nitin Jadhav wrote: > > > > + {"log_min_duration_startup_process", PGC_SUSET, LOGGING_WHEN, > > > > > > > > I think it should be PGC_SIGHUP, to allow changing it during runtime. > > > > Obviously it has no effect except during startup, but the change will be > > > > effective if the current process crashes. > > > > See also: https://www.postgresql.org/message-id/20210526001359.GE3676@telsasoft.com > > > > > > I did not get exactly how it will change behaviour. In my > > > understanding, when the server restarts after a crash, it fetches the > > > value from the config file. So if there is any change that gets > > > affected. Kindly correct me if I am wrong. > > Sorry my understanding was wrong. But I'm not completely convinced > with the above description saying that the change will be effective if > the current process crashes. > AFAIK, whenever we set the GucContext less than PGC_SIGHUP (that is > either PGC_POSTMASTER or PGC_INTERNAL) then any change in the config > file will not get affected during restart after crash. If the > GucContext is greater than or equal to PGC_SIGHUP, then any change in > the config file will be changed once it receives the SIGHUP signal. So > it gets affected by a restart after a crash. So since the GucContext > set here is PGC_SUSET which is greater than PGC_SIGHUP, there is no > change in the behaviour wrt this point. Since you agreed that SUSET was wrong, and PGC_POSTMASTER doesn't allow changing the value without restart, doesn't it follow that SIGHUP is what's wanted ? > > I've triple checked the behavior using a patch I submitted for Thomas' syncfs > > feature. ALTER SYSTEM recovery_init_sync_method=syncfs was not picked up when > > I sent SIGABRT. But with my patch, if I also do SELECT pg_reload_conf(), then > > a future crash uses syncfs. > > https://www.postgresql.org/message-id/20210526001359.GE3676@telsasoft.com > > The difference is since the behaviour is compared between > PGC_POSTMASTER and PGC_SIGHUP. > > > The GUC definitely isn't SUSET, since it's not useful to write in a (super) > > user session SET log_min_duration_startup_process=123. > I agree with this. I may have to change this value as setting in a > user session is not at all useful. But I am confused between > PGC_POSTMASTER and PGC_SIGHUP. We should use PGC_SIGHUP if we would > like to allow the change during restart after a crash. Otherwise > PGC_POSTMASTER would be sufficient. Kindly share your thoughts. > > Thanks & Regards, > Nitin Jadhav > > On Wed, Jun 9, 2021 at 9:49 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > On Wed, Jun 09, 2021 at 05:09:54PM +0530, Nitin Jadhav wrote: > > > > + {"log_min_duration_startup_process", PGC_SUSET, LOGGING_WHEN, > > > > > > > > I think it should be PGC_SIGHUP, to allow changing it during runtime. > > > > Obviously it has no effect except during startup, but the change will be > > > > effective if the current process crashes. > > > > See also: https://www.postgresql.org/message-id/20210526001359.GE3676@telsasoft.com > > > > > > I did not get exactly how it will change behaviour. In my > > > understanding, when the server restarts after a crash, it fetches the > > > value from the config file. So if there is any change that gets > > > affected. Kindly correct me if I am wrong. > > > > I don't think so. I checked and SelectConfigFiles is called only once to read > > config files and cmdline args. And not called on restart_after_crash. > > > > The GUC definitely isn't SUSET, since it's not useful to write in a (super) > > user session SET log_min_duration_startup_process=123. > > > > I've triple checked the behavior using a patch I submitted for Thomas' syncfs > > feature. ALTER SYSTEM recovery_init_sync_method=syncfs was not picked up when > > I sent SIGABRT. But with my patch, if I also do SELECT pg_reload_conf(), then > > a future crash uses syncfs. > > https://www.postgresql.org/message-id/20210526001359.GE3676@telsasoft.com
> Since you agreed that SUSET was wrong, and PGC_POSTMASTER doesn't allow > changing the value without restart, doesn't it follow that SIGHUP is what's > wanted ? Yes. I have done the changes in the attached patch. Apart from this, I have done a few other changes to the patch. The changes include 1. Renamed 'InitCurrentOperation' to 'InitStartupProgress()'. 2. Divided the functionality of 'LogStartupProgress()' into 2 parts. One for logging the progress and the other to log the completion information. The first part's function name remains as is and a new function 'CloseStartupProgress()' added for the second part. 3. In case of any invalid operations found during logging of the startup progress, throwing an error. This is not a concern unless the developer makes a mistake. 4. Modified the 'StartupProcessOp' enums like 'FSYNC_START' to 'FSYNC_IN_PROGRESS' for better readability. 5. Updated the comments and some cosmetic changes. Kindly share your comments. Thanks & Regards, Nitin Jadhav On Thu, Jun 10, 2021 at 3:31 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Thu, Jun 10, 2021 at 03:19:20PM +0530, Nitin Jadhav wrote: > > > > > + {"log_min_duration_startup_process", PGC_SUSET, LOGGING_WHEN, > > > > > > > > > > I think it should be PGC_SIGHUP, to allow changing it during runtime. > > > > > Obviously it has no effect except during startup, but the change will be > > > > > effective if the current process crashes. > > > > > See also: https://www.postgresql.org/message-id/20210526001359.GE3676@telsasoft.com > > > > > > > > I did not get exactly how it will change behaviour. In my > > > > understanding, when the server restarts after a crash, it fetches the > > > > value from the config file. So if there is any change that gets > > > > affected. Kindly correct me if I am wrong. > > > > Sorry my understanding was wrong. But I'm not completely convinced > > with the above description saying that the change will be effective if > > the current process crashes. > > AFAIK, whenever we set the GucContext less than PGC_SIGHUP (that is > > either PGC_POSTMASTER or PGC_INTERNAL) then any change in the config > > file will not get affected during restart after crash. If the > > GucContext is greater than or equal to PGC_SIGHUP, then any change in > > the config file will be changed once it receives the SIGHUP signal. So > > it gets affected by a restart after a crash. So since the GucContext > > set here is PGC_SUSET which is greater than PGC_SIGHUP, there is no > > change in the behaviour wrt this point. > > Since you agreed that SUSET was wrong, and PGC_POSTMASTER doesn't allow > changing the value without restart, doesn't it follow that SIGHUP is what's > wanted ? > > > > I've triple checked the behavior using a patch I submitted for Thomas' syncfs > > > feature. ALTER SYSTEM recovery_init_sync_method=syncfs was not picked up when > > > I sent SIGABRT. But with my patch, if I also do SELECT pg_reload_conf(), then > > > a future crash uses syncfs. > > > https://www.postgresql.org/message-id/20210526001359.GE3676@telsasoft.com > > > > The difference is since the behaviour is compared between > > PGC_POSTMASTER and PGC_SIGHUP. > > > > > The GUC definitely isn't SUSET, since it's not useful to write in a (super) > > > user session SET log_min_duration_startup_process=123. > > I agree with this. I may have to change this value as setting in a > > user session is not at all useful. But I am confused between > > PGC_POSTMASTER and PGC_SIGHUP. We should use PGC_SIGHUP if we would > > like to allow the change during restart after a crash. Otherwise > > PGC_POSTMASTER would be sufficient. Kindly share your thoughts. > > > > Thanks & Regards, > > Nitin Jadhav > > > > On Wed, Jun 9, 2021 at 9:49 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > > > On Wed, Jun 09, 2021 at 05:09:54PM +0530, Nitin Jadhav wrote: > > > > > + {"log_min_duration_startup_process", PGC_SUSET, LOGGING_WHEN, > > > > > > > > > > I think it should be PGC_SIGHUP, to allow changing it during runtime. > > > > > Obviously it has no effect except during startup, but the change will be > > > > > effective if the current process crashes. > > > > > See also: https://www.postgresql.org/message-id/20210526001359.GE3676@telsasoft.com > > > > > > > > I did not get exactly how it will change behaviour. In my > > > > understanding, when the server restarts after a crash, it fetches the > > > > value from the config file. So if there is any change that gets > > > > affected. Kindly correct me if I am wrong. > > > > > > I don't think so. I checked and SelectConfigFiles is called only once to read > > > config files and cmdline args. And not called on restart_after_crash. > > > > > > The GUC definitely isn't SUSET, since it's not useful to write in a (super) > > > user session SET log_min_duration_startup_process=123. > > > > > > I've triple checked the behavior using a patch I submitted for Thomas' syncfs > > > feature. ALTER SYSTEM recovery_init_sync_method=syncfs was not picked up when > > > I sent SIGABRT. But with my patch, if I also do SELECT pg_reload_conf(), then > > > a future crash uses syncfs. > > > https://www.postgresql.org/message-id/20210526001359.GE3676@telsasoft.com
Attachment
+ * Codes of the operations performed during startup process + */ +typedef enum StartupProcessOp +{ + SYNCFS_IN_PROGRESS, + FSYNC_IN_PROGRESS, + RECOVERY_IN_PROGRESS, + RESET_UNLOGGED_REL_IN_PROGRESS, + DUMMY, + SYNCFS_END, + FSYNC_END, + RECOVERY_END, + RESET_UNLOGGED_REL_END +} StartupProcessOp; What is DUMMY about ? If you just want to separate the "start" from "end", you could write: /* codes for start of operations */ FSYNC_IN_PROGRESS SYNCFS_IN_PROGRESS ... /* codes for end of operations */ FSYNC_END SYNCFS_END ... Or group them together like: FSYNC_IN_PROGRESS, FSYNC_END, SYNCFS_IN_PROGRESS, SYNCFS_END, RECOVERY_IN_PROGRESS, RECOVERY_END, RESET_UNLOGGED_REL_IN_PROGRESS, RESET_UNLOGGED_REL_END, -- Justin
> What is DUMMY about ? If you just want to separate the "start" from "end", > you could write: > > /* codes for start of operations */ > FSYNC_IN_PROGRESS > SYNCFS_IN_PROGRESS > ... > /* codes for end of operations */ > FSYNC_END > SYNCFS_END > ... That was by mistake and I have corrected it in the attached patch. Thanks & Regards, Nitin Jadhav On Thu, Jun 17, 2021 at 6:22 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > + * Codes of the operations performed during startup process > + */ > +typedef enum StartupProcessOp > +{ > + SYNCFS_IN_PROGRESS, > + FSYNC_IN_PROGRESS, > + RECOVERY_IN_PROGRESS, > + RESET_UNLOGGED_REL_IN_PROGRESS, > + DUMMY, > + SYNCFS_END, > + FSYNC_END, > + RECOVERY_END, > + RESET_UNLOGGED_REL_END > +} StartupProcessOp; > > What is DUMMY about ? If you just want to separate the "start" from "end", > you could write: > > /* codes for start of operations */ > FSYNC_IN_PROGRESS > SYNCFS_IN_PROGRESS > ... > /* codes for end of operations */ > FSYNC_END > SYNCFS_END > ... > > Or group them together like: > > FSYNC_IN_PROGRESS, > FSYNC_END, > SYNCFS_IN_PROGRESS, > SYNCFS_END, > RECOVERY_IN_PROGRESS, > RECOVERY_END, > RESET_UNLOGGED_REL_IN_PROGRESS, > RESET_UNLOGGED_REL_END, > > -- > Justin
Attachment
Few comments for v4 patch: @@ -7351,6 +7363,8 @@ StartupXLOG(void) (errmsg("redo starts at %X/%X", LSN_FORMAT_ARGS(ReadRecPtr)))); + InitStartupProgress(); + /* * main redo apply loop */ @@ -7358,6 +7372,8 @@ StartupXLOG(void) { bool switchedTLI = false; + LogStartupProgress(RECOVERY_IN_PROGRESS, NULL); + #ifdef WAL_DEBUG if (XLOG_DEBUG || (rmid == RM_XACT_ID && trace_recovery_messages <= DEBUG2) || @@ -7569,6 +7585,8 @@ StartupXLOG(void) * end of main redo apply loop */ + CloseStartupProgress(RECOVERY_END); I am not sure I am getting the code flow correctly. From CloseStartupProgress() naming it seems, it corresponds to InitStartupProgress() but what it is doing is similar to LogStartupProgress(). I think it should be renamed to be inlined with LogStartupProgress(), IMO. --- + + /* Return if any invalid operation */ + if (operation >= SYNCFS_END) + return; .... + /* Return if any invalid operation */ + if (operation < SYNCFS_END) + return; + This part should be an assertion, it's the developer's responsibility to call correctly. --- +/* + * Codes of the operations performed during startup process + */ +typedef enum StartupProcessOp +{ + /* Codes for in-progress operations */ + SYNCFS_IN_PROGRESS, + FSYNC_IN_PROGRESS, + RECOVERY_IN_PROGRESS, + RESET_UNLOGGED_REL_IN_PROGRESS, + /* Codes for end of operations */ + SYNCFS_END, + FSYNC_END, + RECOVERY_END, + RESET_UNLOGGED_REL_END +} StartupProcessOp; + Since we do have a separate call for the in-progress operation and the end-operation, only a single enum would have been enough. If we do this, then I think we should remove get_startup_process_operation_string() move messages to the respective function. --- Also, with your patch "make check-world" has few failures, kindly check that. Regards, Amul On Mon, Jun 21, 2021 at 12:06 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > > What is DUMMY about ? If you just want to separate the "start" from "end", > > you could write: > > > > /* codes for start of operations */ > > FSYNC_IN_PROGRESS > > SYNCFS_IN_PROGRESS > > ... > > /* codes for end of operations */ > > FSYNC_END > > SYNCFS_END > > ... > > That was by mistake and I have corrected it in the attached patch. > > Thanks & Regards, > Nitin Jadhav > > On Thu, Jun 17, 2021 at 6:22 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > + * Codes of the operations performed during startup process > > + */ > > +typedef enum StartupProcessOp > > +{ > > + SYNCFS_IN_PROGRESS, > > + FSYNC_IN_PROGRESS, > > + RECOVERY_IN_PROGRESS, > > + RESET_UNLOGGED_REL_IN_PROGRESS, > > + DUMMY, > > + SYNCFS_END, > > + FSYNC_END, > > + RECOVERY_END, > > + RESET_UNLOGGED_REL_END > > +} StartupProcessOp; > > > > What is DUMMY about ? If you just want to separate the "start" from "end", > > you could write: > > > > /* codes for start of operations */ > > FSYNC_IN_PROGRESS > > SYNCFS_IN_PROGRESS > > ... > > /* codes for end of operations */ > > FSYNC_END > > SYNCFS_END > > ... > > > > Or group them together like: > > > > FSYNC_IN_PROGRESS, > > FSYNC_END, > > SYNCFS_IN_PROGRESS, > > SYNCFS_END, > > RECOVERY_IN_PROGRESS, > > RECOVERY_END, > > RESET_UNLOGGED_REL_IN_PROGRESS, > > RESET_UNLOGGED_REL_END, > > > > -- > > Justin
Hi, I'd really like to see this enabled by default, say with a default interval of 10 seconds. If it has to be enabled explicitly, most people won't, but I think a lot of people would benefit from knowing why their system is slow to start up when that sort of thing happens. I don't see much downside to having it on by default either, since it shouldn't be expensive. I think the GUC's units should be seconds, not milliseconds, though. I tried starting the server with log_startup_progress_interval=1000 and then crashing it to see what the output looked like. I got this: 2021-07-09 15:49:55.956 EDT [99033] LOG: all server processes terminated; reinitializing 2021-07-09 15:49:55.970 EDT [99106] LOG: database system was interrupted; last known up at 2021-07-09 15:48:39 EDT 2021-07-09 15:49:56.499 EDT [99106] LOG: Data directory sync (fsync) complete after 529.673 ms 2021-07-09 15:49:56.501 EDT [99106] LOG: database system was not properly shut down; automatic recovery in progress 2021-07-09 15:49:56.503 EDT [99106] LOG: redo starts at 0/223494A8 2021-07-09 15:49:57.504 EDT [99106] LOG: Performing crash recovery, elapsed time: 1000.373 ms, current LSN: 0/40A3F888 2021-07-09 15:49:58.505 EDT [99106] LOG: Performing crash recovery, elapsed time: 2001.449 ms, current LSN: 0/41F89388 2021-07-09 15:49:59.505 EDT [99106] LOG: Performing crash recovery, elapsed time: 3001.602 ms, current LSN: 0/55745760 2021-07-09 15:50:00.506 EDT [99106] LOG: Performing crash recovery, elapsed time: 4002.677 ms, current LSN: 0/60CB9FE0 2021-07-09 15:50:01.507 EDT [99106] LOG: Performing crash recovery, elapsed time: 5003.808 ms, current LSN: 0/6A2BBE10 2021-07-09 15:50:02.508 EDT [99106] LOG: Performing crash recovery, elapsed time: 6004.916 ms, current LSN: 0/71BA3F90 2021-07-09 15:50:03.385 EDT [99106] LOG: invalid record length at 0/76BD80F0: wanted 24, got 0 2021-07-09 15:50:03.385 EDT [99106] LOG: Crash recovery complete after 6881.834 ms 2021-07-09 15:50:03.385 EDT [99106] LOG: redo done at 0/76BD80C8 system usage: CPU: user: 2.77 s, system: 3.80 s, elapsed: 6.88 s 2021-07-09 15:50:04.778 EDT [99033] LOG: database system is ready to accept connections Few observations on this: - The messages you've added are capitalized, but the ones PostgreSQL has currently are not. You should conform to the existing style. - The "crash recovery complete" message looks redundant with the "redo done" message. Also, in my mind, "redo" is one particular phase of crash recovery, so it looks really odd that "crash recovery" finishes first and then "redo" finishes. I think some thought is needed about the terminology here. - I'm not clear why I get a message about the data directory fsync but not about resetting unlogged relations. I wasn't really expecting to get a message about things that completed in less than the configured interval, although I find that I don't mind having it there either. But then it seems like it should be consistent across the various things we're timing, and resetting unlogged relations should certainly be one of those. - The way you've coded this has some drift. In a perfect world, I'd get a progress report at 1000.00 ms, 2000.00 ms, 3000.00 ms, etc. That's never going to be the case, because there's always going to be a slightly delay in responding to the timer interrupt. However, as you've coded it, the delay increases over time. The first "Performing crash recovery" message is only 373 ms late, but the last one is 4916 ms late. To avoid this, you want to reschedule the timer interrupt based on the time the last one was supposed to fire, not the time it actually did fire. (Exception: If the time it actually did fire is before the time it was supposed to fire, then use the time it actually did fire instead. This protects against the system clock being set backwards.) ...Robert
On Mon, Jun 21, 2021 at 12:06 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > That was by mistake and I have corrected it in the attached patch. Thanks for the patch. Here are some comments. Please ignore if I repeat any of the comments given previously, as I didn't look at the entire thread. 1) A new line between function return value and the function name: +void CloseStartupProgress(StartupProcessOp operation) +{ Like below: +void +CloseStartupProgress(StartupProcessOp operation) +{ 2) Add an entry in the commit fest, if it's not done yet. That way, the patch gets tested on many platforms. 3) Replace "zero" with the number "0". + # -1 disables the feature, zero logs all 4) I think we can just rename InitStartupProgress to EnableStartupProgress or EnableStartupOpProgress to be more in sync with what it does. +/* + * Sets the start timestamp of the current operation and also enables the + * timeout for logging the progress of startup process. + */ +void +InitStartupProgress(void) +{ 5) Do we need the GetCurrentOperationStartTimestamp function at all? It does nothing great actually, you might have referred to GetCurrentTimestamp which does a good amount of work that qualifies to be inside a function. Can't we just use the operationStartTimestamp variable? Can we rename operationStartTimestamp (I don't think we need to specify Timestamp in a variable name) to startup_op_start_time or some other better name? +/* + * Fetches the start timestamp of the current operation. + */ +TimestampTz +GetCurrentOperationStartTimestamp(void) +{ 6) I think you can transform below + /* If the feature is disabled, then no need to proceed further. */ + if (log_startup_progress_interval < 0) + return; to + /* If the feature is disabled, then no need to proceed further. */ + if (log_startup_progress_interval == -1) + return; as -1 means feature disabled and values < -1 are not allowed to be set at all. 7) Isn't RECOVERY_IN_PROGRESS supposed to be REDO_IN_PRGRESS? Because, "recovery in progress" generally applies to the entire startup process right? Put it another way, the startup process as a whole does the entire recovery, and you have the RECOVERY_IN_PROGRESS in the redo phase of the entire startup process. 8) Why do we need to call get_startup_process_operation_string here? Can't you directly use the error message text? if (operation == RECOVERY_IN_PROGRESS) ereport(LOG, (errmsg("%s, elapsed time: %ld.%03d ms, current LSN: %X/%X", get_startup_process_operation_string(operation), 9) Do you need error handling in the default case of get_startup_process_operation_string? Instead, how about an assertion, Assert(operation >= SYNCFS_IN_PROGRESS && operation <= RESET_UNLOGGED_REL_END);? default: ereport(ERROR, (errmsg("unrecognized operation (%d) in startup progress update", operation))); 10) I personally didn't like the name get_startup_process_operation_string. How about get_startup_op_string? 11) As pointed out by Robert, the error log message should start with small letters. "syncing data directory (syncfs)"); "syncing data directory (fsync)"); "performing crash recovery"); "resetting unlogged relations"); In general, the error log message should start with small letters and not end with ".". The detail/hit messages should start with capital letters and end with "." ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("only B-Tree indexes are supported as targets for verification"), errdetail("Relation \"%s\" is not a B-Tree index.", RelationGetRelationName(rel)))); ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), errmsg("sslcert and sslkey are superuser-only"), errhint("User mappings with the sslcert or sslkey options set may only be created or modified by the superuser"))); 12) How about starting SYNCFS_IN_PROGRESS = 1, and leaving 0 for some unknown value? typedef enum StartupProcessOp { /* Codes for in-progress operations */ SYNCFS_IN_PROGRESS = 1, 13) Can we combine LogStartupProgress and CloseStartupProgress? Let's have a single function LogStartupProgress(StartupProcessOp operation, const char *path, bool start);, and differentiate the functionality with the start flag. 14) Can we move log_startup_progress_interval declaration from guc.h and guc.c to xlog.h and xlog.c? Because it makes more sense to be there, similar to the other GUCs under /* these variables are GUC parameters related to XLOG */ in xlog.h. Regards, Bharath Rupireddy.
> I am not sure I am getting the code flow correctly. From CloseStartupProgress() > naming it seems, it corresponds to InitStartupProgress() but what it is doing > is similar to LogStartupProgress(). I think it should be renamed to be inlined > with LogStartupProgress(), IMO. Renamed CloseStartupProgress() to LogStartupProgressComplete(). > This part should be an assertion, it's the developer's responsibility to call > correctly. This code is not required at all due to the fix of the next comment. > Since we do have a separate call for the in-progress operation and the > end-operation, only a single enum would have been enough. If we do this, then I > think we should remove get_startup_process_operation_string() move messages to > the respective function. Fixed. > I'd really like to see this enabled by default, say with a default > interval of 10 seconds. If it has to be enabled explicitly, most > people won't, but I think a lot of people would benefit from knowing > why their system is slow to start up when that sort of thing happens. > I don't see much downside to having it on by default either, since it > shouldn't be expensive. I think the GUC's units should be seconds, not > milliseconds, though. I agree that it is better to enable it by default. Changed the code accordingly and changed the GUC's units to seconds. > The messages you've added are capitalized, but the ones PostgreSQL > has currently are not. You should conform to the existing style. Fixed. > The "crash recovery complete" message looks redundant with the "redo > done" message. Also, in my mind, "redo" is one particular phase of > crash recovery, so it looks really odd that "crash recovery" finishes > first and then "redo" finishes. I think some thought is needed about > the terminology here. Yes. "redo" is one phase of the crash recovery. Even "resetting unlogged relations" is also a part of the crash recovery. These 2 are the major time consuming operations of the crash recovery task. There is a separate log message to indicate the progress of "resetting the unlogged relations". So instead of saying 'performing crash recovery", it is better to say "redo in progress" and not add any additional message at the end of redo, instead retain the existing message to avoid redundancy. > I'm not clear why I get a message about the data directory fsync but > not about resetting unlogged relations. I wasn't really expecting to > get a message about things that completed in less than the configured > interval, although I find that I don't mind having it there either. > But then it seems like it should be consistent across the various > things we're timing, and resetting unlogged relations should certainly > be one of those. It is the same across all the things we'are timing. I was able to see those messages on my machine. I feel there is not much overhead of logging one message at the end of the operation even though it completes within the configured interval. Following are the log messages shown on my machine. 2021-07-20 18:47:32.046 IST [102230] LOG: listening on IPv4 address "127.0.0.1", port 5445 2021-07-20 18:47:32.048 IST [102230] LOG: listening on Unix socket "/tmp/.s.PGSQL.5445" 2021-07-20 18:47:32.051 IST [102234] LOG: database system was interrupted; last known up at 2021-07-20 18:46:27 IST 2021-07-20 18:47:32.060 IST [102234] LOG: data directory sync (fsync) complete after 0.00 s 2021-07-20 18:47:32.060 IST [102234] LOG: database system was not properly shut down; automatic recovery in progress 2021-07-20 18:47:32.063 IST [102234] LOG: unlogged relations reset after 0.00 s 2021-07-20 18:47:32.063 IST [102234] LOG: redo starts at 0/14EF418 .2021-07-20 18:47:33.063 IST [102234] LOG: redo in progress, elapsed time: 1.00 s, current LSN: 0/5C13D28 .2021-07-20 18:47:34.063 IST [102234] LOG: redo in progress, elapsed time: 2.00 s, current LSN: 0/A289160 .2021-07-20 18:47:35.063 IST [102234] LOG: redo in progress, elapsed time: 3.00 s, current LSN: 0/EE2DE10 2021-07-20 18:47:35.563 IST [102234] LOG: invalid record length at 0/115C63E0: wanted 24, got 0 2021-07-20 18:47:35.563 IST [102234] LOG: redo done at 0/115C63B8 system usage: CPU: user: 3.58 s, system: 0.14 s, elapsed: 3.50 s 2021-07-20 18:47:35.564 IST [102234] LOG: unlogged relations reset after 0.00 s 2021-07-20 18:47:35.706 IST [102230] LOG: database system is ready to accept connections > The way you've coded this has some drift. In a perfect world, I'd > get a progress report at 1000.00 ms, 2000.00 ms, 3000.00 ms, etc. > That's never going to be the case, because there's always going to be > a slightly delay in responding to the timer interrupt. However, as > you've coded it, the delay increases over time. The first "Performing > crash recovery" message is only 373 ms late, but the last one is 4916 > ms late. To avoid this, you want to reschedule the timer interrupt > based on the time the last one was supposed to fire, not the time it > actually did fire. (Exception: If the time it actually did fire is > before the time it was supposed to fire, then use the time it actually > did fire instead. This protects against the system clock being set > backwards.) I have rescheduled the timer interrupt based on the time the last one was supposed to fire, not the time it actually did fire. Now I am able to see the messages when the timer is timed out and it is very close to the configured interval. But I did not find a scenario when the above mentioned exception can occur. Kindly let me know if I am wrong in the approach. > 1) A new line between function return value and the function name: > +void CloseStartupProgress(StartupProcessOp operation) > +{ > Like below: > +void > +CloseStartupProgress(StartupProcessOp operation) > +{ Fixed. > 2) Add an entry in the commit fest, if it's not done yet. That way, > the patch gets tested on many platforms. I have an entry in the sept commitfest https://commitfest.postgresql.org/34/3261/ > 3) Replace "zero" with the number "0". > + # -1 disables the feature, zero logs all Fixed. > 4) I think we can just rename InitStartupProgress to > EnableStartupProgress or EnableStartupOpProgress to be more in sync > with what it does. I feel 'Init' is more appropriate than 'enable' here. As it not only enables the timer but also initializes some variables. Timer enabling can also be interpreted as initialization. So in common 'init' is better than 'enable'. > 5) Do we need the GetCurrentOperationStartTimestamp function at all? > It does nothing great actually, you might have referred to > GetCurrentTimestamp which does a good amount of work that qualifies to > be inside a function. Can't we just use the operationStartTimestamp > variable? Can we rename operationStartTimestamp (I don't think we need > to specify Timestamp in a variable name) to startup_op_start_time or > some other better name? Changed it to 'startupProcessOpStartTime'. > 6) I think you can transform below > + /* If the feature is disabled, then no need to proceed further. */ > + if (log_startup_progress_interval < 0) > + return; > to > + /* If the feature is disabled, then no need to proceed further. */ > + if (log_startup_progress_interval == -1) > + return; > as -1 means feature disabled and values < -1 are not allowed to be set at all. I feel that should be ok. As '<0' includes '-1'. So it does our job. I can change it if it is really required to do so. > 7) Isn't RECOVERY_IN_PROGRESS supposed to be REDO_IN_PRGRESS? Because, > "recovery in progress" generally applies to the entire startup process > right? Put it another way, the startup process as a whole does the > entire recovery, and you have the RECOVERY_IN_PROGRESS in the redo > phase of the entire startup process. Changed it as part of the earlier comment's fix. Modified the message also to 'redo in progress' rather than 'recovery in progress'. > 8) Why do we need to call get_startup_process_operation_string here? > Can't you directly use the error message text? > if (operation == RECOVERY_IN_PROGRESS) > ereport(LOG, > (errmsg("%s, elapsed time: %ld.%03d ms, current LSN: %X/%X", > get_startup_process_operation_string(operation), Fixed it as part of an earlier comment. > 9) Do you need error handling in the default case of > get_startup_process_operation_string? Instead, how about an assertion, > Assert(operation >= SYNCFS_IN_PROGRESS && operation <= > RESET_UNLOGGED_REL_END);? > default: > ereport(ERROR, > (errmsg("unrecognized operation (%d) in startup >progress update", > operation))); It is better to have a default case. Assert is difficult to maintain if there are any modifications to the operations. > 10) I personally didn't like the name > get_startup_process_operation_string. How about get_startup_op_string? I have removed it as part of fixing the earlier comment. > 11) As pointed out by Robert, the error log message should start with > small letters. > "syncing data directory (syncfs)"); > "syncing data directory (fsync)"); > "performing crash recovery"); > "resetting unlogged relations"); > In general, the error log message should start with small letters and > not end with ".". The detail/hit messages should start with capital > letters and end with "." Thanks for the information. > 12) How about starting SYNCFS_IN_PROGRESS = 1, and leaving 0 for some > unknown value? > typedef enum StartupProcessOp > { > /* Codes for in-progress operations */ > SYNCFS_IN_PROGRESS = 1, I don't find any reason to do so. So not changed. Kindly let me know if there is any specific reason which would help changing it. > 13) Can we combine LogStartupProgress and CloseStartupProgress? Let's > have a single function LogStartupProgress(StartupProcessOp operation, > const char *path, bool start);, and differentiate the functionality > with the start flag. The function becomes complex and it will affect the readability. > 14) Can we move log_startup_progress_interval declaration from guc.h > and guc.c to xlog.h and xlog.c? Because it makes more sense to be > there, similar to the other GUCs under /* these variables are GUC > parameters related to XLOG */ in xlog.h. Fixed. Please find the v5 patch attached. Kindly let me know your comments. On Sat, Jul 10, 2021 at 1:51 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Mon, Jun 21, 2021 at 12:06 PM Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > That was by mistake and I have corrected it in the attached patch. > > Thanks for the patch. Here are some comments. Please ignore if I > repeat any of the comments given previously, as I didn't look at the > entire thread. > > 1) A new line between function return value and the function name: > +void CloseStartupProgress(StartupProcessOp operation) > +{ > Like below: > +void > +CloseStartupProgress(StartupProcessOp operation) > +{ > > 2) Add an entry in the commit fest, if it's not done yet. That way, > the patch gets tested on many platforms. > > 3) Replace "zero" with the number "0". > + # -1 disables the feature, zero logs all > > 4) I think we can just rename InitStartupProgress to > EnableStartupProgress or EnableStartupOpProgress to be more in sync > with what it does. > +/* > + * Sets the start timestamp of the current operation and also enables the > + * timeout for logging the progress of startup process. > + */ > +void > +InitStartupProgress(void) > +{ > > 5) Do we need the GetCurrentOperationStartTimestamp function at all? > It does nothing great actually, you might have referred to > GetCurrentTimestamp which does a good amount of work that qualifies to > be inside a function. Can't we just use the operationStartTimestamp > variable? Can we rename operationStartTimestamp (I don't think we need > to specify Timestamp in a variable name) to startup_op_start_time or > some other better name? > +/* > + * Fetches the start timestamp of the current operation. > + */ > +TimestampTz > +GetCurrentOperationStartTimestamp(void) > +{ > > 6) I think you can transform below > + /* If the feature is disabled, then no need to proceed further. */ > + if (log_startup_progress_interval < 0) > + return; > to > + /* If the feature is disabled, then no need to proceed further. */ > + if (log_startup_progress_interval == -1) > + return; > as -1 means feature disabled and values < -1 are not allowed to be set at all. > > 7) Isn't RECOVERY_IN_PROGRESS supposed to be REDO_IN_PRGRESS? Because, > "recovery in progress" generally applies to the entire startup process > right? Put it another way, the startup process as a whole does the > entire recovery, and you have the RECOVERY_IN_PROGRESS in the redo > phase of the entire startup process. > > 8) Why do we need to call get_startup_process_operation_string here? > Can't you directly use the error message text? > if (operation == RECOVERY_IN_PROGRESS) > ereport(LOG, > (errmsg("%s, elapsed time: %ld.%03d ms, current LSN: %X/%X", > get_startup_process_operation_string(operation), > > 9) Do you need error handling in the default case of > get_startup_process_operation_string? Instead, how about an assertion, > Assert(operation >= SYNCFS_IN_PROGRESS && operation <= > RESET_UNLOGGED_REL_END);? > default: > ereport(ERROR, > (errmsg("unrecognized operation (%d) in startup > progress update", > operation))); > 10) I personally didn't like the name > get_startup_process_operation_string. How about get_startup_op_string? > > 11) As pointed out by Robert, the error log message should start with > small letters. > "syncing data directory (syncfs)"); > "syncing data directory (fsync)"); > "performing crash recovery"); > "resetting unlogged relations"); > In general, the error log message should start with small letters and > not end with ".". The detail/hit messages should start with capital > letters and end with "." > ereport(ERROR, > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > errmsg("only B-Tree indexes are supported as targets > for verification"), > errdetail("Relation \"%s\" is not a B-Tree index.", > RelationGetRelationName(rel)))); > ereport(ERROR, > (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), > errmsg("sslcert and sslkey are superuser-only"), > errhint("User mappings with the sslcert or > sslkey options set may only be created or modified by the > superuser"))); > > 12) How about starting SYNCFS_IN_PROGRESS = 1, and leaving 0 for some > unknown value? > typedef enum StartupProcessOp > { > /* Codes for in-progress operations */ > SYNCFS_IN_PROGRESS = 1, > > 13) Can we combine LogStartupProgress and CloseStartupProgress? Let's > have a single function LogStartupProgress(StartupProcessOp operation, > const char *path, bool start);, and differentiate the functionality > with the start flag. > > 14) Can we move log_startup_progress_interval declaration from guc.h > and guc.c to xlog.h and xlog.c? Because it makes more sense to be > there, similar to the other GUCs under /* these variables are GUC > parameters related to XLOG */ in xlog.h. > > Regards, > Bharath Rupireddy.
Attachment
On Wed, Jul 21, 2021 at 12:52 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > Please find the v5 patch attached. Kindly let me know your comments. Thanks for the patch. Here are some comments on v5: 1) I still don't see the need for two functions LogStartupProgress and LogStartupProgressComplete. Most of the code is duplicate. I think we can just do it with a single function something like [1]: 2) Why isn't there a LogStartupProgressComplete(STARTUP_PROCESS_OP_REDO)? Is it because of the below existing log message? ereport(LOG, (errmsg("redo done at %X/%X system usage: %s", LSN_FORMAT_ARGS(ReadRecPtr), pg_rusage_show(&ru0)))); 3) I think it should be, "," after occurred instead of "." + * elapsed or not. TRUE if timeout occurred, FALSE otherwise. instead of + * elapsed or not. TRUE if timeout occurred. FALSE otherwise. [1] +/* + * Logs the progress of the operations performed during the startup process. + * is_complete true/false indicates that the operation is finished/ + * in-progress respectively. + */ +void +LogStartupProgress(StartupProcessOp op, const char *path, + bool is_complete) +{ + long secs; + int usecs; + int elapsed_ms; + int interval_in_ms; + + /* If not called from the startup process then this feature is of no use. */ + if (!AmStartupProcess()) + return; + + /* If the feature is disabled, then no need to proceed further. */ + if (log_startup_progress_interval < 0) + return; + + /* + * If the operation is in-progress and the timeout hasn't occurred, then + * no need to log the details. + */ + if (!is_complete && !logStartupProgressTimeout) + return; + + /* Timeout has occurred. */ + TimestampDifference(startupProcessOpStartTime, + GetCurrentTimestamp(), + &secs, &usecs); + + /* + * If the operation is in-progress, enable the timer for the next log + * message based on the time that current log message timer was supposed to + * fire. + */ + if (!is_complete) + { + elapsed_ms = (secs * 1000) + (usecs / 1000); + interval_in_ms = log_startup_progress_interval * 1000; + enable_timeout_after(LOG_STARTUP_PROGRESS_TIMEOUT, + (interval_in_ms - (elapsed_ms % interval_in_ms))); + } + + switch(op) + { + case STARTUP_PROCESS_OP_SYNCFS: + { + if (is_complete) + ereport(LOG, + (errmsg("data directory sync (syncfs) complete after %ld.%02d s", + secs, (usecs / 10000)))); + else + ereport(LOG, + (errmsg("syncing data directory (syncfs), elapsed time: %ld.%02d s, current path: %s", + secs, (usecs / 10000), path))); + } + break; + case STARTUP_PROCESS_OP_FSYNC: + { + if (is_complete) + ereport(LOG, + (errmsg("data directory sync (fsync) complete after %ld.%02d s", + secs, (usecs / 10000)))); + else + ereport(LOG, + (errmsg("syncing data directory (fsync), elapsed time: %ld.%02d s, current path: %s", + secs, (usecs / 10000), path))); + } + break; + case STARTUP_PROCESS_OP_REDO: + { + /* + * No need to log redo completion status here, as it will be + * done in the caller. + */ + if (!is_complete) + ereport(LOG, + (errmsg("redo in progress, elapsed time: %ld.%02d s, current LSN: %X/%X", + secs, (usecs / 10000), LSN_FORMAT_ARGS(ReadRecPtr)))); + } + break; + case STARTUP_PROCESS_OP_RESET_UNLOGGED_REL: + { + if (is_complete) + ereport(LOG, + (errmsg("unlogged relations reset after %ld.%02d s", + secs, (usecs / 10000)))); + else + ereport(LOG, + (errmsg("resetting unlogged relations, elapsed time: %ld.%02d s, current path: %s", + secs, (usecs / 10000), path))); + } + break; + default: + ereport(ERROR, + (errmsg("unrecognized operation (%d) in startup progress update", + op))); + break; + } + + if (is_complete) + disable_timeout(LOG_STARTUP_PROGRESS_TIMEOUT, false); + else + logStartupProgressTimeout = false; +} Regards, Bharath Rupireddy.
I think walkdir() should only call LogStartupProgress(FSYNC_IN_PROGRESS, path); when action == datadir_fsync_fname. ResetUnloggedRelations() is calling ResetUnloggedRelationsInTablespaceDir("base", op); before calling InitStartupProgress(). This shows StartupXLOG() calling ResetUnloggedRelations() twice. Should they both be shown ? If so, maybe they should be distinguished as here: elog(DEBUG1, "resetting unlogged relations: cleanup %d init %d", (op & UNLOGGED_RELATION_CLEANUP) != 0, (op & UNLOGGED_RELATION_INIT) != 0); On Wed, Jul 21, 2021 at 12:52:24PM +0530, Nitin Jadhav wrote: > 2021-07-20 18:47:32.046 IST [102230] LOG: listening on IPv4 address "127.0.0.1", port 5445 > 2021-07-20 18:47:32.048 IST [102230] LOG: listening on Unix socket "/tmp/.s.PGSQL.5445" > 2021-07-20 18:47:32.051 IST [102234] LOG: database system was interrupted; last known up at 2021-07-20 18:46:27 IST > 2021-07-20 18:47:32.060 IST [102234] LOG: data directory sync (fsync) complete after 0.00 s > 2021-07-20 18:47:32.060 IST [102234] LOG: database system was not properly shut down; automatic recovery in progress > 2021-07-20 18:47:32.063 IST [102234] LOG: unlogged relations reset after 0.00 s > 2021-07-20 18:47:32.063 IST [102234] LOG: redo starts at 0/14EF418 > 2021-07-20 18:47:33.063 IST [102234] LOG: redo in progress, elapsed time: 1.00 s, current LSN: 0/5C13D28 > 2021-07-20 18:47:34.063 IST [102234] LOG: redo in progress, elapsed time: 2.00 s, current LSN: 0/A289160 > 2021-07-20 18:47:35.063 IST [102234] LOG: redo in progress, elapsed time: 3.00 s, current LSN: 0/EE2DE10 > 2021-07-20 18:47:35.563 IST [102234] LOG: invalid record length at 0/115C63E0: wanted 24, got 0 > 2021-07-20 18:47:35.563 IST [102234] LOG: redo done at 0/115C63B8 system usage: CPU: user: 3.58 s, system: 0.14 s, elapsed:3.50 s > 2021-07-20 18:47:35.564 IST [102234] LOG: unlogged relations reset after 0.00 s > 2021-07-20 18:47:35.706 IST [102230] LOG: database system is ready to accept connections -- Justin
> I still don't see the need for two functions LogStartupProgress and > LogStartupProgressComplete. Most of the code is duplicate. I think we > can just do it with a single function something like [1]: Initially I had written a common function for these 2. You can see that in the earlier version of the patch. Later separated it since it was too much for one function. If others also agree to make it common, I can make that change. > Why isn't there a > LogStartupProgressComplete(STARTUP_PROCESS_OP_REDO)? Is it because of > the below existing log message? > ereport(LOG, > (errmsg("redo done at %X/%X system usage: %s", > LSN_FORMAT_ARGS(ReadRecPtr), > pg_rusage_show(&ru0)))); Yes. Adding another log message makes it redundant. > I think it should be, "," after occurred instead of "." > + * elapsed or not. TRUE if timeout occurred, FALSE otherwise. > instead of > + * elapsed or not. TRUE if timeout occurred. FALSE otherwise. Fixed. > I think walkdir() should only call LogStartupProgress(FSYNC_IN_PROGRESS, path); > when action == datadir_fsync_fname. I agree and fixed it. > ResetUnloggedRelations() is calling > ResetUnloggedRelationsInTablespaceDir("base", op); > before calling InitStartupProgress(). Fixed. > This shows StartupXLOG() calling ResetUnloggedRelations() twice. > Should they both be shown ? If so, maybe they should be distinguished as here: > > elog(DEBUG1, "resetting unlogged relations: cleanup %d init %d", > (op & UNLOGGED_RELATION_CLEANUP) != 0, > (op & UNLOGGED_RELATION_INIT) != 0); Fixed. Added separate codes to distinguish. Please find the patch attached. On Wed, Jul 21, 2021 at 6:43 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > I think walkdir() should only call LogStartupProgress(FSYNC_IN_PROGRESS, path); > when action == datadir_fsync_fname. > > ResetUnloggedRelations() is calling > ResetUnloggedRelationsInTablespaceDir("base", op); > before calling InitStartupProgress(). > > This shows StartupXLOG() calling ResetUnloggedRelations() twice. > Should they both be shown ? If so, maybe they should be distinguished as here: > > elog(DEBUG1, "resetting unlogged relations: cleanup %d init %d", > (op & UNLOGGED_RELATION_CLEANUP) != 0, > (op & UNLOGGED_RELATION_INIT) != 0); > > On Wed, Jul 21, 2021 at 12:52:24PM +0530, Nitin Jadhav wrote: > > 2021-07-20 18:47:32.046 IST [102230] LOG: listening on IPv4 address "127.0.0.1", port 5445 > > 2021-07-20 18:47:32.048 IST [102230] LOG: listening on Unix socket "/tmp/.s.PGSQL.5445" > > 2021-07-20 18:47:32.051 IST [102234] LOG: database system was interrupted; last known up at 2021-07-20 18:46:27 IST > > 2021-07-20 18:47:32.060 IST [102234] LOG: data directory sync (fsync) complete after 0.00 s > > 2021-07-20 18:47:32.060 IST [102234] LOG: database system was not properly shut down; automatic recovery in progress > > 2021-07-20 18:47:32.063 IST [102234] LOG: unlogged relations reset after 0.00 s > > 2021-07-20 18:47:32.063 IST [102234] LOG: redo starts at 0/14EF418 > > 2021-07-20 18:47:33.063 IST [102234] LOG: redo in progress, elapsed time: 1.00 s, current LSN: 0/5C13D28 > > 2021-07-20 18:47:34.063 IST [102234] LOG: redo in progress, elapsed time: 2.00 s, current LSN: 0/A289160 > > 2021-07-20 18:47:35.063 IST [102234] LOG: redo in progress, elapsed time: 3.00 s, current LSN: 0/EE2DE10 > > 2021-07-20 18:47:35.563 IST [102234] LOG: invalid record length at 0/115C63E0: wanted 24, got 0 > > 2021-07-20 18:47:35.563 IST [102234] LOG: redo done at 0/115C63B8 system usage: CPU: user: 3.58 s, system: 0.14 s, elapsed:3.50 s > > 2021-07-20 18:47:35.564 IST [102234] LOG: unlogged relations reset after 0.00 s > > 2021-07-20 18:47:35.706 IST [102230] LOG: database system is ready to accept connections > > -- > Justin
Attachment
On Fri, Jul 23, 2021 at 04:09:47PM +0530, Nitin Jadhav wrote: > > I think walkdir() should only call LogStartupProgress(FSYNC_IN_PROGRESS, path); > > when action == datadir_fsync_fname. > > I agree and fixed it. I saw that you fixed it by calling InitStartupProgress() after the walkdir() calls which do pre_sync_fname. So then walkdir is calling LogStartupProgress(STARTUP_PROCESS_OP_FSYNC) even when it's not doing fsync, and then LogStartupProgress() is returning because !AmStartupProcess(). That seems indirect, fragile, and confusing. I suggest that walkdir() should take an argument for which operation to pass to LogStartupProgress(). You can pass a special enum for cases where nothing should be logged, like STARTUP_PROCESS_OP_NONE. On Wed, Jul 21, 2021 at 04:47:32PM +0530, Bharath Rupireddy wrote: > 1) I still don't see the need for two functions LogStartupProgress and > LogStartupProgressComplete. Most of the code is duplicate. I think we > can just do it with a single function something like [1]: I agree that one function can do this more succinctly. I think it's best to use a separate enum value for START operations and END operations. switch(operation) { case STARTUP_PROCESS_OP_SYNCFS_START: ereport(...); break; case STARTUP_PROCESS_OP_SYNCFS_END: ereport(...); break; case STARTUP_PROCESS_OP_FSYNC_START: ereport(...); break; case STARTUP_PROCESS_OP_FSYNC_END: ereport(...); break; ... -- Justin
On Sun, Jul 25, 2021 at 1:56 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > On Fri, Jul 23, 2021 at 04:09:47PM +0530, Nitin Jadhav wrote: > > > I think walkdir() should only call LogStartupProgress(FSYNC_IN_PROGRESS, path); > > > when action == datadir_fsync_fname. > > > > I agree and fixed it. > > I saw that you fixed it by calling InitStartupProgress() after the walkdir() > calls which do pre_sync_fname. So then walkdir is calling > LogStartupProgress(STARTUP_PROCESS_OP_FSYNC) even when it's not doing fsync, > and then LogStartupProgress() is returning because !AmStartupProcess(). > > That seems indirect, fragile, and confusing. I suggest that walkdir() should > take an argument for which operation to pass to LogStartupProgress(). You can > pass a special enum for cases where nothing should be logged, like > STARTUP_PROCESS_OP_NONE. I don't think walkdir() has any business calling LogStartupProgress() at all. It's supposed to be a generic function, not one that is only available to be called from the startup process, or has different behavior there. From my point of view, the right thing is to put the logging calls into the particular callbacks that SyncDataDirectory() uses. > On Wed, Jul 21, 2021 at 04:47:32PM +0530, Bharath Rupireddy wrote: > > 1) I still don't see the need for two functions LogStartupProgress and > > LogStartupProgressComplete. Most of the code is duplicate. I think we > > can just do it with a single function something like [1]: > > I agree that one function can do this more succinctly. I think it's best to > use a separate enum value for START operations and END operations. Maybe I'm missing something here, but I don't understand the purpose of this. You can always combine two functions into one, but it's only worth doing if you end up with less code, which doesn't seem to be the case here. The strings are all different and that's most of the function, and the other stuff that gets done isn't the same either, so you'd just end up with a bunch of if-statements. That doesn't seem like an improvement. Thinking further, I guess I understand it from the caller's perspective. It's not necessarily clear why in some places we call LogStartupProgress() and others LogStartupProgressComplete(). Someone might expect a function with "complete" in the name like that to only be called once at the very end, rather than once at the end of a phase, and it does sort of make sense that you'd want to call one function everywhere rather than sometimes one and sometimes the other ... but I'm not exactly sure how to get there from here. Having only LogStartupProgress() but having it do a giant if-statement to figure out whether we're mid-phase or end-of-phase does not seem like the right approach. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Jul 26, 2021 at 10:13:09AM -0400, Robert Haas wrote: > I don't think walkdir() has any business calling LogStartupProgress() > at all. It's supposed to be a generic function, not one that is only > available to be called from the startup process, or has different > behavior there. From my point of view, the right thing is to put the > logging calls into the particular callbacks that SyncDataDirectory() > uses. You're right - this is better. On Wed, Jul 21, 2021 at 04:47:32PM +0530, Bharath Rupireddy wrote: > > > 1) I still don't see the need for two functions LogStartupProgress and > > > LogStartupProgressComplete. Most of the code is duplicate. I think we > > > can just do it with a single function something like [1]: > > > > I agree that one function can do this more succinctly. I think it's best to > > use a separate enum value for START operations and END operations. > > Maybe I'm missing something here, but I don't understand the purpose > of this. You can always combine two functions into one, but it's only > worth doing if you end up with less code, which doesn't seem to be the > case here. 4 files changed, 39 insertions(+), 71 deletions(-) > ... but I'm not exactly sure how to get there from here. Having only > LogStartupProgress() but having it do a giant if-statement to figure > out whether we're mid-phase or end-of-phase does not seem like the > right approach. I used a bool arg and negation to handle within a single switch. Maybe it's cleaner to use a separate enum value for each DONE, and set a local done flag. startup[29675] LOG: database system was interrupted; last known up at 2021-07-26 10:23:04 CDT startup[29675] LOG: syncing data directory (fsync), elapsed time: 1.38 s, current path: ./pg_ident.conf startup[29675] LOG: data directory sync (fsync) complete after 1.72 s startup[29675] LOG: database system was not properly shut down; automatic recovery in progress startup[29675] LOG: resetting unlogged relations (cleanup) complete after 0.00 s startup[29675] LOG: redo starts at 0/17BE500 startup[29675] LOG: redo in progress, elapsed time: 1.00 s, current LSN: 0/35D7CB8 startup[29675] LOG: redo in progress, elapsed time: 2.00 s, current LSN: 0/54A6918 startup[29675] LOG: redo in progress, elapsed time: 3.00 s, current LSN: 0/7370570 startup[29675] LOG: redo in progress, elapsed time: 4.00 s, current LSN: 0/924D8A0 startup[29675] LOG: redo done at 0/9FFFFB8 system usage: CPU: user: 4.28 s, system: 0.15 s, elapsed: 4.44 s startup[29675] LOG: resetting unlogged relations (init) complete after 0.03 s startup[29675] LOG: checkpoint starting: end-of-recovery immediate startup[29675] LOG: checkpoint complete: wrote 9872 buffers (60.3%); 0 WAL file(s) added, 0 removed, 8 recycled; write=0.136s, sync=0.801 s, total=1.260 s; sync files=21, longest=0.774 s, average=B
Attachment
On Mon, Jul 26, 2021 at 11:30 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > > Maybe I'm missing something here, but I don't understand the purpose > > of this. You can always combine two functions into one, but it's only > > worth doing if you end up with less code, which doesn't seem to be the > > case here. > > 4 files changed, 39 insertions(+), 71 deletions(-) Hmm. I don't completely hate that, but I don't love it either. I don't think the result is any easier to understand than the original, and in some ways it's worse. In particular, you've flattened the separate comments for each of the individual functions into a single-line comment that is more generic than the comments it replaced. You could fix that and you'd still be slightly ahead on LOC, but I'm not convinced that it's actually a real win. > > ... but I'm not exactly sure how to get there from here. Having only > > LogStartupProgress() but having it do a giant if-statement to figure > > out whether we're mid-phase or end-of-phase does not seem like the > > right approach. > > I used a bool arg and negation to handle within a single switch. Maybe it's > cleaner to use a separate enum value for each DONE, and set a local done flag. If we're going to go the route of combining the functions, I agree that a Boolean is the way to go; I think we have some existing precedent for 'bool finished' rather than 'bool done'. But I kind of wonder if we should have an enum in the first place. It feels like we've got code in a bunch of places that just exists to decide which enum value to use, and then code someplace else that turns around and decides which message to produce. That would be sensible if we were using the same enum values in lots of places, but that's not the case here. So suppose we just moved the messages to the places where we're now calling LogStartupProgress() or LogStartupProgressComplete()? So something like this: if (should_report_startup_progress()) ereport(LOG, (errmsg("syncing data directory (syncfs), elapsed time: %ld.%02d s, current path: %s", secs, (usecs / 10000), path))); Well, that doesn't quite work, because "secs" and "usecs" aren't going to exist in the caller. We can fix that easily enough: let's just make should_report_startup_progress take arguments long *secs, int *usecs, and bool done. Then the above becomes: if (should_report_startup_progress(&secs, &usecs, false)) ereport(LOG, (errmsg("syncing data directory (syncfs), elapsed time: %ld.%02d s, current path: %s", secs, (usecs / 10000), path))); And if this were the call site that corresponds to LogStartupProgressComplete(), we'd replace false with true. Now, the only real disadvantage of this that I can see is that it requires the caller to declare 'secs' and 'usecs', which is not a big deal, but mildly annoying perhaps. I think we can do better still with a little macro magic. Suppose we define a macro report_startup_progress(force, msg, ...) that expands to: { long secs; int usecs; if (startup_progress_timer_expired(&secs, &usecs, force)) ereport(LOG, errmsg(msg, secs, usecs, ...)); } Then the above just becomes this: report_startup_progress(false, "syncing data directory (syncfs), elapsed time: %ld.%02d s, current path: %s", path); This would have the advantage that the strings we're using would be present in the code that arranges to emit them, instead of being removed to some other module, so I think it would be clearer. It would also have the advantage of making it much easier to add further calls, if someone feels they want to do that. You don't have to run around and update enums and all the various things that use them, just copy and paste the line above and adjust as required. With this design, we avoid a lot of "action at a distance". We don't define the message strings in a place far-removed from the code that wants to emit them any more. When someone wants a new progress message, they can just add another call to report_statup_progress() wherever it needs to go and they're done. They don't have to go run and update the enum and various switch statements. They're just done. -- Robert Haas EDB: http://www.enterprisedb.com
> > > I saw that you fixed it by calling InitStartupProgress() after the walkdir() > > > calls which do pre_sync_fname. So then walkdir is calling > > > LogStartupProgress(STARTUP_PROCESS_OP_FSYNC) even when it's not doing fsync, > > > and then LogStartupProgress() is returning because !AmStartupProcess(). > > > > I don't think walkdir() has any business calling LogStartupProgress() > > at all. It's supposed to be a generic function, not one that is only > > available to be called from the startup process, or has different > > behavior there. From my point of view, the right thing is to put the > > logging calls into the particular callbacks that SyncDataDirectory() > > uses. > > You're right - this is better. I also agree that this is the better place to do it. Hence modified the patch accordingly. The condition "!AmStartupProcess()" is added to differentiate whether the call is done from a startup process or some other process. Actually StartupXLOG() gets called in 2 places. one in StartupProcessMain() and the other in InitPostgres(). As the logging of the startup progress is required only during the startup process and not in the other cases, so added the condition to confirm the call is from the startup process. I did not find any other way to differentiate. Kindly let me know if there is any other better approach to do this. > > > Maybe I'm missing something here, but I don't understand the purpose > > > of this. You can always combine two functions into one, but it's only > > > worth doing if you end up with less code, which doesn't seem to be the > > > case here. > > > > 4 files changed, 39 insertions(+), 71 deletions(-) > > Hmm. I don't completely hate that, but I don't love it either. I don't > think the result is any easier to understand than the original, and in > some ways it's worse. In particular, you've flattened the separate > comments for each of the individual functions into a single-line > comment that is more generic than the comments it replaced. You could > fix that and you'd still be slightly ahead on LOC, but I'm not > convinced that it's actually a real win. As per my understanding there are no changes required wrt this. Hence not done any changes. Please find the updated patch attached. Kindly share your comments if any. On Mon, Jul 26, 2021 at 10:41 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Jul 26, 2021 at 11:30 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > Maybe I'm missing something here, but I don't understand the purpose > > > of this. You can always combine two functions into one, but it's only > > > worth doing if you end up with less code, which doesn't seem to be the > > > case here. > > > > 4 files changed, 39 insertions(+), 71 deletions(-) > > Hmm. I don't completely hate that, but I don't love it either. I don't > think the result is any easier to understand than the original, and in > some ways it's worse. In particular, you've flattened the separate > comments for each of the individual functions into a single-line > comment that is more generic than the comments it replaced. You could > fix that and you'd still be slightly ahead on LOC, but I'm not > convinced that it's actually a real win. > > > > ... but I'm not exactly sure how to get there from here. Having only > > > LogStartupProgress() but having it do a giant if-statement to figure > > > out whether we're mid-phase or end-of-phase does not seem like the > > > right approach. > > > > I used a bool arg and negation to handle within a single switch. Maybe it's > > cleaner to use a separate enum value for each DONE, and set a local done flag. > > If we're going to go the route of combining the functions, I agree > that a Boolean is the way to go; I think we have some existing > precedent for 'bool finished' rather than 'bool done'. > > But I kind of wonder if we should have an enum in the first place. It > feels like we've got code in a bunch of places that just exists to > decide which enum value to use, and then code someplace else that > turns around and decides which message to produce. That would be > sensible if we were using the same enum values in lots of places, but > that's not the case here. So suppose we just moved the messages to the > places where we're now calling LogStartupProgress() or > LogStartupProgressComplete()? So something like this: > > if (should_report_startup_progress()) > ereport(LOG, > (errmsg("syncing data directory (syncfs), elapsed > time: %ld.%02d s, current path: %s", > secs, (usecs / 10000), path))); > > Well, that doesn't quite work, because "secs" and "usecs" aren't going > to exist in the caller. We can fix that easily enough: let's just make > should_report_startup_progress take arguments long *secs, int *usecs, > and bool done. Then the above becomes: > > if (should_report_startup_progress(&secs, &usecs, false)) > ereport(LOG, > (errmsg("syncing data directory (syncfs), elapsed > time: %ld.%02d s, current path: %s", > secs, (usecs / 10000), path))); > > And if this were the call site that corresponds to > LogStartupProgressComplete(), we'd replace false with true. Now, the > only real disadvantage of this that I can see is that it requires the > caller to declare 'secs' and 'usecs', which is not a big deal, but > mildly annoying perhaps. I think we can do better still with a little > macro magic. Suppose we define a macro report_startup_progress(force, > msg, ...) that expands to: > > { > long secs; > int usecs; > if (startup_progress_timer_expired(&secs, &usecs, force)) > ereport(LOG, errmsg(msg, secs, usecs, ...)); > } > > Then the above just becomes this: > > report_startup_progress(false, "syncing data directory (syncfs), > elapsed time: %ld.%02d s, current path: %s", path); > > This would have the advantage that the strings we're using would be > present in the code that arranges to emit them, instead of being > removed to some other module, so I think it would be clearer. It would > also have the advantage of making it much easier to add further calls, > if someone feels they want to do that. You don't have to run around > and update enums and all the various things that use them, just copy > and paste the line above and adjust as required. > > With this design, we avoid a lot of "action at a distance". We don't > define the message strings in a place far-removed from the code that > wants to emit them any more. When someone wants a new progress > message, they can just add another call to report_statup_progress() > wherever it needs to go and they're done. They don't have to go run > and update the enum and various switch statements. They're just done. > > -- > Robert Haas > EDB: http://www.enterprisedb.com
Attachment
On Wed, Jul 28, 2021 at 5:24 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > I also agree that this is the better place to do it. Hence modified > the patch accordingly. The condition "!AmStartupProcess()" is added to > differentiate whether the call is done from a startup process or some > other process. Actually StartupXLOG() gets called in 2 places. one in > StartupProcessMain() and the other in InitPostgres(). As the logging > of the startup progress is required only during the startup process > and not in the other cases, The InitPostgres() case occurs when the server is started in bootstrap mode (during initdb) or in single-user mode (postgres --single). I do not see any reason why we shouldn't produce progress messages in at least the latter case. I suspect that someone who is in the rather desperate scenario of having to use single-user mode would really like to know how long the server is going to take to start up. Perhaps during initdb we don't want messages, but I'm not sure that we need to do anything about that here. None of the messages that the server normally produces show up when you run initdb, so I guess they are getting redirected to /dev/null or something. So I don't think that using AmStartupProcess() for this purpose is right. -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, Jul 28, 2021 at 7:02 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Jul 28, 2021 at 5:24 AM Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > I also agree that this is the better place to do it. Hence modified > > the patch accordingly. The condition "!AmStartupProcess()" is added to > > differentiate whether the call is done from a startup process or some > > other process. Actually StartupXLOG() gets called in 2 places. one in > > StartupProcessMain() and the other in InitPostgres(). As the logging > > of the startup progress is required only during the startup process > > and not in the other cases, > > The InitPostgres() case occurs when the server is started in bootstrap > mode (during initdb) or in single-user mode (postgres --single). I do > not see any reason why we shouldn't produce progress messages in at > least the latter case. I suspect that someone who is in the rather > desperate scenario of having to use single-user mode would really like > to know how long the server is going to take to start up. +1 to emit the same log messages in single-user mode and basically whoever calls StartupXLOG. Do we need to adjust the GUC parameter log_startup_progress_interval(a reasonable value) so that the logs are generated by default? > Perhaps during initdb we don't want messages, but I'm not sure that we > need to do anything about that here. None of the messages that the > server normally produces show up when you run initdb, so I guess they > are getting redirected to /dev/null or something. I enabled the below log message in XLogFlush and ran initdb, it is printing these logs onto the stdout, looks like the logs have not been redirected to /dev/null in initdb/bootstrap mode. #ifdef WAL_DEBUG if (XLOG_DEBUG) elog(LOG, "xlog flush request %X/%X; write %X/%X; flush %X/%X", LSN_FORMAT_ARGS(record), LSN_FORMAT_ARGS(LogwrtResult.Write), LSN_FORMAT_ARGS(LogwrtResult.Flush)); #endif So, even in bootstrap mode, can we use the above #ifdef WAL_DEBUG and XLOG_DEBUG to print those logs? It looks like the problem with these macros is that they are not settable by normal users in the production environment, but only by the developers. I'm not sure how much it is helpful to print the startup process progress logs in bootstrap mode. Maybe we can use the IsBootstrapProcessingMode macro to disable these logs in the bootstrap mode. > So I don't think that using AmStartupProcess() for this purpose is right. Agree. Regards, Bharath Rupireddy.
On Wed, Jul 28, 2021 at 11:25 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > Perhaps during initdb we don't want messages, but I'm not sure that we > > need to do anything about that here. None of the messages that the > > server normally produces show up when you run initdb, so I guess they > > are getting redirected to /dev/null or something. > > I enabled the below log message in XLogFlush and ran initdb, it is > printing these logs onto the stdout, looks like the logs have not been > redirected to /dev/null in initdb/bootstrap mode. > > #ifdef WAL_DEBUG > if (XLOG_DEBUG) > elog(LOG, "xlog flush request %X/%X; write %X/%X; flush %X/%X", > LSN_FORMAT_ARGS(record), > LSN_FORMAT_ARGS(LogwrtResult.Write), > LSN_FORMAT_ARGS(LogwrtResult.Flush)); > #endif > > So, even in bootstrap mode, can we use the above #ifdef WAL_DEBUG and > XLOG_DEBUG to print those logs? It looks like the problem with these > macros is that they are not settable by normal users in the production > environment, but only by the developers. I'm not sure how much it is > helpful to print the startup process progress logs in bootstrap mode. > Maybe we can use the IsBootstrapProcessingMode macro to disable these > logs in the bootstrap mode. I don't think we should drag XLOG_DEBUG into this. That's a debugging facility that isn't really relevant to the topic at hand. The point is the server normally prints a bunch of messages that we don't see in bootstrap mode. For example: [rhaas pgsql]$ postgres 2021-07-28 11:32:33.824 EDT [36801] LOG: starting PostgreSQL 15devel on x86_64-apple-darwin19.6.0, compiled by clang version 5.0.2 (tags/RELEASE_502/final), 64-bit 2021-07-28 11:32:33.825 EDT [36801] LOG: listening on IPv6 address "::1", port 5432 2021-07-28 11:32:33.825 EDT [36801] LOG: listening on IPv4 address "127.0.0.1", port 5432 2021-07-28 11:32:33.826 EDT [36801] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" 2021-07-28 11:32:33.846 EDT [36802] LOG: database system was shut down at 2021-07-28 11:32:32 EDT 2021-07-28 11:32:33.857 EDT [36801] LOG: database system is ready to accept connections None of that shows up when you run initdb. Taking a fast look at the code, I don't think the reasons are the same for all of those messages. Some of the code isn't reached, whereas e.g. "database system was shut down at 2021-07-28 11:32:32 EDT" is special-cased. I'm not sure right off the top of my head what this code should do, but ideally it looks something like one of the cases we've already got. -- Robert Haas EDB: http://www.enterprisedb.com
> The InitPostgres() case occurs when the server is started in bootstrap > mode (during initdb) or in single-user mode (postgres --single). I do > not see any reason why we shouldn't produce progress messages in at > least the latter case. I suspect that someone who is in the rather > desperate scenario of having to use single-user mode would really like > to know how long the server is going to take to start up. Thanks for sharing the information. I have done the necessary changes to show the logs during the latter case (postgres --single) and verified the log messages. > +1 to emit the same log messages in single-user mode and basically > whoever calls StartupXLOG. Do we need to adjust the GUC parameter > log_startup_progress_interval(a reasonable value) so that the logs are > generated by default? At present, this feature is enabled by default and the initial value set for log_startup_progress_interval is 10 seconds. On Wed, Jul 28, 2021 at 9:07 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Jul 28, 2021 at 11:25 AM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > Perhaps during initdb we don't want messages, but I'm not sure that we > > > need to do anything about that here. None of the messages that the > > > server normally produces show up when you run initdb, so I guess they > > > are getting redirected to /dev/null or something. > > > > I enabled the below log message in XLogFlush and ran initdb, it is > > printing these logs onto the stdout, looks like the logs have not been > > redirected to /dev/null in initdb/bootstrap mode. > > > > #ifdef WAL_DEBUG > > if (XLOG_DEBUG) > > elog(LOG, "xlog flush request %X/%X; write %X/%X; flush %X/%X", > > LSN_FORMAT_ARGS(record), > > LSN_FORMAT_ARGS(LogwrtResult.Write), > > LSN_FORMAT_ARGS(LogwrtResult.Flush)); > > #endif > > > > So, even in bootstrap mode, can we use the above #ifdef WAL_DEBUG and > > XLOG_DEBUG to print those logs? It looks like the problem with these > > macros is that they are not settable by normal users in the production > > environment, but only by the developers. I'm not sure how much it is > > helpful to print the startup process progress logs in bootstrap mode. > > Maybe we can use the IsBootstrapProcessingMode macro to disable these > > logs in the bootstrap mode. > > I don't think we should drag XLOG_DEBUG into this. That's a debugging > facility that isn't really relevant to the topic at hand. The point is > the server normally prints a bunch of messages that we don't see in > bootstrap mode. For example: > > [rhaas pgsql]$ postgres > 2021-07-28 11:32:33.824 EDT [36801] LOG: starting PostgreSQL 15devel > on x86_64-apple-darwin19.6.0, compiled by clang version 5.0.2 > (tags/RELEASE_502/final), 64-bit > 2021-07-28 11:32:33.825 EDT [36801] LOG: listening on IPv6 address > "::1", port 5432 > 2021-07-28 11:32:33.825 EDT [36801] LOG: listening on IPv4 address > "127.0.0.1", port 5432 > 2021-07-28 11:32:33.826 EDT [36801] LOG: listening on Unix socket > "/tmp/.s.PGSQL.5432" > 2021-07-28 11:32:33.846 EDT [36802] LOG: database system was shut > down at 2021-07-28 11:32:32 EDT > 2021-07-28 11:32:33.857 EDT [36801] LOG: database system is ready to > accept connections > > None of that shows up when you run initdb. Taking a fast look at the > code, I don't think the reasons are the same for all of those > messages. Some of the code isn't reached, whereas e.g. "database > system was shut down at 2021-07-28 11:32:32 EDT" is special-cased. I'm > not sure right off the top of my head what this code should do, but > ideally it looks something like one of the cases we've already got. > > -- > Robert Haas > EDB: http://www.enterprisedb.com
Attachment
On Thu, Jul 29, 2021 at 4:56 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > Thanks for sharing the information. I have done the necessary changes > to show the logs during the latter case (postgres --single) and > verified the log messages. Thanks. Can you please have a look at what I suggested down toward the bottom of http://postgr.es/m/CA+TgmoaP2wEFSktmCgwT9LXuz7Y99HNdUYshpk7qNFuQB98g6g@mail.gmail.com ? -- Robert Haas EDB: http://www.enterprisedb.com
> Thanks. Can you please have a look at what I suggested down toward the > bottom of http://postgr.es/m/CA+TgmoaP2wEFSktmCgwT9LXuz7Y99HNdUYshpk7qNFuQB98g6g@mail.gmail.com > ? > > If we're going to go the route of combining the functions, I agree > that a Boolean is the way to go; I think we have some existing > precedent for 'bool finished' rather than 'bool done'. > > But I kind of wonder if we should have an enum in the first place. It > feels like we've got code in a bunch of places that just exists to > decide which enum value to use, and then code someplace else that > turns around and decides which message to produce. That would be > sensible if we were using the same enum values in lots of places, but > that's not the case here. So suppose we just moved the messages to the > places where we're now calling LogStartupProgress() or > LogStartupProgressComplete()? So something like this: Sorry. I thought it is related to the discussion of deciding whether LogStartupProgress() and LogStartupProgressComplete() should be combined or not. I feel it's a really nice design. With this we avoid a "action at a distance" issue and its easy to use. If we are reporting the same kind of msgs at multiple places then the current approach of using enum will be more suitable since we don't have to worry about matching the log msg string. But in the current scenario, we are not using the same kind of msgs at multiple places (I feel such scenario will not occur in future also. Even if there is similar operation, it can be distinguished like resetting unlogged relations is distinguished by init and clean. Kindly mention if you can oversee any such scenario), hence the approach you are suggesting will be a best suit. Thanks & Regards, Nitin Jadhav On Thu, Jul 29, 2021 at 9:48 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Jul 29, 2021 at 4:56 AM Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > Thanks for sharing the information. I have done the necessary changes > > to show the logs during the latter case (postgres --single) and > > verified the log messages. > > Thanks. Can you please have a look at what I suggested down toward the > bottom of http://postgr.es/m/CA+TgmoaP2wEFSktmCgwT9LXuz7Y99HNdUYshpk7qNFuQB98g6g@mail.gmail.com > ? > > -- > Robert Haas > EDB: http://www.enterprisedb.com
> Thanks. Can you please have a look at what I suggested down toward the > bottom of http://postgr.es/m/CA+TgmoaP2wEFSktmCgwT9LXuz7Y99HNdUYshpk7qNFuQB98g6g@mail.gmail.com ? Implemented the above approach and verified the patch. Kindly have a look and share your thoughts. Thanks & Regards, Nitin Jadhav On Fri, Jul 30, 2021 at 10:40 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > > Thanks. Can you please have a look at what I suggested down toward the > > bottom of http://postgr.es/m/CA+TgmoaP2wEFSktmCgwT9LXuz7Y99HNdUYshpk7qNFuQB98g6g@mail.gmail.com > > ? > > > > If we're going to go the route of combining the functions, I agree > > that a Boolean is the way to go; I think we have some existing > > precedent for 'bool finished' rather than 'bool done'. > > > > But I kind of wonder if we should have an enum in the first place. It > > feels like we've got code in a bunch of places that just exists to > > decide which enum value to use, and then code someplace else that > > turns around and decides which message to produce. That would be > > sensible if we were using the same enum values in lots of places, but > > that's not the case here. So suppose we just moved the messages to the > > places where we're now calling LogStartupProgress() or > > LogStartupProgressComplete()? So something like this: > > Sorry. I thought it is related to the discussion of deciding whether > LogStartupProgress() and LogStartupProgressComplete() should be > combined or not. I feel it's a really nice design. With this we avoid > a "action at a distance" issue and its easy to use. If we are > reporting the same kind of msgs at multiple places then the current > approach of using enum will be more suitable since we don't have to > worry about matching the log msg string. But in the current scenario, > we are not using the same kind of msgs at multiple places (I feel such > scenario will not occur in future also. Even if there is similar > operation, it can be distinguished like resetting unlogged relations > is distinguished by init and clean. Kindly mention if you can oversee > any such scenario), hence the approach you are suggesting will be a > best suit. > > Thanks & Regards, > Nitin Jadhav > > On Thu, Jul 29, 2021 at 9:48 PM Robert Haas <robertmhaas@gmail.com> wrote: > > > > On Thu, Jul 29, 2021 at 4:56 AM Nitin Jadhav > > <nitinjadhavpostgres@gmail.com> wrote: > > > Thanks for sharing the information. I have done the necessary changes > > > to show the logs during the latter case (postgres --single) and > > > verified the log messages. > > > > Thanks. Can you please have a look at what I suggested down toward the > > bottom of http://postgr.es/m/CA+TgmoaP2wEFSktmCgwT9LXuz7Y99HNdUYshpk7qNFuQB98g6g@mail.gmail.com > > ? > > > > -- > > Robert Haas > > EDB: http://www.enterprisedb.com
Attachment
Two issues that I saw: The first syncfs message is not output, because it's before InitStartupProgress() is called: 2021-08-03 07:53:02.176 CDT startup[9717] LOG: database system was interrupted; last known up at 2021-08-03 07:52:15 CDT 2021-08-03 07:53:02.733 CDT startup[9717] LOG: data directory sync (syncfs) complete after 0.55 s 2021-08-03 07:53:02.734 CDT startup[9717] LOG: database system was not properly shut down; automatic recovery in progress FP exception when the GUC is set to 0: 2021-08-03 07:53:02.877 CDT postmaster[9715] LOG: startup process (PID 9717) was terminated by signal 8: Floating pointexception Probably due to mod zero operation. This prevents the process from starting. + enable_timeout_after(LOG_STARTUP_PROGRESS_TIMEOUT, + (interval_in_ms - (elapsed_ms % interval_in_ms))); -- Justin
On Tue, Aug 3, 2021 at 2:48 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > Implemented the above approach and verified the patch. Kindly have a > look and share your thoughts. +LogStartupProgressTimerExpired(bool force, long *secs, int *usecs) The name of this function begins with "log" but it does not log anything, so that's probably a sign that you should rethink the name of the function. I suggested startup_progress_timer_expired() upthread, but now I think maybe we should call it startup_progress_timer_has_expired() and then renaming the Boolean you've called logStartupProgressTimeout to startup_progress_timer_expired. Also, the argument "bool force" doesn't really make sense for this function, which is why I suggested above calling it "bool done" instead. + elapsed_ms = (seconds * 1000) + (useconds / 1000); + interval_in_ms = log_startup_progress_interval * 1000; + enable_timeout_after(LOG_STARTUP_PROGRESS_TIMEOUT, + (interval_in_ms - (elapsed_ms % interval_in_ms))); This will work correctly only if elapsed_ms is equal to interval_in_ms or slightly greater than interval_ms. But if elapsed_ms is greater than two times interval_ms, then it will produce pretty much random results. If elapsed_ms is negative because the system clock gets set backward, a possibility I've already mentioned to you in a previous review, then it will also misbehave. I actually don't think enable_timeout_after() is very easy to use for this kind of thing. At least for me, it's way easier to think about calculating the timestamp at which I want the timer to expire. Maybe something along these lines: next_timeout = last_startup_progress_timeout + interval; if (next_timeout < now) { // Either the timeout was processed so late that we missed an entire cycle, // or the system clock was set backwards. next_timeout = now + interval; } enable_timeout_at(next_timeout); Also, I said before that I thought it was OK that you were logging a line at the end of every operation as well as after every N milliseconds. But, the more I think about it, the less I like it. We already have a 'redo done' line that shows up at the end of redo, which the patch wisely does not duplicate. But it's not quite clear that any of these other things are important enough to bother mentioning in the log unless they take a long time. After an immediate shutdown of an empty cluster, with this patch applied, I get 3 extra log messages: 2021-08-03 10:17:49.630 EDT [17567] LOG: data directory sync (fsync) complete after 0.13 s 2021-08-03 10:17:49.633 EDT [17567] LOG: resetting unlogged relations (cleanup) complete after 0.00 s 2021-08-03 10:17:49.635 EDT [17567] LOG: resetting unlogged relations (init) complete after 0.00 s That doesn't seem like information anyone really needs. If it had taken a long time, it would have been worth logging, but in the normal case where it doesn't, it's just clutter. Another funny thing is that, as you've coded it, those additional log lines only appear when log_startup_progress_interval != 0. That's strange. It seems particularly strange because of the existing precedent where 'redo done' appears regardless of any setting, but also because when I set, say, a 10s interval, I guess I expect something to happen every 10s. Making something happen once at the end is different. So I think we should take this out, which would permit simplifying a bunch of code.The four places where you call ereport_startup_progress(true, ...) would go away. ereport_startup_progress() would no longer need a Boolean argument, and neither would LogStartupProgressTimerExpired() / startup_progress_timer_has_expired(). Note that there's no real need to disable the timeout when we're done with it. It's fine if we do, but if we don't, it's also not a big deal; all that happens if we leave the timer scheduled and let it expire is that it will set a Boolean flag that nobody will care about. So what I'm thinking about is that we could just have, say, reset_startup_progress_timeout() and startup_progress_timeout_has_expired(). reset_startup_progress_timeout() would just do exactly what I showed above to reset the timeout, and you'd call it at the beginning of each phase. And startup_progress_timeout_has_expired() would look roughly like this: if (!startup_progress_timer_expired) return; now = GetCurrentTimestamp(); // compute timestamp difference last_startup_progress_timeout = now; reset_startup_progress_timeout(); With these changes you'd have only 1 place in the code that needs to care about log_startup_progress_interval, as opposed to 3 as you have it currently, and only one place that enables the timeout, as opposed to 2 as you have it currently. I think that would be tidier. I think we also should consider where to put the new functions introduced by this patch, and the GUC. You put them in xlog.c which is reasonable since that is where StartupXLOG() lives. However, xlog.c is also a gigantic file, and xlog.h is included in a lot of places, most of which aren't going to care about the new things you're adding to that file at all. So I'm thinking it might make more sense to put the new code in src/backend/postmaster/startup.c. That is actually a better thematic fit given that this is really about the startup process specifically, not WAL-logging in general. Then reinit.c would include startup.h instead of xlog.h, which seems better, because I don't think we want any actual xlog operations to happen from within that file, so better not to be including xlog.h. The patch currently lacks documentation. It needs to update config.sgml. -- Robert Haas EDB: http://www.enterprisedb.com
Thanks for the detailed explanation. > + elapsed_ms = (seconds * 1000) + (useconds / 1000); > + interval_in_ms = log_startup_progress_interval * 1000; > + enable_timeout_after(LOG_STARTUP_PROGRESS_TIMEOUT, > + (interval_in_ms - (elapsed_ms % interval_in_ms))); > > This will work correctly only if elapsed_ms is equal to interval_in_ms > or slightly greater than interval_ms. But if elapsed_ms is greater > than two times interval_ms, then it will produce pretty much random > results. If elapsed_ms is negative because the system clock gets set > backward, a possibility I've already mentioned to you in a previous > review, then it will also misbehave. I actually don't think > enable_timeout_after() is very easy to use for this kind of thing. At > least for me, it's way easier to think about calculating the timestamp > at which I want the timer to expire. Maybe something along these > lines: > > next_timeout = last_startup_progress_timeout + interval; > if (next_timeout < now) > { > // Either the timeout was processed so late that we missed an entire cycle, > // or the system clock was set backwards. > next_timeout = now + interval; > } > enable_timeout_at(next_timeout); > > So I think we should take this out, which would permit simplifying a > bunch of code.The four places where you call > ereport_startup_progress(true, ...) would go away. > ereport_startup_progress() would no longer need a Boolean argument, > and neither would LogStartupProgressTimerExpired() / > startup_progress_timer_has_expired(). Note that there's no real need > to disable the timeout when we're done with it. It's fine if we do, > but if we don't, it's also not a big deal; all that happens if we > leave the timer scheduled and let it expire is that it will set a > Boolean flag that nobody will care about. So what I'm thinking about > is that we could just have, say, reset_startup_progress_timeout() and > startup_progress_timeout_has_expired(). > reset_startup_progress_timeout() would just do exactly what I showed > above to reset the timeout, and you'd call it at the beginning of each > phase. And startup_progress_timeout_has_expired() would look roughly > like this: > > if (!startup_progress_timer_expired) > return; > now = GetCurrentTimestamp(); > // compute timestamp difference > last_startup_progress_timeout = now; > reset_startup_progress_timeout(); This seems a little confusing. As we are setting last_startup_progress_timeout = now and then calling reset_startup_progress_timeout() which will calculate the next_time based on the value of last_startup_progress_timeout initially and checks whether next_timeout is less than now. It doesn't make sense to me. I feel we should calculate the next_timeout based on the time that it is supposed to fire next time. So we should set last_startup_progress_timeout = next_timeout after enabling the timer. Also I feel with the 2 functions mentioned above, we also need InitStartupProgress() which sets the initial value to startupProcessOpStartTime which is used to calculate the time difference and display in the logs. I could see those functions like below. InitStartupProgress(void) { startupProcessOpStartTime = GetCurrentTimestamp(); ResetStartupProgressTimeout(startupProcessOpStartTime); } reset_startup_progress_timeout(TimeStampTz now) { next_timeout = last_startup_progress_timeout + interval; if (next_timeout < now) { // Either the timeout was processed so late that we missed an entire cycle, // or the system clock was set backwards. next_timeout = now + interval; } enable_timeout_at(next_timeout); last_startup_progress_timeout = next_timeout; } startup_progress_timeout_has_expired() { if (!startup_progress_timer_expired) return; now = GetCurrentTimestamp(); // compute timestamp difference based on startupProcessOpStartTime reset_startup_progress_timeout(now); } Kindly share your thoughts and correct me if I am wrong. > I think we also should consider where to put the new functions > introduced by this patch, and the GUC. You put them in xlog.c which is > reasonable since that is where StartupXLOG() lives. However, xlog.c is > also a gigantic file, and xlog.h is included in a lot of places, most > of which aren't going to care about the new things you're adding to > that file at all. So I'm thinking it might make more sense to put the > new code in src/backend/postmaster/startup.c. That is actually a > better thematic fit given that this is really about the startup > process specifically, not WAL-logging in general. Then reinit.c would > include startup.h instead of xlog.h, which seems better, because I > don't think we want any actual xlog operations to happen from within > that file, so better not to be including xlog.h. > > The patch currently lacks documentation. It needs to update config.sgml. I agree and I will take care in the next patch. Thanks & Regards, Nitin Jadhav On Tue, Aug 3, 2021 at 8:21 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Aug 3, 2021 at 2:48 AM Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > Implemented the above approach and verified the patch. Kindly have a > > look and share your thoughts. > > +LogStartupProgressTimerExpired(bool force, long *secs, int *usecs) > > The name of this function begins with "log" but it does not log > anything, so that's probably a sign that you should rethink the name > of the function. I suggested startup_progress_timer_expired() > upthread, but now I think maybe we should call it > startup_progress_timer_has_expired() and then renaming the Boolean > you've called logStartupProgressTimeout to > startup_progress_timer_expired. Also, the argument "bool force" > doesn't really make sense for this function, which is why I suggested > above calling it "bool done" instead. > > + elapsed_ms = (seconds * 1000) + (useconds / 1000); > + interval_in_ms = log_startup_progress_interval * 1000; > + enable_timeout_after(LOG_STARTUP_PROGRESS_TIMEOUT, > + (interval_in_ms - (elapsed_ms % interval_in_ms))); > > This will work correctly only if elapsed_ms is equal to interval_in_ms > or slightly greater than interval_ms. But if elapsed_ms is greater > than two times interval_ms, then it will produce pretty much random > results. If elapsed_ms is negative because the system clock gets set > backward, a possibility I've already mentioned to you in a previous > review, then it will also misbehave. I actually don't think > enable_timeout_after() is very easy to use for this kind of thing. At > least for me, it's way easier to think about calculating the timestamp > at which I want the timer to expire. Maybe something along these > lines: > > next_timeout = last_startup_progress_timeout + interval; > if (next_timeout < now) > { > // Either the timeout was processed so late that we missed an entire cycle, > // or the system clock was set backwards. > next_timeout = now + interval; > } > enable_timeout_at(next_timeout); > > Also, I said before that I thought it was OK that you were logging a > line at the end of every operation as well as after every N > milliseconds. But, the more I think about it, the less I like it. We > already have a 'redo done' line that shows up at the end of redo, > which the patch wisely does not duplicate. But it's not quite clear > that any of these other things are important enough to bother > mentioning in the log unless they take a long time. After an immediate > shutdown of an empty cluster, with this patch applied, I get 3 extra > log messages: > > 2021-08-03 10:17:49.630 EDT [17567] LOG: data directory sync (fsync) > complete after 0.13 s > 2021-08-03 10:17:49.633 EDT [17567] LOG: resetting unlogged relations > (cleanup) complete after 0.00 s > 2021-08-03 10:17:49.635 EDT [17567] LOG: resetting unlogged relations > (init) complete after 0.00 s > > That doesn't seem like information anyone really needs. If it had > taken a long time, it would have been worth logging, but in the normal > case where it doesn't, it's just clutter. Another funny thing is that, > as you've coded it, those additional log lines only appear when > log_startup_progress_interval != 0. That's strange. It seems > particularly strange because of the existing precedent where 'redo > done' appears regardless of any setting, but also because when I set, > say, a 10s interval, I guess I expect something to happen every 10s. > Making something happen once at the end is different. > > So I think we should take this out, which would permit simplifying a > bunch of code.The four places where you call > ereport_startup_progress(true, ...) would go away. > ereport_startup_progress() would no longer need a Boolean argument, > and neither would LogStartupProgressTimerExpired() / > startup_progress_timer_has_expired(). Note that there's no real need > to disable the timeout when we're done with it. It's fine if we do, > but if we don't, it's also not a big deal; all that happens if we > leave the timer scheduled and let it expire is that it will set a > Boolean flag that nobody will care about. So what I'm thinking about > is that we could just have, say, reset_startup_progress_timeout() and > startup_progress_timeout_has_expired(). > reset_startup_progress_timeout() would just do exactly what I showed > above to reset the timeout, and you'd call it at the beginning of each > phase. And startup_progress_timeout_has_expired() would look roughly > like this: > > if (!startup_progress_timer_expired) > return; > now = GetCurrentTimestamp(); > // compute timestamp difference > last_startup_progress_timeout = now; > reset_startup_progress_timeout(); > > With these changes you'd have only 1 place in the code that needs to > care about log_startup_progress_interval, as opposed to 3 as you have > it currently, and only one place that enables the timeout, as opposed > to 2 as you have it currently. I think that would be tidier. > > I think we also should consider where to put the new functions > introduced by this patch, and the GUC. You put them in xlog.c which is > reasonable since that is where StartupXLOG() lives. However, xlog.c is > also a gigantic file, and xlog.h is included in a lot of places, most > of which aren't going to care about the new things you're adding to > that file at all. So I'm thinking it might make more sense to put the > new code in src/backend/postmaster/startup.c. That is actually a > better thematic fit given that this is really about the startup > process specifically, not WAL-logging in general. Then reinit.c would > include startup.h instead of xlog.h, which seems better, because I > don't think we want any actual xlog operations to happen from within > that file, so better not to be including xlog.h. > > The patch currently lacks documentation. It needs to update config.sgml. > > -- > Robert Haas > EDB: http://www.enterprisedb.com
On Thu, Aug 5, 2021 at 7:41 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > This seems a little confusing. As we are setting > last_startup_progress_timeout = now and then calling > reset_startup_progress_timeout() which will calculate the next_time > based on the value of last_startup_progress_timeout initially and > checks whether next_timeout is less than now. It doesn't make sense to > me. I feel we should calculate the next_timeout based on the time that > it is supposed to fire next time. So we should set > last_startup_progress_timeout = next_timeout after enabling the timer. > Also I feel with the 2 functions mentioned above, we also need > InitStartupProgress() which sets the initial value to > startupProcessOpStartTime which is used to calculate the time > difference and display in the logs. I could see those functions like > below. > > InitStartupProgress(void) > { > startupProcessOpStartTime = GetCurrentTimestamp(); > ResetStartupProgressTimeout(startupProcessOpStartTime); > } This makes sense, but I think I'd like to have all the functions in this patch use names_like_this() rather than NamesLikeThis(). > reset_startup_progress_timeout(TimeStampTz now) > { > next_timeout = last_startup_progress_timeout + interval; > if (next_timeout < now) > { > // Either the timeout was processed so late that we missed an entire cycle, > // or the system clock was set backwards. > next_timeout = now + interval; > } > enable_timeout_at(next_timeout); > last_startup_progress_timeout = next_timeout; > } Hmm, yeah, that seems good, but given this change, maybe the variables need a little renaming. Like change last_startup_progress_timeout to scheduled_startup_progress_timeout, perhaps. > startup_progress_timeout_has_expired() > { > if (!startup_progress_timer_expired) > return; > now = GetCurrentTimestamp(); > // compute timestamp difference based on startupProcessOpStartTime > reset_startup_progress_timeout(now); > } I guess this one needs to return a Boolean, actually. -- Robert Haas EDB: http://www.enterprisedb.com
Modified the reset_startup_progress_timeout() to take the second parameter which indicates whether it is called for initialization or for resetting. Without this parameter there is a problem if we call init_startup_progress() more than one time if there is no call to ereport_startup_progress() in between as the code related to disabling the timer has been removed. reset_startup_progress_timeout(TimeStampTz now, bool is_init) { if (is_init) next_timeout = now + interval; else { next_timeout = scheduled_startup_progress_timeout + interval; if (next_timeout < now) { // Either the timeout was processed so late that we missed an entire cycle, // or the system clock was set backwards. next_timeout = now + interval; } enable_timeout_at(next_timeout); scheduled_startup_progress_timeout = next_timeout; } } I have incorporated this in the attached patch. Please correct me if I am wrong. > This makes sense, but I think I'd like to have all the functions in > this patch use names_like_this() rather than NamesLikeThis(). I have changed all the function names accordingly. But I would like to know why it should be names_like_this() as there are many functions with the format NamesLikeThis(). I would like to understand when to use what, just to incorporate in the future patches. > Hmm, yeah, that seems good, but given this change, maybe the variables > need a little renaming. Like change last_startup_progress_timeout to > scheduled_startup_progress_timeout, perhaps. Right. Changed the variable name. > I guess this one needs to return a Boolean, actually. Yes. > reset_startup_progress_timeout(TimeStampTz now) > { > next_timeout = last_startup_progress_timeout + interval; > if (next_timeout < now) > { > // Either the timeout was processed so late that we missed an entire cycle, > // or the system clock was set backwards. > next_timeout = now + interval; > } > enable_timeout_at(next_timeout); > last_startup_progress_timeout = next_timeout; > } As per the above logic, I would like to discuss 2 cases. Case-1: First scheduled timeout is after 1 sec. But the time out occurred after 1.5 sec. So the log msg prints after 1.5 sec. Next timer is scheduled after 2 sec (scheduled_startup_progress_timeout + interval). The condition (next_timeout < now) gets evaluated to false and everything goes smooth. Case-2: First scheduled timeout is after 1 sec. But the timeout occurred after 2.5 sec. So the log msg prints after 2.5 sec. Now next time is scheduled after 2 sec (scheduled_startup_progress_timeout + interval). So the condition (next_timeout < now) will fail and the next_timeout will get reset to 3.5 sec (2.5 + 1) and it continues. Is this behaviour ok or should we set the next_timeout to 3 sec. Please share your thoughts on this. Thanks & Regards, Nitin Jadhav On Thu, Aug 5, 2021 at 7:49 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Aug 5, 2021 at 7:41 AM Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > This seems a little confusing. As we are setting > > last_startup_progress_timeout = now and then calling > > reset_startup_progress_timeout() which will calculate the next_time > > based on the value of last_startup_progress_timeout initially and > > checks whether next_timeout is less than now. It doesn't make sense to > > me. I feel we should calculate the next_timeout based on the time that > > it is supposed to fire next time. So we should set > > last_startup_progress_timeout = next_timeout after enabling the timer. > > Also I feel with the 2 functions mentioned above, we also need > > InitStartupProgress() which sets the initial value to > > startupProcessOpStartTime which is used to calculate the time > > difference and display in the logs. I could see those functions like > > below. > > > > InitStartupProgress(void) > > { > > startupProcessOpStartTime = GetCurrentTimestamp(); > > ResetStartupProgressTimeout(startupProcessOpStartTime); > > } > > This makes sense, but I think I'd like to have all the functions in > this patch use names_like_this() rather than NamesLikeThis(). > > > reset_startup_progress_timeout(TimeStampTz now) > > { > > next_timeout = last_startup_progress_timeout + interval; > > if (next_timeout < now) > > { > > // Either the timeout was processed so late that we missed an entire cycle, > > // or the system clock was set backwards. > > next_timeout = now + interval; > > } > > enable_timeout_at(next_timeout); > > last_startup_progress_timeout = next_timeout; > > } > > Hmm, yeah, that seems good, but given this change, maybe the variables > need a little renaming. Like change last_startup_progress_timeout to > scheduled_startup_progress_timeout, perhaps. > > > startup_progress_timeout_has_expired() > > { > > if (!startup_progress_timer_expired) > > return; > > now = GetCurrentTimestamp(); > > // compute timestamp difference based on startupProcessOpStartTime > > reset_startup_progress_timeout(now); > > } > > I guess this one needs to return a Boolean, actually. > > -- > Robert Haas > EDB: http://www.enterprisedb.com
Attachment
On Mon, Aug 9, 2021 at 11:20 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > Modified the reset_startup_progress_timeout() to take the second > parameter which indicates whether it is called for initialization or > for resetting. Without this parameter there is a problem if we call > init_startup_progress() more than one time if there is no call to > ereport_startup_progress() in between as the code related to disabling > the timer has been removed. I'd really like to avoid this. I don't see why it's necessary. You say it causes a problem, but you don't explain what problem it causes. enable_timeout_at() will disable the timer if not already done. I think all we need to do is set scheduled_startup_progress_timeout = 0 before calling reset_startup_progress_timeout() in the "init" case and don't do that for the non-init case. If that's not quite right, maybe you can work out something that does work. But adding an is_init flag to a function and having no common code between the is_init = true case and the is_init = false case is exactly the kind of thing that I don't want here. I want as much common code as possible. > > This makes sense, but I think I'd like to have all the functions in > > this patch use names_like_this() rather than NamesLikeThis(). > > I have changed all the function names accordingly. But I would like to > know why it should be names_like_this() as there are many functions > with the format NamesLikeThis(). I would like to understand when to > use what, just to incorporate in the future patches. There is, unfortunately, no hard-and-fast rule, but we want to maintain as much consistency with the existing style as we can. I wasn't initially sure what would work best for this particular patch, but after we committed to a name like ereport_startup_progress() that to me was a strong hint in favor of using names_like_this() throughout. It seems impossible to imagine punctuating it as EreportStartupProgress() or something since that would be wildly inconsistent with the existing function name, and there seems to be no good reason why this patch can't be internally consistent. To some degree, we tend to use names_like_this() for low-level functions and NamesLikeThis() for higher-level functions, but that is not a very consistent practice. > > reset_startup_progress_timeout(TimeStampTz now) > > { > > next_timeout = last_startup_progress_timeout + interval; > > if (next_timeout < now) > > { > > // Either the timeout was processed so late that we missed an entire cycle, > > // or the system clock was set backwards. > > next_timeout = now + interval; > > } > > enable_timeout_at(next_timeout); > > last_startup_progress_timeout = next_timeout; > > } > > As per the above logic, I would like to discuss 2 cases. > > Case-1: > First scheduled timeout is after 1 sec. But the time out occurred > after 1.5 sec. So the log msg prints after 1.5 sec. Next timer is > scheduled after 2 sec (scheduled_startup_progress_timeout + interval). > The condition (next_timeout < now) gets evaluated to false and > everything goes smooth. > > Case-2: > First scheduled timeout is after 1 sec. But the timeout occurred after > 2.5 sec. So the log msg prints after 2.5 sec. Now next time is > scheduled after 2 sec (scheduled_startup_progress_timeout + interval). > So the condition (next_timeout < now) will fail and the next_timeout > will get reset to 3.5 sec (2.5 + 1) and it continues. Is this > behaviour ok or should we set the next_timeout to 3 sec. Please share > your thoughts on this. I can't quite follow this, because it seems like you are sometimes viewing the interval as 1 second and sometimes as 2 seconds. Maybe you could clarify that, and perhaps show example output? My feeling is that the timer will almost always be slightly late, but it will very rarely be extremely late, and it will also very rarely be early (only if someone resets the system clock). So let's consider those two cases separately. If the timer is a little bit late each time, we want to avoid drift, so we want to shorten the next sleep time by the amount that the previous interrupt was late. If the interval is 1000ms and the interrupt fires 1ms late then we should sleep 999ms the next time; if 2ms late, 998ms. That way, although there will be some variation in which the messages are logged, the drift won't accumulate over time and even after many minutes of recovery the messages will be printed at ABOUT the same number of milliseconds after the second every time, instead of drifting further and further off course. But this strategy cannot be used if the drift is larger than the interval. If we are trying to log a message every 1000ms and the timer doesn't fire for 14ms, we can wait only 986ms the next time. If it doesn't fire for 140ms, we can wait only 860ms the next time. But if the timer doesn't fire for 1400ms, we cannot wait for -400ms the next time. So what should we do? My proposal is to just wait for the configured interval, 1000ms, essentially giving up on drift correction. Now you could argue that we ought to just wait for 600ms in the hopes of making it 2 * 1000ms after the previous status message, but I'm not sure that really has any value, and it doesn't seem especially likely to work. The only way timer interrupts are likely to be that badly delayed is if the system is horrifically overloaded, and if that's the case the next timer interrupt isn't likely to fire on schedule anyway. Trying to correct for drift in such a situation seems more likely to be confusing than to produce any helpful result. -- Robert Haas EDB: http://www.enterprisedb.com
> I'd really like to avoid this. I don't see why it's necessary. You say > it causes a problem, but you don't explain what problem it causes. > enable_timeout_at() will disable the timer if not already done. I > think all we need to do is set scheduled_startup_progress_timeout = 0 > before calling reset_startup_progress_timeout() in the "init" case and > don't do that for the non-init case. If that's not quite right, maybe > you can work out something that does work. But adding an is_init flag > to a function and having no common code between the is_init = true > case and the is_init = false case is exactly the kind of thing that I > don't want here. I want as much common code as possible. Setting set scheduled_startup_progress_timeout = 0 in the "init" case solves the problem. The problem was if we call init_start_progress() continuously then the first call to reset_startup_progress_timeout() sets the value of scheduled_startup_progress_timeout to "now + interval". Later call to reset_startup_progress_timeout() uses the previously set value of scheduled_startup_progress_timeout which was not correct and it was not behaving as expected. I could see that the first log gets printed far later than the expected interval. > To some degree, we tend to use names_like_this() for low-level > functions and NamesLikeThis() for higher-level functions, but that is > not a very consistent practice. Ok. Thanks for the information. > But this strategy cannot be used if the drift is larger than the > interval. If we are trying to log a message every 1000ms and the timer > doesn't fire for 14ms, we can wait only 986ms the next time. If it > doesn't fire for 140ms, we can wait only 860ms the next time. But if > the timer doesn't fire for 1400ms, we cannot wait for -400ms the next > time. So what should we do? My proposal is to just wait for the > configured interval, 1000ms, essentially giving up on drift > correction. Now you could argue that we ought to just wait for 600ms > in the hopes of making it 2 * 1000ms after the previous status > message, but I'm not sure that really has any value, and it doesn't > seem especially likely to work. The only way timer interrupts are > likely to be that badly delayed is if the system is horrifically > overloaded, and if that's the case the next timer interrupt isn't > likely to fire on schedule anyway. Trying to correct for drift in such > a situation seems more likely to be confusing than to produce any > helpful result. This is what I was trying to convey in case-2. I agree that it is better to consider "now + interval" in such a case instead of trying to adjust the drift. Please find the updated patch attached. On Tue, Aug 10, 2021 at 1:06 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Aug 9, 2021 at 11:20 AM Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > Modified the reset_startup_progress_timeout() to take the second > > parameter which indicates whether it is called for initialization or > > for resetting. Without this parameter there is a problem if we call > > init_startup_progress() more than one time if there is no call to > > ereport_startup_progress() in between as the code related to disabling > > the timer has been removed. > > I'd really like to avoid this. I don't see why it's necessary. You say > it causes a problem, but you don't explain what problem it causes. > enable_timeout_at() will disable the timer if not already done. I > think all we need to do is set scheduled_startup_progress_timeout = 0 > before calling reset_startup_progress_timeout() in the "init" case and > don't do that for the non-init case. If that's not quite right, maybe > you can work out something that does work. But adding an is_init flag > to a function and having no common code between the is_init = true > case and the is_init = false case is exactly the kind of thing that I > don't want here. I want as much common code as possible. > > > > This makes sense, but I think I'd like to have all the functions in > > > this patch use names_like_this() rather than NamesLikeThis(). > > > > I have changed all the function names accordingly. But I would like to > > know why it should be names_like_this() as there are many functions > > with the format NamesLikeThis(). I would like to understand when to > > use what, just to incorporate in the future patches. > > There is, unfortunately, no hard-and-fast rule, but we want to > maintain as much consistency with the existing style as we can. I > wasn't initially sure what would work best for this particular patch, > but after we committed to a name like ereport_startup_progress() that > to me was a strong hint in favor of using names_like_this() > throughout. It seems impossible to imagine punctuating it as > EreportStartupProgress() or something since that would be wildly > inconsistent with the existing function name, and there seems to be no > good reason why this patch can't be internally consistent. > > To some degree, we tend to use names_like_this() for low-level > functions and NamesLikeThis() for higher-level functions, but that is > not a very consistent practice. > > > > reset_startup_progress_timeout(TimeStampTz now) > > > { > > > next_timeout = last_startup_progress_timeout + interval; > > > if (next_timeout < now) > > > { > > > // Either the timeout was processed so late that we missed an entire cycle, > > > // or the system clock was set backwards. > > > next_timeout = now + interval; > > > } > > > enable_timeout_at(next_timeout); > > > last_startup_progress_timeout = next_timeout; > > > } > > > > As per the above logic, I would like to discuss 2 cases. > > > > Case-1: > > First scheduled timeout is after 1 sec. But the time out occurred > > after 1.5 sec. So the log msg prints after 1.5 sec. Next timer is > > scheduled after 2 sec (scheduled_startup_progress_timeout + interval). > > The condition (next_timeout < now) gets evaluated to false and > > everything goes smooth. > > > > Case-2: > > First scheduled timeout is after 1 sec. But the timeout occurred after > > 2.5 sec. So the log msg prints after 2.5 sec. Now next time is > > scheduled after 2 sec (scheduled_startup_progress_timeout + interval). > > So the condition (next_timeout < now) will fail and the next_timeout > > will get reset to 3.5 sec (2.5 + 1) and it continues. Is this > > behaviour ok or should we set the next_timeout to 3 sec. Please share > > your thoughts on this. > > I can't quite follow this, because it seems like you are sometimes > viewing the interval as 1 second and sometimes as 2 seconds. Maybe you > could clarify that, and perhaps show example output? > > My feeling is that the timer will almost always be slightly late, but > it will very rarely be extremely late, and it will also very rarely be > early (only if someone resets the system clock). So let's consider > those two cases separately. If the timer is a little bit late each > time, we want to avoid drift, so we want to shorten the next sleep > time by the amount that the previous interrupt was late. If the > interval is 1000ms and the interrupt fires 1ms late then we should > sleep 999ms the next time; if 2ms late, 998ms. That way, although > there will be some variation in which the messages are logged, the > drift won't accumulate over time and even after many minutes of > recovery the messages will be printed at ABOUT the same number of > milliseconds after the second every time, instead of drifting further > and further off course. > > But this strategy cannot be used if the drift is larger than the > interval. If we are trying to log a message every 1000ms and the timer > doesn't fire for 14ms, we can wait only 986ms the next time. If it > doesn't fire for 140ms, we can wait only 860ms the next time. But if > the timer doesn't fire for 1400ms, we cannot wait for -400ms the next > time. So what should we do? My proposal is to just wait for the > configured interval, 1000ms, essentially giving up on drift > correction. Now you could argue that we ought to just wait for 600ms > in the hopes of making it 2 * 1000ms after the previous status > message, but I'm not sure that really has any value, and it doesn't > seem especially likely to work. The only way timer interrupts are > likely to be that badly delayed is if the system is horrifically > overloaded, and if that's the case the next timer interrupt isn't > likely to fire on schedule anyway. Trying to correct for drift in such > a situation seems more likely to be confusing than to produce any > helpful result. > > -- > Robert Haas > EDB: http://www.enterprisedb.com
Attachment
On Tue, Aug 10, 2021 at 9:28 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > Please find the updated patch attached. I think this is getting close. The output looks nice. However, I still see a bunch of issues. You mentioned previously that you would add documentation, but I do not see it here. startup_progress_timer_expired should be declared as sig_atomic_t like we do in other cases (see interrupt.c). The default value of the new GUC is 10s in postgresql.conf.sample, but -1 in guc.c. They should both be 10s, and the one in postgresql.conf.sample should be commented out. I suggest making the GUC GUC_UNIT_MS rather than GUC_UNIT_S, but expressing the default in postgresl.conf.sample as 10s rather than 10000ms. I tried values measured in milliseconds just for testing purposes and didn't initially understand why it wasn't working. I don't think there's any reason it can't work. I would prefer to see log_startup_progress_interval declared and defined in startup.c/startup.h rather than guc.c/guc.h. There's no precedent in the tree for the use of ##__VA_ARGS__. On my system it seems to work fine if I just leave out the ##. Any reason not to do that? Two of the declarations in startup.h forgot the leading "extern", while the other two that are right next to them have it, matching project style. I'm reasonably happy with most of the identifier names now, but I think init_startup_progress() is confusing. The reason I think that is that we call it more than once, which is not really what people think about when they think of an "init" function, I think. It's not initializing the startup progress facility in general; it's preparing for the next phase of startup progress reporting. How about renaming it to begin_startup_progress_phase()? And then startup_process_op_start_time could be startup_process_phase_start_time to match. SyncDataDirectory() potentially walks over the data directory three times: once to call do_syncfs(), once to call pre_sync_fname(), and once to call datadir_fsync_fname(). With this patch, the first and third will emit progress messages if the operation runs long, but the second will not. I think they should all be treated the same. Also, the locations where you've inserted calls to init_startup_progress() don't really make it clear with what code that's associated. I'd put them *immediately* before the call to do_syncfs() or walkdir(). Remember that PostgreSQL comments are typically written "telegraph style," so function comments should say "Does whatever" not "It does whatever". Most of them are correct, but there's one sentence you need to fix. -- Robert Haas EDB: http://www.enterprisedb.com
> startup_progress_timer_expired should be declared as sig_atomic_t like > we do in other cases (see interrupt.c). Fixed. > The default value of the new GUC is 10s in postgresql.conf.sample, but > -1 in guc.c. They should both be 10s, and the one in > postgresql.conf.sample should be commented out. Fixed. > I suggest making the GUC GUC_UNIT_MS rather than GUC_UNIT_S, but > expressing the default in postgresl.conf.sample as 10s rather than > 10000ms. I tried values measured in milliseconds just for testing > purposes and didn't initially understand why it wasn't working. I > don't think there's any reason it can't work. As suggested, I have changed it to GUC_UNIT_MS and kept the default value to 10s. I would like to know the reason why it can't be GUC_UNIT_S as we are expressing the values in terms of seconds. > I would prefer to see log_startup_progress_interval declared and > defined in startup.c/startup.h rather than guc.c/guc.h. Fixed. > There's no precedent in the tree for the use of ##__VA_ARGS__. On my > system it seems to work fine if I just leave out the ##. Any reason > not to do that? I had added this to support if no variable argument are passed to the macro. For example, in the previous patches we used to log the progress at the end of the operation like "ereport_startup_progress(true, "data directory sync (syncfs) complete after %ld.%02d s");". Since these calls are removed now, ## is not required. Hence removed in the attached patch. > Two of the declarations in startup.h forgot the leading "extern", > while the other two that are right next to them have it, matching > project style. I had not added extern since those function were not used in the other files. Now added to match the project style. > I'm reasonably happy with most of the identifier names now, but I > think init_startup_progress() is confusing. The reason I think that is > that we call it more than once, which is not really what people think > about when they think of an "init" function, I think. It's not > initializing the startup progress facility in general; it's preparing > for the next phase of startup progress reporting. How about renaming > it to begin_startup_progress_phase()? And then > startup_process_op_start_time could be > startup_process_phase_start_time to match. Yes begin_startup_progress_phase() looks more appropriate. Instead of startup_process_phase_start_time, startup_progress_phase_start_time looks more appropriate. Changed these in the attached patch. > SyncDataDirectory() potentially walks over the data directory three > times: once to call do_syncfs(), once to call pre_sync_fname(), and > once to call datadir_fsync_fname(). With this patch, the first and > third will emit progress messages if the operation runs long, but the > second will not. I think they should all be treated the same. Also, > the locations where you've inserted calls to init_startup_progress() > don't really make it clear with what code that's associated. I'd put > them *immediately* before the call to do_syncfs() or walkdir(). Fixed. > Remember that PostgreSQL comments are typically written "telegraph > style," so function comments should say "Does whatever" not "It does > whatever". Most of them are correct, but there's one sentence you need > to fix. Fixed in the function comments of startup_progress_timeout_has_expired(). Please let me now if this is not the one you wanted me to correct. > You mentioned previously that you would add documentation, but I do > not see it here. Sorry. I missed this. I have added the documentation in the attached patch. On Tue, Aug 10, 2021 at 8:56 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Aug 10, 2021 at 9:28 AM Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > Please find the updated patch attached. > > I think this is getting close. The output looks nice. However, I still > see a bunch of issues. > > You mentioned previously that you would add documentation, but I do > not see it here. > > startup_progress_timer_expired should be declared as sig_atomic_t like > we do in other cases (see interrupt.c). > > The default value of the new GUC is 10s in postgresql.conf.sample, but > -1 in guc.c. They should both be 10s, and the one in > postgresql.conf.sample should be commented out. > > I suggest making the GUC GUC_UNIT_MS rather than GUC_UNIT_S, but > expressing the default in postgresl.conf.sample as 10s rather than > 10000ms. I tried values measured in milliseconds just for testing > purposes and didn't initially understand why it wasn't working. I > don't think there's any reason it can't work. > > I would prefer to see log_startup_progress_interval declared and > defined in startup.c/startup.h rather than guc.c/guc.h. > > There's no precedent in the tree for the use of ##__VA_ARGS__. On my > system it seems to work fine if I just leave out the ##. Any reason > not to do that? > > Two of the declarations in startup.h forgot the leading "extern", > while the other two that are right next to them have it, matching > project style. > > I'm reasonably happy with most of the identifier names now, but I > think init_startup_progress() is confusing. The reason I think that is > that we call it more than once, which is not really what people think > about when they think of an "init" function, I think. It's not > initializing the startup progress facility in general; it's preparing > for the next phase of startup progress reporting. How about renaming > it to begin_startup_progress_phase()? And then > startup_process_op_start_time could be > startup_process_phase_start_time to match. > > SyncDataDirectory() potentially walks over the data directory three > times: once to call do_syncfs(), once to call pre_sync_fname(), and > once to call datadir_fsync_fname(). With this patch, the first and > third will emit progress messages if the operation runs long, but the > second will not. I think they should all be treated the same. Also, > the locations where you've inserted calls to init_startup_progress() > don't really make it clear with what code that's associated. I'd put > them *immediately* before the call to do_syncfs() or walkdir(). > > Remember that PostgreSQL comments are typically written "telegraph > style," so function comments should say "Does whatever" not "It does > whatever". Most of them are correct, but there's one sentence you need > to fix. > > -- > Robert Haas > EDB: http://www.enterprisedb.com
Attachment
On Thu, Aug 12, 2021 at 7:40 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > I suggest making the GUC GUC_UNIT_MS rather than GUC_UNIT_S, but > > expressing the default in postgresl.conf.sample as 10s rather than > > 10000ms. I tried values measured in milliseconds just for testing > > purposes and didn't initially understand why it wasn't working. I > > don't think there's any reason it can't work. > > As suggested, I have changed it to GUC_UNIT_MS and kept the default > value to 10s. I would like to know the reason why it can't be > GUC_UNIT_S as we are expressing the values in terms of seconds. I mean, it *could* be. There's just no advantage. Values in seconds will work correctly either way. But values in milliseconds will only work if it's GUC_UNIT_MS. It seems to me that it's better to make more things work rather than fewer. > > There's no precedent in the tree for the use of ##__VA_ARGS__. On my > > system it seems to work fine if I just leave out the ##. Any reason > > not to do that? > > I had added this to support if no variable argument are passed to the > macro. For example, in the previous patches we used to log the > progress at the end of the operation like > "ereport_startup_progress(true, "data directory sync (syncfs) complete > after %ld.%02d s");". Since these calls are removed now, ## is not > required. Hence removed in the attached patch. Hmm, OK. That's actually a pretty good reason for using ## there. It just made me nervous because we have no similar uses of ## in the backend code. We rely on it elsewhere for concatenation, but not for comma removal. Let's try leaving it out for now unless somebody else shows up with a different opinion. > I had not added extern since those function were not used in the other > files. Now added to match the project style. Anything that's not used in other files should be declared static in the file itself, and not present in the header. Once you fix this for reset_startup_progress_timeout, the header won't need to include datatype/timestamp.h any more, which is good, because we don't want header files to depend on more other header files than necessary. Looking over this version, I realized something I (or you) should have noticed sooner: you've added the RegisterTimeout call to InitPostgres(), but that's for things that are used by all backends, and this is only used by the startup process. So it seems to me that the right place is StartupProcessMain. That would have the further advantage of allowing startup_progress_timeout_handler to be made static. begin_startup_progress_phase() and startup_progress_timeout_has_expired() are the actual API functions though so they will need to remain extern. @@ -679,7 +680,6 @@ static char *recovery_target_lsn_string; /* should be static, but commands/variable.c needs to get at this */ char *role_string; - /* * Displayable names for context types (enum GucContext) * This hunk should be removed. -- Robert Haas EDB: http://www.enterprisedb.com
Should this feature distinguish between crash recovery and archive recovery on a hot standby ? Otherwise the standby will display this all the time. 2021-08-14 16:13:33.139 CDT startup[11741] LOG: redo in progress, elapsed time: 124.42 s, current LSN: 0/EEE2100 If so, I think maybe you'd check !InArchiveRecovery (but until Robert finishes cleanup of xlog.c variables, I can't say that with much confidence).
On Sat, Aug 14, 2021 at 5:47 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > Should this feature distinguish between crash recovery and archive recovery on > a hot standby ? Otherwise the standby will display this all the time. > > 2021-08-14 16:13:33.139 CDT startup[11741] LOG: redo in progress, elapsed time: 124.42 s, current LSN: 0/EEE2100 > > If so, I think maybe you'd check !InArchiveRecovery (but until Robert finishes > cleanup of xlog.c variables, I can't say that with much confidence). Hmm. My inclination is to think that on an actual standby, you wouldn't want to get messages like this, but if you were doing a point-in-time-recovery, then you would. So I think maybe InArchiveRecovery is not the right thing. Perhaps StandbyMode? -- Robert Haas EDB: http://www.enterprisedb.com
> Anything that's not used in other files should be declared static in > the file itself, and not present in the header. Once you fix this for > reset_startup_progress_timeout, the header won't need to include > datatype/timestamp.h any more, which is good, because we don't want > header files to depend on more other header files than necessary. Thanks for identifying this. I will take care in the next patch. > Looking over this version, I realized something I (or you) should have > noticed sooner: you've added the RegisterTimeout call to > InitPostgres(), but that's for things that are used by all backends, > and this is only used by the startup process. So it seems to me that > the right place is StartupProcessMain. That would have the further > advantage of allowing startup_progress_timeout_handler to be made > static. begin_startup_progress_phase() and > startup_progress_timeout_has_expired() are the actual API functions > though so they will need to remain extern. Yes. I had noticed this earlier and the RegisterTimeout() call was only present in StartupProcessMain() and not in InitPostgres() in the earlier versions (v7) of the patch. Since StartupXLOG() gets called in the 2 places, I had restricted the InitPostgres() flow by checking for the !AmStartupProcess() in the newly added functions. But later we had discussion and concluded to add the RegisterTimeout() call even in case of InitPostgres(). Kindly refer to the discussion just after the v7 patch in this thread and let me know your thoughts. > This hunk should be removed. I will remove it in the next patch. On Tue, Aug 17, 2021 at 1:08 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Sat, Aug 14, 2021 at 5:47 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > Should this feature distinguish between crash recovery and archive recovery on > > a hot standby ? Otherwise the standby will display this all the time. > > > > 2021-08-14 16:13:33.139 CDT startup[11741] LOG: redo in progress, elapsed time: 124.42 s, current LSN: 0/EEE2100 > > > > If so, I think maybe you'd check !InArchiveRecovery (but until Robert finishes > > cleanup of xlog.c variables, I can't say that with much confidence). > > Hmm. My inclination is to think that on an actual standby, you > wouldn't want to get messages like this, but if you were doing a > point-in-time-recovery, then you would. So I think maybe > InArchiveRecovery is not the right thing. Perhaps StandbyMode? > > -- > Robert Haas > EDB: http://www.enterprisedb.com
> > Anything that's not used in other files should be declared static in > > the file itself, and not present in the header. Once you fix this for > > reset_startup_progress_timeout, the header won't need to include > > datatype/timestamp.h any more, which is good, because we don't want > > header files to depend on more other header files than necessary. > > Thanks for identifying this. I will take care in the next patch. Fixed. > > This hunk should be removed. > > I will remove it in the next patch. Removed. Please find the updated patch attached. On Wed, Aug 18, 2021 at 12:23 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > > Anything that's not used in other files should be declared static in > > the file itself, and not present in the header. Once you fix this for > > reset_startup_progress_timeout, the header won't need to include > > datatype/timestamp.h any more, which is good, because we don't want > > header files to depend on more other header files than necessary. > > Thanks for identifying this. I will take care in the next patch. > > > Looking over this version, I realized something I (or you) should have > > noticed sooner: you've added the RegisterTimeout call to > > InitPostgres(), but that's for things that are used by all backends, > > and this is only used by the startup process. So it seems to me that > > the right place is StartupProcessMain. That would have the further > > advantage of allowing startup_progress_timeout_handler to be made > > static. begin_startup_progress_phase() and > > startup_progress_timeout_has_expired() are the actual API functions > > though so they will need to remain extern. > > Yes. I had noticed this earlier and the RegisterTimeout() call was > only present in StartupProcessMain() and not in InitPostgres() in the > earlier versions (v7) of the patch. Since StartupXLOG() gets called in > the 2 places, I had restricted the InitPostgres() flow by checking for > the !AmStartupProcess() in the newly added functions. But later we had > discussion and concluded to add the RegisterTimeout() call even in > case of InitPostgres(). Kindly refer to the discussion just after the > v7 patch in this thread and let me know your thoughts. > > > This hunk should be removed. > > I will remove it in the next patch. > > On Tue, Aug 17, 2021 at 1:08 AM Robert Haas <robertmhaas@gmail.com> wrote: > > > > On Sat, Aug 14, 2021 at 5:47 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > Should this feature distinguish between crash recovery and archive recovery on > > > a hot standby ? Otherwise the standby will display this all the time. > > > > > > 2021-08-14 16:13:33.139 CDT startup[11741] LOG: redo in progress, elapsed time: 124.42 s, current LSN: 0/EEE2100 > > > > > > If so, I think maybe you'd check !InArchiveRecovery (but until Robert finishes > > > cleanup of xlog.c variables, I can't say that with much confidence). > > > > Hmm. My inclination is to think that on an actual standby, you > > wouldn't want to get messages like this, but if you were doing a > > point-in-time-recovery, then you would. So I think maybe > > InArchiveRecovery is not the right thing. Perhaps StandbyMode? > > > > -- > > Robert Haas > > EDB: http://www.enterprisedb.com
Attachment
On Fri, Sep 03, 2021 at 01:23:56PM +0530, Nitin Jadhav wrote: > Please find the updated patch attached. Please check CA+TgmoZtbqxaOLdpNkBcDbz=41tWALA8kpH4M=RWtPYHC7-KNg@mail.gmail.com I agree with Robert that a standby server should not continuously show timing messages for WAL replay. Some doc comments: + Sets the time interval between each progress update of the operations + performed during startup process. This produces the log message after Either say "performed by the startup process" or "performed during startup". s/the/a/ + every interval of time for the operations that take longer time. The unit ..for those operations which take longer than the specified duration. + used to specify the value is seconds. For example, if you set it to + <literal> 10s </literal>, then after every <literal> 10s </literal> there remove "after" + is a log message indicating which operation is going on and what is the say "..every 10s, a log is emitted indicating which operation is ongoing, and the elapsed time from the beginning of the operation.." + elapsed time from beginning. If the value is set to <literal> 0 </literal>, + then it logs all the available messages for such operations. <literal> -1 "..then all messages for such operations are logged." + </literal> disables the feature. The default value is set to <literal> 10s + </literal> "The default value is >10s<."
On Fri, Sep 03, 2021 at 09:23:27PM -0500, Justin Pryzby wrote: > On Fri, Sep 03, 2021 at 01:23:56PM +0530, Nitin Jadhav wrote: > > Please find the updated patch attached. > > Please check CA+TgmoZtbqxaOLdpNkBcDbz=41tWALA8kpH4M=RWtPYHC7-KNg@mail.gmail.com > > I agree with Robert that a standby server should not continuously show timing > messages for WAL replay. Clearly. This should be limited to crash recovery, and maybe there could be some checks to make sure that nothing is logged once a consistent point is reached. Honestly, I don't see why we should have a GUC for what's proposed here. A value too low would bloat the logs with entries that are not that meaningful. A value too large would just prevent access to some information wanted. Wouldn't it be better to just pick up a value like 10s or 20s? Looking at v13.. + {"log_startup_progress_interval", PGC_SIGHUP, LOGGING_WHEN, + gettext_noop("Sets the time interval between each progress update " + "of the startup process."), + gettext_noop("0 logs all messages. -1 turns this feature off."), + GUC_UNIT_MS, The unit is incorrect here, as that would default to 10ms, contrary to what the documentation says about 10s. -- Michael
Attachment
> I also agree that this is the better place to do it. Hence modified > the patch accordingly. The condition "!AmStartupProcess()" is added to > differentiate whether the call is done from a startup process or some > other process. Actually StartupXLOG() gets called in 2 places. one in > StartupProcessMain() and the other in InitPostgres(). As the logging > of the startup progress is required only during the startup process > and not in the other cases, The InitPostgres() case occurs when the server is started in bootstrap mode (during initdb) or in single-user mode (postgres --single). I do not see any reason why we shouldn't produce progress messages in at least the latter case. I suspect that someone who is in the rather desperate scenario of having to use single-user mode would really like to know how long the server is going to take to start up. Perhaps during initdb we don't want messages, but I'm not sure that we need to do anything about that here. None of the messages that the server normally produces show up when you run initdb, so I guess they are getting redirected to /dev/null or something. So I don't think that using AmStartupProcess() for this purpose is right. On Tue, Sep 7, 2021 at 10:58 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Sep 03, 2021 at 09:23:27PM -0500, Justin Pryzby wrote: > > On Fri, Sep 03, 2021 at 01:23:56PM +0530, Nitin Jadhav wrote: > > > Please find the updated patch attached. > > > > Please check CA+TgmoZtbqxaOLdpNkBcDbz=41tWALA8kpH4M=RWtPYHC7-KNg@mail.gmail.com > > > > I agree with Robert that a standby server should not continuously show timing > > messages for WAL replay. > > Clearly. This should be limited to crash recovery, and maybe there > could be some checks to make sure that nothing is logged once a > consistent point is reached. Honestly, I don't see why we should have > a GUC for what's proposed here. A value too low would bloat the logs > with entries that are not that meaningful. A value too large would > just prevent access to some information wanted. Wouldn't it be better > to just pick up a value like 10s or 20s? > > Looking at v13.. > > + {"log_startup_progress_interval", PGC_SIGHUP, LOGGING_WHEN, > + gettext_noop("Sets the time interval between each progress update " > + "of the startup process."), > + gettext_noop("0 logs all messages. -1 turns this feature off."), > + GUC_UNIT_MS, > The unit is incorrect here, as that would default to 10ms, contrary to > what the documentation says about 10s. > -- > Michael
> Looking over this version, I realized something I (or you) should have > noticed sooner: you've added the RegisterTimeout call to > InitPostgres(), but that's for things that are used by all backends, > and this is only used by the startup process. So it seems to me that > the right place is StartupProcessMain. That would have the further > advantage of allowing startup_progress_timeout_handler to be made > static. begin_startup_progress_phase() and > startup_progress_timeout_has_expired() are the actual API functions > though so they will need to remain extern. > > I agree with Robert that a standby server should not continuously show timing > messages for WAL replay. The earlier discussion wrt this point is as follows. > > I also agree that this is the better place to do it. Hence modified > > the patch accordingly. The condition "!AmStartupProcess()" is added to > > differentiate whether the call is done from a startup process or some > > other process. Actually StartupXLOG() gets called in 2 places. one in > > StartupProcessMain() and the other in InitPostgres(). As the logging > > of the startup progress is required only during the startup process > > and not in the other cases, > > The InitPostgres() case occurs when the server is started in bootstrap > mode (during initdb) or in single-user mode (postgres --single). I do > not see any reason why we shouldn't produce progress messages in at > least the latter case. I suspect that someone who is in the rather > desperate scenario of having to use single-user mode would really like > to know how long the server is going to take to start up. > > Perhaps during initdb we don't want messages, but I'm not sure that we > need to do anything about that here. None of the messages that the > server normally produces show up when you run initdb, so I guess they > are getting redirected to /dev/null or something. > > So I don't think that using AmStartupProcess() for this purpose is right. So as per the recent discussion, RegisterTimeout call should be removed from InitPostgres() and the condition "!AmStartupProcess()" is to be added in begin_startup_progress_phase() and ereport_startup_progress() to differentiate whether the call is from a startup process or some other process. Kindly correct me if I am wrong. > Some doc comments: Thanks for the suggestions. I will take care in the next patch. > Clearly. This should be limited to crash recovery, and maybe there > could be some checks to make sure that nothing is logged once a > consistent point is reached. The purpose here is to show the progress of the operation if it is taking longer than the interval set by the user until it completes the operation. Users should know what operation is happening in the background and to show the progress, displaying the elapsed time. So according to me the consistent point is nothing but the end of the operation. Kindly let me know if you have something in mind and that could be the better consistent point. > Honestly, I don't see why we should have > a GUC for what's proposed here. A value too low would bloat the logs > with entries that are not that meaningful. A value too large would > just prevent access to some information wanted. Wouldn't it be better > to just pick up a value like 10s or 20s? It is difficult to finalise the value and use that value without providing an option to change. If we choose one value (say 10s), it may be too less for some users or too large for some other users. So I feel it is better to provide an option to users so that they can choose the value according to their need. Anyway the default value set for this setting is 10s. > + {"log_startup_progress_interval", PGC_SIGHUP, LOGGING_WHEN, > + gettext_noop("Sets the time interval between each progress update " > + "of the startup process."), > + gettext_noop("0 logs all messages. -1 turns this feature off."), > + GUC_UNIT_MS, > The unit is incorrect here, as that would default to 10ms, contrary to > what the documentation says about 10s. Kindly refer the previous few discussions wrt this point. On Tue, Sep 7, 2021 at 10:58 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Sep 03, 2021 at 09:23:27PM -0500, Justin Pryzby wrote: > > On Fri, Sep 03, 2021 at 01:23:56PM +0530, Nitin Jadhav wrote: > > > Please find the updated patch attached. > > > > Please check CA+TgmoZtbqxaOLdpNkBcDbz=41tWALA8kpH4M=RWtPYHC7-KNg@mail.gmail.com > > > > I agree with Robert that a standby server should not continuously show timing > > messages for WAL replay. > > Clearly. This should be limited to crash recovery, and maybe there > could be some checks to make sure that nothing is logged once a > consistent point is reached. Honestly, I don't see why we should have > a GUC for what's proposed here. A value too low would bloat the logs > with entries that are not that meaningful. A value too large would > just prevent access to some information wanted. Wouldn't it be better > to just pick up a value like 10s or 20s? > > Looking at v13.. > > + {"log_startup_progress_interval", PGC_SIGHUP, LOGGING_WHEN, > + gettext_noop("Sets the time interval between each progress update " > + "of the startup process."), > + gettext_noop("0 logs all messages. -1 turns this feature off."), > + GUC_UNIT_MS, > The unit is incorrect here, as that would default to 10ms, contrary to > what the documentation says about 10s. > -- > Michael
On Tue, Sep 07, 2021 at 03:07:15PM +0530, Nitin Jadhav wrote: > > Looking over this version, I realized something I (or you) should have > > noticed sooner: you've added the RegisterTimeout call to > > InitPostgres(), but that's for things that are used by all backends, > > and this is only used by the startup process. So it seems to me that > > the right place is StartupProcessMain. That would have the further > > advantage of allowing startup_progress_timeout_handler to be made > > static. begin_startup_progress_phase() and > > startup_progress_timeout_has_expired() are the actual API functions > > though so they will need to remain extern. > > > > I agree with Robert that a standby server should not continuously show timing > > messages for WAL replay. > > The earlier discussion wrt this point is as follows. I think you're confusing discussions. Robert was talking about initdb/bootstrap/single, and I separately and independently asked about hot standbys. It seems like Robert and I agreed about the desired behavior and there was no further discussion. > > Honestly, I don't see why we should have > > a GUC for what's proposed here. A value too low would bloat the logs > > with entries that are not that meaningful. A value too large would > > just prevent access to some information wanted. Wouldn't it be better > > to just pick up a value like 10s or 20s? I don't think bloating logs is a issue for values > 10sec. You agreed that it's important to choose the "right" value, but I think that will vary between users. Some installations with large checkpoint_timeout might anticipate taking 15+min to perform recovery, but others might even have a strict requirement that recovery must not take more than (say) 10sec; someone might want to use this to verify that, or to optimize the slow parts of recovery, with an interval that someone else might not care about. > > + {"log_startup_progress_interval", PGC_SIGHUP, LOGGING_WHEN, > > + gettext_noop("Sets the time interval between each progress update " > > + "of the startup process."), > > + gettext_noop("0 logs all messages. -1 turns this feature off."), > > + GUC_UNIT_MS, |+ 10, -1, INT_MAX, > > The unit is incorrect here, as that would default to 10ms, contrary to > > what the documentation says about 10s. > > Kindly refer the previous few discussions wrt this point. You copied and pasted unrelated emails, which isn't helpful. Michael is right. You updated some of the units based on Robert's suggestion to use MS, but didn't update all of the corresponding parts of the patch. guc.c says that the units are in MS, which means that unqualified values are interpretted as such. But postgresql.conf.sample still says "seconds", and guc.c says the default value is "10", and you still do: + interval_in_ms = log_startup_progress_interval * 1000; I checked that this currently does not interpret the value as ms: |./tmp_install/usr/local/pgsql/bin/postgres -D src/test/regress/tmp_check/data/ -c log_startup_progress_interval=1 |2021-09-07 06:28:58.694 CDT startup[18865] LOG: redo in progress, elapsed time: 1.00 s, current LSN: 0/E94ED88 |2021-09-07 06:28:59.694 CDT startup[18865] LOG: redo in progress, elapsed time: 2.00 s, current LSN: 0/10808EE0 |2021-09-07 06:29:00.694 CDT startup[18865] LOG: redo in progress, elapsed time: 3.00 s, current LSN: 0/126B8C80 (Also, the GUC value is in the range 0..INT_MAX, so multiplying and storing to another int could overflow.) I think the convention is to for GUC global vars to be initialized with the same default as in guc.c, so both should be 10000, like: +int log_startup_progress_interval = 10 * 1000 /* 10sec */ -- Justin
> I think you're confusing discussions. > > Robert was talking about initdb/bootstrap/single, and I separately and > independently asked about hot standbys. It seems like Robert and I agreed > about the desired behavior and there was no further discussion. Sorry for including 2 separate points into one. > Looking over this version, I realized something I (or you) should have > noticed sooner: you've added the RegisterTimeout call to > InitPostgres(), but that's for things that are used by all backends, > and this is only used by the startup process. So it seems to me that > the right place is StartupProcessMain. That would have the further > advantage of allowing startup_progress_timeout_handler to be made > static. begin_startup_progress_phase() and > startup_progress_timeout_has_expired() are the actual API functions > though so they will need to remain extern. The earlier discussion wrt this point is as follows. > > I also agree that this is the better place to do it. Hence modified > > the patch accordingly. The condition "!AmStartupProcess()" is added to > > differentiate whether the call is done from a startup process or some > > other process. Actually StartupXLOG() gets called in 2 places. one in > > StartupProcessMain() and the other in InitPostgres(). As the logging > > of the startup progress is required only during the startup process > > and not in the other cases, > > The InitPostgres() case occurs when the server is started in bootstrap > mode (during initdb) or in single-user mode (postgres --single). I do > not see any reason why we shouldn't produce progress messages in at > least the latter case. I suspect that someone who is in the rather > desperate scenario of having to use single-user mode would really like > to know how long the server is going to take to start up. > > Perhaps during initdb we don't want messages, but I'm not sure that we > need to do anything about that here. None of the messages that the > server normally produces show up when you run initdb, so I guess they > are getting redirected to /dev/null or something. > > So I don't think that using AmStartupProcess() for this purpose is right. This point is really confusing. As per the earlier discussion we concluded to include RegisterTimeout() call even in case of InitPostgres() to support logging in case of single-user mode. Now if we remove the RegisterTimeout() call from InitPostgres(), we are not going to support that anymore. Is this what you're trying to convey? or we should add some checks and disable the code to RegisterTimeout() if it is other than single-user mode. I have added a check if (!IsPostmasterEnvironment) in the attached patch for this scenario. Kindly confirm my understanding. > > Should this feature distinguish between crash recovery and archive recovery on > > a hot standby ? Otherwise the standby will display this all the time. > > > >2021-08-14 16:13:33.139 CDT startup[11741] LOG: redo in progress, elapsed time: 124.42 s, current LSN: 0/EEE2100 > > > >If so, I think maybe you'd check !InArchiveRecovery (but until Robert finishes > > cleanup of xlog.c variables, I can't say that with much confidence). > > Hmm. My inclination is to think that on an actual standby, you > wouldn't want to get messages like this, but if you were doing a > point-in-time-recovery, then you would. So I think maybe > InArchiveRecovery is not the right thing. Perhaps StandbyMode? I also feel that the log messages should be recorded in case of point-in-time-recovery. So I have added a check if (!StandbyMode) and verified the replication and point-in-time-recovery scenario. > > Some doc comments: > > Thanks for the suggestions. I will take care in the next patch. Fixed. > Michael is right. You updated some of the units based on Robert's suggestion > to use MS, but didn't update all of the corresponding parts of the patch. > guc.c says that the units are in MS, which means that unqualified values are > interpretted as such. But postgresql.conf.sample still says "seconds", and > guc.c says the default value is "10", and you still do: > > + interval_in_ms = log_startup_progress_interval * 1000; > > I checked that this currently does not interpret the value as ms: > |./tmp_install/usr/local/pgsql/bin/postgres -D src/test/regress/tmp_check/data/ -c log_startup_progress_interval=1 > |2021-09-07 06:28:58.694 CDT startup[18865] LOG: redo in progress, elapsed time: 1.00 s, current LSN: 0/E94ED88 > |2021-09-07 06:28:59.694 CDT startup[18865] LOG: redo in progress, elapsed time: 2.00 s, current LSN: 0/10808EE0 > |2021-09-07 06:29:00.694 CDT startup[18865] LOG: redo in progress, elapsed time: 3.00 s, current LSN: 0/126B8C80 > > (Also, the GUC value is in the range 0..INT_MAX, so multiplying and storing to > another int could overflow.) > > I think the convention is to for GUC global vars to be initialized with the > same default as in guc.c, so both should be 10000, like: > > +int log_startup_progress_interval = 10 * 1000 /* 10sec */ Following is the discussion done wrt this point. Kindly refer and share your thoughts. > > > I suggest making the GUC GUC_UNIT_MS rather than GUC_UNIT_S, but > > > expressing the default in postgresl.conf.sample as 10s rather than > > > 10000ms. I tried values measured in milliseconds just for testing > > > purposes and didn't initially understand why it wasn't working. I > > > don't think there's any reason it can't work. > > > > As suggested, I have changed it to GUC_UNIT_MS and kept the default > > value to 10s. I would like to know the reason why it can't be > > GUC_UNIT_S as we are expressing the values in terms of seconds. > > I mean, it *could* be. There's just no advantage. Values in seconds > will work correctly either way. But values in milliseconds will only > work if it's GUC_UNIT_MS. It seems to me that it's better to make more > things work rather than fewer. Thanks & Regards, Nitin Jadhav On Tue, Sep 7, 2021 at 5:19 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Tue, Sep 07, 2021 at 03:07:15PM +0530, Nitin Jadhav wrote: > > > Looking over this version, I realized something I (or you) should have > > > noticed sooner: you've added the RegisterTimeout call to > > > InitPostgres(), but that's for things that are used by all backends, > > > and this is only used by the startup process. So it seems to me that > > > the right place is StartupProcessMain. That would have the further > > > advantage of allowing startup_progress_timeout_handler to be made > > > static. begin_startup_progress_phase() and > > > startup_progress_timeout_has_expired() are the actual API functions > > > though so they will need to remain extern. > > > > > > I agree with Robert that a standby server should not continuously show timing > > > messages for WAL replay. > > > > The earlier discussion wrt this point is as follows. > > I think you're confusing discussions. > > Robert was talking about initdb/bootstrap/single, and I separately and > independently asked about hot standbys. It seems like Robert and I agreed > about the desired behavior and there was no further discussion. > > > > Honestly, I don't see why we should have > > > a GUC for what's proposed here. A value too low would bloat the logs > > > with entries that are not that meaningful. A value too large would > > > just prevent access to some information wanted. Wouldn't it be better > > > to just pick up a value like 10s or 20s? > > I don't think bloating logs is a issue for values > 10sec. > > You agreed that it's important to choose the "right" value, but I think that > will vary between users. Some installations with large checkpoint_timeout > might anticipate taking 15+min to perform recovery, but others might even have > a strict requirement that recovery must not take more than (say) 10sec; someone > might want to use this to verify that, or to optimize the slow parts of > recovery, with an interval that someone else might not care about. > > > > + {"log_startup_progress_interval", PGC_SIGHUP, LOGGING_WHEN, > > > + gettext_noop("Sets the time interval between each progress update " > > > + "of the startup process."), > > > + gettext_noop("0 logs all messages. -1 turns this feature off."), > > > + GUC_UNIT_MS, > |+ 10, -1, INT_MAX, > > > The unit is incorrect here, as that would default to 10ms, contrary to > > > what the documentation says about 10s. > > > > Kindly refer the previous few discussions wrt this point. > > You copied and pasted unrelated emails, which isn't helpful. > > Michael is right. You updated some of the units based on Robert's suggestion > to use MS, but didn't update all of the corresponding parts of the patch. > guc.c says that the units are in MS, which means that unqualified values are > interpretted as such. But postgresql.conf.sample still says "seconds", and > guc.c says the default value is "10", and you still do: > > + interval_in_ms = log_startup_progress_interval * 1000; > > I checked that this currently does not interpret the value as ms: > |./tmp_install/usr/local/pgsql/bin/postgres -D src/test/regress/tmp_check/data/ -c log_startup_progress_interval=1 > |2021-09-07 06:28:58.694 CDT startup[18865] LOG: redo in progress, elapsed time: 1.00 s, current LSN: 0/E94ED88 > |2021-09-07 06:28:59.694 CDT startup[18865] LOG: redo in progress, elapsed time: 2.00 s, current LSN: 0/10808EE0 > |2021-09-07 06:29:00.694 CDT startup[18865] LOG: redo in progress, elapsed time: 3.00 s, current LSN: 0/126B8C80 > > (Also, the GUC value is in the range 0..INT_MAX, so multiplying and storing to > another int could overflow.) > > I think the convention is to for GUC global vars to be initialized with the > same default as in guc.c, so both should be 10000, like: > > +int log_startup_progress_interval = 10 * 1000 /* 10sec */ > > -- > Justin
Attachment
> Michael is right. You updated some of the units based on Robert's suggestion > to use MS, but didn't update all of the corresponding parts of the patch. > guc.c says that the units are in MS, which means that unqualified values are > interpretted as such. But postgresql.conf.sample still says "seconds", and > guc.c says the default value is "10", and you still do: > > + interval_in_ms = log_startup_progress_interval * 1000; > > I checked that this currently does not interpret the value as ms: > |./tmp_install/usr/local/pgsql/bin/postgres -D src/test/regress/tmp_check/data/ -c log_startup_progress_interval=1 > |2021-09-07 06:28:58.694 CDT startup[18865] LOG: redo in progress, elapsed time: 1.00 s, current LSN: 0/E94ED88 > |2021-09-07 06:28:59.694 CDT startup[18865] LOG: redo in progress, elapsed time: 2.00 s, current LSN: 0/10808EE0 > |2021-09-07 06:29:00.694 CDT startup[18865] LOG: redo in progress, elapsed time: 3.00 s, current LSN: 0/126B8C80 > > (Also, the GUC value is in the range 0..INT_MAX, so multiplying and storing to > another int could overflow.) > > I think the convention is to for GUC global vars to be initialized with the > same default as in guc.c, so both should be 10000, like: > > +int log_startup_progress_interval = 10 * 1000 /* 10sec */ Thanks Justin for the detailed explanation. Done the necessary changes. Please find the updated patch attached. On Mon, Sep 13, 2021 at 8:32 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > > I think you're confusing discussions. > > > > Robert was talking about initdb/bootstrap/single, and I separately and > > independently asked about hot standbys. It seems like Robert and I agreed > > about the desired behavior and there was no further discussion. > > Sorry for including 2 separate points into one. > > > Looking over this version, I realized something I (or you) should have > > noticed sooner: you've added the RegisterTimeout call to > > InitPostgres(), but that's for things that are used by all backends, > > and this is only used by the startup process. So it seems to me that > > the right place is StartupProcessMain. That would have the further > > advantage of allowing startup_progress_timeout_handler to be made > > static. begin_startup_progress_phase() and > > startup_progress_timeout_has_expired() are the actual API functions > > though so they will need to remain extern. > > The earlier discussion wrt this point is as follows. > > > > I also agree that this is the better place to do it. Hence modified > > > the patch accordingly. The condition "!AmStartupProcess()" is added to > > > differentiate whether the call is done from a startup process or some > > > other process. Actually StartupXLOG() gets called in 2 places. one in > > > StartupProcessMain() and the other in InitPostgres(). As the logging > > > of the startup progress is required only during the startup process > > > and not in the other cases, > > > > The InitPostgres() case occurs when the server is started in bootstrap > > mode (during initdb) or in single-user mode (postgres --single). I do > > not see any reason why we shouldn't produce progress messages in at > > least the latter case. I suspect that someone who is in the rather > > desperate scenario of having to use single-user mode would really like > > to know how long the server is going to take to start up. > > > > Perhaps during initdb we don't want messages, but I'm not sure that we > > need to do anything about that here. None of the messages that the > > server normally produces show up when you run initdb, so I guess they > > are getting redirected to /dev/null or something. > > > > So I don't think that using AmStartupProcess() for this purpose is right. > > This point is really confusing. As per the earlier discussion we > concluded to include RegisterTimeout() call even in case of > InitPostgres() to support logging in case of single-user mode. Now if > we remove the RegisterTimeout() call from InitPostgres(), we are not > going to support that anymore. Is this what you're trying to convey? > or we should add some checks and disable the code to RegisterTimeout() > if it is other than single-user mode. I have added a check if > (!IsPostmasterEnvironment) in the attached patch for this scenario. > Kindly confirm my understanding. > > > > Should this feature distinguish between crash recovery and archive recovery on > > > a hot standby ? Otherwise the standby will display this all the time. > > > > > >2021-08-14 16:13:33.139 CDT startup[11741] LOG: redo in progress, elapsed time: 124.42 s, current LSN: 0/EEE2100 > > > > > >If so, I think maybe you'd check !InArchiveRecovery (but until Robert finishes > > > cleanup of xlog.c variables, I can't say that with much confidence). > > > > Hmm. My inclination is to think that on an actual standby, you > > wouldn't want to get messages like this, but if you were doing a > > point-in-time-recovery, then you would. So I think maybe > > InArchiveRecovery is not the right thing. Perhaps StandbyMode? > > I also feel that the log messages should be recorded in case of > point-in-time-recovery. So I have added a check if (!StandbyMode) and > verified the replication and point-in-time-recovery scenario. > > > > Some doc comments: > > > > Thanks for the suggestions. I will take care in the next patch. > > Fixed. > > > Michael is right. You updated some of the units based on Robert's suggestion > > to use MS, but didn't update all of the corresponding parts of the patch. > > guc.c says that the units are in MS, which means that unqualified values are > > interpretted as such. But postgresql.conf.sample still says "seconds", and > > guc.c says the default value is "10", and you still do: > > > > + interval_in_ms = log_startup_progress_interval * 1000; > > > > I checked that this currently does not interpret the value as ms: > > |./tmp_install/usr/local/pgsql/bin/postgres -D src/test/regress/tmp_check/data/ -c log_startup_progress_interval=1 > > |2021-09-07 06:28:58.694 CDT startup[18865] LOG: redo in progress, elapsed time: 1.00 s, current LSN: 0/E94ED88 > > |2021-09-07 06:28:59.694 CDT startup[18865] LOG: redo in progress, elapsed time: 2.00 s, current LSN: 0/10808EE0 > > |2021-09-07 06:29:00.694 CDT startup[18865] LOG: redo in progress, elapsed time: 3.00 s, current LSN: 0/126B8C80 > > > > (Also, the GUC value is in the range 0..INT_MAX, so multiplying and storing to > > another int could overflow.) > > > > I think the convention is to for GUC global vars to be initialized with the > > same default as in guc.c, so both should be 10000, like: > > > > +int log_startup_progress_interval = 10 * 1000 /* 10sec */ > > Following is the discussion done wrt this point. Kindly refer and > share your thoughts. > > > > > I suggest making the GUC GUC_UNIT_MS rather than GUC_UNIT_S, but > > > > expressing the default in postgresl.conf.sample as 10s rather than > > > > 10000ms. I tried values measured in milliseconds just for testing > > > > purposes and didn't initially understand why it wasn't working. I > > > > don't think there's any reason it can't work. > > > > > > As suggested, I have changed it to GUC_UNIT_MS and kept the default > > > value to 10s. I would like to know the reason why it can't be > > > GUC_UNIT_S as we are expressing the values in terms of seconds. > > > > I mean, it *could* be. There's just no advantage. Values in seconds > > will work correctly either way. But values in milliseconds will only > > work if it's GUC_UNIT_MS. It seems to me that it's better to make more > > things work rather than fewer. > > Thanks & Regards, > Nitin Jadhav > On Tue, Sep 7, 2021 at 5:19 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > On Tue, Sep 07, 2021 at 03:07:15PM +0530, Nitin Jadhav wrote: > > > > Looking over this version, I realized something I (or you) should have > > > > noticed sooner: you've added the RegisterTimeout call to > > > > InitPostgres(), but that's for things that are used by all backends, > > > > and this is only used by the startup process. So it seems to me that > > > > the right place is StartupProcessMain. That would have the further > > > > advantage of allowing startup_progress_timeout_handler to be made > > > > static. begin_startup_progress_phase() and > > > > startup_progress_timeout_has_expired() are the actual API functions > > > > though so they will need to remain extern. > > > > > > > > I agree with Robert that a standby server should not continuously show timing > > > > messages for WAL replay. > > > > > > The earlier discussion wrt this point is as follows. > > > > I think you're confusing discussions. > > > > Robert was talking about initdb/bootstrap/single, and I separately and > > independently asked about hot standbys. It seems like Robert and I agreed > > about the desired behavior and there was no further discussion. > > > > > > Honestly, I don't see why we should have > > > > a GUC for what's proposed here. A value too low would bloat the logs > > > > with entries that are not that meaningful. A value too large would > > > > just prevent access to some information wanted. Wouldn't it be better > > > > to just pick up a value like 10s or 20s? > > > > I don't think bloating logs is a issue for values > 10sec. > > > > You agreed that it's important to choose the "right" value, but I think that > > will vary between users. Some installations with large checkpoint_timeout > > might anticipate taking 15+min to perform recovery, but others might even have > > a strict requirement that recovery must not take more than (say) 10sec; someone > > might want to use this to verify that, or to optimize the slow parts of > > recovery, with an interval that someone else might not care about. > > > > > > + {"log_startup_progress_interval", PGC_SIGHUP, LOGGING_WHEN, > > > > + gettext_noop("Sets the time interval between each progress update " > > > > + "of the startup process."), > > > > + gettext_noop("0 logs all messages. -1 turns this feature off."), > > > > + GUC_UNIT_MS, > > |+ 10, -1, INT_MAX, > > > > The unit is incorrect here, as that would default to 10ms, contrary to > > > > what the documentation says about 10s. > > > > > > Kindly refer the previous few discussions wrt this point. > > > > You copied and pasted unrelated emails, which isn't helpful. > > > > Michael is right. You updated some of the units based on Robert's suggestion > > to use MS, but didn't update all of the corresponding parts of the patch. > > guc.c says that the units are in MS, which means that unqualified values are > > interpretted as such. But postgresql.conf.sample still says "seconds", and > > guc.c says the default value is "10", and you still do: > > > > + interval_in_ms = log_startup_progress_interval * 1000; > > > > I checked that this currently does not interpret the value as ms: > > |./tmp_install/usr/local/pgsql/bin/postgres -D src/test/regress/tmp_check/data/ -c log_startup_progress_interval=1 > > |2021-09-07 06:28:58.694 CDT startup[18865] LOG: redo in progress, elapsed time: 1.00 s, current LSN: 0/E94ED88 > > |2021-09-07 06:28:59.694 CDT startup[18865] LOG: redo in progress, elapsed time: 2.00 s, current LSN: 0/10808EE0 > > |2021-09-07 06:29:00.694 CDT startup[18865] LOG: redo in progress, elapsed time: 3.00 s, current LSN: 0/126B8C80 > > > > (Also, the GUC value is in the range 0..INT_MAX, so multiplying and storing to > > another int could overflow.) > > > > I think the convention is to for GUC global vars to be initialized with the > > same default as in guc.c, so both should be 10000, like: > > > > +int log_startup_progress_interval = 10 * 1000 /* 10sec */ > > > > -- > > Justin
Attachment
On Wed, Sep 22, 2021 at 10:28 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > Thanks Justin for the detailed explanation. Done the necessary changes. Not really. The documentation here does not make a ton of sense: + Sets the time interval between each progress update of the operations + performed by the startup process. This produces the log messages for + those operations which take longer than the specified duration. The unit + used to specify the value is milliseconds. For example, if you set it to + <literal> 10s </literal>, then every <literal> 10s </literal>, a log is + emitted indicating which operation is ongoing, and the elapsed time from + the beginning of the operation. If the value is set to <literal> 0 </literal>, + then all messages for such operations are logged. <literal> -1 </literal> + disables the feature. The default value is <literal> 10s </literal> I really don't know what to say about this. You say that the time is measured in milliseconds, and then immediately turn around and say "For example, if you set it to 10s". Now we do expect that most people will set it to intervals that are measured in seconds rather than milliseconds, but saying that setting it to a value measured in seconds is an example of setting it in milliseconds is not logical. It also looks pretty silly to say that if you set the value to 10s, something will happen every 10s. What else would anyone expect to happen? You really need to give some thought to how to explain the behavior in a way that is clear and logical but not overly wordy. Also, please note that you've got spaces around the literals, which does not match the surrounding style and does not render properly in HTML. It is also not logical to define 0 as meaning that "all messages for such operations are logged". What does that even mean? It makes sense for something like log_autovacuum_min_duration, because there we are talking about logging one message per operation, and we could log messages for all operations or just some of them. Here we are talking about the time between one message and the next, so talking about "all messages" does not really seem to make a lot of sense. Experimentally, what 0 actually does is cause the system to spam log lines in a tight loop, which cannot be what anyone wants. I think you should make 0 mean disabled, and a positive value mean log at that interval, and disallow -1 altogether. And on that note, I tested your patch with log_startup_progress_interval=-1 and found that -1 behaves just like 0. In other words, contrary to what the documentation says, -1 does not disable the feature. It instead behaves just like 0. It's really confusing to me how you write documentation that says -1 has a certain behavior without thinking about the fact that you haven't written any code that would make -1 behave that way. And apparently you didn't test it, either. It took me approximately 1 minute of testing to find that this is broken, which really is not very much. -- Robert Haas EDB: http://www.enterprisedb.com
> I really don't know what to say about this. You say that the time is > measured in milliseconds, and then immediately turn around and say > "For example, if you set it to 10s". Now we do expect that most people > will set it to intervals that are measured in seconds rather than > milliseconds, but saying that setting it to a value measured in > seconds is an example of setting it in milliseconds is not logical. Based on the statement "I suggest making the GUC GUC_UNIT_MS rather than GUC_UNIT_S, but expressing the default in postgresl.conf.sample as 10s rather than 10000ms", I have used the default value in the postgresl.conf.sample as 10s rather than 10000ms. So I just used the same value in the example too in config.sgml. If it is really getting confusing, I will change it to 100ms in config.sgml. > It also looks pretty silly to say that if you set the value to 10s, > something will happen every 10s. What else would anyone expect to > happen? You really need to give some thought to how to explain the > behavior in a way that is clear and logical but not overly wordy. Added a few lines about that. "For example, if you set it to 1000ms, then it tries to emit a log every 1000ms. If the log message is not available at every 100th millisecond, then there is a possibility of delay in emitting the log. If the delay is more than a cycle or if the system clock gets set backwards then the next attempt is done based on the last logging time, otherwise the delay gets adjusted in the next attempt." Please correct the explanation if it does not meet your expectations. > Also, please note that you've got spaces around the literals, which > does not match the surrounding style and does not render properly in > HTML. Fixed. > It is also not logical to define 0 as meaning that "all messages for > such operations are logged". What does that even mean? It makes sense > for something like log_autovacuum_min_duration, because there we are > talking about logging one message per operation, and we could log > messages for all operations or just some of them. Here we are talking > about the time between one message and the next, so talking about "all > messages" does not really seem to make a lot of sense. Experimentally, > what 0 actually does is cause the system to spam log lines in a tight > loop, which cannot be what anyone wants. I think you should make 0 > mean disabled, and a positive value mean log at that interval, and > disallow -1 altogether. Made changes which indicate 0 mean disabled, > 0 mean interval in millisecond and removed -1. Please find the patch attached. On Thu, Sep 23, 2021 at 9:44 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Sep 22, 2021 at 10:28 AM Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > Thanks Justin for the detailed explanation. Done the necessary changes. > > Not really. The documentation here does not make a ton of sense: > > + Sets the time interval between each progress update of the operations > + performed by the startup process. This produces the log messages for > + those operations which take longer than the specified > duration. The unit > + used to specify the value is milliseconds. For example, if > you set it to > + <literal> 10s </literal>, then every <literal> 10s > </literal>, a log is > + emitted indicating which operation is ongoing, and the > elapsed time from > + the beginning of the operation. If the value is set to > <literal> 0 </literal>, > + then all messages for such operations are logged. <literal> > -1 </literal> > + disables the feature. The default value is <literal> 10s </literal> > > I really don't know what to say about this. You say that the time is > measured in milliseconds, and then immediately turn around and say > "For example, if you set it to 10s". Now we do expect that most people > will set it to intervals that are measured in seconds rather than > milliseconds, but saying that setting it to a value measured in > seconds is an example of setting it in milliseconds is not logical. It > also looks pretty silly to say that if you set the value to 10s, > something will happen every 10s. What else would anyone expect to > happen? You really need to give some thought to how to explain the > behavior in a way that is clear and logical but not overly wordy. > Also, please note that you've got spaces around the literals, which > does not match the surrounding style and does not render properly in > HTML. > > It is also not logical to define 0 as meaning that "all messages for > such operations are logged". What does that even mean? It makes sense > for something like log_autovacuum_min_duration, because there we are > talking about logging one message per operation, and we could log > messages for all operations or just some of them. Here we are talking > about the time between one message and the next, so talking about "all > messages" does not really seem to make a lot of sense. Experimentally, > what 0 actually does is cause the system to spam log lines in a tight > loop, which cannot be what anyone wants. I think you should make 0 > mean disabled, and a positive value mean log at that interval, and > disallow -1 altogether. > > And on that note, I tested your patch with > log_startup_progress_interval=-1 and found that -1 behaves just like > 0. In other words, contrary to what the documentation says, -1 does > not disable the feature. It instead behaves just like 0. It's really > confusing to me how you write documentation that says -1 has a certain > behavior without thinking about the fact that you haven't written any > code that would make -1 behave that way. And apparently you didn't > test it, either. It took me approximately 1 minute of testing to find > that this is broken, which really is not very much. > > -- > Robert Haas > EDB: http://www.enterprisedb.com
Attachment
On Mon, Sep 27, 2021 at 04:57:20PM +0530, Nitin Jadhav wrote: > > It is also not logical to define 0 as meaning that "all messages for > > such operations are logged". What does that even mean? It makes sense > > for something like log_autovacuum_min_duration, because there we are > > talking about logging one message per operation, and we could log > > messages for all operations or just some of them. Here we are talking > > about the time between one message and the next, so talking about "all > > messages" does not really seem to make a lot of sense. Experimentally, > > what 0 actually does is cause the system to spam log lines in a tight > > loop, which cannot be what anyone wants. I think you should make 0 > > mean disabled, and a positive value mean log at that interval, and > > disallow -1 altogether. > > Made changes which indicate 0 mean disabled, > 0 mean interval in > millisecond and removed -1. > > Please find the patch attached. I think you misunderstood - Robert was saying that interval=0 doesn't work, not suggesting that you write more documentation about it. Also, I agree with Robert that the documentation is too verbose. I don't think you need to talk about what happens if the clock goes backwards (It just needs to behave conveniently). Look at the other _duration statements for what they say about units. "If this value is specified without units, it is taken as milliseconds." https://www.postgresql.org/docs/14/runtime-config-logging.html log_autovacuum_min_duration log_min_duration_statement >>It also looks pretty silly to say that if you set the value to 10s, something >>will happen every 10s. What else would anyone expect to happen? @Robert: that's consistent with existing documentation, even though it might seem obvious and silly to us. | For example, if you set this to 250ms then all automatic vacuums and analyzes that run 250ms or longer will be logged | For example, if you set it to 250ms then all SQL statements that run 250ms or longer will be logged -- Justin
On Mon, Sep 27, 2021 at 7:26 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > I really don't know what to say about this. You say that the time is > > measured in milliseconds, and then immediately turn around and say > > "For example, if you set it to 10s". Now we do expect that most people > > will set it to intervals that are measured in seconds rather than > > milliseconds, but saying that setting it to a value measured in > > seconds is an example of setting it in milliseconds is not logical. > > Based on the statement "I suggest making the GUC GUC_UNIT_MS rather > than GUC_UNIT_S, but expressing the default in postgresl.conf.sample > as 10s rather than 10000ms", I have used the default value in the > postgresl.conf.sample as 10s rather than 10000ms. So I just used the > same value in the example too in config.sgml. If it is really getting > confusing, I will change it to 100ms in config.sgml. That's really not what I'm complaining about. I think if we're going to give an example at all, 10ms is a better example than 100ms, because 10s is a value that people are more likely to find useful. But I'm not sure that it's necessary to mention a specific value, and if it is, I think it needs to be phrased in a less confusing way. > Made changes which indicate 0 mean disabled, > 0 mean interval in > millisecond and removed -1. Well, I see that -1 is now disallowed, and that's good as far as it goes, but 0 still does not actually disable the feature. I don't understand why you posted the previous version of the patch without testing that it works, and I even less understand why you are posting another version without fixing the bug that I pointed out to you in the last version. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Sep 27, 2021 at 9:32 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > >>It also looks pretty silly to say that if you set the value to 10s, something > >>will happen every 10s. What else would anyone expect to happen? > > @Robert: that's consistent with existing documentation, even though it might > seem obvious and silly to us. > > | For example, if you set this to 250ms then all automatic vacuums and analyzes that run 250ms or longer will be logged > | For example, if you set it to 250ms then all SQL statements that run 250ms or longer will be logged Fair enough, but I still don't like it much. I tried my hand at rewriting this and came up with the attached: + Sets the amount of time after which the startup process will log + a message about a long-running operation that is still in progress, + as well as the interval between further progress messages for that + operation. This setting is applied separately to each operation. + For example, if syncing the data directory takes 25 seconds and + thereafter resetting unlogged relations takes 8 seconds, and if this + setting has the default value of 10 seconds, then a messages will be + logged for syncing the data directory after it has been in progress + for 10 seconds and again after it has been in progress for 20 seconds, + but nothing will be logged for resetting unlogged operations. + A setting of <literal>0</literal> disables the feature. I prefer this to Nitin's version because I think it could be unclear to someone that the value applies separately to each operation, whereas I don't think we need to document that we can't guarantee that the messages will be perfectly on time - that's true of every kind of scheduled event in pretty much every computer system - or what happens if the system clock goes backwards. Those are things we should try to get right, as well as we can anyway, but we don't need to tell the user that we got them right. -- Robert Haas EDB: http://www.enterprisedb.com
> +/* > + * Decides whether to log the startup progress or not based on whether the > + * timer is expired or not. Returns FALSE if the timer is not expired, otherwise > + * calculates the elapsed time and sets the respective out parameters secs and > + * usecs. Enables the timer for the next log message and returns TRUE. > + */ > +bool > +startup_progress_timeout_has_expired(long *secs, int *usecs) I think this comment can be worded better. It says it "decides", but it doesn't actually decide on any action to take -- it just reports whether the timer expired or not, to allow its caller to make the decision. In such situations we just say something like "Report whether startup progress has caused a timeout, return true and rearm the timer if it did, or just return false otherwise"; and we don't indicate what the value is going to be used *for*. Then the caller can use the boolean return value to make a decision, such as whether something is going to be logged. This function can be oblivious to details such as this: > + /* If the timeout has not occurred, then no need to log the details. */ > + if (!startup_progress_timer_expired) > + return false; here we can just say "No timeout has occurred" and make no inference about what's going to happen or not happen. Also, for functions that do things like this we typically use English sentence structure with the function name starting with the verb -- perhaps has_startup_progress_timeout_expired(). Sometimes we are lax about this if we have some sort of poor-man's modularisation by using a common prefix for several functions doing related things, which perhaps could be "startup_progress_*" in your case, but your other functions are already not doing that (such as begin_startup_progress_phase) so it's not clear why you would not use the most natural name for this one. > + ereport_startup_progress("syncing data directory (syncfs), elapsed time: %ld.%02d s, current path: %s", > + path); Please make sure to add ereport_startup_progress() as a translation trigger in src/backend/nls.mk. -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
> That's really not what I'm complaining about. I think if we're going > to give an example at all, 10ms is a better example than 100ms, > because 10s is a value that people are more likely to find useful. But > I'm not sure that it's necessary to mention a specific value, and if > it is, I think it needs to be phrased in a less confusing way. > > > >>It also looks pretty silly to say that if you set the value to 10s, something > > >>will happen every 10s. What else would anyone expect to happen? > > > > @Robert: that's consistent with existing documentation, even though it might > > seem obvious and silly to us. > > > > | For example, if you set this to 250ms then all automatic vacuums and analyzes that run 250ms or longer will be logged > > | For example, if you set it to 250ms then all SQL statements that run 250ms or longer will be logged > > Fair enough, but I still don't like it much. I tried my hand at > rewriting this and came up with the attached: > > + Sets the amount of time after which the startup process will log > + a message about a long-running operation that is still in progress, > + as well as the interval between further progress messages for that > + operation. This setting is applied separately to each operation. > + For example, if syncing the data directory takes 25 seconds and > + thereafter resetting unlogged relations takes 8 seconds, and if this > + setting has the default value of 10 seconds, then a messages will be > + logged for syncing the data directory after it has been in progress > + for 10 seconds and again after it has been in progress for 20 seconds, > + but nothing will be logged for resetting unlogged operations. > + A setting of <literal>0</literal> disables the feature. > > I prefer this to Nitin's version because I think it could be unclear > to someone that the value applies separately to each operation, > whereas I don't think we need to document that we can't guarantee that > the messages will be perfectly on time - that's true of every kind of > scheduled event in pretty much every computer system - or what happens > if the system clock goes backwards. Those are things we should try to > get right, as well as we can anyway, but we don't need to tell the > user that we got them right. I thought mentioning the unit in milliseconds and the example in seconds would confuse the user, so I changed the example to milliseconds.The message behind the above description looks good to me however I feel some sentences can be explained in less words. The information related to the units is missing and I feel it should be mentioned in the document. The example says, if the setting has the default value of 10 seconds, then it explains the behaviour. I feel it may not be the default value, it can be any value set by the user. So mentioning 'default' in the example does not look good to me. I feel we just have to mention "if this setting is set to the value of 10 seconds". Below is the modified version of the above information. + Sets the amount of time after every such interval the startup process + will log a message about a long-running operation that is still in + progress. This setting is applied separately to each operation. + For example, if syncing the data directory takes 25 seconds and + thereafter resetting unlogged relations takes 8 seconds, and if this + setting is set to the value of 10 seconds, then a messages will be + logged for syncing the data directory after it has been in progress + for 10 seconds and again after it has been in progress for 20 seconds, + but nothing will be logged for resetting unlogged operations. + A setting of <literal>0</literal> disables the feature. If this value + is specified without units, it is taken as milliseconds. > Well, I see that -1 is now disallowed, and that's good as far as it > goes, but 0 still does not actually disable the feature. I don't > understand why you posted the previous version of the patch without > testing that it works, and I even less understand why you are posting > another version without fixing the bug that I pointed out to you in > the last version. I had added additional code to check the value of the 'log_startup_progress_interval' variable and disable the feature in case of -1 in the earlier versions of the patch (Specifically v9.patch). There was a review comment for v9 patch and it resulted in major refactoring of the patch. The comment was > With these changes you'd have only 1 place in the code that needs to > care about log_startup_progress_interval, as opposed to 3 as you have > it currently, and only one place that enables the timeout, as opposed > to 2 as you have it currently. I think that would be tidier. Based on the above comment and the idea behind enabling the timer, it does not log anything if the value is set to -1. So I thought there is no extra code necessary to disable the feature even though it executes through the code flow. So I did not worry about adding logic to disable the feature. I will take care of this in the next patch. The answer for the question of "I don't understand why you posted the previous version of the patch without testing that it works" is, for the value of -1, the above description was my understanding and for the value of 0, the older versions of the patch was behaving as documented. But with the later versions the behaviour got changed and I missed to modify the documentation. So I modified it in the last version. Please share your thoughts. Thanks & Regards, Nitin Jadhav On Mon, Sep 27, 2021 at 9:47 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Sep 27, 2021 at 9:32 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > > >>It also looks pretty silly to say that if you set the value to 10s, something > > >>will happen every 10s. What else would anyone expect to happen? > > > > @Robert: that's consistent with existing documentation, even though it might > > seem obvious and silly to us. > > > > | For example, if you set this to 250ms then all automatic vacuums and analyzes that run 250ms or longer will be logged > > | For example, if you set it to 250ms then all SQL statements that run 250ms or longer will be logged > > Fair enough, but I still don't like it much. I tried my hand at > rewriting this and came up with the attached: > > + Sets the amount of time after which the startup process will log > + a message about a long-running operation that is still in progress, > + as well as the interval between further progress messages for that > + operation. This setting is applied separately to each operation. > + For example, if syncing the data directory takes 25 seconds and > + thereafter resetting unlogged relations takes 8 seconds, and if this > + setting has the default value of 10 seconds, then a messages will be > + logged for syncing the data directory after it has been in progress > + for 10 seconds and again after it has been in progress for 20 seconds, > + but nothing will be logged for resetting unlogged operations. > + A setting of <literal>0</literal> disables the feature. > > I prefer this to Nitin's version because I think it could be unclear > to someone that the value applies separately to each operation, > whereas I don't think we need to document that we can't guarantee that > the messages will be perfectly on time - that's true of every kind of > scheduled event in pretty much every computer system - or what happens > if the system clock goes backwards. Those are things we should try to > get right, as well as we can anyway, but we don't need to tell the > user that we got them right. > > -- > Robert Haas > EDB: http://www.enterprisedb.com
On Tue, Sep 28, 2021 at 8:06 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > I thought mentioning the unit in milliseconds and the example in > seconds would confuse the user, so I changed the example to > milliseconds.The message behind the above description looks good to me > however I feel some sentences can be explained in less words. The > information related to the units is missing and I feel it should be > mentioned in the document. The example says, if the setting has the > default value of 10 seconds, then it explains the behaviour. I feel it > may not be the default value, it can be any value set by the user. So > mentioning 'default' in the example does not look good to me. I feel > we just have to mention "if this setting is set to the value of 10 > seconds". Below is the modified version of the above information. It is common to mention what the default is as part of the documentation of a GUC. I don't see why this one should be an exception, especially since not mentioning it reduces the length of the documentation by exactly one word. I don't mind the sentence you added at the end to clarify the default units, but the way you've rewritten the first sentence makes it, in my opinion, much less clear. > I had added additional code to check the value of the > 'log_startup_progress_interval' variable and disable the feature in > case of -1 in the earlier versions of the patch (Specifically > v9.patch). There was a review comment for v9 patch and it resulted in > major refactoring of the patch. ... > The answer for the question of "I don't understand why you posted the > previous version of the patch without testing that it works" is, for > the value of -1, the above description was my understanding and for > the value of 0, the older versions of the patch was behaving as > documented. But with the later versions the behaviour got changed and > I missed to modify the documentation. So I modified it in the last > version. v9 was posted on August 3rd. I told you that it wasn't working on September 23rd. You posted a new version that still does not work on September 27th. I think you should have tested each version of your patch before posting it, and especially after any major refactorings. And if for whatever reason you didn't, then certainly after I told you on September 23rd that it didn't work, I think you should have fixed it before posting a new version. -- Robert Haas EDB: http://www.enterprisedb.com
> It is common to mention what the default is as part of the > documentation of a GUC. I don't see why this one should be an > exception, especially since not mentioning it reduces the length of > the documentation by exactly one word. > > I don't mind the sentence you added at the end to clarify the default > units, but the way you've rewritten the first sentence makes it, in my > opinion, much less clear. Ok. I have kept your documentation as it is and added the sentence at the end to clarify the default units. > v9 was posted on August 3rd. I told you that it wasn't working on > September 23rd. You posted a new version that still does not work on > September 27th. I think you should have tested each version of your > patch before posting it, and especially after any major refactorings. > And if for whatever reason you didn't, then certainly after I told you > on September 23rd that it didn't work, I think you should have fixed > it before posting a new version. Sorry. There was a misunderstanding about this and for the patch shared on September 27th, I had tested for the value '0' and observed that no progress messages were getting logged, probably the time at which 'enable_timeout_at' is getting called is past the 'next_timeout' value. This behaviour is completely dependent on the system. Now added an extra condition to disable the feature in case of '0' setting. > I think this comment can be worded better. It says it "decides", but it > doesn't actually decide on any action to take -- it just reports whether > the timer expired or not, to allow its caller to make the decision. In > such situations we just say something like "Report whether startup > progress has caused a timeout, return true and rearm the timer if it > did, or just return false otherwise"; and we don't indicate what the > value is going to be used *for*. Then the caller can use the boolean > return value to make a decision, such as whether something is going to > be logged. This function can be oblivious to details such as this: > > here we can just say "No timeout has occurred" and make no inference > about what's going to happen or not happen. Modified the comment. > Also, for functions that do things like this we typically use English > sentence structure with the function name starting with the verb -- > perhaps has_startup_progress_timeout_expired(). Sometimes we are lax > about this if we have some sort of poor-man's modularisation by using a > common prefix for several functions doing related things, which perhaps > could be "startup_progress_*" in your case, but your other functions are > already not doing that (such as begin_startup_progress_phase) so it's > not clear why you would not use the most natural name for this one. I agree that has_startup_progress_timeout_expired() is better than the existing function name. So I changed the function name accordingly. > Please make sure to add ereport_startup_progress() as a translation > trigger in src/backend/nls.mk. I have added ereport_startup_progress() under the section GETTEXT_TRIGGERS and GETTEXT_FLAGS in src/backend/nls.mk. Also verified the messages in src/backend/po/postgres.pot file. Kindly let me know if I have missed anything. Thanks & Regards, Nitin Jadhav On Tue, Sep 28, 2021 at 8:29 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Sep 28, 2021 at 8:06 AM Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > I thought mentioning the unit in milliseconds and the example in > > seconds would confuse the user, so I changed the example to > > milliseconds.The message behind the above description looks good to me > > however I feel some sentences can be explained in less words. The > > information related to the units is missing and I feel it should be > > mentioned in the document. The example says, if the setting has the > > default value of 10 seconds, then it explains the behaviour. I feel it > > may not be the default value, it can be any value set by the user. So > > mentioning 'default' in the example does not look good to me. I feel > > we just have to mention "if this setting is set to the value of 10 > > seconds". Below is the modified version of the above information. > > It is common to mention what the default is as part of the > documentation of a GUC. I don't see why this one should be an > exception, especially since not mentioning it reduces the length of > the documentation by exactly one word. > > I don't mind the sentence you added at the end to clarify the default > units, but the way you've rewritten the first sentence makes it, in my > opinion, much less clear. > > > I had added additional code to check the value of the > > 'log_startup_progress_interval' variable and disable the feature in > > case of -1 in the earlier versions of the patch (Specifically > > v9.patch). There was a review comment for v9 patch and it resulted in > > major refactoring of the patch. > ... > > The answer for the question of "I don't understand why you posted the > > previous version of the patch without testing that it works" is, for > > the value of -1, the above description was my understanding and for > > the value of 0, the older versions of the patch was behaving as > > documented. But with the later versions the behaviour got changed and > > I missed to modify the documentation. So I modified it in the last > > version. > > v9 was posted on August 3rd. I told you that it wasn't working on > September 23rd. You posted a new version that still does not work on > September 27th. I think you should have tested each version of your > patch before posting it, and especially after any major refactorings. > And if for whatever reason you didn't, then certainly after I told you > on September 23rd that it didn't work, I think you should have fixed > it before posting a new version. > > -- > Robert Haas > EDB: http://www.enterprisedb.com
Attachment
So, I've wondered about this part all along: > +/* > + * Calculates the timestamp at which the next timer should expire and enables > + * the timer accordingly. > + */ > +static void > +reset_startup_progress_timeout(TimestampTz now) > +{ > + TimestampTz next_timeout; > + > + next_timeout = TimestampTzPlusMilliseconds(scheduled_startup_progress_timeout, > + log_startup_progress_interval); > + if (next_timeout < now) > + { > + /* > + * Either the timeout was processed so late that we missed an > + * entire cycle or system clock was set backwards. > + */ > + next_timeout = TimestampTzPlusMilliseconds(now, log_startup_progress_interval); > + } > + > + enable_timeout_at(STARTUP_PROGRESS_TIMEOUT, next_timeout); Why is it that we set the next timeout to fire not at "now + interval" but at "when-it-should-have-fired-but-didn't + interval"? As a user, if I request a message to be logged every N milliseconds, and one of them is a little bit delayed, then (assuming I set it to 10s) I still expect the next one to occur at now+10s. I don't expect the next at "now+5s" if one is delayed 5s. In other words, I think this function should just be enable_timeout_after(STARTUP_PROGRESS_TIMEOUT, log_startup_progress_interval); This means you can remove the scheduled_startup_progress_timeout variable. -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/ "No hay ausente sin culpa ni presente sin disculpa" (Prov. francés)
On Wed, Sep 29, 2021 at 10:08 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > Sorry. There was a misunderstanding about this and for the patch > shared on September 27th, I had tested for the value '0' and observed > that no progress messages were getting logged, probably the time at > which 'enable_timeout_at' is getting called is past the 'next_timeout' > value. This behaviour is completely dependent on the system. Now added > an extra condition to disable the feature in case of '0' setting. Oh, interesting. I failed to consider that the behavior might vary from one system to another. I just noticed something else which I realize is the indirect result of my own suggestion but which doesn't actually look all that nice. You've now got a call to RegisterTimeout(STARTUP_PROGRESS_TIMEOUT, ...) in InitPostgres, guarded by ! IsPostmasterEnvironment, and then another one in StartupProcessMain(). I think that's so that the feature works in single-user mode, but that means that technically, we're not reporting on the progress of the startup process. We're reporting progress on the startup operations that are normally performed by the startup process. But that means that the documentation isn't quite accurate (because it mentions the startup process specifically) and that the placement of the code in startup.c is suspect (because that's specifically for the startup process) and that basically every instance of the word "process" in the patch is technically a little bit wrong. I'm not sure if that's a big enough problem to be worth worrying about or exactly what we ought to do about it, but it doesn't seem fantastic. At a minimum, I think we should probably try to eliminate as many references to the startup process as we can, and talk about startup progress or startup operations or something like that. + * Start timestamp of the operation that occur during startup process. This is not correct grammar - it would need to be "operations that occur" or "operation that occurs". But neither seems particularly clear about what the variable actually does. How about "Time at which the most recent startup operation started"? + * Indicates the timestamp at which the timer was supposed to expire. "Indicates" can be deleted, but also I think it would be better to state the purpose of the timer i.e. "Timestamp at which the next startup progress message should be logged." + enable_timeout_at(STARTUP_PROGRESS_TIMEOUT, next_timeout); + scheduled_startup_progress_timeout = next_timeout; + startup_progress_timer_expired = false; I think you should set startup_progress_timer_expired to false before calling enable_timeout_at. Otherwise there's a race condition. It's unlikely that the timer could expire and the interrupt handler fire before we reach startup_progress_timer_expired = false, but it seems like there's no reason to take a chance. + * Calculates the timestamp at which the next timer should expire and enables So in some places you have verbs with an "s" on the end, like here, and in other places without, like in the next example. In "telegraph style" comments like this, this implicit subject is "it" or "this," but you don't write that. However you write the rest of the sentence as if it were there. So this should say "calculate" and "enable" rather than "calculates" and "enables". + * Schedule a wakeup call for logging the progress of startup process. This isn't really an accurate description, I think. It's not scheduling anything, and I don't know what a "wakeup call" is anyway. "Set a flag indicating that it's time to log a progress report" might be better. + * Sets the start timestamp of the current operation and also enables the Set. enable. + * timeout for logging the progress of startup process. I think you could delete "for logging the progress of startup process" here; that seems clear enough, and this reads a bit awkwardly. If you want to keep something like this I wrote write "...enable the timeout so that the progress of the startup progress will be logged." + * the timer if it did, otheriwse return false. otherwise + * Begin the startup progress phase to report the progress of syncing + * data directory (syncfs). All of the comments that start with "Begin the startup progress phase" should instead start with "Begin startup progress phase". I think this could be condensed down to "Prepare to report progress syncing the data directory via syncfs", and likewise "Prepare to report progress of the pre-fsync phase", "Prepare to report progress resetting unlogged relations," etc. + gettext_noop("Sets the time interval between each progress update " + "of the startup process."), This is actually inaccurate. It's sort of the same problem I was worried about with respect to the documentation: it's NOT the interval between progress updates, because it applies separately to each operation. We need to say something that makes that clear, or at least that doesn't get overtly the opposite impression. It's hard to do that briefly, but maybe something like "Time between progress updates for long-running startup operations"? Whatever we use here could also be the comment for log_startup_progress_interval. + * Logs the startup progress message if the timer has expired. the -> a -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, Sep 29, 2021 at 1:36 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Why is it that we set the next timeout to fire not at "now + interval" > but at "when-it-should-have-fired-but-didn't + interval"? As a user, if > I request a message to be logged every N milliseconds, and one > of them is a little bit delayed, then (assuming I set it to 10s) I still > expect the next one to occur at now+10s. I don't expect the next at > "now+5s" if one is delayed 5s. Well, this was my suggestion, because if you don't do this, you get drift, which I think looks weird. Like the timestamps will be: 13:41:05.012456 13:41:15.072484 13:41:25.149632 ...and it gets further and further off as it goes on.' I guess my expectation is different from yours: I expect that if I ask for a message every 10 seconds, the time between messages is going to be 10s, at least on average, not 10s + however much latency the system has. -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, Sep 29, 2021 at 02:36:14PM -0300, Alvaro Herrera wrote: > Why is it that we set the next timeout to fire not at "now + interval" > but at "when-it-should-have-fired-but-didn't + interval"? As a user, if > I request a message to be logged every N milliseconds, and one > of them is a little bit delayed, then (assuming I set it to 10s) I still > expect the next one to occur at now+10s. I don't expect the next at > "now+5s" if one is delayed 5s. > > In other words, I think this function should just be > enable_timeout_after(STARTUP_PROGRESS_TIMEOUT, log_startup_progress_interval); > > This means you can remove the scheduled_startup_progress_timeout > variable. Robert requested the current behavior here. https://www.postgresql.org/message-id/CA%2BTgmoYkS1ZeWdSMFMBecMNxWonHk6J5eoX4FEQrpKtvEbXqGQ%40mail.gmail.com It's confusing (at least) to get these kind of requests to change the behavior back and forth. -- Justin
On 2021-Sep-29, Robert Haas wrote: > Well, this was my suggestion, because if you don't do this, you get > drift, which I think looks weird. Like the timestamps will be: > > 13:41:05.012456 > 13:41:15.072484 > 13:41:25.149632 > > ...and it gets further and further off as it goes on.' Right ... I actually *expect* this drift to occur. Maybe people generally don't like this, it just seems natural to me. Are there other opinions on this aspect? -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/ "Nadie está tan esclavizado como el que se cree libre no siéndolo" (Goethe)
On 2021-Sep-29, Justin Pryzby wrote: > Robert requested the current behavior here. > https://www.postgresql.org/message-id/CA%2BTgmoYkS1ZeWdSMFMBecMNxWonHk6J5eoX4FEQrpKtvEbXqGQ%40mail.gmail.com > > It's confusing (at least) to get these kind of requests to change the behavior > back and forth. Well, I did scan the thread to see if this had been discussed, and I overlooked that message. But there was no reply to that message, so it's not clear whether this was just Robert's opinion or consensus; in fact we now have exactly two votes on it (mine and Robert's). I think one person casting an opinion on one aspect does not set that aspect in stone. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "It takes less than 2 seconds to get to 78% complete; that's a good sign. A few seconds later it's at 90%, but it seems to have stuck there. Did somebody make percentages logarithmic while I wasn't looking?" http://smylers.hates-software.com/2005/09/08/1995c749.html
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > On 2021-Sep-29, Robert Haas wrote: >> Well, this was my suggestion, because if you don't do this, you get >> drift, which I think looks weird. Like the timestamps will be: >> >> 13:41:05.012456 >> 13:41:15.072484 >> 13:41:25.149632 >> >> ...and it gets further and further off as it goes on.' > Right ... I actually *expect* this drift to occur. Maybe people > generally don't like this, it just seems natural to me. Are there other > opinions on this aspect? FWIW, I agree with Robert that it's nicer if the timeout doesn't drift. There's a limit to how much complexity I'm willing to tolerate for that, but it doesn't seem like this exceeds it. The real comment I'd have here, though, is that writing one-off code for this purpose is bad. If we have a need for a repetitive timeout, it'd be better to add the feature to timeout.c explicitly. That would probably also remove the need for extra copies of the timeout time. regards, tom lane
On Wed, Sep 29, 2021 at 1:52 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > I think one person casting an opinion on one aspect does not set that > aspect in stone. Of course not. I was just explaining that how the patch ended up like it did. -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, Sep 29, 2021 at 2:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > The real comment I'd have here, though, is that writing one-off > code for this purpose is bad. If we have a need for a repetitive > timeout, it'd be better to add the feature to timeout.c explicitly. > That would probably also remove the need for extra copies of the > timeout time. I'm not sure that really helps very much, honestly. I mean it would be useful in this particular case, but there are other cases where we have logic like this already, and this wouldn't do anything about those. For example, consider autoprewarm_main(). Like this code, that code thinks (perhaps just because I'm the one who reviewed it) that the next time should be measured from the last time ... but an enhancement to the timeout machinery wouldn't help it at all. I suspect there are other cases like this elsewhere, because this is what I personally tend to think is the right behavior and I feel like it comes up in patch reviews from time to time, but I'm not finding any at the moment. Even if I'm right that they exist, I'm not sure they look much like each other or can easily reuse any code. And then again on the other hand, BackgroundWriterMain() thinks that the next time should be measured from the time we got around to doing it, not the scheduled time. I guess we don't really have any consistent practice here. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Sep 29, 2021 at 2:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The real comment I'd have here, though, is that writing one-off >> code for this purpose is bad. If we have a need for a repetitive >> timeout, it'd be better to add the feature to timeout.c explicitly. >> That would probably also remove the need for extra copies of the >> timeout time. > I'm not sure that really helps very much, honestly. I didn't claim there are any other places that could use the feature *today*. But once we've got one, it seems like there could be more tomorrow. In any case, I dislike keeping timeout state data outside timeout.c, because it's so likely to get out-of-sync that way. regards, tom lane
On Wed, Sep 29, 2021 at 5:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I didn't claim there are any other places that could use the feature > *today*. But once we've got one, it seems like there could be more > tomorrow. In any case, I dislike keeping timeout state data outside > timeout.c, because it's so likely to get out-of-sync that way. Well, I had a quick go at implementing this (attached). It seems to do a satisfactory job preventing drift over time, but it doesn't behave nicely if you set the system clock backward. With a bit of extra debugging output: 2021-09-30 14:23:50.291 EDT [2279] LOG: scheduling wakeup in 2 secs, 998727 usecs 2021-09-30 14:23:53.291 EDT [2279] LOG: scheduling wakeup in 2 secs, 998730 usecs 2021-09-30 14:23:56.291 EDT [2279] LOG: scheduling wakeup in 2 secs, 998728 usecs 2021-09-30 14:20:01.154 EDT [2279] LOG: scheduling wakeup in 238 secs, 135532 usecs 2021-09-30 14:23:59.294 EDT [2279] LOG: scheduling wakeup in 2 secs, 995922 use The issue here is that fin_time is really the driving force behind everything timeout.c does. In particular, it determines the order of active_timeouts[]. And that's not really correct either for enable_timeout_after(), or for the new function I added in this draft patch, enable_timeout_every(). When I say I want my handler to be fired in 3 s, I don't mean that I want it to be fired when the system time is 3 seconds greater than it is right now. I mean I want it to be fired in 3 actual seconds, regardless of what dumb thing the system clock may choose to do. I don't really think that precise behavior can be implemented, but ideally if a timeout that was supposed to happen after 3 s is now scheduled for a time that is more than 3 seconds beyond the current value of the system clock, we'd move the firing time backwards to 3 seconds beyond the current system clock. That way, if you set the system time backward by 4 minutes, you might see a few seconds of delay before the next firing, but you wouldn't go into the tank for 4 minutes. I don't see an obvious way of making timeout.c behave like that without major surgery, though. If nobody else does either, then we could (1) stick with something closer to Nitin's current patch, which tries to handle this concern outside of timeout.c, (2) adopt something like the attached 0001 and leave the question of improved behavior in case of backwards system clock adjustments for another day, or (3) undertake to rewrite timeout.c as a precondition of being able to log messages about why startup is slow. I'm not a huge fan of (3) but I'm open to suggestions. -- Robert Haas EDB: http://www.enterprisedb.com
Attachment
Robert Haas <robertmhaas@gmail.com> writes: > ... When I say I want my handler to be > fired in 3 s, I don't mean that I want it to be fired when the system > time is 3 seconds greater than it is right now. I mean I want it to be > fired in 3 actual seconds, regardless of what dumb thing the system > clock may choose to do. That would be lovely, certainly. But aren't you moving the goalposts rather far? I don't think we make any promises about such things today, so why has the issue suddenly gotten more pressing? In particular, why do you think Nitin's patch is proof against this? Seems to me it's probably got *more* failure cases, not fewer, if the system clock is acting funny. BTW, one could imagine addressing this concern by having timeout.c work with CLOCK_MONOTONIC instead of the regular wall clock. But I fear we'd have to drop enable_timeout_at(), for lack of ability to translate between CLOCK_MONOTONIC timestamps and those used by anybody else. Also get_timeout_start_time/get_timeout_finish_time would become problematic. Maybe we only really care about deltas, so the more restrictive API would be workable, but it seems like a nontrivial amount of work. On the whole, in these days of NTP, I'm not sure I care to spend large amounts of effort on dealing with a bogus system clock. regards, tom lane
On Thu, Sep 30, 2021 at 3:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > That would be lovely, certainly. But aren't you moving the goalposts > rather far? I don't think we make any promises about such things > today, so why has the issue suddenly gotten more pressing? Yeah, perhaps it's best to not to worry about it. I dislike failure to worry about that case on general principle, but I agree with you that it seems to be moving the goalposts a disproportionate distance. > In particular, > why do you think Nitin's patch is proof against this? Seems to me it's > probably got *more* failure cases, not fewer, if the system clock is > acting funny. You might be right. I sort of assumed that timeout.c had some defense against this, but since that seems not to be the case, I suppose no facility that depends on it can hope to stay out of trouble either. > On the whole, in these days of NTP, I'm not sure I care to spend > large amounts of effort on dealing with a bogus system clock. It's certainly less of an issue than it used to be back in my day. Any thoughts on the patch I attached? -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Sep 30, 2021 at 05:08:17PM -0400, Robert Haas wrote: > It's certainly less of an issue than it used to be back in my day. > > Any thoughts on the patch I attached? I don't know. Anyway, this is actively worked on, so I have taken the liberty to move that to the next CF. -- Michael
Attachment
On Wed, Sep 29, 2021 at 11:10 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Sep 29, 2021 at 10:08 AM Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > Sorry. There was a misunderstanding about this and for the patch > > shared on September 27th, I had tested for the value '0' and observed > > that no progress messages were getting logged, probably the time at > > which 'enable_timeout_at' is getting called is past the 'next_timeout' > > value. This behaviour is completely dependent on the system. Now added > > an extra condition to disable the feature in case of '0' setting. > > Oh, interesting. I failed to consider that the behavior might vary > from one system to another. > > I just noticed something else which I realize is the indirect result > of my own suggestion but which doesn't actually look all that nice. > You've now got a call to RegisterTimeout(STARTUP_PROGRESS_TIMEOUT, > ...) in InitPostgres, guarded by ! IsPostmasterEnvironment, and then > another one in StartupProcessMain(). I think that's so that the > feature works in single-user mode, but that means that technically, > we're not reporting on the progress of the startup process. We're > reporting progress on the startup operations that are normally > performed by the startup process. But that means that the > documentation isn't quite accurate (because it mentions the startup > process specifically) and that the placement of the code in startup.c > is suspect (because that's specifically for the startup process) and > that basically every instance of the word "process" in the patch is > technically a little bit wrong. I'm not sure if that's a big enough > problem to be worth worrying about or exactly what we ought to do > about it, but it doesn't seem fantastic. At a minimum, I think we > should probably try to eliminate as many references to the startup > process as we can, and talk about startup progress or startup > operations or something like that. > > + * Start timestamp of the operation that occur during startup process. > > This is not correct grammar - it would need to be "operations that > occur" or "operation that occurs". But neither seems particularly > clear about what the variable actually does. How about "Time at which > the most recent startup operation started"? > > + * Indicates the timestamp at which the timer was supposed to expire. > > "Indicates" can be deleted, but also I think it would be better to > state the purpose of the timer i.e. "Timestamp at which the next > startup progress message should be logged." > > + enable_timeout_at(STARTUP_PROGRESS_TIMEOUT, next_timeout); > + scheduled_startup_progress_timeout = next_timeout; > + startup_progress_timer_expired = false; > > I think you should set startup_progress_timer_expired to false before > calling enable_timeout_at. Otherwise there's a race condition. It's > unlikely that the timer could expire and the interrupt handler fire > before we reach startup_progress_timer_expired = false, but it seems > like there's no reason to take a chance. > > + * Calculates the timestamp at which the next timer should expire and enables > > So in some places you have verbs with an "s" on the end, like here, > and in other places without, like in the next example. In "telegraph > style" comments like this, this implicit subject is "it" or "this," > but you don't write that. However you write the rest of the sentence > as if it were there. So this should say "calculate" and "enable" > rather than "calculates" and "enables". > > + * Schedule a wakeup call for logging the progress of startup process. > > This isn't really an accurate description, I think. It's not > scheduling anything, and I don't know what a "wakeup call" is anyway. > "Set a flag indicating that it's time to log a progress report" might > be better. > > + * Sets the start timestamp of the current operation and also enables the > > Set. enable. > > + * timeout for logging the progress of startup process. > > I think you could delete "for logging the progress of startup process" > here; that seems clear enough, and this reads a bit awkwardly. If you > want to keep something like this I wrote write "...enable the timeout > so that the progress of the startup progress will be logged." > > + * the timer if it did, otheriwse return false. > > otherwise > > + * Begin the startup progress phase to report the progress of syncing > + * data directory (syncfs). > > All of the comments that start with "Begin the startup progress phase" > should instead start with "Begin startup progress phase". > > I think this could be condensed down to "Prepare to report progress > syncing the data directory via syncfs", and likewise "Prepare to > report progress of the pre-fsync phase", "Prepare to report progress > resetting unlogged relations," etc. > > + gettext_noop("Sets the time interval between each progress update " > + "of the startup process."), > > This is actually inaccurate. It's sort of the same problem I was > worried about with respect to the documentation: it's NOT the interval > between progress updates, because it applies separately to each > operation. We need to say something that makes that clear, or at least > that doesn't get overtly the opposite impression. It's hard to do that > briefly, but maybe something like "Time between progress updates for > long-running startup operations"? > > Whatever we use here could also be the comment for > log_startup_progress_interval. > > + * Logs the startup progress message if the timer has expired. > > the -> a > In addition I have little concern about ereport_startup_progress() use: + if (!StandbyMode) + ereport_startup_progress("redo in progress, elapsed time: %ld.%02d s, current LSN: %X/%X", + LSN_FORMAT_ARGS(ReadRecPtr)); The format string, its input, input count, and input position is not properly aligned. The input for "elapsed time: %ld.%02d s" is getting value implicitly inside ereport_startup_progress() which could be confusing, usually, the input count should be the same as expected in the format string in the same sequence that is expected to appear in the log message. I think the "elapsed time" part can be implicitly added to the error message inside ereport_startup_progress() which is common to all calls. Regards, Amul
On Wed, Oct 13, 2021 at 9:06 AM Amul Sul <sulamul@gmail.com> wrote: > I think the "elapsed time" part can be implicitly added to the error > message inside ereport_startup_progress() which is common to all > calls. Not if it means having to call psprintf there! If there's some way we could do it with macro tricks, it might be worth considering, but I'm not sure there is, or that it would be less confusing. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Sep 30, 2021 at 5:08 PM Robert Haas <robertmhaas@gmail.com> wrote: > Any thoughts on the patch I attached? Apparently not, but here's a v2 anyway. In this version I made enable_timeout_every() a three-argument function, so that the caller can specify both the first time at which the timeout routine should be called and the interval between them, instead of only the latter. That seems to be more convenient for this use case, and is more powerful in general. -- Robert Haas EDB: http://www.enterprisedb.com
Attachment
> Apparently not, but here's a v2 anyway. In this version I made > enable_timeout_every() a three-argument function, so that the caller > can specify both the first time at which the timeout routine should be > called and the interval between them, instead of only the latter. That > seems to be more convenient for this use case, and is more powerful in > general. Thanks for sharing the patch. Overall approach looks good to me. But just one concern about using enable_timeout_every() functionality. As per my understanding the caller should calculate the first scheduled timeout (now + interval) and pass it as the second argument but this is not the same in 'v2-0002-Quick-testing-hack.patch'. Anyways I have done the changes as I have mentioned (like now + interval). Kindly correct me if I am wrong. I am attaching 2 patches here. 'v19-0001-Add-enable_timeout_every-to-fire-the-same-timeout.patch' is the same as Robert's v2 patch. I have rebased my patch on top of this and it is 'v19-0002-startup-progress.patch'. > I just noticed something else which I realize is the indirect result > of my own suggestion but which doesn't actually look all that nice. > You've now got a call to RegisterTimeout(STARTUP_PROGRESS_TIMEOUT, > ...) in InitPostgres, guarded by ! IsPostmasterEnvironment, and then > another one in StartupProcessMain(). I think that's so that the > feature works in single-user mode, but that means that technically, > we're not reporting on the progress of the startup process. We're > reporting progress on the startup operations that are normally > performed by the startup process. But that means that the > documentation isn't quite accurate (because it mentions the startup > process specifically) and that the placement of the code in startup.c > is suspect (because that's specifically for the startup process) and > that basically every instance of the word "process" in the patch is > technically a little bit wrong. I'm not sure if that's a big enough > problem to be worth worrying about or exactly what we ought to do > about it, but it doesn't seem fantastic. At a minimum, I think we > should probably try to eliminate as many references to the startup > process as we can, and talk about startup progress or startup > operations or something like that. Yeah right. I have modified the comments accordingly and also fixed the other review comments related to the code comments. I have modified the code to include a call to RegisterTimeout() in only one place than the two calls previously. Initially I thought to call this in begin_startup_progress_phase(). I feel this is not a better choice since begin_startup_progress_phase function is getting called in many places. So it calls RegisterTimeout() many times which is not required. I feel StartupXLOG() is a better place for this since StartupXLOG() gets called during the startup process, bootstrap process or standalone backend. As per earlier discussion we need support for this in the case of startup process and standalone backend. Hence guarded this with '!IsBootstrapProcessingMode()'. Also verified the behaviour in both of the cases. Please correct me if I am wrong. Thanks & Regards, Nitin Jadhav On Mon, Oct 18, 2021 at 9:15 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Sep 30, 2021 at 5:08 PM Robert Haas <robertmhaas@gmail.com> wrote: > > Any thoughts on the patch I attached? > > Apparently not, but here's a v2 anyway. In this version I made > enable_timeout_every() a three-argument function, so that the caller > can specify both the first time at which the timeout routine should be > called and the interval between them, instead of only the latter. That > seems to be more convenient for this use case, and is more powerful in > general. > > -- > Robert Haas > EDB: http://www.enterprisedb.com
Attachment
On Tue, Oct 19, 2021 at 9:06 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > Thanks for sharing the patch. Overall approach looks good to me. But > just one concern about using enable_timeout_every() functionality. As > per my understanding the caller should calculate the first scheduled > timeout (now + interval) and pass it as the second argument but this > is not the same in 'v2-0002-Quick-testing-hack.patch'. Anyways I have > done the changes as I have mentioned (like now + interval). Kindly > correct me if I am wrong. I am attaching 2 patches here. > 'v19-0001-Add-enable_timeout_every-to-fire-the-same-timeout.patch' is > the same as Robert's v2 patch. I have rebased my patch on top of this > and it is 'v19-0002-startup-progress.patch'. This version looks fine, so I have committed it (and my enable_timeout_every patch also, as a necessary prerequisite). Thanks, -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Oct 25, 2021 at 9:26 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Oct 19, 2021 at 9:06 AM Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > Thanks for sharing the patch. Overall approach looks good to me. But > > just one concern about using enable_timeout_every() functionality. As > > per my understanding the caller should calculate the first scheduled > > timeout (now + interval) and pass it as the second argument but this > > is not the same in 'v2-0002-Quick-testing-hack.patch'. Anyways I have > > done the changes as I have mentioned (like now + interval). Kindly > > correct me if I am wrong. I am attaching 2 patches here. > > 'v19-0001-Add-enable_timeout_every-to-fire-the-same-timeout.patch' is > > the same as Robert's v2 patch. I have rebased my patch on top of this > > and it is 'v19-0002-startup-progress.patch'. > > This version looks fine, so I have committed it (and my > enable_timeout_every patch also, as a necessary prerequisite). Thanks for getting this in. I have few more thoughts: Can we also log the total time the startup process took to recover, and also the total time each stage of the recovery/redo processing took: 1) into a file or 2) emitting that info via a new hook 3) into a system catalog table (assuming at the end of the recovery the database is in a consistent state, but I'm not sure if we ever update any catalog tables in/after the startup/recovery phase). This will help the users/admins/developers for summarizing, analytical and debugging purposes. This information can easily help us to understand the recovery patterns. Thoughts? If okay, I can spend some more time and start a separate thread to discuss. Regards, Bharath Rupireddy.
On Tue, Oct 26, 2021 at 4:19 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > Can we also log the total time the startup process took to recover, > and also the total time each stage of the recovery/redo processing > took: 1) into a file or 2) emitting that info via a new hook 3) into a > system catalog table (assuming at the end of the recovery the database > is in a consistent state, but I'm not sure if we ever update any > catalog tables in/after the startup/recovery phase). #3 would be hard to do because there could be any number of databases and it is unclear which one we ought to update. Also, we'd have to launch a background worker to connect to that database in order to do it, and be prepared for what happens if that worker fails to get the work done for whatever reason. Also, it is unclear why we should only log this specific thing to a system catalog and not anything else. #1 and #2 could certainly be done, but I'm not sure what the use case is, and I think it's a separate proposal from what we did here. So I think a new thread would be appropriate if you want to make a new proposal. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Oct 25, 2021 at 11:56 AM Robert Haas <robertmhaas@gmail.com> wrote: > This version looks fine, so I have committed it (and my > enable_timeout_every patch also, as a necessary prerequisite). I was fooling around with a test setup today, working on an unrelated problem, and this happened: 2021-10-28 14:21:23.145 EDT [92010] LOG: resetting unlogged relations (init), elapsed time: 0.00 s, current path: base/13020 That's not supposed to happen. I assume the problem is that the timeout for the previous phase fired just as we were beginning a new one, and the code got confused. I think we probably need to do something like this to make sure that the timeout from one operation can't trigger a log message for the next: diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c index 28e68dd871..47ec737888 100644 --- a/src/backend/postmaster/startup.c +++ b/src/backend/postmaster/startup.c @@ -320,6 +320,8 @@ begin_startup_progress_phase(void) if (log_startup_progress_interval == 0) return; + disable_timeout(STARTUP_PROGRESS_TIMEOUT, false); + startup_progress_timer_expired = false; startup_progress_phase_start_time = GetCurrentTimestamp(); fin_time = TimestampTzPlusMilliseconds(startup_progress_phase_start_time, log_startup_progress_interval); Thoughts? -- Robert Haas EDB: http://www.enterprisedb.com
> I was fooling around with a test setup today, working on an unrelated > problem, and this happened: > > 2021-10-28 14:21:23.145 EDT [92010] LOG: resetting unlogged relations > (init), elapsed time: 0.00 s, current path: base/13020 Nice catch and interesting case. > That's not supposed to happen. I assume the problem is that the > timeout for the previous phase fired just as we were beginning a new > one, and the code got confused. I think we probably need to do > something like this to make sure that the timeout from one operation > can't trigger a log message for the next: > > diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c > index 28e68dd871..47ec737888 100644 > --- a/src/backend/postmaster/startup.c > +++ b/src/backend/postmaster/startup.c > @@ -320,6 +320,8 @@ begin_startup_progress_phase(void) > if (log_startup_progress_interval == 0) > return; > > + disable_timeout(STARTUP_PROGRESS_TIMEOUT, false); > + startup_progress_timer_expired = false; > startup_progress_phase_start_time = GetCurrentTimestamp(); > fin_time = TimestampTzPlusMilliseconds(startup_progress_phase_start_time, > log_startup_progress_interval); > > Thoughts? Yes. I agree that this is not an expected behaviour and I do agree that, probably the timeout for the previous phase fired just as we were beginning a new one. For each operation, we call begin_startup_progress_phase() before starting the operation and one/multiple calls to ereport_startup_progress() to report the progress and intentionally we have not added any functionality to disable the timer at the end of the operation. The timer remains active and there may be multiple calls to startup_progress_timeout_handler() which sets a flag to true. So whenever we call a begin_startup_progress_phase() for the next operation, we do enable the timer (In my understanding it will internally disable the old timer and schedule a new one) but the flag is already set to true by the previous timer. Hence the next call to ereport_startup_progress() logs a message. So I feel just setting 'startup_progress_timer_expired' to false in begin_startup_progress_phase() would work. Please correct me if I am wrong. Thanks & Regards, Nitin Jadhav On Thu, Oct 28, 2021 at 11:59 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Oct 25, 2021 at 11:56 AM Robert Haas <robertmhaas@gmail.com> wrote: > > This version looks fine, so I have committed it (and my > > enable_timeout_every patch also, as a necessary prerequisite). > > I was fooling around with a test setup today, working on an unrelated > problem, and this happened: > > 2021-10-28 14:21:23.145 EDT [92010] LOG: resetting unlogged relations > (init), elapsed time: 0.00 s, current path: base/13020 > > That's not supposed to happen. I assume the problem is that the > timeout for the previous phase fired just as we were beginning a new > one, and the code got confused. I think we probably need to do > something like this to make sure that the timeout from one operation > can't trigger a log message for the next: > > diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c > index 28e68dd871..47ec737888 100644 > --- a/src/backend/postmaster/startup.c > +++ b/src/backend/postmaster/startup.c > @@ -320,6 +320,8 @@ begin_startup_progress_phase(void) > if (log_startup_progress_interval == 0) > return; > > + disable_timeout(STARTUP_PROGRESS_TIMEOUT, false); > + startup_progress_timer_expired = false; > startup_progress_phase_start_time = GetCurrentTimestamp(); > fin_time = TimestampTzPlusMilliseconds(startup_progress_phase_start_time, > log_startup_progress_interval); > > Thoughts? > > -- > Robert Haas > EDB: http://www.enterprisedb.com
On Fri, Oct 29, 2021 at 7:37 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > ereport_startup_progress() logs a message. So I feel just setting > 'startup_progress_timer_expired' to false in > begin_startup_progress_phase() would work. Please correct me if I am > wrong. I think you're wrong. If we did that, the previous timer could fire right after we set startup_progress_timer_expired = false, and before we reschedule the timeout. It seems annoying to have to disable the timeout and immediately turn around and re-enable it, but I don't see how to avoid the race condition otherwise. -- Robert Haas EDB: http://www.enterprisedb.com
> I think you're wrong. If we did that, the previous timer could fire > right after we set startup_progress_timer_expired = false, and before > we reschedule the timeout. It seems annoying to have to disable the > timeout and immediately turn around and re-enable it, but I don't see > how to avoid the race condition otherwise. Right. There is a possibility of race conditions. In that case the above changes look good to me. Thanks & Regards, Nitin Jadhav On Fri, Oct 29, 2021 at 6:10 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Fri, Oct 29, 2021 at 7:37 AM Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > ereport_startup_progress() logs a message. So I feel just setting > > 'startup_progress_timer_expired' to false in > > begin_startup_progress_phase() would work. Please correct me if I am > > wrong. > > I think you're wrong. If we did that, the previous timer could fire > right after we set startup_progress_timer_expired = false, and before > we reschedule the timeout. It seems annoying to have to disable the > timeout and immediately turn around and re-enable it, but I don't see > how to avoid the race condition otherwise. > > -- > Robert Haas > EDB: http://www.enterprisedb.com > >
On Fri, Oct 29, 2021 at 9:10 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > I think you're wrong. If we did that, the previous timer could fire > > right after we set startup_progress_timer_expired = false, and before > > we reschedule the timeout. It seems annoying to have to disable the > > timeout and immediately turn around and re-enable it, but I don't see > > how to avoid the race condition otherwise. > > Right. There is a possibility of race conditions. In that case the > above changes look good to me. Committed. -- Robert Haas EDB: http://www.enterprisedb.com
On Sat, Oct 30, 2021 at 7:44 AM Robert Haas <robertmhaas@gmail.com> wrote: > Committed. Is it expected that an otherwise idle standby's recovery process receives SIGALRM every N seconds, or should the timer be canceled at that point, as there is no further progress to report?
On Tue, Nov 8, 2022 at 4:35 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Sat, Oct 30, 2021 at 7:44 AM Robert Haas <robertmhaas@gmail.com> wrote: > > Committed. > > Is it expected that an otherwise idle standby's recovery process > receives SIGALRM every N seconds, or should the timer be canceled at > that point, as there is no further progress to report? Nice catch. Yeah, that seems unnecessary, see the below standby logs. I think we need to disable_timeout(STARTUP_PROGRESS_TIMEOUT, false);, something like the attached? I think there'll be no issue with the patch since the StandbyMode gets reset only at the end of recovery (in FinishWalRecovery()) but it can very well be set during recovery (in ReadRecord()). Note that I also added an assertion in has_startup_progress_timeout_expired(), just in case. 2022-11-08 11:28:23.563 UTC [980909] LOG: SIGALRM handle_sig_alarm received 2022-11-08 11:28:23.563 UTC [980909] LOG: startup_progress_timeout_handler called 2022-11-08 11:28:33.563 UTC [980909] LOG: SIGALRM handle_sig_alarm received 2022-11-08 11:28:33.563 UTC [980909] LOG: startup_progress_timeout_handler called 2022-11-08 11:28:43.563 UTC [980909] LOG: SIGALRM handle_sig_alarm received 2022-11-08 11:28:43.563 UTC [980909] LOG: startup_progress_timeout_handler called 2022-11-08 11:28:53.563 UTC [980909] LOG: SIGALRM handle_sig_alarm received 2022-11-08 11:28:53.563 UTC [980909] LOG: startup_progress_timeout_handler called Whilte at it, I noticed that we report redo progress for PITR, but we don't report when standby enters archive recovery mode, say due to a failure in the connection to primary or after the promote signal is found. Isn't it useful to report in this case as well to know the recovery progress? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On Tue, 8 Nov 2022 at 12:33, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Tue, Nov 8, 2022 at 4:35 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > > > On Sat, Oct 30, 2021 at 7:44 AM Robert Haas <robertmhaas@gmail.com> wrote: > > > Committed. > > > > Is it expected that an otherwise idle standby's recovery process > > receives SIGALRM every N seconds, or should the timer be canceled at > > that point, as there is no further progress to report? > > Nice catch. Yeah, that seems unnecessary, see the below standby logs. > I think we need to disable_timeout(STARTUP_PROGRESS_TIMEOUT, false);, > something like the attached? I think there'll be no issue with the > patch since the StandbyMode gets reset only at the end of recovery (in > FinishWalRecovery()) but it can very well be set during recovery (in > ReadRecord()). Note that I also added an assertion in > has_startup_progress_timeout_expired(), just in case. > > 2022-11-08 11:28:23.563 UTC [980909] LOG: SIGALRM handle_sig_alarm received > 2022-11-08 11:28:23.563 UTC [980909] LOG: > startup_progress_timeout_handler called > 2022-11-08 11:28:33.563 UTC [980909] LOG: SIGALRM handle_sig_alarm received > 2022-11-08 11:28:33.563 UTC [980909] LOG: > startup_progress_timeout_handler called > 2022-11-08 11:28:43.563 UTC [980909] LOG: SIGALRM handle_sig_alarm received > 2022-11-08 11:28:43.563 UTC [980909] LOG: > startup_progress_timeout_handler called > 2022-11-08 11:28:53.563 UTC [980909] LOG: SIGALRM handle_sig_alarm received > 2022-11-08 11:28:53.563 UTC [980909] LOG: > startup_progress_timeout_handler called > > Whilte at it, I noticed that we report redo progress for PITR, but we > don't report when standby enters archive recovery mode, say due to a > failure in the connection to primary or after the promote signal is > found. Isn't it useful to report in this case as well to know the > recovery progress? I think your patch disables progress too early, effectively turning off the standby progress feature. The purpose was to report on things that take long periods during recovery, not just prior to recovery. I would advocate that we disable progress only while waiting, as I've done here: https://www.postgresql.org/message-id/CANbhV-GcWjZ2cmj0uCbZDWQUHnneMi_4EfY3dVWq0-yD5o7Ccg%40mail.gmail.com -- Simon Riggs http://www.EnterpriseDB.com/
On Mon, Nov 14, 2022 at 7:37 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > Whilte at it, I noticed that we report redo progress for PITR, but we > > don't report when standby enters archive recovery mode, say due to a > > failure in the connection to primary or after the promote signal is > > found. Isn't it useful to report in this case as well to know the > > recovery progress? > > I think your patch disables progress too early, effectively turning > off the standby progress feature. The purpose was to report on things > that take long periods during recovery, not just prior to recovery. > > I would advocate that we disable progress only while waiting, as I've done here: > https://www.postgresql.org/message-id/CANbhV-GcWjZ2cmj0uCbZDWQUHnneMi_4EfY3dVWq0-yD5o7Ccg%40mail.gmail.com Maybe I'm confused here, but I think that, on a standby, startup progress messages are only printed until the main redo loop is reached. Otherwise, we would print a message on a standby every 10s forever, which seems like a thing that most users would not like. So I think that Bharath has the right idea here. I don't think that his patch is right in detail, though. I don't think the call to disable_timeout() needs to be conditional, and I don't think the Assert is correct. Also, I think that your patch has the right idea in encapsulating the disable_timeout() call inside a new function disable_startup_progress_timeout(), rather than having the details known directly by xlogrecovery.c. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Nov 14, 2022 at 9:31 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Nov 14, 2022 at 7:37 AM Simon Riggs > <simon.riggs@enterprisedb.com> wrote: > > > Whilte at it, I noticed that we report redo progress for PITR, but we > > > don't report when standby enters archive recovery mode, say due to a > > > failure in the connection to primary or after the promote signal is > > > found. Isn't it useful to report in this case as well to know the > > > recovery progress? > > > > I think your patch disables progress too early, effectively turning > > off the standby progress feature. The purpose was to report on things > > that take long periods during recovery, not just prior to recovery. > > > > I would advocate that we disable progress only while waiting, as I've done here: > > https://www.postgresql.org/message-id/CANbhV-GcWjZ2cmj0uCbZDWQUHnneMi_4EfY3dVWq0-yD5o7Ccg%40mail.gmail.com > > Maybe I'm confused here, but I think that, on a standby, startup > progress messages are only printed until the main redo loop is > reached. Otherwise, we would print a message on a standby every 10s > forever, which seems like a thing that most users would not like. So I > think that Bharath has the right idea here. Yes, the idea is to disable the timeout on standby completely since we actually don't report any recovery progress. Keeping it enabled, unnecessarily calls startup_progress_timeout_handler() every log_startup_progress_interval seconds i.e. 10 seconds. That's the intention of the patch. > I don't think that his patch is right in detail, though. I don't think > the call to disable_timeout() needs to be conditional, Yes, disable_timeout() returns if the timeout was previously disabled i.e. all_timeouts[STARTUP_PROGRESS_TIMEOUT].active is false. I changed it in the v2 patch. > and I don't > think the Assert is correct. You're right. My intention there was to check if the timeout is enabled while ereport_startup_progress() is called. In the v2 patch, when we actually disable the timeout startup_progress_timer_expired gets set to false and has_startup_progress_timeout_expired() just returns in such a case. > Also, I think that your patch has the > right idea in encapsulating the disable_timeout() call inside a new > function disable_startup_progress_timeout(), rather than having the > details known directly by xlogrecovery.c. Yes, I too like Simon's idea of {enable, disable}_startup_progress_timeout functions, I utilized them in the v2 patch here. I actually want to get rid of begin_startup_progress_phase() which now becomes a thin wrapper calling disable and enable functions and ensure the callers do follow enable()-report_progress()-disable() way to use the feature, however I didn't code for that as it needs changes across many files. If okay, I can code for that too. Thoughts? Please review the v2 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On Tue, 15 Nov 2022 at 13:33, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Mon, Nov 14, 2022 at 9:31 PM Robert Haas <robertmhaas@gmail.com> wrote: > > > > On Mon, Nov 14, 2022 at 7:37 AM Simon Riggs > > <simon.riggs@enterprisedb.com> wrote: > > > > Whilte at it, I noticed that we report redo progress for PITR, but we > > > > don't report when standby enters archive recovery mode, say due to a > > > > failure in the connection to primary or after the promote signal is > > > > found. Isn't it useful to report in this case as well to know the > > > > recovery progress? > > > > > > I think your patch disables progress too early, effectively turning > > > off the standby progress feature. The purpose was to report on things > > > that take long periods during recovery, not just prior to recovery. > > > > > > I would advocate that we disable progress only while waiting, as I've done here: > > > https://www.postgresql.org/message-id/CANbhV-GcWjZ2cmj0uCbZDWQUHnneMi_4EfY3dVWq0-yD5o7Ccg%40mail.gmail.com > > > > Maybe I'm confused here, but I think that, on a standby, startup > > progress messages are only printed until the main redo loop is > > reached. Otherwise, we would print a message on a standby every 10s > > forever, which seems like a thing that most users would not like. So I > > think that Bharath has the right idea here. > > Yes, the idea is to disable the timeout on standby completely since we > actually don't report any recovery progress. Keeping it enabled, > unnecessarily calls startup_progress_timeout_handler() every > log_startup_progress_interval seconds i.e. 10 seconds. That's the > intention of the patch. As long as we don't get the SIGALRMs that Thomas identified, then I'm happy. -- Simon Riggs http://www.EnterpriseDB.com/
On Tue, Nov 15, 2022 at 8:33 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > Please review the v2 patch. It seems to me that this will call disable_startup_progress_timeout once per WAL record, which seems like an unnecessary expense. How about leaving the code inside the loop just as we have it, and putting if (StandbyMode) disable_startup_progress_timeout() before entering the loop? -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, Nov 15, 2022 at 10:55 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Nov 15, 2022 at 8:33 AM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > Please review the v2 patch. > > It seems to me that this will call disable_startup_progress_timeout > once per WAL record, which seems like an unnecessary expense. How > about leaving the code inside the loop just as we have it, and putting > if (StandbyMode) disable_startup_progress_timeout() before entering > the loop? That can be done, only if we can disable the timeout in another place when the StandbyMode is set to true in ReadRecord(), that is, after the standby server finishes crash recovery and enters standby mode. I'm attaching the v3 patch for further review. Please find the CF entry here - https://commitfest.postgresql.org/41/4012/. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On Wed, 16 Nov 2022 at 06:47, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Tue, Nov 15, 2022 at 10:55 PM Robert Haas <robertmhaas@gmail.com> wrote: > > > > On Tue, Nov 15, 2022 at 8:33 AM Bharath Rupireddy > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > Please review the v2 patch. > > > > It seems to me that this will call disable_startup_progress_timeout > > once per WAL record, which seems like an unnecessary expense. How > > about leaving the code inside the loop just as we have it, and putting > > if (StandbyMode) disable_startup_progress_timeout() before entering > > the loop? > > That can be done, only if we can disable the timeout in another place > when the StandbyMode is set to true in ReadRecord(), that is, after > the standby server finishes crash recovery and enters standby mode. > > I'm attaching the v3 patch for further review. Please find the CF > entry here - https://commitfest.postgresql.org/41/4012/. begin_startup_progress_phase() checks to see if feature is disabled twice, so I think you can skip the check and just rely on the check in enable(). Otherwise, all good. -- Simon Riggs http://www.EnterpriseDB.com/
On Wed, Nov 16, 2022 at 2:28 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > On Wed, 16 Nov 2022 at 06:47, Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > On Tue, Nov 15, 2022 at 10:55 PM Robert Haas <robertmhaas@gmail.com> wrote: > > > > > > On Tue, Nov 15, 2022 at 8:33 AM Bharath Rupireddy > > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > Please review the v2 patch. > > > > > > It seems to me that this will call disable_startup_progress_timeout > > > once per WAL record, which seems like an unnecessary expense. How > > > about leaving the code inside the loop just as we have it, and putting > > > if (StandbyMode) disable_startup_progress_timeout() before entering > > > the loop? > > > > That can be done, only if we can disable the timeout in another place > > when the StandbyMode is set to true in ReadRecord(), that is, after > > the standby server finishes crash recovery and enters standby mode. > > > > I'm attaching the v3 patch for further review. Please find the CF > > entry here - https://commitfest.postgresql.org/41/4012/. > > begin_startup_progress_phase() checks to see if feature is disabled > twice, so I think you can skip the check and just rely on the check in > enable(). Yes, I did that intentionally to avoid begin_startup_progress_phase() calling disable and enable functions when the feature is disabled. I'll leave it to the committer whether to retain it or delete it. > Otherwise, all good. Thanks. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Wed, Nov 16, 2022 at 1:47 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > That can be done, only if we can disable the timeout in another place > when the StandbyMode is set to true in ReadRecord(), that is, after > the standby server finishes crash recovery and enters standby mode. Oh, interesting. I didn't realize that we would need to worry about that case. > I'm attaching the v3 patch for further review. Please find the CF > entry here - https://commitfest.postgresql.org/41/4012/. I kind of dislike having to have logic for this in two places. Seems like it could create future bugs. How about the attached approach, instead? This way, the first time the timer expires after we reach standby mode, we reactively disable it. -- Robert Haas EDB: http://www.enterprisedb.com
Attachment
On Thu, Nov 17, 2022 at 7:51 AM Robert Haas <robertmhaas@gmail.com> wrote: + * up, since standby mode is a state that is intendeded to persist typo Otherwise LGTM.
On Thu, Nov 17, 2022 at 12:21 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Nov 16, 2022 at 1:47 AM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > That can be done, only if we can disable the timeout in another place > > when the StandbyMode is set to true in ReadRecord(), that is, after > > the standby server finishes crash recovery and enters standby mode. > > Oh, interesting. I didn't realize that we would need to worry about that case. > > > I'm attaching the v3 patch for further review. Please find the CF > > entry here - https://commitfest.postgresql.org/41/4012/. > > I kind of dislike having to have logic for this in two places. Seems > like it could create future bugs. Duplication is a problem that I agree with and I have an idea here - how about introducing a new function, say EnableStandbyMode() that sets StandbyMode to true and disables the startup progress timeout, something like the attached? > How about the attached approach, instead? This way, the first time the > timer expires after we reach standby mode, we reactively disable it. Hm. I'm not really sure if it's a good idea. While it simplifies the code, the has_startup_progress_timeout_expired() gets called for every WAL record in standby mode. Isn't this an unnecessary thing? Currently, the if (!StandbyMode) condition blocks the function calls. And I'm also a little concerned that we move the StandbyMode variable to startup.c which so far tiled to xlogrecovery.c. Maybe these are not really concerns at all. Maybe others are okay with this approach. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On Thu, Nov 17, 2022 at 2:22 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > Duplication is a problem that I agree with and I have an idea here - > how about introducing a new function, say EnableStandbyMode() that > sets StandbyMode to true and disables the startup progress timeout, > something like the attached? That works for me, more or less. But I think that enable_startup_progress_timeout should be amended to either say if (log_startup_progress_interval == 0 || StandbyMode) return; or else it should at least Assert(!StandbyMode), so that we can't accidentally re-enable the timer after we shut it off. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Nov 18, 2022 at 12:42 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Nov 17, 2022 at 2:22 AM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > Duplication is a problem that I agree with and I have an idea here - > > how about introducing a new function, say EnableStandbyMode() that > > sets StandbyMode to true and disables the startup progress timeout, > > something like the attached? > > That works for me, more or less. But I think that > enable_startup_progress_timeout should be amended to either say if > (log_startup_progress_interval == 0 || StandbyMode) return; or else it > should at least Assert(!StandbyMode), so that we can't accidentally > re-enable the timer after we shut it off. Hm, an assertion may not help in typical production servers running on non-assert builds. I've modified the if condition, please see the attached v5 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
At Fri, 18 Nov 2022 15:55:00 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > On Fri, Nov 18, 2022 at 12:42 AM Robert Haas <robertmhaas@gmail.com> wrote: > > > > On Thu, Nov 17, 2022 at 2:22 AM Bharath Rupireddy > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > Duplication is a problem that I agree with and I have an idea here - > > > how about introducing a new function, say EnableStandbyMode() that > > > sets StandbyMode to true and disables the startup progress timeout, > > > something like the attached? > > > > That works for me, more or less. But I think that > > enable_startup_progress_timeout should be amended to either say if > > (log_startup_progress_interval == 0 || StandbyMode) return; or else it > > should at least Assert(!StandbyMode), so that we can't accidentally > > re-enable the timer after we shut it off. > > Hm, an assertion may not help in typical production servers running on > non-assert builds. I've modified the if condition, please see the > attached v5 patch. I prefer Robert's approach as it is more robust for future changes and simple. I prefer to avoid this kind of piggy-backing and it doesn't seem to be needed in this case. XLogShutdownWalRcv() looks like a similar case to me and honestly I don't like it in the sense of robustness but it is simpler than checking walreceiver status at every site that refers to the flag. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Sun, Nov 20, 2022 at 9:20 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > I prefer Robert's approach as it is more robust for future changes and > simple. I prefer to avoid this kind of piggy-backing and it doesn't > seem to be needed in this case. XLogShutdownWalRcv() looks like a > similar case to me and honestly I don't like it in the sense of > robustness but it is simpler than checking walreceiver status at every > site that refers to the flag. I don't understand what you want changed. Can you be more specific about what you mean by "Robert's approach"? I don't agree with Bharath's logic for preferring an if-test to an Assert. There are some cases where we think we've written the code correctly but also recognize that the logic is complicated enough that something might slip through the cracks. Then, a runtime check makes sense, because otherwise something real bad might happen on a production instance. But here, I don't think that's the main risk. I think the main risk is that some future patch tries to add code that should print startup log messages later on. That would probably be a coding mistake, and Assert would alert the patch author about that, whereas amending the if-test would just make the code do something differently then the author intended. But I don't feel super-strongly about this, which is why I mentioned both options in my previous email. I'm not clear on whether you are expressing an opinion on this point in particular or something more general. Thanks, -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Nov 21, 2022 at 10:37 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Sun, Nov 20, 2022 at 9:20 PM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > I prefer Robert's approach as it is more robust for future changes and > > simple. I prefer to avoid this kind of piggy-backing and it doesn't > > seem to be needed in this case. XLogShutdownWalRcv() looks like a > > similar case to me and honestly I don't like it in the sense of > > robustness but it is simpler than checking walreceiver status at every > > site that refers to the flag. > > I don't understand what you want changed. Can you be more specific > about what you mean by "Robert's approach"? > > I don't agree with Bharath's logic for preferring an if-test to an > Assert. There are some cases where we think we've written the code > correctly but also recognize that the logic is complicated enough that > something might slip through the cracks. Then, a runtime check makes > sense, because otherwise something real bad might happen on a > production instance. But here, I don't think that's the main risk. I > think the main risk is that some future patch tries to add code that > should print startup log messages later on. That would probably be a > coding mistake, and Assert would alert the patch author about that, > whereas amending the if-test would just make the code do something > differently then the author intended. > > But I don't feel super-strongly about this, which is why I mentioned > both options in my previous email. I'm not clear on whether you are > expressing an opinion on this point in particular or something more > general. If we place just the Assert(!StandbyMode); in enable_startup_progress_timeout(), it fails for begin_startup_progress_phase() in ResetUnloggedRelations() because the InitWalRecovery(), that sets StandbyMode to true, is called before ResetUnloggedRelations() . However, with the if (StandbyMode) { return; }, we fail to report progress of ResetUnloggedRelations() in a standby, which isn't a good idea at all because we only want to disable the timeout during the recovery's main loop. I introduced an assert-only function returning true when we're in standby's main redo apply loop and modified the assertion to be Assert(!InStandbyMainRedoApplyLoop()); works better but it complicates the code a bit. FWIW, I'm attaching the v6 patch with this change. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On Tue, Nov 22, 2022 at 6:05 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > If we place just the Assert(!StandbyMode); in > enable_startup_progress_timeout(), it fails for > begin_startup_progress_phase() in ResetUnloggedRelations() because the > InitWalRecovery(), that sets StandbyMode to true, is called before > ResetUnloggedRelations() . However, with the if (StandbyMode) { > return; }, we fail to report progress of ResetUnloggedRelations() in a > standby, which isn't a good idea at all because we only want to > disable the timeout during the recovery's main loop. Ugh. Well, in that case, I guess my vote is to forget about this whole Assert business and just commit what you had in v4. Does that work for you? Protecting against specifically the situation where we're in the standby's main redo apply loop is not really what I had in mind here, but this is already sort of weirdly complicated-looking, and making it more weirdly complicated-looking to achieve the kind of protection that I had in mind doesn't really seem worth it. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Feb 3, 2023 at 2:29 AM Robert Haas <robertmhaas@gmail.com> wrote: > Thanks for looking at this. > On Tue, Nov 22, 2022 at 6:05 AM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > If we place just the Assert(!StandbyMode); in > > enable_startup_progress_timeout(), it fails for > > begin_startup_progress_phase() in ResetUnloggedRelations() because the > > InitWalRecovery(), that sets StandbyMode to true, is called before > > ResetUnloggedRelations() . However, with the if (StandbyMode) { > > return; }, we fail to report progress of ResetUnloggedRelations() in a > > standby, which isn't a good idea at all because we only want to > > disable the timeout during the recovery's main loop. > > Ugh. Well, in that case, I guess my vote is to forget about this whole > Assert business and just commit what you had in v4. Does that work for > you? Yes, it seems reasonable to me. > Protecting against specifically the situation where we're in the > standby's main redo apply loop is not really what I had in mind here, > but this is already sort of weirdly complicated-looking, and making it > more weirdly complicated-looking to achieve the kind of protection > that I had in mind doesn't really seem worth it. IMHO, the responsibility of whether or not to report progress of any operation can lie with the developers writing features using the progress reporting mechanism. The progress reporting mechanism can just be independent of all that. I took the v4 patch, added some comments and attached it as the v7 patch here. Please find it. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On Thu, Feb 2, 2023 at 11:22 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > I took the v4 patch, added some comments and attached it as the v7 > patch here. Please find it. Committed and back-patched to v15. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Feb 2, 2023 at 11:22 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: >> I took the v4 patch, added some comments and attached it as the v7 >> patch here. Please find it. > Committed and back-patched to v15. Umm ... is this really the sort of patch to be committing on a release wrap day? regards, tom lane
On Mon, Feb 6, 2023 at 11:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Umm ... is this really the sort of patch to be committing on a > release wrap day? Oh, shoot, I wasn't thinking about that. Would you like me to revert it in v15 for now? -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Feb 6, 2023 at 11:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Umm ... is this really the sort of patch to be committing on a >> release wrap day? > Oh, shoot, I wasn't thinking about that. Would you like me to revert > it in v15 for now? Yeah, seems like the safest course. I wouldn't object to it going in after the release is over, but right now there's zero margin for error. regards, tom lane
On Mon, Feb 6, 2023 at 11:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Mon, Feb 6, 2023 at 11:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Umm ... is this really the sort of patch to be committing on a > >> release wrap day? > > > Oh, shoot, I wasn't thinking about that. Would you like me to revert > > it in v15 for now? > > Yeah, seems like the safest course. I wouldn't object to it going in > after the release is over, but right now there's zero margin for error. Done. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Feb 6, 2023 at 9:39 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Feb 6, 2023 at 11:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Umm ... is this really the sort of patch to be committing on a > > release wrap day? > > Oh, shoot, I wasn't thinking about that. Would you like me to revert > it in v15 for now? Thanks a lot Robert for taking care of this. The patch is committed on HEAD and reverted on v15. Now that the minor version branches are stamped, is it time for us to get this to v15? I can then close the CF entry - https://commitfest.postgresql.org/42/4012/. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes: > Thanks a lot Robert for taking care of this. The patch is committed on > HEAD and reverted on v15. Now that the minor version branches are > stamped, is it time for us to get this to v15? I can then close the CF > entry - https://commitfest.postgresql.org/42/4012/. No objection to un-reverting from here. regards, tom lane
On Wed, Feb 8, 2023 at 11:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes: > > Thanks a lot Robert for taking care of this. The patch is committed on > > HEAD and reverted on v15. Now that the minor version branches are > > stamped, is it time for us to get this to v15? I can then close the CF > > entry - https://commitfest.postgresql.org/42/4012/. > > No objection to un-reverting from here. Thanks Robert, Tom. It is now un-reverted on v15. I've closed the CF entry. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com