Thread: Support a wildcard in backtrace_functions
I would like to be able to add backtraces to all ERROR logs. This is useful to me, because during postgres or extension development any error that I hit is usually unexpected. This avoids me from having to change backtrace_functions every time I get an error based on the function name listed in the LOCATION output (added by "\set VERBOSITY verbose"). Attached is a trivial patch that starts supporting backtrace_functions='*'. By setting that in postgresql.conf for my dev environment it starts logging backtraces always. The main problem it currently has is that it adds backtraces to all LOG level logs too. So probably we want to make backtrace_functions only log backtraces for ERROR and up (or maybe WARNING/NOTICE and up), or add a backtrace_functions_level GUC too control this behaviour. The docs of backtrace_functions currently heavily suggest that it should only be logging backtraces for errors, so either we actually start doing that or we should clarify the docs (emphasis mine): > This parameter contains a comma-separated list of C function > names. If an **error** is raised and the name of the internal C function > where the **error** happens matches a value in the list, then a > backtrace is written to the server log together with the error > message. This can be used to debug specific areas of the > source code.
Attachment
> On 20 Dec 2023, at 12:23, Jelte Fennema-Nio <me@jeltef.nl> wrote: > Attached is a trivial patch that starts supporting > backtrace_functions='*'. By setting that in postgresql.conf for my dev > environment it starts logging backtraces always. I happened to implement pretty much the same diff today during a debugging session, and then stumbled across this when searching the archives, so count me in for +1 on the concept. > The main problem it currently has is that it adds backtraces to all > LOG level logs too. So probably we want to make backtrace_functions > only log backtraces for ERROR and up (or maybe WARNING/NOTICE and up), > or add a backtrace_functions_level GUC too control this behaviour. A wildcard should IMO only apply for ERROR (and higher) so I've hacked that up in the attached v2. I was thinking about WARNING as well but opted against it. > The docs of backtrace_functions currently heavily suggest that it should > only be logging backtraces for errors, so either we actually start > doing that or we should clarify the docs I think we should keep the current functionality and instead adjust the docs. This has already been shipped like this, and restricting it now without a clear usecase for doing so seems invasive (and someone might very well be using this). 0001 in the attached adjusts this. -- Daniel Gustafsson
Attachment
On Mon, 12 Feb 2024 at 14:14, Daniel Gustafsson <daniel@yesql.se> wrote: > > The main problem it currently has is that it adds backtraces to all > > LOG level logs too. So probably we want to make backtrace_functions > > only log backtraces for ERROR and up (or maybe WARNING/NOTICE and up), > > or add a backtrace_functions_level GUC too control this behaviour. > > A wildcard should IMO only apply for ERROR (and higher) so I've hacked that up > in the attached v2. I was thinking about WARNING as well but opted against it. Fine by me patch looks good. Although I think I'd slightly prefer having a backtrace_functions_level GUC, so that we can get this same benefit for non wildcard backtrace_functions and so we keep the behaviour between the two consistent. > I think we should keep the current functionality and instead adjust the docs. > This has already been shipped like this, and restricting it now without a clear > usecase for doing so seems invasive (and someone might very well be using > this). 0001 in the attached adjusts this. Would a backtrace_functions_level GUC that would default to ERROR be acceptable in this case? It's slight behaviour break, but you would be able to get the previous behaviour very easily. And honestly wanting to get backtraces for non-ERROR log entries seems quite niche to me, which to me makes it a weird default. > + If an log entry is raised and the name of the internal C function where s/an log entry/a log entry
On 12.02.24 14:27, Jelte Fennema-Nio wrote: > And honestly wanting > to get backtraces for non-ERROR log entries seems quite niche to me, > which to me makes it a weird default. I think one reason for this is that non-ERRORs are fairly unique in their wording, so you don't have to isolate them by function name.
On Mon, 12 Feb 2024 at 14:27, Jelte Fennema-Nio <me@jeltef.nl> wrote: > Would a backtrace_functions_level GUC that would default to ERROR be > acceptable in this case? I implemented it this way in the attached patchset.
Attachment
> On 27 Feb 2024, at 18:03, Jelte Fennema-Nio <me@jeltef.nl> wrote: > > On Mon, 12 Feb 2024 at 14:27, Jelte Fennema-Nio <me@jeltef.nl> wrote: >> Would a backtrace_functions_level GUC that would default to ERROR be >> acceptable in this case? > > I implemented it this way in the attached patchset. I'm not excited about adding even more GUCs but maybe it's the least bad option here. + If a log entry is raised with a level higher than + <xref linkend="guc-backtrace-functions-min-level"/> and the name of the This should be "equal to or higher" right? -- Daniel Gustafsson
On Wed, 28 Feb 2024 at 19:04, Daniel Gustafsson <daniel@yesql.se> wrote: > This should be "equal to or higher" right? Correct, nicely spotted. Fixed that. I also updated the docs for the new backtrace_functions_min_level GUC itself too, as well as creating a dedicated options array for the GUC. Because when updating its docs I realized that none of the existing level arrays matched what we wanted.
Attachment
> On 28 Feb 2024, at 19:50, Jelte Fennema-Nio <me@jeltef.nl> wrote: > > On Wed, 28 Feb 2024 at 19:04, Daniel Gustafsson <daniel@yesql.se> wrote: >> This should be "equal to or higher" right? > > Correct, nicely spotted. Fixed that. I also updated the docs for the > new backtrace_functions_min_level GUC itself too, as well as creating > a dedicated options array for the GUC. Because when updating its docs > I realized that none of the existing level arrays matched what we > wanted. Looks good, I'm marking this ready for committer for now. I just have a few small comments: + A single <literal>*</literal> character is interpreted as a wildcard and + will cause all errors in the log to contain backtraces. This should mention that it's all error matching the level (or higher) of the newly introduced GUC. + gettext_noop("Sets the message levels that create backtraces when backtrace_functions is configured"), I think we should add the same "Each level includes.." long_desc, and the short_desc should end with period. + <para> + Backtrace support is not available on all platforms, and the quality + of the backtraces depends on compilation options. + </para> I don't think we need to duplicate this para here, having it on backtrace_functions suffice. -- Daniel Gustafsson
On 2024-Feb-28, Jelte Fennema-Nio wrote: > diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c > index 699d9d0a241..553e4785520 100644 > --- a/src/backend/utils/error/elog.c > +++ b/src/backend/utils/error/elog.c > @@ -843,6 +843,8 @@ matches_backtrace_functions(const char *funcname) > if (*p == '\0') /* end of backtrace_function_list */ > break; > > + if (strcmp("*", p) == 0) > + return true; > if (strcmp(funcname, p) == 0) > return true; > p += strlen(p) + 1; Hmm, so if I write "foo,*" this will work but check all function names first and on the second entry. But if I write "foo*" the GUC value will be accepted but match nothing (as will "*foo" or "foo*bar"). I don't like either of these behaviors. I think we should tighten this up: an asterisk should be allowed only if it appears alone in the string (short-circuiting check_backtrace_functions before strspn); and let's leave the strspn() call alone. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ Bob [Floyd] used to say that he was planning to get a Ph.D. by the "green stamp method," namely by saving envelopes addressed to him as 'Dr. Floyd'. After collecting 500 such letters, he mused, a university somewhere in Arizona would probably grant him a degree. (Don Knuth)
On Thu, Feb 29, 2024 at 4:05 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > Hmm, so if I write "foo,*" this will work but check all function names > first and on the second entry. But if I write "foo*" the GUC value will > be accepted but match nothing (as will "*foo" or "foo*bar"). I don't > like either of these behaviors. I think we should tighten this up: an > asterisk should be allowed only if it appears alone in the string > (short-circuiting check_backtrace_functions before strspn); and let's > leave the strspn() call alone. +1 for disallowing *foo or foo* or foo*bar etc. combinations. I think we need to go a bit further and convert backtrace_functions of type GUC_LIST_INPUT so that check_backtrace_functions can just use SplitIdentifierString to parse the list of identifiers. Then, the strspn can just be something like below for each token: validlen = strspn(*tok, "0123456789_" "abcdefghijklmnopqrstuvwxyz" "ABCDEFGHIJKLMNOPQRSTUVWXYZ"); Does anyone see a problem with it? FWIW, I've recently noticed for my work on https://commitfest.postgresql.org/47/2863/ that there isn't any test covering all the backtrace related code - backtrace_functions GUC, backtrace_on_internal_error GUC, set_backtrace(), backtrace(), backtrace_symbols(). I came up with a test module covering these areas https://commitfest.postgresql.org/47/4823/. I could make the TAP tests pass on all the CF bot animals. Interestingly, the new code that gets added for this thread can also be covered with it. Any thoughts are welcome. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On 2024-Mar-06, Bharath Rupireddy wrote: > +1 for disallowing *foo or foo* or foo*bar etc. combinations. Cool. > I think we need to go a bit further and convert backtrace_functions of > type GUC_LIST_INPUT so that check_backtrace_functions can just use > SplitIdentifierString to parse the list of identifiers. Then, the > strspn can just be something like below for each token: > > validlen = strspn(*tok, > "0123456789_" > "abcdefghijklmnopqrstuvwxyz" > "ABCDEFGHIJKLMNOPQRSTUVWXYZ"); > > Does anyone see a problem with it? IIRC the reason it's coded as it is, is so that we have a single palloc chunk of memory to free when the value changes; we purposefully stayed away from SplitIdentifierString and the like. What problem do you see with the idea I proposed? That was: > On Thu, Feb 29, 2024 at 4:05 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > I think we should tighten this up: an asterisk should be allowed > > only if it appears alone in the string (short-circuiting > > check_backtrace_functions before strspn); and let's leave the > > strspn() call alone. That means, just add something like this at the top of check_backtrace_functions and don't do anything to this function otherwise (untested code): if (newval[0] == '*' && newval[1] == '\0') { someval = guc_malloc(ERROR, 2); if (someval == NULL) return false; someval[0] = '*'; someval[1] = '\0'; *extra = someval; return true; } (Not sure if a second trailing \0 is necessary.) -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ Voy a acabar con todos los humanos / con los humanos yo acabaré voy a acabar con todos (bis) / con todos los humanos acabaré ¡acabaré! (Bender)
On Wed, Mar 6, 2024 at 12:41 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > I think we need to go a bit further and convert backtrace_functions of > > type GUC_LIST_INPUT so that check_backtrace_functions can just use > > SplitIdentifierString to parse the list of identifiers. Then, the > > strspn can just be something like below for each token: > > > > validlen = strspn(*tok, > > "0123456789_" > > "abcdefghijklmnopqrstuvwxyz" > > "ABCDEFGHIJKLMNOPQRSTUVWXYZ"); > > > > Does anyone see a problem with it? > > IIRC the reason it's coded as it is, is so that we have a single palloc > chunk of memory to free when the value changes; we purposefully stayed > away from SplitIdentifierString and the like. Why do we even need to prepare another list backtrace_function_list from the parsed identifiers? Why can't we just do something like v4-0003? Existing logic looks a bit complicated to me personally. I still don't understand why we can't just turn backtrace_functions to GUC_LIST_INPUT and use SplitIdentifierString? I see a couple of advantages with this approach: 1. It simplifies the backtrace_functions GUC related code a lot. 2. We don't need assign_backtrace_functions() and a separate variable backtrace_function_list, we can just rely on the GUC value backtrace_functions. 3. All we do now in check_backtrace_functions() is just parse the user entered backtrace_functions value, and quickly exit if we have found '*'. 4. In matches_backtrace_functions(), we iterate over the list as we already do right now. With the v4-0003, all of the below test cases work: ALTER SYSTEM SET backtrace_functions = 'pg_terminate_backend, pg_create_restore_point'; SELECT pg_reload_conf(); SHOW backtrace_functions; -- Must see backtrace SELECT pg_create_restore_point(repeat('A', 1024)); -- Must see backtrace SELECT pg_terminate_backend(1234, -1); ALTER SYSTEM SET backtrace_functions = '*, pg_create_restore_point'; SELECT pg_reload_conf(); SHOW backtrace_functions; -- Must see backtrace as * is specified SELECT pg_terminate_backend(1234, -1); -- Must see an error as * is specified in between the identifier name ALTER SYSTEM SET backtrace_functions = 'pg*_create_restore_point'; ERROR: invalid value for parameter "backtrace_functions": "pg*_create_restore_point" DETAIL: Invalid character > What problem do you see with the idea I proposed? That was: > > > On Thu, Feb 29, 2024 at 4:05 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > I think we should tighten this up: an asterisk should be allowed > > > only if it appears alone in the string (short-circuiting > > > check_backtrace_functions before strspn); and let's leave the > > > strspn() call alone. > > That means, just add something like this at the top of > check_backtrace_functions and don't do anything to this function > otherwise (untested code): > > if (newval[0] == '*' && newval[1] == '\0') > { > someval = guc_malloc(ERROR, 2); > if (someval == NULL) > return false; > someval[0] = '*'; > someval[1] = '\0'; > *extra = someval; > return true; > } This works only if '* 'is specified as the only one character in backtrace_functions = '*', right? If yes, what if someone sets backtrace_functions = 'foo, bar, *, baz'? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On 2024-Mar-08, Bharath Rupireddy wrote: > This works only if '* 'is specified as the only one character in > backtrace_functions = '*', right? If yes, what if someone sets > backtrace_functions = 'foo, bar, *, baz'? It throws an error, as expected. This is a useless waste of resources: checking for "foo" and "bar" is pointless, since the * is going to give a positive match anyway. And the "baz" is a waste of memory which is never going to be checked. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "I love the Postgres community. It's all about doing things _properly_. :-)" (David Garamond)
On Fri, 8 Mar 2024 at 10:59, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2024-Mar-08, Bharath Rupireddy wrote: > > > This works only if '* 'is specified as the only one character in > > backtrace_functions = '*', right? If yes, what if someone sets > > backtrace_functions = 'foo, bar, *, baz'? > > It throws an error, as expected. This is a useless waste of resources: > checking for "foo" and "bar" is pointless, since the * is going to give > a positive match anyway. And the "baz" is a waste of memory which is > never going to be checked. Makes sense. Attached is a new patchset that implements it that way. I've not included Bharath his 0003 patch, since it's a much bigger change than the others, and thus might need some more discussion.
Attachment
> On 8 Mar 2024, at 12:25, Jelte Fennema-Nio <me@jeltef.nl> wrote: > > On Fri, 8 Mar 2024 at 10:59, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: >> >> On 2024-Mar-08, Bharath Rupireddy wrote: >> >>> This works only if '* 'is specified as the only one character in >>> backtrace_functions = '*', right? If yes, what if someone sets >>> backtrace_functions = 'foo, bar, *, baz'? >> >> It throws an error, as expected. This is a useless waste of resources: >> checking for "foo" and "bar" is pointless, since the * is going to give >> a positive match anyway. And the "baz" is a waste of memory which is >> never going to be checked. > > Makes sense. Attached is a new patchset that implements it that way. This version address the concerns raised by Alvaro, and even simplifies the code over earlier revisions. My documentation comments from upthread still stands, but other than those this version LGTM. > I've not included Bharath his 0003 patch, since it's a much bigger > change than the others, and thus might need some more discussion. Agreed. -- Daniel Gustafsson
On Fri, Mar 8, 2024 at 7:12 PM Daniel Gustafsson <daniel@yesql.se> wrote: > > > On 8 Mar 2024, at 12:25, Jelte Fennema-Nio <me@jeltef.nl> wrote: > > > > On Fri, 8 Mar 2024 at 10:59, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > >> > >> On 2024-Mar-08, Bharath Rupireddy wrote: > >> > >>> This works only if '* 'is specified as the only one character in > >>> backtrace_functions = '*', right? If yes, what if someone sets > >>> backtrace_functions = 'foo, bar, *, baz'? > >> > >> It throws an error, as expected. This is a useless waste of resources: > >> checking for "foo" and "bar" is pointless, since the * is going to give > >> a positive match anyway. And the "baz" is a waste of memory which is > >> never going to be checked. > > > > Makes sense. Attached is a new patchset that implements it that way. > > This version address the concerns raised by Alvaro, and even simplifies the > code over earlier revisions. My documentation comments from upthread still > stands, but other than those this version LGTM. So, to get backtraces of all functions at backtrace_functions_min_level level, one has to specify backtrace_functions = '*'; combining it with function names is not allowed. This looks cleaner. postgres=# ALTER SYSTEM SET backtrace_functions = '*, pg_create_restore_point'; ERROR: invalid value for parameter "backtrace_functions": "*, pg_create_restore_point" DETAIL: Invalid character I have one comment on 0002, otherwise all looks good. + <para> + A single <literal>*</literal> character can be used instead of a list + of C functions. This <literal>*</literal> is interpreted as a wildcard + and will cause all errors in the log to contain backtraces. + </para> It's not always the ERRORs for which backtraces get logged, it really depends on the new GUC backtrace_functions_min_level. If my understanding is right, can we specify that in the above note? > > I've not included Bharath his 0003 patch, since it's a much bigger > > change than the others, and thus might need some more discussion. +1. I'll see if I can start a new thread for this. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
> On 8 Mar 2024, at 15:01, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > So, to get backtraces of all functions at > backtrace_functions_min_level level, one has to specify > backtrace_functions = '*'; combining it with function names is not > allowed. This looks cleaner. > > postgres=# ALTER SYSTEM SET backtrace_functions = '*, pg_create_restore_point'; > ERROR: invalid value for parameter "backtrace_functions": "*, > pg_create_restore_point" > DETAIL: Invalid character If we want to be extra helpful here we could add something like the below to give an errhint when a wildcard was found. Also, the errdetail should read like a full sentence so it should be slightly expanded anyways. diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c index ca621ea3ff..7bc655ecd2 100644 --- a/src/backend/utils/error/elog.c +++ b/src/backend/utils/error/elog.c @@ -2151,7 +2151,9 @@ check_backtrace_functions(char **newval, void **extra, GucSource source) ", \n\t"); if (validlen != newvallen) { - GUC_check_errdetail("Invalid character"); + GUC_check_errdetail("Invalid character in function name."); + if ((*newval)[validlen] == '*') + GUC_check_errhint("For wildcard matching, use a single \"*\" without any other function names."); return false; } -- Daniel Gustafsson
On 08.03.24 12:25, Jelte Fennema-Nio wrote: > On Fri, 8 Mar 2024 at 10:59, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: >> >> On 2024-Mar-08, Bharath Rupireddy wrote: >> >>> This works only if '* 'is specified as the only one character in >>> backtrace_functions = '*', right? If yes, what if someone sets >>> backtrace_functions = 'foo, bar, *, baz'? >> >> It throws an error, as expected. This is a useless waste of resources: >> checking for "foo" and "bar" is pointless, since the * is going to give >> a positive match anyway. And the "baz" is a waste of memory which is >> never going to be checked. > > Makes sense. Attached is a new patchset that implements it that way. > I've not included Bharath his 0003 patch, since it's a much bigger > change than the others, and thus might need some more discussion. What is the relationship of these changes with the recently added backtrace_on_internal_error? We had similar discussions there, I feel like we are doing similar things here but slightly differently. Like, shouldn't backtrace_functions_min_level also affect backtrace_on_internal_error? Don't you really just want backtrace_on_any_error? You are sneaking that in through the backdoor via backtrace_functions. Can we somehow combine all these use cases more elegantly? backtrace_on_error = {all|internal|none}? Btw., your code/documentation sometimes writes "stack trace". Let's stick to backtrace for consistency.
On Fri, 8 Mar 2024 at 14:42, Daniel Gustafsson <daniel@yesql.se> wrote: > My documentation comments from upthread still > stands, but other than those this version LGTM. Ah yeah, I forgot about those. Fixed now.
Attachment
On Fri, 8 Mar 2024 at 15:51, Peter Eisentraut <peter@eisentraut.org> wrote: > What is the relationship of these changes with the recently added > backtrace_on_internal_error? I think that's a reasonable question. And the follow up ones too. I think it all depends on how close we consider backtrace_on_internal_error and backtrace_functions. While they obviously have similar functionality, I feel like backtrace_on_internal_error is probably a function that we'd want to turn on by default in the future. While backtrace_functions seems like it's mostly useful for developers. (i.e. the current grouping of backtrace_on_internal_error under DEVELOPER_OPTIONS seems wrong to me) > shouldn't backtrace_functions_min_level also affect > backtrace_on_internal_error? I guess that depends on the default behaviour that we want. Would we want warnings with ERRCODE_INTERNAL_ERROR to be backtraced by default or not. There is at least one example of such a warning in the codebase: ereport(WARNING, errcode(ERRCODE_INTERNAL_ERROR), errmsg_internal("could not parse XML declaration in stored value"), errdetail_for_xml_code(res_code)); > Btw., your code/documentation sometimes writes "stack trace". Let's > stick to backtrace for consistency. Fixed that in the latest patset in the email I sent right before this one.
On Fri, Mar 8, 2024 at 9:25 PM Jelte Fennema-Nio <me@jeltef.nl> wrote: > > On Fri, 8 Mar 2024 at 15:51, Peter Eisentraut <peter@eisentraut.org> wrote: > > What is the relationship of these changes with the recently added > > backtrace_on_internal_error? > > I think that's a reasonable question. And the follow up ones too. > > I think it all depends on how close we consider > backtrace_on_internal_error and backtrace_functions. While they > obviously have similar functionality, I feel like > backtrace_on_internal_error is probably a function that we'd want to > turn on by default in the future. Hm, we may not want backtrace_on_internal_error to be on by default. AFAICS, all ERRCODE_INTERNAL_ERROR are identifiable with the error message, so it's sort of easy for one to track down the cause of it without actually needing to log backtrace by default. On Fri, Mar 8, 2024 at 8:21 PM Peter Eisentraut <peter@eisentraut.org> wrote: > > What is the relationship of these changes with the recently added > backtrace_on_internal_error? We had similar discussions there, I feel > like we are doing similar things here but slightly differently. Like, > shouldn't backtrace_functions_min_level also affect > backtrace_on_internal_error? Don't you really just want > backtrace_on_any_error? You are sneaking that in through the backdoor > via backtrace_functions. Can we somehow combine all these use cases > more elegantly? backtrace_on_error = {all|internal|none}? I see a merit in Peter's point. I like the idea of backtrace_functions_min_level controlling backtrace_on_internal_error too. Less GUCs for similar functionality is always a good idea IMHO. Here's what I think: 1. Rename the new GUC backtrace_functions_min_level to backtrace_min_level. 2. Add 'internal' to backtrace_min_level_options enum +static const struct config_enum_entry backtrace_functions_level_options[] = { + {"internal", INTERNAL, false}, + {"debug5", DEBUG5, false}, + {"debug4", DEBUG4, false}, 3. Remove backtrace_on_internal_error GUC which is now effectively covered by backtrace_min_level = 'internal'; Does anyone see a problem with it? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Fri, 8 Mar 2024 at 17:17, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > Hm, we may not want backtrace_on_internal_error to be on by default. > AFAICS, all ERRCODE_INTERNAL_ERROR are identifiable with the error > message, so it's sort of easy for one to track down the cause of it > without actually needing to log backtrace by default. While maybe all messages uniquely identify the exact error, these errors usually originate somewhere deep down the call stack in a function that is called from many other places. Having the full stack trace can thus greatly help us to find what caused this specific error to occur. I think that would be quite useful to enable by default, if only because it would make many bug reports much more actionable. > 1. Rename the new GUC backtrace_functions_min_level to backtrace_min_level. > 2. Add 'internal' to backtrace_min_level_options enum > +static const struct config_enum_entry backtrace_functions_level_options[] = { > + {"internal", INTERNAL, false}, > + {"debug5", DEBUG5, false}, > + {"debug4", DEBUG4, false}, > 3. Remove backtrace_on_internal_error GUC which is now effectively > covered by backtrace_min_level = 'internal'; > > Does anyone see a problem with it? Honestly, it seems a bit confusing to me to add INTERNAL as a level, since it's an error code not log level. Also merging it in this way, brings up certain questions: 1. How do you get the current backtrace_on_internal_error=true behaviour? Would you need to set both backtrace_functions='*' and backtrace_min_level=INTERNAL? 2. What is the default value for backtrace_min_level? 3. You still wouldn't be able to limit INTERNAL errors to FATAL level I personally think having three GUCs in this patchset make sense, especially since I think it would be good to turn backtrace_on_internal_error on by default. The only additional change that I think might be worth making is to make backtrace_on_internal_error take a level argument, so that you could configure postgres to only add stack traces to FATAL internal errors. (attached is a patch that should fix the CI issue by adding GUC_NOT_IN_SAMPLE backtrace_functions_min_level)
Attachment
On 08.03.24 16:55, Jelte Fennema-Nio wrote: > On Fri, 8 Mar 2024 at 15:51, Peter Eisentraut <peter@eisentraut.org> wrote: >> What is the relationship of these changes with the recently added >> backtrace_on_internal_error? > > I think that's a reasonable question. And the follow up ones too. > > I think it all depends on how close we consider > backtrace_on_internal_error and backtrace_functions. While they > obviously have similar functionality, I feel like > backtrace_on_internal_error is probably a function that we'd want to > turn on by default in the future. While backtrace_functions seems like > it's mostly useful for developers. (i.e. the current grouping of > backtrace_on_internal_error under DEVELOPER_OPTIONS seems wrong to me) Hence the idea backtrace_on_error = {all|internal|none} which could default to 'internal'.
On Wed, Mar 13, 2024 at 7:50 PM Peter Eisentraut <peter@eisentraut.org> wrote: > > > I think it all depends on how close we consider > > backtrace_on_internal_error and backtrace_functions. While they > > obviously have similar functionality, I feel like > > backtrace_on_internal_error is probably a function that we'd want to > > turn on by default in the future. While backtrace_functions seems like > > it's mostly useful for developers. (i.e. the current grouping of > > backtrace_on_internal_error under DEVELOPER_OPTIONS seems wrong to me) > > Hence the idea > > backtrace_on_error = {all|internal|none} > > which could default to 'internal'. So, are you suggesting to just have backtrace_on_error = {all|internal|none} leaving backtrace_functions_min_level aside? In that case, I'd like to understand how backtrace_on_error and backtrace_functions interact with each other? Does one need to set backtrace_on_error = all to get backtrace of functions specified in backtrace_functions? What must be the behaviour of backtrace_on_error = all? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Wed, 13 Mar 2024 at 15:20, Peter Eisentraut <peter@eisentraut.org> wrote: > Hence the idea > > backtrace_on_error = {all|internal|none} > > which could default to 'internal'. I think one use-case that I'd personally at least would like to see covered is being able to get backtraces on all warnings. How would that be done with this setting? backtrace_on_error = all backtrace_min_level = warning In that case backtrace_on_error seems like a weird name, since it can include backtraces for warnings if you change backtrace_min_level. How about the following aproach. It still uses 3 GUCs, but they now all work together. There's one entry point and two additional filters (level and function name) # Says what log entries to log backtraces for log_backtrace = {all|internal|none} (default: internal) # Excludes log entries from log_include_backtrace by level backtrace_min_level = {debug4|...|fatal} (default: error) # Excludes log entries from log_include_backtrace if function name # does not match list, but empty string disables this filter (thus # logging for all functions) backtrace_functions = {...} (default: '') PS. Other naming option for log_backtrace could be log_include_backtrace
On Wed, 13 Mar 2024 at 16:32, Jelte Fennema-Nio <me@jeltef.nl> wrote: > How > about the following aproach. It still uses 3 GUCs, but they now all > work together. There's one entry point and two additional filters > (level and function name) > > # Says what log entries to log backtraces for > log_backtrace = {all|internal|none} (default: internal) > > # Excludes log entries from log_include_backtrace by level > backtrace_min_level = {debug4|...|fatal} (default: error) > > # Excludes log entries from log_include_backtrace if function name > # does not match list, but empty string disables this filter (thus > # logging for all functions) > backtrace_functions = {...} (default: '') Attached is a patch that implements this. Since the more I think about it, the more I like this approach.
Attachment
On Thu, 21 Mar 2024 at 15:41, Jelte Fennema-Nio <me@jeltef.nl> wrote: > Attached is a patch that implements this. Since the more I think about > it, the more I like this approach. I now added a 3rd patch to this patchset which changes the log_backtrace default to "internal", because it seems quite useful to me if user error reports of internal errors included a backtrace.
Attachment
On Thu, Mar 21, 2024 at 8:11 PM Jelte Fennema-Nio <me@jeltef.nl> wrote: > > Attached is a patch that implements this. Since the more I think about > it, the more I like this approach. Thanks. Overall the design looks good. log_backtrace is the entry point for one to control if a backtrace needs to be emitted at all. backtrace_min_level controls at what elevel the backtraces need to be emitted. If one wants to get backtraces for all internal ERRORs, then log_backtrace = 'internal' and backtrace_min_level = 'error' must be set. If backtrace_min_level = 'panic', then backtraces won't get logged for internal ERRORs. But, this is not the case right now, one can just set backtrace_on_internal_error = 'on' to get backtraces for all internal ERROR/FATAL or whatever just the errcode has to be ERRCODE_INTERNAL_ERROR. This is one change of behaviour and looks fine to me. If one wants to get backtrace from a function for all elog/ereport calls, then log_backtrace = either 'internal' or 'all', backtrace_min_level = 'debug5' and backtrace_functions = '<function_name>' must be set. But, right now, one can just set backtrace_functions = '<function_name>' to get backtrace from the functions for any of elog/ereport calls. A few comments: 1. @@ -832,6 +849,9 @@ matches_backtrace_functions(const char *funcname) { const char *p; + if (!backtrace_functions || backtrace_functions[0] == '\0') + return true; + Shouldn't this be returning false to not log set backtrace when backtrace_functions is not set? Am I missing something here? 2. + { + {"log_backtrace", PGC_SUSET, LOGGING_WHAT, + gettext_noop("Sets if logs should include a backtrace."), + NULL IMV, log_backtrace, backtrace_min_level and backtrace_functions are interlinked, so keeping all of them as DEVELOPER_OPTIONS looks fine to me than having log_backtrace at just LOGGING_WHAT kind. Also, we must also mark log_backtrace as GUC_NOT_IN_SAMPLE. 3. I think it's worth adding a few examples to get backtraces in docs. For instance, what to set to get backtraces of all internal errors and what to set to get backtraces of all ERRORs coming from a specific function etc. 4. I see the support for wildcard in backtrace_functions is missing. Is it intentionally left out? If not, can you make it part of 0003 patch? 5. The amount of backtraces generated is going to be too huge when setting log_backtrace = 'all' and backtrace_min_level = 'debug5'. With this setting installcheck generated 12MB worth of log and the test took about 55 seconds (as opposed to 48 seconds without it) The point is if these settings are misused, it can easily slow down the entire system and fill up disk space leading to crashes eventually. This makes a strong case for marking log_backtrace a developer only function. 6. In continuation to comment #5, does anyone need backtrace for elevels like debugX and LOG etc.? What's the use of the backtrace in such cases? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Fri, 22 Mar 2024 at 11:14, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > A few comments: > > 1. > @@ -832,6 +849,9 @@ matches_backtrace_functions(const char *funcname) > { > const char *p; > > + if (!backtrace_functions || backtrace_functions[0] == '\0') > + return true; > + > > Shouldn't this be returning false to not log set backtrace when > backtrace_functions is not set? Am I missing something here? Empty string is considered the new wildcard, i.e. backtrace_functions filtering is not enabled if it is empty. This seemed reasonable to me since you should now disable backtraces by using log_backtrace=none, having backtrace_functions='' mean the same thing seems rather useless. I also documented this in the updated docs. > 2. > + { > + {"log_backtrace", PGC_SUSET, LOGGING_WHAT, > + gettext_noop("Sets if logs should include a backtrace."), > + NULL > > IMV, log_backtrace, backtrace_min_level and backtrace_functions are > interlinked, so keeping all of them as DEVELOPER_OPTIONS looks fine to > me than having log_backtrace at just LOGGING_WHAT kind. Also, we must > also mark log_backtrace as GUC_NOT_IN_SAMPLE. I agree they are linked, but I don't think it's just useful for developers to be able to set log_backtrace to internal (even if we choose not to make "internal" the default). > 3. I think it's worth adding a few examples to get backtraces in docs. > For instance, what to set to get backtraces of all internal errors and > what to set to get backtraces of all ERRORs coming from a specific > function etc. Good idea, I'll try to add those later. For now your specific cases would be: log_backtrace = 'internal' (default in 0003) backtrace_functions = '' (default) backtrace_min_level = 'ERROR' (default) and log_backtrace = 'all' backtrace_functions = 'someFunc' backtrace_min_level = 'ERROR' (default) > 4. I see the support for wildcard in backtrace_functions is missing. > Is it intentionally left out? If not, can you make it part of 0003 > patch? Yes it's intentional, see answer on 1. > 5. The amount of backtraces generated is going to be too huge when > setting log_backtrace = 'all' and backtrace_min_level = 'debug5'. With > this setting installcheck generated 12MB worth of log and the test > took about 55 seconds (as opposed to 48 seconds without it) The point > is if these settings are misused, it can easily slow down the entire > system and fill up disk space leading to crashes eventually. This > makes a strong case for marking log_backtrace a developer only > function. I think the same argument can be made for many other GUCs that are not marked as developer options (e.g. log_min_messages). Normally we "protect" such options by using PGC_SUSET. DEVELOPER_OPTIONS is really only meant for options that are only useful for developers > 6. In continuation to comment #5, does anyone need backtrace for > elevels like debugX and LOG etc.? What's the use of the backtrace in > such cases? I think at least WARNING and NOTICE could be useful in practice, but I agree LOG and DEBUGX seem kinda useless. But it seems kinda strange to not have them in the list, especially given it is pretty much no effort to support them too.
On Fri, 22 Mar 2024 at 11:09, Jelte Fennema-Nio <me@jeltef.nl> wrote: > On Thu, 21 Mar 2024 at 15:41, Jelte Fennema-Nio <me@jeltef.nl> wrote: > > Attached is a patch that implements this. Since the more I think about > > it, the more I like this approach. > > I now added a 3rd patch to this patchset which changes the > log_backtrace default to "internal", because it seems quite useful to > me if user error reports of internal errors included a backtrace. I think patch 0002 should be considered an Open Item for PG17. Since it's proposing changing the name of the newly introduced backtrace_on_internal_error GUC. If we want to change it in this way, we should do it before the release and preferably before the beta. I added it to the Open Items list[1] so we don't forget to at least decide on this. [1]: https://wiki.postgresql.org/wiki/PostgreSQL_17_Open_Items
On Wed, Apr 10, 2024 at 11:07:00AM +0200, Jelte Fennema-Nio wrote: > I think patch 0002 should be considered an Open Item for PG17. Since > it's proposing changing the name of the newly introduced > backtrace_on_internal_error GUC. If we want to change it in this way, > we should do it before the release and preferably before the beta. Indeed, it makes little sense to redesign this at the beginning of v18 if we're unhappy with the current outcome of v17. So tweaking that is worth considering at this stage. log_backtrace speaks a bit more to me as a name for this stuff because it logs a backtrace. Now, there is consistency on HEAD as well because these GUCs are all prefixed with "backtrace_". Would something like a backtrace_mode where we have an enum rather than a boolean be better? One thing would be to redesign the existing GUC as having two values on HEAD as of: - "none", to log nothing. - "internal", to log backtraces for internal errors. Then this could be extended with more modes, to discuss in future releases as new features. What you are suggesting with backtrace_min_level is an entirely new feature. Perhaps using an extra GUC to control the interactions of the modes that can be assigned to the primary GUC "log_backtrace" in your 0002 is better, but all that sounds like v18 material at this stage. The code that resolves the interactions between the existing GUC and the new "level" GUC does not use LOGBACKTRACE_ALL. Perhaps it should use a case/switch. + gettext_noop("Each level includes all the levels that follow it. The later" + " the level, the fewer backtraces are created."), > I added it to the Open Items list[1] so we don't forget to at least > decide on this. > > [1]: https://wiki.postgresql.org/wiki/PostgreSQL_17_Open_Items Thanks. -- Michael
Attachment
On Fri, Apr 12, 2024 at 09:36:36AM +0900, Michael Paquier wrote: > log_backtrace speaks a bit more to me as a name for this stuff because > it logs a backtrace. Now, there is consistency on HEAD as well > because these GUCs are all prefixed with "backtrace_". Would > something like a backtrace_mode where we have an enum rather than a > boolean be better? One thing would be to redesign the existing GUC as > having two values on HEAD as of: > - "none", to log nothing. > - "internal", to log backtraces for internal errors. > > Then this could be extended with more modes, to discuss in future > releases as new features. As this is an open item, let's move on here. Attached is a proposal of patch for this open item, switching backtrace_on_internal_error to backtrace_mode with two values: - "none", to log no backtraces. - "internal", to log backtraces for internal errors. The rest of the proposals had better happen as a v18 discussion, where extending this GUC is a benefit. -- Michael
Attachment
On 18.04.24 09:02, Michael Paquier wrote: > On Fri, Apr 12, 2024 at 09:36:36AM +0900, Michael Paquier wrote: >> log_backtrace speaks a bit more to me as a name for this stuff because >> it logs a backtrace. Now, there is consistency on HEAD as well >> because these GUCs are all prefixed with "backtrace_". Would >> something like a backtrace_mode where we have an enum rather than a >> boolean be better? One thing would be to redesign the existing GUC as >> having two values on HEAD as of: >> - "none", to log nothing. >> - "internal", to log backtraces for internal errors. >> >> Then this could be extended with more modes, to discuss in future >> releases as new features. > > As this is an open item, let's move on here. > > Attached is a proposal of patch for this open item, switching > backtrace_on_internal_error to backtrace_mode with two values: > - "none", to log no backtraces. > - "internal", to log backtraces for internal errors. > > The rest of the proposals had better happen as a v18 discussion, where > extending this GUC is a benefit. Why exactly is this an open item? Is there anything wrong with the existing feature?
On Thu, 18 Apr 2024 at 10:50, Peter Eisentraut <peter@eisentraut.org> wrote: > Why exactly is this an open item? Is there anything wrong with the > existing feature? The name of the GUC backtrace_on_internal_error is so specific that it's impossible to extend our backtrace behaviour in future releases without adding yet another backtrace GUC. You started the discussion on renaming it upthread: On Fri, 8 Mar 2024 at 15:51, Peter Eisentraut <peter@eisentraut.org> wrote: > What is the relationship of these changes with the recently added > backtrace_on_internal_error? We had similar discussions there, I feel > like we are doing similar things here but slightly differently. Like, > shouldn't backtrace_functions_min_level also affect > backtrace_on_internal_error? Don't you really just want > backtrace_on_any_error? You are sneaking that in through the backdoor > via backtrace_functions. Can we somehow combine all these use cases > more elegantly? backtrace_on_error = {all|internal|none}?
On Thu, 18 Apr 2024 at 09:02, Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Apr 12, 2024 at 09:36:36AM +0900, Michael Paquier wrote: > > log_backtrace speaks a bit more to me as a name for this stuff because > > it logs a backtrace. Now, there is consistency on HEAD as well > > because these GUCs are all prefixed with "backtrace_". Would > > something like a backtrace_mode where we have an enum rather than a > > boolean be better? I guess it depends what we want consistency with. If we want naming consistency with all other LOGGING_WHAT GUCs or if we want naming consistency with the current backtrace_functions GUC. I personally like log_backtrace slightly better, but I don't have a super strong opinion on this either. Another option could be plain "backtrace". > > One thing would be to redesign the existing GUC as > > having two values on HEAD as of: > > - "none", to log nothing. > > - "internal", to log backtraces for internal errors. If we go for backtrace_mode or backtrace, then I think I'd prefer "disabled"/"off" and "internal_error" for these values. > The rest of the proposals had better happen as a v18 discussion, where > extending this GUC is a benefit. agreed
On Thu, Apr 18, 2024 at 12:21:56PM +0200, Jelte Fennema-Nio wrote: > On Thu, 18 Apr 2024 at 09:02, Michael Paquier <michael@paquier.xyz> wrote: >> On Fri, Apr 12, 2024 at 09:36:36AM +0900, Michael Paquier wrote: >> log_backtrace speaks a bit more to me as a name for this stuff because >> it logs a backtrace. Now, there is consistency on HEAD as well >> because these GUCs are all prefixed with "backtrace_". Would >> something like a backtrace_mode where we have an enum rather than a >> boolean be better? > > I guess it depends what we want consistency with. If we want naming > consistency with all other LOGGING_WHAT GUCs or if we want naming > consistency with the current backtrace_functions GUC. I personally > like log_backtrace slightly better, but I don't have a super strong > opinion on this either. Another option could be plain "backtrace". "backtrace" is too generic IMO. I'd append a "mode" as an effect of backtrace_functions, which is also a developer option, and has been around for a couple of releases now. >> One thing would be to redesign the existing GUC as >> having two values on HEAD as of: >> - "none", to log nothing. >> - "internal", to log backtraces for internal errors. > > If we go for backtrace_mode or backtrace, then I think I'd prefer > "disabled"/"off" and "internal_error" for these values. "internal_error" as a value sounds fine to me, that speaks more than just "internal". "off" rather that "none" or "disabled", less so, because it requires more enum entries to map with the boolean values that could be expected from it. "disabled" would be mostly a first in the GUCs, icu_validation_level being the first one using it, so I'd rather choose "none" over "disabled". No strong preference on this one, TBH, but as we're bike-shedding that.. -- Michael
Attachment
On 18.04.24 11:54, Jelte Fennema-Nio wrote: > On Thu, 18 Apr 2024 at 10:50, Peter Eisentraut<peter@eisentraut.org> wrote: >> Why exactly is this an open item? Is there anything wrong with the >> existing feature? > The name of the GUC backtrace_on_internal_error is so specific that > it's impossible to extend our backtrace behaviour in future releases > without adding yet another backtrace GUC. You started the discussion > on renaming it upthread: This presupposes that there is consensus about how the future functionality should look like. This topic has gone through half a dozen designs over a few months, and I think it would be premature to randomly freeze that discussion now and backport that design. If a better, more comprehensive design arises for PG18, I think it would be pretty easy to either remove backtrace_on_internal_error or just internally remap it.
On Fri, 19 Apr 2024 at 10:58, Peter Eisentraut <peter@eisentraut.org> wrote: > This presupposes that there is consensus about how the future > functionality should look like. This topic has gone through half a > dozen designs over a few months, and I think it would be premature to > randomly freeze that discussion now and backport that design. While there maybe isn't consensus on what a new design exactly looks like, I do feel like there's consensus on this thread that the backtrace_on_internal_error GUC is almost certainly not the design that we want. I guess a more conservative approach to this problem would be to revert the backtrace_on_internal_error commit and agree on a better design for PG18. But I didn't think that would be necessary if we could agree on the name for a more flexibly named GUC, which seemed quite possible to me (after a little bit of bikeshedding). > If a better, more comprehensive design arises for PG18, I think it would > be pretty easy to either remove backtrace_on_internal_error or just > internally remap it. I think releasing a GUC (even if it's just meant for developers) in PG17 and then deprecating it for a newer version in PG18 wouldn't be a great look. And even if that's not a huge problem, it still seems better not to have the problem at all. Renaming the GUC now seems to have only upsides to me: worst case the new design turns out not to be what we want either, and we have to deprecate the GUC. But in the best case we don't need to deprecate anything.
On Thu, Apr 18, 2024 at 3:02 AM Michael Paquier <michael@paquier.xyz> wrote: > As this is an open item, let's move on here. > > Attached is a proposal of patch for this open item, switching > backtrace_on_internal_error to backtrace_mode with two values: > - "none", to log no backtraces. > - "internal", to log backtraces for internal errors. > > The rest of the proposals had better happen as a v18 discussion, where > extending this GUC is a benefit. -1. That's just weird. There's no reason to replace a Boolean with a non-Boolean without adding a third value. Either we decide this concern is important enough to justify a post-feature-freeze design change, and add the third value now, or we leave it alone and revisit it in a future release. I'm probably OK with either one, but being halfway in between has no appeal for me. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Apr 19, 2024 at 7:31 AM Jelte Fennema-Nio <me@jeltef.nl> wrote: > While there maybe isn't consensus on what a new design exactly looks > like, I do feel like there's consensus on this thread that the > backtrace_on_internal_error GUC is almost certainly not the design > that we want. I guess a more conservative approach to this problem > would be to revert the backtrace_on_internal_error commit and agree on > a better design for PG18. But I didn't think that would be necessary > if we could agree on the name for a more flexibly named GUC, which > seemed quite possible to me (after a little bit of bikeshedding). I think Peter is correct that this presupposes we more or less agree on the final destination. For example, I think that log_backtrace = error | internal | all is a bit odd; why not backtrace_errcodes = <comma-separated list of error codes>? I've written a logging hook for EDB that can filter out error messages by error code, so I don't think it's at all a stretch to think that someone might want to do something similar here. I agree that it's somewhat likely that the name we want going forward isn't backtrace_on_internal_error, but I don't think we know that completely for certain, or what new name we necessarily want. > I think releasing a GUC (even if it's just meant for developers) in > PG17 and then deprecating it for a newer version in PG18 wouldn't be a > great look. And even if that's not a huge problem, it still seems > better not to have the problem at all. Renaming the GUC now seems to > have only upsides to me: worst case the new design turns out not to be > what we want either, and we have to deprecate the GUC. But in the best > case we don't need to deprecate anything. There are some things that are pretty hard to change once we've released them. For example, if we had a function or operator and somebody embeds it in a view definition, removing or renaming it prevents people from upgrading. But GUCs are not as bad. If you don't port your postgresql.conf forward to the new version, or if you haven't uncommented this particular setting, then there's no issue at all, and when there is a problem, removing a GUC setting from postgresql.conf is pretty straightforward compared to getting some construct out of your application code. I agree it's not amazing if we end up changing this exactly one release after it was introduced, but we don't really know that we're going to change it next release, or at all, and even if we do, I still don't think that's a catastrophe. I'm not completely certain that "let's just leave this alone for right now" is the correct conclusion, but the fact that we might need to rename a GUC down the road is not, by itself, a critical flaw in the status quo. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > There are some things that are pretty hard to change once we've > released them. For example, if we had a function or operator and > somebody embeds it in a view definition, removing or renaming it > prevents people from upgrading. But GUCs are not as bad. Really? Are we certain that nobody will put SETs of this GUC into their applications, or even just activate it via ALTER DATABASE SET? If they've done that, then a GUC change means dump/reload/upgrade failure. I've not been following this thread, so I don't have an opinion about what the design ought to be. But if we still aren't settled on it by now, I think the prudent thing is to revert the feature out of v17 altogether, and try again in v18. When we're still designing something after feature freeze, that is a good indication that we are trying to ship something that's not ready for prime time. regards, tom lane
On Fri, Apr 19, 2024 at 2:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > There are some things that are pretty hard to change once we've > > released them. For example, if we had a function or operator and > > somebody embeds it in a view definition, removing or renaming it > > prevents people from upgrading. But GUCs are not as bad. > > Really? Are we certain that nobody will put SETs of this GUC > into their applications, or even just activate it via ALTER DATABASE > SET? If they've done that, then a GUC change means dump/reload/upgrade > failure. That's fair. That feature isn't super-widely used, but it wouldn't be crazy for someone to use it with this feature, either. > I've not been following this thread, so I don't have an opinion > about what the design ought to be. But if we still aren't settled > on it by now, I think the prudent thing is to revert the feature > out of v17 altogether, and try again in v18. When we're still > designing something after feature freeze, that is a good indication > that we are trying to ship something that's not ready for prime time. There are two features at issue here. One is backtrace_on_internal_error, committed as a740b213d4b4d3360ad0cac696e47e5ec0eb8864. The other is a feature to produce backtraces for all errors, which was originally proposed as backtrace_functions='*', backtrace_functions_level=ERROR but which has subsequently been proposed with a few other spellings that involve merging that functionality into backtrace_on_internal_error. To the extent that there's an open question here for PG17, it's not about reverting this patch (which AIUI has never been committed) but about reverting the earlier patch for backtrace_on_internal_error. So is that the right thing to do? Well, on the one hand, I confess to having had a passing thought that backtrace_on_internal_error is awfully specific. Surely, user A's criterion for which messages should have backtraces might be anything, and we cannot reasonably add backtrace_on_$WHATEVER for all $WHATEVER in some large set. And the debate here suggests that I wasn't the only one to have that concern. So that argues for a revert. But on the other hand, in my personal experience, backtrace_on_internal_error would be the right thing in a really large number of cases. I was disappointed to see it added as a developer option with GUC_NOT_IN_SAMPLE. My vote would have been to put it in postgresql.conf and enable it by default. We have a somewhat debatable habit of using the exact same message in many places with similar kinds of problems, and when a production system manages to hit one of those errors, figuring out what actually went wrong is hard. In fact, it's often hard even when the error text only occurs in one or two places, because it's often some very low-level part of the code where you can't get enough context to understand the problem without knowing where that code got called from. So I sort of hate to see one of the most useful extensions of backtrace functionality that I can personally imagine get pulled back out of the tree because it turns out that someone else has something else that they want. I wonder whether a practical solution here might be to replace backtrace_on_internal_error=true|false with backtrace_on_error=true|internal|false. (Yes, I know that more proposed resolutions is not necessarily what we need right now, but I can't resist floating the idea.) -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Apr 19, 2024 at 2:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I've not been following this thread, so I don't have an opinion >> about what the design ought to be. But if we still aren't settled >> on it by now, I think the prudent thing is to revert the feature >> out of v17 altogether, and try again in v18. When we're still >> designing something after feature freeze, that is a good indication >> that we are trying to ship something that's not ready for prime time. > There are two features at issue here. One is > backtrace_on_internal_error, committed as > a740b213d4b4d3360ad0cac696e47e5ec0eb8864. The other is a feature to > produce backtraces for all errors, which was originally proposed as > backtrace_functions='*', backtrace_functions_level=ERROR but which has > subsequently been proposed with a few other spellings that involve > merging that functionality into backtrace_on_internal_error. To the > extent that there's an open question here for PG17, it's not about > reverting this patch (which AIUI has never been committed) but about > reverting the earlier patch for backtrace_on_internal_error. So is > that the right thing to do? I can't say that I care for "backtrace_on_internal_error". Re-reading that thread, I see I argued for having *no* GUC and just enabling that behavior all the time. I lost that fight, but it should have been clear that a GUC of this exact shape is a design dead end --- and that's what we're seeing now. > Well, on the one hand, I confess to having had a passing thought that > backtrace_on_internal_error is awfully specific. Surely, user A's > criterion for which messages should have backtraces might be anything, > and we cannot reasonably add backtrace_on_$WHATEVER for all $WHATEVER > in some large set. And the debate here suggests that I wasn't the only > one to have that concern. So that argues for a revert. Exactly. > But on the other hand, in my personal experience, > backtrace_on_internal_error would be the right thing in a really large > number of cases. That's why I thought we could get away with doing it automatically. Sure, more control would be better. But if we just hard-wire it for the moment then we aren't locking in what the eventual design for that control will be. regards, tom lane
On Fri, Apr 19, 2024 at 3:24 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I can't say that I care for "backtrace_on_internal_error". > Re-reading that thread, I see I argued for having *no* GUC and > just enabling that behavior all the time. I lost that fight, > but it should have been clear that a GUC of this exact shape > is a design dead end --- and that's what we're seeing now. Yeah, I guess I have to agree with that. > > But on the other hand, in my personal experience, > > backtrace_on_internal_error would be the right thing in a really large > > number of cases. > > That's why I thought we could get away with doing it automatically. > Sure, more control would be better. But if we just hard-wire it for > the moment then we aren't locking in what the eventual design for > that control will be. So the question before us right now is whether there's a palatable alternative to completely ripping out a feature that both you and I seem to agree does something useful. I don't think we necessarily need to leap to the conclusion that a revert is radically less risky than some other alternative. Now, if there's not some obvious alternative upon which we can (mostly) all agree, then maybe that's where we end up. But I would like us to be looking to save the features we can. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Apr 19, 2024 at 04:17:18PM -0400, Robert Haas wrote: > On Fri, Apr 19, 2024 at 3:24 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I can't say that I care for "backtrace_on_internal_error". >> Re-reading that thread, I see I argued for having *no* GUC and >> just enabling that behavior all the time. I lost that fight, >> but it should have been clear that a GUC of this exact shape >> is a design dead end --- and that's what we're seeing now. > > Yeah, I guess I have to agree with that. Ah, I have missed this argument. > So the question before us right now is whether there's a palatable > alternative to completely ripping out a feature that both you and I > seem to agree does something useful. I don't think we necessarily need > to leap to the conclusion that a revert is radically less risky than > some other alternative. Now, if there's not some obvious alternative > upon which we can (mostly) all agree, then maybe that's where we end > up. But I would like us to be looking to save the features we can. Removing this GUC and making the backend react by default the same way as when this GUC was enabled sounds like a sensible route here. This stuff is useful. -- Michael
Attachment
On 19.04.24 21:24, Tom Lane wrote: >> But on the other hand, in my personal experience, >> backtrace_on_internal_error would be the right thing in a really large >> number of cases. > That's why I thought we could get away with doing it automatically. > Sure, more control would be better. But if we just hard-wire it for > the moment then we aren't locking in what the eventual design for > that control will be. Note that a standard test run produces a number of internal errors. I haven't checked how likely these are in production, but one might want to consider that before starting to dump backtraces in routine situations. For example, $ PG_TEST_INITDB_EXTRA_OPTS='-c backtrace_on_internal_error=on' meson test -C build $ grep -r 'BACKTRACE:' build/testrun | wc -l 85
On Sat, 20 Apr 2024 at 01:19, Michael Paquier <michael@paquier.xyz> wrote: > Removing this GUC and making the backend react by default the same way > as when this GUC was enabled sounds like a sensible route here. This > stuff is useful. I definitely agree it's useful. But I feel like changing the default of the GUC and removing the ability to disable it at the same time are pretty radical changes that we should not be doing after a feature freeze. I think we should at least have a way to turn this feature off to be able to avoid log spam. Especially given the fact that extensions use elog much more freely than core. Which afaict from other threads[1] Tom also thinks we should normally be careful about. Of the options to resolve the open item so far, I think there are only three somewhat reasonable to do after the feature freeze: 1. Rename the GUC to something else (but keep behaviour the same) 2. Decide to keep the GUC as is 3. Revert a740b213d4 I hoped 1 was possible, but based on the discussion so far it doesn't seem like we'll be able to get a quick agreement on a name. IMHO 2 is just a version of 1, but with a GUC name that no-one thinks is a good one. So I think we are left with option 3. [1]: https://www.postgresql.org/message-id/flat/524751.1707240550%40sss.pgh.pa.us#59710fd4f3f186e642b8e6b886b2fdff
On Sun, Apr 21, 2024 at 09:26:38AM +0200, Peter Eisentraut wrote: > Note that a standard test run produces a number of internal errors. I > haven't checked how likely these are in production, but one might want to > consider that before starting to dump backtraces in routine situations. > > For example, > > $ PG_TEST_INITDB_EXTRA_OPTS='-c backtrace_on_internal_error=on' meson test > -C build > $ grep -r 'BACKTRACE:' build/testrun | wc -l > 85 Ugh, I would not have expected that much. Isn't the problem you are reporting here entirely unrelated, though? It seems to me that these error paths should be using a proper errcode rather than the internal errcode, as these refer to states that can be reached by the user with a combination of queries and/or cancellations. For example, take this one for the REFRESH matview path which is a valid error, still using an elog(): if (!foundUniqueIndex) elog(ERROR, "could not find suitable unique index on materialized view"); I'd like to think about this stuff in a different way: this is useful if enabled by default because it can also help in finding out error paths that should not use the internal errcode. Normally, there should be zero backtraces produced, except in unexpected never-to-be-reached cases. -- Michael
Attachment
On Mon, Apr 22, 2024 at 2:36 AM Michael Paquier <michael@paquier.xyz> wrote: > I'd like to think about this stuff in a different way: this is useful > if enabled by default because it can also help in finding out error > paths that should not use the internal errcode. Normally, there > should be zero backtraces produced, except in unexpected > never-to-be-reached cases. That's long been my feeling about this. So, if we revert this for now, what we ought to do is put it back right ASAP after feature freeze and then clean all that up. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Apr 22, 2024 at 09:25:15AM -0400, Robert Haas wrote: > That's long been my feeling about this. So, if we revert this for now, > what we ought to do is put it back right ASAP after feature freeze and > then clean all that up. In the 85 backtraces I can find in the tests, we have a mixed bag of: - Code paths that use the internal errcode, but should not. - Code paths that use the internal errcode, and are OK with that in the scope of the tests. - Code paths that use the internal errcode, though the coding assumptions behind their use feels fuzzy to me, like amcheck for some SQL tests or satisfies_hash_partition() in one SQL test. As cleaning up that is a separate topic, I have begun a new thread and with a full analysis of everything I've spotted. See here: https://www.postgresql.org/message-id/Zic_GNgos5sMxKoa@paquier.xyz The first class of issues refers to real problems, and should be assigned errcodes. Having a way to switch the backtraces off can have some benefits in the second cases. However, even if we silence them, it would also mean to miss backtraces that could be legit. The third cases would require a different analysis, behind the designs of the code paths able to trigger the internal codes. At this stage, my opinion would tend in favor of a revert of the GUC. The second class of cases is useful to stress many unexpected cases, and I don't expect this number to go down over time, but increase with more advanced tests added into core (I/O failures with partial writes for availability, etc). -- Michael
Attachment
On Tue, Apr 23, 2024 at 1:05 AM Michael Paquier <michael@paquier.xyz> wrote: > At this stage, my opinion would tend in favor of a revert of the GUC. > The second class of cases is useful to stress many unexpected cases, > and I don't expect this number to go down over time, but increase with > more advanced tests added into core (I/O failures with partial writes > for availability, etc). All right. I think there is a consensus in favor of reverting a740b213d4b4d3360ad0cac696e47e5ec0eb8864. Maybe not an absolutely iron-clad consensus, but there are a couple of votes explicitly in favor of that course of action and the other votes seem to mostly be of the form "well, reverting is ONE option." Peter? -- Robert Haas EDB: http://www.enterprisedb.com
Hi, On 2024-04-19 15:24:17 -0400, Tom Lane wrote: > I can't say that I care for "backtrace_on_internal_error". > Re-reading that thread, I see I argued for having *no* GUC and > just enabling that behavior all the time. I lost that fight, > but it should have been clear that a GUC of this exact shape > is a design dead end --- and that's what we're seeing now. I don't think enabling backtraces without a way to disable them is a good idea - security vulnerablilities in backtrace generation code are far from unheard of and can make error handling a lot slower... Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > I don't think enabling backtraces without a way to disable them is a good idea > - security vulnerablilities in backtrace generation code are far from unheard > of and can make error handling a lot slower... Well, in that case we have to have some kind of control GUC, and I think the consensus is that the one we have now is under-designed. So I also vote for a full revert and try again in v18. regards, tom lane
On 2024-04-26 14:39:16 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > I don't think enabling backtraces without a way to disable them is a good idea > > - security vulnerablilities in backtrace generation code are far from unheard > > of and can make error handling a lot slower... > > Well, in that case we have to have some kind of control GUC, and > I think the consensus is that the one we have now is under-designed. > So I also vote for a full revert and try again in v18. Yea, makes sense. I just wanted to point out that some level of control is needed, not say that what we have now is right.
On Fri, Apr 26, 2024 at 02:39:16PM -0400, Tom Lane wrote: > Well, in that case we have to have some kind of control GUC, and > I think the consensus is that the one we have now is under-designed. > So I also vote for a full revert and try again in v18. Okay, fine by me to move on with a revert. -- Michael
Attachment
On 27.04.24 00:16, Michael Paquier wrote: > On Fri, Apr 26, 2024 at 02:39:16PM -0400, Tom Lane wrote: >> Well, in that case we have to have some kind of control GUC, and >> I think the consensus is that the one we have now is under-designed. >> So I also vote for a full revert and try again in v18. > > Okay, fine by me to move on with a revert. Ok, it's reverted.
On Mon, Apr 29, 2024 at 5:12 AM Peter Eisentraut <peter@eisentraut.org> wrote: > On 27.04.24 00:16, Michael Paquier wrote: > > On Fri, Apr 26, 2024 at 02:39:16PM -0400, Tom Lane wrote: > >> Well, in that case we have to have some kind of control GUC, and > >> I think the consensus is that the one we have now is under-designed. > >> So I also vote for a full revert and try again in v18. > > > > Okay, fine by me to move on with a revert. > > Ok, it's reverted. Thanks, and sorry. :-( -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Apr 29, 2024 at 08:08:19AM -0400, Robert Haas wrote: > On Mon, Apr 29, 2024 at 5:12 AM Peter Eisentraut <peter@eisentraut.org> wrote: >> Ok, it's reverted. Thanks for taking care of it. > Thanks, and sorry. :-( Sorry for the outcome.. -- Michael
Attachment
Hi, This patch is currently parked in the July CommitFest: https://commitfest.postgresql.org/48/4735/ That's fine, except that I think that what the previous discussion revealed is that we don't have consensus on how backtrace behavior ought to be controlled: backtrace_on_internal_error was one proposal, and this was a competing proposal, and neither one of them seemed to be completely satisfactory. If we don't forge a consensus on what a hypothetical patch ought to do, any particular actual patch is probably doomed. So I would suggest that the effort ought to be deciding what kind of design would be generally acceptable -- and that need not wait for July, nor should it, if the goal is to get something committed in July. ...Robert
On Wed, 15 May 2024 at 20:31, Robert Haas <robertmhaas@gmail.com> wrote: > That's fine, except that I think that what the previous discussion > revealed is that we don't have consensus on how backtrace behavior > ought to be controlled: backtrace_on_internal_error was one proposal, > and this was a competing proposal, and neither one of them seemed to > be completely satisfactory. Attached is a rebased patchset of my previous proposal, including a few changes that Michael preferred: 1. Renames log_backtrace to log_backtrace_mode 2. Rename internal to internal_error I reread the thread since I previously posted the patch and apart from Michaels feedback I don't think there was any more feedback on the current proposal. Rethinking about it myself though, I think the main downside of this proposal is that if you want the previous behaviour of backtrace_functions (add backtraces to all elog/ereports in the given functions) you now need to set three GUCs: log_backtrace_mode='all' backtrace_functions='some_func' backtrace_min_level=DEBUG5 The third one is not needed in the common case where someone only cares about errors, but still needing to set log_backtrace_mode='all' might seem a bit annoying. One way around that would be to make log_backtrace_mode and backtrace_functions be additive instead of subtractive. Personally I think the proposed subtractive nature would be exactly what I want for backtraces I'm interested in. Because I would want to use backtrace_functions in this way: 1. I see an error I want a backtrace of: et log_backtrace_mode='all' and try to trigger again. 2. Oops, there's many backtraces now let's filter by function: set backtrace_functions=some_func So if it's additive, I'd have to also undo log_backtrace_mode='all' again at step 2. So changing two GUCs instead of one to do what I want.
Attachment
> On 18 Feb 2025, at 09:34, Jelte Fennema-Nio <postgres@jeltef.nl> wrote: > > On Thu, 27 Jun 2024 at 12:43, Jelte Fennema-Nio <me@jeltef.nl> wrote: >> Attached is a rebased patchset of my previous proposal, including a >> few changes that Michael preferred: > > Rebased again. (got notified because of the new commitfest rebase emails) We typically avoid having intra-depencies between multiple GUCs to control a single behavior like this. I'm not sure having three is the right move here but I need to do some more review and testing. > The first patch should be trivial to commit at least as it's just cleanup. Agreed, I've applied that to keep it from getting lost in here. -- Daniel Gustafsson