Thread: proposal: additional error fields
Hello I have to goals for 9.3. First goal is plpgsql_check_function, second goal is enhancing ErrorData and error management to support new fields: COLUMN_NAME, CONSTRAINT_NAME, CONSTRAINT_SCHEMA, SCHEMA_NAME, TABLE_NAME, ROUTINE_NAME, ROUTINE_SCHEMA, TRIGGER_NAME and TRIGGER_SCHEMA previous discussion is in thread http://postgresql.1045698.n5.nabble.com/patch-for-9-2-enhanced-errors-td4470837.html COLUMN_NAME - contains missing or inaccessible column name or empty string CONSTRAINT_NAME - a name of constraint caused error CONSTRAINT_SCHEMA - a name of schema where constraint is defined - usually same as table schema in PostgreSQL SCHEMA_NAME - schema name of table that caused exception ROUTINE_NAME, ROUTINE_SCHEMA name and schema of function that caused exception - this doesn't mean function where exception was raised TABLE_NAME - a name of table that caused exception TRIGGER_NAME, TRIGGER_SCHEMA - name and schema of trigger that caused exception attached patch is redesigned previous version - actually it supports constraints and RI only second patch will be related to plpgsql enhancing to use these error fields. Regards Pavel
Attachment
On 1 May 2012 13:21, Pavel Stehule <pavel.stehule@gmail.com> wrote: > COLUMN_NAME - contains missing or inaccessible column name or empty string > CONSTRAINT_NAME - a name of constraint caused error > CONSTRAINT_SCHEMA - a name of schema where constraint is defined - > usually same as table schema in PostgreSQL > SCHEMA_NAME - schema name of table that caused exception > ROUTINE_NAME, ROUTINE_SCHEMA name and schema of function that caused > exception - this doesn't mean function where exception was raised > TABLE_NAME - a name of table that caused exception > TRIGGER_NAME, TRIGGER_SCHEMA - name and schema of trigger that caused exception I'm strongly in favour of this. Certainly, the need to translate an error into a domain-specific error message within the application is a common one, and there's currently no well-principled way to do so, certainly not across locales. What I'd also like to see, which is something that I've agitated about in the past without much luck, is for a new severity level, along the lines of a "severe error". The idea of this is to make a representation that the error in question is one that the DBA should reasonably hope to never see. That is quite distinct from the nature of what usually form the large majority of errors - routine integrity constraint violations and things like that. Do you suppose you could incorporate this into your design? It would be nice if in addition to this, a domain-specific error message could be specified within the database, associated with each constraint, but I suppose that the details of the API would require a great deal of bike shedding. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
2012/5/1 Peter Geoghegan <peter@2ndquadrant.com>: > On 1 May 2012 13:21, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> COLUMN_NAME - contains missing or inaccessible column name or empty string >> CONSTRAINT_NAME - a name of constraint caused error >> CONSTRAINT_SCHEMA - a name of schema where constraint is defined - >> usually same as table schema in PostgreSQL >> SCHEMA_NAME - schema name of table that caused exception >> ROUTINE_NAME, ROUTINE_SCHEMA name and schema of function that caused >> exception - this doesn't mean function where exception was raised >> TABLE_NAME - a name of table that caused exception >> TRIGGER_NAME, TRIGGER_SCHEMA - name and schema of trigger that caused exception > > I'm strongly in favour of this. Certainly, the need to translate an > error into a domain-specific error message within the application is a > common one, and there's currently no well-principled way to do so, > certainly not across locales. yes, this is reason why I wrote this patch. Additional benefit is significantly richer exception data model, that can be used for PL What I'd also like to see, which is > something that I've agitated about in the past without much luck, is > for a new severity level, along the lines of a "severe error". The > idea of this is to make a representation that the error in question is > one that the DBA should reasonably hope to never see. That is quite > distinct from the nature of what usually form the large majority of > errors - routine integrity constraint violations and things like that. > Do you suppose you could incorporate this into your design? I don't understand well, can you explain it. I don't plan to solve more issues in one patch, but it can be inspiration for next work. Regards Pavel > > It would be nice if in addition to this, a domain-specific error > message could be specified within the database, associated with each > constraint, but I suppose that the details of the API would require a > great deal of bike shedding. > > -- > Peter Geoghegan http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training and Services
On 1 May 2012 14:01, Pavel Stehule <pavel.stehule@gmail.com> wrote: > What I'd also like to see, which is >> something that I've agitated about in the past without much luck, is >> for a new severity level, along the lines of a "severe error". The >> idea of this is to make a representation that the error in question is >> one that the DBA should reasonably hope to never see. That is quite >> distinct from the nature of what usually form the large majority of >> errors - routine integrity constraint violations and things like that. >> Do you suppose you could incorporate this into your design? > > I don't understand well, can you explain it. Alright. Table 18-1 describes current severity levels: http://www.postgresql.org/docs/current/static/runtime-config-logging.html#RUNTIME-CONFIG-SEVERITY-LEVELS Currently the following informal categories of error are bunched together at ERROR severity: * Integrity constraint violations * Very serious situations, like running out of disk space * Serious disasters that often relate to hardware failure, like "xlog flush request %X/%X is not satisfied --- flushed only to %X/%X" * Errors that if seen relate to a bug within PostgreSQL, with obscure error messages, as from most of the elog calls within the planner, for example. The first category of error is something that the DBA will often see very frequently. The latter 3 are situations which I'd like to be woken up in the middle of the night to respond to. We ought to be facilitating monitoring tools (including very simple ones like grep), so that they can make this very important practical distinction. The hard part is replacing the severity level of many existing elog/ereport call sites, but that's not much of a problem, really. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Peter Geoghegan <peter@2ndquadrant.com> writes: > Currently the following informal categories of error are bunched > together at ERROR severity: > * Integrity constraint violations > * Very serious situations, like running out of disk space > * Serious disasters that often relate to hardware failure, like "xlog > flush request %X/%X is not satisfied --- flushed only to %X/%X" > * Errors that if seen relate to a bug within PostgreSQL, with obscure > error messages, as from most of the elog calls within the planner, for > example. > The first category of error is something that the DBA will often see > very frequently. The latter 3 are situations which I'd like to be > woken up in the middle of the night to respond to. We ought to be > facilitating monitoring tools (including very simple ones like grep), > so that they can make this very important practical distinction. The > hard part is replacing the severity level of many existing > elog/ereport call sites, but that's not much of a problem, really. The last time you complained about this I suggested that looking at SQLSTATE would resolve most of your concern. Have you pursued that angle? I'm not at all excited about inventing more kinds of "error" severity. For one thing, the fact that ERROR is the only severity level that results in a longjmp is known in more places than you might think, not all of them in the core code. For another, this would result in a protocol break -- that is, an incompatibility, not just more fields -- in the FE/BE ErrorResponse message. regards, tom lane
Peter Geoghegan <peter@2ndquadrant.com> wrote: > Currently the following informal categories of error are bunched > together at ERROR severity: > > * Integrity constraint violations > * Very serious situations, like running out of disk space > * Serious disasters that often relate to hardware failure, like > "xlog flush request %X/%X is not satisfied --- flushed only to > %X/%X" > * Errors that if seen relate to a bug within PostgreSQL, with > obscure error messages, as from most of the elog calls within the > planner, for example. You forgot a few: * Syntax errors in submitted statements. * State-related errors, like trying to execute VACUUM in a transaction or trying to UPDATE a row in a READ ONLY transaction. * Serialization failures. * RAISE EXCEPTION from inside a function; often because of a validation in a BEFORE trigger function. If you tilt your head just right you can see these as constraint violations, but I think it is different enough to deserve specific mention. * Violations of security policy. * Loss of connection to a client. I'm sure I didn't cover them all, but the wide variety of causes should be recognized when considering a change like adding a new severity level. The fact is, these all are (or at least *should be*) coded with a SQLSTATE to classify the nature of the problem. Adding one more severity level forces us to examine every class of error and determine whether it should be promoted to "severe". Perhaps a better solution would be to allow some filtering of error SQLSTATE values to determine what action is taken? Changes would be more localized, and it would provide greater flexibility. The comments earlier in the thread about this helping translation don't make sense to me. In what way would a new severity level help with that? -Kevin
On Tue, May 1, 2012 at 8:21 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > I have to goals for 9.3. First goal is plpgsql_check_function, second > goal is enhancing ErrorData and error management to support new > fields: COLUMN_NAME, CONSTRAINT_NAME, CONSTRAINT_SCHEMA, SCHEMA_NAME, > TABLE_NAME, ROUTINE_NAME, ROUTINE_SCHEMA, TRIGGER_NAME and > TRIGGER_SCHEMA > > previous discussion is in thread > http://postgresql.1045698.n5.nabble.com/patch-for-9-2-enhanced-errors-td4470837.html I have some concerns about the performance cost of this. Now, you may think that this is a dumb thing to be concerned about, but some testing I've done seems to indicate that MOST of the cost of rolling back a subtransaction is the cost of generating the error string, and this is why PL/pgsql exception blocks are slow, and I actually do think that the slowness of PL/pgsql exception blocks is a real issue for users. It certainly has been for me, in the past. So adding 9 more fields that will have to be populated on every error whether someone cares about them or not is a little scary to me. If, on the other hand, we can arrange to generate these fields only when they'll be used, that would be a lot more appealing, and obviously we might be able to apply the same technique to the error message itself, which would be neat, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 1 May 2012 14:55, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The last time you complained about this I suggested that looking at > SQLSTATE would resolve most of your concern. Have you pursued that > angle? Sure, I did consider that at the time. I think that a new severity level (equivalent to ERROR in all but name) is warranted to make a representation that certain errors are more severe than the bulk of ERRORs seen in real production systems, that do not warrant being treated as FATAL or PANIC. Leaving aside the practical implications, it seems likely that many will agree that a new severity level is at least intuitively the most natural way to do so. Another consideration is that it seems unlikely that many elog call sites will ever be replaced by ereport calls, that specify an SQLSTATE . As it says in the docs: "Any message that is likely to be of interest to ordinary users should go through ereport. Nonetheless, there are enough internal "can't happen" error checks in the system that elog is still widely used; it is preferred for those messages for its notational simplicity." Certainly, the "xlog flush request not satisfied" example is a simple elog call, as are many other bellwethers of data corruption. What if the user doesn't specify %e in their log_line_prefix (most don't)? What hope have they then? You might counter "they can add that if they're interested", but who isn't interested in nipping serious problems in the bud? Yes, we ought to be somewhat paternalistically forcing users to take notice of severe problems by drawing their attention to them using special terminology. Besides, SQLSTATE doesn't attempt to classify errors by severity, but merely characterises them as belonging to some subsystem, or some broad category of error. Sure, some of these are unambiguously serious, like disk_full or data_corrupted, but others clearly straddle between being errors and severe errors, like internal_error. Besides, who wants to go to the trouble of enumerating all known severe SQLSTATE codes when grepping for such a simple requirement? > I'm not at all excited about inventing more kinds of "error" severity. > For one thing, the fact that ERROR is the only severity level that > results in a longjmp is known in more places than you might think, > not all of them in the core code. For another, this would result in a > protocol break -- that is, an incompatibility, not just more fields -- > in the FE/BE ErrorResponse message. I don't think that bumping the protocol version is necessarily that big of a deal. Certainly, it wouldn't be very difficult to make the new version of the protocol compatible with the old - SEVERE_ERROR will simply act as an alias for ERROR there. Maybe no one is convinced by any of this, but the fact is that the SQLSTATE argument falls down when one considers that we aren't using it in many cases of errors that clearly are severe. I draw a very strong practical distinction between errors that are a serious problem, that I need to worry about, and errors that are routine. To my mind it's a pity that Postgres doesn't similarly draw this distinction, even if that does imply that there will be a certain amount of grey area. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
2012/5/1 Robert Haas <robertmhaas@gmail.com>: > On Tue, May 1, 2012 at 8:21 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> I have to goals for 9.3. First goal is plpgsql_check_function, second >> goal is enhancing ErrorData and error management to support new >> fields: COLUMN_NAME, CONSTRAINT_NAME, CONSTRAINT_SCHEMA, SCHEMA_NAME, >> TABLE_NAME, ROUTINE_NAME, ROUTINE_SCHEMA, TRIGGER_NAME and >> TRIGGER_SCHEMA >> >> previous discussion is in thread >> http://postgresql.1045698.n5.nabble.com/patch-for-9-2-enhanced-errors-td4470837.html > > I have some concerns about the performance cost of this. Now, you may > think that this is a dumb thing to be concerned about, but some > testing I've done seems to indicate that MOST of the cost of rolling > back a subtransaction is the cost of generating the error string, and > this is why PL/pgsql exception blocks are slow, and I actually do > think that the slowness of PL/pgsql exception blocks is a real issue > for users. It certainly has been for me, in the past. So adding 9 > more fields that will have to be populated on every error whether > someone cares about them or not is a little scary to me. If, on the > other hand, we can arrange to generate these fields only when they'll > be used, that would be a lot more appealing, and obviously we might be > able to apply the same technique to the error message itself, which > would be neat, too. yes, it can has impact and I have to do some performance tests. But usually almost fields are NULL - and in typical use case are 2, 4, or 5 fields non empty. More - just copy string is used - so it is relative fast. Other possibility is preallocation, because all fields are limited by MAXNAMELEN. Same trick we can use for SQLSTATE variable create table ff(a int not null); CREATE OR REPLACE FUNCTION public.fx()RETURNS voidLANGUAGE plpgsql AS $function$ begin for i in 1..100000 loop begin insert into ff values(null); exception when others then /* do nothing */ end; end loop; end; $function$ this is most worst case - 5 fields more patched 1500 ms master 1380 ms so this is about 8% slowdown for unoptimized code where any statement was raised. Any other statement in loop decrease slowdown to half and usually not all statements will raise exception. I think so there are some possibility haw to optimize it - minimize palloc calls Regards Pavel > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
On Tue, May 1, 2012 at 12:02 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> I have some concerns about the performance cost of this. Now, you may >> think that this is a dumb thing to be concerned about, but some >> testing I've done seems to indicate that MOST of the cost of rolling >> back a subtransaction is the cost of generating the error string, and >> this is why PL/pgsql exception blocks are slow, and I actually do >> think that the slowness of PL/pgsql exception blocks is a real issue >> for users. It certainly has been for me, in the past. So adding 9 >> more fields that will have to be populated on every error whether >> someone cares about them or not is a little scary to me. If, on the >> other hand, we can arrange to generate these fields only when they'll >> be used, that would be a lot more appealing, and obviously we might be >> able to apply the same technique to the error message itself, which >> would be neat, too. > > yes, it can has impact and I have to do some performance tests. But > usually almost fields are NULL - and in typical use case are 2, 4, or > 5 fields non empty. More - just copy string is used - so it is > relative fast. Other possibility is preallocation, because all fields > are limited by MAXNAMELEN. Same trick we can use for SQLSTATE variable > > create table ff(a int not null); > > CREATE OR REPLACE FUNCTION public.fx() > RETURNS void > LANGUAGE plpgsql > AS $function$ > begin > for i in 1..100000 > loop > begin > insert into ff values(null); > exception when others then > /* do nothing */ > end; > end loop; > end; > $function$ > > this is most worst case - 5 fields more > > patched 1500 ms > master 1380 ms > > so this is about 8% slowdown for unoptimized code where any statement > was raised. Any other statement in loop decrease slowdown to half and > usually not all statements will raise exception. I think so there are > some possibility haw to optimize it - minimize palloc calls Yeah. I also noticed in my benchmarking that sprintf() seemed to be very slow, to the point where I wondered whether we ought to have our own minimal version of sprintf() just for error strings. Another idea would be to pass a flag indicating whether the context that's going to receive the error cares about the additional flags. If not, we can skip generating the information. Not sure how much that helps, but maybe... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> > Yeah. I also noticed in my benchmarking that sprintf() seemed to be > very slow, to the point where I wondered whether we ought to have our > own minimal version of sprintf() just for error strings. > > Another idea would be to pass a flag indicating whether the context > that's going to receive the error cares about the additional flags. > If not, we can skip generating the information. Not sure how much > that helps, but maybe... > complex solution can take time that we can get by sprintf elimination. some special flags for block can be solution, that can change a behave of err* functions. Usually we need only SQLSTATE and this can be minimized. maybe pl option create or replace function foo() returns .. as $$ #option light_exception begin ... in this case, only SQLSTATE should be valid. A implementation should not be hard - but we really need it? there is oprofile report for bad case: CREATE OR REPLACE FUNCTION public.fx()RETURNS voidLANGUAGE plpgsql AS $function$ declare a int = 0; begin for i in 1..100000 loop begin perform 1/ (a - (i % 1)); exception when others then /*do nothing */ end; end loop; end; $function$ 5871 10.2975 postgres AllocSetAlloc 4472 7.8437 postgres MemoryContextAllocZeroAligned 2570 4.5077 postgres MemoryContextAlloc 2031 3.5623 postgres AllocSetFree 1940 3.4027 postgres SearchCatCache 1906 3.3430 postgres expand_fmt_string.clone.0 1535 2.6923 postgres copyObject 1241 2.1767 postgres fmgr_info_cxt_security 1058 1.8557 postgres MemoryContextCreate 1057 1.8539 postgres ExecInitExpr 1050 1.8417 postgres simplify_function 968 1.6978 postgres pfree 965 1.6926 postgres check_stack_depth 812 1.4242 postgres _copyList.clone.20 801 1.4049 postgres hash_seq_search 777 1.3628 postgres eval_const_expressions_mutator 662 1.1611 postgres expression_tree_mutator 591 1.0366 postgres expression_tree_walker and good case postgres=# create or replace function fx() returns void as $$ declare a int = 1; begin for i in 1..100000 loop begin perform 1/ (a - (i % 1)); exception when others then /*do nothing */ end; end loop; end; $$ language plpgsql; 7916 10.1067 postgres AllocSetAlloc 5956 7.6043 postgres MemoryContextAllocZeroAligned 3339 4.2631 postgres MemoryContextAlloc 2581 3.2953 postgres SearchCatCache 2348 2.9978 postgres copyObject 2176 2.7782 postgres AllocSetFree 1643 2.0977 postgres MemoryContextCreate 1488 1.8998 postgres ExecInitExpr 1438 1.8360 postgres grouping_planner 1285 1.6406 postgres check_stack_depth 1224 1.5627 postgres fmgr_info_cxt_security 1094 1.3968 postgres pfree 1044 1.3329 postgres _copyList.clone.20 1030 1.3151 postgres eval_const_expressions_mutator 939 1.1989 postgres expression_tree_walker 938 1.1976 postgres simplify_function 889 1.1350 postgres AllocSetDelete 836 1.0674 postgres subquery_planner 834 1.0648 postgres expression_tree_mutator I don't see a errmsg or errdetail there - the most expensive is expand_fmt_string with 3% Regards Pavel > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
On 1 May 2012 17:22, Robert Haas <robertmhaas@gmail.com> wrote: > Yeah. I also noticed in my benchmarking that sprintf() seemed to be > very slow, to the point where I wondered whether we ought to have our > own minimal version of sprintf() just for error strings. Which sprintf()? You're probably aware that we already have a memset replacement, MemSet, and indeed we even have a pg_sprintf(). Without saying that we shouldn't do that, I'd caution against it. I use glibc 2.14.90, and for example my system's memcpy is implemented with various optimisation that are only applicable to the architecture of my system - SSE3 optimisations. A quick google throws up http://sources.redhat.com/ml/libc-alpha/2010-01/msg00016.html . Ditto every other architecture (most of these are from a recent revision of glibc from SCM): [peter@peterlaptop ~]$ locate memcpy.S /home/peter/glibc/sysdeps/i386/i586/memcpy.S /home/peter/glibc/sysdeps/i386/i686/memcpy.S /home/peter/glibc/sysdeps/i386/i686/multiarch/memcpy.S /home/peter/glibc/sysdeps/ia64/memcpy.S /home/peter/glibc/sysdeps/powerpc/powerpc32/a2/memcpy.S /home/peter/glibc/sysdeps/powerpc/powerpc32/cell/memcpy.S /home/peter/glibc/sysdeps/powerpc/powerpc32/power4/memcpy.S /home/peter/glibc/sysdeps/powerpc/powerpc32/power6/memcpy.S /home/peter/glibc/sysdeps/powerpc/powerpc32/power7/memcpy.S /home/peter/glibc/sysdeps/powerpc/powerpc64/memcpy.S /home/peter/glibc/sysdeps/powerpc/powerpc64/a2/memcpy.S /home/peter/glibc/sysdeps/powerpc/powerpc64/cell/memcpy.S /home/peter/glibc/sysdeps/powerpc/powerpc64/power4/memcpy.S /home/peter/glibc/sysdeps/powerpc/powerpc64/power6/memcpy.S /home/peter/glibc/sysdeps/powerpc/powerpc64/power7/memcpy.S /home/peter/glibc/sysdeps/s390/s390-32/memcpy.S /home/peter/glibc/sysdeps/s390/s390-64/memcpy.S /home/peter/glibc/sysdeps/sh/memcpy.S /home/peter/glibc/sysdeps/sparc/sparc32/memcpy.S /home/peter/glibc/sysdeps/sparc/sparc32/sparcv9/memcpy.S /home/peter/glibc/sysdeps/sparc/sparc32/sparcv9/multiarch/memcpy.S /home/peter/glibc/sysdeps/sparc/sparc64/memcpy.S /home/peter/glibc/sysdeps/sparc/sparc64/multiarch/memcpy.S /home/peter/glibc/sysdeps/x86_64/memcpy.S /home/peter/glibc/sysdeps/x86_64/multiarch/memcpy.S /home/peter/llvm/projects/compiler-rt/lib/arm/aeabi_memcpy.S /usr/src/debug/glibc-2.14-394-g8f3b1ff/sysdeps/x86_64/memcpy.S /usr/src/debug/glibc-2.14-394-g8f3b1ff/sysdeps/x86_64/multiarch/memcpy.S -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On Tue, May 01, 2012 at 02:19:16PM +0100, Peter Geoghegan wrote: > Currently the following informal categories of error are bunched > together at ERROR severity: > > * Integrity constraint violations > * Very serious situations, like running out of disk space > * Serious disasters that often relate to hardware failure, like "xlog > flush request %X/%X is not satisfied --- flushed only to %X/%X" > * Errors that if seen relate to a bug within PostgreSQL, with obscure > error messages, as from most of the elog calls within the planner, for > example. > > The first category of error is something that the DBA will often see > very frequently. The latter 3 are situations which I'd like to be > woken up in the middle of the night to respond to. We ought to be > facilitating monitoring tools (including very simple ones like grep), > so that they can make this very important practical distinction. The > hard part is replacing the severity level of many existing > elog/ereport call sites, but that's not much of a problem, really. I agree that some means to mechanically distinguish these cases would constitute a significant boon for admin monitoring. Note, however, that the same split appears at other severity levels: FATAL, routine: terminating connection due to conflict with recovery FATAL, critical: incorrect checksum in control file WARNING, routine: nonstandard use of escape in a string literal WARNING, critical: locallock table corrupted We'd be adding at least three new severity levels to cover the necessary messages by this approach.
Excerpts from Noah Misch's message of mar may 01 15:36:22 -0400 2012: > On Tue, May 01, 2012 at 02:19:16PM +0100, Peter Geoghegan wrote: > > The first category of error is something that the DBA will often see > > very frequently. The latter 3 are situations which I'd like to be > > woken up in the middle of the night to respond to. We ought to be > > facilitating monitoring tools (including very simple ones like grep), > > so that they can make this very important practical distinction. The > > hard part is replacing the severity level of many existing > > elog/ereport call sites, but that's not much of a problem, really. > > I agree that some means to mechanically distinguish these cases would > constitute a significant boon for admin monitoring. Note, however, that the > same split appears at other severity levels: > > FATAL, routine: terminating connection due to conflict with recovery > FATAL, critical: incorrect checksum in control file > WARNING, routine: nonstandard use of escape in a string literal > WARNING, critical: locallock table corrupted > > We'd be adding at least three new severity levels to cover the necessary > messages by this approach. So maybe a new severity is not the answer -- perhaps a separate error data field, say "criticality" (just to give it a name). ereport() could be labeled by default with routine (low) criticality, or a new macro errcritical(ERRCRIT_HIGH) could set it higher (which we would introduce manually in the right places); while elog() would have high criticality by default except for DEBUG messages (maybe we'd need a new macro elog_routine() for messages with a higher severity than DEBUG that do not need to be marked as critical). So in the log we could end up with noncritical messages being displayed as today, and critical ones with a new tag: FATAL: terminating connection due to conflict with recovery FATAL: CRITICAL: incorrect checksum in control file or something like that. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Noah Misch <noah@leadboat.com> writes: > I agree that some means to mechanically distinguish these cases would > constitute a significant boon for admin monitoring. Note, however, that the > same split appears at other severity levels: > FATAL, routine: terminating connection due to conflict with recovery > FATAL, critical: incorrect checksum in control file > WARNING, routine: nonstandard use of escape in a string literal > WARNING, critical: locallock table corrupted > We'd be adding at least three new severity levels to cover the necessary > messages by this approach. The reason why this is a problem is that the severity level is simply the wrong thing for this. Really, there are four severity levels in elog.c, and what they have to do with is the post-error path of control: NOTICE: return to callerERROR: longjmp to someplace that can recoverFATAL: end the sessionPANIC: crash, forcing database-widerestart We haven't done ourselves any favors by conflating a log-verbosity setting, which is what the subflavors of NOTICE really are, with this fundamental path-of-control issue. Adding subflavors of ERROR isn't going to make that better. To make matters worse, the severity level for a given error condition is not fixed, but can be context sensitive; for instance we sometimes promote ERROR to FATAL or PANIC if the backend's state is such that normal error recovery is inappropriate or possibly unsafe. Not to mention the numerous places that actually have logic to decide which severity level to report to begin with. So to do what Peter wants by means of severity levels, we'd need the same subflavors of FATAL and PANIC as well, which gets out of hand really quickly from a maintenance standpoint, and makes the scheme not nearly as user-friendly or easy to understand as he hopes, anyway. (Now it's possible that you could get away with lumping all PANICs into one "sound the alarms" category, but this is surely not the case for FATAL. Unless you want to be gotten out of bed anytime somebody fat-fingers their username or password.) I continue to maintain that the SQLSTATE is a much better basis for solving this problem. Its categories are already pretty close to what Peter needs: basically, IIUC, he wants to know about classes 53, 58, maybe F0, and XX. We might wish to move a small number of messages from one category to another to make the boundaries a bit clearer, but that would be vastly less painful for both implementors and users than inventing multiple new severity levels. regards, tom lane
On 1 May 2012 20:36, Noah Misch <noah@leadboat.com> wrote: > I agree that some means to mechanically distinguish these cases would > constitute a significant boon for admin monitoring. Note, however, that the > same split appears at other severity levels: > > FATAL, routine: terminating connection due to conflict with recovery > FATAL, critical: incorrect checksum in control file > WARNING, routine: nonstandard use of escape in a string literal > WARNING, critical: locallock table corrupted > > We'd be adding at least three new severity levels to cover the necessary > messages by this approach. Good point. It might make sense to have an orthogonal property - call it magnitude - and have that specified alongside the existing severity levels. However, there's really no point in bothering if all of the existing elog calls are let off the hook. Someone would need to go around and assign a value to every existing single elog and ereport callsite, with the exisiting elog macro signature incrementally deprecated. Furthermore, I'd support changing the definition of log_line_prefix within postgresql.conf.sample to include this new property by default. This feature will only be a boon to admin monitoring if it's actually something that there is a reasonable expectation of finding on most or all Postgres production systems in all circumstances. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Peter Geoghegan <peter@2ndquadrant.com> writes: > Maybe no one is convinced by any of this, but the fact is that the > SQLSTATE argument falls down when one considers that we aren't using > it in many cases of errors that clearly are severe. The reason that argument isn't convincing is that we *are* using a SQLSTATE for every such message; it's just defaulted to XX000. AFAICT, it would be reasonable to treat all XX000 as alarm conditions until proven different. If a given message is, in fact, not supposed to be "can't happen", then it shouldn't be going through elog(). We'd probably be needing to fix some places that were lazily coded as elogs, but under your proposal we would also have to touch every such place ... and thousands more besides. regards, tom lane
On 1 May 2012 21:14, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Geoghegan <peter@2ndquadrant.com> writes: >> Maybe no one is convinced by any of this, but the fact is that the >> SQLSTATE argument falls down when one considers that we aren't using >> it in many cases of errors that clearly are severe. > > The reason that argument isn't convincing is that we *are* using a > SQLSTATE for every such message; it's just defaulted to XX000. AFAICT, > it would be reasonable to treat all XX000 as alarm conditions until > proven different. If a given message is, in fact, not supposed to be > "can't happen", then it shouldn't be going through elog(). We'd > probably be needing to fix some places that were lazily coded as elogs, > but under your proposal we would also have to touch every such place > ... and thousands more besides. Fair enough. Adjusting all of those elog calls may be excessive. The argument could be made that what I've characterised as severe (which is, as I've said, not entirely clear-cut) could be deduced from SQLSTATE if we were to formalise the "can't happen errors are only allowed to use elog()" convention into a hard rule. However, I think it's critically important to make all of this easy and well-documented. Severity should probably be part of the default log_line_prefix. Sorry for high-jacking your thread, Pavel. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Peter Geoghegan <peter@2ndquadrant.com> wrote: > The argument could be made that what I've characterised as severe > (which is, as I've said, not entirely clear-cut) could be deduced > from SQLSTATE if we were to formalise the "can't happen errors are > only allowed to use elog()" convention into a hard rule. That doesn't seem necessary or desirable. The thing to do is to somewhere define a list of what is "severe". It seems likely that different shops may have different opinions on what constitutes a "severe" problem, or may have more than a "severe"/"not severe" dichotomy. So it would be best if it was configurable. To solve the problem, we mostly seem to need something which can scan the server log and take action based on SQLSTATE values. Since we can already easily log those, this seems like territory for external monitoring software. I don't see anything for the community here other than to discuss places where we might want to use a different SQLSTATE than we currently do. Or possibly hooks in the logging process, so monitors don't need to scan text. -Kevin
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- > owner@postgresql.org] On Behalf Of Peter Geoghegan > Sent: Tuesday, May 01, 2012 4:37 PM > To: Tom Lane > Cc: Pavel Stehule; PostgreSQL Hackers > Subject: Re: [HACKERS] proposal: additional error fields > > On 1 May 2012 21:14, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Peter Geoghegan <peter@2ndquadrant.com> writes: > >> Maybe no one is convinced by any of this, but the fact is that the > >> SQLSTATE argument falls down when one considers that we aren't using > >> it in many cases of errors that clearly are severe. > > > > The reason that argument isn't convincing is that we *are* using a > > SQLSTATE for every such message; it's just defaulted to XX000. > > AFAICT, it would be reasonable to treat all XX000 as alarm conditions > > until proven different. If a given message is, in fact, not supposed > > to be "can't happen", then it shouldn't be going through elog(). We'd > > probably be needing to fix some places that were lazily coded as > > elogs, but under your proposal we would also have to touch every such > > place ... and thousands more besides. > > Fair enough. Adjusting all of those elog calls may be excessive. The argument > could be made that what I've characterised as severe (which is, as I've said, > not entirely clear-cut) could be deduced from SQLSTATE if we were to > formalise the "can't happen errors are only allowed to use elog()" convention > into a hard rule. However, I think it's critically important to make all of this > easy and well-documented. Severity should probably be part of the default > log_line_prefix. > > Sorry for high-jacking your thread, Pavel. > So the apparent desire is to promote proper usage of SQLSTATE but simultaneously add and encode a default SQLSTATE_PG_SEVERITY value for each class/code that can be used for external monitoring and notification. Ideally customization could be done so that differing opinions on such severity classification could be made on a client-per-client basis without having to resort to outputting the SQLSTATE code itself and then requiring external software to maintain such an association. To that end any "severity" on the class itself would act as a default and specific codes that want to share the same severity can be skipped while those needing a different code can have an override specified. Since the codes are neither exhaustive nor mandatory such a default would apply to any user-chosen code not previously defined. Simply adding in more high-level categories avoids the issue that the current system has insufficient information encoded to facilitate desired reporting requirements. If we encode our messages with a sufficient level of detail then internally or externally adding categories and meta-data on top of those layers is simple and new ideas and techniques can be tried without having to modify the system in the future. Supplemental context information such as table and constraint names can be useful if the cost of recording such data is low enough and the value sufficient. That said, knowing the SQL that caused the error and which process it is implementing should be sufficient to identify possible causes and resolutions without requiring the specific columns and tables involved. Since constraint violations already expose the name of the violated constraint that particular situation seems to have a sufficient solution. Given that you should not be giving end-users that kind of implementation artifact anyway the developer and DBA should be able to identify the root cause and either avoid it themselves or code an application interface to present to the end-user. So, at least from my perspective, the bar to move this forward is pretty high - either it must be fairly simple to implement (which it is not) or there needs to be more value to it than I am seeing currently. This ignores whether normal runtime performance costs will be a significant factor. Looking at the SQLSTATE error classes I am somewhat concerned with the number of items found under "HV" and the apparent intermixing of "client" and "internal" error types. As for an upgrade path how about something along the lines of: 1) Make a best-attempt effort at identifying existing elog and ereport calls and modifying them to output specific SQLSTATE codes 2) Modify elog/ereport to catch and log (stack trace) any calls that do not set SQLSTATE to a specific value. 3) During development, beta, and RC phases keep such code in place and ask people to look at their logs for missed elog/ereport calls 4) Remove the stack trace (logging) within ereport/elog from the final released code David J.
On 1 May 2012 22:09, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > That doesn't seem necessary or desirable. The thing to do is to > somewhere define a list of what is "severe". It seems likely that > different shops may have different opinions on what constitutes a > "severe" problem, or may have more than a "severe"/"not severe" > dichotomy. I doubt that opinion would be that widely divided, and in any case I see no reason to let the perfect be the enemy of the good. For example Tom seemed pretty confident that the definition of non-routine/critical could be "has one of a limited number of existing SQLSTATEs", assuming use of elogs is formally limited to "can't happen" errors... > So it would be best if it was configurable. ...that said, it could be configurable, if the case for that were argued convincingly. At the very least, I'd expect the vast majority of users to stick to a stock set of SQLSTATEs. > To solve the problem, we mostly seem to need something which can scan the > server log and take action based on SQLSTATE values. Since we can > already easily log those, this seems like territory for external > monitoring software. I don't accept the premise that absolutely anything that could be done outside of the core system should not be in the core system. By that standard you could argue that a great deal of existing features shouldn't have been accepted, including for example streaming replication, since external systems provide roughly equivalent, albeit non-standardised, in some ways inferior behaviour (doing this in some third party project certainly isn't exactly equivalent, since there is no way to put the magnitude of the message in log_line_prefix to have it directly readable by a human from the logfile, etc). The pertinent question to my mind isn't whether we could do this outside the server, but rather if we *should* do it within the server. I think that we should, because lots of people want to know if their database server has our best working definition of a serious problem. That isn't a narrow use-case. There is another advantage to doing this within the core system that I have not touched upon, which is that in this way we can expose another useful GUC to reduce log noise, beyond log_min_messages. It's certainly not inconceivable that smaller sites could find it useful to not have to log every single integrity constraint violation, while still seeing other types of errors. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Peter Geoghegan <peter@2ndquadrant.com> writes: > There is another advantage to doing this within the core system that I > have not touched upon, which is that in this way we can expose another > useful GUC to reduce log noise, beyond log_min_messages. Yeah, that one is actually a pretty compelling argument. A *lot* of people would like to have a setting for "log only non-routine errors", I think. regards, tom lane
On Tue, May 1, 2012 at 4:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I continue to maintain that the SQLSTATE is a much better basis for > solving this problem. Its categories are already pretty close to > what Peter needs: basically, IIUC, he wants to know about classes > 53, 58, maybe F0, and XX. This is really too mushy, IMHO. ERRCODE_TOO_MANY_CONNECTIONS isn't what I'd call an oh-shit condition even though it's in class 53, but this "could not create archive status file \"%s\"" is definitely an oh-shit regardless of what errcode_for_file_access() returns. Also, the fact is that most people do not log SQLSTATEs. And even if they did, they're not going to know to grep for 53|58|maybe F0|XX. What we need is an easy way for people to pick out any log entries that represent conditions that should never occur as a result of any legitimate user activity. Like, with grep. And, without needing to have a PhD in Postgresology. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, May 1, 2012 at 4:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I continue to maintain that the SQLSTATE is a much better basis for >> solving this problem. �Its categories are already pretty close to >> what Peter needs: basically, IIUC, he wants to know about classes >> 53, 58, maybe F0, and XX. > This is really too mushy, IMHO. I don't deny that we probably need to reclassify a few error cases, and fix some elogs that should be ereports, before this approach would be really workable. My point is that it's *close*, whereas "let's invent some new error severities" is not close to reality and will break all sorts of stuff. regards, tom lane
On May 1, 2012, at 20:05, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, May 1, 2012 at 4:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I continue to maintain that the SQLSTATE is a much better basis for >> solving this problem. Its categories are already pretty close to >> what Peter needs: basically, IIUC, he wants to know about classes >> 53, 58, maybe F0, and XX. > > This is really too mushy, IMHO. ERRCODE_TOO_MANY_CONNECTIONS isn't > what I'd call an oh-shit condition even though it's in class 53, but > this "could not create archive status file \"%s\"" is definitely an > oh-shit regardless of what errcode_for_file_access() returns. > > Also, the fact is that most people do not log SQLSTATEs. And even if > they did, they're not going to know to grep for 53|58|maybe F0|XX. > What we need is an easy way for people to pick out any log entries > that represent conditions that should never occur as a result of any > legitimate user activity. > Like, with grep. And, without needing to > have a PhD in Postgresology. > If you want something really simple why not output all elog calls to one file and ereport calls to the current log? If you recognize the need to fix existing code so that you can determine the severity levels you desire then go all the wayand use SQLSTATE at the call level and then add meta-data about those codes higher up. That meta-data is then customizableso those who want the too many connections error can see them while those that do not can turn them off. With the addition of the PostgreSQL specific severity category both that value and the SQLSTATE upon which it is based shouldbe something that is considered best practice to output (and the default) and future attention should be given to ensuringthat the code is as accurate as possible. Since existing log formats would still be valid upgrades should not bean issue. David J.
On 2 May 2012 01:13, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I don't deny that we probably need to reclassify a few error cases, and > fix some elogs that should be ereports, before this approach would be > really workable. My point is that it's *close*, whereas "let's invent > some new error severities" is not close to reality and will break all > sorts of stuff. I now accept that your proposal to derive magnitude from SQLSTATE was better than my earlier proposal to invent a new severity level, though I do of course also agree that that approach necessitates refining the SQLSTATEs in some cases. On 2 May 2012 01:05, Robert Haas <robertmhaas@gmail.com> wrote: > Also, the fact is that most people do not log SQLSTATEs. And even if > they did, they're not going to know to grep for 53|58|maybe F0|XX. > What we need is an easy way for people to pick out any log entries > that represent conditions that should never occur as a result of any > legitimate user activity. Like, with grep. And, without needing to > have a PhD in Postgresology. I couldn't agree more. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
2012/5/2 Robert Haas <robertmhaas@gmail.com>: > On Tue, May 1, 2012 at 4:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I continue to maintain that the SQLSTATE is a much better basis for >> solving this problem. Its categories are already pretty close to >> what Peter needs: basically, IIUC, he wants to know about classes >> 53, 58, maybe F0, and XX. > > This is really too mushy, IMHO. ERRCODE_TOO_MANY_CONNECTIONS isn't > what I'd call an oh-shit condition even though it's in class 53, but > this "could not create archive status file \"%s\"" is definitely an > oh-shit regardless of what errcode_for_file_access() returns. > > Also, the fact is that most people do not log SQLSTATEs. And even if > they did, they're not going to know to grep for 53|58|maybe F0|XX. > What we need is an easy way for people to pick out any log entries > that represent conditions that should never occur as a result of any > legitimate user activity. Like, with grep. And, without needing to > have a PhD in Postgresology. long time I wish Pg supports more log files and now I had to write extension for forwarding slow queries to other log. so possible solutions can be extension that support mapping, forwarding, post processing of log items. We have no example for log_hook still. Regards Pavel > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
On tis, 2012-05-01 at 20:13 -0400, Tom Lane wrote: > I don't deny that we probably need to reclassify a few error cases, > and fix some elogs that should be ereports, before this approach would > be really workable. My point is that it's *close*, whereas "let's > invent some new error severities" is not close to reality and will > break all sorts of stuff. We might hit a road block because some of these sqlstates are defined by the SQL standard. But at least we should try this first, and if it doesn't work make another field that contains the admin/server-level severity instead of the client-side/flow-control severity level.
Peter Eisentraut <peter_e@gmx.net> writes: > On tis, 2012-05-01 at 20:13 -0400, Tom Lane wrote: >> I don't deny that we probably need to reclassify a few error cases, >> and fix some elogs that should be ereports, before this approach would >> be really workable. My point is that it's *close*, whereas "let's >> invent some new error severities" is not close to reality and will >> break all sorts of stuff. > We might hit a road block because some of these sqlstates are defined by > the SQL standard. My guess is that all the ones defined in the SQL standard are "expected" errors, more or less by definition, and thus not interesting according to Peter G's criteria. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > My guess is that all the ones defined in the SQL standard are > "expected" errors, more or less by definition, and thus not > interesting according to Peter G's criteria. On a scan through the list, I didn't see any exceptions to that, except for the "F0" class. To restate what the standard reserves for standard SQLSTATE values in the form of a regular expression, it looks like: '^[0-4A-H][0-9A-Z][0-4A-H][0-9A-Z][0-9A-Z]$' Eyeballing the errcode page in the docs, it looks like there are PostgreSQL-assigned values that start with '5', 'P', and 'X'. That "F0" class looks suspicious; are those really defined by standard or did we encroach on standard naming space with PostgreSQL-specific values? We also have PostgreSQL-specific values in standard classes where we use 'P' for the third character, which is fine. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > That "F0" class looks suspicious; are those really defined by standard or > did we encroach on standard naming space with PostgreSQL-specific > values? I think we screwed up on that :-(. So we ought to renumber those codes anyway. Perhaps use "PF" instead of "F0"? regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >> That "F0" class looks suspicious; are those really defined by >> standard or did we encroach on standard naming space with >> PostgreSQL-specific values? > > I think we screwed up on that :-(. So we ought to renumber those > codes anyway. Perhaps use "PF" instead of "F0"? Sounds good to me. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >>> That "F0" class looks suspicious; are those really defined by >>> standard or did we encroach on standard naming space with >>> PostgreSQL-specific values? >> I think we screwed up on that :-(. So we ought to renumber those >> codes anyway. Perhaps use "PF" instead of "F0"? > Sounds good to me. I thought for a few minutes about whether we ought to try to sneak such a change into 9.2. But given that we're talking about probably doing a number of other SQLSTATE reassignments in the future, it seems likely better to wait and absorb all that pain in a single release cycle. It seems moderately unlikely that any client-side code is dependent on these specific assignments, but still I'd rather not see a dribble of "we changed some SQLSTATEs" compatibility flags across several successive releases. regards, tom lane