Thread: Add value to error message when size extends
Hi all, We have encountered an issue when executing an insert command, when one of the values' length was bigger than the column size limitation. the error message which been displayed was "value too long for type char..." but there was no indication which value has exceeded the limited size. (See bug #8880) Attached is a WIP patch which attend to make the message clearer. Regards, Maor and Daniel
Attachment
Maor Lipchuk <mlipchuk@redhat.com> writes: > We have encountered an issue when executing an insert command, > when one of the values' length was bigger than the column size limitation. > the error message which been displayed was "value too long for type char..." > but there was no indication which value has exceeded the limited size. > (See bug #8880) > Attached is a WIP patch which attend to make the message clearer. This sort of thing has been proposed before ... We have a message style guideline that says that primary error messages should fit on "one line", which is generally understood to mean maybe 80 characters. Complaining about a too-long varchar string in this style seems practically guaranteed to violate that. More, putting the string contents before the meat of the message is the worst possible choice if a client user interface decides to truncate the message. And if the string were *really* long, say in the millions of characters, it's not unlikely that this would end up causing the message to get replaced by "out of memory", which is totally counterproductive. You could avoid the first two of those objections by putting the string contents into a DETAIL message; but not the third one. Aside from that, it's less than clear whether this approach would actually help much: maybe the string is readily identifiable as to which column it's meant for, and maybe not. People have speculated about ways to name the target column in the error report, which would probably be far more useful; but it's not real clear how to do that in our system structure. One thing that occurs to me just now is that perhaps we could report the failure as if it were a syntax error, that is with an error cursor pointing to where in the triggering SQL query the coercion is being done. (Years ago this would not have been possible because we didn't always keep around the query text until runtime, but I think we do now.) It would take some work to make that happen, and I'm not sure it would really resolve the usability problem, but it seems worth proposing. An advantage is that over time such a facility could be extended to run-time errors happening in any function not just this particular one. regards, tom lane
On Sun, Jan 19, 2014 at 8:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Complaining about a too-long varchar string in this style > seems practically guaranteed to violate that. Agreed. But I think it would be useful to add the length of the string being inserted; we already have it in the len variable. > One thing that occurs to me just now is that perhaps we could report > the failure as if it were a syntax error That would be cool, if it can be made to work. Regards, Marti
Marti Raudsepp <marti@juffo.org> writes: > Agreed. But I think it would be useful to add the length of the string > being inserted; we already have it in the len variable. Hm, maybe, but I'm having a hard time visualizing cases in which it helps much. regards, tom lane
Marti Raudsepp <marti@juffo.org> writes: > On Sun, Jan 19, 2014 at 8:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> One thing that occurs to me just now is that perhaps we could report >> the failure as if it were a syntax error > That would be cool, if it can be made to work. Just as a five-minute proof-of-concept hack, attached is a patch that makes varchar() report an error position if it can get one. This is *very* far from being production quality --- debug_query_string is the wrong thing to rely on in general, and we'd certainly want to encapsulate the logic rather than have individual functions know about how to do it. But here's some testing that shows that the idea seems to have promise from a usability standpoint: regression=# create table test (f1 varchar(10), f2 varchar(5), f3 varchar(10)); CREATE TABLE regression=# insert into test values ('a', 'b', 'foobar'); INSERT 0 1 regression=# insert into test values ('foobar', 'foobar', 'foobar'); ERROR: value too long for type character varying(5) LINE 1: insert into test values ('foobar', 'foobar', 'foobar'); ^ regression=# update test set f2 = f3 where f1 = 'a'; ERROR: value too long for type character varying(5) LINE 1: update test set f2 = f3 where f1 = 'a'; ^ The first error case points out a limitation of relying on the contents of the string to figure out where your problem is. The error-cursor approach has its own limitations, of course; for instance the second case might not be thought to be all that helpful. regards, tom lane diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/varchar.c index 502ca44..4438ed8 100644 *** a/src/backend/utils/adt/varchar.c --- b/src/backend/utils/adt/varchar.c *************** *** 19,24 **** --- 19,25 ---- #include "access/tuptoaster.h" #include "libpq/pqformat.h" #include "nodes/nodeFuncs.h" + #include "tcop/tcopprot.h" #include "utils/array.h" #include "utils/builtins.h" #include "mb/pg_wchar.h" *************** varchar(PG_FUNCTION_ARGS) *** 617,626 **** { for (i = maxmblen; i < len; i++) if (s_data[i] != ' ') ereport(ERROR, (errcode(ERRCODE_STRING_DATA_RIGHT_TRUNCATION), errmsg("value too long for type character varying(%d)", ! maxlen))); } PG_RETURN_VARCHAR_P((VarChar *) cstring_to_text_with_len(s_data, --- 618,645 ---- { for (i = maxmblen; i < len; i++) if (s_data[i] != ' ') + { + int pos = 0; + + if (debug_query_string && + fcinfo->flinfo->fn_expr) + { + int location = exprLocation(fcinfo->flinfo->fn_expr); + + if (location >= 0) + { + /* Convert offset to character number */ + pos = pg_mbstrlen_with_len(debug_query_string, + location) + 1; + } + } + ereport(ERROR, (errcode(ERRCODE_STRING_DATA_RIGHT_TRUNCATION), errmsg("value too long for type character varying(%d)", ! maxlen), ! errposition(pos))); ! } } PG_RETURN_VARCHAR_P((VarChar *) cstring_to_text_with_len(s_data,
Hi Tom and Marti Thank you so much for your respond. The solution Tom proposed is much more better, and it will be a great solution for clarifying many issues regarding this error. Regards, Maor On 01/19/2014 10:00 PM, Tom Lane wrote: > Marti Raudsepp <marti@juffo.org> writes: >> On Sun, Jan 19, 2014 at 8:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> One thing that occurs to me just now is that perhaps we could report >>> the failure as if it were a syntax error > >> That would be cool, if it can be made to work. > > Just as a five-minute proof-of-concept hack, attached is a patch that > makes varchar() report an error position if it can get one. This is > *very* far from being production quality --- debug_query_string is the > wrong thing to rely on in general, and we'd certainly want to encapsulate > the logic rather than have individual functions know about how to do it. > But here's some testing that shows that the idea seems to have promise > from a usability standpoint: > > regression=# create table test (f1 varchar(10), f2 varchar(5), f3 varchar(10)); > CREATE TABLE > > regression=# insert into test values ('a', 'b', 'foobar'); > INSERT 0 1 > > regression=# insert into test values ('foobar', 'foobar', 'foobar'); > ERROR: value too long for type character varying(5) > LINE 1: insert into test values ('foobar', 'foobar', 'foobar'); > ^ > > regression=# update test set f2 = f3 where f1 = 'a'; > ERROR: value too long for type character varying(5) > LINE 1: update test set f2 = f3 where f1 = 'a'; > ^ > > The first error case points out a limitation of relying on the contents of > the string to figure out where your problem is. The error-cursor approach > has its own limitations, of course; for instance the second case might not > be thought to be all that helpful. Yes, but in this case you will know specifically which column is the problematic one. The highlight of your approach gains much more benefit when updating/inserting multiple values in one update/insert command. > > regards, tom lane >
Hi, Many thanks for the prompt response and the suggestions! So, regarding the issue of "production quality" you've mentioned, we understand there are two remaining matters to address: 1. debug_query_string: As we can't rely on this flag, is there any alternative we can rely on? 2. encapsulation: Is there any "utility file" we could extract the logic to? Or, any other functions that are used forencapsulation mechanism? Thanks! Daniel ----- Original Message ----- > From: "Maor Lipchuk" <mlipchuk@redhat.com> > To: "Tom Lane" <tgl@sss.pgh.pa.us>, "Marti Raudsepp" <marti@juffo.org> > Cc: "pgsql-hackers" <pgsql-hackers@postgresql.org>, "Daniel Erez" <derez@redhat.com> > Sent: Monday, January 20, 2014 2:32:57 AM > Subject: Re: [HACKERS] Add value to error message when size extends > > Hi Tom and Marti > Thank you so much for your respond. > > The solution Tom proposed is much more better, and it will be a great > solution for clarifying many issues regarding this error. > > Regards, > Maor > > > On 01/19/2014 10:00 PM, Tom Lane wrote: > > Marti Raudsepp <marti@juffo.org> writes: > >> On Sun, Jan 19, 2014 at 8:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >>> One thing that occurs to me just now is that perhaps we could report > >>> the failure as if it were a syntax error > > > >> That would be cool, if it can be made to work. > > > > Just as a five-minute proof-of-concept hack, attached is a patch that > > makes varchar() report an error position if it can get one. This is > > *very* far from being production quality --- debug_query_string is the > > wrong thing to rely on in general, and we'd certainly want to encapsulate > > the logic rather than have individual functions know about how to do it. > > But here's some testing that shows that the idea seems to have promise > > from a usability standpoint: > > > > regression=# create table test (f1 varchar(10), f2 varchar(5), f3 > > varchar(10)); > > CREATE TABLE > > > > regression=# insert into test values ('a', 'b', 'foobar'); > > INSERT 0 1 > > > > regression=# insert into test values ('foobar', 'foobar', 'foobar'); > > ERROR: value too long for type character varying(5) > > LINE 1: insert into test values ('foobar', 'foobar', 'foobar'); > > ^ > > > > regression=# update test set f2 = f3 where f1 = 'a'; > > ERROR: value too long for type character varying(5) > > LINE 1: update test set f2 = f3 where f1 = 'a'; > > ^ > > > > The first error case points out a limitation of relying on the contents of > > the string to figure out where your problem is. The error-cursor approach > > has its own limitations, of course; for instance the second case might not > > be thought to be all that helpful. > Yes, but in this case you will know specifically which column is the > problematic one. > The highlight of your approach gains much more benefit when > updating/inserting multiple values in one update/insert command. > > > > regards, tom lane > > > >
Daniel Erez <derez@redhat.com> writes: > Many thanks for the prompt response and the suggestions! > So, regarding the issue of "production quality" you've mentioned, > we understand there are two remaining matters to address: > 1. debug_query_string: > As we can't rely on this flag, is there any alternative we can rely on? > 2. encapsulation: > Is there any "utility file" we could extract the logic to? > Or, any other functions that are used for encapsulation mechanism? The big picture here is that there are two ways we could go: * Encapsulate some improved version of the logic I posted into an error reporting auxiliary function, along the lines of function_errposition(fcinfo) and then run around and add calls to this function in the ereports where it seems useful. This would be the right answer if we think only a few such ereports need the extra info --- but if we want it to apply to many/most function-level error reports, it sounds pretty darn tedious. Also it'd require that everyplace that is going to throw such reports have access to the appropriate fcinfo, which would require some refactoring in places where SQL functions have been divided into multiple C routines. * Add logic in execQual.c to automatically add the information whenever an error is thrown while executing an identifiable expression node. This solves the problem for all functions at one stroke. The problem is that it would add some overhead to the non-error code path. It's hard to see how to do this without at least half a dozen added instructions per function or operator node (it'd likely involve saving, setting, and restoring a global variable ...). I'm not sure if that would be significant but it would very possibly be measurable on function-call-heavy workloads. We might think that such overhead is worth paying for better error reporting; but given how few people have commented on this thread, I'm not sure many people are excited about it. I'd like to hear some more comments before undertaking either approach. As for getting rid of the use of debug_query_string, it'd take some work, though it seems do-able. I think ActivePortal->sourceText is the right thing to reference in the "normal" case, but it may not be the right thing when executing queries from SQL-language functions for instance. Also, the reason I didn't use that in the quick-hack patch was that the test cases I wanted to post involved calls that will get executed during the planner's constant-simplification pass (since varchar() is marked IMMUTABLE). It turns out there isn't an ActivePortal at that point, so we'd need some thought about how to make the right query string accessible during planning. regards, tom lane