Re: Problems with Error Messages wrt Domains, Checks - Mailing list pgsql-hackers

From David G. Johnston
Subject Re: Problems with Error Messages wrt Domains, Checks
Date
Msg-id CAKFQuwbFLvUiVOV4N7a1Fdh6hQthuLZCy-WwcQ6w5Cp6C4MRyw@mail.gmail.com
Whole thread Raw
In response to Problems with Error Messages wrt Domains, Checks  (john frazer <johnfrazer783@gmail.com>)
List pgsql-hackers
On Sat, Mar 17, 2018 at 6:14 AM, john frazer <johnfrazer783@gmail.com> wrote:

Today I realized a number of points where PostgreSQL (v10.3) is rather lackluster and sparse in its error messages.

​You may find the following thread and its predecessors enlightening.


​Basically, the fundamental problem is type input is performed in a relatively isolated fashion since there is no requirement that a table or column of said type even exist.
psql:db/experiments/pg-error-fail-illegal-regex.sql:17: ERROR:
invalid regular expression: parentheses () not balanced

There are several problems with this error message:

FAILURE: the error is really in line 5 where a syntactically invalid RegEx is created; the fact that it is a RegEx and not a general string is obvious from the semantics of the ~ (tilde) operator at that point in time.

​Yeah, the fact that we don't "compile" expressions is unfortunate.  Not sure if there are any plans to do so or what limitations there are​
 

FAILURE: the offending RegEx is not referred to and not quoted in the error message.

​This seems like an easy enough oversight to correct.​  In all the discussion about being challenging to identify location I don't recall seeing anything about why we aren't at least showing the offending input value.

As such, it could be anywhere in my many, many kLOCs big DB definition. I cannot even search the RegEx with a RegEx because all I know is some parenthesis is missing, somewhere:

​Well, the error does point to the first statement in the chain of issues - working backward a couple of steps is possible.

RegExes cannot match parentheses,

​Sure they can​.

and PG RegExes do not have a unique syntactic marker to them.

​True​

FAILURE: before the insert statement, everything runs dandy. We could have built an entire data warehouse application on top of a table definition that can never be syntactically processed but which will only fail when someone accidentally tries to insert a line.

​Since this is going to fail every single time you add a record I'm lacking sympathy here.  "Accidentally tries to insert a line" - if the table wasn't meant to be used why does it exist in the first place?​  And if it is intended to be used then functional testing should quickly point out something like this.

FAILURE: I can select from a table with a syntactically invalid definition.

​You're stretching here if you think this is an important failure point.  Since the table cannot not have valid data there would be nothing to select.  Checking constraints during selection is undesireable - they should be an are only checked during insertion or when the constraint itself changes.

With only line B active, this gives:

psql:db/experiments/pg-error-fail-no-constraint-name.sql:16:
ERROR:  new row for relation "table_with_constraints" violates
check constraint "field b must have 3 characters"
DETAIL:  Failing row contains (xxxx).

SUCCESS: we get the name of the relation and the name of the violated rule.

SUCCESS: the offending piece of data is quoted.

FAILURE: we don't get the full name of the relation, which is "X"."table_with_constraints". Neither do we get the name of the column that received the offending value.

​No, you get "check constraint field b must have 3 characters" with the owning table.  You've defined a table constraint so there is no directly attached column to report - the expression as a whole fails and we don't report which boolean aspects of the expression where true and false.  You do get the input value which makes manually resolving the expression possible.  The lack of schema-qualification on the table identifier seems like an oversight but at the moment I'd not willing to go find evidence in support to opposition to that thought.​

Lastly, with only line A (not line B) active:

psql:db/experiments/pg-error-fail-no-constraint-name.sql:16:
ERROR:  value for domain x.a_legal_regex violates check constraint "a_legal_regex_check"

FAILURE: no reference to the affected table, column is made. 

FAILURE: no reference to the offending piece of data is made

​Repeat of the first example, same explanations apply.  Hopefully this gets improved eventually.​
 

FAILURE: no reference to the offended constraint is made ("column a must start with x").

​It never got that far into the validation process.  It couldn't even form a value of correct type that the constraint expression could evaluate.  I suppose this is just another aspect of the first problem - the isolation of type conversion and the absence of keeping and reporting a stack-trace.  Someone more knowledgeable than I would need to expound on the similarities and differences.

What are the best practices or workarounds for the above shortcomings? I've been trying for several hours to figure out what causes an error message a la value for domain xxx violates check constraint "xxx_check" by rewriting table definitions, inserting data row by row and so on, to no avail. What I need is a full chain of the objects (column -> table -> constraint -> domain -> check) that are involved in the error.

​If the error starts with "value for domain xxx violates" then the only place to look is at your "create domain" statement for "xxx".  Well, at least once you know what the actually offending value is...which right now might be a bit of a challenge depending on the situation (if your first example applies the data doesn't matter and critically evaluating the create domain statement might be sufficient).
 
I'm writing this to the developers' list because I see the above observations as serious shortcomings in an otherwise great piece of software that can probably not be fixed by using client-side code only.

​I suppose the best practice when dealing with a lack of information in the error handle code flows is to limit the amount of context that is in scope by unit testing.  And while they are absolutely short-comings overcoming them has cost in terms of both developer effort and, more importantly, runtime performance.

David J.

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: strange failure in plpgsql_control tests (on fulmar, ICC 14.0.3)
Next
From: Christos Maris
Date:
Subject: Re: Google Summer of Code: Potential Applicant