Thread: Problems with Error Messages wrt Domains, Checks

Problems with Error Messages wrt Domains, Checks

From
john frazer
Date:

Today I realized a number of points where PostgreSQL (v10.3) is rather lackluster and sparse in its error messages.

The first point is illustrated by this code:

drop schema if exists X cascade;
create schema X;

create domain X.an_illegal_regex as text check ( value ~ '(' );

create table X.table_with_illegal_constraint ( a text, constraint "column a must have a bogus value" check ( a::X.an_illegal_regex = a ) );

select * from X.table_with_illegal_constraint;

insert into X.table_with_illegal_constraint values ( 'xxx' ), -- ( 'xxx' ), ( 'foo' ), ( 'xyx' );

This code will throw with

psql:db/experiments/pg-error-fail-illegal-regex.sql:17: ERROR:
invalid regular expression: parentheses () not balanced

There are several problems with this error message:

FAILURE: the error is really in line 5 where a syntactically invalid RegEx is created; the fact that it is a RegEx and not a general string is obvious from the semantics of the ~ (tilde) operator at that point in time.

FAILURE: the offending RegEx is not referred to and not quoted in the error message. As such, it could be anywhere in my many, many kLOCs big DB definition. I cannot even search the RegEx with a RegEx because all I know is some parenthesis is missing, somewhere: RegExes cannot match parentheses, and PG RegExes do not have a unique syntactic marker to them.

FAILURE: before the insert statement, everything runs dandy. We could have built an entire data warehouse application on top of a table definition that can never be syntactically processed but which will only fail when someone accidentally tries to insert a line.

FAILURE: I can select from a table with a syntactically invalid definition.

The second point is related:

drop schema if exists X cascade;
create schema X;

create domain X.a_legal_regex as text check ( value ~ '^x' );

create table X.table_with_constraints ( a text, constraint "column a must start with x" check ( a::X.a_legal_regex = a ), constraint "field b must have 3 characters" check ( character_length( a ) = 3 ) );

insert into X.table_with_constraints values ( 'xxx' ), ( 'foo' ),        /* A: violates first constraint */ -- ( 'xxxx' ),       /* B: violates second constraint */ ( 'xyx' );

With only line B active, this gives:

psql:db/experiments/pg-error-fail-no-constraint-name.sql:16:
ERROR:  new row for relation "table_with_constraints" violates
check constraint "field b must have 3 characters"
DETAIL:  Failing row contains (xxxx).

SUCCESS: we get the name of the relation and the name of the violated rule.

SUCCESS: the offending piece of data is quoted.

FAILURE: we don't get the full name of the relation, which is "X"."table_with_constraints". Neither do we get the name of the column that received the offending value.

Lastly, with only line A (not line B) active:

psql:db/experiments/pg-error-fail-no-constraint-name.sql:16:
ERROR:  value for domain x.a_legal_regex violates check constraint "a_legal_regex_check"

FAILURE: no reference to the affected table, column is made.

FAILURE: no reference to the offending piece of data is made.

FAILURE: no reference to the offended constraint is made ("column a must start with x").

What are the best practices or workarounds for the above shortcomings? I've been trying for several hours to figure out what causes an error message a la value for domain xxx violates check constraint "xxx_check" by rewriting table definitions, inserting data row by row and so on, to no avail. What I need is a full chain of the objects (column -> table -> constraint -> domain -> check) that are involved in the error.

I'm writing this to the developers' list because I see the above observations as serious shortcomings in an otherwise great piece of software that can probably not be fixed by using client-side code only.

Re: Problems with Error Messages wrt Domains, Checks

From
"David G. Johnston"
Date:
On Sat, Mar 17, 2018 at 6:14 AM, john frazer <johnfrazer783@gmail.com> wrote:

Today I realized a number of points where PostgreSQL (v10.3) is rather lackluster and sparse in its error messages.

​You may find the following thread and its predecessors enlightening.


​Basically, the fundamental problem is type input is performed in a relatively isolated fashion since there is no requirement that a table or column of said type even exist.
psql:db/experiments/pg-error-fail-illegal-regex.sql:17: ERROR:
invalid regular expression: parentheses () not balanced

There are several problems with this error message:

FAILURE: the error is really in line 5 where a syntactically invalid RegEx is created; the fact that it is a RegEx and not a general string is obvious from the semantics of the ~ (tilde) operator at that point in time.

​Yeah, the fact that we don't "compile" expressions is unfortunate.  Not sure if there are any plans to do so or what limitations there are​
 

FAILURE: the offending RegEx is not referred to and not quoted in the error message.

​This seems like an easy enough oversight to correct.​  In all the discussion about being challenging to identify location I don't recall seeing anything about why we aren't at least showing the offending input value.

As such, it could be anywhere in my many, many kLOCs big DB definition. I cannot even search the RegEx with a RegEx because all I know is some parenthesis is missing, somewhere:

​Well, the error does point to the first statement in the chain of issues - working backward a couple of steps is possible.

RegExes cannot match parentheses,

​Sure they can​.

and PG RegExes do not have a unique syntactic marker to them.

​True​

FAILURE: before the insert statement, everything runs dandy. We could have built an entire data warehouse application on top of a table definition that can never be syntactically processed but which will only fail when someone accidentally tries to insert a line.

​Since this is going to fail every single time you add a record I'm lacking sympathy here.  "Accidentally tries to insert a line" - if the table wasn't meant to be used why does it exist in the first place?​  And if it is intended to be used then functional testing should quickly point out something like this.

FAILURE: I can select from a table with a syntactically invalid definition.

​You're stretching here if you think this is an important failure point.  Since the table cannot not have valid data there would be nothing to select.  Checking constraints during selection is undesireable - they should be an are only checked during insertion or when the constraint itself changes.

With only line B active, this gives:

psql:db/experiments/pg-error-fail-no-constraint-name.sql:16:
ERROR:  new row for relation "table_with_constraints" violates
check constraint "field b must have 3 characters"
DETAIL:  Failing row contains (xxxx).

SUCCESS: we get the name of the relation and the name of the violated rule.

SUCCESS: the offending piece of data is quoted.

FAILURE: we don't get the full name of the relation, which is "X"."table_with_constraints". Neither do we get the name of the column that received the offending value.

​No, you get "check constraint field b must have 3 characters" with the owning table.  You've defined a table constraint so there is no directly attached column to report - the expression as a whole fails and we don't report which boolean aspects of the expression where true and false.  You do get the input value which makes manually resolving the expression possible.  The lack of schema-qualification on the table identifier seems like an oversight but at the moment I'd not willing to go find evidence in support to opposition to that thought.​

Lastly, with only line A (not line B) active:

psql:db/experiments/pg-error-fail-no-constraint-name.sql:16:
ERROR:  value for domain x.a_legal_regex violates check constraint "a_legal_regex_check"

FAILURE: no reference to the affected table, column is made. 

FAILURE: no reference to the offending piece of data is made

​Repeat of the first example, same explanations apply.  Hopefully this gets improved eventually.​
 

FAILURE: no reference to the offended constraint is made ("column a must start with x").

​It never got that far into the validation process.  It couldn't even form a value of correct type that the constraint expression could evaluate.  I suppose this is just another aspect of the first problem - the isolation of type conversion and the absence of keeping and reporting a stack-trace.  Someone more knowledgeable than I would need to expound on the similarities and differences.

What are the best practices or workarounds for the above shortcomings? I've been trying for several hours to figure out what causes an error message a la value for domain xxx violates check constraint "xxx_check" by rewriting table definitions, inserting data row by row and so on, to no avail. What I need is a full chain of the objects (column -> table -> constraint -> domain -> check) that are involved in the error.

​If the error starts with "value for domain xxx violates" then the only place to look is at your "create domain" statement for "xxx".  Well, at least once you know what the actually offending value is...which right now might be a bit of a challenge depending on the situation (if your first example applies the data doesn't matter and critically evaluating the create domain statement might be sufficient).
 
I'm writing this to the developers' list because I see the above observations as serious shortcomings in an otherwise great piece of software that can probably not be fixed by using client-side code only.

​I suppose the best practice when dealing with a lack of information in the error handle code flows is to limit the amount of context that is in scope by unit testing.  And while they are absolutely short-comings overcoming them has cost in terms of both developer effort and, more importantly, runtime performance.

David J.

Fwd: Problems with Error Messages wrt Domains, Checks

From
john frazer
Date:

---------- Forwarded message ----------
From: john frazer <johnfrazer783@gmail.com>
Date: Sat, Mar 17, 2018 at 6:28 PM
Subject: Re: Problems with Error Messages wrt Domains, Checks
To: "David G. Johnston" <david.g.johnston@gmail.com>




Thanks for your consideration,

I'll try to be brief.


>     As such, it could be anywhere in my many, many kLOCs big DB definition. I cannot even search the RegEx with a RegEx because all I know is some parenthesis is missing, somewhere:
>
> ​Well, the error does point to the first statement in the chain of issues - working backward a couple of steps is possible.

In this particular case that is possible, and I did manage to do it. The point is that in the
general case the faulty regular expression could be anywhere, and there's no clue given at all.

>     RegExes cannot match parentheses,
> ​Sure they can​.
>     and PG RegExes do not have a unique syntactic marker to them.
> ​True​

I meant to say they can't detect matching parentheses or lack thereof.

>     FAILURE: before the insert statement, everything runs dandy. We could have built an entire data warehouse application on top of a table definition that can never be syntactically processed but which will only fail when someone accidentally tries to insert a line.
>
> ​Since this is going to fail every single time you add a record I'm lacking sympathy here.  "Accidentally tries to insert a line" - if the table wasn't meant to be used why does it exist in the first place?​  And if it is intended to be used then functional testing should quickly point out something like this.

But there clearly can be tables that are used only now and then and might get checked
for absence of rows. But regardless, I think the point stands that ideally you shouldn't
be able to succesfully declare nonsensical objects and only be told so some kinds of
usage patterns by runtime errors (with defective contexts), and in most cases, pgSQL
does keep that promise.

>     FAILURE: I can select from a table with a syntactically invalid definition.
> ​You're stretching here if you think this is an important failure point.  Since the table cannot not have valid data there would be nothing to select.  Checking constraints during selection is undesireable - they should be an are only checked during insertion or when the constraint itself changes.

To clarify, I do not suggest checking constraints during `select`, I suggest
tighter checks at table creation time. I should not be able to construct
any kind of object that says (in SQL or maybe even plpgsql, too) something
to the effect `x ~ '('` because that string in that syntactic context which must
be a RegexLiteral is syntactically bogus. (Yes operators can be redefined but the
only thing that counts here is the definition of the `~` operator at the point in time
that the table gets created, and that expression is constant with a literal, so it's
not much difference between this and checking against, say, `x > 0` which could and
would fail for illegal literals and non-matching types.)

> ​I suppose the best practice when dealing with a lack of information in the error handle code flows is to limit the amount of context that is in scope by unit testing.  And while they are absolutely short-comings overcoming them has cost in terms of both developer effort and, more importantly, runtime performance.

I'm afraid no amount of unit testing of the DDL code can do this for me. Yes,
in the first reported cases (the invalid RegExp), I can make sure I use each
expression at least once so unsyntactic ones will make themselves shown. But
in the other two cases, well, the production environment in which this came up
has an insert statement that takes data from a largish source into the target table
(20k rows of altogether >2m rows), and I *can't* unit test that data.

FWIW the workaround that IÄve found is this:

create table X.table_with_constraints (
  my_column text,
  constraint "my_column must start with 'x'"  check ( Q.starts_with_x( my_column ) ),
  constraint "my_column must have 3 chrs"     check ( Q.has_3_characters( my_column ) ) );

In other words, I dispense with domains and use (small, boolean) functions (defined as `select` one-liners)
because only then do I get told what piece of data comes doen the wrong way and where.
It's a shame because this is essentially what I expect to do in a language like
JavaScript.







On Sat, Mar 17, 2018 at 4:20 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Sat, Mar 17, 2018 at 6:14 AM, john frazer <johnfrazer783@gmail.com> wrote:

Today I realized a number of points where PostgreSQL (v10.3) is rather lackluster and sparse in its error messages.

​You may find the following thread and its predecessors enlightening.


​Basically, the fundamental problem is type input is performed in a relatively isolated fashion since there is no requirement that a table or column of said type even exist.
psql:db/experiments/pg-error-fail-illegal-regex.sql:17: ERROR:
invalid regular expression: parentheses () not balanced

There are several problems with this error message:

FAILURE: the error is really in line 5 where a syntactically invalid RegEx is created; the fact that it is a RegEx and not a general string is obvious from the semantics of the ~ (tilde) operator at that point in time.

​Yeah, the fact that we don't "compile" expressions is unfortunate.  Not sure if there are any plans to do so or what limitations there are​
 

FAILURE: the offending RegEx is not referred to and not quoted in the error message.

​This seems like an easy enough oversight to correct.​  In all the discussion about being challenging to identify location I don't recall seeing anything about why we aren't at least showing the offending input value.

As such, it could be anywhere in my many, many kLOCs big DB definition. I cannot even search the RegEx with a RegEx because all I know is some parenthesis is missing, somewhere:

​Well, the error does point to the first statement in the chain of issues - working backward a couple of steps is possible.

RegExes cannot match parentheses,

​Sure they can​.

and PG RegExes do not have a unique syntactic marker to them.

​True​

FAILURE: before the insert statement, everything runs dandy. We could have built an entire data warehouse application on top of a table definition that can never be syntactically processed but which will only fail when someone accidentally tries to insert a line.

​Since this is going to fail every single time you add a record I'm lacking sympathy here.  "Accidentally tries to insert a line" - if the table wasn't meant to be used why does it exist in the first place?​  And if it is intended to be used then functional testing should quickly point out something like this.

FAILURE: I can select from a table with a syntactically invalid definition.

​You're stretching here if you think this is an important failure point.  Since the table cannot not have valid data there would be nothing to select.  Checking constraints during selection is undesireable - they should be an are only checked during insertion or when the constraint itself changes.

With only line B active, this gives:

psql:db/experiments/pg-error-fail-no-constraint-name.sql:16:
ERROR:  new row for relation "table_with_constraints" violates
check constraint "field b must have 3 characters"
DETAIL:  Failing row contains (xxxx).

SUCCESS: we get the name of the relation and the name of the violated rule.

SUCCESS: the offending piece of data is quoted.

FAILURE: we don't get the full name of the relation, which is "X"."table_with_constraints". Neither do we get the name of the column that received the offending value.

​No, you get "check constraint field b must have 3 characters" with the owning table.  You've defined a table constraint so there is no directly attached column to report - the expression as a whole fails and we don't report which boolean aspects of the expression where true and false.  You do get the input value which makes manually resolving the expression possible.  The lack of schema-qualification on the table identifier seems like an oversight but at the moment I'd not willing to go find evidence in support to opposition to that thought.​

Lastly, with only line A (not line B) active:

psql:db/experiments/pg-error-fail-no-constraint-name.sql:16:
ERROR:  value for domain x.a_legal_regex violates check constraint "a_legal_regex_check"

FAILURE: no reference to the affected table, column is made. 

FAILURE: no reference to the offending piece of data is made

​Repeat of the first example, same explanations apply.  Hopefully this gets improved eventually.​
 

FAILURE: no reference to the offended constraint is made ("column a must start with x").

​It never got that far into the validation process.  It couldn't even form a value of correct type that the constraint expression could evaluate.  I suppose this is just another aspect of the first problem - the isolation of type conversion and the absence of keeping and reporting a stack-trace.  Someone more knowledgeable than I would need to expound on the similarities and differences.

What are the best practices or workarounds for the above shortcomings? I've been trying for several hours to figure out what causes an error message a la value for domain xxx violates check constraint "xxx_check" by rewriting table definitions, inserting data row by row and so on, to no avail. What I need is a full chain of the objects (column -> table -> constraint -> domain -> check) that are involved in the error.

​If the error starts with "value for domain xxx violates" then the only place to look is at your "create domain" statement for "xxx".  Well, at least once you know what the actually offending value is...which right now might be a bit of a challenge depending on the situation (if your first example applies the data doesn't matter and critically evaluating the create domain statement might be sufficient).
 
I'm writing this to the developers' list because I see the above observations as serious shortcomings in an otherwise great piece of software that can probably not be fixed by using client-side code only.

​I suppose the best practice when dealing with a lack of information in the error handle code flows is to limit the amount of context that is in scope by unit testing.  And while they are absolutely short-comings overcoming them has cost in terms of both developer effort and, more importantly, runtime performance.

David J.



Re: Problems with Error Messages wrt Domains, Checks

From
"David G. Johnston"
Date:
On Sat, Mar 17, 2018 at 12:54 PM, john frazer <johnfrazer783@gmail.com> wrote:

---------- Forwarded message ----------
From: john frazer <johnfrazer783@gmail.com>
Date: Sat, Mar 17, 2018 at 6:28 PM
Subject: Re: Problems with Error Messages wrt Domains, Checks
To: "David G. Johnston" <david.g.johnston@gmail.com>

>     As such, it could be anywhere in my many, many kLOCs big DB definition. I cannot even search the RegEx with a RegEx because all I know is some parenthesis is missing, somewhere:
>
> ​Well, the error does point to the first statement in the chain of issues - working backward a couple of steps is possible.

In this particular case that is possible, and I did manage to do it. The point is that in the
general case the faulty regular expression could be anywhere, and there's no clue given at all.

​Frankly, I'm not seeing "invalid constant regular expressions" as being a large scale problem - but I'll agree that having the error include the actual literal being parsed as a RegEx should be done.  If the targeted reporting (e.g., stack trace) gets fixed as a side-effect of the more annoying type input errors - which usually involves dynamic data - that would be swell.
 
>     FAILURE: before the insert statement, everything runs dandy. We could have built an entire data warehouse application on top of a table definition that can never be syntactically processed but which will only fail when someone accidentally tries to insert a line.
>
> ​Since this is going to fail every single time you add a record I'm lacking sympathy here.  "Accidentally tries to insert a line" - if the table wasn't meant to be used why does it exist in the first place?​  And if it is intended to be used then functional testing should quickly point out something like this.

But there clearly can be tables that are used only now and then and might get checked
for absence of rows. But regardless, I think the point stands that ideally you shouldn't
be able to succesfully declare nonsensical objects and only be told so some kinds of
usage patterns by runtime errors (with defective contexts), and in most cases, pgSQL
does keep that promise.

I'm not disagreeing but I'm also not part of the solution.  In terms of importance I'd say its not that high given that I've never really felt the lack personally.  An invalid object, even though it doesn't fail at creation, usually fails immediately after its first use which happens soon enough after creation as to make pin-pointing its location generally trivial.
 

> ​I suppose the best practice when dealing with a lack of information in the error handle code flows is to limit the amount of context that is in scope by unit testing.  And while they are absolutely short-comings overcoming them has cost in terms of both developer effort and, more importantly, runtime performance.

I'm afraid no amount of unit testing of the DDL code can do this for me. Yes,
in the first reported cases (the invalid RegExp), I can make sure I use each
expression at least once so unsyntactic ones will make themselves shown. But
in the other two cases, well, the production environment in which this came up
has an insert statement that takes data from a largish source into the target table
(20k rows of altogether >2m rows), and I *can't* unit test that data.

​I'd be inclined to not constrain the table itself at all and instead perform soft validation post-load.​  You can process and remove offending records and then add the constraints as a sanity check/documentation.


FWIW the workaround that IÄve found is this:

create table X.table_with_constraints (
  my_column text,
  constraint "my_column must start with 'x'"  check ( Q.starts_with_x( my_column ) ),
  constraint "my_column must have 3 chrs"     check ( Q.has_3_characters( my_column ) ) );

In other words, I dispense with domains and use (small, boolean) functions (defined as `select` one-liners)
because only then do I get told what piece of data comes doen the wrong way and where.
It's a shame because this is essentially what I expect to do in a language like
JavaScript.


​Yes, hopefully we can decide and implement value reporting for v12 (and consider adding it in for v11) which would at least avoid the need for the functions and thus rely on just the named constraints.

Though since its just you and I on this thread there is no one who can write a patch speaking up...

David J.

Re: Problems with Error Messages wrt Domains, Checks

From
Tom Lane
Date:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> Frankly, I'm not seeing "invalid constant regular expressions" as being a
> large scale problem - but I'll agree that having the error include the
> actual literal being parsed as a RegEx should be done.

Agreed.  Doing anything about the other stuff discussed in this thread is
fairly large-scale work, but adjusting our regex error messages is easy.

At least, it is if we can get consensus on what they should look like :-).
There's at least one place that already includes the regex proper in
its error, in hba.c:

            ereport(LOG,
                    (errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
                     errmsg("invalid regular expression \"%s\": %s",
                            parsedline->ident_user + 1, errstr)));

But I wonder if we wouldn't be better off to put the regex into a
detail line, ie

                     errmsg("invalid regular expression: %s", ...),
                     errdetail("Regular expression is \"%s\".", ...),

The reason I'm concerned about that is I've seen some pretty hairy
regexes, way longer than are reasonable to include in a primary
error message.  The one-line way is nice for short regexes, but
it could get out of hand.  Also, for the principal use-cases in regexp.c,
we could avoid changing the primary message text from what it is now.
That might prevent some unnecessary client breakage (not that people
are supposed to be testing message string contents, but ...)

Thoughts?

            regards, tom lane


Re: Problems with Error Messages wrt Domains, Checks

From
"David G. Johnston"
Date:
On Mon, Mar 19, 2018 at 8:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> Frankly, I'm not seeing "invalid constant regular expressions" as being a
> large scale problem - but I'll agree that having the error include the
> actual literal being parsed as a RegEx should be done.

Agreed.  Doing anything about the other stuff discussed in this thread is
fairly large-scale work, but adjusting our regex error messages is easy.
​[...]

But I wonder if we wouldn't be better off to put the regex into a
detail line, ie

                     errmsg("invalid regular expression: %s", ...),
                     errdetail("Regular expression is \"%s\".", ...),

The reason I'm concerned about that is I've seen some pretty hairy
regexes, way longer than are reasonable to include in a primary
error message.  The one-line way is nice for short regexes, but
it could get out of hand.

​I write many of those - albeit less so when working with database embedded expressions as opposed to application-layer.

I'd consider at least supplying the first 30 or so characters (or maybe up to the first newline, whichever is shorter) in the main message and then the entire regex in the detail line.

invalid regular expression starting with: %s

I think having the typical regex in the message will aid users that use client interfaces that don't propagate error detail by default - and it should be sufficient to narrow down, if not pinpoint, the offending regex in most cases.  And, faced with multiple, the user can add a leading comment to the regexp to help narrow down the options if necessary.

David J.

Re: Problems with Error Messages wrt Domains, Checks

From
Tom Lane
Date:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Mon, Mar 19, 2018 at 8:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> But I wonder if we wouldn't be better off to put the regex into a
>> detail line, ie
>>         errmsg("invalid regular expression: %s", ...),
>>         errdetail("Regular expression is \"%s\".", ...),

> I'd consider at least supplying the first 30 or so characters (or maybe up
> to the first newline, whichever is shorter) in the main message and then
> the entire regex in the detail line.

That seems like a lot more complication than this is worth, and it'd be
confusing to users as well, if things are formatted differently for short
and long regexes.

Also, by my count there are at least eight places in the code that
need to emit messages like this; if we turn formatting the messages into
a major production, people will take shortcuts.  Some already have, eg
spell.c is failing to report pg_regexec failures at all.  I thought
about trying to discourage deviations by using common error-reporting
subroutines, but there are enough legit differences in what needs to
be done that that might not work well.

            regards, tom lane


Re: Problems with Error Messages wrt Domains, Checks

From
"David G. Johnston"
Date:
On Mon, Mar 19, 2018 at 9:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Mon, Mar 19, 2018 at 8:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> But I wonder if we wouldn't be better off to put the regex into a
>> detail line, ie
>>              errmsg("invalid regular expression: %s", ...),
>>              errdetail("Regular expression is \"%s\".", ...),

> I'd consider at least supplying the first 30 or so characters (or maybe up
> to the first newline, whichever is shorter) in the main message and then
> the entire regex in the detail line.

That seems like a lot more complication than this is worth, and it'd be
confusing to users as well, if things are formatted differently for short
and long regexes.

They would be formatted the same every time - just need to remember to add the extra function call around the expression variable in the errmsg case.  Definitely not married to the idea though. 

Saying "begins with" and then showing the entire regex shouldn't cause too much confusion I'd hope.
  
I thought
about trying to discourage deviations by using common error-reporting
subroutines, but there are enough legit differences in what needs to
be done that that might not work well.

​Two support functions might suffice:​

present_regexp_for_errmsg(regex_var)
present_regexp_for_detailmsg(regex_var)

We can at least centralize how the expression itself is string-ified.

David J.

Re: Problems with Error Messages wrt Domains, Checks

From
"David G. Johnston"
Date:
On Mon, Mar 19, 2018 at 8:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> Frankly, I'm not seeing "invalid constant regular expressions" as being a
> large scale problem - but I'll agree that having the error include the
> actual literal being parsed as a RegEx should be done.

Agreed.  Doing anything about the other stuff discussed in this thread is
fairly large-scale work, but adjusting our regex error messages is easy.

​Another one that seems to fall into the "fairly large-scale work" would be the:

​ERROR:  more than one row returned by a subquery used as an expression
(that's it, nothing else prints in psql when I run the offending query - using "--echo-all" to at least see what query I sent caused the issue)

Found this via Google


Printing out the offending expression would be nice if possible - printing the set (size > 1) of values that were returned would probably help as well, though usually the problem data is in a where clause while the set would contain target list data - and its those where clause items that matter more.  In the case of a correlated subquery exhibiting this problem the outer query vars being passed in is what would be most helpful.

David J.

P.S. I consider this to be on-topic to the general "hard to debug" topic this thread covers even if its doesn't involve domains...I may gripe more prominently later...

Re: Problems with Error Messages wrt Domains, Checks

From
Tom Lane
Date:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Mon, Mar 19, 2018 at 8:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> ​Another one that seems to fall into the "fairly large-scale work" would be
> the:
> ​ERROR:  more than one row returned by a subquery used as an expression
> (that's it, nothing else prints in psql when I run the offending query -
> using "--echo-all" to at least see what query I sent caused the issue)

Yeah, this falls into the general category of unlocalized run-time errors.
I still think that emitting an error cursor would be a general-purpose
answer that would improve the user experience for this and many other
cases.  You could imagine specific behaviors that would be more optimal
for this specific error report, but the approach of solving this issue
one error at a time doesn't scale even a little bit.

> Printing out the offending expression would be nice if possible - printing
> the set (size > 1) of values that were returned would probably help as
> well, though usually the problem data is in a where clause while the set
> would contain target list data - and its those where clause items that
> matter more.  In the case of a correlated subquery exhibiting this problem
> the outer query vars being passed in is what would be most helpful.

IIRC, this error is thrown as soon as we get a second row out of the
subquery; we don't know what the ultimate result-set size would be,
and I rather doubt that finding that out would be helpful very often.
Nor does it seem like figuring out how to identify the parameter values
passed down to the subquery (if any) would really repay the effort.
The set that the executor thinks it's passing down might not have that
much to do with what the user thinks the query is doing.

            regards, tom lane