Re: Problems with Error Messages wrt Domains, Checks - Mailing list pgsql-hackers

From David G. Johnston
Subject Re: Problems with Error Messages wrt Domains, Checks
Date
Msg-id CAKFQuwaaaf11qeRsD8cGKQPecwHdbFLMjdM6BUyONP3WLPny=w@mail.gmail.com
Whole thread Raw
In response to Fwd: Problems with Error Messages wrt Domains, Checks  (john frazer <johnfrazer783@gmail.com>)
Responses Re: Problems with Error Messages wrt Domains, Checks  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sat, Mar 17, 2018 at 12:54 PM, john frazer <johnfrazer783@gmail.com> wrote:

---------- Forwarded message ----------
From: john frazer <johnfrazer783@gmail.com>
Date: Sat, Mar 17, 2018 at 6:28 PM
Subject: Re: Problems with Error Messages wrt Domains, Checks
To: "David G. Johnston" <david.g.johnston@gmail.com>

>     As such, it could be anywhere in my many, many kLOCs big DB definition. I cannot even search the RegEx with a RegEx because all I know is some parenthesis is missing, somewhere:
>
> ​Well, the error does point to the first statement in the chain of issues - working backward a couple of steps is possible.

In this particular case that is possible, and I did manage to do it. The point is that in the
general case the faulty regular expression could be anywhere, and there's no clue given at all.

​Frankly, I'm not seeing "invalid constant regular expressions" as being a large scale problem - but I'll agree that having the error include the actual literal being parsed as a RegEx should be done.  If the targeted reporting (e.g., stack trace) gets fixed as a side-effect of the more annoying type input errors - which usually involves dynamic data - that would be swell.
 
>     FAILURE: before the insert statement, everything runs dandy. We could have built an entire data warehouse application on top of a table definition that can never be syntactically processed but which will only fail when someone accidentally tries to insert a line.
>
> ​Since this is going to fail every single time you add a record I'm lacking sympathy here.  "Accidentally tries to insert a line" - if the table wasn't meant to be used why does it exist in the first place?​  And if it is intended to be used then functional testing should quickly point out something like this.

But there clearly can be tables that are used only now and then and might get checked
for absence of rows. But regardless, I think the point stands that ideally you shouldn't
be able to succesfully declare nonsensical objects and only be told so some kinds of
usage patterns by runtime errors (with defective contexts), and in most cases, pgSQL
does keep that promise.

I'm not disagreeing but I'm also not part of the solution.  In terms of importance I'd say its not that high given that I've never really felt the lack personally.  An invalid object, even though it doesn't fail at creation, usually fails immediately after its first use which happens soon enough after creation as to make pin-pointing its location generally trivial.
 

> ​I suppose the best practice when dealing with a lack of information in the error handle code flows is to limit the amount of context that is in scope by unit testing.  And while they are absolutely short-comings overcoming them has cost in terms of both developer effort and, more importantly, runtime performance.

I'm afraid no amount of unit testing of the DDL code can do this for me. Yes,
in the first reported cases (the invalid RegExp), I can make sure I use each
expression at least once so unsyntactic ones will make themselves shown. But
in the other two cases, well, the production environment in which this came up
has an insert statement that takes data from a largish source into the target table
(20k rows of altogether >2m rows), and I *can't* unit test that data.

​I'd be inclined to not constrain the table itself at all and instead perform soft validation post-load.​  You can process and remove offending records and then add the constraints as a sanity check/documentation.


FWIW the workaround that IÄve found is this:

create table X.table_with_constraints (
  my_column text,
  constraint "my_column must start with 'x'"  check ( Q.starts_with_x( my_column ) ),
  constraint "my_column must have 3 chrs"     check ( Q.has_3_characters( my_column ) ) );

In other words, I dispense with domains and use (small, boolean) functions (defined as `select` one-liners)
because only then do I get told what piece of data comes doen the wrong way and where.
It's a shame because this is essentially what I expect to do in a language like
JavaScript.


​Yes, hopefully we can decide and implement value reporting for v12 (and consider adding it in for v11) which would at least avoid the need for the functions and thus rely on just the named constraints.

Though since its just you and I on this thread there is no one who can write a patch speaking up...

David J.

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: [PROPOSAL] Shared Ispell dictionaries
Next
From: Claudio Freire
Date:
Subject: Re: [HACKERS] GUC for cleanup indexes threshold.