Thread: Suspected bug: outer WHERE reordered before inner WHERE -> input syntax ERROR

Suspected bug: outer WHERE reordered before inner WHERE -> input syntax ERROR

From
John Keith Hohm
Date:
This fails with ERROR:  invalid input syntax for integer: "JOHN":

select * from (
  select * from (VALUES ('100'), ('JOHN')) as A (n)
  where trim(trim(n), '0123456789') = ''
) as B where n::integer <> -1;

But without the nested trim, this works just fine (although it would not
suffice for my purpose):

select * from (
  select * from (VALUES ('100'), ('JOHN')) as A (n)
  where trim(n, '0123456789') = ''
) as B where n::integer <> -1;

I think they should both work because the documentation says "A
sub-SELECT can appear in the FROM clause. This acts as though its
output were created as a temporary table for the duration of this
single SELECT command."

The cause is clear when you look at the error-ing EXPLAIN:

 Values Scan on "*VALUES*"  (cost=0.00..0.06 rows=1 width=32)
   Filter: (((column1)::integer <> (-1)) AND (btrim(btrim(column1),
'0123456789'::text) = ''::text))

Versus the single-trim EXPLAIN:

 Values Scan on "*VALUES*"  (cost=0.00..0.05 rows=1 width=32)
   Filter: ((btrim(column1, '0123456789'::text) = ''::text) AND
((column1)::integer <> (-1)))

The extra trim causes the cast-and-compare to happen before the
trim-and-compare. By my understanding PostgreSQL should not be allowed
to reorder the clause of the subselect before the outer select.

I'm running the Ubuntu postgresql package version 8.3.3-0ubuntu0.8.04.

I'm also interested in responses of the form "why not just do X?".

--
John Keith Hohm
<john@hohm.net>

Re: Suspected bug: outer WHERE reordered before inner WHERE -> input syntax ERROR

From
Alvaro Herrera
Date:
John Keith Hohm wrote:

> select * from (
>   select * from (VALUES ('100'), ('JOHN')) as A (n)
>   where trim(trim(n), '0123456789') = ''
> ) as B where n::integer <> -1;

> I'm also interested in responses of the form "why not just do X?".

Why not just add an "OFFSET 0" in the subselect so that the optimizer
does not flatten the query?

 select * from (
   select * from (VALUES ('100'), ('JOHN')) as A (n)
   where trim(trim(n), '0123456789') = ''
 offset 0) as B where n::integer <> -1;


--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

John Keith Hohm <john@hohm.net> writes:
> This fails with ERROR:  invalid input syntax for integer: "JOHN":
> select * from (
>   select * from (VALUES ('100'), ('JOHN')) as A (n)
>   where trim(trim(n), '0123456789') = ''
> ) as B where n::integer <> -1;

This isn't a bug: the optimizer is entitled to rearrange WHERE clauses
any way it pleases.  If you want an optimization fence between the inner
and outer SELECTS, add OFFSET 0 (or LIMIT ALL if you like).

> I'm also interested in responses of the form "why not just do X?".

It does raise the question of why you aren't just doing
    where trim(n) != '-1'
I'm also wondering whether the logic is even consistent: something
with a minus sign in it will never get through the inner WHERE,
so what is the point of the outer one?

            regards, tom lane

Re: Suspected bug: outer WHERE reordered before inner WHERE -> input syntax ERROR

From
John Keith Hohm
Date:
On Mon, 08 Sep 2008 13:53:03 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> This isn't a bug: the optimizer is entitled to rearrange WHERE clauses
> any way it pleases.  If you want an optimization fence between the
> inner and outer SELECTS, add OFFSET 0 (or LIMIT ALL if you like).

Thanks for telling me about LIMIT ALL being an optimization fence.
I believe you that it is not a bug, but I admin I cannot find any
documentation of that entitlement.

The description of sub-selects in the SELECT statement documentation
suggests that a sub-select is equivalent to creating a temporary table
with the inner select for the duration of that statement, and clearly
that would not have the same effect.  And the documentation of LIMIT
ALL suggests it is optional noise.

http://www.postgresql.org/docs/8.3/interactive/sql-select.html:

    A sub-SELECT can appear in the FROM clause. This acts as though its
output were created as a temporary table for the duration of this
single SELECT command. [...]

http://www.postgresql.org/docs/8.3/interactive/queries-limit.html:

    [...] LIMIT ALL is the same as omitting the LIMIT clause.

Is there a SQL standard document that gives permission for various
optimizations, and the PostgreSQL documentation contains some
technically inaccurate simplifications for exposition?

> It does raise the question of why you aren't just doing
>     where trim(n) != '-1'
> I'm also wondering whether the logic is even consistent: something
> with a minus sign in it will never get through the inner WHERE,
> so what is the point of the outer one?

Sorry, in minimizing my example to demonstrate the problem I made it
appear pointless; the actual query that motivated the discussion is:

select * from (
       select * from Inquiry where nullif(trim(Member_Nbr), '') is not
NULL and trim(trim(Member_Nbr), '0123456789') = ''
    -- and I've added LIMIT ALL here to make it work
) as valid_Inquiry where Member_Nbr::integer not in (
       select Member_Nbr::integer from Member
);

...which is part of the conversion process for data dumped from an
Access database.

--
John Keith Hohm
<john@hohm.net>

On Mon, Sep 08, 2008 at 01:53:03PM -0400, Tom Lane wrote:
> John Keith Hohm <john@hohm.net> writes:
> > This fails with ERROR:  invalid input syntax for integer: "JOHN":
> > select * from (
> >   select * from (VALUES ('100'), ('JOHN')) as A (n)
> >   where trim(trim(n), '0123456789') = ''
> > ) as B where n::integer <> -1;
>
> This isn't a bug: the optimizer is entitled to rearrange WHERE clauses
> any way it pleases.  If you want an optimization fence between the inner
> and outer SELECTS, add OFFSET 0 (or LIMIT ALL if you like).

I'd interpret John's note as pointing out that SQL doesn't distinguish
between type declarations and type casting.  I think he wants A.n to
be of type TEXT, would like to temporarily treat it as INTEGER in one
sub-expression.  PG incorrectly propagates this cast as applying to the
whole query, leading to John getting his confusing error message.

I don't think PG is doing anything incorrect, it's just that there's
nothing in the language that I know of that would allow you to express
what is really intended.

Or have I missed the point entirely?

A hacky fix would be something like the following:

  SELECT * FROM (
    SELECT * FROM (VALUES ('100'), ('JOHN')) AS A (n)
    WHERE trim(trim(n), '0123456789') = '') AS B
  WHERE (n||'')::INTEGER <> -1;

It works because the cast is applying to a new expression and isn't
being interpreted as a type declaration.


  Sam

Sam Mason <sam@samason.me.uk> writes:
> I'd interpret John's note as pointing out that SQL doesn't distinguish
> between type declarations and type casting.  I think he wants A.n to
> be of type TEXT, would like to temporarily treat it as INTEGER in one
> sub-expression.  PG incorrectly propagates this cast as applying to the
> whole query, leading to John getting his confusing error message.

No, that's not correct at all.  John had diagnosed the problem
correctly: the planner was reordering the two separate WHERE conditions
in such a way that the run-time cast to integer was attempted before the
condition that would have eliminated the values for which the cast
fails.  (I believe that the reason is that for conditions that end up in
the same plan node, the planner tries to push the more expensive ones to
the end of the list; so his example with an extra trim() call works
differently, having three operations to the other condition's two.)

            regards, tom lane

John Keith Hohm <john@hohm.net> writes:
> Thanks for telling me about LIMIT ALL being an optimization fence.
> I believe you that it is not a bug, but I admin I cannot find any
> documentation of that entitlement.

It's over here:
http://www.postgresql.org/docs/8.3/static/sql-expressions.html#SYNTAX-EXPRESS-EVAL

            regards, tom lane