Thread: 8.1 substring bug?

8.1 substring bug?

From

Harald Fuchs

Date:

11 November 2005, 09:49:40

Consider the following:
 CREATE TEMP TABLE tbl (   id SERIAL NOT NULL,   PRIMARY KEY (id) );
 COPY tbl (id) FROM stdin; 1 2 3 4 \.
 SELECT substring ('1234567890' FOR (SELECT count (*) FROM tbl)::int);

This returns '1234', as expected.  But
 SELECT substring ('1234567890' FOR (SELECT count (*) FROM tbl));

returns NULL.  I think the problem is that "SELECT count(*)" returns a
BIGINT whereas "substring" expects an INT.  Shouldn't there be a warning?

Re: 8.1 substring bug?

From

Martijn van Oosterhout

Date:

11 November 2005, 10:12:37

It's even sillier than that:

test=# SELECT substring ('1234567890' FOR 4::bigint);substring
-----------
(1 row)

test=# SELECT substring ('1234567890' FOR 4::int);substring
-----------1234
(1 row)

Looking at the explain verbose make it look like it's using the wrong
version of substring. It's using the oid 2074 one:

test=# select oid,  oid::regprocedure from pg_proc where proname =
'substring'; oid  |                 oid
-------+-------------------------------------  936 | "substring"(text,integer,integer)  937 | "substring"(text,integer)
1680| "substring"(bit,integer,integer) 1699 | "substring"(bit,integer) 2012 | "substring"(bytea,integer,integer) 2013 |
"substring"(bytea,integer)2073 | "substring"(text,text) 2074 | "substring"(text,text,text)           <----16579 |
"substring"(citext,integer,integer)16580| "substring"(citext,integer) 
(10 rows)

That substring is for regular expressions. Nasty, not sure how to deal
with that one...

Have a nice day,

On Fri, Nov 11, 2005 at 02:43:23PM +0100, Harald Fuchs wrote:
> Consider the following:
>
>   CREATE TEMP TABLE tbl (
>     id SERIAL NOT NULL,
>     PRIMARY KEY (id)
>   );
>
>   COPY tbl (id) FROM stdin;
>   1
>   2
>   3
>   4
>   \.
>
>   SELECT substring ('1234567890' FOR (SELECT count (*) FROM tbl)::int);
>
> This returns '1234', as expected.  But
>
>   SELECT substring ('1234567890' FOR (SELECT count (*) FROM tbl));
>
> returns NULL.  I think the problem is that "SELECT count(*)" returns a
> BIGINT whereas "substring" expects an INT.  Shouldn't there be a warning?
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: 8.1 substring bug?

From

Tom Lane

Date:

11 November 2005, 11:06:51

Martijn van Oosterhout <kleptog@svana.org> writes:
> It's even sillier than that:

> test=# SELECT substring ('1234567890' FOR 4::bigint);
>  substring 
> -----------
>  
> (1 row)

> test=# SELECT substring ('1234567890' FOR 4::int);
>  substring 
> -----------
>  1234
> (1 row)

This has been complained of before.  The problem is that there is no
implicit cast from bigint to int, but there is one from bigint to text,
so the only acceptable mapping the parser can find is to convert bigint
to text and apply the pattern-match version of substring().  (There are
some other things happening here because of the weird SQL99 syntax, but
that's the bottom line.)

I have opined before that implicit cross-category casts to text are
evil.  Unfortunately, we'll probably break a lot of people's
applications if we remove them...
        regards, tom lane

Re: 8.1 substring bug?

From

Stephan Szabo

Date:

11 November 2005, 11:47:09

On Fri, 11 Nov 2005, Tom Lane wrote:

> Martijn van Oosterhout <kleptog@svana.org> writes:
> > It's even sillier than that:
>
> > test=# SELECT substring ('1234567890' FOR 4::bigint);
> >  substring
> > -----------
> >
> > (1 row)
>
> > test=# SELECT substring ('1234567890' FOR 4::int);
> >  substring
> > -----------
> >  1234
> > (1 row)
>
> This has been complained of before.  The problem is that there is no
> implicit cast from bigint to int, but there is one from bigint to text,
> so the only acceptable mapping the parser can find is to convert bigint
> to text and apply the pattern-match version of substring().  (There are
> some other things happening here because of the weird SQL99 syntax, but
> that's the bottom line.)

It looks to me like we should be supporting any exact numeric with scale 0
there (at least AFAICS from SQL92 and SQL03), so I don't think the current
behavior is compliant. It doesn't look like adding a numeric overload
of the function works, and the function also becomes ambiguous for int2
inputs. :(

Re: 8.1 substring bug?

From

Martijn van Oosterhout

Date:

11 November 2005, 12:02:42

On Fri, Nov 11, 2005 at 07:47:12AM -0800, Stephan Szabo wrote:
>
> On Fri, 11 Nov 2005, Tom Lane wrote:
> > This has been complained of before.  The problem is that there is no
> > implicit cast from bigint to int, but there is one from bigint to text,
> > so the only acceptable mapping the parser can find is to convert bigint
> > to text and apply the pattern-match version of substring().  (There are
> > some other things happening here because of the weird SQL99 syntax, but
> > that's the bottom line.)
>
> It looks to me like we should be supporting any exact numeric with scale 0
> there (at least AFAICS from SQL92 and SQL03), so I don't think the current
> behavior is compliant. It doesn't look like adding a numeric overload
> of the function works, and the function also becomes ambiguous for int2
> inputs. :(

Other than adding explicit definitions for each and every numeric type,
is there no way to add a preference for implicit conversions?

In this particular case the syntax makes it unclear that the substring
is the problem. Perhaps here the solution would be to put a cast in the
grammer, like so:

substr_for: FOR a_expr                       { $$ = makeTypeCast($2,"int4"); }               ;

This would make the match guarenteed since the type matches exactly, no?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: 8.1 substring bug?

From

Tom Lane

Date:

11 November 2005, 12:07:36

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> It looks to me like we should be supporting any exact numeric with scale 0
> there (at least AFAICS from SQL92 and SQL03), so I don't think the current
> behavior is compliant. It doesn't look like adding a numeric overload
> of the function works, and the function also becomes ambiguous for int2
> inputs. :(

Currently (see gram.y, about line 7600) the grammar converts
SUBSTRING(foo FOR bar)

into
pg_catalog.substring(foo, 1, bar)

and then leaves the normal function-call-analysis code to make the best
of it with that.  If "bar" isn't implicitly castable to integer then
you've got trouble.

I was toying with the idea of making it translate instead to
pg_catalog.substring(foo, 1, (bar)::int4)

since AFAICS there isn't any other reasonable mapping once you have
committed to having the "1" in there.  This does not solve the general
problem, but it'd address the particular case anyway ...
        regards, tom lane

Re: 8.1 substring bug?

From

Tom Lane

Date:

11 November 2005, 12:22:41

Martijn van Oosterhout <kleptog@svana.org> writes:
> In this particular case the syntax makes it unclear that the substring
> is the problem. Perhaps here the solution would be to put a cast in the
> grammer, like so:

> substr_for: FOR a_expr                       { $$ =3D makeTypeCast($2,"int4"); }
>                 ;

Not there, because it would break the variants where FOR introduces a
character expression, eg
        <regular expression substring function> ::=             SUBSTRING <left paren> <character value expression>
FROM                        <character value expression> FOR             <escape character> <right paren>
 

But I think we could do this in substr_list in the case where we have
just "a_expr substr_for", because there are no variants of that where
the FOR expression is supposed to be string.  See my other message
just now.
        regards, tom lane

Re: 8.1 substring bug?

From

Stephan Szabo

Date:

11 November 2005, 13:31:33

On Fri, 11 Nov 2005, Tom Lane wrote:

> Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> > It looks to me like we should be supporting any exact numeric with scale 0
> > there (at least AFAICS from SQL92 and SQL03), so I don't think the current
> > behavior is compliant. It doesn't look like adding a numeric overload
> > of the function works, and the function also becomes ambiguous for int2
> > inputs. :(
>
> Currently (see gram.y, about line 7600) the grammar converts
>
>     SUBSTRING(foo FOR bar)
>
> into
>
>     pg_catalog.substring(foo, 1, bar)
>
> and then leaves the normal function-call-analysis code to make the best
> of it with that.  If "bar" isn't implicitly castable to integer then
> you've got trouble.

Right, I was thinking we could get around it with another substring that
took two numerics, but then I think FROM int2 FOR int2 would be
ambiguous.

> I was toying with the idea of making it translate instead to
>
>     pg_catalog.substring(foo, 1, (bar)::int4)
>
> since AFAICS there isn't any other reasonable mapping once you have
> committed to having the "1" in there.  This does not solve the general
> problem, but it'd address the particular case anyway ...

And, it's fairly reasonable to assume at least right now that any
reasonable string offset or length fits in an int4.

Re: 8.1 substring bug?

From

Tom Lane

Date:

11 November 2005, 13:44:16

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> On Fri, 11 Nov 2005, Tom Lane wrote:
>> I was toying with the idea of making it translate instead to
>> 
>> pg_catalog.substring(foo, 1, (bar)::int4)
>> 
>> since AFAICS there isn't any other reasonable mapping once you have
>> committed to having the "1" in there.  This does not solve the general
>> problem, but it'd address the particular case anyway ...

> And, it's fairly reasonable to assume at least right now that any
> reasonable string offset or length fits in an int4.

If we thought differently we'd be changing the substring function,
and we could surely change the translation at the same time.
        regards, tom lane

Re: 8.1 substring bug?

From

Harald Fuchs

Date:

11 November 2005, 16:06:48

In article <20051111141232.GH13177@svana.org>,
Martijn van Oosterhout <kleptog@svana.org> writes:

> It's even sillier than that:
> test=# SELECT substring ('1234567890' FOR 4::bigint);
>  substring 
> -----------
> (1 row)

> test=# SELECT substring ('1234567890' FOR 4::int);
>  substring 
> -----------
>  1234
> (1 row)

> Looking at the explain verbose make it look like it's using the wrong
> version of substring. It's using the oid 2074 one:

> test=# select oid,  oid::regprocedure from pg_proc where proname =
> 'substring';
>   oid  |                 oid                 
> -------+-------------------------------------
>    936 | "substring"(text,integer,integer)
>    937 | "substring"(text,integer)
>   1680 | "substring"(bit,integer,integer)
>   1699 | "substring"(bit,integer)
>   2012 | "substring"(bytea,integer,integer)
>   2013 | "substring"(bytea,integer)
>   2073 | "substring"(text,text)
>   2074 | "substring"(text,text,text)           <----
>  16579 | "substring"(citext,integer,integer)
>  16580 | "substring"(citext,integer)
> (10 rows)

> That substring is for regular expressions. Nasty, not sure how to deal
> with that one...

Ah, so it's using "substring (STRING from PATTERN for ESCAPE)"?
Yes, that explains the NULL.  Looks like we're in the INT/BIGINT
confusion again...

> Have a nice day,

It's a nice day since I have a nice workaround for this misfeature :-)

Re: 8.1 substring bug?

From

Tom Lane

Date:

13 November 2005, 15:13:41

I wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
>> In this particular case the syntax makes it unclear that the substring
>> is the problem. Perhaps here the solution would be to put a cast in the
>> grammer, like so:
> ...
> But I think we could do this in substr_list in the case where we have
> just "a_expr substr_for", because there are no variants of that where
> the FOR expression is supposed to be string.

I've applied this patch as far back as 8.0.  Not sure whether there's
a need to back-patch further.
        regards, tom lane