Thread: 8.1 substring bug?
Consider the following: CREATE TEMP TABLE tbl ( id SERIAL NOT NULL, PRIMARY KEY (id) ); COPY tbl (id) FROM stdin; 1 2 3 4 \. SELECT substring ('1234567890' FOR (SELECT count (*) FROM tbl)::int); This returns '1234', as expected. But SELECT substring ('1234567890' FOR (SELECT count (*) FROM tbl)); returns NULL. I think the problem is that "SELECT count(*)" returns a BIGINT whereas "substring" expects an INT. Shouldn't there be a warning?
It's even sillier than that: test=# SELECT substring ('1234567890' FOR 4::bigint);substring ----------- (1 row) test=# SELECT substring ('1234567890' FOR 4::int);substring -----------1234 (1 row) Looking at the explain verbose make it look like it's using the wrong version of substring. It's using the oid 2074 one: test=# select oid, oid::regprocedure from pg_proc where proname = 'substring'; oid | oid -------+------------------------------------- 936 | "substring"(text,integer,integer) 937 | "substring"(text,integer) 1680| "substring"(bit,integer,integer) 1699 | "substring"(bit,integer) 2012 | "substring"(bytea,integer,integer) 2013 | "substring"(bytea,integer)2073 | "substring"(text,text) 2074 | "substring"(text,text,text) <----16579 | "substring"(citext,integer,integer)16580| "substring"(citext,integer) (10 rows) That substring is for regular expressions. Nasty, not sure how to deal with that one... Have a nice day, On Fri, Nov 11, 2005 at 02:43:23PM +0100, Harald Fuchs wrote: > Consider the following: > > CREATE TEMP TABLE tbl ( > id SERIAL NOT NULL, > PRIMARY KEY (id) > ); > > COPY tbl (id) FROM stdin; > 1 > 2 > 3 > 4 > \. > > SELECT substring ('1234567890' FOR (SELECT count (*) FROM tbl)::int); > > This returns '1234', as expected. But > > SELECT substring ('1234567890' FOR (SELECT count (*) FROM tbl)); > > returns NULL. I think the problem is that "SELECT count(*)" returns a > BIGINT whereas "substring" expects an INT. Shouldn't there be a warning? > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Martijn van Oosterhout <kleptog@svana.org> writes: > It's even sillier than that: > test=# SELECT substring ('1234567890' FOR 4::bigint); > substring > ----------- > > (1 row) > test=# SELECT substring ('1234567890' FOR 4::int); > substring > ----------- > 1234 > (1 row) This has been complained of before. The problem is that there is no implicit cast from bigint to int, but there is one from bigint to text, so the only acceptable mapping the parser can find is to convert bigint to text and apply the pattern-match version of substring(). (There are some other things happening here because of the weird SQL99 syntax, but that's the bottom line.) I have opined before that implicit cross-category casts to text are evil. Unfortunately, we'll probably break a lot of people's applications if we remove them... regards, tom lane
On Fri, 11 Nov 2005, Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > > It's even sillier than that: > > > test=# SELECT substring ('1234567890' FOR 4::bigint); > > substring > > ----------- > > > > (1 row) > > > test=# SELECT substring ('1234567890' FOR 4::int); > > substring > > ----------- > > 1234 > > (1 row) > > This has been complained of before. The problem is that there is no > implicit cast from bigint to int, but there is one from bigint to text, > so the only acceptable mapping the parser can find is to convert bigint > to text and apply the pattern-match version of substring(). (There are > some other things happening here because of the weird SQL99 syntax, but > that's the bottom line.) It looks to me like we should be supporting any exact numeric with scale 0 there (at least AFAICS from SQL92 and SQL03), so I don't think the current behavior is compliant. It doesn't look like adding a numeric overload of the function works, and the function also becomes ambiguous for int2 inputs. :(
On Fri, Nov 11, 2005 at 07:47:12AM -0800, Stephan Szabo wrote: > > On Fri, 11 Nov 2005, Tom Lane wrote: > > This has been complained of before. The problem is that there is no > > implicit cast from bigint to int, but there is one from bigint to text, > > so the only acceptable mapping the parser can find is to convert bigint > > to text and apply the pattern-match version of substring(). (There are > > some other things happening here because of the weird SQL99 syntax, but > > that's the bottom line.) > > It looks to me like we should be supporting any exact numeric with scale 0 > there (at least AFAICS from SQL92 and SQL03), so I don't think the current > behavior is compliant. It doesn't look like adding a numeric overload > of the function works, and the function also becomes ambiguous for int2 > inputs. :( Other than adding explicit definitions for each and every numeric type, is there no way to add a preference for implicit conversions? In this particular case the syntax makes it unclear that the substring is the problem. Perhaps here the solution would be to put a cast in the grammer, like so: substr_for: FOR a_expr { $$ = makeTypeCast($2,"int4"); } ; This would make the match guarenteed since the type matches exactly, no? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > It looks to me like we should be supporting any exact numeric with scale 0 > there (at least AFAICS from SQL92 and SQL03), so I don't think the current > behavior is compliant. It doesn't look like adding a numeric overload > of the function works, and the function also becomes ambiguous for int2 > inputs. :( Currently (see gram.y, about line 7600) the grammar converts SUBSTRING(foo FOR bar) into pg_catalog.substring(foo, 1, bar) and then leaves the normal function-call-analysis code to make the best of it with that. If "bar" isn't implicitly castable to integer then you've got trouble. I was toying with the idea of making it translate instead to pg_catalog.substring(foo, 1, (bar)::int4) since AFAICS there isn't any other reasonable mapping once you have committed to having the "1" in there. This does not solve the general problem, but it'd address the particular case anyway ... regards, tom lane
Martijn van Oosterhout <kleptog@svana.org> writes: > In this particular case the syntax makes it unclear that the substring > is the problem. Perhaps here the solution would be to put a cast in the > grammer, like so: > substr_for: FOR a_expr { $$ =3D makeTypeCast($2,"int4"); } > ; Not there, because it would break the variants where FOR introduces a character expression, eg <regular expression substring function> ::= SUBSTRING <left paren> <character value expression> FROM <character value expression> FOR <escape character> <right paren> But I think we could do this in substr_list in the case where we have just "a_expr substr_for", because there are no variants of that where the FOR expression is supposed to be string. See my other message just now. regards, tom lane
On Fri, 11 Nov 2005, Tom Lane wrote: > Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > > It looks to me like we should be supporting any exact numeric with scale 0 > > there (at least AFAICS from SQL92 and SQL03), so I don't think the current > > behavior is compliant. It doesn't look like adding a numeric overload > > of the function works, and the function also becomes ambiguous for int2 > > inputs. :( > > Currently (see gram.y, about line 7600) the grammar converts > > SUBSTRING(foo FOR bar) > > into > > pg_catalog.substring(foo, 1, bar) > > and then leaves the normal function-call-analysis code to make the best > of it with that. If "bar" isn't implicitly castable to integer then > you've got trouble. Right, I was thinking we could get around it with another substring that took two numerics, but then I think FROM int2 FOR int2 would be ambiguous. > I was toying with the idea of making it translate instead to > > pg_catalog.substring(foo, 1, (bar)::int4) > > since AFAICS there isn't any other reasonable mapping once you have > committed to having the "1" in there. This does not solve the general > problem, but it'd address the particular case anyway ... And, it's fairly reasonable to assume at least right now that any reasonable string offset or length fits in an int4.
Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > On Fri, 11 Nov 2005, Tom Lane wrote: >> I was toying with the idea of making it translate instead to >> >> pg_catalog.substring(foo, 1, (bar)::int4) >> >> since AFAICS there isn't any other reasonable mapping once you have >> committed to having the "1" in there. This does not solve the general >> problem, but it'd address the particular case anyway ... > And, it's fairly reasonable to assume at least right now that any > reasonable string offset or length fits in an int4. If we thought differently we'd be changing the substring function, and we could surely change the translation at the same time. regards, tom lane
In article <20051111141232.GH13177@svana.org>, Martijn van Oosterhout <kleptog@svana.org> writes: > It's even sillier than that: > test=# SELECT substring ('1234567890' FOR 4::bigint); > substring > ----------- > (1 row) > test=# SELECT substring ('1234567890' FOR 4::int); > substring > ----------- > 1234 > (1 row) > Looking at the explain verbose make it look like it's using the wrong > version of substring. It's using the oid 2074 one: > test=# select oid, oid::regprocedure from pg_proc where proname = > 'substring'; > oid | oid > -------+------------------------------------- > 936 | "substring"(text,integer,integer) > 937 | "substring"(text,integer) > 1680 | "substring"(bit,integer,integer) > 1699 | "substring"(bit,integer) > 2012 | "substring"(bytea,integer,integer) > 2013 | "substring"(bytea,integer) > 2073 | "substring"(text,text) > 2074 | "substring"(text,text,text) <---- > 16579 | "substring"(citext,integer,integer) > 16580 | "substring"(citext,integer) > (10 rows) > That substring is for regular expressions. Nasty, not sure how to deal > with that one... Ah, so it's using "substring (STRING from PATTERN for ESCAPE)"? Yes, that explains the NULL. Looks like we're in the INT/BIGINT confusion again... > Have a nice day, It's a nice day since I have a nice workaround for this misfeature :-)
I wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: >> In this particular case the syntax makes it unclear that the substring >> is the problem. Perhaps here the solution would be to put a cast in the >> grammer, like so: > ... > But I think we could do this in substr_list in the case where we have > just "a_expr substr_for", because there are no variants of that where > the FOR expression is supposed to be string. I've applied this patch as far back as 8.0. Not sure whether there's a need to back-patch further. regards, tom lane