Thread: Substring auto trim
Hi guys, I'm not sure whether this a really dumb question, but I'm curious as to what might be the problem. We have a column 'foo' which is of type character (not varying). select substr(foo, 1, 10) from bar The result of this query are values whose trailing spaces have been trimmed automatically. This causes incorrect results when comparing to a value that may contain trailing spaces. select * from bar where substr(foo, 1, 4) = 'AB ' I should mention that we normally run Oracle and DB2 (and have done for many years), but I have been pushing for Postgres as an alternative. Fortunately this is all handled through Hibernate, and so for now I have wrapped the substr command in rpad which seems to do the trick. Any light you can shed on this issue would be much appreciated. Cheers, Charles O'Farrell PostgreSQL 8.4.2 on i486-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu 4.4.1-4ubuntu8) 4.4.1, 32-bit
Hello 2010/1/13 Charles O'Farrell <charleso@gmail.com>: > Hi guys, > > I'm not sure whether this a really dumb question, but I'm curious as to w= hat > might be the problem. > > We have a column 'foo' which is of type character (not varying). > > select substr(foo, 1, 10) from bar > > The result of this query are values whose trailing spaces have been trimm= ed > automatically. This causes incorrect results when comparing to a value th= at > may contain trailing spaces. > > select * from bar where substr(foo, 1, 4) =3D 'AB=C2=A0 ' > You have to write C function substr for type "any" :( Because "char" and char(n) are two different types, and you cannot to write function for char(n) > I should mention that we normally run Oracle and DB2 (and have done for m= any > years), but I have been pushing for Postgres as an alternative. > Fortunately this is all handled through Hibernate, and so for now I have > wrapped the substr command in rpad which seems to do the trick. > > Any light you can shed on this issue would be much appreciated. > Function substr has first parameter of type "text". When pg call this function, then it does conversion from char(x) to text. Regards Pavel Stehule > Cheers, > > Charles O'Farrell > > PostgreSQL 8.4.2 on i486-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubun= tu > 4.4.1-4ubuntu8) 4.4.1, 32-bit >
2010/1/13 Pavel Stehule <pavel.stehule@gmail.com>: > Hello > > 2010/1/13 Charles O'Farrell <charleso@gmail.com>: >> Hi guys, >> >> I'm not sure whether this a really dumb question, but I'm curious as to = what >> might be the problem. >> >> We have a column 'foo' which is of type character (not varying). >> >> select substr(foo, 1, 10) from bar >> >> The result of this query are values whose trailing spaces have been trim= med >> automatically. This causes incorrect results when comparing to a value t= hat >> may contain trailing spaces. >> >> select * from bar where substr(foo, 1, 4) =3D 'AB=C2=A0 ' >> > > You have to write C function substr for type "any" :( Because "char" > and char(n) are two different types, and you cannot to write function > for char(n) > > >> I should mention that we normally run Oracle and DB2 (and have done for = many >> years), but I have been pushing for Postgres as an alternative. >> Fortunately this is all handled through Hibernate, and so for now I have >> wrapped the substr command in rpad which seems to do the trick. >> >> Any light you can shed on this issue would be much appreciated. >> I thing, so there is workaround, create or replace function substr(character, int, int) returns character as= $$ select substr($1::cstring::text,$2,$3) $$ language sql; postgres=3D# create table f(a character(5)); CREATE TABLE postgres=3D# insert into f values('a'),('ab'),('abc'); INSERT 0 3 postgres=3D# select * from f; a ------- a ab abc (3 rows) postgres=3D# select * from f where substr(a,1,3) =3D 'a '; a ------- a (1 row) postgres=3D# select * from f where substr(a,1,3) =3D 'ab '; a ------- ab (1 row) Regards Pavel Stehule > > Function substr has first parameter of type "text". When pg call this > function, then it does conversion from char(x) to text. > > Regards > Pavel Stehule > > >> Cheers, >> >> Charles O'Farrell >> >> PostgreSQL 8.4.2 on i486-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubu= ntu >> 4.4.1-4ubuntu8) 4.4.1, 32-bit >> >
"Charles O'Farrell" <charleso@gmail.com> writes: > We have a column 'foo' which is of type character (not varying). > select substr(foo, 1, 10) from bar > The result of this query are values whose trailing spaces have been trimmed > automatically. This causes incorrect results when comparing to a value that > may contain trailing spaces. What's the data type of the value being compared to? I get, for instance, postgres=# select substr('ab '::char(4), 1, 4) = 'ab '::char(4); ?column? ---------- t (1 row) The actual value coming out of the substr() is indeed just 'ab', but that ought to be considered equal to 'ab ' anyway in char(n) semantics. Postgres considers that trailing blanks in a char(n) value are semantically insignificant, so it strips them when converting to a type where they would be significant (ie, text or varchar). What's happening in this scenario is that substr() is defined to take and return text, so the stripping happens before substr ever sees it. As Pavel noted, you could possibly work around this particular case by defining a variant of substr() that takes and returns char(n), but on the whole I'd strongly advise switching over to varchar/text if possible. The semantics of char(n) are so weird/braindamaged that it's best avoided. BTW, if you do want to use the workaround, this seems sufficient: create function substr(char,int,int) returns char strict immutable language internal as 'text_substr' ; It's the same C code, you're just avoiding the coercion on input. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > What's the data type of the value being compared to? I get, for > instance, > > postgres=# select substr('ab '::char(4), 1, 4) = 'ab '::char(4); > ?column? > ---------- > t > (1 row) This looks like another situation where we're running into trouble because of non-standard behavior when people might be expecting something consistent with other products and the explicit language in the standard. Quoting from section 5.3 of "WG3:HBA-003 H2-2003-305 August, 2003 (ISO-ANSI Working Draft) Foundation (SQL/Foundation)": | 13) The declared type of a <character string literal> is | fixed-length character string. The length of a <character | string literal> is the number of <character representation>s | that it contains. Each <quote symbol> contained in <character | string literal> represents a single <quote> in both the value | and the length of the <character string literal>. The two | <quote>s contained in a <quote symbol> shall not be separated | by any <separator>. | | NOTE 72 * <character string literal>s are allowed to be | zero-length strings (i.e., to contain no characters) even | though it is not permitted to declare a <data type> that is | CHARACTER with <length> 0 (zero). Based on that, the cast of the literals to char(4) in your example should not be needed. I don't know if there's any reasonable fix or if this should be handled with a doc change or FAQ entry. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> What's the data type of the value being compared to? I get, for >> instance, >> >> postgres=# select substr('ab '::char(4), 1, 4) = 'ab '::char(4); > This looks like another situation where we're running into trouble > because of non-standard behavior when people might be expecting > something consistent with other products and the explicit language > in the standard. If we were to change that so that 'ab ' were implicitly typed as char(4), then we'd start getting bug reports from people complaining that "select 'ab' = 'ab '" yields true. I remain of the opinion that char(n) is so hopelessly brain-damaged that we should be very careful to NOT bring it into our mainstream behavior. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >> This looks like another situation where we're running into >> trouble because of non-standard behavior when people might be >> expecting something consistent with other products and the >> explicit language in the standard. > > If we were to change that so that 'ab ' were implicitly typed as > char(4), then we'd start getting bug reports from people > complaining that "select 'ab' = 'ab '" yields true. I remain of > the opinion that char(n) is so hopelessly brain-damaged that we > should be very careful to NOT bring it into our mainstream > behavior. I'm inclined to agree with you, but it does present a barrier to those migrating. Are there any "migration considerations" documents where we should mention this? Standards compliance notes in the docs? Some form of this question seems to be asked frequently.... -Kevin
On Thu, Jan 14, 2010 at 3:21 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov > wrote: > > I'm inclined to agree with you, but it does present a barrier to > those migrating. Are there any "migration considerations" documents > where we should mention this? Standards compliance notes in the > docs? Some form of this question seems to be asked frequently.... > > Many thanks for the quick and detailed responses. Looks like we'll have to stick with that work around for now. Part of the problem for us re: varchars is that we are using Cobol where trailing spaces are significant and litter all our data and queries. Thanks again. Charles