Thread: Substring auto trim

Substring auto trim

From
"Charles O'Farrell"
Date:
Hi guys,

I'm not sure whether this a really dumb question, but I'm curious as to what
might be the problem.

We have a column 'foo' which is of type character (not varying).

select substr(foo, 1, 10) from bar

The result of this query are values whose trailing spaces have been trimmed
automatically. This causes incorrect results when comparing to a value that
may contain trailing spaces.

select * from bar where substr(foo, 1, 4) = 'AB  '

I should mention that we normally run Oracle and DB2 (and have done for many
years), but I have been pushing for Postgres as an alternative.
Fortunately this is all handled through Hibernate, and so for now I have
wrapped the substr command in rpad which seems to do the trick.

Any light you can shed on this issue would be much appreciated.

Cheers,

Charles O'Farrell

PostgreSQL 8.4.2 on i486-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu
4.4.1-4ubuntu8) 4.4.1, 32-bit

Re: Substring auto trim

From
Pavel Stehule
Date:
Hello

2010/1/13 Charles O'Farrell <charleso@gmail.com>:
> Hi guys,
>
> I'm not sure whether this a really dumb question, but I'm curious as to w=
hat
> might be the problem.
>
> We have a column 'foo' which is of type character (not varying).
>
> select substr(foo, 1, 10) from bar
>
> The result of this query are values whose trailing spaces have been trimm=
ed
> automatically. This causes incorrect results when comparing to a value th=
at
> may contain trailing spaces.
>
> select * from bar where substr(foo, 1, 4) =3D 'AB=C2=A0 '
>

You have to write C function substr for type "any" :( Because "char"
and char(n) are two different types, and you cannot to write function
for char(n)


> I should mention that we normally run Oracle and DB2 (and have done for m=
any
> years), but I have been pushing for Postgres as an alternative.
> Fortunately this is all handled through Hibernate, and so for now I have
> wrapped the substr command in rpad which seems to do the trick.
>
> Any light you can shed on this issue would be much appreciated.
>

Function substr has first parameter of type "text". When pg call this
function, then it does conversion from char(x) to text.

Regards
Pavel Stehule


> Cheers,
>
> Charles O'Farrell
>
> PostgreSQL 8.4.2 on i486-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubun=
tu
> 4.4.1-4ubuntu8) 4.4.1, 32-bit
>

Re: Substring auto trim

From
Pavel Stehule
Date:
2010/1/13 Pavel Stehule <pavel.stehule@gmail.com>:
> Hello
>
> 2010/1/13 Charles O'Farrell <charleso@gmail.com>:
>> Hi guys,
>>
>> I'm not sure whether this a really dumb question, but I'm curious as to =
what
>> might be the problem.
>>
>> We have a column 'foo' which is of type character (not varying).
>>
>> select substr(foo, 1, 10) from bar
>>
>> The result of this query are values whose trailing spaces have been trim=
med
>> automatically. This causes incorrect results when comparing to a value t=
hat
>> may contain trailing spaces.
>>
>> select * from bar where substr(foo, 1, 4) =3D 'AB=C2=A0 '
>>
>
> You have to write C function substr for type "any" :( Because "char"
> and char(n) are two different types, and you cannot to write function
> for char(n)
>
>
>> I should mention that we normally run Oracle and DB2 (and have done for =
many
>> years), but I have been pushing for Postgres as an alternative.
>> Fortunately this is all handled through Hibernate, and so for now I have
>> wrapped the substr command in rpad which seems to do the trick.
>>
>> Any light you can shed on this issue would be much appreciated.
>>

I thing, so there is workaround,

create or replace function substr(character, int, int) returns character as=
 $$
select substr($1::cstring::text,$2,$3)
$$ language sql;

postgres=3D# create table f(a character(5));
CREATE TABLE
postgres=3D# insert into f values('a'),('ab'),('abc');
INSERT 0 3
postgres=3D# select * from f;
   a
-------
 a
 ab
 abc
(3 rows)

postgres=3D# select * from f where substr(a,1,3) =3D 'a  ';
   a
-------
 a
(1 row)

postgres=3D# select * from f where substr(a,1,3) =3D 'ab  ';
   a
-------
 ab
(1 row)

Regards
Pavel Stehule

>
> Function substr has first parameter of type "text". When pg call this
> function, then it does conversion from char(x) to text.
>
> Regards
> Pavel Stehule
>
>
>> Cheers,
>>
>> Charles O'Farrell
>>
>> PostgreSQL 8.4.2 on i486-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubu=
ntu
>> 4.4.1-4ubuntu8) 4.4.1, 32-bit
>>
>

Re: Substring auto trim

From
Tom Lane
Date:
"Charles O'Farrell" <charleso@gmail.com> writes:
> We have a column 'foo' which is of type character (not varying).

> select substr(foo, 1, 10) from bar

> The result of this query are values whose trailing spaces have been trimmed
> automatically. This causes incorrect results when comparing to a value that
> may contain trailing spaces.

What's the data type of the value being compared to?  I get, for instance,

postgres=# select substr('ab  '::char(4), 1, 4) = 'ab  '::char(4);
 ?column?
----------
 t
(1 row)

The actual value coming out of the substr() is indeed just 'ab',
but that ought to be considered equal to 'ab  ' anyway in char(n)
semantics.

Postgres considers that trailing blanks in a char(n) value are
semantically insignificant, so it strips them when converting to a type
where they would be significant (ie, text or varchar).  What's happening
in this scenario is that substr() is defined to take and return text,
so the stripping happens before substr ever sees it.

As Pavel noted, you could possibly work around this particular case by
defining a variant of substr() that takes and returns char(n), but on
the whole I'd strongly advise switching over to varchar/text if
possible.  The semantics of char(n) are so weird/braindamaged that
it's best avoided.

BTW, if you do want to use the workaround, this seems sufficient:

create function substr(char,int,int) returns char
  strict immutable language internal as 'text_substr' ;

It's the same C code, you're just avoiding the coercion on input.

            regards, tom lane

Re: Substring auto trim

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> What's the data type of the value being compared to?  I get, for
> instance,
>
> postgres=# select substr('ab  '::char(4), 1, 4) = 'ab  '::char(4);
>  ?column?
> ----------
>  t
> (1 row)

This looks like another situation where we're running into trouble
because of non-standard behavior when people might be expecting
something consistent with other products and the explicit language
in the standard.

Quoting from section 5.3 of "WG3:HBA-003 H2-2003-305 August, 2003
(ISO-ANSI Working Draft) Foundation (SQL/Foundation)":

| 13) The declared type of a <character string literal> is
|     fixed-length character string. The length of a <character
|     string literal> is the number of <character representation>s
|     that it contains. Each <quote symbol> contained in <character
|     string literal> represents a single <quote> in both the value
|     and the length of the <character string literal>. The two
|     <quote>s contained in a <quote symbol> shall not be separated
|     by any <separator>.
|
|     NOTE 72 * <character string literal>s are allowed to be
|     zero-length strings (i.e., to contain no characters) even
|     though it is not permitted to declare a <data type> that is
|     CHARACTER with <length> 0 (zero).

Based on that, the cast of the literals to char(4) in your example
should not be needed.  I don't know if there's any reasonable fix
or if this should be handled with a doc change or FAQ entry.

-Kevin

Re: Substring auto trim

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> What's the data type of the value being compared to?  I get, for
>> instance,
>>
>> postgres=# select substr('ab  '::char(4), 1, 4) = 'ab  '::char(4);

> This looks like another situation where we're running into trouble
> because of non-standard behavior when people might be expecting
> something consistent with other products and the explicit language
> in the standard.

If we were to change that so that 'ab  ' were implicitly typed as
char(4), then we'd start getting bug reports from people complaining
that "select 'ab' = 'ab  '" yields true.  I remain of the opinion that
char(n) is so hopelessly brain-damaged that we should be very careful
to NOT bring it into our mainstream behavior.

            regards, tom lane

Re: Substring auto trim

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

>> This looks like another situation where we're running into
>> trouble because of non-standard behavior when people might be
>> expecting something consistent with other products and the
>> explicit language in the standard.
>
> If we were to change that so that 'ab  ' were implicitly typed as
> char(4), then we'd start getting bug reports from people
> complaining that "select 'ab' = 'ab  '" yields true.  I remain of
> the opinion that char(n) is so hopelessly brain-damaged that we
> should be very careful to NOT bring it into our mainstream
> behavior.

I'm inclined to agree with you, but it does present a barrier to
those migrating.  Are there any "migration considerations" documents
where we should mention this?  Standards compliance notes in the
docs?  Some form of this question seems to be asked frequently....

-Kevin

Re: Substring auto trim

From
"Charles O'Farrell"
Date:
On Thu, Jan 14, 2010 at 3:21 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov
> wrote:

>
> I'm inclined to agree with you, but it does present a barrier to
> those migrating.  Are there any "migration considerations" documents
> where we should mention this?  Standards compliance notes in the
> docs?  Some form of this question seems to be asked frequently....
>
>
Many thanks for the quick and detailed responses. Looks like we'll have to
stick with that work around for now.

Part of the problem for us re: varchars is that we are using Cobol where
trailing spaces are significant and litter all our data and queries.

Thanks again.

Charles