Thread: Blank-padding (was: Oracle buys Innobase)

Blank-padding (was: Oracle buys Innobase)

From
"Dean Gibson (DB Administrator)"
Date:
On 2005-10-21 09:47, Tom Lane wrote:
> Alex Turner <armtuk@gmail.com> writes:
>   
>> It appears that casting to a char() causes spaces to be stripped (ignored) from the string:
>>     
> mls=# select length('123 '::char(8));
> length
> --------
> 3
> (1 row)
>
>   
>> I'm not sure about anyone else, but I would personaly consider that a bug?
>>     
>
> No, it's a feature, as per extensive discussion some time ago when we made it do that.  The general rule is that
trailingspaces in a char(n) are semantically insignificant.
 
>
>             regards, tom lane
>   

I remember that discussion, and I was for the change.  However, upon 
doing some testing after reading the above, I wonder if the 
blank-stripping isn't too aggressive.  I have a CHAR(6) field (say, 
named Z) that has "abc   " in it.  Suppose I want to append "x" to Z, 
with any leading spaces in Z PRESERVED.  The following do not work in 8.0.4:

select Z || 'x';

select Z::char(6) || 'x';

select Z::varchar(6) || 'x';

select (Z || '     ')::char(6) || 'x';

There are only two ways I've found:

select rpad( Z, 6) || 'x'; -- but "rpad" is apparently not a 
SQL-standard function.

select cast (Z || '      ' as varchar(6)) || 'x';  -- hokey but 
SQL-compliant

Is there something I'm missing???

-- Dean




Re: Blank-padding (was: Oracle buys Innobase)

From
Tom Lane
Date:
"Dean Gibson (DB Administrator)" <postgresql4@ultimeth.com> writes:
> I remember that discussion, and I was for the change.  However, upon 
> doing some testing after reading the above, I wonder if the 
> blank-stripping isn't too aggressive.  I have a CHAR(6) field (say, 
> named Z) that has "abc   " in it.  Suppose I want to append "x" to Z, 
> with any leading spaces in Z PRESERVED.

(You meant trailing spaces, I assume.)  Why exactly would you want to do
that?  You decided by your choice of datatype that the trailing spaces
weren't significant.  This gripe seems to me exactly comparable to
complaining if a numeric datatype doesn't remember how many trailing
zeroes you typed after the decimal point.  Those zeroes aren't
semantically significant, so you have no case.
        regards, tom lane


Re: Blank-padding

From
Chris Travers
Date:
Tom Lane wrote:

>"Dean Gibson (DB Administrator)" <postgresql4@ultimeth.com> writes:
>  
>
>>I remember that discussion, and I was for the change.  However, upon 
>>doing some testing after reading the above, I wonder if the 
>>blank-stripping isn't too aggressive.  I have a CHAR(6) field (say, 
>>named Z) that has "abc   " in it.  Suppose I want to append "x" to Z, 
>>with any leading spaces in Z PRESERVED.
>>    
>>
>
>(You meant trailing spaces, I assume.)  Why exactly would you want to do
>that?  You decided by your choice of datatype that the trailing spaces
>weren't significant.
>
I once built a telecom billing app where this might be important (fixed 
length fields).  Lets say you have fixed length fields defined as 
char(n) datatypes.  You may want to build a query to generate billing 
records like:
select field1 || field2 || field3 || field4 || field5 ... AS bill_record 
FROM lec_billing_entries;

It seels to me that I would expect trailing spaces to be preserved in 
these cases.  Having an implicit rtrim function is asking for problems.  
Personally I would rather have to call rtrim explicitly than have the 
backend treat the concatenation differently than if I do it on the client.

>  This gripe seems to me exactly comparable to
>complaining if a numeric datatype doesn't remember how many trailing
>zeroes you typed after the decimal point.  Those zeroes aren't
>semantically significant, so you have no case.\
>  
>
My only gripe here is that the implicit rtrimming is going to cause 
problems in cases where you are trying to do things with fixed-length 
fields, which is really where one is likely to use bpchar anyway.  It is 
not a showstopper, but I can see why some people don't like it.  But 
can't please everyone :-) 

Best Wishes,
Chris Travers
Metatron Technology Consulting


Re: Blank-padding

From
"Dean Gibson (DB Administrator)"
Date:
On 2005-10-21 20:23, Tom Lane wrote:
> "Dean Gibson (DB Administrator)" <postgresql4@ultimeth.com> writes:
>   
>> I remember that discussion, and I was for the change.  However, upon 
>> doing some testing after reading the above, I wonder if the 
>> blank-stripping isn't too aggressive.  I have a CHAR(6) field (say, 
>> named Z) that has "abc   " in it.  Suppose I want to append "x" to Z, 
>> with any leading (oops, trailing) spaces in Z PRESERVED.
>>     
>
> (You meant trailing spaces, I assume.)  Why exactly would you want to do that?  You decided by your choice of
datatypethat the trailing spaces weren't significant.  This gripe seems to me exactly comparable to complaining if a
numericdatatype doesn't remember how many trailing zeroes you typed after the decimal point.  Those zeroes aren't
semanticallysignificant, so you have no case.
 
>
>             regards, tom lane
>   

It's one thing, as with floating point values, where the internal 
representation is identical (1.0 stored identically to 1.00), and 
another thing where the internal representation is different (a la 
strings storing 'abc   ' differently from 'abc'.  While programming 
languages may differ as to how they compare those two strings, every one 
that I'm familiar with does not strip trailing spaces when concatenating.

It's not an issue with me, but the 8.0 concatenation does violate the 
"Principle of Least Astonishment".  As does treating 'abc   ' and 'abc' 
different during comparisons (which 8.0 fixed).

So my tentative argument would be, strip trailing spaces on CHAR(n) 
comparisons but not on concatenation or assignment, because you lost 
information.  You can always strip if you need to (as you do in other 
languages).

As I said, this issue doesn't presently affect my databases, it's just 
an observation.

-- Dean


Re: Blank-padding

From
Scott Marlowe
Date:
On Sat, 2005-10-22 at 00:39, Chris Travers wrote:
> Tom Lane wrote:
> 
> >"Dean Gibson (DB Administrator)" <postgresql4@ultimeth.com> writes:
> >  
> >
> >>I remember that discussion, and I was for the change.  However, upon 
> >>doing some testing after reading the above, I wonder if the 
> >>blank-stripping isn't too aggressive.  I have a CHAR(6) field (say, 
> >>named Z) that has "abc   " in it.  Suppose I want to append "x" to Z, 
> >>with any leading spaces in Z PRESERVED.
> >>    
> >>
> >
> >(You meant trailing spaces, I assume.)  Why exactly would you want to do
> >that?  You decided by your choice of datatype that the trailing spaces
> >weren't significant.
> >
> I once built a telecom billing app where this might be important (fixed 
> length fields).  Lets say you have fixed length fields defined as 
> char(n) datatypes.  You may want to build a query to generate billing 
> records like:
> select field1 || field2 || field3 || field4 || field5 ... AS bill_record 
> FROM lec_billing_entries;
> 
> It seels to me that I would expect trailing spaces to be preserved in 
> these cases.  Having an implicit rtrim function is asking for problems.  
> Personally I would rather have to call rtrim explicitly than have the 
> backend treat the concatenation differently than if I do it on the client.

If I rememberate correctificantly, this problem is or was caused by the
|| operator using the same internal logic for varchar / text AND char. 
Tom, is it reasonable / doable to have || have a different set of
internal functions for each of those types.