Re: char(n) to varchar or text conversion should strip trailing spaces - Mailing list pgsql-hackers

From Tom Lane
Subject Re: char(n) to varchar or text conversion should strip trailing spaces
Date
Msg-id 839.1037644958@sss.pgh.pa.us
Whole thread Raw
In response to Re: char(n) to varchar or text conversion should strip trailing spaces  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
I said:
> "Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
>> One alternate possible approach would maybe be to change the on-disk
>> representation to really be binary compatible and change the input 
>> output and operator functions ?

> Seems like a great idea to me.

On further thought I've got some reservations about it.  The main
problem is that the physical contents of a char(n) Datum wouldn't
provide the complete semantic meaning: you must know the typmod as well
to know the number of padding spaces that are supposed to be there.
We have special provisions to allow input and output functions to know
the typmod, but in more general cases functions will not receive a
typmod, and in any case they cannot deliver a typmod.  So for any
operation that wanted to behave as though the padding spaces are real,
there'd be a problem.

We could maybe play some games with having char(n) values be expanded
(space-padded) during expression evaluation and then trimmed for
storage, but this strikes me as awfully messy, not to mention redundant
with the compression done by TOAST.


Also: I've been reading through the spec in more detail, and what I now
realize is that they expect trailing blanks to be ignored by default in
both char(n) and varchar(n) comparisons!  In fact, it's not really the
datatype that determines this, but the collation attribute, and they
specify that the default collation must have the PAD SPACE attribute
(which essentially means that trailing spaces are not significant).
What's more, AFAICT padding spaces in char(n) are treated as real data
by every operation except comparison --- for example, they are real
data in concatenation.

I don't think we really want to meet the letter of the spec here :-(
It seems quite schizoid to treat pad spaces as real data for everything
except comparison.  Certainly I do not want to do that for varchar or
text datatypes.

The idea of having char(n)-to-text conversion strip trailing blanks
still appeals to me, but I have to withdraw the claim that it'd improve
our spec compliance; it wouldn't.


I'm now wondering whether it wouldn't be better to leave the data
representation as-is (padding spaces are stored), and still allow binary
compatibility both ways, but add a collection of duplicate pg_proc and
pg_operator entries so that char-ness is preserved where appropriate.
For example, we'd need both these pg_proc entries for UPPER():upper(text) returns textupper(character) returns
character
They could point at the same C routine, but the parser would select the
first when the input is text or varchar, and the second when the input
is character.  This would solve the original complaint aboutupper('foo   '::char(6)) = 'FOO'
needing to yield TRUE.  The extra pg_proc entries would be a tad
tedious, but I think we'd only need a couple dozen to satisfy the spec's
requirements.  Some (perhaps not all) of these functions would need to
be duplicated:

btrim(text)
btrim(text,text)
convert(text,name)
convert(text,name,name)
convert_using(text,text)
initcap(text)
lower(text)
lpad(text,integer)
lpad(text,integer,text)
ltrim(text)
ltrim(text,text)
max(text)
min(text)
overlay(text,text,integer)
overlay(text,text,integer,integer)
repeat(text,integer)
replace(text,text,text)
rpad(text,integer)
rpad(text,integer,text)
rtrim(text)
rtrim(text,text)
split_part(text,text,integer)
substr(text,integer)
substr(text,integer,integer)
substring(text,integer)
substring(text,integer,integer)
text_larger(text,text)
text_smaller(text,text)
textcat(text,text)
translate(text,text,text)
upper(text)
        regards, tom lane


pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [CYGWIN] ipc-daemon
Next
From: Tom Lane
Date:
Subject: Re: char(n) to varchar or text conversion should strip