Re: BUG #18051: char(N) and varchar(N) behave opposite to the documentation - Mailing list pgsql-bugs

From Nicolas Gouteux
Subject Re: BUG #18051: char(N) and varchar(N) behave opposite to the documentation
Date
Msg-id CANyQZ-60Y0wYMNTzSN6O=XkEVgrfvMmKAR7nDv=jgaF1apEa2w@mail.gmail.com
Whole thread Raw
In response to Re: BUG #18051: char(N) and varchar(N) behave opposite to the documentation  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: BUG #18051: char(N) and varchar(N) behave opposite to the documentation  (Félix GERZAGUET <felix.gerzaguet@gmail.com>)
Re: BUG #18051: char(N) and varchar(N) behave opposite to the documentation  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-bugs
Hi David

Thanks a lot for the explanation.

Don't get me wrong, I almost never used the char datatype for my almost 40 years of DB programming.

Char is a reminiscence from the time when VAX and IBM machines had a single mantra: fixed length data
The fact that white spaces are insignificant is all good. Nobody would like to pad these days.
I hold for proof that client code is full of ltrim() rtrim() all the time...

I was just trying to show more junior programmers the difference between the 'assumed' padded char and the more modern 'trimming' varchar
So imagine my surprise when both length() and concatenation() yielded the exactly opposite behavior than expected!

I come from Sybase & Oracle, and I can tell you data length(char_type) always return the length of the field.

I don't have any issue with this discrepancy with other vendors, but I believe it's good to know (and maybe advertise in the docs?

Thanks again for taking the time to respond


Nicolas Gouteux | Sonar

https://sonarsource.comAre you using SonarLint in your IDE?



On Wed, 9 Aug 2023 at 16:38, David Rowley <dgrowleyml@gmail.com> wrote:
On Thu, 10 Aug 2023 at 02:05, Nicolas Gouteux
<nicolas.gouteux@sonarsource.com> wrote:
> Sorry, but I still do not get it:

I think the key is this sentence from the documentation:

"Trailing spaces are removed when converting a character value to one
of the other string types."

Your concatenation example from earlier required that the type be
implicitly cast to another type as we have no char-to-char
concatenation operator, per:

postgres=# select oprname,oprleft::regtype,oprright::regtype,oprcode
from pg_operator where oprname = '||';
 oprname |      oprleft       |      oprright      |     oprcode
---------+--------------------+--------------------+-----------------
 ||      | anycompatiblearray | anycompatible      | array_append
 ||      | anycompatible      | anycompatiblearray | array_prepend
 ||      | anycompatiblearray | anycompatiblearray | array_cat
 ||      | text               | text               | textcat
 ||      | bit varying        | bit varying        | bitcat
 ||      | bytea              | bytea              | byteacat
 ||      | text               | anynonarray        | textanycat
 ||      | anynonarray        | text               | anytextcat
 ||      | tsvector           | tsvector           | tsvector_concat
 ||      | tsquery            | tsquery            | tsquery_or
 ||      | jsonb              | jsonb              | jsonb_concat
(11 rows)

Concatenating two char(10)s would just use textcat(), so the above
line from the doc applies since type conversion is required.

> - char type adds padding up to its length
> - select length(charcol), char_length(charcoal)
> both yield 1

There is a length function (bpcharlen) for char, per:

postgres=# select proargtypes[0]::Regtype,prosrc from pg_proc where
proname = 'length';
 proargtypes |       prosrc
-------------+--------------------
 text        | textlen
 character   | bpcharlen
 lseg        | lseg_length
 path        | path_length
 bytea       | length_in_encoding
 bit         | bitlength
 bytea       | byteaoctetlen
 tsvector    | tsvector_length
(8 rows)

However, that function does:

/* get number of bytes, ignoring trailing spaces */
len = bcTruelen(arg);

and that wasn't by accident, per:

commit f27976c85b1fb9002727cce65b9f9567e158f754
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Sun Feb 1 06:27:48 2004 +0000

    Make length() disregard trailing spaces in char(n) values, per discussion
    some time ago and recent patch from Gavin Sherry.  Update documentation
    to point out that trailing spaces are insignificant in char(n).

unfortunately, we didn't link to discussions in commit messages back
then.  It might be worth you searching the archives shortly before
that date to see if anything interesting comes up.

As mentioned by David, there's not much call for using char(N) in
PostgreSQL.  I don't know the history, but I always imagined char(N)
existence was owed to improved internal optimizations in RDBMS
implementations that might have been possible if the tuples were
fixed-sized.  We have no such advantages in PostgreSQL as even a
char(N) will be stored as a variable length field. In theory, we could
take some advantage in that as tuple deformation becomes less
efficient for columns that come after a variable length field due to
the offset into the tuple not being fixed, however, because we store
char(N)s as variable length, we can't take advantage of that and it's
too late as changing it would change the binary format of the type.

David

pgsql-bugs by date:

Previous
From: David Rowley
Date:
Subject: Re: BUG #18051: char(N) and varchar(N) behave opposite to the documentation
Next
From: Nicolas Gouteux
Date:
Subject: Re: BUG #18051: char(N) and varchar(N) behave opposite to the documentation