Thread: BUG #17258: Unexpected results in CHAR(1) data type

BUG #17258: Unexpected results in CHAR(1) data type

From

PG Bug reporting form

Date:

29 October 2021, 18:14:56

The following bug has been logged on the website:

Bug reference:      17258
Logged by:          David Calascibetta
Email address:      david@calascibetta.com
PostgreSQL version: 12.8
Operating system:   AWS RDS
Description:

My understanding is that a CHAR(1) column can never be 0 (zero) length.
I expect the results of these two SELECT statements to produce the same
result:

select length(substr('   '::varchar,1,1)), ascii(substr('
'::varchar,1,1));
produces==> 1    32   (as expected)

select length(substr('   '::char,1,1)), ascii(substr('   '::char,1,1));
produces==> 0     0     (not as expected)

Re: BUG #17258: Unexpected results in CHAR(1) data type

From

"David G. Johnston"

Date:

29 October 2021, 18:47:57

On Friday, October 29, 2021, PG Bug reporting form <noreply@postgresql.org> wrote:

The following bug has been logged on the website:

Bug reference: 17258
Logged by: David Calascibetta
Email address: david@calascibetta.com
PostgreSQL version: 12.8
Operating system: AWS RDS
Description:

My understanding is that a CHAR(1) column can never be 0 (zero) length.

https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_char.28n.29

David J.

Re: BUG #17258: Unexpected results in CHAR(1) data type

From

Mark Dilger

Date:

29 October 2021, 18:57:51

> On Oct 29, 2021, at 11:14 AM, PG Bug reporting form <noreply@postgresql.org> wrote:
>
> select length(substr('   '::varchar,1,1)), ascii(substr('
> '::varchar,1,1));
> produces==> 1    32   (as expected)
>
> select length(substr('   '::char,1,1)), ascii(substr('   '::char,1,1));
> produces==> 0     0     (not as expected)

There are four substr() functions defined in the standard distribution.  They are:

  substr(text, int4, int4)
  substr(text, int4)
  substr(bytea, int4, int4)
  substr(bytea, int4)

There is none defined directly for char(n).  Since "text" is the preferred type (not "bytea"), I expect a cast of your
''::char to text before entry to the function.  So the question is why casting ' '::char to text is different than '
'::text. The answer is that cast is implemented using rtrim1, which trims trailing space. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

FW: BUG #17258: Unexpected results in CHAR(1) data type

From

"David M. Calascibetta"

Date:

29 October 2021, 19:32:48

-----Original Message-----
From: David M. Calascibetta <david@calascibetta.com>
Sent: Friday, October 29, 2021 2:04 PM
To: 'Mark Dilger' <mark.dilger@enterprisedb.com>
Subject: RE: BUG #17258: Unexpected results in CHAR(1) data type

Ok, but my example was just a simplified version of what is going on.
The actual problem stems from a CHAR(1) column data type that is behaving the same way.
I was just trying to create a super-simple example of the problem.
It still seems to me that a CHAR(1) should never be zero length, regardless of how it's implemented.

-----Original Message-----
From: Mark Dilger <mark.dilger@enterprisedb.com>
Sent: Friday, October 29, 2021 1:58 PM
To: david@calascibetta.com; PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org>
Subject: Re: BUG #17258: Unexpected results in CHAR(1) data type

> On Oct 29, 2021, at 11:14 AM, PG Bug reporting form <noreply@postgresql.org> wrote:
>
> select length(substr('   '::varchar,1,1)), ascii(substr('
> '::varchar,1,1));
> produces==> 1    32   (as expected)
>
> select length(substr('   '::char,1,1)), ascii(substr('   '::char,1,1));
> produces==> 0     0     (not as expected)

There are four substr() functions defined in the standard distribution.  They are:

  substr(text, int4, int4)
  substr(text, int4)
  substr(bytea, int4, int4)
  substr(bytea, int4)

There is none defined directly for char(n).  Since "text" is the preferred type (not "bytea"), I expect a cast of your
''::char to text before entry to the function.  So the question is why casting ' '::char to text is different than '
'::text. The answer is that cast is implemented using rtrim1, which trims trailing space. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company

Re: FW: BUG #17258: Unexpected results in CHAR(1) data type

From

"David G. Johnston"

Date:

29 October 2021, 20:39:33

On Fri, Oct 29, 2021 at 1:17 PM David M. Calascibetta <david@calascibetta.com> wrote:

Subject: RE: BUG #17258: Unexpected results in CHAR(1) data type

Ok, but my example was just a simplified version of what is going on.
The actual problem stems from a CHAR(1) column data type that is behaving the same way.
I was just trying to create a super-simple example of the problem.
It still seems to me that a CHAR(1) should never be zero length, regardless of how it's implemented.

This qualifies as a feature request, not a bug. One could write a version of substr that does what you expect (it probably wouldn't be named substr though) and takes in a character data type. It's just no one has, nor is likely to. Thus you are stuck using versions that take in text and you get the char-to-text casting side effects.

If you do octet_length(' ':: character(1)) it will return 1, not zero. So it indeed has a length one.

David J.

RE: FW: BUG #17258: Unexpected results in CHAR(1) data type

From

"David M. Calascibetta"

Date:

29 October 2021, 21:06:02

I disagree about it being a feature request.

I was only using substr to demonstrate the problem.

Here is the problem without using any functions:

create table x (a char(1));

insert into x values ('x');

insert into x values (' ');

select length(a) from x;

From: David G. Johnston <david.g.johnston@gmail.com>
Sent: Friday, October 29, 2021 3:40 PM
To: David@calascibetta.com
Cc: PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org>
Subject: Re: FW: BUG #17258: Unexpected results in CHAR(1) data type

On Fri, Oct 29, 2021 at 1:17 PM David M. Calascibetta <david@calascibetta.com> wrote:

Subject: RE: BUG #17258: Unexpected results in CHAR(1) data type

Ok, but my example was just a simplified version of what is going on.
The actual problem stems from a CHAR(1) column data type that is behaving the same way.
I was just trying to create a super-simple example of the problem.
It still seems to me that a CHAR(1) should never be zero length, regardless of how it's implemented.

If you do octet_length(' ':: character(1)) it will return 1, not zero. So it indeed has a length one.

David J.

Re: FW: BUG #17258: Unexpected results in CHAR(1) data type

From

"David G. Johnston"

Date:

29 October 2021, 21:08:09

On Fri, Oct 29, 2021 at 2:06 PM David M. Calascibetta <david@calascibetta.com> wrote:

I disagree about it being a feature request.
I was only using substr to demonstrate the problem.
Here is the problem without using any functions:

create table x (a char(1));
insert into x values ('x');
insert into x values (' ');
select length(a) from x;
1
0

If length(a) isn't a function call then what is it?

David J.

Re: BUG #17258: Unexpected results in CHAR(1) data type

From

Mark Dilger

Date:

29 October 2021, 21:16:06

> On Oct 29, 2021, at 12:32 PM, David M. Calascibetta <david@calascibetta.com> wrote:
>
> Ok, but my example was just a simplified version of what is going on.
> The actual problem stems from a CHAR(1) column data type that is behaving the same way.
> I was just trying to create a super-simple example of the problem.
> It still seems to me that a CHAR(1) should never be zero length, regardless of how it's implemented.

Please consider:

mark.dilger=# select ''::char(1) = '               '::char(50);
 ?column?
----------
 t
(1 row)

I infer that you expect a single byte of space to be compared against 50 bytes of space, and to be found unequal.
Postgresdoesn't treat trailing spaces in char(n) the way I infer that you expect.  Even without casting to another type
(andthe equality operator for char(n) does not cast to another type) the comparison logic intentionally ignores the
trailingspaces. 

Consider also:

mark.dilger=# select length('   '::char(50));
 length
--------
      0
(1 row)

Perhaps this behavior is nuts, but I say it is not a bug, just a peculiarity in how char(n) is defined to behave.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

RE: BUG #17258: Unexpected results in CHAR(1) data type

From

"David M. Calascibetta"

Date:

29 October 2021, 21:26:15

OK. I have a work-around so I'm alright.
I agree, the behavior is nuts, and is inconsistent with every other RDBMS out there.
I was only reporting it to improve the product, but if you think this is appropriate behavior,
I'm good with it. You can close this issue.

David

-----Original Message-----
From: Mark Dilger <mark.dilger@enterprisedb.com>
Sent: Friday, October 29, 2021 4:16 PM
To: <David@calascibetta.com> <David@Calascibetta.com>
Cc: pgsql-bugs@lists.postgresql.org
Subject: Re: BUG #17258: Unexpected results in CHAR(1) data type

> On Oct 29, 2021, at 12:32 PM, David M. Calascibetta <david@calascibetta.com> wrote:
>
> Ok, but my example was just a simplified version of what is going on.
> The actual problem stems from a CHAR(1) column data type that is behaving the same way.
> I was just trying to create a super-simple example of the problem.
> It still seems to me that a CHAR(1) should never be zero length, regardless of how it's implemented.

Please consider:

mark.dilger=# select ''::char(1) = '               '::char(50);
 ?column?
----------
 t
(1 row)

I infer that you expect a single byte of space to be compared against 50 bytes of space, and to be found unequal.
Postgresdoesn't treat trailing spaces in char(n) the way I infer that you expect.  Even without casting to another type
(andthe equality operator for char(n) does not cast to another type) the comparison logic intentionally ignores the
trailingspaces. 

Consider also:

mark.dilger=# select length('   '::char(50));
 length
--------
      0
(1 row)

Perhaps this behavior is nuts, but I say it is not a bug, just a peculiarity in how char(n) is defined to behave.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company

Re: BUG #17258: Unexpected results in CHAR(1) data type

From

Mark Dilger

Date:

29 October 2021, 21:36:08

> On Oct 29, 2021, at 2:26 PM, David M. Calascibetta <david@calascibetta.com> wrote:
>
> OK. I have a work-around so I'm alright.

Glad to hear it.

> I agree, the behavior is nuts, and is inconsistent with every other RDBMS out there.

I haven't studied the behavior of char(n) on other RDBMS products.  I'd be curious if the SQL spec says anything that
we'reviolating in this regard.  If so, we should at least have a warning in the docs about that.  (We already have a
warningabout how char(n) behaves, but nothing I see about the behavior being non-compliant.)  But I'm wondering if
otherRDBMS products really differ in this regard?  Are you perhaps thinking about how varchar(n) works?  

> I was only reporting it to improve the product, but if you think this is appropriate behavior,
> I'm good with it.

I tend to think of char(n) as a misfeature and avoid using it.  Based on your experience with other RDBMSs, would you
expectchar(n) and varchar(n) to behave the same or to behave differently?  In postgres, they are different, and
varchar(n)would behave more like you seem to want. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

RE: BUG #17258: Unexpected results in CHAR(1) data type

From

"David M. Calascibetta"

Date:

29 October 2021, 21:52:46

> I agree, the behavior is nuts, and is inconsistent with every other RDBMS out there.

I haven't studied the behavior of char(n) on other RDBMS products.  I'd be curious if the SQL spec says anything that
we'reviolating in this regard.  If so, we should at least have a warning in the docs about that.  (We already have a
warningabout how char(n) behaves, but nothing I see about the behavior being non-compliant.)  But I'm wondering if
otherRDBMS products really differ in this regard?  Are you perhaps thinking about how varchar(n) works?  

In every other rdbms, if it's fixed length character, values will always be that fixed length.
PG behaves that way unless the value is blank. I believe this is in violation of the spec,
which is why I reported it. Even the PG doc says it will pad with blanks. It does not give
an exception for blank values.



> I was only reporting it to improve the product, but if you think this
> is appropriate behavior, I'm good with it.

I tend to think of char(n) as a misfeature and avoid using it.  Based on your experience with other RDBMSs, would you
expectchar(n) and varchar(n) to behave the same or to behave differently?  In postgres, they are different, and
varchar(n)would behave more like you seem to want. 

They should behave the same, in that if I insert a blank into a char(1) column, it should store a blank value.
If I insert a blank into a varchar(1) column, it should store a blank value.  Fixed length characters would pad
on the right, but it should not distort the value I entered in any other way. In this case, I enter a blank value and
PG removes it.  That is not appropriate behavior.

Re: BUG #17258: Unexpected results in CHAR(1) data type

From

"David G. Johnston"

Date:

29 October 2021, 22:08:12

On Fri, Oct 29, 2021 at 2:59 PM David M. Calascibetta <david@calascibetta.com> wrote:

They should behave the same, in that if I insert a blank into a char(1) column, it should store a blank value.
If I insert a blank into a varchar(1) column, it should store a blank value. Fixed length characters would pad
on the right, but it should not distort the value I entered in any other way. In this case, I enter a blank value and
PG removes it. That is not appropriate behavior.

As implemented it has no way to distinguish between user-added trailing whitespace and padding-added trailing whitespace. So instead it just says "trailing whitespace is insignificant" and, further, insignificant whitespace is discarded when converting to a data type that doesn't have the concept of insignificant whitespace.

I don't necessarily agree with this either, but I'm in favor of the position that char is obsolete and should not be used in new development, nor should it be expected to be maintained.

David J.

Re: BUG #17258: Unexpected results in CHAR(1) data type

From

Mark Dilger

Date:

29 October 2021, 22:09:46


> On Oct 29, 2021, at 2:52 PM, David M. Calascibetta <david@calascibetta.com> wrote:
>
> In every other rdbms, if it's fixed length character, values will always be that fixed length.
> PG behaves that way unless the value is blank. I believe this is in violation of the spec,
> which is why I reported it. Even the PG doc says it will pad with blanks. It does not give
> an exception for blank values.

Taken from the postgres docs:

> Values of type character are physically padded with spaces to the specified width n, and are stored and displayed
thatway. However, trailing spaces are treated as semantically insignificant and disregarded when comparing two values
oftype character. In collations where whitespace is significant, this behavior can produce unexpected results; for
exampleSELECT 'a '::CHAR(2) collate "C" < E'a\n'::CHAR(2) returns true, even though C locale would consider a space to
begreater than a newline. Trailing spaces are removed when converting a character value to one of the other string
types.Note that trailing spaces are semantically significant in character varying and text values, and when using
patternmatching, that is LIKE and regular expressions. 

The two parts of that which at least hint at the behavior are "However, trailing spaces are treated as semantically
insignificantand disregarded when comparing two values of type character." and "Trailing spaces are removed when
convertinga character value to one of the other string types." 

There aren't that many functions which directly accept a bpchar type (aka, a char(n)), so most of the functionality
involvingchar(n) will involve first casting to text before handing the resultant text off to a function, with the cast
triggeringthe "spaces are removed when converting" bit, and those functions that do directly take a bpchar type for
comparisonpurposes trigger the "semantically insignificant and disregarded" bit.  For practical purposes you will
frequentlyhit the one bit or the other.   

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #17258: Unexpected results in CHAR(1) data type

From

"David G. Johnston"

Date:

29 October 2021, 22:13:08

On Fri, Oct 29, 2021 at 3:09 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:

and those functions that do directly take a bpchar type for comparison purposes trigger the "semantically insignificant and disregarded" bit. For practical purposes you will frequently hit the one bit or the other.

As I noted in a prior reply, octet_length(char) does the length computation with the padding spaces. So it is possible for char input functions to do the expected thing.

David J.

Re: BUG #17258: Unexpected results in CHAR(1) data type

From

Mark Dilger

Date:

29 October 2021, 22:18:30

> On Oct 29, 2021, at 3:13 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
>
> For practical purposes you will frequently hit the one bit or the other.
>
>
> As I noted in a prior reply, octet_length(char) does the length computation with the padding spaces.  So it is
possiblefor char input functions to do the expected thing. 

Yes, I saw that.  But there aren't that many functions like octet_length that do so.  If users coming from other RDBMSs
expectCHAR(1) to behave as David expects them to behave, it's cold comfort to say, "hang in there, you can still use
octet_length()on them!"  Better to say that they are going to be bitten by this expectation again and again, and
insteadchoose a different datatype (which you also said.) 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #17258: Unexpected results in CHAR(1) data type

From

Tom Lane

Date:

29 October 2021, 22:23:27

Mark Dilger <mark.dilger@enterprisedb.com> writes:
> I haven't studied the behavior of char(n) on other RDBMS products.  I'd be curious if the SQL spec says anything that
we'reviolating in this regard. 

It's not a great approximation to the spec.  Postgres views trailing
spaces in a char(n) value as always being semantically insignificant,
where I think the spec treats them as insignificant only for purposes of
comparisons.  Even more to the point, the spec considers that PAD SPACE
is an attribute of *collations* not data types.  Back in the day we
didn't have collations, so the only way to even approximate that
behavior was to make it a data type property.  Now that we do have
collations, it'd be conceivable to reimplement all this in something
closer to the way the spec describes it.  But it'd be a lot of work,
and I'm not sure we'd accept such a patch even if somebody wrote it.
It'd almost inevitably break applications that are relying on the
existing behavior.

> I tend to think of char(n) as a misfeature and avoid using it.

Yeah, that.  I haven't seen any reason to use char(n) rather than
varchar(n) since punched cards stopped being a thing.  So it's hard
to summon the motivation to do a lot of work on that data type.

Perhaps somebody else will feel more motivated, but nobody's stepped
forward, and I wouldn't counsel holding your breath for it.

            regards, tom lane

Re: BUG #17258: Unexpected results in CHAR(1) data type

From

Mark Dilger

Date:

29 October 2021, 22:34:35


> On Oct 29, 2021, at 3:09 PM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
>
> most of the functionality involving char(n)

...

> and those functions that do directly take a bpchar type for comparison purposes trigger the "semantically
insignificantand disregarded" bit 

I meant the "most of the functionality" qualifier to apply to the second part, meaning "most of the functions that
do..."but on re-reading, my grammar didn't accomplish that.  I now understand why David Johnston corrected me here: 

> On Oct 29, 2021, at 3:13 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
>
> As I noted in a prior reply, octet_length(char) does the length computation with the padding spaces.  So it is
possiblefor char input functions to do the expected thing. 

You are correct, sir.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

RE: BUG #17258: Unexpected results in CHAR(1) data type

From

"David M. Calascibetta"

Date:

29 October 2021, 23:10:00

I understand your position. I've been there myself.
We will convert the CHAR(1) columns to VARCHAR and keep going.
I just thought it was strange and you should know about it.
Not a problem.
Thanks for your attention.

David

-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us> 
Sent: Friday, October 29, 2021 5:23 PM
To: Mark Dilger <mark.dilger@enterprisedb.com>
Cc: <David@calascibetta.com> <David@Calascibetta.com>;
pgsql-bugs@lists.postgresql.org
Subject: Re: BUG #17258: Unexpected results in CHAR(1) data type

Mark Dilger <mark.dilger@enterprisedb.com> writes:
> I haven't studied the behavior of char(n) on other RDBMS products.  I'd be
curious if the SQL spec says anything that we're violating in this regard.

It's not a great approximation to the spec.  Postgres views trailing spaces
in a char(n) value as always being semantically insignificant, where I think
the spec treats them as insignificant only for purposes of comparisons.
Even more to the point, the spec considers that PAD SPACE is an attribute of
*collations* not data types.  Back in the day we didn't have collations, so
the only way to even approximate that behavior was to make it a data type
property.  Now that we do have collations, it'd be conceivable to
reimplement all this in something closer to the way the spec describes it.
But it'd be a lot of work, and I'm not sure we'd accept such a patch even if
somebody wrote it.
It'd almost inevitably break applications that are relying on the existing
behavior.

> I tend to think of char(n) as a misfeature and avoid using it.

Yeah, that.  I haven't seen any reason to use char(n) rather than
varchar(n) since punched cards stopped being a thing.  So it's hard to
summon the motivation to do a lot of work on that data type.

Perhaps somebody else will feel more motivated, but nobody's stepped
forward, and I wouldn't counsel holding your breath for it.

            regards, tom lane

Re: BUG #17258: Unexpected results in CHAR(1) data type

From

Marek Läll

Date:

30 October 2021, 11:23:01

Hi,

> I haven't studied the behavior of char(n) on other RDBMS products.

At school, at SQL language course, the following was teached:
CHAR( size) is a fixed length string of length "size" characters (not bytes). Whatever data you enter, space is added at the end.

I can offer Oracle behaviour.

SQL> create table test (
id NUMBER
, c2 char(2 char)
, c10 char(10 char)
);

Table TEST created.

SQL> insert into test values (1, ' ', ' ');
1 row inserted.

SQL> insert into test values (2, 'ä', 'ä');
1 row inserted.

SQL> select id, length(c2), length(c10) from test;
ID LENGTH(C2) LENGTH(C10)
-- ---------- -----------
1 2 10
2 2 10

2 rows selected.

-- LENGTH IN BYTES
SQL> select id, lengthb(c2), lengthb(c10) from test;
ID LENGTHB(C2) LENGTHB(C10)
-- ----------- ------------
1 2 10
2 3 11

2 rows selected.

-- LENGTH IN CHARS
SQL> select id, lengthc(c2), lengthc(c10) from test;
ID LENGTHC(C2) LENGTHC(C10)
-- ----------- ------------
1 2 10
2 2 10

2 rows selected.

-- DUMP OF REAL VALUE ON STORAGE
SQL> select id, dump(c10, 1015) from test;
ID DUMP(C10,1015)
-- -----------------------------------------------------------------------
1 Typ=96 Len=10 CharacterSet=AL32UTF8: 32,32,32,32,32,32,32,32,32,32
2 Typ=96 Len=11 CharacterSet=AL32UTF8: 195,164,32,32,32,32,32,32,32,32,32

2 rows selected.

-- ARE THEY EQUAL? YES
SQL> select * from test where c2 = c10;
ID C2 C10
-- -- ----------
1
2 ä ä

2 rows selected.

SQL> select * from test where c10 = 'ä ';
ID C2 C10
-- -- ----------
2 ä ä

1 rows selected.

SQL> select banner_full from v$version;
BANNER_FULL
-----------------------------------------------------------------------------------------
Oracle Database 21c Enterprise Edition Release 21.0.0.0.0 - Production
Version 21.2.0.0.0

Regards

Marek