Thread: Bug in concat operator for Char?
People, Severity: Serious Annoyance Reproducable on: 7.4.1, 7.4.3, 7.5devel Summary: Concatination of CHAR() data type field seems to result in a TEXT value instead of a CHAR value. Is there a reason for this? Example: webmergers=> select '"'::char(4) || ''::char(4) || '"'::char(4); ?column? ---------- "" (1 row) Depending on the spec, it seems to me that the above should result either in a char(4) of " " or a char(12) of " " . But we get a text value. Is this the SQL spec? Is there another reason for this behavior? -- -Josh Berkus Aglio Database Solutions San Francisco
Folks, Also: This behavior was different in 7.1: [11:02:45] <DarcyB> darcy=# select '1'::char(4) || '-'::char(1); [11:02:45] <DarcyB> ?column? [11:02:45] <DarcyB> ---------- [11:02:45] <DarcyB> 1 - [11:02:45] <DarcyB> (1 row) [11:02:49] <DarcyB> on 7.1 And there's apparently either an issue, or a change in behavior, in CHAR for 7.5: [11:03:25] <DarcyB> darcy=# SELECT length('1'::char(4)); [11:03:25] <DarcyB> length [11:03:25] <DarcyB> -------- [11:03:25] <DarcyB> 1 [11:03:25] <DarcyB> (1 row) [11:03:29] <DarcyB> is 7.5 pg743=> select length('1'::char(4)); length -------- 4 (1 row) (on 7.4.3) Are these changes intentional, or are they bugs? -- -Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus <josh@agliodbs.com> writes: > Are these changes intentional, Yes. We've been moving more and more steadily towards the notion that trailing spaces in char(n) values are insignificant noise. If you think that trailing spaces are significant, you shouldn't be using char(n) to store them. regards, tom lane
This means that there is no more difference between CHAR(N) and VARCHAR(N). To bad... '1 ' sould be different from '1'. Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: > >>Are these changes intentional, > > > Yes. We've been moving more and more steadily towards the notion that > trailing spaces in char(n) values are insignificant noise. If you think > that trailing spaces are significant, you shouldn't be using char(n) > to store them. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
On Wed, 21 Jul 2004, Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: > > Are these changes intentional, > > Yes. We've been moving more and more steadily towards the notion that > trailing spaces in char(n) values are insignificant noise. If you think > that trailing spaces are significant, you shouldn't be using char(n) > to store them. Well, the problem here is that technically we're returning the wrong type. We should be returning a char(l1+l2) rather than a text for a char concatenate, but similarly to the recent complaint about numerics, we don't really have a fully proper way to do that and it seems non-trivial.
Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > On Wed, 21 Jul 2004, Tom Lane wrote: >> Yes. We've been moving more and more steadily towards the notion that >> trailing spaces in char(n) values are insignificant noise. If you think >> that trailing spaces are significant, you shouldn't be using char(n) >> to store them. > Well, the problem here is that technically we're returning the wrong type. > We should be returning a char(l1+l2) rather than a text for a char > concatenate, but similarly to the recent complaint about numerics, we > don't really have a fully proper way to do that and it seems non-trivial. Well, it'd be trivial to implement a char || char yielding char operator; it could just point to the existing textcat function and you'd get what you want. (It would come out as char(-1), ie unspecified length, but I'm not buying into doing the kind of analysis it would take to predict the length.) The real question in my mind is whether that would be more or less consistent with the behavior in other cases. Food for thought: in 7.4, regression=# select ('X '::char) = ('X'::char); ?column? ---------- t (1 row) regression=# select ('Y '::char) = ('Y'::char); ?column? ---------- t (1 row) regression=# select ('X '::char || 'Y '::char) = ('X'::char || 'Y'::char); ?column? ---------- t (1 row) If we change || as is proposed in this thread, then the last case would yield 'false', because the first concatenation would yield 'X Y ' which is not equal to 'XY' no matter what you think about trailing spaces. I find it a bit disturbing that the concatenation of equal values would yield unequal values. IMHO the bottom line here is that the SQL-spec behavior of type char(N) is completely brain-dead. Practically all of the questions in this area would go away if people used varchar(N) or text to store their data. regards, tom lane
On Wed, 21 Jul 2004, Tom Lane wrote: > Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > > On Wed, 21 Jul 2004, Tom Lane wrote: > >> Yes. We've been moving more and more steadily towards the notion that > >> trailing spaces in char(n) values are insignificant noise. If you think > >> that trailing spaces are significant, you shouldn't be using char(n) > >> to store them. > > > Well, the problem here is that technically we're returning the wrong type. > > We should be returning a char(l1+l2) rather than a text for a char > > concatenate, but similarly to the recent complaint about numerics, we > > don't really have a fully proper way to do that and it seems non-trivial. > > Well, it'd be trivial to implement a char || char yielding char > operator; it could just point to the existing textcat function and > you'd get what you want. (It would come out as char(-1), ie unspecified > length, but I'm not buying into doing the kind of analysis it would take > to predict the length.) The real question in my mind is whether that The reason that to do it completely means knowing the length comes from case and union afaics. Both of these need to do something consistent with the lengths. case when <blah> then 'f'::char(2) || 'g'::char(2) else 'f'::char(3) || 'g'::char(3) end should return a consistent length char no matter which branch is taken on any given row. This was the basic complaint with numeric in the -sql thread, we return the "correct" actual numeric values with proper seeming precision and scale, but if you then case two of these that gave different precision and scale, you'd get inconsistent scale in the case output. > Food for thought: in 7.4, > > regression=# select ('X '::char) = ('X'::char); > ?column? > ---------- > t > (1 row) > > regression=# select ('Y '::char) = ('Y'::char); > ?column? > ---------- > t > (1 row) > > regression=# select ('X '::char || 'Y '::char) = ('X'::char || 'Y'::char); > ?column? > ---------- > t > (1 row) > > If we change || as is proposed in this thread, then the last case would > yield 'false', because the first concatenation would yield 'X Y ' > which is not equal to 'XY' no matter what you think about trailing > spaces. I find it a bit disturbing that the concatenation of equal > values would yield unequal values. That is somewhat bad, yeah. > IMHO the bottom line here is that the SQL-spec behavior of type char(N) > is completely brain-dead. Practically all of the questions in this area > would go away if people used varchar(N) or text to store their data. It is fairly wierd, yes. I'm not sure if the spec lets you, but a NO PAD default character set probably would have made this simpler, by not requiring that 'Y'::char(4) is equal to 'Y'::char(2), but it's really too late to change that now in any case.
Tom Lane wrote: > Food for thought: in 7.4, > > regression=# select ('X '::char) = ('X'::char); > ?column? > ---------- > t > (1 row) > > regression=# select ('Y '::char) = ('Y'::char); > ?column? > ---------- > t > (1 row) > > regression=# select ('X '::char || 'Y '::char) = ('X'::char || 'Y'::char); > ?column? > ---------- > t > (1 row) > > If we change || as is proposed in this thread, then the last case would > yield 'false', because the first concatenation would yield 'X Y ' > which is not equal to 'XY' no matter what you think about trailing > spaces. I find it a bit disturbing that the concatenation of equal > values would yield unequal values. Well this indicates that the first two examples are questionable. 'X ' is quite-the-same as 'X', but not really-the-same. CREATE OR REPLACE FUNCTION toms_name() RETURNS char(50) as $BODY$ DECLARE fullname char(50); DECLARE firstname char(50) := 'Tom'; DECLARE secondname char(50) := 'G'; DECLARE lastname char(50) := 'Lane'; BEGIN fullname := firstname; IF secondname != '' THEN IF fullname != '' THEN fullname := fullname || ' '; END IF; fullname := fullname || secondname; END IF; IF fullname != '' THEN fullname := fullname || ' '; END IF; fullname := fullname || lastname; RETURN fullname; END; $BODY$ LANGUAGE 'plpgsql' I find the result of this function quite surprising, and certainly not yielding what was intended (yes, this can avoided, I know). Surprise is getting bigger, if fullname is declared as text... > IMHO the bottom line here is that the SQL-spec behavior of type char(N) > is completely brain-dead. Just for COBOL's sake, I suppose. Regards, Andreas
On July 21, 2004 08:22 am, Tom Lane wrote: > > IMHO the bottom line here is that the SQL-spec behavior of type char(N) > is completely brain-dead. Practically all of the questions in this area > would go away if people used varchar(N) or text to store their data. > > regards, tom lane For reference sake oracle treats it as follows: <Dorm> SQL> create table dummy (value char(4)); <Dorm> Table created. <Dorm> SQL> insert into dummy values ('A'); <Dorm> 1 row created. <Dorm> SQL> commit; <Dorm> Commit complete. <Dorm> SQL> select value||value from dummy; <Dorm> VALUE||V <Dorm> -------- <Dorm> A A <Dorm> SQL> select length(value||value) from dummy; <Dorm> LENGTH(VALUE||VALUE) <Dorm> -------------------- <Dorm> 8 -- Darcy Buskermolen Wavefire Technologies Corp. ph: 250.717.0200 fx: 250.763.1759 http://www.wavefire.com
Andreas Pflug <pgadmin@pse-consulting.de> writes: > Tom Lane wrote: >> Food for thought: in 7.4, >> >> regression=# select ('X '::char) = ('X'::char); >> ?column? >> ---------- >> t >> (1 row) >> >> regression=# select ('Y '::char) = ('Y'::char); >> ?column? >> ---------- >> t >> (1 row) >> >> regression=# select ('X '::char || 'Y '::char) = ('X'::char || 'Y'::char); >> ?column? >> ---------- >> t >> (1 row) >> >> If we change || as is proposed in this thread, then the last case would >> yield 'false', because the first concatenation would yield 'X Y ' >> which is not equal to 'XY' no matter what you think about trailing >> spaces. I find it a bit disturbing that the concatenation of equal >> values would yield unequal values. > Well this indicates that the first two examples are questionable. Indeed, but AFAICS this behavior is mandated by the SQL standard. (Note we are interpreting char(N) as always having the PAD SPACE behavior, though the spec really wants us to associate that with a collation instead.) regards, tom lane