Thread: Varchar standard compliance

Varchar standard compliance

From
Peter Eisentraut
Date:
Currently, CHAR is correctly interpreted as CHAR(1), but VARCHAR is
incorrectly interpreted as VARCHAR(<infinity>).  Any reason for that,
besides the fact that it of course makes much more sense than VARCHAR(1)?

Additionally, neither CHAR nor VARCHAR seem to bark on too long input,
they just truncate silently.

I'm wondering because should the bit types be made to imitate this
incorrect behaviour, or should they start out correctly?

-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/



Re: Varchar standard compliance

From
"Mitch Vincent"
Date:
I've been wondering the difference in varchar and TEXT in the aspect of
length and indexing - what would happen if you tried to index a
varchar(BLCKSZ) ? I know you can index smaller portions of text (at least it
appears you can) so why not larger alphanumeric data? (I'm not complaining,
just trying to understand.)

I just made a varchar(30000) field, inserted some data into it and created
an index on it, it seemed to work OK -- is it really only indexing X
characters or something?

-Mitch

----- Original Message -----
From: "Peter Eisentraut" <peter_e@gmx.net>
To: "PostgreSQL Development" <pgsql-hackers@postgresql.org>
Sent: Thursday, November 16, 2000 10:16 AM
Subject: [HACKERS] Varchar standard compliance


> Currently, CHAR is correctly interpreted as CHAR(1), but VARCHAR is
> incorrectly interpreted as VARCHAR(<infinity>).  Any reason for that,
> besides the fact that it of course makes much more sense than VARCHAR(1)?
>
> Additionally, neither CHAR nor VARCHAR seem to bark on too long input,
> they just truncate silently.
>
> I'm wondering because should the bit types be made to imitate this
> incorrect behaviour, or should they start out correctly?
>
> --
> Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/
>
>



Re: Varchar standard compliance

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Currently, CHAR is correctly interpreted as CHAR(1), but VARCHAR is
> incorrectly interpreted as VARCHAR(<infinity>).  Any reason for that,
> besides the fact that it of course makes much more sense than VARCHAR(1)?

On what grounds do you claim that behavior is incorrect?
        regards, tom lane


Re: Varchar standard compliance

From
Peter Eisentraut
Date:
Tom Lane writes:

> > Currently, CHAR is correctly interpreted as CHAR(1), but VARCHAR is
> > incorrectly interpreted as VARCHAR(<infinity>).  Any reason for that,
> > besides the fact that it of course makes much more sense than VARCHAR(1)?
> 
> On what grounds do you claim that behavior is incorrect?

Because SQL says so:
        <character string type> ::=               CHARACTER [ <left paren> <length> <right paren> ]             | CHAR
[<left paren> <length> <right paren> ]             | CHARACTER VARYING <left paren> <length> <right paren>
|CHAR VARYING <left paren> <length> <right paren>             | VARCHAR <left paren> <length> <right paren>
 
        4) If <length> is omitted, then a <length> of 1 is implicit.

It doesn't make much sense to me either, but it's consistent with the
overall SQL attitude of "no anythings of possibly unlimited length".

If we want to keep this, then there would really be no difference between
VARCHAR and TEXT, right?

I'm not partial to either side, but I wanted to know what the bit types
should do.

-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/



Re: Varchar standard compliance

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
>> On what grounds do you claim that behavior is incorrect?

> Because SQL says so:

>          <character string type> ::=
>                 CHARACTER [ <left paren> <length> <right paren> ]
>               | CHAR [ <left paren> <length> <right paren> ]
>               | CHARACTER VARYING <left paren> <length> <right paren>
>               | CHAR VARYING <left paren> <length> <right paren>
>               | VARCHAR <left paren> <length> <right paren>

>          4) If <length> is omitted, then a <length> of 1 is implicit.

Well, what that actually says is that CHAR means CHAR(1).  The syntax
does not allow VARCHAR without (n), so the thing we are noncompliant
on is not what we consider the default n to be, but whether there is
a default length for varchar at all.  The spec is not offering one.

I don't particularly want to enforce the spec's position that leaving
off (n) is illegal, and given the choice between defaulting to
VARCHAR(1) or VARCHAR(large), I'll take the second.  The second one
at least has some usefulness...

> If we want to keep this, then there would really be no difference between
> VARCHAR and TEXT, right?

There's no real difference between VARCHAR without a length limit and
TEXT, no.

> I'm not partial to either side, but I wanted to know what the bit types
> should do.

I'd be inclined to stick with our existing VARCHAR behavior just on
grounds of backwards compatibility.  If you want to make the bit types
behave differently, I wouldn't say that's indefensible.

However, one advantage of treating BIT VARYING without (n) as unlimited
is that you'd have the equivalent functionality to TEXT without having
to make a third bit type...
        regards, tom lane


Re: Varchar standard compliance

From
Peter Eisentraut
Date:
Is there a reason why the conversion from CHAR to CHAR(1) is done in
analyze.c:transformColumnType rather than right in the
grammar?  Currently, you get this incorrect behaviour:

peter=# select cast('voodoo' as char(1));?column?
----------v
(1 row)
peter=# select cast('voodoo' as char);?column?
----------voodoo
(1 row)

Both should return 'v'.

-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/



Re: Varchar standard compliance

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Is there a reason why the conversion from CHAR to CHAR(1) is done in
> analyze.c:transformColumnType rather than right in the
> grammar?

Well, transformColumnType does database access, which is verboten during
the grammar phase.  (The grammar has to execute correctly even if we're
in transaction-abort state, else we'll not be able to recognize the
COMMIT or ROLLBACK command...)

You could possibly do the equivalent work in the grammar based strictly
on recognizing the keywords CHAR, NUMERIC, etc, but I think that
approach will probably run into a dead end at some point.  Really,
the grammar is NOT the place to be making semantic deductions.  It
should give back an undecorated parse tree and let parse_analyze fill
in semantic deductions.  (We've been pretty lax about that in the past,
but I've been trying to move semantics code out of gram.y recently.)

> peter=# select cast('voodoo' as char(1));
>  ?column?
> ----------
>  v
> (1 row)
> peter=# select cast('voodoo' as char);
>  ?column?
> ----------
>  voodoo
> (1 row)

Possibly transformColumnType() should be applied to datatype names
appearing in casts (and other places?) as well as those appearing in
table column declarations.  However, I find your example unconvincing:
I'd expect the result of that cast to be of type char(6), not char(1).
In short, I don't believe the above-quoted behavior is wrong.
        regards, tom lane