Thread: length coerce for bpchar is broken since 7.0

length coerce for bpchar is broken since 7.0

From
Tatsuo Ishii
Date:
It seems the length coerce for bpchar is broken since 7.0.
In 6.5 when a string is inserted, bpchar() is called to properly clip
the string. However in 7.0 (and probably current) bpchar() is not
called anymore. 

coerce_type_typmod() calls exprTypmod(). exprTypmod() returns VARSIZE
of the bpchar data only if the data type is bpchar (if the data type
is varchar, exprTypmod just returns -1 and the parser add a function
node to call varchar(). so there is no problem for varchar). If
VARSIZE returned from exprTypmod() and atttypmod passed to
coerce_type_typmod() is equal, the function node to call bpchar()
would not be added.

I'm not sure if this was an intended efect of the change. Anyway we
have to do the length coerce for bpchar somewhere and I'm thinking now
is doing in bpcharin(). This would also solve the problem in copy in a
mail I have posted.

Comments?
--
Tatsuo Ishii


Re: length coerce for bpchar is broken since 7.0

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> If VARSIZE returned from exprTypmod() and atttypmod passed to
> coerce_type_typmod() is equal, the function node to call bpchar()
> would not be added.

Um, what's wrong with that?  Seems to me that parse_coerce is doing
exactly what it's supposed to, ie, adding only length coercions
that are needed.
        regards, tom lane


Re: length coerce for bpchar is broken since 7.0

From
Tatsuo Ishii
Date:
> > If VARSIZE returned from exprTypmod() and atttypmod passed to
> > coerce_type_typmod() is equal, the function node to call bpchar()
> > would not be added.
> 
> Um, what's wrong with that?  Seems to me that parse_coerce is doing
> exactly what it's supposed to, ie, adding only length coercions
> that are needed.

Simply clipping multibyte strings by atttypmode might produce
incorrect multibyte strings. Consider a case inserting 3 multibyte
letters (each consisting of 2 bytes) into a char(5) column.

Or this kind of consideration should be in bpcharin() as I said in the
earilier mail?
--
Tatsuo Ishii


Re: length coerce for bpchar is broken since 7.0

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
>>>> If VARSIZE returned from exprTypmod() and atttypmod passed to
>>>> coerce_type_typmod() is equal, the function node to call bpchar()
>>>> would not be added.
>> 
>> Um, what's wrong with that?  Seems to me that parse_coerce is doing
>> exactly what it's supposed to, ie, adding only length coercions
>> that are needed.

> Simply clipping multibyte strings by atttypmode might produce
> incorrect multibyte strings. Consider a case inserting 3 multibyte
> letters (each consisting of 2 bytes) into a char(5) column.

It seems to me that this means that atttypmod or exprTypmod() isn't
correctly defined for MULTIBYTE char(n) values.  We should define
typmod in such a way that they agree iff the string is correctly
clipped.  This might be easier said than done, perhaps, but I don't
like the idea of having to apply length-coercion functions all the
time because we can't figure out whether they're needed or not.
        regards, tom lane


Re: length coerce for bpchar is broken since 7.0

From
Tatsuo Ishii
Date:
> > Simply clipping multibyte strings by atttypmode might produce
> > incorrect multibyte strings. Consider a case inserting 3 multibyte
> > letters (each consisting of 2 bytes) into a char(5) column.
> 
> It seems to me that this means that atttypmod or exprTypmod() isn't
> correctly defined for MULTIBYTE char(n) values.  We should define
> typmod in such a way that they agree iff the string is correctly
> clipped.  This might be easier said than done, perhaps, but I don't
> like the idea of having to apply length-coercion functions all the
> time because we can't figure out whether they're needed or not.

Before going further, may I ask you a question. Why in exprTypmod() is
bpchar() treated differently from other data types such as varchar?
            switch (con->consttype)            {                case BPCHAROID:                    if
(!con->constisnull)                       return VARSIZE(DatumGetPointer(con->constvalue));                    break;
            default:                    break;            }
 

--
Tatsuo Ishii


Re: length coerce for bpchar is broken since 7.0

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> Before going further, may I ask you a question. Why in exprTypmod() is
> bpchar() treated differently from other data types such as varchar?

It's just hardwired knowledge about that particular datatype.  In the
light of your comments, it seems clear that the code here is wrong
for the MULTIBYTE case: instead of plain VARSIZE(), it should be
returning the number of multibyte characters + 4 (or whatever
atttypmod is defined to mean for MULTIBYTE bpchar).  I think I wrote
this code to start with, so you can blame me for the fact that it
neglects the MULTIBYTE case :-(
        regards, tom lane


Re: length coerce for bpchar is broken since 7.0

From
Tatsuo Ishii
Date:
> It's just hardwired knowledge about that particular datatype.  In the
> light of your comments, it seems clear that the code here is wrong
> for the MULTIBYTE case: instead of plain VARSIZE(), it should be
> returning the number of multibyte characters + 4 (or whatever
> atttypmod is defined to mean for MULTIBYTE bpchar).  I think I wrote
> this code to start with, so you can blame me for the fact that it
> neglects the MULTIBYTE case :-(

I'm going to fix the problem by changing bpcharin() rather than
changing exprTypmod(). Surely we could fix the problem by changing
exprTypmod() for INSERT, however, we could not fix the similar problem
for COPY FROM in the same way.  Changing bpcharin() would solve
problems of both INSERT and COPY FROM. So bpcharin() seems more
appropreate place to fix both problems.
--
Tatsuo Ishii


Re: length coerce for bpchar is broken since 7.0

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> I'm going to fix the problem by changing bpcharin() rather than
> changing exprTypmod(). Surely we could fix the problem by changing
> exprTypmod() for INSERT, however, we could not fix the similar problem
> for COPY FROM in the same way.  Changing bpcharin() would solve
> problems of both INSERT and COPY FROM. So bpcharin() seems more
> appropreate place to fix both problems.

bpcharin() will most definitely NOT fix the problem, because it often
will not know the target column's typmod, if indeed there is an
identifiable target column at all.  I agree that it's a good solution
for COPY FROM, but you need to fix exprTypmod() too.
        regards, tom lane


Re: length coerce for bpchar is broken since 7.0

From
Tatsuo Ishii
Date:
> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> > I'm going to fix the problem by changing bpcharin() rather than
> > changing exprTypmod(). Surely we could fix the problem by changing
> > exprTypmod() for INSERT, however, we could not fix the similar problem
> > for COPY FROM in the same way.  Changing bpcharin() would solve
> > problems of both INSERT and COPY FROM. So bpcharin() seems more
> > appropreate place to fix both problems.
> 
> bpcharin() will most definitely NOT fix the problem, because it often
> will not know the target column's typmod, if indeed there is an
> identifiable target column at all. 

Can you give me any example for this case?

> I agree that it's a good solution
> for COPY FROM, but you need to fix exprTypmod() too.

Anyway, I'm gotoing to fix exprTypmod() also.
--
Tatsuo Ishii


Re: length coerce for bpchar is broken since 7.0

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
>> bpcharin() will most definitely NOT fix the problem, because it often
>> will not know the target column's typmod, if indeed there is an
>> identifiable target column at all. 

> Can you give me any example for this case?

UPDATE foo SET bpcharcol = 'a'::char || 'b'::char;

UPDATE foo SET bpcharcol = upper('abc');

In the first case bpcharin() will be invoked, but not in the context
of direct assignment to a table column, so it won't receive a valid
typmod.  In the second case bpcharin() will never be invoked at all,
because upper takes and returns text --- so 'abc' is not a bpchar
constant but a text constant.  You have to be sure that the parser
handles type length coercion correctly, and I think the cleanest way to
do that is to fix exprTypmod so that it knows how typmod is defined in
the MULTIBYTE case.
        regards, tom lane


Re: length coerce for bpchar is broken since 7.0

From
Tatsuo Ishii
Date:
> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> >> bpcharin() will most definitely NOT fix the problem, because it often
> >> will not know the target column's typmod, if indeed there is an
> >> identifiable target column at all. 
> 
> > Can you give me any example for this case?
> 
> UPDATE foo SET bpcharcol = 'a'::char || 'b'::char;
> 
> UPDATE foo SET bpcharcol = upper('abc');
> 
> In the first case bpcharin() will be invoked, but not in the context
> of direct assignment to a table column, so it won't receive a valid
> typmod.  In the second case bpcharin() will never be invoked at all,
> because upper takes and returns text --- so 'abc' is not a bpchar
> constant but a text constant.  You have to be sure that the parser
> handles type length coercion correctly, and I think the cleanest way to
> do that is to fix exprTypmod so that it knows how typmod is defined in
> the MULTIBYTE case.

In those cases above bpchar() will be called anyway, so I don't see
MULTIBYTE length coerce problems there.
--
Tatsuo Ishii


Re: length coerce for bpchar is broken since 7.0

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
>>>> Can you give me any example for this case?
>> 
>> UPDATE foo SET bpcharcol = 'a'::char || 'b'::char;
>> 
>> UPDATE foo SET bpcharcol = upper('abc');

> In those cases above bpchar() will be called anyway, so I don't see
> MULTIBYTE length coerce problems there.

So it will, but *only* because the parser realizes that it needs to
add a call to bpchar().  If exprTypmod returns incorrect values then
it's possible that the parser would wrongly decide it didn't need to
call bpchar().
        regards, tom lane


Re: length coerce for bpchar is broken since 7.0

From
Bruce Momjian
Date:
Can someone comment on the status of this?

> It seems the length coerce for bpchar is broken since 7.0.
> In 6.5 when a string is inserted, bpchar() is called to properly clip
> the string. However in 7.0 (and probably current) bpchar() is not
> called anymore. 
> 
> coerce_type_typmod() calls exprTypmod(). exprTypmod() returns VARSIZE
> of the bpchar data only if the data type is bpchar (if the data type
> is varchar, exprTypmod just returns -1 and the parser add a function
> node to call varchar(). so there is no problem for varchar). If
> VARSIZE returned from exprTypmod() and atttypmod passed to
> coerce_type_typmod() is equal, the function node to call bpchar()
> would not be added.
> 
> I'm not sure if this was an intended efect of the change. Anyway we
> have to do the length coerce for bpchar somewhere and I'm thinking now
> is doing in bpcharin(). This would also solve the problem in copy in a
> mail I have posted.
> 
> Comments?
> --
> Tatsuo Ishii
> 


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: length coerce for bpchar is broken since 7.0

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Can someone comment on the status of this?

regression=# create table foo (f1 char(7));
CREATE
regression=# insert into foo values ('123456789');
INSERT 145180 1
regression=# select * from foo;  f1
---------1234567
(1 row)

Where's the problem?
        regards, tom lane


Re: length coerce for bpchar is broken since 7.0

From
Tatsuo Ishii
Date:
I believe this has been fixed.

>Subject: [COMMITTERS] pgsql/src/backend/utils/adt (varchar.c)
>From: ishii@postgresql.org
>To: pgsql-committers@postgresql.org
>Date: Sun, 26 Nov 2000 06:35:23 -0500 (EST)

> Can someone comment on the status of this?
> 
> > It seems the length coerce for bpchar is broken since 7.0.
> > In 6.5 when a string is inserted, bpchar() is called to properly clip
> > the string. However in 7.0 (and probably current) bpchar() is not
> > called anymore. 
> > 
> > coerce_type_typmod() calls exprTypmod(). exprTypmod() returns VARSIZE
> > of the bpchar data only if the data type is bpchar (if the data type
> > is varchar, exprTypmod just returns -1 and the parser add a function
> > node to call varchar(). so there is no problem for varchar). If
> > VARSIZE returned from exprTypmod() and atttypmod passed to
> > coerce_type_typmod() is equal, the function node to call bpchar()
> > would not be added.
> > 
> > I'm not sure if this was an intended efect of the change. Anyway we
> > have to do the length coerce for bpchar somewhere and I'm thinking now
> > is doing in bpcharin(). This would also solve the problem in copy in a
> > mail I have posted.
> > 
> > Comments?
> > --
> > Tatsuo Ishii
> > 
> 
> 
> -- 
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 853-3000
>   +  If your life is a hard drive,     |  830 Blythe Avenue
>   +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026