Thread: length coerce for bpchar is broken since 7.0
It seems the length coerce for bpchar is broken since 7.0. In 6.5 when a string is inserted, bpchar() is called to properly clip the string. However in 7.0 (and probably current) bpchar() is not called anymore. coerce_type_typmod() calls exprTypmod(). exprTypmod() returns VARSIZE of the bpchar data only if the data type is bpchar (if the data type is varchar, exprTypmod just returns -1 and the parser add a function node to call varchar(). so there is no problem for varchar). If VARSIZE returned from exprTypmod() and atttypmod passed to coerce_type_typmod() is equal, the function node to call bpchar() would not be added. I'm not sure if this was an intended efect of the change. Anyway we have to do the length coerce for bpchar somewhere and I'm thinking now is doing in bpcharin(). This would also solve the problem in copy in a mail I have posted. Comments? -- Tatsuo Ishii
Tatsuo Ishii <t-ishii@sra.co.jp> writes: > If VARSIZE returned from exprTypmod() and atttypmod passed to > coerce_type_typmod() is equal, the function node to call bpchar() > would not be added. Um, what's wrong with that? Seems to me that parse_coerce is doing exactly what it's supposed to, ie, adding only length coercions that are needed. regards, tom lane
> > If VARSIZE returned from exprTypmod() and atttypmod passed to > > coerce_type_typmod() is equal, the function node to call bpchar() > > would not be added. > > Um, what's wrong with that? Seems to me that parse_coerce is doing > exactly what it's supposed to, ie, adding only length coercions > that are needed. Simply clipping multibyte strings by atttypmode might produce incorrect multibyte strings. Consider a case inserting 3 multibyte letters (each consisting of 2 bytes) into a char(5) column. Or this kind of consideration should be in bpcharin() as I said in the earilier mail? -- Tatsuo Ishii
Tatsuo Ishii <t-ishii@sra.co.jp> writes: >>>> If VARSIZE returned from exprTypmod() and atttypmod passed to >>>> coerce_type_typmod() is equal, the function node to call bpchar() >>>> would not be added. >> >> Um, what's wrong with that? Seems to me that parse_coerce is doing >> exactly what it's supposed to, ie, adding only length coercions >> that are needed. > Simply clipping multibyte strings by atttypmode might produce > incorrect multibyte strings. Consider a case inserting 3 multibyte > letters (each consisting of 2 bytes) into a char(5) column. It seems to me that this means that atttypmod or exprTypmod() isn't correctly defined for MULTIBYTE char(n) values. We should define typmod in such a way that they agree iff the string is correctly clipped. This might be easier said than done, perhaps, but I don't like the idea of having to apply length-coercion functions all the time because we can't figure out whether they're needed or not. regards, tom lane
> > Simply clipping multibyte strings by atttypmode might produce > > incorrect multibyte strings. Consider a case inserting 3 multibyte > > letters (each consisting of 2 bytes) into a char(5) column. > > It seems to me that this means that atttypmod or exprTypmod() isn't > correctly defined for MULTIBYTE char(n) values. We should define > typmod in such a way that they agree iff the string is correctly > clipped. This might be easier said than done, perhaps, but I don't > like the idea of having to apply length-coercion functions all the > time because we can't figure out whether they're needed or not. Before going further, may I ask you a question. Why in exprTypmod() is bpchar() treated differently from other data types such as varchar? switch (con->consttype) { case BPCHAROID: if (!con->constisnull) return VARSIZE(DatumGetPointer(con->constvalue)); break; default: break; } -- Tatsuo Ishii
Tatsuo Ishii <t-ishii@sra.co.jp> writes: > Before going further, may I ask you a question. Why in exprTypmod() is > bpchar() treated differently from other data types such as varchar? It's just hardwired knowledge about that particular datatype. In the light of your comments, it seems clear that the code here is wrong for the MULTIBYTE case: instead of plain VARSIZE(), it should be returning the number of multibyte characters + 4 (or whatever atttypmod is defined to mean for MULTIBYTE bpchar). I think I wrote this code to start with, so you can blame me for the fact that it neglects the MULTIBYTE case :-( regards, tom lane
> It's just hardwired knowledge about that particular datatype. In the > light of your comments, it seems clear that the code here is wrong > for the MULTIBYTE case: instead of plain VARSIZE(), it should be > returning the number of multibyte characters + 4 (or whatever > atttypmod is defined to mean for MULTIBYTE bpchar). I think I wrote > this code to start with, so you can blame me for the fact that it > neglects the MULTIBYTE case :-( I'm going to fix the problem by changing bpcharin() rather than changing exprTypmod(). Surely we could fix the problem by changing exprTypmod() for INSERT, however, we could not fix the similar problem for COPY FROM in the same way. Changing bpcharin() would solve problems of both INSERT and COPY FROM. So bpcharin() seems more appropreate place to fix both problems. -- Tatsuo Ishii
Tatsuo Ishii <t-ishii@sra.co.jp> writes: > I'm going to fix the problem by changing bpcharin() rather than > changing exprTypmod(). Surely we could fix the problem by changing > exprTypmod() for INSERT, however, we could not fix the similar problem > for COPY FROM in the same way. Changing bpcharin() would solve > problems of both INSERT and COPY FROM. So bpcharin() seems more > appropreate place to fix both problems. bpcharin() will most definitely NOT fix the problem, because it often will not know the target column's typmod, if indeed there is an identifiable target column at all. I agree that it's a good solution for COPY FROM, but you need to fix exprTypmod() too. regards, tom lane
> Tatsuo Ishii <t-ishii@sra.co.jp> writes: > > I'm going to fix the problem by changing bpcharin() rather than > > changing exprTypmod(). Surely we could fix the problem by changing > > exprTypmod() for INSERT, however, we could not fix the similar problem > > for COPY FROM in the same way. Changing bpcharin() would solve > > problems of both INSERT and COPY FROM. So bpcharin() seems more > > appropreate place to fix both problems. > > bpcharin() will most definitely NOT fix the problem, because it often > will not know the target column's typmod, if indeed there is an > identifiable target column at all. Can you give me any example for this case? > I agree that it's a good solution > for COPY FROM, but you need to fix exprTypmod() too. Anyway, I'm gotoing to fix exprTypmod() also. -- Tatsuo Ishii
Tatsuo Ishii <t-ishii@sra.co.jp> writes: >> bpcharin() will most definitely NOT fix the problem, because it often >> will not know the target column's typmod, if indeed there is an >> identifiable target column at all. > Can you give me any example for this case? UPDATE foo SET bpcharcol = 'a'::char || 'b'::char; UPDATE foo SET bpcharcol = upper('abc'); In the first case bpcharin() will be invoked, but not in the context of direct assignment to a table column, so it won't receive a valid typmod. In the second case bpcharin() will never be invoked at all, because upper takes and returns text --- so 'abc' is not a bpchar constant but a text constant. You have to be sure that the parser handles type length coercion correctly, and I think the cleanest way to do that is to fix exprTypmod so that it knows how typmod is defined in the MULTIBYTE case. regards, tom lane
> Tatsuo Ishii <t-ishii@sra.co.jp> writes: > >> bpcharin() will most definitely NOT fix the problem, because it often > >> will not know the target column's typmod, if indeed there is an > >> identifiable target column at all. > > > Can you give me any example for this case? > > UPDATE foo SET bpcharcol = 'a'::char || 'b'::char; > > UPDATE foo SET bpcharcol = upper('abc'); > > In the first case bpcharin() will be invoked, but not in the context > of direct assignment to a table column, so it won't receive a valid > typmod. In the second case bpcharin() will never be invoked at all, > because upper takes and returns text --- so 'abc' is not a bpchar > constant but a text constant. You have to be sure that the parser > handles type length coercion correctly, and I think the cleanest way to > do that is to fix exprTypmod so that it knows how typmod is defined in > the MULTIBYTE case. In those cases above bpchar() will be called anyway, so I don't see MULTIBYTE length coerce problems there. -- Tatsuo Ishii
Tatsuo Ishii <t-ishii@sra.co.jp> writes: >>>> Can you give me any example for this case? >> >> UPDATE foo SET bpcharcol = 'a'::char || 'b'::char; >> >> UPDATE foo SET bpcharcol = upper('abc'); > In those cases above bpchar() will be called anyway, so I don't see > MULTIBYTE length coerce problems there. So it will, but *only* because the parser realizes that it needs to add a call to bpchar(). If exprTypmod returns incorrect values then it's possible that the parser would wrongly decide it didn't need to call bpchar(). regards, tom lane
Can someone comment on the status of this? > It seems the length coerce for bpchar is broken since 7.0. > In 6.5 when a string is inserted, bpchar() is called to properly clip > the string. However in 7.0 (and probably current) bpchar() is not > called anymore. > > coerce_type_typmod() calls exprTypmod(). exprTypmod() returns VARSIZE > of the bpchar data only if the data type is bpchar (if the data type > is varchar, exprTypmod just returns -1 and the parser add a function > node to call varchar(). so there is no problem for varchar). If > VARSIZE returned from exprTypmod() and atttypmod passed to > coerce_type_typmod() is equal, the function node to call bpchar() > would not be added. > > I'm not sure if this was an intended efect of the change. Anyway we > have to do the length coerce for bpchar somewhere and I'm thinking now > is doing in bpcharin(). This would also solve the problem in copy in a > mail I have posted. > > Comments? > -- > Tatsuo Ishii > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Can someone comment on the status of this? regression=# create table foo (f1 char(7)); CREATE regression=# insert into foo values ('123456789'); INSERT 145180 1 regression=# select * from foo; f1 ---------1234567 (1 row) Where's the problem? regards, tom lane
I believe this has been fixed. >Subject: [COMMITTERS] pgsql/src/backend/utils/adt (varchar.c) >From: ishii@postgresql.org >To: pgsql-committers@postgresql.org >Date: Sun, 26 Nov 2000 06:35:23 -0500 (EST) > Can someone comment on the status of this? > > > It seems the length coerce for bpchar is broken since 7.0. > > In 6.5 when a string is inserted, bpchar() is called to properly clip > > the string. However in 7.0 (and probably current) bpchar() is not > > called anymore. > > > > coerce_type_typmod() calls exprTypmod(). exprTypmod() returns VARSIZE > > of the bpchar data only if the data type is bpchar (if the data type > > is varchar, exprTypmod just returns -1 and the parser add a function > > node to call varchar(). so there is no problem for varchar). If > > VARSIZE returned from exprTypmod() and atttypmod passed to > > coerce_type_typmod() is equal, the function node to call bpchar() > > would not be added. > > > > I'm not sure if this was an intended efect of the change. Anyway we > > have to do the length coerce for bpchar somewhere and I'm thinking now > > is doing in bpcharin(). This would also solve the problem in copy in a > > mail I have posted. > > > > Comments? > > -- > > Tatsuo Ishii > > > > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 853-3000 > + If your life is a hard drive, | 830 Blythe Avenue > + Christ can be your backup. | Drexel Hill, Pennsylvania 19026