Thread: character type value is not padded with spaces
Character type value including multibyte characters is not padded with spaces. It reproduces at 7.3.x, 7.4.x and 8.0.x. create table t (a char(10)); insert into t values ('XXXXX'); -- X is 2byte character. I expect that 'XXXXX ' is inserted. But 'XXXXX' is inserted. select a, octed_length(a) from t; a | octet_length -------+-------------- XXXXX | 10 If padded with spaces, octet_length(a) is 15. This problem is caused that string length is calculated by byte length(VARSIZE) in exprTypmod(). I attache the patch for this problem. Regards, -- Yoshiyuki Asaba y-asaba@sra.co.jp *** parse_expr.c.orig 2005-01-13 02:32:36.000000000 +0900 --- parse_expr.c 2005-05-22 17:12:37.000000000 +0900 *************** *** 18,23 **** --- 18,24 ---- #include "catalog/pg_operator.h" #include "catalog/pg_proc.h" #include "commands/dbcommands.h" + #include "mb/pg_wchar.h" #include "miscadmin.h" #include "nodes/makefuncs.h" #include "nodes/params.h" *************** *** 34,40 **** #include "utils/lsyscache.h" #include "utils/syscache.h" - bool Transform_null_equals = false; static Node *transformColumnRef(ParseState *pstate, ColumnRef *cref); --- 35,40 ---- *************** *** 1491,1497 **** { case BPCHAROID: if (!con->constisnull) ! return VARSIZE(DatumGetPointer(con->constvalue)); break; default: break; --- 1491,1503 ---- { case BPCHAROID: if (!con->constisnull) ! { ! int32 len = VARSIZE(DatumGetPointer(con->constvalue)) - VARHDRSZ; ! ! if (pg_database_encoding_max_length() > 1) ! len = pg_mbstrlen_with_len(VARDATA(DatumGetPointer(con->constvalue)), len); ! return len + VARHDRSZ; ! } break; default: break;
Hackers, The problem he found is not only existing in Japanese characters but also in any multibyte encodings including UTF-8. For me the patch looks good and I will commit it to 7.3, 7.4, 8.0 stables and current if there's no objection. -- Tatsuo Ishii > Character type value including multibyte characters is not padded > with spaces. It reproduces at 7.3.x, 7.4.x and 8.0.x. > > create table t (a char(10)); > insert into t values ('XXXXX'); -- X is 2byte character. > > I expect that 'XXXXX ' is inserted. But 'XXXXX' is inserted. > > select a, octed_length(a) from t; > > a | octet_length > -------+-------------- > XXXXX | 10 > > If padded with spaces, octet_length(a) is 15. This problem is caused > that string length is calculated by byte length(VARSIZE) in > exprTypmod(). > > I attache the patch for this problem. > > Regards, > > -- > Yoshiyuki Asaba > y-asaba@sra.co.jp
Ahemm,... UNICODE DB: create table t (a char(10)); set client_encoding = iso88591; insert into t VALUES ('æøå'); select a, octet_length(a),length(a) from t; a | octet_length | length ------------+--------------+-------- æøå | 13 | 3 (1 row) This is with 8.0.2. Just FYI. ... John > -----Original Message----- > From: pgsql-patches-owner@postgresql.org > [mailto:pgsql-patches-owner@postgresql.org] On Behalf Of Tatsuo Ishii > Sent: Tuesday, May 24, 2005 8:52 AM > To: y-asaba@sra.co.jp > Cc: pgsql-patches@postgresql.org; pgsql-hackers@postgresql.org > Subject: Re: [PATCHES] character type value is not padded with spaces > > Hackers, > > The problem he found is not only existing in Japanese > characters but also in any multibyte encodings including > UTF-8. For me the patch looks good and I will commit it to > 7.3, 7.4, 8.0 stables and current if there's no objection. > -- > Tatsuo Ishii > > > Character type value including multibyte characters is not > padded with > > spaces. It reproduces at 7.3.x, 7.4.x and 8.0.x. > > > > create table t (a char(10)); > > insert into t values ('XXXXX'); -- X is 2byte character. > > > > I expect that 'XXXXX ' is inserted. But 'XXXXX' is inserted. > > > > select a, octed_length(a) from t; > > > > a | octet_length > > -------+-------------- > > XXXXX | 10 > > > > If padded with spaces, octet_length(a) is 15. This problem > is caused > > that string length is calculated by byte length(VARSIZE) in > > exprTypmod(). > > > > I attache the patch for this problem. > > > > Regards, > > > > -- > > Yoshiyuki Asaba > > y-asaba@sra.co.jp > > ---------------------------(end of > broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to > majordomo@postgresql.org) > >
I think you need to test with 5 characters, not 3. -- Tatsuo Ishii > Ahemm,... > > UNICODE DB: > > create table t (a char(10)); > set client_encoding = iso88591; > insert into t VALUES ('æøå'); > > select a, octet_length(a),length(a) from t; > a | octet_length | length > ------------+--------------+-------- > æøå | 13 | 3 > (1 row) > > This is with 8.0.2. > > Just FYI. > > ... John > > > -----Original Message----- > > From: pgsql-patches-owner@postgresql.org > > [mailto:pgsql-patches-owner@postgresql.org] On Behalf Of Tatsuo Ishii > > Sent: Tuesday, May 24, 2005 8:52 AM > > To: y-asaba@sra.co.jp > > Cc: pgsql-patches@postgresql.org; pgsql-hackers@postgresql.org > > Subject: Re: [PATCHES] character type value is not padded with spaces > > > > Hackers, > > > > The problem he found is not only existing in Japanese > > characters but also in any multibyte encodings including > > UTF-8. For me the patch looks good and I will commit it to > > 7.3, 7.4, 8.0 stables and current if there's no objection. > > -- > > Tatsuo Ishii > > > > > Character type value including multibyte characters is not > > padded with > > > spaces. It reproduces at 7.3.x, 7.4.x and 8.0.x. > > > > > > create table t (a char(10)); > > > insert into t values ('XXXXX'); -- X is 2byte character. > > > > > > I expect that 'XXXXX ' is inserted. But 'XXXXX' is inserted. > > > > > > select a, octed_length(a) from t; > > > > > > a | octet_length > > > -------+-------------- > > > XXXXX | 10 > > > > > > If padded with spaces, octet_length(a) is 15. This problem > > is caused > > > that string length is calculated by byte length(VARSIZE) in > > > exprTypmod(). > > > > > > I attache the patch for this problem. > > > > > > Regards, > > > > > > -- > > > Yoshiyuki Asaba > > > y-asaba@sra.co.jp > > > > ---------------------------(end of > > broadcast)--------------------------- > > TIP 2: you can get off all lists at once with the unregister command > > (send "unregister YourEmailAddressHere" to > > majordomo@postgresql.org) > > > > >
Ahhh... > -----Original Message----- > From: Tatsuo Ishii [mailto:t-ishii@sra.co.jp] > Sent: Tuesday, May 24, 2005 9:26 AM > To: John Hansen > Cc: y-asaba@sra.co.jp; pgsql-patches@postgresql.org; > pgsql-hackers@postgresql.org > Subject: Re: [PATCHES] character type value is not padded with spaces > > I think you need to test with 5 characters, not 3. > -- > Tatsuo Ishii > > > Ahemm,... > > > > UNICODE DB: > > > > create table t (a char(10)); > > set client_encoding = iso88591; > > insert into t VALUES ('æøå'); > > > > select a, octet_length(a),length(a) from t; > > a | octet_length | length > > ------------+--------------+-------- > > æøå | 13 | 3 > > (1 row) > > > > This is with 8.0.2. > > > > Just FYI. > > > > ... John > > > > > -----Original Message----- > > > From: pgsql-patches-owner@postgresql.org > > > [mailto:pgsql-patches-owner@postgresql.org] On Behalf Of Tatsuo > > > Ishii > > > Sent: Tuesday, May 24, 2005 8:52 AM > > > To: y-asaba@sra.co.jp > > > Cc: pgsql-patches@postgresql.org; pgsql-hackers@postgresql.org > > > Subject: Re: [PATCHES] character type value is not padded with > > > spaces > > > > > > Hackers, > > > > > > The problem he found is not only existing in Japanese > characters but > > > also in any multibyte encodings including UTF-8. For me the patch > > > looks good and I will commit it to 7.3, 7.4, 8.0 stables > and current > > > if there's no objection. > > > -- > > > Tatsuo Ishii > > > > > > > Character type value including multibyte characters is not > > > padded with > > > > spaces. It reproduces at 7.3.x, 7.4.x and 8.0.x. > > > > > > > > create table t (a char(10)); > > > > insert into t values ('XXXXX'); -- X is 2byte character. > > > > > > > > I expect that 'XXXXX ' is inserted. But 'XXXXX' is inserted. > > > > > > > > select a, octed_length(a) from t; > > > > > > > > a | octet_length > > > > -------+-------------- > > > > XXXXX | 10 > > > > > > > > If padded with spaces, octet_length(a) is 15. This problem > > > is caused > > > > that string length is calculated by byte length(VARSIZE) in > > > > exprTypmod(). > > > > > > > > I attache the patch for this problem. > > > > > > > > Regards, > > > > > > > > -- > > > > Yoshiyuki Asaba > > > > y-asaba@sra.co.jp > > > > > > ---------------------------(end of > > > broadcast)--------------------------- > > > TIP 2: you can get off all lists at once with the > unregister command > > > (send "unregister YourEmailAddressHere" to > > > majordomo@postgresql.org) > > > > > > > > > >
I see Tatsuo already applied this, which is great. I added a little comment: /* if multi-byte, take len and find # characters */ --------------------------------------------------------------------------- Yoshiyuki Asaba wrote: > Character type value including multibyte characters is not padded > with spaces. It reproduces at 7.3.x, 7.4.x and 8.0.x. > > create table t (a char(10)); > insert into t values ('XXXXX'); -- X is 2byte character. > > I expect that 'XXXXX ' is inserted. But 'XXXXX' is inserted. > > select a, octed_length(a) from t; > > a | octet_length > -------+-------------- > XXXXX | 10 > > If padded with spaces, octet_length(a) is 15. This problem is caused > that string length is calculated by byte length(VARSIZE) in > exprTypmod(). > > I attache the patch for this problem. > > Regards, > > -- > Yoshiyuki Asaba > y-asaba@sra.co.jp > *** parse_expr.c.orig 2005-01-13 02:32:36.000000000 +0900 > --- parse_expr.c 2005-05-22 17:12:37.000000000 +0900 > *************** > *** 18,23 **** > --- 18,24 ---- > #include "catalog/pg_operator.h" > #include "catalog/pg_proc.h" > #include "commands/dbcommands.h" > + #include "mb/pg_wchar.h" > #include "miscadmin.h" > #include "nodes/makefuncs.h" > #include "nodes/params.h" > *************** > *** 34,40 **** > #include "utils/lsyscache.h" > #include "utils/syscache.h" > > - > bool Transform_null_equals = false; > > static Node *transformColumnRef(ParseState *pstate, ColumnRef *cref); > --- 35,40 ---- > *************** > *** 1491,1497 **** > { > case BPCHAROID: > if (!con->constisnull) > ! return VARSIZE(DatumGetPointer(con->constvalue)); > break; > default: > break; > --- 1491,1503 ---- > { > case BPCHAROID: > if (!con->constisnull) > ! { > ! int32 len = VARSIZE(DatumGetPointer(con->constvalue)) - VARHDRSZ; > ! > ! if (pg_database_encoding_max_length() > 1) > ! len = pg_mbstrlen_with_len(VARDATA(DatumGetPointer(con->constvalue)), len); > ! return len + VARHDRSZ; > ! } > break; > default: > break; > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073