Thread: character type value is not padded with spaces

character type value is not padded with spaces

From
Yoshiyuki Asaba
Date:
Character type value including multibyte characters is not padded
with spaces. It reproduces at 7.3.x, 7.4.x and 8.0.x.

   create table t (a char(10));
   insert into t values ('XXXXX'); -- X is 2byte character.

I expect that 'XXXXX     ' is inserted. But 'XXXXX' is inserted.

   select a, octed_length(a) from t;

      a   | octet_length
   -------+--------------
    XXXXX |           10

If padded with spaces, octet_length(a) is 15. This problem is caused
that string length is calculated by byte length(VARSIZE) in
exprTypmod().

I attache the patch for this problem.

Regards,

--
Yoshiyuki Asaba
y-asaba@sra.co.jp
*** parse_expr.c.orig    2005-01-13 02:32:36.000000000 +0900
--- parse_expr.c    2005-05-22 17:12:37.000000000 +0900
***************
*** 18,23 ****
--- 18,24 ----
  #include "catalog/pg_operator.h"
  #include "catalog/pg_proc.h"
  #include "commands/dbcommands.h"
+ #include "mb/pg_wchar.h"
  #include "miscadmin.h"
  #include "nodes/makefuncs.h"
  #include "nodes/params.h"
***************
*** 34,40 ****
  #include "utils/lsyscache.h"
  #include "utils/syscache.h"

-
  bool        Transform_null_equals = false;

  static Node *transformColumnRef(ParseState *pstate, ColumnRef *cref);
--- 35,40 ----
***************
*** 1491,1497 ****
                  {
                      case BPCHAROID:
                          if (!con->constisnull)
!                             return VARSIZE(DatumGetPointer(con->constvalue));
                          break;
                      default:
                          break;
--- 1491,1503 ----
                  {
                      case BPCHAROID:
                          if (!con->constisnull)
!                         {
!                             int32 len = VARSIZE(DatumGetPointer(con->constvalue)) - VARHDRSZ;
!
!                             if (pg_database_encoding_max_length() > 1)
!                                 len = pg_mbstrlen_with_len(VARDATA(DatumGetPointer(con->constvalue)), len);
!                             return len + VARHDRSZ;
!                         }
                          break;
                      default:
                          break;

Re: character type value is not padded with spaces

From
Tatsuo Ishii
Date:
Hackers,

The problem he found is not only existing in Japanese characters but
also in any multibyte encodings including UTF-8. For me the patch
looks good and I will commit it to 7.3, 7.4, 8.0 stables and current
if there's no objection.
--
Tatsuo Ishii

> Character type value including multibyte characters is not padded
> with spaces. It reproduces at 7.3.x, 7.4.x and 8.0.x.
>
>    create table t (a char(10));
>    insert into t values ('XXXXX'); -- X is 2byte character.
>
> I expect that 'XXXXX     ' is inserted. But 'XXXXX' is inserted.
>
>    select a, octed_length(a) from t;
>
>       a   | octet_length
>    -------+--------------
>     XXXXX |           10
>
> If padded with spaces, octet_length(a) is 15. This problem is caused
> that string length is calculated by byte length(VARSIZE) in
> exprTypmod().
>
> I attache the patch for this problem.
>
> Regards,
>
> --
> Yoshiyuki Asaba
> y-asaba@sra.co.jp

Re: character type value is not padded with spaces

From
"John Hansen"
Date:
Ahemm,...

UNICODE DB:

create table t (a char(10));
set client_encoding = iso88591;
insert into t VALUES ('æøå');

select a, octet_length(a),length(a) from t;
     a      | octet_length | length
------------+--------------+--------
 æøå        |           13 |      3
(1 row)

This is with 8.0.2.

Just FYI.

... John

> -----Original Message-----
> From: pgsql-patches-owner@postgresql.org
> [mailto:pgsql-patches-owner@postgresql.org] On Behalf Of Tatsuo Ishii
> Sent: Tuesday, May 24, 2005 8:52 AM
> To: y-asaba@sra.co.jp
> Cc: pgsql-patches@postgresql.org; pgsql-hackers@postgresql.org
> Subject: Re: [PATCHES] character type value is not padded with spaces
>
> Hackers,
>
> The problem he found is not only existing in Japanese
> characters but also in any multibyte encodings including
> UTF-8. For me the patch looks good and I will commit it to
> 7.3, 7.4, 8.0 stables and current if there's no objection.
> --
> Tatsuo Ishii
>
> > Character type value including multibyte characters is not
> padded with
> > spaces. It reproduces at 7.3.x, 7.4.x and 8.0.x.
> >
> >    create table t (a char(10));
> >    insert into t values ('XXXXX'); -- X is 2byte character.
> >
> > I expect that 'XXXXX     ' is inserted. But 'XXXXX' is inserted.
> >
> >    select a, octed_length(a) from t;
> >
> >       a   | octet_length
> >    -------+--------------
> >     XXXXX |           10
> >
> > If padded with spaces, octet_length(a) is 15. This problem
> is caused
> > that string length is calculated by byte length(VARSIZE) in
> > exprTypmod().
> >
> > I attache the patch for this problem.
> >
> > Regards,
> >
> > --
> > Yoshiyuki Asaba
> > y-asaba@sra.co.jp
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to
> majordomo@postgresql.org)
>
>

Re: character type value is not padded with spaces

From
Tatsuo Ishii
Date:
I think you need to test with 5 characters, not 3.
--
Tatsuo Ishii

> Ahemm,...
>
> UNICODE DB:
>
> create table t (a char(10));
> set client_encoding = iso88591;
> insert into t VALUES ('æøå');
>
> select a, octet_length(a),length(a) from t;
>      a      | octet_length | length
> ------------+--------------+--------
>  æøå        |           13 |      3
> (1 row)
>
> This is with 8.0.2.
>
> Just FYI.
>
> ... John
>
> > -----Original Message-----
> > From: pgsql-patches-owner@postgresql.org
> > [mailto:pgsql-patches-owner@postgresql.org] On Behalf Of Tatsuo Ishii
> > Sent: Tuesday, May 24, 2005 8:52 AM
> > To: y-asaba@sra.co.jp
> > Cc: pgsql-patches@postgresql.org; pgsql-hackers@postgresql.org
> > Subject: Re: [PATCHES] character type value is not padded with spaces
> >
> > Hackers,
> >
> > The problem he found is not only existing in Japanese
> > characters but also in any multibyte encodings including
> > UTF-8. For me the patch looks good and I will commit it to
> > 7.3, 7.4, 8.0 stables and current if there's no objection.
> > --
> > Tatsuo Ishii
> >
> > > Character type value including multibyte characters is not
> > padded with
> > > spaces. It reproduces at 7.3.x, 7.4.x and 8.0.x.
> > >
> > >    create table t (a char(10));
> > >    insert into t values ('XXXXX'); -- X is 2byte character.
> > >
> > > I expect that 'XXXXX     ' is inserted. But 'XXXXX' is inserted.
> > >
> > >    select a, octed_length(a) from t;
> > >
> > >       a   | octet_length
> > >    -------+--------------
> > >     XXXXX |           10
> > >
> > > If padded with spaces, octet_length(a) is 15. This problem
> > is caused
> > > that string length is calculated by byte length(VARSIZE) in
> > > exprTypmod().
> > >
> > > I attache the patch for this problem.
> > >
> > > Regards,
> > >
> > > --
> > > Yoshiyuki Asaba
> > > y-asaba@sra.co.jp
> >
> > ---------------------------(end of
> > broadcast)---------------------------
> > TIP 2: you can get off all lists at once with the unregister command
> >     (send "unregister YourEmailAddressHere" to
> > majordomo@postgresql.org)
> >
> >
>

Re: character type value is not padded with spaces

From
"John Hansen"
Date:
Ahhh...

> -----Original Message-----
> From: Tatsuo Ishii [mailto:t-ishii@sra.co.jp]
> Sent: Tuesday, May 24, 2005 9:26 AM
> To: John Hansen
> Cc: y-asaba@sra.co.jp; pgsql-patches@postgresql.org;
> pgsql-hackers@postgresql.org
> Subject: Re: [PATCHES] character type value is not padded with spaces
>
> I think you need to test with 5 characters, not 3.
> --
> Tatsuo Ishii
>
> > Ahemm,...
> >
> > UNICODE DB:
> >
> > create table t (a char(10));
> > set client_encoding = iso88591;
> > insert into t VALUES ('æøå');
> >
> > select a, octet_length(a),length(a) from t;
> >      a      | octet_length | length
> > ------------+--------------+--------
> >  æøå        |           13 |      3
> > (1 row)
> >
> > This is with 8.0.2.
> >
> > Just FYI.
> >
> > ... John
> >
> > > -----Original Message-----
> > > From: pgsql-patches-owner@postgresql.org
> > > [mailto:pgsql-patches-owner@postgresql.org] On Behalf Of Tatsuo
> > > Ishii
> > > Sent: Tuesday, May 24, 2005 8:52 AM
> > > To: y-asaba@sra.co.jp
> > > Cc: pgsql-patches@postgresql.org; pgsql-hackers@postgresql.org
> > > Subject: Re: [PATCHES] character type value is not padded with
> > > spaces
> > >
> > > Hackers,
> > >
> > > The problem he found is not only existing in Japanese
> characters but
> > > also in any multibyte encodings including UTF-8. For me the patch
> > > looks good and I will commit it to 7.3, 7.4, 8.0 stables
> and current
> > > if there's no objection.
> > > --
> > > Tatsuo Ishii
> > >
> > > > Character type value including multibyte characters is not
> > > padded with
> > > > spaces. It reproduces at 7.3.x, 7.4.x and 8.0.x.
> > > >
> > > >    create table t (a char(10));
> > > >    insert into t values ('XXXXX'); -- X is 2byte character.
> > > >
> > > > I expect that 'XXXXX     ' is inserted. But 'XXXXX' is inserted.
> > > >
> > > >    select a, octed_length(a) from t;
> > > >
> > > >       a   | octet_length
> > > >    -------+--------------
> > > >     XXXXX |           10
> > > >
> > > > If padded with spaces, octet_length(a) is 15. This problem
> > > is caused
> > > > that string length is calculated by byte length(VARSIZE) in
> > > > exprTypmod().
> > > >
> > > > I attache the patch for this problem.
> > > >
> > > > Regards,
> > > >
> > > > --
> > > > Yoshiyuki Asaba
> > > > y-asaba@sra.co.jp
> > >
> > > ---------------------------(end of
> > > broadcast)---------------------------
> > > TIP 2: you can get off all lists at once with the
> unregister command
> > >     (send "unregister YourEmailAddressHere" to
> > > majordomo@postgresql.org)
> > >
> > >
> >
>
>

Re: character type value is not padded with spaces

From
Bruce Momjian
Date:
I see Tatsuo already applied this, which is great.  I added a little
comment:

    /* if multi-byte, take len and find # characters */

---------------------------------------------------------------------------

Yoshiyuki Asaba wrote:
> Character type value including multibyte characters is not padded
> with spaces. It reproduces at 7.3.x, 7.4.x and 8.0.x.
>
>    create table t (a char(10));
>    insert into t values ('XXXXX'); -- X is 2byte character.
>
> I expect that 'XXXXX     ' is inserted. But 'XXXXX' is inserted.
>
>    select a, octed_length(a) from t;
>
>       a   | octet_length
>    -------+--------------
>     XXXXX |           10
>
> If padded with spaces, octet_length(a) is 15. This problem is caused
> that string length is calculated by byte length(VARSIZE) in
> exprTypmod().
>
> I attache the patch for this problem.
>
> Regards,
>
> --
> Yoshiyuki Asaba
> y-asaba@sra.co.jp

> *** parse_expr.c.orig    2005-01-13 02:32:36.000000000 +0900
> --- parse_expr.c    2005-05-22 17:12:37.000000000 +0900
> ***************
> *** 18,23 ****
> --- 18,24 ----
>   #include "catalog/pg_operator.h"
>   #include "catalog/pg_proc.h"
>   #include "commands/dbcommands.h"
> + #include "mb/pg_wchar.h"
>   #include "miscadmin.h"
>   #include "nodes/makefuncs.h"
>   #include "nodes/params.h"
> ***************
> *** 34,40 ****
>   #include "utils/lsyscache.h"
>   #include "utils/syscache.h"
>
> -
>   bool        Transform_null_equals = false;
>
>   static Node *transformColumnRef(ParseState *pstate, ColumnRef *cref);
> --- 35,40 ----
> ***************
> *** 1491,1497 ****
>                   {
>                       case BPCHAROID:
>                           if (!con->constisnull)
> !                             return VARSIZE(DatumGetPointer(con->constvalue));
>                           break;
>                       default:
>                           break;
> --- 1491,1503 ----
>                   {
>                       case BPCHAROID:
>                           if (!con->constisnull)
> !                         {
> !                             int32 len = VARSIZE(DatumGetPointer(con->constvalue)) - VARHDRSZ;
> !
> !                             if (pg_database_encoding_max_length() > 1)
> !                                 len = pg_mbstrlen_with_len(VARDATA(DatumGetPointer(con->constvalue)), len);
> !                             return len + VARHDRSZ;
> !                         }
>                           break;
>                       default:
>                           break;

>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073