Thread: multibyte support by default

multibyte support by default

From
Tatsuo Ishii
Date:
In my understanding, our consensus was enabling multibyte support by
default for 7.3. Any objection?
--
Tatsuo Ishii


Re: multibyte support by default

From
Peter Eisentraut
Date:
Tatsuo Ishii writes:

> In my understanding, our consensus was enabling multibyte support by
> default for 7.3. Any objection?

It was my understanding (or if I was mistaken, then it is my suggestion)
that the build-time option would be removed altogether and certain
performance-critical places (if any) would be wrapped into
   if (encoding_is_single_byte(current_encoding)) { }

That's basically what I did with the locale support.

-- 
Peter Eisentraut   peter_e@gmx.net



Re: multibyte support by default

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> In my understanding, our consensus was enabling multibyte support by
> default for 7.3. Any objection?

Uh, was it?  I don't recall that.  Do we have any numbers on the
performance overhead?
        regards, tom lane


Re: multibyte support by default

From
Tatsuo Ishii
Date:
> > In my understanding, our consensus was enabling multibyte support by
> > default for 7.3. Any objection?
> 
> Uh, was it?  I don't recall that.  Do we have any numbers on the
> performance overhead?
> 
>             regards, tom lane

See below.

Subject: Re: [HACKERS] Unicode combining characters 
From: Tom Lane <tgl@sss.pgh.pa.us>
To: Tatsuo Ishii <t-ishii@sra.co.jp>
cc: ZeugswetterA@spardat.at, pgman@candle.pha.pa.us, phede-ml@islande.org,       pgsql-hackers@postgresql.org
Date: Wed, 03 Oct 2001 23:05:16 -0400
Comments: In-reply-to Tatsuo Ishii <t-ishii@sra.co.jp>    message dated "Thu, 04 Oct 2001 11:16:42 +0900"

Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> To accomplish this, I moved MatchText etc. to a separate file and now
> like.c includes it *twice* (similar technique used in regexec()). This
> makes like.o a little bit larger, but I believe this is worth for the
> optimization.

That sounds great.

What's your feeling now about the original question: whether to enable
multibyte by default now, or not?  I'm still thinking that Peter's
counsel is the wisest: plan to do it in 7.3, not today.  But this fix
seems to eliminate the only hard reason we have not to do it today ...
        regards, tom lane


Re: multibyte support by default

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> In my understanding, our consensus was enabling multibyte support by
> default for 7.3. Any objection?
>> 
>> Uh, was it?  I don't recall that.  Do we have any numbers on the
>> performance overhead?

> See below.

Oh, okay, now I recall that thread.  You're right, we did agree.
        regards, tom lane


Re: multibyte support by default

From
Hannu Krosing
Date:
On Tue, 2002-04-16 at 03:20, Tatsuo Ishii wrote:
> In my understanding, our consensus was enabling multibyte support by
> default for 7.3. Any objection?

Is there currently some agreed plan for introducing standard
NCHAR/NVARCHAR types.

What does ISO/ANSI say about multybyteness of simple CHAR types ?

--------------
Hannu




Re: multibyte support by default

From
Tatsuo Ishii
Date:
> On Tue, 2002-04-16 at 03:20, Tatsuo Ishii wrote:
> > In my understanding, our consensus was enabling multibyte support by
> > default for 7.3. Any objection?
> 
> Is there currently some agreed plan for introducing standard
> NCHAR/NVARCHAR types.

I have such a kind of *personal* plan, maybe for 7.4, not for 7.3 due
to the limitation of my free time.

BTW, NCHAR/NVARCHAR is just a abbreviation of "CHAR(n) CHARACTER SET
foo"(where foo is an implementaion defined charset). So I'm not too
impressed by an idea implementing NCHAR/NVARCHAR alone.

> What does ISO/ANSI say about multybyteness of simple CHAR types ?

There's no such that idea "multybyteness" in the standard.  In my
understanding the standard does not restrict "normal" CHAR types to
have only ASCII (more precisely "SQL_CHARACTER"). Moreover, CHAR types
without CHARSET specification will a have default charset to SQL_TEXT,
and its actual charset will be defined by the implementation.

In summary allowing any characters including multibyte ones in CHAR
types is not againt the standard at all, IMO.
--
Tatsuo Ishii