Thread: pgsql: Fix for Unicode characters above 0x10000.

pgsql: Fix for Unicode characters above 0x10000.

From
momjian@svr1.postgresql.org (Bruce Momjian)
Date:
Log Message:
-----------
Fix for Unicode characters above 0x10000.

John Hansen

Modified Files:
--------------
    pgsql/src/backend/utils/mb:
        wchar.c (r1.38 -> r1.39)
        (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/mb/wchar.c.diff?r1=1.38&r2=1.39)
    pgsql/src/include/mb:
        pg_wchar.h (r1.53 -> r1.54)
        (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/mb/pg_wchar.h.diff?r1=1.53&r2=1.54)

Re: pgsql: Fix for Unicode characters above 0x10000.

From
Tom Lane
Date:
momjian@svr1.postgresql.org (Bruce Momjian) writes:
> Fix for Unicode characters above 0x10000.

I really have to object to this going in on the day before RC1, as well.
(1) There is no way on God's green earth that this isn't a new feature.
(2) It is mucking with fairly central code.
(3) AFAIK it hasn't been reviewed by anybody who's familiar with the
multibyte code.
(4) How do you know that wider-than-16-bit characters won't break stuff
elsewhere?

            regards, tom lane

Re: pgsql: Fix for Unicode characters above 0x10000.

From
Bruce Momjian
Date:
Tom Lane wrote:
> momjian@svr1.postgresql.org (Bruce Momjian) writes:
> > Fix for Unicode characters above 0x10000.
>
> I really have to object to this going in on the day before RC1, as well.
> (1) There is no way on God's green earth that this isn't a new feature.
> (2) It is mucking with fairly central code.
> (3) AFAIK it hasn't been reviewed by anybody who's familiar with the
> multibyte code.
> (4) How do you know that wider-than-16-bit characters won't break stuff
> elsewhere?

I don't know the answers to any of these questions.  I don't even
understand the purpose of the patch.  However, no one object to it nor
did they say anything when it went into the queue, so it was applied.

If you are registering concerns, I will remove it. I was unclear about
the patch as well but assumed Unicode-folks were OK with it.

Should it be backed out and saved for 8.1?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: pgsql: Fix for Unicode characters above 0x10000.

From
Neil Conway
Date:
On Thu, 2004-12-02 at 19:48 -0500, Bruce Momjian wrote:
> Should it be backed out and saved for 8.1?

That would be my preference, unless we can get some review from some
Unicode-clueful folks on it.

-Neil



Re: pgsql: Fix for Unicode characters above 0x10000.

From
Bruce Momjian
Date:
Neil Conway wrote:
> On Thu, 2004-12-02 at 19:48 -0500, Bruce Momjian wrote:
> > Should it be backed out and saved for 8.1?
>
> That would be my preference, unless we can get some review from some
> Unicode-clueful folks on it.

OK, backing out. I just don't understand the Unicode stuff myself so
until someone can say it is required, it will be kept for 8.1.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: pgsql: Fix for Unicode characters above 0x10000.

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I don't know the answers to any of these questions.  I don't even
> understand the purpose of the patch.  However, no one object to it nor
> did they say anything when it went into the queue, so it was applied.

I was waiting for Tatsuo to comment on it --- he's certainly the
best-qualified person.  But in any case I was expecting it would be
held for 8.1.  Even if it's fine as far as the MB code itself goes,
I don't think we can assume that the rest of the system is good to
go with wide characters wider than we have tested before.  (Example:
before 7.4 the regex code was definitely unable to handle chars wider
than 16 bits.  I think it would be all right now, but it's untested.)

            regards, tom lane