Thread: Java's Unicode Notation

Java's Unicode Notation

From
Jean-Michel POURE
Date:
Dear all,

Could it be possible to use the Java Unicode Notation to define UTF-8 
strings in PostgreSQL 7.2.
Information can be found on http://czyborra.com/utf/

Best regards,
Jean-Michel pOURE

************************************************

Java's Unicode Notation
There are some less compact but more readable ASCII transformations the 
most important of which is the Java Unicode Notation as allowed in Java 
source code and processed by Java's native2ascii converter:

putwchar(c)
{  if (c >= 0x10000) {    printf ("\\u%04x\\u%04x" , 0xD7C0 + (c >> 10), 0xDC00 | c & 0x3FF);  }  else if (c >= 0x100)
printf("\\u%04x", c);  else putchar (c);
 
}

The advantage of the \u20ac notation is that it is very easy to type it in 
on any old ASCII keyboard and easy to look up the intended character if you 
happen to have a copy of the Unicode book or the 
{unidata2,names2,unihan}.txt files from the Unicode FTP site or CD-ROM or 
know what U+20AC is the €.

What's not so nice about the \u20ac notation is that the small letters are 
quite unusual for Unicode characters, the backslashes have to be quoted for 
many Unix tools, the four hexdigits without a terminator may appear merged 
with the following word as in \u00a333 for £33, it is unclear when and how 
you have to escape the backslash character itself, 6 bytes for one 
character may be considered wasteful, and there is no way to clearly 
present the characters beyond \uffff without \ud800\udc00 surrogates, and 
last but not least the plain hexnumbers may not be very helpful.

JAVA is one of the target and source encodings of yudit and its uniconv 
converter.




'postgres' flag

From
"Mike Rogers"
Date:
Hi Folks;   Anyone have a code hack to 7.1 to make postgreSQL break out of the
'sameuser' jail if a user as the 'postgres' superuser flag?  Or maybe to set
config file lines based also on 'superuser' (like 'crypt superuser' or
something like that).  Otherwise I think I might make one.
--
Mike



Re: 'postgres' flag

From
Tom Lane
Date:
"Mike Rogers" <temp6453@hotmail.com> writes:
>     Anyone have a code hack to 7.1 to make postgreSQL break out of the
> 'sameuser' jail if a user as the 'postgres' superuser flag?

The difficulty with that idea is that the connection-matching code has
no idea whether a given userid is superuser or not (indeed, that info
is not available to the postmaster at all).

> Or maybe to set
> config file lines based also on 'superuser' (like 'crypt superuser' or
> something like that).  Otherwise I think I might make one.

Did you read the thread a day or two back in pgsql-admin?  Consider
something like
local    sameuser    passwordlocal    all        password crossauth

where crossauth contains the usernames you want to allow to connect
to databases other than their own.
        regards, tom lane


Re: 'postgres' flag

From
"Mike Rogers"
Date:
There appears to be some delay on the list.  I just received that message
this morning (how helpful)- I will be trying to implement it now and see how
far I can get.  It looks like it'll work.  Does it work with 'crypt' or only
'password' (i presently use crypted passwords, but I can change that if
it'll make all the difference)?
   Now the even bigger question- why isn't this documented?
--
Mike

----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Mike Rogers" <temp6453@hotmail.com>
Cc: <pgsql-hackers@postgresql.org>
Sent: Thursday, November 08, 2001 10:10 AM
Subject: Re: [HACKERS] 'postgres' flag


> "Mike Rogers" <temp6453@hotmail.com> writes:
> >     Anyone have a code hack to 7.1 to make postgreSQL break out of the
> > 'sameuser' jail if a user as the 'postgres' superuser flag?
>
> The difficulty with that idea is that the connection-matching code has
> no idea whether a given userid is superuser or not (indeed, that info
> is not available to the postmaster at all).
>
> > Or maybe to set
> > config file lines based also on 'superuser' (like 'crypt superuser' or
> > something like that).  Otherwise I think I might make one.
>
> Did you read the thread a day or two back in pgsql-admin?  Consider
> something like
>
> local sameuser password
> local all password crossauth
>
> where crossauth contains the usernames you want to allow to connect
> to databases other than their own.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>


Re: 'postgres' flag

From
"Mike Rogers"
Date:
Thank you so much- I have been trying to do exactly that for months (my
postgres and admin users could never see the individual users because we
were using sameuser, unless they were logged in as certain users so that
ident could work- and even then, it's not hard to come from
root@anothermachine or admin@anothermachine).  Thanks so much.  This should
really be documented.  It's not in the sample pg_hba.conf nor the web docs.
--
Mike

----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Mike Rogers" <temp6453@hotmail.com>
Cc: <pgsql-hackers@postgresql.org>
Sent: Thursday, November 08, 2001 10:10 AM
Subject: Re: [HACKERS] 'postgres' flag


> "Mike Rogers" <temp6453@hotmail.com> writes:
> >     Anyone have a code hack to 7.1 to make postgreSQL break out of the
> > 'sameuser' jail if a user as the 'postgres' superuser flag?
>
> The difficulty with that idea is that the connection-matching code has
> no idea whether a given userid is superuser or not (indeed, that info
> is not available to the postmaster at all).
>
> > Or maybe to set
> > config file lines based also on 'superuser' (like 'crypt superuser' or
> > something like that).  Otherwise I think I might make one.
>
> Did you read the thread a day or two back in pgsql-admin?  Consider
> something like
>
> local sameuser password
> local all password crossauth
>
> where crossauth contains the usernames you want to allow to connect
> to databases other than their own.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>


Re: Java's Unicode Notation

From
Jean-Michel POURE
Date:
Dear Hiroshi,

We need this Java Unicode Notation as it is the only one accepted by 
Javascript for example. Furthermore, it is ASCII compatible and therefore 
an alternative when an existing database has ASCII encoding or PostgreSQL 
is not compiled with the right extensions (ex: provider).

At least, I need to be able to convert output to Java Unicode Notation. I 
am learning C and will have a try, but not before a few weeks. This is what 
I say everyday when I wake up: today, I am going to read PostgreSQL 
internals source code.

But, unfortunately, I always have to postpone this step...
Thanks for your help.

Best regards,
Jean-Michel POURE

At 10:13 10/11/01 +0900, you wrote:
>Hi,
>
> > -----Original Message-----
> > From: Jean-Michel POURE [mailto:jm.poure@freesurf.fr]
> >
> > Dear Hiroshi,
> >
> > Could it be possible to use the Java Unicode Notation to define UTF-8
> > strings in PostgreSQL 7.2.
>
>We are now in 7.2 beta and it seems impossible to add a new feature
>to 7.2. Am I misunderstanding your point ?
>
> > Information can be found on http://czyborra.com/utf/
> >
> > Do you think it is hard to implement?
>
>It seems difficult to get a consensus about it in PG community
>in the first place. I asked some developers' opinion in Japan but
>they(me either) aren't enthusiatic about it.  However they seem
>to have another idea though I don't the details about it now.
>I would mail you about it if I would get some info.
>
>regards,
>Hiroshi Inoue



Re: Java's Unicode Notation

From
Patrice Hédé
Date:
Hi,

I'm answering to the original mail, as it has the description itself.

* Jean-Michel POURE <jm.poure@freesurf.fr> [011107 22:04]:
> Dear all,
> 
> Could it be possible to use the Java Unicode Notation to define UTF-8 
> strings in PostgreSQL 7.2.
> Information can be found on http://czyborra.com/utf/
> 
> Best regards,
> Jean-Michel pOURE
> 
> ************************************************
> 
> Java's Unicode Notation
> There are some less compact but more readable ASCII transformations
> the most important of which is the Java Unicode Notation as allowed
> in Java source code and processed by Java's native2ascii converter:
> 
> putwchar(c)
> {
>   if (c >= 0x10000) {
>     printf ("\\u%04x\\u%04x" , 0xD7C0 + (c >> 10), 0xDC00 | c & 0x3FF);
>   }
>   else if (c >= 0x100) printf ("\\u%04x", c);
>   else putchar (c);
> }
> 
> The advantage of the \u20ac notation is that it is very easy to type
> it in on any old ASCII keyboard and easy to look up the intended
> character if you happen to have a copy of the Unicode book or the
> {unidata2,names2,unihan}.txt files from the Unicode FTP site or
> CD-ROM or know what U+20AC is the €.                                   ^^^
Was that the codepoint for the windows proprietary charset for the
Euro, disguised in a mail advertising itself as "iso-8859-1", which
doesn't have the euro sign ? ;)

[No wonder Unicode is really needed in Europe !]

> What's not so nice about the \u20ac notation is that the small
> letters are quite unusual for Unicode characters, the backslashes
> have to be quoted for many Unix tools, the four hexdigits without a
> terminator may appear merged with the following word as in \u00a333
> for £33, it is unclear when and how you have to escape the backslash
> character itself, 6 bytes for one character may be considered
> wasteful, and there is no way to clearly present the characters
> beyond \uffff without \ud800\udc00 surrogates, and last but not
> least the plain hexnumbers may not be very helpful.
> 
> JAVA is one of the target and source encodings of yudit and its
> uniconv converter.

I have to disagree about this feature... well, not about the idea, but
the implementation.

First, the use of surrogates to describe > 0x010000 codepoints.
Surrogates are NOT Unicode codepoints. They only exist in UTF-16
encoding, which is the encoding used by Java and Windows. However,
PostgreSQL, as most Unix tools, uses UTF-8 as encoding.

Encoding codepoints over 0xffff with two surrogates in UTF-8 is
illegal... So, you should forget about this, as this is an unnatural
extra step.

I've seen somewhere the notation \v010000 (using \v for 6-char
codepoints). But I don't like it too much either.

I agree with your idea of being able to express unicode codepoints
directly with escape characters. I personally like Perl's solution :

\x{20ac}
\x{010123}
\x{7e}

Using the braces, it makes it unambiguous to deal with codepoint
length (I've often myself put one "0" too much or not enough in
unicode code point descriptions).

I don't mind \u{...} instead of \x{...}. But a lot of PostgreSQL users
would be familiar with \x{} notation :) [Me being the first one]

I think that this is something for psql however. Where is "\n"
translated, for example ? Anyway, for 7.3... :)

Patrice.

-- 
Patrice Hédé
email: patrice hede à islande org
www  : http://www.islande.org/