Re: Combining chars in psql (pre-patch) - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Combining chars in psql (pre-patch)
Date
Msg-id 200202221807.g1MI7nv29511@candle.pha.pa.us
Whole thread Raw
In response to Combining chars in psql (pre-patch)  (Patrice Hédé <phede-ml@islande.org>)
List pgsql-hackers
Patrice, do you have an updated patch you want applied to 7.3?

---------------------------------------------------------------------------

Patrice H�d� wrote:
> Hi,
> 
> I have been working a bit at a patch for that problem in psql. The
> patch is far from being ready for inclusion or whatever, it's just for
> comments...
> 
> By the way, someone can tell me how to generate nice patches showing
> the difference between one's version and the cvs code that has been
> downloaded ? I'm new to this (I've only used cvs for personal projects
> so far, and I don't need to send patches to myself ;) ).
> 
> The good things in this patch :
> 
> - it works for me :)
> 
> - I've used Markus Kuhn's implementation of wcwidth.c : it is locale
>   independant, and is in the public domain. :) [if we keep it, I'll
>   have to tell him, though !]
> 
> - No dependency on the local libc's UTF-8-awareness ;) [I've seen that
>   psql has no such dependancy, at least in print.c, so I haven't added
>   any]. Actually, the change is completely self-contained.
> 
> - I've made my own utf-8 -> ucs converter, since I haven't found any
>   without a copyright notice yesterday. It checks invalid and
>   non-optimal UTF-8 sequences, as requested per Unicode 3.0.1 (or 3.1,
>   I don't remember).
> 
> - it works for japanese (and I believe other "full-width" characters).
> 
> - if MULTIBYTE is not defined, the code doesn't change from the
>   commited version.
> 
> The not so good things :
> 
> - I've made my own utf-8 -> ucs converter... It seems to work fine,
>   but it's not tested well enough, it may not be so robust.
> 
> - The printf( "%*s", width, utfstr) doesn't work as expected, so I had
>   to fix by doing printf( "%*s%s", width - utfstrwidth, "", utfstr);
> 
> - everything in #ifdef MULTIBYTE/#endif . Since they're is no
>   dependancy on anything else (including the rest of the multibyte
>   implementation - which I haven't had the time to look at in detail),
>   it doesn't depend on it.
> 
> - I get this (for each call to pg_mb_utfs_width) and I don't know why :
> 
>   print.c:265: warning: passing arg 1 of `pg_mb_utfs_width' discards
>   qualifiers from pointer target type
> 
> - If pg_mb_utfs_width finds an invalid UTF-8 string, it truncates it.
>   I suppose that's what we want to do, but that's probably not the
>   best place to do it.
> 
> The bad things :
> 
> - If MULTIBYTE is defined, the strings must be in UTF-8, it doesn't
>   check any encoding.
> 
> - it is not integrated at all with the rest of the MB code.
> 
> - it doesn't respect the indentation policy ;)
> 
> 
> To do :
> 
> - integrate better with the rest of the MB (client-side encoding), and
>   with the rest of the code of print.c .
> 
> - verify utf8-to-ucs robustness seriously.
> 
> - make a visually nicer code :)
> 
> - find better function names.
> 
> And possibly :
> 
> - consolidate the code, in order to remove the need for the #ifdef's
>   in many places.
> 
> - make it working with some others multiwidth-encoding (but then, I
>   don't know anything about these encodings myself !).
> 
> - check also utf-8 stream at input time, so that no invalid utf-8 is
>   sent to the backend (at least from psql - the backend will need also
>   a strict checking for UTF-8).
> 
> - add nice UTF-8 borders as an option :)
> 
> - add a command-line parameter to consider Unicode Ambiguous
>   characters (characters which can be narrow or wide, depending on the
>   terminal) wide characters, as it seems to be the case for CJK
>   terminals (as per TR#11).
> 
> - What else ?
> 
> 
> BTW, here is the table I had in the first mail. I would have shown the
> one with all the weird Unicode characters, but my mutt is configured
> with iso-8859-15, and I doubt many of you have utf-8 as a default yet
> :)
> 
> +------+-------+--------+
> | lang | text  |  text  |
> +------+-------+--------+
> | isl  | ?l?ta | ?leit  |
> | isl  | ?l?ta | ?litum |
> | isl  | ?l?ta | ?liti? |
> | isl  | ma?ur | mann   |
> | isl  | ma?ur | m?nnum |
> | isl  | ma?ur | manna  |
> | isl  | ?ska  | -a?i   |
> +------+-------+--------+
> 
> 
> The files in attachment :
> - a diff for pgsql/src/bin/psql/print.c
> - a diff for pgsql/src/bin/psql/Makefile
> - two new files :
>   pgsql/src/bin/psql/pg_mb_utf8.c
>   pgsql/src/bin/psql/pg_mb_utf8.h
> 
> Have fun !
> 
> Patrice
> 
> -- 
> Patrice H?D? ------------------------------- patrice ? islande org -----
>   --  Isn't it weird  how scientists  can imagine  all the matter of the
> universe exploding out of a dot smaller than the head of a pin, but they
> can't come up with a more evocative name for it than "The Big Bang" ?
>   -- What would _you_ call the creation of the universe ?
>   -- "The HORRENDOUS SPACE KABLOOIE !"               - Calvin and Hobbes
> ------------------------------------------ http://www.islande.org/ -----

[ Attachment, skipping... ]

[ Attachment, skipping... ]

[ Attachment, skipping... ]

[ Attachment, skipping... ]

> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Status of index location patch
Next
From: Ned Wolpert
Date:
Subject: Re: Replication direction