Re: Combining chars in psql (pre-patch) - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: Combining chars in psql (pre-patch) |
Date | |
Msg-id | 200202221807.g1MI7nv29511@candle.pha.pa.us Whole thread Raw |
In response to | Combining chars in psql (pre-patch) (Patrice Hédé <phede-ml@islande.org>) |
List | pgsql-hackers |
Patrice, do you have an updated patch you want applied to 7.3? --------------------------------------------------------------------------- Patrice H�d� wrote: > Hi, > > I have been working a bit at a patch for that problem in psql. The > patch is far from being ready for inclusion or whatever, it's just for > comments... > > By the way, someone can tell me how to generate nice patches showing > the difference between one's version and the cvs code that has been > downloaded ? I'm new to this (I've only used cvs for personal projects > so far, and I don't need to send patches to myself ;) ). > > The good things in this patch : > > - it works for me :) > > - I've used Markus Kuhn's implementation of wcwidth.c : it is locale > independant, and is in the public domain. :) [if we keep it, I'll > have to tell him, though !] > > - No dependency on the local libc's UTF-8-awareness ;) [I've seen that > psql has no such dependancy, at least in print.c, so I haven't added > any]. Actually, the change is completely self-contained. > > - I've made my own utf-8 -> ucs converter, since I haven't found any > without a copyright notice yesterday. It checks invalid and > non-optimal UTF-8 sequences, as requested per Unicode 3.0.1 (or 3.1, > I don't remember). > > - it works for japanese (and I believe other "full-width" characters). > > - if MULTIBYTE is not defined, the code doesn't change from the > commited version. > > The not so good things : > > - I've made my own utf-8 -> ucs converter... It seems to work fine, > but it's not tested well enough, it may not be so robust. > > - The printf( "%*s", width, utfstr) doesn't work as expected, so I had > to fix by doing printf( "%*s%s", width - utfstrwidth, "", utfstr); > > - everything in #ifdef MULTIBYTE/#endif . Since they're is no > dependancy on anything else (including the rest of the multibyte > implementation - which I haven't had the time to look at in detail), > it doesn't depend on it. > > - I get this (for each call to pg_mb_utfs_width) and I don't know why : > > print.c:265: warning: passing arg 1 of `pg_mb_utfs_width' discards > qualifiers from pointer target type > > - If pg_mb_utfs_width finds an invalid UTF-8 string, it truncates it. > I suppose that's what we want to do, but that's probably not the > best place to do it. > > The bad things : > > - If MULTIBYTE is defined, the strings must be in UTF-8, it doesn't > check any encoding. > > - it is not integrated at all with the rest of the MB code. > > - it doesn't respect the indentation policy ;) > > > To do : > > - integrate better with the rest of the MB (client-side encoding), and > with the rest of the code of print.c . > > - verify utf8-to-ucs robustness seriously. > > - make a visually nicer code :) > > - find better function names. > > And possibly : > > - consolidate the code, in order to remove the need for the #ifdef's > in many places. > > - make it working with some others multiwidth-encoding (but then, I > don't know anything about these encodings myself !). > > - check also utf-8 stream at input time, so that no invalid utf-8 is > sent to the backend (at least from psql - the backend will need also > a strict checking for UTF-8). > > - add nice UTF-8 borders as an option :) > > - add a command-line parameter to consider Unicode Ambiguous > characters (characters which can be narrow or wide, depending on the > terminal) wide characters, as it seems to be the case for CJK > terminals (as per TR#11). > > - What else ? > > > BTW, here is the table I had in the first mail. I would have shown the > one with all the weird Unicode characters, but my mutt is configured > with iso-8859-15, and I doubt many of you have utf-8 as a default yet > :) > > +------+-------+--------+ > | lang | text | text | > +------+-------+--------+ > | isl | ?l?ta | ?leit | > | isl | ?l?ta | ?litum | > | isl | ?l?ta | ?liti? | > | isl | ma?ur | mann | > | isl | ma?ur | m?nnum | > | isl | ma?ur | manna | > | isl | ?ska | -a?i | > +------+-------+--------+ > > > The files in attachment : > - a diff for pgsql/src/bin/psql/print.c > - a diff for pgsql/src/bin/psql/Makefile > - two new files : > pgsql/src/bin/psql/pg_mb_utf8.c > pgsql/src/bin/psql/pg_mb_utf8.h > > Have fun ! > > Patrice > > -- > Patrice H?D? ------------------------------- patrice ? islande org ----- > -- Isn't it weird how scientists can imagine all the matter of the > universe exploding out of a dot smaller than the head of a pin, but they > can't come up with a more evocative name for it than "The Big Bang" ? > -- What would _you_ call the creation of the universe ? > -- "The HORRENDOUS SPACE KABLOOIE !" - Calvin and Hobbes > ------------------------------------------ http://www.islande.org/ ----- [ Attachment, skipping... ] [ Attachment, skipping... ] [ Attachment, skipping... ] [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
pgsql-hackers by date: