Dear Tatsuo,
> > > 1) a character is not always represented on a terminal propotional to
> > > the storage size. For example a kanji character in UTF-8 encoding
> > > has a storage size of 3 bytes while it occupies spaces only twice
> > > of ASCII characters on a terminal. Same thing can be said to LATIN
> > > 2,3 etc. in UTF-8 perhaps.
> >
> > I thought I dealt with that in the code by calling PQmblen for every char.
> > Am I wrong ?
>
> PQmblen returns the storage size, which is not necessarily same as the
> character width reprensented in a terminal. For example for a kanji
> character in UTF-8 PQmblen returns 3, but it ocuppies 2 x ASCII
> character space, not x 3. Isn't that a problem for you?
If I read you correctly, you mean that 1 character may take 3 bytes
of storage in the string, but it is not guaranteed to be 1 character
from the terminal perspective... Argh, that's definitely an issue:-(
I assumed that one character whatever the encoding would be 1 character
on the display.
If it is not the case, I think I can put/compute this information in the
translation structures that is use by PQmblen, and implement a
PQmbtermlen function...
Maybe you could point me some source of information about display lengths
of characters depending on the encoding?
> > What I mean by "ASCII compatible" is that spaces, new lines, carriage
> > returns, tabs and NULL (C string terminaison) are one byte characters.
> > This assumption seemed pretty safe to me.
>
> I think you can do it safely using PQmblen.
Ok, what you describe is basically what I've done with the qidx
computation as suggested by Tom Lane and then later I check that the
encoded length is one to find my special characters.
Thanks for you reply,
--
Fabien Coelho - coelho@cri.ensmp.fr