Trevor Talbot wrote:
> The encoding output is the one you specified.
OK.
> Keep in mind,
> underneath Windows is mostly working with Unicode, so all characters
> exist and the locale rules specify their behavior there. The encoding
> is just the byte stream it needs to force them all into after doing
> whatever it does to them. As you've seen, it uses some sort of
> best-fit mapping I don't know the details of. (It will drop accent
> marks and choose characters with similar shape where possible, by
> default.)
Right, that makes sense. The codepages used by setlocale etc. are just
translation tables to/from the internal unicode representation.
> I think it's a bit more complex for input/transform cases where you
> operate on the byte stream directly without intermediate conversion to
> Unicode, which is why UTF-8 doesn't work as a codepage, but again I
> don't have the details nearby. I can try to do more digging if
> needed.
It does (sort of) work as a codepage, it just doesn't have the NLS file
to define how things like UPPER() and LOWER() should work.
Regards, Dave