It is a common problem that a server uses a nontrivial character set
encoding (e.g., Unicode) but users forget to set an appropriate
client-side encoding. Then they get bogus displays for non-ASCII
characters because their client isn't actually prepared for Unicode.
There is a standard interface (SUSv2) for detecting the character set
based on the locale settings. I suggest we use this (if available) in
applications like psql and pg_dump by default unless it is overridden by
the usual mechanisms. If the character set name obtained this way is not
recognized by PostgreSQL, we fall back to SQL_ASCII.
Here's a piece of code that shows how this would work:
#include <stdio.h>
#include <locale.h>
#include <langinfo.h>
int
main(int argc, char *argv[])
{ setlocale(LC_ALL, ""); printf("%s\n", nl_langinfo(CODESET)); return 0;
}
(LC_CTYPE is the governing category for this.)
Comments?
--
Peter Eisentraut peter_e@gmx.net