Hi all.
I'm a bit fresh on this particular subject, so I'm asking...
I have PostgreSQL 7.1.2 *without* multibyte support. I have a relatively simple
database, some of the tables hold name/surname fields.
The data was loaded from a set of transformed HTML pages that have those fields
inside HTML tables. The pransformation of data from HTML to TXT went OK.
The local language data was encoded as Windows-1250 charset and it was loaded as
such into PostgreSQL. The data is there, definitely.
When I access it from PHP4 it displays the text fields as they are and since the
encoding of the data matches the declared encoding of the page, it works as it
should.
----
When I access the data in PostgreSQL table from JDBC (JSP or standalone Java
application), all "extended" characters used for letters specific to our
alphabet are displayed as "?".
I understand that Java uses Unicode, hence so does JDBC. I also know that
compiling PostgreSQL with "multibyte" option will give me a Unicode capable
database. So far, so good. I am uncertain about next steps. At this moment I'm
compiling PostgreSQL 7.1.3 with following options:
--enable-locale enable locale support
--enable-recode enable character set recode support
--enable-multibyte enable multibyte character support
--enable-unicode-conversion enable unicode conversion support
--with-perl build Perl interface and PL/Perl
--with-java build JDBC interface and Java tools
--with-openssl[=DIR] build with OpenSSL support [/usr/local/ssl]
--enable-odbc build the ODBC driver package
--enable-syslog enable logging to syslog
QUESTION 1: Does anybody have a good advice on what steps to take further?
QUESTION 2: What would be the best way to convert Windows-1250 into Unicode?
QUESTION 3: Should I convert data into Latin-2 (ISO-8859-2) encoding?
QUESTION 4: How would I enter a Unicode character into "psql" or SQL statement?
QUESTION 5: Anyone has any experience with this problem and client side?
I realize that the last question is best asked on some other mailing list, but
I'll ask it anyway. Since this is supposed to be a Web application, how will Web
clients enter Unicode data? Is there a Unicode keyboard mapper?
I imagine that I could accept ISO-8859-2 encoded data in my JSP page. Will Java
Servlet engine convert ISO-8859-2 into Unicode?
TYIA.
Anxiously awaiting your replies.
Nix.