Thread: Character Encoding Question
Hi all,
I am using Psycopg2 with Python3 & PostgreSQL 9.1 with a database encoded as UTF-8 (Kubuntu 12.10). I have a question about the encoding and decoding, and why some fetch calls did what I wanted while fetchall() seemed to choke. I got the following error message when calling fetchall():UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 9: ordinal not in range(128)
The traceback pointed to the fetchall() statement.
conn.set_client_encoding('LATIN9')
I tried that, but set the encoding to UTF-8 instead of the LATIN9, supposing that might be useful. Given the successful result, it appears that was just what the doctor ordered!
My first question is, is this really all I need to do - just set the client encoding? Or is there something else I need to do or be aware of?
My second question is why does fetchall() fail without setting the client encoding (while the other fetch calls work)?
Thanks!
Don
--
D.C. Parris, FMP, Linux+, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
D.C. Parris, FMP, Linux+, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
GPG Key ID: F5E179BE
On Thu, Mar 28, 2013 at 3:35 PM, Don Parris <parrisdc@gmail.com> wrote: > Hi all, > > I am using Psycopg2 with Python3 & PostgreSQL 9.1 with a database encoded as > UTF-8 (Kubuntu 12.10). I have a question about the encoding and decoding, > and why some fetch calls did what I wanted while fetchall() seemed to choke. > I got the following error message when calling fetchall(): > UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 9: > ordinal not in range(128) > > The traceback pointed to the fetchall() statement. > > First, I was able to do a pattern matching search on a specific field in my > payee table and could get back whatever records matched the user input in my > script. Then I ran a simple SELECT on the same table, using fetchall(), but > that choked. > > I decided to experiment some before asking questions here on the list, and > discovered that, if I call fetchone(), I got a single record with the fields > presented as rows in my Bash console. Then I tried fetchmany() and got the > same record presented as a tuple. > > In the psycopg usage doc > (http://www.initd.org/psycopg/docs/usage.html#unicode-handling), I found > this reference: > > conn.set_client_encoding('LATIN9') > > I tried that, but set the encoding to UTF-8 instead of the LATIN9, supposing > that might be useful. Given the successful result, it appears that was just > what the doctor ordered! Yes, that should be the case. What was the original encoding? Can you check "conn.client_encoding" (to get what psycopg thinks the encoding is) and get the result from the query "show client_encoding" (to get what the database say it is)? It's strange your connection is not in utf8 encoding by default. > My first question is, is this really all I need to do - just set the client > encoding? Or is there something else I need to do or be aware of? Yes, but as said it is suspicious it wasn't already in utf8. > My second question is why does fetchall() fail without setting the client > encoding (while the other fetch calls work)? Are you sure the error is in fetchall itself and not somewhere downstream? The decoding code path should be common for all the fetch*() methods. Can you produce a synthetic test case with a couple of rows in a table to trigger the error? What psycopg version are you using? Thanks, -- Daniele
Thanks Daniele,
I'm using psycopg 2.4.5 with Python 3.2.3 on Kubuntu 12.10.
This is the connection encoding based on a quick check at the Python console:
>>>print(con.encoding)
SQLASCII
I honestly don't know where the error actually is - not a very advanced programmer by any means. All I really know is that the program breaks when I use fetchall(), but it works with fetchone() and fetchmany(). It could have something to do with the way my main program module imports the module that calls the function, for all I know. Here is the traceback:I'm using psycopg 2.4.5 with Python 3.2.3 on Kubuntu 12.10.
This is the connection encoding based on a quick check at the Python console:
>>>print(con.encoding)
SQLASCII
File "main_program_file.py", line 63, in <module>
rpt_payee_list()
File "imported_sqlcommand_module.py", line 50, in rpt_payee_list
rows = cur.fetchall()
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 9: ordinal not in range(128)
Here is some fake data retrieved by the query (when specifying the UTF-8 client encoding):
(79, 'Payee', "Bob's Restaurant", '123 Some St.', 'Big City', 'STATE', 'USA', '337709', 'Restaurant')
(80, 'Payee', 'Tony's Fancy Restaurant', '34 Long Rd', 'Big City', 'STATE', 'USA', 'NA', 'Restaurant')
(81, 'Payee', 'Piña Colada Resort', 'Resort Avenue', 'Resort Town', 'Lima', 'Perú', 'NA', 'Resort Hotel')
(82, 'Payee', "Great Foods", '45 Eating Way', 'Big City', 'STATE', 'USA', '2028', 'Grocery Store')
(83, 'Payee', 'The Home Store', '25 Shopping Plaza', 'Big City', 'STATE', 'USA', '17', 'Home Improvement Store')
Here is the actual function in all its glory (try not to laugh too hard - I'm still learning):
def rpt_payee_list():
con = None
con = psycopg2.connect(database='financedb_dev', user='donp')
con.set_client_encoding('UTF-8')
cur = con.cursor()
try:
cur.execute("""SELECT * FROM entity
ORDER BY entity_name""")
except:
print('It ain\'t workin')
rows = cur.fetchall()
for row in rows:
print(row)
if con:
con.close()
def rpt_payee_list():
con = None
con = psycopg2.connect(database='financedb_dev', user='donp')
con.set_client_encoding('UTF-8')
cur = con.cursor()
try:
cur.execute("""SELECT * FROM entity
ORDER BY entity_name""")
except:
print('It ain\'t workin')
rows = cur.fetchall()
for row in rows:
print(row)
if con:
con.close()
A little later, I will likely create a connection function that I can call when I want to run a specific query, but for the initial steps, I've included the connection as part of the function that creates/runs the query.
On Thu, Mar 28, 2013 at 11:49 AM, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote:
On Thu, Mar 28, 2013 at 3:35 PM, Don Parris <parrisdc@gmail.com> wrote:Yes, that should be the case. What was the original encoding? Can you
> Hi all,
>
> I am using Psycopg2 with Python3 & PostgreSQL 9.1 with a database encoded as
> UTF-8 (Kubuntu 12.10). I have a question about the encoding and decoding,
> and why some fetch calls did what I wanted while fetchall() seemed to choke.
> I got the following error message when calling fetchall():
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 9:
> ordinal not in range(128)
>
> The traceback pointed to the fetchall() statement.
>
> First, I was able to do a pattern matching search on a specific field in my
> payee table and could get back whatever records matched the user input in my
> script. Then I ran a simple SELECT on the same table, using fetchall(), but
> that choked.
>
> I decided to experiment some before asking questions here on the list, and
> discovered that, if I call fetchone(), I got a single record with the fields
> presented as rows in my Bash console. Then I tried fetchmany() and got the
> same record presented as a tuple.
>
> In the psycopg usage doc
> (http://www.initd.org/psycopg/docs/usage.html#unicode-handling), I found
> this reference:
>
> conn.set_client_encoding('LATIN9')
>
> I tried that, but set the encoding to UTF-8 instead of the LATIN9, supposing
> that might be useful. Given the successful result, it appears that was just
> what the doctor ordered!
check "conn.client_encoding" (to get what psycopg thinks the encoding
is) and get the result from the query "show client_encoding" (to get
what the database say it is)? It's strange your connection is not in
utf8 encoding by default.Yes, but as said it is suspicious it wasn't already in utf8.
> My first question is, is this really all I need to do - just set the client
> encoding? Or is there something else I need to do or be aware of?Are you sure the error is in fetchall itself and not somewhere
> My second question is why does fetchall() fail without setting the client
> encoding (while the other fetch calls work)?
downstream? The decoding code path should be common for all the
fetch*() methods. Can you produce a synthetic test case with a couple
of rows in a table to trigger the error?
What psycopg version are you using?
Thanks,
-- Daniele
--
D.C. Parris, FMP, Linux+, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
GPG Key ID: F5E179BE
On Thu, Mar 28, 2013 at 4:34 PM, Don Parris <parrisdc@gmail.com> wrote: > Thanks Daniele, > > I'm using psycopg 2.4.5 with Python 3.2.3 on Kubuntu 12.10. > > This is the connection encoding based on a quick check at the Python > console: >>>>print(con.encoding) > SQLASCII In this case it's natural that you get an error on decode. But is it really the database encoding? What does this query say: cur = con.cursor() cur.execute("show client_encoding") print cur.fetchone()[0] > I honestly don't know where the error actually is - not a very advanced > programmer by any means. All I really know is that the program breaks when > I use fetchall(), but it works with fetchone() and fetchmany(). It could > have something to do with the way my main program module imports the module > that calls the function, for all I know. Here is the traceback: I'll try to load your dataset and play with it, thank you. -- Daniele
Please keep the mailing list in copy. On Thu, Mar 28, 2013 at 5:17 PM, Don Parris <parrisdc@gmail.com> wrote: > Here is the result of the query you suggested: > >>>> cur = con.cursor() >>>> cur.execute("show client_encoding") >>>> print(cur.fetchone()[0]) > SQL_ASCII This means that everything is going as expected: connection has been requested in ascii and obtained so. I'll check if really the encoding errors happen inconsistently in the fetch*() methods. What I suspect is that you issued fetchone() and got an ascii-only record: if you continued to loop on the entire recordset you'd eventually get an error. Try (without setting the encoding to utf8): cur.execute("select [ your stuff, including non-ascii records") while 1: record = cur.fetchone() if not record: break I'm pretty sure you will get an encoding error too. > I created the DB in postgresql using the following command: > CREATE DATABASE mydb > WITH TEMPLATE template0 ENCODING 'UTF8'; > > Although I cannot now recall my reason for doing so, there *is* a reason why > I'm using this approach. :-/ I should have said "why" in my comments! I don't see anything wrong here. The only question is what is setting the connection to SQL_ASCII. Maybe something in postgresql.conf or a setting of the database user. It's not a psycopg issue anyway: the driver is following the instructions. -- Daniele
On 28/03/2013 17:41, Daniele Varrazzo wrote: > On Thu, Mar 28, 2013 at 4:34 PM, Don Parris <parrisdc@gmail.com> wrote: >> > Thanks Daniele, >> > >> > I'm using psycopg 2.4.5 with Python 3.2.3 on Kubuntu 12.10. >> > >> > This is the connection encoding based on a quick check at the Python >> > console: >>>>> >>>>print(con.encoding) >> > SQLASCII > In this case it's natural that you get an error on decode. But is it > really the database encoding? What does this query say: IMHO this seems very much like the "uh-oh! I put latin9 data into my database and now I am trying to read it as unicode" problem. PostgreSQL does NOT enforce any kind of encoding. It just converts between the client encoding and the database one before storing the data. If you configure any of the two encodings the wrong way is quite easy to store data in the wrong format. federico
For the list:
Here is the result of the query you suggested:
>>> cur = con.cursor()
>>> cur.execute("show client_encoding")
>>> print(cur.fetchone()[0])
SQL_ASCII
I created the DB in postgresql using the following command:Here is the result of the query you suggested:
>>> cur = con.cursor()
>>> cur.execute("show client_encoding")
SQL_ASCII
CREATE DATABASE mydb
WITH TEMPLATE template0 ENCODING 'UTF8';
On Thu, Mar 28, 2013 at 12:41 PM, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote:
On Thu, Mar 28, 2013 at 4:34 PM, Don Parris <parrisdc@gmail.com> wrote:In this case it's natural that you get an error on decode. But is it
> Thanks Daniele,
>
> I'm using psycopg 2.4.5 with Python 3.2.3 on Kubuntu 12.10.
>
> This is the connection encoding based on a quick check at the Python
> console:
>>>>print(con.encoding)
> SQLASCII
really the database encoding? What does this query say:
cur = con.cursor()
cur.execute("show client_encoding")
print cur.fetchone()[0]I'll try to load your dataset and play with it, thank you.
> I honestly don't know where the error actually is - not a very advanced
> programmer by any means. All I really know is that the program breaks when
> I use fetchall(), but it works with fetchone() and fetchmany(). It could
> have something to do with the way my main program module imports the module
> that calls the function, for all I know. Here is the traceback:
-- Daniele
--
D.C. Parris, FMP, Linux+, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
GPG Key ID: F5E179BE
On Thu, Mar 28, 2013 at 1:30 PM, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote:
Please keep the mailing list in copy.
Yes... My apologies - I have gotten accustomed to munging lists.... Will try to remember (or maybe stop using gmail for this list).
On Thu, Mar 28, 2013 at 5:17 PM, Don Parris <parrisdc@gmail.com> wrote:
> Here is the result of the query you suggested:
<SNIP>
>Try (without setting the encoding to utf8):
cur.execute("select [ your stuff, including non-ascii records")
while 1:
record = cur.fetchone()
if not record:
break
I'm pretty sure you will get an encoding error too.
You are right about this. It stopped at record #7 (entity_id = 7) and gave me the traceback. That is, record #7 was the last record printed. Does that have anything to do with the position mentioned in the traceback? Would #8 have been the 'guilty' record?
This record was loaded (as were all of the first 80-100 in this table) via \copy command from within psql. The remaining records have been inserted manually via PGAdmin, simply opening the table and adding the rows. I also played with my pattern matching search query, which has turned up great results, until I searched on record #8 - then it broke.
This record was loaded (as were all of the first 80-100 in this table) via \copy command from within psql. The remaining records have been inserted manually via PGAdmin, simply opening the table and adding the rows. I also played with my pattern matching search query, which has turned up great results, until I searched on record #8 - then it broke.
Given that most of the early records were loaded via \copy, why & how would the character encoding change in the middle of a \copy command? Or is it possible they were all different when initially saved as a CSV file? I could understand better if the encoding broke the program, say at the first record I loaded via \copy, but maybe worked on the first record I inserted via PGAdmin - or vice versa.
I don't see anything wrong here. The only question is what is setting
> I created the DB in postgresql using the following command:
> CREATE DATABASE mydb
> WITH TEMPLATE template0 ENCODING 'UTF8';
>
> Although I cannot now recall my reason for doing so, there *is* a reason why
> I'm using this approach. :-/ I should have said "why" in my comments!
the connection to SQL_ASCII. Maybe something in postgresql.conf or a
setting of the database user. It's not a psycopg issue anyway: the
driver is following the instructions.
I can't say there's anything wrong with psycopg - I just want to better understand what is happening with my data and front-end code.
--
D.C. Parris, FMP, Linux+, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
GPG Key ID: F5E179BE
On Thu, Mar 28, 2013 at 1:12 PM, Federico Di Gregorio <fog@dndg.it> wrote:
Thanks Federico,
If I created the database using the UTF-8 encoding, then why would some data be encoded differently than the rest? And how can I control how the data gets inserted? See my previous post, where I mentioned loading a good chunk of the data via the \copy command in psql, and then later added more via PGAdmin. Many records seem to work just fine, but quite a few others don't - and I was just naively entering or loading data without knowing any encoding was being changed.
On 28/03/2013 17:41, Daniele Varrazzo wrote:IMHO this seems very much like the "uh-oh! I put latin9 data into my
> On Thu, Mar 28, 2013 at 4:34 PM, Don Parris <parrisdc@gmail.com> wrote:
>> > Thanks Daniele,
>> >
>> > I'm using psycopg 2.4.5 with Python 3.2.3 on Kubuntu 12.10.
>> >
>> > This is the connection encoding based on a quick check at the Python
>> > console:
>>>>> >>>>print(con.encoding)
>> > SQLASCII
> In this case it's natural that you get an error on decode. But is it
> really the database encoding? What does this query say:
database and now I am trying to read it as unicode" problem. PostgreSQL
does NOT enforce any kind of encoding. It just converts between the
client encoding and the database one before storing the data. If you
configure any of the two encodings the wrong way is quite easy to store
data in the wrong format.
federico
Thanks Federico,
If I created the database using the UTF-8 encoding, then why would some data be encoded differently than the rest? And how can I control how the data gets inserted? See my previous post, where I mentioned loading a good chunk of the data via the \copy command in psql, and then later added more via PGAdmin. Many records seem to work just fine, but quite a few others don't - and I was just naively entering or loading data without knowing any encoding was being changed.
On Thu, Mar 28, 2013 at 8:49 PM, Don Parris <parrisdc@gmail.com> wrote: > On Thu, Mar 28, 2013 at 1:30 PM, Daniele Varrazzo >> >Try (without setting the encoding to utf8): >> >> cur.execute("select [ your stuff, including non-ascii records") >> while 1: >> record = cur.fetchone() >> if not record: >> break >> >> I'm pretty sure you will get an encoding error too. >> > You are right about this. It stopped at record #7 (entity_id = 7) and gave > me the traceback. That is, record #7 was the last record printed. Does > that have anything to do with the position mentioned in the traceback? > Would #8 have been the 'guilty' record? yes: try again setting the encoding to utf8 and you'll see a record with a non-ascii char: it could be "Piña Colada Resort". > This record was loaded (as were all of the first 80-100 in this table) via > \copy command from within psql. The remaining records have been inserted > manually via PGAdmin Your database is probably fine and perfectly encoded. If there's anything to be understood is why psycopg connects by default with a SQL_ASCII encoding - which is not good to transfer the ñ char or any other char with ascii value > 127. The answer could be, from the most general to the most specific, in the postgresql.conf (global setting), in the pg_user view (per-user setting), in some environment variable (per-connection setting). Which one of these settings say client_encoding=sql_ascii? I bet it's the first. -- Daniele
On 28/03/13 16:59, Don Parris wrote: > If I created the database using the UTF-8 encoding, then why would some > data be encoded differently than the rest? And how can I control how > the data gets inserted? See my previous post, where I mentioned loading > a good chunk of the data via the \copy command in psql, and then later > added more via PGAdmin. Many records seem to work just fine, but quite > a few others don't - and I was just naively entering or loading data > without knowing any encoding was being changed. If the database is created with UTF-8 encoding, all character data will be encoded as UTF-8. The problem was that your client was using SQL_ASCII encoding so any UTF-8, non-ASCII data (i.e., characters above decimal 127) received from PG couldn't be decoded. IIRC the client encoding is set according to the template0 encoding. I would do a psql -l to see the encoding of other databases in your cluster, in particular template0, template1 and postgres. Joe
On Thu, Mar 28, 2013 at 7:07 PM, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote:
On Thu, Mar 28, 2013 at 8:49 PM, Don Parris <parrisdc@gmail.com> wrote:<SNIP>
> On Thu, Mar 28, 2013 at 1:30 PM, Daniele Varrazzo
Your database is probably fine and perfectly encoded. If there's
anything to be understood is why psycopg connects by default with a
SQL_ASCII encoding - which is not good to transfer the ñ char or any
other char with ascii value > 127. The answer could be, from the most
general to the most specific, in the postgresql.conf (global setting),
in the pg_user view (per-user setting), in some environment variable
(per-connection setting). Which one of these settings say
client_encoding=sql_ascii?
I bet it's the first.
-- Daniele
Aha! As it turns out, I started looking into the character set support in the postgresql documentation, and discovered the psql -l command. It showed this test database is actually *not* encoded in UTF-8 at all, but rather in ASCII. I am not sure how I managed to do that, but I did. I was sure I had used the same DB creation script and just changed the DB name, but clearly, I missed something. I am not sure if it is necessary to drop and re-create the database to correct this, but that is what I have done.
When I tried using \encoding or SET client_encoding, I got no errors, but I still saw this test DB set as ASCII when running the psql -l command. Anyway, I'll have to pursue this further later. Many thanks for the help!
--
D.C. Parris, FMP, Linux+, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
GPG Key ID: F5E179BE
On Thu, Mar 28, 2013 at 9:21 PM, Joe Abbate <jma@freedomcircle.com> wrote:
From the postgresql.conf file:On 28/03/13 16:59, Don Parris wrote:If the database is created with UTF-8 encoding, all character data will
> If I created the database using the UTF-8 encoding, then why would some
> data be encoded differently than the rest? And how can I control how
> the data gets inserted? See my previous post, where I mentioned loading
> a good chunk of the data via the \copy command in psql, and then later
> added more via PGAdmin. Many records seem to work just fine, but quite
> a few others don't - and I was just naively entering or loading data
> without knowing any encoding was being changed.
be encoded as UTF-8. The problem was that your client was using
SQL_ASCII encoding so any UTF-8, non-ASCII data (i.e., characters above
decimal 127) received from PG couldn't be decoded. IIRC the client
encoding is set according to the template0 encoding. I would do a psql
-l to see the encoding of other databases in your cluster, in particular
template0, template1 and postgres.
#client_encoding = sql_ascii # actually, defaults to database encoding
The way I understand it, the client should default to whatever encoding the DB uses. Sounds like they tried to make it difficult to do what I apparently did.
--
D.C. Parris, FMP, Linux+, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
GPG Key ID: F5E179BE
On 28/03/13 22:07, Don Parris wrote: > From the postgresql.conf file: > #client_encoding = sql_ascii # actually, defaults to database > encoding > > The way I understand it, the client should default to whatever encoding > the DB uses. Sounds like they tried to make it difficult to do what I > apparently did. Yes, you're right. However, when you do the cluster initdb, the template0 database gets created with whatever encoding is specified to initdb (IIRC nowadays the default is UTF-8, but in some previous version the default used to be SQL_ASCII). After that, any createdb that doesn't specify an encoding defaults to the template0 encoding. So if your template0 has SQL_ASCII encoding, it's very easy to create a database with that encoding. Joe
On Thu, Mar 28, 2013 at 10:42 PM, Joe Abbate <jma@freedomcircle.com> wrote:
On 28/03/13 22:07, Don Parris wrote:Yes, you're right. However, when you do the cluster initdb, the
> From the postgresql.conf file:
> #client_encoding = sql_ascii # actually, defaults to database
> encoding
>
> The way I understand it, the client should default to whatever encoding
> the DB uses. Sounds like they tried to make it difficult to do what I
> apparently did.
template0 database gets created with whatever encoding is specified to
initdb (IIRC nowadays the default is UTF-8, but in some previous version
the default used to be SQL_ASCII). After that, any createdb that
doesn't specify an encoding defaults to the template0 encoding. So if
your template0 has SQL_ASCII encoding, it's very easy to create a
database with that encoding.
Joe
Thanks Joe. That's good to know.
On Fri, Mar 29, 2013 at 2:01 AM, Don Parris <parrisdc@gmail.com> wrote: > Aha! As it turns out, I started looking into the character set support in > the postgresql documentation, and discovered the psql -l command. It showed > this test database is actually *not* encoded in UTF-8 at all, but rather in > ASCII. I am not sure how I managed to do that, but I did. I was sure I had > used the same DB creation script and just changed the DB name, but clearly, > I missed something. I am not sure if it is necessary to drop and re-create > the database to correct this, but that is what I have done. > > When I tried using \encoding or SET client_encoding, I got no errors, but I > still saw this test DB set as ASCII when running the psql -l command. > Anyway, I'll have to pursue this further later. Many thanks for the help! In this case you should convert your database to utf8 (because it contains utf8 data) asap. SQL_ASCII actually doesn't mean ASCII but means store whatever octet you throw at it as it is, it's more akin to binary data (but without the possibility to store 0x00). From your examples, and with some luck, your database may contain utf8 only data, but if you connect with different clients or encodings and feed some latin1 etc. the database will be just happy to accept everything, no question asked; just, it will be a nightmare to read the data back or to make it uniform later. If you don't have familiarity with encodings and relative problems, the Spolsky article is a nice introduction <http://www.joelonsoftware.com/articles/Unicode.html>. -- Daniele
On Fri, Mar 29, 2013 at 5:35 AM, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote:
On Fri, Mar 29, 2013 at 2:01 AM, Don Parris <parrisdc@gmail.com> wrote:In this case you should convert your database to utf8 (because it
> Aha! As it turns out, I started looking into the character set support in
> the postgresql documentation, and discovered the psql -l command. It showed
> this test database is actually *not* encoded in UTF-8 at all, but rather in
> ASCII. I am not sure how I managed to do that, but I did. I was sure I had
> used the same DB creation script and just changed the DB name, but clearly,
> I missed something. I am not sure if it is necessary to drop and re-create
> the database to correct this, but that is what I have done.
>
> When I tried using \encoding or SET client_encoding, I got no errors, but I
> still saw this test DB set as ASCII when running the psql -l command.
> Anyway, I'll have to pursue this further later. Many thanks for the help!
contains utf8 data) asap. SQL_ASCII actually doesn't mean ASCII but
means store whatever octet you throw at it as it is, it's more akin to
binary data (but without the possibility to store 0x00). From your
examples, and with some luck, your database may contain utf8 only
data, but if you connect with different clients or encodings and feed
some latin1 etc. the database will be just happy to accept everything,
no question asked; just, it will be a nightmare to read the data back
or to make it uniform later.
If you don't have familiarity with encodings and relative problems,
the Spolsky article is a nice introduction
<http://www.joelonsoftware.com/articles/Unicode.html>.
Thanks Daniele,
I think I sent a follow-up post to this one saying that I have now converted this db to UTF-8. I appreciate your help in tracking down what the problem was, as well as the link to this article. Good reading for sure. If I understand the article correctly, I can handle pretty much any language - Korean, bulgarian, Arabic, etc... - by using the UTF-8 encoding. Is that correct?
Incidentally, my code actually broke on records that were only in English. Or at least that is how it appears. The particular table I was searching on contains no non-English letters. It probably will contain non-English characters in the future, but does not now.
I am very interested in being able to support multiple languages, as my wife and I speak Castillano (Peruvian flavored) and I speak a little German and a few words in other languages. That's a topic for another day and probably for another list, however. :-)
Again, many thanks to all of you for the help!
--
D.C. Parris, FMP, Linux+, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
GPG Key ID: F5E179BE