Re: Error "invalid byte sequence for encoding UTF8" on insert into BYTEA column - Mailing list pgsql-general
From | Alan Millington |
---|---|
Subject | Re: Error "invalid byte sequence for encoding UTF8" on insert into BYTEA column |
Date | |
Msg-id | 251874.72534.qm@web25405.mail.ukl.yahoo.com Whole thread Raw |
In response to | Re: Error "invalid byte sequence for encoding UTF8" on insert into BYTEA column (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-general |
>You probably need to ask the mxODBC developers (who AFAIK don't hang out >on this list) what they are doing with that data. It sounds fairly >likely to me that the bytea value is just being sent as a string without >any special encoding. That would explain both the null sensitivity you >mention later in the thread, and the encoding validity complaints --- >PG 8.1 was much less picky about string encoding validity than recent >versions are. >There are basically two ways that you could make this work reliably: >arrange for the bytea value to be sent as an out-of-line binary >parameter, or encode it using backslash sequences (eg, '\000' for a >null). Whether the former is possible with mxODBC I dunno. The latter >might be something that mxODBC will do for you if it knows the value >is supposed to be bytea, but without that knowledge I don't see how it >could. You might end up having to do the encoding yourself. Preliminary notes: 1. I have now confirmed that at some point I upgraded from mxODBC 3.0 to 3.0.3. The statement in my original posting that my mxODBC installation had not changed was wrong. 2. The Python 'str' datatype is used for any sequence of single bytes, like C's array of char. One cannot tell from the datatype what these bytes are intended to represent: it could be ASCII characters, characters in any single-byte encoding, Unicode in any encoding, or binary data. I have discovered a workaround, which is to pass the data to mxODBC in a Python buffer object, which clearly identifies the data as binary. I wrote to eGenix about this as follows:
Marc-Andre Lemburg replied as follows:
What puzzles me is hinted at in the last sentence: why does Postgres 8.4.1 (though apparently not 8.1.4) try to interpret the bytes as UTF8 when they are being sent to a column that is typed as bytea?
I apologise if this posting is excessively long, but I like to understand the reasons for things, and others may find the information useful.
|
pgsql-general by date: