Thread: BUG #5944: COPY FROM doesn't work with international characters
The following bug has been logged online: Bug reference: 5944 Logged by: Nathan Davalos Email address: n.davalos@sharedmarketing.com PostgreSQL version: 9 Operating system: Windows XP/2003 Description: COPY FROM doesn't work with international characters Details: I'm trying to load a tab delimited text file with COPY FROM under version 9. create table tmpintermediate ( AcNumber character varying(20), AcName character varying(50)); SET CLIENT_ENCODING TO 'WIN1251'; copy tmpintermediate from 'thefile.txt'; Sample contents of thefile: 230002 Alto Desempeño, S.A. De C.V. When using WIN1251 or WIN1252 I get nothing in the second field, it just ignores the data. Same thing for LATIN-1. When using UTF8 for client encoding I get this message: ERROR: invalid byte sequence for encoding "UTF8": 0xf16f2c20 CONTEXT: COPY tmpintermediate , line 1 Not sure what the problem is, I assumed encoding the database as UTF8 would just take any character. Inserting the data with an insert statement works perfectly normal. Is it something I'm doing wrong with COPY FROM? The database is UTF8, in 8.3 we were using WIN1252 for the database encoding, but when we migrated to 8.4 everything was forced to UTF8 regardless of what the locale was specified during install using pg_dumpall to migrate the data, so I just left things as UTF8. Not sure how to proceed with this one. We create all our databases using a customized template datatabase (we leave the ones that install with postgresql intact and untouched). The copy commands are being issued by pgadmin3 if that makes any difference. That and we use MicroOlap's PosgresDAC components for Delphi.
On 03/23/11 4:32 PM, Nathan Davalos wrote: > ... > SET CLIENT_ENCODING TO 'WIN1251'; > copy tmpintermediate from 'thefile.txt'; > > > Sample contents of thefile: > 230002 Alto Desempe=C3=B1o, S.A. De C.V. > > When using WIN1251 or WIN1252 I get nothing in the second field, it just > ignores the data. Same thing for LATIN-1. > > When using UTF8 for client encoding I get this message: > ERROR: invalid byte sequence for encoding "UTF8": 0xf16f2c20 > CONTEXT: COPY tmpintermediate , line 1 what is the byte (binary) encoding of the file? in hex, =C3=B1 in win1251 =3D=3D (no such character. win1251 is cyrillic) =C3=B1 in win1252 =3D=3D F1 =C3=B1 in UTF-8 =3D=3D C3 B1 =20=20
MzIzMzMwMzAzMDMyMDk0MTZDNzQ2RjIwNDQ2NTczNjU2RDcwNjVGMTZGMkMy MDUzMkU0MTJFMjA0NDY1MjA0MzJFNTYyRTBEMEENClRoZSBjaGFyYWN0ZXIg aW4gcXVlc3Rpb24gaXMgRjENCg0KLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0t LS0NCkZyb206IEpvaG4gUiBQaWVyY2UgW21haWx0bzpwaWVyY2VAaG9ncmFu Y2guY29tXSANClNlbnQ6IFdlZG5lc2RheSwgTWFyY2ggMjMsIDIwMTEgNjo0 OSBQTQ0KVG86IE5hdGhhbiBNLiBEYXZhbG9zDQpDYzogcGdzcWwtYnVnc0Bw b3N0Z3Jlc3FsLm9yZw0KU3ViamVjdDogUmU6IFtCVUdTXSBCVUcgIzU5NDQ6 IENPUFkgRlJPTSBkb2Vzbid0IHdvcmsgd2l0aCBpbnRlcm5hdGlvbmFsIGNo YXJhY3RlcnMNCg0KT24gMDMvMjMvMTEgNDozMiBQTSwgTmF0aGFuIERhdmFs b3Mgd3JvdGU6DQo+IC4uLg0KPiBTRVQgQ0xJRU5UX0VOQ09ESU5HIFRPICdX SU4xMjUxJzsNCj4gY29weSB0bXBpbnRlcm1lZGlhdGUgZnJvbSAndGhlZmls ZS50eHQnOw0KPg0KPg0KPiBTYW1wbGUgY29udGVudHMgb2YgdGhlZmlsZToN Cj4gMjMwMDAyCUFsdG8gRGVzZW1wZcOxbywgUy5BLiBEZSBDLlYuDQo+DQo+ IFdoZW4gdXNpbmcgV0lOMTI1MSBvciBXSU4xMjUyIEkgZ2V0IG5vdGhpbmcg aW4gdGhlIHNlY29uZCBmaWVsZCwgaXQganVzdA0KPiBpZ25vcmVzIHRoZSBk YXRhLiBTYW1lIHRoaW5nIGZvciBMQVRJTi0xLg0KPg0KPiBXaGVuIHVzaW5n IFVURjggZm9yIGNsaWVudCBlbmNvZGluZyBJIGdldCB0aGlzIG1lc3NhZ2U6 DQo+IEVSUk9SOiAgaW52YWxpZCBieXRlIHNlcXVlbmNlIGZvciBlbmNvZGlu ZyAiVVRGOCI6IDB4ZjE2ZjJjMjANCj4gQ09OVEVYVDogIENPUFkgdG1waW50 ZXJtZWRpYXRlICwgbGluZSAxDQoNCndoYXQgaXMgdGhlIGJ5dGUgKGJpbmFy eSkgZW5jb2Rpbmcgb2YgdGhlIGZpbGU/ICBpbiBoZXgsDQoNCsOxIGluIHdp bjEyNTEgID09IChubyBzdWNoIGNoYXJhY3Rlci4gICAgd2luMTI1MSBpcyBj eXJpbGxpYykNCsOxIGluIHdpbjEyNTIgID09IEYxDQrDsSBpbiBVVEYtOCAg ICA9PSBDMyBCMQ0KDQoNCg0KDQoNCiAgDQoNCg0K
On 03/23/11 5:42 PM, Nathan M. Davalos wrote: > Just in case the reply didn't go through the character in question is F1. > > I replied to the message using pgsql-bugs@postgresql.org, but I'm a wee bit new to the bug reporting stuff using the mailinglist. > > The hex of the file in total is > 32.33.30.30.30.32.09.41.6C.74.6F.20.44.65.73.65.6D.70.65.F1.6F.2C.20.53.2E.41.2E.20.44.65.20.43.2E.56.2E.0D.0A k, thats certainly win-1252. I see a tab after the 230002, then the rest of it is all one field, ending in a <CR><LF> sequence. The table you're copying this data to, what fields does it have?
QWNOdW1iZXIgY2hhcmFjdGVyIHZhcnlpbmcoMjApLA0KQWNOYW1lIGNoYXJh Y3RlciB2YXJ5aW5nKDUwKQ0KDQpUaGUgZGF0YWJhc2UgaXRzZWxmIGlzOg0K ICAgICAgIEVOQ09ESU5HID0gJ1VURjgnDQogICAgICAgVEFCTEVTUEFDRSA9 IHBnX2RlZmF1bHQNCiAgICAgICBMQ19DT0xMQVRFID0gJ0VuZ2xpc2hfVW5p dGVkIFN0YXRlcy4xMjUyJw0KICAgICAgIExDX0NUWVBFID0gJ0VuZ2xpc2hf VW5pdGVkIFN0YXRlcy4xMjUyJw0KDQoNCg0KLS0tLS1PcmlnaW5hbCBNZXNz YWdlLS0tLS0NCkZyb206IEpvaG4gUiBQaWVyY2UgW21haWx0bzpwaWVyY2VA aG9ncmFuY2guY29tXSANClNlbnQ6IFdlZG5lc2RheSwgTWFyY2ggMjMsIDIw MTEgNzo1NCBQTQ0KVG86IE5hdGhhbiBNLiBEYXZhbG9zOyBQb3N0Z3JlU1FM IEJ1Z3MNClN1YmplY3Q6IFJlOiBbQlVHU10gQlVHICM1OTQ0OiBDT1BZIEZS T00gZG9lc24ndCB3b3JrIHdpdGggaW50ZXJuYXRpb25hbCBjaGFyYWN0ZXJz DQoNCk9uIDAzLzIzLzExIDU6NDIgUE0sIE5hdGhhbiBNLiBEYXZhbG9zIHdy b3RlOg0KPiBKdXN0IGluIGNhc2UgdGhlIHJlcGx5IGRpZG4ndCBnbyB0aHJv dWdoIHRoZSBjaGFyYWN0ZXIgaW4gcXVlc3Rpb24gaXMgRjEuDQo+DQo+IEkg cmVwbGllZCB0byB0aGUgbWVzc2FnZSB1c2luZyAgcGdzcWwtYnVnc0Bwb3N0 Z3Jlc3FsLm9yZywgYnV0IEknbSBhIHdlZSBiaXQgbmV3IHRvIHRoZSBidWcg cmVwb3J0aW5nIHN0dWZmIHVzaW5nIHRoZSBtYWlsaW5nIGxpc3QuDQo+DQo+ IFRoZSBoZXggb2YgdGhlIGZpbGUgaW4gdG90YWwgaXMNCj4gMzIuMzMuMzAu MzAuMzAuMzIuMDkuNDEuNkMuNzQuNkYuMjAuNDQuNjUuNzMuNjUuNkQuNzAu NjUuRjEuNkYuMkMuMjAuNTMuMkUuNDEuMkUuMjAuNDQuNjUuMjAuNDMuMkUu NTYuMkUuMEQuMEENCg0KaywgdGhhdHMgY2VydGFpbmx5IHdpbi0xMjUyLiAg SSBzZWUgYSB0YWIgYWZ0ZXIgdGhlIDIzMDAwMiwgdGhlbiB0aGUgDQpyZXN0 IG9mIGl0IGlzIGFsbCBvbmUgZmllbGQsIGVuZGluZyBpbiBhIDxDUj48TEY+ IHNlcXVlbmNlLg0KDQoNClRoZSB0YWJsZSB5b3UncmUgY29weWluZyB0aGlz IGRhdGEgdG8sIHdoYXQgZmllbGRzIGRvZXMgaXQgaGF2ZT8NCg==
SSBqdXN0IHRyaWVkIGl0IGZyb20gdGhlIHBzcWwgY29uc29sZSBhbmQgaXQg c2VlbXMgdG8gd29yaywgd2hpY2ggaXNuJ3QgcmVhbGx5IGFuIG9wdGlvbiBm b3IgdXMgdG8gdXNlIHNpbmNlIHdlIG5lZWQgdG8gYmUgYWJsZSB0byBsZXQg ZW5kIHVzZXJzIGxvYWQgZGF0YSB2aWEgb3VyIGN1c3RvbSBkZXNpZ25lZCBk YXRhYmFzZSBhcHBsaWNhdGlvbi4gTm9uZSBvZiB0aGVtIHdvdWxkIHVuZGVy c3RhbmQgb3Iga25vdyBob3cgdG8gdXNlIHRoZSBjb25zb2xlLiBDdXJyZW50 bHkgdGhlcmUncyBubyBwcmVzc2luZyBuZWVkIHRvIGJlIGFibGUgdG8gZG8g dGhpcyBmb3IgdXMsIHNvIHdlJ3JlIGp1c3QgbG9hZGluZyB0aGUgZGF0YSB3 aXRoIGEgc2VyaWVzIG9mIGluc2VydCBzdGF0ZW1lbnRzIGluc3RlYWQgb2Yg Q09QWSBGUk9NLg0KDQp0ZXN0ZGI9IyBTRVQgQ0xJRU5UX0VOQ09ESU5HIFRP ICdXSU4xMjUyJzsNClNFVA0KdGVzdGRiPSMgY3JlYXRlIHRhYmxlIHRtcGlu dGVybWVkaWF0ZSAoYWNudW1iZXIgY2hhcmFjdGVyIHZhcnlpbmcoMjApLGFj bmFtZSBjaGFyYWN0ZXIgdmFyeWluZyg1MCkNCik7DQpDUkVBVEUgVEFCTEUN CnRlc3RkYj0jIGNvcHkgdG1waW50ZXJtZWRpYXRlIGZyb20gJ2M6XFx0ZW1w XFx0aGVmaWxlLnR4dCc7DQpXQVJOSU5HOiAgbm9uc3RhbmRhcmQgdXNlIG9m IFxcIGluIGEgc3RyaW5nIGxpdGVyYWwNCkxJTkUgMTogY29weSB0bXBpbnRl cm1lZGlhdGUgZnJvbSAnYzpcXHRlbXBcXHRoZWZpbGUudHh0JzsNCiAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICBeDQpISU5UOiAgVXNlIHRo ZSBlc2NhcGUgc3RyaW5nIHN5bnRheCBmb3IgYmFja3NsYXNoZXMsIGUuZy4s IEUnXFwnLg0KQ09QWSAxDQp0ZXN0ZGI9IyBzZWxlY3QgKiBmcm9tIHRtcGlu dGVybWVkaWF0ZTsNCiBhY251bWJlciB8ICAgICAgICAgICAgYWNuYW1lDQot LS0tLS0tLS0tKy0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLQ0KIDIz MDAwMiAgIHwgQWx0byBEZXNlbXBlwrFvLCBTLkEuIERlIEMuVi4NCigxIHJv dykNCg0KVGhlIG91dHB1dCBpcyB3cm9uZyBpbiB0aGUgc2VsZWN0IHN0YXRl bWVudCBmcm9tIHRoZSBjb25zb2xlLCBidXQgYXBwZWFycyBjb3JyZWN0bHkg dXNpbmcgcGdhZG1pbjMuIEkgdGhpbmsgaXQncyBhIHByb2JsZW0gd2l0aCBv bmUgaWYgdGhlIHJlZGlzdHJpYnV0YWJsZSBsaWJyYXJpZXMgaW5zdGVhZCBv ZiBwb3N0Z3Jlc3FsIGl0c2VsZi4gV2hlbiBsb2FkaW5nIHRocm91Z2ggcGdh ZG1pbjMgb3IgZWxzZXdoZXJlIHRoZSBhY25hbWUgZmllbGQganVzdCBlbmRz IHVwIGJsYW5rIHVzaW5nIFdJTjEyNTIsIHRoZSBDT1BZIEZST00gd2lsbCBq dXN0IGlnbm9yZSB3aGF0ZXZlciBkYXRhIGlzIHN1cHBvc2VkIHRvIGJlIGxv YWRlZCBpbnRvIHRoYXQgZmllbGQgd2hlbiB0aGVyZSBpcyBhbiBpbnRlcm5h dGlvbmFsIGNoYXJhY3Rlci4NCg0KLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0t LS0NCkZyb206IHBnc3FsLWJ1Z3Mtb3duZXJAcG9zdGdyZXNxbC5vcmcgW21h aWx0bzpwZ3NxbC1idWdzLW93bmVyQHBvc3RncmVzcWwub3JnXSBPbiBCZWhh bGYgT2YgTmF0aGFuIE0uIERhdmFsb3MNClNlbnQ6IFdlZG5lc2RheSwgTWFy Y2ggMjMsIDIwMTEgNzo1OSBQTQ0KVG86IFBvc3RncmVTUUwgQnVncw0KU3Vi amVjdDogUmU6IFtCVUdTXSBCVUcgIzU5NDQ6IENPUFkgRlJPTSBkb2Vzbid0 IHdvcmsgd2l0aCBpbnRlcm5hdGlvbmFsIGNoYXJhY3RlcnMNCg0KQWNOdW1i ZXIgY2hhcmFjdGVyIHZhcnlpbmcoMjApLA0KQWNOYW1lIGNoYXJhY3RlciB2 YXJ5aW5nKDUwKQ0KDQpUaGUgZGF0YWJhc2UgaXRzZWxmIGlzOg0KICAgICAg IEVOQ09ESU5HID0gJ1VURjgnDQogICAgICAgVEFCTEVTUEFDRSA9IHBnX2Rl ZmF1bHQNCiAgICAgICBMQ19DT0xMQVRFID0gJ0VuZ2xpc2hfVW5pdGVkIFN0 YXRlcy4xMjUyJw0KICAgICAgIExDX0NUWVBFID0gJ0VuZ2xpc2hfVW5pdGVk IFN0YXRlcy4xMjUyJw0KDQoNCg0KLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0t LS0NCkZyb206IEpvaG4gUiBQaWVyY2UgW21haWx0bzpwaWVyY2VAaG9ncmFu Y2guY29tXSANClNlbnQ6IFdlZG5lc2RheSwgTWFyY2ggMjMsIDIwMTEgNzo1 NCBQTQ0KVG86IE5hdGhhbiBNLiBEYXZhbG9zOyBQb3N0Z3JlU1FMIEJ1Z3MN ClN1YmplY3Q6IFJlOiBbQlVHU10gQlVHICM1OTQ0OiBDT1BZIEZST00gZG9l c24ndCB3b3JrIHdpdGggaW50ZXJuYXRpb25hbCBjaGFyYWN0ZXJzDQoNCk9u IDAzLzIzLzExIDU6NDIgUE0sIE5hdGhhbiBNLiBEYXZhbG9zIHdyb3RlOg0K PiBKdXN0IGluIGNhc2UgdGhlIHJlcGx5IGRpZG4ndCBnbyB0aHJvdWdoIHRo ZSBjaGFyYWN0ZXIgaW4gcXVlc3Rpb24gaXMgRjEuDQo+DQo+IEkgcmVwbGll ZCB0byB0aGUgbWVzc2FnZSB1c2luZyAgcGdzcWwtYnVnc0Bwb3N0Z3Jlc3Fs Lm9yZywgYnV0IEknbSBhIHdlZSBiaXQgbmV3IHRvIHRoZSBidWcgcmVwb3J0 aW5nIHN0dWZmIHVzaW5nIHRoZSBtYWlsaW5nIGxpc3QuDQo+DQo+IFRoZSBo ZXggb2YgdGhlIGZpbGUgaW4gdG90YWwgaXMNCj4gMzIuMzMuMzAuMzAuMzAu MzIuMDkuNDEuNkMuNzQuNkYuMjAuNDQuNjUuNzMuNjUuNkQuNzAuNjUuRjEu NkYuMkMuMjAuNTMuMkUuNDEuMkUuMjAuNDQuNjUuMjAuNDMuMkUuNTYuMkUu MEQuMEENCg0KaywgdGhhdHMgY2VydGFpbmx5IHdpbi0xMjUyLiAgSSBzZWUg YSB0YWIgYWZ0ZXIgdGhlIDIzMDAwMiwgdGhlbiB0aGUgDQpyZXN0IG9mIGl0 IGlzIGFsbCBvbmUgZmllbGQsIGVuZGluZyBpbiBhIDxDUj48TEY+IHNlcXVl bmNlLg0KDQoNClRoZSB0YWJsZSB5b3UncmUgY29weWluZyB0aGlzIGRhdGEg dG8sIHdoYXQgZmllbGRzIGRvZXMgaXQgaGF2ZT8NCg0KLS0gDQpTZW50IHZp YSBwZ3NxbC1idWdzIG1haWxpbmcgbGlzdCAocGdzcWwtYnVnc0Bwb3N0Z3Jl c3FsLm9yZykNClRvIG1ha2UgY2hhbmdlcyB0byB5b3VyIHN1YnNjcmlwdGlv bjoNCmh0dHA6Ly93d3cucG9zdGdyZXNxbC5vcmcvbWFpbHByZWYvcGdzcWwt YnVncw0K
I have another example of this issue with WIN1252. This line in a copy file: 659446 828 1 /6�\bH@^W^Za$H�\b�@\\/ <p><p></p><p>No valid or unique HTTP objects found in XML response.</p></p> Into this table: Table "public.ep_tests" Column | Type | Modifiers ---------+--------------+----------- id | bigint | not null v_id | integer | not null status | character(1) | not null vkey | text | details | text | Results in this error while loading the data: the following error is encounted by the \copy: ERROR: missing data for column "details" CONTEXT: COPY ep_tests, line 1028752: "659446 828 1 /6�\bH@â¨" The dump file was produced by using 9.0.3's pg_dump in text mode to dump an 8.2 database, then using 9.0.3's psql to load the file. Both servers are UTF8, locale WIN1252. So it looks like we're not successfully escaping characters on WIN1252. The characters in question are also latin characters. We've reproduced this on a clean install. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Josh Berkus wrote: > I have another example of this issue with WIN1252. This line in a copy > file: > > 659446 828 1 /6???\bH@^W^Za$H???\b???@\\/ <p><p></p><p>No > valid or unique HTTP objects found in XML response.</p></p> > > Into this table: > > Table "public.ep_tests" > Column | Type | Modifiers > ---------+--------------+----------- > id | bigint | not null > v_id | integer | not null > status | character(1) | not null > vkey | text | > details | text | > > Results in this error while loading the data: > > the following error is encounted by the \copy: ERROR: missing data for > column "details" > CONTEXT: COPY ep_tests, line 1028752: "659446 828 1 /6???\bH@?" > > The dump file was produced by using 9.0.3's pg_dump in text mode to dump > an 8.2 database, then using 9.0.3's psql to load the file. Both servers > are UTF8, locale WIN1252. > > So it looks like we're not successfully escaping characters on WIN1252. > The characters in question are also latin characters. > > We've reproduced this on a clean install. Has this been fixed? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Tue, Sep 6, 2011 at 12:48 PM, Bruce Momjian <bruce@momjian.us> wrote: > Josh Berkus wrote: >> I have another example of this issue with WIN1252. =A0This line in a copy >> file: >> >> 659446 =A0828 =A0 =A0 1 =A0 =A0 =A0 /6???\bH@^W^Za$H???\b???@\\/ =A0 =A0= <p><p></p><p>No >> valid or unique HTTP objects found in XML response.</p></p> >> >> Into this table: >> >> =A0 =A0 =A0Table "public.ep_tests" >> Column =A0| =A0 =A0 Type =A0 =A0 | Modifiers >> ---------+--------------+----------- >> id =A0 =A0 =A0| bigint =A0 =A0 =A0 | not null >> v_id =A0 =A0| integer =A0 =A0 =A0| not null >> status =A0| character(1) | not null >> vkey =A0 =A0| text =A0 =A0 =A0 =A0 | >> details | text =A0 =A0 =A0 =A0 | >> >> Results in this error while loading the data: >> >> the following error is encounted by the \copy: ERROR: =A0missing data for >> column "details" >> CONTEXT: =A0COPY ep_tests, line 1028752: "659446 =A0828 =A0 =A0 1 =A0 = =A0 =A0 /6???\bH@?" >> >> The dump file was produced by using 9.0.3's pg_dump in text mode to dump >> an 8.2 database, then using 9.0.3's psql to load the file. =A0Both serve= rs >> are UTF8, locale WIN1252. >> >> So it looks like we're not successfully escaping characters on WIN1252. >> =A0The characters in question are also latin characters. >> >> We've reproduced this on a clean install. > > Has this been fixed? I don't think so. It's not really clear to me what the issue is. Is there some confusion between the encoding of the file and the encoding in use in the client session? It's odd that we would have a bug that only affects win1252. --=20 Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 10/19/11 5:41 AM, Robert Haas wrote: > I don't think so. It's not really clear to me what the issue is. Is > there some confusion between the encoding of the file and the encoding > in use in the client session? It's odd that we would have a bug that > only affects win1252. I think it's quite possible that this is something broken in the win1252 encoding itself. I've seen a lot of reports online for errors from other software. However, we need to at least find a workaround for users if we can't fix it ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com