BUG #9210: PostgreSQL string store bug? not enforce check with correct characterSET/encoding - Mailing list pgsql-bugs
From | digoal@126.com |
---|---|
Subject | BUG #9210: PostgreSQL string store bug? not enforce check with correct characterSET/encoding |
Date | |
Msg-id | 20140213140759.2710.49877@wrigleys.postgresql.org Whole thread Raw |
Responses |
Re: BUG #9210: PostgreSQL string store bug? not enforce check with correct characterSET/encoding
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 9210 Logged by: digoal.zhou Email address: digoal@126.com PostgreSQL version: 9.3.2 Operating system: CentOS 6.4 Description: In PostgreSQL , in addition sql_ascii coded character set encoding does not check the legitimacy of the other characters are encoded legality checks . This has clear instructions in the PostgreSQL manual: Reference http://www.postgresql.org/docs/9.3/static/multibyte.html http://blog.163.com/digoal @ 126/blog/static/163877040201211281407682 / http://blog.163.com/digoal @ 126/blog/static/16387704020132150381348 / http://blog.163.com/digoal @ 126/blog/static/1638770402011718112210516 / For example, we are using a database server UTF8 character set , then the time stored UTF8 database checks whether the agreement , if it does not it will error . For example: ERROR: invalid byte sequence for encoding "UTF8": 0xee 0xc1 0x22 It would appear that illegal characters should not be stored in the database, but recently doing some data migration discovered this problem , the two databases on two servers, operating systems consistent LANG, are UTF8, database the characters are also UTF8, but after export data to another database into some of the data that is reported such an error , and therefore can not normally migrate some data . >From the face of it , it should be stored in the database of the illegal character . Then since there are checks , but also how to store into it? I used here convert_from this function to restore this phenomenon. We see , in addition to sql_ascii will convert character sets. postgres = # select t, t :: bytea from convert_from ('\ xeec1', 'gbk') as g (t);  t | t ---- + ----------  He | \ xe79b8d (1 row) postgres = # select t, t :: bytea from convert_from ('\ xeec1', 'utf8') as g (t); ERROR: invalid byte sequence for encoding "UTF8": 0xee 0xc1 But truthfully use sql_ascii is stored byte stream , without any conversion. postgres = # select t, t :: bytea from convert_from ('\ xeec1', 'sql_ascii') as g (t);  t | t --- + --------    | \ Xeec1 (1 row) Using this method, the illegal byte stream into the database . postgres = # create table test (info text); CREATE TABLE postgres = # insert into test values ââ(convert_from ('\ xeec1', 'sql_ascii')); INSERT 0 1 postgres = # select info, info :: bytea from test;  info | info ------ + --------       | \ Xeec1 (1 row) This record came out , then the backup is not restored , as follows: postgres @ db-192-168-173-55-> pg_dump-t test-a | psql-f - SET SET SET SET SET SET psql: <stdin>: 19: ERROR: invalid byte sequence for encoding "UTF8": 0xee 0xc1 CONTEXT: COPY test, line 1 Can be used directly in the database, for example : postgres = # insert into test select * from test; INSERT 0 1 postgres = # select info, info :: bytea from test;  info | info ------ + --------       | \ Xeec1       | \ Xeec1 (2 rows) This problem makes the backup data reduction becomes a hassle . There may be other ways to put the data into the database illegally , which resulted in the present situation . If you want to import the data across databases currently available through DBLINK or external table to import the illegal character of the problem the way to use the backup to restore occurs . [ References ] 1. Src / backend / utils / mb / mbutils.c 2. Src / backend / utils / mb / wchar.c 3. Postgres = # \ df convert *                               List of functions    Schema | Name | Result data type | Argument data types | Type ------------ + -------------- + ------------------ + --- ------------------ + --------  pg_catalog | convert | bytea | bytea, name, name | normal  pg_catalog | convert_from | text | bytea, name | normal  pg_catalog | convert_to | bytea | text, name | normal (3 rows)
pgsql-bugs by date: