Re: [PATCH] Patch to fix a crash of psql - Mailing list pgsql-hackers

From Tatsuo Ishii
Subject Re: [PATCH] Patch to fix a crash of psql
Date
Msg-id 20121130.135229.1173369746228315179.t-ishii@sraoss.co.jp
Whole thread Raw
In response to [PATCH] Patch to fix a crash of psql  (JiangGuiqing <jianggq@cn.fujitsu.com>)
Responses Re: [PATCH] Patch to fix a crash of psql
Re: [PATCH] Patch to fix a crash of psql
List pgsql-hackers
>> 1. some especial character
>> (my sql file contains japanese comment "-- コメント" .  It can cause
>> psql crash.)
>> 2. PGCLIENTENCODING is SJIS
>> 3. the encoding of input sql file is UTF-8

Actually the problem can occur even when importing following 3 byte
UTF8 input file:

ト

(in hexa, 0xe3, 0x83, 0x88)

In this paticular case, psql decides that the total character length is 
5, not 3. Because it just looks at the each first byte by calling PQmblen:

0xe3 -> 1 bytes in SJIS
0x83 -> 2 bytes in SJIS
0x88 -> 2 bytes in SJIS
total: 5 bytes

which is apparently wrong and causes subsequent segfault. Note that it
is possible that "input file > psql decision" case as well if client
encoding is different from file encoding, which will not be good too.
I think we should detect the cases as much as possible and warn users,
rather than silently ignore that fact client encoding != file
encoding. I don't think we can detect it in a reliable way, but at
least we could check the cases above(sum of PQmblen is not equale to
buffer lenghth). So my proposal is, if prepare_buffer() detects
possible inconsistency between buffer encoding and file encoding, warn
user.

[t-ishii@localhost psql]$ PGCLIENTENCODING=SJIS psql postgres
Pager usage is off.
psql (9.3devel)
Type "help" for help.

postgres=# \i ~/sql
CREATE DATABASE
You are now connected to database "mydb" as user "t-ishii".
CREATE SCHEMA
psql:/home/t-ishii/sql:7: warning: possible conflict between client encoding SJIS and input file encoding
CREATE TABLE

Comments?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
diff --git a/src/bin/psql/psqlscan.l b/src/bin/psql/psqlscan.l
index d32a12c..a14d6fe 100644
--- a/src/bin/psql/psqlscan.l
+++ b/src/bin/psql/psqlscan.l
@@ -1808,7 +1808,29 @@ prepare_buffer(const char *txt, int len, char **txtcopy)            newtxt[i] = txt[i];
 i++;            while (--thislen > 0)
 
+            {
+                if (i >= len)
+                {
+                    /*
+                     * This could happen if cur_state->encoding is
+                     * different from input file encoding.
+                     */
+                    psql_error("warning: possible conflict between client encoding %s and input file encoding\n",
+                               pg_encoding_to_char(cur_state->encoding));
+                    break;
+                }                newtxt[i++] = (char) 0xFF;
+            }
+        }
+
+        if (i != len)
+        {
+            /*
+             * This could happen if cur_state->encoding is
+             * different from input file encoding.
+             */
+            psql_error("warning: possible conflict between client encoding %s and input file encoding\n",
+                       pg_encoding_to_char(cur_state->encoding));        }    }

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: initdb.c::main() too large
Next
From: JiangGuiqing
Date:
Subject: Re: [PATCH] Patch to fix a crash of psql