Thread: [PATCH] Patch to fix a crash of psql
hi When i test psql under multi-lingual and different encoding environment, I found a crash of psql. ---------------------------------------------------------------------- $ export PGCLIENTENCODING=SJIS $ psql psql (9.2rc1) Type "help" for help. postgres=# \i sql CREATE DATABASE You are now connected to database "mydb" as user "postgres". CREATE SCHEMA Segmentation fault (core dumped) $ ---------------------------------------------------------------------- I'm look into this problem and found that only some especial character can cause psql crash. conditions is: 1. some especial character (my sql file contains japanese comment "-- コメント" . It can cause psql crash.) 2. PGCLIENTENCODING is SJIS 3. the encoding of input sql file is UTF-8 I investigated this problem. The reasons are as follows. ---------------------------------------------------------------------- src/bin/psql/mainloop.c -> psql_scan_setup() //Set up to perform lexing of the given input line. -->prepare_buffer () //Set up a flex input buffer to scan the given data. ---->malloc character buffer. ---->set two \0 characters. (Flex wants two \0 characters after the actual data.) ---->working in an unsafe encoding, the copy has multibyte sequences replaced by FFs to avoid fooling the lexer rules. ****the encoding of input sql file is different from PGCLIENTENCODING, two \0 characters are replaced by FFs. **** ---->yy_scan_buffer() //Setup the input buffer state to scan directly from a user-specified character buffer. ****because two \0 characters are replaced by FFs,yy_scan_buffer() return 0. input buffer state can not setup correctly.**** -> psql_scan() //Do lexical analysis of SQL command text. --> yylex() //The main scanner function which does all the work. ****because input buffer state is not setup,so when access the input buffer state,segmentation fault is happened.**** ---------------------------------------------------------------------- I modify src/bin/psql/psqlscan.l to resolve this problem. The diff file refer to the attachment "psqlscan.l.patch". Regards, Jiang Guiqing
Attachment
I confirmed the problem. Also I confirmed your patch fixes the problem. In addition to this, all the tests in test/mb and test/regress are passed. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp > hi > > When i test psql under multi-lingual and different encoding > environment, > I found a crash of psql. > > ---------------------------------------------------------------------- > $ export PGCLIENTENCODING=SJIS > $ psql > psql (9.2rc1) > Type "help" for help. > > postgres=# \i sql > CREATE DATABASE > You are now connected to database "mydb" as user "postgres". > CREATE SCHEMA > Segmentation fault (core dumped) > $ > ---------------------------------------------------------------------- > > I'm look into this problem and found that > only some especial character can cause psql crash. > conditions is: > 1. some especial character > (my sql file contains japanese comment "-- コメント" . It can cause > psql crash.) > 2. PGCLIENTENCODING is SJIS > 3. the encoding of input sql file is UTF-8 > > > I investigated this problem. The reasons are as follows. > ---------------------------------------------------------------------- > src/bin/psql/mainloop.c > -> psql_scan_setup() //Set up to perform lexing of the given input line. > -->prepare_buffer () //Set up a flex input buffer to scan the given data. > ---->malloc character buffer. > ---->set two \0 characters. (Flex wants two \0 characters after the > actual data.) > ---->working in an unsafe encoding, the copy has multibyte sequences > replaced by FFs to avoid fooling the lexer rules. > ****the encoding of input sql file is different from PGCLIENTENCODING, two > \0 characters are replaced by FFs. **** > > ---->yy_scan_buffer() //Setup the input buffer state to scan directly > from a user-specified character buffer. > ****because two \0 characters are replaced by FFs,yy_scan_buffer() return > 0. input buffer state can not setup correctly.**** > > -> psql_scan() //Do lexical analysis of SQL command text. > --> yylex() //The main scanner function which does all the work. > ****because input buffer state is not setup,so when access the input > buffer state,segmentation fault is happened.**** > ---------------------------------------------------------------------- > > > I modify src/bin/psql/psqlscan.l to resolve this problem. > The diff file refer to the attachment "psqlscan.l.patch". > > > Regards, > Jiang Guiqing
>> 1. some especial character >> (my sql file contains japanese comment "-- コメント" . It can cause >> psql crash.) >> 2. PGCLIENTENCODING is SJIS >> 3. the encoding of input sql file is UTF-8 Actually the problem can occur even when importing following 3 byte UTF8 input file: ト (in hexa, 0xe3, 0x83, 0x88) In this paticular case, psql decides that the total character length is 5, not 3. Because it just looks at the each first byte by calling PQmblen: 0xe3 -> 1 bytes in SJIS 0x83 -> 2 bytes in SJIS 0x88 -> 2 bytes in SJIS total: 5 bytes which is apparently wrong and causes subsequent segfault. Note that it is possible that "input file > psql decision" case as well if client encoding is different from file encoding, which will not be good too. I think we should detect the cases as much as possible and warn users, rather than silently ignore that fact client encoding != file encoding. I don't think we can detect it in a reliable way, but at least we could check the cases above(sum of PQmblen is not equale to buffer lenghth). So my proposal is, if prepare_buffer() detects possible inconsistency between buffer encoding and file encoding, warn user. [t-ishii@localhost psql]$ PGCLIENTENCODING=SJIS psql postgres Pager usage is off. psql (9.3devel) Type "help" for help. postgres=# \i ~/sql CREATE DATABASE You are now connected to database "mydb" as user "t-ishii". CREATE SCHEMA psql:/home/t-ishii/sql:7: warning: possible conflict between client encoding SJIS and input file encoding CREATE TABLE Comments? -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp diff --git a/src/bin/psql/psqlscan.l b/src/bin/psql/psqlscan.l index d32a12c..a14d6fe 100644 --- a/src/bin/psql/psqlscan.l +++ b/src/bin/psql/psqlscan.l @@ -1808,7 +1808,29 @@ prepare_buffer(const char *txt, int len, char **txtcopy) newtxt[i] = txt[i]; i++; while (--thislen > 0) + { + if (i >= len) + { + /* + * This could happen if cur_state->encoding is + * different from input file encoding. + */ + psql_error("warning: possible conflict between client encoding %s and input file encoding\n", + pg_encoding_to_char(cur_state->encoding)); + break; + } newtxt[i++] = (char) 0xFF; + } + } + + if (i != len) + { + /* + * This could happen if cur_state->encoding is + * different from input file encoding. + */ + psql_error("warning: possible conflict between client encoding %s and input file encoding\n", + pg_encoding_to_char(cur_state->encoding)); } }
> buffer lenghth). So my proposal is, if prepare_buffer() detects> possible inconsistency between buffer encoding and fileencoding, warn> user. I agree with that. On 2012/11/30 12:52, Tatsuo Ishii Wrote: >>> 1. some especial character >>> (my sql file contains japanese comment "-- コメント" . It can cause >>> psql crash.) >>> 2. PGCLIENTENCODING is SJIS >>> 3. the encoding of input sql file is UTF-8 > > Actually the problem can occur even when importing following 3 byte > UTF8 input file: > > ト > > (in hexa, 0xe3, 0x83, 0x88) > > In this paticular case, psql decides that the total character length is > 5, not 3. Because it just looks at the each first byte by calling PQmblen: > > 0xe3 -> 1 bytes in SJIS > 0x83 -> 2 bytes in SJIS > 0x88 -> 2 bytes in SJIS > total: 5 bytes > > which is apparently wrong and causes subsequent segfault. Note that it > is possible that "input file > psql decision" case as well if client > encoding is different from file encoding, which will not be good too. > I think we should detect the cases as much as possible and warn users, > rather than silently ignore that fact client encoding != file > encoding. I don't think we can detect it in a reliable way, but at > least we could check the cases above(sum of PQmblen is not equale to > buffer lenghth). So my proposal is, if prepare_buffer() detects > possible inconsistency between buffer encoding and file encoding, warn > user. > > [t-ishii@localhost psql]$ PGCLIENTENCODING=SJIS psql postgres > Pager usage is off. > psql (9.3devel) > Type "help" for help. > > postgres=# \i ~/sql > CREATE DATABASE > You are now connected to database "mydb" as user "t-ishii". > CREATE SCHEMA > psql:/home/t-ishii/sql:7: warning: possible conflict between client encoding SJIS and input file encoding > CREATE TABLE > > Comments? > -- > Tatsuo Ishii > SRA OSS, Inc. Japan > English: http://www.sraoss.co.jp/index_en.php > Japanese: http://www.sraoss.co.jp >
Tatsuo Ishii wrote: > I think we should detect the cases as much as possible and warn users, > rather than silently ignore that fact client encoding != file > encoding. I don't think we can detect it in a reliable way, but at > least we could check the cases above(sum of PQmblen is not equale to > buffer lenghth). So my proposal is, if prepare_buffer() detects > possible inconsistency between buffer encoding and file encoding, warn > user. > > [t-ishii@localhost psql]$ PGCLIENTENCODING=SJIS psql postgres > Pager usage is off. > psql (9.3devel) > Type "help" for help. > > postgres=# \i ~/sql > CREATE DATABASE > You are now connected to database "mydb" as user "t-ishii". > CREATE SCHEMA > psql:/home/t-ishii/sql:7: warning: possible conflict between client encoding SJIS and input file > encoding > CREATE TABLE > > Comments? I wonder about the "possible". Could there be false positives with the test? If yes, I don't like the idea. If there is no possibility for false positives, I'd say that the "possible" should go. Maybe it should even be an error and no warning then. Yours, Laurenz Albe
On 11/30/12 3:26 AM, Albe Laurenz wrote: > Tatsuo Ishii wrote: >> postgres=# \i ~/sql >> CREATE DATABASE >> You are now connected to database "mydb" as user "t-ishii". >> CREATE SCHEMA >> psql:/home/t-ishii/sql:7: warning: possible conflict between client > encoding SJIS and input file >> encoding >> CREATE TABLE >> >> Comments? > > I wonder about the "possible". > > Could there be false positives with the test? > If yes, I don't like the idea. > If there is no possibility for false positives, I'd say > that the "possible" should go. Maybe it should even be > an error and no warning then. Yes, encoding mismatches are generally an error. I think the message should be more precise. Nobody will know what an "encoding conflict" is. The error condition is "last multibyte character ran over end of file" or something like that, which should be clearer.
Peter Eisentraut <peter_e@gmx.net> writes: > On 11/30/12 3:26 AM, Albe Laurenz wrote: >> If there is no possibility for false positives, I'd say >> that the "possible" should go. Maybe it should even be >> an error and no warning then. > Yes, encoding mismatches are generally an error. > I think the message should be more precise. Nobody will know what an > "encoding conflict" is. The error condition is "last multibyte > character ran over end of file" or something like that, which should be > clearer. TBH I think that a message here is unnecessary; it's sufficient to ensure psql doesn't crash. The backend will produce a better message than this anyway once the data gets there, and that way we don't have to invent a new error recovery path inside psql. As-is, the patch just creates the question of what to do after issuing the error. regards, tom lane
> Peter Eisentraut <peter_e@gmx.net> writes: >> On 11/30/12 3:26 AM, Albe Laurenz wrote: >>> If there is no possibility for false positives, I'd say >>> that the "possible" should go. Maybe it should even be >>> an error and no warning then. > >> Yes, encoding mismatches are generally an error. > >> I think the message should be more precise. Nobody will know what an >> "encoding conflict" is. The error condition is "last multibyte >> character ran over end of file" or something like that, which should be >> clearer. > > TBH I think that a message here is unnecessary; it's sufficient to > ensure psql doesn't crash. The backend will produce a better message > than this anyway once the data gets there, and that way we don't have to > invent a new error recovery path inside psql. As-is, the patch just > creates the question of what to do after issuing the error. Problem is, backend does not always detect errors. For my example backend caches an error: postgres=# \i ~/a psql:/home/t-ishii/a:1: warning: possible conflict between client encoding SJIS and input file encoding ERROR: invalid byte sequence for encoding "SJIS": 0x88 psql:/home/t-ishii/a:1: ERROR: invalid byte sequence for encoding "SJIS": 0x88 However for Jiang Guiqing's example, backend silently ignores error: postgres=# \i ~/sql CREATE DATABASE You are now connected to database "mydb" as user "t-ishii". CREATE SCHEMA psql:/home/t-ishii/sql:7: warning: possible conflict between client encoding SJIS and input file encoding CREATE TABLE IMO it is a benefit for users to detect such errors earlier. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
> I wonder about the "possible". > > Could there be false positives with the test? > If yes, I don't like the idea. Yes, there could be false positives. It was just because the input file was broken. In the real world, I think probably encoding mismatch is the most possible cause, but this is not 100%. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
> I think the message should be more precise. Nobody will know what an > "encoding conflict" is. The error condition is "last multibyte > character ran over end of file" or something like that, which should be > clearer. "last multibyte character ran over" is not precise at all because the error was caused by each byte in the file confused PQmblen, not just last multibyte character. However to explain those in the messgae is too technical for users, I'm afraid. Maybe just "encoding mismatch detected. please make sure that input file encoding is SJIS" or something like that? -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
Tatsuo Ishii <ishii@postgresql.org> writes: >> TBH I think that a message here is unnecessary; it's sufficient to >> ensure psql doesn't crash. The backend will produce a better message >> than this anyway once the data gets there, and that way we don't have to >> invent a new error recovery path inside psql. As-is, the patch just >> creates the question of what to do after issuing the error. > Problem is, backend does not always detect errors. The reason it doesn't produce an error in Jiang Guiqing's example is that the encoding error is in a comment and thus is never transmitted to the backend at all. I don't see a big problem with that. If we did have a problem with it, I think the better fix would be to stop having psql strip comments from what it sends. (I've considered proposing such a change anyway, in order that logged statements look more like what the user typed.) > IMO it is a benefit for users to detect such errors earlier. It is not a benefit for users to get randomly different (and less informative) messages and different error behaviors for the same problem. I think we should just do this and call it good: diff --git a/src/bin/psql/psqlscan.l b/src/bin/psql/psqlscan.l index d32a12c63c856625aa42c708b24d4b58e3cdd677..6c1429815f2eec597f735d18ea86d9c8c9f1f3a2 100644 *** a/src/bin/psql/psqlscan.l --- b/src/bin/psql/psqlscan.l *************** prepare_buffer(const char *txt, int len, *** 1807,1813 **** /* first byte should always be okay... */ newtxt[i] = txt[i]; i++; ! while (--thislen > 0) newtxt[i++] = (char) 0xFF; } } --- 1807,1813 ---- /* first byte should always be okay... */ newtxt[i] = txt[i]; i++; ! while (--thislen > 0 && i < len) newtxt[i++] = (char) 0xFF; } } regards, tom lane
> I confirmed the problem. Also I confirmed your patch fixes the > problem. In addition to this, all the tests in test/mb and > test/regress are passed. Fix committed as you proposed(without any message I proposed). Thanks. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp