Thread: [PATCH] Patch to fix a crash of psql

[PATCH] Patch to fix a crash of psql

From
JiangGuiqing
Date:
hi

When i test psql under multi-lingual and different encoding environment,
I found a crash of psql.

----------------------------------------------------------------------
$ export PGCLIENTENCODING=SJIS
$ psql
psql (9.2rc1)
Type "help" for help.

postgres=# \i sql
CREATE DATABASE
You are now connected to database "mydb" as user "postgres".
CREATE SCHEMA
Segmentation fault (core dumped)
$
----------------------------------------------------------------------

I'm look into this problem and found that
only some especial character can cause psql crash.
conditions is:
1. some especial character
(my sql file contains japanese comment "-- コメント" .  It can cause
psql crash.)
2. PGCLIENTENCODING is SJIS
3. the encoding of input sql file is UTF-8


I investigated this problem. The reasons are as follows.
----------------------------------------------------------------------
src/bin/psql/mainloop.c
-> psql_scan_setup()    //Set up to perform lexing of the given input line.
-->prepare_buffer ()    //Set up a flex input buffer to scan the given data.
---->malloc character buffer.
---->set two \0 characters. (Flex wants two \0 characters after the
actual data.)
---->working in an unsafe encoding, the copy has multibyte sequences
replaced by FFs to avoid fooling the lexer rules.
****the encoding of input sql file is different from PGCLIENTENCODING, two
\0 characters are replaced by FFs. ****

---->yy_scan_buffer()   //Setup the input buffer state to scan directly
from a user-specified character buffer.
****because  two \0 characters are replaced by FFs,yy_scan_buffer() return
0.  input buffer state can not setup correctly.****

-> psql_scan()   //Do lexical analysis of SQL command text.
--> yylex()         //The main scanner function which does all the work.
****because input buffer state is not setup,so when access the input
buffer state,segmentation fault is happened.****
----------------------------------------------------------------------


I modify src/bin/psql/psqlscan.l to resolve this problem.
The diff file refer to the attachment "psqlscan.l.patch".


Regards,
Jiang Guiqing

Attachment

Re: [PATCH] Patch to fix a crash of psql

From
Tatsuo Ishii
Date:
I confirmed the problem. Also I confirmed your patch fixes the
problem.  In addition to this, all the tests in test/mb and
test/regress are passed.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> hi
> 
> When i test psql under multi-lingual and different encoding
> environment,
> I found a crash of psql.
> 
> ----------------------------------------------------------------------
> $ export PGCLIENTENCODING=SJIS
> $ psql
> psql (9.2rc1)
> Type "help" for help.
> 
> postgres=# \i sql
> CREATE DATABASE
> You are now connected to database "mydb" as user "postgres".
> CREATE SCHEMA
> Segmentation fault (core dumped)
> $
> ----------------------------------------------------------------------
>     
> I'm look into this problem and found that
> only some especial character can cause psql crash.
> conditions is:
> 1. some especial character
> (my sql file contains japanese comment "-- コメント" .  It can cause
> psql crash.)
> 2. PGCLIENTENCODING is SJIS
> 3. the encoding of input sql file is UTF-8
> 
> 
> I investigated this problem. The reasons are as follows.
> ----------------------------------------------------------------------
> src/bin/psql/mainloop.c
> -> psql_scan_setup() //Set up to perform lexing of the given input line.
> -->prepare_buffer () //Set up a flex input buffer to scan the given data.
> ---->malloc character buffer.
> ---->set two \0 characters. (Flex wants two \0 characters after the
> actual data.)
> ---->working in an unsafe encoding, the copy has multibyte sequences
> replaced by FFs to avoid fooling the lexer rules.
> ****the encoding of input sql file is different from PGCLIENTENCODING, two
> \0 characters are replaced by FFs. ****
> 
> ---->yy_scan_buffer()   //Setup the input buffer state to scan directly
> from a user-specified character buffer.
> ****because  two \0 characters are replaced by FFs,yy_scan_buffer() return
> 0.  input buffer state can not setup correctly.****
> 
> -> psql_scan()   //Do lexical analysis of SQL command text.
> --> yylex()         //The main scanner function which does all the work.
> ****because input buffer state is not setup,so when access the input
> buffer state,segmentation fault is happened.****
> ----------------------------------------------------------------------
> 
> 
> I modify src/bin/psql/psqlscan.l to resolve this problem.
> The diff file refer to the attachment "psqlscan.l.patch".
> 
> 
> Regards,
> Jiang Guiqing



Re: [PATCH] Patch to fix a crash of psql

From
Tatsuo Ishii
Date:
>> 1. some especial character
>> (my sql file contains japanese comment "-- コメント" .  It can cause
>> psql crash.)
>> 2. PGCLIENTENCODING is SJIS
>> 3. the encoding of input sql file is UTF-8

Actually the problem can occur even when importing following 3 byte
UTF8 input file:

ト

(in hexa, 0xe3, 0x83, 0x88)

In this paticular case, psql decides that the total character length is 
5, not 3. Because it just looks at the each first byte by calling PQmblen:

0xe3 -> 1 bytes in SJIS
0x83 -> 2 bytes in SJIS
0x88 -> 2 bytes in SJIS
total: 5 bytes

which is apparently wrong and causes subsequent segfault. Note that it
is possible that "input file > psql decision" case as well if client
encoding is different from file encoding, which will not be good too.
I think we should detect the cases as much as possible and warn users,
rather than silently ignore that fact client encoding != file
encoding. I don't think we can detect it in a reliable way, but at
least we could check the cases above(sum of PQmblen is not equale to
buffer lenghth). So my proposal is, if prepare_buffer() detects
possible inconsistency between buffer encoding and file encoding, warn
user.

[t-ishii@localhost psql]$ PGCLIENTENCODING=SJIS psql postgres
Pager usage is off.
psql (9.3devel)
Type "help" for help.

postgres=# \i ~/sql
CREATE DATABASE
You are now connected to database "mydb" as user "t-ishii".
CREATE SCHEMA
psql:/home/t-ishii/sql:7: warning: possible conflict between client encoding SJIS and input file encoding
CREATE TABLE

Comments?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
diff --git a/src/bin/psql/psqlscan.l b/src/bin/psql/psqlscan.l
index d32a12c..a14d6fe 100644
--- a/src/bin/psql/psqlscan.l
+++ b/src/bin/psql/psqlscan.l
@@ -1808,7 +1808,29 @@ prepare_buffer(const char *txt, int len, char **txtcopy)            newtxt[i] = txt[i];
 i++;            while (--thislen > 0)
 
+            {
+                if (i >= len)
+                {
+                    /*
+                     * This could happen if cur_state->encoding is
+                     * different from input file encoding.
+                     */
+                    psql_error("warning: possible conflict between client encoding %s and input file encoding\n",
+                               pg_encoding_to_char(cur_state->encoding));
+                    break;
+                }                newtxt[i++] = (char) 0xFF;
+            }
+        }
+
+        if (i != len)
+        {
+            /*
+             * This could happen if cur_state->encoding is
+             * different from input file encoding.
+             */
+            psql_error("warning: possible conflict between client encoding %s and input file encoding\n",
+                       pg_encoding_to_char(cur_state->encoding));        }    }

Re: [PATCH] Patch to fix a crash of psql

From
JiangGuiqing
Date:
> buffer lenghth). So my proposal is, if prepare_buffer() detects> possible inconsistency between buffer encoding and
fileencoding, warn> user. 
I agree with that.


On 2012/11/30 12:52, Tatsuo Ishii Wrote:
>>> 1. some especial character
>>> (my sql file contains japanese comment "-- コメント" .  It can cause
>>> psql crash.)
>>> 2. PGCLIENTENCODING is SJIS
>>> 3. the encoding of input sql file is UTF-8
>
> Actually the problem can occur even when importing following 3 byte
> UTF8 input file:
>
> ト
>
> (in hexa, 0xe3, 0x83, 0x88)
>
> In this paticular case, psql decides that the total character length is
> 5, not 3. Because it just looks at the each first byte by calling PQmblen:
>
> 0xe3 -> 1 bytes in SJIS
> 0x83 -> 2 bytes in SJIS
> 0x88 -> 2 bytes in SJIS
> total: 5 bytes
>
> which is apparently wrong and causes subsequent segfault. Note that it
> is possible that "input file > psql decision" case as well if client
> encoding is different from file encoding, which will not be good too.
> I think we should detect the cases as much as possible and warn users,
> rather than silently ignore that fact client encoding != file
> encoding. I don't think we can detect it in a reliable way, but at
> least we could check the cases above(sum of PQmblen is not equale to
> buffer lenghth). So my proposal is, if prepare_buffer() detects
> possible inconsistency between buffer encoding and file encoding, warn
> user.
>
> [t-ishii@localhost psql]$ PGCLIENTENCODING=SJIS psql postgres
> Pager usage is off.
> psql (9.3devel)
> Type "help" for help.
>
> postgres=# \i ~/sql
> CREATE DATABASE
> You are now connected to database "mydb" as user "t-ishii".
> CREATE SCHEMA
> psql:/home/t-ishii/sql:7: warning: possible conflict between client encoding SJIS and input file encoding
> CREATE TABLE
>
> Comments?
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
>




Re: [PATCH] Patch to fix a crash of psql

From
"Albe Laurenz"
Date:
Tatsuo Ishii wrote:
> I think we should detect the cases as much as possible and warn users,
> rather than silently ignore that fact client encoding != file
> encoding. I don't think we can detect it in a reliable way, but at
> least we could check the cases above(sum of PQmblen is not equale to
> buffer lenghth). So my proposal is, if prepare_buffer() detects
> possible inconsistency between buffer encoding and file encoding, warn
> user.
>
> [t-ishii@localhost psql]$ PGCLIENTENCODING=SJIS psql postgres
> Pager usage is off.
> psql (9.3devel)
> Type "help" for help.
>
> postgres=# \i ~/sql
> CREATE DATABASE
> You are now connected to database "mydb" as user "t-ishii".
> CREATE SCHEMA
> psql:/home/t-ishii/sql:7: warning: possible conflict between client
encoding SJIS and input file
> encoding
> CREATE TABLE
>
> Comments?

I wonder about the "possible".

Could there be false positives with the test?
If yes, I don't like the idea.
If there is no possibility for false positives, I'd say
that the "possible" should go.  Maybe it should even be
an error and no warning then.

Yours,
Laurenz Albe



Re: [PATCH] Patch to fix a crash of psql

From
Peter Eisentraut
Date:
On 11/30/12 3:26 AM, Albe Laurenz wrote:
> Tatsuo Ishii wrote:
>> postgres=# \i ~/sql
>> CREATE DATABASE
>> You are now connected to database "mydb" as user "t-ishii".
>> CREATE SCHEMA
>> psql:/home/t-ishii/sql:7: warning: possible conflict between client
> encoding SJIS and input file
>> encoding
>> CREATE TABLE
>>
>> Comments?
> 
> I wonder about the "possible".
> 
> Could there be false positives with the test?
> If yes, I don't like the idea.
> If there is no possibility for false positives, I'd say
> that the "possible" should go.  Maybe it should even be
> an error and no warning then.

Yes, encoding mismatches are generally an error.

I think the message should be more precise.  Nobody will know what an
"encoding conflict" is.  The error condition is "last multibyte
character ran over end of file" or something like that, which should be
clearer.



Re: [PATCH] Patch to fix a crash of psql

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> On 11/30/12 3:26 AM, Albe Laurenz wrote:
>> If there is no possibility for false positives, I'd say
>> that the "possible" should go.  Maybe it should even be
>> an error and no warning then.

> Yes, encoding mismatches are generally an error.

> I think the message should be more precise.  Nobody will know what an
> "encoding conflict" is.  The error condition is "last multibyte
> character ran over end of file" or something like that, which should be
> clearer.

TBH I think that a message here is unnecessary; it's sufficient to
ensure psql doesn't crash.  The backend will produce a better message
than this anyway once the data gets there, and that way we don't have to
invent a new error recovery path inside psql.  As-is, the patch just
creates the question of what to do after issuing the error.
        regards, tom lane



Re: [PATCH] Patch to fix a crash of psql

From
Tatsuo Ishii
Date:
> Peter Eisentraut <peter_e@gmx.net> writes:
>> On 11/30/12 3:26 AM, Albe Laurenz wrote:
>>> If there is no possibility for false positives, I'd say
>>> that the "possible" should go.  Maybe it should even be
>>> an error and no warning then.
> 
>> Yes, encoding mismatches are generally an error.
> 
>> I think the message should be more precise.  Nobody will know what an
>> "encoding conflict" is.  The error condition is "last multibyte
>> character ran over end of file" or something like that, which should be
>> clearer.
> 
> TBH I think that a message here is unnecessary; it's sufficient to
> ensure psql doesn't crash.  The backend will produce a better message
> than this anyway once the data gets there, and that way we don't have to
> invent a new error recovery path inside psql.  As-is, the patch just
> creates the question of what to do after issuing the error.

Problem is, backend does not always detect errors.

For my example backend caches an error:
postgres=# \i ~/a
psql:/home/t-ishii/a:1: warning: possible conflict between client encoding SJIS and input file encoding
ERROR:  invalid byte sequence for encoding "SJIS": 0x88
psql:/home/t-ishii/a:1: ERROR:  invalid byte sequence for encoding "SJIS": 0x88

However for Jiang Guiqing's example, backend silently ignores error:
postgres=# \i ~/sql
CREATE DATABASE
You are now connected to database "mydb" as user "t-ishii".
CREATE SCHEMA
psql:/home/t-ishii/sql:7: warning: possible conflict between client encoding SJIS and input file encoding
CREATE TABLE

IMO it is a benefit for users to detect such errors earlier.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp



Re: [PATCH] Patch to fix a crash of psql

From
Tatsuo Ishii
Date:
> I wonder about the "possible".
> 
> Could there be false positives with the test?
> If yes, I don't like the idea.

Yes, there could be false positives. It was just because the input
file was broken. In the real world, I think probably encoding mismatch
is the most possible cause, but this is not 100%.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp



Re: [PATCH] Patch to fix a crash of psql

From
Tatsuo Ishii
Date:
> I think the message should be more precise.  Nobody will know what an
> "encoding conflict" is.  The error condition is "last multibyte
> character ran over end of file" or something like that, which should be
> clearer.

"last multibyte character ran over" is not precise at all because the
error was caused by each byte in the file confused PQmblen, not just
last multibyte character. However to explain those in the messgae is
too technical for users, I'm afraid. Maybe just "encoding mismatch
detected. please make sure that input file encoding is SJIS" or
something like that?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp



Re: [PATCH] Patch to fix a crash of psql

From
Tom Lane
Date:
Tatsuo Ishii <ishii@postgresql.org> writes:
>> TBH I think that a message here is unnecessary; it's sufficient to
>> ensure psql doesn't crash.  The backend will produce a better message
>> than this anyway once the data gets there, and that way we don't have to
>> invent a new error recovery path inside psql.  As-is, the patch just
>> creates the question of what to do after issuing the error.

> Problem is, backend does not always detect errors.

The reason it doesn't produce an error in Jiang Guiqing's example is
that the encoding error is in a comment and thus is never transmitted
to the backend at all.  I don't see a big problem with that.  If we
did have a problem with it, I think the better fix would be to stop
having psql strip comments from what it sends.  (I've considered
proposing such a change anyway, in order that logged statements look
more like what the user typed.)

> IMO it is a benefit for users to detect such errors earlier.

It is not a benefit for users to get randomly different (and less
informative) messages and different error behaviors for the same
problem.

I think we should just do this and call it good:

diff --git a/src/bin/psql/psqlscan.l b/src/bin/psql/psqlscan.l
index d32a12c63c856625aa42c708b24d4b58e3cdd677..6c1429815f2eec597f735d18ea86d9c8c9f1f3a2 100644
*** a/src/bin/psql/psqlscan.l
--- b/src/bin/psql/psqlscan.l
*************** prepare_buffer(const char *txt, int len,
*** 1807,1813 ****             /* first byte should always be okay... */             newtxt[i] = txt[i];
i++;
!             while (--thislen > 0)                 newtxt[i++] = (char) 0xFF;         }     }
--- 1807,1813 ----             /* first byte should always be okay... */             newtxt[i] = txt[i];
i++;
!             while (--thislen > 0 && i < len)                 newtxt[i++] = (char) 0xFF;         }     }

        regards, tom lane



Re: [PATCH] Patch to fix a crash of psql

From
Tatsuo Ishii
Date:
> I confirmed the problem. Also I confirmed your patch fixes the
> problem.  In addition to this, all the tests in test/mb and
> test/regress are passed.

Fix committed as you proposed(without any message I proposed). Thanks.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp