Thread: postgresql v7.1.3 bug report

postgresql v7.1.3 bug report

From
"pierre"
Date:
Dear Sir,

    How are you. I need you help!

    I make postgres 7.1.3 version in my linux system with --enable-multibyt=
e=3DEUC_TW, but=20

    I got some problem when I exec sql command below,  in chinese character=
 (CName ~* '=A6|'')  the chicode is 0xA67C  -> 0x7c is ascii '|" , I guess =
you system reject '|' this byte, but it was Big5 Code 2nd byte , How can I =
avoid this proble??

SELECT * FROM ifabinstn Where((CName ~* '=A6|') OR FALSE) ORDER BY CName

Warning: PostgreSQL query failed: ERROR: Invalid regular expression: empty =
expression or subexpression in DB/pgsql.php on line 163
ERROR: Invalid regular expression: empty expression or subexpression=20

would you give some advise to solve this problem??

Thank you very much

Best Rgds.
Pierre Ho

Re: postgresql v7.1.3 bug report

From
Tom Lane
Date:
"pierre" <cti848@www.textilenet.org.tw> writes:
>     I make postgres 7.1.3 version in my linux system with --enable-multibyt=
> e=3DEUC_TW, but=20

>     I got some problem when I exec sql command below,  in chinese character=
>  (CName ~* '=A6|'')  the chicode is 0xA67C  -> 0x7c is ascii '|" , I guess =
> you system reject '|' this byte, but it was Big5 Code 2nd byte , How can I =
> avoid this proble??

> SELECT * FROM ifabinstn Where((CName ~* '=A6|') OR FALSE) ORDER BY CName

> Warning: PostgreSQL query failed: ERROR: Invalid regular expression: empty =
> expression or subexpression in DB/pgsql.php on line 163
> ERROR: Invalid regular expression: empty expression or subexpression=20


I am thinking that p_ere's local "char c" (regcomp.c, about line 304 in
current sources) should have been declared "pg_wchar c".  Tatsuo, what
do you think?  Are there any other places in this file where char should
be pg_wchar?

            regards, tom lane

Re: postgresql v7.1.3 bug report

From
Tatsuo Ishii
Date:
> "pierre" <cti848@www.textilenet.org.tw> writes:
> >     I make postgres 7.1.3 version in my linux system with --enable-multibyt=
> > e=3DEUC_TW, but=20
>
> >     I got some problem when I exec sql command below,  in chinese character=
> >  (CName ~* '=A6|'')  the chicode is 0xA67C  -> 0x7c is ascii '|" , I guess =
> > you system reject '|' this byte, but it was Big5 Code 2nd byte , How can I =
> > avoid this proble??
>
> > SELECT * FROM ifabinstn Where((CName ~* '=A6|') OR FALSE) ORDER BY CName
>
> > Warning: PostgreSQL query failed: ERROR: Invalid regular expression: empty =
> > expression or subexpression in DB/pgsql.php on line 163
> > ERROR: Invalid regular expression: empty expression or subexpression=20
>
>
> I am thinking that p_ere's local "char c" (regcomp.c, about line 304 in
> current sources) should have been declared "pg_wchar c".  Tatsuo, what
> do you think?  Are there any other places in this file where char should
> be pg_wchar?

I don't think so. The problem is he uses EUC_TW for backend encoding,
while he uses Big5 for frontend encoding. In this case he should
declare that client side encoding explicitly to let backend do the
encoding conversion. To acomplish this in php scripts, call:

pg_set_client_encoding($con, "BIG5");

before doing any query ($con is a connection to PostgreSQL).

Note that EUC_TW or any multibyte encodings that are allowed for
backend side, do not contain such ASCII special characters as "|" and
should be safe for the parser and the regexp routines.
--
Tatsuo Ishii

Re: postgresql v7.1.3 bug report

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> Note that EUC_TW or any multibyte encodings that are allowed for
> backend side, do not contain such ASCII special characters as "|" and
> should be safe for the parser and the regexp routines.

But the point is that a pg_wchar is being squeezed down to a char.
PEEK() produces a pg_wchar, no?

            regards, tom lane

Re: postgresql v7.1.3 bug report

From
Tatsuo Ishii
Date:
> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> > Note that EUC_TW or any multibyte encodings that are allowed for
> > backend side, do not contain such ASCII special characters as "|" and
> > should be safe for the parser and the regexp routines.
>
> But the point is that a pg_wchar is being squeezed down to a char.
> PEEK() produces a pg_wchar, no?

Oh I see.

Actually "c" is used soly to judge if it's '|' or some other stop
(ASCII) characters, so there is no need for changing it to pg_wchar
even if it could be squeezed down to a char. However, someday someone
might use c for other purpose, and it would be a good idea to prepare
for such kind of disaster. Will fix.
--
Tatsuo Ishii