Re: Invalid "trailing junk" error message when non-English letters are used - Mailing list pgsql-hackers

From Pavel Borisov
Subject Re: Invalid "trailing junk" error message when non-English letters are used
Date
Msg-id CALT9ZEFG8u=+pBMkON1Ske+We6wtjf=A2SYGvhsZJn5TaHLwLA@mail.gmail.com
Whole thread Raw
List pgsql-hackers
Hi, Karina!

On Tue, 27 Aug 2024 at 19:06, Karina Litskevich <litskevichkarina@gmail.com> wrote:
Hi hackers,

When error "trailing junk after numeric literal" occurs at a number
followed by a symbol that is presented by more than one byte, that symbol
in the error message is not displayed correctly. Instead of that symbol
there is only its first byte. That makes the error message an invalid
UTF-8 (or whatever encoding is set). The whole log file where this error
message goes also becomes invalid. That could lead to problems with
reading logs. You can see an invalid message by trying "SELECT 123ä;".

Rejecting trailing junk after numeric literals was introduced in commit
2549f066 to prevent scanning a number immediately followed by an
identifier without whitespace as number and identifier. All the tokens
that made to catch such cases match a numeric literal and the next byte,
and that is where the problem comes from. I thought that it could be fixed
just by using tokens that match a numeric literal immediately followed by
an identifier, not only one byte. This also improves error messages in
cases with English letters. After these changes, for "SELECT 123abc;" the
error message will say that the error appeared at or near "123abc" instead
of "123a".

I've attached the patch. Are there any pitfalls I can't see? It just keeps
bothering me why wasn't it done from the beginning. Matching the whole
identifier after a numeric literal just seems more obvious to me than
matching its first byte.
 
I see the following compile time warnings:
scan.l:1062: warning, rule cannot be matched
scan.l:1066: warning, rule cannot be matched
scan.l:1070: warning, rule cannot be matched
pgc.l:1030: warning, rule cannot be matched
pgc.l:1033: warning, rule cannot be matched
pgc.l:1036: warning, rule cannot be matched
psqlscan.l:905: warning, rule cannot be matched
psqlscan.l:908: warning, rule cannot be matched
psqlscan.l:911: warning, rule cannot be matched

FWIW output of the whole string in the error message doesnt' look nice to me, but other places of code do this anyway e.g:
select ('1'||repeat('p',1000000))::integer;
This may be worth fixing.

Regards,
Pavel Borisov
Supabase

pgsql-hackers by date:

Previous
From: Matthias van de Meent
Date:
Subject: Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
Next
From: Jeff Davis
Date:
Subject: Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM