Re: [PATCH] hstore: Fix parsing on Mac OS X: isspace() is locale specific - Mailing list pgsql-hackers
From | Evan Jones |
---|---|
Subject | Re: [PATCH] hstore: Fix parsing on Mac OS X: isspace() is locale specific |
Date | |
Msg-id | CA+HWA9bTRDf52DHyU+JOoqEALgRGRo5uHUYTFuduoj3cBfer+Q@mail.gmail.com Whole thread Raw |
In response to | Re: [PATCH] hstore: Fix parsing on Mac OS X: isspace() is locale specific (Michael Paquier <michael@paquier.xyz>) |
Responses |
Re: [PATCH] hstore: Fix parsing on Mac OS X: isspace() is locale specific
|
List | pgsql-hackers |
Unfortunately I just noticed a possible "bug" with this change. The scanner_isspace() function only recognizes *five* ASCII space characters: ' ' \t \n \r \f. It *excludes* VTAB \v, which the C standard function isspace() includes. This means this patch changed the behavior of hstore parsing for some "unusual" cases where the \v character was previously ignored, and now is not, such as: "select 'k=>\vv'::hstore" . It seems unlikely to me that anyone would be depending on this. The application/programming language library would need to be explicitly depending on VTAB being ignored as leading/trailing characters for hstore key/values. I am hopeful that most implementations encode hstore values the same way Postgres does: always using quoted strings, which avoids this problem.
However, if we think this change could be a problem, one fix would be to switch scanner_isspace() to array_isspace(), which returns true for these *six* ASCII characters. I am happy to submit a patch to do this.
However, I am now wondering if the fact that scanner_isspace() and array_isspace() disagree with each other could be problematic somewhere, but so far I haven found anything.
Problematic example before my hstore change:
$ printf "select 'k=>\vv'::hstore" | psql
hstore
----------
"k"=>"v"
(1 row)
hstore
----------
"k"=>"v"
(1 row)
Same example after my hstore change on postgres master commit a14e75eb0b from 2023-06-16:
$ printf "select 'k=>\vv'::hstore" | psql
hstore
--------------
"k"=>"\x0Bv"
(1 row)
hstore
--------------
"k"=>"\x0Bv"
(1 row)
On Sun, Jun 11, 2023 at 8:18 PM Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Jun 06, 2023 at 10:16:09AM -0400, Evan Jones wrote:
> I did a quick look at the places found with "git grep isspace" yesterday. I
> agree with the comment from commit 9ae2661: "I've left alone isspace()
> calls in places that aren't really expecting any non-ASCII input
> characters, such as float8in()." There are a number of other calls where I
> think it would likely be safe, and possibly even a good idea, to replace
> isspace() with scanner_isspace(). However, I couldn't find any where I
> could cause a bug like the one I hit in hstore parsing.
Yes, I agree with this feeling. Like 9ae2661, I can't get really
excited about plastering more of that, especially if it were for
timezone value input or dictionary options. One area with a new
isspace() since 2017 is multirangetypes.c, but it is just a copy of
rangetypes.c.
> Original mailing list post for commit 9ae2661 in case it is helpful for
> others: https://www.postgresql.org/message-id/10129.1495302480@sss.pgh.pa.us
I have reproduced the original problem reported on macOS 13.4, which
is close to the top of what's available.
Passing to pg_regress some options to use something else than UTF-8
leads to a failure in the tests, so we need a split like
fussyztrmatch to test that:
REGRESS_OPTS='--encoding=SQL_ASCII --no-locale' make check
An other error pattern without a split could be found on Windows, as
of:
select E'key\u0105=>value'::hstore;
- hstore
------------------
- "keyÄ…"=>"value"
-(1 row)
-
+ERROR: character with byte sequence 0xc4 0x85 in encoding "UTF8" has
no equivalent in encoding "WIN1252"
+LINE 1: select E'key\u0105=>value'::hstore;
We don't do that for unaccent, actually, leading to similar failures..
I'll launch a separate thread about that shortly.
With that fixed, the fix has been applied and backpatched. Thanks for
the report, Evan!
--
Michael
pgsql-hackers by date: