Thread: BUG #12070: hstore extension: hstore_to_json_loose produces invalid JSON

BUG #12070: hstore extension: hstore_to_json_loose produces invalid JSON

From
bouda@edookit.com
Date:
The following bug has been logged on the website:

Bug reference:      12070
Logged by:          Ondřej Bouda
Email address:      bouda@edookit.com
PostgreSQL version: 9.4beta2
Operating system:   Windows 7 64bit
Description:

The hstore_to_json_loose(hstore) produces an invalid JSON in the following
case:

SELECT hstore_to_json_loose(hstore(ARRAY ['name'], ARRAY ['1.'] :: TEXT
[]))

Output: {"name": 1.}

Expected:
  either {"name": "1."}
  or     {"name": 1}
  (the latter being the preferred one so that it produces the same JSON as
hstore_to_jsonb_loose(hstore))

The actual output is indeed incorrect as JSON does not permit `1.` - it must
be a string.

Tested with PostgreSQL 9.4 RC1 and still wrong.
bouda@edookit.com writes:
> The hstore_to_json_loose(hstore) produces an invalid JSON in the following
> case:

> SELECT hstore_to_json_loose(hstore(ARRAY ['name'], ARRAY ['1.'] :: TEXT
> []))

> Output: {"name": 1.}

> The actual output is indeed incorrect as JSON does not permit `1.` - it must
> be a string.

Yeah.  The problem seems to be the ad-hoc (I'm being polite) code in
hstore_to_json_loose to decide whether a string should be treated as a
number.  It does much more work than it needs to, and fails to have any
tight connection to the JSON syntax rules for numbers.

Offhand, it seems like the nicest fix would be if the core json code
exposed a function that would say whether a string matches the JSON
number syntax.  Does that functionality already exist someplace,
or is it too deeply buried in the JSON parser guts?
        regards, tom lane



On 11/26/2014 11:19 AM, Tom Lane wrote:
> bouda@edookit.com writes:
>> The hstore_to_json_loose(hstore) produces an invalid JSON in the following
>> case:
>> SELECT hstore_to_json_loose(hstore(ARRAY ['name'], ARRAY ['1.'] :: TEXT
>> []))
>> Output: {"name": 1.}
>> The actual output is indeed incorrect as JSON does not permit `1.` - it must
>> be a string.
> Yeah.  The problem seems to be the ad-hoc (I'm being polite) code in
> hstore_to_json_loose to decide whether a string should be treated as a
> number.  It does much more work than it needs to, and fails to have any
> tight connection to the JSON syntax rules for numbers.
>
> Offhand, it seems like the nicest fix would be if the core json code
> exposed a function that would say whether a string matches the JSON
> number syntax.  Does that functionality already exist someplace,
> or is it too deeply buried in the JSON parser guts?
>
>             regards, tom lane
>
>



In json.c we now check numbers like this:
   JsonLexContext dummy_lex;   bool        numeric_error;   ...   dummy_lex.input = *outputstr == '-' ? outputstr + 1 :
outputstr;  dummy_lex.input_length = strlen(dummy_lex.input);   json_lex_number(&dummy_lex, dummy_lex.input,
&numeric_error);

numeric_error is true IFF outputstr is a legal json number.

Exposing a function to do this should be trivial.

cheers

andrew






On 11/26/2014 11:48 AM, Andrew Dunstan wrote:
>
>
>
>
> In json.c we now check numbers like this:
>
>    JsonLexContext dummy_lex;
>    bool        numeric_error;
>    ...
>    dummy_lex.input = *outputstr == '-' ? outputstr + 1 : outputstr;
>    dummy_lex.input_length = strlen(dummy_lex.input);
>    json_lex_number(&dummy_lex, dummy_lex.input, &numeric_error);
>
> numeric_error is true IFF outputstr is a legal json number.

er is NOT a legal json number

cheers

andrew






On 11/26/2014 11:48 AM, Andrew Dunstan wrote:
>
> On 11/26/2014 11:19 AM, Tom Lane wrote:
>> bouda@edookit.com writes:
>>> The hstore_to_json_loose(hstore) produces an invalid JSON in the 
>>> following
>>> case:
>>> SELECT hstore_to_json_loose(hstore(ARRAY ['name'], ARRAY ['1.'] :: TEXT
>>> []))
>>> Output: {"name": 1.}
>>> The actual output is indeed incorrect as JSON does not permit `1.` - 
>>> it must
>>> be a string.
>> Yeah.  The problem seems to be the ad-hoc (I'm being polite) code in
>> hstore_to_json_loose to decide whether a string should be treated as a
>> number.  It does much more work than it needs to, and fails to have any
>> tight connection to the JSON syntax rules for numbers.
>>
>> Offhand, it seems like the nicest fix would be if the core json code
>> exposed a function that would say whether a string matches the JSON
>> number syntax.  Does that functionality already exist someplace,
>> or is it too deeply buried in the JSON parser guts?
>>
>>             regards, tom lane
>>
>>
>
>
>
> In json.c we now check numbers like this:
>
>    JsonLexContext dummy_lex;
>    bool        numeric_error;
>    ...
>    dummy_lex.input = *outputstr == '-' ? outputstr + 1 : outputstr;
>    dummy_lex.input_length = strlen(dummy_lex.input);
>    json_lex_number(&dummy_lex, dummy_lex.input, &numeric_error);
>
> numeric_error is true IFF outputstr is a legal json number.
>
> Exposing a function to do this should be trivial.
>
>


Tom,

what do you want to do about this? In the back branches, exposing a 
function like this would be an API change, wouldn't it? Perhaps there we 
just need to pick up the 100 lines or so involved from json.c and copy 
them into hstore_io.c, suitably modified. In the development branch I 
thing adding the function to the API is the best way.

I don't mind doing the work once we agree what is to be done - in the 
development branch it will be trivial.

cheers

andrew




Andrew Dunstan <andrew@dunslane.net> writes:
> what do you want to do about this? In the back branches, exposing a 
> function like this would be an API change, wouldn't it? Perhaps there we 
> just need to pick up the 100 lines or so involved from json.c and copy 
> them into hstore_io.c, suitably modified. In the development branch I 
> thing adding the function to the API is the best way.

If we're going to do it by calling some newly-exposed function, I'd be
inclined to fix it the same way in the back branches.  Otherwise the
discrepancy between the branches is a big back-patching hazard.
(For instance, if we realize we need to fix a bug in the numeric-parsing
code, what are the odds that we remember to fix hstore's additional copy
in the back branches?)

The "API break" isn't a big issue imo.  The net effect would be that eg
hstore 9.3.6 wouldn't work against a 9.3.5 server.  We do that sort of
thing *all the time* --- at least twice in the past year, according to
a quick scan of the commit logs.  If you were changing or removing a
function that third-party code might depend on, it'd be problematic,
but an addition has no such risk.
        regards, tom lane



On 11/30/2014 11:45 AM, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> what do you want to do about this? In the back branches, exposing a
>> function like this would be an API change, wouldn't it? Perhaps there we
>> just need to pick up the 100 lines or so involved from json.c and copy
>> them into hstore_io.c, suitably modified. In the development branch I
>> thing adding the function to the API is the best way.
> If we're going to do it by calling some newly-exposed function, I'd be
> inclined to fix it the same way in the back branches.  Otherwise the
> discrepancy between the branches is a big back-patching hazard.
> (For instance, if we realize we need to fix a bug in the numeric-parsing
> code, what are the odds that we remember to fix hstore's additional copy
> in the back branches?)
>
> The "API break" isn't a big issue imo.  The net effect would be that eg
> hstore 9.3.6 wouldn't work against a 9.3.5 server.  We do that sort of
> thing *all the time* --- at least twice in the past year, according to
> a quick scan of the commit logs.  If you were changing or removing a
> function that third-party code might depend on, it'd be problematic,
> but an addition has no such risk.
>
>


OK, here's the patch.

cheers

andrew

Attachment
Andrew Dunstan <andrew@dunslane.net> writes:
> OK, here's the patch.

Can we make IsValidJsonNumber() take a "const char *"?  Also its
comment should specify that it doesn't require nul-terminated
input, if indeed it doesn't.  Otherwise +1.
        regards, tom lane



On 11/30/2014 04:31 PM, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> OK, here's the patch.
> Can we make IsValidJsonNumber() take a "const char *"?  Also its
> comment should specify that it doesn't require nul-terminated
> input, if indeed it doesn't.  Otherwise +1.
>
>             


Then I have to cast away the const-ness when I assign it inside the 
function. Constifying the whole API would be a rather larger project, I 
suspect, assuming it's possible.

cheers

andrew



On 11/30/14 11:45 AM, Tom Lane wrote:
> The "API break" isn't a big issue imo.  The net effect would be that eg
> hstore 9.3.6 wouldn't work against a 9.3.5 server.  We do that sort of
> thing *all the time* --- at least twice in the past year, according to
> a quick scan of the commit logs.  If you were changing or removing a
> function that third-party code might depend on, it'd be problematic,
> but an addition has no such risk.

This sort of things is actually a bit of an annoyance, because it means
that for minor-version upgrades, you need to stop the server before
unpacking the new version, otherwise the old running server will try to
load the new hstore module and fail with a symbol lookup.  This can
increase the downtime significantly.

Yes, we've done this before, and people have gotten bitten by it before.




On Tue, Jan 13, 2015 at 10:56:48AM -0500, Peter Eisentraut wrote:
> On 11/30/14 11:45 AM, Tom Lane wrote:
> > The "API break" isn't a big issue imo.  The net effect would be that eg
> > hstore 9.3.6 wouldn't work against a 9.3.5 server.  We do that sort of
> > thing *all the time* --- at least twice in the past year, according to
> > a quick scan of the commit logs.  If you were changing or removing a
> > function that third-party code might depend on, it'd be problematic,
> > but an addition has no such risk.
> 
> This sort of things is actually a bit of an annoyance, because it means
> that for minor-version upgrades, you need to stop the server before
> unpacking the new version, otherwise the old running server will try to
> load the new hstore module and fail with a symbol lookup.  This can
> increase the downtime significantly.
> 
> Yes, we've done this before, and people have gotten bitten by it before.

Uh, do we ever support installing new binaries while the server is
running?  I would hope not.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



On 1/15/15 2:29 PM, Bruce Momjian wrote:
> On Tue, Jan 13, 2015 at 10:56:48AM -0500, Peter Eisentraut wrote:
>> On 11/30/14 11:45 AM, Tom Lane wrote:
>>> The "API break" isn't a big issue imo.  The net effect would be that eg
>>> hstore 9.3.6 wouldn't work against a 9.3.5 server.  We do that sort of
>>> thing *all the time* --- at least twice in the past year, according to
>>> a quick scan of the commit logs.  If you were changing or removing a
>>> function that third-party code might depend on, it'd be problematic,
>>> but an addition has no such risk.
>>
>> This sort of things is actually a bit of an annoyance, because it means
>> that for minor-version upgrades, you need to stop the server before
>> unpacking the new version, otherwise the old running server will try to
>> load the new hstore module and fail with a symbol lookup.  This can
>> increase the downtime significantly.
>>
>> Yes, we've done this before, and people have gotten bitten by it before.
> 
> Uh, do we ever support installing new binaries while the server is
> running?  I would hope not.

Effectively, we don't, but it's not unreasonable to expect it.  Check
how your operating system upgrades other server packages such as apache
or openssh.