Thread: hstore parser incorrectly handles malformed input

hstore parser incorrectly handles malformed input

From
Ryan Kelly
Date:
It seems that the hstore parser has some odd behavior in the the
handling of certain malformed input constructions:

[db]> select 'a=>,b=>1'::hstore;
    hstore
--------------
 "a"=>",b=>1"

[db]> select 'a=> ,b=>1'::hstore;
    hstore
--------------
 "a"=>",b=>1"

[db]> select 'a=>, b=>1'::hstore;
ERROR:  Syntax error near 'b' at position 5
LINE 2: select 'a=>, b=>1'::hstore;

In my mind, all of these should have been rejected as erroneous input.
To that end, I have attached a patch which causes all of these inputs
to be rejected as invalid.

-Ryan Kelly

Attachment

Re: hstore parser incorrectly handles malformed input

From
Tom Lane
Date:
Ryan Kelly <rpkelly22@gmail.com> writes:
> It seems that the hstore parser has some odd behavior in the the
> handling of certain malformed input constructions:

> [db]> select 'a=>,b=>1'::hstore;
>     hstore
> --------------
>  "a"=>",b=>1"

> [db]> select 'a=> ,b=>1'::hstore;
>     hstore
> --------------
>  "a"=>",b=>1"

> [db]> select 'a=>, b=>1'::hstore;
> ERROR:  Syntax error near 'b' at position 5
> LINE 2: select 'a=>, b=>1'::hstore;

> In my mind, all of these should have been rejected as erroneous input.
> To that end, I have attached a patch which causes all of these inputs
> to be rejected as invalid.

Hm, I would have expected all three of these to result in "a" having
an empty-string value.  I see nothing in the hstore documentation
suggesting that I must write a=>"" or some such to get an empty value,
and it also says "whitespace between pairs or around the => sign is
ignored".  So what is your reasoning for calling this malformed input?

(We're in agreement that the current behavior is wrong, though.)

            regards, tom lane

Re: hstore parser incorrectly handles malformed input

From
Tom Lane
Date:
I wrote:
> Ryan Kelly <rpkelly22@gmail.com> writes:
>> In my mind, all of these should have been rejected as erroneous input.
>> To that end, I have attached a patch which causes all of these inputs
>> to be rejected as invalid.

> Hm, I would have expected all three of these to result in "a" having
> an empty-string value.  I see nothing in the hstore documentation
> suggesting that I must write a=>"" or some such to get an empty value,

Attached is an alternative patch that fixes it that way.

Does anybody else have an opinion as to which of these solutions is
more preferable?  And should we regard this as a back-patchable bug
fix, or a definition change suitable only for HEAD?

            regards, tom lane

diff --git a/contrib/hstore/hstore_io.c b/contrib/hstore/hstore_io.c
index dde6c4b376ea2a82ed7d3cc0ef040e52f4df393a..08eafa18f1600d21e35eb4c3298215318dfebd9d 100644
*** a/contrib/hstore/hstore_io.c
--- b/contrib/hstore/hstore_io.c
*************** get_val(HSParser *state, bool ignoreeq,
*** 74,81 ****
--- 74,88 ----
              }
              else if (*(state->ptr) == '=' && !ignoreeq)
              {
+                 /* Empty key is a syntax error */
                  elog(ERROR, "Syntax error near '%c' at position %d", *(state->ptr), (int4) (state->ptr -
state->begin));
              }
+             else if (*(state->ptr) == ',' && ignoreeq)
+             {
+                 /* Empty value is perfectly OK */
+                 state->ptr--;
+                 return true;
+             }
              else if (*(state->ptr) == '\\')
              {
                  st = GV_WAITESCIN;
*************** parse_hstore(HSParser *state)
*** 191,197 ****
          if (st == WKEY)
          {
              if (!get_val(state, false, &escaped))
!                 return;
              if (state->pcur >= state->plen)
              {
                  state->plen *= 2;
--- 198,204 ----
          if (st == WKEY)
          {
              if (!get_val(state, false, &escaped))
!                 return;            /* end of string */
              if (state->pcur >= state->plen)
              {
                  state->plen *= 2;
*************** parse_hstore(HSParser *state)
*** 236,242 ****
          else if (st == WVAL)
          {
              if (!get_val(state, true, &escaped))
!                 elog(ERROR, "Unexpected end of string");
              state->pairs[state->pcur].val = state->word;
              state->pairs[state->pcur].vallen = hstoreCheckValLen(state->cur - state->word);
              state->pairs[state->pcur].isnull = false;
--- 243,252 ----
          else if (st == WVAL)
          {
              if (!get_val(state, true, &escaped))
!             {
!                 /* end of string, treat as empty value */
!                 state->ptr--;
!             }
              state->pairs[state->pcur].val = state->word;
              state->pairs[state->pcur].vallen = hstoreCheckValLen(state->cur - state->word);
              state->pairs[state->pcur].isnull = false;

Re: hstore parser incorrectly handles malformed input

From
Vik Reykja
Date:
On Fri, Apr 27, 2012 at 03:12, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Does anybody else have an opinion as to which of these solutions is
> more preferable?
>

I think all unquoted whitespace should be ignored, so I prefer your
solution. (note: I haven't actually tested it, I'm going off these emails)


> And should we regard this as a back-patchable bug
> fix, or a definition change suitable only for HEAD?
>

Since this is removing a syntax error and not creating one, I'd say it
should be safe to backpatch.

Re: hstore parser incorrectly handles malformed input

From
Ryan Kelly
Date:
On Fri, Apr 27, 2012 at 11:27:03AM +0200, Vik Reykja wrote:
> On Fri, Apr 27, 2012 at 03:12, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> > Does anybody else have an opinion as to which of these solutions is
> > more preferable?
> >
>
> I think all unquoted whitespace should be ignored, so I prefer your
> solution. (note: I haven't actually tested it, I'm going off these emails)
As long as we make it consistent on both sides of the '=>' (and document
it, too), then I don't really care either way. Currently you have to use
quotes to get an empty key, so I thought it natural to that you should
have to quote to get an empty value.

I've attached a modified version of Tom's patch which also allows empty
keys.

>
>
> > And should we regard this as a back-patchable bug
> > fix, or a definition change suitable only for HEAD?
> >
>
> Since this is removing a syntax error and not creating one, I'd say it
> should be safe to backpatch.

-Ryan Kelly

Attachment

Re: hstore parser incorrectly handles malformed input

From
Tom Lane
Date:
Ryan Kelly <rpkelly22@gmail.com> writes:
> As long as we make it consistent on both sides of the '=>' (and document
> it, too), then I don't really care either way. Currently you have to use
> quotes to get an empty key, so I thought it natural to that you should
> have to quote to get an empty value.

> I've attached a modified version of Tom's patch which also allows empty
> keys.

Hm ... I don't agree that keys and values are interchangeable, and
I don't see that empty keys are a good thing (whereas empty values are
clearly a reasonable edge case).  So I think this is going a bit far;
it seems to me it'd be giving up a lot of syntax-error detection
capability in return for some not-actually-helpful symmetry.

On the other hand, I seldom use hstore so I'm probably not the best
person to be judging the appropriateness of these options.  Any other
votes out there?

            regards, tom lane

Re: hstore parser incorrectly handles malformed input

From
Ryan Kelly
Date:
On Fri, Apr 27, 2012 at 10:22:11AM -0400, Tom Lane wrote:
> Ryan Kelly <rpkelly22@gmail.com> writes:
> > As long as we make it consistent on both sides of the '=>' (and document
> > it, too), then I don't really care either way. Currently you have to use
> > quotes to get an empty key, so I thought it natural to that you should
> > have to quote to get an empty value.
>
> > I've attached a modified version of Tom's patch which also allows empty
> > keys.
>
> Hm ... I don't agree that keys and values are interchangeable, and
> I don't see that empty keys are a good thing (whereas empty values are
> clearly a reasonable edge case).  So I think this is going a bit far;
> it seems to me it'd be giving up a lot of syntax-error detection
> capability in return for some not-actually-helpful symmetry.
I don't think any language supports the empty key edge case in this way.
The only language I know of that will let you get away with it is Perl,
and in that case you get undef, not the empty string, and it's a warning
if you have warnings on:

$ use warnings;
$ my $h = { 1 => };
Odd number of elements in anonymous hash
$HASH1 = { 1 => undef };

In fact, it's akin to doing:
[db]> select hstore('{foo}'::text[]);
ERROR:  array must have even number of elements

Or maybe even this:
[db]> select hstore('{foo,}'::text[]);
ERROR:  malformed array literal: "{foo,}"

Python:
>>> h = {1 : }
SyntaxError: invalid syntax

Ruby:
>> h = {1 => }
SyntaxError: compile error

C#:
csharp> var h = new Dictionary<String, String>(){ { "1", } };
{interactive}(1,50): error CS1525: Unexpected symbol `}'
csharp> var h = new Dictionary<String, String>(){ { "1" } };
{interactive}(1,44): error CS1501: No overload for method `Add' takes `1' arguments

JavaScript:
js> a = { 1: }
typein:1: SyntaxError: syntax error:

PHP:
$ echo '<?php $h = array( 1 => ); ?>' | php
PHP Parse error:  syntax error, unexpected ')'

> On the other hand, I seldom use hstore so I'm probably not the best
> person to be judging the appropriateness of these options.  Any other
> votes out there?
As long as we're voting, I vote for my original patch. I think requiring
something on the RHS of => to get a value is a sane and rather common thing.

>
>             regards, tom lane

-Ryan Kelly

Re: hstore parser incorrectly handles malformed input

From
Alex Hunsaker
Date:
On Sat, Apr 28, 2012 at 17:20, Ryan Kelly <rpkelly22@gmail.com> wrote:

> I don't think any language supports the empty key edge case in this way.
> The only language I know of that will let you get away with it is Perl,
> and in that case you get undef, not the empty string, and it's a warning
> if you have warnings on:
>
> $ use warnings;
> $ my $h = { 1 => };
> Odd number of elements in anonymous hash

Well... kind of but not really, perl (as usual) is a bit funky here.
Try it with more than one key. What ends up happening is the 2nd key
is used as the first keys value... Surprise! You also don't get the
warning, it sees 1=>2,(void) or something.

$ perl -E 'my $h = {1=>,2=>}; say $h->{1};'
2

Anyway, +1 for what tom proposed.

Re: hstore parser incorrectly handles malformed input

From
Robert Haas
Date:
On Thu, Apr 26, 2012 at 9:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> Ryan Kelly <rpkelly22@gmail.com> writes:
>>> In my mind, all of these should have been rejected as erroneous input.
>>> To that end, I have attached a patch which causes all of these inputs
>>> to be rejected as invalid.
>
>> Hm, I would have expected all three of these to result in "a" having
>> an empty-string value. =A0I see nothing in the hstore documentation
>> suggesting that I must write a=3D>"" or some such to get an empty value,
>
> Attached is an alternative patch that fixes it that way.
>
> Does anybody else have an opinion as to which of these solutions is
> more preferable? =A0And should we regard this as a back-patchable bug
> fix, or a definition change suitable only for HEAD?

I vote for not back-patching, regardless of exactly what we decide to do he=
re.

--=20
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company