Re: json api WIP patch - Mailing list pgsql-hackers

From Daniel Farina
Subject Re: json api WIP patch
Date
Msg-id CAAZKuFb6Ho7MgiBuDt4trCf_gqh6bChJO46WnUfhbegOJUn66g@mail.gmail.com
Whole thread Raw
In response to Re: json api WIP patch  (Andrew Dunstan <andrew@dunslane.net>)
List pgsql-hackers
On Tue, Jan 15, 2013 at 12:17 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
>
> On 01/15/2013 02:47 PM, Merlin Moncure wrote:
>>
>> On Tue, Jan 15, 2013 at 1:04 PM, David Fetter <david@fetter.org> wrote:
>>>
>>> On Mon, Jan 14, 2013 at 07:52:56PM -0500, Andrew Dunstan wrote:
>>>>
>>>> On 01/14/2013 07:36 PM, Merlin Moncure wrote:
>>>>>
>>>>> While testing this I noticed that integer based 'get' routines are
>>>>> zero based -- was this intentional?  Virtually all other aspects of
>>>>> SQL are 1 based:
>>>>>
>>>>> postgres=# select json_get('[1,2,3]', 1);
>>>>>   json_get
>>>>> ----------
>>>>>   2
>>>>> (1 row)
>>>>>
>>>>> postgres=# select json_get('[1,2,3]', 0);
>>>>>   json_get
>>>>> ----------
>>>>>   1
>>>>> (1 row)
>>>>
>>>> Yes. it's intentional. SQL arrays might be 1-based by default, but
>>>> JavaScript arrays are not. JsonPath and similar gadgets treat the
>>>> arrays as zero-based. I suspect the Json-using community would not
>>>> thank us for being overly SQL-centric on this - and I say that as
>>>> someone who has always thought zero based arrays were a major design
>>>> mistake, responsible for countless off-by-one errors.
>>>
>>> Perhaps we could compromise by making arrays 0.5-based.
>>
>> Well, I'm not prepared to argue with Andrew in this one.  It was
>> surprising behavior to me, but that's sample size one.
>
> I doubt I'm very representative either. People like David Wheeler, Taras
> Mitran, Joe Van Dyk, and the Heroku guys would be better people to ask than
> me. I'm quite prepared to change it if that's the consensus.

Hello.

I'm inclined to go with the same gut feeling you had (zero-based-indexing).

Here is the background for my reasoning:

The downside of zero-based-indexing is that people who want to use
multiple sequential container types will inevitably have to deal with
detailed and not easily type-checked integer coordinates that mean
different things in each domain that will, no doubt, lead to a number
of off-by-one errors.  Nevertheless, this cost is already paid because
one of the first things many people will do in programs generating SQL
queries is try to zero-index a SQL array, swear a bit after figuring
things out (because a NULL will be generated, not an error), and then
adjust all the offsets. So, this is not a new problem.  On many
occasions I'm sure this has caused off-by-one bugs, or the NULLs
slipped through testing and delivered funny results, yet the world
moves on.

On the other hand, the downside of going down the road of 1-based
indexing and attempting to attain relative sameness to SQL arrays, it
would also feel like one would be obliged to implement SQL array
infelicities like 'out of bounds' being SQL NULL rather than an error,
related to other spectres like non-rectangular nested arrays.  SQL
array semantics are complex and The Committee can change them or --
slightly more likely -- add interactions, so it seems like a general
expectation that Postgres container types that happen to have any
reasonable ordinal addressing will implement some level of same-ness
with SQL arrays is a very messy one.  As such, if it becomes customary
to implement one-based indexing of containers, I think such customs
are best carefully circumscribed so that attempts to be 'like' SQL
arrays are only as superficial as that.

What made me come down on the side of zero-based indexing in spite of
the weaknesses are these two reasons:

* The number of people who use JSON and zero-based-indexing is very large, and furthermore, within that set the number
thatknow that SQL even defines array support -- much less that Postgres implements it -- is much smaller. Thus, one is
targetingcohesion with a fairly alien concept that is not understood by the audience.
 

* Maintaining PL integrated code that uses both 1-based indexing in PG functions and 0-based indexing in embedded
languagesthat are likely to be combined with JSON -- doesn't sound very palatable, and the use of such PLs (e.g. plv8)
seemspretty likely, too.  That can probably be a rich source of bugs and frustration.
 

If one wants SQL array semantics, it seems like the right way to get
them is coercion to a SQL array value.  Then one will receive SQL
array semantics exactly.

--
fdr



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: [PATCH] COPY .. COMPRESSED
Next
From: Tom Lane
Date:
Subject: Re: unlogged tables vs. GIST