Re: JSON Function Bike Shedding - Mailing list pgsql-hackers

From Merlin Moncure
Subject Re: JSON Function Bike Shedding
Date
Msg-id CAHyXU0xdK51OJLr_O0yHaj5g_3F6pe4x4HLZbSmtF7pi8jSQPw@mail.gmail.com
Whole thread Raw
In response to Re: JSON Function Bike Shedding  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: JSON Function Bike Shedding
List pgsql-hackers
On Wed, Feb 20, 2013 at 9:42 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Feb 19, 2013 at 10:00 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> Anyways, as to overloading in general, well, SQL is heavily
>> overloaded.  We don't have int_max, float_max, etc. and it would be
>> usability reduction if we did.
>
> That's true, but max(int) and max(float) are doing pretty much the
> same logical operation - they are taking the maximum of a group of
> numbers.  Overloading in cases where the semantics vary - e.g. + for
> both integer addition and string concatenation - is something else
> altogether, and I have not generally observed it to be a very good
> idea.  Sometimes it works in cases where it's part of the core
> language design, but we don't have the luxury of knowing what other
> data types we'll want to add in the future, and I'm vary wary of
> allowing JSON to engage in uncontrolled namespace pollution.

Sure: but that's another straw man:  abuse of + operator is case of
combining arbitrarily different behaviors (concatenation and
arithmetic aggregation) into uniform syntax.   This is bad, but a
different thing.   The right way to do it is to globally define the
behavior and map it to types if and only if it makes sense.  Again,
you want clean separation of 'what you're doing' vs 'what you're doing
it over'.


>> But that's not even the point; the
>> driving philosophy of SQL is that your data structures (and types) are
>> to be strongly decoupled from the manipulation you do -- this keeps
>> the language very general. That philosophy, while not perfect, should
>> be adhered to when possible.
>
> Perhaps, but that goal seems unlikely to be met in this case.  The
> JSON functions and operators are being named by one group of people
> with one set of sensibilities, and the hstore functions and operators
> were named by a different group of people with a different set of
> sensibilities (and therefore don't match), and the next type that
> comes along will be named according to yet another group of people
> with another set of sensibilities.  So we're unlikely to end up with a
> coherent set of primitives that operate on underlying data of a
> variety of types; we are instead likely to end up with an incoherent
> jumble.

json and hstore have overlap in the sense that you can use them to
define a tuple that is independent from database type system and
therefore free from it's restrictions (this is why 9.0+ hstore was a
complete game changer for trigger development).  Also a json object is
for all intents and purposes an hstore++ -- json is more general and
if json it gets the ability to be manipulated would probably displace
hstore for most usages.

So I'm not buying that: if the truly overlapping behaviors were
syntactically equivalent then you would be able to swap out the
implementation changing only the type without refactoring all your
code.   C++ STL works this way and that principle, at least, is good
despite all the C++ baggage headaches.

> Although we now have a JSON type in core, we should not pretend that
> it's in the same league as text or int4.  If those data types claim
> common function names like max and abs and common operator names like
> + and ||, it can be justified on the grounds that the appeal of those
> data types is pretty near universal.  JSON is a very popular format
> right now and I completely support adding more support for it, but I
> cheerfully submit that if you think it falls into the same category as
> "text" or "int4", you've gotten way too caught up in the hype.  It's
> completely appropriate to apply stricter criteria for namespace
> pollution to JSON than to a basic data type whose semantics are
> dictated by the SQL standard, the behavior of other database products,
> and fourth-grade math class.

I'm not buying into the hype at all.  I've been arguing (without much
success) for years that throwing arcane type specific functions into
the public namespace is incoherent, not the other way around.
array_upper()?  How about length() or count()?

Well, we need to to decide what to do here -- I'll call the vote about
even, and there plausible arguments to do it either way -- so how do
we resolve this?

merlin



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Materialized views WIP patch
Next
From: Pavel Stehule
Date:
Subject: Re: is it bug? - printing boolean domains in sql/xml function