Re: nested hstore patch - Mailing list pgsql-hackers

From David E. Wheeler
Subject Re: nested hstore patch
Date
Msg-id AEC7F7D9-4B3C-4074-AF3C-5260600560E2@justatheory.com
Whole thread Raw
In response to nested hstore patch  (Teodor Sigaev <teodor@sigaev.ru>)
Responses Re: nested hstore patch
Re: nested hstore patch
List pgsql-hackers
On Nov 12, 2013, at 10:35 AM, Teodor Sigaev <teodor@sigaev.ru> wrote:

> Hi!
>
> Attatched patch adds nesting feature, types (string, boll and numeric values), arrays and scalar to hstore type.

My apologies for not getting to this sooner, work has been a bit nutty. The truth is that I reviewed this patch quite a
bita month back, mostly so I could write documentation, the results of which are included in this patch. And I'm super
excitedfor what's to come in the next iteration, as I hear that Teodor and Andrew are hard at work adding jsonb as a
binary-compatibleJSON data type. 

Meanwhile, for this version, a quick overview of what has changed since 9.2.

Contents & Purpose
==================

Improved Data Type Support
--------------------------

* Added data type support for values. Previously they could only be strings or NULL, but with this patch they can also
benumbers or booleans. 

* Added array support. Values can be arrays of other values. The format for arrays is a bracketed, comma-delimited
list.

* Added nesting support. hstore values can themselves be hstores. Nested hstores are wrapped in braces, but the
root-levelhstore is not (for compatibility with the format of previous versions of hstore). 

* An hstore value is no longer required to be an hstore object. It can now be any scalar value.

These three items make the basic format feature-complete with JSON. Here's an example where the values are scalars:
   =% SELECT 'foo'::hstore, '"hi \"bob\""'::hstore, '1.0'::hstore, 'true'::hstore, NULL::hstore;    hstore |    hstore
 | hstore | hstore | hstore    --------+--------------+--------+--------+--------    "foo"  | "hi \"bob\"" | 1.0    | t
    |  

And here are a couple of arrays with strings, numbers, booleans, and NULLs:
   SELECT '[k,v]'::hstore, '[1.0, "hi there", false, null]'::hstore;      hstore   |           hstore
------------+----------------------------   ["k", "v"] | [1.0, "hi there", f, NULL] 

Here's a complicated example formatted with `hstore.pretty_print` enabled.
=% SET hstore.pretty_print=true;=% SELECT '{  "type" => "Feature",  "bbox" => [-180.0, -90.0, 180.0, 90.0],  "geometry"
=>{    "type" => "Polygon",    "coordinates" => [[      [-180.0, 10.0], [20.0, 90.0], [180.0, -5.0], [-30.0, -90.0]
]]    }}'::hstore;          hstore          -------------------------- "bbox"=>                + [
+     -180.0,             +     -90.0,              +     180.0,              +     90.0                + ],
         + "type"=>"Feature",      + "geometry"=>            + {                       +     "type"=>"Polygon",  +
"coordinates"=>    +     [                   +         [               +             [           +
-180.0,+                 10.0    +             ],          +             [           +                 20.0,   +
        90.0    +             ],          +             [           +                 180.0,  +                 -5.0
+            ],          +             [           +                 -30.0,  +                 -90.0   +             ]
        +         ]               +     ]                   + } 

So, exact feature parity with the JSON data type.

* hstore.pretty_print is a new GUC, specifically to allow an HSTORE value to be pretty-printed. There is also a
functionto pretty-print, so we might be able to just do away with the GUC. 

Interface
---------

* New operators: + `hstore -> int`:     Get string value at array index (starting at 0) + `hstore ^> text`:    Get
numericvalue for key + `hstore ^> int`:     Get numeric value at array index + `hstore ?> text`:    Get boolean value
forkey + `hstore ?> int`:     Get boolean value at array index + `hstore #> text[]`:  Get string value for key path +
`hstore#^> text[]`: Get numeric value for key path + `hstore #?> text[]`: Get boolean value for key path + `hstore %>
text`:   Get hstore value for key + `hstore %> int`:     Get hstore value at array index + `hstore #%> text[]`: Get
hstorevalue for key path + `hstore ? int`:      Does hstore contain array index + `hstore #? text[]`:  Does hstore
containkey path + `hstore - int`:      Delete index from left operand + `hstore #- text[]`:  Delete key path from left
operand

* New functions: + `hstore(text)`:             Make a text scalar hstore + `hstore(numeric)`:          Make a numeric
scalarhstore + `hstore(boolean)`:          Make a boolean scalar hstore + `hstore(text, hstore)`:     Make a nested
hstore+ `hstore(text, numeric)`:    Make an hstore with a key and numeric value + `hstore(text, boolean)`:    Make an
hstorewith a key and boolean value + `array_to_hstore(anyarray): Make an array hstore from an SQL array +
`hvals(hstore)`            Get values as a set of hstore values + `json_to_hstore(json)`      Convert JSON to hstore +
`each_hstore(hstore)`      Get set of hstore key/value pairs + `hstore_typeof(hstore)`     Return text name for the
hstoretype (hash, array, text, numeric, etc.) + `replace(hstore,text[],hstore)`:     Replace value at specified path +
`concat_path(hstore,text[],hstore)`:Concatenate hstore value at specified path + `hstore_print(hstore, params)`:
Formathstore as text 
 The hstore_print() function has a number of optional boolean parameters to affect how the resulting text is formatted.
Theyall default to false: 
   - pretty_print   - array_curly_braces: use {} instead of [] for arrays   - root_hash_decorated: Use {} for the root
hash  - json: Format as JSON   - loose: Try to parse numbers and booleans from text values 

Other Changes
-------------

* New casts: JSON and HSTORE can be cast to each other. I don't think they're implicit, though the forthcoming jsonb
datatype might support explicit casting to and from hstore, since internally they will be identical. 

* The internal representation has been changed, but should be backward (and pg_upgrade) compatible, just as Andrew
Gierth'schange from 8.4 to 9.0 was. One can do an in-place update to rewrite all records at once. Of course, nested
and/ornon-hash hstore values dumped from 9.4 will not be able to be loaded into 9.3. 

* GIN indexing is now supported. This is actually pretty amazing. For an hstore value, even hash keys are considered
values,as far as the index is concerned. This makes it efficient to find hstore values that contain a key. I wrote an
examplein this blog post: 
 http://theory.so/pg/2013/10/25/indexing-nested-hstore/

Submission review
=================
* Is the patch in a patch format which has context? Yes.
* Does it apply cleanly to the current git master? It did for me, though I think Peter has found an issue or two since.
* Does it include reasonable tests, necessary doc patches, etc? Yes.

Usability review
================

* Does the patch actually implement what it says it does? Yes.
* Do we want that? OH yes.
* Do we already have it? No.
* Does it follow SQL spec, or the community-agreed behavior? Yes, though want jsonb, too.
* Does it include pg_dump support? Yes
* Are there dangers?  Could break backward compatibility, though I don't think it does.
* Have all the bases been covered? I think so

Feature test
============

* Does the feature work as advertised? Yes.
* Are there corner cases the author has failed to consider? All I noticed were promptly fixed.
* Are there any assertion failures or crashes? No

Performance review
==================

* Does the patch slow down simple tests?  No
* If it claims to improve performance, does it? Yes, with GIN index support. Loading hstore values is slower than
loadingJSON, but everything else is faster than JSON. 
* Does it slow down other things? No.

Coding review
=============

* Does it follow the project guidelines? Yes.
* Are there portability issues?  Unknown
* Will it work on Windows/BSD etc? Tested on OS X only.
* Are the comments sufficient and accurate? Yes.
* Does it do what it says, correctly? As best I can tell, yes.
* Does it produce compiler warnings? No.
* Can you make it crash? No, but I did find a bug or two that was promptly fixed.

Architecture review
===================

* Is everything done in a way that fits together coherently with other features/modules? yes.
* Are there interdependencies that can cause problems? No

Conclusion
==========

I love where nested hstore is going, especially since it will be used for jsonb, too. The nesting, data type, and GIN
indexsupport is really great, and the new constructors provide a nice SQL API that make it easy to use. I think that
thenext version of this patch will be full of win for the project. 

This was considered a WIP patch, since the jsonb support is still forthcoming, so it's appropriate to leave it marked
“Returnedwith feedback”. As Andrew is doing much of that work, the code itself will get a much closer examination from
him.But for the hstore feature itself, I think the current interface and features are ready to go. 

Best,

David













pgsql-hackers by date:

Previous
From: Gavin Flower
Date:
Subject: Re: make_interval ??
Next
From: Andres Freund
Date:
Subject: Re: nested hstore patch