Re: jsonb and nested hstore - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: jsonb and nested hstore
Date
Msg-id 5320D20E.9030306@fuzzy.cz
Whole thread Raw
In response to Re: jsonb and nested hstore  (Peter Geoghegan <pg@heroku.com>)
Responses Re: jsonb and nested hstore  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On 12.3.2014 20:40, Peter Geoghegan wrote:
> On Wed, Mar 12, 2014 at 6:20 AM, Tomas Vondra <tv@fuzzy.cz> wrote:
>> I'm still not sure how would that look. Does that mean I'd have to
>> create multiple GIN indexes - one for each possible key or
>> something like that? Can you give an example?
> 
> It could mean that you're obliged to create multiple indexes, yes.
> For an example, and to get a better sense of what I mean, look at
> the documentation in the patch.

OK, will do.

> The idea that you're going to create one index on a jsonb, and it's
> going to be able to usefully index a lot of different queries doesn't
> seem practical for most use-cases. Mostly, people will have fairly
> homogeneous json documents, and they'll want to index certain nested
> fields common to all or at least a large majority of those documents.

I think that's unfounded assumption. Many users actually have very
little control over the documents or queries - a nice example may be the
mail archive, with headers stored in a hstore/jsonb. I have absolutely
no control over the headers or queries.

But I think this is a "feedback loop" too - what if many users actually
want that functionality, but realize that expression indexes are not
sufficient for their needs and thus don't even try (and so we don't hear
about them)?

And my experience is that this is actualy one of the very cool hstore
features - being able to index the whole structure and then do arbitrary
queries over that.

The only reason why I'm looking at jsonb is that it the improved support
for data types (especially arrays).

So I have my doubts about the claims that users have homogenous
documents and only want to index some fields with expression indexes.

> By indexing entire jsonb datums, do you hope to get much benefit out
> of the indexed values (as opposed to keys) being stored (in serialized
> form) in the GIN index? Because you *are* indexing a large nested
> structure as a value. Is that large nested structure going to appear
> in your query predicate, or are you just going to subscript the jsonb
> to get to the level that's of interest to query that? I'm pretty sure
> that people want the latter. Are you sure that your complaint isn't
> just that the default GIN opclass indexes values (as distinct from
> keys) that are large and unwieldy, and not terribly useful?

No, I don't expect a large nested structure to appear in the query. And
I expect most people won't need that, although I can imagine queries  @>
doing that (not sure if that checks for equality or 'subset').

But I'm not sure I understand how's this related to my original post?

All I was asking whether it wouldn't be enough to store a hash instead
of the original value, i.e. instead of this:
 {"from" : "john@example.com",  "to" : "jack@example.com",  "content-type" : "text/plain; charset=us-ascii",
"dkim-signature": ".... veeeery long value ...."}
 

this
 {129812 : 29382,  459821 : 1029381,   21083 : 102941,  111390 : 129010292}

which would solve issues with the long values and might still support
the queries (with recheck, of course). I don't know if that's what
jsonb_hash_ops do or if it's even possible / compatible with GIN.

> I don't think expressional indexes are some kind of unfortunate work
> around for a jsonb limitation. I think that they're the natural way to
> approach indexing a nested structure in Postgres. MongoDB, for
> example, does not magically index everything. You're still required to
> make choices about indexing that consider the access patterns.

For many usecases, expressional indexes are the right tool. But not for
all and I see no reason to just throw some tools away.

regards
Tomas




pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Performance Improvement by reducing WAL for Update Operation
Next
From: Tomas Vondra
Date:
Subject: Re: jsonb and nested hstore