Home > mailing lists

Re: jsonb and nested hstore - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: jsonb and nested hstore
Date	March 12, 2014 13:15:19
Msg-id	2747709e7ae47794d8a75368382720bd.squirrel@sq.gransy.com Whole thread Raw
In response to	Re: jsonb and nested hstore (Peter Geoghegan <pg@heroku.com>)
Responses	Re: jsonb and nested hstore
List	pgsql-hackers

Tree view

On 12 Březen 2014, 0:41, Peter Geoghegan wrote:
> On Tue, Mar 11, 2014 at 3:58 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
>>   ERROR:  index row size 1416 exceeds maximum 1352 for index "gin_idx"
>
> All index AMs have similar restrictions.

Yes, I know and I have no problem with restrictions in general. You may
run into similar issues with btree indexes on text columns with long text,
for example. The thing is that people don't generally index text directly,
because it usually does not make much sense, but using tsvector etc.

But with jsonb it's more likely because indexing is one of the goodies (at
least for me). And the discussions with several people interested in
storing json data I had recently went often like this:

me: It seems we'll have a better json datatype in 9.4.
them: Nice!
me: And it will be possible to do searches on arbitrary keys.
them: Yay!
me: And we actually got pretty significant improvements in GIN indexes.
them: Awesome!
me: But the values you may index need to be less than ~1500B.
them: Bummer :-(
me: Well, you can use GIST then.

>> A good example of such header is "dkim-signature" which basically
>> contains the whole message digitally signed with DKIM. The signature
>> tends to be long and non-compressible, thanks to the signature.
>>
>> I'm wondering what's the best way around this, because I suspect many
>> new users (especially those attracted by jsonb and GIN improvements)
>> will run into this. Maybe not immediately, but eventully they'll try to
>> insert a jsonb with long value, and it will fail ...
>
> The jsonb_hash_ops operator class just stores a 32-bit integer hash
> value (it always sets the recheck flag, which only some of the other
> default GIN opclass' strategies do). It only supports containment, and
> not the full variety of operators that the default opclass supports,
> which is why it isn't the default. I think that in practice the
> general recommendation will be that when indexing at the "top level",
> use jsonb_hash_ops. When indexing nested items, use the more flexible
> default GIN opclass. That seems like a pretty smart trade-off to me.

OK, I'll look into the jsonb_hash_ops - that sounds more or less like what
I was thinking about (and sure, storing hashes makes some operations
impossible to support).

The other thing I was thinking about is introducing some kind of upper
limit for the value length - e.g. index just the first 1kB, or something
like that. My experience is most values are way shorter, or actually
differ in the first 1kB, so this should allow most decisions to be made.
But I'm not really that familiar with how GIN works, so maybe this is
nonsense.

> The more I think about it, the more inclined I am to lose GiST support
> entirely for the time being. It lets us throw out about 700 lines of C
> code, which is a very significant fraction of the total, removes the
> one open bug, and removes the least understood part of the code. The
> GiST opclass is not particularly compelling for this.

I disagree with that. I see GiST as a simple fallback option for the cases
I described. I wasn't able to create a GIN index because of exceeding the
max item length, but GiST created just fine. It was considerably slower,
but it worked.

Tomas

pgsql-hackers by date:

From: Julien Tachoires
Date: 12 March 2014, 13:11:58
Subject: Rowtype column and domain subfield with DEFAULT and NOT NULL constraint

From: Andres Freund
Date: 12 March 2014, 13:19:41
Subject: Re: db_user_namespace a "temporary measure"

Re: jsonb and nested hstore - Mailing list pgsql-hackers

Previous

Next