Re: working around JSONB's lack of stats? - Mailing list pgsql-performance

From Jim Nasby
Subject Re: working around JSONB's lack of stats?
Date
Msg-id 54D028E3.9040502@BlueTreble.com
Whole thread Raw
In response to Re: working around JSONB's lack of stats?  (Josh Berkus <josh@agliodbs.com>)
List pgsql-performance
On 2/1/15 3:08 PM, Josh Berkus wrote:
> On 01/30/2015 05:34 PM, Jim Nasby wrote:
>> On 1/30/15 2:26 PM, Josh Berkus wrote:
>>> This would probably work because there aren't a lot of data structures
>>> where people would have the same key:value pair in different locations
>>> in the JSON, and care about it stats-wise.  Alternatetly, if the same
>>> key-value pair appears multiple times in the same sample row, we could
>>> cut the MC% by that multiple.
>>
>> Even if there were multiple occurrences, this would probably still be an
>> improvement.
>>
>> Another idea... at one time in the past when discussing statistics on
>> multiple columns, one idea was to build statistics on indexes. If we
>> built that, we could also do the same thing for at least JSONB (not sure
>> about JSON). Obviously doesn't help for stuff you haven't indexed, but
>> presumably if you care about performance and have any significant size
>> of data you've also indexed parts of the JSON, yes?
>
> I'm not clear on what you're suggesting here.  I'm discussing how the
> stats for a JSONB field would be stored and accessed; I don't understand
> what that has to do with indexing.

The JSON problem is similar to the problem of doing multi-column
statistics: there's no way to simply try to keep statistics on all
possible combinations because that's something that's can be extremely
large.

Something that's been proposed is only trying to keep multi-column stats
on column combinations that we have an index on (which in a way is
really just keeping stats on the index itself).

If we built that, we could use the same technique for JSON by simply
defining the indexes you needed for whatever you were searching on.

Obviously that's not as ideal is simply keeping full statistics on
everything in a JSON document, but it might be a heck of a lot easier to
accomplish.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


pgsql-performance by date:

Previous
From: Jim Nasby
Date:
Subject: Re: why pg_class.relfrozenxid needs to be updated for frozen tables (where all rows have xmin=2)?
Next
From: Jim Nasby
Date:
Subject: Re: why pg_class.relfrozenxid needs to be updated for frozen tables (where all rows have xmin=2)?