Re: working around JSONB's lack of stats? - Mailing list pgsql-performance

From Josh Berkus
Subject Re: working around JSONB's lack of stats?
Date
Msg-id 54D11C69.1040406@agliodbs.com
Whole thread Raw
In response to working around JSONB's lack of stats?  (Josh Berkus <josh@agliodbs.com>)
List pgsql-performance
On 02/02/2015 05:48 PM, Jim Nasby wrote:
> On 2/1/15 3:08 PM, Josh Berkus wrote:
>> I'm not clear on what you're suggesting here.  I'm discussing how the
>> stats for a JSONB field would be stored and accessed; I don't understand
>> what that has to do with indexing.
>
> The JSON problem is similar to the problem of doing multi-column
> statistics: there's no way to simply try to keep statistics on all
> possible combinations because that's something that's can be extremely
> large.
>
> Something that's been proposed is only trying to keep multi-column stats
> on column combinations that we have an index on (which in a way is
> really just keeping stats on the index itself).

The difficulty with column correlation (as with value correlation for
JSONB) is the combination of *values*, not the combination of *columns*.
 Via the brute force method, imagine you have one column with cardinalty
100, and another with cardinality 100,000.  This would require you do
keep 10 million different correlation coefficients in order to be able
to estimate correctly.  Even correlating MCVs would add up to quite a
bit of stats in short order; these days people frequently set statistics
to 1000.  The same goes for JSON keys.

> If we built that, we could use the same technique for JSON by simply
> defining the indexes you needed for whatever you were searching on.
>
> Obviously that's not as ideal is simply keeping full statistics on
> everything in a JSON document, but it might be a heck of a lot easier to
> accomplish.

Walk before trying to run, let alone fly, please.  Right now we don't
have selectivity estimation for a *single* key; let's do that before we
start talking about better estimation for combinations of keys.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


pgsql-performance by date:

Previous
From: Jim Nasby
Date:
Subject: Re: why pg_class.relfrozenxid needs to be updated for frozen tables (where all rows have xmin=2)?
Next
From: belal
Date:
Subject: Copy command Faster than original select