Re: WIP: collect frequency statistics for arrays - Mailing list pgsql-hackers

From Robert Haas
Subject Re: WIP: collect frequency statistics for arrays
Date
Msg-id BANLkTikX+iWCjKMUbkSjGv=K-C+cKSj5gw@mail.gmail.com
Whole thread Raw
In response to Re: WIP: collect frequency statistics for arrays  (Alexander Korotkov <aekorotkov@gmail.com>)
Responses Re: WIP: collect frequency statistics for arrays
List pgsql-hackers
On Sun, Jun 12, 2011 at 12:17 PM, Alexander Korotkov
<aekorotkov@gmail.com> wrote:
> I'm worrying about perfomance of "column <@ const" estimation. It takes
> O(m*(n+m)) of time, where m - const length and n - statistics target.
> Probably, it can be too slow is some some cases.

Hmm, that doesn't sound terribly promising.  The best thing to do here
is probably to construct a pessimal case and test it.  So, say, look
for a column <@ a 100-element array with a statistics target of 400...once you know how bad it is, then we can make
intelligentdecisions 
about where to go with it.

If the data type is hashable, you could consider building a hash table
on the MCVs and then do a probe for each element in the array.  I
think that's better than the other way around because there can't be
more than 10k MCVs, whereas the input constant could be arbitrarily
long.  I'm not entirely sure whether this case is important enough to
be worth spending a lot of code on, but then again it might not be
that much code.

Another option is to bound the number of operations you're willing to
perform to some reasonable limit, say, 10 * default_statistics_target.Work out ceil((10 * default_statistics_target) /
number-of-elements-in-const) and consider at most that many MCVs.
When this limit kicks in you'll get a less-accurate selectivity
estimate, but that's a reasonable price to pay for not blowing out
planning time.

>> At the moment I see no tests. If this code will be exercised by
>> existing tests then you should put some notes with the patch to
>> explain that, or at least provide some pointers as to how I might test
>> this.
>
> I didn't find in existing tests which check selectivity estimation accuracy.
> And I found difficult to create them because regression tests gives binary
> result while estimation accuracy is quantitative value. Existing regression
> tests covers case if typanalyze or selectivity estimation function falls
> down. I've added "ANALYZE array_op_test;" line into array test in order to
> these tests covers falldown case for this patch functions too.
> Seems that, selectivity estimation accuracy should be tested manually on
> various distributions. I've done very small amount of such tests.
> Unfortunately, few months pass before I got idea about "column <@ const"
> case. And now, I don't have sufficient time for it due to my GSoC project.
> It would be great if you can help me with this tests.

AFAIK, we don't have regression tests for any other selectivity
estimator; except perhaps to the extent that we verify they don't
crash.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Boolean operators without commutators vs. ALL/ANY
Next
From: Greg Smith
Date:
Subject: Re: pgbench--new transaction type