Re: WIP: collect frequency statistics for arrays - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: WIP: collect frequency statistics for arrays |
Date | |
Msg-id | BANLkTikX+iWCjKMUbkSjGv=K-C+cKSj5gw@mail.gmail.com Whole thread Raw |
In response to | Re: WIP: collect frequency statistics for arrays (Alexander Korotkov <aekorotkov@gmail.com>) |
Responses |
Re: WIP: collect frequency statistics for arrays
|
List | pgsql-hackers |
On Sun, Jun 12, 2011 at 12:17 PM, Alexander Korotkov <aekorotkov@gmail.com> wrote: > I'm worrying about perfomance of "column <@ const" estimation. It takes > O(m*(n+m)) of time, where m - const length and n - statistics target. > Probably, it can be too slow is some some cases. Hmm, that doesn't sound terribly promising. The best thing to do here is probably to construct a pessimal case and test it. So, say, look for a column <@ a 100-element array with a statistics target of 400...once you know how bad it is, then we can make intelligentdecisions about where to go with it. If the data type is hashable, you could consider building a hash table on the MCVs and then do a probe for each element in the array. I think that's better than the other way around because there can't be more than 10k MCVs, whereas the input constant could be arbitrarily long. I'm not entirely sure whether this case is important enough to be worth spending a lot of code on, but then again it might not be that much code. Another option is to bound the number of operations you're willing to perform to some reasonable limit, say, 10 * default_statistics_target.Work out ceil((10 * default_statistics_target) / number-of-elements-in-const) and consider at most that many MCVs. When this limit kicks in you'll get a less-accurate selectivity estimate, but that's a reasonable price to pay for not blowing out planning time. >> At the moment I see no tests. If this code will be exercised by >> existing tests then you should put some notes with the patch to >> explain that, or at least provide some pointers as to how I might test >> this. > > I didn't find in existing tests which check selectivity estimation accuracy. > And I found difficult to create them because regression tests gives binary > result while estimation accuracy is quantitative value. Existing regression > tests covers case if typanalyze or selectivity estimation function falls > down. I've added "ANALYZE array_op_test;" line into array test in order to > these tests covers falldown case for this patch functions too. > Seems that, selectivity estimation accuracy should be tested manually on > various distributions. I've done very small amount of such tests. > Unfortunately, few months pass before I got idea about "column <@ const" > case. And now, I don't have sufficient time for it due to my GSoC project. > It would be great if you can help me with this tests. AFAIK, we don't have regression tests for any other selectivity estimator; except perhaps to the extent that we verify they don't crash. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: