Re: WIP: collect frequency statistics for arrays - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: WIP: collect frequency statistics for arrays
Date
Msg-id BANLkTikpkO1kkqDscmR_bWPqBrawhnmTAw@mail.gmail.com
Whole thread Raw
In response to Re: WIP: collect frequency statistics for arrays  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: WIP: collect frequency statistics for arrays
List pgsql-hackers
On Fri, Jun 10, 2011 at 9:03 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
Initial comments are that the code is well structured and I doubt
there will be problems at the code level. Looks like a good patch.
I'm worrying about perfomance of "column <@ const" estimation. It takes O(m*(n+m)) of time, where m - const length and n - statistics target. Probably, it can be too slow is some some cases.
 
At the moment I see no tests. If this code will be exercised by
existing tests then you should put some notes with the patch to
explain that, or at least provide some pointers as to how I might test
this.
I didn't find in existing tests which check selectivity estimation accuracy. And I found difficult to create them because regression tests gives binary result while estimation accuracy is quantitative value. Existing regression tests covers case if typanalyze or selectivity estimation function falls down. I've added "ANALYZE array_op_test;" line into array test in order to these tests covers falldown case for this patch functions too. 
Seems that, selectivity estimation accuracy should be tested manually on various distributions. I've done very small amount of such tests. Unfortunately, few months pass before I got idea about "column <@ const" case. And now, I don't have sufficient time for it due to my GSoC project. It would be great if you can help me with this tests.
 
Also, I'd like to see some more explanation. Either in comments, or
just as a post to hackers. That saves me time, but we need to be clear
about what this does and does not do, what it might do in the future
etc.. 3+ years from now we need to be able to remember what the code
was supposed to do. You will forget yourself in time, if you write
enough patches. Based on this, I think you'll be writing quite a few
more.
I've added some more comments. I'm afraid that it should be completely rewritten before committing due to my english. If some particular points should be clarified more, please, specify them. 
 
And of course, a few lines for the docs also.
I found that in statistics patch for tsvector only article about pg_stats view was corrected. I've corrected this article a little bit too.

------
With best regards,
Alexander Korotkov.
Attachment

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Creating new remote branch in git?
Next
From: Seref Arikan
Date:
Subject: Detailed documentation for external calls (threading, shared resources etc)