Re: Patch Review: Collect frequency statistics and selectivity estimation for arrays - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: Patch Review: Collect frequency statistics and selectivity estimation for arrays
Date
Msg-id CAPpHfdu_Hn3++doYxAh1sYKUwyH6Pk0jnis6r=RkeMXE5Wb20A@mail.gmail.com
Whole thread Raw
In response to Patch Review: Collect frequency statistics and selectivity estimation for arrays  (Nathan Boley <npboley@gmail.com>)
List pgsql-hackers
Hi!

Thank you for review. I've few questions about it.

On Fri, Jul 15, 2011 at 2:13 AM, Nathan Boley <npboley@gmail.com> wrote:
First, it makes me uncomfortable that you are using the MCV and histogram slot
kinds in a way that is very different from other data types.

I realize that tsvector uses MCV in the same way that you do but:

1) I don't like that very much either.
2) TS vector is different in that equality ( in the btree sense )
   doesn't make sense, whereas it does for arrays.

Using the histogram slot for the array lengths is also very surprising to me.

Why not just use a new STA_KIND? It's not like we are running out of
room, and this will be the second 'container' type that splits the container
and stores stats about the elements.
Thus, do you think we should collect both btree and frequency/length statistics for arrays?
 
1) In calc_distr you go to some lengths to avoid round off errors. Since it is
  certainly just the order of the estimate that matters, why not just
  perform the calculation in log space?
It seems to me that I didn't anything to avoid round off errors there...

------
With best regards,
Alexander Korotkov.  

pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: patch: enhanced get diagnostics statement 2
Next
From: Dave Page
Date:
Subject: Re: pg_class.relistemp