Re: Collect frequency statistics for arrays - Mailing list pgsql-hackers

From Noah Misch
Subject Re: Collect frequency statistics for arrays
Date
Msg-id 20120117103322.GA19977@tornado.leadboat.com
Whole thread Raw
In response to Re: Collect frequency statistics for arrays  (Alexander Korotkov <aekorotkov@gmail.com>)
Responses Re: Collect frequency statistics for arrays  (Alexander Korotkov <aekorotkov@gmail.com>)
List pgsql-hackers
On Tue, Jan 17, 2012 at 12:04:06PM +0400, Alexander Korotkov wrote:
> Thanks for your fixes to the patch. Them looks correct to me. I did some
> fixes in the patch. The proof of some concepts is still needed. I'm going
> to provide it in a few days.

Your further fixes look good.  Could you also answer my question about the
header comment of mcelem_array_contained_selec()?

/** Estimate selectivity of "column <@ const" based on most common element* statistics.    Independent element
occurrencewould imply a particular* distribution of distinct element counts among matching rows.  Real data* usually
falsifiesthat assumption.  For example, in a set of 1-element* integer arrays having elements in the range [0;10],
elementoccurrences are* not independent.  If they were, a sufficiently-large set would include all* distinct element
counts0 through 11.  We correct for this using the* histogram of distinct element counts.** In the "column @> const"
and"column && const" cases, we usually have* "const" with low summary frequency of elements (otherwise we have*
selectivityclose to 0 or 1 correspondingly).  That's why the effect of* dependence related to distinct element counts
distributionis negligible* there.  In the "column <@ const" case, summary frequency of elements is* high (otherwise we
haveselectivity close to 0).  That's why we should do* correction due to array distinct element counts distribution.*/
 

By "summary frequency of elements", do you mean literally P_0 + P_1 ... + P_N?
If so, I can follow the above argument for "column && const" and "column <@
const", but not for "column @> const".  For "column @> const", selectivity
cannot exceed the smallest frequency among const elements.  A number of
high-frequency elements will drive up the sum of the frequencies without
changing the true selectivity much at all.

Thanks,
nm


pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: BGWriter latch, power saving
Next
From: Fujii Masao
Date:
Subject: Re: Online base backup from the hot-standby