Re: Collect frequency statistics for arrays - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: Collect frequency statistics for arrays
Date
Msg-id CAPpHfdvm1z0dQ-v0=_+QF_Ws8LXfE_75xQ-n4dzR6eyffh213Q@mail.gmail.com
Whole thread Raw
In response to Re: Collect frequency statistics for arrays  (Noah Misch <noah@leadboat.com>)
Responses Re: Collect frequency statistics for arrays  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers
Hi!

Updated patch is attached. I've updated comment of mcelem_array_contained_selec with more detailed description of probability distribution assumption. Also, I found that "rest" behavious should be better described by Poisson distribution, relevant changes were made.

On Tue, Jan 17, 2012 at 2:33 PM, Noah Misch <noah@leadboat.com> wrote:
By "summary frequency of elements", do you mean literally P_0 + P_1 ... + P_N?
If so, I can follow the above argument for "column && const" and "column <@
const", but not for "column @> const".  For "column @> const", selectivity
cannot exceed the smallest frequency among const elements.  A number of
high-frequency elements will drive up the sum of the frequencies without
changing the true selectivity much at all.
Referencing to summary frequency is not really correct. It would be more correct to reference to number of element in "const". When there are many elements in "const", "column @> const" selectivity tends to be close to 0 and  "column @> const" tends to be close to 1. Surely, it's true when elements have some kind of middle values of frequencies (not very close to 0 and not very close to 1). I've replaced "summary frequency of elements" by "number of elements".

------
With best regards,
Alexander Korotkov.
Attachment

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: [PATCH] Support for foreign keys with arrays
Next
From: Mikko Tiihonen
Date:
Subject: Re: Optimize binary serialization format of arrays with fixed size elements