Re: WIP: collect frequency statistics for arrays - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: WIP: collect frequency statistics for arrays
Date
Msg-id BANLkTikOowSvYoZWUE8b4uS7JdOZ=A-y4w@mail.gmail.com
Whole thread Raw
In response to Re: WIP: collect frequency statistics for arrays  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: WIP: collect frequency statistics for arrays
List pgsql-hackers
On Mon, Jun 13, 2011 at 8:16 AM, Robert Haas <robertmhaas@gmail.com> wrote:
If the data type is hashable, you could consider building a hash table
on the MCVs and then do a probe for each element in the array.  I
think that's better than the other way around because there can't be
more than 10k MCVs, whereas the input constant could be arbitrarily
long.  I'm not entirely sure whether this case is important enough to
be worth spending a lot of code on, but then again it might not be
that much code.
Unfortunately, most time consuming operation isn't related to elements comparison. It is caused by complex computations in calc_distr function.
 
Another option is to bound the number of operations you're willing to
perform to some reasonable limit, say, 10 * default_statistics_target.
 Work out ceil((10 * default_statistics_target) /
number-of-elements-in-const) and consider at most that many MCVs.
When this limit kicks in you'll get a less-accurate selectivity
estimate, but that's a reasonable price to pay for not blowing out
planning time.
 Good option. I'm going to add such condition to my patch.

------
With best regards,
Alexander Korotkov.

pgsql-hackers by date:

Previous
From: Dave Page
Date:
Subject: Re: FOREIGN TABLE doc fix
Next
From: "Kevin Grittner"
Date:
Subject: Re: SSI patch renumbered existing 2PC resource managers??