Re: Collect frequency statistics for arrays - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Collect frequency statistics for arrays
Date
Msg-id 16932.1330885480@sss.pgh.pa.us
Whole thread Raw
In response to Re: Collect frequency statistics for arrays  (Alexander Korotkov <aekorotkov@gmail.com>)
List pgsql-hackers
Alexander Korotkov <aekorotkov@gmail.com> writes:
> On Sun, Mar 4, 2012 at 5:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> 2. The tests in the above-mentioned message show that in most cases
>> where mcelem_array_contained_selec falls through to the "rough
>> estimate", the resulting rowcount estimate is just 1, ie we are coming
>> out with very small selectivities.  Although that path will now only be
>> taken when there are no stats, it seems like we'd be better off to
>> return DEFAULT_CONTAIN_SEL instead of what it's doing.  I think there
>> must be something wrong with the "rough estimate" logic.  Could you
>> recheck that?

> I think the wrong think with "rough estimate" is that assumption about
> independent occurrences of items is very unsuitable even for "rough
> estimate". The following example shows that "rough estimate" really works
> in the case of independent occurrences of items. ...
> It this particular case "rough estimate" is quite accurate. But in most
> part of cases it behaves really bad. It is why I started to invent
> calc_distr and etc. So, I think return DEFAULT_CONTAIN_SEL is OK unless
> we've some better ideas.

OK.  Looking again at that code, I notice that it also punts and returns
DEFAULT_CONTAIN_SEL if it's not given MCELEM stats, which it more or
less has to because without even a minfreq the whole calculation is just
hot air.  And there are no plausible scenarios where compute_array_stats
would produce an MCELEM slot but no count histogram.  So that says there
is no point in sweating over this case, unless you have an idea how to
produce useful results without MCELEM.

So I think it's sufficient to punt at the top of the function if no
histogram, and take out the various attempts to cope with the case.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Dimitri Fontaine
Date:
Subject: Re: Command Triggers, patch v11
Next
From: hubert depesz lubaczewski
Date:
Subject: Re: Our regex vs. POSIX on "longest match"