Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation
Date
Msg-id CAFjFpRdb6CAUOWQVMEPbF8YeDFPcG0VNVj2GevyQ_d=tQNfBTA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation  (Yuto Hayamizu <y.hayamizu@gmail.com>)
Responses Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation  (Dean Rasheed <dean.a.rasheed@gmail.com>)
List pgsql-hackers
On Mon, Jan 29, 2018 at 10:42 AM, Yuto Hayamizu <y.hayamizu@gmail.com> wrote:
> On Fri, Jan 19, 2018 at 5:07 PM, Yuto Hayamizu <y.hayamizu@gmail.com> wrote:
>> My idea of improving this patch is that give a threshold N_limit,
>> and for q_1 ... q_N_limit, do the same weighted cost estimation in the
>> current version of this patch.
>> For q_{N_limit+1} ...., stop calling clauselist_selectivity for
>> calculating the weight
>> and reuse the result of clauselist_selectivity({q_1,q_2, ..., q_N_limit}).
>> For example, if N_limit=100, additional overhead is only
>> sub-milliseconds per each range table entry,
>> and cost estimation is surely better than the current postgres implementation.
>
> Attached patch implemented the improvement idea above.
> With this patch attached, performance degradation of the test query
> with many quals was <1%.
> Example test query is attached.
>

I looked at the patch and the idea sounds reasonable to me. But I
still have doubts about the usefulness of the patch. What this patch
does is to produce more accurate costs of scan of a single relation.
That's good, but it does that for all the paths created for that
relation. So the accurate estimate doesn't help us to choose one
method of scanning the relation over the other method. It also does
not affect the costs of different join paths, sort paths etc. IOW, I
can not imagine a find a query which will be planned in a different
way than upstream when we apply the patch and the new plan will be
more efficient than the previous one. I might be missing something
but. Can you please provide such a query.

In the patch clauselist_selectivity() gets called repeatedly for every
new qual added to the clause list. Instead, if we try to combine the
cost/row estimation with order_qual_clauses() or
clauselist_selectivity(), we might be able to what the current patch
does in O(n). clauselist_selectivity() calculates combined selectivity
of all the given quals in O(n). We should do something similar to that
in this case.

As suggested upthread by Tom, it will be more useful to have this
feature work with not just scans but joins as well.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


pgsql-hackers by date:

Previous
From: "Tsunakawa, Takayuki"
Date:
Subject: RE: [bug fix] Produce a crash dump before main() on Windows
Next
From: Simon Riggs
Date:
Subject: Re: Making "COPY partitioned_table FROM" faster