Thread: Re: [COMMITTERS] pgsql: Collect and use multi-column dependency stats

Re: [COMMITTERS] pgsql: Collect and use multi-column dependency stats

From
Tom Lane
Date:
David Rowley <david.rowley@2ndquadrant.com> writes:
> On 6 April 2017 at 10:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The buildfarm is unhappy about the fact that this changed the API
>> for clauselist_selectivity().  I am not convinced that that change
>> was a good idea, so before telling FDW authors that they need to
>> change their code, I'd like to hear a defense of the API change.

> Because varReliId is often passed in as 0, and that meant we'd have to
> write some code to check of the clause was made up of RestrictInfos
> from a single relation or not, and look for extended stats on that
> singleton rel.

Generally, if it's passed as zero, that's a good clue that the clause
*is* a join clause.  In any case, this defense fails to address my
other question, which is what's going to happen to this API when you
want to use extended stats in join-clause estimates, which I'd expect
to surely happen before very long.

Also, I find it hard to believe that a bms_get_singleton_member call is
going to be material in comparison to all the work that will be invoked
indirectly via whatever selectivity estimation function gets called for
each clause.  Even a single catcache fetch would swamp that.

So no, you have not convinced me that this isn't a broken design.

> FWIW, I found this function being called 72 times in a 5 way join
> search problem.

And you measured the overhead of doing it the other way to be ... ?
Premature optimization and all that.
        regards, tom lane



Re: [COMMITTERS] pgsql: Collect and use multi-columndependency stats

From
David Rowley
Date:
On 6 April 2017 at 11:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> David Rowley <david.rowley@2ndquadrant.com> writes:
>> On 6 April 2017 at 10:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> The buildfarm is unhappy about the fact that this changed the API
>>> for clauselist_selectivity().  I am not convinced that that change
>>> was a good idea, so before telling FDW authors that they need to
>>> change their code, I'd like to hear a defense of the API change.
>
>> Because varReliId is often passed in as 0, and that meant we'd have to
>> write some code to check of the clause was made up of RestrictInfos
>> from a single relation or not, and look for extended stats on that
>> singleton rel.
>
> Generally, if it's passed as zero, that's a good clue that the clause
> *is* a join clause.  In any case, this defense fails to address my
> other question, which is what's going to happen to this API when you
> want to use extended stats in join-clause estimates, which I'd expect
> to surely happen before very long.
>
> Also, I find it hard to believe that a bms_get_singleton_member call is
> going to be material in comparison to all the work that will be invoked
> indirectly via whatever selectivity estimation function gets called for
> each clause.  Even a single catcache fetch would swamp that.
>
> So no, you have not convinced me that this isn't a broken design.
>
>> FWIW, I found this function being called 72 times in a 5 way join
>> search problem.
>
> And you measured the overhead of doing it the other way to be ... ?
> Premature optimization and all that.

I tested with the attached, and it does not seem to hurt planner
performance executing:

explain select * from ab ab1
inner join ab ab2 on ab1.a = ab2.a and ab1.b = ab2.b
inner join ab ab3 on ab1.a = ab3.a and ab1.b = ab3.b
inner join ab ab4 on ab1.a = ab4.a and ab1.b = ab4.b
inner join ab ab5 on ab1.a = ab5.a and ab1.b = ab5.b
inner join ab ab6 on ab1.a = ab6.a and ab1.b = ab6.b
inner join ab ab7 on ab1.a = ab7.a and ab1.b = ab7.b
inner join ab ab8 on ab1.a = ab8.a and ab1.b = ab8.b;

after having executed:

create table ab (a int, b int);

I get:

find_relation_from_clauses
tps = 48.992918 (excluding connections establishing)
tps = 49.060407 (excluding connections establishing)
tps = 49.075815 (excluding connections establishing)

Master

tps = 48.938027 (excluding connections establishing)
tps = 48.066274 (excluding connections establishing)
tps = 48.727089 (excluding connections establishing)

running pgbench -n -T 60 -f 8wayjoin.sql


-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment