Re: WIP: Upper planner pathification - Mailing list pgsql-hackers

From Tom Lane
Subject Re: WIP: Upper planner pathification
Date
Msg-id 8783.1456842626@sss.pgh.pa.us
Whole thread Raw
In response to Re: WIP: Upper planner pathification  (Teodor Sigaev <teodor@sigaev.ru>)
Responses Re: WIP: Upper planner pathification  (Greg Stark <stark@mit.edu>)
Re: WIP: Upper planner pathification  (Teodor Sigaev <teodor@sigaev.ru>)
List pgsql-hackers
Teodor Sigaev <teodor@sigaev.ru> writes:
> I tried to look into patch and I had a question (one for now): why LimitPath 
> doesn't contain actual limit/offset value? I saw a lot of subqueries with LIMIT 
> 1 which could be transformed into EXISTS subquery.

Oh, yeah, I intended to change that but didn't get to it yet.  Consider
it done.

> Me too. I applied the patch and can confirm that 'make test' doesn't fail on 
> FreeBSD 10.2. Now I will try to run kind of TPC-H with and without patch.

I do not think the patch will make a lot of performance difference as-is;
its value is more in what it will let us do later.  There are a couple of
regression test cases that change plans for the better, but it's sort of
accidental.  Those cases look like
 select d.* from d left join (select * from b group by b.id, b.c_id) s   on d.a = s.id;

and what happens in HEAD is that the subquery chooses a hashagg plan
and then the upper query decides a mergejoin would be a good idea ...
so it has to sort the output of the hashagg.  With the patch, what
comes back from the subquery is a Path for the hashagg and a Path
for doing the GROUP BY with Sort/Uniq.  The second path is more expensive,
but it survives the add_path tournament because it can produce sorted
output.  Then the outer level discovers that it can use that to do its
mergejoin without a separate sort step, and that way is cheaper overall.
So instead of

!    ->  Sort
!          Sort Key: s.id
!          ->  Subquery Scan on s
!                ->  HashAggregate
!                      Group Key: b.id
!                      ->  Seq Scan on b

we get

!    ->  Group
!          Group Key: b.id
!          ->  Index Scan using b_pkey on b

which is noticeably cheaper, and not just because we got rid of the
Subquery Scan node.  So that's nice --- but it's more or less accidental,
because the outer level isn't telling the inner level that this sort order
might be interesting.

Once this infrastructure is in place, I want to look at passing down more
information to recursive subquery_planner calls so that we're not leaving
this kind of optimization to chance.  But the patch is big enough already,
so that (and a lot of other things) are getting left for later.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: TAP / recovery-test fs-level backups, psql enhancements etc
Next
From: Julien Rouhaud
Date:
Subject: Re: Publish autovacuum informations