Re: Relids in upper relations - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Relids in upper relations |
Date | |
Msg-id | CA+Tgmob9GXWvoJ=65Nr5RDpHQVBTvmkSd2-u_dtJQCkfe=vNEg@mail.gmail.com Whole thread Raw |
In response to | Re: Relids in upper relations (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
On Sun, Oct 9, 2016 at 6:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Wed, Oct 5, 2016 at 9:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I think this is fundamentally wrongheaded. If we go that route, >>> the only valid relids for any upper path would be the union of all >>> baserel RTIs, ... > >> Hmm, but this is only true if the upper steps are always done last. >> Hackers on this list have been hoping to reorder joins with aggregates >> since at least 2008 - probably sooner, but that's when I started >> reading this mailing list. A simple example is: > >> SELECT order_line.order_id, order.customer_id, SUM(order_line.amount) >> FROM order_line, order WHERE order_line.order_id = order.order_id >> GROUP BY 1,2; > >> Doing the aggregation step first is likely to be much faster than >> doing the join first here, > > Please provide some reason to believe that. It's the nature of an > aggregate that it's sensitive to the number of rows going through it, > with only a tiny number of exceptions (and SUM ain't one). So you could > only push it down past joins that won't change the number of rows the > aggregate will process, and how is that going to make it any faster? Hmm, I guess it only works if you know that the order (order_id) is unique. If that's true, then each order_line.order_id will match 0 or 1 values in order.order_id. If you had duplicates in order.order_id, then SUM(order_line.amount) would effectively be multiplied by the number of duplicates, so pushing the join down doesn't work. But if order (order_id) is unique, then I think it's OK. For any given value of order_line.order_id, either every row in the group matches the one and only row in the order table where that value appears in order.order_id, or there is no such row and the whole group vanishes. As for why it should be faster, my intuition is that the aggregate might be slightly faster because the rows passing through it can be narrower, but the real win is the join should be quite a lot faster because the number of rows that need to be joined is probably being reduced by the aggregation step, possibly quite drastically. Here's basically that example with 10GB of TPC-H data: rhaas=# explain analyze select l_orderkey, o_custkey, sum(l_extendedprice) from lineitem, orders where l_orderkey = o_orderkey group by 1, 2; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------GroupAggregate (cost=15665868.56..17015554.73 rows=59986052 width=48) (actual time=114794.016..156967.012 rows=15000000 loops=1) Group Key: lineitem.l_orderkey, orders.o_custkey -> Sort (cost=15665868.56..15815833.69 rows=59986052 width=24) (actual time=114793.999..126303.591 rows=59986052 loops=1) Sort Key: lineitem.l_orderkey, orders.o_custkey Sort Method: external sort Disk: 2031640kB -> Merge Join (cost=244.37..4225682.88 rows=59986052 width=24) (actual time=0.020..56368.444 rows=59986052 loops=1) Merge Cond: (orders.o_orderkey = lineitem.l_orderkey) -> Index Scan using orders_pkey on orders (cost=0.43..665749.44 rows=15000000 width=12) (actual time=0.009..5722.540 rows=15000000 loops=1) -> Index Scan using idx_lineitem_orderkey on lineitem (cost=0.56..2772694.34 rows=59986052 width=16) (actual time=0.008..25766.270 rows=59986052 loops=1)Planning time: 0.251 msExecution time: 158320.203 ms (11 rows) rhaas=# explain analyze select l_orderkey, o_custkey, l_extendedprice from (select l_orderkey, sum(l_extendedprice) l_extendedprice from lineitem group by 1) x, orders where l_orderkey = o_orderkey; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------Merge Join (cost=1.00..3789127.49 rows=378670 width=48) (actual time=0.104..76107.364 rows=15000000 loops=1) Merge Cond: (lineitem.l_orderkey = orders.o_orderkey) -> GroupAggregate (cost=0.56..3077357.98 rows=378670 width=40) (actual time=0.071..59870.322 rows=15000000 loops=1) Group Key: lineitem.l_orderkey -> Index Scan using idx_lineitem_orderkeyon lineitem (cost=0.56..2772694.34 rows=59986052 width=16) (actual time=0.057..24382.053 rows=59986052 loops=1) -> Index Scan using orders_pkey on orders (cost=0.43..665749.44 rows=15000000 width=12) (actual time=0.030..6115.856 rows=15000000 loops=1)Planning time: 0.117 msExecution time: 77266.600 ms (8 rows) Actually, the major culprit here is that the plan which does the join first introduces a totally useless sort, but it looks like the second plan would be modestly faster even the planner were smart enough not to do that. If I set work_mem to 1GB and max_workers_per_gather to 8, the time for the aggregate-first plan improves to 66s, but the join-first plan doesn't change. I'm not sure this is actually a great example; 1.5 million output rows is a lot. But I've seen this kind of thing plenty of times in the real world; see also David Rowley's example. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: