Home > mailing lists

Functional dependency in GROUP BY through JOINs - Mailing list pgsql-hackers

From	David Rowley
Subject	Functional dependency in GROUP BY through JOINs
Date	December 6, 2012 02:37:30
Msg-id	000001cdd341$7c6cc310$75464930$@gmail.com Whole thread Raw
Responses	Re: Functional dependency in GROUP BY through JOINs
List	pgsql-hackers

Tree view

<div class="WordSection1"><p class="MsoNormal">Some idle thoughts has provoked me to sending this email. I’m wondering
whatI’m seeing here, is a simple missed optimisation or something that would be very difficult to implement, or even
perhapssomething I’ve completely misunderstood.<p class="MsoNormal"> <p class="MsoNormal">To set the scene let’s go
withthe old products and sales example. In this example each product has been given a unique ID maybe due to perhaps
product_codesbeing a bit long and to keep the sales table as narrow as possible.<p class="MsoNormal"> <p
class="MsoNormal">Ifwe wanted to see the sales per product we could write something like this:<p class="MsoNormal"> <p
class="MsoNormal">SELECTp.product_code,SUM(s.quantity)<p class="MsoNormal">FROM products p<p class="MsoNormal">INNER
JOINbigsalestable s ON p.productid = s.productid<p class="MsoNormal">GROUP BY p.product_code;<p class="MsoNormal"> <p
class="MsoNormal">Thoughthis plan might not be quite as optimal as it could be as it performs the grouping after the
join.<pclass="MsoNormal"> <p class="MsoNormal">                                     QUERY PLAN<p
class="MsoNormal">-------------------------------------------------------------------------------------<p
class="MsoNormal">HashAggregate (cost=33195.13..33199.63 rows=450 width=150)<p class="MsoNormal">   ->  Hash Join 
(cost=20.13..28195.13rows=1000000 width=150)<p class="MsoNormal">         Hash Cond: (s.productid = p.productid)<p
class="MsoNormal">        ->  Seq Scan on bigsalestable s  (cost=0.00..14425.00 rows=1000000 width=8)<p
class="MsoNormal">        ->  Hash  (cost=14.50..14.50 rows=450 width=150)<p class="MsoNormal">               -> 
SeqScan on products p  (cost=0.00..14.50 rows=450 width=150)<p class="MsoNormal">(6 rows)<p class="MsoNormal"> <p
class="MsoNormal">Ofcourse the query could have been written in the first place as:<p class="MsoNormal"> <p
class="MsoNormal">SELECTp.product_code,s.quantity<p class="MsoNormal">FROM products AS p<p class="MsoNormal">INNER JOIN
(SELECTproductid,SUM(quantity) AS quantity FROM bigsalestable GROUP BY productid) AS s ON p.productid = s.productid;<p
class="MsoNormal"> <pclass="MsoNormal">And that would have given us a more efficient plan.<p class="MsoNormal"> <p
class="MsoNormal">Ofcourse, for these actual plans to be equivalent there would naturally have to be a unique index on
product_codein the products table.<p class="MsoNormal">I think I’m right in thinking that if a unique index exists to
matchthe group by clause, and the join condition is equality (probably using the same operator class as the unique
btreeindex?), then the grouping could be pushed up to before the join.<p class="MsoNormal"> <p class="MsoNormal">In the
attachedscript the hand optimised query runs about twice as fast with 1 million record sales table.<p
class="MsoNormal"> <pclass="MsoNormal">The existence of join removal makes me think we don’t want to always allow it up
towho or what is writing the query to allow the best choice in plan.<p class="MsoNormal"> <p class="MsoNormal">I’m not
reallyskilled enough in the arts of postgresql’s planner to look into how hard it would be to implement this, but I
thoughtI’d throw it out there to collect some comments on the idea.<p class="MsoNormal"> <p class="MsoNormal">Regards<p
class="MsoNormal"> <pclass="MsoNormal">David Rowley<p class="MsoNormal"> </div>

pgsql-hackers by date:

From: Dimitri Fontaine
Date: 06 December 2012, 02:34:38
Subject: Re: Dumping an Extension's Script

From: Robert Haas
Date: 06 December 2012, 02:40:50
Subject: Re: Enabling Checksums

Functional dependency in GROUP BY through JOINs - Mailing list pgsql-hackers

Previous

Next