Re: Performance improvement for joins where outer side is unique - Mailing list pgsql-hackers
From | Kyotaro HORIGUCHI |
---|---|
Subject | Re: Performance improvement for joins where outer side is unique |
Date | |
Msg-id | 20150316.172855.213501931.horiguchi.kyotaro@lab.ntt.co.jp Whole thread Raw |
In response to | Re: Performance improvement for joins where outer side is unique (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: Performance improvement for joins where outer side
is unique
|
List | pgsql-hackers |
Hello, I don't have enough time for now but made some considerations on this. It might be a way that marking unique join peer at bottom and propagate up, not searching from top of join list. Around create_join_clause might be a candidate for it. I'll investigate that later. regards, At Sat, 14 Mar 2015 23:05:24 +1300, David Rowley <dgrowleyml@gmail.com> wrote in <CAApHDvoh-EKF51QQyNoJUe0eHYMZw6OzJjjgYP63Cmw7QfebjA@mail.gmail.com> dgrowleyml> On 14 March 2015 at 14:51, David Rowley <dgrowleyml@gmail.com> wrote: dgrowleyml> dgrowleyml> > On 13 March 2015 at 20:34, Kyotaro HORIGUCHI < dgrowleyml> > horiguchi.kyotaro@lab.ntt.co.jp> wrote: dgrowleyml> > dgrowleyml> >> Unfortunately I can't decide this to be 'ready for commiter' for dgrowleyml> >> dgrowleyml> > now. I think we should have this on smaller footprint, in a dgrowleyml> >> method without separate exhauxtive searching. I also had very dgrowleyml> >> similar problem in the past but I haven't find such a way for my dgrowleyml> >> problem.. dgrowleyml> >> dgrowleyml> >> dgrowleyml> > I don't think it's ready yet either. I've just been going over a few dgrowleyml> > things and looking at Tom's recent commit b557226 in pathnode.c I've got a dgrowleyml> > feeling that this patch would need to re-factor some code that's been dgrowleyml> > modified around the usage of relation_has_unique_index_for() as when this dgrowleyml> > code is called, the semi joins have already been analysed to see if they're dgrowleyml> > unique, so it could be just a case of ripping all of that out dgrowleyml> > of create_unique_path() and just putting a check to say rel->is_unique_join dgrowleyml> > in there. But if I do that then I'm not quite sure if dgrowleyml> > SpecialJoinInfo->semi_rhs_exprs and SpecialJoinInfo->semi_operators would dgrowleyml> > still be needed at all. These were only added in b557226. Changing this dgrowleyml> > would help reduce the extra planning time when the query contains dgrowleyml> > semi-joins. To be quite honest, this type of analysis belongs in dgrowleyml> > analyzejoin.c anyway. I'm tempted to hack at this part some more, but I'd dgrowleyml> > rather Tom had a quick glance at what I'm trying to do here first. dgrowleyml> > dgrowleyml> > dgrowleyml> dgrowleyml> I decided to hack away any change the code Tom added in b557226. I've dgrowleyml> changed it so that create_unique_path() now simply just uses if dgrowleyml> (rel->is_unique_join), instead off all the calls to dgrowleyml> relation_has_unique_index_for() and query_is_distinct_for(). This vastly dgrowleyml> simplifies that code. One small change is that Tom's checks for uniqueness dgrowleyml> on semi joins included checks for volatile functions, this check didn't dgrowleyml> exist in the original join removal code, so I've left it out. We'll never dgrowleyml> match a expression with a volatile function to a unique index as indexes dgrowleyml> don't allow volatile function expressions anyway. So as I understand it dgrowleyml> this only serves as a fast path out if the join condition has a volatile dgrowleyml> function... But I'd assume that check is not all that cheap. dgrowleyml> dgrowleyml> I ended up making query_supports_distinctness() and query_is_distinct_for() dgrowleyml> static in analyzejoins.c as they're not used in any other files. dgrowleyml> relation_has_unique_index_for() is also now only used in analyzejoins.c, dgrowleyml> but I've not moved it into that file yet as I don't want to bloat the dgrowleyml> patch. I just added a comment to say it needs moved. dgrowleyml> dgrowleyml> I've also added a small amount of code to query_is_distinct_for() which dgrowleyml> allows subqueries such as (select 1 a offset 0) to be marked as unique. I dgrowleyml> thought it was a little silly that these were not being detected as unique, dgrowleyml> so I fixed it. This has the side effect of allowing left join removals for dgrowleyml> queries such as: select t1.* from t1 left join (select 1 a offset 0) a on dgrowleyml> t1.id=a.a; dgrowleyml> dgrowleyml> Updated patch attached. dgrowleyml> dgrowleyml> Regards dgrowleyml> dgrowleyml> David Rowley
pgsql-hackers by date: