Home > mailing lists

Re: Performance improvement for joins where outer side is unique - Mailing list pgsql-hackers

From	David Rowley
Subject	Re: Performance improvement for joins where outer side is unique
Date	March 28, 2015 07:35:09
Msg-id	CAApHDvrvPqgjcko3wUcv2+Sp5UxkVU6HYL54nNyuqQkmrCZ-=g@mail.gmail.com Whole thread
In response to	Re: Performance improvement for joins where outer side is unique (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses	Re: Performance improvement for joins where outer side is unique
List	pgsql-hackers

Tree view

On 25 March 2015 at 01:11, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hi, thanks for the new patch.

I made an additional shrink from your last one. Do you have a
look on the attached?

Thanks, for looking again.

I'm not too sure I like the idea of relying on join removals to mark the is_unique_join property.

By explicitly doing it in mark_unique_joins we have the nice side effect of not having to re-check a relations unique properties after removing another relation, providing the relation was already marked as unique on the first pass.

At Sun, 22 Mar 2015 19:42:21 +1300, David Rowley <dgrowleyml@gmail.com> wrote in <CAApHDvrKwMmTwkXfn4uazYZA9jQL1c7UwBjBtuwFR69rqLVKfA@mail.gmail.com>
> On 20 March 2015 at 21:11, David Rowley <dgrowleyml@gmail.com> wrote:
> >
> > I can continue working on your patch if you like? Or are you planning to
> > go further with it?

It's fine that you continue to work on this.

# Sorry for the hardly baked patch which had left many things alone:(

> I've been working on this more over the weekend and I've re-factored things
> to allow LEFT JOINs to be properly marked as unique.
> I've also made changes to re-add support for detecting the uniqueness of
> sub-queries.

I don't see the point of calling mark_unique_joins for every
iteration on join_info_list in remove_useless_joins.

I've fixed this in the attached. I must have forgotten to put the test for LEFT JOINs here as I was still thinking that I might make a change to the code that converts unique semi joins to inner joins so that it just checks is_unique_join instead of calling relation_has_unique_index_for().

The loop already iteraltes on whole the join_info_list so
mark_unique_join as an individual function is needless.

Finally, simply marking uniqueness of join in join_is_removable
seems to be enough, inhibiting early bailing out by the checks on
attribute usage and placeholder let it work as expected.

Reducing changes to this extent, I can easily see what is added
to planning computations. It consists of mainly two factors.

- Unique-join chekcs for every candidate inner joins in
add_paths_to_joinrel.

- Uniqueness check of mergejoinable clause in join-removability
check for every left join, some of which would be skipped
by other checks before.

> Also, I've added modified the costing for hash and nested loop joins to
> reduce the cost for unique inner joins to cost the join the same as it does
> for SEMI joins. This has tipped the scales on a few plans in the regression
> tests.

I've forgotten it, but quite important.

I've fixed quite a fundamental bug in my previous costing change. The fix for this makes it so I have to pass the unique_inner bool down to the costing functions. This also changes how the JoinPaths are marked as unique_inner. This is now done when the JoinPath is created, instead of updating it afterwards like it was done previously.

> Also, please see attached unijoin_analysis.patch. This just adds some code
> which spouts out notices when join nodes are initialised which states if
> the join is unique or not. Running the regression tests with this patch in
> places gives:
>
> Unique Inner: Yes == 753 hits
> Unique Inner: No == 1430 hits
>
> So it seems we can increase the speed of about 1 third of joins by about
> 10%.
> A quick scan of the "No"s seems to show quite a few cases which do not look
> that real world like. e.g cartesian join.

I don't have an idea how many queries in the reality hit this but
I suppose it's not a few.

> It would be great if someone could run some PostgreSQL application with
> these 2 patches applied, and then grep the logs for the Unique Inner
> results... Just to get a better idea of how many joins in a real world case
> will benefit from this patch.

Wow. I think the second patch should be DEBUGx, not NOTICE:)

I didn't give that much thought, but you're probably right. It was just a quick test and demo to try to raise some more interest in this. :-)

Regards

David Rowley

Attachment

unijoin_2015-03-28_d1923fb.patch

pgsql-hackers by date:

From: Andrew Gierth
Date: 28 March 2015, 05:16:28
Subject: Re: Rounding to even for numeric data type

From: Dean Rasheed
Date: 28 March 2015, 08:58:47
Subject: Re: Rounding to even for numeric data type

Re: Performance improvement for joins where outer side is unique - Mailing list pgsql-hackers

Attachment

Previous

Next