Re: Allowing NOT IN to use ANTI joins - Mailing list pgsql-hackers

From David Rowley
Subject Re: Allowing NOT IN to use ANTI joins
Date
Msg-id CAApHDvqRB-iFBy68=dCgqS46aRep7AuN2pou4KTwL8kX9YOcTQ@mail.gmail.com
Whole thread Raw
In response to Re: Allowing NOT IN to use ANTI joins  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Allowing NOT IN to use ANTI joins  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Allowing NOT IN to use ANTI joins  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Mon, Jul 14, 2014 at 8:55 PM, David Rowley <dgrowleyml@gmail.com> wrote:

Since the patch has not been marked as rejected I was thinking that I'd take a bash at fixing it up, but if you think this is a waste of time, please let me know.
 


I've made some changes to the patch so that it only allows the conversion to ANTI JOIN to take place if both the outer query's expressions AND the subquery's target list can be proved not to have NULLs.

I've attached a delta, which is the changes I've made on top of Tom's cleaned up version of my patch, and also a full patch.

I've also performed some benchmarks to try to determine how much time it takes to execute this null checking code. I ended up hacking the code a little for the benchmarks and just put the null checking function in a tight loop that performed 100000 iterations. 

Like:
if (under_not)
{
int x;
bool result;
for (x = 0; x < 100000; x++)
{
result = is_NOTANY_compatible_with_antijoin(parse, sublink);
}
if (!result)
return NULL;
}

I then ran 6 queries, 3 times each through the planner and grabbed the "Planning Time" from the explain analyze result.
I then removed the extra looping code (seen above) and compiled the code as it is with the attached patch.
I then ran each of the 6 queries again 3 times each and noted down the "Planning Time from the explain analyze result.

In my results I assumed that the  first set of times divided by 100000 would be the time taken to perform the NULL checks... This is not quite accurate, but all the other planning work was quite well drowned out by the 100k loop.

I found that the call to is_NOTANY_compatible_with_antijoin adds about 0.2% and 2.3% to total planning time. Though the 2.3% was quite an extreme case, and the 0.2% was the most simple case I could think of.

I've attached the complete results in html format. I've also attached the schema that I used and all 6 queries tested.

Here's 2 points which I think are important to note about the planning time overhead of this patch:
1. There is no additional overhead if the query has no NOT IN clause.
2. The test queries 3 and 6 were to benchmark the overhead of when the NOT NULL test fails. The slowest of these was test 3 which added just under 0.5% to the planning time. The query that added a 2.3% overhead performed an ANTI JOIN, so likely the reduction in execution time more than made up for the extra planning time.

Regards

David Rowley
Attachment

pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Removing dependency to wsock32.lib when compiling code on WIndows
Next
From: Michael Paquier
Date:
Subject: Re: Removing dependency to wsock32.lib when compiling code on WIndows