Thread: Patch to support SEMI and ANTI join removal

Patch to support SEMI and ANTI join removal

From

David Rowley

Date:

05 August 2014, 10:35:56

I've been working away at allowing semi and anti joins to be added to the list of join types that our join removal code supports.

The basic idea is that we can removal a semi or anti join if the left hand relation references the relation that's being semi/anti joined if the join condition matches a foreign key definition on the left hand relation.

To give an example:

Given the 2 tables:

create table t2 (t1_id int primary key);

create table t1 (value int references t2);

The join to t2 would not be required in:

select * from t1 where value in(select t1_id from t2);

Neither would it be here:

select * from t1 where not exists(select 1 from t2 where t1_id=value);

To give a bit of background, I initially proposed the idea here:

http://www.postgresql.org/message-id/CAApHDvq0NAi8cEqTNNdqG6mhFH__7_A6Tn9XU4V0cut9wab4gA@mail.gmail.com

And some issues were raised around the fact that updates to the referenced relation would only flush out changes to the referencing tables on completion of the command, and if we happened to be planning a query that was located inside a volatile function then we wouldn't know that the parent query hadn't updated some of these referenced tables.

Noah raised this concern here: http://www.postgresql.org/message-id/20140603235053.GA351732@tornado.leadboat.com

But proposed a solution here: http://www.postgresql.org/message-id/20140605000407.GA390318@tornado.leadboat.com

In the attached I've used Noah's solution to the problem, and it seems to work just fine. (See regression test in the attached patch)

Tom raised a point here: http://www.postgresql.org/message-id/19326.1401891282@sss.pgh.pa.us

Where he mentioned that it may be possible that the foreign key trigger queue gets added to after planning has taken place.

I've spent some time looking into this and I've not yet managed to find a case where this matters as it seems that updates made in 1 command are not visible to that same command. I've tested various different test cases in all transaction isolation levels and also tested update commands which call volatile functions that perform updates in the same table that the outer update will reach later in the command.

The patch (attached) is also now able to detect when a NOT EXISTS clause cannot produce any records at all.

If I make a simple change to the tables I defined above:

ALTER TABLE t1 ALTER COLUMN value SET NOT NULL;

Then the following will be produced:

explain (costs off) select * from t1 where not exists(select 1 from t2 where t1_id=value);

QUERY PLAN

--------------------------

Result

One-Time Filter: false

-> Seq Scan on t1

A small note on my intentions with this patch:

I'm not seeing the use case for all of this to be massive, I'm more interested in this patch to use it as a stepping stone towards implementing INNER JOIN removals which would use foreign keys in a similar way to attempt to prove that the join is not required. I decided to tackle semi and anti joins first as these are a far more simple case, and it also adds quite a bit of the infrastructure that would be required for inner join removal, plus if nobody manages to poke holes in my ideas with this then I should have good grounds to begin the work on the inner join removal code. I also think if we're bothering to load foreign key constraints at planning time, then only using them for inner join removals wouldn't be making full use of them, so likely this patch would be a good idea anyway.

Currently most of my changes are in analyzejoin.c, but I did also have to make changes to load the foreign key constraints so that they were available to the planner. One thing that is currently lacking, which would likely be needed, before the finished patch is ready, would be a "relhasfkeys" column in pg_class. Such a column would mean that it would be possible to skip scanning pg_constraint for foreign keys when there's none to find. I'll delay implementing that until I get a bit more feedback to weather this patch would be a welcome addition to the existing join removal code or not.

I'm submitting this (a little early) for the August commitfest, but if anyone has any time to glance at it before then then that would be a really good help.

Regards

David Rowley

Attachment

semianti_join_removal_2410c7c_2014-08-05.patch

Re: Patch to support SEMI and ANTI join removal

From

David Rowley

Date:

10 August 2014, 11:48:30

On Tue, Aug 5, 2014 at 10:35 PM, David Rowley <dgrowleyml@gmail.com> wrote:

The patch (attached) is also now able to detect when a NOT EXISTS clause cannot produce any records at all.

I've attached an updated version of the patch which fixes up some incorrect logic in the foreign key matching code, plus various comment improvements.

Regards

David Rowley

Attachment

semianti_join_removal_f92541e_2014-08-10.patch

Re: Patch to support SEMI and ANTI join removal

From

David Rowley

Date:

17 August 2014, 06:29:47

On Sun, Aug 10, 2014 at 11:48 PM, David Rowley <dgrowleyml@gmail.com> wrote:

I've attached an updated version of the patch which fixes up some incorrect logic in the foreign key matching code, plus various comment improvements.

I've made a few updated to the patch to simplify some logic in the code which analyses the join condition. The result is slightly faster code for detecting either successful or unsuccessful join removal.

I've also been doing a little benchmarking of the overhead that this adds to planning time for a handful of different queries.

With the queries I tested the overhead was between ~20 and ~423 nanoseconds per SEMI or ANTI join, the 20 was for the earliest fast path out on an unsuccessful removal and the 423 was for a successful removal. (tests done on a 4 year old intel i5 laptop). This accounted for between 0.01% and 0.2% of planning time for the tested queries. I was quite happy with this, but I did manage to knock it down a little more with the bms_get_singleton_v1.patch, which I've attached. This reduces the range to between ~15 and ~409 nanoseconds, but probably this is going into micro benchmark territory... so perhaps not worth the extra code...

With the benchmarks I just put semiorantijoin_is_removable() in a tight 1 million iteration loop and grabbed the total planning time for that, I then compared this to an unpatched master's planning time after dividing the time reported for the 1 million removals version by 1 million.

I didn't really find a good way to measure the extra overhead in actually loading the foreign key constraints in get_relation_info()

Regards

David Rowley

Attachment

Re: Patch to support SEMI and ANTI join removal

From

David Rowley

Date:

26 August 2014, 12:28:38

On Tue, Aug 5, 2014 at 10:35 PM, David Rowley <dgrowleyml@gmail.com> wrote:

Currently most of my changes are in analyzejoin.c, but I did also have to make changes to load the foreign key constraints so that they were available to the planner. One thing that is currently lacking, which would likely be needed, before the finished patch is ready, would be a "relhasfkeys" column in pg_class. Such a column would mean that it would be possible to skip scanning pg_constraint for foreign keys when there's none to find. I'll delay implementing that until I get a bit more feedback to weather this patch would be a welcome addition to the existing join removal code or not.

I've modified this patch to include a new "relhasfkey" column in pg_class, and then only attempt to load the foreign keys in get_relation_info() if the pg_class flag is true.

Currently what I'm not quite sure on is the best place to be clearing this flag. I see that relhaspkey is cleared during vacuum, but only if there's no indexes at all on the relation. It seems that it will remain set to "true" after vacuum, if the primary key is dropped and there's still other indexes on the relation. My guess here is that this is done so that pg_constraint does not have to be checked to see if a PK exists, which is why I'm not sure if this would be the correct place for me to do the same in order to determine if there's any FKs on the relation. Though I can't quite think where else I might clear this flag.

Any ideas or feedback on this would be welcome

Regards

David Rowley

Attachment

semianti_join_removal_3ab5ccb_2014-08-27.patch.gz

Re: Patch to support SEMI and ANTI join removal

From

Heikki Linnakangas

Date:

26 August 2014, 13:41:39

On 08/26/2014 03:28 PM, David Rowley wrote:
> Any ideas or feedback on this would be welcome

Before someone spends time reviewing this patch, are you sure this is 
worth the effort? It seems like very narrow use case to me. I understand 
removing LEFT and INNER joins, but the case for SEMI and ANTI joins 
seems a lot thinner. Unnecessary LEFT and INNER joins can easily creep 
into a query when views are used, for example, but I can't imagine that 
happening for a SEMI or ANTI join. Maybe I'm lacking imagination. If 
someone has run into a query in the wild that would benefit from this, 
please raise your hand.

If I understood correctly, you're planning to work on INNER join removal 
too. How much of the code in this patch is also required for INNER join 
removal, and how much is specific to SEMI and ANTI joins?

Just so everyone is on the same page on what kind of queries this helps 
with, here are some examples from the added regression tests:

> -- Test join removals for semi and anti joins
> CREATE TEMP TABLE b (id INT NOT NULL PRIMARY KEY, val INT);
> CREATE TEMP TABLE a (id INT NOT NULL PRIMARY KEY, b_id INT REFERENCES b(id));
> -- should remove semi join to b
> EXPLAIN (COSTS OFF)
> SELECT id FROM a WHERE b_id IN(SELECT id FROM b);
>           QUERY PLAN
> ------------------------------
>  Seq Scan on a
>    Filter: (b_id IS NOT NULL)
> (2 rows)
>
> -- should remove semi join to b
> EXPLAIN (COSTS OFF)
> SELECT id FROM a WHERE EXISTS(SELECT 1 FROM b WHERE a.b_id = id);
>           QUERY PLAN
> ------------------------------
>  Seq Scan on a
>    Filter: (b_id IS NOT NULL)
> (2 rows)
>
> -- should remove anti join to b
> EXPLAIN (COSTS OFF)
> SELECT id FROM a WHERE NOT EXISTS(SELECT 1 FROM b WHERE a.b_id = id);
>         QUERY PLAN
> --------------------------
>  Seq Scan on a
>    Filter: (b_id IS NULL)
> (2 rows)

- Heikki

Re: Patch to support SEMI and ANTI join removal

From

David Rowley

Date:

27 August 2014, 09:16:33

On Wed, Aug 27, 2014 at 1:40 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

On 08/26/2014 03:28 PM, David Rowley wrote:
Any ideas or feedback on this would be welcome

Before someone spends time reviewing this patch, are you sure this is worth the effort? It seems like very narrow use case to me. I understand removing LEFT and INNER joins, but the case for SEMI and ANTI joins seems a lot thinner. Unnecessary LEFT and INNER joins can easily creep into a query when views are used, for example, but I can't imagine that happening for a SEMI or ANTI join. Maybe I'm lacking imagination. If someone has run into a query in the wild that would benefit from this, please raise your hand.

I agree that the use case for removals of SEMI and ANTI join are a lot thinner than LEFT and INNER joins. My longer term goal here is to add join removal support for INNER joins. In order to do this I need the foreign key infrastructure which is included in this patch. I held back from just going ahead and writing the INNER JOIN removal patch as I didn't want to waste the extra effort in doing that if someone was to find a show stopper problem with using foreign keys the way I am with this patch. I was kind of hoping someone would be able to look at this patch a bit more and confirm to me that it's safe to do this or not before I go ahead and write the inner join version.

If I understood correctly, you're planning to work on INNER join removal too. How much of the code in this patch is also required for INNER join removal, and how much is specific to SEMI and ANTI joins?

Apart from the extra lines of code in remove_useless_joins(), there's 3 functions added here which won't be needed at all for INNER JOINs; semiorantijoin_is_removable(), convert_semijoin_to_isnotnull_quals() and convert_antijoin_to_isnull_quals(). Not including the regression tests, this is 396 lines with comments and 220 lines without. All of these functions are static and in analyzejoin.c.

The benchmarks I posted a few weeks back show that the overhead of performing the semi/anti join removal checks is quite low. I measured an extra 400 or so nanoseconds for a successful removal on my i5 laptop. Or just 15 nanoseconds on the earliest fast path for a non-removal. This accounted for between 0.008% and 0.2% of planning time for the queries I tested.

Regards

David Rowley

Re: Patch to support SEMI and ANTI join removal

From

Jim Nasby

Date:

27 August 2014, 17:14:04

On 8/26/14, 8:40 AM, Heikki Linnakangas wrote:
> Just so everyone is on the same page on what kind of queries this helps with, here are some examples from the added
regressiontests:

>
>> -- Test join removals for semi and anti joins
>> CREATE TEMP TABLE b (id INT NOT NULL PRIMARY KEY, val INT);
>> CREATE TEMP TABLE a (id INT NOT NULL PRIMARY KEY, b_id INT REFERENCES b(id));
>> -- should remove semi join to b
>> EXPLAIN (COSTS OFF)
>> SELECT id FROM a WHERE b_id IN(SELECT id FROM b);
<snip>
>> SELECT id FROM a WHERE EXISTS(SELECT 1 FROM b WHERE a.b_id = id);

I also fail to see a use for examples that are that silly *unless* we're talking machine-generated SQL, but I suspect
thatnormally uses JOINS.

Where I would expect this to be useful is in cases where we can pre-evaluate some other condition in the subqueries to
makethe subqueries useless (ie: SELECT id FROM b WHERE 1=1), or where the condition could be passed through (ie: SELECT
idFROM b WHERE id=42). Another possibility would be if there's a condition in the subquery that could trigger
constraintelimination.

Those are the real world cases I'd expect to see from anything reasonably sane (an adjective that doesn't always apply
tosome of the users I have to support...) My $0.01 on the burden of carrying the "useless" tests and code around is
thatit doesn't seem like all that much overhead...

-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net

Re: Patch to support SEMI and ANTI join removal

From

Tom Lane

Date:

27 August 2014, 18:23:47

Jim Nasby <jim@nasby.net> writes:
> On 8/26/14, 8:40 AM, Heikki Linnakangas wrote:
>> Just so everyone is on the same page on what kind of queries this helps with, here are some examples from the added
regressiontests:
 
>> 
> -- Test join removals for semi and anti joins
> CREATE TEMP TABLE b (id INT NOT NULL PRIMARY KEY, val INT);
> CREATE TEMP TABLE a (id INT NOT NULL PRIMARY KEY, b_id INT REFERENCES b(id));
> -- should remove semi join to b
> EXPLAIN (COSTS OFF)
> SELECT id FROM a WHERE b_id IN(SELECT id FROM b);
> <snip>
> SELECT id FROM a WHERE EXISTS(SELECT 1 FROM b WHERE a.b_id = id);

> I also fail to see a use for examples that are that silly *unless* we're talking machine-generated SQL, but I suspect
thatnormally uses JOINS.
 

> Where I would expect this to be useful is in cases where we can pre-evaluate some other condition in the subqueries
tomake the subqueries useless (ie: SELECT id FROM b WHERE 1=1), or where the condition could be passed through (ie:
SELECTid FROM b WHERE id=42). Another possibility would be if there's a condition in the subquery that could trigger
constraintelimination.
 

Unless I'm misunderstanding something, pretty much *any* WHERE restriction
in the subquery would defeat this optimization, since it would no longer
be certain that there was a match to an arbitrary outer-query row.  So
it seems unlikely to me that this would fire in enough real-world cases
to be worth including.  I am definitely not a fan of carrying around
deadwood in the planner.

If the majority of the added code is code that will be needed for
less-bogus optimizations, it might be all right; but I'd kind of want to
see the less-bogus optimizations working first.
        regards, tom lane

Re: Patch to support SEMI and ANTI join removal

From

David Rowley

Date:

11 September 2014, 11:14:41

On Thu, Aug 28, 2014 at 6:23 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

If the majority of the added code is code that will be needed for
less-bogus optimizations, it might be all right; but I'd kind of want to
see the less-bogus optimizations working first.

That seems fair. Likely there'd be not a great deal of value to semi and anti joins removal alone. I was more just trying to spread weight of an inner join removal patch...

So In response to this, I've gone off and written an inner join removal support patch (attached), which turned out to be a bit less complex than I thought.

Here's a quick demo, of the patch at work:

test=# create table c (id int primary key);

CREATE TABLE

test=# create table b (id int primary key, c_id int not null references c(id));

CREATE TABLE

test=# create table a (id int primary key, b_id int not null references b(id));

CREATE TABLE

test=#

test=# explain select a.* from a inner join b on a.b_id = b.id inner join c on b.c_id = c.id;

QUERY PLAN

-----------------------------------------------------

Seq Scan on a (cost=0.00..31.40 rows=2140 width=8)

Planning time: 1.061 ms

(2 rows)

Perhaps not a greatly useful example, but if you can imagine the joins are hidden in a view and the user is just requesting a small subset of columns, then it does seem quite powerful.

There's currently a few things with the patch that I'll list below, which may raise a few questions:

1. I don't think that I'm currently handling removing eclass members properly. So far the code just removes the Vars that belong to the relation being removed. I likely should also be doing bms_del_member(ec->ec_relids, relid); on the eclass, but perhaps I should just be marking the whole class as "ec_broken = true" and adding another eclass all everything that the broken one has minus the parts from the removed relation?

2. Currently the inner join removal is dis-allowed if the (would be) removal relation has *any* baserestrictinfo items. The reason for this is that we must ensure that the inner join gives us exactly 1 row match on the join condition, but a baserestrictinfo can void the proof that a foreign key would give us that a matching row does exist. However there is an exception to this that could allow that restriction to be relaxed. That is if the qual in baserestrictinfo use vars that are in an eclass, where the same eclass also has ec members vars that belong to the rel that we're using the foreign key for to prove the relation not needed.... umm.. that's probably better described by example:

Assume there's a foreign key a (x) reference b(x)

SELECT a.* FROM a INNER JOIN b ON a.x = b.x WHERE b.x = 1

relation b should be removable because an eclass will contain {a.x, b.x} and therefore s baserestrictinfo for a.x = 1 should also exist on relation a. Therefore removing relation b should produce equivalent results, i.e everything that gets filtered out on relation b will also be filtered out on relation a anyway.

I think the patch without this is still worth it, but if someone feels strongly about it I'll take a bash at supporting it.

3. Currently the inner join support does not allow removals using foreign keys which contain duplicate columns on the referencing side. e.g (a,a) references (x,y), this is basically because of the point I made in item 2. In this case a baserestrictinfo would exist on the referenced relation to say WHERE x = y. I'd have to remove the restriction described in item 2 and do a small change to the code that extracts the join condition from the eclass for this to work. But it's likely a corner case that's not worth too much trouble to support. I think probably if I saw an FK like that in the field, I'd probably scratch my head for a while, while trying to understanding why they bothered.

4. The patch currently only allows removals for eclass join types. If the rel has any joininfo items, then the join removal is disallowed. From what I can see equality type inner join conditions get described in eclasses, and only non-equality join conditions make it into the joininfo list, and since foreign keys only support equality operators, then I thought this was a valid restriction, however, if someone can show me a flaw in my assumption then I may need to improve this.

5. I've added a flag to pg_class called relhasfkey. Currently this gets set to true when a foreign key is added, though I've added nothing to set it back to false again. I notice that relhasindex gets set back to false during vacuum, if vacuum happens to find there to not be any indexes on the rel. I didn't put my logic here as I wasn't too sure if scanning pg_constraint during a vacuum seemed very correct, so I just left out the "setting it to false" logic based on the the fact that I noticed that relhaspkey gets away with quite lazy setting back to false logic (only when there's no indexes of any kind left on the relation at all).

The only think else I can think of is perhaps optimising a little. I was thinking likely most queries wont benefit from this too much, so I was thinking of adding some logic to skip all join removals by pulling out the varnos from the target list entries and skipping even attempting to perform a join removal for a relation that has its varno in the targetlist of the query. Though perhaps a few benchmarks will determine if this is worth it or not.

Comments are welcome. -- I'm really hoping this patch generates a bit more interest than the SEMI/ANTI join removal one!

Regards

David Rowley

Attachment

inner_join_removals_2014-09-11_38cf71c.patch

Re: Patch to support SEMI and ANTI join removal

From

Robert Haas

Date:

11 September 2014, 15:35:13

On Thu, Sep 11, 2014 at 7:14 AM, David Rowley <dgrowleyml@gmail.com> wrote:
> Here's a quick demo, of the patch at work:
>
> test=# create table c (id int primary key);
> CREATE TABLE
> test=# create table b (id int primary key, c_id int not null references
> c(id));
> CREATE TABLE
> test=# create table a (id int primary key, b_id int not null references
> b(id));
> CREATE TABLE
> test=#
> test=# explain select a.* from a inner join b on a.b_id = b.id inner join c
> on b.c_id = c.id;
>                      QUERY PLAN
> -----------------------------------------------------
>  Seq Scan on a  (cost=0.00..31.40 rows=2140 width=8)
>  Planning time: 1.061 ms
> (2 rows)

That is just awesome.  You are my new hero.

> 1. I don't think that I'm currently handling removing eclass members
> properly. So far the code just removes the Vars that belong to the relation
> being removed. I likely should also be doing bms_del_member(ec->ec_relids,
> relid); on the eclass, but perhaps I should just be marking the whole class
> as "ec_broken = true" and adding another eclass all everything that the
> broken one has minus the parts from the removed relation?

I haven't read the patch, but I think the question is why this needs
to be different than what we do for left join removal.

> Assume there's a foreign key a (x) reference b(x)
>
> SELECT a.* FROM a INNER JOIN b ON a.x = b.x WHERE b.x = 1
>
> relation b should be removable because an eclass will contain {a.x, b.x} and
> therefore s baserestrictinfo for a.x = 1 should also exist on relation a.
> Therefore removing relation b should produce equivalent results, i.e
> everything that gets filtered out on relation b will also be filtered out on
> relation a anyway.
>
> I think the patch without this is still worth it, but if someone feels
> strongly about it I'll take a bash at supporting it.

That'd be nice to fix, but IMHO not essential.

> 3. Currently the inner join support does not allow removals using foreign
> keys which contain duplicate columns on the referencing side. e.g (a,a)
> references (x,y), this is basically because of the point I made in item 2.
> In this case a baserestrictinfo would exist on the referenced relation to
> say WHERE x = y.

I think it's fine to not bother with this case.  Who cares?

> 4. The patch currently only allows removals for eclass join types. If the
> rel has any joininfo items, then the join removal is disallowed. From what I
> can see equality type inner join conditions get described in eclasses, and
> only non-equality join conditions make it into the joininfo list, and since
> foreign keys only support equality operators, then I thought this was a
> valid restriction, however, if someone can show me a flaw in my assumption
> then I may need to improve this.

Seems OK.

> 5. I've added a flag to pg_class called relhasfkey. Currently this gets set
> to true when a foreign key is added, though I've added nothing to set it
> back to false again. I notice that relhasindex gets set back to false during
> vacuum, if vacuum happens to find there to not be any indexes on the rel. I
> didn't put my logic here as I wasn't too sure if scanning pg_constraint
> during a vacuum seemed very correct, so I just left out the "setting it to
> false" logic based on the the fact that I noticed that relhaspkey gets away
> with quite lazy setting back to false logic (only when there's no indexes of
> any kind left on the relation at all).

The alternative to resetting the flag somehow is not having it in the
first place.  Would that be terribly expensive?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Patch to support SEMI and ANTI join removal

From

Tom Lane

Date:

11 September 2014, 15:47:37

Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Sep 11, 2014 at 7:14 AM, David Rowley <dgrowleyml@gmail.com> wrote:
>> 5. I've added a flag to pg_class called relhasfkey. Currently this gets set
>> to true when a foreign key is added, though I've added nothing to set it
>> back to false again. I notice that relhasindex gets set back to false during
>> vacuum, if vacuum happens to find there to not be any indexes on the rel. I
>> didn't put my logic here as I wasn't too sure if scanning pg_constraint
>> during a vacuum seemed very correct, so I just left out the "setting it to
>> false" logic based on the the fact that I noticed that relhaspkey gets away
>> with quite lazy setting back to false logic (only when there's no indexes of
>> any kind left on the relation at all).

> The alternative to resetting the flag somehow is not having it in the
> first place.  Would that be terribly expensive?

The behavior of relhaspkey is a legacy thing that we've tolerated only
because nothing whatsoever in the backend depends on it at all.  I'm not
eager to add more equally-ill-defined pg_class columns.
        regards, tom lane

Re: Patch to support SEMI and ANTI join removal

From

David Rowley

Date:

12 September 2014, 09:17:58

On Fri, Sep 12, 2014 at 3:35 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Sep 11, 2014 at 7:14 AM, David Rowley <dgrowleyml@gmail.com> wrote:
>

> 1. I don't think that I'm currently handling removing eclass members
> properly. So far the code just removes the Vars that belong to the relation
> being removed. I likely should also be doing bms_del_member(ec->ec_relids,
> relid); on the eclass, but perhaps I should just be marking the whole class
> as "ec_broken = true" and adding another eclass all everything that the
> broken one has minus the parts from the removed relation?

I haven't read the patch, but I think the question is why this needs
to be different than what we do for left join removal.

I discovered over here -> http://www.postgresql.org/message-id/CAApHDvo5wCRk7uHBuMHJaDpbW-b_oGKQOuisCZzHC25=H3__fA@mail.gmail.com during the early days of the semi and anti join removal code that the planner was trying to generate paths to the dead rel. I managed to track the problem down to eclass members still existing for the dead rel. I guess we must not have eclass members for outer rels? or we'd likely have seen some troubles with left join removals already.

In the meantime I'll fix up the inner join removal code to properly delete the ec_relids member for the dead rel. I guess probably the restrict info should come out too.

I know it's late in the commitfest, but since there was next to no interest in semi and anti join removals, can I rename the patch in the commitfest app to be "Inner join removals"? It's either that or I'd mark that patch as rejected and submit this one in October.

Regards

David Rowley

Re: Patch to support SEMI and ANTI join removal

From

David Rowley

Date:

12 September 2014, 09:37:31

On Fri, Sep 12, 2014 at 3:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Sep 11, 2014 at 7:14 AM, David Rowley <dgrowleyml@gmail.com> wrote:
>> 5. I've added a flag to pg_class called relhasfkey. Currently this gets set
>> to true when a foreign key is added, though I've added nothing to set it
>> back to false again. I notice that relhasindex gets set back to false during
>> vacuum, if vacuum happens to find there to not be any indexes on the rel. I
>> didn't put my logic here as I wasn't too sure if scanning pg_constraint
>> during a vacuum seemed very correct, so I just left out the "setting it to
>> false" logic based on the the fact that I noticed that relhaspkey gets away
>> with quite lazy setting back to false logic (only when there's no indexes of
>> any kind left on the relation at all).

> The alternative to resetting the flag somehow is not having it in the
> first place. Would that be terribly expensive?

I'd imagine not really expensive. I guess I just thought that it would be a good idea to save from having to bother looking in pg_constraint for foreign keys when none exist. The scan uses pg_constraint_conrelid_index so only would ever see the constraints for the rel being cached/loaded.

The behavior of relhaspkey is a legacy thing that we've tolerated only
because nothing whatsoever in the backend depends on it at all. I'm not
eager to add more equally-ill-defined pg_class columns.

I guess it's certainly not required. It would be easier to add it later if we decided it was a good idea, rather than having to keep it forever and a day if it's next to useless.

I'll remove it from the patch.

Regards

David Rowley

Re: Patch to support SEMI and ANTI join removal

From

Tom Lane

Date:

12 September 2014, 13:39:05

David Rowley <dgrowleyml@gmail.com> writes:
> On Fri, Sep 12, 2014 at 3:35 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I haven't read the patch, but I think the question is why this needs
>> to be different than what we do for left join removal.

> I discovered over here ->
> http://www.postgresql.org/message-id/CAApHDvo5wCRk7uHBuMHJaDpbW-b_oGKQOuisCZzHC25=H3__fA@mail.gmail.com
> during the early days of the semi and anti join removal code that the
> planner was trying to generate paths to the dead rel. I managed to track
> the problem down to eclass members still existing for the dead rel. I guess
> we must not have eclass members for outer rels? or we'd likely have seen
> some troubles with left join removals already.

Mere existence of an eclass entry ought not cause paths to get built.
It'd be worth looking a bit harder into what's happening there.
        regards, tom lane

Re: Patch to support SEMI and ANTI join removal

From

David Rowley

Date:

16 September 2014, 10:21:07

On Sat, Sep 13, 2014 at 1:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:
> On Fri, Sep 12, 2014 at 3:35 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I haven't read the patch, but I think the question is why this needs
>> to be different than what we do for left join removal.

> I discovered over here ->
> http://www.postgresql.org/message-id/CAApHDvo5wCRk7uHBuMHJaDpbW-b_oGKQOuisCZzHC25=H3__fA@mail.gmail.com
> during the early days of the semi and anti join removal code that the
> planner was trying to generate paths to the dead rel. I managed to track
> the problem down to eclass members still existing for the dead rel. I guess
> we must not have eclass members for outer rels? or we'd likely have seen
> some troubles with left join removals already.

Mere existence of an eclass entry ought not cause paths to get built.
It'd be worth looking a bit harder into what's happening there.

It took me a bit of time to create this problem again as I didn't record the actual query where I hit the assert failure the first time. Though, now I have managed to recreate the problem again by removing the code that I had added which removes eclass members for dead rels.

Using the attached patch, the failing test case is:

create table b (id int primary key);

create table a (id int primary key, b_id int references b);

create index on a (b_id); -- add index to create alternative path

explain select a.* from a inner join b on b.id=a.b_id;

What seems to be happening is that generate_implied_equalities_for_column generates a RestrictInfo for the dead rel due to the eclass member still existing. This new rinfo gets matched to the index by match_clauses_to_index()

The code then later fails in get_loop_count:

TRAP: FailedAssertion("!(outer_rel->rows > 0)", File: "src\backend\optimizer\path\indxpath.c", Line: 1861)

The call stack looks like:

> postgres.exe!get_loop_count(PlannerInfo * root, Bitmapset * outer_relids) Line 1861 C

postgres.exe!build_index_paths(PlannerInfo * root, RelOptInfo * rel, IndexOptInfo * index, IndexClauseSet * clauses, char useful_predicate, SaOpControl saop_control, ScanTypeControl scantype) Line 938 C

postgres.exe!get_index_paths(PlannerInfo * root, RelOptInfo * rel, IndexOptInfo * index, IndexClauseSet * clauses, List * * bitindexpaths) Line 745 C

postgres.exe!get_join_index_paths(PlannerInfo * root, RelOptInfo * rel, IndexOptInfo * index, IndexClauseSet * rclauseset, IndexClauseSet * jclauseset, IndexClauseSet * eclauseset, List * * bitindexpaths, Bitmapset * relids, List * * considered_relids) Line 672 C

postgres.exe!consider_index_join_outer_rels(PlannerInfo * root, RelOptInfo * rel, IndexOptInfo * index, IndexClauseSet * rclauseset, IndexClauseSet * jclauseset, IndexClauseSet * eclauseset, List * * bitindexpaths, List * indexjoinclauses, int considered_clauses, List * * considered_relids) Line 585 C

postgres.exe!consider_index_join_clauses(PlannerInfo * root, RelOptInfo * rel, IndexOptInfo * index, IndexClauseSet * rclauseset, IndexClauseSet * jclauseset, IndexClauseSet * eclauseset, List * * bitindexpaths) Line 485 C

postgres.exe!create_index_paths(PlannerInfo * root, RelOptInfo * rel) Line 308 C

postgres.exe!set_plain_rel_pathlist(PlannerInfo * root, RelOptInfo * rel, RangeTblEntry * rte) Line 403 C

postgres.exe!set_rel_pathlist(PlannerInfo * root, RelOptInfo * rel, unsigned int rti, RangeTblEntry * rte) Line 337 C

postgres.exe!set_base_rel_pathlists(PlannerInfo * root) Line 223 C

postgres.exe!make_one_rel(PlannerInfo * root, List * joinlist) Line 157 C

postgres.exe!query_planner(PlannerInfo * root, List * tlist, void (PlannerInfo *, void *) * qp_callback, void * qp_extra) Line 236 C

postgres.exe!grouping_planner(PlannerInfo * root, double tuple_fraction) Line 1289 C

postgres.exe!subquery_planner(PlannerGlobal * glob, Query * parse, PlannerInfo * parent_root, char hasRecursion, double tuple_fraction, PlannerInfo * * subroot) Line 573 C

postgres.exe!standard_planner(Query * parse, int cursorOptions, ParamListInfoData * boundParams) Line 211 C

postgres.exe!planner(Query * parse, int cursorOptions, ParamListInfoData * boundParams) Line 139 C

postgres.exe!pg_plan_query(Query * querytree, int cursorOptions, ParamListInfoData * boundParams) Line 750 C

postgres.exe!ExplainOneQuery(Query * query, IntoClause * into, ExplainState * es, const char * queryString, ParamListInfoData * params) Line 330 C

postgres.exe!ExplainQuery(ExplainStmt * stmt, const char * queryString, ParamListInfoData * params, _DestReceiver * dest) Line 231 C

postgres.exe!standard_ProcessUtility(Node * parsetree, const char * queryString, ProcessUtilityContext context, ParamListInfoData * params, _DestReceiver * dest, char * completionTag) Line 647 C

postgres.exe!ProcessUtility(Node * parsetree, const char * queryString, ProcessUtilityContext context, ParamListInfoData * params, _DestReceiver * dest, char * completionTag) Line 314 C

postgres.exe!PortalRunUtility(PortalData * portal, Node * utilityStmt, char isTopLevel, _DestReceiver * dest, char * completionTag) Line 1195 C

postgres.exe!FillPortalStore(PortalData * portal, char isTopLevel) Line 1063 C

postgres.exe!PortalRun(PortalData * portal, long count, char isTopLevel, _DestReceiver * dest, _DestReceiver * altdest, char * completionTag) Line 790 C

postgres.exe!exec_simple_query(const char * query_string) Line 1052 C

postgres.exe!PostgresMain(int argc, char * * argv, const char * dbname, const char * username) Line 4012 C

postgres.exe!BackendRun(Port * port) Line 4113 C

postgres.exe!SubPostmasterMain(int argc, char * * argv) Line 4618 C

postgres.exe!main(int argc, char * * argv) Line 207 C

So my solution to this was just to add the code that strips out the eclass members that belong to the newly dead rel in join removal.

The only other solution I can think of would be to add a bitmap set of dead rels onto the PlannerInfo struct or perhaps just generate one and passing that in prohibited_rels in generate_implied_equalities_for_column (). I don't really care for this solution very much as it seems better to make the join removal code pay for this extra processing rather than (probably) most queries.

Of course this is my problem as I'm unable to create the same situation with the existing left join removals. The point here is more to justify why I added code to strip eclass members of dead rels.

Any thoughts? Or arguments against me keeping the code that strips out the eclass members of dead rels?

Regards

David Rowley

Attachment

inner_join_removals_2014-09-16_2bcc9ba.patch

Re: Patch to support SEMI and ANTI join removal

From

Heikki Linnakangas

Date:

25 September 2014, 12:36:48

On 09/16/2014 01:20 PM, David Rowley wrote:
> +    /*
> +     * We mustn't allow any joins to be removed if there are any pending
> +     * foreign key triggers in the queue. This could happen if we are planning
> +     * a query that has been executed from within a volatile function and the
> +     * query which called this volatile function has made some changes to a
> +     * table referenced by a foreign key. The reason for this is that any
> +     * updates to a table which is referenced by a foreign key constraint will
> +     * only have the referencing tables updated after the command is complete,
> +     * so there is a window of time where records may violate the foreign key
> +     * constraint.
> +     *
> +     * Currently this code is quite naive, as we won't even attempt to remove
> +     * the join if there are *any* pending foreign key triggers, on any
> +     * relation. It may be worthwhile to improve this to check if there's any
> +     * pending triggers for the referencing relation in the join.
> +     */
> +    if (!AfterTriggerQueueIsEmpty())
> +        return false;

Hmm. This code runs when the query is planned. There is no guarantee 
that there won't be after triggers pending when the query is later 
*executed*.

- Heikki

Re: Patch to support SEMI and ANTI join removal

From

David Rowley

Date:

28 September 2014, 04:32:35

On Fri, Sep 26, 2014 at 12:36 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

On 09/16/2014 01:20 PM, David Rowley wrote:
+ /*
+ * We mustn't allow any joins to be removed if there are any pending
+ * foreign key triggers in the queue. This could happen if we are planning
+ * a query that has been executed from within a volatile function and the
+ * query which called this volatile function has made some changes to a
+ * table referenced by a foreign key. The reason for this is that any
+ * updates to a table which is referenced by a foreign key constraint will
+ * only have the referencing tables updated after the command is complete,
+ * so there is a window of time where records may violate the foreign key
+ * constraint.
+ *
+ * Currently this code is quite naive, as we won't even attempt to remove
+ * the join if there are *any* pending foreign key triggers, on any
+ * relation. It may be worthwhile to improve this to check if there's any
+ * pending triggers for the referencing relation in the join.
+ */
+ if (!AfterTriggerQueueIsEmpty())
+ return false;

Hi Heikki,

Thanks for having a look at the patch.

Hmm. This code runs when the query is planned. There is no guarantee that there won't be after triggers pending when the query is later *executed*.

Please correct anything that sounds wrong here, but my understanding is that we'll always plan a query right before we execute it, with the exception of PREPARE statements where PostgreSQL will cache the query plan when the prepare statement is first executed. So I think you may have a point here regarding PREPARE'd statements, but I think that it is isolated to those.

I think in all other cases we'll plan right before we execute. So if we happen to be planning an UPDATE statement which has a sub-query that perform some INNER JOINs, I think we're safe to remove INNER JOINs when possible, as the UPDATE statement won't get visibility of its own changes.

We can see that here:

create table updatetest (id int primary key, value int, value2 int);

create or replace function getvalue2(p_id int) returns int

as $$select value2 from updatetest where id = p_id$$

language sql volatile;

insert into updatetest values(0,0,0);

insert into updatetest values(1,10,10);

insert into updatetest values(2,20,20);

insert into updatetest values(3,30,30);

update updatetest set value = COALESCE((select value from updatetest u2 where updatetest.id - 1 = u2.id) + 1,0);

update updatetest set value2 = COALESCE(getvalue2(id - 1) + 1,0);

select * from updatetest;

id | value | value2

----+-------+--------

0 | 0 | 0

1 | 1 | 1

2 | 11 | 2

3 | 21 | 3

The value column appears to have been set based on the value that was previously in the value column, and has not come from the newly set value. The behaviour is different for the value2 column as the value for this has been fetched from another query, which *does* see the newly updated value stored in the value2 column.

My understanding of foreign keys is that any pending foreign key triggers will be executed just before the query completes, so we should only ever encounter pending foreign key triggers during planning when we're planning a query that's being executed from somewhere like a volatile function or trigger function, if the outer query has updated or deleted some records which are referenced by a foreign key.

So I think with the check for pending triggers at planning time this is safe at least for queries being planned right before they're executed, but you've caused me to realise that I'll probably need to do some more work on this for when it comes to PREPARE'd queries, as it looks like if we executed a prepared query from inside a volatile function or trigger function that was called from a DELETE or UPDATE statement that caused foreign key triggers to be queued, and we'd happened to have removed some INNER JOINs when we originally planned that prepare statement, then that would be wrong.

The only thing that comes to mind to fix that right now is to tag something maybe in PlannerInfo to say if we've removed any INNER JOINs in planning, then when we execute a prepared statement we could void the cached plan we see that some INNER JOINs were removed, but only do this if the foreign key trigger queue has pending triggers. (which will hopefully not be very often).

Another thing that comes to mind which may be similar is how we handle something like:

PREPARE a AS SELECT * from tbl WHERE name LIKE $1;

Where, if $1 is 'foo' or 'foo%' we might want to use an index scan, but if $1 was '%foo' then we'd probably not.

I've not yet looked into great detail of what happens here, but from some quick simple tests it seems to replan each time!? But perhaps that'd due to the parameter, where with my other tests the PREPARE statement had no parameters.

There was some other discussion relating to some of this over here-> http://www.postgresql.org/message-id/20140603235053.GA351732@tornado.leadboat.com

Regards

David Rowley

Re: Patch to support SEMI and ANTI join removal

From

Andres Freund

Date:

28 September 2014, 13:41:23

On 2014-09-28 17:32:21 +1300, David Rowley wrote:
> My understanding of foreign keys is that any pending foreign key triggers
> will be executed just before the query completes, so we should only ever
> encounter pending foreign key triggers during planning when we're planning
> a query that's being executed from somewhere like a volatile function or
> trigger function, if the outer query has updated or deleted some records
> which are referenced by a foreign key.

Note that foreign key checks also can be deferred. So the window for
these cases is actually larger.

> So I think with the check for pending triggers at planning time this is
> safe at least for queries being planned right before they're executed, but
> you've caused me to realise that I'll probably need to do some more work on
> this for when it comes to PREPARE'd queries, as it looks like if we
> executed a prepared query from inside a volatile function or trigger
> function that was called from a DELETE or UPDATE statement that caused
> foreign key triggers to be queued, and we'd happened to have removed some
> INNER JOINs when we originally planned that prepare statement, then that
> would be wrong.

I'm wondering whether this wouldn't actually be better handled by some
sort of 'one time filter' capability for joins. When noticing during
planning that one side of the join is nullable attach a check to the
join node. Then, whenever that check returns true, skip checking one
side of the join and return rows without looking at that side.

That capability might also be interesting for more efficient planning of
left joins that partially have a constant join expression.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Patch to support SEMI and ANTI join removal

From

Tom Lane

Date:

28 September 2014, 14:42:08

David Rowley <dgrowleyml@gmail.com> writes:
> Please correct anything that sounds wrong here, but my understanding is
> that we'll always plan a query right before we execute it, with the
> exception of PREPARE statements where PostgreSQL will cache the query plan
> when the prepare statement is first executed.

If this optimization only works in that scenario, it's dead in the water,
because that assumption is unsupportable.  The planner does not in general
use the same query snapshot as the executor, so even in an immediate-
execution workflow there could have been data changes (caused by other
transactions) between planning and execution.

Why do you need such an assumption?
        regards, tom lane

Re: Patch to support SEMI and ANTI join removal

From

David Rowley

Date:

29 September 2014, 09:43:08

On Mon, Sep 29, 2014 at 2:41 AM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-09-28 17:32:21 +1300, David Rowley wrote:
> My understanding of foreign keys is that any pending foreign key triggers
> will be executed just before the query completes, so we should only ever
> encounter pending foreign key triggers during planning when we're planning
> a query that's being executed from somewhere like a volatile function or
> trigger function, if the outer query has updated or deleted some records
> which are referenced by a foreign key.

Note that foreign key checks also can be deferred. So the window for
these cases is actually larger.

Thanks Andres, I know you had said this before but I had previously failed to realise exactly what you meant. I thought you were talking about defining a foreign key to reference a column that has a deferrable unique index. I now realise you were talking about making the foreign key itself as deferrable. I've made a change to the patch locally to ignore foreign keys that are marked as deferrable.

Regards

David Rowley

Re: Patch to support SEMI and ANTI join removal

From

Andres Freund

Date:

29 September 2014, 09:49:52

On 2014-09-28 10:41:56 -0400, Tom Lane wrote:
> David Rowley <dgrowleyml@gmail.com> writes:
> > Please correct anything that sounds wrong here, but my understanding is
> > that we'll always plan a query right before we execute it, with the
> > exception of PREPARE statements where PostgreSQL will cache the query plan
> > when the prepare statement is first executed.
> 
> If this optimization only works in that scenario, it's dead in the water,
> because that assumption is unsupportable.  The planner does not in general
> use the same query snapshot as the executor, so even in an immediate-
> execution workflow there could have been data changes (caused by other
> transactions) between planning and execution.

I don't think the effects of other queries are the problem here. The
effect of other backend's deferred FK checks shouldn't matter for other
backends for normal query purposes. It's the planning backend that might
have deferred checks and thus temporarily violated foreign keys.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Patch to support SEMI and ANTI join removal

From

Andres Freund

Date:

29 September 2014, 11:42:24

On 2014-09-29 22:42:57 +1300, David Rowley wrote:
> On Mon, Sep 29, 2014 at 2:41 AM, Andres Freund <andres@2ndquadrant.com>
> wrote:
> 
> > On 2014-09-28 17:32:21 +1300, David Rowley wrote:
> > > My understanding of foreign keys is that any pending foreign key triggers
> > > will be executed just before the query completes, so we should only ever
> > > encounter pending foreign key triggers during planning when we're
> > planning
> > > a query that's being executed from somewhere like a volatile function or
> > > trigger function, if the outer query has updated or deleted some records
> > > which are referenced by a foreign key.
> >
> > Note that foreign key checks also can be deferred. So the window for
> > these cases is actually larger.
> >
> >
> Thanks Andres, I know you had said this before but I had previously failed
> to realise exactly what you meant. I thought you were talking about
> defining a foreign key to reference a column that has a deferrable unique
> index. I now realise you were talking about making the foreign key itself
> as deferrable.

Oh. Don't remember doing that ;)

> I've made a change to the patch locally to ignore foreign
> keys that are marked as deferrable.

I have serious doubts about the general usefulness if this is onlyu
going to be useable in a very restricted set of circumstances (only one
time plans, no deferrable keys). I think it'd be awesome to have the
capability, but I don't think it's ok to restrict it that much.

To me that means you can't make the decision at plan time, but need to
move it to execution time. It really doesn't sound that hard to short
circuit the semi joins whenever, at execution time, there's no entries
in the deferred trigger queue. It's a bit annoying to have to add code
to all of nestloop/hashjoin/mergejoin to not check the outer tuple if
there's no such entry. But I don't think it'll be too bad. That'd mean
it can be used in prepared statements.

What I think would be a bit finnicky is the costing...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Patch to support SEMI and ANTI join removal

From

Tom Lane

Date:

29 September 2014, 14:12:43

Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-09-28 10:41:56 -0400, Tom Lane wrote:
>> If this optimization only works in that scenario, it's dead in the water,
>> because that assumption is unsupportable.  The planner does not in general
>> use the same query snapshot as the executor, so even in an immediate-
>> execution workflow there could have been data changes (caused by other
>> transactions) between planning and execution.

> I don't think the effects of other queries are the problem here. The
> effect of other backend's deferred FK checks shouldn't matter for other
> backends for normal query purposes. It's the planning backend that might
> have deferred checks and thus temporarily violated foreign keys.

I see.  So why aren't we simply ignoring deferrable FKs when making the
optimization?  That pushes it back from depending on execution-time state
(unsafe) to depending on table DDL (safe).
        regards, tom lane

Re: Patch to support SEMI and ANTI join removal

From

Andres Freund

Date:

29 September 2014, 14:28:22

On 2014-09-29 10:12:25 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2014-09-28 10:41:56 -0400, Tom Lane wrote:
> >> If this optimization only works in that scenario, it's dead in the water,
> >> because that assumption is unsupportable.  The planner does not in general
> >> use the same query snapshot as the executor, so even in an immediate-
> >> execution workflow there could have been data changes (caused by other
> >> transactions) between planning and execution.
> 
> > I don't think the effects of other queries are the problem here. The
> > effect of other backend's deferred FK checks shouldn't matter for other
> > backends for normal query purposes. It's the planning backend that might
> > have deferred checks and thus temporarily violated foreign keys.
> 
> I see.  So why aren't we simply ignoring deferrable FKs when making the
> optimization?  That pushes it back from depending on execution-time state
> (unsafe) to depending on table DDL (safe).

IIRC there's some scenarios where violated FKs are visible to client
code for nondeferrable ones as well. Consider e.g. cascading foreign
keys + triggers. Or, somewhat insane, operators used in fkey triggers
that execute queries themselves.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Patch to support SEMI and ANTI join removal

From

Tom Lane

Date:

29 September 2014, 14:43:43

Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-09-29 10:12:25 -0400, Tom Lane wrote:
>> I see.  So why aren't we simply ignoring deferrable FKs when making the
>> optimization?  That pushes it back from depending on execution-time state
>> (unsafe) to depending on table DDL (safe).

> IIRC there's some scenarios where violated FKs are visible to client
> code for nondeferrable ones as well. Consider e.g. cascading foreign
> keys + triggers. Or, somewhat insane, operators used in fkey triggers
> that execute queries themselves.

Yeah, I had just thought of the query-in-function-called-from-violating-
query case myself.  I plead insufficient caffeine :-(.  I'd been making
a mental analogy to non-deferred uniqueness constraints, but actually
what we will optimize outer joins on is "immediate" unique indexes,
wherein there's no delay at all before the constraint is checked.
Too bad there's no equivalent in foreign key land.

These are certainly corner cases, but it doesn't seem up to project
quality standards to just ignore them.  So I'm thinking you're right
that a run-time short-circuit would be the only way to deal with the
case safely.

On the whole I'm feeling that the scope of applicability of this
optimization is going to be too narrow to justify the maintenance
effort and extra planning/runtime overhead.
        regards, tom lane

Re: Patch to support SEMI and ANTI join removal

From

David Rowley

Date:

30 September 2014, 10:25:54

On Tue, Sep 30, 2014 at 12:42 AM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-09-29 22:42:57 +1300, David Rowley wrote:

> I've made a change to the patch locally to ignore foreign
> keys that are marked as deferrable.

I have serious doubts about the general usefulness if this is onlyu
going to be useable in a very restricted set of circumstances (only one
time plans, no deferrable keys). I think it'd be awesome to have the
capability, but I don't think it's ok to restrict it that much.

I had a look to see what Oracle does in this situation and I was quite shocked to see that they're blatantly just ignoring the fact that the foreign key is being deferred. I tested by deferring the foreign key in a transaction then updating the referenced record and I see that Oracle just return the wrong results as they're just blindly removing the join. So it appears that they've not solved this one very well.

To me that means you can't make the decision at plan time, but need to
move it to execution time. It really doesn't sound that hard to short
circuit the semi joins whenever, at execution time, there's no entries
in the deferred trigger queue. It's a bit annoying to have to add code
to all of nestloop/hashjoin/mergejoin to not check the outer tuple if
there's no such entry. But I don't think it'll be too bad. That'd mean
it can be used in prepared statements.

I'm starting to think about how this might be done, but I'm a bit confused and I don't know if it's something you've overlooked or something I've misunderstood.

I've not quite gotten my head around how we might stop the unneeded relation from being the primary path to join the other inner relations, i.e. what would stop the planner making a plan that hashed all the other relations and planned to perform a sequence scan on the relation that we have no need to scan (because the foreign key tells us the join is pointless). If we were not use use that relation then we'd just have a bunch of hash tables with no path to join them up. If we did anything to force the planner into creating a plan that would suit skipping relations, then we could possibly be throwing away the optimal plan..... Right?

Regards

David Rowley

Re: Patch to support SEMI and ANTI join removal

From

Andres Freund

Date:

30 September 2014, 11:01:56

On 2014-09-30 23:25:45 +1300, David Rowley wrote:
> On Tue, Sep 30, 2014 at 12:42 AM, Andres Freund <andres@2ndquadrant.com>
> wrote:
> 
> > On 2014-09-29 22:42:57 +1300, David Rowley wrote:
> >
> > > I've made a change to the patch locally to ignore foreign
> > > keys that are marked as deferrable.
> >
> > I have serious doubts about the general usefulness if this is onlyu
> > going to be useable in a very restricted set of circumstances (only one
> > time plans, no deferrable keys). I think it'd be awesome to have the
> > capability, but I don't think it's ok to restrict it that much.
> >
> >
> I had a look to see what Oracle does in this situation and I was quite
> shocked to see that they're blatantly just ignoring the fact that the
> foreign key is being deferred. I tested by deferring the foreign key in a
> transaction then updating the referenced record and I see that Oracle just
> return the wrong results as they're just blindly removing the join. So it
> appears that they've not solved this one very well.

Ick. I'm pretty strongly against going that way.

> > To me that means you can't make the decision at plan time, but need to
> > move it to execution time. It really doesn't sound that hard to short
> > circuit the semi joins whenever, at execution time, there's no entries
> > in the deferred trigger queue. It's a bit annoying to have to add code
> > to all of nestloop/hashjoin/mergejoin to not check the outer tuple if
> > there's no such entry. But I don't think it'll be too bad. That'd mean
> > it can be used in prepared statements.
> >
> >
> I'm starting to think about how this might be done, but I'm a bit confused
> and I don't know if it's something you've overlooked or something I've
> misunderstood.
> 
> I've not quite gotten my head around how we might stop the unneeded
> relation from being the primary path to join the other inner relations,
> i.e. what would stop the planner making a plan that hashed all the other
> relations and planned to perform a sequence scan on the relation that we
> have no need to scan (because the foreign key tells us the join is
> pointless). If we were not use use that relation then we'd just have a
> bunch of hash tables with no path to join them up. If we did anything to
> force the planner into creating a plan that would suit skipping relations,
> then we could possibly be throwing away the optimal plan..... Right?

I'm not 100% sure I understand your problem description, but let me
describe how I think this would work. During planning, you'd emit the
exactly same plan as you'd today, with two exceptions:
a) When costing a node where one side of a join is very likely to be  removable, you'd cost it nearly as if there
wasn'ta join.
 
b) The planner would attach some sort of 'one time join qual' to the  'likely removable' join nodes. If, during
executorinit, that qual  returns false, simply don't perform the join. Just check the inner  relation, but entirely
skipthe outer relation.
 

With regard to your comment hash tables that aren't joined up: Currently
hash tables aren't built if they're not used. I.e. it's not
ExecInitHash() that does the hashing, but they're generally only built
when needed. E.g. nodeHashJoin.c:ExecHashJoin() only calls
MultiExecProcNode() when in the HJ_BUILD_HASHTABLE state - which it only
initially and sometimes after rescans is.

Does that clear things up or have I completely missed your angle?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Patch to support SEMI and ANTI join removal

From

David Rowley

Date:

30 September 2014, 12:03:49

On Wed, Oct 1, 2014 at 12:01 AM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-09-30 23:25:45 +1300, David Rowley wrote:
>
> I've not quite gotten my head around how we might stop the unneeded
> relation from being the primary path to join the other inner relations,
> i.e. what would stop the planner making a plan that hashed all the other
> relations and planned to perform a sequence scan on the relation that we
> have no need to scan (because the foreign key tells us the join is
> pointless). If we were not use use that relation then we'd just have a
> bunch of hash tables with no path to join them up. If we did anything to
> force the planner into creating a plan that would suit skipping relations,
> then we could possibly be throwing away the optimal plan..... Right?

I'm not 100% sure I understand your problem description, but let me
describe how I think this would work. During planning, you'd emit the
exactly same plan as you'd today, with two exceptions:
a) When costing a node where one side of a join is very likely to be
removable, you'd cost it nearly as if there wasn't a join.

Ok given the tables:

create table t1 (x int primary key);

create table t2 (y int primary key);

suppose the planner came up with something like:

test=# explain (costs off) select t2.* from t1 inner join t2 on t1.x=t2.y;

QUERY PLAN

----------------------------

Hash Join

Hash Cond: (t1.x = t2.y)

-> Seq Scan on t1

-> Hash

-> Seq Scan on t2

If we had a foreign key...

alter table t2 add constraint t2_y_fkey foreign key (y) references t1 (x);

...the join to t1 could possibly be "ignored" by the executor... but there's a problem as the plan states we're going to seqscan then hash that relation, then seqscan t1 with a hash lookup on each of t1's rows. In this case how would the executor skip the scan on t1? I can see how this might work if it was t2 that we were removing, as we'd just skip the hash lookup part in the hash join node.

b) The planner would attach some sort of 'one time join qual' to the
'likely removable' join nodes. If, during executor init, that qual
returns false, simply don't perform the join. Just check the inner
relation, but entirely skip the outer relation.

I think in the example that I've given above that the relation we'd want to keep is the outer relation, we'd want to throw away the inner one, but I can't quite see how that would work.

Hopefully that makes sense.

Regards

David Rowley

Re: Patch to support SEMI and ANTI join removal

From

Andres Freund

Date:

30 September 2014, 12:35:16

On 2014-10-01 01:03:35 +1300, David Rowley wrote:
> On Wed, Oct 1, 2014 at 12:01 AM, Andres Freund <andres@2ndquadrant.com>
> wrote:
> 
> > On 2014-09-30 23:25:45 +1300, David Rowley wrote:
> > >
> > > I've not quite gotten my head around how we might stop the unneeded
> > > relation from being the primary path to join the other inner relations,
> > > i.e. what would stop the planner making a plan that hashed all the other
> > > relations and planned to perform a sequence scan on the relation that we
> > > have no need to scan (because the foreign key tells us the join is
> > > pointless). If we were not use use that relation then we'd just have a
> > > bunch of hash tables with no path to join them up. If we did anything to
> > > force the planner into creating a plan that would suit skipping
> > relations,
> > > then we could possibly be throwing away the optimal plan..... Right?
> >
> > I'm not 100% sure I understand your problem description, but let me
> > describe how I think this would work. During planning, you'd emit the
> > exactly same plan as you'd today, with two exceptions:
> > a) When costing a node where one side of a join is very likely to be
> >    removable, you'd cost it nearly as if there wasn't a join.
> >
> 
> Ok given the tables:
> create table t1 (x int primary key);
> create table t2 (y int primary key);
> 
> suppose the planner came up with something like:
> 
> test=# explain (costs off) select t2.* from t1 inner join t2 on t1.x=t2.y;
>          QUERY PLAN
> ----------------------------
>  Hash Join
>    Hash Cond: (t1.x = t2.y)
>    ->  Seq Scan on t1
>    ->  Hash
>          ->  Seq Scan on t2
> 
> If we had a foreign key...
> 
> alter table t2 add constraint t2_y_fkey foreign key (y) references t1 (x);
> 
> ...the join to t1 could possibly be "ignored" by the executor... but
> there's a problem as the plan states we're going to seqscan then hash that
> relation, then seqscan t1 with a hash lookup on each of t1's rows. In this
> case how would the executor skip the scan on t1? I can see how this might
> work if it was t2 that we were removing, as we'd just skip the hash lookup
> part in the hash join node.

Hm, right. But that doesn't seem like a fatal problem to me. The planner
knows about t1/t2 and Seq(t1), Seq(t2), not just Hash(Seq(t2)). So it
can tell the HashJoin node that when the 'shortcut' qualifier is true,
it should source everything from Seq(t2). Since the sequence scan
doesn't care about the node ontop that doesn't seem to be overly
dramatic?
Obviously reality makes this a bit more complicated...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Patch to support SEMI and ANTI join removal

From

David Rowley

Date:

06 October 2014, 09:57:32

On Wed, Oct 1, 2014 at 1:34 AM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-10-01 01:03:35 +1300, David Rowley wrote:
> On Wed, Oct 1, 2014 at 12:01 AM, Andres Freund <andres@2ndquadrant.com>
> wrote:
>
> > On 2014-09-30 23:25:45 +1300, David Rowley wrote:
> > >
> > > I've not quite gotten my head around how we might stop the unneeded
> > > relation from being the primary path to join the other inner relations,
> > > i.e. what would stop the planner making a plan that hashed all the other
> > > relations and planned to perform a sequence scan on the relation that we
> > > have no need to scan (because the foreign key tells us the join is
> > > pointless). If we were not use use that relation then we'd just have a
> > > bunch of hash tables with no path to join them up. If we did anything to
> > > force the planner into creating a plan that would suit skipping
> > relations,
> > > then we could possibly be throwing away the optimal plan..... Right?
> >
> > I'm not 100% sure I understand your problem description, but let me
> > describe how I think this would work. During planning, you'd emit the
> > exactly same plan as you'd today, with two exceptions:
> > a) When costing a node where one side of a join is very likely to be
> > removable, you'd cost it nearly as if there wasn't a join.
> >
>
> Ok given the tables:
> create table t1 (x int primary key);
> create table t2 (y int primary key);
>
> suppose the planner came up with something like:
>
> test=# explain (costs off) select t2.* from t1 inner join t2 on t1.x=t2.y;
> QUERY PLAN
> ----------------------------
> Hash Join
> Hash Cond: (t1.x = t2.y)
> -> Seq Scan on t1
> -> Hash
> -> Seq Scan on t2
>
> If we had a foreign key...
>
> alter table t2 add constraint t2_y_fkey foreign key (y) references t1 (x);
>
> ...the join to t1 could possibly be "ignored" by the executor... but
> there's a problem as the plan states we're going to seqscan then hash that
> relation, then seqscan t1 with a hash lookup on each of t1's rows. In this
> case how would the executor skip the scan on t1? I can see how this might
> work if it was t2 that we were removing, as we'd just skip the hash lookup
> part in the hash join node.

Hm, right. But that doesn't seem like a fatal problem to me. The planner
knows about t1/t2 and Seq(t1), Seq(t2), not just Hash(Seq(t2)). So it
can tell the HashJoin node that when the 'shortcut' qualifier is true,
it should source everything from Seq(t2). Since the sequence scan
doesn't care about the node ontop that doesn't seem to be overly
dramatic?
Obviously reality makes this a bit more complicated...

Ok, after a bit of study on the hash join code I can see what you mean, it shouldn't really matter.

I've started working on this now and I've made some changes in analyzejoins.c so that instead of removing joins, it just marks the RangeTblEntry, setting a new flag named skipJoinPossible to true. (I'll think of a better name later)

I'm starting off with hash joins and I'm hacking a bit at ExecInitHashJoin. I want to add 2 bool flags to HashJoinState, outerIsRequired and innerIsRequired. I think if both of these flags are set then we can just abort the join altogether. Though in order to set these flags I need to identify which relations are for the outer and which are for the inner side of the join. I've got the logic for that only partially worked out. My understanding of it so far is that I just need to look at the hjstate->js.ps.lefttree and righttree. Inner being right, and outer being left. I'm a little stuck on more complex cases where the scan is nested deeper in the tree and I'm not quite sure on the logic I should be using to navigate to it.

Take the following plan: (which I've amended to mark the left and right nodes)

explain select t1.* from t1 inner join t2 on t1.t2_id= t2.id inner join t3 on t2.t3_id=t3.id;

QUERY PLAN

------------------------------------------------------------------------

Hash Join (cost=122.15..212.40 rows=2140 width=8)

Hash Cond: (t2.t3_id = t3.id)

-> Hash Join (cost=58.15..118.98 rows=2140 width=12) (left node)

Hash Cond: (t1.t2_id = t2.id)

-> Seq Scan on t1 (cost=0.00..31.40 rows=2140 width=8) (left node)

-> Hash (cost=31.40..31.40 rows=2140 width=8) (right node)

-> Seq Scan on t2 (cost=0.00..31.40 rows=2140 width=8) (left node)

-> Hash (cost=34.00..34.00 rows=2400 width=4) (right node)

-> Seq Scan on t3 (cost=0.00..34.00 rows=2400 width=4) (left node)

The schema is set up in such a way that the joins to t2 and t3 can be... "skipped", and my new code in analyzejoin.c marks the RangeTblEntry records for this relation to reflect that.

During ExecInitHashJoin for the join between t2 and t3, if I want to find t2 in the tree, I'd need to do hjstate->js.ps.lefttree->righttree->lefttree... (which I know just from looking at the explain output) I just can't work out the logic behind where the scan node will actually be. At first I had thought something like, loop down the lefttree path until I reach a deadend, and that's the outer scan node, but in this case there's a right turn in there too, so that won't work. If I keep going down the left path I'd end up at t1, which is completely wrong.

Can anyone shed any light on how I might determine where the scan rel is in the tree? I need to find it so I can check if the RangeTblEntry is marked as skip-able.

Regards

David Rowley

Re: Patch to support SEMI and ANTI join removal

From

Robert Haas

Date:

06 October 2014, 14:46:17

On Mon, Oct 6, 2014 at 5:57 AM, David Rowley <dgrowleyml@gmail.com> wrote:
>> Hm, right. But that doesn't seem like a fatal problem to me. The planner
>> knows about t1/t2 and Seq(t1), Seq(t2), not just Hash(Seq(t2)). So it
>> can tell the HashJoin node that when the 'shortcut' qualifier is true,
>> it should source everything from Seq(t2). Since the sequence scan
>> doesn't care about the node ontop that doesn't seem to be overly
>> dramatic?
>> Obviously reality makes this a bit more complicated...
>>
> Ok, after a bit of study on the hash join code I can see what you mean, it
> shouldn't really matter.
> I've started working on this now and I've made some changes in
> analyzejoins.c so that instead of removing joins, it just marks the
> RangeTblEntry, setting a new flag named skipJoinPossible to true. (I'll
> think of a better name later)
>
> I'm starting off with hash joins and I'm hacking a bit at ExecInitHashJoin.
> I want to add 2 bool flags to HashJoinState, outerIsRequired and
> innerIsRequired. I think if both of these flags are set then we can just
> abort the join altogether. Though in order to set these flags I need to
> identify which relations are for the outer and which are for the inner side
> of the join. I've got the logic for that only partially worked out. My
> understanding of it so far is that I just need to look at the
> hjstate->js.ps.lefttree and righttree. Inner being right, and outer being
> left. I'm a little stuck on more complex cases where the scan is nested
> deeper in the tree and I'm not quite sure on the logic I should be using to
> navigate to it.
>
> Take the following plan: (which I've amended to mark the left and right
> nodes)
>
> explain select t1.* from t1 inner join t2 on t1.t2_id= t2.id inner join t3
> on t2.t3_id=t3.id;
>                                QUERY PLAN
> ------------------------------------------------------------------------
>  Hash Join  (cost=122.15..212.40 rows=2140 width=8)
>    Hash Cond: (t2.t3_id = t3.id)
>    ->  Hash Join  (cost=58.15..118.98 rows=2140 width=12) (left node)
>          Hash Cond: (t1.t2_id = t2.id)
>          ->  Seq Scan on t1  (cost=0.00..31.40 rows=2140 width=8) (left
> node)
>          ->  Hash  (cost=31.40..31.40 rows=2140 width=8) (right node)
>                ->  Seq Scan on t2  (cost=0.00..31.40 rows=2140 width=8)
> (left node)
>    ->  Hash  (cost=34.00..34.00 rows=2400 width=4) (right node)
>          ->  Seq Scan on t3  (cost=0.00..34.00 rows=2400 width=4) (left
> node)
>
> The schema is set up in such a way that the joins to t2 and t3 can be...
> "skipped", and my new code in analyzejoin.c marks the RangeTblEntry records
> for this relation to reflect that.
>
> During ExecInitHashJoin for the join between t2 and t3, if I want to find t2
> in the tree, I'd need to do hjstate->js.ps.lefttree->righttree->lefttree...
> (which I know just from looking at the explain output) I just can't work out
> the logic behind where the scan node will actually be. At first I had
> thought something like, loop down the lefttree path until I reach a deadend,
> and that's the outer scan node, but in this case there's a right turn in
> there too, so that won't work. If I keep going down the left path I'd end up
> at t1, which is completely wrong.
>
> Can anyone shed any light on how I might determine where the scan rel is in
> the tree? I need to find it so I can check if the RangeTblEntry is marked as
> skip-able.

I think you're probably going to need logic that knows about
particular node types and does the right thing for each one.  I think
- but maybe I'm misunderstanding - what you're looking for is a
function of the form Oid ScansOnePlainTableWithoutQuals().  The
algorithm could be something like:

switch (node type)
{   case T_SeqScanState:       if (no quals)           return the_appropriate_oid;       return false;   case
T_HashJoin:      decide whether we can ignore one side of the join       fish out the node from the other side of the
join(including
 
reaching through the Hash node if necessary)       return ScansOnePlainTableWithoutQuals(the node we fished out);
...otherspecific cases...   default:       return false;
 
}

This seems messy, though.  Can't the deferred trigger queue become
non-empty at pretty much any point in time?  At exactly what point are
we making this decision, and how do we know the correct answer can't
change after that point?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Patch to support SEMI and ANTI join removal

From

Andres Freund

Date:

06 October 2014, 15:00:19

On 2014-10-06 10:46:09 -0400, Robert Haas wrote:
> This seems messy, though.  Can't the deferred trigger queue become
> non-empty at pretty much any point in time?  At exactly what point are
> we making this decision, and how do we know the correct answer can't
> change after that point?

What we've been talking about is doing this during executor startup. And
at that point we really don't care about new entries in the queue during
query execution - we can't see them anyway.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Patch to support SEMI and ANTI join removal

From

Robert Haas

Date:

06 October 2014, 15:05:35

On Mon, Oct 6, 2014 at 10:59 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-10-06 10:46:09 -0400, Robert Haas wrote:
>> This seems messy, though.  Can't the deferred trigger queue become
>> non-empty at pretty much any point in time?  At exactly what point are
>> we making this decision, and how do we know the correct answer can't
>> change after that point?
>
> What we've been talking about is doing this during executor startup. And
> at that point we really don't care about new entries in the queue during
> query execution - we can't see them anyway.

Ah, OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Patch to support SEMI and ANTI join removal

From

David Rowley

Date:

08 October 2014, 11:21:56

On Tue, Oct 7, 2014 at 3:46 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Oct 6, 2014 at 5:57 AM, David Rowley <dgrowleyml@gmail.com> wrote:
> Can anyone shed any light on how I might determine where the scan rel is in
> the tree? I need to find it so I can check if the RangeTblEntry is marked as
> skip-able.

I think you're probably going to need logic that knows about
particular node types and does the right thing for each one. I think
- but maybe I'm misunderstanding - what you're looking for is a
function of the form Oid ScansOnePlainTableWithoutQuals(). The
algorithm could be something like:

switch (node type)
{
case T_SeqScanState:
if (no quals)
return the_appropriate_oid;
return false;
case T_HashJoin:
decide whether we can ignore one side of the join
fish out the node from the other side of the join (including
reaching through the Hash node if necessary)
return ScansOnePlainTableWithoutQuals(the node we fished out);
...other specific cases...
default:
return false;
}

Thanks Robert.

Ok, so I've been hacking away at this for a couple of evenings and I think I have a working prototype finally!

My earlier thoughts about having to descend down until I find a seqscan were wrong. It looks like just need to look at the next node down, if it's a seqscan and it's marked as not needed, then we can skip that side of the join, or if the child node is a HashJoinState then check the skip status of that node, if both sides are marked as not needed, then skip that side of the join.

I've just completed some simple performance tests:

create table t3 (id int primary key);

create table t2 (id int primary key, t3_id int not null references t3);

create table t1 (id int primary key, t2_id int not null references t2);

I've loaded these tables with 4 million rows each.The performance is as follows:

test=# select count(*) from t1 inner join t2 on t1.t2_id=t2.id inner join t3 on t2.t3_id=t3.id;

count

---------

4000000

(1 row)

Time: 1022.492 ms

test=# select count(*) from t1;

count

---------

4000000

(1 row)

Time: 693.642 ms

test=# alter table t2 drop constraint t2_t3_id_fkey;

ALTER TABLE

Time: 2.141 ms

test=# alter table t1 drop constraint t1_t2_id_fkey;

ALTER TABLE

Time: 1.678 ms

test=# select count(*) from t1 inner join t2 on t1.t2_id=t2.id inner join t3 on t2.t3_id=t3.id;

count

---------

4000000

(1 row)

Time: 11682.525 ms

So it seems it's not quite as efficient as join removal at planning time, but still a big win when it's possible to perform the join skipping.

As yet, I've only added support for hash joins, and I've not really looked into detail on what's needed for nested loop joins or merge joins.

For hash join I just added some code into the case HJ_BUILD_HASHTABLE: in ExecHashJoin(). The code just checks if any side can be skipped, if they can then the node will never move out of the HJ_BUILD_HASHTABLE state, so that each time ExecHashJoin() is called, it'll just return the next tuple from the non-skip side of the join until we run out of tuples... Or if both sides can be skipped NULL is returned as any nodes above this shouldn't attempt to scan, perhaps this should just be an Assert() as I don't think any parent nodes should ever bother executing that node if it's not required. The fact that I've put this code into the switch under HJ_BUILD_HASHTABLE makes me think it should add about close to zero overhead for when the join cannot be skipped.

One thing that we've lost out of this execution time join removal checks method is the ability to still remove joins where the join column is NULLable. The previous patch added IS NOT NULL to ensure the query was equivalent when a join was removed, of course this meant that any subsequent joins may later not have been removed due to the IS NOT NULL quals existing (which could restrict the rows and remove the ability that the foreign key could guarantee the existence of exactly 1 row matching the join condition). I've not yet come up with a nice way to reimplement these null checks at execution time, so I've thought that perhaps I won't bother, at least not for this patch. I'd just disable join skipping at planning time if any of the join columns can have NULLs.

Anyway this is just a progress report, and also to say thanks to Andres, you might just have saved this patch with the execution time checking idea.

I'll post a patch soon, hopefully once I have merge and nestloop join types working.

Regards

David Rowley

Re: Patch to support SEMI and ANTI join removal

From

Andres Freund

Date:

08 October 2014, 11:40:19

On 2014-10-09 00:21:44 +1300, David Rowley wrote:
> Ok, so I've been hacking away at this for a couple of evenings and I think
> I have a working prototype finally!

Cool!

> So it seems it's not quite as efficient as join removal at planning time,
> but still a big win when it's possible to perform the join skipping.

Have you checked where the overhead is? Is it really just the additional
node that the tuples are passed through?

Have you measured the overhead of the plan/execution time checks over
master?

> One thing that we've lost out of this execution time join removal checks
> method is the ability to still remove joins where the join column is
> NULLable. The previous patch added IS NOT NULL to ensure the query was
> equivalent when a join was removed, of course this meant that any
> subsequent joins may later not have been removed due to the IS NOT NULL
> quals existing (which could restrict the rows and remove the ability that
> the foreign key could guarantee the existence of exactly 1 row matching the
> join condition). I've not yet come up with a nice way to reimplement these
> null checks at execution time, so I've thought that perhaps I won't bother,
> at least not for this patch. I'd just disable join skipping at planning
> time if any of the join columns can have NULLs.

Sounds fair enough for the first round.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Patch to support SEMI and ANTI join removal

From

David Rowley

Date:

15 October 2014, 10:03:17

On Thu, Oct 9, 2014 at 12:40 AM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-10-09 00:21:44 +1300, David Rowley wrote:
> Ok, so I've been hacking away at this for a couple of evenings and I think
> I have a working prototype finally!

Cool!

Patch attached.

> So it seems it's not quite as efficient as join removal at planning time,
> but still a big win when it's possible to perform the join skipping.

Have you checked where the overhead is? Is it really just the additional
node that the tuples are passed through?

I've not checked this yet, but I'd assume that it has to be from the extra node. I'll run some profiles soon.

Have you measured the overhead of the plan/execution time checks over
master?

I did a bit of benchmarking last night, but this was mostly for testing that I've not added too much overhead on the nest loop code.

For the merge and hashjoin code I've managed to keep the special skipping code in the EXEC_MJ_INITIALIZE_OUTER and HJ_BUILD_HASHTABLE part of the main switch statement, so the extra checks should only be performed on the first call of the node when skips are not possible. For nested loop I can't see any other way but to pay the small price of setting the skipflags and checking if there are any skip flags on every call to the node.

I tested the overhead of this on my laptop by creating 2 tables with 1 million rows each, joining them on an INT column, where each value of the int column was unique. I seem to have added about a 2% overhead to this. :( Which I was quite surprised at, giving it's just 1000001 million extra settings if the skipflags and checking that the skip flags are not empty, but perhaps the extra local variable is causing something else to not make it into a register. At the moment I can't quite see another way to do it, but I guess it may not be the end of the world as the chances of having to perform a nest loop join on 2 tables of that size is probably not that high.

Test case:

create table t3 (id int primary key);

create table t2 (id int primary key, t3_id int not null references t3);

create table t1 (id int primary key, t2_id int not null references t2);

insert into t3 select x.x from generate_series(1,1000000) x(x);

insert into t2 select x.x,x.x from generate_series(1,1000000) x(x);

insert into t1 select x.x,x.x from generate_series(1,1000000) x(x);

vacuum;

set enable_hashjoin = off;

set enable_mergejoin = off;

select count(*) from t1 inner join t2 on t1.id=t2.id;

Unpatched:

duration: 120 s

number of transactions actually processed: 45

latency average: 2666.667 ms

tps = 0.371901 (including connections establishing)

tps = 0.371965 (excluding connections establishing)

Patched:

Master

duration: 120 s

number of transactions actually processed: 44

latency average: 2727.273 ms

tps = 0.363933 (including connections establishing)

tps = 0.363987 (excluding connections establishing)

102.19%

Of course if we do the join on the column that has the foreign key, then this is much faster.

test=# select count(*) from t1 inner join t2 on t1.t2_id=t2.id;

count

---------

1000000

(1 row)

Time: 105.206 ms

The explain analyze from the above query looks like:

test=# explain (analyze, costs off, timing off) select count(*) from t1 inner join t2 on t1.t2_id=t2.id;

QUERY PLAN

------------------------------------------------------------------

Aggregate (actual rows=1 loops=1)

-> Nested Loop (actual rows=1000000 loops=1)

-> Seq Scan on t1 (actual rows=1000000 loops=1)

-> Index Only Scan using t2_pkey on t2 (never executed)

Index Cond: (id = t1.t2_id)

Heap Fetches: 0

Execution time: 124.990 ms

(7 rows)

As you can see the scan on t2 never occurred.

I've so far only managed to come up with 1 useful regression test for this new code. It's not possible to tell if the removal has taken place at plan time, as the plan looks the same as if it didn't get removed. The only way to tell us from the explain analyze, but the problem is that the execution time is shown and can't be removed. Perhaps I should modify the explain to tag join nodes that the planner has marked to say skipping may be possible? But this is not really ideal as it's only the join nodes that know about skipping and they just don't bother executing the child nodes, so it's really up to the executor to decide which child nodes don't get called, so to add something to explain might require making it more smart than it needs to be.

Right now I'm not quite sure if I should modify any costs. Quite possibly hash joins could have the costing reduced a little to try to encourage hashjoins over merge joins, as with merge joins we can't skip sort operations, but with hash joins we can skip the hash table build.

Regards

David Rowley

Attachment

inner_join_removals_2014-10-15_0f3f1ea.patch

Re: Patch to support SEMI and ANTI join removal

From

Simon Riggs

Date:

15 November 2014, 21:09:33

On 15 October 2014 11:03, David Rowley <dgrowleyml@gmail.com> wrote:

> The explain analyze from the above query looks like:
> test=# explain (analyze, costs off, timing off) select count(*) from t1
> inner join t2 on t1.t2_id=t2.id;
>                             QUERY PLAN
> ------------------------------------------------------------------
>  Aggregate (actual rows=1 loops=1)
>    ->  Nested Loop (actual rows=1000000 loops=1)
>          ->  Seq Scan on t1 (actual rows=1000000 loops=1)
>          ->  Index Only Scan using t2_pkey on t2 (never executed)
>                Index Cond: (id = t1.t2_id)
>                Heap Fetches: 0
>  Execution time: 124.990 ms
> (7 rows)
>
> As you can see the scan on t2 never occurred.

Very good, happy to see this happening (yay FKs!) and with
PostgreSQL-style rigour.

I've reviewed the patch from cold and I have a few comments.

The plan you end up with here works quite differently from left outer
join removal, where the join is simply absent. That inconsistency
causes most of the other problems I see.

I propose that we keep track of whether there are any potentially
skippable joins at the top of the plan. When we begin execution we do
a single if test to see if there is run-time work to do. If we pass
the run-time tests we then descend the tree and prune the plan to
completely remove unnecessary nodes. We end with an EXPLAIN and
EXPLAIN ANALYZE that looks like this

>                             QUERY PLAN
> ------------------------------------------------------------------
>  Aggregate (actual rows=1 loops=1)
>          ->  Seq Scan on t1 (actual rows=1000000 loops=1)

Doing that removes all the overheads and complexity; it also matches
how join removal currently works.

The alternative is accepting some pretty horrible additional code in
most join types, plus a small regression on nested loop joins which I
would have to call out as regrettably unacceptable. (Horrible in this
sense that we don't want that code, not that David's code is poor).

The tests on the patch are pretty poor. If we should use EXPLAINs to
show a join removal that works and a join removal that fails. With a
few of the main permutations.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Patch to support SEMI and ANTI join removal

From

David Rowley

Date:

15 November 2014, 23:19:37

On Sun, Nov 16, 2014 at 10:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 15 October 2014 11:03, David Rowley <dgrowleyml@gmail.com> wrote:

> The explain analyze from the above query looks like:
> test=# explain (analyze, costs off, timing off) select count(*) from t1
> inner join t2 on t1.t2_id=t2.id;
> QUERY PLAN
> ------------------------------------------------------------------
> Aggregate (actual rows=1 loops=1)
> -> Nested Loop (actual rows=1000000 loops=1)
> -> Seq Scan on t1 (actual rows=1000000 loops=1)
> -> Index Only Scan using t2_pkey on t2 (never executed)
> Index Cond: (id = t1.t2_id)
> Heap Fetches: 0
> Execution time: 124.990 ms
> (7 rows)
>
> As you can see the scan on t2 never occurred.

Very good, happy to see this happening (yay FKs!) and with
PostgreSQL-style rigour.

I've reviewed the patch from cold and I have a few comments.

Thanks!

The plan you end up with here works quite differently from left outer
join removal, where the join is simply absent. That inconsistency
causes most of the other problems I see.

I propose that we keep track of whether there are any potentially
skippable joins at the top of the plan. When we begin execution we do
a single if test to see if there is run-time work to do. If we pass
the run-time tests we then descend the tree and prune the plan to
completely remove unnecessary nodes. We end with an EXPLAIN and
EXPLAIN ANALYZE that looks like this

> QUERY PLAN
> ------------------------------------------------------------------
> Aggregate (actual rows=1 loops=1)
> -> Seq Scan on t1 (actual rows=1000000 loops=1)

Doing that removes all the overheads and complexity; it also matches
how join removal currently works.

This sounds much cleaner than what I have at the moment, although, you say EXPLAIN would look like that... I don't think that's quite true as the EXPLAIN still would have the un-pruned version, as the pruning would be done as executor start-up. Would it cause problems to have the EXPLAIN have a different looking plan than EXPLAIN ANALYZE?

I'll need to look into how the plan is stored in the case of PREPARE statements, as no doubt I can't go vandalising any plans that are stored in the PREPARE hashtable. I'd need to make a copy first, unless that's already done for me. But I guess I'd only have to do that if some flag on PlannerInfo hasSkippableNodes was true, so likely the overhead of such a copy would be regained by skipping some joins.

The alternative is accepting some pretty horrible additional code in
most join types, plus a small regression on nested loop joins which I
would have to call out as regrettably unacceptable. (Horrible in this
sense that we don't want that code, not that David's code is poor).

Yeah it is quite horrid. I did try and keep it as simple and as non-invasive as possible, but for nest loop it seemed there was just no better way.

The tests on the patch are pretty poor. If we should use EXPLAINs to
show a join removal that works and a join removal that fails. With a
few of the main permutations.

Agreed. To be honest I abandoned the tests due to a problem with EXPLAIN ANALYZE outputting the variable timing information at the bottom. There's no way to disable this! So that makes testing much harder.

I added myself to the list of complainers over here -> http://www.postgresql.org/message-id/CAApHDvoVzBTzLJbD9VfaznWo6jooK1k6-7rFQ8zYM9H7ndCcSA@mail.gmail.com but the proposed solution (diff tool which supports regex matching) is not all that simple, and I've not gotten around to attempting to make one yet.

I've also attached a rebased patch, as the old one is no longer applying.

Regards

David Rowley

Attachment

inner_join_removals_2014-11-16_3a40b4f.patch

Re: Patch to support SEMI and ANTI join removal

From

David Rowley

Date:

19 November 2014, 10:49:17

On Sun, Nov 16, 2014 at 12:19 PM, David Rowley <dgrowleyml@gmail.com> wrote:

On Sun, Nov 16, 2014 at 10:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

I propose that we keep track of whether there are any potentially
skippable joins at the top of the plan. When we begin execution we do
a single if test to see if there is run-time work to do. If we pass
the run-time tests we then descend the tree and prune the plan to
completely remove unnecessary nodes. We end with an EXPLAIN and
EXPLAIN ANALYZE that looks like this

> QUERY PLAN
> ------------------------------------------------------------------
> Aggregate (actual rows=1 loops=1)
> -> Seq Scan on t1 (actual rows=1000000 loops=1)

Doing that removes all the overheads and complexity; it also matches
how join removal currently works.

This sounds much cleaner than what I have at the moment, although, you say EXPLAIN would look like that... I don't think that's quite true as the EXPLAIN still would have the un-pruned version, as the pruning would be done as executor start-up. Would it cause problems to have the EXPLAIN have a different looking plan than EXPLAIN ANALYZE?

Oops, It seems you're right about the EXPLAIN output. I had not previously realised that plain old EXPLAIN would initialise the plan. It's nice to see that I'll get my old tests working again!

I've been hacking away at this, and I've now got a function which "implodes" the plan down to just what is required, I'm just calling this function is there are no pending foreign key triggers.

Writing this has made me realise that I may need to remove the functionality that I've added to the planner which, after it removes 1 inner join, it puts that relation in an "ignore list" and tries again to remove other relations again, but this time ignoring any vars from ignored relations. The problem I see with this is that, with a plan such as:

Hash Join

Hash Cond: (t1.id = t4.id)

-> Hash Join

Hash Cond: (t1.id = t3.id)

-> Hash Join

Hash Cond: (t1.id = t2.id)

-> Seq Scan on t1

-> Hash

-> Seq Scan on t2

-> Hash

-> Seq Scan on t3

-> Hash

-> Seq Scan on t4

If t1 and t4 are marked as "can remove", then the code that "implodes" plan to remove the nodes which are no longer required would render the plan a bit useless as there's no join between t2 and t3, we'd need to keep t1 in this case, even though non of it's Vars are required. Perhaps I could fix this by writing some more intelligent code which would leave joins in place in this situation, and maybe I could coerce the planner into not producing plans like this by lowering the costs of joins where 1 of the relations could be removed. Andres did mention lowering costs previously, but at the time I'd not realised why it was required.

I'm also a little concerned around Merge Joins, as if I removed a Merge Join, because one of the relations was not required, and just left, say the SeqScan node for the other relation in place of the Merge Join, then I'd need to somehow check that none of the parent nodes were expecting some specific sort order. Perhaps I could just always leave any Sort node in place, if it existed, and just put the scan below that, but it all feels a bit like executor performing voodoo on the plan... i.e. just feels like a little bit more than the executor should know about plans. I'm a bit worried that I could spend a week on this and Tom or someone else then comes along and throws it out.

So I'm really just looking for some confirmation to if this is a good or bad idea, based on the discoveries I've explained above. I really want to see this stuff working, but at the same time don't want to waste time on it if it's never going to be committed.

Regards

David Rowley

Re: Patch to support SEMI and ANTI join removal

From

David Rowley

Date:

23 November 2014, 11:23:56

On Wed, Nov 19, 2014 at 11:49 PM, David Rowley <dgrowleyml@gmail.com> wrote:

On Sun, Nov 16, 2014 at 12:19 PM, David Rowley <dgrowleyml@gmail.com> wrote:
On Sun, Nov 16, 2014 at 10:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

I propose that we keep track of whether there are any potentially
skippable joins at the top of the plan. When we begin execution we do
a single if test to see if there is run-time work to do. If we pass
the run-time tests we then descend the tree and prune the plan to
completely remove unnecessary nodes. We end with an EXPLAIN and
EXPLAIN ANALYZE that looks like this

> QUERY PLAN
> ------------------------------------------------------------------
> Aggregate (actual rows=1 loops=1)
> -> Seq Scan on t1 (actual rows=1000000 loops=1)

Doing that removes all the overheads and complexity; it also matches
how join removal currently works.

I've attached an updated patch which works in this way. All of the skipping code that I had added to the executor's join functions has now been removed.

Here's an example output with the plan trimmed, and then untrimmed.

set constraints b_c_id_fkey deferred;

explain (costs off) select b.* from b inner join c on b.c_id = c.id;

QUERY PLAN

---------------

Seq Scan on b

(1 row)

-- add a item to the trigger queue by updating a referenced record.

update c set id = 2 where id=1;

explain (costs off) select b.* from b inner join c on b.c_id = c.id;

QUERY PLAN

------------------------------

Hash Join

Hash Cond: (b.c_id = c.id)

-> Seq Scan on b

-> Hash

-> Seq Scan on c

(5 rows)

A slight quirk with the patch as it stands is that I'm unconditionally NOT removing Sort nodes that sit below a MergeJoin node. The reason for this is that I've not quite figured out a way to determine if the Sort order is required still.

An example of this can be seen in the regression tests:

-- check merge join nodes are removed properly

set enable_hashjoin = off;

-- this should remove joins to b and c.

explain (costs off)

select COUNT(*) from a inner join b on a.b_id = b.id left join c on a.id = c.id;

QUERY PLAN

---------------------------

Aggregate

-> Sort

Sort Key: a.b_id

-> Seq Scan on a

(4 rows)

As the patch stands there's still a couple of FIXMEs in there, so there's still a bit of work to do yet.

Comments are welcome

Regards

David Rowley

Attachment

inner_join_removals_2014-11-24_7cde1e4.patch

Re: Patch to support SEMI and ANTI join removal

From

Michael Paquier

Date:

13 February 2015, 07:52:32

On Sun, Nov 23, 2014 at 8:23 PM, David Rowley <dgrowleyml@gmail.com> wrote:

As the patch stands there's still a couple of FIXMEs in there, so there's still a bit of work to do yet.
Comments are welcome

Hm, if there is still work to do, we may as well mark this patch as rejected as-is, also because it stands in this state for a couple of months.
--

Michael

Re: Patch to support SEMI and ANTI join removal

From

Marko Tiikkaja

Date:

13 February 2015, 07:58:02

On 2/13/15 8:52 AM, Michael Paquier wrote:
> On Sun, Nov 23, 2014 at 8:23 PM, David Rowley <dgrowleyml@gmail.com> wrote:
>> As the patch stands there's still a couple of FIXMEs in there, so there's
>> still a bit of work to do yet.
>> Comments are welcome
>>
>
> Hm, if there is still work to do, we may as well mark this patch as
> rejected as-is, also because it stands in this state for a couple of months.

I didn't bring this up before, but I'm pretty sure this patch should be 
marked "returned with feedback".  From what I've understood, "rejected" 
means "we don't want this thing, not in this form or any other".  That 
doesn't seem to be the case for this patch, nor for a few others marked 
"rejected" in the currently in-progress commit fest.

.m

Re: Patch to support SEMI and ANTI join removal

From

Michael Paquier

Date:

13 February 2015, 08:06:19

<div dir="ltr"><br /><div class="gmail_extra"><br /><div class="gmail_quote">On Fri, Feb 13, 2015 at 4:57 PM, Marko
Tiikkaja<span dir="ltr"><<a href="mailto:marko@joh.to" target="_blank">marko@joh.to</a>></span> wrote:<br
/><blockquoteclass="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex"><divclass=""><div class="h5">On 2/13/15 8:52 AM, Michael Paquier wrote:<br
/><blockquoteclass="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">On Sun, Nov 23, 2014 at 8:23 PM, David Rowley <<a
href="mailto:dgrowleyml@gmail.com"target="_blank">dgrowleyml@gmail.com</a>> wrote:<br /><blockquote
class="gmail_quote"style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> As the
patchstands there's still a couple of FIXMEs in there, so there's<br /> still a bit of work to do yet.<br /> Comments
arewelcome<br /><br /></blockquote><br /> Hm, if there is still work to do, we may as well mark this patch as<br />
rejectedas-is, also because it stands in this state for a couple of months.<br /></blockquote><br /></div></div> I
didn'tbring this up before, but I'm pretty sure this patch should be marked "returned with feedback".  From what I've
understood,"rejected" means "we don't want this thing, not in this form or any other".  That doesn't seem to be the
casefor this patch, nor for a few others marked "rejected" in the currently in-progress commit fest.<span
class=""><fontcolor="#888888"><br /></font></span></blockquote></div><br /></div><div class="gmail_extra">In the new CF
app,marking a patch as "returned this feedback" adds it automatically to the next commit fest. And note that it is
actuallywhat I did for now to move on to the next CF in the doubt:<br /><a
href="https://commitfest.postgresql.org/3/27/">https://commitfest.postgresql.org/3/27/</a><br/></div><div
class="gmail_extra">Butif nothing is done, we should as well mark it as "rejected". Not based on the fact that it is
rejectedbased on its content, but to not bloat the CF app with entries that have no activity for months.<br clear="all"
/></div><divclass="gmail_extra">-- <br /><div class="gmail_signature">Michael<br /></div></div></div>

Re: Patch to support SEMI and ANTI join removal

From

Andres Freund

Date:

13 February 2015, 08:13:03

On 2015-02-13 17:06:14 +0900, Michael Paquier wrote:
> On Fri, Feb 13, 2015 at 4:57 PM, Marko Tiikkaja <marko@joh.to> wrote:
> 
> > On 2/13/15 8:52 AM, Michael Paquier wrote:
> >
> >> On Sun, Nov 23, 2014 at 8:23 PM, David Rowley <dgrowleyml@gmail.com>
> >> wrote:
> >>
> >>> As the patch stands there's still a couple of FIXMEs in there, so there's
> >>> still a bit of work to do yet.
> >>> Comments are welcome
> >>>
> >>>
> >> Hm, if there is still work to do, we may as well mark this patch as
> >> rejected as-is, also because it stands in this state for a couple of
> >> months.
> >>
> >
> > I didn't bring this up before, but I'm pretty sure this patch should be
> > marked "returned with feedback".  From what I've understood, "rejected"
> > means "we don't want this thing, not in this form or any other".  That
> > doesn't seem to be the case for this patch, nor for a few others marked
> > "rejected" in the currently in-progress commit fest.
> >
> 
> In the new CF app, marking a patch as "returned this feedback" adds it
> automatically to the next commit fest. And note that it is actually what I
> did for now to move on to the next CF in the doubt:
> https://commitfest.postgresql.org/3/27/
> But if nothing is done, we should as well mark it as "rejected". Not based
> on the fact that it is rejected based on its content, but to not bloat the
> CF app with entries that have no activity for months.

Then the CF app needs to be fixed. Marking patches as rejected on these
grounds is a bad idea.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Patch to support SEMI and ANTI join removal

From

Michael Paquier

Date:

13 February 2015, 08:14:33

<div dir="ltr"><br /><div class="gmail_extra"><br /><div class="gmail_quote">On Fri, Feb 13, 2015 at 5:12 PM, Andres
Freund<span dir="ltr"><<a href="mailto:andres@2ndquadrant.com" target="_blank">andres@2ndquadrant.com</a>></span>
wrote:<br/><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div
class="HOEnZb"><divclass="h5">On 2015-02-13 17:06:14 +0900, Michael Paquier wrote:<br /> > On Fri, Feb 13, 2015 at
4:57PM, Marko Tiikkaja <<a href="mailto:marko@joh.to">marko@joh.to</a>> wrote:<br /> ><br /> > > On
2/13/158:52 AM, Michael Paquier wrote:<br /> > ><br /> > >> On Sun, Nov 23, 2014 at 8:23 PM, David
Rowley<<a href="mailto:dgrowleyml@gmail.com">dgrowleyml@gmail.com</a>><br /> > >> wrote:<br /> >
>><br/> > >>> As the patch stands there's still a couple of FIXMEs in there, so there's<br /> >
>>>still a bit of work to do yet.<br /> > >>> Comments are welcome<br /> > >>><br />
>>>><br /> > >> Hm, if there is still work to do, we may as well mark this patch as<br /> >
>>rejected as-is, also because it stands in this state for a couple of<br /> > >> months.<br /> >
>><br/> > ><br /> > > I didn't bring this up before, but I'm pretty sure this patch should be<br />
>> marked "returned with feedback".  From what I've understood, "rejected"<br /> > > means "we don't want
thisthing, not in this form or any other".  That<br /> > > doesn't seem to be the case for this patch, nor for a
fewothers marked<br /> > > "rejected" in the currently in-progress commit fest.<br /> > ><br /> ><br />
>In the new CF app, marking a patch as "returned this feedback" adds it<br /> > automatically to the next commit
fest.And note that it is actually what I<br /> > did for now to move on to the next CF in the doubt:<br /> > <a
href="https://commitfest.postgresql.org/3/27/"target="_blank">https://commitfest.postgresql.org/3/27/</a><br /> >
Butif nothing is done, we should as well mark it as "rejected". Not based<br /> > on the fact that it is rejected
basedon its content, but to not bloat the<br /> > CF app with entries that have no activity for months.<br /><br
/></div></div>Thenthe CF app needs to be fixed. Marking patches as rejected on these<br /> grounds is a bad idea.<br
/></blockquote></div><br/></div><div class="gmail_extra">Yup, definitely the term is incorrect. We need "Returned with
feedbackbut please do not add it to the next CF dear CF app".<br />-- <br /><div class="gmail_signature">Michael<br
/></div></div></div>

Re: Patch to support SEMI and ANTI join removal

From

David Rowley

Date:

14 February 2015, 12:41:13

There does not seem to be a delete button, so marking as "rejected" due to this now being a duplicate entry for this
patch.

Re: Patch to support SEMI and ANTI join removal

From

David Rowley

Date:

14 February 2015, 12:48:47

On 13 February 2015 at 20:52, Michael Paquier <michael.paquier@gmail.com> wrote:

On Sun, Nov 23, 2014 at 8:23 PM, David Rowley <dgrowleyml@gmail.com> wrote:

As the patch stands there's still a couple of FIXMEs in there, so there's still a bit of work to do yet.
Comments are welcome

Hm, if there is still work to do, we may as well mark this patch as rejected as-is, also because it stands in this state for a couple of months.

My apologies, I'd not realised that the thread link on the commitfest app was pointing to the wrong thread.

I see now that the patch has been bumped off the december fest that It's now a duplicate on the February commitfest as I'd hastily added it to Feb before I rushed off on my summer holidays 2 weeks ago.

I've now changed the duplicate to "returned with feedback" as there was no option that I could find to delete it. (If anyone has extra rights to do this, could that be done instead?)

The state of the patch is currently ready for review. The FIXME stuff that I had talked about above is old news.

Please can we use the http://www.postgresql.org/message-id/flat/CAApHDvocUEYdt1uT+DLDPs2xEu=v3qJGT6HeXKonQM4rY_OsSA@mail.gmail.com#CAApHDvocUEYdt1uT+DLDPs2xEu=v3qJGT6HeXKonQM4rY_OsSA@mail.gmail.com thread for further communications about this patch. I'm trying to kill this one off due to the out-of-date subject.

Regards

David Rowley