Thread: Performance improvement for joins where outer side is unique
Hi,
I've been hacking a bit at the join code again... This time I've been putting some effort into optimising the case where the inner side of the join is known to be unique.
For example, given the tables:
create table t1 (id int primary key);
create table t2 (id int primary key);
And query such as:
It is possible to deduce at planning time that "for each row in the outer relation, only 0 or 1 rows can exist in the inner relation", (inner being t2)
In all of our join algorithms in the executor, if the join type is SEMI, we skip to the next outer row once we find a matching inner row. This is because we don't want to allow duplicate rows in the inner side to duplicate outer rows in the result set. Obviously this is required per SQL spec. I believe we can also skip to the next outer row in this case when we've managed to prove that no other row can possibly exist that matches the current outer row, due to a unique index or group by/distinct clause (for subqueries).
I've attached a patch which aims to prove this idea works. Although so far I've only gotten around to implementing this for outer joins.
Since code already existed in analyzejoin.c which aimed to determine if a join to a relation was unique on the join's condition, the patch is pretty much just some extra plumbing and a small rewire of analyzejoin.c, which just aims to get the "unique_inner" bool value down to the join node.
The performance improvement is somewhere around 5-10% for hash joins, and a bit less for merge join. In theory it could be 50% for nested loop joins.
In my life in the OLTP world, these "unique joins" pretty much cover all joins that ever happen ever. Perhaps the OLAP world is different, so from my point of view this is going to be a very nice performance gain.
I'm seeing this patch (or a more complete version) as the first step to a whole number of other planner improvements:
A couple of examples of those are:
1.
explain select * from sales s inner join product p on p.productid = s.productid order by s.productid,p.name;
The current plan for this looks like:
QUERY PLAN
--------------------------------------------------------------------------------
Sort (cost=149.80..152.80 rows=1200 width=46)
Sort Key: s.productid, p.name
-> Hash Join (cost=37.00..88.42 rows=1200 width=46)
Hash Cond: (s.productid = p.productid)
-> Seq Scan on sales s (cost=0.00..31.40 rows=2140 width=8)
-> Hash (cost=22.00..22.00 rows=1200 width=38)
-> Seq Scan on product p (cost=0.00..22.00 rows=1200 width=38)
But in reality we could have executed this with a simple merge join using the PK of product (productid) to provide presorted input. The extra sort column on p.name is redundant due to productid being unique.
The UNION planning is crying out for help for cases like this. Where it could provide sorted input on a unique index, providing the union's targetlist contained all of the unique index's columns, and we also managed to find an index on the other part of the union on the same set of columns, then we could perform a Merge Append and a Unique. This would cause a signification improvement in execution time for these types of queries, as the current planner does an append/sort/unique, which especially sucks for plans with a LIMIT clause.
I think this should solve some of the problems that Kyotarosan encountered in his episode of dancing with indices over here -> http://www.postgresql.org/message-id/20131031.194310.212107585.horiguchi.kyotaro@lab.ntt.co.jp where he was unable to prove that he could trim down sort nodes once all of the columns of a unique index had been seen in the order by due to not knowing if joins were going to duplicate the outer rows.
2.
It should also be possible to reuse a join in situations like:
create view product_sales as select p.productid,sum(s.qty) soldqty from product p inner join sales s group by p.productid;
Where the consumer does:
select ps.*,p.current_price from product_sales ps inner join product p on ps.productid = p.productid;
Here we'd currently join the product table twice, both times on the same condition, but we could safely not bother doing that if we knew that the join could never match more than 1 row on the inner side. Unfortunately I deal with horrid situations like this daily, where people have used views from within views, and all the same tables end up getting joined multiple times :-(
Of course, both 1 and 2 should be considered separately from the attached patch, this was just to show where it might lead.
Performance of the patch are as follows:
Test case:
create table t1 (id int primary key);
create table t2 (id int primary key);
insert into t1 select x.x from generate_series(1,1000000) x(x);
insert into t2 select x.x from generate_series(1,1000000) x(x);
vacuum analyze;
Query:
Values are in transactions per second.
JOIN type Patched Unpatched new runtime
hash 1.295764 1.226188 94.63%
merge 1.786248 1.776605 99.46%
nestloop 0.465356 0.443015 95.20%
Things improve a bit more when using a varchar instead of an int:
hash | 1.198821 | 1.102183 | 91.94% |
I've attached the full benchmark results so as not to spam the thread.
Comments are welcome
Regards
David Rowley
Attachment
On 1 January 2015 at 02:47, David Rowley <dgrowleyml@gmail.com> wrote:
Hi,I've been hacking a bit at the join code again... This time I've been putting some effort into optimising the case where the inner side of the join is known to be unique.For example, given the tables:create table t1 (id int primary key);create table t2 (id int primary key);And query such as:It is possible to deduce at planning time that "for each row in the outer relation, only 0 or 1 rows can exist in the inner relation", (inner being t2)
I've been hacking at this unique join idea again and I've now got it working for all join types -- Patch attached.
Here's how the performance is looking:
postgres=# create table t1 (id int primary key);
CREATE TABLE
postgres=# create table t2 (id int primary key);
CREATE TABLE
postgres=# insert into t1 select x.x from generate_series(1,1000000) x(x);
INSERT 0 1000000
postgres=# insert into t2 select x.x from generate_series(1,1000000) x(x);
INSERT 0 1000000
postgres=# vacuum analyze;
VACUUM
postgres=# \q
With Patch on master as of 32bf6ee
D:\Postgres\install\bin>pgbench -f d:\unijoin3.sql -T 60 -n postgres
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 78
latency average: 769.231 ms
tps = 1.288260 (including connections establishing)
tps = 1.288635 (excluding connections establishing)
Master as of 32bf6ee
D:\Postgres\install\bin>pgbench -f d:\unijoin3.sql -T 60 -n postgres
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 70
latency average: 857.143 ms
tps = 1.158905 (including connections establishing)
tps = 1.159264 (excluding connections establishing)
That's a 10% performance increase.
I still need to perform more thorough benchmarking with different data types.
One weird thing that I noticed before is that in an earlier revision of the patch in the executor's join Initialise node code, I had set the unique_inner to true for semi joins and replaced the SEMI_JOIN check for a unique_join check in the execute node for each join method. With this the performance results barely changed from standard... I've yet to find out why.
The patch also has added a property to the EXPLAIN (VERBOSE) output which states if the join was found to be unique or not.
The patch also still requires a final pass of comment fix-ups. I've just plain run out of time for now.
I'll pick this up in 2 weeks time.
Regards
David Rowley
Attachment
Hi, I had a look on this patch. Although I haven't understood whole the stuff and all of the related things, I will comment as possible. Performance: I looked on the performance gain this patch gives. For several on-memory joins, I had gains about 3% for merge join, 5% for hash join, and 10% for nestloop (@CentOS6), for simple 1-level joins with aggregation similar to what you mentioned in previous mail like this. =# SELECT count(*) FROM t1 JOIN t2 USING (id); Of course, the gain would be trivial when returning many tuples, or scans go to disk. I haven't measured the loss by additional computation when the query is regarded as not a "single join". Explain representation: I don't like that the 'Unique Join:" occupies their own lines in the result of explain, moreover, it doesn't show the meaning clearly. What about the representation like the following? Or, It might not be necessary to appear there. Nested Loop ... Output: ....- Unique Jion: Yes -> Seq Scan on public.t2 (cost = ...- -> Seq Scan on public.t1(cost = ....+ -> Seq Scan on public.t1 (unique) (cost = .... Coding: The style looks OK. Could applied on master. It looks to work as you expected for some cases. Random comments follow. - Looking specialjoin_is_unique_join, you seem to assume that !delay_upper_joins is a sufficient condition for not being"unique join". The former simplly indicates that "don't commute with upper OJs" and the latter seems to indicate that"the RHS is unique", Could you tell me what kind of relationship is there between them? - The naming "unique join" seems a bit obscure for me, but I don't have no better idea:( However, the member name "has_unique_join"seems to be better to be "is_unique_join". - eclassjoin_is_unique_join involves seemingly semi-exhaustive scan on ec members for every element in joinlist. Even ifnot, it might be thought to be a bit too large for the gain. Could you do the equivelant things by more small code? regards, Jan 2015 00:37:19 +1300, David Rowley <dgrowleyml@gmail.com> wrote in <CAApHDvod_uCMoUPovdpXbNkw50O14x3wwKoJmZLxkbBn71zdEg@mail.gmail.com> > On 1 January 2015 at 02:47, David Rowley <dgrowleyml@gmail.com> wrote: > > > Hi, > > > > I've been hacking a bit at the join code again... This time I've been > > putting some effort into optimising the case where the inner side of the > > join is known to be unique. > > For example, given the tables: > > > > create table t1 (id int primary key); > > create table t2 (id int primary key); > > > > And query such as: > > > > select * from t1 left outer join t2 on t1.id=t2.id; > > > > It is possible to deduce at planning time that "for each row in the outer > > relation, only 0 or 1 rows can exist in the inner relation", (inner being > > t2) > > > > I've been hacking at this unique join idea again and I've now got it > working for all join types -- Patch attached. > > Here's how the performance is looking: > > postgres=# create table t1 (id int primary key); > CREATE TABLE > postgres=# create table t2 (id int primary key); > CREATE TABLE > postgres=# insert into t1 select x.x from generate_series(1,1000000) x(x); > INSERT 0 1000000 > postgres=# insert into t2 select x.x from generate_series(1,1000000) x(x); > INSERT 0 1000000 > postgres=# vacuum analyze; > VACUUM > postgres=# \q > > Query: select count(t1.id) from t1 inner join t2 on t1.id=t2.id; > > With Patch on master as of 32bf6ee > > D:\Postgres\install\bin>pgbench -f d:\unijoin3.sql -T 60 -n postgres > transaction type: Custom query > scaling factor: 1 > query mode: simple > number of clients: 1 > number of threads: 1 > duration: 60 s > number of transactions actually processed: 78 > latency average: 769.231 ms > tps = 1.288260 (including connections establishing) > tps = 1.288635 (excluding connections establishing) > > Master as of 32bf6ee > > D:\Postgres\install\bin>pgbench -f d:\unijoin3.sql -T 60 -n postgres > transaction type: Custom query > scaling factor: 1 > query mode: simple > number of clients: 1 > number of threads: 1 > duration: 60 s > number of transactions actually processed: 70 > latency average: 857.143 ms > tps = 1.158905 (including connections establishing) > tps = 1.159264 (excluding connections establishing) > > That's a 10% performance increase. > > I still need to perform more thorough benchmarking with different data > types. > > One weird thing that I noticed before is that in an earlier revision of the > patch in the executor's join Initialise node code, I had set the > unique_inner to true for semi joins and replaced the SEMI_JOIN check for a > unique_join check in the execute node for each join method. With this the > performance results barely changed from standard... I've yet to find out > why. I don't know even what you did precisely but if you do such a thing, you perhaps should change the name of "unique_inner" to something representing that it reads up to one row from the inner for one outer row. (Sorry I have no good idea for this..) > The patch also has added a property to the EXPLAIN (VERBOSE) output which > states if the join was found to be unique or not. > > The patch also still requires a final pass of comment fix-ups. I've just > plain run out of time for now. > > I'll pick this up in 2 weeks time. > > Regards > > David Rowley -- Kyotaro Horiguchi NTT Open Source Software Center
Hi David, I've been looking at this patch, mostly because it seems like a great starting point for improving estimation for joins on multi-column FKs. Currently we do this: CREATE TABLE parent (a INT, b INT, PRIMARY KEY (a,b)); CREATE TABLE child (a INT, b INT, FOREIGN KEY (a,b) REFERENCES parent (a,b)); INSERT INTO parent SELECT i, i FROM generate_series(1,1000000) s(i); INSERT INTO child SELECT i, i FROM generate_series(1,1000000)s(i); ANALYZE; EXPLAIN SELECT * FROM parent JOIN child USING (a,b); QUERY PLAN --------------------------------------------------------------------- Hash Join (cost=33332.00..66978.01 rows=1 width=8) Hash Cond: ((parent.a = child.a) AND (parent.b = child.b)) -> SeqScan on parent (cost=0.00..14425.00 rows=1000000 width=8) -> Hash (cost=14425.00..14425.00 rows=1000000 width=8) -> Seq Scan on child (cost=0.00..14425.00 rows=1000000 width=8) (5 rows) Which is of course non-sense, because we know it's a join on FK, so the join will produce 1M rows (just like the child table). This seems like a rather natural extension of what you're doing in this patch, except that it only affects the optimizer and not the executor. Do you have any plans in this direction? If not, I'll pick this up as I do have that on my TODO. regards -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, I tried to do an initdb with the patch applied, and seems there's a bug somewhere in analyzejoins.c: tomas@rimmer ~ $ pg_ctl -D tmp/pg-unidata init The files belonging to this database system will be owned by user "tomas". This user must also own the server process. The database cluster will be initialized with locale "en_US". The default database encoding has accordingly been set to "LATIN1". The default text search configuration will be set to "english". Data page checksums are disabled. creating directory tmp/pg-unidata ... ok creating subdirectories ... ok selecting default max_connections ... 100 selecting default shared_buffers ... 128MB selecting dynamic shared memory implementation ... sysv creating configuration files ... ok creating template1 database in tmp/pg-unidata/base/1 ... ok initializing pg_authid ... ok initializing dependencies ... ok creating system views ... ok loading system objects' descriptions ... TRAP: FailedAssertion("!(index_vars != ((List *) ((void *)0)))", File: "analyzejoins.c", Line: 414) sh: line 1: 339 Aborted "/home/tomas/pg-unijoins/bin/postgres" --single -F -O -c search_path=pg_catalog -c exit_on_error=true template1 > /dev/null child process exited with exit code 134 initdb: removing data directory "tmp/pg-unidata" pg_ctl: database system initialization failed The problem seems to be the last command in setup_description() at src/bin/initdb/initdb.c:1843, i.e. this query: WITH funcdescs AS ( SELECT p.oid as p_oid, oprname, coalesce(obj_description(o.oid, 'pg_operator'),'') as opdesc FROMpg_proc p JOIN pg_operator o ON oprcode = p.oid ) INSERT INTO pg_description SELECT p_oid, 'pg_proc'::regclass, 0, 'implementation of ' || oprname || ' operator' FROM funcdescs WHERE opdesc NOT LIKE 'deprecated%' AND NOT EXISTS (SELECT 1 FROM pg_description WHERE objoid = p_oid AND classoid = 'pg_proc'::regclass) And particularly the join in the CTE, i.e. this fails SELECT * FROM pg_proc p JOIN pg_operator o ON oprcode = p.oid I'm not quite sure why, but eclassjoin_is_unique_join() never actually jumps into this part (line ~400): if (relvar != NULL && candidaterelvar != NULL) { ... index_vars = lappend(index_vars, candidaterelvar); ... } so the index_vars is NIL. Not sure why, but I'm sure you'll spot the issue right away. BTW, I find this coding (first cast, then check) rather strange: Var *var = (Var *) ecm->em_expr; if (!IsA(var, Var)) continue; /* Ignore Consts */ It's probably harmless, but I find it confusing and I can't remember seeing it elsewhere in the code (for example clausesel.c and such) use this style: ... clause is (Node*) ... if (IsA(clause, Var)) { Var *var = (Var*)clause; ... } or Var * var = NULL; if (! IsA(clause, Var)) // error / continue var = (Var*)clause; -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 26.2.2015 01:15, Tomas Vondra wrote: > > I'm not quite sure why, but eclassjoin_is_unique_join() never actually > jumps into this part (line ~400): > > if (relvar != NULL && candidaterelvar != NULL) > { > ... > index_vars = lappend(index_vars, candidaterelvar); > ... > } > > so the index_vars is NIL. Not sure why, but I'm sure you'll spot the > issue right away. FWIW this apparently happens because the code only expect that EquivalenceMembers only contain Var, but in this particular case that's not the case - it contains RelabelType (because oprcode is regproc, and needs to be relabeled to oid). Adding this before the IsA(var, Var) check fixes the issue if (IsA(var, RelabelType)) var = (Var*) ((RelabelType *) var)->arg; but this makes the code even more confusing, because 'var' suggests it's a Var node, but it's RelabelType node. Also, specialjoin_is_unique_join() may have the same problem, but I've been unable to come up with an example. regards -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > FWIW this apparently happens because the code only expect that > EquivalenceMembers only contain Var, but in this particular case that's > not the case - it contains RelabelType (because oprcode is regproc, and > needs to be relabeled to oid). If it thinks an EquivalenceMember must be a Var, it's outright broken; I'm pretty sure any nonvolatile expression is possible. regards, tom lane
On 26.2.2015 18:34, Tom Lane wrote: > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> FWIW this apparently happens because the code only expect that >> EquivalenceMembers only contain Var, but in this particular case that's >> not the case - it contains RelabelType (because oprcode is regproc, and >> needs to be relabeled to oid). > > If it thinks an EquivalenceMember must be a Var, it's outright > broken; I'm pretty sure any nonvolatile expression is possible. I came to the same conclusion, because even with the RelabelType fix it's trivial to crash it with a query like this: SELECT 1 FROM pg_proc p JOIN pg_operator o ON oprcode = (p.oid::int4 + 1); I think fixing this (e.g. by restricting the optimization to Var-only cases) should not be difficult, although it might be nice to handle generic expressions too, and something like examine_variable() should do the trick. But I think UNIQUE indexes on expressions are not all that common. -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 3 February 2015 at 22:23, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Hi, I had a look on this patch. Although I haven't understood
whole the stuff and all of the related things, I will comment as
possible.
Great, thank you for taking the time to look and review the patch.
Performance:
I looked on the performance gain this patch gives. For several
on-memory joins, I had gains about 3% for merge join, 5% for hash
join, and 10% for nestloop (@CentOS6), for simple 1-level joins
with aggregation similar to what you mentioned in previous
mail like this.
=# SELECT count(*) FROM t1 JOIN t2 USING (id);
Of course, the gain would be trivial when returning many tuples,
or scans go to disk.
That's true, but joins where rows are filtered after the join condition will win here too:
Also queries with a GROUP BY clause. (Since grouping is always performed after the join :-( )
I haven't measured the loss by additional computation when the
query is regarded as not a "single join".
I think this would be hard to measure, but likely if it is measurable then you'd want to look at just planning time, rather than planning and execution time.
Explain representation:
I don't like that the 'Unique Join:" occupies their own lines in
the result of explain, moreover, it doesn't show the meaning
clearly. What about the representation like the following? Or,
It might not be necessary to appear there.
Nested Loop ...
Output: ....
- Unique Jion: Yes
-> Seq Scan on public.t2 (cost = ...
- -> Seq Scan on public.t1 (cost = ....
+ -> Seq Scan on public.t1 (unique) (cost = ....
Yeah I'm not too big a fan of this either. I did at one evolution of the patch I had "Unique Left Join" in the join node's line in the explain output, but I hated that more and changed it just to be just in the VERBOSE output, and after I did that I didn't want to change the join node's line only when in verbose mode. I do quite like that it's a separate item for the XML and JSON explain output. That's perhaps quite useful when the explain output must be processed by software.
I'm totally open for better ideas on names, but I just don't have any at the moment.
Coding:
The style looks OK. Could applied on master.
It looks to work as you expected for some cases.
Random comments follow.
- Looking specialjoin_is_unique_join, you seem to assume that
!delay_upper_joins is a sufficient condition for not being
"unique join". The former simplly indicates that "don't
commute with upper OJs" and the latter seems to indicate that
"the RHS is unique", Could you tell me what kind of
relationship is there between them?
The rationalisation around that are from the (now changed) version of join_is_removable(), where the code read:
/*
* Must be a non-delaying left join to a single baserel, else we aren't
* going to be able to do anything with it.
*/
if (sjinfo->jointype != JOIN_LEFT ||
sjinfo->delay_upper_joins)
return false;
I have to admit that I didn't go and investigate why delayed upper joins cannot be removed by left join removal code, I really just assumed that we're just unable to prove that a join to such a relation won't match more than one outer side's row. I kept this to maintain that behaviour as I assumed it was there for a good reason.
- The naming "unique join" seems a bit obscure for me, but I
don't have no better idea:( However, the member name
"has_unique_join" seems to be better to be "is_unique_join".
Yeah, I agree with this. I just can't find something short enough that means "based on the join condition, the inner side of the join will never produce more than 1 row for any single outer row". Unique join was the best I came up with. I'm totally open for better ideas.
I agree that is_unique_join is better than has_unique_join. I must have just copied the has_eclass_joins struct member without thinking too hard about it. I've now changed this in my local copy of the patch.
- eclassjoin_is_unique_join involves seemingly semi-exhaustive
scan on ec members for every element in joinlist. Even if not,
it might be thought to be a bit too large for the gain. Could
you do the equivelant things by more small code?
I'd imagine some very complex queries could have many equivalence classes and members, though I hadn't really thought that this number would be so great that processing here would suffer much. There's quite a few fast paths out, like the test to ensure that both rels are mentioned in ec_relids. Though for a query which is so complex to have a great number of eclass' and members, likely there would be quite a few relations involved and planning would be slower anyway. I'm not quite sure how else I could write this and still find the unique join cases each time. We can't really just give up half way through looking.
Thanks again for the review.
Regards
David Rowley
On 26 February 2015 at 08:39, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
I've been looking at this patch, mostly because it seems like a great
starting point for improving estimation for joins on multi-column FKs.
Currently we do this:
CREATE TABLE parent (a INT, b INT, PRIMARY KEY (a,b));
CREATE TABLE child (a INT, b INT, FOREIGN KEY (a,b)
REFERENCES parent (a,b));
INSERT INTO parent SELECT i, i FROM generate_series(1,1000000) s(i);
INSERT INTO child SELECT i, i FROM generate_series(1,1000000) s(i);
ANALYZE;
EXPLAIN SELECT * FROM parent JOIN child USING (a,b);
QUERY PLAN
---------------------------------------------------------------------
Hash Join (cost=33332.00..66978.01 rows=1 width=8)
Hash Cond: ((parent.a = child.a) AND (parent.b = child.b))
-> Seq Scan on parent (cost=0.00..14425.00 rows=1000000 width=8)
-> Hash (cost=14425.00..14425.00 rows=1000000 width=8)
-> Seq Scan on child (cost=0.00..14425.00 rows=1000000 width=8)
(5 rows)
Which is of course non-sense, because we know it's a join on FK, so the
join will produce 1M rows (just like the child table).
This seems like a rather natural extension of what you're doing in this
patch, except that it only affects the optimizer and not the executor.
Do you have any plans in this direction? If not, I'll pick this up as I
do have that on my TODO.
Hi Tomas,
I guess similar analysis could be done on FKs as I'm doing on unique indexes. Perhaps my patch for inner join removal can help you more with that. You may notice that in this patch I have ended up changing the left join removal code so that it just checks if has_unique_join is set for the special join. Likely something similar could be done with your idea and the inner join removals, just by adding some sort of flag on RelOptInfo to say "join_row_exists" or some better name. Quite likely if there's any pending foreign key triggers, then it won't matter at all for the sake of row estimates.
Regards
David Rowley
On 27 February 2015 at 06:48, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
I came to the same conclusion, because even with the RelabelType fixOn 26.2.2015 18:34, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> FWIW this apparently happens because the code only expect that
>> EquivalenceMembers only contain Var, but in this particular case that's
>> not the case - it contains RelabelType (because oprcode is regproc, and
>> needs to be relabeled to oid).
>
> If it thinks an EquivalenceMember must be a Var, it's outright
> broken; I'm pretty sure any nonvolatile expression is possible.
it's trivial to crash it with a query like this:
SELECT 1 FROM pg_proc p JOIN pg_operator o
ON oprcode = (p.oid::int4 + 1);
Thanks for looking at this Tomas. Sorry it's taken this long for me to respond, but I wanted to do so with a working patch.
I've made a few changes in the attached version:
1. Fixed Assert failure when eclass contained non-Var types, as reported by you.
2. Added support for expression indexes.
The expression indexes should really be supported as with the previous patch they worked ok with LEFT JOINs, but not INNER JOINs, that inconsistency is pretty much a bug in my book, so I've fixed it.
The one weird quirk with the patch is that, if we had some tables like:
create table r1 (id int primary key, value int not null);
create table r2 (id int primary key);
And a query:
The join is not properly detected as a unique join. This is down to the eclass containing 3 members, when the code finds the 2nd ec member for r1 it returns false as it already found another one. I'm not quite sure what the fix is for this just yet as it's not quite clear to me how the code would work if there were 2 vars from each relation in the same eclass... If these Vars were of different types then which operators would we use for them? I'm not sure if having eclassjoin_is_unique_join() append every possible combination to index_exprs is the answer. I'm also not quite sure if the complexity is worth the extra code either.
Updated patch attached.
Thank you for looking it and reporting that bug.
Regards
David Rowley
Attachment
On 26 February 2015 at 13:15, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
BTW, I find this coding (first cast, then check) rather strange:
Var *var = (Var *) ecm->em_expr;
if (!IsA(var, Var))
continue; /* Ignore Consts */
It's probably harmless, but I find it confusing and I can't remember
seeing it elsewhere in the code (for example clausesel.c and such) use
this style:
... clause is (Node*) ...
if (IsA(clause, Var))
{
Var *var = (Var*)clause;
...
}
or
Var * var = NULL;
if (! IsA(clause, Var))
// error / continue
var = (Var*)clause;
Yeah, it does look a bit weird, but if you search the code for "IsA(var, Var)" you'll see it's nothing new.
Regards
David Rowley
Sorry for the dealy, I've returned to this. > > Performance: > > > > I looked on the performance gain this patch gives. For several > > on-memory joins, I had gains about 3% for merge join, 5% for hash > > join, and 10% for nestloop (@CentOS6), for simple 1-level joins > > with aggregation similar to what you mentioned in previous > > mail like this. > > > > =# SELECT count(*) FROM t1 JOIN t2 USING (id); > > > > Of course, the gain would be trivial when returning many tuples, > > or scans go to disk. > > > > > That's true, but joins where rows are filtered after the join condition > will win here too: Yes, when not many tuples was returned, the gain remains in inverse proportion to the number of the returning rows. Heavy (time-consuming) on-memory join returning several rows should have gain from this. > Also queries with a GROUP BY clause. (Since grouping is always performed > after the join :-( ) And in many cases clinged by additional sorts:p As such, so many factors affect the performance so improvements which give linear performance gain are often difficult to gain approval to being applied. I suppose we need more evidence how and in what situation we will receive the gain. > I haven't measured the loss by additional computation when the > > query is regarded as not a "single join". > > > I think this would be hard to measure, but likely if it is measurable then > you'd want to look at just planning time, rather than planning and > execution time. From the another point of view, the patch looks a bit large for the gain (for me). Addition to it, it loops by many levels. [mark_unique_joins()] foreach (joinlist) [eclassjoin_is_unique_join()] foreach(joinlist) foreach(root->eq_classes) foreach(root->ec_members) The innermost loop could be roughly said to iterate about 10^3*2 times for a join of 10 tables all of which have index and no eclass is shared among them. From my expreince, we must have the same effect by far less iteration levels. This is caueed the query like this, as a bad case. create table t1 (a int, b int); create index i_t1 (a int); create table t2 (like t1 including indexes); .... create table t10 (.....); insert into t1 (select a, a * 10 from generate_series(0, 100) a); insert into t2 (select a * 10, a * 20 from generate_series(0, 100) a); ... insert into t10 (select a * 90, a * 100 from generate_series(0, 100) a); explain select t1.a, t10.b from t1 join t2 on (t1.b = t2.a) join t3 on (t2.b = t3.a) join t4 on (t3.b = t4.a) join t5 on(t4.b = t5.a) join t6 on (t5.b = t6.a) join t7 on (t6.b = t7.a) join t8 on (t7.b = t8.a) join t9 on (t8.b = t9.a) joint10 on (t9.b = t10.a); The head takes 3ms for planning and the patched version takes around 5ms while pure execution time is 1ms.. I think it is a too long extra time. > > Explain representation: > > > > I don't like that the 'Unique Join:" occupies their own lines in > > the result of explain, moreover, it doesn't show the meaning > > clearly. What about the representation like the following? Or, > > It might not be necessary to appear there. > > > > Nested Loop ... > > Output: .... > > - Unique Jion: Yes > > -> Seq Scan on public.t2 (cost = ... > > - -> Seq Scan on public.t1 (cost = .... > > + -> Seq Scan on public.t1 (unique) (cost = .... > > > > > > > Yeah I'm not too big a fan of this either. I did at one evolution of the > patch I had "Unique Left Join" in the join node's line in the explain > output, but I hated that more and changed it just to be just in the VERBOSE > output, and after I did that I didn't want to change the join node's line > only when in verbose mode. I do quite like that it's a separate item for > the XML and JSON explain output. That's perhaps quite useful when the > explain output must be processed by software. > > I'm totally open for better ideas on names, but I just don't have any at > the moment. "Unique Left Join" looks too bold. Anyway changing there is rather easier. > > Coding: > > > > The style looks OK. Could applied on master. > > It looks to work as you expected for some cases. > > > > Random comments follow. > > > > - Looking specialjoin_is_unique_join, you seem to assume that > > !delay_upper_joins is a sufficient condition for not being > > "unique join". The former simplly indicates that "don't > > commute with upper OJs" and the latter seems to indicate that > > "the RHS is unique", Could you tell me what kind of > > relationship is there between them? > > > > The rationalisation around that are from the (now changed) version of > join_is_removable(), where the code read: > > /* > * Must be a non-delaying left join to a single baserel, else we aren't > * going to be able to do anything with it. > */ > if (sjinfo->jointype != JOIN_LEFT || > sjinfo->delay_upper_joins) > return false; > > I have to admit that I didn't go and investigate why delayed upper joins > cannot be removed by left join removal code, I really just assumed that > we're just unable to prove that a join to such a relation won't match more > than one outer side's row. I kept this to maintain that behaviour as I > assumed it was there for a good reason. Yes, I suppose that you thought so and it should work as expected as a logical calculation for now. But delay_upper_joins is the condition for not being allowed to be removed and the meaning looks not to have been changed, right? If so, I think they should be treated according to their right meanings. Specifically, delay_upper_joins should be removed from the condition for unique_join to the original location, at the very first of join_is_removable. > > - The naming "unique join" seems a bit obscure for me, but I > > don't have no better idea:( However, the member name > > "has_unique_join" seems to be better to be "is_unique_join". > > > > > Yeah, I agree with this. I just can't find something short enough that > means "based on the join condition, the inner side of the join will never > produce more than 1 row for any single outer row". Unique join was the best > I came up with. I'm totally open for better ideas. It should be good to write the precise definition of "unique join" in the appropriate comment. > I agree that is_unique_join is better than has_unique_join. I must have > just copied the has_eclass_joins struct member without thinking too hard > about it. I've now changed this in my local copy of the patch. > > - eclassjoin_is_unique_join involves seemingly semi-exhaustive > > scan on ec members for every element in joinlist. Even if not, > > it might be thought to be a bit too large for the gain. Could > > you do the equivelant things by more small code? > > > > > I'd imagine some very complex queries could have many equivalence classes > and members, though I hadn't really thought that this number would be so > great that processing here would suffer much. There's quite a few fast > paths out, like the test to ensure that both rels are mentioned > in ec_relids. Though for a query which is so complex to have a great number > of eclass' and members, likely there would be quite a few relations > involved and planning would be slower anyway. I'm not quite sure how else I > could write this and still find the unique join cases each time. We can't > really just give up half way through looking. A rough estimate for a bad case is mentioned above. I had the similar comment about the complexity for code to find implicitly uniquely ordered join peer. Perhaps there's many improvable points implemented easily by looping over join_list and eclasses. But we would shuold do them without piled loops. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On 10 March 2015 at 19:19, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
From the another point of view, the patch looks a bit large for
the gain (for me). Addition to it, it loops by many levels.
[mark_unique_joins()]
foreach (joinlist)
[eclassjoin_is_unique_join()]
foreach(joinlist)
foreach(root->eq_classes)
foreach(root->ec_members)
The innermost loop could be roughly said to iterate about 10^3*2
times for a join of 10 tables all of which have index and no
eclass is shared among them. From my expreince, we must have the
same effect by far less iteration levels.
This is caueed the query like this, as a bad case.
create table t1 (a int, b int);
create index i_t1 (a int);
create table t2 (like t1 including indexes);
....
create table t10 (.....);
insert into t1 (select a, a * 10 from generate_series(0, 100) a);
insert into t2 (select a * 10, a * 20 from generate_series(0, 100) a);
...
insert into t10 (select a * 90, a * 100 from generate_series(0, 100) a);
explain select t1.a, t10.b from t1 join t2 on (t1.b = t2.a) join t3 on (t2.b = t3.a) join t4 on (t3.b = t4.a) join t5 on (t4.b = t5.a) join t6 on (t5.b = t6.a) join t7 on (t6.b = t7.a) join t8 on (t7.b = t8.a) join t9 on (t8.b = t9.a) join t10 on (t9.b = t10.a);
The head takes 3ms for planning and the patched version takes
around 5ms while pure execution time is 1ms.. I think it is a too
long extra time.
This smells quite fishy to me. I'd be quite surprised if your machine took an extra 2 ms to do this.
I've run what I think is the same test on my 5 year old i5 laptop and attached the .sql file which I used to generate the same schema as you've described.
I've also attached the results of the explain analyze "Planning Time:" output from patched and unpatched using your test case.
I was unable to notice any difference in plan times between both versions. In fact master came out slower, which is likely just the noise in the results.
Just to see how long mark_unique_joins() takes with your test case I changed query_planner() to call mark_unique_joins() 1 million times:
{
int x;
for (x = 0; x < 1000000;x++)
mark_unique_joins(root, joinlist);
}
I also got rid of the fast path test which bails out if the join is already marked as unique.
/* check if we've already marked this join as unique on a previous call */
/*if (idxrel->is_unique_join)
return true;
*/
On my machine after making these changes, it takes 800 ms to plan the query. So it seems that's around 800 nano seconds for a single call to mark_unique_joins().
Perhaps you've accidentally compiled the patched version with debug asserts?
Are you able to retest this?
Regards
David Rowley
Attachment
Hello. The performance drag was not so large after all. For all that, I agree that the opition that this kind of separate multiple-nested loops on relations, joins or ECs and so on for searching something should be avoided. I personally feel that additional time to such an extent (around 1%) would be tolerable if it affected a wide range of queries or it brought more obvious gain. Unfortunately I can't decide this to be 'ready for commiter' for now. I think we should have this on smaller footprint, in a method without separate exhauxtive searching. I also had very similar problem in the past but I haven't find such a way for my problem.. At Wed, 11 Mar 2015 01:32:24 +1300, David Rowley <dgrowleyml@gmail.com> wrote in <CAApHDvpEXAjs6mV2ro4=3qbzpx=pLrteinX0J2YHq6wrp85pPw@mail.gmail.com> > > explain select t1.a, t10.b from t1 join t2 on (t1.b = t2.a) join t3 on > > (t2.b = t3.a) join t4 on (t3.b = t4.a) join t5 on (t4.b = t5.a) join t6 on > > (t5.b = t6.a) join t7 on (t6.b = t7.a) join t8 on (t7.b = t8.a) join t9 on > > (t8.b = t9.a) join t10 on (t9.b = t10.a); > > > > The head takes 3ms for planning and the patched version takes > > around 5ms while pure execution time is 1ms.. I think it is a too > > long extra time. > > > > > This smells quite fishy to me. I'd be quite surprised if your machine took > an extra 2 ms to do this. You're right. Sorry. I was amazed by the numbers.. I took again the times for both master and patched on master (some conflict arised). Configured with no options so compiled with -O2 and no assertions. Measured the planning time for the test query 10 times and calcualted the average. patched: 1.883ms (stddev 0.034) master: 1.861ms (stddev 0.042) About 0.02ms, 1% extra time looks to be taken by the extra processing. regards, > I've run what I think is the same test on my 5 year old i5 laptop and > attached the .sql file which I used to generate the same schema as you've > described. > > I've also attached the results of the explain analyze "Planning Time:" > output from patched and unpatched using your test case. > > I was unable to notice any difference in plan times between both versions. > In fact master came out slower, which is likely just the noise in the > results. > > Just to see how long mark_unique_joins() takes with your test case I > changed query_planner() to call mark_unique_joins() 1 million times: > > { > int x; > for (x = 0; x < 1000000;x++) > mark_unique_joins(root, joinlist); > } > > > I also got rid of the fast path test which bails out if the join is already > marked as unique. > > /* check if we've already marked this join as unique on a previous call */ > /*if (idxrel->is_unique_join) > return true; > */ > > On my machine after making these changes, it takes 800 ms to plan the > query. So it seems that's around 800 nano seconds for a single call to > mark_unique_joins(). > > Perhaps you've accidentally compiled the patched version with debug asserts? > > Are you able to retest this? -- Kyotaro Horiguchi NTT Open Source Software Center
On 13 March 2015 at 20:34, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
David Rowley
Hello. The performance drag was not so large after all.
Great, thanks for retesting this.
For all that, I agree that the opition that this kind of separate
multiple-nested loops on relations, joins or ECs and so on for
searching something should be avoided. I personally feel that
additional time to such an extent (around 1%) would be tolerable
if it affected a wide range of queries or it brought more obvious
gain.
Do you have any ideas on an implementation of how we can avoid checking all eclasses for each combination of joins?
The only thing I can think of would be to add a List of eclasses on RelOptInfo for all eclasses that the RelOptInfo has. That would save the looping over every single eclass in eclassjoin_is_unique_join() and save from having to check if the relation is mentioned in the eclass before continuing. I could certainly make this change, but I'd be worried that it would just add bloat to the patch and cause a committed to question it.
I could also almost completely remove the extra plan time of your test case by either adding a proper pre-check to ensure the relation has unique indexes, rather than just indexes or I could add a new list to RelOptInfo which stores unique index, then I could just check if that's NIL before going any further in eclassjoin_is_unique_join(), but I come from a world where every relation has a primary key, so I'd just imagine that would not be hit in enough real cases. I'd imagine that pre-check is only there because it's so cheap in the first place.
For testing, I added some code to mark_unique_joins() to spit out a NOTICE:
if (eclassjoin_is_unique_join(root, joinlist, rtr))
{
root->simple_rel_array[rtr->rtindex]->is_unique_join = true;
elog(NOTICE, "Unique Join: Yes");
}
else
elog(NOTICE, "Unique Join: No");
and the same below for special joins too.
On running the regression tests I see:
"Unique Join: Yes" 1557 times
"Unique Join: No" 11563 times
I would imagine the regression tests are not the best thing to base this one as they tend to exercise more unusual cases.
I have an OLTP type application here which I will give a bit of exercise and see how that compares.
Unfortunately I can't decide this to be 'ready for commiter' for
now. I think we should have this on smaller footprint, in a
method without separate exhauxtive searching. I also had very
similar problem in the past but I haven't find such a way for my
problem..
I don't think it's ready yet either. I've just been going over a few things and looking at Tom's recent commit b557226 in pathnode.c I've got a feeling that this patch would need to re-factor some code that's been modified around the usage of relation_has_unique_index_for() as when this code is called, the semi joins have already been analysed to see if they're unique, so it could be just a case of ripping all of that out of create_unique_path() and just putting a check to say rel->is_unique_join in there. But if I do that then I'm not quite sure if SpecialJoinInfo->semi_rhs_exprs and SpecialJoinInfo->semi_operators would still be needed at all. These were only added in b557226. Changing this would help reduce the extra planning time when the query contains semi-joins. To be quite honest, this type of analysis belongs in analyzejoin.c anyway. I'm tempted to hack at this part some more, but I'd rather Tom had a quick glance at what I'm trying to do here first.
At Wed, 11 Mar 2015 01:32:24 +1300, David Rowley <dgrowleyml@gmail.com> wrote in <CAApHDvpEXAjs6mV2ro4=3qbzpx=pLrteinX0J2YHq6wrp85pPw@mail.gmail.com>
> > explain select t1.a, t10.b from t1 join t2 on (t1.b = t2.a) join t3 on
> > (t2.b = t3.a) join t4 on (t3.b = t4.a) join t5 on (t4.b = t5.a) join t6 on
> > (t5.b = t6.a) join t7 on (t6.b = t7.a) join t8 on (t7.b = t8.a) join t9 on
> > (t8.b = t9.a) join t10 on (t9.b = t10.a);
> >
> > The head takes 3ms for planning and the patched version takes
> > around 5ms while pure execution time is 1ms.. I think it is a too
> > long extra time.
> >
> >
> This smells quite fishy to me. I'd be quite surprised if your machine took
> an extra 2 ms to do this.
You're right. Sorry. I was amazed by the numbers..
I took again the times for both master and patched on master
(some conflict arised). Configured with no options so compiled
with -O2 and no assertions. Measured the planning time for the
test query 10 times and calcualted the average.
patched: 1.883ms (stddev 0.034)
master: 1.861ms (stddev 0.042)
About 0.02ms, 1% extra time looks to be taken by the extra
processing.
Is this still using the same test query as I attached in my previous email?
This is still 25 times more of a slowdown as to what I witnessed by calling mark_unique_joins() 1 million times in a tight loop, but of course much closer to what I would have thought. You're getting 20,000 nanoseconds and I'm getting 800 nanoseconds, but our overall planning times are very similar, so I assume our processors are of similar speeds.
It's certainly difficult to know if the extra planning will pay off enough for it to reduce total CPU time between both planning and executing the query. There actually are cases where the planning time will be reduced by this patch. In the case when a LEFT JOIN is removed the master code currently goes off and re-checks all relations to see if the removal of that 1 relation has caused it to be possible for other relations to be removed, and this currently involves analysing unique indexes again to see if the join is still unique. With this patch I've cut out most of the work which is done in join_is_removable(), it no longer does that unique analysis of the join condition as that's done in mark_unique_joins(). The cases that should be faster are, if a join is already marked as unique then the code will never test that again.
I have quite a few other ideas lined up which depend on knowing if a join is unique, all these ideas naturally depend on this patch. Really the 5%-10% join speed-up was the best excuse that I could find have this work actually accepted. My first email in this thread explains some of my other ideas.
In addition to those ideas I've been thinking about how that if we could determine that a plan returns a single row, e.g a key lookup then we could likely safely cache those plans, as there were never be any data skew in these situations. I have no immediate plans to go and work on this, but quite possibly I will in the future if I manage to get unique joins in first.
Regards
David Rowley
On 14 March 2015 at 14:51, David Rowley <dgrowleyml@gmail.com> wrote:
On 13 March 2015 at 20:34, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:For all that, I agree that the opition that this kind of separate
multiple-nested loops on relations, joins or ECs and so on for
searching something should be avoided. I personally feel that
additional time to such an extent (around 1%) would be tolerable
if it affected a wide range of queries or it brought more obvious
gain.For testing, I added some code to mark_unique_joins() to spit out a NOTICE:if (eclassjoin_is_unique_join(root, joinlist, rtr)){root->simple_rel_array[rtr->rtindex]->is_unique_join = true;elog(NOTICE, "Unique Join: Yes");}elseelog(NOTICE, "Unique Join: No");and the same below for special joins too.On running the regression tests I see:"Unique Join: Yes" 1557 times"Unique Join: No" 11563 times
With this notice emitting code in place, I opened up pgAdmin and had a click around for a few minutes.
If I search the log file I see:
Unique Join: No 940 times
Unique Join: Yes 585 times
It seems that joins with a unique inner side are quite common here.
Regards
David Rowley
On 14 March 2015 at 14:51, David Rowley <dgrowleyml@gmail.com> wrote:
On 13 March 2015 at 20:34, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:Unfortunately I can't decide this to be 'ready for commiter' fornow. I think we should have this on smaller footprint, in a
method without separate exhauxtive searching. I also had very
similar problem in the past but I haven't find such a way for my
problem..I don't think it's ready yet either. I've just been going over a few things and looking at Tom's recent commit b557226 in pathnode.c I've got a feeling that this patch would need to re-factor some code that's been modified around the usage of relation_has_unique_index_for() as when this code is called, the semi joins have already been analysed to see if they're unique, so it could be just a case of ripping all of that out of create_unique_path() and just putting a check to say rel->is_unique_join in there. But if I do that then I'm not quite sure if SpecialJoinInfo->semi_rhs_exprs and SpecialJoinInfo->semi_operators would still be needed at all. These were only added in b557226. Changing this would help reduce the extra planning time when the query contains semi-joins. To be quite honest, this type of analysis belongs in analyzejoin.c anyway. I'm tempted to hack at this part some more, but I'd rather Tom had a quick glance at what I'm trying to do here first.
I decided to hack away any change the code Tom added in b557226. I've changed it so that create_unique_path() now simply just uses if (rel->is_unique_join), instead off all the calls to relation_has_unique_index_for() and query_is_distinct_for(). This vastly simplifies that code. One small change is that Tom's checks for uniqueness on semi joins included checks for volatile functions, this check didn't exist in the original join removal code, so I've left it out. We'll never match a expression with a volatile function to a unique index as indexes don't allow volatile function expressions anyway. So as I understand it this only serves as a fast path out if the join condition has a volatile function... But I'd assume that check is not all that cheap.
I ended up making query_supports_distinctness() and query_is_distinct_for() static in analyzejoins.c as they're not used in any other files. relation_has_unique_index_for() is also now only used in analyzejoins.c, but I've not moved it into that file yet as I don't want to bloat the patch. I just added a comment to say it needs moved.
I've also added a small amount of code to query_is_distinct_for() which allows subqueries such as (select 1 a offset 0) to be marked as unique. I thought it was a little silly that these were not being detected as unique, so I fixed it. This has the side effect of allowing left join removals for queries such as: select t1.* from t1 left join (select 1 a offset 0) a on t1.id=a.a;
Updated patch attached.
Regards
David Rowley
Attachment
Hello, I don't have enough time for now but made some considerations on this. It might be a way that marking unique join peer at bottom and propagate up, not searching from top of join list. Around create_join_clause might be a candidate for it. I'll investigate that later. regards, At Sat, 14 Mar 2015 23:05:24 +1300, David Rowley <dgrowleyml@gmail.com> wrote in <CAApHDvoh-EKF51QQyNoJUe0eHYMZw6OzJjjgYP63Cmw7QfebjA@mail.gmail.com> dgrowleyml> On 14 March 2015 at 14:51, David Rowley <dgrowleyml@gmail.com> wrote: dgrowleyml> dgrowleyml> > On 13 March 2015 at 20:34, Kyotaro HORIGUCHI < dgrowleyml> > horiguchi.kyotaro@lab.ntt.co.jp> wrote: dgrowleyml> > dgrowleyml> >> Unfortunately I can't decide this to be 'ready for commiter' for dgrowleyml> >> dgrowleyml> > now. I think we should have this on smaller footprint, in a dgrowleyml> >> method without separate exhauxtive searching. I also had very dgrowleyml> >> similar problem in the past but I haven't find such a way for my dgrowleyml> >> problem.. dgrowleyml> >> dgrowleyml> >> dgrowleyml> > I don't think it's ready yet either. I've just been going over a few dgrowleyml> > things and looking at Tom's recent commit b557226 in pathnode.c I've got a dgrowleyml> > feeling that this patch would need to re-factor some code that's been dgrowleyml> > modified around the usage of relation_has_unique_index_for() as when this dgrowleyml> > code is called, the semi joins have already been analysed to see if they're dgrowleyml> > unique, so it could be just a case of ripping all of that out dgrowleyml> > of create_unique_path() and just putting a check to say rel->is_unique_join dgrowleyml> > in there. But if I do that then I'm not quite sure if dgrowleyml> > SpecialJoinInfo->semi_rhs_exprs and SpecialJoinInfo->semi_operators would dgrowleyml> > still be needed at all. These were only added in b557226. Changing this dgrowleyml> > would help reduce the extra planning time when the query contains dgrowleyml> > semi-joins. To be quite honest, this type of analysis belongs in dgrowleyml> > analyzejoin.c anyway. I'm tempted to hack at this part some more, but I'd dgrowleyml> > rather Tom had a quick glance at what I'm trying to do here first. dgrowleyml> > dgrowleyml> > dgrowleyml> dgrowleyml> I decided to hack away any change the code Tom added in b557226. I've dgrowleyml> changed it so that create_unique_path() now simply just uses if dgrowleyml> (rel->is_unique_join), instead off all the calls to dgrowleyml> relation_has_unique_index_for() and query_is_distinct_for(). This vastly dgrowleyml> simplifies that code. One small change is that Tom's checks for uniqueness dgrowleyml> on semi joins included checks for volatile functions, this check didn't dgrowleyml> exist in the original join removal code, so I've left it out. We'll never dgrowleyml> match a expression with a volatile function to a unique index as indexes dgrowleyml> don't allow volatile function expressions anyway. So as I understand it dgrowleyml> this only serves as a fast path out if the join condition has a volatile dgrowleyml> function... But I'd assume that check is not all that cheap. dgrowleyml> dgrowleyml> I ended up making query_supports_distinctness() and query_is_distinct_for() dgrowleyml> static in analyzejoins.c as they're not used in any other files. dgrowleyml> relation_has_unique_index_for() is also now only used in analyzejoins.c, dgrowleyml> but I've not moved it into that file yet as I don't want to bloat the dgrowleyml> patch. I just added a comment to say it needs moved. dgrowleyml> dgrowleyml> I've also added a small amount of code to query_is_distinct_for() which dgrowleyml> allows subqueries such as (select 1 a offset 0) to be marked as unique. I dgrowleyml> thought it was a little silly that these were not being detected as unique, dgrowleyml> so I fixed it. This has the side effect of allowing left join removals for dgrowleyml> queries such as: select t1.* from t1 left join (select 1 a offset 0) a on dgrowleyml> t1.id=a.a; dgrowleyml> dgrowleyml> Updated patch attached. dgrowleyml> dgrowleyml> Regards dgrowleyml> dgrowleyml> David Rowley
Hello, The attached is non-search version of unique join. It is not fully examined but looks to work as expected. Many small changes make the patch larger but affected area is rather small. What do you think about this? > Hello, I don't have enough time for now but made some > considerations on this. > > It might be a way that marking unique join peer at bottom and > propagate up, not searching from top of join list. > Around create_join_clause might be a candidate for it. > I'll investigate that later. regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index a951c55..b8a68b5 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -1151,9 +1151,16 @@ ExplainNode(PlanState *planstate, List *ancestors, appendStringInfo(es->str," %s Join", jointype); else if (!IsA(plan, NestLoop)) appendStringInfoString(es->str, " Join"); + if (((Join *)plan)->inner_unique) + appendStringInfoString(es->str, "(inner unique)"); + } else + { ExplainPropertyText("Join Type", jointype, es); + ExplainPropertyText("Inner unique", + ((Join *)plan)->inner_unique?"true":"false", es); + } } break; case T_SetOp: diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c index 1d78cdf..d3b14e5 100644 --- a/src/backend/executor/nodeHashjoin.c +++ b/src/backend/executor/nodeHashjoin.c @@ -306,10 +306,11 @@ ExecHashJoin(HashJoinState *node) } /* - * In a semijoin, we'll consider returning the first - * match, but after that we're done with this outer tuple. + * We'll consider returning the first match if the inner + * is unique, but after that we're done with this outer + * tuple. */ - if (node->js.jointype == JOIN_SEMI) + if (node->js.inner_unique) node->hj_JoinState = HJ_NEED_NEW_OUTER; if (otherqual == NIL || @@ -451,6 +452,7 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags) hjstate = makeNode(HashJoinState); hjstate->js.ps.plan = (Plan *) node; hjstate->js.ps.state = estate; + hjstate->js.inner_unique = node->join.inner_unique; /* * Miscellaneous initialization @@ -498,8 +500,10 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags) /* set up null tuples for outer joins,if needed */ switch (node->join.jointype) { - case JOIN_INNER: case JOIN_SEMI: + hjstate->js.inner_unique = true; + /* fall through */ + case JOIN_INNER: break; case JOIN_LEFT: case JOIN_ANTI: diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c index 15742c5..3c21ffe 100644 --- a/src/backend/executor/nodeMergejoin.c +++ b/src/backend/executor/nodeMergejoin.c @@ -840,10 +840,11 @@ ExecMergeJoin(MergeJoinState *node) } /* - * In a semijoin, we'll consider returning the first - * match, but after that we're done with this outer tuple. + * We'll consider returning the first match if the inner + * is unique, but after that we're done with this outer + * tuple. */ - if (node->js.jointype == JOIN_SEMI) + if (node->js.inner_unique) node->mj_JoinState = EXEC_MJ_NEXTOUTER; qualResult = (otherqual == NIL || @@ -1486,6 +1487,8 @@ ExecInitMergeJoin(MergeJoin *node, EState *estate, int eflags) mergestate->js.ps.plan = (Plan *)node; mergestate->js.ps.state = estate; + mergestate->js.inner_unique = node->join.inner_unique; + /* * Miscellaneous initialization * @@ -1553,8 +1556,10 @@ ExecInitMergeJoin(MergeJoin *node, EState *estate, int eflags) switch (node->join.jointype) { - case JOIN_INNER: case JOIN_SEMI: + mergestate->js.inner_unique = true; + /* fall through */ + case JOIN_INNER: mergestate->mj_FillOuter = false; mergestate->mj_FillInner = false; break; diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c index e66bcda..342c448 100644 --- a/src/backend/executor/nodeNestloop.c +++ b/src/backend/executor/nodeNestloop.c @@ -247,10 +247,10 @@ ExecNestLoop(NestLoopState *node) } /* - * In a semijoin, we'll consider returning the first match, but - * after that we're done with this outer tuple. + * We'll consider returning the first match if the inner is + * unique, but after that we're done with this outer tuple. */ - if (node->js.jointype == JOIN_SEMI) + if (node->js.inner_unique) node->nl_NeedNewOuter = true; if (otherqual == NIL || ExecQual(otherqual,econtext, false)) @@ -310,6 +310,8 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags) nlstate->js.ps.plan = (Plan *) node; nlstate->js.ps.state = estate; + nlstate->js.inner_unique = node->join.inner_unique; + /* * Miscellaneous initialization * @@ -354,8 +356,10 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags) switch (node->join.jointype) { - case JOIN_INNER: case JOIN_SEMI: + nlstate->js.inner_unique = true; + /* fall through */ + case JOIN_INNER: break; case JOIN_LEFT: case JOIN_ANTI: diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c index 1da953f..8363216 100644 --- a/src/backend/optimizer/path/joinpath.c +++ b/src/backend/optimizer/path/joinpath.c @@ -26,18 +26,21 @@ ((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))static void sort_inner_and_outer(PlannerInfo*root, RelOptInfo *joinrel, - RelOptInfo *outerrel, RelOptInfo *innerrel, + RelOptInfo *outerrel, + RelOptInfo *innerrel, bool inner_unique, List *restrictlist, List *mergeclause_list, JoinType jointype, SpecialJoinInfo *sjinfo, Relids param_source_rels,Relids extra_lateral_rels);static void match_unsorted_outer(PlannerInfo *root, RelOptInfo *joinrel, - RelOptInfo *outerrel, RelOptInfo *innerrel, + RelOptInfo *outerrel, + RelOptInfo *innerrel, bool inner_unique, List *restrictlist, List *mergeclause_list, JoinType jointype, SpecialJoinInfo *sjinfo, SemiAntiJoinFactors*semifactors, Relids param_source_rels, Relids extra_lateral_rels);static void hash_inner_and_outer(PlannerInfo*root, RelOptInfo *joinrel, - RelOptInfo *outerrel, RelOptInfo *innerrel, + RelOptInfo *outerrel, + RelOptInfo *innerrel, bool inner_unique, List *restrictlist, JoinType jointype, SpecialJoinInfo *sjinfo, SemiAntiJoinFactors *semifactors, @@ -49,7 +52,8 @@ static List *select_mergejoin_clauses(PlannerInfo *root, List *restrictlist, JoinType jointype, bool *mergejoin_allowed); - +static inline bool clause_sides_match_join(RestrictInfo *rinfo, RelOptInfo *outerrel, + RelOptInfo *innerrel);/* * add_paths_to_joinrel @@ -89,6 +93,7 @@ add_paths_to_joinrel(PlannerInfo *root, Relids param_source_rels = NULL; Relids extra_lateral_rels= NULL; ListCell *lc; + bool inner_unique = false; /* * Find potential mergejoin clauses. We can skip this if we are not @@ -115,6 +120,31 @@ add_paths_to_joinrel(PlannerInfo *root, &semifactors); /* + * We can optimize inner loop execution for joins on which the inner rel + * is unique on the restrictlist. + */ + if (jointype == JOIN_INNER && + innerrel->rtekind == RTE_RELATION && + restrictlist) + { + /* relation_has_unique_index_for adds some restrictions */ + int org_len = list_length(restrictlist); + ListCell *lc; + + foreach (lc, restrictlist) + { + clause_sides_match_join((RestrictInfo *) lfirst(lc), + outerrel, innerrel); + } + if (relation_has_unique_index_for(root, innerrel, restrictlist, + NIL, NIL)) + inner_unique = true; + + /* Remove restirictions added by the function */ + list_truncate(restrictlist, org_len); + } + + /* * Decide whether it's sensible to generate parameterized paths for this * joinrel, and if so, which relationssuch paths should require. There * is usually no need to create a parameterized result path unless there @@ -212,7 +242,7 @@ add_paths_to_joinrel(PlannerInfo *root, * sorted. Skip this if we can't mergejoin. */ if(mergejoin_allowed) - sort_inner_and_outer(root, joinrel, outerrel, innerrel, + sort_inner_and_outer(root, joinrel, outerrel, innerrel, inner_unique, restrictlist,mergeclause_list, jointype, sjinfo, param_source_rels,extra_lateral_rels); @@ -225,7 +255,7 @@ add_paths_to_joinrel(PlannerInfo *root, * joins at all, so it wouldn't work in the prohibited caseseither.) */ if (mergejoin_allowed) - match_unsorted_outer(root, joinrel, outerrel, innerrel, + match_unsorted_outer(root, joinrel, outerrel, innerrel, inner_unique, restrictlist,mergeclause_list, jointype, sjinfo, &semifactors, param_source_rels,extra_lateral_rels); @@ -256,7 +286,7 @@ add_paths_to_joinrel(PlannerInfo *root, * joins, because there may be no other alternative. */ if (enable_hashjoin || jointype == JOIN_FULL) - hash_inner_and_outer(root, joinrel, outerrel, innerrel, + hash_inner_and_outer(root, joinrel, outerrel, innerrel, inner_unique, restrictlist,jointype, sjinfo, &semifactors, param_source_rels, extra_lateral_rels); @@ -277,6 +307,7 @@ try_nestloop_path(PlannerInfo *root, Relids extra_lateral_rels, Path*outer_path, Path *inner_path, + bool inner_unique, List *restrict_clauses, List *pathkeys){ @@ -349,6 +380,7 @@ try_nestloop_path(PlannerInfo *root, semifactors, outer_path, inner_path, + inner_unique, restrict_clauses, pathkeys, required_outer)); @@ -374,6 +406,7 @@ try_mergejoin_path(PlannerInfo *root, Relids extra_lateral_rels, Path *outer_path, Path *inner_path, + bool inner_unique, List *restrict_clauses, List *pathkeys, List *mergeclauses, @@ -434,6 +467,7 @@ try_mergejoin_path(PlannerInfo *root, sjinfo, outer_path, inner_path, + inner_unique, restrict_clauses, pathkeys, required_outer, @@ -463,6 +497,7 @@ try_hashjoin_path(PlannerInfo *root, Relids extra_lateral_rels, Path*outer_path, Path *inner_path, + bool inner_unique, List *restrict_clauses, List *hashclauses){ @@ -510,6 +545,7 @@ try_hashjoin_path(PlannerInfo *root, semifactors, outer_path, inner_path, + inner_unique, restrict_clauses, required_outer, hashclauses)); @@ -574,6 +610,7 @@ sort_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel, RelOptInfo *outerrel, RelOptInfo *innerrel, + bool inner_unique, List *restrictlist, List *mergeclause_list, JoinType jointype, @@ -629,6 +666,7 @@ sort_inner_and_outer(PlannerInfo *root, inner_path, sjinfo); Assert(inner_path); jointype = JOIN_INNER; + inner_unique = true; } /* @@ -712,6 +750,7 @@ sort_inner_and_outer(PlannerInfo *root, extra_lateral_rels, outer_path, inner_path, + inner_unique, restrictlist, merge_pathkeys, cur_mergeclauses, @@ -762,6 +801,7 @@ match_unsorted_outer(PlannerInfo *root, RelOptInfo *joinrel, RelOptInfo *outerrel, RelOptInfo *innerrel, + bool inner_unique, List *restrictlist, List *mergeclause_list, JoinType jointype, @@ -832,6 +872,7 @@ match_unsorted_outer(PlannerInfo *root, inner_cheapest_total = (Path *) create_unique_path(root,innerrel, inner_cheapest_total, sjinfo); Assert(inner_cheapest_total); + inner_unique = true; } else if (nestjoinOK) { @@ -901,6 +942,7 @@ match_unsorted_outer(PlannerInfo *root, extra_lateral_rels, outerpath, inner_cheapest_total, + inner_unique, restrictlist, merge_pathkeys); } @@ -927,6 +969,7 @@ match_unsorted_outer(PlannerInfo *root, extra_lateral_rels, outerpath, innerpath, + inner_unique, restrictlist, merge_pathkeys); } @@ -942,6 +985,7 @@ match_unsorted_outer(PlannerInfo *root, extra_lateral_rels, outerpath, matpath, + inner_unique, restrictlist, merge_pathkeys); } @@ -998,6 +1042,7 @@ match_unsorted_outer(PlannerInfo *root, extra_lateral_rels, outerpath, inner_cheapest_total, + inner_unique, restrictlist, merge_pathkeys, mergeclauses, @@ -1097,6 +1142,7 @@ match_unsorted_outer(PlannerInfo *root, extra_lateral_rels, outerpath, innerpath, + inner_unique, restrictlist, merge_pathkeys, newclauses, @@ -1143,6 +1189,7 @@ match_unsorted_outer(PlannerInfo *root, extra_lateral_rels, outerpath, innerpath, + inner_unique, restrictlist, merge_pathkeys, newclauses, @@ -1182,6 +1229,7 @@ hash_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel, RelOptInfo *outerrel, RelOptInfo *innerrel, + bool inner_unique, List *restrictlist, JoinType jointype, SpecialJoinInfo *sjinfo, @@ -1264,6 +1312,7 @@ hash_inner_and_outer(PlannerInfo *root, extra_lateral_rels, cheapest_total_outer, cheapest_total_inner, + inner_unique, restrictlist, hashclauses); /* no possibility of cheap startup here */ @@ -1284,6 +1333,7 @@ hash_inner_and_outer(PlannerInfo *root, extra_lateral_rels, cheapest_total_outer, cheapest_total_inner, + true, restrictlist, hashclauses); if (cheapest_startup_outer != NULL && @@ -1297,6 +1347,7 @@ hash_inner_and_outer(PlannerInfo *root, extra_lateral_rels, cheapest_startup_outer, cheapest_total_inner, + true, restrictlist, hashclauses); } @@ -1322,6 +1373,7 @@ hash_inner_and_outer(PlannerInfo *root, extra_lateral_rels, cheapest_startup_outer, cheapest_total_inner, + inner_unique, restrictlist, hashclauses); @@ -1360,6 +1412,7 @@ hash_inner_and_outer(PlannerInfo *root, extra_lateral_rels, outerpath, innerpath, + inner_unique, restrictlist, hashclauses); } diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c index cb69c03..448c556 100644 --- a/src/backend/optimizer/plan/createplan.c +++ b/src/backend/optimizer/plan/createplan.c @@ -131,12 +131,12 @@ static BitmapAnd *make_bitmap_and(List *bitmapplans);static BitmapOr *make_bitmap_or(List *bitmapplans);staticNestLoop *make_nestloop(List *tlist, List *joinclauses, List *otherclauses, List *nestParams, - Plan *lefttree, Plan *righttree, + Plan *lefttree, Plan *righttree, bool inner_unique, JoinType jointype);static HashJoin *make_hashjoin(List*tlist, List *joinclauses, List *otherclauses, List *hashclauses, - Plan *lefttree, Plan *righttree, + Plan *lefttree, Plan *righttree, bool inner_unique, JoinType jointype);static Hash *make_hash(Plan*lefttree, Oid skewTable, @@ -151,7 +151,7 @@ static MergeJoin *make_mergejoin(List *tlist, Oid *mergecollations, int *mergestrategies, bool *mergenullsfirst, - Plan *lefttree, Plan *righttree, + Plan *lefttree, Plan *righttree, bool inner_unique, JoinType jointype);static Sort *make_sort(PlannerInfo*root, Plan *lefttree, int numCols, AttrNumber *sortColIdx, Oid *sortOperators, @@ -2192,6 +2192,7 @@ create_nestloop_plan(PlannerInfo *root, nestParams, outer_plan, inner_plan, + best_path->inner_unique, best_path->jointype); copy_path_costsize(&join_plan->join.plan,&best_path->path); @@ -2486,6 +2487,7 @@ create_mergejoin_plan(PlannerInfo *root, mergenullsfirst, outer_plan, inner_plan, + best_path->jpath.inner_unique, best_path->jpath.jointype); /* Costs of sort and material steps are included in path cost already */ @@ -2612,6 +2614,7 @@ create_hashjoin_plan(PlannerInfo *root, hashclauses, outer_plan, (Plan *) hash_plan, + best_path->jpath.inner_unique, best_path->jpath.jointype); copy_path_costsize(&join_plan->join.plan, &best_path->jpath.path); @@ -3717,6 +3720,7 @@ make_nestloop(List *tlist, List *nestParams, Plan *lefttree, Plan *righttree, + bool inner_unique, JoinType jointype){ NestLoop *node = makeNode(NestLoop); @@ -3729,6 +3733,7 @@ make_nestloop(List *tlist, plan->righttree = righttree; node->join.jointype = jointype; node->join.joinqual= joinclauses; + node->join.inner_unique = inner_unique; node->nestParams = nestParams; return node; @@ -3741,6 +3746,7 @@ make_hashjoin(List *tlist, List *hashclauses, Plan *lefttree, Plan *righttree, + bool inner_unique, JoinType jointype){ HashJoin *node = makeNode(HashJoin); @@ -3754,6 +3760,7 @@ make_hashjoin(List *tlist, node->hashclauses = hashclauses; node->join.jointype = jointype; node->join.joinqual = joinclauses; + node->join.inner_unique = inner_unique; return node;} @@ -3801,6 +3808,7 @@ make_mergejoin(List *tlist, bool *mergenullsfirst, Plan *lefttree, Plan *righttree, + bool inner_unique, JoinType jointype){ MergeJoin *node = makeNode(MergeJoin); @@ -3818,6 +3826,7 @@ make_mergejoin(List *tlist, node->mergeNullsFirst = mergenullsfirst; node->join.jointype = jointype; node->join.joinqual = joinclauses; + node->join.inner_unique = inner_unique; return node;} @@ -4586,7 +4595,6 @@ make_unique(Plan *lefttree, List *distinctList) node->numCols = numCols; node->uniqColIdx = uniqColIdx; node->uniqOperators = uniqOperators; - return node;} diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index faca30b..299a51d 100644 --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -1135,8 +1135,8 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath, */ if (rel->rtekind ==RTE_RELATION && sjinfo->semi_can_btree && relation_has_unique_index_for(root, rel, NIL, - sjinfo->semi_rhs_exprs, - sjinfo->semi_operators)) + sjinfo->semi_rhs_exprs, + sjinfo->semi_operators)) { pathnode->umethod = UNIQUE_PATH_NOOP; pathnode->path.rows = rel->rows; @@ -1534,6 +1534,7 @@ create_nestloop_path(PlannerInfo *root, SemiAntiJoinFactors *semifactors, Path *outer_path, Path *inner_path, + bool inner_unique, List *restrict_clauses, List *pathkeys, Relids required_outer) @@ -1581,6 +1582,7 @@ create_nestloop_path(PlannerInfo *root, pathnode->jointype = jointype; pathnode->outerjoinpath= outer_path; pathnode->innerjoinpath = inner_path; + pathnode->inner_unique = inner_unique; pathnode->joinrestrictinfo = restrict_clauses; final_cost_nestloop(root,pathnode, workspace, sjinfo, semifactors); @@ -1615,6 +1617,7 @@ create_mergejoin_path(PlannerInfo *root, SpecialJoinInfo *sjinfo, Path *outer_path, Path *inner_path, + bool inner_unique, List *restrict_clauses, List *pathkeys, Relids required_outer, @@ -1638,6 +1641,7 @@ create_mergejoin_path(PlannerInfo *root, pathnode->jpath.jointype = jointype; pathnode->jpath.outerjoinpath= outer_path; pathnode->jpath.innerjoinpath = inner_path; + pathnode->jpath.inner_unique = inner_unique; pathnode->jpath.joinrestrictinfo = restrict_clauses; pathnode->path_mergeclauses= mergeclauses; pathnode->outersortkeys = outersortkeys; @@ -1674,6 +1678,7 @@ create_hashjoin_path(PlannerInfo *root, SemiAntiJoinFactors *semifactors, Path *outer_path, Path *inner_path, + bool inner_unique, List *restrict_clauses, Relids required_outer, List *hashclauses) @@ -1706,6 +1711,7 @@ create_hashjoin_path(PlannerInfo *root, pathnode->jpath.jointype = jointype; pathnode->jpath.outerjoinpath= outer_path; pathnode->jpath.innerjoinpath = inner_path; + pathnode->jpath.inner_unique = inner_unique; pathnode->jpath.joinrestrictinfo = restrict_clauses; pathnode->path_hashclauses= hashclauses; /* final_cost_hashjoin will fill in pathnode->num_batches */ diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h index 59b17f3..f86f806 100644 --- a/src/include/nodes/execnodes.h +++ b/src/include/nodes/execnodes.h @@ -1562,6 +1562,7 @@ typedef struct JoinState PlanState ps; JoinType jointype; List *joinqual; /* JOIN quals (in addition to ps.qual) */ + bool inner_unique;} JoinState;/* ---------------- diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h index 21cbfa8..122f2f4 100644 --- a/src/include/nodes/plannodes.h +++ b/src/include/nodes/plannodes.h @@ -543,6 +543,7 @@ typedef struct Join Plan plan; JoinType jointype; List *joinqual; /*JOIN quals (in addition to plan.qual) */ + bool inner_unique;} Join;/* ---------------- diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h index 334cf51..c1ebfdb 100644 --- a/src/include/nodes/relation.h +++ b/src/include/nodes/relation.h @@ -1030,6 +1030,7 @@ typedef struct JoinPath Path *outerjoinpath; /* path for the outer side of the join */ Path *innerjoinpath; /* path for the inner side of the join */ + bool inner_unique; List *joinrestrictinfo; /* RestrictInfos to apply to join */ diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h index 9923f0e..cefcecc 100644 --- a/src/include/optimizer/pathnode.h +++ b/src/include/optimizer/pathnode.h @@ -94,6 +94,7 @@ extern NestPath *create_nestloop_path(PlannerInfo *root, SemiAntiJoinFactors *semifactors, Path *outer_path, Path *inner_path, + bool inner_unique, List *restrict_clauses, List *pathkeys, Relids required_outer); @@ -105,6 +106,7 @@ extern MergePath *create_mergejoin_path(PlannerInfo *root, SpecialJoinInfo *sjinfo, Path *outer_path, Path *inner_path, + bool inner_unique, List *restrict_clauses, List *pathkeys, Relids required_outer, @@ -120,6 +122,7 @@ extern HashPath *create_hashjoin_path(PlannerInfo *root, SemiAntiJoinFactors *semifactors, Path *outer_path, Path *inner_path, + bool inner_unique, List *restrict_clauses, Relids required_outer, List *hashclauses); diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out index dfae84e..ad1d673 100644 --- a/src/test/regress/expected/equivclass.out +++ b/src/test/regress/expected/equivclass.out @@ -186,7 +186,7 @@ explain (costs off) select * from ec1, ec2 where ff = x1 and x1 = '42'::int8alias2; QUERYPLAN ----------------------------------------- - Nested Loop + Nested Loop(inner unique) -> Seq Scan on ec2 Filter: (x1 = '42'::int8alias2) -> Index Scan using ec1_pkeyon ec1 @@ -310,7 +310,7 @@ explain (costs off) -> Index Scan using ec1_expr3 on ec1 ec1_5 -> Index Scan usingec1_expr4 on ec1 ec1_6 -> Materialize - -> Merge Join + -> Merge Join(inner unique) Merge Cond: ((((ec1_1.ff + 2) + 1)) = ec1.f1) -> MergeAppend Sort Key: (((ec1_1.ff + 2) + 1)) @@ -365,7 +365,7 @@ explain (costs off) where ss1.x = ec1.f1 and ec1.ff = 42::int8; QUERY PLAN ----------------------------------------------------- - Merge Join + Merge Join(inner unique) Merge Cond: ((((ec1_1.ff + 2) + 1)) = ec1.f1) -> Merge Append Sort Key: (((ec1_1.ff+ 2) + 1)) diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out index 57fc910..b6e3024 100644 --- a/src/test/regress/expected/join.out +++ b/src/test/regress/expected/join.out @@ -2614,8 +2614,8 @@ from nt3 as nt3where nt3.id = 1 and ss2.b3; QUERY PLAN ----------------------------------------------- - Nested Loop - -> Nested Loop + Nested Loop(inner unique) + -> Nested Loop(inner unique) -> Index Scan using nt3_pkey on nt3 Index Cond: (id = 1) -> Index Scan using nt2_pkey on nt2 diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out index f41bef1..aaf585d 100644 --- a/src/test/regress/expected/rowsecurity.out +++ b/src/test/regress/expected/rowsecurity.out @@ -248,7 +248,7 @@ EXPLAIN (COSTS OFF) SELECT * FROM document WHERE f_leak(dtitle);EXPLAIN (COSTS OFF) SELECT * FROM documentNATURAL JOIN category WHERE f_leak(dtitle); QUERY PLAN ---------------------------------------------------- - Nested Loop + Nested Loop(inner unique) -> Subquery Scan on document Filter: f_leak(document.dtitle) -> Seq Scanon document document_1
Hello, > Hello, The attached is non-search version of unique join. This is a quite bucket-brigade soye makes many functions got changes to have new parameter only to convey the new bool information. I know it's very bad. After some consideration, I removed almost all of such parameters from path creation function and confine the main work within add_paths_to_joinrel. After all the patch shrinked and looks better. Addition to that, somehow, some additional regtests found to be inner-unique. I think this satisfies your wish and implemented in non exhaustive-seearch-in-jointree manner. It still don't have regressions for itself but I don't see siginificance in adding it so much... regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On 20 March 2015 at 16:11, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
I think this satisfies your wish and implemented in non
exhaustive-seearch-in-jointree manner. It still don't have
regressions for itself but I don't see siginificance in adding
it so much...
This seems quite a bit better. Having the inner_unique variable as part of the JoinPath struct seems much better than what I had. This seems to remove the requirement of my patch that all joins to that RelOptInfo be unique.
I also quite like the more recent change to make_hashjoin and co. just pass the JoinPath as a parameter.
I don't really like the "(inner unique)" being tagged onto the end of the join node, but there's not much point in spending too much time talking about that right now. There's much better things to talk about. I'm sure we can all bikeshed around that one later.
In joinpath.c you have a restriction to only perform the unique check for inner joins.. This should work ok for left joins too, but it would probably be more efficient to have the left join removal code analyse the SpecialJoinInfos during checks for left join removals. I think it would just be a matter of breaking down the join removal code similar to how I did in my patch, but this time add a bool inner_unique to the SpecialJoinInfo struct. The join_is_legal() function seems to select the correct SpecialJoinInfo if one exists, so add_paths_to_joinrel() shouldn't need to call relation_has_unique_index_for() if it's a LEFT JOIN, as we'll already know if it's unique by just looking at the property.
You've also lost the ability to detect that subqueries are unique:
create table j1(id int primary key);
create table j2(value int not null);
explain select * from j1 inner join (select distinct value from j2) j2 on j1.id=j2.value;
The left join removal code properly detects this, so I think unique joins should too.
I can continue working on your patch if you like? Or are you planning to go further with it?
Regards
David Rowley
On 20 March 2015 at 21:11, David Rowley <dgrowleyml@gmail.com> wrote:
I can continue working on your patch if you like? Or are you planning to go further with it?
I've been working on this more over the weekend and I've re-factored things to allow LEFT JOINs to be properly marked as unique.
I've also made changes to re-add support for detecting the uniqueness of sub-queries.
Also, I've added modified the costing for hash and nested loop joins to reduce the cost for unique inner joins to cost the join the same as it does for SEMI joins. This has tipped the scales on a few plans in the regression tests.
Also, please see attached unijoin_analysis.patch. This just adds some code which spouts out notices when join nodes are initialised which states if the join is unique or not. Running the regression tests with this patch in places gives:
Unique Inner: Yes == 753 hits
Unique Inner: No == 1430 hits
So it seems we can increase the speed of about 1 third of joins by about 10%.
A quick scan of the "No"s seems to show quite a few cases which do not look that real world like. e.g cartesian join.
It would be great if someone could run some PostgreSQL application with these 2 patches applied, and then grep the logs for the Unique Inner results... Just to get a better idea of how many joins in a real world case will benefit from this patch.
Regards
David Rowley
Attachment
Hi, thanks for the new patch. I made an additional shrink from your last one. Do you have a look on the attached? At Sun, 22 Mar 2015 19:42:21 +1300, David Rowley <dgrowleyml@gmail.com> wrote in <CAApHDvrKwMmTwkXfn4uazYZA9jQL1c7UwBjBtuwFR69rqLVKfA@mail.gmail.com> > On 20 March 2015 at 21:11, David Rowley <dgrowleyml@gmail.com> wrote: > > > > I can continue working on your patch if you like? Or are you planning to > > go further with it? It's fine that you continue to work on this. # Sorry for the hardly baked patch which had left many things alone:( > I've been working on this more over the weekend and I've re-factored things > to allow LEFT JOINs to be properly marked as unique. > I've also made changes to re-add support for detecting the uniqueness of > sub-queries. I don't see the point of calling mark_unique_joins for every iteration on join_info_list in remove_useless_joins. The loop already iteraltes on whole the join_info_list so mark_unique_join as an individual function is needless. Finally, simply marking uniqueness of join in join_is_removable seems to be enough, inhibiting early bailing out by the checks on attribute usage and placeholder let it work as expected. Reducing changes to this extent, I can easily see what is added to planning computations. It consists of mainly two factors. - Unique-join chekcs for every candidate inner joins in add_paths_to_joinrel. - Uniqueness check of mergejoinable clause in join-removability check for every left join, some of which would be skippedby other checks before. > Also, I've added modified the costing for hash and nested loop joins to > reduce the cost for unique inner joins to cost the join the same as it does > for SEMI joins. This has tipped the scales on a few plans in the regression > tests. I've forgotten it, but quite important. > Also, please see attached unijoin_analysis.patch. This just adds some code > which spouts out notices when join nodes are initialised which states if > the join is unique or not. Running the regression tests with this patch in > places gives: > > Unique Inner: Yes == 753 hits > Unique Inner: No == 1430 hits > > So it seems we can increase the speed of about 1 third of joins by about > 10%. > A quick scan of the "No"s seems to show quite a few cases which do not look > that real world like. e.g cartesian join. I don't have an idea how many queries in the reality hit this but I suppose it's not a few. > It would be great if someone could run some PostgreSQL application with > these 2 patches applied, and then grep the logs for the Unique Inner > results... Just to get a better idea of how many joins in a real world case > will benefit from this patch. Wow. I think the second patch should be DEBUGx, not NOTICE:) regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On 25 March 2015 at 01:11, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Hi, thanks for the new patch.
I made an additional shrink from your last one. Do you have a
look on the attached?
Thanks, for looking again.
I'm not too sure I like the idea of relying on join removals to mark the is_unique_join property.
By explicitly doing it in mark_unique_joins we have the nice side effect of not having to re-check a relations unique properties after removing another relation, providing the relation was already marked as unique on the first pass.
At Sun, 22 Mar 2015 19:42:21 +1300, David Rowley <dgrowleyml@gmail.com> wrote in <CAApHDvrKwMmTwkXfn4uazYZA9jQL1c7UwBjBtuwFR69rqLVKfA@mail.gmail.com>
> On 20 March 2015 at 21:11, David Rowley <dgrowleyml@gmail.com> wrote:
> >
> > I can continue working on your patch if you like? Or are you planning to
> > go further with it?
It's fine that you continue to work on this.
# Sorry for the hardly baked patch which had left many things alone:(
> I've been working on this more over the weekend and I've re-factored things
> to allow LEFT JOINs to be properly marked as unique.
> I've also made changes to re-add support for detecting the uniqueness of
> sub-queries.
I don't see the point of calling mark_unique_joins for every
iteration on join_info_list in remove_useless_joins.
I've fixed this in the attached. I must have forgotten to put the test for LEFT JOINs here as I was still thinking that I might make a change to the code that converts unique semi joins to inner joins so that it just checks is_unique_join instead of calling relation_has_unique_index_for().
The loop already iteraltes on whole the join_info_list so
mark_unique_join as an individual function is needless.
Finally, simply marking uniqueness of join in join_is_removable
seems to be enough, inhibiting early bailing out by the checks on
attribute usage and placeholder let it work as expected.
Reducing changes to this extent, I can easily see what is added
to planning computations. It consists of mainly two factors.
- Unique-join chekcs for every candidate inner joins in
add_paths_to_joinrel.
- Uniqueness check of mergejoinable clause in join-removability
check for every left join, some of which would be skipped
by other checks before.
> Also, I've added modified the costing for hash and nested loop joins to
> reduce the cost for unique inner joins to cost the join the same as it does
> for SEMI joins. This has tipped the scales on a few plans in the regression
> tests.
I've forgotten it, but quite important.
I've fixed quite a fundamental bug in my previous costing change. The fix for this makes it so I have to pass the unique_inner bool down to the costing functions. This also changes how the JoinPaths are marked as unique_inner. This is now done when the JoinPath is created, instead of updating it afterwards like it was done previously.
> Also, please see attached unijoin_analysis.patch. This just adds some code
> which spouts out notices when join nodes are initialised which states if
> the join is unique or not. Running the regression tests with this patch in
> places gives:
>
> Unique Inner: Yes == 753 hits
> Unique Inner: No == 1430 hits
>
> So it seems we can increase the speed of about 1 third of joins by about
> 10%.
> A quick scan of the "No"s seems to show quite a few cases which do not look
> that real world like. e.g cartesian join.
I don't have an idea how many queries in the reality hit this but
I suppose it's not a few.
> It would be great if someone could run some PostgreSQL application with
> these 2 patches applied, and then grep the logs for the Unique Inner
> results... Just to get a better idea of how many joins in a real world case
> will benefit from this patch.
Wow. I think the second patch should be DEBUGx, not NOTICE:)
I didn't give that much thought, but you're probably right. It was just a quick test and demo to try to raise some more interest in this. :-)
Regards
David Rowley
Attachment
Hi!
On Sat, Mar 28, 2015 at 10:35 AM, David Rowley <dgrowleyml@gmail.com> wrote:
On 25 March 2015 at 01:11, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:Hi, thanks for the new patch.
I made an additional shrink from your last one. Do you have a
look on the attached?Thanks, for looking again.I'm not too sure I like the idea of relying on join removals to mark the is_unique_join property.By explicitly doing it in mark_unique_joins we have the nice side effect of not having to re-check a relations unique properties after removing another relation, providing the relation was already marked as unique on the first pass.At Sun, 22 Mar 2015 19:42:21 +1300, David Rowley <dgrowleyml@gmail.com> wrote in <CAApHDvrKwMmTwkXfn4uazYZA9jQL1c7UwBjBtuwFR69rqLVKfA@mail.gmail.com>
> On 20 March 2015 at 21:11, David Rowley <dgrowleyml@gmail.com> wrote:
> >
> > I can continue working on your patch if you like? Or are you planning to
> > go further with it?
It's fine that you continue to work on this.
# Sorry for the hardly baked patch which had left many things alone:(
> I've been working on this more over the weekend and I've re-factored things
> to allow LEFT JOINs to be properly marked as unique.
> I've also made changes to re-add support for detecting the uniqueness of
> sub-queries.
I don't see the point of calling mark_unique_joins for every
iteration on join_info_list in remove_useless_joins.I've fixed this in the attached. I must have forgotten to put the test for LEFT JOINs here as I was still thinking that I might make a change to the code that converts unique semi joins to inner joins so that it just checks is_unique_join instead of calling relation_has_unique_index_for().
Patch doesn't apply to current master. Could you, please, rebase it?
patching file src/backend/commands/explain.c
Hunk #1 succeeded at 1193 (offset 42 lines).
patching file src/backend/executor/nodeHashjoin.c
patching file src/backend/executor/nodeMergejoin.c
patching file src/backend/executor/nodeNestloop.c
patching file src/backend/nodes/copyfuncs.c
Hunk #1 succeeded at 2026 (offset 82 lines).
patching file src/backend/nodes/equalfuncs.c
Hunk #1 succeeded at 837 (offset 39 lines).
patching file src/backend/nodes/outfuncs.c
Hunk #1 succeeded at 2010 (offset 62 lines).
patching file src/backend/optimizer/path/costsize.c
Hunk #1 succeeded at 1780 (offset 68 lines).
Hunk #2 succeeded at 1814 with fuzz 2 (offset 68 lines).
Hunk #3 succeeded at 1887 with fuzz 2 (offset 38 lines).
Hunk #4 succeeded at 2759 (offset 97 lines).
patching file src/backend/optimizer/path/joinpath.c
Hunk #1 succeeded at 19 with fuzz 2 (offset 1 line).
Hunk #2 FAILED at 30.
Hunk #3 succeeded at 46 (offset -5 lines).
Hunk #4 succeeded at 84 with fuzz 2 (offset -8 lines).
Hunk #5 FAILED at 241.
Hunk #6 FAILED at 254.
Hunk #7 FAILED at 284.
Hunk #8 FAILED at 299.
Hunk #9 succeeded at 373 with fuzz 2 (offset 4 lines).
Hunk #10 succeeded at 385 with fuzz 2 (offset 4 lines).
Hunk #11 FAILED at 411.
Hunk #12 succeeded at 470 with fuzz 2 (offset 1 line).
Hunk #13 FAILED at 498.
Hunk #14 succeeded at 543 with fuzz 2 (offset -3 lines).
Hunk #15 FAILED at 604.
Hunk #16 FAILED at 617.
Hunk #17 FAILED at 748.
Hunk #18 FAILED at 794.
Hunk #19 FAILED at 808.
Hunk #20 FAILED at 939.
Hunk #21 FAILED at 966.
Hunk #22 FAILED at 982.
Hunk #23 FAILED at 1040.
Hunk #24 FAILED at 1140.
Hunk #25 FAILED at 1187.
Hunk #26 FAILED at 1222.
Hunk #27 FAILED at 1235.
Hunk #28 FAILED at 1310.
Hunk #29 FAILED at 1331.
Hunk #30 FAILED at 1345.
Hunk #31 FAILED at 1371.
Hunk #32 FAILED at 1410.
25 out of 32 hunks FAILED -- saving rejects to file src/backend/optimizer/path/joinpath.c.rej
patching file src/backend/optimizer/path/joinrels.c
patching file src/backend/optimizer/plan/analyzejoins.c
patching file src/backend/optimizer/plan/createplan.c
Hunk #1 succeeded at 135 (offset 4 lines).
Hunk #2 succeeded at 155 (offset 4 lines).
Hunk #3 succeeded at 2304 (offset 113 lines).
Hunk #4 succeeded at 2598 (offset 113 lines).
Hunk #5 succeeded at 2724 (offset 113 lines).
Hunk #6 succeeded at 3855 (offset 139 lines).
Hunk #7 succeeded at 3865 (offset 139 lines).
Hunk #8 succeeded at 3880 (offset 139 lines).
Hunk #9 succeeded at 3891 (offset 139 lines).
Hunk #10 succeeded at 3941 (offset 139 lines).
Hunk #11 succeeded at 3956 (offset 139 lines).
patching file src/backend/optimizer/plan/initsplan.c
patching file src/backend/optimizer/plan/planmain.c
patching file src/backend/optimizer/util/pathnode.c
Hunk #1 succeeded at 1541 (offset 27 lines).
Hunk #2 succeeded at 1557 (offset 27 lines).
Hunk #3 succeeded at 1610 (offset 27 lines).
Hunk #4 succeeded at 1625 (offset 27 lines).
Hunk #5 succeeded at 1642 (offset 27 lines).
Hunk #6 succeeded at 1670 (offset 27 lines).
Hunk #7 succeeded at 1688 (offset 27 lines).
Hunk #8 succeeded at 1703 (offset 27 lines).
Hunk #9 succeeded at 1741 (offset 27 lines).
patching file src/include/nodes/execnodes.h
Hunk #1 succeeded at 1636 (offset 71 lines).
patching file src/include/nodes/plannodes.h
Hunk #1 succeeded at 586 (offset 43 lines).
patching file src/include/nodes/relation.h
Hunk #1 succeeded at 1045 (offset 14 lines).
Hunk #2 succeeded at 1422 (offset 14 lines).
patching file src/include/optimizer/cost.h
Hunk #1 succeeded at 114 (offset 1 line).
patching file src/include/optimizer/pathnode.h
Hunk #1 succeeded at 91 (offset 2 lines).
Hunk #2 succeeded at 104 (offset 2 lines).
Hunk #3 succeeded at 119 (offset 2 lines).
patching file src/include/optimizer/planmain.h
Hunk #1 succeeded at 124 (offset 2 lines).
patching file src/test/regress/expected/equivclass.out
patching file src/test/regress/expected/join.out
Hunk #1 succeeded at 2656 (offset 42 lines).
Hunk #2 succeeded at 3428 (offset 90 lines).
Hunk #3 succeeded at 4506 (offset 90 lines).
patching file src/test/regress/expected/rowsecurity.out
Hunk #1 succeeded at 274 (offset 26 lines).
patching file src/test/regress/expected/select_views.out
Hunk #1 succeeded at 1411 (offset 46 lines).
Hunk #2 succeeded at 1432 (offset 46 lines).
Hunk #3 succeeded at 1466 (offset 46 lines).
Hunk #4 succeeded at 1497 (offset 46 lines).
patching file src/test/regress/expected/select_views_1.out
Hunk #1 succeeded at 1411 (offset 46 lines).
Hunk #2 succeeded at 1432 (offset 46 lines).
Hunk #3 succeeded at 1466 (offset 46 lines).
Hunk #4 succeeded at 1497 (offset 46 lines).
patching file src/test/regress/sql/join.sql
Hunk #1 succeeded at 1344 (offset 37 lines).
The Russian Postgres Company
On 8 July 2015 at 02:00, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
Patch doesn't apply to current master. Could you, please, rebase it?
Attached. Thanks.
Regards
David Rowley
--
David Rowley http://www.2ndQuadrant.com/
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Wednesday 08 July 2015 12:29:38 David Rowley wrote: > On 8 July 2015 at 02:00, Alexander Korotkov <a.korotkov@postgrespro.ru> > > wrote: > > Patch doesn't apply to current master. Could you, please, rebase it? > > Attached. Thanks. > > Regards > > David Rowley > > -- > David Rowley http://www.2ndQuadrant.com/ > <http://www.2ndquadrant.com/> > PostgreSQL Development, 24x7 Support, Training & Services Hello. What is the current status of the patch? -- YUriy Zhuravlev Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 2015-08-06 15:36, Uriy Zhuravlev wrote: > On Wednesday 08 July 2015 12:29:38 David Rowley wrote: >> On 8 July 2015 at 02:00, Alexander Korotkov >> <a.korotkov@postgrespro.ru> >> >> wrote: >> > Patch doesn't apply to current master. Could you, please, rebase it? >> >> Attached. Thanks. >> >> Regards >> >> David Rowley >> >> -- >> David Rowley http://www.2ndQuadrant.com/ >> <http://www.2ndquadrant.com/> >> PostgreSQL Development, 24x7 Support, Training & Services > > Hello. > What is the current status of the patch? > [unique_joins_9e6d4e4_2015-07-08.patch] FWIW, I just happened to be looking at the latest patch: applies OK (against current HEAD) compiles OK make check fail: join ... FAILED installs OK I was just going to repeat David's 2 queries (hardware: x86_64, Linux 2.6.18-402.el5) -- with these tables: create table t1 (id int primary key); create table t2 (id int primary key); insert into t1 select x.x from generate_series(1,1000000) x(x); insert into t2 select x.x from generate_series(1,1000000) x(x); create table s1 (id varchar(32) primary key); insert into s1 select x.x::varchar from generate_series(1,1000000) x(x); create table s2 (id varchar(32) primary key); insert into s2 select x.x::varchar from generate_series(1,1000000) x(x); vacuum analyze; I do not see much difference between patched and unpatched after running both David's statements (unijoin.sql and unijoin2.sql): -- pgbench -f unijoin.sql -n -T 300 -P30 testdb select count(t2.id) from t1 left outer join t2 on t1.id=t2.id; tps = 2.323222 - unpatched tps = 2.356906 - patched -- -- pgbench -f unijoin2.sql -n -T 300 -P30 testdb tps = 1.257656 - unpatched tps = 1.225758 - patched So as far as I can tell it does not look very promising. Erik Rijkers
Hi, I did some initial performance evaluation of this patch today, and I see a clear improvement on larger joins. The scenario I've chosen for the experiments is a simple fact-dimension join, i.e. a small table referenced by a large table. So effectively something like this: CREATE TABLE dim (id INT PRIMARY KEY, ...); CREATE TABLE fact (dim_d INT REFERENCES dim(id), ...); except that I haven't used the foreign key constraint. In all the experiments I've used a fact table 10x the size of the dimension, but I believe what really matters most is the size of the dimension (and how the hash table fits into the L2/L3 cache). The query tested is very simple: select count(1) from ( select * from f join d on (f.fact_id = d.dim_id) ) foo; The outer aggregation is intentional - the join produces many rows and formatting them as string would completely eliminate any gains from the patch (even with "\o /dev/null" or such). The following numbers come current master, running on E5-2630 v3 (2.40GHz), 64GB of RAM and this configuration: checkpoint_timeout = 15min effective_cache_size = 48GB maintenance_work_mem = 1GB max_wal_size = 4GB min_wal_size = 1GB random_page_cost = 1.5 shared_buffers = 4GB work_mem = 1GB all the other values are set to default. I did 10 runs for each combination of sizes - the numbers seem quite consistent and repeatable. I also looked at the median values. dim 100k rows, fact 1M rows --------------------------- master patched ------- ------- 1 286.184 265.489 2 284.827 264.961 3 281.040 264.768 4 280.926 267.720 5 280.984 261.348 6 280.878 261.463 7 280.875 261.338 8 281.042 261.265 9 281.003 261.236 10 280.939 261.185 ------- ------- med 280.994 261.406 (-7%) dim 1M rows, fact 10M rows -------------------------- master patched -------- -------- 1 4316.235 3690.373 2 4399.539 3738.097 3 4360.551 3655.602 4 4359.763 3626.142 5 4361.821 3621.941 6 4359.205 3654.835 7 4371.438 3631.212 8 4361.857 3626.237 9 4357.317 3676.651 10 4359.561 3641.830 -------- -------- med 4360.157 3648.333 (-17%) dim 10M rows, fact 100M rows ---------------------------- master patched -------- -------- 1 46246.467 39561.597 2 45982.937 40083.352 3 45818.118 39674.661 4 45716.281 39616.585 5 45651.117 40463.966 6 45979.036 41395.390 7 46045.400 41358.047 8 45978.698 41253.946 9 45801.343 41156.440 10 45720.722 41374.688 --------- --------- med 45898.408 40810.203 (-10%) So the gains seem quite solid - it's not something that would make the query an order of magnitude faster, but it's well above the noise. Of course, in practice the queries will be more complicated, making the improvement less significant, but I don't think that's a reason not to apply it. Two minor comments on the patch: 1) the 'subquery' variable in specialjoin_is_unique_join is unused 2) in the explain output, there should probably be a space before the '(inner unique)' text, so Hash Join (inner unique) ... instead of Hash Join(inner unique) but that's just nitpicking at this point. Otherwise the patch seems quite solid to me. regard -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 24 August 2015 at 07:31, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
dim 100k rows, fact 1M rows
---------------------------
master patched
------- -------
..
med 280.994 261.406 (-7%)
dim 1M rows, fact 10M rows
--------------------------
master patched
-------- --------
..
med 4360.157 3648.333 (-17%)
dim 10M rows, fact 100M rows
----------------------------
master patched
-------- --------
..
med 45898.408 40810.203 (-10%)
So the gains seem quite solid - it's not something that would make the query an order of magnitude faster, but it's well above the noise.
Of course, in practice the queries will be more complicated, making the improvement less significant, but I don't think that's a reason not to apply it.
Many thanks for doing that performance testing.
Two minor comments on the patch:
1) the 'subquery' variable in specialjoin_is_unique_join is unused
2) in the explain output, there should probably be a space before the
'(inner unique)' text, so
Hash Join (inner unique) ...
instead of
Hash Join(inner unique)
but that's just nitpicking at this point. Otherwise the patch seems quite solid to me.
The attached fixes these two issues.
Regards
David Rowley
--
David Rowley http://www.2ndQuadrant.com/
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachment
David Rowley <david.rowley@2ndquadrant.com> writes: > On 24 August 2015 at 07:31, Tomas Vondra <tomas.vondra@2ndquadrant.com> > wrote: >> 2) in the explain output, there should probably be a space before the >> '(inner unique)' text, so >> >> Hash Join (inner unique) ... >> >> instead of >> >> Hash Join(inner unique) >> >> but that's just nitpicking at this point. Otherwise the patch seems quite >> solid to me. > The attached fixes these two issues. Please, no. Randomly sticking additional bits of information into Join Type is a good way to render EXPLAIN output permanently unintelligible, not only to humans but to whatever programs are still trying to read the text output format (I think explain.depesz.com still does). It is also not a great idea to put more semantic distance between the text and non-text output representations. I am not exactly convinced that this behavior needs to be visible in EXPLAIN output at all, but if it does, it should be a separate field. regards, tom lane
On 24 August 2015 at 12:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <david.rowley@2ndquadrant.com> writes:
> On 24 August 2015 at 07:31, Tomas Vondra <tomas.vondra@2ndquadrant.com>
> wrote:
>> 2) in the explain output, there should probably be a space before the
>> '(inner unique)' text, so
>>
>> Hash Join (inner unique) ...
>>
>> instead of
>>
>> Hash Join(inner unique)
>>
>> but that's just nitpicking at this point. Otherwise the patch seems quite
>> solid to me.
> The attached fixes these two issues.
Please, no. Randomly sticking additional bits of information into Join
Type is a good way to render EXPLAIN output permanently unintelligible,
not only to humans but to whatever programs are still trying to read the
text output format (I think explain.depesz.com still does). It is also
not a great idea to put more semantic distance between the text and
non-text output representations.
I am not exactly convinced that this behavior needs to be visible in
EXPLAIN output at all, but if it does, it should be a separate field.
I have to admit I don't much like it either, originally I had this as an extra property that was only seen in EXPLAIN VERBOSE.
-> Nested Loop
Output: a.ctid, wcte.*
+ Unique Join: No
There was a debate somewhere about this and it ended up the way it is now, I didn't feel the need to argue for the EXPLAIN VERBOSR field as I thought that a committer would likely change it anyway. However, if I remove all changes to explain.c, then it makes it very difficult to write regression tests which ensure the new code is doing what it's meant to.
What do you think of the extra EXPLAIN VERBOSE field?
Regards
David Rowley
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
David Rowley <david.rowley@2ndquadrant.com> writes: > On 24 August 2015 at 12:19, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I am not exactly convinced that this behavior needs to be visible in >> EXPLAIN output at all, but if it does, it should be a separate field. > I have to admit I don't much like it either, originally I had this as an > extra property that was only seen in EXPLAIN VERBOSE. Seems like a reasonable design from here. (Note that for non-text output, I'd say the field should come out unconditionally. We only care about abbreviating in text mode.) regards, tom lane
On 24 August 2015 at 14:29, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <david.rowley@2ndquadrant.com> writes:
> I have to admit I don't much like it either, originally I had this as an
> extra property that was only seen in EXPLAIN VERBOSE.
Seems like a reasonable design from here.
The attached patch has the format in this way.
(Note that for non-text output,
I'd say the field should come out unconditionally. We only care about
abbreviating in text mode.)
If that's the case then why do we not enable verbose for all of the non-text outputs?
It seems strange to start making exceptions on a case-by-case basis.
Regards
David Rowley
--
David Rowley http://www.2ndQuadrant.com/
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Tue, Aug 25, 2015 at 2:25 PM, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 24 August 2015 at 14:29, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >> David Rowley <david.rowley@2ndquadrant.com> writes: >> > I have to admit I don't much like it either, originally I had this as an >> > extra property that was only seen in EXPLAIN VERBOSE. >> >> Seems like a reasonable design from here. > > > The attached patch has the format in this way. > >> >> (Note that for non-text output, >> I'd say the field should come out unconditionally. We only care about >> >> abbreviating in text mode.) >> > > If that's the case then why do we not enable verbose for all of the non-text > outputs? > It seems strange to start making exceptions on a case-by-case basis. Moved to CF 2015-09. -- Michael
On Tue, Aug 25, 2015 at 1:25 AM, David Rowley <david.rowley@2ndquadrant.com> wrote: > If that's the case then why do we not enable verbose for all of the non-text > outputs? > It seems strange to start making exceptions on a case-by-case basis. +1. FORMAT and VERBOSE are separate options, and each one should control what the name suggests, not the other thing. Also: very nice performance results. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 4 September 2015 at 04:50, Robert Haas <robertmhaas@gmail.com> wrote:
Also: very nice performance results.
Thanks.
On following a thread in [General] [1] it occurred to me that this patch can give a massive improvement on Merge joins where the mark and restore causes an index scan to have to skip over many filtered rows again and again.
I mocked up some tables and some data from the scenario on the [General] thread:
create table customers (id bigint, group_id bigint not null);
insert into customers select x.x,x.x%27724+1 from generate_series(1,473733) x(x);
alter table customers add constraint customer_pkey primary key (id);
create table balances (id bigint, balance int not null, tracking_number int not null, customer_id bigint not null);
insert into balances select x.x, 100, 12345, x.x % 45 + 1 from generate_Series(1,16876) x(x);
create index balance_customer_id_index on balances (customer_id);
create index balances_customer_tracking_number_index on balances (customer_id,tracking_number);
analyze;
Unpatched I get:
test=# explain analyze SELECT ac.* FROM balances ac join customers o ON o.id = ac.customer_id WHERE o.group_id = 45;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=164.87..1868.70 rows=1 width=24) (actual time=6.110..1491.408 rows=375 loops=1)
Merge Cond: (ac.customer_id = o.id)
-> Index Scan using balance_customer_id_index on balances ac (cost=0.29..881.24 rows=16876 width=24) (actual time=0.009..5.206 rows=16876 loops=1)
-> Index Scan using customer_pkey on customers o (cost=0.42..16062.75 rows=17 width=8) (actual time=0.014..1484.382 rows=376 loops=1)
Filter: (group_id = 45)
Rows Removed by Filter: 10396168
Planning time: 0.207 ms
Execution time: 1491.469 ms
(8 rows)
Patched:
test=# explain analyze SELECT ac.* FROM balances ac join customers o ON o.id = ac.customer_id WHERE o.group_id = 45;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=164.87..1868.70 rows=1 width=24) (actual time=6.037..11.528 rows=375 loops=1)
Merge Cond: (ac.customer_id = o.id)
-> Index Scan using balance_customer_id_index on balances ac (cost=0.29..881.24 rows=16876 width=24) (actual time=0.009..4.978 rows=16876 loops=1)
-> Index Scan using customer_pkey on customers o (cost=0.42..16062.75 rows=17 width=8) (actual time=0.015..5.141 rows=2 loops=1)
Filter: (group_id = 45)
Rows Removed by Filter: 27766
Planning time: 0.204 ms
Execution time: 11.575 ms
(8 rows)
Now it could well be that the merge join costs need a bit more work to avoid a merge join in this case, but as it stands as of today, this is your performance gain.
Regards
David Rowley
--
David Rowley http://www.2ndQuadrant.com/
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
2015-10-13 23:28 GMT+02:00 David Rowley <david.rowley@2ndquadrant.com>:
On 4 September 2015 at 04:50, Robert Haas <robertmhaas@gmail.com> wrote:
Also: very nice performance results.Thanks.On following a thread in [General] [1] it occurred to me that this patch can give a massive improvement on Merge joins where the mark and restore causes an index scan to have to skip over many filtered rows again and again.I mocked up some tables and some data from the scenario on the [General] thread:create table customers (id bigint, group_id bigint not null);insert into customers select x.x,x.x%27724+1 from generate_series(1,473733) x(x);alter table customers add constraint customer_pkey primary key (id);create table balances (id bigint, balance int not null, tracking_number int not null, customer_id bigint not null);insert into balances select x.x, 100, 12345, x.x % 45 + 1 from generate_Series(1,16876) x(x);create index balance_customer_id_index on balances (customer_id);create index balances_customer_tracking_number_index on balances (customer_id,tracking_number);analyze;Unpatched I get:test=# explain analyze SELECT ac.* FROM balances ac join customers o ON o.id = ac.customer_id WHERE o.group_id = 45;QUERY PLAN--------------------------------------------------------------------------------------------------------------------------------------------------------Merge Join (cost=164.87..1868.70 rows=1 width=24) (actual time=6.110..1491.408 rows=375 loops=1)Merge Cond: (ac.customer_id = o.id)-> Index Scan using balance_customer_id_index on balances ac (cost=0.29..881.24 rows=16876 width=24) (actual time=0.009..5.206 rows=16876 loops=1)-> Index Scan using customer_pkey on customers o (cost=0.42..16062.75 rows=17 width=8) (actual time=0.014..1484.382 rows=376 loops=1)Filter: (group_id = 45)Rows Removed by Filter: 10396168Planning time: 0.207 msExecution time: 1491.469 ms(8 rows)Patched:test=# explain analyze SELECT ac.* FROM balances ac join customers o ON o.id = ac.customer_id WHERE o.group_id = 45;QUERY PLAN--------------------------------------------------------------------------------------------------------------------------------------------------------Merge Join (cost=164.87..1868.70 rows=1 width=24) (actual time=6.037..11.528 rows=375 loops=1)Merge Cond: (ac.customer_id = o.id)-> Index Scan using balance_customer_id_index on balances ac (cost=0.29..881.24 rows=16876 width=24) (actual time=0.009..4.978 rows=16876 loops=1)-> Index Scan using customer_pkey on customers o (cost=0.42..16062.75 rows=17 width=8) (actual time=0.015..5.141 rows=2 loops=1)Filter: (group_id = 45)Rows Removed by Filter: 27766Planning time: 0.204 msExecution time: 11.575 ms(8 rows)Now it could well be that the merge join costs need a bit more work to avoid a merge join in this case, but as it stands as of today, this is your performance gain.Regards
it is great
Pavel
David Rowley--
David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Oct 14, 2015 at 1:03 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > it is great +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 25 August 2015 at 17:25, David Rowley <david.rowley@2ndquadrant.com> wrote:
On 24 August 2015 at 14:29, Tom Lane <tgl@sss.pgh.pa.us> wrote:David Rowley <david.rowley@2ndquadrant.com> writes:
> I have to admit I don't much like it either, originally I had this as an
> extra property that was only seen in EXPLAIN VERBOSE.
Seems like a reasonable design from here.The attached patch has the format in this way.
I've attached a rebased patch against current master as there were some conflicts from the recent changes to LATERAL join.
On reviewing the patch again I was reminded that the bulk of the changes in the patch are in analyzejoins.c. These are mostly just a refactor of the current code to make it more reusable. The diff looks a bit more scary than it actually is.
--
Attachment
Hi, On 12/16/2015 01:27 AM, David Rowley wrote: > > I've attached a rebased patch against current master as there were > some conflicts from the recent changes to LATERAL join. Thanks. I've looked at the rebased patch and have a few minor comments. 0) I know the patch does not tweak costing - any plans in this direction? Would it be possible to simply use the costingused by semijoin? 1) nodeHashjoin.c (and other join nodes) I've noticed we have this in the ExecHashJoin() method: /* * When the inner side is unique or we're performing a * semijoin, we'll consider returning the first match, but *after that we're done with this outer tuple. */ if (node->js.unique_inner) node->hj_JoinState = HJ_NEED_NEW_OUTER; That seems a bit awkward because the comment speaks about unique joins *OR* semijoins, but the check only references unique_inner. That of course works because we do this in ExecInitHashJoin(): switch (node->join.jointype){ case JOIN_SEMI: hjstate->js.unique_inner = true; /* fall through */ Either some semijoins may not be unique - in that case the naming is misleading at best, and we either need to find a better name or simply check two conditions like this: if (node->js.unique_inner || node->join.type == JOIN_SEMI) node->hj_JoinState = HJ_NEED_NEW_OUTER; Or all semijoins are unique joins, and in that case the comment may need rephrasing. But more importantly, it begs the question why we're detecting this in the executor and not in the planner? Because if we detect it in executor, we can't use this bit in costing, for example. FWIW the misleading naming was actually mentioned in the thread by Horiguchi-san, but I don't see any response to that (might have missed it though). 2) analyzejoins.c I see that this code in join_is_removable() innerrel = find_base_rel(root, innerrelid); if (innerrel->reloptkind != RELOPT_BASEREL) return false; was rewritten like this: innerrel = find_base_rel(root, innerrelid); Assert(innerrel->reloptkind == RELOPT_BASEREL); That suggests that previously the function expected cases where reloptkind may not be RELOPT_BASEREL, but now it'll error out int such cases. I haven't noticed any changes in the surrounding code that'd guarantee this won't happen, but I also haven't been able to come up with an example triggering the assert (haven't been trying too hard). How do we know the assert() makes sense? 3) joinpath.c - either "were" or "been" seems redundant /* left joins were already been analyzed for uniqueness in mark_unique_joins() */ 4) analyzejoins.c comment format broken /* * mark_unique_joinsAnalyze joins in order to determine if their inner side is unique basedon the join condition. */ 5) analyzejoins.c missing line before the comment } /* * rel_supports_distinctness * Returns true if rel has some properties which can prove the relation * tobe unique over some set of columns. * regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 17 December 2015 at 05:02, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
0) I know the patch does not tweak costing - any plans in this
direction? Would it be possible to simply use the costing used by
semijoin?
Many thanks for looking at this.
The patch does tweak the costings so that unique joins are costed in the same way as semi joins. It's a very small change, so easy to miss.
For example, see the change in initial_cost_nestloop()
- if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
+ if (jointype == JOIN_SEMI ||
+ jointype == JOIN_ANTI ||
+ unique_inner)
{
Also see the changes in final_cost_nestloop() and final_cost_hashjoin()
1) nodeHashjoin.c (and other join nodes)
I've noticed we have this in the ExecHashJoin() method:
/*
* When the inner side is unique or we're performing a
* semijoin, we'll consider returning the first match, but
* after that we're done with this outer tuple.
*/
if (node->js.unique_inner)
node->hj_JoinState = HJ_NEED_NEW_OUTER;
That seems a bit awkward because the comment speaks about unique joins *OR* semijoins, but the check only references unique_inner. That of course works because we do this in ExecInitHashJoin():
switch (node->join.jointype)
{
case JOIN_SEMI:
hjstate->js.unique_inner = true;
/* fall through */
Either some semijoins may not be unique - in that case the naming is misleading at best, and we either need to find a better name or simply check two conditions like this:
if (node->js.unique_inner || node->join.type == JOIN_SEMI)
node->hj_JoinState = HJ_NEED_NEW_OUTER;
Or all semijoins are unique joins, and in that case the comment may need rephrasing. But more importantly, it begs the question why we're detecting this in the executor and not in the planner? Because if we detect it in executor, we can't use this bit in costing, for example.
The reason not to detect that in the planner is simply that unique_join is meant to mean that the relation on the inner side of the join will at most contain a single Tuple which matches each outer relation tuple, based on the join condition. I've added no detection for this in semi joins, and I'd rather not go blindly setting the flag to true in the planner as it simply may not be true for the semi join. At the moment that might not matter as we're only using the unique_join flag as an optimization in the join nodes, but I'd prefer not to do this as its likely we'll want to do more with this flag later, and I'd rather we keep the meaning well defined. You might argue that this is also true for semi joins, but if down the road somewhere we want to perform some checks on the inner relation before the join takes place, and in that case the Tuples of the relation might not have the same properties we claim they do.
But you're right that reusing the flag in the join nodes is not ideal, and the comment is not that great either. I'd really rather go down the path of either renaming the variable, or explaining this better in the comment. It seems unnecessary to check both for each tuple being joined. I'd imagine that might add a little overhead to joins which are not unique.
How about changing the comment to:
/*
* In the event that the inner side has been detected to be
* unique, as an optimization we can skip searching for any
* subsequent matching inner tuples, as none will exist.
* For semijoins unique_inner will always be true, although
* in this case we don't match another inner tuple as this
* is the required semi join behavior.
*/
Alternatively or additionally we can rename the variable in the executor state, although I've not thought of a name yet that I don't find overly verbose: unique_inner_or_semi_join, match_first_tuple_only.
2) analyzejoins.c
I see that this code in join_is_removable()
innerrel = find_base_rel(root, innerrelid);
if (innerrel->reloptkind != RELOPT_BASEREL)
return false;
was rewritten like this:
innerrel = find_base_rel(root, innerrelid);
Assert(innerrel->reloptkind == RELOPT_BASEREL);
That suggests that previously the function expected cases where reloptkind may not be RELOPT_BASEREL, but now it'll error out int such cases. I haven't noticed any changes in the surrounding code that'd guarantee this won't happen, but I also haven't been able to come up with an example triggering the assert (haven't been trying too hard). How do we know the assert() makes sense?
I'd have changed this as this should be covered by the if (!sjinfo->is_unique_join || a few lines up. mark_unique_joins() only sets is_unique_join to true if specialjoin_is_unique_join() returns true and that function tests reloptkind to ensure its set to RELOPT_BASEREL and return false if the relation is not. Perhaps what is missing is a comment to explain the function should only be used on RELOPT_BASEREL type relations.
3) joinpath.c
- either "were" or "been" seems redundant
/* left joins were already been analyzed for uniqueness in mark_unique_joins() */
Good catch. Fixed
4) analyzejoins.c
comment format broken
/*
* mark_unique_joins
Analyze joins in order to determine if their inner side is unique based
on the join condition.
*/
Fixed
5) analyzejoins.c
missing line before the comment
}
/*
* rel_supports_distinctness
* Returns true if rel has some properties which can prove the relation
* to be unique over some set of columns.
*
Fixed
I've attached updated patches.
Thanks again for the review!
--
Attachment
Hi, On 12/16/2015 11:40 PM, David Rowley wrote: > On 17 December 2015 at 05:02, Tomas Vondra <tomas.vondra@2ndquadrant.com > <mailto:tomas.vondra@2ndquadrant.com>> wrote: > > 0) I know the patch does not tweak costing - any plans in this > > direction? Would it be possible to simply use the costing used by > semijoin? > > > Many thanks for looking at this. > > The patch does tweak the costings so that unique joins are costed in the > same way as semi joins. It's a very small change, so easy to miss. Thanks. I missed that bit somehow. > > For example, see the change in initial_cost_nestloop() > > - if (jointype == JOIN_SEMI || jointype == JOIN_ANTI) > + if (jointype == JOIN_SEMI || > + jointype == JOIN_ANTI || > + unique_inner) > { > > Also see the changes in final_cost_nestloop() and final_cost_hashjoin() > > > 1) nodeHashjoin.c (and other join nodes) > > I've noticed we have this in the ExecHashJoin() method: > > /* > * When the inner side is unique or we're performing a > * semijoin, we'll consider returning the first match, but > * after that we're done with this outer tuple. > */ > if (node->js.unique_inner) > node->hj_JoinState = HJ_NEED_NEW_OUTER; > > That seems a bit awkward because the comment speaks about unique > joins *OR* semijoins, but the check only references unique_inner. > That of course works because we do this in ExecInitHashJoin(): > > switch (node->join.jointype) > { > case JOIN_SEMI: > hjstate->js.unique_inner = true; > /* fall through */ > > Either some semijoins may not be unique - in that case the naming is > misleading at best, and we either need to find a better name or > simply check two conditions like this: > > if (node->js.unique_inner || node->join.type == JOIN_SEMI) > node->hj_JoinState = HJ_NEED_NEW_OUTER; > > Or all semijoins are unique joins, and in that case the comment may > need rephrasing. But more importantly, it begs the question why > we're detecting this in the executor and not in the planner? Because > if we detect it in executor, we can't use this bit in costing, for > example. > > > The reason not to detect that in the planner is simply that unique_join > is meant to mean that the relation on the inner side of the join will at > most contain a single Tuple which matches each outer relation tuple, > based on the join condition. I've added no detection for this in semi > joins, and I'd rather not go blindly setting the flag to true in the > planner as it simply may not be true for the semi join. At the moment > that might not matter as we're only using the unique_join flag as an > optimization in the join nodes, but I'd prefer not to do this as its > likely we'll want to do more with this flag later, and I'd rather we > keep the meaning well defined. You might argue that this is also true > for semi joins, but if down the road somewhere we want to perform some > checks on the inner relation before the join takes place, and in that > case the Tuples of the relation might not have the same properties we > claim they do. > > But you're right that reusing the flag in the join nodes is not ideal, > and the comment is not that great either. I'd really rather go down the > path of either renaming the variable, or explaining this better in the > comment. It seems unnecessary to check both for each tuple being joined. > I'd imagine that might add a little overhead to joins which are not unique. I'd be very surprised it that had any measurable impact. > How about changing the comment to: > > /* > * In the event that the inner side has been detected to be > * unique, as an optimization we can skip searching for any > * subsequent matching inner tuples, as none will exist. > * For semijoins unique_inner will always be true, although > * in this case we don't match another inner tuple as this > * is the required semi join behavior. > */ > > Alternatively or additionally we can rename the variable in the executor > state, although I've not thought of a name yet that I don't find overly > verbose: unique_inner_or_semi_join, match_first_tuple_only. I'd go with match_first_tuple_only. > > > 2) analyzejoins.c > > I see that this code in join_is_removable() > > innerrel = find_base_rel(root, innerrelid); > > if (innerrel->reloptkind != RELOPT_BASEREL) > return false; > > was rewritten like this: > > innerrel = find_base_rel(root, innerrelid); > > Assert(innerrel->reloptkind == RELOPT_BASEREL); > > That suggests that previously the function expected cases where > reloptkind may not be RELOPT_BASEREL, but now it'll error out int > such cases. I haven't noticed any changes in the surrounding code > that'd guarantee this won't happen, but I also haven't been able to > come up with an example triggering the assert (haven't been trying > too hard). How do we know the assert() makes sense? > > > I'd have changed this as this should be covered by the if > (!sjinfo->is_unique_join || a few lines up. mark_unique_joins() only > sets is_unique_join to true if specialjoin_is_unique_join() returns true > and that function tests reloptkind to ensure its set to RELOPT_BASEREL > and return false if the relation is not. Perhaps what is missing is a > comment to explain the function should only be used on RELOPT_BASEREL > type relations. Yeah, I think it'd be good to document this contract somewhere. Either in the comment before the function, or perhaps right above the Assert(). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 17 December 2015 at 00:17, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
--
I'd be very surprised it that had any measurable impact.1) nodeHashjoin.c (and other join nodes)
I've noticed we have this in the ExecHashJoin() method:
/*
* When the inner side is unique or we're performing a
* semijoin, we'll consider returning the first match, but
* after that we're done with this outer tuple.
*/
if (node->js.unique_inner)
node->hj_JoinState = HJ_NEED_NEW_OUTER;
That seems a bit awkward because the comment speaks about unique
joins *OR* semijoins, but the check only references unique_inner.
That of course works because we do this in ExecInitHashJoin():
switch (node->join.jointype)
{
case JOIN_SEMI:
hjstate->js.unique_inner = true;
/* fall through */
Either some semijoins may not be unique - in that case the naming is
misleading at best, and we either need to find a better name or
simply check two conditions like this:
if (node->js.unique_inner || node->join.type == JOIN_SEMI)
node->hj_JoinState = HJ_NEED_NEW_OUTER;
Or all semijoins are unique joins, and in that case the comment may
need rephrasing. But more importantly, it begs the question why
we're detecting this in the executor and not in the planner? Because
if we detect it in executor, we can't use this bit in costing, for
example.
The reason not to detect that in the planner is simply that unique_join
is meant to mean that the relation on the inner side of the join will at
most contain a single Tuple which matches each outer relation tuple,
based on the join condition. I've added no detection for this in semi
joins, and I'd rather not go blindly setting the flag to true in the
planner as it simply may not be true for the semi join. At the moment
that might not matter as we're only using the unique_join flag as an
optimization in the join nodes, but I'd prefer not to do this as its
likely we'll want to do more with this flag later, and I'd rather we
keep the meaning well defined. You might argue that this is also true
for semi joins, but if down the road somewhere we want to perform some
checks on the inner relation before the join takes place, and in that
case the Tuples of the relation might not have the same properties we
claim they do.
But you're right that reusing the flag in the join nodes is not ideal,
and the comment is not that great either. I'd really rather go down the
path of either renaming the variable, or explaining this better in the
comment. It seems unnecessary to check both for each tuple being joined.
I'd imagine that might add a little overhead to joins which are not unique.How about changing the comment to:
/*
* In the event that the inner side has been detected to be
* unique, as an optimization we can skip searching for any
* subsequent matching inner tuples, as none will exist.
* For semijoins unique_inner will always be true, although
* in this case we don't match another inner tuple as this
* is the required semi join behavior.
*/
Alternatively or additionally we can rename the variable in the executor
state, although I've not thought of a name yet that I don't find overly
verbose: unique_inner_or_semi_join, match_first_tuple_only.
I'd go with match_first_tuple_only.
+1
unique_inner is a state that has been detected, match_first_tuple_only is the action we take as a result.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 17 December 2015 at 19:11, Simon Riggs <simon@2ndquadrant.com> wrote:
On 17 December 2015 at 00:17, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:I'd go with match_first_tuple_only.+1unique_inner is a state that has been detected, match_first_tuple_only is the action we take as a result.
Ok great. I've made it so in the attached. This means the comment in the join code where we perform the skip can be a bit less verbose and all the details can go in where we're actually setting the match_first_tuple_only to true.
--
Attachment
On Thu, Dec 17, 2015 at 10:17 PM, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 17 December 2015 at 19:11, Simon Riggs <simon@2ndquadrant.com> wrote: >> >> On 17 December 2015 at 00:17, Tomas Vondra <tomas.vondra@2ndquadrant.com> >> wrote: >>> >>> I'd go with match_first_tuple_only. >> >> >> +1 >> >> unique_inner is a state that has been detected, match_first_tuple_only is >> the action we take as a result. >> > > Ok great. I've made it so in the attached. This means the comment in the > join code where we perform the skip can be a bit less verbose and all the > details can go in where we're actually setting the match_first_tuple_only to > true. Patch moved to next CF because of a lack of reviews on the new patch, and because the last patch has been posted not so long ago. -- Michael
Hi, On 12/17/2015 02:17 PM, David Rowley wrote: > On 17 December 2015 at 19:11, Simon Riggs <simon@2ndquadrant.com > <mailto:simon@2ndquadrant.com>> wrote: > > On 17 December 2015 at 00:17, Tomas Vondra > <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> > wrote: > > I'd go with match_first_tuple_only. > > > +1 > > unique_inner is a state that has been detected, > match_first_tuple_only is the action we take as a result. > > > Ok great. I've made it so in the attached. This means the comment in the > join code where we perform the skip can be a bit less verbose and all > the details can go in where we're actually setting the > match_first_tuple_only to true. OK. I've looked at the patch again today, and it seems broken bv 45be99f8 as the partial paths were not passing the unique_inner to the create_*_path() functions. The attached patch should fix that. Otherwise I think the patch is ready for committer - I think this is a valuable optimization and I see nothing wrong with the code. Any objections to marking it accordingly? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 23 January 2016 at 05:36, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > OK. I've looked at the patch again today, and it seems broken bv 45be99f8 as > the partial paths were not passing the unique_inner to the create_*_path() > functions. The attached patch should fix that. > Thanks for looking at this again, and thanks for fixing that problem. I've attached an updated patch which incorporates your change, and also another 2 small tweaks that I made after making another pass over this myself. > Otherwise I think the patch is ready for committer - I think this is a > valuable optimization and I see nothing wrong with the code. > > > Any objections to marking it accordingly? None from me. I think it's had plenty of review time over the last year. The only other thing I can possibly think of is to apply most of the analyzejoins.c changes as a refactor patch first. If committer wants that, I can separate these out, but I'll hold off for a response before doing that. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
<p dir="ltr"><br /> On 23/01/2016 12:42 am, "David Rowley" <<a href="mailto:david.rowley@2ndquadrant.com">david.rowley@2ndquadrant.com</a>>wrote:<br /> ><br /> > On 23 January2016 at 05:36, Tomas Vondra <<a href="mailto:tomas.vondra@2ndquadrant.com">tomas.vondra@2ndquadrant.com</a>>wrote:<br /> > > Otherwise I think thepatch is ready for committer - I think this is a<br /> > > valuable optimization and I see nothing wrong with thecode.<br /> > ><br /> > ><br /> > > Any objections to marking it accordingly?<p dir="ltr">I've justset this patch back to ready for committer. It was previous marked as so but lost that status when it was moved to themarch fest.<br />
I wonder why do we have two identical copies of clause_sides_match_join ... -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 5 March 2016 at 10:43, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > I wonder why do we have two identical copies of clause_sides_match_join ... Yeah, I noticed the same a while back, and posted about it. Here was the response: http://www.postgresql.org/message-id/26820.1405522310@sss.pgh.pa.us -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 23 January 2016 at 05:36, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > Hi, > > On 12/17/2015 02:17 PM, David Rowley wrote: >> >> On 17 December 2015 at 19:11, Simon Riggs <simon@2ndquadrant.com >> <mailto:simon@2ndquadrant.com>> wrote: >> >> On 17 December 2015 at 00:17, Tomas Vondra >> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> >> wrote: >> >> I'd go with match_first_tuple_only. >> >> >> +1 >> >> unique_inner is a state that has been detected, >> match_first_tuple_only is the action we take as a result. >> >> >> Ok great. I've made it so in the attached. This means the comment in the >> join code where we perform the skip can be a bit less verbose and all >> the details can go in where we're actually setting the >> match_first_tuple_only to true. > > > OK. I've looked at the patch again today, and it seems broken bv 45be99f8 as > the partial paths were not passing the unique_inner to the create_*_path() > functions. The attached patch should fix that. > > Otherwise I think the patch is ready for committer - I think this is a > valuable optimization and I see nothing wrong with the code. I've attached an updated patch which updates it to fix the conflicts with the recent upper planner changes. I also notice that some regression tests, which I think some of which Tom updated in the upper planner changes have now changed back again due to the slightly reduced costs on hash and nested loop joins where the inner side is unique. I checked the costs of one of these by disabling hash join and noticed that the final totla cost is the same, so it's not too surprising that they keep switching plans with these planner changes going in. I verified that these remain as is when I comment out the cost changing code in this patch. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
David Rowley <david.rowley@2ndquadrant.com> writes: > I also notice that some regression tests, which I think some of which > Tom updated in the upper planner changes have now changed back again > due to the slightly reduced costs on hash and nested loop joins where > the inner side is unique. ?? I don't see anything in this patch that touches the same queries that changed plans in my previous patch. I do think that the verbosity this adds to the EXPLAIN output is not desirable at all, at least not in text mode. Another line for every darn join is a pretty high price. regards, tom lane
On 9 March 2016 at 13:19, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I do think that the verbosity this adds to the EXPLAIN output is not > desirable at all, at least not in text mode. Another line for every > darn join is a pretty high price. For me it seems like a good idea to give some sort of indication in EXPLAIN about this, although I'm not wedded to what I have currently as being the best and neatest way to do this. I was however under the impression that you thought this was reasonable [1], so I didn't go seeking out any other way. Did you have another idea, or are you just proposing to remove it for text EXPLAIN? Perhaps, if you are then the tests can still exist with a non-text output. Also in [1], I disagree with your proposal for being inconsistent with what's shown with VERBOSE depending on the EXPLAIN output format. If we were going to do that then I'd rather just switch on verbose for non-text, but that might make some people unhappy, so I'd rather we didn't do that either. So how about I remove it from the TEXT output and just include it in non-text formats when the VERBOSE flag is set? then change the tests to use... something other than text... YAML seems most compact. [1] http://www.postgresql.org/message-id/8907.1440383377@sss.pgh.pa.us -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
David Rowley <david.rowley@2ndquadrant.com> writes: > [1] http://www.postgresql.org/message-id/8907.1440383377@sss.pgh.pa.us Oh, okay, I had looked at the many changes in the regression outputs and jumped to the conclusion that you were printing the info all the time. Looking closer I see it's only coming out in VERBOSE mode. That's probably fine. <emilylitella>Never mind</emilylitella> regards, tom lane
So I started re-reading this thread in preparation for looking at the patch, and this bit in your initial message jumped out at me: > In all of our join algorithms in the executor, if the join type is SEMI, > we skip to the next outer row once we find a matching inner row. This is > because we don't want to allow duplicate rows in the inner side to > duplicate outer rows in the result set. Obviously this is required per SQL > spec. I believe we can also skip to the next outer row in this case when > we've managed to prove that no other row can possibly exist that matches > the current outer row, due to a unique index or group by/distinct clause > (for subqueries). I wondered why, instead of inventing an extra semantics-modifying flag, we couldn't just change the jointype to *be* JOIN_SEMI when we've discovered that the inner side is unique. Now of course this only works if the join type was INNER to start with. If it was a LEFT join, you'd need an "outer semi join" jointype which we haven't got at the moment. But I wonder whether inventing that jointype wouldn't let us arrive at a less messy handling of things in the executor and EXPLAIN. I'm not very enamored of plastering this "match_first_tuple_only" flag on every join, in part because it doesn't appear to have sensible semantics for other jointypes such as JOIN_RIGHT. And I'd really be happier to see the information reflected by join type than a new line in EXPLAIN, also. regards, tom lane
I wrote: > I wondered why, instead of inventing an extra semantics-modifying flag, > we couldn't just change the jointype to *be* JOIN_SEMI when we've > discovered that the inner side is unique. BTW, to clarify: I'm not imagining that we'd make this change in the query jointree, as for example prepjointree.c might do. That would appear to add join order constraints, which we don't want. But I'm thinking that at the instant where we form a join Path, we could change the Path's jointype to be JOIN_SEMI or JOIN_SEMI_OUTER if we're able to prove the inner side unique, rather than annotating the Path with a separate flag. Then that representation is what propagates forward. It seems like the major intellectual complexity here is to figure out how to detect inner-side-unique at reasonable cost. I see that for LEFT joins you're caching that in the SpecialJoinInfos, which is probably fine. But for INNER joins it looks like you're just doing it over again for every candidate join, and that seems mighty expensive. regards, tom lane
On 12 March 2016 at 11:43, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> I wondered why, instead of inventing an extra semantics-modifying flag, >> we couldn't just change the jointype to *be* JOIN_SEMI when we've >> discovered that the inner side is unique. > > BTW, to clarify: I'm not imagining that we'd make this change in the > query jointree, as for example prepjointree.c might do. That would appear > to add join order constraints, which we don't want. But I'm thinking that > at the instant where we form a join Path, we could change the Path's > jointype to be JOIN_SEMI or JOIN_SEMI_OUTER if we're able to prove the > inner side unique, rather than annotating the Path with a separate flag. > Then that representation is what propagates forward. Thanks for looking at this. Yes that might work, since we'd just be changing the jointype in the JoinPath, if that path was discarded if favour of, say the commutative variant, which was not "unique inner", then it shouldn't matter, as the join type for that path would be the original one. The thing that might matter is that, this; explain (costs off) select * from t1 inner join t2 on t1.id=t2.id QUERY PLAN ------------------------------Hash Join Hash Cond: (t1.id = t2.id) -> Seq Scan on t1 -> Hash -> Seq Scan ont2 could become; QUERY PLAN ------------------------------Hash Semi Join Hash Cond: (t1.id = t2.id) -> Seq Scan on t1 -> Hash -> Seq Scanon t2 Wouldn't that cause quite a bit of confusion? People browsing EXPLAIN output might be a bit confused at the lack of EXISTS/IN clause in a query which is showing a Semi Join. Now, we could get around that by adding JOIN_SEMI_INNER I guess, and just displaying that as a normal inner join, yet it'll behave exactly like JOIN_SEMI! And if we do that, then the join node code changes from; if (node->js.match_first_tuple_only) node->nl_NeedNewOuter = true; to; if (node->js.jointype == JOIN_SEMI || node->js.jointype == JOIN_SEMI_INNER || node->js.jointype == JOIN_SEMI_LEFT) node->nl_NeedNewOuter = true; which is kinda horrid and adds a few cycles and code without a real need for it. If we kept the match_first_tuples_only and set it in the node startup similar to how it is now, or changed JoinType to be some bitmask mask that had a bit for each piece of a venn diagram, and an extra for the unique inner... but I'm not so sure I like that idea... To me it seems neater just to keep the match_first_tuple_only and just update the planner stuff to use the new jointypes. Thoughts? > > It seems like the major intellectual complexity here is to figure out > how to detect inner-side-unique at reasonable cost. I see that for > LEFT joins you're caching that in the SpecialJoinInfos, which is probably > fine. But for INNER joins it looks like you're just doing it over again > for every candidate join, and that seems mighty expensive. hmm yeah, perhaps something can be done to cache that, I'm just not all that sure how much reuse the cache will get used since it generates the Paths for nestloop/merge/has join methods all at once, once the unique_inner has been determined for that inner rel, and the restrict info for whatever the outer rel(s) are. I didn't expect that to happen twice, so I'm not all that sure if a cache will get reused... ... thinks more ... Perhaps maybe if rel x was found to be unique with the join conditions of rel y, then we can be sure that x is unique against y, z, as that could only possible add more to the join condition, never less. Is this what you meant? ... I'll look into that. The other thing I thought of was to add a dedicated list for unique indexes in RelOptInfo, this would also allow rel_supports_distinctness() to do something a bit smarter than just return false if there's no indexes. That might not buy us much though, but at least relations tend to have very little unique indexes, even when they have lots of indexes. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Saturday, March 12, 2016, David Rowley <david.rowley@2ndquadrant.com> wrote:
On 12 March 2016 at 11:43, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> I wondered why, instead of inventing an extra semantics-modifying flag,
>> we couldn't just change the jointype to *be* JOIN_SEMI when we've
>> discovered that the inner side is unique.
>
> BTW, to clarify: I'm not imagining that we'd make this change in the
> query jointree, as for example prepjointree.c might do. That would appear
> to add join order constraints, which we don't want. But I'm thinking that
> at the instant where we form a join Path, we could change the Path's
> jointype to be JOIN_SEMI or JOIN_SEMI_OUTER if we're able to prove the
> inner side unique, rather than annotating the Path with a separate flag.
> Then that representation is what propagates forward.
Thanks for looking at this.
Yes that might work, since we'd just be changing the jointype in the
JoinPath, if that path was discarded if favour of, say the commutative
variant, which was not "unique inner", then it shouldn't matter, as
the join type for that path would be the original one.
The thing that might matter is that, this;
explain (costs off) select * from t1 inner join t2 on t1.id=t2.id
QUERY PLAN
------------------------------
Hash Join
Hash Cond: (t1.id = t2.id)
-> Seq Scan on t1
-> Hash
-> Seq Scan on t2
could become;
QUERY PLAN
------------------------------
Hash Semi Join
Hash Cond: (t1.id = t2.id)
-> Seq Scan on t1
-> Hash
-> Seq Scan on t2
Wouldn't that cause quite a bit of confusion? People browsing EXPLAIN
output might be a bit confused at the lack of EXISTS/IN clause in a
query which is showing a Semi Join. Now, we could get around that by
adding JOIN_SEMI_INNER I guess, and just displaying that as a normal
inner join, yet it'll behave exactly like JOIN_SEMI!
Don't the semantics of a SEMI JOIN also state that the output columns only come from the outer relation? i.e., the inner relation doesn't contribute either rows or columns to the final result? Or is that simply an implementation artifact of the fact that the only current way to perform a semi-join explicitly is via exists/in?
David J.
"David G. Johnston" <david.g.johnston@gmail.com> writes: > Don't the semantics of a SEMI JOIN also state that the output columns only > come from the outer relation? i.e., the inner relation doesn't contribute > either rows or columns to the final result? Or is that simply > an implementation artifact of the fact that the only current way to perform > a semi-join explicitly is via exists/in? I think it's an artifact. What nodes.h actually says about it is you get the values of one randomly-selected matching inner row, which seems like a fine definition for the purposes we plan to put it to. regards, tom lane
David Rowley <david.rowley@2ndquadrant.com> writes: > On 12 March 2016 at 11:43, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I wondered why, instead of inventing an extra semantics-modifying flag, >>> we couldn't just change the jointype to *be* JOIN_SEMI when we've >>> discovered that the inner side is unique. > The thing that might matter is that, this; > explain (costs off) select * from t1 inner join t2 on t1.id=t2.id > QUERY PLAN > ------------------------------ > Hash Join > Hash Cond: (t1.id = t2.id) > -> Seq Scan on t1 > -> Hash > -> Seq Scan on t2 > could become; > QUERY PLAN > ------------------------------ > Hash Semi Join > Hash Cond: (t1.id = t2.id) > -> Seq Scan on t1 > -> Hash > -> Seq Scan on t2 > Wouldn't that cause quite a bit of confusion? Well, no more than was introduced when we invented semi joins at all. > Now, we could get around that by > adding JOIN_SEMI_INNER I guess, and just displaying that as a normal > inner join, yet it'll behave exactly like JOIN_SEMI! I'm not that thrilled with having EXPLAIN hide real differences in the plan from you; if I was, I'd have just lobbied to drop the "unique inner" annotation from EXPLAIN output altogether. (I think at one point we'd discussed displaying this in EXPLAIN output as a different join type, and I'd been against it at the time. What changed my thinking was realizing that it could be mapped on to the existing jointype "semi join". We still need one new concept, "outer semi join" or whatever we decide to call it, but it's less of a stretch than I'd supposed originally.) regards, tom lane
On Fri, Mar 11, 2016 at 4:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > So I started re-reading this thread in preparation for looking at the > patch, and this bit in your initial message jumped out at me: > >> In all of our join algorithms in the executor, if the join type is SEMI, >> we skip to the next outer row once we find a matching inner row. This is >> because we don't want to allow duplicate rows in the inner side to >> duplicate outer rows in the result set. Obviously this is required per SQL >> spec. I believe we can also skip to the next outer row in this case when >> we've managed to prove that no other row can possibly exist that matches >> the current outer row, due to a unique index or group by/distinct clause >> (for subqueries). > > I wondered why, instead of inventing an extra semantics-modifying flag, > we couldn't just change the jointype to *be* JOIN_SEMI when we've > discovered that the inner side is unique. > > Now of course this only works if the join type was INNER to start with. > If it was a LEFT join, you'd need an "outer semi join" jointype which > we haven't got at the moment. But I wonder whether inventing that > jointype wouldn't let us arrive at a less messy handling of things in > the executor and EXPLAIN. I'm not very enamored of plastering this > "match_first_tuple_only" flag on every join, in part because it doesn't > appear to have sensible semantics for other jointypes such as JOIN_RIGHT. > And I'd really be happier to see the information reflected by join type > than a new line in EXPLAIN, also. The new join pushdown code in postgres_fdw does not grok SEMI and ANTI joins because there is no straightforward way of reducing those back to SQL. They can originate in multiple ways and not all of those can be represented easily. I think it would be nice to do something to fix this. For example, if a LEFT join WHERE outer_column IS NULL turns into an ANTI join, it would be nice if that were marked in some way so that postgres_fdw could conveniently emit it in the original form. Maybe the people who have been working on that patch just haven't been creative enough in thinking about how to solve this problem, but it makes me greet the idea of more join types that don't map directly back to SQL with somewhat mixed feelings. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > The new join pushdown code in postgres_fdw does not grok SEMI and ANTI > joins because there is no straightforward way of reducing those back > to SQL. They can originate in multiple ways and not all of those can > be represented easily. I think it would be nice to do something to > fix this. For example, if a LEFT join WHERE outer_column IS NULL > turns into an ANTI join, it would be nice if that were marked in some > way so that postgres_fdw could conveniently emit it in the original > form. > Maybe the people who have been working on that patch just haven't been > creative enough in thinking about how to solve this problem, but it > makes me greet the idea of more join types that don't map directly > back to SQL with somewhat mixed feelings. I can't summon a whole lot of sympathy for that objection. These cases won't arise with postgres_fdw as it stands because we'd never be able to prove uniqueness on a foreign table. When and if someone tries to improve that, we can think about how the whole thing ought to map to FDWs. Having said that, your point does add a bit of weight to David's suggestion of inventing two new join-type codes rather than overloading JOIN_SEMI. I'd still be a bit inclined to display JOIN_INNER_UNIQUE or whatever we call it as a "Semi" join in EXPLAIN, though, just to minimize the amount of newness there. regards, tom lane
On Saturday, March 12, 2016, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Given that this is largely internals (aside from the plan explanations themselves) I guess we can punt for now but calling an inner or outer join a semijoin in this case relies on a non-standard definition of semijoin - namely that it is an optimized variation of the other joins instead of a join type in its own right. This is complicated further in that we do implement a true semijoin (using exists) while we allow for an anti join to be non-standard if expressed using "left join ... is null" instead of via "not exists".
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> Don't the semantics of a SEMI JOIN also state that the output columns only
> come from the outer relation? i.e., the inner relation doesn't contribute
> either rows or columns to the final result? Or is that simply
> an implementation artifact of the fact that the only current way to perform
> a semi-join explicitly is via exists/in?
I think it's an artifact. What nodes.h actually says about it is you get
the values of one randomly-selected matching inner row, which seems like
a fine definition for the purposes we plan to put it to.
But is it a definition that actually materializes anywhere presently?
I'm not sure what we consider an authoritative source but relational algebra does define the results of semi and anti joins as only containing rows from main relation.
Calling these optimizations outer/inner+semi/anti preserves the ability to distinguish these versions from the standard definitions. I do like ithe idea of it being exposed and encapsulated as a distinct join type instead of being an attribute.
David J.
On Saturday, March 12, 2016, David G. Johnston <david.g.johnston@gmail.com> wrote:
In short "semi" communicates a semantic meaning as to the intended output of the query irrespective of the data upon which it is executed. We now are hijacking the and calling something "semi" if by some chance the data the query is operating against happens to be accommodating to some particular optimization.
On Saturday, March 12, 2016, Tom Lane <tgl@sss.pgh.pa.us> wrote:"David G. Johnston" <david.g.johnston@gmail.com> writes:
> Don't the semantics of a SEMI JOIN also state that the output columns only
> come from the outer relation? i.e., the inner relation doesn't contribute
> either rows or columns to the final result? Or is that simply
> an implementation artifact of the fact that the only current way to perform
> a semi-join explicitly is via exists/in?
I think it's an artifact. What nodes.h actually says about it is you get
the values of one randomly-selected matching inner row, which seems like
a fine definition for the purposes we plan to put it to.But is it a definition that actually materializes anywhere presently?I'm not sure what we consider an authoritative source but relational algebra does define the results of semi and anti joins as only containing rows from main relation.
Pondering it more calling these optimizations "semi" joins further distances us from the meaning of "semi" as used in relational algebra. The half that semi refers to IS that only one half of the tables are returned. That you only get a single row of output regardless of multiple potential matches is simply a consequence of this and general set theory.
This seems wrong on definitional and cleanliness grounds.
So while I'm still liking the idea of introducing specializations of outer and inner joins I think calling them "semi" joins adds a definitional inconsistency we are better off avoiding.
This came about because calling something "outer semi join" struck me as odd.
Something like "outer only join" and "inner only join" comes to mind. Consider the parallel between this and "index only scan". Learning that "only" means "join the outer row to the (at most for outer) one and only row in the inner relation" doesn't seem to much of a challenge.
David J.
On 12 March 2016 at 11:43, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> I wondered why, instead of inventing an extra semantics-modifying flag, >> we couldn't just change the jointype to *be* JOIN_SEMI when we've >> discovered that the inner side is unique. > > BTW, to clarify: I'm not imagining that we'd make this change in the > query jointree, as for example prepjointree.c might do. That would appear > to add join order constraints, which we don't want. But I'm thinking that > at the instant where we form a join Path, we could change the Path's > jointype to be JOIN_SEMI or JOIN_SEMI_OUTER if we're able to prove the > inner side unique, rather than annotating the Path with a separate flag. > Then that representation is what propagates forward. I've attached a patch which implements this, although I did call the new join type JOIN_LEFT_UNIQUE rather than JOIN_SEMI_OUTER. I'm not all that confident that I've not added handling for JOIN_LEFT_UNIQUE in places where it's not possible to get that join type, I did leave out a good number of places where I know it's not possible. I'm also not sure with too much certainty that I've got all cases correct, but the regression tests pass. The patch is more intended for assisting discussion than as a ready to commit patch. > It seems like the major intellectual complexity here is to figure out > how to detect inner-side-unique at reasonable cost. I see that for > LEFT joins you're caching that in the SpecialJoinInfos, which is probably > fine. But for INNER joins it looks like you're just doing it over again > for every candidate join, and that seems mighty expensive. I have a rough idea for this, but I need to think of it a bit more to make sure it's bulletproof. I also just noticed (since it's been a while since I wrote the patch) that in add_paths_to_joinrel() that innerrel can naturally be a joinrel too, and we can fail to find uniqueness in that joinrel. I think it should be possible to analyse the join rel too and search for a base rel which supports the distinctness, and then ensure none of the other rels which make up the join rel can cause tuple duplication of that rel. But this just causes missed optimisation opportunities. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
David Rowley <david.rowley@2ndquadrant.com> writes: > On 12 March 2016 at 11:43, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It seems like the major intellectual complexity here is to figure out >> how to detect inner-side-unique at reasonable cost. I see that for >> LEFT joins you're caching that in the SpecialJoinInfos, which is probably >> fine. But for INNER joins it looks like you're just doing it over again >> for every candidate join, and that seems mighty expensive. > I have a rough idea for this, but I need to think of it a bit more to > make sure it's bulletproof. Where are we on this? I had left it alone for awhile because I supposed that you were going to post a new patch, but it's been a couple weeks... regards, tom lane
On 2 April 2016 at 05:52, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <david.rowley@2ndquadrant.com> writes: > > On 12 March 2016 at 11:43, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> It seems like the major intellectual complexity here is to figure out > >> how to detect inner-side-unique at reasonable cost. I see that for > >> LEFT joins you're caching that in the SpecialJoinInfos, which is probably > >> fine. But for INNER joins it looks like you're just doing it over again > >> for every candidate join, and that seems mighty expensive. > > > I have a rough idea for this, but I need to think of it a bit more to > > make sure it's bulletproof. > > Where are we on this? I had left it alone for awhile because I supposed > that you were going to post a new patch, but it's been a couple weeks... Apologies for the delay. I was fully booked. I worked on this today to try and get it into shape. Changes: * Added unique_rels and non_unique_rels cache to PlannerInfo. These are both arrays of Lists which are indexed by the relid, which is either proved to be unique by, or proved not to be unique by each listed set of relations. Each List item is a Bitmapset containing the relids of the relation which proved the uniqueness, or proved no possibility of uniqueness for the non_unique_rels case. The non_unique_rels cache seems far less useful than the unique one, as with the standard join search, we start at level one, comparing singleton relations and works our way up, so it's only towards the end of the search that we'll have more rels on each side of the join search. Many of the unique cases are found early on in the search, but the non-unique cases must be rechecked as more relations are added to the search, as that introduces a new possibility of uniqueness. In fact, the only cache hit of the non-unique case in the regression tests is a test that's using the GEQO, so I'm not quite sure if the standard join search will ever have cache hit here. Never-the-less I kept the cache, as it's likely going to be most useful to have it when the GEQO is running the show, as that's when we're going to see the largest number of relations to join. There's also a small quirk which could lead to false negatives for uniqueness which might still need ironed out: I don't yet attempt to get the minimum set of relations which proved the uniqueness. This, perhaps, does not matter much for the standard join search, but likely will, more so for GEQO cases. Fixing this will require changes to relation_has_unique_index_for() to get it to record and return the relids of the clauses which matched the index. * Renamed JOIN_LEFT_UNIQUE to JOIN_SEMI_LEFT. It seems better to maintain the SEMI part. I thought about renaming JOIN_SEMI to JOIN_SEMI_INNER, but thought I'd better not touch that here. * Made a pass to update comments in the areas where I've added handling for JOIN_SEMI_LEFT. * I also updated comments in the regression tests to remove reference to unique joins. "Join conversion" seems to be a better term now. There was also a couple of places where I wasn't quite sure how to handle JOIN_SEMI_LEFT. I've marked these with an XXX comment. Perhaps how to handle them is more clear to you. I also was not quite sure the best place to allocate memory for these caches. I've ended up doing that in setup_simple_rel_arrays(), which is likely the wrong spot. I originally did this in standard_join_search(), after join_rel_level is allocated memory, but I soon realised that it can't happen here as the GEQO requires the caches too, and naturally it does not come through standard_join_search(). I was not quite sure if I should update any docs to mention that Inner Joins can become Semi Joins in some cases. I was also a bit unsure if I should move the two new functions I added to joinpath.c into analyzejoins.c. Thanks for taking an interest in this. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 2 April 2016 at 23:26, David Rowley <david.rowley@2ndquadrant.com> wrote: > I worked on this today to try and get it into shape. In the last patch I failed to notice that there's an alternative expected results file for one of the regression tests. The attached patch includes the fix to update that file to match the new expected EXPLAIN output. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
David Rowley <david.rowley@2ndquadrant.com> writes: > In the last patch I failed to notice that there's an alternative > expected results file for one of the regression tests. > The attached patch includes the fix to update that file to match the > new expected EXPLAIN output. Starting to look at this again. I wonder, now that you have the generic caching mechanism for remembering whether join inner sides have been proven unique, is it still worth having the is_unique_join field in SpecialJoinInfo? It seems like that's creating a separate code path for special joins vs. inner joins that may not be buying us much. It does potentially save lookups in the unique_rels cache, if you already have the SpecialJoinInfo at hand, but I'm not sure what that's worth. Also, as I'm looking around at the planner some more, I'm beginning to get uncomfortable with the idea of using JOIN_SEMI this way. It's fine so far as the executor is concerned, no doubt, but there's a lot of planner expectations about the use of JOIN_SEMI that we'd be violating. One is that there should be a SpecialJoinInfo for any SEMI join. Another is that JOIN_SEMI can be implemented by unique-ifying the inner side and then doing a regular inner join; that's not a code path we wish to trigger in these situations. The patch might avoid tripping over these hazards as it stands, but it seems fragile, and third-party FDWs could easily contain code that'll be broken. So I'm starting to feel that we'd better invent two new JoinTypes after all, to make sure we can distinguish plain-join- with-inner-side-known-unique from a real SEMI join when we need to. What's your thoughts on these matters? regards, tom lane
On 7 April 2016 at 04:05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > David Rowley <david.rowley@2ndquadrant.com> writes: >> In the last patch I failed to notice that there's an alternative >> expected results file for one of the regression tests. >> The attached patch includes the fix to update that file to match the >> new expected EXPLAIN output. > > Starting to look at this again. I wonder, now that you have the generic > caching mechanism for remembering whether join inner sides have been > proven unique, is it still worth having the is_unique_join field in > SpecialJoinInfo? It seems like that's creating a separate code path > for special joins vs. inner joins that may not be buying us much. > It does potentially save lookups in the unique_rels cache, if you already > have the SpecialJoinInfo at hand, but I'm not sure what that's worth. I quite like that field where it is, as it should make remove_useless_joins() a bit more efficient, as after a LEFT JOIN is removed, the previous code would go off and try to make sure all the joins are unique again, but now we cache that, and save it from having to bother doing that again, on joins already marked as unique. Certainly changing that would mean one less special case in joinpath.c, as the JOIN_LEFT case can be handle the same as the other cases, although it looks like probably, if I do change that, then I'd probably move is_innerrel_unique_for() into analyzejoins.c, and put the special case for JOIN_LEFT in that function, so that it calls specialjoin_is_unique_join(), then cache the sjinfo->min_righthand in the unique_rels cache if the result comes back positive, and in the non_unique_rels cache if negative... But it seems a bit crazy to go to the trouble or all that caching, when we can just throw the result in a struct field in the case of Special Joins. Maybe we could just hide both the new joinpath.c functions in analyzejoins.c and call it quits. It's not as if there's no special cases for JOIN_LEFT in that file. > Also, as I'm looking around at the planner some more, I'm beginning to get > uncomfortable with the idea of using JOIN_SEMI this way. It's fine so far > as the executor is concerned, no doubt, but there's a lot of planner > expectations about the use of JOIN_SEMI that we'd be violating. One is > that there should be a SpecialJoinInfo for any SEMI join. Another is that > JOIN_SEMI can be implemented by unique-ifying the inner side and then > doing a regular inner join; that's not a code path we wish to trigger in > these situations. The patch might avoid tripping over these hazards as it > stands, but it seems fragile, and third-party FDWs could easily contain > code that'll be broken. So I'm starting to feel that we'd better invent > two new JoinTypes after all, to make sure we can distinguish plain-join- > with-inner-side-known-unique from a real SEMI join when we need to. > > What's your thoughts on these matters? I was a bit uncomfortable with JOIN_INNER becoming JOIN_SEMI before, but that was for EXPLAIN reasons. So I do think it's better to have JOIN_INNER_UNIQUE and JOIN_LEFT_UNIQUE instead. I can go make that change, but unsure on how EXPLAIN will display these now. I'd need to pull out my tests if we don't have anything to show in EXPLAIN, and I'd really rather have tests for this. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 7 April 2016 at 08:01, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 7 April 2016 at 04:05, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Starting to look at this again. I wonder, now that you have the generic >> caching mechanism for remembering whether join inner sides have been >> proven unique, is it still worth having the is_unique_join field in >> SpecialJoinInfo? It seems like that's creating a separate code path >> for special joins vs. inner joins that may not be buying us much. >> It does potentially save lookups in the unique_rels cache, if you already >> have the SpecialJoinInfo at hand, but I'm not sure what that's worth. > > I quite like that field where it is, as it should make > remove_useless_joins() a bit more efficient, as after a LEFT JOIN is > removed, the previous code would go off and try to make sure all the > joins are unique again, but now we cache that, and save it from having > to bother doing that again, on joins already marked as unique. > > Certainly changing that would mean one less special case in > joinpath.c, as the JOIN_LEFT case can be handle the same as the other > cases, although it looks like probably, if I do change that, then I'd > probably move is_innerrel_unique_for() into analyzejoins.c, and put > the special case for JOIN_LEFT in that function, so that it calls > specialjoin_is_unique_join(), then cache the sjinfo->min_righthand in > the unique_rels cache if the result comes back positive, and in the > non_unique_rels cache if negative... But it seems a bit crazy to go to > the trouble or all that caching, when we can just throw the result in > a struct field in the case of Special Joins. Maybe we could just hide > both the new joinpath.c functions in analyzejoins.c and call it quits. > It's not as if there's no special cases for JOIN_LEFT in that file. We could also get rid of the SpecialJoinInfo.is_unique_join and just store this as optimal_jointype, where this would be initialised to jointype in make_outerjoininfo(), and then set in mark_unique_joins(). This would simplify the test in get_optimal_jointype(), perhaps if (IS_OUTER_JOIN(jointype)) return sjinfo->optimal_jointype; -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 7 April 2016 at 08:01, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 7 April 2016 at 04:05, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Starting to look at this again. I wonder, now that you have the generic >> caching mechanism for remembering whether join inner sides have been >> proven unique, is it still worth having the is_unique_join field in >> SpecialJoinInfo? It seems like that's creating a separate code path >> for special joins vs. inner joins that may not be buying us much. >> It does potentially save lookups in the unique_rels cache, if you already >> have the SpecialJoinInfo at hand, but I'm not sure what that's worth. > > I quite like that field where it is, as it should make > remove_useless_joins() a bit more efficient, as after a LEFT JOIN is > removed, the previous code would go off and try to make sure all the > joins are unique again, but now we cache that, and save it from having > to bother doing that again, on joins already marked as unique. > > Certainly changing that would mean one less special case in > joinpath.c, as the JOIN_LEFT case can be handle the same as the other > cases, although it looks like probably, if I do change that, then I'd > probably move is_innerrel_unique_for() into analyzejoins.c, and put > the special case for JOIN_LEFT in that function, so that it calls > specialjoin_is_unique_join(), then cache the sjinfo->min_righthand in > the unique_rels cache if the result comes back positive, and in the > non_unique_rels cache if negative... But it seems a bit crazy to go to > the trouble or all that caching, when we can just throw the result in > a struct field in the case of Special Joins. Maybe we could just hide > both the new joinpath.c functions in analyzejoins.c and call it quits. > It's not as if there's no special cases for JOIN_LEFT in that file. > >> Also, as I'm looking around at the planner some more, I'm beginning to get >> uncomfortable with the idea of using JOIN_SEMI this way. It's fine so far >> as the executor is concerned, no doubt, but there's a lot of planner >> expectations about the use of JOIN_SEMI that we'd be violating. One is >> that there should be a SpecialJoinInfo for any SEMI join. Another is that >> JOIN_SEMI can be implemented by unique-ifying the inner side and then >> doing a regular inner join; that's not a code path we wish to trigger in >> these situations. The patch might avoid tripping over these hazards as it >> stands, but it seems fragile, and third-party FDWs could easily contain >> code that'll be broken. So I'm starting to feel that we'd better invent >> two new JoinTypes after all, to make sure we can distinguish plain-join- >> with-inner-side-known-unique from a real SEMI join when we need to. >> >> What's your thoughts on these matters? > > I was a bit uncomfortable with JOIN_INNER becoming JOIN_SEMI before, > but that was for EXPLAIN reasons. So I do think it's better to have > JOIN_INNER_UNIQUE and JOIN_LEFT_UNIQUE instead. I can go make that > change, but unsure on how EXPLAIN will display these now. I'd need to > pull out my tests if we don't have anything to show in EXPLAIN, and > I'd really rather have tests for this. I've attached an updated patch which introduces JOIN_INNER_UNIQUE and JOIN_LEFT_UNIQUE. So unique inner joins no longer borrow JOIN_SEMI. I also made some changes around the is_unique_join flag in SpecialJoinInfo. I've changed this to become optimal_jointype, which is initially set to jointype and updated by what used to be called mark_unique_joins(), now called optimize_outerjoin_types(). The LEFT JOIN removal code now only bothers with special joins with optimal_jointype as JOIN_LEFT_UNIQUE. This still saves any double work analyzing the unique properties of LEFT JOINs. I've moved the two new functions I put in joinpath.c into analyzejoins.c In EXPLAIN, I named these new join types "Unique Inner" and "Unique Left". Going by a comment in explain.c, there's some historic reason that we display "Hash Join" rather than "Hash Inner Join", and "Nested Loop", rather than "Nested Loop Join" or "Nested Loop Inner Join". I wasn't quite sure if Unique Inner Joins should use a similar shorthand form, so I didn't touch that code. We'll get "Nested Loop Unique Inner Join" now instead of "Nested Loop". I wasn't able to think of some equivalent shorthand form that made sense. I know we're close to the feature freeze, so I just want to say that I'll be AFK starting 5:30am Friday New Zealand time (13:30 on the 8th New York time), until Sunday ~4pm. . I really hope that if this needs any tweaking it will be minor. I'll check my email before I leave on Friday in the hope that if there's anything, then I can quickly fix it up before I go, but time will be short. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
David Rowley <david.rowley@2ndquadrant.com> writes: > I've attached an updated patch which introduces JOIN_INNER_UNIQUE and > JOIN_LEFT_UNIQUE. So unique inner joins no longer borrow JOIN_SEMI. OK. > In EXPLAIN, I named these new join types "Unique Inner" and "Unique > Left". Hm. I'm back to being unhappy about the amount of churn introduced into the regression test outputs by this patch. I wonder whether we could get away with only mentioning the "unique" aspect in VERBOSE mode. I'm also a bit suspicious of the fact that some of the plans in aggregates.out changed from merge to hash joins; with basically no stats at hand in those tests, that seems dubious. A quick look at what the patch touched in costsize.c suggests that this might be because you've effectively allowed cost_hashjoin to give a cost discount for inner unique, but provided no similar intelligence in cost_mergejoin. > I know we're close to the feature freeze, so I just want to say that > I'll be AFK starting 5:30am Friday New Zealand time (13:30 on the 8th > New York time), until Sunday ~4pm. Understood. I don't know if we'll get this in or not, but I'll work on it. regards, tom lane
David Rowley <david.rowley@2ndquadrant.com> writes: > [ unique_joins_2016-04-07.patch ] Just had a thought about this, which should have crystallized a long time ago perhaps. Where I'd originally imagined you were going with this idea is to do what the thread title actually says, and check for joins in which the *outer* side is unique. I can't see that that's of any value for nestloop or hash joins, but for merge joins, knowing that the outer side is unique could be extremely valuable because we could skip doing mark/restore backups on the inner side, hugely reducing the cost when the inner side has many duplicates. Now I'm not asking for the initially-committed version of the patch to do that, but I think we need to design it to be readily extensible to do so. The problem with this is that it blows the current factorization around add_paths_to_joinrel out of the water. What we'd want is for the caller (make_join_rel) to determine uniqueness on both sides, and pass that info down to each of its two calls of add_paths_to_joinrel; otherwise we have to do double the work because each run of add_paths_to_joinrel will have to make those same two determinations. This probably also means that encoding the uniqueness into JoinType is a lost cause. Expanding JOIN_INNER into four variants depending on whether either or both sides are known unique, and ditto for JOIN_LEFT, doesn't seem attractive at all. I suspect we want to go back to your original design with a separate bool flag (well, two bools now, but anyway separate from JoinType). Or maybe the variant JoinTypes still are better, since they'd fit into switch() tests more naturally, but it's a lot more debatable as to whether that notation is a win. I apologize for leading you down the wrong path on the notational aspect, but sometimes the right answer isn't clear till you've tried all the possibilities. Anyway, while refactoring the make_join_rel/add_paths_to_joinrel division of labor wouldn't be such a big deal in itself, I don't want to commit a change to JoinType only to undo it later; that would be too much churn. So I think we need to resolve this question before we can move forward. I don't know if you have time to look at this now --- my clock says it's already Friday morning in New Zealand. regards, tom lane
Tom Lane wrote: > Anyway, while refactoring the make_join_rel/add_paths_to_joinrel division > of labor wouldn't be such a big deal in itself, I don't want to commit a > change to JoinType only to undo it later; that would be too much churn. > So I think we need to resolve this question before we can move forward. > I don't know if you have time to look at this now --- my clock says it's > already Friday morning in New Zealand. FWIW the feature freeze rules state that it is allowed for a committer to request an extension to the feature freeze date for individual patches: https://www.postgresql.org/message-id/CA%2BTgmoY56w5FOzeEo%2Bi48qehL%2BBsVTwy-Q1M0xjUhUCwgGW7-Q%40mail.gmail.com It seems to me that the restrictions laid out there are well met for this patch, if you only need a couple of additional days for this patch to get in. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Tom Lane wrote: >> I don't know if you have time to look at this now --- my clock says it's >> already Friday morning in New Zealand. > FWIW the feature freeze rules state that it is allowed for a committer > to request an extension to the feature freeze date for individual > patches: > https://www.postgresql.org/message-id/CA%2BTgmoY56w5FOzeEo%2Bi48qehL%2BBsVTwy-Q1M0xjUhUCwgGW7-Q%40mail.gmail.com > It seems to me that the restrictions laid out there are well met for > this patch, if you only need a couple of additional days for this patch > to get in. Hmm ... the changes I'm thinking about here are certainly pretty mechanical, if tedious. The main question mark I have hanging over this patch is whether the planning-time penalty is too large --- but that's something that can be tested with the patch as it stands. Let me go investigate that a bit before requesting an extension. regards, tom lane
I wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> FWIW the feature freeze rules state that it is allowed for a committer >> to request an extension to the feature freeze date for individual >> patches: >> https://www.postgresql.org/message-id/CA%2BTgmoY56w5FOzeEo%2Bi48qehL%2BBsVTwy-Q1M0xjUhUCwgGW7-Q%40mail.gmail.com >> It seems to me that the restrictions laid out there are well met for >> this patch, if you only need a couple of additional days for this patch >> to get in. > Hmm ... the changes I'm thinking about here are certainly pretty > mechanical, if tedious. The main question mark I have hanging over > this patch is whether the planning-time penalty is too large --- but > that's something that can be tested with the patch as it stands. > Let me go investigate that a bit before requesting an extension. I did some performance testing on the attached somewhat-cleaned-up patch, and convinced myself that the planning time penalty is fairly minimal: on the order of a couple percent in simple one-join queries, and less than that in very large queries. Oddly, it seems that the result cacheing done in get_optimal_jointype() is only barely worth the trouble in typical cases; though if you get a query large enough to require GEQO, it's a win because successive GEQO attempts can re-use cache entries made by earlier attempts. (This may indicate something wrong with my testing procedure? Seems like diking out the cache should have made more difference.) I did find by measurement that the negative-cache-entry code produces exactly zero hits unless you're in GEQO mode, which is not really surprising given the order in which the join search occurs. So in the attached patch I made the code not bother with making negative cache entries unless using GEQO, to hopefully save a few nanoseconds. I rebased over f338dd758, did a little bit of code cleanup and fixed some bugs in the uniqueness detection logic, but have not reviewed the rest of the patch since it's likely all gonna change if we reconsider the JoinType representation. Anyway, I think it would be reasonable to give this patch a few more days in view of David's being away through the weekend. But the RMT has final say on that. regards, tom lane diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 50f1261..f9df294 100644 *** a/contrib/postgres_fdw/expected/postgres_fdw.out --- b/contrib/postgres_fdw/expected/postgres_fdw.out *************** EXPLAIN (VERBOSE, COSTS false) *** 386,392 **** --------------------------------------------------------------------------------------- Limit Output: t1.c1, t2."C 1" ! -> Merge Join Output: t1.c1, t2."C 1" Merge Cond: (t1.c1 = t2."C 1") -> Foreign Scan on public.ft2 t1 --- 386,392 ---- --------------------------------------------------------------------------------------- Limit Output: t1.c1, t2."C 1" ! -> Merge Unique Inner Join Output: t1.c1, t2."C 1" Merge Cond: (t1.c1 = t2."C 1") -> Foreign Scan on public.ft2 t1 *************** EXPLAIN (VERBOSE, COSTS false) *** 419,425 **** --------------------------------------------------------------------------------------- Limit Output: t1.c1, t2."C 1" ! -> Merge Left Join Output: t1.c1, t2."C 1" Merge Cond: (t1.c1 = t2."C 1") -> Foreign Scan on public.ft2 t1 --- 419,425 ---- --------------------------------------------------------------------------------------- Limit Output: t1.c1, t2."C 1" ! -> Merge Unique Left Join Output: t1.c1, t2."C 1" Merge Cond: (t1.c1 = t2."C 1") -> Foreign Scan on public.ft2 t1 *************** explain (verbose, costs off) select * fr *** 2520,2526 **** where f.f3 = l.f3 COLLATE "POSIX" and l.f1 = 'foo'; QUERY PLAN ------------------------------------------------------------- ! Hash Join Output: f.f1, f.f2, f.f3, l.f1, l.f2, l.f3 Hash Cond: ((f.f3)::text = (l.f3)::text) -> Foreign Scan on public.ft3 f --- 2520,2526 ---- where f.f3 = l.f3 COLLATE "POSIX" and l.f1 = 'foo'; QUERY PLAN ------------------------------------------------------------- ! Hash Unique Inner Join Output: f.f1, f.f2, f.f3, l.f1, l.f2, l.f3 Hash Cond: ((f.f3)::text = (l.f3)::text) -> Foreign Scan on public.ft3 f diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index ee0220a..59084c6 100644 *** a/contrib/postgres_fdw/postgres_fdw.c --- b/contrib/postgres_fdw/postgres_fdw.c *************** foreign_join_ok(PlannerInfo *root, RelOp *** 3923,3932 **** /* * We support pushing down INNER, LEFT, RIGHT and FULL OUTER joins. * Constructing queries representing SEMI and ANTI joins is hard, hence ! * not considered right now. */ if (jointype != JOIN_INNER && jointype != JOIN_LEFT && ! jointype != JOIN_RIGHT && jointype != JOIN_FULL) return false; /* --- 3923,3935 ---- /* * We support pushing down INNER, LEFT, RIGHT and FULL OUTER joins. * Constructing queries representing SEMI and ANTI joins is hard, hence ! * not considered right now. INNER_UNIQUE and LEFT_UNIQUE joins are ok ! * here, since they're merely an optimization their non-unique ! * counterparts. */ if (jointype != JOIN_INNER && jointype != JOIN_LEFT && ! jointype != JOIN_RIGHT && jointype != JOIN_FULL && ! jointype != JOIN_INNER_UNIQUE && jointype != JOIN_LEFT_UNIQUE) return false; /* *************** foreign_join_ok(PlannerInfo *root, RelOp *** 4052,4057 **** --- 4055,4061 ---- switch (jointype) { case JOIN_INNER: + case JOIN_INNER_UNIQUE: fpinfo->remote_conds = list_concat(fpinfo->remote_conds, list_copy(fpinfo_i->remote_conds)); fpinfo->remote_conds = list_concat(fpinfo->remote_conds, *************** foreign_join_ok(PlannerInfo *root, RelOp *** 4059,4064 **** --- 4063,4069 ---- break; case JOIN_LEFT: + case JOIN_LEFT_UNIQUE: fpinfo->joinclauses = list_concat(fpinfo->joinclauses, list_copy(fpinfo_i->remote_conds)); fpinfo->remote_conds = list_concat(fpinfo->remote_conds, diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 713cd0e..e05925b 100644 *** a/src/backend/commands/explain.c --- b/src/backend/commands/explain.c *************** ExplainNode(PlanState *planstate, List * *** 1118,1126 **** --- 1118,1132 ---- case JOIN_INNER: jointype = "Inner"; break; + case JOIN_INNER_UNIQUE: + jointype = "Unique Inner"; + break; case JOIN_LEFT: jointype = "Left"; break; + case JOIN_LEFT_UNIQUE: + jointype = "Unique Left"; + break; case JOIN_FULL: jointype = "Full"; break; diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c index 369e666..196a075 100644 *** a/src/backend/executor/nodeHashjoin.c --- b/src/backend/executor/nodeHashjoin.c *************** ExecHashJoin(HashJoinState *node) *** 306,315 **** } /* ! * In a semijoin, we'll consider returning the first ! * match, but after that we're done with this outer tuple. */ ! if (node->js.jointype == JOIN_SEMI) node->hj_JoinState = HJ_NEED_NEW_OUTER; if (otherqual == NIL || --- 306,318 ---- } /* ! * In an inner unique, left unique or semi join, we'll ! * consider returning the first match, but after that ! * we're done with this outer tuple. */ ! if (node->js.jointype == JOIN_INNER_UNIQUE || ! node->js.jointype == JOIN_LEFT_UNIQUE || ! node->js.jointype == JOIN_SEMI) node->hj_JoinState = HJ_NEED_NEW_OUTER; if (otherqual == NIL || *************** ExecInitHashJoin(HashJoin *node, EState *** 499,507 **** --- 502,512 ---- switch (node->join.jointype) { case JOIN_INNER: + case JOIN_INNER_UNIQUE: case JOIN_SEMI: break; case JOIN_LEFT: + case JOIN_LEFT_UNIQUE: case JOIN_ANTI: hjstate->hj_NullInnerTupleSlot = ExecInitNullTupleSlot(estate, diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c index 6db09b8..aa3e144 100644 *** a/src/backend/executor/nodeMergejoin.c --- b/src/backend/executor/nodeMergejoin.c *************** ExecMergeJoin(MergeJoinState *node) *** 840,849 **** } /* ! * In a semijoin, we'll consider returning the first ! * match, but after that we're done with this outer tuple. */ ! if (node->js.jointype == JOIN_SEMI) node->mj_JoinState = EXEC_MJ_NEXTOUTER; qualResult = (otherqual == NIL || --- 840,852 ---- } /* ! * In an inner unique, left unique or semi join, we'll ! * consider returning the first match, but after that ! * we're done with this outer tuple. */ ! if (node->js.jointype == JOIN_INNER_UNIQUE || ! node->js.jointype == JOIN_LEFT_UNIQUE || ! node->js.jointype == JOIN_SEMI) node->mj_JoinState = EXEC_MJ_NEXTOUTER; qualResult = (otherqual == NIL || *************** ExecInitMergeJoin(MergeJoin *node, EStat *** 1555,1564 **** --- 1558,1569 ---- { case JOIN_INNER: case JOIN_SEMI: + case JOIN_INNER_UNIQUE: mergestate->mj_FillOuter = false; mergestate->mj_FillInner = false; break; case JOIN_LEFT: + case JOIN_LEFT_UNIQUE: case JOIN_ANTI: mergestate->mj_FillOuter = true; mergestate->mj_FillInner = false; diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c index 555fa09..0526170 100644 *** a/src/backend/executor/nodeNestloop.c --- b/src/backend/executor/nodeNestloop.c *************** ExecNestLoop(NestLoopState *node) *** 182,187 **** --- 182,188 ---- if (!node->nl_MatchedOuter && (node->js.jointype == JOIN_LEFT || + node->js.jointype == JOIN_LEFT_UNIQUE || node->js.jointype == JOIN_ANTI)) { /* *************** ExecNestLoop(NestLoopState *node) *** 247,256 **** } /* ! * In a semijoin, we'll consider returning the first match, but ! * after that we're done with this outer tuple. */ ! if (node->js.jointype == JOIN_SEMI) node->nl_NeedNewOuter = true; if (otherqual == NIL || ExecQual(otherqual, econtext, false)) --- 248,260 ---- } /* ! * In an inner unique, left unique or semi join, we'll consider ! * returning the first match, but after that we're done with this ! * outer tuple. */ ! if (node->js.jointype == JOIN_INNER_UNIQUE || ! node->js.jointype == JOIN_LEFT_UNIQUE || ! node->js.jointype == JOIN_SEMI) node->nl_NeedNewOuter = true; if (otherqual == NIL || ExecQual(otherqual, econtext, false)) *************** ExecInitNestLoop(NestLoop *node, EState *** 355,363 **** --- 359,369 ---- switch (node->join.jointype) { case JOIN_INNER: + case JOIN_INNER_UNIQUE: case JOIN_SEMI: break; case JOIN_LEFT: + case JOIN_LEFT_UNIQUE: case JOIN_ANTI: nlstate->nl_NullInnerTupleSlot = ExecInitNullTupleSlot(estate, diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index a21928b..a2a91e5 100644 *** a/src/backend/nodes/copyfuncs.c --- b/src/backend/nodes/copyfuncs.c *************** _copySpecialJoinInfo(const SpecialJoinIn *** 2065,2070 **** --- 2065,2071 ---- COPY_BITMAPSET_FIELD(syn_lefthand); COPY_BITMAPSET_FIELD(syn_righthand); COPY_SCALAR_FIELD(jointype); + COPY_SCALAR_FIELD(optimal_jointype); COPY_SCALAR_FIELD(lhs_strict); COPY_SCALAR_FIELD(delay_upper_joins); COPY_SCALAR_FIELD(semi_can_btree); diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c index 3c6c567..3302a69 100644 *** a/src/backend/nodes/equalfuncs.c --- b/src/backend/nodes/equalfuncs.c *************** _equalSpecialJoinInfo(const SpecialJoinI *** 837,842 **** --- 837,843 ---- COMPARE_BITMAPSET_FIELD(syn_lefthand); COMPARE_BITMAPSET_FIELD(syn_righthand); COMPARE_SCALAR_FIELD(jointype); + COMPARE_SCALAR_FIELD(optimal_jointype); COMPARE_SCALAR_FIELD(lhs_strict); COMPARE_SCALAR_FIELD(delay_upper_joins); COMPARE_SCALAR_FIELD(semi_can_btree); diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index f783a49..06323d4 100644 *** a/src/backend/nodes/outfuncs.c --- b/src/backend/nodes/outfuncs.c *************** _outSpecialJoinInfo(StringInfo str, cons *** 2276,2281 **** --- 2276,2282 ---- WRITE_BITMAPSET_FIELD(syn_lefthand); WRITE_BITMAPSET_FIELD(syn_righthand); WRITE_ENUM_FIELD(jointype, JoinType); + WRITE_ENUM_FIELD(optimal_jointype, JoinType); WRITE_BOOL_FIELD(lhs_strict); WRITE_BOOL_FIELD(delay_upper_joins); WRITE_BOOL_FIELD(semi_can_btree); diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index 70a4c27..e4d5cc9 100644 *** a/src/backend/optimizer/path/costsize.c --- b/src/backend/optimizer/path/costsize.c *************** cost_group(Path *path, PlannerInfo *root *** 1893,1900 **** * estimate and getting a tight lower bound. We choose to not examine the * join quals here, since that's by far the most expensive part of the * calculations. The end result is that CPU-cost considerations must be ! * left for the second phase; and for SEMI/ANTI joins, we must also postpone ! * incorporation of the inner path's run cost. * * 'workspace' is to be filled with startup_cost, total_cost, and perhaps * other data to be used by final_cost_nestloop --- 1893,1901 ---- * estimate and getting a tight lower bound. We choose to not examine the * join quals here, since that's by far the most expensive part of the * calculations. The end result is that CPU-cost considerations must be ! * left for the second phase; and for INNER_UNIQUE, LEFT_UNIQUE, SEMI, and ! * ANTI joins, we must also postpone incorporation of the inner path's run ! * cost. * * 'workspace' is to be filled with startup_cost, total_cost, and perhaps * other data to be used by final_cost_nestloop *************** cost_group(Path *path, PlannerInfo *root *** 1902,1908 **** * 'outer_path' is the outer input to the join * 'inner_path' is the inner input to the join * 'sjinfo' is extra info about the join for selectivity estimation ! * 'semifactors' contains valid data if jointype is SEMI or ANTI */ void initial_cost_nestloop(PlannerInfo *root, JoinCostWorkspace *workspace, --- 1903,1910 ---- * 'outer_path' is the outer input to the join * 'inner_path' is the inner input to the join * 'sjinfo' is extra info about the join for selectivity estimation ! * 'semifactors' contains valid data if jointype is INNER_UNIQUE, LEFT_UNIQUE, ! * SEMI, or ANTI */ void initial_cost_nestloop(PlannerInfo *root, JoinCostWorkspace *workspace, *************** initial_cost_nestloop(PlannerInfo *root, *** 1940,1949 **** inner_run_cost = inner_path->total_cost - inner_path->startup_cost; inner_rescan_run_cost = inner_rescan_total_cost - inner_rescan_start_cost; ! if (jointype == JOIN_SEMI || jointype == JOIN_ANTI) { /* ! * SEMI or ANTI join: executor will stop after first match. * * Getting decent estimates requires inspection of the join quals, * which we choose to postpone to final_cost_nestloop. --- 1942,1955 ---- inner_run_cost = inner_path->total_cost - inner_path->startup_cost; inner_rescan_run_cost = inner_rescan_total_cost - inner_rescan_start_cost; ! if (jointype == JOIN_INNER_UNIQUE || ! jointype == JOIN_LEFT_UNIQUE || ! jointype == JOIN_SEMI || ! jointype == JOIN_ANTI) { /* ! * INNER_UNIQUE, LEFT_UNIQUE, SEMI, or ANTI join: executor will stop ! * after first match. * * Getting decent estimates requires inspection of the join quals, * which we choose to postpone to final_cost_nestloop. *************** initial_cost_nestloop(PlannerInfo *root, *** 1977,1983 **** * 'path' is already filled in except for the rows and cost fields * 'workspace' is the result from initial_cost_nestloop * 'sjinfo' is extra info about the join for selectivity estimation ! * 'semifactors' contains valid data if path->jointype is SEMI or ANTI */ void final_cost_nestloop(PlannerInfo *root, NestPath *path, --- 1983,1990 ---- * 'path' is already filled in except for the rows and cost fields * 'workspace' is the result from initial_cost_nestloop * 'sjinfo' is extra info about the join for selectivity estimation ! * 'semifactors' contains valid data if jointype is INNER_UNIQUE, LEFT_UNIQUE, ! * SEMI, or ANTI */ void final_cost_nestloop(PlannerInfo *root, NestPath *path, *************** final_cost_nestloop(PlannerInfo *root, N *** 2017,2026 **** /* cost of inner-relation source data (we already dealt with outer rel) */ ! if (path->jointype == JOIN_SEMI || path->jointype == JOIN_ANTI) { /* ! * SEMI or ANTI join: executor will stop after first match. */ Cost inner_run_cost = workspace->inner_run_cost; Cost inner_rescan_run_cost = workspace->inner_rescan_run_cost; --- 2024,2037 ---- /* cost of inner-relation source data (we already dealt with outer rel) */ ! if (path->jointype == JOIN_INNER_UNIQUE || ! path->jointype == JOIN_LEFT_UNIQUE || ! path->jointype == JOIN_SEMI || ! path->jointype == JOIN_ANTI) { /* ! * INNER_UNIQUE, LEFT_UNIQUE, SEMI, or ANTI join: executor will stop ! * after first match. */ Cost inner_run_cost = workspace->inner_run_cost; Cost inner_rescan_run_cost = workspace->inner_rescan_run_cost; *************** initial_cost_mergejoin(PlannerInfo *root *** 2250,2255 **** --- 2261,2267 ---- innerendsel = cache->leftendsel; } if (jointype == JOIN_LEFT || + jointype == JOIN_LEFT_UNIQUE || jointype == JOIN_ANTI) { outerstartsel = 0.0; *************** initial_cost_hashjoin(PlannerInfo *root, *** 2773,2779 **** * num_batches * 'workspace' is the result from initial_cost_hashjoin * 'sjinfo' is extra info about the join for selectivity estimation ! * 'semifactors' contains valid data if path->jointype is SEMI or ANTI */ void final_cost_hashjoin(PlannerInfo *root, HashPath *path, --- 2785,2792 ---- * num_batches * 'workspace' is the result from initial_cost_hashjoin * 'sjinfo' is extra info about the join for selectivity estimation ! * 'semifactors' contains valid data if path->jointype is INNER_UNIQUE, ! * LEFT_UNIQUE, SEMI or ANTI */ void final_cost_hashjoin(PlannerInfo *root, HashPath *path, *************** final_cost_hashjoin(PlannerInfo *root, H *** 2896,2908 **** /* CPU costs */ ! if (path->jpath.jointype == JOIN_SEMI || path->jpath.jointype == JOIN_ANTI) { double outer_matched_rows; Selectivity inner_scan_frac; /* ! * SEMI or ANTI join: executor will stop after first match. * * For an outer-rel row that has at least one match, we can expect the * bucket scan to stop after a fraction 1/(match_count+1) of the --- 2909,2925 ---- /* CPU costs */ ! if (path->jpath.jointype == JOIN_INNER_UNIQUE || ! path->jpath.jointype == JOIN_LEFT_UNIQUE || ! path->jpath.jointype == JOIN_SEMI || ! path->jpath.jointype == JOIN_ANTI) { double outer_matched_rows; Selectivity inner_scan_frac; /* ! * INNER_UNIQUE, LEFT_UNIQUE, SEMI or ANTI join: executor will stop ! * after first match. * * For an outer-rel row that has at least one match, we can expect the * bucket scan to stop after a fraction 1/(match_count+1) of the *************** final_cost_hashjoin(PlannerInfo *root, H *** 2937,2946 **** clamp_row_est(inner_path_rows / virtualbuckets) * 0.05; /* Get # of tuples that will pass the basic join */ ! if (path->jpath.jointype == JOIN_SEMI) ! hashjointuples = outer_matched_rows; ! else hashjointuples = outer_path_rows - outer_matched_rows; } else { --- 2954,2963 ---- clamp_row_est(inner_path_rows / virtualbuckets) * 0.05; /* Get # of tuples that will pass the basic join */ ! if (path->jpath.jointype == JOIN_ANTI) hashjointuples = outer_path_rows - outer_matched_rows; + else + hashjointuples = outer_matched_rows; } else { *************** get_restriction_qual_cost(PlannerInfo *r *** 3469,3481 **** /* * compute_semi_anti_join_factors ! * Estimate how much of the inner input a SEMI or ANTI join ! * can be expected to scan. * ! * In a hash or nestloop SEMI/ANTI join, the executor will stop scanning ! * inner rows as soon as it finds a match to the current outer row. ! * We should therefore adjust some of the cost components for this effect. ! * This function computes some estimates needed for these adjustments. * These estimates will be the same regardless of the particular paths used * for the outer and inner relation, so we compute these once and then pass * them to all the join cost estimation functions. --- 3486,3498 ---- /* * compute_semi_anti_join_factors ! * Estimate how much of the inner input a INNER_UNIQUE, LEFT_UNIQUE, SEMI, ! * ANTI join can be expected to scan. * ! * In a hash or nestloop INNER_UNIQUE/LEFT_UNIQUE/SEMI/ANTI join, the executor ! * will stop scanning inner rows as soon as it finds a match to the current ! * outer row. We should therefore adjust some of the cost components for this ! * effect. This function computes some estimates needed for these adjustments. * These estimates will be the same regardless of the particular paths used * for the outer and inner relation, so we compute these once and then pass * them to all the join cost estimation functions. *************** get_restriction_qual_cost(PlannerInfo *r *** 3483,3489 **** * Input parameters: * outerrel: outer relation under consideration * innerrel: inner relation under consideration ! * jointype: must be JOIN_SEMI or JOIN_ANTI * sjinfo: SpecialJoinInfo relevant to this join * restrictlist: join quals * Output parameters: --- 3500,3507 ---- * Input parameters: * outerrel: outer relation under consideration * innerrel: inner relation under consideration ! * jointype: must be JOIN_INNER_UNIQUE, JOIN_LEFT_UNIQUE,JOIN_SEMI or ! * JOIN_ANTI * sjinfo: SpecialJoinInfo relevant to this join * restrictlist: join quals * Output parameters: *************** compute_semi_anti_join_factors(PlannerIn *** 3506,3512 **** ListCell *l; /* Should only be called in these cases */ ! Assert(jointype == JOIN_SEMI || jointype == JOIN_ANTI); /* * In an ANTI join, we must ignore clauses that are "pushed down", since --- 3524,3533 ---- ListCell *l; /* Should only be called in these cases */ ! Assert(jointype == JOIN_INNER_UNIQUE || ! jointype == JOIN_LEFT_UNIQUE || ! jointype == JOIN_SEMI || ! jointype == JOIN_ANTI); /* * In an ANTI join, we must ignore clauses that are "pushed down", since *************** compute_semi_anti_join_factors(PlannerIn *** 3530,3536 **** joinquals = restrictlist; /* ! * Get the JOIN_SEMI or JOIN_ANTI selectivity of the join clauses. */ jselec = clauselist_selectivity(root, joinquals, --- 3551,3558 ---- joinquals = restrictlist; /* ! * Get the JOIN_INNER_UNIQUE, JOIN_LEFT_UNIQUE, JOIN_SEMI or JOIN_ANTI ! * selectivity of the join clauses. */ jselec = clauselist_selectivity(root, joinquals, *************** calc_joinrel_size_estimate(PlannerInfo * *** 3967,3974 **** * pushed-down quals are applied after the outer join, so their * selectivity applies fully. * ! * For JOIN_SEMI and JOIN_ANTI, the selectivity is defined as the fraction ! * of LHS rows that have matches, and we apply that straightforwardly. */ switch (jointype) { --- 3989,4001 ---- * pushed-down quals are applied after the outer join, so their * selectivity applies fully. * ! * For JOIN_INNER_UNIQUE, JOIN_SEMI and JOIN_ANTI, the selectivity is ! * defined as the fraction of LHS rows that have matches, and we apply ! * that straightforwardly. ! * ! * For JOIN_SEMI_LEFT, the selectivity is defined as the fraction of the ! * LHS rows that have matches, although unlike JOIN_SEMI we must consider ! * NULL RHS rows, and take the higher estimate of the two. */ switch (jointype) { *************** calc_joinrel_size_estimate(PlannerInfo * *** 3990,3998 **** --- 4017,4032 ---- nrows *= pselec; break; case JOIN_SEMI: + case JOIN_INNER_UNIQUE: nrows = outer_rows * jselec; /* pselec not used */ break; + case JOIN_LEFT_UNIQUE: + nrows = outer_rows * jselec; + if (nrows < outer_rows) + nrows = outer_rows; + nrows *= pselec; + break; case JOIN_ANTI: nrows = outer_rows * (1.0 - jselec); nrows *= pselec; diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c index 41b60d0..c92f561 100644 *** a/src/backend/optimizer/path/joinpath.c --- b/src/backend/optimizer/path/joinpath.c *************** *** 19,24 **** --- 19,25 ---- #include "executor/executor.h" #include "foreign/fdwapi.h" #include "optimizer/cost.h" + #include "optimizer/planmain.h" #include "optimizer/pathnode.h" #include "optimizer/paths.h" *************** add_paths_to_joinrel(PlannerInfo *root, *** 88,93 **** --- 89,101 ---- bool mergejoin_allowed = true; ListCell *lc; + /* + * There may be a more optimal JoinType to use. Check for such cases + * first. + */ + jointype = get_optimal_jointype(root, outerrel, innerrel, jointype, sjinfo, + restrictlist); + extra.restrictlist = restrictlist; extra.mergeclause_list = NIL; extra.sjinfo = sjinfo; *************** add_paths_to_joinrel(PlannerInfo *root, *** 109,118 **** &mergejoin_allowed); /* ! * If it's SEMI or ANTI join, compute correction factors for cost ! * estimation. These will be the same for all paths. */ ! if (jointype == JOIN_SEMI || jointype == JOIN_ANTI) compute_semi_anti_join_factors(root, outerrel, innerrel, jointype, sjinfo, restrictlist, &extra.semifactors); --- 117,130 ---- &mergejoin_allowed); /* ! * If it's INNER_UNIQUE, LEFT_UNIQUE, SEMI or ANTI join, compute ! * correction factors for cost estimation. These will be the same for all ! * paths. */ ! if (jointype == JOIN_INNER_UNIQUE || ! jointype == JOIN_LEFT_UNIQUE || ! jointype == JOIN_SEMI || ! jointype == JOIN_ANTI) compute_semi_anti_join_factors(root, outerrel, innerrel, jointype, sjinfo, restrictlist, &extra.semifactors); *************** match_unsorted_outer(PlannerInfo *root, *** 827,842 **** ListCell *lc1; /* ! * Nestloop only supports inner, left, semi, and anti joins. Also, if we ! * are doing a right or full mergejoin, we must use *all* the mergeclauses ! * as join clauses, else we will not have a valid plan. (Although these ! * two flags are currently inverses, keep them separate for clarity and ! * possible future changes.) */ switch (jointype) { case JOIN_INNER: case JOIN_LEFT: case JOIN_SEMI: case JOIN_ANTI: nestjoinOK = true; --- 839,856 ---- ListCell *lc1; /* ! * Nestloop only supports inner, inner unique, left, left unique, semi, ! * and anti joins. Also, if we are doing a right or full mergejoin, we ! * must use *all* the mergeclauses as join clauses, else we will not have ! * a valid plan. (Although these two flags are currently inverses, keep ! * them separate for clarity and possible future changes.) */ switch (jointype) { case JOIN_INNER: + case JOIN_INNER_UNIQUE: case JOIN_LEFT: + case JOIN_LEFT_UNIQUE: case JOIN_SEMI: case JOIN_ANTI: nestjoinOK = true; diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c index 01d4fea..547d09d 100644 *** a/src/backend/optimizer/path/joinrels.c --- b/src/backend/optimizer/path/joinrels.c *************** join_is_legal(PlannerInfo *root, RelOptI *** 490,499 **** /* * The proposed join could still be legal, but only if we're * allowed to associate it into the RHS of this SJ. That means ! * this SJ must be a LEFT join (not SEMI or ANTI, and certainly ! * not FULL) and the proposed join must not overlap the LHS. */ ! if (sjinfo->jointype != JOIN_LEFT || bms_overlap(joinrelids, sjinfo->min_lefthand)) return false; /* invalid join path */ --- 490,501 ---- /* * The proposed join could still be legal, but only if we're * allowed to associate it into the RHS of this SJ. That means ! * this SJ must be a LEFT or LEFT_UNIQUE join (not SEMI or ANTI, ! * and certainly not FULL) and the proposed join must not overlap ! * the LHS. */ ! if ((sjinfo->jointype != JOIN_LEFT && ! sjinfo->jointype != JOIN_LEFT_UNIQUE) || bms_overlap(joinrelids, sjinfo->min_lefthand)) return false; /* invalid join path */ *************** join_is_legal(PlannerInfo *root, RelOptI *** 508,515 **** } /* ! * Fail if violated any SJ's RHS and didn't match to a LEFT SJ: the ! * proposed join can't associate into an SJ's RHS. * * Also, fail if the proposed join's predicate isn't strict; we're * essentially checking to see if we can apply outer-join identity 3, and --- 510,517 ---- } /* ! * Fail if violated any SJ's RHS and didn't match to a LEFT or LEFT_UNIQUE ! * SJ: the proposed join can't associate into an SJ's RHS. * * Also, fail if the proposed join's predicate isn't strict; we're * essentially checking to see if we can apply outer-join identity 3, and *************** join_is_legal(PlannerInfo *root, RelOptI *** 518,524 **** */ if (must_be_leftjoin && (match_sjinfo == NULL || ! match_sjinfo->jointype != JOIN_LEFT || !match_sjinfo->lhs_strict)) return false; /* invalid join path */ --- 520,527 ---- */ if (must_be_leftjoin && (match_sjinfo == NULL || ! (match_sjinfo->jointype != JOIN_LEFT && ! match_sjinfo->jointype != JOIN_LEFT_UNIQUE) || !match_sjinfo->lhs_strict)) return false; /* invalid join path */ *************** make_join_rel(PlannerInfo *root, RelOptI *** 698,703 **** --- 701,707 ---- sjinfo->syn_lefthand = rel1->relids; sjinfo->syn_righthand = rel2->relids; sjinfo->jointype = JOIN_INNER; + sjinfo->optimal_jointype = JOIN_INNER; /* we don't bother trying to make the remaining fields valid */ sjinfo->lhs_strict = false; sjinfo->delay_upper_joins = false; diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c index 3d305eb..f00d4a1 100644 *** a/src/backend/optimizer/plan/analyzejoins.c --- b/src/backend/optimizer/plan/analyzejoins.c *************** *** 34,39 **** --- 34,41 ---- /* local functions */ static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo); + static bool specialjoin_is_unique_join(PlannerInfo *root, + SpecialJoinInfo *sjinfo); static void remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids); static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved); *************** static bool rel_supports_distinctness(Pl *** 41,46 **** --- 43,80 ---- static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list); static Oid distinct_col_search(int colno, List *colnos, List *opids); + static bool is_innerrel_unique_for(PlannerInfo *root, + RelOptInfo *outerrel, + RelOptInfo *innerrel, + List *restrictlist); + + + /* + * optimize_outerjoin_types + * Determine the most optimal JoinType for each outer join, and update + * SpecialJoinInfo's optimal_jointype to that join type. + */ + void + optimize_outerjoin_types(PlannerInfo *root) + { + ListCell *lc; + + foreach(lc, root->join_info_list) + { + SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(lc); + + /* + * Currently we're only interested in LEFT JOINs. If we can prove + * these to have a unique inner side, based on the join condition then + * this can save the executor from having to attempt fruitless + * searches for subsequent matching outer tuples. + */ + if (sjinfo->jointype == JOIN_LEFT && + sjinfo->optimal_jointype != JOIN_LEFT_UNIQUE && + specialjoin_is_unique_join(root, sjinfo)) + sjinfo->optimal_jointype = JOIN_LEFT_UNIQUE; + } + } /* *************** restart: *** 95,100 **** --- 129,140 ---- root->join_info_list = list_delete_ptr(root->join_info_list, sjinfo); /* + * We may now be able to optimize some joins which we could not + * optimize before. (XXX huh? how would that happen?) + */ + optimize_outerjoin_types(root); + + /* * Restart the scan. This is necessary to ensure we find all * removable joins independently of ordering of the join_info_list * (note that removal of attr_needed bits may make a join appear *************** join_is_removable(PlannerInfo *root, Spe *** 156,186 **** int innerrelid; RelOptInfo *innerrel; Relids joinrelids; - List *clause_list = NIL; - ListCell *l; int attroff; /* ! * Must be a non-delaying left join to a single baserel, else we aren't ! * going to be able to do anything with it. */ ! if (sjinfo->jointype != JOIN_LEFT || sjinfo->delay_upper_joins) return false; if (!bms_get_singleton_member(sjinfo->min_righthand, &innerrelid)) ! return false; innerrel = find_base_rel(root, innerrelid); - /* - * Before we go to the effort of checking whether any innerrel variables - * are needed above the join, make a quick check to eliminate cases in - * which we will surely be unable to prove uniqueness of the innerrel. - */ - if (!rel_supports_distinctness(root, innerrel)) - return false; - /* Compute the relid set for the join we are considering */ joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand); --- 196,219 ---- int innerrelid; RelOptInfo *innerrel; Relids joinrelids; int attroff; + ListCell *l; /* ! * Join must have a unique inner side and must be a non-delaying join to a ! * single baserel, else we aren't going to be able to do anything with it. */ ! if (sjinfo->optimal_jointype != JOIN_LEFT_UNIQUE || sjinfo->delay_upper_joins) return false; + /* Now we know specialjoin_is_unique_join() succeeded for this join */ + if (!bms_get_singleton_member(sjinfo->min_righthand, &innerrelid)) ! return false; /* should not happen */ innerrel = find_base_rel(root, innerrelid); /* Compute the relid set for the join we are considering */ joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand); *************** join_is_removable(PlannerInfo *root, Spe *** 190,196 **** * * Note that this test only detects use of inner-rel attributes in higher * join conditions and the target list. There might be such attributes in ! * pushed-down conditions at this join, too. We check that case below. * * As a micro-optimization, it seems better to start with max_attr and * count down rather than starting with min_attr and counting up, on the --- 223,230 ---- * * Note that this test only detects use of inner-rel attributes in higher * join conditions and the target list. There might be such attributes in ! * pushed-down conditions at this join, too, but in this case the join ! * would not have been optimized into a LEFT_UNIQUE join. * * As a micro-optimization, it seems better to start with max_attr and * count down rather than starting with min_attr and counting up, on the *************** join_is_removable(PlannerInfo *root, Spe *** 231,236 **** --- 265,305 ---- return false; /* it does reference innerrel */ } + return true; + } + + /* + * specialjoin_is_unique_join + * True if it can be proved that this special join can only ever match at + * most one inner row for any single outer row. False is returned if + * there's insufficient evidence to prove the join is unique. + */ + static bool + specialjoin_is_unique_join(PlannerInfo *root, SpecialJoinInfo *sjinfo) + { + int innerrelid; + RelOptInfo *innerrel; + Relids joinrelids; + List *clause_list = NIL; + ListCell *lc; + + /* if there's more than one relation involved, then punt */ + if (!bms_get_singleton_member(sjinfo->min_righthand, &innerrelid)) + return false; + + innerrel = find_base_rel(root, innerrelid); + + /* + * Before we go to the effort of pulling out the join condition's columns, + * make a quick check to eliminate cases in which we will surely be unable + * to prove uniqueness of the innerrel. + */ + if (!rel_supports_distinctness(root, innerrel)) + return false; + + /* Compute the relid set for the join we are considering */ + joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand); + /* * Search for mergejoinable clauses that constrain the inner rel against * either the outer rel or a pseudoconstant. If an operator is *************** join_is_removable(PlannerInfo *root, Spe *** 238,246 **** * it's what we want. The mergejoinability test also eliminates clauses * containing volatile functions, which we couldn't depend on. */ ! foreach(l, innerrel->joininfo) { ! RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l); /* * If it's not a join clause for this outer join, we can't use it. --- 307,315 ---- * it's what we want. The mergejoinability test also eliminates clauses * containing volatile functions, which we couldn't depend on. */ ! foreach(lc, innerrel->joininfo) { ! RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(lc); /* * If it's not a join clause for this outer join, we can't use it. *************** join_is_removable(PlannerInfo *root, Spe *** 252,261 **** !bms_equal(restrictinfo->required_relids, joinrelids)) { /* ! * If such a clause actually references the inner rel then join ! * removal has to be disallowed. We have to check this despite ! * the previous attr_needed checks because of the possibility of ! * pushed-down clauses referencing the rel. */ if (bms_is_member(innerrelid, restrictinfo->clause_relids)) return false; --- 321,331 ---- !bms_equal(restrictinfo->required_relids, joinrelids)) { /* ! * If such a clause actually references the inner rel then we ! * can't say we're unique. (XXX this is confusing conditions for ! * join removability with conditions for uniqueness, and could ! * probably stand to be improved. But for the moment, keep on ! * applying the stricter condition.) */ if (bms_is_member(innerrelid, restrictinfo->clause_relids)) return false; *************** distinct_col_search(int colno, List *col *** 837,839 **** --- 907,1074 ---- } return InvalidOid; } + + + /* + * is_innerrel_unique_for + * Determine if this innerrel can, at most, return a single tuple for each + * outer tuple, based on the 'restrictlist'. + */ + static bool + is_innerrel_unique_for(PlannerInfo *root, + RelOptInfo *outerrel, + RelOptInfo *innerrel, + List *restrictlist) + { + List *clause_list = NIL; + ListCell *lc; + + /* Fall out quickly if we certainly can't prove anything */ + if (restrictlist == NIL || + !rel_supports_distinctness(root, innerrel)) + return false; + + /* + * Search for mergejoinable clauses that constrain the inner rel against + * the outer rel. If an operator is mergejoinable then it behaves like + * equality for some btree opclass, so it's what we want. The + * mergejoinability test also eliminates clauses containing volatile + * functions, which we couldn't depend on. + */ + foreach(lc, restrictlist) + { + RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(lc); + + /* Ignore if it's not a mergejoinable clause */ + if (!restrictinfo->can_join || + restrictinfo->mergeopfamilies == NIL) + continue; /* not mergejoinable */ + + /* + * Check if clause has the form "outer op inner" or "inner op outer", + * and if so mark which side is inner. + */ + if (!clause_sides_match_join(restrictinfo, outerrel->relids, + innerrel->relids)) + continue; /* no good for these input relations */ + + /* OK, add to list */ + clause_list = lappend(clause_list, restrictinfo); + } + + /* Let rel_is_distinct_for() do the hard work */ + return rel_is_distinct_for(root, innerrel, clause_list); + } + + /* + * get_optimal_jointype + * We may be able to optimize some joins by converting the JoinType to one + * which the executor is able to run more efficiently. Here we look for + * such cases and if we find a better choice, then we'll return it, + * otherwise we'll return the original JoinType. + */ + JoinType + get_optimal_jointype(PlannerInfo *root, + RelOptInfo *outerrel, + RelOptInfo *innerrel, + JoinType jointype, + SpecialJoinInfo *sjinfo, + List *restrictlist) + { + int innerrelid; + + /* left joins were already optimized in optimize_outerjoin_types() */ + if (jointype == JOIN_LEFT) + return sjinfo->optimal_jointype; + + if (!bms_get_singleton_member(innerrel->relids, &innerrelid)) + return jointype; + + /* + * Any INNER JOINs which can be proven to return at most one inner tuple + * for each outer tuple can be converted in to a JOIN_SEMI. + */ + if (jointype == JOIN_INNER) + { + MemoryContext old_context; + ListCell *lc; + + /* can't optimize jointype with an empty restrictlist */ + if (restrictlist == NIL) + return jointype; + + /* + * First let's query the unique and non-unique caches to see if we've + * managed to prove that innerrel is unique for some subset of this + * outerrel. We don't need an exact match, as if we have any extra + * outerrels than were previously cached, then they can't make the + * innerrel any less unique. + */ + foreach(lc, root->unique_rels[innerrelid]) + { + Bitmapset *unique_rels = (Bitmapset *) lfirst(lc); + + if (bms_is_subset(unique_rels, outerrel->relids)) + return JOIN_INNER_UNIQUE; /* Success! */ + } + + /* + * We may have previously determined that this outerrel, or some + * superset thereof, cannot prove this innerrel to be unique. + */ + foreach(lc, root->non_unique_rels[innerrelid]) + { + Bitmapset *unique_rels = (Bitmapset *) lfirst(lc); + + if (bms_is_subset(outerrel->relids, unique_rels)) + return jointype; + } + + /* No cached information, so try to make the proof. */ + if (is_innerrel_unique_for(root, outerrel, innerrel, restrictlist)) + { + jointype = JOIN_INNER_UNIQUE; /* Success! */ + + /* + * Cache the positive result for future probes, being sure to keep + * it in the planner_cxt even if we are working in GEQO. + * + * Note: one might consider trying to isolate the minimal subset + * of the outerrels that proved the innerrel unique. But it's not + * worth the trouble, because the planner builds up joinrels + * incrementally and so we'll see the minimally sufficient + * outerrels before any supersets of them anyway. + */ + old_context = MemoryContextSwitchTo(root->planner_cxt); + root->unique_rels[innerrelid] = + lappend(root->unique_rels[innerrelid], + bms_copy(outerrel->relids)); + MemoryContextSwitchTo(old_context); + } + else + { + /* + * None of the join conditions for outerrel proved innerrel + * unique, so we can safely reject this outerrel or any subset of + * it in future checks. + * + * However, in normal planning mode, caching this knowledge is + * totally pointless; it won't be queried again, because we build + * up joinrels from smaller to larger. It is useful in GEQO mode, + * where the knowledge can be carried across successive planning + * attempts; and it's likely to be useful when using join-search + * plugins, too. Hence cache only when join_search_private is + * non-NULL. (Yeah, that's a hack, but it seems reasonable.) + */ + if (root->join_search_private) + { + old_context = MemoryContextSwitchTo(root->planner_cxt); + root->non_unique_rels[innerrelid] = + lappend(root->non_unique_rels[innerrelid], + bms_copy(outerrel->relids)); + MemoryContextSwitchTo(old_context); + } + } + } + return jointype; + } diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c index 9999eea..a615680 100644 *** a/src/backend/optimizer/plan/initsplan.c --- b/src/backend/optimizer/plan/initsplan.c *************** make_outerjoininfo(PlannerInfo *root, *** 1128,1133 **** --- 1128,1135 ---- sjinfo->syn_lefthand = left_rels; sjinfo->syn_righthand = right_rels; sjinfo->jointype = jointype; + /* this may be changed later */ + sjinfo->optimal_jointype = jointype; /* this always starts out false */ sjinfo->delay_upper_joins = false; diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c index edd95d8..690349d 100644 *** a/src/backend/optimizer/plan/planmain.c --- b/src/backend/optimizer/plan/planmain.c *************** query_planner(PlannerInfo *root, List *t *** 124,129 **** --- 124,132 ---- */ setup_simple_rel_arrays(root); + /* Allocate memory for caching which joins are unique. */ + setup_unique_join_caches(root); + /* * Construct RelOptInfo nodes for all base relations in query, and * indirectly for all appendrel member relations ("other rels"). This *************** query_planner(PlannerInfo *root, List *t *** 185,190 **** --- 188,196 ---- */ fix_placeholder_input_needed_levels(root); + /* Determine if there's a better JoinType for each outer join */ + optimize_outerjoin_types(root); + /* * Remove any useless outer joins. Ideally this would be done during * jointree preprocessing, but the necessary information isn't available diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c index 5537c14..95c3a2f 100644 *** a/src/backend/optimizer/plan/setrefs.c --- b/src/backend/optimizer/plan/setrefs.c *************** set_join_references(PlannerInfo *root, J *** 1617,1622 **** --- 1617,1623 ---- switch (join->jointype) { case JOIN_LEFT: + case JOIN_LEFT_UNIQUE: case JOIN_SEMI: case JOIN_ANTI: inner_itlist->has_non_vars = false; diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c index 802eab3..c11769e 100644 *** a/src/backend/optimizer/util/relnode.c --- b/src/backend/optimizer/util/relnode.c *************** setup_simple_rel_arrays(PlannerInfo *roo *** 81,86 **** --- 81,102 ---- } /* + * setup_unique_join_caches + * Prepare the arrays we use for caching which joins are proved to be + * unique and non-unique. + */ + void + setup_unique_join_caches(PlannerInfo *root) + { + int size = list_length(root->parse->rtable) + 1; + + /* initialize the unique relation cache to all NULLs */ + root->unique_rels = (List **) palloc0(size * sizeof(List *)); + + root->non_unique_rels = (List **) palloc0(size * sizeof(List *)); + } + + /* * build_simple_rel * Construct a new RelOptInfo for a base relation or 'other' relation. */ diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c index 751de4b..66eb670 100644 *** a/src/backend/parser/parse_clause.c --- b/src/backend/parser/parse_clause.c *************** transformFromClauseItem(ParseState *psta *** 978,988 **** /* * Make the left-side RTEs available for LATERAL access within the * right side, by temporarily adding them to the pstate's namespace ! * list. Per SQL:2008, if the join type is not INNER or LEFT then the ! * left-side names must still be exposed, but it's an error to ! * reference them. (Stupid design, but that's what it says.) Hence, ! * we always push them into the namespace, but mark them as not ! * lateral_ok if the jointype is wrong. * * Notice that we don't require the merged namespace list to be * conflict-free. See the comments for scanNameSpaceForRefname(). --- 978,988 ---- /* * Make the left-side RTEs available for LATERAL access within the * right side, by temporarily adding them to the pstate's namespace ! * list. Per SQL:2008, if the join type is not INNER, INNER_UNIQUE, ! * LEFT or LEFT_UNIQUE then the left-side names must still be exposed, ! * but it's an error to reference them. (Stupid design, but that's ! * what it says.) Hence, we always push them into the namespace, but ! * mark them as not lateral_ok if the jointype is wrong. * * Notice that we don't require the merged namespace list to be * conflict-free. See the comments for scanNameSpaceForRefname(). *************** transformFromClauseItem(ParseState *psta *** 990,996 **** * NB: this coding relies on the fact that list_concat is not * destructive to its second argument. */ ! lateral_ok = (j->jointype == JOIN_INNER || j->jointype == JOIN_LEFT); setNamespaceLateralState(l_namespace, true, lateral_ok); sv_namespace_length = list_length(pstate->p_namespace); --- 990,999 ---- * NB: this coding relies on the fact that list_concat is not * destructive to its second argument. */ ! lateral_ok = (j->jointype == JOIN_INNER || ! j->jointype == JOIN_INNER_UNIQUE || ! j->jointype == JOIN_LEFT || ! j->jointype == JOIN_LEFT_UNIQUE); setNamespaceLateralState(l_namespace, true, lateral_ok); sv_namespace_length = list_length(pstate->p_namespace); diff --git a/src/backend/utils/adt/network_selfuncs.c b/src/backend/utils/adt/network_selfuncs.c index 2e39687..f14a3cb 100644 *** a/src/backend/utils/adt/network_selfuncs.c --- b/src/backend/utils/adt/network_selfuncs.c *************** networkjoinsel(PG_FUNCTION_ARGS) *** 215,221 **** --- 215,223 ---- switch (sjinfo->jointype) { case JOIN_INNER: + case JOIN_INNER_UNIQUE: case JOIN_LEFT: + case JOIN_LEFT_UNIQUE: case JOIN_FULL: /* diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index cc2a9a1..346952e 100644 *** a/src/backend/utils/adt/selfuncs.c --- b/src/backend/utils/adt/selfuncs.c *************** eqjoinsel(PG_FUNCTION_ARGS) *** 2202,2207 **** --- 2202,2208 ---- { case JOIN_INNER: case JOIN_LEFT: + case JOIN_LEFT_UNIQUE: case JOIN_FULL: selec = eqjoinsel_inner(operator, &vardata1, &vardata2); break; diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h index 84efa8e..2ba7b07 100644 *** a/src/include/nodes/nodes.h --- b/src/include/nodes/nodes.h *************** typedef enum JoinType *** 640,645 **** --- 640,655 ---- JOIN_ANTI, /* 1 copy of each LHS row that has no match */ /* + * The following join types are a variants of JOIN_INNER and JOIN_LEFT for + * when the inner side of the join is known to be unique. This serves + * solely as an optimization to allow the executor to skip looking for + * another matching tuple in the inner side, when it's known that another + * cannot exist. + */ + JOIN_INNER_UNIQUE, + JOIN_LEFT_UNIQUE, + + /* * These codes are used internally in the planner, but are not supported * by the executor (nor, indeed, by most of the planner). */ *************** typedef enum JoinType *** 668,673 **** --- 678,684 ---- #define IS_OUTER_JOIN(jointype) \ (((1 << (jointype)) & \ ((1 << JOIN_LEFT) | \ + (1 << JOIN_LEFT_UNIQUE) | \ (1 << JOIN_FULL) | \ (1 << JOIN_RIGHT) | \ (1 << JOIN_ANTI))) != 0) diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h index d430f6e..b00aba3 100644 *** a/src/include/nodes/relation.h --- b/src/include/nodes/relation.h *************** typedef struct PlannerInfo *** 221,226 **** --- 221,238 ---- List **join_rel_level; /* lists of join-relation RelOptInfos */ int join_cur_level; /* index of list being extended */ + /* + * During the join search we attempt to optimize joins to try to prove + * their inner side to be unique based on the join condition. This is a + * rather expensive thing to do as it requires checking each relations + * unique indexes to see if the relation can, at most, return one tuple + * for each outer tuple. We use this cache during the join search to + * record lists of the sets of relations which both prove, and disprove + * the uniqueness properties for the relid indexed by these arrays. + */ + List **unique_rels; /* cache for proven unique rels */ + List **non_unique_rels; /* cache for proven non-unique rels */ + List *init_plans; /* init SubPlans for query */ List *cte_plan_ids; /* per-CTE-item list of subplan IDs */ *************** typedef struct SpecialJoinInfo *** 1750,1755 **** --- 1762,1769 ---- Relids syn_lefthand; /* base relids syntactically within LHS */ Relids syn_righthand; /* base relids syntactically within RHS */ JoinType jointype; /* always INNER, LEFT, FULL, SEMI, or ANTI */ + JoinType optimal_jointype; /* as above but may also be + * INNER_UNIQUE or LEFT_UNIQUE */ bool lhs_strict; /* joinclause is strict for some LHS rel */ bool delay_upper_joins; /* can't commute with upper RHS */ /* Remaining fields are set only for JOIN_SEMI jointype: */ diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h index acc827d..13b212f 100644 *** a/src/include/optimizer/pathnode.h --- b/src/include/optimizer/pathnode.h *************** extern Path *reparameterize_path(Planner *** 236,241 **** --- 236,242 ---- * prototypes for relnode.c */ extern void setup_simple_rel_arrays(PlannerInfo *root); + extern void setup_unique_join_caches(PlannerInfo *root); extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid, RelOptKind reloptkind); extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid); diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h index da9c640..a10146c 100644 *** a/src/include/optimizer/planmain.h --- b/src/include/optimizer/planmain.h *************** extern RestrictInfo *build_implied_join_ *** 99,107 **** --- 99,114 ---- /* * prototypes for plan/analyzejoins.c */ + extern void optimize_outerjoin_types(PlannerInfo *root); extern List *remove_useless_joins(PlannerInfo *root, List *joinlist); extern bool query_supports_distinctness(Query *query); extern bool query_is_distinct_for(Query *query, List *colnos, List *opids); + extern JoinType get_optimal_jointype(PlannerInfo *root, + RelOptInfo *outerrel, + RelOptInfo *innerrel, + JoinType jointype, + SpecialJoinInfo *sjinfo, + List *restrictlist); /* * prototypes for plan/setrefs.c diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out index 3ff6691..691fa11 100644 *** a/src/test/regress/expected/aggregates.out --- b/src/test/regress/expected/aggregates.out *************** explain (costs off) select a,c from t1 g *** 880,908 **** explain (costs off) select * from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.y,t2.z; ! QUERY PLAN ! ------------------------------------------------------- ! Group Group Key: t1.a, t1.b, t2.x, t2.y ! -> Merge Join ! Merge Cond: ((t1.a = t2.x) AND (t1.b = t2.y)) ! -> Index Scan using t1_pkey on t1 ! -> Index Scan using t2_pkey on t2 ! (6 rows) -- Test case where t1 can be optimized but not t2 explain (costs off) select t1.*,t2.x,t2.z from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.z; ! QUERY PLAN ! ------------------------------------------------------- HashAggregate Group Key: t1.a, t1.b, t2.x, t2.z ! -> Merge Join ! Merge Cond: ((t1.a = t2.x) AND (t1.b = t2.y)) ! -> Index Scan using t1_pkey on t1 ! -> Index Scan using t2_pkey on t2 ! (6 rows) -- Cannot optimize when PK is deferrable explain (costs off) select * from t3 group by a,b,c; --- 880,910 ---- explain (costs off) select * from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.y,t2.z; ! QUERY PLAN ! ------------------------------------------------------ ! HashAggregate Group Key: t1.a, t1.b, t2.x, t2.y ! -> Hash Unique Inner Join ! Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b)) ! -> Seq Scan on t2 ! -> Hash ! -> Seq Scan on t1 ! (7 rows) -- Test case where t1 can be optimized but not t2 explain (costs off) select t1.*,t2.x,t2.z from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.z; ! QUERY PLAN ! ------------------------------------------------------ HashAggregate Group Key: t1.a, t1.b, t2.x, t2.z ! -> Hash Unique Inner Join ! Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b)) ! -> Seq Scan on t2 ! -> Hash ! -> Seq Scan on t1 ! (7 rows) -- Cannot optimize when PK is deferrable explain (costs off) select * from t3 group by a,b,c; diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out index 0391b8e..f446284 100644 *** a/src/test/regress/expected/equivclass.out --- b/src/test/regress/expected/equivclass.out *************** explain (costs off) *** 186,192 **** select * from ec1, ec2 where ff = x1 and x1 = '42'::int8alias2; QUERY PLAN ----------------------------------------- ! Nested Loop -> Seq Scan on ec2 Filter: (x1 = '42'::int8alias2) -> Index Scan using ec1_pkey on ec1 --- 186,192 ---- select * from ec1, ec2 where ff = x1 and x1 = '42'::int8alias2; QUERY PLAN ----------------------------------------- ! Nested Loop Unique Inner Join -> Seq Scan on ec2 Filter: (x1 = '42'::int8alias2) -> Index Scan using ec1_pkey on ec1 *************** explain (costs off) *** 310,316 **** -> Index Scan using ec1_expr3 on ec1 ec1_5 -> Index Scan using ec1_expr4 on ec1 ec1_6 -> Materialize ! -> Merge Join Merge Cond: ((((ec1_1.ff + 2) + 1)) = ec1.f1) -> Merge Append Sort Key: (((ec1_1.ff + 2) + 1)) --- 310,316 ---- -> Index Scan using ec1_expr3 on ec1 ec1_5 -> Index Scan using ec1_expr4 on ec1 ec1_6 -> Materialize ! -> Merge Unique Inner Join Merge Cond: ((((ec1_1.ff + 2) + 1)) = ec1.f1) -> Merge Append Sort Key: (((ec1_1.ff + 2) + 1)) *************** explain (costs off) *** 365,371 **** where ss1.x = ec1.f1 and ec1.ff = 42::int8; QUERY PLAN ----------------------------------------------------- ! Merge Join Merge Cond: ((((ec1_1.ff + 2) + 1)) = ec1.f1) -> Merge Append Sort Key: (((ec1_1.ff + 2) + 1)) --- 365,371 ---- where ss1.x = ec1.f1 and ec1.ff = 42::int8; QUERY PLAN ----------------------------------------------------- ! Merge Unique Inner Join Merge Cond: ((((ec1_1.ff + 2) + 1)) = ec1.f1) -> Merge Append Sort Key: (((ec1_1.ff + 2) + 1)) diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out index cafbc5e..7d0aaa7 100644 *** a/src/test/regress/expected/join.out --- b/src/test/regress/expected/join.out *************** from nt3 as nt3 *** 2756,2763 **** where nt3.id = 1 and ss2.b3; QUERY PLAN ----------------------------------------------- ! Nested Loop ! -> Nested Loop -> Index Scan using nt3_pkey on nt3 Index Cond: (id = 1) -> Index Scan using nt2_pkey on nt2 --- 2756,2763 ---- where nt3.id = 1 and ss2.b3; QUERY PLAN ----------------------------------------------- ! Nested Loop Unique Inner Join ! -> Nested Loop Unique Inner Join -> Index Scan using nt3_pkey on nt3 Index Cond: (id = 1) -> Index Scan using nt2_pkey on nt2 *************** explain (costs off) *** 4035,4041 **** on (p.k = ss.k); QUERY PLAN --------------------------------- ! Hash Left Join Hash Cond: (p.k = c.k) -> Seq Scan on parent p -> Hash --- 4035,4041 ---- on (p.k = ss.k); QUERY PLAN --------------------------------- ! Hash Unique Left Join Hash Cond: (p.k = c.k) -> Seq Scan on parent p -> Hash *************** ERROR: invalid reference to FROM-clause *** 5260,5262 **** --- 5260,5497 ---- LINE 1: ...xx1 using lateral (select * from int4_tbl where f1 = x1) ss; ^ HINT: There is an entry for table "xx1", but it cannot be referenced from this part of the query. + -- + -- test planner's ability to change joins into their appropriate semi join + -- type + -- + create table j1 (id int primary key); + create table j2 (id int primary key); + create table j3 (id int); + insert into j1 values(1),(2),(3); + insert into j2 values(1),(2),(3); + insert into j3 values(1),(1); + analyze j1; + analyze j2; + analyze j3; + -- ensure join is changed to a semi join + explain (verbose, costs off) + select * from j1 inner join j2 on j1.id = j2.id; + QUERY PLAN + ----------------------------------- + Hash Unique Inner Join + Output: j1.id, j2.id + Hash Cond: (j1.id = j2.id) + -> Seq Scan on public.j1 + Output: j1.id + -> Hash + Output: j2.id + -> Seq Scan on public.j2 + Output: j2.id + (9 rows) + + -- ensure join not changed when not an equi-join + explain (verbose, costs off) + select * from j1 inner join j2 on j1.id > j2.id; + QUERY PLAN + ----------------------------------- + Nested Loop + Output: j1.id, j2.id + Join Filter: (j1.id > j2.id) + -> Seq Scan on public.j1 + Output: j1.id + -> Materialize + Output: j2.id + -> Seq Scan on public.j2 + Output: j2.id + (9 rows) + + -- don't change, as j3 has no unique index or pk on id + explain (verbose, costs off) + select * from j1 inner join j3 on j1.id = j3.id; + QUERY PLAN + ----------------------------------- + Hash Unique Inner Join + Output: j1.id, j3.id + Hash Cond: (j3.id = j1.id) + -> Seq Scan on public.j3 + Output: j3.id + -> Hash + Output: j1.id + -> Seq Scan on public.j1 + Output: j1.id + (9 rows) + + -- ensure left join is converted to left semi join + explain (verbose, costs off) + select * from j1 left join j2 on j1.id = j2.id; + QUERY PLAN + ----------------------------------- + Hash Unique Left Join + Output: j1.id, j2.id + Hash Cond: (j1.id = j2.id) + -> Seq Scan on public.j1 + Output: j1.id + -> Hash + Output: j2.id + -> Seq Scan on public.j2 + Output: j2.id + (9 rows) + + -- ensure right join is converted too + explain (verbose, costs off) + select * from j1 right join j2 on j1.id = j2.id; + QUERY PLAN + ----------------------------------- + Hash Unique Left Join + Output: j1.id, j2.id + Hash Cond: (j2.id = j1.id) + -> Seq Scan on public.j2 + Output: j2.id + -> Hash + Output: j1.id + -> Seq Scan on public.j1 + Output: j1.id + (9 rows) + + -- a clauseless (cross) join can't be converted + explain (verbose, costs off) + select * from j1 cross join j2; + QUERY PLAN + ----------------------------------- + Nested Loop + Output: j1.id, j2.id + -> Seq Scan on public.j1 + Output: j1.id + -> Materialize + Output: j2.id + -> Seq Scan on public.j2 + Output: j2.id + (8 rows) + + -- ensure a natural join is converted to a semi join + explain (verbose, costs off) + select * from j1 natural join j2; + QUERY PLAN + ----------------------------------- + Hash Unique Inner Join + Output: j1.id + Hash Cond: (j1.id = j2.id) + -> Seq Scan on public.j1 + Output: j1.id + -> Hash + Output: j2.id + -> Seq Scan on public.j2 + Output: j2.id + (9 rows) + + -- ensure distinct clause allows the inner to become a semi join + explain (verbose, costs off) + select * from j1 + inner join (select distinct id from j3) j3 on j1.id = j3.id; + QUERY PLAN + ----------------------------------------------- + Nested Loop Unique Inner Join + Output: j1.id, j3.id + Join Filter: (j1.id = j3.id) + -> Seq Scan on public.j1 + Output: j1.id + -> Materialize + Output: j3.id + -> Unique + Output: j3.id + -> Sort + Output: j3.id + Sort Key: j3.id + -> Seq Scan on public.j3 + Output: j3.id + (14 rows) + + -- ensure group by clause allows the inner to become a semi join + explain (verbose, costs off) + select * from j1 + inner join (select id from j3 group by id) j3 on j1.id = j3.id; + QUERY PLAN + ----------------------------------------------- + Nested Loop Unique Inner Join + Output: j1.id, j3.id + Join Filter: (j1.id = j3.id) + -> Seq Scan on public.j1 + Output: j1.id + -> Materialize + Output: j3.id + -> Group + Output: j3.id + Group Key: j3.id + -> Sort + Output: j3.id + Sort Key: j3.id + -> Seq Scan on public.j3 + Output: j3.id + (15 rows) + + -- ensure a full join is not altered + explain (verbose, costs off) + select * from j1 full join j2 on j1.id = j2.id; + QUERY PLAN + ----------------------------------- + Hash Full Join + Output: j1.id, j2.id + Hash Cond: (j1.id = j2.id) + -> Seq Scan on public.j1 + Output: j1.id + -> Hash + Output: j2.id + -> Seq Scan on public.j2 + Output: j2.id + (9 rows) + + drop table j1; + drop table j2; + drop table j3; + -- test a more complex permutations of join conversions + create table j1 (id1 int, id2 int, primary key(id1,id2)); + create table j2 (id1 int, id2 int, primary key(id1,id2)); + create table j3 (id1 int, id2 int, primary key(id1,id2)); + insert into j1 values(1,1),(2,2); + insert into j2 values(1,1); + insert into j3 values(1,1); + analyze j1; + analyze j2; + analyze j3; + -- ensure there's no join conversion when not all columns which are part of + -- the unique index are part of the join clause + explain (verbose, costs off) + select * from j1 + inner join j2 on j1.id1 = j2.id1; + QUERY PLAN + ------------------------------------------ + Nested Loop + Output: j1.id1, j1.id2, j2.id1, j2.id2 + Join Filter: (j1.id1 = j2.id1) + -> Seq Scan on public.j2 + Output: j2.id1, j2.id2 + -> Seq Scan on public.j1 + Output: j1.id1, j1.id2 + (7 rows) + + -- ensure inner is converted to semi join when there's multiple columns in the + -- join condition + explain (verbose, costs off) + select * from j1 + inner join j2 on j1.id1 = j2.id1 and j1.id2 = j2.id2; + QUERY PLAN + ---------------------------------------------------------- + Nested Loop Unique Inner Join + Output: j1.id1, j1.id2, j2.id1, j2.id2 + Join Filter: ((j1.id1 = j2.id1) AND (j1.id2 = j2.id2)) + -> Seq Scan on public.j1 + Output: j1.id1, j1.id2 + -> Materialize + Output: j2.id1, j2.id2 + -> Seq Scan on public.j2 + Output: j2.id1, j2.id2 + (9 rows) + + drop table j1; + drop table j2; + drop table j3; diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out index 067aa8d..9fdc695 100644 *** a/src/test/regress/expected/rowsecurity.out --- b/src/test/regress/expected/rowsecurity.out *************** EXPLAIN (COSTS OFF) SELECT * FROM docume *** 276,282 **** EXPLAIN (COSTS OFF) SELECT * FROM document NATURAL JOIN category WHERE f_leak(dtitle); QUERY PLAN ---------------------------------------------------- ! Nested Loop -> Subquery Scan on document Filter: f_leak(document.dtitle) -> Seq Scan on document document_1 --- 276,282 ---- EXPLAIN (COSTS OFF) SELECT * FROM document NATURAL JOIN category WHERE f_leak(dtitle); QUERY PLAN ---------------------------------------------------- ! Nested Loop Unique Inner Join -> Subquery Scan on document Filter: f_leak(document.dtitle) -> Seq Scan on document document_1 diff --git a/src/test/regress/expected/select_views.out b/src/test/regress/expected/select_views.out index 7f57526..2651ff8 100644 *** a/src/test/regress/expected/select_views.out --- b/src/test/regress/expected/select_views.out *************** NOTICE: f_leak => 9801-2345-6789-0123 *** 1411,1417 **** EXPLAIN (COSTS OFF) SELECT * FROM my_credit_card_normal WHERE f_leak(cnum); QUERY PLAN --------------------------------------------------------- ! Hash Join Hash Cond: (r.cid = l.cid) -> Seq Scan on credit_card r Filter: f_leak(cnum) --- 1411,1417 ---- EXPLAIN (COSTS OFF) SELECT * FROM my_credit_card_normal WHERE f_leak(cnum); QUERY PLAN --------------------------------------------------------- ! Hash Unique Inner Join Hash Cond: (r.cid = l.cid) -> Seq Scan on credit_card r Filter: f_leak(cnum) *************** EXPLAIN (COSTS OFF) SELECT * FROM my_cre *** 1432,1438 **** --------------------------------------------------------------- Subquery Scan on my_credit_card_secure Filter: f_leak(my_credit_card_secure.cnum) ! -> Hash Join Hash Cond: (r.cid = l.cid) -> Seq Scan on credit_card r -> Hash --- 1432,1438 ---- --------------------------------------------------------------- Subquery Scan on my_credit_card_secure Filter: f_leak(my_credit_card_secure.cnum) ! -> Hash Unique Inner Join Hash Cond: (r.cid = l.cid) -> Seq Scan on credit_card r -> Hash *************** EXPLAIN (COSTS OFF) SELECT * FROM my_cre *** 1466,1472 **** -> Materialize -> Subquery Scan on l Filter: f_leak(l.cnum) ! -> Hash Join Hash Cond: (r_1.cid = l_1.cid) -> Seq Scan on credit_card r_1 -> Hash --- 1466,1472 ---- -> Materialize -> Subquery Scan on l Filter: f_leak(l.cnum) ! -> Hash Unique Inner Join Hash Cond: (r_1.cid = l_1.cid) -> Seq Scan on credit_card r_1 -> Hash *************** EXPLAIN (COSTS OFF) SELECT * FROM my_cre *** 1497,1503 **** -> Seq Scan on credit_usage r Filter: ((ymd >= '10-01-2011'::date) AND (ymd < '11-01-2011'::date)) -> Materialize ! -> Hash Join Hash Cond: (r_1.cid = l.cid) -> Seq Scan on credit_card r_1 -> Hash --- 1497,1503 ---- -> Seq Scan on credit_usage r Filter: ((ymd >= '10-01-2011'::date) AND (ymd < '11-01-2011'::date)) -> Materialize ! -> Hash Unique Inner Join Hash Cond: (r_1.cid = l.cid) -> Seq Scan on credit_card r_1 -> Hash diff --git a/src/test/regress/expected/select_views_1.out b/src/test/regress/expected/select_views_1.out index 5275ef0..81795d8 100644 *** a/src/test/regress/expected/select_views_1.out --- b/src/test/regress/expected/select_views_1.out *************** NOTICE: f_leak => 9801-2345-6789-0123 *** 1411,1417 **** EXPLAIN (COSTS OFF) SELECT * FROM my_credit_card_normal WHERE f_leak(cnum); QUERY PLAN --------------------------------------------------------- ! Hash Join Hash Cond: (r.cid = l.cid) -> Seq Scan on credit_card r Filter: f_leak(cnum) --- 1411,1417 ---- EXPLAIN (COSTS OFF) SELECT * FROM my_credit_card_normal WHERE f_leak(cnum); QUERY PLAN --------------------------------------------------------- ! Hash Unique Inner Join Hash Cond: (r.cid = l.cid) -> Seq Scan on credit_card r Filter: f_leak(cnum) *************** EXPLAIN (COSTS OFF) SELECT * FROM my_cre *** 1432,1438 **** --------------------------------------------------------------- Subquery Scan on my_credit_card_secure Filter: f_leak(my_credit_card_secure.cnum) ! -> Hash Join Hash Cond: (r.cid = l.cid) -> Seq Scan on credit_card r -> Hash --- 1432,1438 ---- --------------------------------------------------------------- Subquery Scan on my_credit_card_secure Filter: f_leak(my_credit_card_secure.cnum) ! -> Hash Unique Inner Join Hash Cond: (r.cid = l.cid) -> Seq Scan on credit_card r -> Hash *************** EXPLAIN (COSTS OFF) SELECT * FROM my_cre *** 1466,1472 **** -> Materialize -> Subquery Scan on l Filter: f_leak(l.cnum) ! -> Hash Join Hash Cond: (r_1.cid = l_1.cid) -> Seq Scan on credit_card r_1 -> Hash --- 1466,1472 ---- -> Materialize -> Subquery Scan on l Filter: f_leak(l.cnum) ! -> Hash Unique Inner Join Hash Cond: (r_1.cid = l_1.cid) -> Seq Scan on credit_card r_1 -> Hash *************** EXPLAIN (COSTS OFF) SELECT * FROM my_cre *** 1497,1503 **** -> Seq Scan on credit_usage r Filter: ((ymd >= '10-01-2011'::date) AND (ymd < '11-01-2011'::date)) -> Materialize ! -> Hash Join Hash Cond: (r_1.cid = l.cid) -> Seq Scan on credit_card r_1 -> Hash --- 1497,1503 ---- -> Seq Scan on credit_usage r Filter: ((ymd >= '10-01-2011'::date) AND (ymd < '11-01-2011'::date)) -> Materialize ! -> Hash Unique Inner Join Hash Cond: (r_1.cid = l.cid) -> Seq Scan on credit_card r_1 -> Hash diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql index 3430f91..38cc39c 100644 *** a/src/test/regress/sql/join.sql --- b/src/test/regress/sql/join.sql *************** update xx1 set x2 = f1 from xx1, lateral *** 1696,1698 **** --- 1696,1791 ---- delete from xx1 using (select * from int4_tbl where f1 = x1) ss; delete from xx1 using (select * from int4_tbl where f1 = xx1.x1) ss; delete from xx1 using lateral (select * from int4_tbl where f1 = x1) ss; + + -- + -- test planner's ability to change joins into their appropriate semi join + -- type + -- + + create table j1 (id int primary key); + create table j2 (id int primary key); + create table j3 (id int); + + insert into j1 values(1),(2),(3); + insert into j2 values(1),(2),(3); + insert into j3 values(1),(1); + + analyze j1; + analyze j2; + analyze j3; + + -- ensure join is changed to a semi join + explain (verbose, costs off) + select * from j1 inner join j2 on j1.id = j2.id; + + -- ensure join not changed when not an equi-join + explain (verbose, costs off) + select * from j1 inner join j2 on j1.id > j2.id; + + -- don't change, as j3 has no unique index or pk on id + explain (verbose, costs off) + select * from j1 inner join j3 on j1.id = j3.id; + + -- ensure left join is converted to left semi join + explain (verbose, costs off) + select * from j1 left join j2 on j1.id = j2.id; + + -- ensure right join is converted too + explain (verbose, costs off) + select * from j1 right join j2 on j1.id = j2.id; + + -- a clauseless (cross) join can't be converted + explain (verbose, costs off) + select * from j1 cross join j2; + + -- ensure a natural join is converted to a semi join + explain (verbose, costs off) + select * from j1 natural join j2; + + -- ensure distinct clause allows the inner to become a semi join + explain (verbose, costs off) + select * from j1 + inner join (select distinct id from j3) j3 on j1.id = j3.id; + + -- ensure group by clause allows the inner to become a semi join + explain (verbose, costs off) + select * from j1 + inner join (select id from j3 group by id) j3 on j1.id = j3.id; + + -- ensure a full join is not altered + explain (verbose, costs off) + select * from j1 full join j2 on j1.id = j2.id; + + drop table j1; + drop table j2; + drop table j3; + + -- test a more complex permutations of join conversions + + create table j1 (id1 int, id2 int, primary key(id1,id2)); + create table j2 (id1 int, id2 int, primary key(id1,id2)); + create table j3 (id1 int, id2 int, primary key(id1,id2)); + + insert into j1 values(1,1),(2,2); + insert into j2 values(1,1); + insert into j3 values(1,1); + + analyze j1; + analyze j2; + analyze j3; + + -- ensure there's no join conversion when not all columns which are part of + -- the unique index are part of the join clause + explain (verbose, costs off) + select * from j1 + inner join j2 on j1.id1 = j2.id1; + + -- ensure inner is converted to semi join when there's multiple columns in the + -- join condition + explain (verbose, costs off) + select * from j1 + inner join j2 on j1.id1 = j2.id1 and j1.id2 = j2.id2; + + drop table j1; + drop table j2; + drop table j3;
On Thu, Apr 7, 2016 at 7:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Anyway, I think it would be reasonable to give this patch a few more > days in view of David's being away through the weekend. But the RMT > has final say on that. The RMT has considered this request (sorry for the delay) and thought it had merit, but ultimately voted not to give this patch an extension. Robert Haas PostgreSQL 9.6 Release Management Team
On 8 April 2016 at 02:46, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm also a bit suspicious of the fact that some of the plans in > aggregates.out changed from merge to hash joins; with basically > no stats at hand in those tests, that seems dubious. A quick look > at what the patch touched in costsize.c suggests that this might > be because you've effectively allowed cost_hashjoin to give a cost > discount for inner unique, but provided no similar intelligence > in cost_mergejoin. (catching up) The only possible reason that the merge join plans have become a hash join is that hash join is costed more cheaply (same as SEMI JOIN) when the inner side is unique. The reason I didn't cost merge join differently is that there seems to be no special costing done there for SEMI joins already. I might be wrong, but I didn't feel like it was up to this patch to introduce that, though, perhaps a patch should go in before this one to do that. I understand this is 9.7 stuff now, but I feel like I should tie up the lose ends before they get forgotten. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 8 April 2016 at 11:59, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I did some performance testing on the attached somewhat-cleaned-up patch, > and convinced myself that the planning time penalty is fairly minimal: > on the order of a couple percent in simple one-join queries, and less > than that in very large queries. Oddly, it seems that the result cacheing > done in get_optimal_jointype() is only barely worth the trouble in typical > cases; though if you get a query large enough to require GEQO, it's a win > because successive GEQO attempts can re-use cache entries made by earlier > attempts. (This may indicate something wrong with my testing procedure? > Seems like diking out the cache should have made more difference.) It'll really depend on whether you're testing a positive case or a negative one, since a GEQO join search will be the only time the non-unique cache is filled, then if you're testing the negative case then this is the only time you'll get any non-unique cache benefits. On the other hand, if you were testing with a positive case then I guess the unique index check is cheaper than we thought. Maybe adding a couple of other unique indexes before defining the one which proves the join unique (so that they've got a lower OID) would be enough to start showing the cache benefits of the unique rel cache. > I did find by measurement that the negative-cache-entry code produces > exactly zero hits unless you're in GEQO mode, which is not really > surprising given the order in which the join search occurs. So in the > attached patch I made the code not bother with making negative cache > entries unless using GEQO, to hopefully save a few nanoseconds. Yeah, I also noticed this in my testing, and it makes sense given how the standard join search works. Thanks for making that improvement, I've not yet looked at the code, but it sounds like a good idea. > I rebased over f338dd758, did a little bit of code cleanup and fixed some > bugs in the uniqueness detection logic, but have not reviewed the rest of > the patch since it's likely all gonna change if we reconsider the JoinType > representation. > > Anyway, I think it would be reasonable to give this patch a few more > days in view of David's being away through the weekend. But the RMT > has final say on that. Thanks for making the updates and committing the refactor of analyzejoins.c. The RMT have ruled no extension will be given, so I'll get this into shape for 9.7 CF1. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 8 April 2016 at 06:49, Tom Lane <tgl@sss.pgh.pa.us> wrote: > David Rowley <david.rowley@2ndquadrant.com> writes: > Just had a thought about this, which should have crystallized a long > time ago perhaps. Where I'd originally imagined you were going with > this idea is to do what the thread title actually says, and check for > joins in which the *outer* side is unique. I can't see that that's > of any value for nestloop or hash joins, but for merge joins, knowing > that the outer side is unique could be extremely valuable because > we could skip doing mark/restore backups on the inner side, hugely > reducing the cost when the inner side has many duplicates. Now I'm > not asking for the initially-committed version of the patch to do that, > but I think we need to design it to be readily extensible to do so. I've rebased the changes I made to address this back in April to current master. The changes get rid of the changing JOIN_INNER to JOIN_SEMI conversion and revert back to the original "inner_unique" marking of the joins. In addition to this I've also added "outer_unique", which is only made use of in merge join to control if the outer side needs to enable mark and restore or not. However, having said that, I'm not sure why we'd need outer_unique available so we'd know that we could skip mark/restore. I think inner_unique is enough for this purpose. Take the comment from nodeMergejoin.c: * outer: (0 ^1 1 2 5 5 5 6 6 7) current tuple: 1 * inner: (1 ^3 5 5 5 5 6) current tuple: 3 ... * * Consider the above relations and suppose that the executor has * just joined the first outer "5" with the last inner "5". The * next step is of course to join the second outer "5" with all * the inner "5's". This requires repositioning the inner "cursor" * to point at the first inner "5". This is done by "marking" the * first inner 5 so we can restore the "cursor" to it before joining * with the second outer 5. The access method interface provides * routines to mark and restore to a tuple. If only one inner "5" can exist (unique_inner), then isn't the inner side already in the correct place, and no restore is required? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 31 October 2016 at 18:37, David Rowley <david.rowley@2ndquadrant.com> wrote: > I've rebased the changes I made to address this back in April to current master. Please note that I went ahead and marked this as "Ready for committer". It was previously marked as such in a previous commitfest. The changes made since last version was based on feedback from Tom. If anyone thinks this is not correct then please mark as "Ready for review". -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Nov 2, 2016 at 1:21 PM, David Rowley <david.rowley@2ndquadrant.com> wrote:
On 31 October 2016 at 18:37, David Rowley <david.rowley@2ndquadrant.com> wrote:
> I've rebased the changes I made to address this back in April to current master.
Please note that I went ahead and marked this as "Ready for
committer". It was previously marked as such in a previous commitfest.
The changes made since last version was based on feedback from Tom.
If anyone thinks this is not correct then please mark as "Ready for review".
Patch still applies fine to HEAD.
Moved to next CF with "ready for committer" status.
Regards,
Hari Babu
Fujitsu Australia
On Dec 2, 2016, at 7:47 AM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
On Wed, Nov 2, 2016 at 1:21 PM, David Rowley <david.rowley@2ndquadrant.com> wrote:On 31 October 2016 at 18:37, David Rowley <david.rowley@2ndquadrant.com> wrote:
> I've rebased the changes I made to address this back in April to current master.
Please note that I went ahead and marked this as "Ready for
committer". It was previously marked as such in a previous commitfest.
The changes made since last version was based on feedback from Tom.
If anyone thinks this is not correct then please mark as "Ready for review".Patch still applies fine to HEAD.Moved to next CF with "ready for committer" status.
Tom, are you picking this up?
...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Dec 2, 2016, at 7:47 AM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote: >> Patch still applies fine to HEAD. >> Moved to next CF with "ready for committer" status. > Tom, are you picking this up? Yeah, I apologize for not having gotten to it in this commitfest, but it's definitely something I will look at. regards, tom lane
On 3 December 2016 at 10:26, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Dec 2, 2016, at 7:47 AM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote: >>> Patch still applies fine to HEAD. >>> Moved to next CF with "ready for committer" status. > >> Tom, are you picking this up? > > Yeah, I apologize for not having gotten to it in this commitfest, but > it's definitely something I will look at. Old patch no longer applies, so I've attached a rebased patch. This also re-adds a comment line which I mistakenly removed. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 19 January 2017 at 11:06, David Rowley <david.rowley@2ndquadrant.com> wrote: > Old patch no longer applies, so I've attached a rebased patch. This > also re-adds a comment line which I mistakenly removed. (meanwhile Andres commits 69f4b9c) I should've waited a bit longer. Here's another that fixes the new conflicts. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
[ getting back to this at long last ] David Rowley <david.rowley@2ndquadrant.com> writes: > However, having said that, I'm not sure why we'd need outer_unique > available so we'd know that we could skip mark/restore. I think > inner_unique is enough for this purpose. Take the comment from > nodeMergejoin.c: > * outer: (0 ^1 1 2 5 5 5 6 6 7) current tuple: 1 > * inner: (1 ^3 5 5 5 5 6) current tuple: 3 > ... > * > * Consider the above relations and suppose that the executor has > * just joined the first outer "5" with the last inner "5". The > * next step is of course to join the second outer "5" with all > * the inner "5's". This requires repositioning the inner "cursor" > * to point at the first inner "5". This is done by "marking" the > * first inner 5 so we can restore the "cursor" to it before joining > * with the second outer 5. The access method interface provides > * routines to mark and restore to a tuple. > If only one inner "5" can exist (unique_inner), then isn't the inner > side already in the correct place, and no restore is required? Hmm ... let me see whether I have my head wrapped around this correctly. What I had in mind is a different optimization. When the inner is not unique, we can't know we're done joining the first outer "5" until we reach the inner "6". At that point, what we normally do is advance the outer by one row and back up the inner to the mark point (the first inner "5"). But the only reason to back up is that the new outer row might join to the same inner rows the previous one did. If we know the outer side is unique, then those inner rows can't join to the new outer row so no need to back up. So this requires no real change to the mergejoin algorithm, we just skip the mark and restore steps. I think what you're saying is that, if we know the inner side is unique, we can also skip mark/restore overhead; but it would work a bit differently. After joining two rows with equal keys, instead of advancing the inner as per standard algorithm, we'd need to advance the outer side. (This is OK because we know the next inner can't join to this same outer. But we don't know if the next outer can join to this inner.) We advance inner only when current inner < current outer, so we're done with that inner and need never rewind. So this is a more fundamental algorithm change but it gets the same performance benefit. So the question is, if we can skip mark/restore overhead when we know that either input is unique, is it necessary for the planner to account for both ways explicitly? Given the general symmetry of mergejoins, it might be okay for the planner to preferentially generate plans with the unique input on the inside, and not worry about optimizing in this way when it's on the outside. Now, JOIN_SEMI and JOIN_ANTI cases are *not* symmetric, since we don't implement reverse-semi or reverse-anti join modes. But I don't know that there's anything to be won by noticing that the outer side is unique in those cases. I think probably we can apply the same technique of advancing outer not inner, and never rewinding, in those join modes whether or not either input is known unique. Another asymmetry is that if one input is likely to be empty, it's better to put that one on the outside, because if it is empty we don't have to fetch anything from the other input (thus saving its startup cost). But the planner doesn't account for that anyway because it never believes non-dummy relations are truly empty; so even if it's getting this right today, it's purely by chance. I can't think of any other asymmetries offhand. In short, I think you are right that it might be enough to account for inner uniqueness only, and not worry about it for the outer side, even for mergejoin. This means my previous objection is wrong and we don't really have to allow for a future extension of that kind while choosing the notation the planner uses. So ... would you rather go back to the previous notation (extra JoinTypes), or do you like the separate boolean better anyway? Sorry for jerking you back and forth like this, but sometimes the correct path isn't apparent from the start. regards, tom lane
Re: [HACKERS] Performance improvement for joins where outer side is unique
From
Antonin Houska
Date:
David Rowley <david.rowley@2ndquadrant.com> wrote: > On 19 January 2017 at 11:06, David Rowley <david.rowley@2ndquadrant.com> wrote: > > Old patch no longer applies, so I've attached a rebased patch. This > > also re-adds a comment line which I mistakenly removed. > > (meanwhile Andres commits 69f4b9c) > > I should've waited a bit longer. > > Here's another that fixes the new conflicts. I suspect that "inner" and "outer" relation / tuple are sometimes confused in comments: * analyzejoins.c:70 "searches for subsequent matching outer tuples." * analyzejoins.c:972 /** innerrel_is_unique* Check for proofs which prove that 'innerrel' can, at most, match a* single tuple in 'outerrel'based on the join condition in* 'restrictlist'.*/ * relation.h:1831 bool inner_unique; /* inner side of join matches no more than one * outer side tuple*/ -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at
Thank for looking at this again. On 25 January 2017 at 06:27, Tom Lane <tgl@sss.pgh.pa.us> wrote: > David Rowley <david.rowley@2ndquadrant.com> writes: >> However, having said that, I'm not sure why we'd need outer_unique >> available so we'd know that we could skip mark/restore. I think >> inner_unique is enough for this purpose. Take the comment from >> nodeMergejoin.c: > >> * outer: (0 ^1 1 2 5 5 5 6 6 7) current tuple: 1 >> * inner: (1 ^3 5 5 5 5 6) current tuple: 3 >> ... >> * >> * Consider the above relations and suppose that the executor has >> * just joined the first outer "5" with the last inner "5". The >> * next step is of course to join the second outer "5" with all >> * the inner "5's". This requires repositioning the inner "cursor" >> * to point at the first inner "5". This is done by "marking" the >> * first inner 5 so we can restore the "cursor" to it before joining >> * with the second outer 5. The access method interface provides >> * routines to mark and restore to a tuple. > >> If only one inner "5" can exist (unique_inner), then isn't the inner >> side already in the correct place, and no restore is required? > > Hmm ... let me see whether I have my head wrapped around this correctly. > > What I had in mind is a different optimization. When the inner is not > unique, we can't know we're done joining the first outer "5" until we > reach the inner "6". At that point, what we normally do is advance the > outer by one row and back up the inner to the mark point (the first inner > "5"). But the only reason to back up is that the new outer row might join > to the same inner rows the previous one did. If we know the outer side is > unique, then those inner rows can't join to the new outer row so no need > to back up. So this requires no real change to the mergejoin algorithm, > we just skip the mark and restore steps. I'd not thought of that possibility, although with a unique outer, I don't think it'll ever save a restore, as the comparison to the new outer tuple will always fail to match the marked inner tuple. Never-the-less we can save that useless comparison, so I've written code to that affect in the attached. Perhaps detecting unique outer is not worthwhile for just this? > > I think what you're saying is that, if we know the inner side is unique, > we can also skip mark/restore overhead; but it would work a bit > differently. After joining two rows with equal keys, instead of advancing > the inner as per standard algorithm, we'd need to advance the outer side. > (This is OK because we know the next inner can't join to this same outer. Yeah, this is the same optimisation as applies to the other join types too which happens to just be the same point as semi joins must give up too. > But we don't know if the next outer can join to this inner.) We advance > inner only when current inner < current outer, so we're done with that > inner and need never rewind. So this is a more fundamental algorithm > change but it gets the same performance benefit. Yes, I think if inner is unique and outer is unique then after joining in EXEC_MJ_JOINTUPLES, we can have a new state EXEC_MJ_NEXTINNERANDOUTER, as the next outer won't match this inner, so we might as well skip to the next on both, but quite possibly such a join is just not common enough to have to worry about that. It might not give us much better performance anyway. > So the question is, if we can skip mark/restore overhead when we know that > either input is unique, is it necessary for the planner to account for > both ways explicitly? Given the general symmetry of mergejoins, it might > be okay for the planner to preferentially generate plans with the unique > input on the inside, and not worry about optimizing in this way when it's > on the outside. I'm inclined to agree. > Now, JOIN_SEMI and JOIN_ANTI cases are *not* symmetric, since we don't > implement reverse-semi or reverse-anti join modes. But I don't know that > there's anything to be won by noticing that the outer side is unique in > those cases. I think probably we can apply the same technique of > advancing outer not inner, and never rewinding, in those join modes > whether or not either input is known unique. > > Another asymmetry is that if one input is likely to be empty, it's better > to put that one on the outside, because if it is empty we don't have to > fetch anything from the other input (thus saving its startup cost). But > the planner doesn't account for that anyway because it never believes > non-dummy relations are truly empty; so even if it's getting this right > today, it's purely by chance. > > I can't think of any other asymmetries offhand. > > In short, I think you are right that it might be enough to account for > inner uniqueness only, and not worry about it for the outer side, even > for mergejoin. This means my previous objection is wrong and we don't > really have to allow for a future extension of that kind while choosing > the notation the planner uses. > > So ... would you rather go back to the previous notation (extra > JoinTypes), or do you like the separate boolean better anyway? I didn't really ever care for adding the new join types. I think if someone writes SELECT ... FROM ... INNER JOIN ... and sees a "Semi Join" in the plan, then we'll be bombarded with bug reports. Also if we ever changed our minds about outer unique, then we'd not be in a very great position to go and add it again. > Sorry for jerking you back and forth like this, but sometimes the > correct path isn't apparent from the start. It's fine. I learned quite a bit today thinking about all this again. The attached has my Merge Join changes, to show what I think can be done to make use of unique outer. Let me know what you think, but I get that idea that we're both leaning towards ripping the outer unique stuff out, so I'll go do that now... -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 26 January 2017 at 04:56, Antonin Houska <ah@cybertec.at> wrote: > I suspect that "inner" and "outer" relation / tuple are sometimes confused in > comments: > > > * analyzejoins.c:70 > > "searches for subsequent matching outer tuples." > > > * analyzejoins.c:972 > > /* > * innerrel_is_unique > * Check for proofs which prove that 'innerrel' can, at most, match a > * single tuple in 'outerrel' based on the join condition in > * 'restrictlist'. > */ > > > * relation.h:1831 > > bool inner_unique; /* inner side of join matches no more than one > * outer side tuple */ Thanks for looking over the patch. I believe I've fixed all those now. I'm not too surprised I got some wrong, after all, I did mess up the subject of this email too! Which did cause quite a bit of confusion. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 27 January 2017 at 00:37, David Rowley <david.rowley@2ndquadrant.com> wrote: > The attached has my Merge Join changes, to show what I think can be > done to make use of unique outer. Let me know what you think, but I > get that idea that we're both leaning towards ripping the outer unique > stuff out, so I'll go do that now... I've attached a version without outer unique. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
David Rowley <david.rowley@2ndquadrant.com> writes: > I've attached a version without outer unique. I looked through this a bit, and the first thing I noticed was it doesn't touch costsize.c at all. That seems pretty wrong; it's little help to have a performance improvement if the planner won't pick the right plan type. There were costsize.c changes in there back in April; what happened to them? regards, tom lane
On 27 January 2017 at 08:34, Tom Lane <tgl@sss.pgh.pa.us> wrote: > David Rowley <david.rowley@2ndquadrant.com> writes: >> I've attached a version without outer unique. > > I looked through this a bit, and the first thing I noticed was it doesn't > touch costsize.c at all. That seems pretty wrong; it's little help to > have a performance improvement if the planner won't pick the right plan > type. There were costsize.c changes in there back in April; what happened > to them? hmm. I must've taken them out when I changed everything around to use the join types. Nothing special was needed when changing JOIN_INNER to JOIN_SEMI. I must've forgotten to put them back again... I'll do that now. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
To re-familiarize myself with this patch, I've been re-reading the thread, which has gotten quite long. It seemed like it would be a good idea to stop and try to summarize what the patch ought to accomplish, because there's been some drift over the more than 2 years the patch has been in the works. So: ISTM there are two core ideas at this point. 1. We should recognize when the inner side of a join is unique (that is, there is provably no more than one inner tuple joining to any given outer tuple), and make use of that knowledge to short-circuit execution in the same way we already do for JOIN_SEMI and JOIN_ANTI cases. That is, once we find a match we can immediately move on to the next outer tuple rather than continuing to scan for inner-side matches. 2. In these same cases (unique/semi/anti joins), it is possible to avoid mark/restore overhead in a mergejoin, because we can tweak the executor logic to not require backing up the inner side. This goes further than just tweaking the executor logic, though, because if we know we don't need mark/restore then that actually makes some plan shapes legal that weren't before: we don't need to interpose a Material node to protect join inputs that can't mark/restore. Maybe I missed something, but it doesn't look like the current patch (unique_joins_2017-01-27_no_outer_unique.patch) has anything concerning point #2 at all. It might make sense to address that idea as a follow-on patch, but I think it can be a quite significant win and we shouldn't just lose track of it. Anyway, having laid out that scope of work, I have some concerns: * The patch applies point #1 to only INNER and LEFT join modes, but I don't really see why the idea wouldn't work for RIGHT and FULL modes, ie the optimization seems potentially interesting for all executable join types. Once you've got a match, you can immediately go to the next outer tuple instead of continuing to scan inner. (Am I missing something?) * Particularly in view of the preceding point, I'm not that happy with the way that management/caching of the "is it unique" knowledge is done completely differently for INNER and LEFT joins. I wonder if there's actually a good argument for that or is it mostly a development sequence artifact. IOW, would it hurt to drop the SpecialJoinInfo tie-in and just rely on the generic cache? * Because of where we apply the short-circuit logic in the executor, it's only safe to consider the inner rel as unique if it is provably unique using only the join clauses that drive the primary join mechanism (ie, the "joinquals" not the "otherquals"). We already do ignore quals that are pushed-down to an outer join, so that's good, but there's an oversight: we will use a qual that is mergejoinable even if it's not hashjoinable. That means we could get the wrong answers in a hash join. I think probably the appropriate fix for the moment is just to consider only clauses that are both mergeable and hashable while trying to prove uniqueness. We do have some equality operators that support only one or the other, but they're corner cases, and I'm dubious that it's worth having to make separate proofs for merge and hash joins in order to cater to those cases. regards, tom lane
Re: [HACKERS] Performance improvement for joins where outer side is unique
From
Antonin Houska
Date:
I thought about the patch from the perspective of "grouped relations" (especially [1]). When looking for the appropriate context within the thread, I picked this message. David Rowley <david.rowley@2ndquadrant.com> wrote: > On 12 March 2016 at 11:43, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > It seems like the major intellectual complexity here is to figure out > > how to detect inner-side-unique at reasonable cost. I see that for > > LEFT joins you're caching that in the SpecialJoinInfos, which is probably > > fine. But for INNER joins it looks like you're just doing it over again > > for every candidate join, and that seems mighty expensive. > ... I'll look into that. > > The other thing I thought of was to add a dedicated list for unique > indexes in RelOptInfo, this would also allow > rel_supports_distinctness() to do something a bit smarter than just > return false if there's no indexes. That might not buy us much though, > but at least relations tend to have very little unique indexes, even > when they have lots of indexes. I'm thinking of a concept of "unique keys", similar to path keys that the planner already uses. Besides the current evaluation of uniqueness of the inner side of a join, the planner would (kind of) union the unique keys of the joined rels, ie compute a list of expressions which generates an unique row throughout the new join result. (Requirement is that each key must be usable in join expression, as opposed to filter.) To figure out whether at most one inner row exists per outer row, each unique key of the inner relation which references the outer relation needs to match an unique key of the outer relation (but it's probably wrong if multiple unique keys of the inner rel reference the same key of the outer rel). Like path key, the unique key would also point to an equivalence class. Thus mere equality of the EC pointers could perhaps be used to evaluate the match of the inner and outer keys. Given that rel_is_distinct_for() currently does not accept joins, this change would make the patch more generic. (BTW, with this approach, unique_rels and non_unique_rels caches would have to be stored per-relation (RelOptInfo), as opposed to PlannerInfo.) The reason I'd like the unique keys is that - from the "grouped relation" point of view - the relation uniqueness also needs to be checked against the GROUP BY clause. Thus the "unique keys" concept seem to me like an useful abstraction. Does this proposal seem to have a serious flaw? [1] https://www.postgresql.org/message-id/CAKJS1f_h1CLff92B%3D%2BbdrMK2Nf3EfGWaJu2WbzQUYcSBUi02ag%40mail.gmail.com -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at
On 27 January 2017 at 12:39, Tom Lane <tgl@sss.pgh.pa.us> wrote: > 2. In these same cases (unique/semi/anti joins), it is possible to avoid > mark/restore overhead in a mergejoin, because we can tweak the executor > logic to not require backing up the inner side. This goes further than > just tweaking the executor logic, though, because if we know we don't > need mark/restore then that actually makes some plan shapes legal that > weren't before: we don't need to interpose a Material node to protect > join inputs that can't mark/restore. I've made modifications in the attached to add this optimization, and it's quite a significant improvement. Setup: create table t1 (id text primary key); insert into t1 select x from generate_series(1,1000000) x(x); create table t2 (id text primary key); insert into t2 select x from generate_series(1,1000000) x(x); vacuum freeze; Query: select count(*) from t1 inner join t2 on t1.id=t2.id; Unpatched Time: 369.618 ms Time: 358.208 ms Time: 357.094 ms Time: 355.642 ms Time: 358.193 ms Time: 371.272 ms Time: 354.386 ms Time: 364.277 ms Time: 346.091 ms Time: 358.757 ms Patched Time: 273.154 ms Time: 258.520 ms Time: 269.456 ms Time: 252.861 ms Time: 271.015 ms Time: 252.567 ms Time: 271.132 ms Time: 267.505 ms Time: 265.295 ms Time: 257.068 ms About 26.5% improvement for this case. > * The patch applies point #1 to only INNER and LEFT join modes, but > I don't really see why the idea wouldn't work for RIGHT and FULL modes, > ie the optimization seems potentially interesting for all executable join > types. Once you've got a match, you can immediately go to the next outer > tuple instead of continuing to scan inner. (Am I missing something?) No I believe I started development with LEFT JOINs, moved to INNER, then didn't progress to think of RIGHT or FULL. I've lifted this restriction in the patch. It seems no other work is required to have that just work. > * Particularly in view of the preceding point, I'm not that happy with > the way that management/caching of the "is it unique" knowledge is > done completely differently for INNER and LEFT joins. I wonder if > there's actually a good argument for that or is it mostly a development > sequence artifact. IOW, would it hurt to drop the SpecialJoinInfo > tie-in and just rely on the generic cache? I agree that special handling of one join type is not so pretty. However, LEFT JOINs still remain a bit special as they're the only ones we currently perform join removal on, and the patch modifies that code to make use of the new flag for those. This can improve planner performance of join removal when a join is removed successfully, as the previous code had to recheck uniqueness of each remaining LEFT JOIN again, whereas the new code only checks uniqueness of ones not previously marked as unique. This too likely could be done with the cache, although I'm a bit concerned with populating the cache, then performing a bunch of LEFT JOIN removals and leaving relids in the cache which no longer exist. Perhaps it's OK. I've just not found proofs in my head yet that it is. > * Because of where we apply the short-circuit logic in the executor, > it's only safe to consider the inner rel as unique if it is provably > unique using only the join clauses that drive the primary join mechanism > (ie, the "joinquals" not the "otherquals"). We already do ignore quals > that are pushed-down to an outer join, so that's good, but there's an > oversight: we will use a qual that is mergejoinable even if it's not > hashjoinable. That means we could get the wrong answers in a hash join. > I think probably the appropriate fix for the moment is just to consider > only clauses that are both mergeable and hashable while trying to prove > uniqueness. We do have some equality operators that support only one > or the other, but they're corner cases, and I'm dubious that it's worth > having to make separate proofs for merge and hash joins in order to > cater to those cases. hmm. I'm having trouble understanding why this is a problem for Unique joins, but not for join removal? However you mentioning this cause me to notice that this is only true in the patch for left joins, and the other join types are not consistent with that. create table a1 (a int, b int, primary key(a,b)); create table a2 (a int, b int, primary key(a,b)); explain (verbose, costs off) select * from a1 left join a2 on a1.a = a2.a and a2.b=1 ; QUERY PLAN -------------------------------------------------- Merge Left Join Output: a1.a, a1.b, a2.a, a2.b Inner Unique: Yes Merge Cond: (a1.a = a2.a) -> Index Only Scan using a1_pkey on public.a1 Output: a1.a, a1.b -> Sort Output: a2.a, a2.b Sort Key: a2.a -> Bitmap Heap Scan on public.a2 Output: a2.a, a2.b Recheck Cond: (a2.b = 1) -> Bitmap Index Scan on a2_pkey Index Cond: (a2.b = 1) explain (verbose, costs off) select * from a1 inner join a2 on a1.a = a2.a and a2.b=1 ; QUERY PLAN -------------------------------------------------- Nested Loop Output: a1.a, a1.b, a2.a, a2.b Inner Unique: No -> Bitmap Heap Scan on public.a2 Output: a2.a, a2.b Recheck Cond: (a2.b = 1) -> Bitmap Index Scan on a2_pkey Index Cond: (a2.b = 1) -> Index Only Scan using a1_pkey on public.a1 Output: a1.a, a1.b Index Cond: (a1.a = a2.a) Notice the inner join is not detected as unique, but the left join is. This is still not right in the attached. I'm not quite sure what to do about it yet. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
David Rowley <david.rowley@2ndquadrant.com> writes: > On 27 January 2017 at 12:39, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> 2. In these same cases (unique/semi/anti joins), it is possible to avoid >> mark/restore overhead in a mergejoin, because we can tweak the executor >> logic to not require backing up the inner side. > I've made modifications in the attached to add this optimization, and > it's quite a significant improvement. Cool ... validates my gut feeling that that was worth incorporating. >> ... IOW, would it hurt to drop the SpecialJoinInfo >> tie-in and just rely on the generic cache? > I agree that special handling of one join type is not so pretty. > However, LEFT JOINs still remain a bit special as they're the only > ones we currently perform join removal on, and the patch modifies that > code to make use of the new flag for those. This can improve planner > performance of join removal when a join is removed successfully, as > the previous code had to recheck uniqueness of each remaining LEFT > JOIN again, whereas the new code only checks uniqueness of ones not > previously marked as unique. This too likely could be done with the > cache, although I'm a bit concerned with populating the cache, then > performing a bunch of LEFT JOIN removals and leaving relids in the > cache which no longer exist. Perhaps it's OK. I've just not found > proofs in my head yet that it is. TBH, I do not like that tie-in at all. I don't believe that it improves any performance, because if analyzejoins.c detects that the join is unique, it will remove the join; therefore there is nothing to cache. (This statement depends on the uniqueness test being the last removability test, but it is.) And running mark_unique_joins() multiple times is ugly and adds cycles whenever it cannot prove a join unique, because it'll keep trying to do so. So I'm pretty inclined to drop the connection to analyzejoins.c altogether, along with mark_unique_joins(), and just use the generic positive/negative cache mechanism you added for all join types. It's possible that it'd make sense for analyzejoins.c to add a negative cache entry about any join that it tries and fails to prove unique. Your point about cache maintenance is valid, but a simple answer there would just be to flush the cache whenever we remove a rel. (Although I'm dubious that we need to: how could a removable rel be part of the min outer rels for any surviving rel?) >> * Because of where we apply the short-circuit logic in the executor, >> it's only safe to consider the inner rel as unique if it is provably >> unique using only the join clauses that drive the primary join mechanism >> (ie, the "joinquals" not the "otherquals"). We already do ignore quals >> that are pushed-down to an outer join, so that's good, but there's an >> oversight: we will use a qual that is mergejoinable even if it's not >> hashjoinable. That means we could get the wrong answers in a hash join. > hmm. I'm having trouble understanding why this is a problem for Unique > joins, but not for join removal? Ah, you know what, that's just mistaken. I was thinking that we short-circuited the join on the strength of the hash (or merge) quals only, but actually we check all the joinquals first. As long as the uniqueness proof uses only joinquals and not conditions that will end up as otherquals, it's fine. > However you mentioning this cause me to notice that this is only true > in the patch for left joins, and the other join types are not > consistent with that. Hm, perhaps, but the example you show isn't proving that, because it's choosing to put a1 on the inside in the innerjoin case, and a1 certainly isn't unique for this query. We can't see whether a2 was detected as unique. regards, tom lane
I wrote: > David Rowley <david.rowley@2ndquadrant.com> writes: >> hmm. I'm having trouble understanding why this is a problem for Unique >> joins, but not for join removal? > Ah, you know what, that's just mistaken. I was thinking that we > short-circuited the join on the strength of the hash (or merge) quals > only, but actually we check all the joinquals first. As long as the > uniqueness proof uses only joinquals and not conditions that will end up > as otherquals, it's fine. Actually, after thinking about that some more, it seems to me that there is a performance (not correctness) issue here: suppose that we have something like select ... from t1 left join t2 on t1.x = t2.x and t1.y < t2.y If there's a unique index on t2.x, we'll be able to mark the join inner-unique. However, short-circuiting would only occur after finding a row that passes both joinquals. If the y condition is true for only a few rows, this would pretty nearly disable the optimization. Ideally we would short-circuit after testing the x condition only, but there's no provision for that. This might not be a huge problem for outer joins. My sense of typical SQL style is that the joinquals (ON conditions) are likely to be exactly what you need to prove inner uniqueness, while random other conditions will be pushed-down from WHERE and hence will be otherquals. But I'm afraid it is quite a big deal for inner joins, where we dump all available conditions into the joinquals. We might need to rethink that choice. At least for merge and hash joins, it's tempting to think about a short-circuit test being made after testing just the merge/hash quals. But we'd have to prove uniqueness using only the merge/hash quals, so the planning cost might be unacceptably high --- particularly for merge joins which often don't use all available mergeable quals. In the end I think we probably want to keep the short-circuit in the same place where it is for SEMI/ANTI cases (which have to have it exactly there for semantic correctness). I'm afraid though that we may have to do something about the irrelevant-joinquals issue in order for this to be of much real-world use for inner joins. regards, tom lane
On Fri, Jan 27, 2017 at 11:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm afraid though that we may have to do something about the > irrelevant-joinquals issue in order for this to be of much real-world > use for inner joins. Maybe, but it's certainly not the case that all inner joins are highly selective. There are plenty of inner joins in real-world applications where the join product is 10% or 20% or 50% of the size of the larger input. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Jan 27, 2017 at 11:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'm afraid though that we may have to do something about the >> irrelevant-joinquals issue in order for this to be of much real-world >> use for inner joins. > Maybe, but it's certainly not the case that all inner joins are highly > selective. There are plenty of inner joins in real-world applications > where the join product is 10% or 20% or 50% of the size of the larger > input. Um ... what's that got to do with the point at hand? regards, tom lane
On Fri, Jan 27, 2017 at 1:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Fri, Jan 27, 2017 at 11:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I'm afraid though that we may have to do something about the >>> irrelevant-joinquals issue in order for this to be of much real-world >>> use for inner joins. > >> Maybe, but it's certainly not the case that all inner joins are highly >> selective. There are plenty of inner joins in real-world applications >> where the join product is 10% or 20% or 50% of the size of the larger >> input. > > Um ... what's that got to do with the point at hand? I thought it was directly relevant, but maybe I'm confused. Further up in that email, you wrote: "If there's a unique index on t2.x, we'll be able to mark the join inner-unique. However, short-circuiting would only occur after finding a row that passes both joinquals. If the y condition is true for only a few rows, this would pretty nearly disable the optimization. Ideally we would short-circuit after testing the x condition only, but there's no provision for that." So I assumed from that that the issue was that you'd have to wait for the first time the irrelevant-joinqual got satisfied before the optimization kicked in. But, if the join is emitting lots of rows, that'll happen pretty quickly. I mean, if the join emits even as many 20 rows, the time after the first one is, all things being equal, 95% of the runtime of the join. There could certainly be bad cases where it takes a long time to produce the first row, but I wouldn't say that's a particularly common thing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Jan 27, 2017 at 1:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Um ... what's that got to do with the point at hand? > So I assumed from that that the issue was that you'd have to wait for > the first time the irrelevant-joinqual got satisfied before the > optimization kicked in. No, the problem is that that needs to happen for *each* outer row, and there's only one chance for it to happen. Given the previous example, select ... from t1 left join t2 on t1.x = t2.x and t1.y < t2.y once we've found an x match for a given outer row, there aren't going to be any more and we should move on to the next outer row. But as the patch stands, we only recognize that if t1.y < t2.y happens to be true for that particular row pair. Otherwise we'll keep searching and we'll never find another match for that outer row. So if the y condition is, say, 50% selective then the optimization only wins for 50% of the outer rows (that have an x partner in the first place). Now certainly that's better than a sharp stick in the eye, and maybe we should just press forward anyway. But it feels like this is leaving a lot more on the table than I originally thought. Especially for the inner-join case, where *all* the WHERE conditions get a chance to break the optimization this way. regards, tom lane
On Fri, Jan 27, 2017 at 2:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Fri, Jan 27, 2017 at 1:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Um ... what's that got to do with the point at hand? > >> So I assumed from that that the issue was that you'd have to wait for >> the first time the irrelevant-joinqual got satisfied before the >> optimization kicked in. > > No, the problem is that that needs to happen for *each* outer row, > and there's only one chance for it to happen. Given the previous > example, > > select ... from t1 left join t2 on t1.x = t2.x and t1.y < t2.y > > once we've found an x match for a given outer row, there aren't going to > be any more and we should move on to the next outer row. But as the patch > stands, we only recognize that if t1.y < t2.y happens to be true for that > particular row pair. Otherwise we'll keep searching and we'll never find > another match for that outer row. So if the y condition is, say, 50% > selective then the optimization only wins for 50% of the outer rows > (that have an x partner in the first place). > > Now certainly that's better than a sharp stick in the eye, and > maybe we should just press forward anyway. But it feels like > this is leaving a lot more on the table than I originally thought. > Especially for the inner-join case, where *all* the WHERE conditions > get a chance to break the optimization this way. OK, now I understand why you were concerned. Given the size of some of the speedups David's reported on this thread, I'd be tempted to press forward even if no solution to this part of the problem presents itself, but I also agree with you that it's leaving quite a bit on the table if we can't do better. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 28 January 2017 at 05:04, Tom Lane <tgl@sss.pgh.pa.us> wrote: > David Rowley <david.rowley@2ndquadrant.com> writes: >> I agree that special handling of one join type is not so pretty. >> However, LEFT JOINs still remain a bit special as they're the only >> ones we currently perform join removal on, and the patch modifies that >> code to make use of the new flag for those. This can improve planner >> performance of join removal when a join is removed successfully, as >> the previous code had to recheck uniqueness of each remaining LEFT >> JOIN again, whereas the new code only checks uniqueness of ones not >> previously marked as unique. This too likely could be done with the >> cache, although I'm a bit concerned with populating the cache, then >> performing a bunch of LEFT JOIN removals and leaving relids in the >> cache which no longer exist. Perhaps it's OK. I've just not found >> proofs in my head yet that it is. > > TBH, I do not like that tie-in at all. I don't believe that it improves > any performance, because if analyzejoins.c detects that the join is > unique, it will remove the join; therefore there is nothing to cache. > (This statement depends on the uniqueness test being the last removability > test, but it is.) And running mark_unique_joins() multiple times is ugly > and adds cycles whenever it cannot prove a join unique, because it'll keep > trying to do so. So I'm pretty inclined to drop the connection to > analyzejoins.c altogether, along with mark_unique_joins(), and just use > the generic positive/negative cache mechanism you added for all join types. I can make this change, but before I do I just want to point that I don't think what you've said here is entirely accurate. Let's assume unique joins are very common place, and join removals are not so common. If a query has 5 left joins, and only one of which can be removed, then the new code will most likely perform 5 unique join checks, whereas the old code would perform 9, as those unique checks are performed again once the 1 relation is removed for the remaining 4. However I'll go make the change as something needs fixed in that area anyway, as LEFT JOINs use the additional quals, whereas other join types don't, which is broken. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
David Rowley <david.rowley@2ndquadrant.com> writes: > I can make this change, but before I do I just want to point that I > don't think what you've said here is entirely accurate. > Let's assume unique joins are very common place, and join removals are > not so common. If a query has 5 left joins, and only one of which can > be removed, then the new code will most likely perform 5 unique join > checks, whereas the old code would perform 9, as those unique checks > are performed again once the 1 relation is removed for the remaining > 4. I'm not following. If the join removal code had reached the stage of making a uniqueness check, and that check had succeeded, the join would be gone and there would be no repeat check later. If it didn't reach the stage of making a uniqueness check, then again there's no duplication. There will be some advantage in making a negative cache entry if join removal performs a uniqueness check that fails, but I don't really see why that's hard. It does not seem like removal of a relation could cause another rel to become unique that wasn't before, so keeping negative cache entries across join removals ought to be safe. regards, tom lane
On 31 January 2017 at 04:56, Tom Lane <tgl@sss.pgh.pa.us> wrote: > David Rowley <david.rowley@2ndquadrant.com> writes: >> I can make this change, but before I do I just want to point that I >> don't think what you've said here is entirely accurate. > >> Let's assume unique joins are very common place, and join removals are >> not so common. If a query has 5 left joins, and only one of which can >> be removed, then the new code will most likely perform 5 unique join >> checks, whereas the old code would perform 9, as those unique checks >> are performed again once the 1 relation is removed for the remaining >> 4. > > I'm not following. If the join removal code had reached the stage of > making a uniqueness check, and that check had succeeded, the join would be > gone and there would be no repeat check later. If it didn't reach the > stage of making a uniqueness check, then again there's no duplication. I had forgotten the unique check was performed last. In that case the check for unused columns is duplicated needlessly each time. But let's drop it, as putting the code back is not making things any worse. > There will be some advantage in making a negative cache entry if join > removal performs a uniqueness check that fails, but I don't really see > why that's hard. It does not seem like removal of a relation could > cause another rel to become unique that wasn't before, so keeping > negative cache entries across join removals ought to be safe. I don't think that's possible. The whole point that the current join removal code retries to remove joins which it already tried to remove, after a successful removal is exactly because it is possible for a join to become provability unique on the removal of another join. If you remove that retry code, a regression test fails. I believe this is because there initially would have been more than one RHS rel, and the bitmap singleton check would have failed. After all a unique index is on a single relation, so proofs don't exist when >1 rel is on the RHS. In any case, it's not possible to use the cache with join removals, as we use the otherquals for unique tests in join removals, but we can't for unique joins, as I'm adding those optimizations to the executor which rely on the uniqueness being on the join condition alone, so we can skip to the next outer tuple on matched join, but unmatched quals. If I change how join removals work in that regard it will disallow join removals where they were previously possible. So that's a no go area. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
David Rowley <david.rowley@2ndquadrant.com> writes: > On 31 January 2017 at 04:56, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'm not following. If the join removal code had reached the stage of >> making a uniqueness check, and that check had succeeded, the join would be >> gone and there would be no repeat check later. If it didn't reach the >> stage of making a uniqueness check, then again there's no duplication. > I had forgotten the unique check was performed last. In that case the > check for unused columns is duplicated needlessly each time. I think we do need to repeat that each time, as columns that were formerly used in a join condition to a now-dropped relation might thereby have become unused. > But let's > drop it, as putting the code back is not making things any worse. Agreed, if there is something to be won there, we can address it separately. > I don't think that's possible. The whole point that the current join > removal code retries to remove joins which it already tried to remove, > after a successful removal is exactly because it is possible for a > join to become provability unique on the removal of another join. Not seeing that ... example please? > If you remove that retry code, a regression test fails. Probably because of the point about unused columns... regards, tom lane
On 28 January 2017 at 05:44, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> David Rowley <david.rowley@2ndquadrant.com> writes: >>> hmm. I'm having trouble understanding why this is a problem for Unique >>> joins, but not for join removal? > >> Ah, you know what, that's just mistaken. I was thinking that we >> short-circuited the join on the strength of the hash (or merge) quals >> only, but actually we check all the joinquals first. As long as the >> uniqueness proof uses only joinquals and not conditions that will end up >> as otherquals, it's fine. > > Actually, after thinking about that some more, it seems to me that there > is a performance (not correctness) issue here: suppose that we have > something like > > select ... from t1 left join t2 on t1.x = t2.x and t1.y < t2.y > > If there's a unique index on t2.x, we'll be able to mark the join > inner-unique. However, short-circuiting would only occur after > finding a row that passes both joinquals. If the y condition is > true for only a few rows, this would pretty nearly disable the > optimization. Ideally we would short-circuit after testing the x > condition only, but there's no provision for that. I've attached a patch which implements this, though only for MergeJoin, else I'd imagine we'd also need to ensure all proofs used for testing the uniqueness were also hash-able too. I added some XXX comments in analyzejoin.c around the mergeopfamilies == NIL tests to mention that Merge Join depends on all the unique proof quals having mergeopfamilies. This also assumes we'll never use some subset of mergejoin-able quals for a merge join, which could be an interesting area in the future, as we might have some btree index on a subset of those columns to provide pre-sorted input. In short, it's a great optimisation, but I'm a little scared we might break it one day. Implementing this meant removing the match_first_row_only being set for JOIN_SEMI, as this optimisation only applies to unique_inner and not JOIN_SEMI alone. This also means we should be checking for unique properties on JOIN_SEMI now too, which I've enabled. Also, I've removed all jointype checks from innerrel_is_unique(). I took a bit of time to think about JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER. I think these are fine too, in fact one of the regression test plans moved away from using a Semi Join due to proving that the inner side was unique. That could perhaps allow a better plan, since the join order can be swapped. I'd really like someone else to have a think about that too, just to make sure I've not blundered that. I've put the join removal code back to the way it was before. As I mentioned, we can't use the caches here since we're also using additional quals for proofs in this case. I wasn't sure if I should add some regression tests which exercises MergeJoin a bit to test the new optimisation. Any thoughts on that? David -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 31 January 2017 at 13:10, David Rowley <david.rowley@2ndquadrant.com> wrote: > I've attached a patch which implements this. Please disregards previous patch. (I forgot git commit before git diff to make the patch) I've attached the correct patch. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 31 January 2017 at 10:43, Tom Lane <tgl@sss.pgh.pa.us> wrote: > David Rowley <david.rowley@2ndquadrant.com> writes: >> I don't think that's possible. The whole point that the current join >> removal code retries to remove joins which it already tried to remove, >> after a successful removal is exactly because it is possible for a >> join to become provability unique on the removal of another join. > > Not seeing that ... example please? I had a quick look at this again as it had been a while since I noticed that. The sample case was: create temp table uniquetbl (f1 text unique); explain (costs off) select t1.* from uniquetbl as t1 left join (select *, '***'::text as d1 from uniquetbl) t2 on t1.f1 = t2.f1 left join uniquetblt3 on t2.d1 = t3.f1; However, what it actually fails on depends on if you check for unused columns or uniqueness first as initially the subquery fails both of the tests. I was under the impression it was failing the unique test, as that's what I was doing first in my patch. If you test uniqueness first it'll fail on: /* * If such a clause actually references the inner rel then join * removal has to be disallowed. We have to check this despite * the previous attr_needed checks because of the possibility of * pushed-down clauses referencing the rel. */ if (bms_is_member(innerrelid, restrictinfo->clause_relids)) return false; but if you test for unused columns first, it'll fail on: if (bms_is_subset(phinfo->ph_eval_at, innerrel->relids)) return false; /* there isn't any other place to eval PHV */ -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Performance improvement for joins where outer side is unique
From
Michael Paquier
Date:
On Tue, Jan 31, 2017 at 9:13 AM, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 31 January 2017 at 13:10, David Rowley <david.rowley@2ndquadrant.com> wrote: >> I've attached a patch which implements this. > > Please disregards previous patch. (I forgot git commit before git diff > to make the patch) > > I've attached the correct patch. Moved to CF 2017-03. (You are the last one.) -- Michael
[ getting back to this patch finally... ] David Rowley <david.rowley@2ndquadrant.com> writes: > I've attached a patch which implements this, though only for > MergeJoin, else I'd imagine we'd also need to ensure all proofs used > for testing the uniqueness were also hash-able too. I added some XXX > comments in analyzejoin.c around the mergeopfamilies == NIL tests to > mention that Merge Join depends on all the unique proof quals having > mergeopfamilies. This also assumes we'll never use some subset of > mergejoin-able quals for a merge join, which could be an interesting > area in the future, as we might have some btree index on a subset of > those columns to provide pre-sorted input. In short, it's a great > optimisation, but I'm a little scared we might break it one day. Umm ... it's broken already isn't it? See the comment for generate_mergejoin_paths: * We generate mergejoins if mergejoin clauses are available. We have* two ways to generate the inner path for a mergejoin:sort the cheapest* inner path, or use an inner path that is already suitably ordered for the* merge. If we haveseveral mergeclauses, it could be that there is no inner* path (or only a very expensive one) for the full list of mergeclauses,but* better paths exist if we truncate the mergeclause list (thereby discarding* some sort key requirements). So, we consider truncations of the* mergeclause list as well as the full list. (Ideally we'd consider all*subsets of the mergeclause list, but that seems way too expensive.) There's another, more subtle, way in which it could fail in sort_inner_and_outer(): * Each possible ordering of the available mergejoin clauses will generate* a differently-sorted result path at essentiallythe same cost. We have* no basis for choosing one over another at this level of joining, but* some sort ordersmay be more useful than others for higher-level* mergejoins, so it's worth considering multiple orderings.** Actually,it's not quite true that every mergeclause ordering will* generate a different path order, because some of the clausesmay be* partially redundant (refer to the same EquivalenceClasses). Therefore,* what we do is convert the mergeclauselist to a list of canonical* pathkeys, and then consider different orderings of the pathkeys. I'm fairly sure that it's possible to end up with fewer pathkeys than there are mergeclauses in this code path. Now, you might be all right anyway given that the mergeclauses must refer to the same ECs in such a case --- maybe they're fully redundant and we can take testing the included clause as proving the omitted one(s) too. I'm not certain right now what I meant by "partially redundant" in this comment. But in any case, it's moot for the present purpose because generate_mergejoin_paths certainly breaks your assumption. regards, tom lane
On 14 March 2017 at 07:50, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I had to fix up some conflicts before testing this again on a recent master, so I've attached my rebased version. No fix yet for the above issue[ getting back to this patch finally... ]
David Rowley <david.rowley@2ndquadrant.com> writes:
> I've attached a patch which implements this, though only for
> MergeJoin, else I'd imagine we'd also need to ensure all proofs used
> for testing the uniqueness were also hash-able too. I added some XXX
> comments in analyzejoin.c around the mergeopfamilies == NIL tests to
> mention that Merge Join depends on all the unique proof quals having
> mergeopfamilies. This also assumes we'll never use some subset of
> mergejoin-able quals for a merge join, which could be an interesting
> area in the future, as we might have some btree index on a subset of
> those columns to provide pre-sorted input. In short, it's a great
> optimisation, but I'm a little scared we might break it one day.
Umm ... it's broken already isn't it? See the comment for
generate_mergejoin_paths:
* We generate mergejoins if mergejoin clauses are available. We have
* two ways to generate the inner path for a mergejoin: sort the cheapest
* inner path, or use an inner path that is already suitably ordered for the
* merge. If we have several mergeclauses, it could be that there is no inner
* path (or only a very expensive one) for the full list of mergeclauses, but
* better paths exist if we truncate the mergeclause list (thereby discarding
* some sort key requirements). So, we consider truncations of the
* mergeclause list as well as the full list. (Ideally we'd consider all
* subsets of the mergeclause list, but that seems way too expensive.)
There's another, more subtle, way in which it could fail in
sort_inner_and_outer():
* Each possible ordering of the available mergejoin clauses will generate
* a differently-sorted result path at essentially the same cost. We have
* no basis for choosing one over another at this level of joining, but
* some sort orders may be more useful than others for higher-level
* mergejoins, so it's worth considering multiple orderings.
*
* Actually, it's not quite true that every mergeclause ordering will
* generate a different path order, because some of the clauses may be
* partially redundant (refer to the same EquivalenceClasses). Therefore,
* what we do is convert the mergeclause list to a list of canonical
* pathkeys, and then consider different orderings of the pathkeys.
I'm fairly sure that it's possible to end up with fewer pathkeys than
there are mergeclauses in this code path. Now, you might be all right
anyway given that the mergeclauses must refer to the same ECs in such a
case --- maybe they're fully redundant and we can take testing the
included clause as proving the omitted one(s) too. I'm not certain
right now what I meant by "partially redundant" in this comment.
But in any case, it's moot for the present purpose because
generate_mergejoin_paths certainly breaks your assumption.
Thanks for looking at this again.
Yeah confirmed. It's broken. I guess I just need to remember in the Path if we got all the join quals, although I've not looked in detail what the best fix is. I imagine it'll require storing something else in the JoinPath.
Here's my test case:
select 'create tablespace slow_disk LOCATION ''' || current_setting('data_directory') || ''' WITH (random_page_cost=1000);';
\gexec
create table ab (a int not null, b int not null);
create unique index ab_pkey on ab (a,b) tablespace slow_disk;
alter table ab add constraint ab_pkey primary key using index ab_pkey;
create index ab_a_idx on ab (a);
insert into ab values(1,1);
insert into ab values(1,2);
set enable_nestloop to 0;
set enable_hashjoin to 0;
set enable_sort to 0;
explain verbose select * from ab ab1 inner join ab ab2 on ab1.a = ab2.a and ab1.b = ab2.b;
QUERY PLAN
-------------------------------------------------------------------------------------
Merge Join (cost=0.26..24.35 rows=1 width=16)
Output: ab1.a, ab1.b, ab2.a, ab2.b
Inner Unique: Yes
Merge Cond: (ab1.a = ab2.a)
Join Filter: (ab1.b = ab2.b)
-> Index Scan using ab_a_idx on public.ab ab1 (cost=0.13..12.16 rows=2 width=8)
Output: ab1.a, ab1.b
-> Index Scan using ab_a_idx on public.ab ab2 (cost=0.13..12.16 rows=2 width=8)
Output: ab2.a, ab2.b
(9 rows)
select * from ab ab1 inner join ab ab2 on ab1.a = ab2.a and ab1.b = ab2.b; -- wrong results
a | b | a | b
---+---+---+---
1 | 2 | 1 | 2
(1 row)
drop index ab_a_idx;
-- same query again. This time it'll use the PK index on the slow_disk tablespace
select * from ab ab1 inner join ab ab2 on ab1.a = ab2.a and ab1.b = ab2.b;
a | b | a | b
---+---+---+---
1 | 1 | 1 | 1
1 | 2 | 1 | 2
(2 rows)
Attachment
On 14 March 2017 at 11:35, David Rowley <david.rowley@2ndquadrant.com> wrote:
-- On 14 March 2017 at 07:50, Tom Lane <tgl@sss.pgh.pa.us> wrote:[ getting back to this patch finally... ]
David Rowley <david.rowley@2ndquadrant.com> writes:
> I've attached a patch which implements this, though only for
> MergeJoin, else I'd imagine we'd also need to ensure all proofs used
> for testing the uniqueness were also hash-able too. I added some XXX
> comments in analyzejoin.c around the mergeopfamilies == NIL tests to
> mention that Merge Join depends on all the unique proof quals having
> mergeopfamilies. This also assumes we'll never use some subset of
> mergejoin-able quals for a merge join, which could be an interesting
> area in the future, as we might have some btree index on a subset of
> those columns to provide pre-sorted input. In short, it's a great
> optimisation, but I'm a little scared we might break it one day.
Umm ... it's broken already isn't it? See the comment for
generate_mergejoin_paths:Thanks for looking at this again.Yeah confirmed. It's broken. I guess I just need to remember in the Path if we got all the join quals, although I've not looked in detail what the best fix is. I imagine it'll require storing something else in the JoinPath.
OK, so I've spent some more time on this and I've come up with a solution.
Basically the solution is to not skip mark and restore when joinquals contains any items. This is a requirement for SEMI joins, but overly cautious for unique joins. However, I think it'll apply in most cases when Merge Join will be a win, and that's when there's a btree index on both sides of the join which covers all columns in the join condition. I carefully commented this part of the code to explain what can be done to have it apply in more cases.
This caused me to go and change the following code too:
@@ -2676,6 +2688,9 @@ final_cost_mergejoin(PlannerInfo *root, MergePath *path,
* it off does not entitle us to deliver an invalid plan.
*/
else if (innersortkeys == NIL &&
+ !((extra->inner_unique || path->jpath.jointype == JOIN_SEMI) &&
+ list_length(path->jpath.joinrestrictinfo) ==
+ list_length(path->path_mergeclauses)) &&
!ExecSupportsMarkRestore(inner_path))
path->materialize_inner = true;
I've been staring at this for a while and I think it's correct. If it's wrong then we'd get "ERROR: unrecognized node type: <n>" from ExecMarkPos().
Here we must make sure and never skip materializing the inner side, if we're not going to skip mark and restore in ExecInitMergeJoin().
I've attached a patch which fixes the issue.
Also added a regression test to try to make sure it stays fixed. There's a few other mostly cosmetic fixes in there too.
Attachment
On 14 March 2017 at 16:37, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 14 March 2017 at 11:35, David Rowley <david.rowley@2ndquadrant.com> > wrote: >> >> On 14 March 2017 at 07:50, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> >>> [ getting back to this patch finally... ] >>> >>> David Rowley <david.rowley@2ndquadrant.com> writes: >>> > I've attached a patch which implements this, though only for >>> > MergeJoin, else I'd imagine we'd also need to ensure all proofs used >>> > for testing the uniqueness were also hash-able too. I added some XXX >>> > comments in analyzejoin.c around the mergeopfamilies == NIL tests to >>> > mention that Merge Join depends on all the unique proof quals having >>> > mergeopfamilies. This also assumes we'll never use some subset of >>> > mergejoin-able quals for a merge join, which could be an interesting >>> > area in the future, as we might have some btree index on a subset of >>> > those columns to provide pre-sorted input. In short, it's a great >>> > optimisation, but I'm a little scared we might break it one day. >>> >>> Umm ... it's broken already isn't it? See the comment for >>> generate_mergejoin_paths: >>> >> Thanks for looking at this again. >> >> Yeah confirmed. It's broken. I guess I just need to remember in the Path >> if we got all the join quals, although I've not looked in detail what the >> best fix is. I imagine it'll require storing something else in the JoinPath. > > > OK, so I've spent some more time on this and I've come up with a solution. > > Basically the solution is to not skip mark and restore when joinquals > contains any items. This is a requirement for SEMI joins, but overly > cautious for unique joins. However, I think it'll apply in most cases when > Merge Join will be a win, and that's when there's a btree index on both > sides of the join which covers all columns in the join condition. I > carefully commented this part of the code to explain what can be done to > have it apply in more cases. > > This caused me to go and change the following code too: > > @@ -2676,6 +2688,9 @@ final_cost_mergejoin(PlannerInfo *root, MergePath > *path, > * it off does not entitle us to deliver an invalid plan. > */ > else if (innersortkeys == NIL && > + !((extra->inner_unique || path->jpath.jointype == JOIN_SEMI) && > + list_length(path->jpath.joinrestrictinfo) == > + list_length(path->path_mergeclauses)) && > !ExecSupportsMarkRestore(inner_path)) > path->materialize_inner = true; > > I've been staring at this for a while and I think it's correct. If it's > wrong then we'd get "ERROR: unrecognized node type: <n>" from ExecMarkPos(). > > Here we must make sure and never skip materializing the inner side, if we're > not going to skip mark and restore in ExecInitMergeJoin(). > > I've attached a patch which fixes the issue. > > Also added a regression test to try to make sure it stays fixed. There's a > few other mostly cosmetic fixes in there too. Patch is attached which fixes up the conflict between the expression evaluation performance patch. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 27 March 2017 at 09:28, David Rowley <david.rowley@2ndquadrant.com> wrote: > Patch is attached which fixes up the conflict between the expression > evaluation performance patch. Seems I forgot to commit locally before creating the patch... Here's the actual patch I meant to attach earlier. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 27 March 2017 at 15:51, David Rowley <david.rowley@2ndquadrant.com> wrote:
On 27 March 2017 at 09:28, David Rowley <david.rowley@2ndquadrant.com> wrote:
> Patch is attached which fixes up the conflict between the expression
> evaluation performance patch.
Seems I forgot to commit locally before creating the patch... Here's
the actual patch I meant to attach earlier.
I've attached an updated patch which updates the regression test output of a recent commit to include the "Unique Inner" in the expected results.
I've also performed a pgindent run on the patch.
Tom, I'm wondering if you think you'll get time to look at this before the feature freeze?
David
On Sun, Apr 2, 2017 at 5:21 AM, David Rowley <david.rowley@2ndquadrant.com> wrote: > I've attached an updated patch which updates the regression test output of a > recent commit to include the "Unique Inner" in the expected results. Was this email supposed to have a patch attached? > Tom, I'm wondering if you think you'll get time to look at this before the > feature freeze? /me crosses fingers. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2 April 2017 at 21:21, David Rowley <david.rowley@2ndquadrant.com> wrote:
The patch must've fallen off. Attempt number 2 at attaching.I've attached an updated patch which updates the regression test output of a recent commit to include the "Unique Inner" in the expected results.
Attachment
David Rowley <david.rowley@2ndquadrant.com> writes: > Tom, I'm wondering if you think you'll get time to look at this before the > feature freeze? Yeah, I intend to. Thanks for updating the patch. regards, tom lane
David Rowley <david.rowley@2ndquadrant.com> writes: > On 2 April 2017 at 21:21, David Rowley <david.rowley@2ndquadrant.com> wrote: >> I've attached an updated patch which updates the regression test output of >> a recent commit to include the "Unique Inner" in the expected results. > The patch must've fallen off. Attempt number 2 at attaching. I'm looking through this, and I'm failing to see where it deals with the problem we discussed last time, namely that you can't apply the optimization unless all clauses that were used in the uniqueness proof are included in the join's merge/hash conditions + joinquals. It might be sufficient to ignore is_pushed_down conditions (at an outer join only) while trying to make the proofs; but if it's doing that, I don't see where. I don't especially like the centralized unique_rels cache structure. It's not terribly clear what it's for, and you're making uncomfortably large assumptions about never indexing off the end of the array, despite not having any range checks for the subscripts. Wouldn't it be better to add simple List fields into RelOptInfo, representing the outer rels this rel has been proven unique or not-unique for? That would dodge the array size question and would be more readily extensible to someday applying this to join rels. I also think some more thought is needed about handling JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER cases. In the first place, the patch cavalierly violates the statement in joinpath.c that those jointype values never propagate outside that module. In the second place, shouldn't JOIN_UNIQUE_INNER automatically result in the path getting marked inner_unique? I suspect the logic in add_paths_to_joinrel ought to look something like if (jointype == JOIN_UNIQUE_INNER) extra.inner_unique = true; else extra.inner_unique = innerrel_is_unique(root,outerrel, innerrel, (jointype == JOIN_UNIQUE_OUTER ? JOIN_INNER : jointype), restrictlist); regards, tom lane
On 7 April 2017 at 07:26, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm looking through this, and I'm failing to see where it deals with > the problem we discussed last time, namely that you can't apply the > optimization unless all clauses that were used in the uniqueness > proof are included in the join's merge/hash conditions + joinquals. Many thanks for looking at this again. The test in question is in nodeMergeJoin.c. I believe the join is still unique no matter where the clauses are evaluated. It should be up to the executor code to make use of the knowledge how it sees fit. The planner should not make assumptions on how the executor will make use of this knowledge. I've carefully crafted a comment in nodeMergejoin.c which explains all of this, and also about the limitations and where things might be improved later. The code in question is: mergestate->mj_SkipMarkRestore = !mergestate->js.joinqual && mergestate->js.first_inner_tuple_only; > I don't especially like the centralized unique_rels cache structure. > It's not terribly clear what it's for, and you're making uncomfortably > large assumptions about never indexing off the end of the array, despite > not having any range checks for the subscripts. Wouldn't it be better to > add simple List fields into RelOptInfo, representing the outer rels this > rel has been proven unique or not-unique for? That would dodge the array > size question and would be more readily extensible to someday applying > this to join rels . hmm, perhaps bounds checking could be done, but it's no worse than planner_rt_fetch(). I don't really think the List idea would be nearly as efficient. The array provides a direct lookup for the List of proof relations. A List of List list would require a linear lookup just to find the correct List, then the existing linear lookup to find the proofs. The cache is designed to be fast. Slowing it down seems like a bad idea. Perhaps throwing it away would be better, since it's not required and was only added as an optimisation. The non_unique_rels will most often have a NULL bitmap set due to the incremental join search by the standard planner. So access to this as an array should be very fast, as we'll quickly realise there are no proofs to be found. > I also think some more thought is needed about handling JOIN_UNIQUE_OUTER > and JOIN_UNIQUE_INNER cases. In the first place, the patch cavalierly > violates the statement in joinpath.c that those jointype values never > propagate outside that module. In the second place, shouldn't > JOIN_UNIQUE_INNER automatically result in the path getting marked > inner_unique? I suspect the logic in add_paths_to_joinrel ought to > look something like > > if (jointype == JOIN_UNIQUE_INNER) > extra.inner_unique = true; > else > extra.inner_unique = innerrel_is_unique(root, outerrel, innerrel, > (jointype == JOIN_UNIQUE_OUTER ? JOIN_INNER : jointype), > restrictlist); hmm, innerrel_is_unique() is without prejudice as to the jointypes it supports, so much so that the argument is completely ignored. It was probably left over from some previous incarnation of the patch. If we treat JOIN_UNIQUE_INNER specially, then we'd better be sure that it's made unique on the RHS join quals. It looks like create_unique_path() uses sjinfo->semi_rhs_exprs to uniquify the relation, and compute_semijoin_info() seems to take all of the join conditions there or nothing at all, so I think it's safe to automatically mark JOIN_UNIQUE_INNERs this way. Updated patch is attached. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
David Rowley <david.rowley@2ndquadrant.com> writes: > On 7 April 2017 at 07:26, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'm looking through this, and I'm failing to see where it deals with >> the problem we discussed last time, namely that you can't apply the >> optimization unless all clauses that were used in the uniqueness >> proof are included in the join's merge/hash conditions + joinquals. > The code in question is: > mergestate->mj_SkipMarkRestore = !mergestate->js.joinqual && > mergestate->js.first_inner_tuple_only; Uh, AFAICS that only protects the skip-mark-and-restore logic. What I'm on about is that you can't do the early advance to the next outer tuple unless you're sure that all the quals that were relevant to the uniqueness proof have been checked for the current inner tuple. That affects all three join types not only merge. The case that would be relevant to this is, eg, create table t1 (f1 int, f2 int, primary key(f1,f2)); select * from t_outer left join t1 on (t_outer.f1 = t1.f1) where t_outer.f2 = t2.f2; Your existing patch would think t1 is unique-inner, but the qual pushed down from WHERE would not be a joinqual so the wrong thing would happen at runtime. (Hm ... actually, this example wouldn't fail as written because the WHERE qual is probably strict, so the left join would get reduced to an inner join and then pushed-down-ness no longer matters. But hopefully you get my drift.) > I don't really think the List idea would be nearly as efficient. No, what I'm saying is that each RelOptInfo would contain a single List of Relids of proven-unique-for outer rels (and another one for the negative cache). No array, no more searching than you have now, just removal of an uncertainly-safe array fetch. regards, tom lane
On 7 April 2017 at 11:47, Tom Lane <tgl@sss.pgh.pa.us> wrote: > David Rowley <david.rowley@2ndquadrant.com> writes: >> On 7 April 2017 at 07:26, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I'm looking through this, and I'm failing to see where it deals with >>> the problem we discussed last time, namely that you can't apply the >>> optimization unless all clauses that were used in the uniqueness >>> proof are included in the join's merge/hash conditions + joinquals. > >> The code in question is: >> mergestate->mj_SkipMarkRestore = !mergestate->js.joinqual && >> mergestate->js.first_inner_tuple_only; > > Uh, AFAICS that only protects the skip-mark-and-restore logic. > What I'm on about is that you can't do the early advance to the > next outer tuple unless you're sure that all the quals that were > relevant to the uniqueness proof have been checked for the current > inner tuple. That affects all three join types not only merge. Well, look at the join code and you'll see this only happens after the joinqual is evaulated. I didn't make a special effort here. I just borrowed the location that JOIN_SEMI was already using. For example, from hash join: if (joinqual == NULL || ExecQual(joinqual, econtext)) { node->hj_MatchedOuter = true; HeapTupleHeaderSetMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple)); /* In an antijoin, we never return a matched tuple */ if (node->js.jointype == JOIN_ANTI) { node->hj_JoinState = HJ_NEED_NEW_OUTER; continue; } /* * Skip to the next outer tuple if we only need to join to * the first inner matching tuple. */ if (node->js.first_inner_tuple_only) node->hj_JoinState = HJ_NEED_NEW_OUTER; Note the first line and the final two lines. Here's the one from nested loop: if (ExecQual(joinqual, econtext)) { node->nl_MatchedOuter = true; /* In an antijoin, we never return a matched tuple */ if (node->js.jointype == JOIN_ANTI) { node->nl_NeedNewOuter = true; continue; /* return to top of loop */ } /* * Skip to the next outer tuple if we only need to join to the * first inner matching tuple. */ if (node->js.first_inner_tuple_only) node->nl_NeedNewOuter = true; Again, note the first line and final 2 lines. > The case that would be relevant to this is, eg, > > create table t1 (f1 int, f2 int, primary key(f1,f2)); > > select * from t_outer left join t1 on (t_outer.f1 = t1.f1) > where t_outer.f2 = t2.f2; hmm, that query is not valid, unless you have created some table named t_outer. I don't know how you've defined that. So I guess you must have meant: postgres=# explain verbose select * from t1 t_outer left join t1 on (t_outer.f1 = t1.f1) where t_outer.f2 = t1.f2; QUERY PLAN --------------------------------------------------------------------------- Hash Join (cost=66.50..133.57 rows=128 width=16) Output: t_outer.f1, t_outer.f2, t1.f1, t1.f2 Inner Unique: Yes Hash Cond: ((t_outer.f1 = t1.f1) AND (t_outer.f2 = t1.f2)) -> Seq Scan on public.t1 t_outer (cost=0.00..32.60 rows=2260 width=8) Output: t_outer.f1, t_outer.f2 -> Hash (cost=32.60..32.60 rows=2260 width=8) Output: t1.f1, t1.f2 -> Seq Scan on public.t1 (cost=0.00..32.60 rows=2260 width=8) Output: t1.f1, t1.f2 (10 rows) Which did become an INNER JOIN due to the strict W If you'd had done: postgres=# explain verbose select * from t1 t_outer left join t1 on (t_outer.f1 = t1.f1) where t_outer.f2 = t1.f2 or t1.f1 is null; QUERY PLAN ------------------------------------------------------------------------------------------------ Merge Left Join (cost=0.31..608.67 rows=255 width=16) Output: t_outer.f1, t_outer.f2, t1.f1, t1.f2 Inner Unique: No Merge Cond: (t_outer.f1 = t1.f1) Filter: ((t_outer.f2 = t1.f2) OR (t1.f1 IS NULL)) -> Index Only Scan using t1_pkey on public.t1 t_outer (cost=0.16..78.06 rows=2260 width=8) Output: t_outer.f1, t_outer.f2 -> Materialize (cost=0.16..83.71 rows=2260 width=8) Output: t1.f1, t1.f2 -> Index Only Scan using t1_pkey on public.t1 (cost=0.16..78.06 rows=2260 width=8) Output: t1.f1, t1.f2 (11 rows) You'll notice that "Inner Unique: No" > Your existing patch would think t1 is unique-inner, but the qual pushed > down from WHERE would not be a joinqual so the wrong thing would happen > at runtime. > > (Hm ... actually, this example wouldn't fail as written because > the WHERE qual is probably strict, so the left join would get > reduced to an inner join and then pushed-down-ness no longer > matters. But hopefully you get my drift.) Oh yeah. I get it, but that's why we ignore !can_join clauses /* Ignore if it's not a mergejoinable clause */ if (!restrictinfo->can_join || restrictinfo->mergeopfamilies == NIL) continue; /* not mergejoinable */ no? >> I don't really think the List idea would be nearly as efficient. > > No, what I'm saying is that each RelOptInfo would contain a single List of > Relids of proven-unique-for outer rels (and another one for the negative > cache). No array, no more searching than you have now, just removal of an > uncertainly-safe array fetch. That's way cleaner. Thanks. I've changed it that way. Feeling silly now for having done it the original way. Updated patch is attached. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
David Rowley <david.rowley@2ndquadrant.com> writes: > On 7 April 2017 at 11:47, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> What I'm on about is that you can't do the early advance to the >> next outer tuple unless you're sure that all the quals that were >> relevant to the uniqueness proof have been checked for the current >> inner tuple. That affects all three join types not only merge. > Well, look at the join code and you'll see this only happens after the > joinqual is evaulated. I didn't make a special effort here. I just > borrowed the location that JOIN_SEMI was already using. Right, and that's exactly the point: some of the conditions you're depending on might have ended up in the otherqual not the joinqual. We'd discussed rearranging the executor logic enough to deal with such situations and agreed that it seemed too messy; but that means that the optimization needs to take care not to use otherqual (ie pushed-down) conditions in the uniqueness proofs. > Oh yeah. I get it, but that's why we ignore !can_join clauses can_join seems to me to be not particularly relevant ... there's nothing that prevents that from getting set for pushed-down clauses. It's possible that the case I'm worried about is unreachable in practice because all the conditions that could be of interest would be strict and therefore would have forced join strength reduction. But I'm not comfortable with assuming that. regards, tom lane
On 7 April 2017 at 13:41, Tom Lane <tgl@sss.pgh.pa.us> wrote: > David Rowley <david.rowley@2ndquadrant.com> writes: >> On 7 April 2017 at 11:47, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> What I'm on about is that you can't do the early advance to the >>> next outer tuple unless you're sure that all the quals that were >>> relevant to the uniqueness proof have been checked for the current >>> inner tuple. That affects all three join types not only merge. > >> Well, look at the join code and you'll see this only happens after the >> joinqual is evaulated. I didn't make a special effort here. I just >> borrowed the location that JOIN_SEMI was already using. > > Right, and that's exactly the point: some of the conditions you're > depending on might have ended up in the otherqual not the joinqual. > > We'd discussed rearranging the executor logic enough to deal with > such situations and agreed that it seemed too messy; but that means > that the optimization needs to take care not to use otherqual > (ie pushed-down) conditions in the uniqueness proofs. > >> Oh yeah. I get it, but that's why we ignore !can_join clauses > > can_join seems to me to be not particularly relevant ... there's > nothing that prevents that from getting set for pushed-down clauses. > > It's possible that the case I'm worried about is unreachable in > practice because all the conditions that could be of interest would? > be strict and therefore would have forced join strength reduction. > But I'm not comfortable with assuming that. Okay, well how about we protect against that by not using such quals as unique proofs? We'd just need to ignore anything that's outerjoin_delayed? If we're struggling to think of a case that this will affect, then we shouldn't be too worried about any missed optimisations. A patch which does this is attached. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
David Rowley <david.rowley@2ndquadrant.com> writes: [ unique_joins_2017-04-07b.patch ] It turned out that this patch wasn't as close to committable as I'd thought, but after a full day of whacking at it, I got to a place where I thought it was OK. So, pushed. [ and that's a wrap for v10 feature freeze, I think ] regards, tom lane
On 8 April 2017 at 14:23, Tom Lane <tgl@sss.pgh.pa.us> wrote: > David Rowley <david.rowley@2ndquadrant.com> writes: > [ unique_joins_2017-04-07b.patch ] > > It turned out that this patch wasn't as close to committable as I'd > thought, but after a full day of whacking at it, I got to a place > where I thought it was OK. So, pushed. Many thanks for taking the time to do this, and committing too! -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services