Re: Performance on update from join - Mailing list pgsql-sql
From | Tom Lane |
---|---|
Subject | Re: Performance on update from join |
Date | |
Msg-id | 29566.1021052657@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Performance on update from join (Jean-Luc Lachance <jllachan@nsd.ca>) |
List | pgsql-sql |
Jean-Luc Lachance <jllachan@nsd.ca> writes: > Try: > create table a (a1 int primary key, a2 int, a3 int, a4 int); > create table b (b1 int, b2 int, b3 int, b4 int, primary key (b1, b2)); > create table d (d1 int, d2 int, d3 int, d4 int, primary key (d1, d2)); > explain update a set a4 = d.d2 from b,d where a.a2 = b.b1 and a.a3 = > b.b2 and > b.b3 = d.d1 and b.b4 = d.d2 and a.a4 >= d.d3 and a.a4 <= d.d4; > Which is closer to what I have. > +----------------------------------- > | /\ > A1 A2 A3 A4 B1 B2 B3 B4 D1 D2 D3 D4 > | | | | | | | | > +--------------+ | +---------+ | > | | | | > +--------------+ +---------+ Well, I still don't see any tendency to want to form a "cartesian product", if by that you mean an unconstrained join. I tried these cases: After creating tables as above: Hash Join (cost=129.77..162.27 rows=1 width=54) Hash Cond: ("outer".a2 = "inner".b1) Join Filter: (("outer".a3 = "inner".b2)AND ("outer".a4 >= "inner".d3) AND ("outer".a4 <= "inner".d4)) -> Seq Scan on a (cost=0.00..20.00 rows=1000width=22) -> Hash (cost=129.70..129.70 rows=25 width=32) -> Merge Join (cost=69.83..129.70 rows=25 width=32) Merge Cond: (("outer".d1 = "inner".b3) AND ("outer".d2 = "inner".b4)) -> Index Scan usingd_pkey on d (cost=0.00..52.00 rows=1000 width=16) -> Sort (cost=69.83..72.33 rows=1000 width=16) -> Seq Scan on b (cost=0.00..20.00 rows=1000 width=16) (I'm using current development sources here because of the nicer EXPLAIN display, but the planner behavior hasn't changed from 7.2 AFAIR.) This doesn't tell us very much because the default assumptions for the table sizes are all the same, 1000 rows/10 pages. I tried goosing it up: test=# update pg_class set reltuples=1000000,relpages=10000 where relname = 'a'; UPDATE 1 test=# update pg_class set reltuples=1000000,relpages=10000 where relname = 'b'; UPDATE 1 test=# update pg_class set reltuples=1000000,relpages=10000 where relname = 'd'; UPDATE 1 test=# explain ... Merge Join (cost=8294766.62..20924766.62 rows=69444444 width=54) Merge Cond: (("outer".b2 = "inner".a3) AND ("outer".b1= "inner".a2)) Join Filter: (("inner".a4 >= "outer".d3) AND ("inner".a4 <= "outer".d4)) -> Sort (cost=8093208.78..8155708.78rows=25000000 width=32) -> Merge Join (cost=373805.69..758805.69 rows=25000000 width=32) Merge Cond: (("outer".b4 = "inner".d2) AND ("outer".b3 = "inner".d1)) -> Sort (cost=186902.84..189402.84rows=1000000 width=16) -> Seq Scan on b (cost=0.00..20000.00 rows=1000000width=16) -> Sort (cost=186902.84..189402.84 rows=1000000 width=16) -> SeqScan on d (cost=0.00..20000.00 rows=1000000 width=16) -> Sort (cost=201557.84..204057.84 rows=1000000 width=22) -> Seq Scan on a (cost=0.00..20000.00 rows=1000000 width=22) Or if we make table a smaller than the other two: test=# update pg_class set reltuples=100000,relpages=1000 where relname = 'a'; UPDATE 1 test=# explain ... Merge Join (cost=981132.44..2248632.44 rows=6944444 width=54) Merge Cond: (("outer".d2 = "inner".b4) AND ("outer".d1 ="inner".b3)) Join Filter: (("inner".a4 >= "outer".d3) AND ("inner".a4 <= "outer".d4)) -> Sort (cost=186902.84..189402.84rows=1000000 width=16) -> Seq Scan on d (cost=0.00..20000.00 rows=1000000 width=16) -> Sort (cost=794229.60..800479.60 rows=2500000 width=38) -> Merge Join (cost=198580.89..241580.89 rows=2500000width=38) Merge Cond: (("outer".a3 = "inner".b2) AND ("outer".a2 = "inner".b1)) -> Sort (cost=11678.05..11928.05 rows=100000 width=22) -> Seq Scan on a (cost=0.00..2000.00 rows=100000width=22) -> Sort (cost=186902.84..189402.84 rows=1000000 width=16) -> SeqScan on b (cost=0.00..20000.00 rows=1000000 width=16) The only thing I see here that is unhappy-making is that the planner does not recognize that joining both columns of a 2-column primary key guarantees a unique result (for instance, the join of a and b here should be estimated at 100000 rows, not 2500000, since there can't be more than one b row joined to any a row). It would do better if it had any ANALYZE statistics to work with, but ideally it ought to be able to deduce that even without stats, just from the presence of a unique index. It will do so for single-column unique indexes but I haven't yet figured out a reasonable (read inexpensive) way to make the corresponding deduction for the multi-column case. In any case, toy examples like this aren't going to prove much about the behavior with real tables and real statistics ... if you want constructive answers you're going to need to show us your actual EXPLAIN ANALYZE results and the pg_stats entries for the tables. regards, tom lane