Thread: Missed index opportunity for outer join?
I have a case where an outer join's taking 10X more time than a non-outer join; and it looks to me like the outer join could have taken advantage of the same indexes that the non-outer join did. In both cases, the outermost thing is a nested loop. The top subplan gets all "point features" whre featureid=120. The outer join did not use an index for this. The non-outer join did use an index for this. Any reason it couldn't have use the index there? Also - in both cases the second part of the nested loop is using the same multi-column index on the table "facets". The non-outer-join uses both columns of this multi-column index. The outer-join only uses one of the columns and is much slower. Any reason it couldn't have use both columns of the index there? Attached below are explain analyze for the slow outer join and the fast non-outer join. This is using 8.1.0. Thanks in advance, Ron =============================================================================== == The outer join - slow =============================================================================== fli=# explain analyze select * from userfeatures.point_features upf left join facets b on (b.entity_id = upf.entity_id andb.fac_id=261) where featureid in (120); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop Left Join (cost=2.11..90317.33 rows=1207 width=505) (actual time=8.985..734.761 rows=917 loops=1) -> Seq Scan on point_features upf (cost=0.00..265.85 rows=948 width=80) (actual time=8.792..14.270 rows=917 loops=1) Filter: (featureid = 120) -> Bitmap Heap Scan on facets b (cost=2.11..94.60 rows=31 width=425) (actual time=0.101..0.770 rows=1 loops=917) Recheck Cond: (b.entity_id = "outer".entity_id) Filter: (fac_id = 261) -> Bitmap Index Scan on "fac_val(entity_id,fac_id)" (cost=0.00..2.11 rows=31 width=0) (actual time=0.067..0.067rows=32 loops=917) Index Cond: (b.entity_id = "outer".entity_id) Total runtime: 736.444 ms (9 rows) =============================================================================== == The non-outer join - fast =============================================================================== fli=# explain analyze select * from userfeatures.point_features upf join facets b on (b.entity_id = upf.entity_id and b.fac_id=261) where featureid in (120); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=23.32..4942.48 rows=1207 width=505) (actual time=0.571..55.867 rows=917 loops=1) -> Bitmap Heap Scan on point_features upf (cost=23.32..172.17 rows=948 width=80) (actual time=0.468..2.226 rows=917loops=1) Recheck Cond: (featureid = 120) -> Bitmap Index Scan on point_features__featureid (cost=0.00..23.32 rows=948 width=0) (actual time=0.413..0.413rows=917 loops=1) Index Cond: (featureid = 120) -> Index Scan using "fac_val(entity_id,fac_id)" on facets b (cost=0.00..5.02 rows=1 width=425) (actual time=0.051..0.053rows=1 loops=917) Index Cond: ((b.entity_id = "outer".entity_id) AND (b.fac_id = 261)) Total runtime: 56.892 ms (8 rows) =============================================================================== == The tables involved. =============================================================================== fli=# \d facets Table "facet.facets" Column | Type | Modifiers -----------+---------+----------- entity_id | integer | nam_hash | integer | val_hash | integer | fac_id | integer | dis_id | integer | fac_val | text | fac_ival | integer | fac_tval | text | fac_nval | numeric | fac_raval | real[] | fac_bval | bytea | Indexes: "fac_val(entity_id,fac_id)" btree (entity_id, fac_id) "facets__dis_id" btree (dis_id) "facets__ent_id" btree (entity_id) "facets__fac_id" btree (fac_id) "facets__id_value" btree (fac_id, fac_val) CLUSTER Foreign-key constraints: "facets_entity_id_fkey" FOREIGN KEY (entity_id) REFERENCES entity(entity_id) ON DELETE CASCADE "facets_fac_id_fkey" FOREIGN KEY (fac_id) REFERENCES facet_lookup(fac_id) ON DELETE CASCADE fli=# \d point_features Table "userfeatures.point_features" Column | Type | Modifiers -----------+----------+------------------------------------------------------------------ pointid | integer | not null default nextval('point_features_pointid_seq'::regclass) entity_id | integer | featureid | integer | sessionid | integer | userid | integer | extid | text | label | text | iconid | integer | the_geom | geometry | Indexes: "point_features__featureid" btree (featureid) "point_features__postgis" gist (the_geom) Check constraints: "enforce_dims_the_geom" CHECK (ndims(the_geom) = 2) "enforce_geotype_the_geom" CHECK (geometrytype(the_geom) = 'POINT'::text OR the_geom IS NULL) "enforce_srid_the_geom" CHECK (srid(the_geom) = -1) =============================================================================== == version info =============================================================================== fli=# select version(); version ------------------------------------------------------------------------------------- PostgreSQL 8.1.0 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.3 (SuSE Linux) (1 row)
rm_pg@cheapcomplexdevices.com writes: > In both cases, the outermost thing is a nested loop. The > top subplan gets all "point features" whre featureid=120. > The outer join did not use an index for this. > The non-outer join did use an index for this. Hm, I can't duplicate this in a simple test (see below). There were some changes in this area between 8.1.0 and branch tip, but a quick look at the CVS logs doesn't suggest that any of them would be related (AFAICS the intentions of the patches were to change behavior only for OR clauses, and you haven't got any here). Can you try updating to 8.1 branch tip and see if the problem goes away? Or if not, generate a self-contained test case that shows the problem starting from an empty database? Actually, a quick and dirty thing would be to try my would-be test case below, and see if you get a seqscan on your copy. regards, tom lane regression=# create table point_features(entity_id int, featureid int); CREATE TABLE regression=# create index point_features__featureid on point_features(featureid); CREATE INDEX regression=# create table facets(entity_id int, fac_id int); CREATE TABLE regression=# create index "fac_val(entity_id,fac_id)" on facets(entity_id,fac_id); CREATE INDEX regression=# set enable_hashjoin TO 0; SET regression=# set enable_mergejoin TO 0; SET regression=# explain select * from point_features upf join facets b on (b.entity_id = upf.entity_id and b.fac_id=261) wherefeatureid in (120); QUERY PLAN -------------------------------------------------------------------------------------------------- Nested Loop (cost=1.03..59.90 rows=1 width=16) -> Bitmap Heap Scan on point_features upf (cost=1.03..11.50 rows=10 width=8) Recheck Cond: (featureid = 120) -> Bitmap Index Scan on point_features__featureid (cost=0.00..1.03 rows=10 width=0) Index Cond: (featureid = 120) -> Index Scan using "fac_val(entity_id,fac_id)" on facets b (cost=0.00..4.83 rows=1 width=8) Index Cond: ((b.entity_id = "outer".entity_id) AND (b.fac_id = 261)) (7 rows) regression=# explain select * from point_features upf left join facets b on (b.entity_id = upf.entity_id and b.fac_id=261) where featureid in (120); QUERY PLAN ------------------------------------------------------------------------------------------------- Nested Loop Left Join (cost=2.07..127.70 rows=10 width=16) -> Bitmap Heap Scan on point_features upf (cost=1.03..11.50 rows=10 width=8) Recheck Cond: (featureid = 120) -> Bitmap Index Scan on point_features__featureid (cost=0.00..1.03 rows=10 width=0) Index Cond: (featureid = 120) -> Bitmap Heap Scan on facets b (cost=1.03..11.50 rows=10 width=8) Recheck Cond: (b.entity_id = "outer".entity_id) Filter: (fac_id = 261) -> Bitmap Index Scan on "fac_val(entity_id,fac_id)" (cost=0.00..1.03 rows=10 width=0) Index Cond: (b.entity_id = "outer".entity_id) (10 rows) (Note to self: it is a bit odd that fac_id=261 is pushed down to become an indexqual in one case but not the other ...)
On Mon, 5 Dec 2005, Tom Lane wrote: > > Hm, I can't duplicate this in a simple test... > Can you try updating to 8.1 branch tip ... > Actually, a quick and dirty thing would be to try my would-be test case > below, and see if you get a seqscan on your copy. With your simple test-case I did not get the seqscan on 8.1.0. Output shown below that looks just like yours. I'll try upgrading a devel machine too - but will only be able to try on smalller test databases in the near term. > (Note to self: it is a bit odd that fac_id=261 is pushed down to become > an indexqual in one case but not the other ...) I speculate that the seq_scan wasn't really the slow part compared to not using using both parts of the index in the second part of the plan. The table point_features is tens of thousands of rows, while the table facets is tens of millions. Thanks, Ron =============================================================================== === Output of Tom's test case showing the same results he got. =============================================================================== greenie /home/pg2> createdb foo CREATE DATABASE greenie /home/pg2> psql foo [...] foo=# create table point_features(entity_id int, featureid int); CREATE TABLE foo=# create index point_features__featureid on point_features(featureid); CREATE INDEX foo=# create table facets(entity_id int, fac_id int); CREATE TABLE foo=# create index "fac_val(entity_id,fac_id)" on facets(entity_id,fac_id); CREATE INDEX foo=# set enable_hashjoin TO 0; SET foo=# set enable_mergejoin TO 0; SET foo=# explain select * from point_features upf join facets b on (b.entity_id = upf.entity_id and b.fac_id=261) where featureidin (120); QUERY PLAN -------------------------------------------------------------------------------------------------- Nested Loop (cost=1.03..49.15 rows=1 width=16) -> Bitmap Heap Scan on point_features upf (cost=1.03..10.27 rows=10 width=8) Recheck Cond: (featureid = 120) -> Bitmap Index Scan on point_features__featureid (cost=0.00..1.03 rows=10 width=0) Index Cond: (featureid = 120) -> Index Scan using "fac_val(entity_id,fac_id)" on facets b (cost=0.00..3.88 rows=1 width=8) Index Cond: ((b.entity_id = "outer".entity_id) AND (b.fac_id = 261)) (7 rows) foo=# explain select * from point_features upf left join facets b on (b.entity_id = upf.entity_id and b.fac_id=261) wherefeatureid in (120); QUERY PLAN ------------------------------------------------------------------------------------------------- Nested Loop Left Join (cost=2.07..114.25 rows=10 width=16) -> Bitmap Heap Scan on point_features upf (cost=1.03..10.27 rows=10 width=8) Recheck Cond: (featureid = 120) -> Bitmap Index Scan on point_features__featureid (cost=0.00..1.03 rows=10 width=0) Index Cond: (featureid = 120) -> Bitmap Heap Scan on facets b (cost=1.03..10.27 rows=10 width=8) Recheck Cond: (b.entity_id = "outer".entity_id) Filter: (fac_id = 261) -> Bitmap Index Scan on "fac_val(entity_id,fac_id)" (cost=0.00..1.03 rows=10 width=0) Index Cond: (b.entity_id = "outer".entity_id) (10 rows) foo=#
rm_pg@cheapcomplexdevices.com writes: > On Mon, 5 Dec 2005, Tom Lane wrote: >> (Note to self: it is a bit odd that fac_id=261 is pushed down to become >> an indexqual in one case but not the other ...) > I speculate that the seq_scan wasn't really the slow part > compared to not using using both parts of the index in the > second part of the plan. The table point_features is tens of > thousands of rows, while the table facets is tens of millions. Agreed, but it's still odd that it would use a seqscan in one case and not the other. I found the reason why the fac_id=261 clause isn't getting used as an index qual; it's a bit of excessive paranoia that goes back to 2002. I've fixed that for 8.1.1, but am still wondering about the seqscan on the other side of the join. regards, tom lane
Tom Lane wrote: > rm_pg@cheapcomplexdevices.com writes: >>On Mon, 5 Dec 2005, Tom Lane wrote: > >>I speculate that the seq_scan wasn't really the slow part >>compared to not using using both parts of the index in the >>second part of the plan. The table point_features is tens of >>thousands of rows, while the table facets is tens of millions. > > Agreed, but it's still odd that it would use a seqscan in one case and > not the other. Hmm. Unfortunately that was happening on a production system and the amount of data in the tables has changed - and now I'm no longer getting a seq_scan when I try to reproduce it. That system is still using 8.1.0. The "point_features" table is pretty dynamic and it's possible that the data changed between my 'explain analyze' statement in the first post in this thread. However since both of them show an estimate of "rows=948" and returned an actual of 917 I don't think that happened. > I found the reason why the fac_id=261 clause isn't getting used as an > index qual; it's a bit of excessive paranoia that goes back to 2002. > I've fixed that for 8.1.1, but am still wondering about the seqscan > on the other side of the join. I now have a development system with cvs-tip; but have not yet reproduced the seq scan on it either. I'm using the same data that was in "point_features" with "featureid=120" - but don't have any good way of knowing what other data may have been in the table at the time. If desired, I could set up a cron job to periodically explain analyze that query and see if it recurs.
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes: > The "point_features" table is pretty dynamic and it's possible > that the data changed between my 'explain analyze' statement in > the first post in this thread. However since both of them > show an estimate of "rows=948" and returned an actual of 917 I > don't think that happened. Yeah, I had considered the same explanation and rejected it for the same reason. Also, the difference in estimated cost is significant (265.85 for the seqscan vs 172.17 for the bitmap scan) so it's hard to think that a small change in stats --- so small as to not reflect in estimated row count --- would change the estimate by that much. [ thinks some more... ] Of course, what we have to remember is that the planner is actually going to choose based on the ultimate join cost, not on the subplan costs. The reason the seqscan survived initial comparisons at all is that it has a cheaper startup cost (less time to return the first tuple) than the bitmap scan, and this will be reflected into a cheaper startup cost for the overall nestloop. The extra hundred units of total cost would only reflect into the nestloop total cost --- and there, they would be considered "down in the noise" compared to a 90k total estimate. So probably what happened is that the planner preferred this plan on the basis that the total costs are the same to within estimation error while the startup cost is definitely less. In this explanation, the reason for the change in plans over time could be a change in the statistics for the other table. Is "facets" more dynamic than "point_features"? regards, tom lane
Tom Lane wrote: > ...planner is actually going to choose based on the ultimate join cost, > not on the subplan costs... > > In this explanation, the reason for the change in plans over time could > be a change in the statistics for the other table. Is "facets" more > dynamic than "point_features"? In total rows changing it's more dynamic, but percentage-wise, it's less dynamic (point_features probably turns round 50% of it's rows in a day -- while facets turns over about 3% per day -- but facets is 1000X larger). Facets is a big table with rather odd distributions of values. Many of the values in the indexed columns show up only once, others show up hundreds-of-thousands of times. Perhaps an analyze ran and just randomly sampled differently creating different stats on that table?
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes: > Tom Lane wrote: >> In this explanation, the reason for the change in plans over time could >> be a change in the statistics for the other table. Is "facets" more >> dynamic than "point_features"? > Facets is a big table with rather odd distributions of values. > Many of the values in the indexed columns show up only > once, others show up hundreds-of-thousands of times. Perhaps > an analyze ran and just randomly sampled differently creating > different stats on that table? If you have background tasks doing ANALYZEs then this explanation seems plausible enough. I'm willing to accept it anyway ... regards, tom lane
Tom Lane wrote: > If you have background tasks doing ANALYZEs then this explanation seems > plausible enough. I'm willing to accept it anyway ... Yup, there are such tasks. I could dig through logs to try to confirm or reject it; but I think it's reasonably likely that this happened. Basically, data gets added to that table as it becomes ready from other systems, and after each batch a vacuum analyze is run.