Thread: Missed index opportunity for outer join?

Missed index opportunity for outer join?

From
rm_pg@cheapcomplexdevices.com
Date:
I have a case where an outer join's taking 10X more time than
a non-outer join; and it looks to me like the outer join could
have taken advantage of the same indexes that the non-outer join did.


In both cases, the outermost thing is a nested loop. The
top subplan gets all "point features" whre featureid=120.
The outer join did not use an index for this.
The non-outer join did use an index for this.

Any reason it couldn't have use the index there?


Also - in both cases the second part of the nested loop
is using the same multi-column index on the table "facets".
The non-outer-join uses both columns of this multi-column index.
The outer-join only uses one of the columns and is much slower.

Any reason it couldn't have use both columns of the index there?


Attached below are explain analyze for the slow outer join
and the fast non-outer join.  This is using 8.1.0.

  Thanks in advance,
  Ron

===============================================================================
==  The outer join - slow
===============================================================================
fli=# explain analyze select *  from userfeatures.point_features upf left join facets b on (b.entity_id = upf.entity_id
andb.fac_id=261)  where featureid in (120); 
                                                                  QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=2.11..90317.33 rows=1207 width=505) (actual time=8.985..734.761 rows=917 loops=1)
   ->  Seq Scan on point_features upf  (cost=0.00..265.85 rows=948 width=80) (actual time=8.792..14.270 rows=917
loops=1)
         Filter: (featureid = 120)
   ->  Bitmap Heap Scan on facets b  (cost=2.11..94.60 rows=31 width=425) (actual time=0.101..0.770 rows=1 loops=917)
         Recheck Cond: (b.entity_id = "outer".entity_id)
         Filter: (fac_id = 261)
         ->  Bitmap Index Scan on "fac_val(entity_id,fac_id)"  (cost=0.00..2.11 rows=31 width=0) (actual
time=0.067..0.067rows=32 loops=917) 
               Index Cond: (b.entity_id = "outer".entity_id)
 Total runtime: 736.444 ms
(9 rows)



===============================================================================
==  The non-outer join - fast
===============================================================================
fli=# explain analyze select *  from userfeatures.point_features upf join facets b on (b.entity_id = upf.entity_id and
b.fac_id=261) where featureid in (120); 
                                                                   QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=23.32..4942.48 rows=1207 width=505) (actual time=0.571..55.867 rows=917 loops=1)
   ->  Bitmap Heap Scan on point_features upf  (cost=23.32..172.17 rows=948 width=80) (actual time=0.468..2.226
rows=917loops=1) 
         Recheck Cond: (featureid = 120)
         ->  Bitmap Index Scan on point_features__featureid  (cost=0.00..23.32 rows=948 width=0) (actual
time=0.413..0.413rows=917 loops=1) 
               Index Cond: (featureid = 120)
   ->  Index Scan using "fac_val(entity_id,fac_id)" on facets b  (cost=0.00..5.02 rows=1 width=425) (actual
time=0.051..0.053rows=1 loops=917) 
         Index Cond: ((b.entity_id = "outer".entity_id) AND (b.fac_id = 261))
 Total runtime: 56.892 ms
(8 rows)





===============================================================================
==  The tables involved.
===============================================================================

fli=# \d facets
      Table "facet.facets"
  Column   |  Type   | Modifiers
-----------+---------+-----------
 entity_id | integer |
 nam_hash  | integer |
 val_hash  | integer |
 fac_id    | integer |
 dis_id    | integer |
 fac_val   | text    |
 fac_ival  | integer |
 fac_tval  | text    |
 fac_nval  | numeric |
 fac_raval | real[]  |
 fac_bval  | bytea   |
Indexes:
    "fac_val(entity_id,fac_id)" btree (entity_id, fac_id)
    "facets__dis_id" btree (dis_id)
    "facets__ent_id" btree (entity_id)
    "facets__fac_id" btree (fac_id)
    "facets__id_value" btree (fac_id, fac_val) CLUSTER
Foreign-key constraints:
    "facets_entity_id_fkey" FOREIGN KEY (entity_id) REFERENCES entity(entity_id) ON DELETE CASCADE
    "facets_fac_id_fkey" FOREIGN KEY (fac_id) REFERENCES facet_lookup(fac_id) ON DELETE CASCADE

fli=# \d point_features
                           Table "userfeatures.point_features"
  Column   |   Type   |                            Modifiers
-----------+----------+------------------------------------------------------------------
 pointid   | integer  | not null default nextval('point_features_pointid_seq'::regclass)
 entity_id | integer  |
 featureid | integer  |
 sessionid | integer  |
 userid    | integer  |
 extid     | text     |
 label     | text     |
 iconid    | integer  |
 the_geom  | geometry |
Indexes:
    "point_features__featureid" btree (featureid)
    "point_features__postgis" gist (the_geom)
Check constraints:
    "enforce_dims_the_geom" CHECK (ndims(the_geom) = 2)
    "enforce_geotype_the_geom" CHECK (geometrytype(the_geom) = 'POINT'::text OR the_geom IS NULL)
    "enforce_srid_the_geom" CHECK (srid(the_geom) = -1)




===============================================================================
==  version info
===============================================================================

fli=# select version();
                                       version
-------------------------------------------------------------------------------------
 PostgreSQL 8.1.0 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.3 (SuSE Linux)
(1 row)

Re: Missed index opportunity for outer join?

From
Tom Lane
Date:
rm_pg@cheapcomplexdevices.com writes:
> In both cases, the outermost thing is a nested loop. The
> top subplan gets all "point features" whre featureid=120.
> The outer join did not use an index for this.
> The non-outer join did use an index for this.

Hm, I can't duplicate this in a simple test (see below).  There were
some changes in this area between 8.1.0 and branch tip, but a quick
look at the CVS logs doesn't suggest that any of them would be related
(AFAICS the intentions of the patches were to change behavior only for
OR clauses, and you haven't got any here).

Can you try updating to 8.1 branch tip and see if the problem goes away?
Or if not, generate a self-contained test case that shows the problem
starting from an empty database?

Actually, a quick and dirty thing would be to try my would-be test case
below, and see if you get a seqscan on your copy.

            regards, tom lane

regression=# create table point_features(entity_id int, featureid int);
CREATE TABLE
regression=# create index  point_features__featureid on point_features(featureid);
CREATE INDEX
regression=# create table  facets(entity_id int,  fac_id int);
CREATE TABLE
regression=# create index "fac_val(entity_id,fac_id)" on facets(entity_id,fac_id);
CREATE INDEX
regression=# set enable_hashjoin TO 0;
SET
regression=# set enable_mergejoin TO 0;
SET
regression=# explain select *  from point_features upf join facets b on (b.entity_id = upf.entity_id and b.fac_id=261)
wherefeatureid in (120); 
                                            QUERY PLAN
--------------------------------------------------------------------------------------------------
 Nested Loop  (cost=1.03..59.90 rows=1 width=16)
   ->  Bitmap Heap Scan on point_features upf  (cost=1.03..11.50 rows=10 width=8)
         Recheck Cond: (featureid = 120)
         ->  Bitmap Index Scan on point_features__featureid  (cost=0.00..1.03 rows=10 width=0)
               Index Cond: (featureid = 120)
   ->  Index Scan using "fac_val(entity_id,fac_id)" on facets b  (cost=0.00..4.83 rows=1 width=8)
         Index Cond: ((b.entity_id = "outer".entity_id) AND (b.fac_id = 261))
(7 rows)

regression=# explain select *  from point_features upf left join facets b on (b.entity_id = upf.entity_id and
b.fac_id=261) where featureid in (120); 
                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=2.07..127.70 rows=10 width=16)
   ->  Bitmap Heap Scan on point_features upf  (cost=1.03..11.50 rows=10 width=8)
         Recheck Cond: (featureid = 120)
         ->  Bitmap Index Scan on point_features__featureid  (cost=0.00..1.03 rows=10 width=0)
               Index Cond: (featureid = 120)
   ->  Bitmap Heap Scan on facets b  (cost=1.03..11.50 rows=10 width=8)
         Recheck Cond: (b.entity_id = "outer".entity_id)
         Filter: (fac_id = 261)
         ->  Bitmap Index Scan on "fac_val(entity_id,fac_id)"  (cost=0.00..1.03 rows=10 width=0)
               Index Cond: (b.entity_id = "outer".entity_id)
(10 rows)

(Note to self: it is a bit odd that fac_id=261 is pushed down to become
an indexqual in one case but not the other ...)

Re: Missed index opportunity for outer join?

From
rm_pg@cheapcomplexdevices.com
Date:
On Mon, 5 Dec 2005, Tom Lane wrote:
>
> Hm, I can't duplicate this in a simple test...
> Can you try updating to 8.1 branch tip ...
> Actually, a quick and dirty thing would be to try my would-be test case
> below, and see if you get a seqscan on your copy.

With your simple test-case I did not get the seqscan on 8.1.0.
Output shown below that looks just like yours.

I'll try upgrading a devel machine too - but will only be
able to try on smalller test databases in the near term.

> (Note to self: it is a bit odd that fac_id=261 is pushed down to become
> an indexqual in one case but not the other ...)

I speculate that the seq_scan wasn't really the slow part
compared to not using using both parts of the index in the
second part of the plan.  The table point_features is tens of
thousands of rows, while the table facets is tens of millions.

    Thanks,
    Ron

===============================================================================
=== Output of Tom's test case showing the same results he got.
===============================================================================

greenie /home/pg2> createdb foo
CREATE DATABASE
greenie /home/pg2> psql foo
[...]
foo=# create table point_features(entity_id int, featureid int);
CREATE TABLE
foo=#  create index  point_features__featureid on point_features(featureid);
CREATE INDEX
foo=# create table  facets(entity_id int,  fac_id int);
CREATE TABLE
foo=#  create index "fac_val(entity_id,fac_id)" on facets(entity_id,fac_id);
CREATE INDEX
foo=# set enable_hashjoin TO 0;
SET
foo=#  set enable_mergejoin TO 0;
SET
foo=# explain select *  from point_features upf join facets b on (b.entity_id = upf.entity_id and b.fac_id=261)  where
featureidin (120); 
                                            QUERY PLAN
--------------------------------------------------------------------------------------------------
 Nested Loop  (cost=1.03..49.15 rows=1 width=16)
   ->  Bitmap Heap Scan on point_features upf  (cost=1.03..10.27 rows=10 width=8)
         Recheck Cond: (featureid = 120)
         ->  Bitmap Index Scan on point_features__featureid  (cost=0.00..1.03 rows=10 width=0)
               Index Cond: (featureid = 120)
   ->  Index Scan using "fac_val(entity_id,fac_id)" on facets b  (cost=0.00..3.88 rows=1 width=8)
         Index Cond: ((b.entity_id = "outer".entity_id) AND (b.fac_id = 261))
(7 rows)

foo=#  explain select *  from point_features upf left join facets b on (b.entity_id = upf.entity_id and b.fac_id=261)
wherefeatureid in (120); 
                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=2.07..114.25 rows=10 width=16)
   ->  Bitmap Heap Scan on point_features upf  (cost=1.03..10.27 rows=10 width=8)
         Recheck Cond: (featureid = 120)
         ->  Bitmap Index Scan on point_features__featureid  (cost=0.00..1.03 rows=10 width=0)
               Index Cond: (featureid = 120)
   ->  Bitmap Heap Scan on facets b  (cost=1.03..10.27 rows=10 width=8)
         Recheck Cond: (b.entity_id = "outer".entity_id)
         Filter: (fac_id = 261)
         ->  Bitmap Index Scan on "fac_val(entity_id,fac_id)"  (cost=0.00..1.03 rows=10 width=0)
               Index Cond: (b.entity_id = "outer".entity_id)
(10 rows)

foo=#



Re: Missed index opportunity for outer join?

From
Tom Lane
Date:
rm_pg@cheapcomplexdevices.com writes:
> On Mon, 5 Dec 2005, Tom Lane wrote:
>> (Note to self: it is a bit odd that fac_id=261 is pushed down to become
>> an indexqual in one case but not the other ...)

> I speculate that the seq_scan wasn't really the slow part
> compared to not using using both parts of the index in the
> second part of the plan.  The table point_features is tens of
> thousands of rows, while the table facets is tens of millions.

Agreed, but it's still odd that it would use a seqscan in one case and
not the other.

I found the reason why the fac_id=261 clause isn't getting used as an
index qual; it's a bit of excessive paranoia that goes back to 2002.
I've fixed that for 8.1.1, but am still wondering about the seqscan
on the other side of the join.

            regards, tom lane

Re: Missed index opportunity for outer join?

From
Ron Mayer
Date:
Tom Lane wrote:
> rm_pg@cheapcomplexdevices.com writes:
>>On Mon, 5 Dec 2005, Tom Lane wrote:
>
>>I speculate that the seq_scan wasn't really the slow part
>>compared to not using using both parts of the index in the
>>second part of the plan.  The table point_features is tens of
>>thousands of rows, while the table facets is tens of millions.
>
> Agreed, but it's still odd that it would use a seqscan in one case and
> not the other.

Hmm.  Unfortunately that was happening on a production system
and the amount of data in the tables has changed - and now I'm
no longer getting a seq_scan when I try to reproduce it.   That
system is still using 8.1.0.

The "point_features" table is pretty dynamic and it's possible
that the data changed between my 'explain analyze' statement in
the first post in this thread.   However since both of them
show an estimate of "rows=948" and returned an actual of 917 I
don't think that happened.

> I found the reason why the fac_id=261 clause isn't getting used as an
> index qual; it's a bit of excessive paranoia that goes back to 2002.
> I've fixed that for 8.1.1, but am still wondering about the seqscan
> on the other side of the join.

I now have a development system with cvs-tip; but have not yet
reproduced the seq scan on it either.  I'm using the same data
that was in "point_features" with "featureid=120" - but don't have
any good way of knowing what other data may have been in the table
at the time.   If desired, I could set up a cron job to periodically
explain analyze that query and see if it recurs.

Re: Missed index opportunity for outer join?

From
Tom Lane
Date:
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:
> The "point_features" table is pretty dynamic and it's possible
> that the data changed between my 'explain analyze' statement in
> the first post in this thread.   However since both of them
> show an estimate of "rows=948" and returned an actual of 917 I
> don't think that happened.

Yeah, I had considered the same explanation and rejected it for the same
reason.  Also, the difference in estimated cost is significant (265.85
for the seqscan vs 172.17 for the bitmap scan) so it's hard to think
that a small change in stats --- so small as to not reflect in estimated
row count --- would change the estimate by that much.

[ thinks some more... ]  Of course, what we have to remember is that the
planner is actually going to choose based on the ultimate join cost, not
on the subplan costs.  The reason the seqscan survived initial
comparisons at all is that it has a cheaper startup cost (less time to
return the first tuple) than the bitmap scan, and this will be reflected
into a cheaper startup cost for the overall nestloop.  The extra hundred
units of total cost would only reflect into the nestloop total cost ---
and there, they would be considered "down in the noise" compared to a
90k total estimate.  So probably what happened is that the planner
preferred this plan on the basis that the total costs are the same to
within estimation error while the startup cost is definitely less.

In this explanation, the reason for the change in plans over time could
be a change in the statistics for the other table.  Is "facets" more
dynamic than "point_features"?

            regards, tom lane

Re: Missed index opportunity for outer join?

From
Ron Mayer
Date:
Tom Lane wrote:
> ...planner is actually going to choose based on the ultimate join cost,
> not on the subplan costs...
>
> In this explanation, the reason for the change in plans over time could
> be a change in the statistics for the other table.  Is "facets" more
> dynamic than "point_features"?

In total rows changing it's more dynamic, but percentage-wise, it's
less dynamic (point_features probably turns round 50% of it's rows
in a day -- while facets turns over about 3% per day -- but facets
is 1000X larger).

Facets is a big table with rather odd distributions of values.
Many of the values in the indexed columns show up only
once, others show up hundreds-of-thousands of times.  Perhaps
an analyze ran and just randomly sampled differently creating
different stats on that table?

Re: Missed index opportunity for outer join?

From
Tom Lane
Date:
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:
> Tom Lane wrote:
>> In this explanation, the reason for the change in plans over time could
>> be a change in the statistics for the other table.  Is "facets" more
>> dynamic than "point_features"?

> Facets is a big table with rather odd distributions of values.
> Many of the values in the indexed columns show up only
> once, others show up hundreds-of-thousands of times.  Perhaps
> an analyze ran and just randomly sampled differently creating
> different stats on that table?

If you have background tasks doing ANALYZEs then this explanation seems
plausible enough.  I'm willing to accept it anyway ...

            regards, tom lane

Re: Missed index opportunity for outer join?

From
Ron Mayer
Date:
Tom Lane wrote:
> If you have background tasks doing ANALYZEs then this explanation seems
> plausible enough.  I'm willing to accept it anyway ...

Yup, there are such tasks.   I could dig through logs to try to confirm
or reject it; but I think it's reasonably likely that this happened.
Basically, data gets added to that table as it becomes ready from other
systems, and after each batch a vacuum analyze is run.