Spatial join insists on sequential scan of larger table - Mailing list pgsql-performance

From Clive Page
Subject Spatial join insists on sequential scan of larger table
Date
Msg-id Pine.LNX.4.44L0.0404021540020.2713-100000@peneca.star.le.ac.uk
Whole thread Raw
Responses Re: Spatial join insists on sequential scan of larger table
List pgsql-performance
I am trying to do a spatial join between two tables each of which has a
column of type BOX called ERRBOX, with R-TREE indices created on both.

The smaller table, xmm1, has      56,711 rows,
the larger one, twomass, has 177,757,299 rows.

The most efficient way to join these is to do a sequential scan of the
smaller table, and an R-tree lookup on the larger.  However for a simple
inner join the optimiser seems to want to do the reverse, for example:

EXPLAIN
SELECT x.ra AS xra, x.decl AS xdecl, t.ra AS tra, t.decl AS tdecl
FROM xmm1 AS x INNER JOIN twomass AS t
ON x.errbox && t.errbox;

                                    QUERY PLAN
----------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..196642756520.34 rows=49506496044 width=32)
   ->  Seq Scan on twomass t  (cost=0.00..9560002.72 rows=177023872 width=48)
   ->  Index Scan using xmm1box on xmm1 x  (cost=0.00..1107.28 rows=280 width=48)
         Index Cond: (x.errbox && "outer".errbox)


Reversing the join condition (i.e. t.errbox && x.errbox) and similar make
no difference, nor does using the old implicit join syntax.

If, however, I specify an outer join such as:

EXPLAIN
SELECT x.ra AS xra, x.decl AS xdecl, t.ra AS tra, t.decl AS tdecl
FROM xmm1 AS x LEFT OUTER JOIN twomass AS t
ON x.errbox && t.errbox;

                                       QUERY PLAN
----------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=0.00..198945259325.90 rows=49506496044
width=32)
   ->  Seq Scan on xmm1 x  (cost=0.00..8592.32 rows=55932 width=48)
   ->  Index Scan using tbox on twomass t  (cost=0.00..3545848.88 rows=885119 width=48)
         Index Cond: ("outer".errbox && t.errbox)


This executes, it need hardly be said, a whole lot faster.

I found that I can also force a sequential scan of the smaller table by
dropping its R-tree index, but I may need this in other operations, so
this isn't a very satisfactory solution.  It's odd that an outer join
should be faster than an inner one, or to put it another way, after
dropping an index there is more than an order of magnitude speed increase.

I'm using Postgres 7.4.1 on Red Hat Linux.  Has anyone had similar
problems with spatial joins?


--
Clive Page
Dept of Physics & Astronomy,
University of Leicester,
Leicester, LE1 7RH,  U.K.



pgsql-performance by date:

Previous
From: "Aaron Werman"
Date:
Subject: Re: single index on more than two coulumns a bad thing?
Next
From: Tom Lane
Date:
Subject: Re: Spatial join insists on sequential scan of larger table