Re: Why is this doing a seq scan? - Mailing list pgsql-sql

From Tom Lane
Subject Re: Why is this doing a seq scan?
Date
Msg-id 14217.974498877@sss.pgh.pa.us
Whole thread Raw
In response to Re: Why is this doing a seq scan?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-sql
I said:
> Ah, there's your problem --- the planner is not very smart about
> optimizing cross-datatype comparisons.  Make these columns both text,
> or both varchar, and I'll bet you get a more intelligent plan.

After a little further thought, I realize that the planner may be
handicapped by not realizing it can do a merge or hash join across
datatypes, but even without that problem, this is not going to be
a fast query.  What you've got is
select ... from atms x, zips y where x.zip = y.ziporder by 1 limit 3;

and there just isn't any way to process this without forming the
full join product --- ie, the thing will sit there and form a join
tuple for *every* valid combination of ATM and ZIP in your database,
and then compute the distance to the target point for every one of
those ATMs, and then sort that result, and finally give you only
the top three rows.  A smarter kind of join isn't going to help
all that much; to make this fast, you need to be filtering
using the really selective condition (distance to the target point)
*before* you do the join.

If you are sufficiently interested in the speed of this query to want to
maintain a specialized index for it, I'd suggest looking at an r-tree
index on the location data, and then using a WHERE condition on the
r-tree index to prefilter the rows before you join.  r-trees only work
on boxes and polygons AFAICT --- what would work nicely is to store a
"box" of very small dimensions surrounding the location of each ATM,
index that column, and then use a WHERE test for overlap between that
box column and a box surrounding the target point out to as far as you
think is likely to be interesting.  This gives you back a fairly small
number of candidate ATMs for which you compute the exact distance to
the target, sort, and limit.  Not sure that you need to join to zips
at all if you do it this way.
        regards, tom lane


pgsql-sql by date:

Previous
From: Tom Lane
Date:
Subject: Re: Why is this doing a seq scan?
Next
From: "Robert B. Easter"
Date:
Subject: Re: how many rows? [was Re: fetching rows]