Thread: Self-join query and index usage

Self-join query and index usage

From
worky.workerson@gmail.com
Date:
I'm doing a self join of some shipping data and wanted to get the best
query possible.  The interesting table is the event table, and it has
the following structure:

  startnode int,
  endnode int,
  weight int,
  starttime timestamp,
  endtime timestamp

and the query that I would like to run is:

SELECT e1.endnode, count(*), sum(e1.weight) AS weight1, sum(e2.weight)
AS weight2
FROM event e1, event e2
WHERE e1.endnode = e2.startnode AND e1.starttime < e2.starttime AND
e2.starttime < e1.endtime
GROUP BY e1.endnode

Assuming that I have indexes on all the columns, should this query be
able to make use of the indexes on starttime and endtime?

The "best" plan that I could see is a merge join between a sorted
sequential scan on e2.startnode and an index scan on e1.endnode, which
I figure takes care of the "e1.endnode = e2.startnode".  The join
filter is then "e1.starttime < e2.starttime AND e2.starttime <
e1.endtime" ... does this use an index?  Can the planner to use a
bitmap index scan to use the indexes on the start/endtimes in the join?

Table is about 3GB.


Re: Self-join query and index usage

From
Tom Lane
Date:
worky.workerson@gmail.com writes:
> and the query that I would like to run is:

> SELECT e1.endnode, count(*), sum(e1.weight) AS weight1, sum(e2.weight)
> AS weight2
> FROM event e1, event e2
> WHERE e1.endnode = e2.startnode AND e1.starttime < e2.starttime AND
> e2.starttime < e1.endtime
> GROUP BY e1.endnode

> Assuming that I have indexes on all the columns, should this query be
> able to make use of the indexes on starttime and endtime?

This is just really poorly suited for btree indexes.  What you're
looking for is an interval overlap test, which is something that can be
handled by rtree or gist indexes, but you'll need to change the form of
the query ...

            regards, tom lane