I have a table called 'jobs' with several million rows, and the only
columns that are important to this discussion are 'start_time' and
'completion_time'.
The sort of queries I want to execute (among others) are like:
SELECT * FROM jobs
WHERE completion_time > SOMEDATE AND start_time < SOMEDATE;
In plain english: All the jobs that were running at SOMEDATE. The
result of the query is on the order of 500 rows.
I've got seperate indexes on 'start_time' and 'completion_time'.
Now, if SOMEDATE is such that the number of rows with completion_time
> SOMEDATE is small (say 10s of thousands), the query uses index scans
and executes quickly. If not, the query uses sequential scans and is
unacceptably slow (a couple of minutes). I've used EXPLAIN and
EXPLAIN ANALYZE to confirm this. This makes perfect sense to me.
I've played with some of the memory settings for PostgreSQL, but none
has had a significant impact.
Any ideas on how to structure the query or add/change indexes in such
a way to improve its performance? In desperation, I tried using a
subquery, but unsurprisingly it made no (positive) difference. I feel
like there might be a way of using an index on both 'completion_time'
and 'start_time', but can't put a temporal lobe on the details.
Mark