Thread: Query not using the indexes properly.

Query not using the indexes properly.

From

Tim Uckun

Date:

01 October 2009, 22:47:36

I have a pretty simple query on a pretty simple table with about 60
million records in it.

This is the query.

SELECT * FROM "changes" WHERE (id > 1935759 and company_id = 4 and
source_model_name = 'CommissionedVisit') ORDER BY id ASC LIMIT 1


The id field is the primary key. The other fields are indexed
(company_id and source_model_name).

This query takes about 30 seconds to run on a pretty beefy machine.

Here is the explain.

"Limit  (cost=0.00..7.46 rows=1 width=45) (actual
time=28799.712..28799.712 rows=0 loops=1)"
"  ->  Index Scan using changes_pkey on changes
(cost=0.00..2331939.52 rows=312519 width=45) (actual
time=28799.710..28799.710 rows=0 loops=1)"
"        Index Cond: (id > 1935759)"
"        Filter: ((company_id = 4) AND ((source_model_name)::text =
'CommissionedVisit'::text))"
"Total runtime: 28799.749 ms"


It seem to me that it's ignoring the indexes on the text fields. Is
that right?

I did a vacuum analyze on the table but that didn't seem to help at all.

Re: Query not using the indexes properly.

From

Chris

Date:

01 October 2009, 23:04:51

Tim Uckun wrote:
> I have a pretty simple query on a pretty simple table with about 60
> million records in it.
>
> This is the query.
>
> SELECT * FROM "changes" WHERE (id > 1935759 and company_id = 4 and
> source_model_name = 'CommissionedVisit') ORDER BY id ASC LIMIT 1
>
>
> The id field is the primary key. The other fields are indexed
> (company_id and source_model_name).
>
> This query takes about 30 seconds to run on a pretty beefy machine.
>
> Here is the explain.
>
> "Limit  (cost=0.00..7.46 rows=1 width=45) (actual
> time=28799.712..28799.712 rows=0 loops=1)"
> "  ->  Index Scan using changes_pkey on changes
> (cost=0.00..2331939.52 rows=312519 width=45) (actual
> time=28799.710..28799.710 rows=0 loops=1)"
> "        Index Cond: (id > 1935759)"
> "        Filter: ((company_id = 4) AND ((source_model_name)::text =
> 'CommissionedVisit'::text))"
> "Total runtime: 28799.749 ms"
>
>
> It seem to me that it's ignoring the indexes on the text fields. Is
> that right?

It probably thinks the id check is going to be better to limit the
result set.

How many records are there for id > 1935759 ?
vs
How many records for company_id = 4 and source_model_name =
'CommissionedVisit' ?

If this is a common query you could probably do a multi-column index on
all 3 columns (id, company_id, source_model_name) - but if company_id
and source_model_name have a low number of distinct values, then it's
not going to help.

--
Postgresql & php tutorials
http://www.designmagick.com/

Re: Query not using the indexes properly.

From

David Wilson

Date:

01 October 2009, 23:19:09

On Thu, Oct 1, 2009 at 10:04 PM, Chris <dmagick@gmail.com> wrote:

Tim Uckun wrote:

If this is a common query you could probably do a multi-column index on all 3 columns (id, company_id, source_model_name) - but if company_id and source_model_name have a low number of distinct values, then it's not going to help.

If you try the multi-column index (which is a good idea), be sure that "id" is the last of the three columns, since that's the column on which you have an inequality test rather than an equality test; eg, (company_id,source_model_name,id).

--
- David T. Wilson
david.t.wilson@gmail.com

Re: Query not using the indexes properly.

From

Tim Uckun

Date:

01 October 2009, 23:20:11

> It probably thinks the id check is going to be better to limit the result
> set.
>
> How many records are there for id > 1935759 ?

About 40 million or so.

> vs
> How many records for company_id = 4 and source_model_name =
> 'CommissionedVisit' ?
>
> If this is a common query you could probably do a multi-column index on all
> 3 columns (id, company_id, source_model_name) - but if company_id and
> source_model_name have a low number of distinct values, then it's not going
> to help.


Both of those will indeed have a pretty low number of distinct values.

Looks like I need to figure out something else.

Re: Query not using the indexes properly.

From

Tim Uckun

Date:

01 October 2009, 23:21:49

>
> If you try the multi-column index (which is a good idea), be sure that "id"
> is the last of the three columns, since that's the column on which you have
> an inequality test rather than an equality test; eg,
> (company_id,source_model_name,id).
>


Interesting. I would have thought the order of the fields would not
matter. I don't have to rewrite the query do I?

Re: Query not using the indexes properly.

From

David Wilson

Date:

01 October 2009, 23:25:58

On Thu, Oct 1, 2009 at 10:21 PM, Tim Uckun <timuckun@gmail.com> wrote:

Interesting. I would have thought the order of the fields would not
matter. I don't have to rewrite the query do I?

No. For multi-column indices, however, postgres can, starting at the leftmost in the index, use as many columns as match equality comparisons plus one column using an inequality comparison.

From our fine manual, section 11.3:

"A multicolumn B-tree index can be used with query conditions that involve any subset of the index's columns, but the index is most efficient when there are constraints on the leading (leftmost) columns. The exact rule is that equality constraints on leading columns, plus any inequality constraints on the first column that does not have an equality constraint, will be used to limit the portion of the index that is scanned. Constraints on columns to the right of these columns are checked in the index, so they save visits to the table proper, but they do not reduce the portion of the index that has to be scanned. For example, given an index on (a, b, c) and a query condition WHERE a = 5 AND b >= 42 AND c < 77, the index would have to be scanned from the first entry with a = 5 and b = 42 up through the last entry with a = 5. Index entries with c >= 77 would be skipped, but they'd still have to be scanned through. This index could in principle be used for queries that have constraints on b and/or c with no constraint on a — but the entire index would have to be scanned, so in most cases the planner would prefer a sequential table scan over using the index."

--
- David T. Wilson
david.t.wilson@gmail.com