Thread: Forcing use of indexes

Forcing use of indexes

From
Pedro Alves
Date:
  Hi! I'm having some dificulties using indexes;

1. I run the same query (select blah ... order by foo limit bar) in 2
"virtualy" identical machines, both having postgres v7.3.2. The database is
the same (the amount of data is a bit diferent) and machine A has (much)
more shared buffers than Machine B; postgres uses indexes in B but not in
A. If I change the limit from 200 to 100, machine A starts using indexes.
In machine B, the optimizer only stops using indexes in limit 800. Why does
this happen? Is there any memory parameter that controles this behaviour?



2. Is there any way to force the use of indexes?


3. I have a composite index in columns foo and bar and an index in foo. I
noticed that making a query such as select * from table where foo=1 and
bar=2, postgres correctly uses foo_bar_idx. But if I use select * from
table where foo=1 and bar IN (1,2), posgtres uses foo_idx, having much more
inneficiency. I can make select * from table where foo=1 and bar=2 UNION
select * from table where foo=1 and bar=1, but it's quite ugly. Is this
supposed to work like this?



  Thanks in advance



--
Pedro Miguel G. Alves           pmalves@think.pt
THINK - Tecnologias de Informação   www.think.pt
Tel:   +351 21 412 56 56  Av. José Gomes Ferreira
Fax:   +351 21 412 56 57     nº 13 1495-139 ALGÉS


Re: Forcing use of indexes

From
Tom Lane
Date:
Pedro Alves <pmalves@think.pt> writes:
> 1. I run the same query (select blah ... order by foo limit bar) in 2
> "virtualy" identical machines, both having postgres v7.3.2. The database is
> the same (the amount of data is a bit diferent) and machine A has (much)
> more shared buffers than Machine B; postgres uses indexes in B but not in
> A. If I change the limit from 200 to 100, machine A starts using indexes.
> In machine B, the optimizer only stops using indexes in limit 800. Why does
> this happen? Is there any memory parameter that controles this behaviour?

Have you vacuum analyzed recently on both machines?

shared_buffers doesn't affect the estimated cost of an indexscan.
effective_cache_size does, also random_page_cost, but you didn't mention
having touched those.


> 3. I have a composite index in columns foo and bar and an index in foo. I
> noticed that making a query such as select * from table where foo=1 and
> bar=2, postgres correctly uses foo_bar_idx. But if I use select * from
> table where foo=1 and bar IN (1,2), posgtres uses foo_idx, having much more
> inneficiency.

Presently, you'd need an index on (bar,foo) to get a good plan for a
query expressed that way.

            regards, tom lane


Re: Forcing use of indexes

From
Pedro Alves
Date:
On Wed, Apr 02, 2003 at 09:52:19AM -0500, Tom Lane wrote:
> Pedro Alves <pmalves@think.pt> writes:
> > 1. I run the same query (select blah ... order by foo limit bar) in 2
> > "virtualy" identical machines, both having postgres v7.3.2. The database is
> > the same (the amount of data is a bit diferent) and machine A has (much)
> > more shared buffers than Machine B; postgres uses indexes in B but not in
> > A. If I change the limit from 200 to 100, machine A starts using indexes.
> > In machine B, the optimizer only stops using indexes in limit 800. Why does
> > this happen? Is there any memory parameter that controles this behaviour?
>
> Have you vacuum analyzed recently on both machines?

  Yep, in both machines


>
> shared_buffers doesn't affect the estimated cost of an indexscan.
> effective_cache_size does, also random_page_cost, but you didn't mention
> having touched those.
>


  No, I didn't

>
> > 3. I have a composite index in columns foo and bar and an index in foo. I
> > noticed that making a query such as select * from table where foo=1 and
> > bar=2, postgres correctly uses foo_bar_idx. But if I use select * from
> > table where foo=1 and bar IN (1,2), posgtres uses foo_idx, having much more
> > inneficiency.
>
> Presently, you'd need an index on (bar,foo) to get a good plan for a
> query expressed that way.
>

  Thats what I meant when I said 'I have a composite index in columns foo
and bar'. In the second query, it doesn't this index.


  Thanks




--
Pedro Miguel G. Alves           pmalves@think.pt
THINK - Tecnologias de Informação   www.think.pt
Tel:   +351 21 412 56 56  Av. José Gomes Ferreira
Fax:   +351 21 412 56 57     nº 13 1495-139 ALGÉS


Re: Forcing use of indexes

From
Tom Lane
Date:
Pedro Alves <pmalves@think.pt> writes:
> On Wed, Apr 02, 2003 at 09:52:19AM -0500, Tom Lane wrote:
>> Presently, you'd need an index on (bar,foo) to get a good plan for a
>> query expressed that way.

>   Thats what I meant when I said 'I have a composite index in columns foo
> and bar'. In the second query, it doesn't this index.

An index on (foo,bar) isn't equivalent to one on (bar,foo).

            regards, tom lane


Re: Forcing use of indexes

From
Lincoln Yeoh
Date:
At 09:52 AM 4/2/03 -0500, Tom Lane wrote:

>shared_buffers doesn't affect the estimated cost of an indexscan.
>effective_cache_size does, also random_page_cost, but you didn't mention
>having touched those.

Hi,

If my O/S has a cache of say 1GB and my DB is < 1GB and is totally in cache
would setting effective_cache_size to 1GB make the optimizer decide on
index usage just as setting random_page_cost to 1?

If random page cost is high but so is effective_cache_size does postgresql
use sequential scans first time round and then index scans second time
round if everything cached? Of course if random page cost is 1 then always
use index scan even for first read. This is probably "academic" and not
really an issue for real world.

But the main thing is: is it hard for the optimizer to tell whether a
DB/table/index is completely in effective_cache_size?

There's mention of something like this (see below), but the final
suggestion in thread was to set random_page_cost to 1, so I'm wondering how
one would use effective_cache_size.

Brian Hirt (bhirt@mobygames.com)
Re: Performance Tuning Question
Date: 2002-09-09 10:17:52 PST

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&safe=off&selm=1031586091.1345.722.camel%40loopy.tr.berkhirt.com&rnum=3

Regards,
Link.


Re: Forcing use of indexes

From
Tom Lane
Date:
Lincoln Yeoh <lyeoh@pop.jaring.my> writes:
> If my O/S has a cache of say 1GB and my DB is < 1GB and is totally in cache
> would setting effective_cache_size to 1GB make the optimizer decide on
> index usage just as setting random_page_cost to 1?

I don't feel like going through the equations at the moment (it's open
source, read for yourself) but certainly if table+index are less than
effective_cache_size the cost estimate should be pretty low.

> If random page cost is high but so is effective_cache_size does postgresql
> use sequential scans first time round and then index scans second time
> round if everything cached?

No, there is no notion of "first time round" vs "second time round".

> But the main thing is: is it hard for the optimizer to tell whether a
> DB/table/index is completely in effective_cache_size?

It knows the table & index size as last recorded by VACUUM.  This might
not match up with current reality, of course...

            regards, tom lane