Thread: Forcing use of indexes
Hi! I'm having some dificulties using indexes; 1. I run the same query (select blah ... order by foo limit bar) in 2 "virtualy" identical machines, both having postgres v7.3.2. The database is the same (the amount of data is a bit diferent) and machine A has (much) more shared buffers than Machine B; postgres uses indexes in B but not in A. If I change the limit from 200 to 100, machine A starts using indexes. In machine B, the optimizer only stops using indexes in limit 800. Why does this happen? Is there any memory parameter that controles this behaviour? 2. Is there any way to force the use of indexes? 3. I have a composite index in columns foo and bar and an index in foo. I noticed that making a query such as select * from table where foo=1 and bar=2, postgres correctly uses foo_bar_idx. But if I use select * from table where foo=1 and bar IN (1,2), posgtres uses foo_idx, having much more inneficiency. I can make select * from table where foo=1 and bar=2 UNION select * from table where foo=1 and bar=1, but it's quite ugly. Is this supposed to work like this? Thanks in advance -- Pedro Miguel G. Alves pmalves@think.pt THINK - Tecnologias de Informação www.think.pt Tel: +351 21 412 56 56 Av. José Gomes Ferreira Fax: +351 21 412 56 57 nº 13 1495-139 ALGÉS
Pedro Alves <pmalves@think.pt> writes: > 1. I run the same query (select blah ... order by foo limit bar) in 2 > "virtualy" identical machines, both having postgres v7.3.2. The database is > the same (the amount of data is a bit diferent) and machine A has (much) > more shared buffers than Machine B; postgres uses indexes in B but not in > A. If I change the limit from 200 to 100, machine A starts using indexes. > In machine B, the optimizer only stops using indexes in limit 800. Why does > this happen? Is there any memory parameter that controles this behaviour? Have you vacuum analyzed recently on both machines? shared_buffers doesn't affect the estimated cost of an indexscan. effective_cache_size does, also random_page_cost, but you didn't mention having touched those. > 3. I have a composite index in columns foo and bar and an index in foo. I > noticed that making a query such as select * from table where foo=1 and > bar=2, postgres correctly uses foo_bar_idx. But if I use select * from > table where foo=1 and bar IN (1,2), posgtres uses foo_idx, having much more > inneficiency. Presently, you'd need an index on (bar,foo) to get a good plan for a query expressed that way. regards, tom lane
On Wed, Apr 02, 2003 at 09:52:19AM -0500, Tom Lane wrote: > Pedro Alves <pmalves@think.pt> writes: > > 1. I run the same query (select blah ... order by foo limit bar) in 2 > > "virtualy" identical machines, both having postgres v7.3.2. The database is > > the same (the amount of data is a bit diferent) and machine A has (much) > > more shared buffers than Machine B; postgres uses indexes in B but not in > > A. If I change the limit from 200 to 100, machine A starts using indexes. > > In machine B, the optimizer only stops using indexes in limit 800. Why does > > this happen? Is there any memory parameter that controles this behaviour? > > Have you vacuum analyzed recently on both machines? Yep, in both machines > > shared_buffers doesn't affect the estimated cost of an indexscan. > effective_cache_size does, also random_page_cost, but you didn't mention > having touched those. > No, I didn't > > > 3. I have a composite index in columns foo and bar and an index in foo. I > > noticed that making a query such as select * from table where foo=1 and > > bar=2, postgres correctly uses foo_bar_idx. But if I use select * from > > table where foo=1 and bar IN (1,2), posgtres uses foo_idx, having much more > > inneficiency. > > Presently, you'd need an index on (bar,foo) to get a good plan for a > query expressed that way. > Thats what I meant when I said 'I have a composite index in columns foo and bar'. In the second query, it doesn't this index. Thanks -- Pedro Miguel G. Alves pmalves@think.pt THINK - Tecnologias de Informação www.think.pt Tel: +351 21 412 56 56 Av. José Gomes Ferreira Fax: +351 21 412 56 57 nº 13 1495-139 ALGÉS
Pedro Alves <pmalves@think.pt> writes: > On Wed, Apr 02, 2003 at 09:52:19AM -0500, Tom Lane wrote: >> Presently, you'd need an index on (bar,foo) to get a good plan for a >> query expressed that way. > Thats what I meant when I said 'I have a composite index in columns foo > and bar'. In the second query, it doesn't this index. An index on (foo,bar) isn't equivalent to one on (bar,foo). regards, tom lane
At 09:52 AM 4/2/03 -0500, Tom Lane wrote: >shared_buffers doesn't affect the estimated cost of an indexscan. >effective_cache_size does, also random_page_cost, but you didn't mention >having touched those. Hi, If my O/S has a cache of say 1GB and my DB is < 1GB and is totally in cache would setting effective_cache_size to 1GB make the optimizer decide on index usage just as setting random_page_cost to 1? If random page cost is high but so is effective_cache_size does postgresql use sequential scans first time round and then index scans second time round if everything cached? Of course if random page cost is 1 then always use index scan even for first read. This is probably "academic" and not really an issue for real world. But the main thing is: is it hard for the optimizer to tell whether a DB/table/index is completely in effective_cache_size? There's mention of something like this (see below), but the final suggestion in thread was to set random_page_cost to 1, so I'm wondering how one would use effective_cache_size. Brian Hirt (bhirt@mobygames.com) Re: Performance Tuning Question Date: 2002-09-09 10:17:52 PST http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&safe=off&selm=1031586091.1345.722.camel%40loopy.tr.berkhirt.com&rnum=3 Regards, Link.
Lincoln Yeoh <lyeoh@pop.jaring.my> writes: > If my O/S has a cache of say 1GB and my DB is < 1GB and is totally in cache > would setting effective_cache_size to 1GB make the optimizer decide on > index usage just as setting random_page_cost to 1? I don't feel like going through the equations at the moment (it's open source, read for yourself) but certainly if table+index are less than effective_cache_size the cost estimate should be pretty low. > If random page cost is high but so is effective_cache_size does postgresql > use sequential scans first time round and then index scans second time > round if everything cached? No, there is no notion of "first time round" vs "second time round". > But the main thing is: is it hard for the optimizer to tell whether a > DB/table/index is completely in effective_cache_size? It knows the table & index size as last recorded by VACUUM. This might not match up with current reality, of course... regards, tom lane