Thread: Ok, why isn't it using *this* index?

Ok, why isn't it using *this* index?

From
Paul Tomblin
Date:
I have a table with columns 'country' and 'state'.  I put indexes on both
of them.  I've done the "vacuum analyze" as per the faq.  But when I ask
it to explain, it says it will use the index on 'state' if I do a
    select * from waypoint where state = 'ON';
but it won't use the index on 'country' if I do a
    select * from waypoint where country = 'CANADA';

Some other interesting things are that it uses the index on state even if
I say "where state in ('ON','QC','BC','AB')", and it uses the index on
state but not the one on country if I combine "where state = 'ON' and
country = 'CANADA'".

Here's what it says:
    waypoint=> explain select * from waypoint where state = 'ON';
    NOTICE:  QUERY PLAN:

    Index Scan using waypoint_state on waypoint  (cost=7.17 rows=84
    width=130)

    EXPLAIN
    waypoint=> explain select * from waypoint where country = 'CANADA';
    NOTICE:  QUERY PLAN:

    Seq Scan on waypoint  (cost=455.13 rows=6813 width=130)

    EXPLAIN

Also, can anybody explain why the "rows=" doesn't correspond to anything
logical?  For instance, in the first one it says "rows=84" even though
there are 107 matching records, and 71 different states.


--
Paul Tomblin <ptomblin@xcski.com>, not speaking for anybody
Diplomacy is the ability to let someone else have your way.

Re: Ok, why isn't it using *this* index?

From
"ADBAAMD"
Date:
Paul Tomblin wrote:

> I have a table with columns 'country' and 'state'.  I put indexes on both
> of them.  I've done the "vacuum analyze" as per the faq.  But when I ask
> it to explain, it says it will use the index on 'state' if I do a
>     select * from waypoint where state = 'ON';
> but it won't use the index on 'country' if I do a
>     select * from waypoint where country = 'CANADA';
>
> Some other interesting things are that it uses the index on state even if
> I say "where state in ('ON','QC','BC','AB')", and it uses the index on
> state but not the one on country if I combine "where state = 'ON' and
> country = 'CANADA'".

    Maybe it's just my Oracle side, but doesn't country has a too low
selectivity?



--
  _
/ \   Leandro Guimarães Faria Corsetti Dutra         +55 (11) 3040 8913
\ /   Amdocs at Bell Canada                          +1 (514) 786 87 47
  X    Support Center, São Paulo, Brazil          mailto:adbaamd@bell.ca
/ \   http://terravista.pt./Enseada/1989/    mailto:leandrod@amdocs.com



RE: Ok, why isn't it using *this* index?

From
Mike Mascari
Date:
I would guess that the optimizer is choosing a sequential scan when
the country is CANADA because the number of rows fetched as a
percentage of total rows would warrant it. For example, country =
'CANADA' might be true for %30 of the total rows whereas STATE = 'ON'
might only represent 2%, and thus the index scan.

The EXPLAIN shows estimates and an excellent explanation of them can
be found here:

http://www.postgresql.org/users-lounge/docs/7.0/user/c4884.htm#AEN
4889

Hope that helps,

Mike Mascari
mascarm@mascari.com

-----Original Message-----
From:    Paul Tomblin [SMTP:ptomblin@xcski.com]
Sent:    Sunday, April 01, 2001 3:39 PM
To:    pgsql-general@postgresql.org
Subject:    [GENERAL] Ok, why isn't it using *this* index?

I have a table with columns 'country' and 'state'.  I put indexes on
both
of them.  I've done the "vacuum analyze" as per the faq.  But when I
ask
it to explain, it says it will use the index on 'state' if I do a
    select * from waypoint where state = 'ON';
but it won't use the index on 'country' if I do a
    select * from waypoint where country = 'CANADA';

Some other interesting things are that it uses the index on state
even if
I say "where state in ('ON','QC','BC','AB')", and it uses the index
on
state but not the one on country if I combine "where state = 'ON' and
country = 'CANADA'".

Here's what it says:
    waypoint=> explain select * from waypoint where state = 'ON';
    NOTICE:  QUERY PLAN:

    Index Scan using waypoint_state on waypoint  (cost=7.17 rows=84
    width=130)

    EXPLAIN
    waypoint=> explain select * from waypoint where country =
'CANADA';
    NOTICE:  QUERY PLAN:

    Seq Scan on waypoint  (cost=455.13 rows=6813 width=130)

    EXPLAIN

Also, can anybody explain why the "rows=" doesn't correspond to
anything
logical?  For instance, in the first one it says "rows=84" even
though
there are 107 matching records, and 71 different states.


--
Paul Tomblin <ptomblin@xcski.com>, not speaking for anybody
Diplomacy is the ability to let someone else have your way.


Re: Ok, why isn't it using *this* index?

From
Tom Lane
Date:
Paul Tomblin <ptomblin@xcski.com> writes:
>     waypoint=> explain select * from waypoint where country = 'CANADA';
>     NOTICE:  QUERY PLAN:

>     Seq Scan on waypoint  (cost=455.13 rows=6813 width=130)

I take it the majority of rows have country = 'CANADA'?  How many rows
in the table all together, anyway?  Presumably you're seeing the results
of an estimate that this WHERE clause is too unselective for an index
scan to be profitable.  But I can't tell if the rows estimate is any
good or not.

A rule of thumb is that an indexscan will only be used if the index
clauses select no more than a few percent of the rows in the table.
Otherwise the additional I/O to scan the index and to read the table
in nonsequential fashion costs more than a sequential scan does.

> Also, can anybody explain why the "rows=" doesn't correspond to anything
> logical?  For instance, in the first one it says "rows=84" even though
> there are 107 matching records, and 71 different states.

ROTFL ... given the thinness of the statistics used to make the
estimate, I'd call rows=84 practically dead on, if the true value
is 107.  The system is doing real good here.  You can read in the
archives about lots of cases where the estimate is off by a factor
of ten or worse, leading to bad plan choices.  This estimate is
plenty close enough to arrive at a reasonable plan.

            regards, tom lane

Re: Ok, why isn't it using *this* index?

From
Paul Tomblin
Date:
Quoting ADBAAMD (adba.amdocs@bell.ca):
> Paul Tomblin wrote:
> > I have a table with columns 'country' and 'state'.  I put indexes on both
> > of them.  I've done the "vacuum analyze" as per the faq.  But when I ask
> > it to explain, it says it will use the index on 'state' if I do a
> >     select * from waypoint where state = 'ON';
> > but it won't use the index on 'country' if I do a
> >     select * from waypoint where country = 'CANADA';
>
>     Maybe it's just my Oracle side, but doesn't country has a too low
> selectivity?

If I try
    explain select * from waypoint where country = 'BELIZE';
a query that will only select one record out of the 8300-odd, it still
doesn't use the index.
    Seq Scan on waypoint  (cost=455.13 rows=6813 width=130)


--
Paul Tomblin <ptomblin@xcski.com>, not speaking for anybody
Mommy, what does "Formatting Drive C:" mean?

Re: Ok, why isn't it using *this* index?

From
"ADBAAMD"
Date:
Paul Tomblin wrote:

> Quoting ADBAAMD (adba.amdocs@bell.ca):
>
>> Paul Tomblin wrote:
>>
>>> I have a table with columns 'country' and 'state'.  I put indexes on both
>>> of them.  I've done the "vacuum analyze" as per the faq.  But when I ask
>>> it to explain, it says it will use the index on 'state' if I do a
>>>     select * from waypoint where state = 'ON';
>>> but it won't use the index on 'country' if I do a
>>>     select * from waypoint where country = 'CANADA';
>>
>>     Maybe it's just my Oracle side, but doesn't country has a too low
>> selectivity?
>
>
> If I try
>     explain select * from waypoint where country = 'BELIZE';
> a query that will only select one record out of the 8300-odd, it still
> doesn't use the index.
>     Seq Scan on waypoint  (cost=455.13 rows=6813 width=130)

    Selectivity isn't about specific values, but about averages.

    If the planner would know statistics about each and every indexed value
on the database, it would take a lot of effort to ANALYZE indexed data,
the memory and disk consumption by statistics would be high, and CPU
usage by the planner would go gaga.  So it analyzes just averages.

    It doesn't matter that BELIZE has a high selectivity, but that country
has a low one.



--
  _
/ \   Leandro Guimarães Faria Corsetti Dutra         +55 (11) 3040 8913
\ /   Amdocs at Bell Canada                          +1 (514) 786 87 47
  X    Support Center, São Paulo, Brazil          mailto:adbaamd@bell.ca
/ \   http://terravista.pt./Enseada/1989/    mailto:leandrod@amdocs.com



Re: Ok, why isn't it using *this* index?

From
Paul Tomblin
Date:
Quoting ADBAAMD (adba.amdocs@bell.ca):
> Paul Tomblin wrote:
> > If I try
> >     explain select * from waypoint where country = 'BELIZE';
> > a query that will only select one record out of the 8300-odd, it still
> > doesn't use the index.
> >     Seq Scan on waypoint  (cost=455.13 rows=6813 width=130)
>
>     Selectivity isn't about specific values, but about averages.
>
>     If the planner would know statistics about each and every indexed value
> on the database, it would take a lot of effort to ANALYZE indexed data,
> the memory and disk consumption by statistics would be high, and CPU
> usage by the planner would go gaga.  So it analyzes just averages.
>
>     It doesn't matter that BELIZE has a high selectivity, but that country
> has a low one.

Ok, so if I understand you correctly, the fact that about 90% of the
records have country='USA' and about 9% of the records have
country='CANADA' means that it's never going to use the index because it
on average, a query is going to be for USA, and a sequential scan is going
to be better.

I think I understand now.  If this is correct, then doesn't it make sense
just to drop that index?  At least until I get a lot more data from other
countries?

waypoint=> select count(*), country from waypoint group by country;
count|country
-----+--------------------
    2|ANTIGUA AND BARBUDA
   15|BAHAMAS
    1|BARBADOS
    1|BELIZE
  741|CANADA
    1|CAYMAN ISLANDS
    5|COLOMBIA
    2|COSTA RICA
   23|CUBA
    1|DOMINICA
    3|DOMINICAN REPUBLIC
    1|ECUADOR
    3|FED STS MICRONESIA
    4|FRENCH WEST INDIES
    1|GRENADA
    1|GUYANA
    2|HAITI
    2|HONDURAS
    4|JAMAICA
    2|MARSHALL ISLANDS
   31|MEXICO
    3|NETHERLANDS ANTILLES
    2|NICARAGUA
    1|PALAU
    8|PANAMA
    2|TRINIDAD AND TOBAGO
    2|TRUST TERRITORIES
    2|TURKS AND CAICOS ISL
 7436|USA
    5|VENEZUELA
(30 rows)

--
Paul Tomblin <ptomblin@xcski.com>, not speaking for anybody
Every program has two purposes -- one for which it was written and
another for which it wasn't.

Re: Ok, why isn't it using *this* index?

From
"ADBAAMD"
Date:
Paul Tomblin wrote:

> Quoting ADBAAMD (adba.amdocs@bell.ca):
>
> Ok, so if I understand you correctly, the fact that about 90% of the
> records have country='USA' and about 9% of the records have
> country='CANADA' means that it's never going to use the index because it
> on average, a query is going to be for USA, and a sequential scan is going
> to be better.
>
> I think I understand now.  If this is correct, then doesn't it make sense
> just to drop that index?  At least until I get a lot more data from other
> countries?

    Probably you are right.

    You could also try other index access methods besides the standard
b-tree.  I don't know about the situation in pgsql, but in Oracle we
have even an index access method create specifically to serve low
selectivity data: bitmaps.

    While I could find a succint description of access methods at
http://www.postgresql.org/users-lounge/docs/7.0/postgres/indices.html, I
don't know how each of them work exactly, nor if any would be any good
for low selectivity situations.  Is there any docs on that?




--
  _
/ \   Leandro Guimarães Faria Corsetti Dutra         +55 (11) 3040 8913
\ /   Amdocs at Bell Canada                          +1 (514) 786 87 47
  X    Support Center, São Paulo, Brazil          mailto:adbaamd@bell.ca
/ \   http://terravista.pt./Enseada/1989/    mailto:leandrod@amdocs.com



Re: Ok, why isn't it using *this* index?

From
Paul Tomblin
Date:
Quoting ADBAAMD (adba.amdocs@bell.ca):
> Paul Tomblin wrote:
> > I think I understand now.  If this is correct, then doesn't it make sense
> > just to drop that index?  At least until I get a lot more data from other
> > countries?
>
>     Probably you are right.

Ok, thanks for all the help everybody.

--
Paul Tomblin <ptomblin@xcski.com>, not speaking for anybody
Don't use a big word where a diminutive one will suffice.

Re: Ok, why isn't it using *this* index?

From
"Denis A. Doroshenko"
Date:
On Sun, Apr 01, 2001 at 05:11:46PM -0400, Paul Tomblin wrote:
> Ok, so if I understand you correctly, the fact that about 90% of the
> records have country='USA' and about 9% of the records have
> country='CANADA' means that it's never going to use the index because it
> on average, a query is going to be for USA, and a sequential scan is going
> to be better.
>
> I think I understand now.  If this is correct, then doesn't it make sense
> just to drop that index?  At least until I get a lot more data from other
> countries?

huh, this seems to be ineteresting to know, just because this
"feature" may seriously affect effectivity. by the way, have you tried
to disable sequential scan? that may force pgsql to use an index in
any case, AFAIK.

--
Denis A. Doroshenko  [GPRS/IN/WAP, VAS group engineer] .-.        _|_  |
[Omnitel Ltd., T.Sevcenkos st. 25, Vilnius, Lithuania] | | _ _  _ .| _ |
[Phone: +370 9863486 E-mail: d.doroshenko@omnitel.net] |_|| | || |||(/_|_

Re: Ok, why isn't it using *this* index?

From
"ADBAAMD"
Date:
Denis A. Doroshenko wrote:

> On Sun, Apr 01, 2001 at 05:11:46PM -0400, Paul Tomblin wrote:
>
>> Ok, so if I understand you correctly, the fact that about 90% of the
>> records have country='USA' and about 9% of the records have
>> country='CANADA' means that it's never going to use the index because it
>> on average, a query is going to be for USA, and a sequential scan is going
>> to be better.
>>
>> I think I understand now.  If this is correct, then doesn't it make sense
>> just to drop that index?  At least until I get a lot more data from other
>> countries?
>
>
> huh, this seems to be ineteresting to know, just because this
> "feature" may seriously affect effectivity. by the way, have you tried
> to disable sequential scan? that may force pgsql to use an index in
> any case, AFAIK.

    It is not always a good idea to force the use of the index.  Overriding
the planner should only be done after tests on realistic volumes of data
and with a thorough understanding about what's going on.

    In fact it is wise to work only with the indices you will use frequently
and tables that have some significant queries as compared to updates,
because updates get *slower* with indices!



--
  _
/ \   Leandro Guimarães Faria Corsetti Dutra         +55 (11) 3040 8913
\ /   Amdocs at Bell Canada                          +1 (514) 786 87 47
  X    Support Center, São Paulo, Brazil          mailto:adbaamd@bell.ca
/ \   http://terravista.pt./Enseada/1989/    mailto:leandrod@amdocs.com



Re: Ok, why isn't it using *this* index?

From
Tom Lane
Date:
Paul Tomblin <ptomblin@xcski.com> writes:
> Ok, so if I understand you correctly, the fact that about 90% of the
> records have country='USA' and about 9% of the records have
> country='CANADA' means that it's never going to use the index because it
> on average, a query is going to be for USA, and a sequential scan is going
> to be better.

Actually, 7.0 and later (which you are not using, I gather from your
EXPLAIN display) do know the difference between the most common value
in the column and the rest of 'em.  I think that 7.0 would choose an
indexscan in the case where it can see that you are not looking for
'USA'.  Which would be the right choice for 'BELIZE', but probably
not the right choice for 'CANADA'.

For 7.2 I am hoping to extend the stored stats to know about the top
three or so common values, not just one, so that we can deal more
effectively with data distributions like this one.  But in any case,
most of the respondents in this thread have been assuming that you
were running a reasonably current Postgres.  Try upgrading ...

            regards, tom lane

Re: Ok, why isn't it using *this* index?

From
Paul Tomblin
Date:
Quoting Tom Lane (tgl@sss.pgh.pa.us):
> Paul Tomblin <ptomblin@xcski.com> writes:
> > Ok, so if I understand you correctly, the fact that about 90% of the
> > records have country='USA' and about 9% of the records have
> > country='CANADA' means that it's never going to use the index because it
> > on average, a query is going to be for USA, and a sequential scan is going
> > to be better.
>
> effectively with data distributions like this one.  But in any case,
> most of the respondents in this thread have been assuming that you
> were running a reasonably current Postgres.  Try upgrading ...

Does anybody know if the RPM that comes with RedHat 7.0 will work with
RedHat 6.2?  I've just upgraded my workstation machine to RedHat 7.0, and
I want to wait a bit before upgrading my server as well.  (If only because
lpr suddenly stopped working with the last "update" from RedHat.)

--
Paul Tomblin <ptomblin@xcski.com>, not speaking for anybody
"American 999, Heathrow Approach.  Descend and maintain 4,000 feet, QNH 1011."
"Uhhh Heathrow Approach, could you give us that in inches?"
"American 999, descend and maintain 48,000 inches, QNH 1011."

Re: Ok, why isn't it using *this* index?

From
Doug McNaught
Date:
Paul Tomblin <ptomblin@xcski.com> writes:

> Does anybody know if the RPM that comes with RedHat 7.0 will work with
> RedHat 6.2?  I've just upgraded my workstation machine to RedHat 7.0, and
> I want to wait a bit before upgrading my server as well.  (If only because
> lpr suddenly stopped working with the last "update" from RedHat.)

It's unlikely that the binary RPM will work, but if you get the source
RPM it *should* build on 6.2, though you might have to upgrade RPM on
your 6.2 machine (if you haven't already).

I'm running 7.1beta (compiled from source tarball) on multiple 6.2
machines.  It's not too painful to manage because everything goes into
/usr/local/pgsql by default--it doesn't scatter files all over the
machine.

-Doug