Why is PostgreSQL 9.1 not using index for simple equality select - Mailing list pgsql-general

From Yang Zhang
Subject Why is PostgreSQL 9.1 not using index for simple equality select
Date
Msg-id CAKxBDU8u8sOWy-hSoM7YCR-AhzbN+cYFpS0oc-s0yJbXg2uG_Q@mail.gmail.com
Whole thread Raw
Responses Re: Why is PostgreSQL 9.1 not using index for simple equality select
Re: Why is PostgreSQL 9.1 not using index for simple equality select
List pgsql-general
Any hints with this question I had posted to SO?

http://stackoverflow.com/questions/15965785/why-is-postgresql-9-1-not-using-index-for-simple-equality-select

Pasted here as well.  Thanks.

My table `lead` has an index:

    \d lead
    ...
    Indexes:
        "lead_pkey" PRIMARY KEY, btree (id)
        "lead_account__c" btree (account__c)
        ...
        "lead_email" btree (email)
        "lead_id_prefix" btree (id text_pattern_ops)

Why doesn't PG (9.1) use the index for this straightforward equality
selection?  Emails are almost all unique....

    db=> explain select * from lead where email = 'blah';
                             QUERY PLAN
    ------------------------------------------------------------
     Seq Scan on lead  (cost=0.00..319599.38 rows=1 width=5108)
       Filter: (email = 'blah'::text)
    (2 rows)

Other index-hitting queries seem to be OK (though I don't know why
this one doesn't just use the pkey index):

    db=> explain select * from lead where id = '';
                                      QUERY PLAN
    ------------------------------------------------------------------------------
     Index Scan using lead_id_prefix on lead  (cost=0.00..8.57 rows=1
width=5108)
       Index Cond: (id = ''::text)
    (2 rows)

    db=> explain select * from lead where account__c = '';
                                        QUERY PLAN
    ----------------------------------------------------------------------------------
     Index Scan using lead_account__c on lead  (cost=0.00..201.05
rows=49 width=5108)
       Index Cond: (account__c = ''::text)
    (2 rows)

At first I thought it may be due to not enough distinct values of
`email`.  For instance, if the stats claim that `email` is `blah` for
most of the table, then a seq scan is faster.  But that's not the
case:

    db=> select count(*), count(distinct email) from lead;
     count  | count
    --------+--------
     749148 | 733416
    (1 row)

Even if I force seq scans to be off, the planner behaves as if it has
no other choice:

    db=> set enable_seqscan = off;
    SET
    db=> show enable_seqscan;
     enable_seqscan
    ----------------
     off
    (1 row)

    db=> explain select * from lead where email = 'foo@blah.com';
                                QUERY PLAN
    ---------------------------------------------------------------------------
     Seq Scan on lead  (cost=10000000000.00..10000319599.38 rows=1 width=5108)
       Filter: (email = 'foo@blah.com'::text)
    (2 rows)

I searched over a good number of past SO questions but none were about
a simple equality query like this one.


pgsql-general by date:

Previous
From: John R Pierce
Date:
Subject: Re: How to convert US date format to European date format ?
Next
From: John R Pierce
Date:
Subject: Re: Why is PostgreSQL 9.1 not using index for simple equality select