Thread: Why is PostgreSQL 9.1 not using index for simple equality select
Any hints with this question I had posted to SO? http://stackoverflow.com/questions/15965785/why-is-postgresql-9-1-not-using-index-for-simple-equality-select Pasted here as well. Thanks. My table `lead` has an index: \d lead ... Indexes: "lead_pkey" PRIMARY KEY, btree (id) "lead_account__c" btree (account__c) ... "lead_email" btree (email) "lead_id_prefix" btree (id text_pattern_ops) Why doesn't PG (9.1) use the index for this straightforward equality selection? Emails are almost all unique.... db=> explain select * from lead where email = 'blah'; QUERY PLAN ------------------------------------------------------------ Seq Scan on lead (cost=0.00..319599.38 rows=1 width=5108) Filter: (email = 'blah'::text) (2 rows) Other index-hitting queries seem to be OK (though I don't know why this one doesn't just use the pkey index): db=> explain select * from lead where id = ''; QUERY PLAN ------------------------------------------------------------------------------ Index Scan using lead_id_prefix on lead (cost=0.00..8.57 rows=1 width=5108) Index Cond: (id = ''::text) (2 rows) db=> explain select * from lead where account__c = ''; QUERY PLAN ---------------------------------------------------------------------------------- Index Scan using lead_account__c on lead (cost=0.00..201.05 rows=49 width=5108) Index Cond: (account__c = ''::text) (2 rows) At first I thought it may be due to not enough distinct values of `email`. For instance, if the stats claim that `email` is `blah` for most of the table, then a seq scan is faster. But that's not the case: db=> select count(*), count(distinct email) from lead; count | count --------+-------- 749148 | 733416 (1 row) Even if I force seq scans to be off, the planner behaves as if it has no other choice: db=> set enable_seqscan = off; SET db=> show enable_seqscan; enable_seqscan ---------------- off (1 row) db=> explain select * from lead where email = 'foo@blah.com'; QUERY PLAN --------------------------------------------------------------------------- Seq Scan on lead (cost=10000000000.00..10000319599.38 rows=1 width=5108) Filter: (email = 'foo@blah.com'::text) (2 rows) I searched over a good number of past SO questions but none were about a simple equality query like this one.
On 4/12/2013 1:03 AM, Yang Zhang wrote: > db=> explain select * from lead where email = 'blah'; > QUERY PLAN > ------------------------------------------------------------ > Seq Scan on lead (cost=0.00..319599.38 rows=1 width=5108) > Filter: (email = 'blah'::text) try EXPLAIN ANALYZE ..... its more useful. my guess is, there's no statistics on this table, and doing an ANALYZE lead; would rectify this. -- john r pierce 37N 122W somewhere on the middle of the left coast
Doesn't seem to be the case. This table has been around for a while and should have been auto-analyzed by now. But anyway: db=> analyze lead; ANALYZE db=> explain select * from lead where email = 'foo@blah.com'; QUERY PLAN --------------------------------------------------------------------------- Seq Scan on lead (cost=10000000000.00..10000319666.99 rows=1 width=5208) Filter: (email = 'foo@blah.com'::text) (2 rows) On Fri, Apr 12, 2013 at 1:13 AM, John R Pierce <pierce@hogranch.com> wrote: > On 4/12/2013 1:03 AM, Yang Zhang wrote: >> >> db=> explain select * from lead where email = 'blah'; >> QUERY PLAN >> ------------------------------------------------------------ >> Seq Scan on lead (cost=0.00..319599.38 rows=1 width=5108) >> Filter: (email = 'blah'::text) > > > > try EXPLAIN ANALYZE ..... its more useful. > > my guess is, there's no statistics on this table, and doing an ANALYZE lead; > would rectify this. > > > > -- > john r pierce 37N 122W > somewhere on the middle of the left coast > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general -- Yang Zhang http://yz.mit.edu/
On 12 April 2013 10:45, Yang Zhang <yanghatespam@gmail.com> wrote:
What about:
explain analyze select * from lead where email = 'foo@blah.com';
--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.
explain select * from lead where email = 'foo@blah.com';
explain analyze select * from lead where email = 'foo@blah.com';
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.
On 4/12/2013 1:45 AM, Yang Zhang wrote:
db=> explain select * from lead where email = 'foo@blah.com';
can you try
explain analyze select * from lead where email = 'foo@blah.com';?
-- john r pierce 37N 122W somewhere on the middle of the left coast
Yang Zhang <yanghatespam@gmail.com> writes: > db=> explain select * from lead where email = 'blah'; > QUERY PLAN > ------------------------------------------------------------ > Seq Scan on lead (cost=0.00..319599.38 rows=1 width=5108) > Filter: (email = 'blah'::text) > (2 rows) That's awfully odd. What data type is the "email" column? It seems possible also that the index on it is marked invalid. I'd have expected \d to tell you so, but maybe you're using a version of psql that doesn't know about that. It'd be interesting to look at select * from pg_index where indexrelid = 'index name here'::regclass; regards, tom lane
It's actually just `text`. I updated my SO question with some more info including explain analyze (no difference), \d, and your last incantation. Thanks! On Fri, Apr 12, 2013 at 7:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Yang Zhang <yanghatespam@gmail.com> writes: >> db=> explain select * from lead where email = 'blah'; >> QUERY PLAN >> ------------------------------------------------------------ >> Seq Scan on lead (cost=0.00..319599.38 rows=1 width=5108) >> Filter: (email = 'blah'::text) >> (2 rows) > > That's awfully odd. What data type is the "email" column? > > It seems possible also that the index on it is marked invalid. I'd have > expected \d to tell you so, but maybe you're using a version of psql that > doesn't know about that. It'd be interesting to look at > select * from pg_index where indexrelid = 'index name here'::regclass; > > > regards, tom lane -- Yang Zhang http://yz.mit.edu/
Yang Zhang <yanghatespam@gmail.com> writes: > I updated my SO question with some more info including explain analyze > (no difference), \d, > and your last incantation. The question is being asked here, not in SO, and I find it rather impolite of you to expect me to go chasing off to some other forum to answer your question. regards, tom lane
Apologies for that Tom. I will paste the information in line once I'm back at my computer. I do appreciate your help. On Fri, Apr 12, 2013 at 10:24 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Yang Zhang <yanghatespam@gmail.com> writes: >> I updated my SO question with some more info including explain analyze >> (no difference), \d, >> and your last incantation. > > The question is being asked here, not in SO, and I find it rather > impolite of you to expect me to go chasing off to some other forum > to answer your question. > > regards, tom lane -- Yang Zhang http://yz.mit.edu/