Tom Lane wrote:
> [ enlarging on Richard's response a bit ]
>
> John Meinel <john@johnmeinel.com> writes:
>
>>jfmeinel=> explain analyze execute myget(30000);
>> QUERY PLAN
>>--------------------------------------------------------------------
>> Seq Scan on tdata (cost=0.00..9773.10 rows=181923 width=4)
>> (actual time=1047.000..1047.000 rows=0 loops=1)
>> Filter: (project_id = $1)
>> Total runtime: 1047.000 ms
>
>
>>jfmeinel=> explain analyze select id from tdata where project_id = 30000;
>> QUERY PLAN
>
>
>>-------------------------------------------------------------------------
>> Index Scan using tdata_project_id_idx on tdata (cost=0.00..4.20
>>rows=1 width=4) (actual time=0.000..0.000 rows=0 loops =1)
>> Index Cond: (project_id = 30000)
>> Total runtime: 0.000 ms
>
>
>>So notice that when doing the actual select it is able to do the index
>>query. But for some reason with a prepared statement, it is not able to
>>do it.
>
>
> This isn't a "can't do it" situation, it's a "doesn't want to do it"
> situation, and it's got nothing whatever to do with null or not null.
> The issue is the estimated row count, which in the first case is so high
> as to make the seqscan approach look cheaper. So the real question here
> is what are the statistics on the column that are making the planner
> guess such a large number when it has no specific information about the
> compared-to value. Do you have one extremely common value in the column?
> Have you done an ANALYZE recently on the table, and if so can you show
> us the pg_stats row for the column?
>
> regards, tom lane
>
The answer is "yes" that particular column has very common numbers in
it. Project id is a number from 1->21. I ended up modifying my query
such that I do the bulk of the work in a regular UNION SELECT so that
all that can be optimized, and then I later do another query for this
row in an 'EXECUTE ...' so that unless I'm actually requesting a small
number, the query planner can notice that it can do an indexed query.
I'm pretty sure this is just avoiding worst case scenario. Because it is
true that if I use the number 18, it will return 500,000 rows. Getting
those with an indexed lookup would be very bad. But typically, I'm doing
numbers in a very different range, and so the planner was able to know
that it would not likely find that number.
Thanks for pointing out what the query planner was thinking, I was able
to work around it.
John
=:->