Re: [pgsql-hackers-win32] Poor Performance for large queries - Mailing list pgsql-performance

From John Meinel
Subject Re: [pgsql-hackers-win32] Poor Performance for large queries
Date
Msg-id 415AE65E.3030200@johnmeinel.com
Whole thread Raw
In response to Re: [pgsql-hackers-win32] Poor Performance for large queries in functions  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Tom Lane wrote:
> [ enlarging on Richard's response a bit ]
>
> John Meinel <john@johnmeinel.com> writes:
>
>>jfmeinel=> explain analyze execute myget(30000);
>>                                                  QUERY PLAN
>>--------------------------------------------------------------------
>>  Seq Scan on tdata  (cost=0.00..9773.10 rows=181923 width=4)
>>    (actual time=1047.000..1047.000 rows=0 loops=1)
>>    Filter: (project_id = $1)
>>  Total runtime: 1047.000 ms
>
>
>>jfmeinel=> explain analyze select id from tdata where project_id = 30000;
>>                                                          QUERY PLAN
>
>
>>-------------------------------------------------------------------------
>>  Index Scan using tdata_project_id_idx on tdata  (cost=0.00..4.20
>>rows=1 width=4) (actual time=0.000..0.000 rows=0 loops =1)
>>    Index Cond: (project_id = 30000)
>>  Total runtime: 0.000 ms
>
>
>>So notice that when doing the actual select it is able to do the index
>>query. But for some reason with a prepared statement, it is not able to
>>do it.
>
>
> This isn't a "can't do it" situation, it's a "doesn't want to do it"
> situation, and it's got nothing whatever to do with null or not null.
> The issue is the estimated row count, which in the first case is so high
> as to make the seqscan approach look cheaper.  So the real question here
> is what are the statistics on the column that are making the planner
> guess such a large number when it has no specific information about the
> compared-to value.  Do you have one extremely common value in the column?
> Have you done an ANALYZE recently on the table, and if so can you show
> us the pg_stats row for the column?
>
>             regards, tom lane
>

The answer is "yes" that particular column has very common numbers in
it. Project id is a number from 1->21. I ended up modifying my query
such that I do the bulk of the work in a regular UNION SELECT so that
all that can be optimized, and then I later do another query for this
row in an 'EXECUTE ...' so that unless I'm actually requesting a small
number, the query planner can notice that it can do an indexed query.

I'm pretty sure this is just avoiding worst case scenario. Because it is
true that if I use the number 18, it will return 500,000 rows. Getting
those with an indexed lookup would be very bad. But typically, I'm doing
numbers in a very different range, and so the planner was able to know
that it would not likely find that number.

Thanks for pointing out what the query planner was thinking, I was able
to work around it.

John
=:->

Attachment

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: [pgsql-hackers-win32] Poor Performance for large queries in functions
Next
From: SZŰCS Gábor
Date:
Subject: stubborn query confuses two different servers