Tom Lane wrote:
> Karl Wright <kwright@metacarta.com> writes:
>> Also, as I said before, I have done extensive query analysis and found
>> that the plans for the queries that are taking a long time are in fact
>> very reasonable. Here's an example from the application log of a query
>> that took way more time than its plan would seem to indicate it should:
>
>> [2007-06-18 09:39:49,797]ERROR Plan: Index Scan using i1181764142395 on
>> intrinsiclink (cost=0.00..14177.29 rows=5 width=253)
>> [2007-06-18 09:39:49,797]ERROR Plan: Index Cond: ((jobid = $2) AND
>> ((childidhash)::text = ($3)::text))
>> [2007-06-18 09:39:49,797]ERROR Plan: Filter: ((childid = ($4)::text)
>> AND ((isnew = ($5)::bpchar) OR (isnew = ($6)::bpchar)))
>
> I see the discussion thread has moved on to consider lack-of-vacuuming
> as the main problem, but I didn't want to let this pass without
> comment. The above plan is not necessarily good at all --- it depends
> on how many rows are selected by the index condition alone (ie, jobid
> and childidhash) versus how many are selected by the index and filter
> conditions. If the index retrieves many rows, most of which are
> eliminated by the filter condition, it's still gonna take a long time.
>
> In this case it looks like the planner is afraid that that's exactly
> what will happen --- a cost of 14177 suggests that several thousand row
> fetches are expected to happen, and yet it's only predicting 5 rows out
> after the filter. It's using this plan anyway because it has no better
> alternative, but you should think about whether a different index
> definition would help.
>
> regards, tom lane
>
Well, that's odd, because the hash in question that it is using is the
SHA-1 hash of a URL. There's essentially one row per URL in this table.
Even with a large table I would not expect more than a couple of
collisions at most.
How does it arrive at that estimate of 14,000?
Karl