Re: Performance query about large tables, lots of concurrent access - Mailing list pgsql-performance

From Karl Wright
Subject Re: Performance query about large tables, lots of concurrent access
Date
Msg-id 4677ECA9.3000402@metacarta.com
Whole thread Raw
In response to Re: Performance query about large tables, lots of concurrent access  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Performance query about large tables, lots of concurrent access
Re: Performance query about large tables, lots of concurrent access
List pgsql-performance
Tom Lane wrote:
> Karl Wright <kwright@metacarta.com> writes:
>> Also, as I said before, I have done extensive query analysis and found
>> that the plans for the queries that are taking a long time are in fact
>> very reasonable.  Here's an example from the application log of a query
>> that took way more time than its plan would seem to indicate it should:
>
>> [2007-06-18 09:39:49,797]ERROR  Plan: Index Scan using i1181764142395 on
>> intrinsiclink  (cost=0.00..14177.29 rows=5 width=253)
>> [2007-06-18 09:39:49,797]ERROR  Plan:   Index Cond: ((jobid = $2) AND
>> ((childidhash)::text = ($3)::text))
>> [2007-06-18 09:39:49,797]ERROR  Plan:   Filter: ((childid = ($4)::text)
>> AND ((isnew = ($5)::bpchar) OR (isnew = ($6)::bpchar)))
>
> I see the discussion thread has moved on to consider lack-of-vacuuming
> as the main problem, but I didn't want to let this pass without
> comment.  The above plan is not necessarily good at all --- it depends
> on how many rows are selected by the index condition alone (ie, jobid
> and childidhash) versus how many are selected by the index and filter
> conditions.  If the index retrieves many rows, most of which are
> eliminated by the filter condition, it's still gonna take a long time.
>
> In this case it looks like the planner is afraid that that's exactly
> what will happen --- a cost of 14177 suggests that several thousand row
> fetches are expected to happen, and yet it's only predicting 5 rows out
> after the filter.  It's using this plan anyway because it has no better
> alternative, but you should think about whether a different index
> definition would help.
>
>             regards, tom lane
>

Well, that's odd, because the hash in question that it is using is the
SHA-1 hash of a URL.  There's essentially one row per URL in this table.
  Even with a large table I would not expect more than a couple of
collisions at most.

How does it arrive at that estimate of 14,000?

Karl


pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Performance query about large tables, lots of concurrent access
Next
From: Tom Lane
Date:
Subject: Re: Performance query about large tables, lots of concurrent access