Re: Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL - Mailing list pgsql-performance

From Richard Huxton
Subject Re: Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL
Date
Msg-id 4B59ED7C.2060908@archonet.com
Whole thread Raw
In response to Re: Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL  (Tory M Blue <tmblue@gmail.com>)
List pgsql-performance
On 22/01/10 18:03, Tory M Blue wrote:
> On Fri, Jan 22, 2010 at 1:42 AM, Richard Huxton<dev@archonet.com>  wrote:
>
>> On 21/01/10 22:15, Tory M Blue wrote:

>> 2. If it's mostly tagged=true you are interested in you can always use a
>> partial index: CREATE INDEX ... (makeid) WHERE tagged
>> This might be a win even if you need a second index with WHERE NOT tagged.
>>
>
> Partial index doesn't seem to fit here due to the fact that there are 35-40%
> Marked True.
>
> Didn't think about creating a second index for false, may give that a shot.

If you're mostly search tagged=true, try the partial index - it'll mean
the planner is just scanning the index for the one term.

>> Also, either I've not had enough cofee yet, or a bitmap scan is an odd
>> choice for only ~ 13000 rows out of 100 million.
>>
>>   *                "        ->    Bitmap Index Scan on
>>>
>>> idx_retargetuserstats_makeidtag  (cost=0.00..360.20 rows=13175 width=0)"*
>>>
>>> *                "              Index Cond: ((makeid =
>>> 'b1mw-ferman'::text)
>>> AND (tagged = true))"*
>>>
>>
>> Otherwise, see what Craig said.
>>
>> I'm assuming this isn't the query that is CPU bound for a long time. Unless
>> your table is horribly bloated, there's no reason for that judging by this
>> plan.
>
> It is, but not always, only when there are 10K more matches. And the explain
> unfortunately is sometimes way high or way low, so the expalin is hit and
> miss.
>
> But the same sql that returns maybe 500 rows is pretty fast, it's the return
> of 10K+ rows that seems to stall and is CPU Bound.

Hmm - might be able to push that cross-over point up a bit by tweaking
various costs, but you've got to be careful you don't end up making all
your other queries worse. It'd be good to figure out what the problem is
first.

Looking at the query there are four stages:
  1. Scan the index, build a bitmap of heap pages with matching rows
  2. Scan those pages, find the rows that match
  3. Run DISTINCT on the uids
  4. Count them
I wonder if it could be the DISTINCT. What happens with a count(*) or
count(uid) instead? Also - you might find EXPLAIN ANALYZE more useful
than straight EXPLAIN here. That will show actual times for each stage.

On Craig's branch of this thread, you say you call it 6000 times with
different "makeid"s. Any reason why you can't join to a temp table and
just do it in one query?

--
   Richard Huxton
   Archonet Ltd

pgsql-performance by date:

Previous
From: Tory M Blue
Date:
Subject: Re: Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL
Next
From: Matthew Wakeling
Date:
Subject: Re: Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL