Re: Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL - Mailing list pgsql-performance

From Richard Huxton
Subject Re: Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL
Date
Msg-id 4B5D7866.5090900@archonet.com
Whole thread Raw
In response to Re: Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL  (Tory M Blue <tmblue@gmail.com>)
Responses Re: Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL  (Matthew Wakeling <matthew@flymine.org>)
List pgsql-performance
On 22/01/10 19:06, Tory M Blue wrote:

 > Here is the explain plan for the query. Actual rows that the query
 > returns is 6369

Actually, it processes 19,799 rows (see the actual rows= below).

> SLOW

> "  ->   Bitmap Heap Scan on userstats  (cost=797.69..118850.46
> rows=13399 width=8) (actual time=281.604..31190.290 rows=19799
> loops=1)"

> "Total runtime: 31219.536 ms"

> FAST

> "  ->   Bitmap Heap Scan on userstats a  (cost=802.66..118855.43
> rows=33276 width=23) (actual time=55.400..3807.908 rows=2606 loops=1)"

> "Total runtime: 3813.626 ms"

OK - so the first query processes 19,799 rows in 31,219 ms (about 1.5ms
per row)

The second processes 2,606 rows in 3,813 ms (about 1.3ms per row).

You are asking for DISTINCT user-ids, so it's seems reasonable that it
will take slightly longer to check a larger set of user-ids.

Otherwise, both queries are the same. I'm still a little puzzled by the
bitmap scan, but the planner probably knows more about your data than I do.

The main time is spent in the "bitmap heap scan" which is where it's
grabbing actual row data (and presumably building a hash over the uid
column). you can see how long in the "actual time" the first number
(e.g. 281.604) is the time spent before it starts, and the second is the
total time at finish (31190.290). If "loops" was greater than 1 you
would multiply the times by the number of loops to get a total.

So - there's nothing "wrong" in the sense that the second query does the
same as the first. Let's take a step back. What you really want is your
reports to be faster.

You mentioned you were running this query thousands of times with a
different "makeid" each time. Running it once for all possible values
and stashing the results in a temp table will probably be *much* faster.
The planner can just scan the whole table once and build up its results
as it goes.

--
   Richard Huxton
   Archonet Ltd

pgsql-performance by date:

Previous
From: "A. Kretschmer"
Date:
Subject: Re: Sql result b where condition
Next
From: Matthew Wakeling
Date:
Subject: Re: Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL