Re: Slow query on OS X box - Mailing list pgsql-performance

From Patrick Hatcher
Subject Re: Slow query on OS X box
Date
Msg-id OFD9C08201.63194703-ON88256CB6.006EA449@fds.com
Whole thread Raw
In response to Slow query on OS X box  ("Patrick Hatcher" <PHatcher@macys.com>)
Responses Re: Slow query on OS X box  (Rod Taylor <rbt@rbt.ca>)
List pgsql-performance
Sorry I'm being really dense today.  I didn't even notice the 3.2 million
row being returned. :(

To answer your question, no, all fields would not have data.  The data we
receive is from a Web log file.  It's parsed and then uploaded to this
table.

I guess the bigger issue is that when trying to do aggregates, grouping by
the wizard field, it takes just as long.

Ex:
mdc_oz=# explain analyze select wizard,count(wizard) from search_log where
sdate
 between '2002-12-01' and '2003-01-15' group by wizard;
                                                                  QUERY
PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1083300.43..1112411.55 rows=388148 width=10) (actual
time=229503.85..302617.75 rows=14 loops=1)
   ->  Group  (cost=1083300.43..1102707.84 rows=3881482 width=10) (actual
time=229503.60..286014.83 rows=3717161 loops=1)
         ->  Sort  (cost=1083300.43..1093004.14 rows=3881482 width=10)
(actual time=229503.57..248415.81 rows=3717161 loops=1)
               Sort Key: wizard
               ->  Seq Scan on search_log  (cost=0.00..575217.57
rows=3881482 width=10) (actual time=91235.76..157559.58 rows=3717161
loops=1)
                     Filter: ((sdate >= '2002-12-01'::date) AND (sdate
<= '2003-01-15'::date))
 Total runtime: 302712.48 msec
(7 rows)

Thanks again for the help
Patrick Hatcher





                    Rod Taylor
                    <rbt@rbt.ca>         To:     Patrick Hatcher <PHatcher@macys.com>
                                         cc:     Postgresql Performance <pgsql-performance@postgresql.org>
                    01/22/2003           Subject:     Re: [PERFORM] Slow query on OS X box
                    12:02 PM






On Wed, 2003-01-22 at 13:26, Patrick Hatcher wrote:
> I have a table that contains over 13 million rows.  This query takes an
> extremely long time to return.  I've vacuum full, analyzed, and
re-indexed
> the table. Still the results are the same.  Any ideas?

Yeah, you're pulling out 3.2 million rows from (possibly) a wide table
bytewise.  Do all of those fields actually have data?  Thats always
going to take a while -- and I find it hard to believe you're actually
doing something with all of those rows that runs regularly.

If every one of those rows was maxed out (ignoring the text field at the
end) you could have ~ 15GB of data to pull out.  Without knowing the
type of data actually in the table, I'm going to bet it's a harddrive
limitation.

The index on 'wizard' is next to useless as at least 1/4 of the data in
the table is under the same key.  You might try a partial index on
'wizard' (skip the value 'Keyword').  It won't help this query, but
it'll help ones looking for values other than 'Keyword'.

Anyway, you might try a CURSOR.  Fetch rows out 5000 at a time, do some
work with them, then grab some more.  This -- more or less -- will allow
you to process the rows received while awaiting the remaining lines to
be processed by the database. Depending on what you're doing with them
it'll give a chance for the diskdrive to catch up.  If the kernels smart
it'll read ahead of the scan.  This doesn't remove read time, but hides
it while you're transferring the data out (from the db to your client)
or processing it.

> mdc_oz=# explain analyze select wizard from search_log where wizard
> ='Keyword' and sdate between '2002-12-01' and '2003-01-15';
>                                                          QUERY PLAN
>

-----------------------------------------------------------------------------------------------------------------------------

>  Seq Scan on search_log  (cost=0.00..609015.34 rows=3305729 width=10)
> (actual time=99833.83..162951.25 rows=3280573 loops=1)
>    Filter: ((wizard = 'Keyword'::character varying) AND (sdate >
> = '2002-12-01'::date) AND (sdate <= '2003-01-15'::date))
>  Total runtime: 174713.25 msec
> (3 rows)
--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc
(See attached file: signature.asc)


Attachment

pgsql-performance by date:

Previous
From: Ron Johnson
Date:
Subject: Re: Slow query on OS X box
Next
From: Tomasz Myrta
Date:
Subject: Re: Slow query on OS X box