Re: Slow query on OS X box - Mailing list pgsql-performance

From Rod Taylor
Subject Re: Slow query on OS X box
Date
Msg-id 1043276059.83856.180.camel@jester
Whole thread Raw
In response to Re: Slow query on OS X box  ("Patrick Hatcher" <PHatcher@macys.com>)
List pgsql-performance
Yup, since you still need to pull everything off the disk (the slowest
part), which is quite a bit of data.  You're simply dealing with a lot
of data for a single query -- not much you can do.

Is this a dedicated -- one client doing big selects like this?

Knock your shared_buffers down to about 2000, bump your sort mem up to
around 32MB (128MB or so if it's a dedicated box with a vast majority of
queries like the below).


Okay, need to do something about the rest of the data. 13million * 2k is
a big number.  Do you have a set of columns that are rarely used?  If
so, toss them into a separate table and link via a unique identifier
(int4).  It'll cost extra when you do hit them, but pulling out a few of
the large ones information wise would buy quite a bit.

Now, wizard.  For that particular query it would be best if entries were
made for all the values of wizard into a lookup table, and change
search_log.wizard into a reference to that entry in the lookup.  Index
the lookup table well (one in the wizard primary key -- int4, and a
unique index on the 'wizard' varchar).  Group by the number, join to the
lookup table for the name.

Any other values with highly repetitive data?  Might want to consider
doing the same for them.

In search_log, index the numeric representation of 'wizard' (key from
lookup table), but don't bother indexing numbers that occur regularly.
Look up how to create a partial index.  Ie. The value 'Keyword' could be
skipped as it occurs once in four tuples -- too often for an index to be
useful.


On Wed, 2003-01-22 at 15:49, Patrick Hatcher wrote:
> Sorry I'm being really dense today.  I didn't even notice the 3.2 million
> row being returned. :(
>
> To answer your question, no, all fields would not have data.  The data we
> receive is from a Web log file.  It's parsed and then uploaded to this
> table.
>
> I guess the bigger issue is that when trying to do aggregates, grouping by
> the wizard field, it takes just as long.
>
> Ex:
> mdc_oz=# explain analyze select wizard,count(wizard) from search_log where
> sdate
>  between '2002-12-01' and '2003-01-15' group by wizard;
>                                                                   QUERY
> PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=1083300.43..1112411.55 rows=388148 width=10) (actual
> time=229503.85..302617.75 rows=14 loops=1)
>    ->  Group  (cost=1083300.43..1102707.84 rows=3881482 width=10) (actual
> time=229503.60..286014.83 rows=3717161 loops=1)
>          ->  Sort  (cost=1083300.43..1093004.14 rows=3881482 width=10)
> (actual time=229503.57..248415.81 rows=3717161 loops=1)
>                Sort Key: wizard
>                ->  Seq Scan on search_log  (cost=0.00..575217.57
> rows=3881482 width=10) (actual time=91235.76..157559.58 rows=3717161
> loops=1)
>                      Filter: ((sdate >= '2002-12-01'::date) AND (sdate
> <= '2003-01-15'::date))
>  Total runtime: 302712.48 msec
> (7 rows)

> On Wed, 2003-01-22 at 13:26, Patrick Hatcher wrote:
> > I have a table that contains over 13 million rows.  This query takes an
> > extremely long time to return.  I've vacuum full, analyzed, and
> re-indexed
> > the table. Still the results are the same.  Any ideas?

> > mdc_oz=# explain analyze select wizard from search_log where wizard
> > ='Keyword' and sdate between '2002-12-01' and '2003-01-15';
> >                                                          QUERY PLAN
> >
>
-----------------------------------------------------------------------------------------------------------------------------
>
> >  Seq Scan on search_log  (cost=0.00..609015.34 rows=3305729 width=10)
> > (actual time=99833.83..162951.25 rows=3280573 loops=1)
> >    Filter: ((wizard = 'Keyword'::character varying) AND (sdate >
> > = '2002-12-01'::date) AND (sdate <= '2003-01-15'::date))
> >  Total runtime: 174713.25 msec
> > (3 rows)
> --
> Rod Taylor <rbt@rbt.ca>
>
> PGP Key: http://www.rbt.ca/rbtpub.asc
> (See attached file: signature.asc)
>
>
> ______________________________________________________________________
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc

Attachment

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Slow query on OS X box
Next
From: Tom Lane
Date:
Subject: Proposal: relaxing link between explicit JOINs and execution order