Re: Slow query on OS X box - Mailing list pgsql-performance
From | Patrick Hatcher |
---|---|
Subject | Re: Slow query on OS X box |
Date | |
Msg-id | OF2F8C390D.61ED20AE-ON88256CB7.00006621@fds.com Whole thread Raw |
In response to | Slow query on OS X box ("Patrick Hatcher" <PHatcher@macys.com>) |
List | pgsql-performance |
Thanks everyone. I'll give your suggestions a try and report back. Patrick Hatcher Macys.Com Legacy Integration Developer 415-422-1610 office HatcherPT - AIM Rod Taylor <rbt@rbt.ca> Sent by: To: Patrick Hatcher <PHatcher@macys.com> pgsql-performance-owner@post cc: Postgresql Performance <pgsql-performance@postgresql.org> gresql.org Subject: Re: [PERFORM] Slow query on OS X box 01/22/2003 02:54 PM Yup, since you still need to pull everything off the disk (the slowest part), which is quite a bit of data. You're simply dealing with a lot of data for a single query -- not much you can do. Is this a dedicated -- one client doing big selects like this? Knock your shared_buffers down to about 2000, bump your sort mem up to around 32MB (128MB or so if it's a dedicated box with a vast majority of queries like the below). Okay, need to do something about the rest of the data. 13million * 2k is a big number. Do you have a set of columns that are rarely used? If so, toss them into a separate table and link via a unique identifier (int4). It'll cost extra when you do hit them, but pulling out a few of the large ones information wise would buy quite a bit. Now, wizard. For that particular query it would be best if entries were made for all the values of wizard into a lookup table, and change search_log.wizard into a reference to that entry in the lookup. Index the lookup table well (one in the wizard primary key -- int4, and a unique index on the 'wizard' varchar). Group by the number, join to the lookup table for the name. Any other values with highly repetitive data? Might want to consider doing the same for them. In search_log, index the numeric representation of 'wizard' (key from lookup table), but don't bother indexing numbers that occur regularly. Look up how to create a partial index. Ie. The value 'Keyword' could be skipped as it occurs once in four tuples -- too often for an index to be useful. On Wed, 2003-01-22 at 15:49, Patrick Hatcher wrote: > Sorry I'm being really dense today. I didn't even notice the 3.2 million > row being returned. :( > > To answer your question, no, all fields would not have data. The data we > receive is from a Web log file. It's parsed and then uploaded to this > table. > > I guess the bigger issue is that when trying to do aggregates, grouping by > the wizard field, it takes just as long. > > Ex: > mdc_oz=# explain analyze select wizard,count(wizard) from search_log where > sdate > between '2002-12-01' and '2003-01-15' group by wizard; > QUERY > PLAN > ----------------------------------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=1083300.43..1112411.55 rows=388148 width=10) (actual > time=229503.85..302617.75 rows=14 loops=1) > -> Group (cost=1083300.43..1102707.84 rows=3881482 width=10) (actual > time=229503.60..286014.83 rows=3717161 loops=1) > -> Sort (cost=1083300.43..1093004.14 rows=3881482 width=10) > (actual time=229503.57..248415.81 rows=3717161 loops=1) > Sort Key: wizard > -> Seq Scan on search_log (cost=0.00..575217.57 > rows=3881482 width=10) (actual time=91235.76..157559.58 rows=3717161 > loops=1) > Filter: ((sdate >= '2002-12-01'::date) AND (sdate > <= '2003-01-15'::date)) > Total runtime: 302712.48 msec > (7 rows) > On Wed, 2003-01-22 at 13:26, Patrick Hatcher wrote: > > I have a table that contains over 13 million rows. This query takes an > > extremely long time to return. I've vacuum full, analyzed, and > re-indexed > > the table. Still the results are the same. Any ideas? > > mdc_oz=# explain analyze select wizard from search_log where wizard > > ='Keyword' and sdate between '2002-12-01' and '2003-01-15'; > > QUERY PLAN > > > ----------------------------------------------------------------------------------------------------------------------------- > > > Seq Scan on search_log (cost=0.00..609015.34 rows=3305729 width=10) > > (actual time=99833.83..162951.25 rows=3280573 loops=1) > > Filter: ((wizard = 'Keyword'::character varying) AND (sdate > > > = '2002-12-01'::date) AND (sdate <= '2003-01-15'::date)) > > Total runtime: 174713.25 msec > > (3 rows) > -- > Rod Taylor <rbt@rbt.ca> > > PGP Key: http://www.rbt.ca/rbtpub.asc > (See attached file: signature.asc) > > > ______________________________________________________________________ > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org -- Rod Taylor <rbt@rbt.ca> PGP Key: http://www.rbt.ca/rbtpub.asc (See attached file: signature.asc)
Attachment
pgsql-performance by date: