Re: Query tuning - Mailing list pgsql-performance

From Kevin Kempter
Subject Re: Query tuning
Date
Msg-id 200908191136.55428.kevink@consistentstate.com
Whole thread Raw
In response to Re: Query tuning  (Scott Carey <scott@richrelevance.com>)
List pgsql-performance
On Wednesday 19 August 2009 11:17:26 Scott Carey wrote:
> On 8/19/09 9:28 AM, "Kevin Kempter" <kevink@consistentstate.com> wrote:
> > Hi all;
> >
> > we've been fighting this query for a few days now. we bumped up the
> > statistict target for the a.id , c.url_hits_id and the b.id columns below
> > to 250 and ran an analyze on the relevant tables.  we killed it after
> > 8hrs.
> >
> > Note the url_hits table has > 1.4billion rows
> >
> > Any suggestions?
>
> Have you tried setting work_mem higher for just this query?

Yes, we upped it to 500Meg


>
> The big estimated cost is the sequential scan on url_hits.  But in reality,
> if the estimates are off the sort and index scan at the end might be your
> bottleneck.  Larger work_mem might make it choose another plan there.
>
> But if the true cost is the sequential scan on url_hits, then only an index
> there will help.
>
> > $ psql -ef expl.sql pwreport
> > explain
> > select
> > a.id,
> > ident_id,
> > time,
> > customer_name,
> > extract('day' from timezone(e.name, to_timestamp(a.time))) as day,
> > category_id
> > from
> > pwreport.url_hits a left outer join
> > pwreport.url_hits_category_jt c on (a.id = c.url_hits_id),
> > pwreport.ident b,
> > pwreport.timezone e
> > where
> > a.ident_id = b.id
> > and b.timezone_id = e.id
> > and time >= extract ('epoch' from timestamp '2009-08-12')
> > and time < extract ('epoch' from timestamp '2009-08-13' )
> > and direction = 'REQUEST'
> > ;
> >
> > QUERY
> > PLAN
> > -------------------------------------------------------------------------
> >-----
> > -------------------------------------------------------------------------
> >----- --------------------------------------------------------
> >  Merge Right Join  (cost=47528508.61..180424544.59 rows=10409251
> > width=53) Merge Cond: (c.url_hits_id = a.id)
> >    ->  Index Scan using mt_url_hits_category_jt_url_hits_id_index on
> > url_hits_category_jt c  (cost=0.00..122162596.63 rows=4189283233 width=8)
> >    ->  Sort  (cost=47528508.61..47536931.63 rows=3369210 width=49)
> >          Sort Key: a.id
> >          ->  Hash Join  (cost=2565.00..47163219.21 rows=3369210 width=49)
> >                Hash Cond: (b.timezone_id = e.id)
> >                ->  Hash Join  (cost=2553.49..47116881.07 rows=3369210
> > width=37)
> >                      Hash Cond: (a.ident_id = b.id)
> >                      ->  Seq Scan on url_hits a  (cost=0.00..47051154.89
> > rows=3369210 width=12)
> >                            Filter: ((direction =
> > 'REQUEST'::proxy_direction_enum) AND (("time")::double precision >=
> > 1250035200::double precision) AND (("time")::double precision <
> > 1250121600::double precision))
> >                      ->  Hash  (cost=2020.44..2020.44 rows=42644
> > width=29) ->  Seq Scan on ident b  (cost=0.00..2020.44 rows=42644
> > width=29)
> >                ->  Hash  (cost=6.78..6.78 rows=378 width=20)
> >                      ->  Seq Scan on timezone e  (cost=0.00..6.78
> > rows=378 width=20)
> > (15 rows)
> >
> >
> > --
> > Sent via pgsql-performance mailing list
> > (pgsql-performance@postgresql.org) To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-performance


pgsql-performance by date:

Previous
From: Nikolas Everett
Date:
Subject: Re: Query tuning
Next
From: Kevin Kempter
Date:
Subject: Re: Query tuning