Re: improving performance of UNION and ORDER BY - Mailing list pgsql-general

From Darren Ferguson
Subject Re: improving performance of UNION and ORDER BY
Date
Msg-id Pine.LNX.4.10.10203041034090.27113-100000@thread.crystalballinc.com
Whole thread Raw
In response to Re: improving performance of UNION and ORDER BY  (Chris Gamache <cgg007@yahoo.com>)
List pgsql-general
This is not always faster but sometimes when using Postgres i have found
it to be significantly see below marked ----->>

I have removed the join from the where clause and put it in the from
clause .

Probably wont make a difference but i have had 7 occassions when it have
been significantly faster and i mean cutting 9 seconds to 1 second etc

Hope this helps

Darren Ferguson

On Mon, 4 Mar 2002, Chris Gamache wrote:

> New Query...
>
> (select
>    a.username as "User",
>    a.trans_date as "Date",
>    tl.longtype as "Type",
>    a.trans_data as "Query Data",
>    a.trans_charge as "Charged",
>    a.user_reference_id as "Reference ID"
>  from a_trans_log a
--->>   join addtypelong tl on a.trans_type = t1.shorttype
>  where (
>    (a.trans_date >= '12/31/01'::TIMESTAMP)
> ) order by 4 desc, 2 limit 20)
> union
> (select
>    b.username as "User",
>    b.trans_date as "Date",
>    tl.longtype as "Type",
>    b.trans_data as "Query Data",
>    b.trans_charge as "Charged",
>    b.user_reference_id as "Reference ID"
>  from b_trans_log b
--->>   join addtypelong tl on b.trans_type = t1.shorttype
>  where (
>    (b.trans_date >= '12/31/01'::TIMESTAMP)
> ) order by 4 desc, 2 limit 20)
> order by 4 desc, 2 limit 20;
>
> Limit  (cost=12674.47..12674.47 rows=4 width=84)
>   ->  Sort  (cost=12674.47..12674.47 rows=4 width=84)
>         ->  Unique  (cost=12673.83..12674.43 rows=4 width=84)
>               ->  Sort  (cost=12673.83..12673.83 rows=40 width=84)
>                     ->  Append  (cost=12055.58..12672.77 rows=40 width=84)
>                           ->  Subquery Scan *SELECT* 1
> (cost=12055.58..12055.58 rows=20 width=84)
>                                 ->  Limit  (cost=12055.58..12055.58 rows=20
> width=84)
>                                       ->  Sort  (cost=12055.58..12055.58
> rows=23724 width=84)
>                                             ->  Hash Join  (cost=1.20..9674.89
> rows=23724 width=84)
>                                                   ->  Seq Scan on a_trans_log a
>  (cost=0.00..8695.30 rows=24455 width=60)
>                                                   ->  Hash  (cost=1.16..1.16
> rows=16 width=24)
>                                                         ->  Seq Scan on
> addtypelong tl  (cost=0.00..1.16 rows=16 width=24)
>                           ->  Subquery Scan *SELECT* 2  (cost=617.19..617.19
> rows=20 width=84)
>                                 ->  Limit  (cost=617.19..617.19 rows=20
> width=84)
>                                       ->  Sort  (cost=617.19..617.19 rows=2462
> width=84)
>                                             ->  Hash Join  (cost=1.20..478.50
> rows=2462 width=84)
>                                                   ->  Seq Scan on b_trans_log b
>  (cost=0.00..378.61 rows=2462 width=60)
>                                                   ->  Hash  (cost=1.16..1.16
> rows=16 width=24)
>                                                         ->  Seq Scan on
> addtypelong tl  (cost=0.00..1.16 rows=16 width=24)
>
> That cut the query down to 6 seconds. About 5 seconds longer than I would like
> it. I'll take any performance increase, tho. I tried SET enable_seqscan=off;
> And it takes 8 seconds. Strange indeed.
>
> The reason I don't put the two tables together is that the tables don't have
> the exact same structure... I suppose I could merge the two, but it would be
> counterintuitive to logically apply some fields in table a to data collected
> for table b. Plus, I do so many more operations on the single tables than the
> joined tables that it wouldn't make good use of programming time to rewrite all
> the single table ops.
>
> I know... excuses, excuses! Thanks for the input. Any more ideas?
>
> CG
>
> --- Jean-Luc Lachance <jllachan@nsd.ca> wrote:
> > Chris,
> >
> > I believe you can cut down the processing time by including the order by
> > and limit in each select:
> >
> > (SELECT ... ORDER BY 2 DESC, 4 LIMIT 20)
> > union
> > (SELECT ... ORDER BY 2 DESC, 4 LIMIT 20)
> > ORDER BY 2 DESC, 4 LIMIT 20
> >
> > Give it a try and let me know
> >
> >
> > JLL
> >
> > P.S.
> >
> > Question:  Why not have a single table with a field log_type in ( 'A',
> > 'B') ???
> >
> >
> >
> > Chris Gamache wrote:
> > >
> > > Three tables...
> > > [...]
> > >
> > > I imagine the combination of UNION and ORDER BY causes the problem, since
> > > Postgres has to locate all the rows that match the search criteria, merge
> > them,
> > > order them, then return the top 20...
> > >
> > > Any suggestions? Did I forget to provide any data that would make things
> > > clearer?
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 2: you can get off all lists at once with the unregister command
> >     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>
>
>
> __________________________________________________
> Do You Yahoo!?
> Yahoo! Sports - sign up for Fantasy Baseball
> http://sports.yahoo.com
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>


pgsql-general by date:

Previous
From: Benjamin Franks
Date:
Subject: COPY FROM and TABLE LOCK question
Next
From: Tom Lane
Date:
Subject: Re: Arrays in pgsql