Re: a JOIN on same table, but 'slided over' - Mailing list pgsql-general
From | Rafal Pietrak |
---|---|
Subject | Re: a JOIN on same table, but 'slided over' |
Date | |
Msg-id | 1182892293.28091.72.camel@zorro.isa-geek.com Whole thread Raw |
In response to | a JOIN on same table, but 'slided over' (Rafal Pietrak <rafal@zorro.isa-geek.com>) |
Responses |
Re: a JOIN on same table, but 'slided over'
|
List | pgsql-general |
I see. (Have actually tried it on a larger dataset - to see it for myself ... it is optimised :) Thenx again! -R On Tue, 2007-06-26 at 19:56 +0530, Gurjeet Singh wrote: > It _is_ the optimised version.... as you can see from the explain > plans posted in the other mail, the planner shows that the cost is > drastically less than the 'distinct on' version. > > For smaller data-sets 'distinct-on' version might seem faster, but > for reasonably larger datasets, it's performance deteriorates > exponentially... This is because of the Nested-loops involved in the > plan... > > I increased your data-set to 10240 rows by executing the following > query 10 times: > > insert into test select id+(select max(id) from test), thread, info > from test; > > On such data-set (which is not very large by any means), the > standard SQL version executes in almost a second, and on the other > hand, I had to cancel the EXPLAIN ANALYZE of the 'distinct on' query > after letting it run for over three minutes!!! > > postgres=# explain analyze > postgres-# select t1.id as id, t2.id as "id+1", > postgres-# t1.thread as thread, t2.thread as "thread+1", > postgres-# t1.info as info, t2.info as "info+1" > postgres-# from test as t1, test as t2 > postgres-# where t2.id = ( select min(id) from test as t3 where t3.id > > t1.id ) > postgres-# order by t1.id asc; > > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Sort (cost=2971.36..2996.96 rows=10240 width=24) (actual > time=1004.031..1030.116 rows=10239 loops=1) > Sort Key: t1.id > Sort Method: external sort Disk: 416kB > -> Merge Join (cost=840.48..2289.28 rows=10240 width=24) (actual > time=834.218..956.595 rows=10239 loops=1) > Merge Cond: (t2.id = ((subplan))) > -> Index Scan using test_id_key on test t2 > (cost=0.00..332.85 rows=10240 width=12) (actual time=0.060..24.503 > rows=10240 loops=1) > -> Sort (cost=840.48..866.08 rows=10240 width=12) (actual > time=834.129..854.776 rows=10240 loops=1) > Sort Key: ((subplan)) > Sort Method: quicksort Memory: 928kB > -> Seq Scan on test t1 (cost=0.00..158.40 rows=10240 > width=12)(actual time=0.196..797.752 rows=10240 loops=1) > SubPlan > -> Result (cost= 0.04..0.05 rows=1 width=0) > (actual time=0.062..0.064 rows=1 loops=10240) > InitPlan > -> Limit (cost=0.00..0.04 rows=1 > width=4) (actual time=0.047..0.050 rows=1 loops=10240) > -> Index Scan using test_id_key > on test t3 (cost=0.00..121.98 rows=3413 width=4) (actual time= > 0.038..0.038 rows=1 loops=10240) > Index Cond: (id > $0) > Filter: (id IS NOT NULL) > Total runtime: 1052.802 ms > (18 rows) > Time: 1056.740 ms > > postgres=# explain analyze > postgres-# select > postgres-# distinct on (t1.id) > postgres-# t1.*, t2.* > postgres-# from > postgres-# test t1 > postgres-# join test t2 on t2.id > t1.id > postgres-# order by t1.id asc, t2.id asc; > Cancel request sent > ERROR: canceling statement due to user request > postgres=# > > > > On 6/26/07, Rafal Pietrak <rafal@zorro.isa-geek.com> wrote: > OK. Have tried this one.... looks like close to 6 times slower > then the > 'non-standard' phrase with 'distinct on'. > > On the small dataset that I've included in my original post > (ten rows of > data within TEST), I've run both queries through EXPLAIN > ANALYSE, with > the following result summary (for clearity, I've cut away the > details > from EXPLAIN output): > > -----------STANDARD > Total runtime: 10.660 ms > -----------DISTINCT-ON > Total runtime: 1.479 ms > ----------- > > Would there be ways to optimise the standard query to get the > performance closer to the none-standard one? > > > -R > > > On Tue, 2007-06-26 at 18:05 +0530, Gurjeet Singh wrote: > > Hi Rafal, > > > > Just a note that this is not standard SQL... 'distinct > on' is an > > extension to SQL provided by postgres. > > > > Following query utilizes the standard SQL to get the same > results: > > > > select t1.id as id, t2.id as "id+1", > > t1.thread as thread, t2.thread as "thread+1", > > t1.info as info, t2.info as "info+1" > > from test as t1, test as t2 > > where t2.id = ( select min(id) from test as t3 where t3.id > > t1.id); > > > > HTH > > -- > > gurjeet[.singh]@EnterpriseDB.com > > singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com > > > > 17°29'34.37"N 78°30' 59.76"E - Hyderabad * > > 18°32'57.25"N 73°56'25.42 "E - Pune > > > > Sent from my BlackLaptop device > > > > On 6/26/07, Rafal Pietrak <rafal@zorro.isa-geek.com> wrote: > > Marvelous! Thenx! > > > > -R > > > > On Tue, 2007-06-26 at 10:06 +0200, hubert depesz > lubaczewski > > wrote: > > > On 6/26/07, Rafal Pietrak < > rafal@zorro.isa-geek.com> wrote: > > > Is there an SQL construct to get it? > > > > > > select > > > distinct on ( t1.id) > > > t1.*, t2.* > > > from > > > test t1 > > > join test t2 on t2.id > t1.id > > > order by t1.id asc, t2.id asc > > > > > > should do the trick. > > > > > > depesz > > > > > > -- > > > http://www.depesz.com/ - nowy, lepszy depesz > > > > ---------------------------(end of > > broadcast)--------------------------- > > TIP 4: Have you searched our list archives? > > > > http://archives.postgresql.org/ > > > > > > -- > gurjeet[.singh]@EnterpriseDB.com > singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com > > 17°29'34.37"N 78°30'59.76"E - Hyderabad * > 18°32'57.25"N 73°56'25.42"E - Pune > > Sent from my BlackLaptop device
pgsql-general by date: