Re: a JOIN on same table, but 'slided over' - Mailing list pgsql-general
| From | Rafal Pietrak |
|---|---|
| Subject | Re: a JOIN on same table, but 'slided over' |
| Date | |
| Msg-id | 1182892293.28091.72.camel@zorro.isa-geek.com Whole thread Raw |
| In response to | a JOIN on same table, but 'slided over' (Rafal Pietrak <rafal@zorro.isa-geek.com>) |
| Responses |
Re: a JOIN on same table, but 'slided over'
|
| List | pgsql-general |
I see. (Have actually tried it on a larger dataset - to see it for
myself ... it is optimised :)
Thenx again!
-R
On Tue, 2007-06-26 at 19:56 +0530, Gurjeet Singh wrote:
> It _is_ the optimised version.... as you can see from the explain
> plans posted in the other mail, the planner shows that the cost is
> drastically less than the 'distinct on' version.
>
> For smaller data-sets 'distinct-on' version might seem faster, but
> for reasonably larger datasets, it's performance deteriorates
> exponentially... This is because of the Nested-loops involved in the
> plan...
>
> I increased your data-set to 10240 rows by executing the following
> query 10 times:
>
> insert into test select id+(select max(id) from test), thread, info
> from test;
>
> On such data-set (which is not very large by any means), the
> standard SQL version executes in almost a second, and on the other
> hand, I had to cancel the EXPLAIN ANALYZE of the 'distinct on' query
> after letting it run for over three minutes!!!
>
> postgres=# explain analyze
> postgres-# select t1.id as id, t2.id as "id+1",
> postgres-# t1.thread as thread, t2.thread as "thread+1",
> postgres-# t1.info as info, t2.info as "info+1"
> postgres-# from test as t1, test as t2
> postgres-# where t2.id = ( select min(id) from test as t3 where t3.id
> > t1.id )
> postgres-# order by t1.id asc;
>
> QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Sort (cost=2971.36..2996.96 rows=10240 width=24) (actual
> time=1004.031..1030.116 rows=10239 loops=1)
> Sort Key: t1.id
> Sort Method: external sort Disk: 416kB
> -> Merge Join (cost=840.48..2289.28 rows=10240 width=24) (actual
> time=834.218..956.595 rows=10239 loops=1)
> Merge Cond: (t2.id = ((subplan)))
> -> Index Scan using test_id_key on test t2
> (cost=0.00..332.85 rows=10240 width=12) (actual time=0.060..24.503
> rows=10240 loops=1)
> -> Sort (cost=840.48..866.08 rows=10240 width=12) (actual
> time=834.129..854.776 rows=10240 loops=1)
> Sort Key: ((subplan))
> Sort Method: quicksort Memory: 928kB
> -> Seq Scan on test t1 (cost=0.00..158.40 rows=10240
> width=12)(actual time=0.196..797.752 rows=10240 loops=1)
> SubPlan
> -> Result (cost= 0.04..0.05 rows=1 width=0)
> (actual time=0.062..0.064 rows=1 loops=10240)
> InitPlan
> -> Limit (cost=0.00..0.04 rows=1
> width=4) (actual time=0.047..0.050 rows=1 loops=10240)
> -> Index Scan using test_id_key
> on test t3 (cost=0.00..121.98 rows=3413 width=4) (actual time=
> 0.038..0.038 rows=1 loops=10240)
> Index Cond: (id > $0)
> Filter: (id IS NOT NULL)
> Total runtime: 1052.802 ms
> (18 rows)
> Time: 1056.740 ms
>
> postgres=# explain analyze
> postgres-# select
> postgres-# distinct on (t1.id)
> postgres-# t1.*, t2.*
> postgres-# from
> postgres-# test t1
> postgres-# join test t2 on t2.id > t1.id
> postgres-# order by t1.id asc, t2.id asc;
> Cancel request sent
> ERROR: canceling statement due to user request
> postgres=#
>
>
>
> On 6/26/07, Rafal Pietrak <rafal@zorro.isa-geek.com> wrote:
> OK. Have tried this one.... looks like close to 6 times slower
> then the
> 'non-standard' phrase with 'distinct on'.
>
> On the small dataset that I've included in my original post
> (ten rows of
> data within TEST), I've run both queries through EXPLAIN
> ANALYSE, with
> the following result summary (for clearity, I've cut away the
> details
> from EXPLAIN output):
>
> -----------STANDARD
> Total runtime: 10.660 ms
> -----------DISTINCT-ON
> Total runtime: 1.479 ms
> -----------
>
> Would there be ways to optimise the standard query to get the
> performance closer to the none-standard one?
>
>
> -R
>
>
> On Tue, 2007-06-26 at 18:05 +0530, Gurjeet Singh wrote:
> > Hi Rafal,
> >
> > Just a note that this is not standard SQL... 'distinct
> on' is an
> > extension to SQL provided by postgres.
> >
> > Following query utilizes the standard SQL to get the same
> results:
> >
> > select t1.id as id, t2.id as "id+1",
> > t1.thread as thread, t2.thread as "thread+1",
> > t1.info as info, t2.info as "info+1"
> > from test as t1, test as t2
> > where t2.id = ( select min(id) from test as t3 where t3.id >
> t1.id);
> >
> > HTH
> > --
> > gurjeet[.singh]@EnterpriseDB.com
> > singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com
> >
> > 17°29'34.37"N 78°30' 59.76"E - Hyderabad *
> > 18°32'57.25"N 73°56'25.42 "E - Pune
> >
> > Sent from my BlackLaptop device
> >
> > On 6/26/07, Rafal Pietrak <rafal@zorro.isa-geek.com> wrote:
> > Marvelous! Thenx!
> >
> > -R
> >
> > On Tue, 2007-06-26 at 10:06 +0200, hubert depesz
> lubaczewski
> > wrote:
> > > On 6/26/07, Rafal Pietrak <
> rafal@zorro.isa-geek.com> wrote:
> > > Is there an SQL construct to get it?
> > >
> > > select
> > > distinct on ( t1.id)
> > > t1.*, t2.*
> > > from
> > > test t1
> > > join test t2 on t2.id > t1.id
> > > order by t1.id asc, t2.id asc
> > >
> > > should do the trick.
> > >
> > > depesz
> > >
> > > --
> > > http://www.depesz.com/ - nowy, lepszy depesz
> >
> > ---------------------------(end of
> > broadcast)---------------------------
> > TIP 4: Have you searched our list archives?
> >
> > http://archives.postgresql.org/
> >
>
>
>
> --
> gurjeet[.singh]@EnterpriseDB.com
> singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com
>
> 17°29'34.37"N 78°30'59.76"E - Hyderabad *
> 18°32'57.25"N 73°56'25.42"E - Pune
>
> Sent from my BlackLaptop device
pgsql-general by date: