Home > mailing lists

Re: a JOIN on same table, but 'slided over' - Mailing list pgsql-general

From	Gurjeet Singh
Subject	Re: a JOIN on same table, but 'slided over'
Date	June 29, 2007 21:53:10
Msg-id	65937bea0706260726i4b904361y37a701418a52c992@mail.gmail.com Whole thread Raw
In response to	Re: a JOIN on same table, but 'slided over' (Rafal Pietrak <rafal@zorro.isa-geek.com>)
List	pgsql-general

Tree view

    It _is_ the optimised version.... as you can see from the explain plans posted in the other mail, the planner shows that the cost is drastically less than the 'distinct on' version.

    For smaller data-sets 'distinct-on' version might seem faster, but for reasonably larger datasets, it's performance deteriorates exponentially... This is because of the Nested-loops involved in the plan...

    I increased your data-set to 10240 rows by executing the following query 10 times:

insert into test select id+(select max(id) from test), thread, info from test;

    On such data-set (which is not very large by any means), the standard SQL version executes in almost a second, and on the other hand, I had to cancel the EXPLAIN ANALYZE of the 'distinct on' query after letting it run for over three minutes!!!

postgres=# explain analyze
postgres-# select       t1.id as id, t2.id as "id+1",
postgres-#              t1.thread as thread, t2.thread as "thread+1",
postgres-#              t1.info as info, t2.info as "info+1"
postgres-# from test as t1, test as t2
postgres-# where t2.id = ( select min(id) from test as t3 where t3.id > t1.id )
postgres-# order by t1.id asc;
                                                                              QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=2971.36..2996.96 rows=10240 width=24) (actual time=1004.031..1030.116 rows=10239 loops=1)
   Sort Key: t1.id
   Sort Method: external sort Disk: 416kB
   -> Merge Join (cost=840.48..2289.28 rows=10240 width=24) (actual time=834.218..956.595 rows=10239 loops=1)
         Merge Cond: (t2.id = ((subplan)))
         -> Index Scan using test_id_key on test t2 (cost=0.00..332.85 rows=10240 width=12) (actual time=0.060..24.503 rows=10240 loops=1)
         -> Sort (cost=840.48..866.08 rows=10240 width=12) (actual time=834.129..854.776 rows=10240 loops=1)
               Sort Key: ((subplan))
               Sort Method: quicksort Memory: 928kB
               -> Seq Scan on test t1 (cost=0.00..158.40 rows=10240 width=12)(actual time=0.196..797.752 rows=10240 loops=1)
                     SubPlan
                       -> Result (cost= 0.04..0.05 rows=1 width=0) (actual time=0.062..0.064 rows=1 loops=10240)
                             InitPlan
                               -> Limit (cost=0.00..0.04 rows=1 width=4) (actual time=0.047..0.050 rows=1 loops=10240)
                                     -> Index Scan using test_id_key on test t3 (cost=0.00..121.98 rows=3413 width=4) (actual time= 0.038..0.038 rows=1 loops=10240)
                                           Index Cond: (id > $0)
                                           Filter: (id IS NOT NULL)
Total runtime: 1052.802 ms
(18 rows)
Time: 1056.740 ms

postgres=# explain analyze
postgres-# select
postgres-#     distinct on (t1.id)
postgres-#     t1.*, t2.*
postgres-# from
postgres-#     test t1
postgres-#     join test t2 on t2.id > t1.id
postgres-# order by t1.id asc, t2.id asc;
Cancel request sent
ERROR: canceling statement due to user request
postgres=#

On 6/26/07, Rafal Pietrak <rafal@zorro.isa-geek.com> wrote:

OK. Have tried this one.... looks like close to 6 times slower then the
'non-standard' phrase with 'distinct on'.

On the small dataset that I've included in my original post (ten rows of
data within TEST), I've run both queries through EXPLAIN ANALYSE, with
the following result summary (for clearity, I've cut away the details
from EXPLAIN output):

-----------STANDARD
Total runtime: 10.660 ms
-----------DISTINCT-ON
Total runtime: 1.479 ms
-----------

Would there be ways to optimise the standard query to get the
performance closer to the none-standard one?

-R

On Tue, 2007-06-26 at 18:05 +0530, Gurjeet Singh wrote:
> Hi Rafal,
>
>     Just a note that this is not standard SQL... 'distinct on' is an
> extension to SQL provided by postgres.
>
> Following query utilizes the standard SQL to get the same results:
>
> select    t1.id as id, t2.id as "id+1",
>         t1.thread as thread, t2.thread as "thread+1",
>         t1.info as info, t2.info as "info+1"
> from test as t1, test as t2
> where t2.id = ( select min(id) from test as t3 where t3.id > t1.id);
>
> HTH
> --
> gurjeet[.singh]@EnterpriseDB.com
> singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com
>
> 17°29'34.37"N  78°30' 59.76"E - Hyderabad *
> 18°32'57.25"N  73°56'25.42 "E - Pune
>
> Sent from my BlackLaptop device
>
> On 6/26/07, Rafal Pietrak < rafal@zorro.isa-geek.com> wrote:
>         Marvelous! Thenx!
>
>         -R
>
>         On Tue, 2007-06-26 at 10:06 +0200, hubert depesz lubaczewski
>         wrote:
>         > On 6/26/07, Rafal Pietrak < rafal@zorro.isa-geek.com> wrote:
>         >         Is there an SQL construct to get it?
>         >
>         > select
>         >     distinct on ( t1.id)
>         >     t1.*, t2.*
>         > from
>         >     test t1
>         >     join test t2 on t2.id > t1.id
>         > order by t1.id asc, t2.id asc
>         >
>         > should do the trick.
>         >
>         > depesz
>         >
>         > --
>         > http://www.depesz.com/ - nowy, lepszy depesz
>
>         ---------------------------(end of
>         broadcast)---------------------------
>         TIP 4: Have you searched our list archives?
>
>                        http://archives.postgresql.org/
>

--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com

17°29'34.37"N 78°30'59.76"E - Hyderabad *
18°32'57.25"N 73°56'25.42"E - Pune

Sent from my BlackLaptop device

pgsql-general by date:

From: Kev
Date: 29 June 2007, 21:53:05
Subject: Re: varchar(n) VS text

From: Jeff Davis
Date: 29 June 2007, 21:58:53
Subject: Re: Interval overflow?

Re: a JOIN on same table, but 'slided over' - Mailing list pgsql-general

Previous

Next