Re: Very specialised query - Mailing list pgsql-performance

From Matthew Wakeling
Subject Re: Very specialised query
Date
Msg-id alpine.DEB.2.00.0903311757030.21772@aragorn.flymine.org
Whole thread Raw
In response to Re: Very specialised query  (Віталій Тимчишин <tivv00@gmail.com>)
Responses Re: Very specialised query
List pgsql-performance
On Mon, 30 Mar 2009, Віталій Тимчишин wrote:
> select
> case when n == 1 then id1 else id2 end,
> case when n == 2 then id1 else id2 end
>
> from (
> SELECT
>    l1.id AS id1,
>    l2.id AS id2
> FROM
>    location l1,
>    location l2
> WHERE
>        l1.objectid = 228000093
>    AND l2.objectid = 228000093
>    AND l1.id <> l2.id
>    AND l1.start < l2.end
>    AND l1.end > l2.start
>    AND l1.start < l2.start) a, (values (1),(2)) b(n)

It is a nice idea. However, the planner gets the join the wrong way round:

select distinct
     case when n = 1 then id1 else id2 end,
     case when n = 1 then id2 else id1 end
FROM (
     select
         l1.id AS id1,
         l2.id AS id2
     FROM
         location l1,
         location l2
     WHERE
             l1.id <> l2.id
         AND l1.objectid = l2.objectid
         AND l1.start <= l2.end
         AND l2.start <= l1.end
         AND l1.start <= l2.start
     ) AS a,
     (values (1), (2)) b(n);

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Unique  (cost=7366497963.75..7637346831.94 rows=36113182426 width=12)
          (actual time=1642178.623..2206678.691 rows=139606782 loops=1)
    ->  Sort  (cost=7366497963.75..7456780919.81 rows=36113182426 width=12)
              (actual time=1642178.619..1899057.147 rows=166377424 loops=1)
          Sort Key: (CASE WHEN ("*VALUES*".column1 = 1) THEN l1.subjectid ELSE l2.subjectid END), (CASE WHEN
("*VALUES*".column1= 1) THEN l2.subjectid ELSE l1.subjectid END) 
          Sort Method:  external merge  Disk: 3903272kB
          ->  Nested Loop  (cost=0.00..592890483.66 rows=36113182426 width=12)
                           (actual time=85.333..984211.011 rows=166377424 loops=1)
                ->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2 width=4)
                                               (actual time=0.002..0.008 rows=2 loops=1)
                ->  Nested Loop  (cost=0.00..25596373.62 rows=18056591213 width=8)
                                 (actual time=42.684..322743.335 rows=83188712 loops=2)
                      Join Filter: ((l1.subjectid <> l2.subjectid) AND (l1.intermine_start <= l2.intermine_end))
                      ->  Seq Scan on location l1
                                 (cost=0.00..78076.79 rows=3490079 width=16)
                                 (actual time=0.008..3629.672 rows=3490079 loops=2)
                      ->  Index Scan using location_test_obj_start on location l2
                                 (cost=0.00..3.89 rows=152 width=16)
                                 (actual time=0.005..0.038 rows=25 loops=6980158)
                            Index Cond: ((l2.objectid = l1.objectid) AND (l2.intermine_start <= l1.intermine_end) AND
(l1.intermine_start<= l2.intermine_start)) 
  Total runtime: 2339619.383 ms

The outer nested join has the VALUES as the main loop, and the complicated
join as the leaf. So, the complicated overlap-finding join gets run twice.

Oh, there's also the great big sort and unique, but I think I can get rid
of that.

Matthew

--
 Contrary to popular belief, Unix is user friendly. It just happens to be
 very selective about who its friends are.                 -- Kyle Hearn

pgsql-performance by date:

Previous
From: Chris Browne
Date:
Subject: Re: I have a fusion IO drive available for testing
Next
From: Віталій Тимчишин
Date:
Subject: Re: Very specialised query