Re: Self-referencing table question - Mailing list pgsql-sql

From Edmund Bacon
Subject Re: Self-referencing table question
Date
Msg-id 4242E732.3050600@onesystem.com
Whole thread Raw
In response to Re: Self-referencing table question  (Richard Huxton <dev@archonet.com>)
Responses Re: Self-referencing table question  (Sean Davis <sdavis2@mail.nih.gov>)
List pgsql-sql
Sometimes using a temp table is a better idea:

e.g.

-- start by creating a temp table 'tids' that hold the to_ids that
-- we are interested in. SELECT to_id   INTO TEMP TABLE tids   FROM correlation  WHERE from_id  = 1234  ORDER BY val
DESClimit 100;
 

-- The following temp table makes use of the primary key on
-- the correlation table, and the stated goal from the original
-- question that:
--         from_id > to_id
--     and from_id in (tids.to_id)
--    and to_id in (tids.to_id)
 SELECT t1.to_id AS from_id, t2.to_id   INTO TEMP TABLE from_to   FROM tids t1, tids t2  WHERE t1.to_id > t2.to_id;

-- Now we can use the from_to table as an index into the correlation
-- table.
 SELECT c.from_id, c.to_id, c.val   FROM from_to   JOIN correlation c USING(from_id, to_id)  WHERE val > 0.5;


The explain analyze for the final select works out to:Nested Loop  (cost=0.00..50692.00 rows=8488 width=16) (actual 
time=0.171..150.095 rows=2427 loops=1)  ->  Seq Scan on from_to  (cost=0.00..79.38 rows=5238 width=8) (actual 
time=0.006..7.660 rows=4950 loops=1)  ->  Index Scan using correlation_pkey on correlation c  
(cost=0.00..9.63 rows=2 width=16) (actual time=0.024..0.025 rows=0 
loops=4950)        Index Cond: (("outer".from_id = c.from_id) AND ("outer".to_id = 
c.to_id))        Filter: (val > 0.5::double precision)Total runtime: 152.261 ms


Richard Huxton wrote:

> Sean Davis wrote:
>
>> I answer my own question, if only for my own records.  The following  
>> query is about 5-6 times faster than the original.  Of course, if  
>> anyone else has other ideas, I'd be happy to hear them.
>>
>> Sean
>>
>> explain analyze select from_id,to_id,val from exprsdb.correlation 
>> where  from_id in (select to_id from exprsdb.correlation where 
>> from_id=2424  order by val desc limit 100) and to_id in (select to_id 
>> from  exprsdb.correlation where from_id=2424 order by val desc limit 
>> 100) and  val>0.6 and to_id<from_id;
>
>
> Might not be any faster, but you can do this as a self-join with 
> subquery:
>
> SELECT c1.from_id, c1.to_id, c1.val
> FROM
>   correlation c1,
>   (
>     SELECT to_id FROM correlation WHERE from_id=2424
>     ORDER BY val DESC LIMIT 100
>   ) AS c2
>   (
>     SELECT to_id FROM correlation WHERE from_id=2424
>     ORDER BY val DESC LIMIT 100
>   ) AS c3
> WHERE
>   c1.from_id = c2.to_id
>   AND c1.to_id = c3.to_id
>   AND c1.val > 0.5
>   AND c1.to_id < from_id
> ;
>
> I think PG should be smart enough nowadays to figure out these two 
> queries are basically the same.


-- 
Edmund Bacon <ebacon@onesystem.com>



pgsql-sql by date:

Previous
From: "Greg Sabino Mullane"
Date:
Subject: Re: How do I do this?
Next
From: "Moran.Michael"
Date:
Subject: Funtions + plpgsql + contrib/pgcrypto = ??