Re: Self-referencing table question - Mailing list pgsql-sql

From Sean Davis
Subject Re: Self-referencing table question
Date
Msg-id 1303f92d183bfd2ff7c522d4ad5afde8@mail.nih.gov
Whole thread Raw
In response to Re: Self-referencing table question  (Edmund Bacon <ebacon@onesystem.com>)
Responses Re: Self-referencing table question  (Edmund Bacon <ebacon@onesystem.com>)
List pgsql-sql
Thanks.  I thought about that a bit and it seems like it is highly 
likely to be expensive for a single query (though I should probably try 
it at some point).  If I do find myself reformatting results after 
response to user input (i.e., reusing the query), though, then your 
solution is likely to be very useful.

Sean

On Mar 24, 2005, at 11:13 AM, Edmund Bacon wrote:

> Sometimes using a temp table is a better idea:
>
> e.g.
>
> -- start by creating a temp table 'tids' that hold the to_ids that
> -- we are interested in.
>  SELECT to_id
>    INTO TEMP TABLE tids
>    FROM correlation
>   WHERE from_id  = 1234
>   ORDER BY val DESC limit 100;
>
> -- The following temp table makes use of the primary key on
> -- the correlation table, and the stated goal from the original
> -- question that:
> --         from_id > to_id
> --     and from_id in (tids.to_id)
> --    and to_id in (tids.to_id)
>
>  SELECT t1.to_id AS from_id, t2.to_id
>    INTO TEMP TABLE from_to
>    FROM tids t1, tids t2
>   WHERE t1.to_id > t2.to_id;
>
> -- Now we can use the from_to table as an index into the correlation
> -- table.
>
>  SELECT c.from_id, c.to_id, c.val
>    FROM from_to
>    JOIN correlation c USING(from_id, to_id)
>   WHERE val > 0.5;
>
>
> The explain analyze for the final select works out to:
> Nested Loop  (cost=0.00..50692.00 rows=8488 width=16) (actual 
> time=0.171..150.095 rows=2427 loops=1)
>   ->  Seq Scan on from_to  (cost=0.00..79.38 rows=5238 width=8) 
> (actual time=0.006..7.660 rows=4950 loops=1)
>   ->  Index Scan using correlation_pkey on correlation c  
> (cost=0.00..9.63 rows=2 width=16) (actual time=0.024..0.025 rows=0 
> loops=4950)
>         Index Cond: (("outer".from_id = c.from_id) AND ("outer".to_id 
> = c.to_id))
>         Filter: (val > 0.5::double precision)
> Total runtime: 152.261 ms
>
>
> Richard Huxton wrote:
>
>> Sean Davis wrote:
>>
>>> I answer my own question, if only for my own records.  The following 
>>>  query is about 5-6 times faster than the original.  Of course, if  
>>> anyone else has other ideas, I'd be happy to hear them.
>>>
>>> Sean
>>>
>>> explain analyze select from_id,to_id,val from exprsdb.correlation 
>>> where  from_id in (select to_id from exprsdb.correlation where 
>>> from_id=2424  order by val desc limit 100) and to_id in (select 
>>> to_id from  exprsdb.correlation where from_id=2424 order by val desc 
>>> limit 100) and  val>0.6 and to_id<from_id;
>>
>>
>> Might not be any faster, but you can do this as a self-join with 
>> subquery:
>>
>> SELECT c1.from_id, c1.to_id, c1.val
>> FROM
>>   correlation c1,
>>   (
>>     SELECT to_id FROM correlation WHERE from_id=2424
>>     ORDER BY val DESC LIMIT 100
>>   ) AS c2
>>   (
>>     SELECT to_id FROM correlation WHERE from_id=2424
>>     ORDER BY val DESC LIMIT 100
>>   ) AS c3
>> WHERE
>>   c1.from_id = c2.to_id
>>   AND c1.to_id = c3.to_id
>>   AND c1.val > 0.5
>>   AND c1.to_id < from_id
>> ;
>>
>> I think PG should be smart enough nowadays to figure out these two 
>> queries are basically the same.
>
>
> -- 
> Edmund Bacon <ebacon@onesystem.com>
>
>
> ---------------------------(end of 
> broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>      subscribe-nomail command to majordomo@postgresql.org so that your
>      message can get through to the mailing list cleanly



pgsql-sql by date:

Previous
From: "Jim Buttafuoco"
Date:
Subject: Re: Funtions + plpgsql + contrib/pgcrypto = ??
Next
From: "Moran.Michael"
Date:
Subject: Re: Funtions + plpgsql + contrib/pgcrypto = ??