Re: Self-referencing table question - Mailing list pgsql-sql
From | Edmund Bacon |
---|---|
Subject | Re: Self-referencing table question |
Date | |
Msg-id | 4242E732.3050600@onesystem.com Whole thread Raw |
In response to | Re: Self-referencing table question (Richard Huxton <dev@archonet.com>) |
Responses |
Re: Self-referencing table question
|
List | pgsql-sql |
Sometimes using a temp table is a better idea: e.g. -- start by creating a temp table 'tids' that hold the to_ids that -- we are interested in. SELECT to_id INTO TEMP TABLE tids FROM correlation WHERE from_id = 1234 ORDER BY val DESClimit 100; -- The following temp table makes use of the primary key on -- the correlation table, and the stated goal from the original -- question that: -- from_id > to_id -- and from_id in (tids.to_id) -- and to_id in (tids.to_id) SELECT t1.to_id AS from_id, t2.to_id INTO TEMP TABLE from_to FROM tids t1, tids t2 WHERE t1.to_id > t2.to_id; -- Now we can use the from_to table as an index into the correlation -- table. SELECT c.from_id, c.to_id, c.val FROM from_to JOIN correlation c USING(from_id, to_id) WHERE val > 0.5; The explain analyze for the final select works out to:Nested Loop (cost=0.00..50692.00 rows=8488 width=16) (actual time=0.171..150.095 rows=2427 loops=1) -> Seq Scan on from_to (cost=0.00..79.38 rows=5238 width=8) (actual time=0.006..7.660 rows=4950 loops=1) -> Index Scan using correlation_pkey on correlation c (cost=0.00..9.63 rows=2 width=16) (actual time=0.024..0.025 rows=0 loops=4950) Index Cond: (("outer".from_id = c.from_id) AND ("outer".to_id = c.to_id)) Filter: (val > 0.5::double precision)Total runtime: 152.261 ms Richard Huxton wrote: > Sean Davis wrote: > >> I answer my own question, if only for my own records. The following >> query is about 5-6 times faster than the original. Of course, if >> anyone else has other ideas, I'd be happy to hear them. >> >> Sean >> >> explain analyze select from_id,to_id,val from exprsdb.correlation >> where from_id in (select to_id from exprsdb.correlation where >> from_id=2424 order by val desc limit 100) and to_id in (select to_id >> from exprsdb.correlation where from_id=2424 order by val desc limit >> 100) and val>0.6 and to_id<from_id; > > > Might not be any faster, but you can do this as a self-join with > subquery: > > SELECT c1.from_id, c1.to_id, c1.val > FROM > correlation c1, > ( > SELECT to_id FROM correlation WHERE from_id=2424 > ORDER BY val DESC LIMIT 100 > ) AS c2 > ( > SELECT to_id FROM correlation WHERE from_id=2424 > ORDER BY val DESC LIMIT 100 > ) AS c3 > WHERE > c1.from_id = c2.to_id > AND c1.to_id = c3.to_id > AND c1.val > 0.5 > AND c1.to_id < from_id > ; > > I think PG should be smart enough nowadays to figure out these two > queries are basically the same. -- Edmund Bacon <ebacon@onesystem.com>