Self-referencing table question - Mailing list pgsql-sql

From Sean Davis
Subject Self-referencing table question
Date
Msg-id 0af8c6d3baaf3d4a6d918df294019507@mail.nih.gov
Whole thread Raw
Responses Re: Self-referencing table question  (Sean Davis <sdavis2@mail.nih.gov>)
List pgsql-sql
I have a table that looks like:
 Column  |     Type     | Modifiers | Description
---------+--------------+-----------+------------- from_id | integer      | not null  | to_id   | integer      | not
null | val     | numeric(4,3) |           |
 
Indexes:    "correlation_pkey" PRIMARY KEY, btree (from_id, to_id)    "correlation_from_id_idx" btree (from_id)
"correlation_to_id_idx"btree (to_id)    "correlation_val_idx" btree (val)
 
Has OIDs: yes

The table describes a pairwise correlation matrix between about 7700  
vectors (so the table has n^2= 60652944 rows, to be exact).  I am  
trying to choose the top 100 correlated vectors with a seed vector;  
this is easily:

select to_id from correlation where from_id=623 order by val desc limit  
100;

Then, I want to take those 100 values and find all from_id,to_id tuples  
where val>0.5 (to construct a graph where all "ids" are nodes and are  
connected to each other when their correlation is >0.5).  I can do this  
like:

explain analyze selectfrom_id,to_id,valfrom exprsdb.correlationwhere from_id in  
(1,2,3,4,5,6,7,8,10,9,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27 
,28,29,30)and to_id in  
(1,2,3,4,5,6,7,8,10,9,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27 
,28,29,30)and from_id>to_idand val>0.5;

However, this does not scale well AT ALL.  The actual (very messy)  
explain analyze output is below.  The thing I notice is that the index  
on to_id is not used.  Also, the primary key index on (from_id, to_id  
is not used, it seems.  Finally, with only 30 values, this already  
takes 2.6 seconds and I am proposing to do this on 100-200 values.  Any  
hints on how better to accomplish this set of tasks?
 Index Scan using correlation_from_id_idx, correlation_from_id_idx,  
correlation_from_id_idx, correlation_from_id_idx,  
correlation_from_id_idx, correlation_from_id_idx,  
correlation_from_id_idx, correlation_from_id_idx,  
correlation_from_id_idx, correlation_from_id_idx,  
correlation_from_id_idx, correlation_from_id_idx,  
correlation_from_id_idx, correlation_from_id_idx,  
correlation_from_id_idx, correlation_from_id_idx,  
correlation_from_id_idx, correlation_from_id_idx,  
correlation_from_id_idx, correlation_from_id_idx,  
correlation_from_id_idx, correlation_from_id_idx,  
correlation_from_id_idx, correlation_from_id_idx,  
correlation_from_id_idx, correlation_from_id_idx,  
correlation_from_id_idx, correlation_from_id_idx,  
correlation_from_id_idx, correlation_from_id_idx on correlation   
(cost=0.00..129377.49 rows=62 width=17) (actual time=340.563..2603.967  
rows=19 loops=1)   Index Cond: ((from_id = 1) OR (from_id = 2) OR (from_id = 3) OR  
(from_id = 4) OR (from_id = 5) OR (from_id = 6) OR (from_id = 7) OR  
(from_id = 8) OR (from_id = 10) OR (from_id = 9) OR (from_id = 11) OR  
(from_id = 12) OR (from_id = 13) OR (from_id = 14) OR (from_id = 15) OR  
(from_id = 16) OR (from_id = 17) OR (from_id = 18) OR (from_id = 19) OR  
(from_id = 20) OR (from_id = 21) OR (from_id = 22) OR (from_id = 23) OR  
(from_id = 24) OR (from_id = 25) OR (from_id = 26) OR (from_id = 27) OR  
(from_id = 28) OR (from_id = 29) OR (from_id = 30))   Filter: (((to_id = 1) OR (to_id = 2) OR (to_id = 3) OR (to_id =
4) 
 
OR (to_id = 5) OR (to_id = 6) OR (to_id = 7) OR (to_id = 8) OR (to_id =  
10) OR (to_id = 9) OR (to_id = 11) OR (to_id = 12) OR (to_id = 13) OR  
(to_id = 14) OR (to_id = 15) OR (to_id = 16) OR (to_id = 17) OR (to_id  
= 18) OR (to_id = 19) OR (to_id = 20) OR (to_id = 21) OR (to_id = 22)  
OR (to_id = 23) OR (to_id = 24) OR (to_id = 25) OR (to_id = 26) OR  
(to_id = 27) OR (to_id = 28) OR (to_id = 29) OR (to_id = 30)) AND  
(from_id > to_id) AND (val > 0.5)) Total runtime: 2604.383 ms

Thanks,
Sean



pgsql-sql by date:

Previous
From: Béatrice Yueksel
Date:
Subject: Re: Your question about date
Next
From: Sean Davis
Date:
Subject: Re: Self-referencing table question