Thread: Enforce Symmetric Matrix
I have a table with N (over 35k) rows. I need to compare each of those elements to every other element in the list, which is the handshake problem. This results in a symmetric matrix (an adjacency matrix for an undirected graph). Because all relations are symmetric, I only need to store the upper triangle of this matrix. A ~ B and B ~ A are the same. I need some advice on how to implement this. At first, I thought that I'd only store the upper triangle, resulting in N^2 / 2 rows of the form: id1 id2 value ---- ---- ----- A B 1 A C 2 A D 3 B C 4 B D 5 C D 6 Where value is the result of an expensive computation, and with the constraint that id1 < id2. But there are problems. To get a list of all things compared to B, I have to look in both columns. Complicated queries with selects nested inside updates become very difficult or impossible to do in one go. I'd also need to index both columns, which seems like a waste. So I thought maybe I'd just store the full matrix, with A,B and B,A having the same data id1 id2 value ---- ---- ----- A B 1 A C 2 A D 3 B A 1 B C 4 B D 5 C A 2 C B 4 C D 6 With the constraint that id1 != id2 because I don't need the diagonal. The table is twice as big, but still O(n^2). This would let me get the list of all things compared to B with SELECT ... WHERE id1 = B, which is super easy. My problem here is that I'm not sure how to enforce the rule that A,B and B,A have the same value. I want to use a rule or trigger such that when row A,B is updated or inserted, row B,A is updated or inserted with the same date. But then it would get called recursively, and that doesn't work. For my application, the total number of items I have (N) will be growing over time and this table will need to be updated accordingly. This seems like a problem that many people must have come across before, but I've been strangely unable to find advice online. Any recommendations?
Attachment
SELECT l.id, u.id, func(l.id, u.id) FROM ids l CROSS JOIN ids u WHERE l.id < u.id Depending on whether you always update a known pair, or instead invalidate all rows where either id is a given value, you can use various means to manage the resultant materialized view. Triggers or interface functions mainly. Without calling the value function you would also know, at any given time, whether a given pair is present. The usefulness of this depends on how real-time you need the updates to be; which is a trade-off with performance during changes. Adding a simple limit on the two ids sub-queries, and doing the incremental add in a loop, you can appropriately scale the updates to limit memory usage during the bulk load phase. Likely ongoing updates will not have the same requirement since you only have N updates instead of N^2/2; but can be done all the same. SELECT LID, UID, FUNC(lid, uid) FROM SELECT CASE WHEN c1 < c2 THEN c1 ELSE c2 END AS LID , CASE WHEN c1 < c2 THEN c2 ELSE c1 END AS UID FROM SELECT * FROM -> WHERE c1 <> c2 SELECT :newval AS c1, ids.id AS c2 FROM ids David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Enforce-Symmetric-Matrix-tp5803064p5803126.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On Wed, May 07, 2014 at 04:07:22PM -0500, John McKown wrote: > On Wed, May 7, 2014 at 3:36 PM, Randy Westlund <rwestlun@gmail.com> wrote: > > > I have a table with N (over 35k) rows. I need to compare each of > > those elements to every other element in the list, which is the > > handshake problem. > > > > This results in a symmetric matrix (an adjacency matrix for an > > undirected graph). Because all relations are symmetric, I only need to > > store the upper triangle of this matrix. A ~ B and B ~ A are the same. > > > > I need some advice on how to implement this. > > Just a crazy idea, but have you considered something like the below: > > create view view1 as select id1 as v_id1, id2 as v_id2, value from abovetbl > union select id2, id1, value from abovetbl; > > The above basically "expands" the "small matrix" table to have the same > values that the "full matrix" table would have. because id1, id2 has the > same "value" as id2, id1 . > > You can now create an index on "view1" like: > create index view1_index on view1(v_id1); > > To find all the values for "B", you can now: > select * from view1 where v_id1='B'; > > The view does not take up much space. But the index on the view might. But > only about as much space as an index on the "full matrix" table would. Oh, > I changed the column names in the view so that it would hopefully not be as > confusing has having multiple "id1" and "id2" column names with different > meanings. > > I'm not a real heavy, but this may be of some use. > -- > There is nothing more pleasant than traveling and meeting new people! > Genghis Khan > > Maranatha! <>< > John McKown This was exactly what I needed, thank you. Everything is working now :)