Re: SEVEN cross joins?!?!? - Mailing list pgsql-sql

From Frank Bax
Subject Re: SEVEN cross joins?!?!?
Date
Msg-id 5.2.1.1.0.20051011145705.00a31440@pop6.sympatico.ca
Whole thread Raw
In response to Re: SEVEN cross joins?!?!?  (Richard Huxton <dev@archonet.com>)
Responses Re: SEVEN cross joins?!?!?  ("Greg Patnude" <gpatnude@hotmail.com>)
Re: SEVEN cross joins?!?!?  (Richard Huxton <dev@archonet.com>)
List pgsql-sql
At 08:29 AM 10/11/05, Richard Huxton wrote:

>Frank Bax wrote:
>>I have a table with only 434 rows in it.  Two important columns are 
>>"itemid" and "locn".  Each item must be in one of seven locations.  We 
>>need to create a "combo" by selecting one item from each of seven 
>>locations; then determine which "combo" is the "best" according to our 
>>analysis (see below).
>>A subselect for items in a location looks something like:
>>(select * from suit_item where locn='Head' AND username='Walter' ORDER BY 
>>itemid LIMIT 10) as Head
>>One subselect for each location, cross join them all and the query 
>>generates 10,000,000 combinations!  Without the "LIMIT 10",  there are
>>78 * 37 * 91 * 81 * 99 * 47 * 1 = 98,981,901,018 results returned for 
>>username='Walter' (the only user at the moment).  The large volume is 
>>causing a problem for my systems!  The "ORDER BY itemid" was added only 
>>so that same 10 items were processed on different computer systems I 
>>tested this query on.  Only one item for 7th locn in the database at the 
>>moment.
>
>Frank - it might just be me, but I've read your email twice and despite 
>all the information I still don't have any idea what you are trying to do.
>
>Are you saying that you WANT to generate a cross-join, score the millions 
>of results and then pick the best 10? It's doing what you want, but you'd 
>like it to be faster.
>
>Or are you saying that you'd like to avoid the explosion in rows altogether?
>
>In either case - I don't suppose you could provide a real example of the 
>query, so we can see exactly what you're trying to do.


There is no "best 10".  I currently limit each subselect to 10 items so 
that query will actually run without crashing.  I would like to remove the 
"ORDER BY itemid LIMIT 10" mentioned above.  At the end of the query I have 
a "LIMIT 100" clause which will stay and produces a list of "best 100" combos.

Either of your solutions would be acceptable; since avoiding the 
"explosion" would also make the query faster.  Current calculations 
indicate that running the query without "LIMIT 10" in subselect would take 
years to process.

The query is filled with expressions.  I'm not sure I can shorten it 
without making typos or deleting something important, so I'll make it 
available on web here:        http://www.execulink.com/~fbax/JOINS/
Results of "explain analyse" is also there. 



pgsql-sql by date:

Previous
From: Richard Huxton
Date:
Subject: Re: ichar
Next
From: "Rick Schumeyer"
Date:
Subject: pg, mysql comparison with "group by" clause