Re: SEVEN cross joins?!?!? - Mailing list pgsql-sql

From Frank Bax
Subject Re: SEVEN cross joins?!?!?
Date
Msg-id 5.2.1.1.0.20051012211223.00a34dd0@pop6.sympatico.ca
Whole thread Raw
In response to Re: SEVEN cross joins?!?!?  (Daryl Richter <daryl@brandywine.com>)
Responses Re: SEVEN cross joins?!?!?  (Daryl Richter <daryl@brandywine.com>)
List pgsql-sql
At 09:00 AM 10/12/05, Daryl Richter wrote:

>Richard Huxton wrote:
>>Frank Bax wrote:
>>
>>>>Are you saying that you WANT to generate a cross-join, score the 
>>>>millions of results and then pick the best 10? It's doing what you 
>>>>want, but you'd like it to be faster.
>>>>
>>>>Or are you saying that you'd like to avoid the explosion in rows 
>>>>altogether?
>>>>
>>>>In either case - I don't suppose you could provide a real example of 
>>>>the query, so we can see exactly what you're trying to do.
>>>
>>>
>>>
>>>There is no "best 10".  I currently limit each subselect to 10 items so 
>>>that query will actually run without crashing.  I would like to remove 
>>>the "ORDER BY itemid LIMIT 10" mentioned above.  At the end of the query 
>>>I have a "LIMIT 100" clause which will stay and produces a list of "best 
>>>100" combos.
>>>
>>>Either of your solutions would be acceptable; since avoiding the 
>>>"explosion" would also make the query faster.  Current calculations 
>>>indicate that running the query without "LIMIT 10" in subselect would 
>>>take years to process.
>>
>>OK - so at the heart of the problem is the fact that you want to search a 
>>space with 100 billion possible states. There are three main options
>>1. Brute force and patience - simple and is guaranteed to produce the 
>>"best" answers. You can use cursors/multiple queries to manage the 
>>workload. The disadvantage is that it's probably slower than you'd like.
>>2. Smarter algorithms - possibly something genetic to work towards local 
>>maxima. Or, if you'd like to keep it simple, just split your 7 locations 
>>into 2,2,3 and solve for each separately.
>>3. Statistical - examine a subset of possible states and accept you'll 
>>only be able to say "almost best" to 99% confidence or similar.
>>I'd be tempted by #2 - there are probably some combinations you can rule 
>>out, which combined with a split/recombine should reduce the number of 
>>states to query.
>
>I'll second this recommendation.  The OP is trying to drive a nail with a 
>screwdriver.  He needs a program, not a query.


Richard, you've summed it up nicely.

Splitting locations into subsets (like 2,2,3) doesn't work because it is 
possible that low values in one location can be offset by high values in 
another location, and still result in an excellent combo.

The good news is these suggestions got me thinking outside the box.  I 
think I can program a modified brute-force that bypasses large numbers of 
combos early.  It might still be too large/slow, so I'd be interested in 
finding more info about these "smarter algorithms" in option 2.  Where do I 
look?

Greg: my son's the gamer; I'm just trying to help him out.



pgsql-sql by date:

Previous
From: Tom Lane
Date:
Subject: Re: Update timestamp on update
Next
From: Tom Lane
Date:
Subject: Re: pg, mysql comparison with "group by" clause