Home > mailing lists

SEVEN cross joins?!?!? - Mailing list pgsql-sql

From	Frank Bax
Subject	SEVEN cross joins?!?!?
Date	October 11, 2005 11:15:50
Msg-id	5.2.1.1.0.20051011071510.0364d850@pop6.sympatico.ca Whole thread Raw
Responses	Re: SEVEN cross joins?!?!?
List	pgsql-sql

Tree view

I have a table with only 434 rows in it.  Two important columns are 
"itemid" and "locn".  Each item must be in one of seven locations.  We need 
to create a "combo" by selecting one item from each of seven locations; 
then determine which "combo" is the "best" according to our analysis (see 
below).

A subselect for items in a location looks something like:
(select * from suit_item where locn='Head' AND username='Walter' ORDER BY 
itemid LIMIT 10) as Head

One subselect for each location, cross join them all and the query 
generates 10,000,000 combinations!  Without the "LIMIT 10",  there are  78 
* 37 * 91 * 81 * 99 * 47 * 1 = 98,981,901,018 results returned for 
username='Walter' (the only user at the moment).  The large volume is 
causing a problem for my systems!  The "ORDER BY itemid" was added only so 
that same 10 items were processed on different computer systems I tested 
this query on.  Only one item for 7th locn in the database at the moment.

Every item has three key properties val1, val2 and val3.  For each combo, 
we calculate:(Head.val1 + Arm.val1 + ... Leg.val1) AS Calc1(Head.val2 + Arm.val2 + ... Leg.val2) AS Calc2(Head.val3 +
Arm.val3+ ... Leg.val3) AS Calc3
 
Each calculation has a pseudo "max" value coded so that values above this 
"max" are considered equal:CASE WHEN calc1 > 70 then 70 else calc1 END as ordcalc1CASE WHEN calc2 > 15 then 15 else
calc2END as ordcalc2CASE WHEN calc3 > 60 then 60 else calc3 END as ordcalc3
 
Then I use:ORDER BY ordcalc1 DESC, ordcalc2 DESC, ordcalc3 DESC

When I activated a couple of my brain cells, I realised that adding "WHERE 
ordcalc1 >= 70 AND ordcalc2 >= 15 AND ordcalc3 >= 60" after the cross joins 
might help things out a bit.  The 10,000,000 results was reduced 
significantly (8K - 30K with different samples).  Because the "ordcalc" 
cannot be used in a WHERE clause, the entire expression was repeated.

I used php to generate the query from pieces so that I could avoid lots of 
repetition in coding (but still there in final query).  The query itself is 
about 6K when assembled.

After that big introduction, I have a couple of questions:

1) Did I approach the problem incorrectly?  Is there another way to 
approach this query so that fewer combos are analysed?

2) Are there any optimisations that could improve query speed?  Since the 
table is so small, I guessed that indexes wouldn't help.  I created an 
index on (username, itemid), but it doesn't get used.  Output of EXPLAIN 
ANALYSE found here:http://www.execulink.com/~fbax/JOINS/

3) When run on P2 and P4 systems, I would expect to see huge improvement in 
time taken to process query, but I don't (only 35-40% better)?

i = number of items in LIMIT of subselect
rc = raw record count
rcw = record count with "limits" in WHERE clause
p2 = seconds for query to run on P2-400M pg=7.4.3 ram=32M
p4 = seconds for query to run on P4-2.8G pg=7.3.5 ram=1G

i=10 - rc=1,000,000 rcw=27,086 p2=81  p4=49
i=11 - rc=1,771,561 rcw=41,121 p2=141 p4=86
i=12 - rc=2,985,984 rcw=56,425 p2=216 p4=142
i=13 - rc=4,826,809 rcw=81,527 p2=??? p4=228

On P2 system i=13 query returns empty page with no errors on server.

On P4 system i=15 results in:
PostgreSQL Error: 1 (ERROR: tuplestore: write failed)

I suppose this is a temp file - is it created in $DATA?  OpenBSD has 
several partitions, so I'll need to know which one is too small.

pgsql-sql by date:

From: "Premsun Choltanwanich"
Date: 11 October 2005, 03:10:54
Subject: Re: How to delete Large Object from Database?

From: Richard Huxton
Date: 11 October 2005, 12:32:17
Subject: Re: SEVEN cross joins?!?!?

SEVEN cross joins?!?!? - Mailing list pgsql-sql

Previous

Next