Thread: Why is this a cross join?
I have a seemingly simple query that returns way too many records. Here is the count query select count(*) from ( select crm.* from sap_crm_sales crm inner join model_configurations mc on left(crm.customer_class, 6) = left(mc.sap_code,6) ) as s The crm table has about 44K records, the model table has about 1K the join produces about 9 million records It doesn't matter if I make it a left join or inner join the count ends up the same.
On Feb 17, 2013, at 9:16, Tim Uckun <timuckun@gmail.com> wrote: > I have a seemingly simple query that returns way too many records. > Here is the count query >=20 > select count(*) from ( > select crm.* > from sap_crm_sales crm > inner join model_configurations mc on left(crm.customer_class, 6) > =3D left(mc.sap_code,6) > ) as s >=20 > The crm table has about 44K records, the model table has about 1K the > join produces about 9 million records Apparently the first 6 characters of those fields are quite common, = which gives you a result for every possible combination of the same = 6-character value. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
> Apparently the first 6 characters of those fields are quite common, which > gives you a result for every possible combination of the same 6-character > value. Mmmmm. That seems kind of weird. Is there any way to NOT have this be a cross join? For example if I extracted the first six characters into a field and then joined on them it would not be a cross join right?
On 18/02/2013, at 7:58 AM, Tim Uckun <timuckun@gmail.com> wrote: >> Apparently the first 6 characters of those fields are quite common, = which >> gives you a result for every possible combination of the same = 6-character >> value. >=20 >=20 > Mmmmm. That seems kind of weird. Is there any way to NOT have this > be a cross join? For example if I extracted the first six characters > into a field and then joined on them it would not be a cross join > right? In some way, every join is a cross join, with the results filtered = according to the specificity of the join conditions. In this case: inner join model_configurations mc on left(crm.customer_class, 6) =3D = left(mc.sap_code,6) "customer_class" sounds like a fairly generic sort of field, so you'd = expect many matches. Truncating the fields is likely to make this even = less specific, returning more results. Cheers, Tony =20=
> > In some way, every join is a cross join, with the results filtered according to the specificity of the join conditions.In this case: > > inner join model_configurations mc on left(crm.customer_class, 6) = left(mc.sap_code,6) > > "customer_class" sounds like a fairly generic sort of field, so you'd expect many matches. Truncating the fields is likelyto make this even less specific, returning more results. > I guess I am not explaining it properly.. Say I created new columns on both tables called "first_6" and populated them with the substrings. If I did a inner join or a left join on those fields would I still get a cross join? inner join model_configurations mc on mc.first_6 = crm.first_6
On 02/17/2013 02:09 PM, Tim Uckun wrote: >> >> In some way, every join is a cross join, with the results filtered according to the specificity of the join conditions.In this case: >> >> inner join model_configurations mc on left(crm.customer_class, 6) = left(mc.sap_code,6) >> >> "customer_class" sounds like a fairly generic sort of field, so you'd expect many matches. Truncating the fields is likelyto make this even less specific, returning more results. >> > > I guess I am not explaining it properly.. > > Say I created new columns on both tables called "first_6" and > populated them with the substrings. If I did a inner join or a left > join on those fields would I still get a cross join? > > inner join model_configurations mc on mc.first_6 = crm.first_6 > > http://www.postgresql.org/docs/9.2/interactive/sql-select.html ...CROSS JOIN and INNER JOIN produce a simple Cartesian product, the same result as you get from listing the two items at the top level of FROM, but restricted by the join condition (if any). CROSS JOIN is equivalent to INNER JOIN ON (TRUE), that is, no rows are removed by qualification. These join types are just a notational convenience, since they do nothing you couldn't do with plain FROM and WHERE... -- Adrian Klaver adrian.klaver@gmail.com
On 18/02/2013, at 9:09 AM, Tim Uckun <timuckun@gmail.com> wrote: >>=20 >> In some way, every join is a cross join, with the results filtered = according to the specificity of the join conditions. In this case: >>=20 >> inner join model_configurations mc on left(crm.customer_class, 6) =3D = left(mc.sap_code,6) >>=20 >> "customer_class" sounds like a fairly generic sort of field, so you'd = expect many matches. Truncating the fields is likely to make this even = less specific, returning more results. >>=20 >=20 > I guess I am not explaining it properly.. >=20 > Say I created new columns on both tables called "first_6" and > populated them with the substrings. If I did a inner join or a left > join on those fields would I still get a cross join? No, it would be no different if you created new columns - the join = condition is the same. You're not actually getting a cross join, that would be many more = records than 9million - you're just not setting a specific enough = filter. Cheers, Tony
On 2/17/2013 2:09 PM, Tim Uckun wrote: > Say I created new columns on both tables called "first_6" and > populated them with the substrings. If I did a inner join or a left > join on those fields would I still get a cross join? > > inner join model_configurations mc on mc.first_6 = crm.first_6 without additional filtering, that will return a row for every row in crm that matches a row in mc. if there's a 1000 rows in crm that each match an average of 6 rows in mc, that would return 6000 rows. crm left join mc would return all of those, plus all the rows in crm that don't have any match in mc (the mc.* fields would be NULL in these result rows). -- john r pierce 37N 122W somewhere on the middle of the left coast
Tim Uckun <timuckun@gmail.com> wrote:=0A=0A> I guess I am not explaining it= properly..=0A>=0A> Say I created new columns on both tables called "first_= 6" and=0A> populated them with the substrings.=A0 If I did a inner join or = a=0A> left join on those fields would I still get a cross join?=0A>=0A> inn= er join model_configurations mc on mc.first_6 =3D crm.first_6=0A=0AAs other= s have said, that would not change the results -- you are=0Agetting exactly= what you are requesting, but apparently not what=0Ayou want.=A0 Can you ex= plain what it is that you do want?=A0 If you=0Ahave ten rows in one table w= ith the same "first six" value, and 20=0Arows in the other table with that = same "first six" value, how do=0Ayou want them to be matched up?=A0 Do you = want one row in the result=0Afor every row in one of the tables?=A0 If so, = how do you want to=0Adetermine which of the matching rows in the other tabl= e to choose,=0Aand which to ignore?=0A=0A-- =0AKevin Grittner=0AEnterpriseD= B: http://www.enterprisedb.com=0AThe Enterprise PostgreSQL Company
> for every row in one of the tables? If so, how do you want to > determine which of the matching rows in the other table to choose, > and which to ignore? In this case it's a simple lookup. Any of the matching rows will do but I just want one.
Tim Uckun <timuckun@gmail.com> wrote:=0A=0A>> how do you want to determine = which of the matching rows in the=0A>> other table to choose, and which to = ignore?=0A>=0A> Any of the matching rows will do but I just want one.=0A=0A= Well, if you don't care, this might give you what you want.=0A=0Aselect dis= tinct on (x.first6) crm.*, x.*=0A=A0 from from sap_crm_sales crm=0A=A0 join= (select left(mc.sap_code,6) as first6, mc.*=0A=A0=A0=A0=A0=A0=A0=A0=A0=A0 = from model_configurations mc) x=0A=A0=A0=A0 on x.first6 =3D left(crm.custom= er_class, 6)=0A=A0 order by x.first6;=0A=0AI can think of about a dozen dif= ferent things which fit the=0Adescription that you've given so far that you= *might* want instead,=0Abut unless you get more specific we can only guess= .=0A=0AIf this isn't what you want, your best bet might might be to come=0A= up with a self-contained example, where you create an populate=0Atables wit= h a small amount of data, and show what you would like as=0Aa result given = that data set.=0A=0A--=0AKevin Grittner=0AEnterpriseDB: http://www.enterpri= sedb.com=0AThe Enterprise PostgreSQL Company