Thread: Is not equal to query...

Is not equal to query...

From
James David Smith
Date:
Hi everyone,

A bit of help please. This query works for me, and gives me 332,000
rows of records out of my table of 400,000 rows. It takes about a
minute to run.

SELECT vehicle.*
FROM vehicle, incidents
WHERE incidents.key = vehicle.incident_link;

Would I would like to do however, is remove the data from this table
that does not match the above query. So to start with, I thought that
I'd better try and identify this data using the query below. You'll
note I have simply changed the '=' for '!=' instead.

SELECT vehicle.*
FROM vehicle, incidents
WHERE incidents.key != vehicle.incident_link;

However when I run this query, after about 10-15 minutes, I still
don't have a result. I don't really understand why not.

Cheers as always.

James

Re: Is not equal to query...

From
Merlin Moncure
Date:
On Wed, Jun 8, 2011 at 4:57 PM, James David Smith
<james.david.smith@gmail.com> wrote:
> Hi everyone,
>
> A bit of help please. This query works for me, and gives me 332,000
> rows of records out of my table of 400,000 rows. It takes about a
> minute to run.
>
> SELECT vehicle.*
> FROM vehicle, incidents
> WHERE incidents.key = vehicle.incident_link;
>
> Would I would like to do however, is remove the data from this table
> that does not match the above query. So to start with, I thought that
> I'd better try and identify this data using the query below. You'll
> note I have simply changed the '=' for '!=' instead.
>
> SELECT vehicle.*
> FROM vehicle, incidents
> WHERE incidents.key != vehicle.incident_link;
>
> However when I run this query, after about 10-15 minutes, I still
> don't have a result. I don't really understand why not.

You're getting a combination of every record from vehicle combined
with every record incidents except in the very specific case where the
identifiers match -- not want you want.  There are a ton of ways to do
what you want in sql.  Typically the best/fastest is left join/not
null:

if you are trying to find vehicles with an incident key that is not in
the incident table:
SELECT vehicle.*
FROM vehicle LEFT JOIN incidents ON incidents.key = vehicle.incident_link
WHERE incidents.key IS NULL

Incidents with no record in the vehicle table:
SELECT incidents .*
FROM incidents LEFT JOIN vehicle ON incidents.key = vehicle.incident_link
WHERE vehicle.incident_link IS NULL

Another way to do it is with 'where not exists' -- which is a lot
easier to fold into delete syntax:
SELECT vehicle.*
FROM vehicle
WHERE NOT EXISTS
(
  SELECT 1 FROM incidents WHERE incidents.key = vehicle.incident_link
)

which you can turn into:
DELETE FROM vehicle
WHERE NOT EXISTS
(
  SELECT 1 FROM incidents WHERE incidents.key = vehicle.incident_link
)

etc (don't jump directly to the delete, test it first!)

merlin

Re: Is not equal to query...

From
Merlin Moncure
Date:
On Thu, Jun 9, 2011 at 6:17 AM, James David Smith
<james.david.smith@gmail.com> wrote:
> Hi Merlin,
>
> Thank you very much for the full and brilliant reply. The last query
> you wrote does exactly what I want it too. I wonder whether it's not
> too much trouble whether you could explain to me a couple of things
> though...?
>
> 1) I understand what you are saying my query does. Like an outer join
> I think? But don't get why. Using '!=' is the opposite of '=' is it
> not?

It is the opposite -- just not in the way you are thinking.  In SQL,
joins between tables means 'give me every combination of data from
table A combined with table B given a condition'.  If A and B each
have 100 records with identifiers 1-100, the not equal join would give
you a join result of 99 records for A=1 (with B 2-100), 99 records for
A=2 (with B 1, 3-100) etc. for a total of 9900 records.

Your problem is that you are still associating tables A and B in your
head in a way that is not expressed in the join.  You have to imagine
both tables as pools of unassociated records with no ordering except
for what you give in the query.

> 2) I understand the 'where not exists' query you suggest, and have
> used that, but I don't see why you use '1' in it. What does the 1 do?

where not exists means 'return this record if this query does not
return at least 1 record' -- since we don't care what is in the record
that is actually returned, I just use 1 as a shorthand because in SQL
you can't write queries that return 0 fields -- this is a pretty
common convention.  I instead of 1, I could have used *, -999, or
anything -- it doesn't matter.

Aside: exists/not exists in relational parlance are called 'semi
joins'.  They can often be faster than regular joins because the
server can bail early without having to express the full join.

merlin