Thread: subselect syntax

subselect syntax

From
Steve Lefevre
Date:
Hello everyone -

I'm moving from MySQL to Postgres and I am in a situation where I am
trying to use subselects properly.

I have a table of projects, users, and user_projects. The table
user_projects creates a many-to-many relationship between users and
projects.

I'm creating a select list on a web form for adding a new user to a
project. I want to select users from the user table, who aren't
already listed in the join table under that project.

Here's my select:
SELECT * FROM users
WHERE user_id <> $current_user_id
AND user_id <> ( SELECT user_id FROM user_projects WHERE project_id =
$project_id )

This query returns no rows, even on projects that have no records in
the user_projects table!

I am certain that I am not the $current_user_id. If I run this query:
SELECT * FROM users
WHERE user_id <> $current_user_id

I get all the user records besides myself.

What am I doing wrong?

Re: subselect syntax

From
"Phillip Smith"
Date:
> Here's my select:
> SELECT * FROM users
> WHERE user_id <> $current_user_id
> AND user_id <> ( SELECT user_id FROM user_projects WHERE project_id =
> $project_id )

Instead of "AND user_id <> (..." try using NOT IN:

SELECT    *
FROM        users
WHERE        user_id <> $current_user_id
 AND        user_id NOT IN (SELECT user_id FROM user_projects WHERE
project_id = $project_id)


*******************Confidentiality and Privilege Notice*******************

The material contained in this message is privileged and confidential to
the addressee.  If you are not the addressee indicated in this message or
responsible for delivery of the message to such person, you may not copy
or deliver this message to anyone, and you should destroy it and kindly
notify the sender by reply email.

Information in this message that does not relate to the official business
of Weatherbeeta must be treated as neither given nor endorsed by Weatherbeeta.
Weatherbeeta, its employees, contractors or associates shall not be liable
for direct, indirect or consequential loss arising from transmission of this
message or any attachments

Re: subselect syntax

From
Tom Lane
Date:
Steve Lefevre <lefevre.10@osu.edu> writes:
> SELECT * FROM users
> WHERE user_id <> $current_user_id
> AND user_id <> ( SELECT user_id FROM user_projects WHERE project_id =
> $project_id )

> This query returns no rows, even on projects that have no records in
> the user_projects table!

Well, that's not too surprising --- the subselect would deliver a NULL
result, and "user_id <> NULL" can't succeed (it'll always give NULL).

Perhaps what you want is

SELECT * FROM users
WHERE user_id <> $current_user_id
AND user_id NOT IN ( SELECT user_id FROM user_projects WHERE project_id =
$project_id )

although this has its own set of gotchas --- if you have any NULL
user_id entries in user_projects, it'll fail.

            regards, tom lane

Re: subselect syntax

From
Michael Glaesemann
Date:
On Jun 6, 2007, at 16:39 , Steve Lefevre wrote:

> I have a table of projects, users, and user_projects. The table
> user_projects creates a many-to-many relationship between users and
> projects.
>
> I'm creating a select list on a web form for adding a new user to a
> project. I want to select users from the user table, who aren't
> already listed in the join table under that project.

What your looking for is sometimes called a semijoin: getting
everything that *isn't* associated with something. SQL doesn't have a
semijoin operator, but there are a couple of workarounds.

First, here's the setup I'm using:

CREATE TABLE users (user_id TEXT PRIMARY KEY);

INSERT INTO users (user_id)
VALUES ('fred')
     , ('wilma')
     , ('barney')
     , ('betty');

CREATE TABLE projects ( project_id TEXT PRIMARY KEY);

INSERT INTO projects (project_id)
VALUES ('foo')
     , ('bar')
     , ('baz');

CREATE TABLE user_projects
(
     user_id TEXT NOT NULL REFERENCES users
     , project_id TEXT NOT NULL REFERENCES projects
     , PRIMARY KEY (user_id, project_id)
);

INSERT INTO user_projects (user_id, project_id)
VALUES ('fred', 'foo')
     , ('wilma', 'foo')
     , ('barney', 'bar')
     , ('betty', 'bar')
     , ('wilma', 'baz')
     , ('betty', 'baz');

> Here's my select:
> SELECT * FROM users
> WHERE user_id <> $current_user_id
> AND user_id <> ( SELECT user_id FROM user_projects WHERE project_id =
> $project_id )
>
> This query returns no rows, even on projects that have no records in
> the user_projects table!

Oh, really? I get an error when I try to run your example:

SELECT *
FROM users
WHERE user_id <> 'barney'
     AND user_id <> (
         SELECT user_id
         FROM user_projects
         WHERE project_id = 'baz');
ERROR:  more than one row returned by a subquery used as an expression

Note that this will work (at least in 8.2.4) if the subquery only
returns a single row, but you'd run into problems as soon as you have
more than one user affiliated with a given project.

One of the ways to get a semijoin is using EXCEPT. We want to get all
of the users *except* those affiliated with a certain project (in
this case 'baz'):

-- get all users
SELECT user_id
FROM users
EXCEPT
-- all of the users who are associated with project baz
SELECT user_id
FROM user_projects
WHERE project_id = 'baz';
user_id
---------
barney
fred
(2 rows)

Let's say barney is the current user, so we add a restriction:

SELECT user_id
FROM (
     -- get all users
     SELECT user_id
     FROM users
     EXCEPT
     -- all of the users who are associated with project baz
     SELECT user_id
     FROM user_projects
     WHERE project_id = 'baz'
     ) non_baz_users
WHERE user_id <> 'barney';
user_id
---------
fred
(1 row)

We can also reformulate that by moving the restriction up into the
first query:
-- get all users other than barney
SELECT user_id
FROM users
WHERE user_id <> 'barney'
EXCEPT
-- all of the users who are associated with project baz
SELECT user_id
FROM user_projects
WHERE project_id = 'baz';
user_id
---------
fred
(1 row)

Another way to do it is to use a LEFT JOIN, taking advantage of the
fact that columns will be filled with NULL where there is no
association. This is often faster than using EXCEPT.

SELECT user_id, project_id
FROM users
LEFT JOIN (
     -- users for project baz
     SELECT *
     FROM user_projects
     WHERE project_id = 'baz') u_p USING (user_id);
user_id | project_id
---------+------------
fred    |
wilma   | baz
barney  |
betty   | baz
(4 rows)

Note that all users are returned. Now we just restrict the result to
those whose project_id is null (because they're not associated with
project baz) and those who aren't barney:

SELECT user_id
FROM users
LEFT JOIN (
     -- users for project baz
     SELECT *
     FROM user_projects
     WHERE project_id = 'baz') u_p USING (user_id)
WHERE project_id IS NULL -- all users who aren't associated with
project baz
                          -- so their project_id column is NULL
     AND user_id <> 'barney'; -- and we don't want barney
user_id
---------
fred
(1 row)

A third way would have been to rewrite your query using NOT IN rather
than <>:

SELECT *
FROM users
WHERE user_id <> 'barney'
     AND user_id NOT IN (
         SELECT user_id
         FROM user_projects
         WHERE project_id = 'baz');
user_id
---------
fred
(1 row)

Hope that helps.

Michael Glaesemann
grzm seespotcode net