Re: subselect syntax - Mailing list pgsql-novice

From Michael Glaesemann
Subject Re: subselect syntax
Date
Msg-id 1821C578-8D90-4837-9894-2F1E0264F7EE@seespotcode.net
Whole thread Raw
In response to subselect syntax  (Steve Lefevre <lefevre.10@osu.edu>)
List pgsql-novice
On Jun 6, 2007, at 16:39 , Steve Lefevre wrote:

> I have a table of projects, users, and user_projects. The table
> user_projects creates a many-to-many relationship between users and
> projects.
>
> I'm creating a select list on a web form for adding a new user to a
> project. I want to select users from the user table, who aren't
> already listed in the join table under that project.

What your looking for is sometimes called a semijoin: getting
everything that *isn't* associated with something. SQL doesn't have a
semijoin operator, but there are a couple of workarounds.

First, here's the setup I'm using:

CREATE TABLE users (user_id TEXT PRIMARY KEY);

INSERT INTO users (user_id)
VALUES ('fred')
     , ('wilma')
     , ('barney')
     , ('betty');

CREATE TABLE projects ( project_id TEXT PRIMARY KEY);

INSERT INTO projects (project_id)
VALUES ('foo')
     , ('bar')
     , ('baz');

CREATE TABLE user_projects
(
     user_id TEXT NOT NULL REFERENCES users
     , project_id TEXT NOT NULL REFERENCES projects
     , PRIMARY KEY (user_id, project_id)
);

INSERT INTO user_projects (user_id, project_id)
VALUES ('fred', 'foo')
     , ('wilma', 'foo')
     , ('barney', 'bar')
     , ('betty', 'bar')
     , ('wilma', 'baz')
     , ('betty', 'baz');

> Here's my select:
> SELECT * FROM users
> WHERE user_id <> $current_user_id
> AND user_id <> ( SELECT user_id FROM user_projects WHERE project_id =
> $project_id )
>
> This query returns no rows, even on projects that have no records in
> the user_projects table!

Oh, really? I get an error when I try to run your example:

SELECT *
FROM users
WHERE user_id <> 'barney'
     AND user_id <> (
         SELECT user_id
         FROM user_projects
         WHERE project_id = 'baz');
ERROR:  more than one row returned by a subquery used as an expression

Note that this will work (at least in 8.2.4) if the subquery only
returns a single row, but you'd run into problems as soon as you have
more than one user affiliated with a given project.

One of the ways to get a semijoin is using EXCEPT. We want to get all
of the users *except* those affiliated with a certain project (in
this case 'baz'):

-- get all users
SELECT user_id
FROM users
EXCEPT
-- all of the users who are associated with project baz
SELECT user_id
FROM user_projects
WHERE project_id = 'baz';
user_id
---------
barney
fred
(2 rows)

Let's say barney is the current user, so we add a restriction:

SELECT user_id
FROM (
     -- get all users
     SELECT user_id
     FROM users
     EXCEPT
     -- all of the users who are associated with project baz
     SELECT user_id
     FROM user_projects
     WHERE project_id = 'baz'
     ) non_baz_users
WHERE user_id <> 'barney';
user_id
---------
fred
(1 row)

We can also reformulate that by moving the restriction up into the
first query:
-- get all users other than barney
SELECT user_id
FROM users
WHERE user_id <> 'barney'
EXCEPT
-- all of the users who are associated with project baz
SELECT user_id
FROM user_projects
WHERE project_id = 'baz';
user_id
---------
fred
(1 row)

Another way to do it is to use a LEFT JOIN, taking advantage of the
fact that columns will be filled with NULL where there is no
association. This is often faster than using EXCEPT.

SELECT user_id, project_id
FROM users
LEFT JOIN (
     -- users for project baz
     SELECT *
     FROM user_projects
     WHERE project_id = 'baz') u_p USING (user_id);
user_id | project_id
---------+------------
fred    |
wilma   | baz
barney  |
betty   | baz
(4 rows)

Note that all users are returned. Now we just restrict the result to
those whose project_id is null (because they're not associated with
project baz) and those who aren't barney:

SELECT user_id
FROM users
LEFT JOIN (
     -- users for project baz
     SELECT *
     FROM user_projects
     WHERE project_id = 'baz') u_p USING (user_id)
WHERE project_id IS NULL -- all users who aren't associated with
project baz
                          -- so their project_id column is NULL
     AND user_id <> 'barney'; -- and we don't want barney
user_id
---------
fred
(1 row)

A third way would have been to rewrite your query using NOT IN rather
than <>:

SELECT *
FROM users
WHERE user_id <> 'barney'
     AND user_id NOT IN (
         SELECT user_id
         FROM user_projects
         WHERE project_id = 'baz');
user_id
---------
fred
(1 row)

Hope that helps.

Michael Glaesemann
grzm seespotcode net




pgsql-novice by date:

Previous
From: Tom Lane
Date:
Subject: Re: subselect syntax
Next
From: "Loredana Curugiu"
Date:
Subject: Query with tables from 2 different databases in Java