Re: Retrieving multiple columns from a subquery - Mailing list pgsql-general

From Chris Hanks
Subject Re: Retrieving multiple columns from a subquery
Date
Msg-id CAK7KUdA_tm4YK89f5EMc5G_XAhP80BV8=h8RUc6Ocafn44mbqg@mail.gmail.com
Whole thread Raw
In response to Retrieving multiple columns from a subquery  (Chris Hanks <christopher.m.hanks@gmail.com>)
Responses Re: Retrieving multiple columns from a subquery
List pgsql-general
Nothing? Are subqueries just not meant to be used this way?


On Wed, May 9, 2012 at 9:42 AM, Chris Hanks <christopher.m.hanks@gmail.com> wrote:
Hello -

I have two tables:

CREATE TABLE users
(
  id serial NOT NULL,
  created_at timestamp with time zone NOT NULL,
  last_seen_at timestamp with time zone NOT NULL,
  -- some other columns...
)

CREATE TABLE emails
(
  user_id integer NOT NULL,
  address text NOT NULL,
  created_at timestamp with time zone NOT NULL,
  confirmed_at timestamp with time zone,
  -- some other columns...
  CONSTRAINT emails_user_id_fkey FOREIGN KEY (user_id)
      REFERENCES users (id) MATCH SIMPLE
      ON UPDATE CASCADE ON DELETE CASCADE
)

The gist is that a single user can be related to multiple emails, and some email addresses are confirmed (they've clicked a link I've sent there, so I know it's valid) and some aren't.

Routinely, when I'm fetching users from the db I also want to get the best email address for each user. That is, the email address that they've confirmed the most recently, or failing that, the one that they created the most recently. I've been doing this via a subselect:

SELECT *,
  (SELECT "address" FROM "emails" WHERE ("user_id" = "id") ORDER BY "confirmed_at" DESC NULLS LAST, "created_at" DESC LIMIT 1) AS "best_email"
FROM "users"

I like the subquery approach because I can use my ORM to easily append it to whatever query I'm running against the users table (whether I'm looking up one user or many), without having to do an explicit join and trim out the unnecessary rows. Also, in the future I'm planning on adding additional subqueries to get (for example) each user's current subscription status, and I'm afraid that the joins will get ungainly. Besides, I find subqueries much easier to reason about than joins.

My problem is that now I need to get not only the best email's address, but whether it is confirmed (whether confirmed_at is not null). My first attempt was to simply repeat the subquery:

SELECT *,
  (SELECT "address" FROM "emails" WHERE ("user_id" = "id") ORDER BY "confirmed_at" DESC NULLS LAST, "created_at" DESC LIMIT 1) AS "best_email",
  ((SELECT "confirmed_at" FROM "emails" WHERE ("user_id" = "id") ORDER BY "confirmed_at" DESC NULLS LAST, "created_at" DESC LIMIT 1) IS NOT NULL) AS "best_email_confirmed"
FROM "users"

I had hoped Postgres would recognize that the two subqueries were identical, but judging from the explain output from my development database it's not, and it's simply running the subquery twice instead:

"Seq Scan on users  (cost=0.00..333.65 rows=13 width=81)"
"  SubPlan 1"
"    ->  Limit  (cost=12.79..12.79 rows=1 width=48)"
"          ->  Sort  (cost=12.79..12.80 rows=5 width=48)"
"                Sort Key: public.emails.confirmed_at, public.emails.created_at"
"                ->  Bitmap Heap Scan on emails  (cost=4.29..12.76 rows=5 width=48)"
"                      Recheck Cond: (user_id = users.id)"
"                      ->  Bitmap Index Scan on emails_pkey  (cost=0.00..4.29 rows=5 width=0)"
"                            Index Cond: (user_id = users.id)"
"  SubPlan 2"
"    ->  Limit  (cost=12.79..12.79 rows=1 width=16)"
"          ->  Sort  (cost=12.79..12.80 rows=5 width=16)"
"                Sort Key: public.emails.confirmed_at, public.emails.created_at"
"                ->  Bitmap Heap Scan on emails  (cost=4.29..12.76 rows=5 width=16)"
"                      Recheck Cond: (user_id = users.id)"
"                      ->  Bitmap Index Scan on emails_pkey  (cost=0.00..4.29 rows=5 width=0)"
"                            Index Cond: (user_id = users.id)"

It would be ideal if I could pull both results from the same subquery, something like:

SELECT *,
  (SELECT "address", "confirmed_at" IS NOT NULL FROM "emails" WHERE ("user_id" = "id") ORDER BY "confirmed_at" DESC NULLS LAST, "created_at" DESC LIMIT 1) AS ("best_email", "best_email_confirmed")
FROM "users"

But this isn't valid syntax. I tried putting the subquery under a FROM clause, but it won't work with my "user_id" = "id" condition, and throws "ERROR:  subquery in FROM cannot refer to other relations of same query level". I think CTEs might be an answer, but I'm stuck on 8.3 for the foreseeable future, which doesn't support them.

Does anyone have any suggestions?

pgsql-general by date:

Previous
From: Basil Bourque
Date:
Subject: Cannot find installers for 9.2 Beta
Next
From: Merlin Moncure
Date:
Subject: Re: Retrieving multiple columns from a subquery