Re: Retrieving multiple columns from a subquery - Mailing list pgsql-general

From Merlin Moncure
Subject Re: Retrieving multiple columns from a subquery
Date
Msg-id CAHyXU0z9dAV1NG_58HJLsCyDNOYhpo1nnWR9NxXU-j59q=YhFQ@mail.gmail.com
Whole thread Raw
In response to Retrieving multiple columns from a subquery  (Chris Hanks <christopher.m.hanks@gmail.com>)
List pgsql-general
On Wed, May 9, 2012 at 11:42 AM, Chris Hanks
<christopher.m.hanks@gmail.com> wrote:
> Hello -
>
> I have two tables:
>
> CREATE TABLE users
> (
>   id serial NOT NULL,
>   created_at timestamp with time zone NOT NULL,
>   last_seen_at timestamp with time zone NOT NULL,
>   -- some other columns...
> )
>
> CREATE TABLE emails
> (
>   user_id integer NOT NULL,
>   address text NOT NULL,
>   created_at timestamp with time zone NOT NULL,
>   confirmed_at timestamp with time zone,
>   -- some other columns...
>   CONSTRAINT emails_user_id_fkey FOREIGN KEY (user_id)
>       REFERENCES users (id) MATCH SIMPLE
>       ON UPDATE CASCADE ON DELETE CASCADE
> )
>
> The gist is that a single user can be related to multiple emails, and some
> email addresses are confirmed (they've clicked a link I've sent there, so I
> know it's valid) and some aren't.
>
> Routinely, when I'm fetching users from the db I also want to get the best
> email address for each user. That is, the email address that they've
> confirmed the most recently, or failing that, the one that they created the
> most recently. I've been doing this via a subselect:
>
> SELECT *,
>   (SELECT "address" FROM "emails" WHERE ("user_id" = "id") ORDER BY
> "confirmed_at" DESC NULLS LAST, "created_at" DESC LIMIT 1) AS "best_email"
> FROM "users"
>
> I like the subquery approach because I can use my ORM to easily append it to
> whatever query I'm running against the users table (whether I'm looking up
> one user or many), without having to do an explicit join and trim out the
> unnecessary rows. Also, in the future I'm planning on adding additional
> subqueries to get (for example) each user's current subscription status, and
> I'm afraid that the joins will get ungainly. Besides, I find subqueries much
> easier to reason about than joins.
>
> My problem is that now I need to get not only the best email's address, but
> whether it is confirmed (whether confirmed_at is not null). My first attempt
> was to simply repeat the subquery:
>
> SELECT *,
>   (SELECT "address" FROM "emails" WHERE ("user_id" = "id") ORDER BY
> "confirmed_at" DESC NULLS LAST, "created_at" DESC LIMIT 1) AS "best_email",
>   ((SELECT "confirmed_at" FROM "emails" WHERE ("user_id" = "id") ORDER BY
> "confirmed_at" DESC NULLS LAST, "created_at" DESC LIMIT 1) IS NOT NULL) AS
> "best_email_confirmed"
> FROM "users"
>
> I had hoped Postgres would recognize that the two subqueries were identical,
> but judging from the explain output from my development database it's not,
> and it's simply running the subquery twice instead:
>
> "Seq Scan on users  (cost=0.00..333.65 rows=13 width=81)"
> "  SubPlan 1"
> "    ->  Limit  (cost=12.79..12.79 rows=1 width=48)"
> "          ->  Sort  (cost=12.79..12.80 rows=5 width=48)"
> "                Sort Key: public.emails.confirmed_at,
> public.emails.created_at"
> "                ->  Bitmap Heap Scan on emails  (cost=4.29..12.76 rows=5
> width=48)"
> "                      Recheck Cond: (user_id = users.id)"
> "                      ->  Bitmap Index Scan on emails_pkey
>  (cost=0.00..4.29 rows=5 width=0)"
> "                            Index Cond: (user_id = users.id)"
> "  SubPlan 2"
> "    ->  Limit  (cost=12.79..12.79 rows=1 width=16)"
> "          ->  Sort  (cost=12.79..12.80 rows=5 width=16)"
> "                Sort Key: public.emails.confirmed_at,
> public.emails.created_at"
> "                ->  Bitmap Heap Scan on emails  (cost=4.29..12.76 rows=5
> width=16)"
> "                      Recheck Cond: (user_id = users.id)"
> "                      ->  Bitmap Index Scan on emails_pkey
>  (cost=0.00..4.29 rows=5 width=0)"
> "                            Index Cond: (user_id = users.id)"
>
> It would be ideal if I could pull both results from the same subquery,
> something like:
>
> SELECT *,
>   (SELECT "address", "confirmed_at" IS NOT NULL FROM "emails" WHERE
> ("user_id" = "id") ORDER BY "confirmed_at" DESC NULLS LAST, "created_at"
> DESC LIMIT 1) AS ("best_email", "best_email_confirmed")
> FROM "users"
>
> But this isn't valid syntax. I tried putting the subquery under a FROM
> clause, but it won't work with my "user_id" = "id" condition, and throws
> "ERROR:  subquery in FROM cannot refer to other relations of same query
> level". I think CTEs might be an answer, but I'm stuck on 8.3 for the
> foreseeable future, which doesn't support them.
>
> Does anyone have any suggestions?

try this.  sometimes, but not always, it will avoid the extra subplans:

SELECT u.*, (emails).* FROM
(
   SELECT *,
   (SELECT e FROM "emails" e WHERE ("user_id" = "id") ORDER BY
"confirmed_at" DESC NULLS LAST, "created_at" DESC LIMIT 1) AS emails,
  FROM "users" u
) q

merlin

pgsql-general by date:

Previous
From: Chris Hanks
Date:
Subject: Re: Retrieving multiple columns from a subquery
Next
From: Tom Lane
Date:
Subject: Re: Retrieving multiple columns from a subquery