Re: Retrieving multiple columns from a subquery - Mailing list pgsql-general
From | Merlin Moncure |
---|---|
Subject | Re: Retrieving multiple columns from a subquery |
Date | |
Msg-id | CAHyXU0z9dAV1NG_58HJLsCyDNOYhpo1nnWR9NxXU-j59q=YhFQ@mail.gmail.com Whole thread Raw |
In response to | Retrieving multiple columns from a subquery (Chris Hanks <christopher.m.hanks@gmail.com>) |
List | pgsql-general |
On Wed, May 9, 2012 at 11:42 AM, Chris Hanks <christopher.m.hanks@gmail.com> wrote: > Hello - > > I have two tables: > > CREATE TABLE users > ( > id serial NOT NULL, > created_at timestamp with time zone NOT NULL, > last_seen_at timestamp with time zone NOT NULL, > -- some other columns... > ) > > CREATE TABLE emails > ( > user_id integer NOT NULL, > address text NOT NULL, > created_at timestamp with time zone NOT NULL, > confirmed_at timestamp with time zone, > -- some other columns... > CONSTRAINT emails_user_id_fkey FOREIGN KEY (user_id) > REFERENCES users (id) MATCH SIMPLE > ON UPDATE CASCADE ON DELETE CASCADE > ) > > The gist is that a single user can be related to multiple emails, and some > email addresses are confirmed (they've clicked a link I've sent there, so I > know it's valid) and some aren't. > > Routinely, when I'm fetching users from the db I also want to get the best > email address for each user. That is, the email address that they've > confirmed the most recently, or failing that, the one that they created the > most recently. I've been doing this via a subselect: > > SELECT *, > (SELECT "address" FROM "emails" WHERE ("user_id" = "id") ORDER BY > "confirmed_at" DESC NULLS LAST, "created_at" DESC LIMIT 1) AS "best_email" > FROM "users" > > I like the subquery approach because I can use my ORM to easily append it to > whatever query I'm running against the users table (whether I'm looking up > one user or many), without having to do an explicit join and trim out the > unnecessary rows. Also, in the future I'm planning on adding additional > subqueries to get (for example) each user's current subscription status, and > I'm afraid that the joins will get ungainly. Besides, I find subqueries much > easier to reason about than joins. > > My problem is that now I need to get not only the best email's address, but > whether it is confirmed (whether confirmed_at is not null). My first attempt > was to simply repeat the subquery: > > SELECT *, > (SELECT "address" FROM "emails" WHERE ("user_id" = "id") ORDER BY > "confirmed_at" DESC NULLS LAST, "created_at" DESC LIMIT 1) AS "best_email", > ((SELECT "confirmed_at" FROM "emails" WHERE ("user_id" = "id") ORDER BY > "confirmed_at" DESC NULLS LAST, "created_at" DESC LIMIT 1) IS NOT NULL) AS > "best_email_confirmed" > FROM "users" > > I had hoped Postgres would recognize that the two subqueries were identical, > but judging from the explain output from my development database it's not, > and it's simply running the subquery twice instead: > > "Seq Scan on users (cost=0.00..333.65 rows=13 width=81)" > " SubPlan 1" > " -> Limit (cost=12.79..12.79 rows=1 width=48)" > " -> Sort (cost=12.79..12.80 rows=5 width=48)" > " Sort Key: public.emails.confirmed_at, > public.emails.created_at" > " -> Bitmap Heap Scan on emails (cost=4.29..12.76 rows=5 > width=48)" > " Recheck Cond: (user_id = users.id)" > " -> Bitmap Index Scan on emails_pkey > (cost=0.00..4.29 rows=5 width=0)" > " Index Cond: (user_id = users.id)" > " SubPlan 2" > " -> Limit (cost=12.79..12.79 rows=1 width=16)" > " -> Sort (cost=12.79..12.80 rows=5 width=16)" > " Sort Key: public.emails.confirmed_at, > public.emails.created_at" > " -> Bitmap Heap Scan on emails (cost=4.29..12.76 rows=5 > width=16)" > " Recheck Cond: (user_id = users.id)" > " -> Bitmap Index Scan on emails_pkey > (cost=0.00..4.29 rows=5 width=0)" > " Index Cond: (user_id = users.id)" > > It would be ideal if I could pull both results from the same subquery, > something like: > > SELECT *, > (SELECT "address", "confirmed_at" IS NOT NULL FROM "emails" WHERE > ("user_id" = "id") ORDER BY "confirmed_at" DESC NULLS LAST, "created_at" > DESC LIMIT 1) AS ("best_email", "best_email_confirmed") > FROM "users" > > But this isn't valid syntax. I tried putting the subquery under a FROM > clause, but it won't work with my "user_id" = "id" condition, and throws > "ERROR: subquery in FROM cannot refer to other relations of same query > level". I think CTEs might be an answer, but I'm stuck on 8.3 for the > foreseeable future, which doesn't support them. > > Does anyone have any suggestions? try this. sometimes, but not always, it will avoid the extra subplans: SELECT u.*, (emails).* FROM ( SELECT *, (SELECT e FROM "emails" e WHERE ("user_id" = "id") ORDER BY "confirmed_at" DESC NULLS LAST, "created_at" DESC LIMIT 1) AS emails, FROM "users" u ) q merlin
pgsql-general by date: