Nested selects - Mailing list pgsql-sql

From Glenn Maynard
Subject Nested selects
Date
Msg-id d18085b50904071934g7ad206f1i14ac05f7bd29f05e@mail.gmail.com
Whole thread Raw
List pgsql-sql
I'm deriving high scores from two tables: one containing data for each
time a user played (rounds), and one containing a list of stages:

CREATE TABLE stage (id SERIAL NOT NULL PRIMARY KEY, name VARCHAR);
CREATE TABLE round (id SERIAL NOT NULL PRIMARY KEY, score REAL,
stage_id INTEGER REFERENCES stage (id));
INSERT INTO stage (name) VALUES ('stage 1'), ('stage 2'), ('stage 3');
INSERT INTO round (stage_id, score) VALUES (1, 100), (1, 150), (1, 175), (2, 250), (2, 275), (2, 220), (3, 350), (3,
380),(3, 322);
 

SELECT r.* FROM round r
WHERE r.id IN (   -- Get the high scoring round ID for each stage:   SELECT   (       -- Get the high score for stage
s:      SELECT r.id FROM round r       WHERE r.stage_id = s.id       ORDER BY r.score DESC LIMIT 1   )   FROM stage s
 
);

This works fine, and with a (stage_id, score DESC) index, is
reasonably fast with around 1000 stages.  round may expand to millions
of rows.

Unfortunately, it doesn't generalize to getting the top N scores for
each stage; LIMIT 2 isn't valid ("more than one row returned by a
subquery used as an expression").

I fiddled with putting the inner results in an array, without much
luck, and I'm not sure how well that'd optimize.  Having the results
in any particular order isn't important.  (In practice, the inner
select will often be more specific--"high scores on the west coast",
"high scores this month", and so on.)

This seems embarrassingly simple: return the top rounds for each
stage--but I'm banging my head on it for some reason.

-- 
Glenn Maynard


pgsql-sql by date:

Previous
From: Peter Koczan
Date:
Subject: Re: pl/pgsql or control structures outside of a function?
Next
From: Stuart McGraw
Date:
Subject: changing multiple pk's in one update