Thread: LIMIT within UNION?
Greetings- I have a table of participants to be contacted for a study. Some are in the "exposure" group, others in the "control" group. This is designated by a column, typenr, that contains 1 for exposure, 2 for control. The complication is this: I need to select 200 total. The 200 number should include *all* those eligible in the exposure group, plus enough from the control group to bring the total number up to 200. (Yes, there is a valid reason for this.) Furthermore, I need to sort the output of the two groups *together* by zip code. What I've done is to write a script that counts the number of eligible exposure candidates: SELECT count(*) FROM participants WHERE <eligibility>AND typenr=1 Then subtract that number (currently 28) from 200 to get 172 control participants. Then the problem starts. SELECT ... FROM participants WHERE typenr=1 AND <eligibility> UNION SELECT ... FROM participants WHERE typenr=2 LIMIT 172 ORDER BY zip; returns ERROR: parser: parse error at or near "ORDER" I've tried a variety of parentheses to no avail. Can someone shed some light? Thanks! ---------------------------------------------------------------------- Andrew J Perrin - http://www.unc.edu/~aperrin Assistant Professor of Sociology, U of North Carolina, Chapel Hill clists@perrin.socsci.unc.edu * andrew_perrin (at) unc.edu
Andrew Perrin <clists@perrin.socsci.unc.edu> writes: > SELECT ... FROM participants > WHERE typenr=1 AND <eligibility> > UNION > SELECT ... FROM participants > WHERE typenr=2 LIMIT 172 > ORDER BY zip; I think you need SELECT * FROM ( SELECT ... FROM participants WHERE typenr=1 AND <eligibility> UNION (SELECT ... FROM participants WHERE typenr=2 LIMIT 172) ) ss ORDER BY zip; Not sure if the inner set of parens is essential, but it might be. The outer SELECT superstructure is definitely necessary to give a place to hang the ORDER BY on. regards, tom lane
>>>>> "Andrew" == Andrew Perrin <clists@perrin.socsci.unc.edu> writes: Andrew> What I've done is to write a script that counts the number Andrew> of eligible exposure candidates: Andrew> SELECT count(*) FROM participants WHERE <eligibility> Andrew> AND typenr=1 Andrew> Then subtract that number (currently 28) from 200 to get Andrew> 172 control participants. Then the problemstarts. [...] Andrew> returns ERROR: parser: parse error at or near "ORDER" Can you do this via a subselect: SELECT * FROM ( SELECT ... FROM participants WHERE typenr=1 AND <eligibility> UNION SELECT ... FROM participants WHERE typenr=2 LIMIT 172 ) ORDER BY zip; roland -- PGP Key ID: 66 BC 3B CD Roland B. Roberts, PhD RL Enterprises roland@rlenter.com 76-15 113th Street, Apt 3B roland@astrofoto.org Forest Hills, NY 11375
Thanks! That did it. The inner parens are necessary - without them the ORDER BY seems to be parsed as part of the second subquery and is therefore a syntax error. Best, Andy ---------------------------------------------------------------------- Andrew J Perrin - http://www.unc.edu/~aperrin Assistant Professor of Sociology, U of North Carolina, Chapel Hill clists@perrin.socsci.unc.edu * andrew_perrin (at) unc.edu On Thu, 12 Sep 2002, Tom Lane wrote: > Andrew Perrin <clists@perrin.socsci.unc.edu> writes: > > SELECT ... FROM participants > > WHERE typenr=1 AND <eligibility> > > UNION > > SELECT ... FROM participants > > WHERE typenr=2 LIMIT 172 > > ORDER BY zip; > > I think you need > > SELECT * FROM > ( > SELECT ... FROM participants > WHERE typenr=1 AND <eligibility> > UNION > (SELECT ... FROM participants > WHERE typenr=2 LIMIT 172) > ) ss > ORDER BY zip; > > Not sure if the inner set of parens is essential, but it might be. > The outer SELECT superstructure is definitely necessary to give a > place to hang the ORDER BY on. > > regards, tom lane >
On 12 Sep 2002, Roland Roberts wrote: > >>>>> "Andrew" == Andrew Perrin <clists@perrin.socsci.unc.edu> writes: > ... > Can you do this via a subselect: > > SELECT * FROM > ( SELECT ... FROM participants > WHERE typenr=1 AND <eligibility> > UNION > SELECT ... FROM participants > WHERE typenr=2 LIMIT 172 ) > ORDER BY zip; > Unfortunately in this case the LIMIT is applied to the fully-UNIONed set, limiting the total number of cases to 172 instead of just those from the second subquery. Tom Lane's example worked, though. Thanks, Andy ---------------------------------------------------------------------- Andrew J Perrin - http://www.unc.edu/~aperrin Assistant Professor of Sociology, U of North Carolina, Chapel Hill clists@perrin.socsci.unc.edu * andrew_perrin (at) unc.edu
On Thu, 12 Sep 2002, Andrew Perrin wrote: > Greetings- > > I have a table of participants to be contacted for a study. Some are in > the "exposure" group, others in the "control" group. This is designated by > a column, typenr, that contains 1 for exposure, 2 for control. > > The complication is this: I need to select 200 total. The 200 number > should include *all* those eligible in the exposure group, plus enough > from the control group to bring the total number up to 200. (Yes, there is > a valid reason for this.) Furthermore, I need to sort the output of the > two groups *together* by zip code. Do you get more than 200 if there are more eligible people and does the ... ever include the same person in both sides of the union? If not in the second case, union all would probably save the database some extra work since it won't have to try to weed out duplicates. If not in the first case, then wouldn't a limit 200 on the after union result set work rather than a separate count and subtraction?
On Thu, 12 Sep 2002, Stephan Szabo wrote: > On Thu, 12 Sep 2002, Andrew Perrin wrote: > > > Greetings- > > > > I have a table of participants to be contacted for a study. Some are in > > the "exposure" group, others in the "control" group. This is designated by > > a column, typenr, that contains 1 for exposure, 2 for control. > > > > The complication is this: I need to select 200 total. The 200 number > > should include *all* those eligible in the exposure group, plus enough > > from the control group to bring the total number up to 200. (Yes, there is > > a valid reason for this.) Furthermore, I need to sort the output of the > > two groups *together* by zip code. > > Do you get more than 200 if there are more eligible people Yes - in the (rather rare) case that there are 200 or more eligible exposure subjects, the result set should be the total number of eligible exposure subjects. > and does the > ... ever include the same person in both sides of the union? No; each person is only in one of the two sides. > > If not in the second case, union all would probably save the database > some extra work since it won't have to try to weed out duplicates. I'll try that. > > If not in the first case, then wouldn't a limit 200 on the after union > result set work rather than a separate count and subtraction? > Interesting - this would count on the UNION including all cases in the first query before those in the second query. Are UNIONed records presented in any predictable order? ---------------------------------------------------------------------- Andrew J Perrin - http://www.unc.edu/~aperrin Assistant Professor of Sociology, U of North Carolina, Chapel Hill clists@perrin.socsci.unc.edu * andrew_perrin (at) unc.edu
Andrew Perrin <clists@perrin.socsci.unc.edu> writes: > Interesting - this would count on the UNION including all cases in the > first query before those in the second query. Are UNIONed records > presented in any predictable order? If you can use UNION ALL then this would be pretty safe to rely on. If you must use UNION (to eliminate dups) then it won't work. regards, tom lane
Andrew, Another approach, one that does not require you to know before constructing the query how many eligible subjects there are, is not to use a union at all: SELECT * FROM ( SELECT ... , zip FROM participants WHERE (typenr = 1 and <eligibility criteria) OR (typenr = 2) ORDER BY (typenr = 1 and <eligibility criteria>) DESC LIMIT 200 ) p1 ORDER BY zip; The inner query gives you all of the records that meet the eligibility criteria, plus all of the records that have typenr = 2, in the order of whether or not they meet the criteria (as a boolean value) and truncates it at 200 records. The outer query then re-sorts this result in zip order. This seems, to me, much more flexible than using a UNION query. -- -Josh BerkusAglio Database SolutionsSan Francisco
This is certainly the most elegant solution - thanks! Andy ---------------------------------------------------------------------- Andrew J Perrin - http://www.unc.edu/~aperrin Assistant Professor of Sociology, U of North Carolina, Chapel Hill clists@perrin.socsci.unc.edu * andrew_perrin (at) unc.edu On Thu, 12 Sep 2002, Josh Berkus wrote: > Andrew, > > Another approach, one that does not require you to know before constructing > the query how many eligible subjects there are, is not to use a union at all: > > SELECT * FROM ( > SELECT ... , zip > FROM participants > WHERE (typenr = 1 and <eligibility criteria) > OR (typenr = 2) > ORDER BY (typenr = 1 and <eligibility criteria>) DESC > LIMIT 200 ) p1 > ORDER BY zip; > > The inner query gives you all of the records that meet the eligibility > criteria, plus all of the records that have typenr = 2, in the order of > whether or not they meet the criteria (as a boolean value) and truncates it > at 200 records. > The outer query then re-sorts this result in zip order. > > This seems, to me, much more flexible than using a UNION query. > > -- > -Josh Berkus > Aglio Database Solutions > San Francisco > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >