Thread: hopefully a brain teaser, can't quite figure out query

hopefully a brain teaser, can't quite figure out query

From
edfialk
Date:
Hi guys, maybe this is just a teaser for me, but any help would be
awesome.  My best crack at the solution is not returning yet after a
good ten minutes.  I'll post the explain analyze if it ever comes
back.  I have no indexing, which is probably embarrassing, I'm just
not quite sure what to index or really how.

So, I've got two tables, which I'm going to try to simplify and still
get across:

The small table is a listing of county fips codes, their name, and the
geometry for the county.  Each fips is only listed once. The big table
is multiple emissions for each county, the parameter for the emission,
and the source code for the emission (scc).  Each county in big tbale
has many entries, variable number of pollutant types, variable number
of scc's.

small table:
fips (int), name(string), geom(geometry)
123, "some county", "some geometry"
124, "some other county", "some other geometry"
etc.

big table:
fips (int), pollutant(string), value(double), scc(int)
123, "co", 1000, 1000
123, "co", 1500, 1000
123, "co", 500, 1001
123, "co", 550, 1001
123, "co", 1500, 1002
123, "co", 50, 1002

123, "so2", 1100, 1000
123, "so2", 510, 1001
123, "so2", 1510, 1002
etc.

So the user supplies the pollutant, a number of scc's, and a minimum
value.
I need to select all the fips from the small table, as well as the sum
of values in the big table (for that pollutant and scc codes) WHERE
the sum of the values in the big table (for that pollutant and those
scc codes) are larger than the given value.

So, for the above, say the user picked "CO" for pollutant, 1001 and
1002 for SCC.
the rows in big table with 1001 OR 1002 for scc AND "co" for pollutant
are:
123, "co", 500, 1001
123, "co", 550, 1001
123, "co", 1500, 1002
123, "co", 50, 1002

I would return the 123 fips ONLY if the value provided was less than
the sum of the values for all scc's (500+550+1500+50 = 2600), as well
as the sum for those values.

so, return set would be
fips, name, value
123, "some county", 2600

for each fips in small table...

Boy I wonder if that makes sense, maybe some pseudo SQL speak would be
better:

SELECT small.fips, small.name, sum(big.value)
FROM small, big
WHERE
small.fips in (
  SELECT fips from big
  WHERE ((pollutant='co') AND
  (
    (scc LIKE '21%') OR (scc LIKE '2301%') OR (scc LIKE '2801000000%')
  )
  HAVING SUM(value > 2000)
)
GROUP BY small.fips, small.name;

This is the query that isn't returning yet.
If anyone has any questions, comments, or any suggestions at all, I'll
do my best to respond ASAP.
Any help is extremely appreciated!
-Ed

Re: hopefully a brain teaser, can't quite figure out query

From
"Adam Rich"
Date:
> -----Original Message-----
> The small table is a listing of county fips codes, their name, and the
> geometry for the county.  Each fips is only listed once. The big table
> is multiple emissions for each county, the parameter for the emission,
> and the source code for the emission (scc).  Each county in big tbale
> has many entries, variable number of pollutant types, variable number
> of scc's.
>
>
> SELECT small.fips, small.name, sum(big.value)
> FROM small, big
> WHERE
> small.fips in (
>   SELECT fips from big
>   WHERE ((pollutant='co') AND
>   (
>     (scc LIKE '21%') OR (scc LIKE '2301%') OR (scc LIKE '2801000000%')
>   )
>   HAVING SUM(value > 2000)
> )
> GROUP BY small.fips, small.name;
>
> This is the query that isn't returning yet.
> If anyone has any questions, comments, or any suggestions at all, I'll
> do my best to respond ASAP.

This sounds like what you want:

SELECT small.fips, small.name, sum(big.value) as big_sum
FROM small
INNER JOIN big on small.fips = big.fips
WHERE pollutant = 'co'
AND (scc LIKE '21%' OR scc LIKE '2301%' OR scc LIKE '2801000000%')
GROUP BY small.fips, small.name

However, I'm not sure I understand this part:

> I would return the 123 fips ONLY if the value provided was less
> than the sum of the values for all scc's (500+550+1500+50 = 2600),
> as well as the sum for those values.

Can you clarify?