Thread: Need Help With a A Simple Query That's Not So Simple
I think this should be easy, but I can't seem to put the SQL together correctly and would appreciate any help. (I'm using Pg 8.4 in CentOS 5.5, if that matters.)
I have a table of Farms and a table of crops in a 1:M relationship of Farms : Crops. There are lots of different crops to choose form but for now I'm only interested in two crops; corn and soybeans. Some farms grow only corn and some grow only soybeans, and some grow both. What I'd like to know is, which Farms and how many are growing only corn, which and how many are growing soybeans and which and how many are growing both? I can easily get all the corn growers with:
SELECT a.*
FROM farms a
JOIN crops b
ON a.farm_id=b.farm_id
WHERE crop_cd='0041'
I can do the same with soybeans (crop_cd= '0081') and then I could subtract the sum of these from the total of all farms that grow either corn or soybeans to get the number of farms growing both, but having to do all those queries sounds very time consuming and inefficient. Is there a better way to get the farm counts or data by categories like farms growing only corn, farms growing only soybeans, farms growing both? I'm also interested in possibly expanding to a general case where I could select more than two crops. and get counts of the permutations.
Here's a sketch of the relevant pieces of the data base.
Tables:
farms crops
======= =======
farm_id bigint (pkey) crop_id (pkey)
type farm_id foreign key to farms
size crop_cd 0041 = corn 0081=soybeans
... year
...
Any help would be much appreciated.
TIA,
- Bill Thoen
I have a table of Farms and a table of crops in a 1:M relationship of Farms : Crops. There are lots of different crops to choose form but for now I'm only interested in two crops; corn and soybeans. Some farms grow only corn and some grow only soybeans, and some grow both. What I'd like to know is, which Farms and how many are growing only corn, which and how many are growing soybeans and which and how many are growing both? I can easily get all the corn growers with:
SELECT a.*
FROM farms a
JOIN crops b
ON a.farm_id=b.farm_id
WHERE crop_cd='0041'
I can do the same with soybeans (crop_cd= '0081') and then I could subtract the sum of these from the total of all farms that grow either corn or soybeans to get the number of farms growing both, but having to do all those queries sounds very time consuming and inefficient. Is there a better way to get the farm counts or data by categories like farms growing only corn, farms growing only soybeans, farms growing both? I'm also interested in possibly expanding to a general case where I could select more than two crops. and get counts of the permutations.
Here's a sketch of the relevant pieces of the data base.
Tables:
farms crops
======= =======
farm_id bigint (pkey) crop_id (pkey)
type farm_id foreign key to farms
size crop_cd 0041 = corn 0081=soybeans
... year
...
Any help would be much appreciated.
TIA,
- Bill Thoen
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Bill Thoen Sent: Monday, October 31, 2011 6:51 PM To: Postgrresql Subject: [GENERAL] Need Help With a A Simple Query That's Not So Simple I think this should be easy, but I can't seem to put the SQL together correctly and would appreciate any help. (I'm using Pg 8.4 in CentOS 5.5, if that matters.) I have a table of Farms and a table of crops in a 1:M relationship of Farms : Crops. There are lots of different crops to choose form but for now I'm only interested in two crops; corn and soybeans. Some farms grow only corn and some grow only soybeans, and some grow both. What I'd like to know is, which Farms and how many are growing only corn, which and how many are growing soybeans and which and how many are growing both? I can easily get all the corn growers with: SELECT a.* FROM farms a JOIN crops b ON a.farm_id=b.farm_id WHERE crop_cd='0041' I can do the same with soybeans (crop_cd= '0081') and then I could subtract the sum of these from the total of all farms that grow either corn or soybeans to get the number of farms growing both, but having to do all those queries sounds very time consuming and inefficient. Is there a better way to get the farm counts or data by categories like farms growing only corn, farms growing only soybeans, farms growing both? I'm also interested in possibly expanding to a general case where I could select more than two crops. and get counts of the permutations. Here's a sketch of the relevant pieces of the data base. Tables: farms crops ======= ======= farm_id bigint (pkey) crop_id (pkey) type farm_id foreign key to farms size crop_cd 0041 = corn 0081=soybeans ... year ... Any help would be much appreciated. TIA, - Bill Thoen --------------------------------------------------------------- General Idea: WITH crop_one AS ( SELECT farm_id, crop_cd AS crop_one_cd ... ), crop_two AS ( SELECT farm_id, crop_cd AS crop_two_cd ) SELECT * FROM crop_one FULL OUTER JOIN crop_two USING (farm_id) ; Records with NULL for "crop_one_cd" only grow crop 2, records with NULL for "crop_two_cd" only grow crop 1, records where neither field is NULL grow both. Not sure regarding the general case. You likely want to use ARRAY_AGG to get a result like: Farm_id_100, { 'CROP_CD_1', 'CROP_CD_2' } You could then probably get a query to output something like: (crop_id, farms_exclusive, farms_shared, farms_without) Where each of the "farms_" columns is an array of farm_ids that match the particular conditional = ALL (exclusive); != ALL && = ANY (shared); != ANY (without) David J.
On 10/31/2011 5:05 PM, David Johnston wrote:
WITH crop_one AS (
SELECT farm_id, crop_cd AS corn FROM gfc_inilmoidia_2007 WHERE crop_cd ='0041'
), crop_two AS (
SELECT farm_id, crop_cd AS soybeans FROM gfc_inilmoidia_2007 WHERE crop_cd = '0081'
)
SELECT *
FROM crop_one
FULL OUTER JOIN crop_two USING (farm_id)
;
It produced the following (which is essentially the base of what I'm looking for):
farm_id | corn | soybeans
---------+------+----------
1473 | 0041 | 0081
1474 | 0041 | 0081
1474 | 0041 | 0081
1474 | 0041 | 0081
1474 | 0041 | 0081
1475 | 0041 |
1475 | 0041 |
1476 | 0041 | 0081
1476 | 0041 | 0081
1476 | 0041 | 0081
1476 | 0041 | 0081
1476 | 0041 | 0081
1476 | 0041 | 0081
1476 | 0041 | 0081
1476 | 0041 | 0081
1476 | 0041 | 0081
1477 | 0041 |
1478 | 0041 | 0081
1479 | 0041 |
1480 | | 0081
1480 | | 0081
Thanks so much for the quick reply. You've also just opened up a whole new area of query possibilities for me of which I wasn't aware
- Bill Thoen
Thanks David! That worked great! When I filled in the the query from the "general idea" in your example above like so:From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Bill Thoen Sent: Monday, October 31, 2011 6:51 PM To: Postgrresql Subject: [GENERAL] Need Help With a A Simple Query That's Not So Simple [...] What I'd like to know is, which Farms and how many are growing only corn, which and how many are growing soybeans and which and how many are growing both? [...] Is there a better way to get the farm counts or data by categories like farms growing only corn, farms growing only soybeans, farms growing both? I'm also interested in possibly expanding to a general case where I could select more than two crops. and get counts of the permutations. [...] --------------------------------------------------------------- General Idea: WITH crop_one AS (SELECT farm_id, crop_cd AS crop_one_cd ... ), crop_two AS (SELECT farm_id, crop_cd AS crop_two_cd ) SELECT * FROM crop_one FULL OUTER JOIN crop_two USING (farm_id) ; Records with NULL for "crop_one_cd" only grow crop 2, records with NULL for "crop_two_cd" only grow crop 1, records where neither field is NULL grow both. Not sure regarding the general case. You likely want to use ARRAY_AGG to get a result like: Farm_id_100, { 'CROP_CD_1', 'CROP_CD_2' } You could then probably get a query to output something like:(crop_id, farms_exclusive, farms_shared, farms_without) Where each of the "farms_" columns is an array of farm_ids that match the particular conditional = ALL (exclusive); != ALL && = ANY (shared); != ANY (without) David J.
WITH crop_one AS (
SELECT farm_id, crop_cd AS corn FROM gfc_inilmoidia_2007 WHERE crop_cd ='0041'
), crop_two AS (
SELECT farm_id, crop_cd AS soybeans FROM gfc_inilmoidia_2007 WHERE crop_cd = '0081'
)
SELECT *
FROM crop_one
FULL OUTER JOIN crop_two USING (farm_id)
;
It produced the following (which is essentially the base of what I'm looking for):
farm_id | corn | soybeans
---------+------+----------
1473 | 0041 | 0081
1474 | 0041 | 0081
1474 | 0041 | 0081
1474 | 0041 | 0081
1474 | 0041 | 0081
1475 | 0041 |
1475 | 0041 |
1476 | 0041 | 0081
1476 | 0041 | 0081
1476 | 0041 | 0081
1476 | 0041 | 0081
1476 | 0041 | 0081
1476 | 0041 | 0081
1476 | 0041 | 0081
1476 | 0041 | 0081
1476 | 0041 | 0081
1477 | 0041 |
1478 | 0041 | 0081
1479 | 0041 |
1480 | | 0081
1480 | | 0081
Thanks so much for the quick reply. You've also just opened up a whole new area of query possibilities for me of which I wasn't aware
- Bill Thoen