Thread: Calculation of per Capita on-the-fly - problems with SQL syntax
Hi there, I need to calculate per Capita data on-the-fly. My table for a given variable looks like this: year | value | id_country --------------------------------------- 2001 | 123 | 1 2002 | 125 | 1 2003 | 128 | 1 2004 | 132 | 1 2005 | 135 | 1 2001 | 412 | 2 2002 | 429 | 2 2003 | 456 | 2 2004 | 465 | 2 2005 | 477 | 2 Now, I can't get the calc working correctly. I use the query below, but a) it just takes too much time to come up with a result; and b) the results has three lines for each country, one with a value for y_2003 and a NULL for y_2002 one with a NULL for y_2003 and a value for y_2002 one with a NULL for both y_2003 and y_2002 SELECT DISTINCT ( CASE WHEN d.year=2003 AND pt.year=2003 AND pt.value <> '0' AND pt.value IS NOT NULL THEN d.value / pt.value ELSE NULL END ) AS y_2003, ( CASE WHEN d.year=2002 AND pt.year=2002 AND pt.value <> '0' AND pt.value IS NOT NULL THEN d.value / pt.value ELSE NULL END ) AS y_2002, c.name FROM public_multiple_tables.agri_area AS d LEFT JOIN public_multiple_tables.pop_total AS pt ON pt.id_country = d.id_country LEFT JOIN countries_view AS c ON c.id = d.id_country ORDER BY name ASC What am I doing wrong? Thanks for any advice, Stef
Stefan Schwarzer skrev: > Hi there, > > I need to calculate per Capita data on-the-fly. My table for a given > variable looks like this: > > year | value | id_country > --------------------------------------- > 2001 | 123 | 1 > 2002 | 125 | 1 > 2003 | 128 | 1 > 2004 | 132 | 1 > 2005 | 135 | 1 > > 2001 | 412 | 2 > 2002 | 429 | 2 > 2003 | 456 | 2 > 2004 | 465 | 2 > 2005 | 477 | 2 > > > Now, I can't get the calc working correctly. I use the query below, but > a) it just takes too much time to come up with a result; and b) the > results has three lines for each country, > one with a value for y_2003 and a NULL for y_2002 > one with a NULL for y_2003 and a value for y_2002 > one with a NULL for both y_2003 and y_2002 > > > SELECT DISTINCT > ( CASE WHEN d.year=2003 AND pt.year=2003 AND pt.value <> '0' AND > pt.value IS NOT NULL THEN d.value / pt.value ELSE NULL END ) AS y_2003, > ( CASE WHEN d.year=2002 AND pt.year=2002 AND pt.value <> '0' AND > pt.value IS NOT NULL THEN d.value / pt.value ELSE NULL END ) AS y_2002, > c.name > FROM > public_multiple_tables.agri_area AS d > LEFT JOIN > public_multiple_tables.pop_total AS pt ON pt.id_country = d.id_country > LEFT JOIN > countries_view AS c ON c.id = d.id_country > ORDER BY > name ASC > > > What am I doing wrong? Thanks for any advice, You are trying to do the join on the year in the SELECT expression. Also, you are trying to do the formatting into year-columns in your query. You are left joining to tables in which there should always be a mathing row (I assume). This should give you the same data out in a different format. Note that most of the NULL values will be excluded from this result. SELECT cname, year, d.value/pt.value FROM public_multiple_tables.agri_area AS d INNER JOIN public_multiple_tables.pop_total AS pt ON pt.id_country = d.id_country AND pt.year = d.year INNER JOIN countries_view AS c ON c.id = d.id_country WHERE d.year in (2002,2003,2004) AND pt.value <> 0 ORDER by c.name, year;
> This should give you the same data out in a different format. Note > that > most of the NULL values will be excluded from this result. > > SELECT cname, year, d.value/pt.value > FROM > public_multiple_tables.agri_area AS d > INNER JOIN > public_multiple_tables.pop_total AS pt ON pt.id_country = > d.id_country AND pt.year = d.year > INNER JOIN > countries_view AS c ON c.id = d.id_country > WHERE d.year in (2002,2003,2004) > AND pt.value <> 0 > ORDER by c.name, year; Ah... Thanks a lot for that. Just one thing: As my year columns can have as well values like "1970-75", they are not integers, but text fields. Thus, the "IN" parameter in the "WHERE" clause doesn't work. Do you have any other idea how ti could work? Thanks a lot for your help! Stef
Stefan Schwarzer wrote: >> This should give you the same data out in a different format. Note that >> most of the NULL values will be excluded from this result. >> >> SELECT cname, year, d.value/pt.value >> FROM >> public_multiple_tables.agri_area AS d >> INNER JOIN >> public_multiple_tables.pop_total AS pt ON pt.id_country = >> d.id_country AND pt.year = d.year >> INNER JOIN >> countries_view AS c ON c.id = d.id_country >> WHERE d.year in (2002,2003,2004) >> AND pt.value <> 0 >> ORDER by c.name, year; > > > Ah... Thanks a lot for that. > > Just one thing: As my year columns can have as well values like > "1970-75", they are not integers, but text fields. Thus, the "IN" > parameter in the "WHERE" clause doesn't work. Do you have any other > idea how ti could work? > Adjust your data so that table agri_area has a row for each year? Otherwise, i think you'd need to parse your year column by expanding any entries with a dash into a range. But i'm not sure that you could then compare a range against a list (WHERE ... IN ... ) Probably best done in a function. brian
On Mon, Oct 15, 2007 at 02:08:24PM +0200, Stefan Schwarzer wrote: > Just one thing: As my year columns can have as well values like "1970-75", > they are not integers, but text fields. Thus, the "IN" parameter in the > "WHERE" clause doesn't work. Do you have any other idea how ti could work? I'd be tempted to split your "year" column (maybe more appropriately named as "year range") into two integer columns, one for the start year and one for the end year. Once you've done that it would become much easier to write queries. Generally, you want to split the information that the database uses into one piece per column. If you do split it then you could do something like: WHERE d.year_min <= 2004 AND d.year_max >= 2002 Sam
On Oct 15, 2007, at 10:50 , brian wrote: > Adjust your data so that table agri_area has a row for each year? I can imagine that some of his raw data is a multi-year aggregate, so it may be difficult for him to do this. > Otherwise, i think you'd need to parse your year column by > expanding any entries with a dash into a range. I suggest using two *date* (or possibly integer) columns for each row and consider each row an year interval (in the mathematical sense, not to be confused with SQL intervals, which are actually durations). Depending on the interval representation you choose (closed-open or closed-closed), a single-year interval (say, 1970), would be represented as either ('1970-01-01', '1970-01-01') or ('1970-01-01', '1971-01-01'). The reason I suggest using dates rather than integers is that it gives you more flexibility for the future, and you have access to all of the date functions available in Postgres that you will probably find useful. > But i'm not sure that you could then compare a range against a list > (WHERE ... IN ... ) The only reference for managing temporal data in ANSI SQL I'm aware of is Snodgrass' "Developing Time-Oriented Database Applications in SQL"[1]. Date, Darwen, and Lorentzos' "Temporal Data and the Relational Model"[2] is a more general text on the same topic, but isn't directly applicable to ANSI SQL. I suggest looking at least at the first. Hope this helps. Michael Glaesemann grzm seespotcode net [1](http://www.cs.arizona.edu/~rts/tdbbook.pdf) [2](http://www.amazon.com/gp/product/1558608559/)
> I suggest using two *date* (or possibly integer) columns for each > row and consider each row an year interval (in the mathematical > sense, not to be confused with SQL intervals, which are actually > durations). Depending on the interval representation you choose > (closed-open or closed-closed), a single-year interval (say, 1970), > would be represented as either ('1970-01-01', '1970-01-01') or > ('1970-01-01', '1971-01-01'). While trying to move step by step into the "right" direction, I discover at every second step some "riddles" I have to solve... Although not so much in favor of using two columns for the year (start, end), I did it - only to discover that I have a couple of variables that don't have a "numerical" year (such as 1970, or even 1970-75), but a textual called "various". "Various" means that there is no common year for all countries for the retrieval/assessment of the given value. That is, for one country it can be 1990, for another 1992, for a third 1995. As the actual year is not important (and not always defined), one tries to use this kind of common describer: various. Evidently, it doesn't work with any of the envisaged or proposed table layouts. Does anyone have a nice solution to this problem? Thanks for any ideas! Yours (slightly frustrated), Stef
On Oct 16, 2007, at 8:09 , Stefan Schwarzer wrote: > That is, for one country it can be 1990, for another 1992, for a > third 1995. As the actual year is not important (and not always > defined), one tries to use this kind of common describer: various. When you don't have the year, put them in another table: they're not the type of data as the rest. If you do, include the year just like normal and modify the query so it's not joining on the year. If you need to include both with years and without, write your query appropriately to include rows from both tables. Michael Glaesemann grzm seespotcode net
>> SELECT DISTINCT >> ( CASE WHEN d.year=2003 AND pt.year=2003 AND pt.value <> '0' AND >> pt.value IS NOT NULL THEN d.value / pt.value ELSE NULL END ) AS >> y_2003, >> ( CASE WHEN d.year=2002 AND pt.year=2002 AND pt.value <> '0' AND >> pt.value IS NOT NULL THEN d.value / pt.value ELSE NULL END ) AS >> y_2002, >> c.name >> FROM >> public_multiple_tables.agri_area AS d >> LEFT JOIN >> public_multiple_tables.pop_total AS pt ON pt.id_country = >> d.id_country >> LEFT JOIN >> countries_view AS c ON c.id = d.id_country >> ORDER BY >> name ASC >> >> >> What am I doing wrong? Thanks for any advice, > > You are trying to do the join on the year in the SELECT expression. > Also, you are trying to do the formatting into year-columns in your > query. You are left joining to tables in which there should always > be a > mathing row (I assume). > > This should give you the same data out in a different format. Note > that > most of the NULL values will be excluded from this result. > > SELECT cname, year, d.value/pt.value > FROM > public_multiple_tables.agri_area AS d > INNER JOIN > public_multiple_tables.pop_total AS pt ON pt.id_country = > d.id_country AND pt.year = d.year > INNER JOIN > countries_view AS c ON c.id = d.id_country > WHERE d.year in (2002,2003,2004) > AND pt.value <> 0 > ORDER by c.name, year; Hmmm.... Actually, my intention was to get a more "excel" like output, that is the formatting into year-columns. This eases a lot the PHP/HTML display/loop. Otherwise I would have to start to do some joggling inside PHP to get it that way....
On Wed, Oct 17, 2007 at 08:29:58AM +0200, Stefan Schwarzer wrote: > Hmmm.... Actually, my intention was to get a more "excel" like output, that > is the formatting into year-columns. This eases a lot the PHP/HTML > display/loop. Otherwise I would have to start to do some joggling inside > PHP to get it that way.... When you say "excel-like output" do you mean some sort of cross- tabulated output? i.e. instead of n y v x 01 5 x 02 7 y 01 3 z 02 4 you want: n 01 02 x 5 7 y 3 z 4 or something like that? If you do then I tend to get that effect by doing something like: SELECT n, SUM(CASE WHEN y = 01 THEN v END) AS y01, SUM(CASE WHEN y = 02 THEN v END) AS y02 FROM table GROUP BY n ORDER BY n; This gets a bit fiddly if you're doing lots of columns/categories, but gives lots of flexibility with the columns you produce. There's a subdirectory in the contrib directory called "tablefunc" that provides some code to do crosstabs if all you want to do is something simple. If you're doing it by hand, you have the advantage of being able to have something like: SUM(v) AS total at the end, or maybe a mean (AVG) or standard deviation (STDDEV) if it makes sense to do anything like that. Sam