Thread: Calculation of per Capita on-the-fly - problems with SQL syntax

Calculation of per Capita on-the-fly - problems with SQL syntax

From
Stefan Schwarzer
Date:
Hi there,

I need to calculate per Capita data on-the-fly. My table for a given
variable looks like this:

     year    |    value     |    id_country
---------------------------------------
       2001    |     123       |   1
       2002    |     125       |   1
       2003    |     128       |   1
       2004    |     132       |   1
       2005    |     135       |   1

       2001    |     412       |   2
       2002    |     429       |   2
       2003    |     456       |   2
       2004    |     465       |   2
       2005    |     477       |   2


Now, I can't get the calc working correctly. I use the query below,
but a) it just takes too much time to come up with a result; and b)
the results has three lines for each country,
    one with a value for y_2003 and a NULL for y_2002
    one with a NULL for y_2003 and a value for y_2002
    one with a NULL for both y_2003 and y_2002


SELECT DISTINCT
    ( CASE WHEN d.year=2003 AND pt.year=2003 AND pt.value <> '0' AND
pt.value IS NOT NULL THEN d.value / pt.value ELSE NULL END ) AS y_2003,
    ( CASE WHEN d.year=2002 AND pt.year=2002 AND pt.value <> '0' AND
pt.value IS NOT NULL THEN d.value / pt.value ELSE NULL END ) AS y_2002,
    c.name
FROM
    public_multiple_tables.agri_area AS d
LEFT JOIN
     public_multiple_tables.pop_total AS pt ON pt.id_country = d.id_country
LEFT JOIN
    countries_view AS c ON c.id = d.id_country
ORDER BY
    name ASC


What am I doing wrong? Thanks for any advice,

Stef




Re: Calculation of per Capita on-the-fly - problems with SQL syntax

From
Nis Jørgensen
Date:
Stefan Schwarzer skrev:
> Hi there,
>
> I need to calculate per Capita data on-the-fly. My table for a given
> variable looks like this:
>
>     year    |    value     |    id_country
> ---------------------------------------
>       2001    |     123       |   1
>       2002    |     125       |   1
>       2003    |     128       |   1
>       2004    |     132       |   1
>       2005    |     135       |   1
>
>       2001    |     412       |   2
>       2002    |     429       |   2
>       2003    |     456       |   2
>       2004    |     465       |   2
>       2005    |     477       |   2
>
>
> Now, I can't get the calc working correctly. I use the query below, but
> a) it just takes too much time to come up with a result; and b) the
> results has three lines for each country,
>     one with a value for y_2003 and a NULL for y_2002
>     one with a NULL for y_2003 and a value for y_2002
>     one with a NULL for both y_2003 and y_2002
>
>
> SELECT DISTINCT
>     ( CASE WHEN d.year=2003 AND pt.year=2003 AND pt.value <> '0' AND
> pt.value IS NOT NULL THEN d.value / pt.value ELSE NULL END ) AS y_2003,
>     ( CASE WHEN d.year=2002 AND pt.year=2002 AND pt.value <> '0' AND
> pt.value IS NOT NULL THEN d.value / pt.value ELSE NULL END ) AS y_2002,
>     c.name
> FROM
>     public_multiple_tables.agri_area AS d
> LEFT JOIN
>      public_multiple_tables.pop_total AS pt ON pt.id_country = d.id_country
> LEFT JOIN
>     countries_view AS c ON c.id = d.id_country
> ORDER BY
>     name ASC
>
>
> What am I doing wrong? Thanks for any advice,

You are trying to do the join on the year in the SELECT expression.
Also, you are trying to do the formatting into year-columns in your
query. You are left joining to tables in which there should always be a
mathing row (I assume).

This should give you the same data out in a different format. Note that
most of the NULL values will be excluded from this result.

SELECT cname, year, d.value/pt.value
FROM
    public_multiple_tables.agri_area AS d
INNER JOIN
     public_multiple_tables.pop_total AS pt ON pt.id_country =
d.id_country AND pt.year = d.year
INNER JOIN
    countries_view AS c ON c.id = d.id_country
WHERE d.year in (2002,2003,2004)
AND pt.value <> 0
ORDER by c.name, year;

Re: Calculation of per Capita on-the-fly - problems with SQL syntax

From
Stefan Schwarzer
Date:
> This should give you the same data out in a different format. Note
> that
> most of the NULL values will be excluded from this result.
>
> SELECT cname, year, d.value/pt.value
> FROM
>     public_multiple_tables.agri_area AS d
> INNER JOIN
>      public_multiple_tables.pop_total AS pt ON pt.id_country =
> d.id_country AND pt.year = d.year
> INNER JOIN
>     countries_view AS c ON c.id = d.id_country
> WHERE d.year in (2002,2003,2004)
> AND pt.value <> 0
> ORDER by c.name, year;

Ah... Thanks a lot for that.

Just one thing: As my year columns can have as well values like
"1970-75", they are not integers, but text fields. Thus, the "IN"
parameter in the "WHERE" clause doesn't work. Do you have any other
idea how ti could work?

Thanks a lot for your help!

Stef

Re: Calculation of per Capita on-the-fly - problems with SQL syntax

From
brian
Date:
Stefan Schwarzer wrote:
>> This should give you the same data out in a different format. Note  that
>> most of the NULL values will be excluded from this result.
>>
>> SELECT cname, year, d.value/pt.value
>> FROM
>>     public_multiple_tables.agri_area AS d
>> INNER JOIN
>>      public_multiple_tables.pop_total AS pt ON pt.id_country =
>> d.id_country AND pt.year = d.year
>> INNER JOIN
>>     countries_view AS c ON c.id = d.id_country
>> WHERE d.year in (2002,2003,2004)
>> AND pt.value <> 0
>> ORDER by c.name, year;
>
>
> Ah... Thanks a lot for that.
>
> Just one thing: As my year columns can have as well values like
> "1970-75", they are not integers, but text fields. Thus, the "IN"
> parameter in the "WHERE" clause doesn't work. Do you have any other
> idea how ti could work?
>

Adjust your data so that table agri_area has a row for each year?

Otherwise, i think you'd need to parse your year column by expanding any
entries with a dash into a range. But i'm not sure that you could then
compare a range against a list (WHERE ... IN ... )

Probably best done in a function.

brian

Re: Calculation of per Capita on-the-fly - problems with SQL syntax

From
Sam Mason
Date:
On Mon, Oct 15, 2007 at 02:08:24PM +0200, Stefan Schwarzer wrote:
> Just one thing: As my year columns can have as well values like "1970-75",
> they are not integers, but text fields. Thus, the "IN" parameter in the
> "WHERE" clause doesn't work. Do you have any other idea how ti could work?

I'd be tempted to split your "year" column (maybe more appropriately
named as "year range") into two integer columns, one for the start year
and one for the end year.  Once you've done that it would become much
easier to write queries.  Generally, you want to split the information
that the database uses into one piece per column.  If you do split it
then you could do something like:

  WHERE d.year_min <= 2004 AND d.year_max >= 2002


  Sam

Re: Calculation of per Capita on-the-fly - problems with SQL syntax

From
Michael Glaesemann
Date:
On Oct 15, 2007, at 10:50 , brian wrote:

> Adjust your data so that table agri_area has a row for each year?

I can imagine that some of his raw data is a multi-year aggregate, so
it may be difficult for him to do this.

> Otherwise, i think you'd need to parse your year column by
> expanding any entries with a dash into a range.

I suggest using two *date* (or possibly integer) columns for each row
and consider each row an year interval (in the mathematical sense,
not to be confused with SQL intervals, which are actually durations).
Depending on the interval representation you choose (closed-open or
closed-closed), a single-year interval (say, 1970), would be
represented as either ('1970-01-01', '1970-01-01') or ('1970-01-01',
'1971-01-01').

The reason I suggest using dates rather than integers is that it
gives you more flexibility for the future, and you have access to all
of the date functions available in Postgres that you will probably
find useful.

> But i'm not sure that you could then compare a range against a list
> (WHERE ... IN ... )

The only reference for managing temporal data in ANSI SQL I'm aware
of is Snodgrass' "Developing Time-Oriented Database Applications in
SQL"[1]. Date, Darwen, and Lorentzos' "Temporal Data and the
Relational Model"[2] is a more general text on the same topic, but
isn't directly applicable to ANSI SQL. I suggest looking at least at
the first.

Hope this helps.

Michael Glaesemann
grzm seespotcode net

[1](http://www.cs.arizona.edu/~rts/tdbbook.pdf)
[2](http://www.amazon.com/gp/product/1558608559/)


Re: Calculation of per Capita on-the-fly - problems with SQL syntax

From
Stefan Schwarzer
Date:
> I suggest using two *date* (or possibly integer) columns for each
> row and consider each row an year interval (in the mathematical
> sense, not to be confused with SQL intervals, which are actually
> durations). Depending on the interval representation you choose
> (closed-open or closed-closed), a single-year interval (say, 1970),
> would be represented as either ('1970-01-01', '1970-01-01') or
> ('1970-01-01', '1971-01-01').

While trying to move step by step into the "right" direction, I
discover at every second step some "riddles" I have to solve...
Although not so much in favor of using two columns for the year
(start, end), I did it - only to discover that I have a couple of
variables that don't have a "numerical" year (such as 1970, or even
1970-75), but a textual called "various". "Various" means that there
is no common year for all countries for the retrieval/assessment of
the given value. That is, for one country it can be 1990, for another
1992, for a third 1995. As the actual year is not important (and not
always defined), one tries to use this kind of common describer:
various.

Evidently, it doesn't work with any of the envisaged or proposed
table layouts.

Does anyone have a nice solution to this problem? Thanks for any ideas!

Yours (slightly frustrated),

Stef

Re: Calculation of per Capita on-the-fly - problems with SQL syntax

From
Michael Glaesemann
Date:
On Oct 16, 2007, at 8:09 , Stefan Schwarzer wrote:

> That is, for one country it can be 1990, for another 1992, for a
> third 1995. As the actual year is not important (and not always
> defined), one tries to use this kind of common describer: various.

When you don't have the year, put them in another table: they're not
the type of data as the rest. If you do, include the year just like
normal and modify the query so it's not joining on the year. If you
need to include both with years and without, write your query
appropriately to include rows from both tables.

Michael Glaesemann
grzm seespotcode net



Re: Calculation of per Capita on-the-fly - problems with SQL syntax

From
Stefan Schwarzer
Date:
>> SELECT DISTINCT
>>     ( CASE WHEN d.year=2003 AND pt.year=2003 AND pt.value <> '0' AND
>> pt.value IS NOT NULL THEN d.value / pt.value ELSE NULL END ) AS
>> y_2003,
>>     ( CASE WHEN d.year=2002 AND pt.year=2002 AND pt.value <> '0' AND
>> pt.value IS NOT NULL THEN d.value / pt.value ELSE NULL END ) AS
>> y_2002,
>>     c.name
>> FROM
>>     public_multiple_tables.agri_area AS d
>> LEFT JOIN
>>      public_multiple_tables.pop_total AS pt ON pt.id_country =
>> d.id_country
>> LEFT JOIN
>>     countries_view AS c ON c.id = d.id_country
>> ORDER BY
>>     name ASC
>>
>>
>> What am I doing wrong? Thanks for any advice,
>
> You are trying to do the join on the year in the SELECT expression.
> Also, you are trying to do the formatting into year-columns in your
> query. You are left joining to tables in which there should always
> be a
> mathing row (I assume).
>
> This should give you the same data out in a different format. Note
> that
> most of the NULL values will be excluded from this result.
>
> SELECT cname, year, d.value/pt.value
> FROM
>     public_multiple_tables.agri_area AS d
> INNER JOIN
>      public_multiple_tables.pop_total AS pt ON pt.id_country =
> d.id_country AND pt.year = d.year
> INNER JOIN
>     countries_view AS c ON c.id = d.id_country
> WHERE d.year in (2002,2003,2004)
> AND pt.value <> 0
> ORDER by c.name, year;

Hmmm.... Actually, my intention was to get a more "excel" like
output, that is the formatting into year-columns. This eases a lot
the PHP/HTML display/loop. Otherwise I would have to start to do some
joggling inside PHP to get it that way....


Re: Calculation of per Capita on-the-fly - problems with SQL syntax

From
Sam Mason
Date:
On Wed, Oct 17, 2007 at 08:29:58AM +0200, Stefan Schwarzer wrote:
> Hmmm.... Actually, my intention was to get a more "excel" like output, that
> is the formatting into year-columns. This eases a lot the PHP/HTML
> display/loop. Otherwise I would have to start to do some joggling inside
> PHP to get it that way....

When you say "excel-like output" do you mean some sort of cross-
tabulated output? i.e. instead of

  n  y  v
  x 01  5
  x 02  7
  y 01  3
  z 02  4

you want:

  n  01 02
  x   5  7
  y   3
  z      4

or something like that?  If you do then I tend to get that effect by
doing something like:

  SELECT n,
    SUM(CASE WHEN y = 01 THEN v END) AS y01,
    SUM(CASE WHEN y = 02 THEN v END) AS y02
  FROM table
  GROUP BY n
  ORDER BY n;

This gets a bit fiddly if you're doing lots of columns/categories, but
gives lots of flexibility with the columns you produce.  There's a
subdirectory in the contrib directory called "tablefunc" that provides
some code to do crosstabs if all you want to do is something simple.

If you're doing it by hand, you have the advantage of being able to have
something like:

  SUM(v) AS total

at the end, or maybe a mean (AVG) or standard deviation (STDDEV) if it
makes sense to do anything like that.



  Sam