Thread: Find min year and min value

Find min year and min value

From
Stefan Schwarzer
Date:
Hi there,

I am trying to find in a table with different variables, countries
and years the

    lowest year

and within that year the

    lowest value


The following SELECT works, but I wonder if it is "elegant". Can you
recommend any other solution?


    SELECT
            MIN(value) AS minv
    FROM
            public_one_table.data
    WHERE
            year =
            (
                SELECT
                        MIN(year) AS min_year
                FROM
                        public_one_table.data
                WHERE
                        id_variable = 1
            ) AND
            id_variable = 1



Thanks for any help!

Stef

Re: Find min year and min value

From
Richard Huxton
Date:
Stefan Schwarzer wrote:
> Hi there,
>
> I am trying to find in a table with different variables, countries and
> years the
>
>     lowest year
>
> and within that year the
>
>     lowest value
>
>
> The following SELECT works, but I wonder if it is "elegant". Can you
> recommend any other solution?

SELECT value AS minv FROM public_on_table.data
WHERE id_variable = 1
ORDER BY year, value LIMIT 1


--
   Richard Huxton
   Archonet Ltd

Re: Find min year and min value

From
Stefan Schwarzer
Date:
>> Hi there,
>> I am trying to find in a table with different variables, countries
>> and years the
>>     lowest year
>> and within that year the
>>     lowest value
>> The following SELECT works, but I wonder if it is "elegant". Can
>> you recommend any other solution?
>
> SELECT value AS minv FROM public_on_table.data
> WHERE id_variable = 1
> ORDER BY year, value LIMIT 1

But that brings only the min value, not the min year. I need to know
both of them, something like

      (min year = ) 1972, (min value = ) 20

Stef


Re: Find min year and min value

From
Richard Huxton
Date:
Stefan Schwarzer wrote:
>>> Hi there,
>>> I am trying to find in a table with different variables, countries
>>> and years the
>>>     lowest year
>>> and within that year the
>>>     lowest value
>>> The following SELECT works, but I wonder if it is "elegant". Can you
>>> recommend any other solution?
>>
>> SELECT value AS minv FROM public_on_table.data
>> WHERE id_variable = 1
>> ORDER BY year, value LIMIT 1
>
> But that brings only the min value, not the min year. I need to know
> both of them, something like
>
>      (min year = ) 1972, (min value = ) 20

SELECT year, value FROM ...

--
   Richard Huxton
   Archonet Ltd

Re: Find min year and min value

From
Stefan Schwarzer
Date:
> SELECT year, value FROM ...

I feel ashamed.... such a simple solution... gush.... Thanks for that!

Unfortunately it doesn't stop there...

If I want to find the "common smallest year" for two given variables
(say, I have years 1970, 1971,.... 2005 for variable 1 (GDP) and
1980, 1981,... 2003) for variable 2 (Fish Catch) ). It should come up
with 1980 for a given country, if there is a value for that year in
both variables. Otherwise 1981, etc...

How would I do that? I really have no clue...

(my table looks something like this:

id_variable   |    year    |    value     |    id_country
---------------------------------------
         1             |   2001    |     123       |   1
         1             |   2002    |     125       |   1
         1             |   2003    |     128       |   1
         1             |   2004    |     132       |   1
         1             |   2005    |     135       |   1

         1             |   2001    |     412       |   2
         1             |   2002    |     429       |   2
         1             |   2003    |     456       |   2
         1             |   2004    |     465       |   2
         1             |   2005    |     477       |   2

....

         2             |   1980    |      83       |   1
         2             |   1981    |      89       |   1
....

)

Thanks for any hints,

Stef

Re: Find min year and min value

From
Richard Huxton
Date:
Stefan Schwarzer wrote:
>> SELECT year, value FROM ...
>
> I feel ashamed.... such a simple solution... gush.... Thanks for that!

Can be easy to over-complicate things when you've been thinking about
them too long.

> Unfortunately it doesn't stop there...
>
> If I want to find the "common smallest year" for two given variables
> (say, I have years 1970, 1971,.... 2005 for variable 1 (GDP) and 1980,
> 1981,... 2003) for variable 2 (Fish Catch) ). It should come up with
> 1980 for a given country, if there is a value for that year in both
> variables. Otherwise 1981, etc...

In that case you will need two subqueries, but it's just a matter of
converting your description to SQL.

SELECT
   yr1,
   gdp.val1 AS gdp_val,
   fish_catch.val2 AS fish_catch_val
FROM
   (SELECT year AS yr1, value AS val1 FROM data WHERE id_variable = 1
   ) AS gdp,
   (SELECT year AS yr2, value AS val2 FROM data WHERE id_variable = 2
   ) AS fish_catch
WHERE
   gdp.yr1 = fish_catch.yr2
ORDER BY
   gdp.yr1
LIMIT 1;

Here I've aliases (renamed) the columns and the sub-queries, but I'd
probably just alias the sub-queries in real-life.

You could write it as a JOIN if you prefer that style, or use the MIN()
aggregate (although I'd guess that the ORDER BY/LIMIT might prove faster).

So, I'd perhaps use:

SELECT gdp.year, gdp.val AS gdp_val, fish_catch.val AS fish_catch_val
FROM
   (SELECT year,value FROM data WHERE id_variable=1) AS gdp
JOIN
   (SELECT year, value FROM data WHERE id_variable=2) AS fish_catch
USING (year)
ORDER BY gdp.year
LIMIT 1;

--
   Richard Huxton
   Archonet Ltd

Re: Find min year and min value

From
Michael Glaesemann
Date:
On Oct 2, 2007, at 9:29 , Stefan Schwarzer wrote:

> How would I do that? I really have no clue...

The key is to build it up in steps.

select id_country, year, var_1, val_1, var_2, val_2
-- Second step:
-- value for year for each country of var_1
from (select id_country, year, id_variable as var_1, "value" as val_1
         from my_table) as val_1
-- value for year for each country for var_2
natural join (select id_country, year, id_variable as var_2, "value"
as val_2
                 from my_table) as val_2
-- First step
-- for each country, find the minimum common year (which the join
will do) for the two
-- variables you're interested in (var_1 and var_2).
natural join (select id_country, var_1, var_2, min(year) as year
                 from (select id_country, year, id_variable as var_1
                         from my_table) as var_1
                 natural join (select id_country, year, id_variable
as var_2
                                 from my_table) as var_2
                 group by id_country, var_1, var_2) as min_common_year
where id_country = :id_country
   and var_1 = :var_1
   and var_2 = :var_2;

Check your explain analyze output: if the planner doesn't push up
the :var_1, :var_2, and :id_country_id values up into subqueries, you
might want to add them as where clauses.

As an aside, I assume you've rewritten the table column names: if you
haven't, as it's an SQL keyword, "value" is a particularly poor
choice of column name. I'd probably rename "year" as well.

Hope this helps.

Michael Glaesemann
grzm seespotcode net



Re: Find min year and min value

From
Steve Crawford
Date:
Stefan Schwarzer wrote:
>> SELECT year, value FROM ...
>
> I feel ashamed.... such a simple solution... gush.... Thanks for that!
>
> Unfortunately it doesn't stop there...
>
> If I want to find the "common smallest year" for two given variables
> (say, I have years 1970, 1971,.... 2005 for variable 1 (GDP) and 1980,
> 1981,... 2003) for variable 2 (Fish Catch) ). It should come up with
> 1980 for a given country, if there is a value for that year in both
> variables. Otherwise 1981, etc...
>
> How would I do that? I really have no clue...
>
> (my table looks something like this:
>
> id_variable   |    year    |    value     |    id_country
> ---------------------------------------
>         1             |   2001    |     123       |   1
>         1             |   2002    |     125       |   1
> ....
>
>         2             |   1980    |      83       |   1
>         2             |   1981    |      89       |   1
> ....
>
> )
>
> Thanks for any hints,

As others have noted, the query *can* be written. But it appears to me
that you are struggling against your table layout. Before struggling
with ever more complicated queries, I'd consider restructuring your
table(s). There are many possibilities depending on the current nature
of your data, how you expect it to change and the queries you expect to
run against it. For example:
country_id
data_year
gdp
fish_catch

Then your query may be as simple as, say:
select min(year) from your_table
where country_id = xxx
and gdp is not null
and fish_catch is not null;

or

select year, gdp, fish_catch from your_table
where country_id = xxx
and gdp is not null
and fish_catch is not null
order by year desc, gdp desc, fish_catch desc
limit 1;

Alternately, you could have a gdp table and a fish_catch table which
would be easily joined to give the same result.

Cheers,
Steve

Re: Find min year and min value

From
hubert depesz lubaczewski
Date:
On Tue, Oct 02, 2007 at 04:29:02PM +0200, Stefan Schwarzer wrote:
> If I want to find the "common smallest year" for two given variables
> (say, I have years 1970, 1971,.... 2005 for variable 1 (GDP) and
> 1980, 1981,... 2003) for variable 2 (Fish Catch) ). It should come up
> with 1980 for a given country, if there is a value for that year in
> both variables. Otherwise 1981, etc...
> How would I do that? I really have no clue...
> (my table looks something like this:
> id_variable   |    year    |    value     |    id_country
> ---------------------------------------
>         1             |   2001    |     123       |   1

select min(year) from (select year from table where id_variable in (1,2)
group by year having count(distinct id_variable) = 2) x;

depesz

--
quicksil1er: "postgres is excellent, but like any DB it requires a
highly paid DBA.  here's my CV!" :)
http://www.depesz.com/ - blog dla ciebie (i moje CV)

Re: Find min year and min value

From
Michael Glaesemann
Date:
On Oct 2, 2007, at 11:10 , Steve Crawford wrote:

> As others have noted, the query *can* be written. But it appears to me
> that you are struggling against your table layout.

The current schema he has is commonly called EAV (entity-attribute-
value) and is generally frowned upon. Now, in his particular case it
may be justified if the "value" column values are actually all of the
same type, such as currency amounts for each category. If this is the
case, I suggest renaming the column to be more descriptive of what is
actually stored: likewise the id_variable column.

> Before struggling
> with ever more complicated queries, I'd consider restructuring your
> table(s). There are many possibilities depending on the current nature
> of your data, how you expect it to change and the queries you
> expect to
> run against it. For example:
> country_id
> data_year
> gdp
> fish_catch

This would be one way to do it. However, each time you add a new
category you'd need to add a new column to the table: not very
flexible. You can also have the same functionality by adding a new
table for each category:


> Alternately, you could have a gdp table and a fish_catch table which
> would be easily joined to give the same result.

Expanding on this:

create table fish_catches (country text not null,
                            data_year date not null,
                            primary key (country, data_year),
                            fish_catch numeric not null);

create table gdp (country text not null reference countries
                   data_year date not null,
                   primary key (country, data_year),
                   gdp numeric not null);

This makes your queries quite simple:

select country, data_year, fish_catch, gdp
from fish_catches
natural join gdp
where country = :country
order by data_year
limit 1;

or

select country, data_year, fish_catch, gdp
from fish_catches
natural join gdp
natural join (select country, min(data_year) as data_year
               from gdp
               natural join fish_catch
               group by country) min_data_year
where country = :country;

Splitting categories into separate tables also eliminates the
necessity of worrying about NULL, which can lead to unexpected
behavior if you aren't careful.

Michael Glaesemann
grzm seespotcode net



Re: Find min year and min value

From
Stefan Schwarzer
Date:
>> As others have noted, the query *can* be written. But it appears
>> to me
>> that you are struggling against your table layout.
>
> The current schema he has is commonly called EAV (entity-attribute-
> value) and is generally frowned upon. Now, in his particular case
> it may be justified if the "value" column values are actually all
> of the same type, such as currency amounts for each category. If
> this is the case, I suggest renaming the column to be more
> descriptive of what is actually stored: likewise the id_variable
> column.

Hmmm.... I am somewhat surprised to here so. After being told in this
forum how "bad" my old table design was, I changed it to the current
(which is less than alpha). Perhaps to summarize: Having 500
statistical global national variables for about 240 countries/
territories. Need to do regional aggregations, per Capita
calculations and some completeness computations on-the-fly.

The design was a table like this for each variable:


id_country   |    1970    |    1971    |    ...    |    2004    |
2005
-------------------------------------------------------------------
  1                   |   NULL    |      36       |   ...     |
42      |      45
  2 ......


The new like this:

id_variable   |    year    |    value     |    id_country
---------------------------------------
         1             |   2001    |     123       |   1
         1             |   2002    |     125       |   1
         1             |   2003    |     128       |   1
         1             |   2004    |     132       |   1
         1             |   2005    |     135       |   1

         1             |   2001    |     412       |   2
         1             |   2002    |     429       |   2
         1             |   2003    |     456       |   2
         1             |   2004    |     465       |   2
         1             |   2005    |     477       |   2

....

         2             |   1980    |      83       |   1
         2             |   1981    |      89       |   1
....


I thought (and did ask) about the possibility to put nevertheless -
with the new table design - the variables into different tables, but
nobody really got my on a track for that. So I thought the most
"common" way would be to have this central table.

But I am at a stage where I still can change - and would very much
like to get your advice.

Thanks a lot!

Stef

  ____________________________________________________________________

   Stefan Schwarzer

   Lean Back and Relax - Enjoy some Nature Photography:
   http://photoblog.la-famille-schwarzer.de

   Appetite for Global Data? UNEP GEO Data Portal:
   http://geodata.grid.unep.ch
   ____________________________________________________________________






Re: Find min year and min value

From
Michael Glaesemann
Date:
On Oct 3, 2007, at 1:29 , Stefan Schwarzer wrote:

>>> As others have noted, the query *can* be written. But it appears
>>> to me
>>> that you are struggling against your table layout.
>>
>> The current schema he has is commonly called EAV (entity-attribute-
>> value) and is generally frowned upon. Now, in his particular case
>> it may be justified if the "value" column values are actually all
>> of the same type, such as currency amounts for each category. If
>> this is the case, I suggest renaming the column to be more
>> descriptive of what is actually stored: likewise the id_variable
>> column.
>
> Having 500 statistical global national variables for about 240
> countries/territories. Need to do regional aggregations, per Capita
> calculations and some completeness computations on-the-fly.


> id_variable   |    year    |    value     |    id_country

Both Steve and I have given you alternatives and reasons for choosing
alternative schema. You haven't provided any additional information
to really help us guide you in any particular direction from what we
already have. For example, in the section from me which you quoted
above, I wrote that this schema may be appropriate if the "value"
column values are actually all of the same type (e..g, all currency
amounts, all masses, all counts). You haven't said whether or not
this is the case. We can't read your mind :)

Again, one thing that would help is if you use a more descriptive
column name than "value" that gives an indication of what *kind* of
values are in the column.

> I thought (and did ask) about the possibility to put nevertheless -
> with the new table design - the variables into different tables,
> but nobody really got my on a track for that.

Steve first suggested it and I provided an example of what that would
look like (using "gdp" and "fish_catches" tables) in the same post
you quoted from above.

http://archives.postgresql.org/pgsql-general/2007-10/msg00108.php

Is this not what you mean?

Michael Glaesemann
grzm seespotcode net



Re: Find min year and min value

From
Stefan Schwarzer
Date:
>> Having 500 statistical global national variables for about 240
>> countries/territories. Need to do regional aggregations, per
>> Capita calculations and some completeness computations on-the-fly.
>
>
>> id_variable   |    year    |    value     |    id_country
>
> Both Steve and I have given you alternatives and reasons for
> choosing alternative schema. You haven't provided any additional
> information to really help us guide you in any particular direction
> from what we already have. For example, in the section from me
> which you quoted above, I wrote that this schema may be appropriate
> if the "value" column values are actually all of the same type
> (e..g, all currency amounts, all masses, all counts). You haven't
> said whether or not this is the case. We can't read your mind :)
>
> Again, one thing that would help is if you use a more descriptive
> column name than "value" that gives an indication of what *kind* of
> values are in the column.
>
>> I thought (and did ask) about the possibility to put nevertheless
>> - with the new table design - the variables into different tables,
>> but nobody really got my on a track for that.
>
> Steve first suggested it and I provided an example of what that
> would look like (using "gdp" and "fish_catches" tables) in the same
> post you quoted from above.
>
> http://archives.postgresql.org/pgsql-general/2007-10/msg00108.php
>
> Is this not what you mean?

Hmm... sorry that I didn't give you more info. I will see it I can do
better now.

So, as mentioned above, we have 500 different variables (such as GDP,
Fish Catch, Population, Forest Cover, Fertility Rate etc.). Many of
them have indeed different units (Metric Tons, Thousand People, Sqkm,
Persons per Year etc.). And many do cover different time periods
(1970-2005; 2000-2002; 1970-75, 1975-80,..., 2000-05; ...).

So, "value" is something general - it can be (measured in) (thousand)
Dollars, (Million) People, (Hundred) Hectares etc...

We update the variables something like once a year or every two
years, in an ongoing manner. Updating means especially adding new
years (as they are being published by World Bank, UNSD, etc.). So, no
real "hard core" editing/adding, I'd say. A slow, but steady progress.

Queries, as mentioned too, are mainly simple "SELECt * FROM
specific_variable_like_gdp". But as well on-the-fly aggregations to
subregions and regions (such as West Africa, and Africa). And some
calculations which enables to indicate the "completeness" of the
aggregations (parameters being used are "Number of countries included
in aggregation in respect to Number of Countries, Total Population,
Land Area, GDP).

Does this help?

Thanks anyway already for all your feedback. Your really great people
out there!!

Best wishes,

Stef

Re: Find min year and min value

From
Michael Glaesemann
Date:
On Oct 4, 2007, at 6:41 , Stefan Schwarzer wrote:

> So, "value" is something general - it can be (measured in)
> (thousand) Dollars, (Million) People, (Hundred) Hectares etc...

Then I would make a separate table for each variable (as I described
above). For one thing, you're going to have keep track of what units
are associated with which variables are somewhere, whether that be in
your database or possibly your middleware. Separating the variables
into separate tables provides better separation of these value types.

> Queries, as mentioned too, are mainly simple "SELECt * FROM
> specific_variable_like_gdp". But as well on-the-fly aggregations to
> subregions and regions (such as West Africa, and Africa).

If these aggregations are going to be common, you might want to set
up another table that arranges the countries into hierarchies using
nested sets or some other hierarchical strategy (like the adjacency
model or contrib/ltree). I've found nested sets are convenient for
calculating aggregations. You can google or search the archives for
these methods.

> And some calculations which enables to indicate the "completeness"
> of the aggregations (parameters being used are "Number of countries
> included in aggregation in respect to Number of Countries, Total
> Population, Land Area, GDP).

You can use nested sets to handle these calculations as well.

I'm sure there are resources out there that describe database schema
for reporting things such as you describe, as it's not a new problem.
I don't know any off the top of my head, but again, Google is your
friend.

Hope this helps!

Michael Glaesemann
grzm seespotcode net



Re: Find min year and min value

From
Stefan Schwarzer
Date:
>> Alternately, you could have a gdp table and a fish_catch table which
>> would be easily joined to give the same result.
>
> Expanding on this:
>
> create table fish_catches (country text not null,
>                            data_year date not null,
>                            primary key (country, data_year),
>                            fish_catch numeric not null);
>
> create table gdp (country text not null reference countries
>                   data_year date not null,
>                   primary key (country, data_year),
>                   gdp numeric not null);
>
> This makes your queries quite simple:
>
> select country, data_year, fish_catch, gdp
> from fish_catches
> natural join gdp
> where country = :country
> order by data_year
> limit 1;

Hmmm..... Don't really get that query working. My SQL looks like this
now:

SELECT
    id_country,
    year,
    value
FROM
   internet_users
NATURAL JOIN
   gdp
WHERE
    id_country = 8
ORDER BY
    year
LIMIT
    1

But there is no result.

My table looks like this (for each variable one table):

id_country            year                value
4                           1980               6.6
4                           1981               7.0
...
6                           1980               5.1


Thanks for any advice!

Stef


Re: Find min year and min value

From
Stefan Schwarzer
Date:
> Hmmm..... Don't really get that query working. My SQL looks like
> this now:
>
> SELECT
>    id_country,
>    year,
>    value
> FROM
>   internet_users
> NATURAL JOIN
>   gdp
> WHERE
>    id_country = 8
> ORDER BY
>    year
> LIMIT
>    1

Ok, got it working with another proposed SQL SELECT. This is a
solution (comes out of an PHP loop, so one could easily have more
than 2 subselects). Thanks for the help to everyone!!

SELECT
    year0 AS common_year,
    v0.val0 AS v0_value,
    v1.val1 AS v1_value
FROM
    (
        SELECT
            year AS year0,
            value AS val0
        FROM
        gdp_capita
        WHERE
            id_country = 672 AND
            value IS NOT NULL
    ) AS v0,
    (
        SELECT
            year AS year1,
            value AS val1
        FROM
        agri_add_gdp
        WHERE
            id_country = 672 AND
            value IS NOT NULL
    ) AS v1
WHERE
    v1.year1 = v0.year0 AND
    v1.year1 = v1.year1
ORDER BY
    v0.year0
LIMIT
    1


Re: Find min year and min value

From
Michael Glaesemann
Date:
On Oct 5, 2007, at 4:11 , Stefan Schwarzer wrote:

>>> Alternately, you could have a gdp table and a fish_catch table which
>>> would be easily joined to give the same result.
>>
>> Expanding on this:
>>
>> create table fish_catches (country text not null,
>>                            data_year date not null,
>>                            primary key (country, data_year),
>>                            fish_catch numeric not null);
>>
>> create table gdp (country text not null reference countries
>>                   data_year date not null,
>>                   primary key (country, data_year),
>>                   gdp numeric not null);
>>
>> This makes your queries quite simple:
>>
>> select country, data_year, fish_catch, gdp
>> from fish_catches
>> natural join gdp
>> where country = :country
>> order by data_year
>> limit 1;
>
> Hmmm..... Don't really get that query working. My SQL looks like
> this now:
>
> SELECT
>    id_country,
>    year,
>    value
> FROM
>   internet_users
> NATURAL JOIN
>   gdp
> WHERE
>    id_country = 8
> ORDER BY
>    year
> LIMIT
>    1
>
> But there is no result.
>
> My table looks like this (for each variable one table):
>
> id_country            year                value

The natural join operator joins on common columns: if columns are
named "value" in both tables, the join condition is (id_country,
year, value) = (id_country, year, value). In the example I provided
above, the tables were fish_catches {country, year, fish_catch} and
gdp {country, year, gdp}: the join condition is (country, year) =
(country, year).

Also, note that there are *four* output columns in the query I used:
{country, data_year, fish_catch, gdp}. You've only got three, which
is bound to be confusing.

In your case you can use subqueries to rename the columns or an
explicit join:

-- using subqueries
select id_country, year, internet_users, gdp
   from (select id_country, year, value as internet users
           from internet_users) i
   natural join (select id_country, year, value as gdp
                   from gdp) g
   where id_country = 8
   order by year
   limit 1

-- using an explicit join

select id_country, year, internet_users.value as internet_users,
gdp.value as gdp
from internet_users
join gdp using (id_country, year)
where id_country = 8
order by year
limit 1


Michael Glaesemann
grzm seespotcode net