Thread: Find min year and min value
Hi there, I am trying to find in a table with different variables, countries and years the lowest year and within that year the lowest value The following SELECT works, but I wonder if it is "elegant". Can you recommend any other solution? SELECT MIN(value) AS minv FROM public_one_table.data WHERE year = ( SELECT MIN(year) AS min_year FROM public_one_table.data WHERE id_variable = 1 ) AND id_variable = 1 Thanks for any help! Stef
Stefan Schwarzer wrote: > Hi there, > > I am trying to find in a table with different variables, countries and > years the > > lowest year > > and within that year the > > lowest value > > > The following SELECT works, but I wonder if it is "elegant". Can you > recommend any other solution? SELECT value AS minv FROM public_on_table.data WHERE id_variable = 1 ORDER BY year, value LIMIT 1 -- Richard Huxton Archonet Ltd
>> Hi there, >> I am trying to find in a table with different variables, countries >> and years the >> lowest year >> and within that year the >> lowest value >> The following SELECT works, but I wonder if it is "elegant". Can >> you recommend any other solution? > > SELECT value AS minv FROM public_on_table.data > WHERE id_variable = 1 > ORDER BY year, value LIMIT 1 But that brings only the min value, not the min year. I need to know both of them, something like (min year = ) 1972, (min value = ) 20 Stef
Stefan Schwarzer wrote: >>> Hi there, >>> I am trying to find in a table with different variables, countries >>> and years the >>> lowest year >>> and within that year the >>> lowest value >>> The following SELECT works, but I wonder if it is "elegant". Can you >>> recommend any other solution? >> >> SELECT value AS minv FROM public_on_table.data >> WHERE id_variable = 1 >> ORDER BY year, value LIMIT 1 > > But that brings only the min value, not the min year. I need to know > both of them, something like > > (min year = ) 1972, (min value = ) 20 SELECT year, value FROM ... -- Richard Huxton Archonet Ltd
> SELECT year, value FROM ... I feel ashamed.... such a simple solution... gush.... Thanks for that! Unfortunately it doesn't stop there... If I want to find the "common smallest year" for two given variables (say, I have years 1970, 1971,.... 2005 for variable 1 (GDP) and 1980, 1981,... 2003) for variable 2 (Fish Catch) ). It should come up with 1980 for a given country, if there is a value for that year in both variables. Otherwise 1981, etc... How would I do that? I really have no clue... (my table looks something like this: id_variable | year | value | id_country --------------------------------------- 1 | 2001 | 123 | 1 1 | 2002 | 125 | 1 1 | 2003 | 128 | 1 1 | 2004 | 132 | 1 1 | 2005 | 135 | 1 1 | 2001 | 412 | 2 1 | 2002 | 429 | 2 1 | 2003 | 456 | 2 1 | 2004 | 465 | 2 1 | 2005 | 477 | 2 .... 2 | 1980 | 83 | 1 2 | 1981 | 89 | 1 .... ) Thanks for any hints, Stef
Stefan Schwarzer wrote: >> SELECT year, value FROM ... > > I feel ashamed.... such a simple solution... gush.... Thanks for that! Can be easy to over-complicate things when you've been thinking about them too long. > Unfortunately it doesn't stop there... > > If I want to find the "common smallest year" for two given variables > (say, I have years 1970, 1971,.... 2005 for variable 1 (GDP) and 1980, > 1981,... 2003) for variable 2 (Fish Catch) ). It should come up with > 1980 for a given country, if there is a value for that year in both > variables. Otherwise 1981, etc... In that case you will need two subqueries, but it's just a matter of converting your description to SQL. SELECT yr1, gdp.val1 AS gdp_val, fish_catch.val2 AS fish_catch_val FROM (SELECT year AS yr1, value AS val1 FROM data WHERE id_variable = 1 ) AS gdp, (SELECT year AS yr2, value AS val2 FROM data WHERE id_variable = 2 ) AS fish_catch WHERE gdp.yr1 = fish_catch.yr2 ORDER BY gdp.yr1 LIMIT 1; Here I've aliases (renamed) the columns and the sub-queries, but I'd probably just alias the sub-queries in real-life. You could write it as a JOIN if you prefer that style, or use the MIN() aggregate (although I'd guess that the ORDER BY/LIMIT might prove faster). So, I'd perhaps use: SELECT gdp.year, gdp.val AS gdp_val, fish_catch.val AS fish_catch_val FROM (SELECT year,value FROM data WHERE id_variable=1) AS gdp JOIN (SELECT year, value FROM data WHERE id_variable=2) AS fish_catch USING (year) ORDER BY gdp.year LIMIT 1; -- Richard Huxton Archonet Ltd
On Oct 2, 2007, at 9:29 , Stefan Schwarzer wrote: > How would I do that? I really have no clue... The key is to build it up in steps. select id_country, year, var_1, val_1, var_2, val_2 -- Second step: -- value for year for each country of var_1 from (select id_country, year, id_variable as var_1, "value" as val_1 from my_table) as val_1 -- value for year for each country for var_2 natural join (select id_country, year, id_variable as var_2, "value" as val_2 from my_table) as val_2 -- First step -- for each country, find the minimum common year (which the join will do) for the two -- variables you're interested in (var_1 and var_2). natural join (select id_country, var_1, var_2, min(year) as year from (select id_country, year, id_variable as var_1 from my_table) as var_1 natural join (select id_country, year, id_variable as var_2 from my_table) as var_2 group by id_country, var_1, var_2) as min_common_year where id_country = :id_country and var_1 = :var_1 and var_2 = :var_2; Check your explain analyze output: if the planner doesn't push up the :var_1, :var_2, and :id_country_id values up into subqueries, you might want to add them as where clauses. As an aside, I assume you've rewritten the table column names: if you haven't, as it's an SQL keyword, "value" is a particularly poor choice of column name. I'd probably rename "year" as well. Hope this helps. Michael Glaesemann grzm seespotcode net
Stefan Schwarzer wrote: >> SELECT year, value FROM ... > > I feel ashamed.... such a simple solution... gush.... Thanks for that! > > Unfortunately it doesn't stop there... > > If I want to find the "common smallest year" for two given variables > (say, I have years 1970, 1971,.... 2005 for variable 1 (GDP) and 1980, > 1981,... 2003) for variable 2 (Fish Catch) ). It should come up with > 1980 for a given country, if there is a value for that year in both > variables. Otherwise 1981, etc... > > How would I do that? I really have no clue... > > (my table looks something like this: > > id_variable | year | value | id_country > --------------------------------------- > 1 | 2001 | 123 | 1 > 1 | 2002 | 125 | 1 > .... > > 2 | 1980 | 83 | 1 > 2 | 1981 | 89 | 1 > .... > > ) > > Thanks for any hints, As others have noted, the query *can* be written. But it appears to me that you are struggling against your table layout. Before struggling with ever more complicated queries, I'd consider restructuring your table(s). There are many possibilities depending on the current nature of your data, how you expect it to change and the queries you expect to run against it. For example: country_id data_year gdp fish_catch Then your query may be as simple as, say: select min(year) from your_table where country_id = xxx and gdp is not null and fish_catch is not null; or select year, gdp, fish_catch from your_table where country_id = xxx and gdp is not null and fish_catch is not null order by year desc, gdp desc, fish_catch desc limit 1; Alternately, you could have a gdp table and a fish_catch table which would be easily joined to give the same result. Cheers, Steve
On Tue, Oct 02, 2007 at 04:29:02PM +0200, Stefan Schwarzer wrote: > If I want to find the "common smallest year" for two given variables > (say, I have years 1970, 1971,.... 2005 for variable 1 (GDP) and > 1980, 1981,... 2003) for variable 2 (Fish Catch) ). It should come up > with 1980 for a given country, if there is a value for that year in > both variables. Otherwise 1981, etc... > How would I do that? I really have no clue... > (my table looks something like this: > id_variable | year | value | id_country > --------------------------------------- > 1 | 2001 | 123 | 1 select min(year) from (select year from table where id_variable in (1,2) group by year having count(distinct id_variable) = 2) x; depesz -- quicksil1er: "postgres is excellent, but like any DB it requires a highly paid DBA. here's my CV!" :) http://www.depesz.com/ - blog dla ciebie (i moje CV)
On Oct 2, 2007, at 11:10 , Steve Crawford wrote: > As others have noted, the query *can* be written. But it appears to me > that you are struggling against your table layout. The current schema he has is commonly called EAV (entity-attribute- value) and is generally frowned upon. Now, in his particular case it may be justified if the "value" column values are actually all of the same type, such as currency amounts for each category. If this is the case, I suggest renaming the column to be more descriptive of what is actually stored: likewise the id_variable column. > Before struggling > with ever more complicated queries, I'd consider restructuring your > table(s). There are many possibilities depending on the current nature > of your data, how you expect it to change and the queries you > expect to > run against it. For example: > country_id > data_year > gdp > fish_catch This would be one way to do it. However, each time you add a new category you'd need to add a new column to the table: not very flexible. You can also have the same functionality by adding a new table for each category: > Alternately, you could have a gdp table and a fish_catch table which > would be easily joined to give the same result. Expanding on this: create table fish_catches (country text not null, data_year date not null, primary key (country, data_year), fish_catch numeric not null); create table gdp (country text not null reference countries data_year date not null, primary key (country, data_year), gdp numeric not null); This makes your queries quite simple: select country, data_year, fish_catch, gdp from fish_catches natural join gdp where country = :country order by data_year limit 1; or select country, data_year, fish_catch, gdp from fish_catches natural join gdp natural join (select country, min(data_year) as data_year from gdp natural join fish_catch group by country) min_data_year where country = :country; Splitting categories into separate tables also eliminates the necessity of worrying about NULL, which can lead to unexpected behavior if you aren't careful. Michael Glaesemann grzm seespotcode net
>> As others have noted, the query *can* be written. But it appears >> to me >> that you are struggling against your table layout. > > The current schema he has is commonly called EAV (entity-attribute- > value) and is generally frowned upon. Now, in his particular case > it may be justified if the "value" column values are actually all > of the same type, such as currency amounts for each category. If > this is the case, I suggest renaming the column to be more > descriptive of what is actually stored: likewise the id_variable > column. Hmmm.... I am somewhat surprised to here so. After being told in this forum how "bad" my old table design was, I changed it to the current (which is less than alpha). Perhaps to summarize: Having 500 statistical global national variables for about 240 countries/ territories. Need to do regional aggregations, per Capita calculations and some completeness computations on-the-fly. The design was a table like this for each variable: id_country | 1970 | 1971 | ... | 2004 | 2005 ------------------------------------------------------------------- 1 | NULL | 36 | ... | 42 | 45 2 ...... The new like this: id_variable | year | value | id_country --------------------------------------- 1 | 2001 | 123 | 1 1 | 2002 | 125 | 1 1 | 2003 | 128 | 1 1 | 2004 | 132 | 1 1 | 2005 | 135 | 1 1 | 2001 | 412 | 2 1 | 2002 | 429 | 2 1 | 2003 | 456 | 2 1 | 2004 | 465 | 2 1 | 2005 | 477 | 2 .... 2 | 1980 | 83 | 1 2 | 1981 | 89 | 1 .... I thought (and did ask) about the possibility to put nevertheless - with the new table design - the variables into different tables, but nobody really got my on a track for that. So I thought the most "common" way would be to have this central table. But I am at a stage where I still can change - and would very much like to get your advice. Thanks a lot! Stef ____________________________________________________________________ Stefan Schwarzer Lean Back and Relax - Enjoy some Nature Photography: http://photoblog.la-famille-schwarzer.de Appetite for Global Data? UNEP GEO Data Portal: http://geodata.grid.unep.ch ____________________________________________________________________
On Oct 3, 2007, at 1:29 , Stefan Schwarzer wrote: >>> As others have noted, the query *can* be written. But it appears >>> to me >>> that you are struggling against your table layout. >> >> The current schema he has is commonly called EAV (entity-attribute- >> value) and is generally frowned upon. Now, in his particular case >> it may be justified if the "value" column values are actually all >> of the same type, such as currency amounts for each category. If >> this is the case, I suggest renaming the column to be more >> descriptive of what is actually stored: likewise the id_variable >> column. > > Having 500 statistical global national variables for about 240 > countries/territories. Need to do regional aggregations, per Capita > calculations and some completeness computations on-the-fly. > id_variable | year | value | id_country Both Steve and I have given you alternatives and reasons for choosing alternative schema. You haven't provided any additional information to really help us guide you in any particular direction from what we already have. For example, in the section from me which you quoted above, I wrote that this schema may be appropriate if the "value" column values are actually all of the same type (e..g, all currency amounts, all masses, all counts). You haven't said whether or not this is the case. We can't read your mind :) Again, one thing that would help is if you use a more descriptive column name than "value" that gives an indication of what *kind* of values are in the column. > I thought (and did ask) about the possibility to put nevertheless - > with the new table design - the variables into different tables, > but nobody really got my on a track for that. Steve first suggested it and I provided an example of what that would look like (using "gdp" and "fish_catches" tables) in the same post you quoted from above. http://archives.postgresql.org/pgsql-general/2007-10/msg00108.php Is this not what you mean? Michael Glaesemann grzm seespotcode net
>> Having 500 statistical global national variables for about 240 >> countries/territories. Need to do regional aggregations, per >> Capita calculations and some completeness computations on-the-fly. > > >> id_variable | year | value | id_country > > Both Steve and I have given you alternatives and reasons for > choosing alternative schema. You haven't provided any additional > information to really help us guide you in any particular direction > from what we already have. For example, in the section from me > which you quoted above, I wrote that this schema may be appropriate > if the "value" column values are actually all of the same type > (e..g, all currency amounts, all masses, all counts). You haven't > said whether or not this is the case. We can't read your mind :) > > Again, one thing that would help is if you use a more descriptive > column name than "value" that gives an indication of what *kind* of > values are in the column. > >> I thought (and did ask) about the possibility to put nevertheless >> - with the new table design - the variables into different tables, >> but nobody really got my on a track for that. > > Steve first suggested it and I provided an example of what that > would look like (using "gdp" and "fish_catches" tables) in the same > post you quoted from above. > > http://archives.postgresql.org/pgsql-general/2007-10/msg00108.php > > Is this not what you mean? Hmm... sorry that I didn't give you more info. I will see it I can do better now. So, as mentioned above, we have 500 different variables (such as GDP, Fish Catch, Population, Forest Cover, Fertility Rate etc.). Many of them have indeed different units (Metric Tons, Thousand People, Sqkm, Persons per Year etc.). And many do cover different time periods (1970-2005; 2000-2002; 1970-75, 1975-80,..., 2000-05; ...). So, "value" is something general - it can be (measured in) (thousand) Dollars, (Million) People, (Hundred) Hectares etc... We update the variables something like once a year or every two years, in an ongoing manner. Updating means especially adding new years (as they are being published by World Bank, UNSD, etc.). So, no real "hard core" editing/adding, I'd say. A slow, but steady progress. Queries, as mentioned too, are mainly simple "SELECt * FROM specific_variable_like_gdp". But as well on-the-fly aggregations to subregions and regions (such as West Africa, and Africa). And some calculations which enables to indicate the "completeness" of the aggregations (parameters being used are "Number of countries included in aggregation in respect to Number of Countries, Total Population, Land Area, GDP). Does this help? Thanks anyway already for all your feedback. Your really great people out there!! Best wishes, Stef
On Oct 4, 2007, at 6:41 , Stefan Schwarzer wrote: > So, "value" is something general - it can be (measured in) > (thousand) Dollars, (Million) People, (Hundred) Hectares etc... Then I would make a separate table for each variable (as I described above). For one thing, you're going to have keep track of what units are associated with which variables are somewhere, whether that be in your database or possibly your middleware. Separating the variables into separate tables provides better separation of these value types. > Queries, as mentioned too, are mainly simple "SELECt * FROM > specific_variable_like_gdp". But as well on-the-fly aggregations to > subregions and regions (such as West Africa, and Africa). If these aggregations are going to be common, you might want to set up another table that arranges the countries into hierarchies using nested sets or some other hierarchical strategy (like the adjacency model or contrib/ltree). I've found nested sets are convenient for calculating aggregations. You can google or search the archives for these methods. > And some calculations which enables to indicate the "completeness" > of the aggregations (parameters being used are "Number of countries > included in aggregation in respect to Number of Countries, Total > Population, Land Area, GDP). You can use nested sets to handle these calculations as well. I'm sure there are resources out there that describe database schema for reporting things such as you describe, as it's not a new problem. I don't know any off the top of my head, but again, Google is your friend. Hope this helps! Michael Glaesemann grzm seespotcode net
>> Alternately, you could have a gdp table and a fish_catch table which >> would be easily joined to give the same result. > > Expanding on this: > > create table fish_catches (country text not null, > data_year date not null, > primary key (country, data_year), > fish_catch numeric not null); > > create table gdp (country text not null reference countries > data_year date not null, > primary key (country, data_year), > gdp numeric not null); > > This makes your queries quite simple: > > select country, data_year, fish_catch, gdp > from fish_catches > natural join gdp > where country = :country > order by data_year > limit 1; Hmmm..... Don't really get that query working. My SQL looks like this now: SELECT id_country, year, value FROM internet_users NATURAL JOIN gdp WHERE id_country = 8 ORDER BY year LIMIT 1 But there is no result. My table looks like this (for each variable one table): id_country year value 4 1980 6.6 4 1981 7.0 ... 6 1980 5.1 Thanks for any advice! Stef
> Hmmm..... Don't really get that query working. My SQL looks like > this now: > > SELECT > id_country, > year, > value > FROM > internet_users > NATURAL JOIN > gdp > WHERE > id_country = 8 > ORDER BY > year > LIMIT > 1 Ok, got it working with another proposed SQL SELECT. This is a solution (comes out of an PHP loop, so one could easily have more than 2 subselects). Thanks for the help to everyone!! SELECT year0 AS common_year, v0.val0 AS v0_value, v1.val1 AS v1_value FROM ( SELECT year AS year0, value AS val0 FROM gdp_capita WHERE id_country = 672 AND value IS NOT NULL ) AS v0, ( SELECT year AS year1, value AS val1 FROM agri_add_gdp WHERE id_country = 672 AND value IS NOT NULL ) AS v1 WHERE v1.year1 = v0.year0 AND v1.year1 = v1.year1 ORDER BY v0.year0 LIMIT 1
On Oct 5, 2007, at 4:11 , Stefan Schwarzer wrote: >>> Alternately, you could have a gdp table and a fish_catch table which >>> would be easily joined to give the same result. >> >> Expanding on this: >> >> create table fish_catches (country text not null, >> data_year date not null, >> primary key (country, data_year), >> fish_catch numeric not null); >> >> create table gdp (country text not null reference countries >> data_year date not null, >> primary key (country, data_year), >> gdp numeric not null); >> >> This makes your queries quite simple: >> >> select country, data_year, fish_catch, gdp >> from fish_catches >> natural join gdp >> where country = :country >> order by data_year >> limit 1; > > Hmmm..... Don't really get that query working. My SQL looks like > this now: > > SELECT > id_country, > year, > value > FROM > internet_users > NATURAL JOIN > gdp > WHERE > id_country = 8 > ORDER BY > year > LIMIT > 1 > > But there is no result. > > My table looks like this (for each variable one table): > > id_country year value The natural join operator joins on common columns: if columns are named "value" in both tables, the join condition is (id_country, year, value) = (id_country, year, value). In the example I provided above, the tables were fish_catches {country, year, fish_catch} and gdp {country, year, gdp}: the join condition is (country, year) = (country, year). Also, note that there are *four* output columns in the query I used: {country, data_year, fish_catch, gdp}. You've only got three, which is bound to be confusing. In your case you can use subqueries to rename the columns or an explicit join: -- using subqueries select id_country, year, internet_users, gdp from (select id_country, year, value as internet users from internet_users) i natural join (select id_country, year, value as gdp from gdp) g where id_country = 8 order by year limit 1 -- using an explicit join select id_country, year, internet_users.value as internet_users, gdp.value as gdp from internet_users join gdp using (id_country, year) where id_country = 8 order by year limit 1 Michael Glaesemann grzm seespotcode net