Thread: Database/Table Design for Global Country Statistics
Hi there, I learned in another posting that my table design - in a polite way - "could be improved". So, before doing any additional design errors, I would like to get feedback, if possible. I am dealing with some 500 tables for worldwide national statistics (GDP, population, environment etc.), covering approx. 30 years each. For each of these variables, I usually have as well (pre-prepared) subregional and regional aggregations too. These could - and should - at the end be calculated on-the-fly, and not pre-calculated and imported from Excel as it is for the moment. My (national) table for a given variable is in the moment as follows (id being the identifier for a specific country): id | 1970 | 1971 | ... | 2004 | 2005 ------------------------------------------------------------------- 1 | NULL | 36 | ... | 42 | 45 2 ...... The new design would be like this: id | year | value ------------------------------- 1 | 1970 | NULL 1 | 1971 | 36 1 .... 1 | 2005 | 45 2 | 1970 | .... 2 ..... Would that be considered as "good table design" then? Thanks for any advice! Stef
Stefan Schwarzer wrote: > Hi there, > > I learned in another posting that my table design - in a polite way - > "could be improved". > > So, before doing any additional design errors, I would like to get > feedback, if possible. > > I am dealing with some 500 tables for worldwide national statistics > (GDP, population, environment etc.), covering approx. 30 years each. For > each of these variables, I usually have as well (pre-prepared) > subregional and regional aggregations too. These could - and should - at > the end be calculated on-the-fly, and not pre-calculated and imported > from Excel as it is for the moment. Might want a trigger to keep them up to date. > My (national) table for a given variable is in the moment as follows (id > being the identifier for a specific country): > > id | 1970 | 1971 | ... | 2004 | 2005 > ------------------------------------------------------------------- > 1 | NULL | 36 | ... | 42 | 45 > 2 ...... Ick - fiddly if you we ever reach 2006... > The new design would be like this: > > id | year | value > ------------------------------- > 1 | 1970 | NULL > 1 | 1971 | 36 > 1 .... > 1 | 2005 | 45 > 2 | 1970 | .... > 2 ..... > > > Would that be considered as "good table design" then? Well if "id" is a country I'd rename it "country_id". If all years+countries should have the same set of measurements you might want it in one table: (country_id, year, population, area, roads_in_km, ...) However, if some measurements aren't relevant you're probably better off with separate table for each measurement: country_population, country_area etc. The issue with NULLs is what do they mean. They should mean "unknown", nothing more and nothing less. However, frequently you want to distinguish between "no figure available" and "not applicable" (e.g. "monarch" isn't relevant except in monarchies). In this case it's probably best practice to separate out the values: (country_id integer, year integer, has_monarch boolean, monarch text) This way you can distinguish between has_monarch=false and monarch='' -- Richard Huxton Archonet Ltd
Stefan Schwarzer schrieb: > Hi there, > > I learned in another posting that my table design - in a polite way - > "could be improved". > > So, before doing any additional design errors, I would like to get > feedback, if possible. > > I am dealing with some 500 tables for worldwide national statistics > (GDP, population, environment etc.), covering approx. 30 years each. For > each of these variables, I usually have as well (pre-prepared) > subregional and regional aggregations too. These could - and should - at > the end be calculated on-the-fly, and not pre-calculated and imported > from Excel as it is for the moment. > > My (national) table for a given variable is in the moment as follows (id > being the identifier for a specific country): > > id | 1970 | 1971 | ... | 2004 | 2005 > ------------------------------------------------------------------- > 1 | NULL | 36 | ... | 42 | 45 > 2 ...... > > The new design would be like this: > > id | year | value > ------------------------------- > 1 | 1970 | NULL > 1 | 1971 | 36 > 1 .... > 1 | 2005 | 45 > 2 | 1970 | .... > 2 ..... > > > Would that be considered as "good table design" then? I'd not save null values but otherwise you are fine. If you want to select over stable ranges of years w/o bothering with non existent data (and therefore year) it might be usefull to have a separate mapping table like year -> year_id (sequential) and use the Ids for year. This can even be simplified to a table just providing you the coverage of years you have as a whole and use the number directly as foreign key in your table. table:years year 1970 1971 1972 ... table:measures measure_id,description 1 foovar 2 barvar ... table:values year (fk from years), measure_id (fk from measures), value 1971 1 36 every column of "values" table would be not null and primary key (year,measure_id) This way you can easily create any report you want by selectiong a year range, then left join the values table as often as you want for a given measure and so on. Regards Tino
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 09/12/07 03:28, Stefan Schwarzer wrote: [snip] > > The new design would be like this: > > id | year | value > ------------------------------- > 1 | 1970 | NULL > 1 | 1971 | 36 > 1 .... > 1 | 2005 | 45 > 2 | 1970 | .... > 2 ..... > > > Would that be considered as "good table design" then? What Richard says, plus: 1. The PK of New_Design should be country_id/year. 2. You also should not store the records where "value is NULL". A left outer join would handle that. - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFG57E/S9HxQb37XmcRAsb/AJ97lFt25sLwIYhkhQgGdJq2m8NaXgCfTxIV rVuX5FF0XRDkXJgdSlV1qZg= =jyv3 -----END PGP SIGNATURE-----
Thanks for the feedback and the suggestions. A problem I have now when using the new design is the following: As a result from my PostGres query I get something like this: year | value | name --------------------------------------- 2001 | 123 | Afghanistan 2002 | 125 | Afghanistan 2003 | 128 | Afghanistan 2004 | 132 | Afghanistan 2005 | 135 | Afghanistan 2001 | 412 | Albania 2002 | 429 | Albania 2003 | 456 | Albania 2004 | 465 | Albania 2005 | 477 | Albania ..... The way it is displayed on the web (in table form) is the "usual" way: name 2001 2002 2003 2004 2005 ----------------------------------------------------------------- Afghanistan .... Albania .... Is there any "simple", elegant solution for PHP, so that it does this transformation? I can't imagine that I would have to write a couple of IFs to achieve that. But I have no other idea. Or is it a question of writing a more elegant SQL query? Thanks for any advice! Stef
Stefan Schwarzer wrote: > Thanks for the feedback and the suggestions. > > A problem I have now when using the new design is the following: > > As a result from my PostGres query I get something like this: > > year | value | name > --------------------------------------- > 2001 | 123 | Afghanistan > 2002 | 125 | Afghanistan > 2003 | 128 | Afghanistan [etc] > The way it is displayed on the web (in table form) is the "usual" way: > > name 2001 2002 2003 2004 2005 > ----------------------------------------------------------------- > Afghanistan .... > Albania .... > > > Is there any "simple", elegant solution for PHP, so that it does this > transformation? I can't imagine that I would have to write a couple of > IFs to achieve that. But I have no other idea. Or is it a question of > writing a more elegant SQL query? $curr_yr = -1 $cols = array(); while (<fetch rows>) { if ($row['year'] != $curr_yr) { if (sizeof($cols) > 0) { display_table_row($cols); } $cols = array(); $curr_year = $row['year']; } $cols[] = $row['value']; } // handle possible last row of table if (sizeof($cols) > 0) { display_table_row($cols); } Make sure your query is ordered properly and you don't have gaps in your years. -- Richard Huxton Archonet Ltd
Many, many thanks for that and the other advices from everybody. You're great! As I am neither expert in PHP, nor in Postgres, MySQL, Javascript, HTML etc. etc., all I do in the forums is asking questions; and not "adding value" by helping others (due to my limited knowledge). All I can do is offer you a moment of relaxation by looking at some really nice nature/landscape photos @ http://photoblog.la-famille-schwarzer.de Cheers, Stef On Sep 12, 2007, at 11:45 AM, Richard Huxton wrote: > Stefan Schwarzer wrote: >> Thanks for the feedback and the suggestions. >> A problem I have now when using the new design is the following: >> As a result from my PostGres query I get something like this: >> year | value | name >> --------------------------------------- >> 2001 | 123 | Afghanistan >> 2002 | 125 | Afghanistan >> 2003 | 128 | Afghanistan > [etc] >> The way it is displayed on the web (in table form) is the "usual" >> way: >> name 2001 2002 2003 2004 2005 >> ----------------------------------------------------------------- >> Afghanistan .... >> Albania .... >> Is there any "simple", elegant solution for PHP, so that it does >> this transformation? I can't imagine that I would have to write a >> couple of IFs to achieve that. But I have no other idea. Or is it >> a question of writing a more elegant SQL query? > > $curr_yr = -1 > $cols = array(); > while (<fetch rows>) { > if ($row['year'] != $curr_yr) { > if (sizeof($cols) > 0) { display_table_row($cols); } > $cols = array(); > $curr_year = $row['year']; > } > $cols[] = $row['value']; > } > // handle possible last row of table > if (sizeof($cols) > 0) { display_table_row($cols); } > > > Make sure your query is ordered properly and you don't have gaps in > your years. > > -- > Richard Huxton > Archonet Ltd
Stefan Schwarzer wrote: > Many, many thanks for that and the other advices from everybody. You're > great! > > As I am neither expert in PHP, nor in Postgres, MySQL, Javascript, HTML > etc. etc., all I do in the forums is asking questions; and not "adding > value" by helping others (due to my limited knowledge). All I can do is > offer you a moment of relaxation by looking at some really nice > nature/landscape photos @ http://photoblog.la-famille-schwarzer.de There's always the pgsql-novice mailing list, I'm sure you could be of assistance there. -- Richard Huxton Archonet Ltd
> $curr_yr = -1 > $cols = array(); > while (<fetch rows>) { > if ($row['year'] != $curr_yr) { > if (sizeof($cols) > 0) { display_table_row($cols); } > $cols = array(); > $curr_year = $row['year']; > } > $cols[] = $row['value']; > } > // handle possible last row of table > if (sizeof($cols) > 0) { display_table_row($cols); } > Thanks for the code. I got it working with a couple of changes. But then I realized that with the new table design I can't anymore easily sort by a given year (1970 or 2000). This is surely one of the advantages of the "old" design, that the use via PHP was quite straight forward. Do I have to transfer the query results into a PHP array to sort it in there, then? Thanks for any hints! Stef
Stefan Schwarzer wrote: >> $curr_yr = -1 >> $cols = array(); >> while (<fetch rows>) { >> if ($row['year'] != $curr_yr) { >> if (sizeof($cols) > 0) { display_table_row($cols); } >> $cols = array(); >> $curr_year = $row['year']; >> } >> $cols[] = $row['value']; >> } >> // handle possible last row of table >> if (sizeof($cols) > 0) { display_table_row($cols); } >> > > Thanks for the code. I got it working with a couple of changes. But then > I realized that with the new table design I can't anymore easily sort by > a given year (1970 or 2000). This is surely one of the advantages of the > "old" design, that the use via PHP was quite straight forward. > > Do I have to transfer the query results into a PHP array to sort it in > there, then? Umm - not sure what you're after. What's wrong with one of: SELECT ... ORDER BY year, value SELECT ... ORDER BY value, year Or did you want a particular year pulled out of the general list, in which case try something like: SELECT ... ORDER BY (year = 1970), year, value SELECT ... ORDER BY (year <> 1970), year, value This works because booleans are considered sortable too. -- Richard Huxton Archonet Ltd
>>> $curr_yr = -1 >>> $cols = array(); >>> while (<fetch rows>) { >>> if ($row['year'] != $curr_yr) { >>> if (sizeof($cols) > 0) { display_table_row($cols); } >>> $cols = array(); >>> $curr_year = $row['year']; >>> } >>> $cols[] = $row['value']; >>> } >>> // handle possible last row of table >>> if (sizeof($cols) > 0) { display_table_row($cols); } >>> >> Thanks for the code. I got it working with a couple of changes. >> But then I realized that with the new table design I can't anymore >> easily sort by a given year (1970 or 2000). This is surely one of >> the advantages of the "old" design, that the use via PHP was quite >> straight forward. >> Do I have to transfer the query results into a PHP array to sort >> it in there, then? > > Umm - not sure what you're after. What's wrong with one of: > SELECT ... ORDER BY year, value > SELECT ... ORDER BY value, year > > Or did you want a particular year pulled out of the general list, > in which case try something like: > SELECT ... ORDER BY (year = 1970), year, value > SELECT ... ORDER BY (year <> 1970), year, value > This works because booleans are considered sortable too. Wow, didn't know about the "(year = 1970)" thing. Cool. But nevertheless, the problem is then with the PHP code above; a different sorting in the query result, means as well a different coding. Or I have completely miscoded your draft. But I don't see how it would handle a resulting array of any order - by year, by name, ascending, descending... I guess I need to go with the PHP array, no? $current_country = -1; $count = -1; $cols = array(); while ($row = pg_fetch_array($result)) { if ($row['name'] != $current_country) { if (sizeof($cols) > 0) { $count++; $code7 .= display_table_row($current_country, $cols, $count, $selectedDataset -> name, $selectedID, $selectedTime); } $cols = array(); $current_country = $row['name']; } $cols[] = $row['value']; } // handle possible last row of table if (sizeof($cols) > 0) { $code7 .= display_table_row($current_country, $cols, ($count + 1), $selectedDataset -> name, $selectedID, $selectedTime); }
Stefan Schwarzer wrote: > >>>> $curr_yr = -1 >>>> $cols = array(); >>>> while (<fetch rows>) { >>>> if ($row['year'] != $curr_yr) { >>>> if (sizeof($cols) > 0) { display_table_row($cols); } >>>> $cols = array(); >>>> $curr_year = $row['year']; >>>> } >>>> $cols[] = $row['value']; >>>> } >>>> // handle possible last row of table >>>> if (sizeof($cols) > 0) { display_table_row($cols); } >>>> >>> Thanks for the code. I got it working with a couple of changes. But >>> then I realized that with the new table design I can't anymore easily >>> sort by a given year (1970 or 2000). This is surely one of the >>> advantages of the "old" design, that the use via PHP was quite >>> straight forward. >>> Do I have to transfer the query results into a PHP array to sort it >>> in there, then? >> >> Umm - not sure what you're after. What's wrong with one of: >> SELECT ... ORDER BY year, value >> SELECT ... ORDER BY value, year >> >> Or did you want a particular year pulled out of the general list, in >> which case try something like: >> SELECT ... ORDER BY (year = 1970), year, value >> SELECT ... ORDER BY (year <> 1970), year, value >> This works because booleans are considered sortable too. > > Wow, didn't know about the "(year = 1970)" thing. Cool. But > nevertheless, the problem is then with the PHP code above; a different > sorting in the query result, means as well a different coding. Or I have > completely miscoded your draft. But I don't see how it would handle a > resulting array of any order - by year, by name, ascending, > descending... I guess I need to go with the PHP array, no? Ah, I see - your query-results do need to be ordered the same as the table, yes. Of course you should really have a data model that knows what it wants to sort by and constructs the query appropriately. The table-drawing code can then ask the data-model for heading-names and sort-order details. It's more work up-front, but you only have to do it once and then you can generate new table layouts very easily. -- Richard Huxton Archonet Ltd
> Umm - not sure what you're after. What's wrong with one of: > SELECT ... ORDER BY year, value > SELECT ... ORDER BY value, year > > Or did you want a particular year pulled out of the general list, > in which case try something like: > SELECT ... ORDER BY (year = 1970), year, value > SELECT ... ORDER BY (year <> 1970), year, value > This works because booleans are considered sortable too. Uiuiui.... and it gets even worse... I want to implement the possibility to calculate on-the-fly the per Capita values for the selected data set. With the "old" table design it would be something like this: SELECT (fish_catch.y_1970 / pop_total.y_1970), (fish_catch.y_1971 / pop_total.y_1971) FROM ..... But how would this work with the new design? Sorry for these repeated questions.... (But it's you guys' fault - you said I had a bad table design.... :-)) Thanks for any hints! Stef
Stefan Schwarzer wrote: > >> Umm - not sure what you're after. What's wrong with one of: >> SELECT ... ORDER BY year, value >> SELECT ... ORDER BY value, year >> >> Or did you want a particular year pulled out of the general list, in >> which case try something like: >> SELECT ... ORDER BY (year = 1970), year, value >> SELECT ... ORDER BY (year <> 1970), year, value >> This works because booleans are considered sortable too. > > Uiuiui.... and it gets even worse... I want to implement the possibility > to calculate on-the-fly the per Capita values for the selected data set. > With the "old" table design it would be something like this: > > SELECT (fish_catch.y_1970 / pop_total.y_1970), (fish_catch.y_1971 > / pop_total.y_1971) FROM ..... SELECT year, country, fish_catch, pop_total, (fish_catch / pop_total) AS fish_per_capita FROM my_table ORDER BY year,country Or, if the fish + population data are in different tables: SELECT f.year, f.country, f.fish_catch p.pop_total (f.fish_catch / p.pop_total) AS fish_per_capita FROM fish_figures f JOIN popn_figures p USING (year, country) ORDER BY f.year, f.country; HTH -- Richard Huxton Archonet Ltd
>> Uiuiui.... and it gets even worse... I want to implement the >> possibility to calculate on-the-fly the per Capita values for the >> selected data set. With the "old" table design it would be >> something like this: >> SELECT (fish_catch.y_1970 / pop_total.y_1970), >> (fish_catch.y_1971 / pop_total.y_1971) FROM ..... > > Or, if the fish + population data are in different tables: > > SELECT > f.year, > f.country, > f.fish_catch > p.pop_total > (f.fish_catch / p.pop_total) AS fish_per_capita > FROM > fish_figures f > JOIN > popn_figures p > USING (year, country) > ORDER BY f.year, f.country; > Muchos gracias, Mr. Postgres! I am really touched by your help! Just for the completeness, I attach the final working SQL query: SELECT f.year, f.id, c.name, (f.value / p.value) AS per_capita FROM fish_catch AS f JOIN pop_total AS p USING (year, id) INNER JOIN countries AS c ON f.id = c.id ORDER BY year
Stefan Schwarzer wrote: > Just for the completeness, I attach the final working SQL query: > > SELECT > f.year, > f.id, > c.name, > (f.value / p.value) AS per_capita > FROM > fish_catch AS f > JOIN > pop_total AS p > USING > (year, id) > INNER JOIN > countries AS c ON f.id = c.id > ORDER BY > year Make sure you fully specify the order: ORDER BY year, f.id, c.name It might work by chance a few times, but then return rows in an unexpected order later. -- Richard Huxton Archonet Ltd
> Of course you should really have a data model that knows what it > wants to sort by and constructs the query appropriately. The table- > drawing code can then ask the data-model for heading-names and sort- > order details. It's more work up-front, but you only have to do it > once and then you can generate new table layouts very easily. Can you recommend some literature for this subject? Tutorials or something online? I know what a data model is. But I have no idea how "the table-drawing code can then ask the data model".... Many thanks! Stef
Stefan Schwarzer wrote: > >> Of course you should really have a data model that knows what it wants >> to sort by and constructs the query appropriately. The table-drawing >> code can then ask the data-model for heading-names and sort-order >> details. It's more work up-front, but you only have to do it once and >> then you can generate new table layouts very easily. > > Can you recommend some literature for this subject? Tutorials or > something online? I know what a data model is. But I have no idea how > "the table-drawing code can then ask the data model".... Google around for "metadata" and "data driven/led development" Basically your data model needs to be able to describe itself in enough detail that your code can draw it without having to have lots of parameters set. -- Richard Huxton Archonet Ltd
SELECT f.year, f.id, c.name, (f.value / p.value) AS per_capita FROM fish_catch AS f JOIN pop_total AS p USING (year, id) INNER JOIN countries AS c ON f.id = c.id ORDER BY (year = 2005), value, name Seems to never end.... Why is redesigning tables so difficult?! :-)) And furthermore, it is really difficult to grab and formulate the problem that I have now. Let's see: The above SQL leads to a correct SQL result; but the "(year = 2005)" changes the "pattern" of the output completely. Before, without sorting by a specific year, it would look like this: year value name 1995 NULL Afghanistan 2000 NULL Afghanistan 2005 NULL Afghanistan 2000 2365 Albania 2005 2065 Albania 1995 1160 Albania 2000 113157 Algeria 2005 126259 Algeria 1995 105872 Algeria 2000 832 American Samoa 2005 3943 American Samoa 1995 152 American Samoa With specifying "ORDER BY (y_2005), value, name" I have this: year value name 1995 0 Ethiopia 2000 0 Ethiopia 2000 0.5 Bosnia and Herzegovina 1995 0.5 Bosnia and Herzegovina 2000 0.5 Christmas Island 1995 0.5 Christmas Island .... 2005 0 Bosnia and Herzegovina 2005 0 Ethiopia 2005 0.5 Christmas Island 2005 0.5 Cocos (Keeling) Islands But what I would need is this: 1995 0.5 Bosnia and Herzegovina 2000 0.5 Bosnia and Herzegovina 2005 0 Bosnia and Herzegovina 1995 0 Ethiopia 2000 0 Ethiopia 2005 0 Ethiopia 1995 0.5 Christmas Island 2000 0.5 Christmas Island 2005 0.5 Christmas Island Looks similar to the first result, but all content would be sorted by the year 2005 without separating it from the other years. Hmmm.... don't know if this is clear... Most grateful for any feedback, Stef