On Fri, 2002-12-13 at 11:43, javier garcia wrote:
> Hi;
> I've got date data, extracted from rain gauge stations. The date of a row in
> my data are structured in three integer fields, and as a result of a query I
> can get the following (what is quite good for my):
> cod_station | year | month | day | rain
> -------------+------+-------+-----+------
> 7250 | 1933 | 8 | 1 | 45
> 7250 | 1933 | 8 | 2 | 3
> 7250 | 1933 | 8 | 3 | 0
> ...
> and this for several rain station for about forty years. My problem is that I
> need to find missing data (a missing data here is a non-existent row).
>
> I think I could prepare a table with a complete series of date field, and
> ALTER it to ADD aditional fields for every station I need to add:
>
> date | stat_7250 | stat_7237 ...
> ----------------+--------------------+---------------+------...
> 1/11/1999
> 2/12/1999
> ...
> , and make an UPDATE that in someway, will compare the "date" of this table
> with an extracted date from "year", "day" and "rain" from the aforementioned
> SELECT, leaving the non-coincident dates in blank and filling the rain data
> of the coincident dates. Is this possible? How?
>
> Please help.
> Thank you for any possible advice
>
seems like a big job. not knowing how many rain guages there are (I am
assuming they are mainly in the plain) this is probably a fair amount of
data, roughly 25,000 data points per gauge?
I think I would approach this differently based on what I knew about the
data but it may change after I think about it for a minute or two. I am
sort of interested in this issue since I am right now scheming on the
most effcient way to sum integer counts of events in seperate rows by 3
columns and where dates are contiguous.
Anyway, are there tens, hundreds or thousands of guages? do most guages
have some missing data? Is diskspace an issue for duplicating the data
in an easier to use fashion?
you could approach it as you were talking about but I think I would
first try to identify the missing data. that seems pretty straight
forward to code in the language of you choice I would probably do it in
perl but that is just because that is what I am most familier with.
buzz the table with a
select * order by cod_station,year, month, day
initialize the comparison values to the first row and
and then for each row compare the station and if it is the same
compare the date and see if it is equal to the old date + 1 day if not
write it out to a file or inster it into a badentry table as a bad
cod_station/date combo
you will need to get the dates into a format you can add a day to
without having to worry about the month year wrapping stuff
if the station is different reinitialize all the variables
and keep going
if the station is the same and the date check worked then set the vars
to the current values and keep on cruising.
This can probably be done in PL/psql or what ever the postgres language
is but I nave not gotten that far yet.
then you can do allsorts of stuff to the table based on the dates that
are found bad.
if all stations are supposed to current then an additional check would
have to be made to see if the last date for a station is the current
date and create entries in the bad file/table for each of the missing
days.
Bret
What I have been dreaming of is a select max(date) min(date) where dates
are contiguous and blah blah. I can't seem to find it in the docs
though :)