Home > mailing lists
Re: partial time stamp query - Mailing list pgsql-general

From	Kirk Wythers
Subject	Re: partial time stamp query
Date	February 5, 2013 03:40:33
Msg-id	14FB805F-E32C-4E58-8156-06B902FBB999@umn.edu Whole thread
In response to	Re: partial time stamp query (Misa Simic <misa.simic@gmail.com>)
List	pgsql-general
Tree view
On Feb 4, 2013, at 7:03 PM, Misa Simic <misa.simic@gmail.com> wrote:

> Select time2::date, extract('hour' from time2), AVG(avg) from =
tablename group by time2::date, extract('hour' from time2)

Thanks Misa,=20

But this gives the same result as the way I was using date_trunc (not =
GROUPING BY the hour portion of the timestamp, or in this case the =
re-cast date). I have simplified the query, as much as I can, and it is =
below:

--COPY (
SELECT
    derived_tsoil_fifteen_min_stacked.time2::date, =20
    extract('hour' FROM derived_tsoil_fifteen_min_stacked.time2),
    data_key.plot,
    data_key.variable_name,
    AVG(derived_tsoil_fifteen_min_stacked.value)
FROM
    data_key,
    derived_tsoil_fifteen_min_stacked
WHERE
    data_key.variable_channel =3D =
derived_tsoil_fifteen_min_stacked.variable AND data_key.block_name =3D =
derived_tsoil_fifteen_min_stacked.block_name
    AND data_key.plot =3D 'a2'
    AND derived_tsoil_fifteen_min_stacked.time2::date =3D =
'2010-07-07'::date
    AND derived_tsoil_fifteen_min_stacked.variable =3D =
'tsoil_avg1_sc'
GROUP BY
    derived_tsoil_fifteen_min_stacked.time2::date,=20
    extract('hour' FROM derived_tsoil_fifteen_min_stacked.time2),
    derived_tsoil_fifteen_min_stacked.time2,
    data_key.variable_name,
    data_key.plot
ORDER BY
    derived_tsoil_fifteen_min_stacked.time2
--) TO '/tmp/derived_tsoil_hourly.csv' WITH CSV HEADER
;

This query returns 96 records (again, one for each 15 minute interval in =
the 24 hour day).=20

2010-07-07    0    a2    tsoil_sc    21.0599994659424
2010-07-07    0    a2    tsoil_sc    20.9599990844727
2010-07-07    0    a2    tsoil_sc    20.8799991607666
2010-07-07    0    a2    tsoil_sc    20.7999992370605
2010-07-07    1    a2    tsoil_sc    20.7199993133545
2010-07-07    1    a2    tsoil_sc    20.6399993896484
2010-07-07    1    a2    tsoil_sc    20.5499992370605
2010-07-07    1    a2    tsoil_sc    20.4699993133545
2010-07-07    2    a2    tsoil_sc    20.3899993896484
2010-07-07    2    a2    tsoil_sc    20.3099994659424
2010-07-07    2    a2    tsoil_sc    20.25
2010-07-07    2    a2    tsoil_sc    20.1599998474121
2010-07-07    3    a2    tsoil_sc    20.1000003814697
2010-07-07    3    a2    tsoil_sc    20.0100002288818
2010-07-07    3    a2    tsoil_sc    19.9400005340576
2010-07-07    3    a2    tsoil_sc    19.8600006103516
2010-07-07    4    a2    tsoil_sc    19.7700004577637
2010-07-07    4    a2    tsoil_sc    19.7199993133545
2010-07-07    4    a2    tsoil_sc    19.6499996185303
2010-07-07    4    a2    tsoil_sc    19.5599994659424
etc=85.

Could there be anything in the JOIN part of this query that is causing =
problems? I'm really grasping at straws now!

Thanks again,

Kirk


>=20
>=20
> On Monday, February 4, 2013, Kirk Wythers wrote:
> Hi Brent,
>=20
> Nice to hear from you. I hope your world is good.
>=20
> On Feb 4, 2013, at 2:14 PM, Brent Wood <Brent.Wood@niwa.co.nz> wrote:
>=20
> > Hi Kirk,
> >
> > We have a (near) real time data database for instrument observations =
from our research vessels. All observations (summarised to one minute =
intervals - the actual raw data is in netCDF, this database makes for =
easier access & meets most users needs) go into a single table, with =
other tables for metadata about platforms, instruments, etc. Now =
approaching 350m records, so reasonably substantial.
> >
> > Underlying optimisations include
> >
> > partitioned readings table, with a separate partition for each year =
(now 23 years)
> > clustered index on timestamp for the previous years partitions.
> > largeish filesystem block size - tested to work well with the =
clustered index & small size records)
> >
> > These generally make a big difference to performance. To address one =
issue, much like yours, where some users want hourly data for a year, =
some want daily data for 10 years & some want 1 minute data for the last =
month (& some, no doubt, want one minute data for 20+ years!) I =
introduced an integer column called timer. This value is set according =
to the time (not date) of each record.
>=20
> Very similar to what I need to do. Our main table consists of records =
that have been standardized to 15 minute timestamps. Here is a =
simplified example
>=20
> record  timestamp               variable                value
> 1               12:00:00                temp            12.6
> 2               12:15:00                temp            12.3
> 3               12:30:00                temp            11.7
> 4               12:45:00                temp            12.3
> 5               13:00:00                temp            13.9
> 6               13:15:00                temp            12.5
> 7               13.30:00                temp            13.7
> 8               13:45:00                temp            12.0
>=20
> You are exactly right, some people will want the original 15 minute =
version, some people will want these summarized to hourly data, and =
others will want these summarized to daily data. Still others may be =
satisfied with monthly summaries.
>=20
> >
> > Along the lines of (from memory) :an even no of minutes after the =
hour is 2, 5 minutes is 4, 10 minutes is 8, 15 minute is 16, 30 minutes =
is 32, 60 minutes is 64, 6 hourly is 128, 12:00 AM is 256 & 12:00PM is =
512.   When any timestamp is in more than one category (eg: 12:00 is all =
of even, 5, 15m 30m 60 minutes), the timer value is set to the largest =
appropriate one.
>=20
> I'm not quite following. In my case, if I want hourly data, I'd be =
looking for=85
>=20
> record  timestamp               variable                value
> 1               12:00:00                temp            12.225
> 2               13:00:00                temp            13.025
>=20
> Are you saying that I could use an approach that WHILE statement? =
Something like:
>=20
> WHILE data_truc('hour', timestamp) =3D 12:00:00, then calulate =
AVG(value)?
>=20
> >
> > So a request for:
> > 1 minute data is select from table;
> > 2 minute data is select from table where timer >=3D2 and timer !=3D15 =
and timer !=3D4;
> > hourly data is select from table where timer >=3D64 and timer !=3D =
15 and timer !=3D 4;
> > etc
> >
> > 5 & 15 minute add a bit of complexity, but we gave the users what =
they wanted. This has worked well for years now, & we have an internal =
web (mapserver/openlayers based) application allowing users to visualise =
& download their selected data - they choose from an interval pick list =
& the SQL is hidden. Some extra enhancements are the automatic collation =
of lat & lon gps readings into a Postgis point for each reading record, =
& the automatic aggregation of daily points into daily track lines, so =
the track for any selected set of dates can easily be displayed on a map =
(the platforms are mobile vessels - not fixed sites)
> >
> > You might adapt some of these ideas for your use case?
> >
> > Cheers
> >
> > Brent Wood
> >
> > Programme leader: Environmental Information Delivery
> > NIWA
> > DDI:  +64 (4) 3860529
> > ________________________________________
> > From: pgsql-general-owner@postgresql.org =
[pgsql-general-owner@postgresql.org] on behalf of Kirk Wythers =
[wythe001@umn.edu]
> > Sent: Tuesday, February 05, 2013 5:58 AM
> > To: pgsql-general@postgresql.org
> > Subject: Fwd: [GENERAL] partial time stamp query
> >
> > Thanks. That worked great! Now I am trying to aggregate these same =
fifteen minute to hourly. I have tried using date_trunk:
> >
> > date_trunc('hour', derived_tsoil_fifteen_min_stacked.time2),
> >
> > but date_truck only seems to aggriage the timestamp. I thought I =
could use
> >
> > AVG(derived_tsoil_fifteen_min_stacked.value)
> >
> > in combination with date_trunk, but I still get 15 minute values, =
not the hourly average from the four 15 minute records.
> >
> > rowid date_truck time2 site canopy plot variable name value avg
> > 2010-07-07_00:00:00_b4warm_a 2010-07-07 00:00:00 1 2010-07-07 =
00:00:00 cfc closed a2 tsoil_sc tsoil_avg1_sc 21.06 21.0599994659424
> > 2010-07-07_00:15:00_b4warm_a 2010-07-07 00:00:00 1 2010-07-07 =
00:15:00 cfc closed a2 tsoil_sc tsoil_avg1_sc 20.96 20.9599990844727
> > 2010-07-07_00:30:00_b4warm_a 2010-07-07 00:00:00 1 2010-07-07 =
00:30:00 cfc closed a2 tsoil_sc tsoil_avg1_sc 20.88 20.8799991607666
> > 2010-07-07_00:45:00_b4warm_a 2010-07-07 00:00:00 1 2010-07-07 =
00:45:00 cfc closed a2 tsoil_sc tsoil_avg1_sc 20.8 20.7999992370605
> > 2010-07-07_01:00:00_b4warm_a 2010-07-07 01:00:00 1 2010-07-07 =
01:00:00 cfc closed a2 tsoil_sc tsoil_avg1_sc 20.72 20.7199993133545
> > 2010-07-07_01:15:00_b4warm_a 2010-07-07 01:00:00 1 2010-07-07 =
01:15:00 cfc closed a2 tsoil_sc tsoil_avg1_sc 20.64 20.6399993896484
> > 2010-07-07_01:30:00_b4warm_a 2010-07-07 01:00:00 1 2010-07-07 =
01:30:00 cfc closed a2 tsoil_sc tsoil_avg1_sc 20.55 20.5499992370605
> > 2010-07-07_01:45:00_b4warm_a 2010-07-07 01:00:00 1 2010-07-07 =
01:45:00 cfc closed a2 tsoil_sc tsoil_avg1_sc 20.47 20.4699993133545
> >
> > I was tying to get two records out of this set, with the 'avg" =
column representing the mean of the first and last four of each 15 =
minute records.
> >
> > Perhaps date_trunk only works for the timestamp?
> >
> >
> >
> > On Feb 4, 2013, at 8:50 AM, Misa Simic =
<misa.simic@gmail.com<mailto:misa.simic@gmail.com>> wrote:
> >
> > WHERE derived_tsoil_fifteen_min_stacked.time2::date =3D =
'2010-07-07'::date
> >
> > On Monday, February 4, 2013, Kirk Wythers wrote:
> > I am trying to write a query that grabs one particular day from a =
timestamp column. The data are ordered in 15 minute chunks like this:
> >
> > 2010-07-07 12:45:00
> > 2010-07-07 13:00:00
> > 2010-07-07 13:15:00
> > 2010-07-07 13:30:00
> > etc=85
> >
> > If I wanted all records from july 7th 2010, I would expect 4 x 24 =3D =
96 records per day.
> >
> > I have tried the '=3D' operator, like this
> >
> > WHERE derived_tsoil_fifteen_min_stacked.time2 =3D '2010-07-07*'
> >
> > but that grabs nothing, and using the '~' operator grabs everything =
with a 2010 or 07 in it=85 in other word
pgsql-general by date:
From: Moshe Jacobson
Date: 05 February 2013, 01:27:03
Subject: Re: partial time stamp query
From: Bruce Momjian
Date: 05 February 2013, 05:06:36
Subject: Re: What language is faster, C or PL/PgSQL?
Re: partial time stamp query - Mailing list pgsql-general

Previous

Next