Thread: trying to summarize into a new table by time...

trying to summarize into a new table by time...

From
Larry Rosenman
Date:
I have a LARGE table with 5 minute summary information:

--
-- TOC Entry ID 9 (OID 539300)
--
-- Name: traffic Type: TABLE Owner: ler
--

CREATE TABLE "traffic" ("asn" integer,"protocol" integer,"pkts_src" integer,"pkts_dst" integer,"bytes_src"
integer,"bytes_dst"integer,"secs_src" integer,"secs_dst" integer,"early" timestamp with time zone,"late" timestamp with
timezone
 
);

I'd like to summarize it into:

--
-- TOC Entry ID 10 (OID 539319)
--
-- Name: traffic_summary Type: TABLE Owner: ler
--

CREATE TABLE "traffic_summary" ("asn" integer,"protocol" integer,"pkts_src" double precision,"pkts_dst" double
precision,"bytes_src"double precision,"bytes_dst" double precision,"secs_src" double precision,"secs_dst" double
precision,"early"timestamp with time zone,"late" timestamp with time zone
 
);

Where we group into six hour groupings.

I came up with the following:

insert into traffic_summary
select asn,protocol,
cast(sum(pkts_src) as float) as pkts_src,
cast(sum(pkts_dst) as float) as pkts_dst,
cast(sum(bytes_src) as float) as bytes_src,
cast(sum(bytes_dst) as float) as bytes_dst,
cast(sum(secs_src) as float)  as secs_src,
cast(sum(secs_dst) as float) as secs_dst,
min(early) as early,
max(late) as late 
from traffic 
where early >= '2001-01-01 00:00:00' and     early <= '2001-01-02 05:59:59'
GROUP BY asn,protocol;

BUT, I'm wondering if there is an easy way to generate the obvious 
where clauses automatically?
-- 
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 972-414-9812                 E-Mail: ler@lerctr.org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749


Re: trying to summarize into a new table by time...

From
Larry Rosenman
Date:
* Larry Rosenman <ler@lerctr.org> [010602 05:17]:
> Where we group into six hour groupings.
> 
> I came up with the following:
> 
> insert into traffic_summary
> select asn,protocol,
> cast(sum(pkts_src) as float) as pkts_src,
> cast(sum(pkts_dst) as float) as pkts_dst,
> cast(sum(bytes_src) as float) as bytes_src,
> cast(sum(bytes_dst) as float) as bytes_dst,
> cast(sum(secs_src) as float)  as secs_src,
> cast(sum(secs_dst) as float) as secs_dst,
> min(early) as early,
> max(late) as late 
> from traffic 
> where early >= '2001-01-01 00:00:00' and
>       early <= '2001-01-02 05:59:59'
> GROUP BY asn,protocol;
> 
I *THINK* I can change the GROUP BY to add date_trunc('hour',early)
and get what I want to the hour. 

How can I get 6 hours? 

-- 
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 972-414-9812                 E-Mail: ler@lerctr.org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749


Re: trying to summarize into a new table by time...

From
Tom Lane
Date:
Larry Rosenman <ler@lerctr.org> writes:
> where early >= '2001-01-01 00:00:00' and
>       early <= '2001-01-02 05:59:59'

> BUT, I'm wondering if there is an easy way to generate the obvious 
> where clauses automatically?

Assuming this is being fired from a cron job started at or just after
each interval, you could do something with date_trunc:

regression=# select now();         now
------------------------2001-06-02 10:12:37-04
(1 row)

regression=# select date_trunc('hour', now());      date_trunc
------------------------2001-06-02 10:00:00-04
(1 row)

regression=# select date_trunc('hour', now()) - '6 hours'::interval;       ?column?
------------------------2001-06-02 04:00:00-04
(1 row)

Or, if you're running the script once a day at midnight, use
current_date (or equivalently date_trunc('day',...)) and subtract
appropriate intervals from that.
        regards, tom lane


Re: trying to summarize into a new table by time...

From
Larry Rosenman
Date:
* Tom Lane <tgl@sss.pgh.pa.us> [010602 09:19]:
> Larry Rosenman <ler@lerctr.org> writes:
> > where early >= '2001-01-01 00:00:00' and
> >       early <= '2001-01-02 05:59:59'
> 
> > BUT, I'm wondering if there is an easy way to generate the obvious 
> > where clauses automatically?
> 
> Assuming this is being fired from a cron job started at or just after
> each interval, you could do something with date_trunc:
> 
> regression=# select now();
>           now
> ------------------------
>  2001-06-02 10:12:37-04
> (1 row)
> 
> regression=# select date_trunc('hour', now());
>        date_trunc
> ------------------------
>  2001-06-02 10:00:00-04
> (1 row)
> 
> regression=# select date_trunc('hour', now()) - '6 hours'::interval;
>         ?column?
> ------------------------
>  2001-06-02 04:00:00-04
> (1 row)
> 
> Or, if you're running the script once a day at midnight, use
> current_date (or equivalently date_trunc('day',...)) and subtract
> appropriate intervals from that.
Thanks, Tom!

You are Terrific!

One more question, will the sum() on a bigint column cast to float
happen in floating point? 

I hope so!  This data is huge numbers!

Thanks again for always answering most of my queries.

LER

> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
> 

-- 
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 972-414-9812                 E-Mail: ler@lerctr.org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749


Re: trying to summarize into a new table by time...

From
Tom Lane
Date:
> One more question, will the sum() on a bigint column cast to float
> happen in floating point? 

Yes.  You don't really need that though.  These days there are only
two flavors of sum(): float8 accumulator (for float8 and float4 inputs)
and numeric accumulator (for everything else).  You don't need to worry
about overflow anymore.  float8 is faster if you don't mind loss of
precision...
        regards, tom lane


Re: trying to summarize into a new table by time...

From
Larry Rosenman
Date:
* Tom Lane <tgl@sss.pgh.pa.us> [010602 09:59]:
> > One more question, will the sum() on a bigint column cast to float
> > happen in floating point? 
> 
> Yes.  You don't really need that though.  These days there are only
> two flavors of sum(): float8 accumulator (for float8 and float4 inputs)
> and numeric accumulator (for everything else).  You don't need to worry
> about overflow anymore.  float8 is faster if you don't mind loss of
> precision...
Since I'm looking at order of magnitude type comparisons, the float8
is fine.  I don't need to know down to the byte, just comparisons
between orders of magnitudes.  ISP's move a *LOT* of data :-) 

LER

> 
>             regards, tom lane

-- 
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 972-414-9812                 E-Mail: ler@lerctr.org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749


Re: trying to summarize into a new table by time...

From
Alex Pilosov
Date:
Funny, I just yesterday wrote the same exact thing (ip/asn accounting
based on netflow)

The problem is to iterate over range, which SQL doesn't like to do, unless
given an explicit range table. Thus, there are possible solutions.

a) do it like this:
insert into traffic_summary
...
group by date_part('epoch',early)/60/60/2

(assuming 2-hour aggregation interval)

However, using this method, you won't get any data in traffic_summary 
when there was no traffic. If you want that, use...

b) have an explicit table traffic_periods(period_start timestamp,
period_end timestamp), prepopulated with data, and then doing

insert ...
select ...
from traffic t,traffic_periods tp
where t.early between period_start and period_end
group by tp.oid

On Sat, 2 Jun 2001, Larry Rosenman wrote:

> insert into traffic_summary
> select asn,protocol,
> cast(sum(pkts_src) as float) as pkts_src,
> cast(sum(pkts_dst) as float) as pkts_dst,
> cast(sum(bytes_src) as float) as bytes_src,
> cast(sum(bytes_dst) as float) as bytes_dst,
> cast(sum(secs_src) as float)  as secs_src,
> cast(sum(secs_dst) as float) as secs_dst,
> min(early) as early,
> max(late) as late 
> from traffic 
> where early >= '2001-01-01 00:00:00' and
>       early <= '2001-01-02 05:59:59'
> GROUP BY asn,protocol;
> 
> BUT, I'm wondering if there is an easy way to generate the obvious 
> where clauses automatically?
> 



help with a function

From
"Clayton Cottingham aka drfrog"
Date:
consider this function:

CREATE FUNCTION "test" (integer) RETURNS text AS 'return "select * from testt
limit $_[o];";' LANGUAGE 'plperl';


how would i get it to just run the query instead of just return it as text
using plperl?





Re: help with a function

From
Peter Eisentraut
Date:
Clayton Cottingham aka drfrog writes:

> CREATE FUNCTION "test" (integer) RETURNS text AS 'return "select * from testt
> limit $_[o];";' LANGUAGE 'plperl';
>
> how would i get it to just run the query instead of just return it as text
> using plperl?

You would extend PL/Perl to support querying the database from within a
function.

-- 
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter