Thread: trying to summarize into a new table by time...
I have a LARGE table with 5 minute summary information: -- -- TOC Entry ID 9 (OID 539300) -- -- Name: traffic Type: TABLE Owner: ler -- CREATE TABLE "traffic" ("asn" integer,"protocol" integer,"pkts_src" integer,"pkts_dst" integer,"bytes_src" integer,"bytes_dst"integer,"secs_src" integer,"secs_dst" integer,"early" timestamp with time zone,"late" timestamp with timezone ); I'd like to summarize it into: -- -- TOC Entry ID 10 (OID 539319) -- -- Name: traffic_summary Type: TABLE Owner: ler -- CREATE TABLE "traffic_summary" ("asn" integer,"protocol" integer,"pkts_src" double precision,"pkts_dst" double precision,"bytes_src"double precision,"bytes_dst" double precision,"secs_src" double precision,"secs_dst" double precision,"early"timestamp with time zone,"late" timestamp with time zone ); Where we group into six hour groupings. I came up with the following: insert into traffic_summary select asn,protocol, cast(sum(pkts_src) as float) as pkts_src, cast(sum(pkts_dst) as float) as pkts_dst, cast(sum(bytes_src) as float) as bytes_src, cast(sum(bytes_dst) as float) as bytes_dst, cast(sum(secs_src) as float) as secs_src, cast(sum(secs_dst) as float) as secs_dst, min(early) as early, max(late) as late from traffic where early >= '2001-01-01 00:00:00' and early <= '2001-01-02 05:59:59' GROUP BY asn,protocol; BUT, I'm wondering if there is an easy way to generate the obvious where clauses automatically? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 972-414-9812 E-Mail: ler@lerctr.org US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749
* Larry Rosenman <ler@lerctr.org> [010602 05:17]: > Where we group into six hour groupings. > > I came up with the following: > > insert into traffic_summary > select asn,protocol, > cast(sum(pkts_src) as float) as pkts_src, > cast(sum(pkts_dst) as float) as pkts_dst, > cast(sum(bytes_src) as float) as bytes_src, > cast(sum(bytes_dst) as float) as bytes_dst, > cast(sum(secs_src) as float) as secs_src, > cast(sum(secs_dst) as float) as secs_dst, > min(early) as early, > max(late) as late > from traffic > where early >= '2001-01-01 00:00:00' and > early <= '2001-01-02 05:59:59' > GROUP BY asn,protocol; > I *THINK* I can change the GROUP BY to add date_trunc('hour',early) and get what I want to the hour. How can I get 6 hours? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 972-414-9812 E-Mail: ler@lerctr.org US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749
Larry Rosenman <ler@lerctr.org> writes: > where early >= '2001-01-01 00:00:00' and > early <= '2001-01-02 05:59:59' > BUT, I'm wondering if there is an easy way to generate the obvious > where clauses automatically? Assuming this is being fired from a cron job started at or just after each interval, you could do something with date_trunc: regression=# select now(); now ------------------------2001-06-02 10:12:37-04 (1 row) regression=# select date_trunc('hour', now()); date_trunc ------------------------2001-06-02 10:00:00-04 (1 row) regression=# select date_trunc('hour', now()) - '6 hours'::interval; ?column? ------------------------2001-06-02 04:00:00-04 (1 row) Or, if you're running the script once a day at midnight, use current_date (or equivalently date_trunc('day',...)) and subtract appropriate intervals from that. regards, tom lane
* Tom Lane <tgl@sss.pgh.pa.us> [010602 09:19]: > Larry Rosenman <ler@lerctr.org> writes: > > where early >= '2001-01-01 00:00:00' and > > early <= '2001-01-02 05:59:59' > > > BUT, I'm wondering if there is an easy way to generate the obvious > > where clauses automatically? > > Assuming this is being fired from a cron job started at or just after > each interval, you could do something with date_trunc: > > regression=# select now(); > now > ------------------------ > 2001-06-02 10:12:37-04 > (1 row) > > regression=# select date_trunc('hour', now()); > date_trunc > ------------------------ > 2001-06-02 10:00:00-04 > (1 row) > > regression=# select date_trunc('hour', now()) - '6 hours'::interval; > ?column? > ------------------------ > 2001-06-02 04:00:00-04 > (1 row) > > Or, if you're running the script once a day at midnight, use > current_date (or equivalently date_trunc('day',...)) and subtract > appropriate intervals from that. Thanks, Tom! You are Terrific! One more question, will the sum() on a bigint column cast to float happen in floating point? I hope so! This data is huge numbers! Thanks again for always answering most of my queries. LER > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 972-414-9812 E-Mail: ler@lerctr.org US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749
> One more question, will the sum() on a bigint column cast to float > happen in floating point? Yes. You don't really need that though. These days there are only two flavors of sum(): float8 accumulator (for float8 and float4 inputs) and numeric accumulator (for everything else). You don't need to worry about overflow anymore. float8 is faster if you don't mind loss of precision... regards, tom lane
* Tom Lane <tgl@sss.pgh.pa.us> [010602 09:59]: > > One more question, will the sum() on a bigint column cast to float > > happen in floating point? > > Yes. You don't really need that though. These days there are only > two flavors of sum(): float8 accumulator (for float8 and float4 inputs) > and numeric accumulator (for everything else). You don't need to worry > about overflow anymore. float8 is faster if you don't mind loss of > precision... Since I'm looking at order of magnitude type comparisons, the float8 is fine. I don't need to know down to the byte, just comparisons between orders of magnitudes. ISP's move a *LOT* of data :-) LER > > regards, tom lane -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 972-414-9812 E-Mail: ler@lerctr.org US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749
Funny, I just yesterday wrote the same exact thing (ip/asn accounting based on netflow) The problem is to iterate over range, which SQL doesn't like to do, unless given an explicit range table. Thus, there are possible solutions. a) do it like this: insert into traffic_summary ... group by date_part('epoch',early)/60/60/2 (assuming 2-hour aggregation interval) However, using this method, you won't get any data in traffic_summary when there was no traffic. If you want that, use... b) have an explicit table traffic_periods(period_start timestamp, period_end timestamp), prepopulated with data, and then doing insert ... select ... from traffic t,traffic_periods tp where t.early between period_start and period_end group by tp.oid On Sat, 2 Jun 2001, Larry Rosenman wrote: > insert into traffic_summary > select asn,protocol, > cast(sum(pkts_src) as float) as pkts_src, > cast(sum(pkts_dst) as float) as pkts_dst, > cast(sum(bytes_src) as float) as bytes_src, > cast(sum(bytes_dst) as float) as bytes_dst, > cast(sum(secs_src) as float) as secs_src, > cast(sum(secs_dst) as float) as secs_dst, > min(early) as early, > max(late) as late > from traffic > where early >= '2001-01-01 00:00:00' and > early <= '2001-01-02 05:59:59' > GROUP BY asn,protocol; > > BUT, I'm wondering if there is an easy way to generate the obvious > where clauses automatically? >
consider this function: CREATE FUNCTION "test" (integer) RETURNS text AS 'return "select * from testt limit $_[o];";' LANGUAGE 'plperl'; how would i get it to just run the query instead of just return it as text using plperl?
Clayton Cottingham aka drfrog writes: > CREATE FUNCTION "test" (integer) RETURNS text AS 'return "select * from testt > limit $_[o];";' LANGUAGE 'plperl'; > > how would i get it to just run the query instead of just return it as text > using plperl? You would extend PL/Perl to support querying the database from within a function. -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter