summarizing traffic logs - Mailing list pgsql-novice

From Tamas MEZEI
Subject summarizing traffic logs
Date
Msg-id 4169CE9B.9010403@bazmag.hu
Whole thread Raw
List pgsql-novice
Hi,

I've just started to use PgSQL, and having some trouble with aggregation
in SQL.

I have a huge-and-growing table described as:

current_traffic (
from_timestamp timestamp without timezone,
to_timestamp timestamp without timezone,
source_host inet,
dest_host inet,
flow_size bigint,
flow_kind smallint,
flow_proto smallint
);

The input is coming from a named pipe, and inserted into the table by a
perl script. I hope the field names are speaking from themselves, but
some additional info:
- source_host and dest_host: we have a campus network in the subnet
x.y.z.0/21 and an university network in the range of x.y.0.0/16, and
this will be inportant on summarizing
- flow_kind is a smallint value with discrete numbers (ie. it can  be
one of {1,2,3,4}) and contains the meaning "this flow is web traffic",
"this is mail" etc.
- flow proto can be 6 or 17 (tcp and udp).

Input is coming like crazy (~60000 rows in 15 mins, we have ~2000 hosts)
and I'd like to do some aggregation in every 15 mins to a table like
below, and then truncate the current_traffic log table. (Is it ok, and
is truncating "atomically"?)

quarterly_sum (
from_timestamp timestamp without timezone,
to_timestamp timestamp without timezone,
host inet,
tcp_to_uni_kind1 bigint,
tcp_to_uni_kind2 bigint,
tcp_to_uni_kind3 bigint,
tcp_to_uni_kind4 bigint,
tcp_from_uni_kind1 bigint,
tcp_from_uni_kind2 bigint,
tcp_from_uni_kind3 bigint,
tcp_from_uni_kind4 bigint,
tcp_to_world_kind1 bigint,
tcp_to_world_kind2 bigint,
tcp_to_world_kind3 bigint,
tcp_to_world_kind4 bigint,
tcp_from_world_kind1 bigint,
tcp_from_world_kind2 bigint,
tcp_from_world_kind3 bigint,
tcp_from_world_kind4 bigint,
udp_to_uni_kind1 bigint,
udp_to_uni_kind2 bigint,
udp_to_uni_kind3 bigint,
udp_to_uni_kind4 bigint,
udp_from_uni_kind1 bigint,
udp_from_uni_kind2 bigint,
udp_from_uni_kind3 bigint,
udp_from_uni_kind4 bigint,
udp_to_world_kind1 bigint,
udp_to_world_kind2 bigint,
udp_to_world_kind3 bigint,
udp_to_world_kind4 bigint,
udp_from_world_kind1 bigint,
udp_from_world_kind2 bigint,
udp_from_world_kind3 bigint,
udp_from_world_kind4 bigint
);

Classifying "world" and "university" traffic is quite easy with PgSQL
inet functions, but how should i create the aggregations grouped by the
kind of the flow?

I had some thoughts about creating some views, or using
triggers/cursors, but I'm not that deep in PgSQL to fully understand
every bit.

If anybody would help me solving this problem, that would be highly
appreciated.

Thanks,

Tamas

pgsql-novice by date:

Previous
From: "V i s h a l Kashyap @ [Sai Hertz And Control Systems]"
Date:
Subject: View or Function as default field
Next
From: Number One
Date:
Subject: Q: parameters to functions