Re: Interesting question - Mailing list pgsql-hackers

From Larry Rosenman
Subject Re: Interesting question
Date
Msg-id 20010518222750.A21941@lerami.lerctr.org
Whole thread Raw
In response to Re: Interesting question  (Larry Rosenman <ler@lerctr.org>)
Responses Re: Interesting question  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
* Larry Rosenman <ler@lerctr.org> [010518 21:48]:
> * Tom Lane <tgl@sss.pgh.pa.us> [010518 21:09]:
> > Larry Rosenman <ler@lerctr.org> writes:
> > >  EXPLAIN
> > >  SELECT ...
> > >  FROM attack_db   
> > >  WHERE (start_time >= now() - '02:00:00'::interval OR   
> > >         end_time >= now() - '02:00:00'::interval)   
> > >         AND host(src_ip) NOT IN (select host(ip) from exempt_ips)    
> > >         AND host(dst_ip) NOT IN (select host(ip) from exempt_ips)    
> > >  ORDER BY bytes DESC;
> > 
> > > NOTICE:  QUERY PLAN:
> > 
> > > Sort  (cost=10870.77..10870.77 rows=5259 width=120)
> > >   ->  Seq Scan on attack_db  (cost=0.00..10358.95 rows=5259 width=120)
> > >         SubPlan
> > >           ->  Seq Scan on exempt_ips  (cost=0.00..1.04 rows=4 width=12)
> > >           ->  Seq Scan on exempt_ips  (cost=0.00..1.04 rows=4 width=12)
> > 
> > 
> > Making use of the indexes on start_time and end_time would be a good
> > thing.  The reason it's not doing that is it doesn't think that the
> > expressions "now() - '02:00:00'::interval" reduce to constants.  We
> > may have a proper solution for that by the time 7.2 comes out, but
> > in the meantime you could fake it with a function that hides the
> > noncacheable function and operator --- see previous discussions of
> > this identical issue in the archives.
> OK. What would you suggest for the function?  I'd like the
> '02:00:00'::interval to be a variable somehow to change the 
> interval we're searching.    What fills the table is a daemon that is
> looking at the netflow data, and when a packet that matches one of the
> attack profiles comes along, it does an insert into attack_db. 
I tried the following function:

--
-- TOC Entry ID 15 (OID 35180)
--
-- Name: "nowminus" (interval) Type: FUNCTION Owner: ler
--

CREATE FUNCTION "nowminus" (interval) RETURNS timestamp with time zone AS 'SELECT now() - $1;' LANGUAGE 'sql';

and the following query:
EXPLAINSELECT to_char(start_time,'MM/DD/YY') as mmddyy,         to_char(start_time,'HH24:MI:SS') as hhmmss,
getattack_type(attack_type)as type,         src_router as router,         input_int as ii,         output_int as oi,
    src_as as srcas,host(src_ip) || '/' || masklen(src_ip) || ':' ||          format_port(protocol,src_port) as
src_address,        dst_as as dstas,host(dst_ip) || '/' || masklen(dst_ip) || ':' ||
format_port(protocol,dst_port)as dst_address,         format_protocol(protocol) as prot,
tos,format_flags(protocol,pr_flags)as tcpflags,         pkts,bytes,         bytes/pkts as bytes_per_packet,
to_char(end_time,'MM/DD/YY')as end_mmddyy,         to_char(end_time,'HH24:MI:SS') as end_hhmmss,         next_hop
FROMattack_db   WHERE (start_time >= nowminus('02:00:00'::interval) OR          end_time >=
nowminus('02:00:00'::interval))       AND host(src_ip) NOT IN (select host(ip) from exempt_ips)           AND
host(dst_ip)NOT IN (select host(ip) from exempt_ips)    ORDER BY bytes DESC;   ;
 

And got the following plan:

NOTICE:  QUERY PLAN:

Sort  (cost=11313.95..11313.95 rows=5497 width=120) ->  Seq Scan on attack_db  (cost=0.00..10777.58 rows=5497
width=120)      SubPlan         ->  Seq Scan on exempt_ips  (cost=0.00..1.04 rows=4 width=12)         ->  Seq Scan on
exempt_ips (cost=0.00..1.04 rows=4 width=12)
 

EXPLAIN
> 
> 
> > 
> > The NOT INs are pretty ugly too (and do you need the host() conversion
> > there?  Seems like a waste of cycles...).  You might be able to live
> > with that if the timestamp condition will always be pretty restrictive,
> > but otherwise they'll be a no go.  Consider NOT EXISTS with an index
> > on exempt_ips(ip).
> Yes, because the masks will probably be different each time (this is
> from netflow data from my cisco's).  The exempt IP's table is, at the
> moment 4 ip's, so that's quick anyway. 
> 
> > 
> >             regards, tom lane
> 
> -- 
> Larry Rosenman                     http://www.lerctr.org/~ler
> Phone: +1 972-414-9812                 E-Mail: ler@lerctr.org
> US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> 

-- 
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 972-414-9812                 E-Mail: ler@lerctr.org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Fix for tablename in targetlist
Next
From: Bruce Momjian
Date:
Subject: Re: Plans for solving the VACUUM problem