Re: Performance of query (fwd) - Mailing list pgsql-general

From Edmund Dengler
Subject Re: Performance of query (fwd)
Date
Msg-id Pine.BSO.4.44.0306111125510.15310-100000@cyclops4.esentire.com
Whole thread Raw
In response to Re: Performance of query (fwd)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Performance of query (fwd)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Ok, tried it with a constant. No improvements. The estimator is still very
high.

Regards,
Ed

explain analyze
select
  event.sid, event.cid, event.timestamp, signature.sig_name, iphdr.ip_src, iphdr.ip_dst,
  icmphdr.icmp_type, icmphdr.icmp_code, icmphdr.icmp_csum, icmphdr.icmp_id, icmphdr.icmp_seq,
  udphdr.udp_sport, udphdr.udp_dport, udphdr.udp_len, udphdr.udp_csum,
  tcphdr.tcp_sport, tcphdr.tcp_dport, tcphdr.tcp_seq, tcphdr.tcp_ack, tcphdr.tcp_off,
  tcphdr.tcp_res, tcphdr.tcp_flags, tcphdr.tcp_win, tcphdr.tcp_csum, tcphdr.tcp_urp,
  sensor.hostname, sensor.interface, data.data_payload
from (
  select * from event
  where timestamp > '2003-06-11 9:22 EDT'::timestamp
    and exists (select 1 from hack_pull_sid where sid = event.sid)
) as event
  left join signature on signature.sig_id = event.signature
  left join iphdr on iphdr.sid = event.sid and iphdr.cid = event.cid
  left join icmphdr on icmphdr.sid = event.sid and icmphdr.cid = event.cid
  left join udphdr on udphdr.sid = event.sid and udphdr.cid = event.cid
  left join tcphdr on tcphdr.sid = event.sid and tcphdr.cid = event.cid
  left join sensor on sensor.sid = event.sid
  left join data on data.sid = event.sid and data.cid = event.cid
order by
  timestamp desc, event.sid desc, event.cid desc
;



---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1710629.85..1711222.69 rows=237136 width=853) (actual time=308403.45..308403.61 rows=139 loops=1)
   Sort Key: public.event."timestamp", public.event.sid, public.event.cid
   ->  Merge Join  (cost=1470394.12..1550828.17 rows=237136 width=853) (actual time=299837.82..308402.64 rows=139
loops=1)
         Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid))
         ->  Merge Join  (cost=727702.20..801483.40 rows=237136 width=258) (actual time=45098.40..50164.32
rows=139loops=1)
               Merge Cond: ("outer".sid = "inner".sid)
               ->  Merge Join  (cost=727700.17..797923.97 rows=237136 width=226) (actual time=45077.16..50140.32
rows=139loops=1) 
                     Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid))
                     ->  Merge Join  (cost=727700.17..735968.33 rows=237136 width=172) (actual time=25446.85..26861.14
rows=139loops=1) 
                           Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid))
                           ->  Merge Join  (cost=727700.17..730447.45 rows=237136 width=144) (actual
time=23709.07..24201.98rows=139 loops=1) 
                                 Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid))
                                 ->  Sort  (cost=700380.22..700973.06 rows=237136 width=116) (actual
time=17821.61..17821.81rows=139 loops=1) 
                                       Sort Key: public.event.sid, public.event.cid
                                       ->  Merge Join  (cost=590252.70..671029.48 rows=237136 width=116) (actual
time=11569.28..17821.25rows=139 loops=1) 
                                             Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid))
                                             ->  Index Scan using iphdr_pkey on iphdr  (cost=0.00..73422.89
rows=1432042width=28) (actual time=18.79..16195.02 rows=730166 loops=1) 
                                             ->  Sort  (cost=590252.70..590845.54 rows=237136 width=88) (actual
time=52.35..52.55rows=139 loops=1) 
                                                   Sort Key: public.event.sid, public.event.cid
                                                   ->  Hash Join  (cost=185.07..562558.03 rows=237136 width=88) (actual
time=34.28..51.89rows=139 loops=1) 
                                                         Hash Cond: ("outer".signature = "inner".sig_id)
                                                         ->  Index Scan using timestamp_idx on event
(cost=0.00..557630.23rows=237136 width=24) (actual time=0.22..16.58 rows=139 loops=1) 
                                                               Index Cond: ("timestamp" > ('2003-06-11
09:22:00'::timestampwithout time zone)::timestamp with time zone) 
                                                               Filter: (subplan)
                                                               SubPlan
                                                                 ->  Seq Scan on hack_pull_sid  (cost=0.00..1.15 rows=1
width=0)(actual time=0.01..0.01 rows=0 loops=953) 
                                                                       Filter: (sid = $0)
                                                         ->  Hash  (cost=167.26..167.26 rows=7126 width=64)
(actualtime=33.91..33.91rows=0 loops=1) 
                                                               ->  Seq Scan on signature  (cost=0.00..167.26 rows=7126
width=64)(actual time=0.04..17.74 rows=7131 loops=1) 
                                 ->  Sort  (cost=27319.94..27897.87 rows=231171 width=28) (actual time=5671.28..6088.28
rows=130551loops=1) 
                                       Sort Key: icmphdr.sid, icmphdr.cid
                                       ->  Seq Scan on icmphdr  (cost=0.00..3784.71 rows=231171 width=28) (actual
time=3.87..1278.01rows=232877 loops=1) 
                           ->  Index Scan using udphdr_pkey on udphdr  (cost=0.00..4411.44 rows=115228 width=28)
(actualtime=79.77..2532.43 rows=56946 loops=1) 
                     ->  Index Scan using tcphdr_pkey on tcphdr  (cost=0.00..56586.06 rows=1139141 width=54) (actual
time=89.40..22157.07rows=504556 loops=1) 
               ->  Sort  (cost=2.04..2.11 rows=30 width=32) (actual time=21.18..21.40 rows=158 loops=1)
                     Sort Key: sensor.sid
                     ->  Seq Scan on sensor  (cost=0.00..1.30 rows=30 width=32) (actual time=20.98..21.05 rows=30
loops=1)
         ->  Sort  (cost=742691.92..745059.80 rows=947154 width=595) (actual time=246938.33..257244.21 rows=444980
loops=1)
               Sort Key: data.sid, data.cid
               ->  Seq Scan on data  (cost=0.00..51821.54 rows=947154 width=595) (actual time=9.68..198574.76
rows=955002loops=1) 
 Total runtime: 308903.04 msec
(41 rows)



On Tue, 10 Jun 2003, Tom Lane wrote:

> Edmund Dengler <edmundd@eSentire.com> writes:
> >   select * from event
> >   where timestamp > (select now() - '2 hours'::interval)
> >     and exists (select 1 from hack_pull_sid where sid = event.sid)
>
> > (note: <hack_pull_sid> is a table of SIDs I am interested in so that I
> > avoid the issues with IN)
>
> I think you're creating more issues than you're avoiding.  With the
> above query, the planner has little chance of guessing how many rows
> will be retrieved from "event" ... and indeed the EXPLAIN output shows
> that its guess is off by more than a factor of 1000:
>
> >      ->  Index Scan using timestamp_idx on event  (cost=0.00..558165.62 rows=237893 width=24) (actual
time=0.18..3.05rows=129 loops=1) 
>
> With a misestimate of that magnitude at the core of the query, it's
> unsurprising that all the other plan choices are bad too.
>
> But actually I suspect the easiest point of attack is not the EXISTS
> subquery, but the timestamp comparison.  Can you get your application
> to supply a simple literal constant to compare to the timestamp, viz
> '2003-06-10 21:44' rather than now()-'2 hours'?  The former gives the
> planner something to compare to its statistics, the latter doesn't.
>
> Oh ... you have done an ANALYZE on event reasonably recently, no?
>
>             regards, tom lane
>


pgsql-general by date:

Previous
From: Chris Gamache
Date:
Subject: Re: Index not being used in MAX function (7.2.3)
Next
From: "listrec"
Date:
Subject: Re: Index not being used in MAX function (7.2.3)