Re: Performance of query (fwd) - Mailing list pgsql-general
From | Edmund Dengler |
---|---|
Subject | Re: Performance of query (fwd) |
Date | |
Msg-id | Pine.BSO.4.44.0306111125510.15310-100000@cyclops4.esentire.com Whole thread Raw |
In response to | Re: Performance of query (fwd) (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Performance of query (fwd)
(Tom Lane <tgl@sss.pgh.pa.us>)
|
List | pgsql-general |
Ok, tried it with a constant. No improvements. The estimator is still very high. Regards, Ed explain analyze select event.sid, event.cid, event.timestamp, signature.sig_name, iphdr.ip_src, iphdr.ip_dst, icmphdr.icmp_type, icmphdr.icmp_code, icmphdr.icmp_csum, icmphdr.icmp_id, icmphdr.icmp_seq, udphdr.udp_sport, udphdr.udp_dport, udphdr.udp_len, udphdr.udp_csum, tcphdr.tcp_sport, tcphdr.tcp_dport, tcphdr.tcp_seq, tcphdr.tcp_ack, tcphdr.tcp_off, tcphdr.tcp_res, tcphdr.tcp_flags, tcphdr.tcp_win, tcphdr.tcp_csum, tcphdr.tcp_urp, sensor.hostname, sensor.interface, data.data_payload from ( select * from event where timestamp > '2003-06-11 9:22 EDT'::timestamp and exists (select 1 from hack_pull_sid where sid = event.sid) ) as event left join signature on signature.sig_id = event.signature left join iphdr on iphdr.sid = event.sid and iphdr.cid = event.cid left join icmphdr on icmphdr.sid = event.sid and icmphdr.cid = event.cid left join udphdr on udphdr.sid = event.sid and udphdr.cid = event.cid left join tcphdr on tcphdr.sid = event.sid and tcphdr.cid = event.cid left join sensor on sensor.sid = event.sid left join data on data.sid = event.sid and data.cid = event.cid order by timestamp desc, event.sid desc, event.cid desc ; --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=1710629.85..1711222.69 rows=237136 width=853) (actual time=308403.45..308403.61 rows=139 loops=1) Sort Key: public.event."timestamp", public.event.sid, public.event.cid -> Merge Join (cost=1470394.12..1550828.17 rows=237136 width=853) (actual time=299837.82..308402.64 rows=139 loops=1) Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid)) -> Merge Join (cost=727702.20..801483.40 rows=237136 width=258) (actual time=45098.40..50164.32 rows=139loops=1) Merge Cond: ("outer".sid = "inner".sid) -> Merge Join (cost=727700.17..797923.97 rows=237136 width=226) (actual time=45077.16..50140.32 rows=139loops=1) Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid)) -> Merge Join (cost=727700.17..735968.33 rows=237136 width=172) (actual time=25446.85..26861.14 rows=139loops=1) Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid)) -> Merge Join (cost=727700.17..730447.45 rows=237136 width=144) (actual time=23709.07..24201.98rows=139 loops=1) Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid)) -> Sort (cost=700380.22..700973.06 rows=237136 width=116) (actual time=17821.61..17821.81rows=139 loops=1) Sort Key: public.event.sid, public.event.cid -> Merge Join (cost=590252.70..671029.48 rows=237136 width=116) (actual time=11569.28..17821.25rows=139 loops=1) Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid)) -> Index Scan using iphdr_pkey on iphdr (cost=0.00..73422.89 rows=1432042width=28) (actual time=18.79..16195.02 rows=730166 loops=1) -> Sort (cost=590252.70..590845.54 rows=237136 width=88) (actual time=52.35..52.55rows=139 loops=1) Sort Key: public.event.sid, public.event.cid -> Hash Join (cost=185.07..562558.03 rows=237136 width=88) (actual time=34.28..51.89rows=139 loops=1) Hash Cond: ("outer".signature = "inner".sig_id) -> Index Scan using timestamp_idx on event (cost=0.00..557630.23rows=237136 width=24) (actual time=0.22..16.58 rows=139 loops=1) Index Cond: ("timestamp" > ('2003-06-11 09:22:00'::timestampwithout time zone)::timestamp with time zone) Filter: (subplan) SubPlan -> Seq Scan on hack_pull_sid (cost=0.00..1.15 rows=1 width=0)(actual time=0.01..0.01 rows=0 loops=953) Filter: (sid = $0) -> Hash (cost=167.26..167.26 rows=7126 width=64) (actualtime=33.91..33.91rows=0 loops=1) -> Seq Scan on signature (cost=0.00..167.26 rows=7126 width=64)(actual time=0.04..17.74 rows=7131 loops=1) -> Sort (cost=27319.94..27897.87 rows=231171 width=28) (actual time=5671.28..6088.28 rows=130551loops=1) Sort Key: icmphdr.sid, icmphdr.cid -> Seq Scan on icmphdr (cost=0.00..3784.71 rows=231171 width=28) (actual time=3.87..1278.01rows=232877 loops=1) -> Index Scan using udphdr_pkey on udphdr (cost=0.00..4411.44 rows=115228 width=28) (actualtime=79.77..2532.43 rows=56946 loops=1) -> Index Scan using tcphdr_pkey on tcphdr (cost=0.00..56586.06 rows=1139141 width=54) (actual time=89.40..22157.07rows=504556 loops=1) -> Sort (cost=2.04..2.11 rows=30 width=32) (actual time=21.18..21.40 rows=158 loops=1) Sort Key: sensor.sid -> Seq Scan on sensor (cost=0.00..1.30 rows=30 width=32) (actual time=20.98..21.05 rows=30 loops=1) -> Sort (cost=742691.92..745059.80 rows=947154 width=595) (actual time=246938.33..257244.21 rows=444980 loops=1) Sort Key: data.sid, data.cid -> Seq Scan on data (cost=0.00..51821.54 rows=947154 width=595) (actual time=9.68..198574.76 rows=955002loops=1) Total runtime: 308903.04 msec (41 rows) On Tue, 10 Jun 2003, Tom Lane wrote: > Edmund Dengler <edmundd@eSentire.com> writes: > > select * from event > > where timestamp > (select now() - '2 hours'::interval) > > and exists (select 1 from hack_pull_sid where sid = event.sid) > > > (note: <hack_pull_sid> is a table of SIDs I am interested in so that I > > avoid the issues with IN) > > I think you're creating more issues than you're avoiding. With the > above query, the planner has little chance of guessing how many rows > will be retrieved from "event" ... and indeed the EXPLAIN output shows > that its guess is off by more than a factor of 1000: > > > -> Index Scan using timestamp_idx on event (cost=0.00..558165.62 rows=237893 width=24) (actual time=0.18..3.05rows=129 loops=1) > > With a misestimate of that magnitude at the core of the query, it's > unsurprising that all the other plan choices are bad too. > > But actually I suspect the easiest point of attack is not the EXISTS > subquery, but the timestamp comparison. Can you get your application > to supply a simple literal constant to compare to the timestamp, viz > '2003-06-10 21:44' rather than now()-'2 hours'? The former gives the > planner something to compare to its statistics, the latter doesn't. > > Oh ... you have done an ANALYZE on event reasonably recently, no? > > regards, tom lane >
pgsql-general by date: