Re: Select running slow on Postgres - Mailing list pgsql-performance
From | Scott Marlowe |
---|---|
Subject | Re: Select running slow on Postgres |
Date | |
Msg-id | dcc563d10807021207q2fd1166cj16216fd5ba8e0234@mail.gmail.com Whole thread Raw |
In response to | Re: Select running slow on Postgres ("samantha mahindrakar" <sam.mahindrakar@gmail.com>) |
List | pgsql-performance |
On Wed, Jul 2, 2008 at 1:01 PM, samantha mahindrakar <sam.mahindrakar@gmail.com> wrote: > I ran the explain analyze.Here is what i got: > > > "Group (cost=112266.37..112266.40 rows=1 width=56) (actual > time=5583.399..5615.476 rows=13373 loops=1)" > " -> Sort (cost=112266.37..112266.38 rows=1 width=56) (actual > time=5583.382..5590.890 rows=13373 loops=1)" > " Sort Key: lane_data_07_08.lane_id, > lane_data_07_08.measurement_start, lane_data_07_08.measurement_end, > lane_data_07_08.speed, lane_data_07_08.volume, lane_data_07_08.occupancy, > lane_data_07_08.quality, lane_data_07_08.effective_date" > " -> Nested Loop IN Join (cost=0.00..112266.36 rows=1 width=56) > (actual time=1100.307..5547.768 rows=13373 loops=1)" > " -> Seq Scan on lane_data_07_08 (cost=0.00..112241.52 rows=3 > width=56) (actual time=1087.666..5341.662 rows=20581 loops=1)" You can see here that the seq scan on lane_data is what's eating up all your time. Also, since the row estimate is WAY off, it then chose a nested loop thinking it would be joining up only 1 row and actually running across 20k rows. > " Filter: (((volume = 255::double precision) OR (speed = > 255::double precision) OR (occupancy = 255::double precision) OR (occupancy >>= 100::double precision) OR (volume > 52::double precision) OR (volume < > 0::double precision) OR (speed > 120::double precision) OR (speed < > 0::double precision)) AND (date_part('hour'::text, measurement_start) >= > 5::double precision) AND (date_part('hour'::text, measurement_start) <= > 23::double precision) AND (date_part('day'::text, measurement_start) = > 1::double precision))" > " -> Index Scan using lane_info_pk on > lane_info (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.007 rows=1 > loops=20581)" > " Index Cond: (lane_data_07_08.lane_id = > lane_info.lane_id)" > " Filter: (inactive IS NULL)" > "Total runtime: 5621.409 ms" > > > Well instaed of creating extra indexes (since they eat up lot of space) i > made use of the whole measurement_start field, so thet it uses the index > proeprty and makes the search faster. > So i changed the query to include the measuerment start as follows: > > SELECT lane_id,measurement_start, > measurement_end,speed,volume,occupancy,quality,effective_date > FROM tss.lane_data_06_08 > WHERE lane_id in(select lane_id from lane_info where inactive is null ) > AND measurement_start between '2008-06-30 05:00:00-04' AND '2008-06-30 > 23:00:00-04' > GROUP BY > lane_id,measurement_start,measurement_end,speed,volume,occupancy,quality,effective_date > ORDER BY lane_id, measurement_start Yeah, anytime you can just compare date / timestamp on an indexed field you'll do better. If you find yourself needing to use the other syntax, so you can, for instance, grab the data for 5 days in a row from 5am to 11am or something, then the method I mentioned of making indexes on date_part are a good choice. Note that you need regular timestamp, not timstamptz to create indexes.
pgsql-performance by date: