Re: difficulties with time based queries - Mailing list pgsql-performance

From Rainer Mager
Subject Re: difficulties with time based queries
Date
Msg-id 000001c9b99a$743da930$5cb8fb90$@com
Whole thread Raw
In response to difficulties with time based queries  ("Rainer Mager" <rainer@vanten.com>)
Responses Re: difficulties with time based queries
List pgsql-performance

Thanks for all of the suggestions so far. I’ve been trying to reduce the number of indices I have, but I’m running into a problem. I have a need to do queries on this table with criteria applied to the date and possibly any or all of the other key columns. As a reminder, here’s my table:

 

                                         Table "public.ad_log"

   Column   |            Type             |                         Modifiers

------------+-----------------------------+------------------------------------------------------------

 ad_log_id  | integer                     | not null default nextval('ad_log_ad_log_id_seq'::regclass)

 channel    | integer                     | not null

 player     | integer                     | not null

 ad         | integer                     | not null

 start_time | timestamp without time zone |

 end_time   | timestamp without time zone |

 

So, I need indices that make it fast querying against start_time as well as all possible combinations of channel, player, and ad. Below is a sample query that uses all of these (note that I’ve removed actual strings to protect customer data). The result is fine in terms of speed, but since it’s using the ad_log_ad_date index I’m wondering what the best strategy is to cover queries that don’t specify an ad. Should I have 4 indices, one with just the start_time (for when no other columns are specified) and the other three each with the start_time and the three other criteria: channel, player, and ad? I’m currently experimenting with various options, but since it takes a couple of hours to create a particular index this is taking a while.

 

 

# explain analyze SELECT ad_log.ad_log_id, channels.name as channel_name, players.name as player_name, ads.name as ad_name, start_time, end_time, (data IS NOT NULL) AS has_screenshot FROM channels, players, players_history, ads,  ad_log LEFT OUTER JOIN ad_log_screenshot USING (ad_log_id) WHERE channel=channels.id AND player=players_history.id AND players_history.player_instance = players.id AND ad=ads.id AND channels.name LIKE '<some channel>' AND players.name LIKE '<some player>' AND ads.name LIKE '<some ad>' AND  date(start_time) BETWEEN '2009-01-20' AND date('2009-01-21') ORDER BY channels.name, players.name, start_time, ads.name LIMIT 100 OFFSET 100;

 

                                                                                    QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 Limit  (cost=7425.26..7425.26 rows=1 width=120) (actual time=1256.116..1256.202 rows=39 loops=1)

   ->  Sort  (cost=7425.26..7425.26 rows=1 width=120) (actual time=1255.965..1256.068 rows=139 loops=1)

         Sort Key: channels.name, players.name, ad_log.start_time, ads.name

         Sort Method:  quicksort  Memory: 35kB

         ->  Nested Loop Left Join  (cost=0.01..7425.25 rows=1 width=120) (actual time=179.086..1255.451 rows=139 loops=1)

               ->  Nested Loop  (cost=0.01..7417.06 rows=1 width=88) (actual time=137.488..1212.531 rows=139 loops=1)

                     Join Filter: (ad_log.channel = channels.id)

                     ->  Nested Loop  (cost=0.01..7415.73 rows=1 width=60) (actual time=120.308..1192.867 rows=139 loops=1)

                           Join Filter: (players_history.id = ad_log.player)

                           ->  Nested Loop  (cost=0.00..36.92 rows=1 width=17) (actual time=21.960..23.405 rows=1 loops=1)

                                 Join Filter: (players.id = players_history.player_instance)

                                 ->  Seq Scan on players  (cost=0.00..11.80 rows=1 width=17) (actual time=5.981..6.083 rows=1 loops=1)

                                       Filter: (name ~~ '<some player>'::text)

                                 ->  Seq Scan on players_history  (cost=0.00..14.50 rows=850 width=8) (actual time=15.880..16.592 rows=850 loops=1)

                           ->  Nested Loop  (cost=0.01..7371.03 rows=622 width=51) (actual time=75.161..1156.076 rows=15600 loops=1)

                                 ->  Seq Scan on ads  (cost=0.00..72.79 rows=1 width=27) (actual time=15.776..31.975 rows=1 loops=1)

                                       Filter: (name ~~ '<some ad>'::text)

                                 ->  Index Scan using ad_log_ad_date on ad_log  (cost=0.01..7267.77 rows=2438 width=32) (actual time=59.375..1095.229 rows=15600 loops=1)

                                       Index Cond: ((ad_log.ad = ads.id) AND (date(ad_log.start_time) >= '2009-01-20'::date) AND (date(ad_log.start_time) <= '2009-01-21'::date))

                     ->  Seq Scan on channels  (cost=0.00..1.31 rows=1 width=36) (actual time=0.128..0.132 rows=1 loops=139)

                           Filter: (channels.name ~~ '<some channel>'::text)

               ->  Index Scan using ad_log_screenshot_pkey on ad_log_screenshot  (cost=0.00..8.18 rows=1 width=36) (actual time=0.304..0.304 rows=0 loops=139)

                     Index Cond: (ad_log.ad_log_id = ad_log_screenshot.ad_log_id)

 Total runtime: 1256.572 ms

 

 

 

Thanks,

 

--Rainer

pgsql-performance by date:

Previous
From: Greg Smith
Date:
Subject: Re: Nested query performance issue
Next
From: Tom Lane
Date:
Subject: Re: difficulties with time based queries