Slow index scan on B-Tree index over timestamp field - Mailing list pgsql-performance
From | Caio Casimiro |
---|---|
Subject | Slow index scan on B-Tree index over timestamp field |
Date | |
Msg-id | CAK42QYGWHKk+5HZAW4+z1+A2xMYUd1KmwsXp9OVtSud7ztA=4g@mail.gmail.com Whole thread Raw |
Responses |
Re: Slow index scan on B-Tree index over timestamp field
Re: Slow index scan on B-Tree index over timestamp field Re: Slow index scan on B-Tree index over timestamp field |
List | pgsql-performance |
Hello all,
I have one query running at ~ 7 seconds and I would like to know if it's possible to make it run faster, once this query runs lots of time in my experiment.
Basically the query return the topics of tweets published by users that the user N follows and that are published between D1 and D2.
Query:
SELECT tt.tweet_id, tt.topic, tt.topic_value
FROM tweet_topic AS tt LEFT JOIN tweet AS t ON tt.tweet_id = t.id
WHERE creation_time BETWEEN 'D1' AND 'D2' AND user_id in
(SELECT followed_id FROM relationship WHERE follower_id = N) ORDER BY tt.tweet_id;
Explain (Analyze, Buffers):
Sort (cost=3950701.24..3950708.22 rows=2793 width=20) (actual time=24062.951..24064.475 rows=1640 loops=1)
Sort Key: tt.tweet_id
Sort Method: quicksort Memory: 97kB
Buffers: shared hit=2390 read=32778
I/O Timings: read=15118.402
-> Nested Loop (cost=247.58..3950541.38 rows=2793 width=20) (actual time=532.578..24057.319 rows=1640 loops=1)
Buffers: shared hit=2387 read=32778
I/O Timings: read=15118.402
-> Hash Semi Join (cost=229.62..73239.03 rows=1361 width=8) (actual time=391.768..15132.889 rows=597 loops=1)
Hash Cond: (t.user_id = relationship.followed_id)
Buffers: shared hit=539 read=31862
I/O Timings: read=6265.279
-> Index Scan using tweet_creation_time_index on tweet t (cost=0.57..68869.39 rows=1472441 width=16) (actual time=82.752..11418.043 rows=175
9645 loops=1)
Index Cond: ((creation_time >= '2013-05-05 00:00:00-03'::timestamp with time zone) AND (creation_time <= '2013-05-06 00:00:00-03'::times
tamp with time zone))
Buffers: shared hit=534 read=31859
I/O Timings: read=6193.764
-> Hash (cost=227.12..227.12 rows=154 width=8) (actual time=72.175..72.175 rows=106 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 3kB
Buffers: shared hit=5 read=3
I/O Timings: read=71.515
-> Index Only Scan using relationship_id on relationship (cost=0.42..227.12 rows=154 width=8) (actual time=59.395..71.972 rows=106 loo
ps=1)
Index Cond: (follower_id = 335093362)
Heap Fetches: 0
Buffers: shared hit=5 read=3
I/O Timings: read=71.515
-> Bitmap Heap Scan on tweet_topic tt (cost=17.96..2841.63 rows=723 width=20) (actual time=14.909..14.917 rows=3 loops=597)
Recheck Cond: (tweet_id = t.id)
Buffers: shared hit=1848 read=916
I/O Timings: read=8853.123
-> Bitmap Index Scan on tweet_topic_pk (cost=0.00..17.78 rows=723 width=0) (actual time=9.793..9.793 rows=3 loops=597)
Index Cond: (tweet_id = t.id)
Buffers: shared hit=1764 read=631
I/O Timings: read=5811.532
Total runtime: 24066.145 ms
(34 rows)
Table structure:
Table "public.tweet"
Column | Type | Modifiers | Storage | Stats target | Description
-----------------------+--------------------------------------+--------------+-------------+------------------+-----------------
id | bigint | not null | plain | |
content | text | | extended | |
creation_time | timestamp with time zone | | plain | |
user_id | bigint | | plain | |
retweeted | bigint | | plain | |
retweet_count | integer | | plain | |
Indexes:
"tweet_plk" PRIMARY KEY, btree (id) CLUSTER
"tweet_creation_time_index" btree (creation_time)
"tweet_id_index" hash (id)
"tweet_ios_index" btree (id, user_id, creation_time)
"tweet_retweeted_idx" hash (retweeted)
"tweet_user_id_creation_time_index" btree (creation_time, user_id)
"tweet_user_id_index" hash (user_id)
System Information:
OS: Slackware 14.0
Postgresql Version: 9.3 Beta2
postgresql.conf Settings:
work_mem = 128MB
shared_buffers = 1GB
maintenance_work_mem = 1536MB
fsync = off
synchronous_commit = off
effective_cache_size = 2GB
Additional information:
All tables in this database are read only tables. I haven't post the details about other tables to not let the email big, as it seems the problem is with the 'tweet' table.
Any help would be much appreciated.
Best regards,
Caio Casimiro.
pgsql-performance by date: