BUG #8943: optimizer appears to not be efficient when there is little data in the query results - Mailing list pgsql-bugs

From nghia.le@postano.com
Subject BUG #8943: optimizer appears to not be efficient when there is little data in the query results
Date
Msg-id 20140124010352.26703.57438@wrigleys.postgresql.org
Whole thread Raw
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      8943
Logged by:          Nghia Le
Email address:      nghia.le@postano.com
PostgreSQL version: 9.3.1
Operating system:   Amazon RDS
Description:

I think the issue is The issue is we order by post_time, scraped time. and
for the most part when there is a lot of data, the composite index created
works wonders. However when there is little data
(ie feed_id =8924 ) has about 54 items. Then it doesn't know to just use a
sequence scan, instead it uses the entire index and takes forever in doing
so.


Query Plan with Index_scan off: Limit  (cost=154847.18..154847.18 rows=1
width=684) (actual time=1.001..1.002 rows=1 loops=1)
   Output: p.id, p.guid, p.source_type, p.post_time, p.source_id, p.title,
p.url, p.picture_url, p.video_url, p.media_mime_type, p.media_height,
p.media_width, p.text, p.user_icon_url, p.user_id, p.user_displayname,
p.user_fullname, p.text_config_name, p.feed_id, p.scraped_time
   Buffers: shared hit=119 read=4
   ->  Sort  (cost=154847.18..154877.28 rows=12038 width=684) (actual
time=0.998..0.998 rows=1 loops=1)
         Output: p.id, p.guid, p.source_type, p.post_time, p.source_id,
p.title, p.url, p.picture_url, p.video_url, p.media_mime_type,
p.media_height, p.media_width, p.text, p.user_icon_url, p.user_id,
p.user_displayname, p.user_fullname, p.text_config_name, p.feed_id,
p.scraped_time
         Sort Key: p.post_time, p.scraped_time
         Sort Method: top-N heapsort  Memory: 26kB
         Buffers: shared hit=119 read=4
         ->  Nested Loop Anti Join  (cost=294.53..154786.99 rows=12038
width=684) (actual time=0.073..0.927 rows=23 loops=1)
               Output: p.id, p.guid, p.source_type, p.post_time,
p.source_id, p.title, p.url, p.picture_url, p.video_url, p.media_mime_type,
p.media_height, p.media_width, p.text, p.user_icon_url, p.user_id,
p.user_displayname, p.user_fullname, p.text_config_name, p.feed_id,
p.scraped_time
               Buffers: shared hit=119 read=4
               ->  Bitmap Heap Scan on public.post p  (cost=290.11..48544.89
rows=12586 width=684) (actual time=0.042..0.583 rows=23 loops=1)
                     Output: p.id, p.guid, p.scraped_time, p.source_type,
p.post_time, p.source_id, p.title, p.url, p.picture_url, p.video_url,
p.media_mime_type, p.media_height, p.media_width, p.text, p.user_icon_url,
p.user_id, p.user_displayname, p.user_fullname, p.text_config_name,
p.feed_id, p.tsv, p.original_source_id
                     Recheck Cond: (p.feed_id = 8924)
                     Buffers: shared hit=5 read=3
                     ->  Bitmap Index Scan on feed_id_idx
(cost=0.00..286.96 rows=12586 width=0) (actual time=0.033..0.033 rows=23
loops=1)
                           Index Cond: (p.feed_id = 8924)
                           Buffers: shared hit=3 read=1
               ->  Bitmap Heap Scan on public.post p1  (cost=4.42..8.44
rows=1 width=8) (actual time=0.009..0.009 rows=0 loops=23)
                     Output: p1.id, p1.guid, p1.scraped_time,
p1.source_type, p1.post_time, p1.source_id, p1.title, p1.url,
p1.picture_url, p1.video_url, p1.media_mime_type, p1.media_height,
p1.media_width, p1.text, p1.user_icon_url, p1.user_id, p1.user_displayname,
p1.user_fullname, p1.text_config_name, p1.feed_id, p1.tsv,
p1.original_source_id
                     Recheck Cond: (p1.id = p.id)
                     Filter: ((p1.original_source_id IS NOT NULL) AND
((p1.source_type)::text = ANY
('{twitter_stream,twitter_search,twitter_lists,twitter_user,twitter_hashtag}'::text[])))
                     Rows Removed by Filter: 1
                     Buffers: shared hit=114 read=1
                     ->  Bitmap Index Scan on post_pkey  (cost=0.00..4.42
rows=1 width=0) (actual time=0.006..0.006 rows=1 loops=23)
                           Index Cond: (p1.id = p.id)
                           Buffers: shared hit=91 read=1
 Total runtime: 1.078 ms


---------
Query Scan with Index On:

Limit  (cost=1.13..61.38 rows=5 width=684) (actual time=0.058..0.110 rows=5
loops=1)
   Output: p.id, p.guid, p.source_type, p.post_time, p.source_id, p.title,
p.url, p.picture_url, p.video_url, p.media_mime_type, p.media_height,
p.media_width, p.text, p.user_icon_url, p.user_id, p.user_displayname,
p.user_fullname, p.text_config_name, p.feed_id, p.scraped_time
   Buffers: shared hit=28 read=3
   ->  Nested Loop Anti Join  (cost=1.13..144978.35 rows=12032 width=684)
(actual time=0.055..0.098 rows=5 loops=1)
         Output: p.id, p.guid, p.source_type, p.post_time, p.source_id,
p.title, p.url, p.picture_url, p.video_url, p.media_mime_type,
p.media_height, p.media_width, p.text, p.user_icon_url, p.user_id,
p.user_displayname, p.user_fullname, p.text_config_name, p.feed_id,
p.scraped_time
         Buffers: shared hit=28 read=3
         ->  Index Scan using feed_id_idx on public.post p
(cost=0.56..38902.40 rows=12580 width=684) (actual time=0.028..0.043 rows=5
loops=1)
               Output: p.id, p.guid, p.scraped_time, p.source_type,
p.post_time, p.source_id, p.title, p.url, p.picture_url, p.video_url,
p.media_mime_type, p.media_height, p.media_width, p.text, p.user_icon_url,
p.user_id, p.user_displayname, p.user_fullname, p.text_config_name,
p.feed_id, p.tsv, p.original_source_id
               Index Cond: (p.feed_id = 8924)
               Buffers: shared hit=4 read=2
         ->  Index Scan using post_pkey on public.post p1  (cost=0.56..8.43
rows=1 width=8) (actual time=0.007..0.007 rows=0 loops=5)
               Output: p1.id, p1.guid, p1.scraped_time, p1.source_type,
p1.post_time, p1.source_id, p1.title, p1.url, p1.picture_url, p1.video_url,
p1.media_mime_type, p1.media_height, p1.media_width, p1.text,
p1.user_icon_url, p1.user_id, p1.user_displayname, p1.user_fullname,
p1.text_config_name, p1.feed_id, p1.tsv, p1.original_source_id
               Index Cond: (p1.id = p.id)
               Filter: ((p1.original_source_id IS NOT NULL) AND
((p1.source_type)::text = ANY
('{twitter_stream,twitter_search,twitter_lists,twitter_user,twitter_hashtag}'::text[])))
               Rows Removed by Filter: 1
               Buffers: shared hit=24 read=1
 Total runtime: 0.186 ms


original query:

explain(analyze,buffers,verbose) SELECT p.id, p.guid, source_type,
post_time, source_id, title, url, picture_url, video_url, media_mime_type,
media_height, media_width, text, user_icon_url, user_id, user_displayname,
user_fullname, text_config_name, feed_id,scraped_time AS t FROM post AS p
LEFT JOIN location AS l on l.post_id=p.id WHERE (   feed_id =8924 )  AND NOT
EXISTS( SELECT 1 FROM post p1 WHERE p1.id=p.id AND p1.source_type
IN('twitter_stream','twitter_search', 'twitter_lists', 'twitter_user',
'twitter_hashtag') AND  p1.original_source_id IS NOT NULL)  ORDER BY
post_time DESC, scraped_time DESC LIMIT 1;

pgsql-bugs by date:

Previous
From: Marcus Overheu
Date:
Subject: Re: BUG #8934: value != ANY (uuid[]) AND expr does not work in all cases
Next
From: Tom Lane
Date:
Subject: Re: Re: PostgreSQL 6.2.5 Visual Studio Build does not pass the regression tests.