Re: Slow index scan on B-Tree index over timestamp field - Mailing list pgsql-performance

From Jeff Janes
Subject Re: Slow index scan on B-Tree index over timestamp field
Date
Msg-id CAMkU=1wU4trachr2jnTaZMbV97KQEMFe91C57A7qXX6MoRg+sw@mail.gmail.com
Whole thread Raw
In response to Re: Slow index scan on B-Tree index over timestamp field  (Caio Casimiro <casimiro.listas@gmail.com>)
Responses Re: Slow index scan on B-Tree index over timestamp field  (Caio Casimiro <casimiro.listas@gmail.com>)
List pgsql-performance
On Mon, Nov 4, 2013 at 12:44 PM, Caio Casimiro <casimiro.listas@gmail.com> wrote:
Thank you very much for your answers guys!


On Mon, Nov 4, 2013 at 5:15 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Sun, Nov 3, 2013 at 2:05 PM, Caio Casimiro <casimiro.listas@gmail.com> wrote:

SELECT tt.tweet_id, tt.topic, tt.topic_value
            FROM tweet_topic AS tt LEFT JOIN tweet AS t ON tt.tweet_id = t.id
            WHERE creation_time BETWEEN 'D1' AND 'D2' AND user_id in
            (SELECT followed_id FROM relationship WHERE follower_id = N) ORDER BY tt.tweet_id;


I don't know if this affects the plan at all, but it is silly to do a left join to "tweet" when the WHERE clause has conditions that can't be satisfied with a null row.  Also, you could try changing the IN-list to an EXISTS subquery.

I'm sorry the ignorance, but I don't understand the issue with the left join, could you explain more?

A left join means you are telling it to make up an all-NULL tweet row for any tweet_topic that does not have a corresponding tweet.  But then once it did so, it would just filter out that row later, because the null creation_time and user_id cannot pass the WHERE criteria--so doing a left join can't change the answer, but it can fool the planner into making a worse choice.
 


Is there some patterns to D1 and D2 that could help the caching?  For example, are they both usually in the just-recent past?
The only pattern is that it is always a one day interval, e.g. D1 = '2013-05-01' and  D2 = '2013-05-02'.

If you only compare creation_time to dates, rather than ever using date+time, then it would probably be better to store them in the table as date, not timestamp.  This might make the index smaller, and can also lead to better estimates and index usage.

But why would you want to offer suggestions to someone based on tweets that were made on exactly one day, over 5 months ago?  I can see why would want a brief period in the immediate past, or a long period; but a brief period that is not the recent past just seems like a strange thing to want to do.  (And it is going to be hard to get good performance with that requirement.)
 
Cheers,

Jeff

pgsql-performance by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: postgresql recommendation memory
Next
From: Caio Casimiro
Date:
Subject: Re: Slow index scan on B-Tree index over timestamp field