Re: Slow index scan on B-Tree index over timestamp field - Mailing list pgsql-performance

From Igor Neyman
Subject Re: Slow index scan on B-Tree index over timestamp field
Date
Msg-id A76B25F2823E954C9E45E32FA49D70EC4283BB3B@mail.corp.perceptron.com
Whole thread Raw
In response to Re: Slow index scan on B-Tree index over timestamp field  (Caio Casimiro <casimiro.listas@gmail.com>)
Responses Re: Slow index scan on B-Tree index over timestamp field
List pgsql-performance

From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Caio Casimiro
Sent: Monday, November 04, 2013 3:44 PM
To: Jeff Janes
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow index scan on B-Tree index over timestamp field

Thank you very much for your answers guys!

On Mon, Nov 4, 2013 at 5:15 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Sun, Nov 3, 2013 at 2:05 PM, Caio Casimiro <casimiro.listas@gmail.com> wrote:
Hello all,

I have one query running at ~ 7 seconds and I would like to know if it's possible to make it run faster, once this
queryruns lots of time in my experiment. 


Do you mean you want it to be fast because it runs many times, or that you want it to become fast after it runs many
times(i.e. once the data is fully cached)?  The plan you show takes 24 seconds, not 7 seconds. 

I want it to be fast because it runs many times. I have an experiment that evaluates recommendation algorithms  for a
setof twitter users. This query returns recommendation candidates so it is called a lot of times for different users
andtime intervals. 
 
 

Basically the query return the topics of tweets published by users that the user N follows and that are published
betweenD1 and D2. 

Query:

SELECT tt.tweet_id, tt.topic, tt.topic_value
            FROM tweet_topic AS tt LEFT JOIN tweet AS t ON tt.tweet_id = t.id
            WHERE creation_time BETWEEN 'D1' AND 'D2' AND user_id in
            (SELECT followed_id FROM relationship WHERE follower_id = N) ORDER BY tt.tweet_id;


I don't know if this affects the plan at all, but it is silly to do a left join to "tweet" when the WHERE clause has
conditionsthat can't be satisfied with a null row.  Also, you could try changing the IN-list to an EXISTS subquery. 

I'm sorry the ignorance, but I don't understand the issue with the left join, could you explain more?
...........................................
Thank you very much again!
Caio


Just try the following:

SELECT tt.tweet_id, tt.topic, tt.topic_value
            FROM tweet_topic AS tt  JOIN tweet AS t ON (tt.tweet_id = t.id
                              AND t.creation_time BETWEEN 'D1' AND 'D2' AND t.user_id in
                            (SELECT followed_id FROM relationship WHERE follower_id = N))
 ORDER BY tt.tweet_id;

And see if it helps with performance.

Regards,
Igor Neyman



pgsql-performance by date:

Previous
From: Caio Casimiro
Date:
Subject: Re: Slow index scan on B-Tree index over timestamp field
Next
From: Caio Casimiro
Date:
Subject: Re: Slow index scan on B-Tree index over timestamp field