Home > mailing lists

Re: Slow index scan on B-Tree index over timestamp field - Mailing list pgsql-performance

From	Caio Casimiro
Subject	Re: Slow index scan on B-Tree index over timestamp field
Date	November 4, 2013 20:44:16
Msg-id	CAK42QYHJif8Qek=oYnwxYcWhJC7GHAqcEPTgP6xq__8TxRNz_A@mail.gmail.com Whole thread
In response to	Re: Slow index scan on B-Tree index over timestamp field (Jeff Janes <jeff.janes@gmail.com>)
Responses	Re: Slow index scan on B-Tree index over timestamp field Re: Slow index scan on B-Tree index over timestamp field Re: Slow index scan on B-Tree index over timestamp field
List	pgsql-performance

Tree view

Thank you very much for your answers guys!

On Mon, Nov 4, 2013 at 5:15 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Sun, Nov 3, 2013 at 2:05 PM, Caio Casimiro <casimiro.listas@gmail.com> wrote:
Hello all,

I have one query running at ~ 7 seconds and I would like to know if it's possible to make it run faster, once this query runs lots of time in my experiment.

Do you mean you want it to be fast because it runs many times, or that you want it to become fast after it runs many times (i.e. once the data is fully cached)? The plan you show takes 24 seconds, not 7 seconds.

I want it to be fast because it runs many times. I have an experiment that evaluates recommendation algorithms for a set of twitter users. This query returns recommendation candidates so it is called a lot of times for different users and time intervals.

Basically the query return the topics of tweets published by users that the user N follows and that are published between D1 and D2.

Query:

SELECT tt.tweet_id, tt.topic, tt.topic_value
FROM tweet_topic AS tt LEFT JOIN tweet AS t ON tt.tweet_id = t.id
WHERE creation_time BETWEEN 'D1' AND 'D2' AND user_id in
(SELECT followed_id FROM relationship WHERE follower_id = N) ORDER BY tt.tweet_id;

I don't know if this affects the plan at all, but it is silly to do a left join to "tweet" when the WHERE clause has conditions that can't be satisfied with a null row. Also, you could try changing the IN-list to an EXISTS subquery.

I'm sorry the ignorance, but I don't understand the issue with the left join, could you explain more?

Is there some patterns to D1 and D2 that could help the caching? For example, are they both usually in the just-recent past?

The only pattern is that it is always a one day interval, e.g. D1 = '2013-05-01' and D2 = '2013-05-02'.

Indexes:
"tweet_plk" PRIMARY KEY, btree (id) CLUSTER
"tweet_creation_time_index" btree (creation_time)
"tweet_id_index" hash (id)
"tweet_ios_index" btree (id, user_id, creation_time)
"tweet_retweeted_idx" hash (retweeted)
"tweet_user_id_creation_time_index" btree (creation_time, user_id)
"tweet_user_id_index" hash (user_id)

Are all of those indexes important? If your table is heavily updated/inserted, which I assume it is, maintaining those indexes is going to take up precious RAM that could probably be better used elsewhere.

Probably not. But once this database is read only, the quantity of index grew following my desperation. =)

Cheers,

Jeff

Thank you very much again!

Caio

pgsql-performance by date:

From: Jeff Janes
Date: 04 November 2013, 19:15:38
Subject: Re: Slow index scan on B-Tree index over timestamp field

From: Caio Casimiro
Date: 04 November 2013, 20:50:30
Subject: Re: Slow index scan on B-Tree index over timestamp field

Re: Slow index scan on B-Tree index over timestamp field - Mailing list pgsql-performance

Previous

Next