Thread: can't win
I have a query that it seems is destined to be slow one way or another. I have a table of around 30k articles, categorized by topic and ordered by date: create table articles ( topic varchar(50), created date, data text ); create index articles_topic_idx on articles(topic); create index articles_created_idx on articles(created); If I want to get the 5 most recent articles in a topic, I get a nice query plan and a fast query: # explain select * from articles where topic = 'Example' order by created desc limit 5 ; QUERY PLAN ------------------------------------------------------------------------------- ----------------------------- Limit (cost=0.00..646.71 rows=5 width=828) -> Index Scan Backward using articles_created_idx on articles (cost=0.00..85202.16 rows=659 width=828) Filter: (topic = 'Example'::character varying) (3 rows) # select * from articles where topic = 'Example' order by created desc limit 5 ; [.....] Time: 18.42 ms However, if the topic happens to not exist, this query takes a very long time: # select * from articles where topic = 'NO-Example' order by created desc limit 5 ; [.....] Time: 1075.36 ms If I drop the date index or get more articles (so it doesn't do the backward scan on articles_created_idx), then the situation is reversed: getting the most recent articles for a topic that exists takes a fair amount of time, while getting a topic that does not exist is nearly instantaneous. Is there any way I can get the best of both worlds? -J
It might be worthwhile to experiment with 2 new indexes: Create UNIQUE index articles_created_topic_idx on articles(created, topic); Create UNIQUE index articles_topic_created_idx on articles(topic, created); Probably, one of the two should become your primary key. That will give the optimizer some new choices for plans. > -----Original Message----- > From: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Jeff Rogers > Sent: Friday, June 18, 2004 12:28 PM > To: pgsql-general@postgresql.org > Subject: [GENERAL] can't win > > > > I have a query that it seems is destined to be slow one way > or another. I > have a table of around 30k articles, categorized by topic and > ordered by date: > > create table articles ( > topic varchar(50), > created date, > data text > ); > > create index articles_topic_idx on articles(topic); > create index articles_created_idx on articles(created); > > If I want to get the 5 most recent articles in a topic, I get > a nice query > plan and a fast query: > # explain select * from articles where topic = 'Example' > order by created desc > limit 5 ; > QUERY PLAN > > > -------------------------------------------------------------- > ----------------- > ----------------------------- > Limit (cost=0.00..646.71 rows=5 width=828) > -> Index Scan Backward using articles_created_idx on articles > (cost=0.00..85202.16 rows=659 width=828) > Filter: (topic = 'Example'::character varying) > (3 rows) > > # select * from articles where topic = 'Example' order by > created desc limit 5 > ; > [.....] > Time: 18.42 ms > > However, if the topic happens to not exist, this query takes > a very long time: # select * from articles where topic = > 'NO-Example' order by created desc > limit 5 ; > [.....] > Time: 1075.36 ms > > If I drop the date index or get more articles (so it doesn't > do the backward > scan on articles_created_idx), then the situation is > reversed: getting the > most recent articles for a topic that exists takes a fair > amount of time, > while getting a topic that does not exist is nearly instantaneous. > > Is there any way I can get the best of both worlds? > > -J > > > > > > ---------------------------(end of > broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so > that your > message can get through to the mailing list cleanly >
On Fri, 18 Jun 2004, Jeff Rogers wrote: > create table articles ( > topic varchar(50), > created date, > data text > ); > > create index articles_topic_idx on articles(topic); > create index articles_created_idx on articles(created); > > If I want to get the 5 most recent articles in a topic, I get a nice query > plan and a fast query: > # explain select * from articles where topic = 'Example' order by created desc > limit 5 ; > QUERY PLAN > > ------------------------------------------------------------------------------- > ----------------------------- > Limit (cost=0.00..646.71 rows=5 width=828) > -> Index Scan Backward using articles_created_idx on articles > (cost=0.00..85202.16 rows=659 width=828) > Filter: (topic = 'Example'::character varying) > (3 rows) > > # select * from articles where topic = 'Example' order by created desc limit 5 > ; > [.....] > Time: 18.42 ms > > However, if the topic happens to not exist, this query takes a very long time: > # select * from articles where topic = 'NO-Example' order by created desc > limit 5 ; > [.....] > Time: 1075.36 ms > > If I drop the date index or get more articles (so it doesn't do the backward > scan on articles_created_idx), then the situation is reversed: getting the > most recent articles for a topic that exists takes a fair amount of time, > while getting a topic that does not exist is nearly instantaneous. > > Is there any way I can get the best of both worlds? Hmm, how about a multiple column index on (topic, created)? You might need to use select * from articles where topic='Example' order by topic desc, created desc limit 5; to get it to use the index though.
>> Is there any way I can get the best of both worlds? > > Hmm, how about a multiple column index on (topic, created)? You might > need to use > select * from articles where topic='Example' order by topic desc, created > desc limit 5; > to get it to use the index though. That works beautifully, thanks! -J