Thread: Determining which index to create
I have the following table : Attribute | Type | Modifier -----------+--------------------------+---------- motid | integer | not null objid | integer | not null date | timestamp with time zone | not null ...with 140 million rows. For each distinct value of motid there are many rows (with different objid/dates). I would like to optimize the following query: => select * from dico_frs where motid=4742 order by date desc limit 10; Creating an index on 'date' makes the query use that index: Limit (cost=0.00..17591.91 rows=10 width=16) -> Index Scan Backward using dico_frs_date on dico_frs (cost=0.00..20023641.63 rows=11382 width=16) But it's still quite slow. I'm thinking an index on (motid, date desc) would be best but that doesn't seem to be possible. How can I optimize this query? -- Eric Cholet
On Wed, Nov 21, 2001 at 12:23:07PM +0100, Eric Cholet wrote: > I would like to optimize the following query: > > => select * from dico_frs where motid=4742 order by date desc limit 10; > > But it's still quite slow. I'm thinking an index on (motid, date desc) > would be best > but that doesn't seem to be possible. How can I optimize this query? Indexes (at least btree ones) can be scanned in either forward or backward directions. So an index on (motid,date) should be fine. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Magnetism, electricity and motion are like a three-for-two special offer: > if you have two of them, the third one comes free.
--On mercredi 21 novembre 2001 22:48 +1100 Martijn van Oosterhout <kleptog@svana.org> wrote: > On Wed, Nov 21, 2001 at 12:23:07PM +0100, Eric Cholet wrote: >> I would like to optimize the following query: >> >> => select * from dico_frs where motid=4742 order by date desc limit 10; >> >> But it's still quite slow. I'm thinking an index on (motid, date desc) >> would be best >> but that doesn't seem to be possible. How can I optimize this query? > > Indexes (at least btree ones) can be scanned in either forward or backward > directions. So an index on (motid,date) should be fine. I should have mentionned I tried that, but it isn't being used: => \d dico_frs_motid_date Index "dico_frs_motid_date" Attribute | Type -----------+-------------------------- motid | integer date | timestamp with time zone btree => explain select * from dico_frs where motid=4742 order by date desc limit 10; NOTICE: QUERY PLAN: Limit (cost=0.00..17591.91 rows=10 width=16) -> Index Scan Backward using dico_frs_date on dico_frs (cost=0.00..20023641.63 rows=11382 width=16) -- Eric Cholet
On Wed, Nov 21, 2001 at 12:53:09PM +0100, Eric Cholet wrote: > I should have mentionned I tried that, but it isn't being used: > > => \d dico_frs_motid_date > Index "dico_frs_motid_date" > Attribute | Type > -----------+-------------------------- > motid | integer > date | timestamp with time zone > btree > > => explain select * from dico_frs where motid=4742 order by date desc limit > 10; > NOTICE: QUERY PLAN: > > Limit (cost=0.00..17591.91 rows=10 width=16) > -> Index Scan Backward using dico_frs_date on dico_frs > (cost=0.00..20023641.63 rows=11382 width=16) Well, it is doing the scan backwards, which is good. But it's not using the index. If you drop dico_frs_date index, does it do it then? Oh, and what version of postgres was this again? -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Magnetism, electricity and motion are like a three-for-two special offer: > if you have two of them, the third one comes free.
--On jeudi 22 novembre 2001 01:04 +1100 Martijn van Oosterhout <kleptog@svana.org> wrote: > On Wed, Nov 21, 2001 at 12:53:09PM +0100, Eric Cholet wrote: >> I should have mentionned I tried that, but it isn't being used: >> >> => \d dico_frs_motid_date >> Index "dico_frs_motid_date" >> Attribute | Type >> -----------+-------------------------- >> motid | integer >> date | timestamp with time zone >> btree >> >> => explain select * from dico_frs where motid=4742 order by date desc >> limit 10; >> NOTICE: QUERY PLAN: >> >> Limit (cost=0.00..17591.91 rows=10 width=16) >> -> Index Scan Backward using dico_frs_date on dico_frs >> (cost=0.00..20023641.63 rows=11382 width=16) > > Well, it is doing the scan backwards, which is good. But it's not using > the index. If you drop dico_frs_date index, does it do it then? => explain select * from dico_frs where motid=4742 order by date desc limit 10; NOTICE: QUERY PLAN: Limit (cost=46172.25..46172.25 rows=10 width=16) -> Sort (cost=46172.25..46172.25 rows=11382 width=16) -> Index Scan using dico_frs_motid_date on dico_frs (cost=0.00..45405.39 rows=11382 width=16) It's a bit better but still quite long, depending on how many rows for a particular motid. Dropping the "desc" in the "order by date" clause makes things much faster, but I need the results in reverse chronological order! > Oh, and what version of postgres was this again? => select version(); version --------------------------------------------------------------------- PostgreSQL 7.1.3 on i386-unknown-freebsd4.4, compiled by GCC 2.95.3 (1 row) Thanks for your help, -- Eric Cholet
On Wed, 21 Nov 2001, Eric Cholet wrote: > => explain select * from dico_frs where motid=4742 order by date desc limit > 10; > NOTICE: QUERY PLAN: > > Limit (cost=46172.25..46172.25 rows=10 width=16) > -> Sort (cost=46172.25..46172.25 rows=11382 width=16) > -> Index Scan using dico_frs_motid_date on dico_frs > (cost=0.00..45405.39 rows=11382 width=16) > > > It's a bit better but still quite long, depending on how many rows for a > particular motid. > Dropping the "desc" in the "order by date" clause makes things much faster, > but I need the > results in reverse chronological order! Hmm, it looks like the sort is the expensive bit even though it's estimating something low for it (relative to the index scan). Have you tried setting sort_mem higher than the defaults (which are really low) to see if it's just going out to disk for the sort.
On Wed, Nov 21, 2001 at 04:09:52PM +0100, Eric Cholet wrote: > => explain select * from dico_frs where motid=4742 order by date desc limit > 10; > NOTICE: QUERY PLAN: > > Limit (cost=46172.25..46172.25 rows=10 width=16) > -> Sort (cost=46172.25..46172.25 rows=11382 width=16) > -> Index Scan using dico_frs_motid_date on dico_frs > (cost=0.00..45405.39 rows=11382 width=16) That's wrong. It doesn't seem to realise that a reverse scan on the index would give the right answer. Note that that's only true because you're selecting only a single motid. If there were multiple, a reverse scan would definitly not be appropriate. Not sure how to fix this though. Functional index? -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Magnetism, electricity and motion are like a three-for-two special offer: > if you have two of them, the third one comes free.
Martijn van Oosterhout wrote: > > On Wed, Nov 21, 2001 at 04:09:52PM +0100, Eric Cholet wrote: > > => explain select * from dico_frs where motid=4742 order by date desc limit > > 10; > > NOTICE: QUERY PLAN: > > > > Limit (cost=46172.25..46172.25 rows=10 width=16) > > -> Sort (cost=46172.25..46172.25 rows=11382 width=16) > > -> Index Scan using dico_frs_motid_date on dico_frs > > (cost=0.00..45405.39 rows=11382 width=16) > > That's wrong. It doesn't seem to realise that a reverse scan on the index > would give the right answer. Note that that's only true because you're > selecting only a single motid. If there were multiple, a reverse scan would > definitly not be appropriate. Please try select * from dico_frs where motid=4742 order by motid desc, date desc limit 10; regards, Hiroshi Inoue
Martijn van Oosterhout <kleptog@svana.org> writes: > That's wrong. It doesn't seem to realise that a reverse scan on the index > would give the right answer. Note that that's only true because you're > selecting only a single motid. Yeah, there's currently little connection between the planner's code that recognizes "you could use this index to meet these WHERE conditions" and the code that recognizes "you could use this index to produce this sort ordering". In particular it has no clue that exact equality constraints on earlier index columns might allow it to consider the indexscan result as being sorted by later index columns. > Not sure how to fix this though. Offhand it doesn't seem like a trivial change :-( I'd be wary of expending a lot of planning cycles to detect this, because it doesn't seem like a very common case. But if we can find a way to do it cheaply, it'd be a cool optimization. regards, tom lane
--On jeudi 22 novembre 2001 11:00 +0900 Hiroshi Inoue <Inoue@tpf.co.jp> wrote: > Martijn van Oosterhout wrote: >> >> On Wed, Nov 21, 2001 at 04:09:52PM +0100, Eric Cholet wrote: >> > => explain select * from dico_frs where motid=4742 order by date desc >> > limit 10; >> > NOTICE: QUERY PLAN: >> > >> > Limit (cost=46172.25..46172.25 rows=10 width=16) >> > -> Sort (cost=46172.25..46172.25 rows=11382 width=16) >> > -> Index Scan using dico_frs_motid_date on dico_frs >> > (cost=0.00..45405.39 rows=11382 width=16) >> >> That's wrong. It doesn't seem to realise that a reverse scan on the index >> would give the right answer. Note that that's only true because you're >> selecting only a single motid. If there were multiple, a reverse scan >> would definitly not be appropriate. > > Please try > select * from dico_frs where motid=4742 order by motid desc, > date desc limit 10; Wow, I am speechless. Sub-second response time, whether the result set is large or very small. Very impressive. I have resisted pressure to use Oracle for this application, trusting open source software would do an equivalent or better job (this table has 140 million records). Thank you very much. -- Eric Cholet