Thread: where+orderby+limit not (always) using appropriate index?

where+orderby+limit not (always) using appropriate index?

From
Szűcs Gábor
Date:
Dear Gurus,

I don't think it's a bug, I just don't understand what's behind this. If
there's a paper or something on this, please point me there.

Version: 7.4.6
Locale: hu_HU (in case that matters)
Dump: see below sig.

Abstract:
Create a table with (at least) two fields, say i and o.
Create three indexes on (i), (o), (i,o)
Insert enough rows to test.
Try to replace min/max aggregates with indexable queries such as:

SELECT o FROM t WHERE i = 1 ORDER BY o LIMIT 1;

Problem #1: This tends to use one of the single-column indexes (depending on
the frequency of the indexed element), not the two-column index. Also, I'm
not perfectly sure but maybe the planner is right. Why?

Problem #2: If I drop the problematic 1-col index, it uses the 2-col index,
but sorts after that. (and probably that's why the planner was right in #1) Why?

Below is an example that isn't perfect; also, IRL I use a second field of
type date.

Problem #3: It seems that an opposite index (o, i) works differently but
still not always. Why?

In case it matters, I may be able to reproduce the original problem with
original data.

TIA,
--
G.

# CREATE TABLE t(i int, o int);
CREATE TABLE
# CREATE INDEX t_i on t (i);
CREATE INDEX
# CREATE INDEX t_o on t (o);
CREATE INDEX
# CREATE INDEX t_io on t (i, o);
CREATE INDEX
# INSERT INTO t SELECT 1, p.oid::int FROM pg_proc p WHERE Pronamespace=11;
INSERT 0 1651
# explain analyze select * from t where i=1 order by o limit 1;
                                                   QUERY PLAN
--------------------------------------------------------------------------------------------------------------
  Limit  (cost=0.00..3.37 rows=1 width=8) (actual time=0.028..0.029 rows=1
loops=1)
    ->  Index Scan using t_o on t  (cost=0.00..20.20 rows=6 width=8) (actual
time=0.025..0.025 rows=1 loops=1)
          Filter: (i = 1)
  Total runtime: 0.082 ms
(4 rows)

# drop index t_o;
DROP INDEX
# explain analyze select * from t where i=1 order by o limit 1;
                                                       QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
  Limit  (cost=6.14..6.14 rows=1 width=8) (actual time=4.624..4.625 rows=1
loops=1)
    ->  Sort  (cost=6.14..6.15 rows=6 width=8) (actual time=4.619..4.619
rows=1 loops=1)
          Sort Key: o
          ->  Index Scan using t_io on t  (cost=0.00..6.11 rows=6 width=8)
(actual time=0.026..2.605 rows=1651 loops=1)
                Index Cond: (i = 1)
  Total runtime: 4.768 ms
(6 rows)

[local]:tir=#



> SELECT o FROM t WHERE i = 1 ORDER BY o LIMIT 1;

    use :
    ORDER BY i, o

    If you have a multicol index and want to order on it, you should help the
planner by ORDERing BY all of the columns in the index...
    It bit me a few times ;)

Re: where+orderby+limit not (always) using appropriate index?

From
Tom Lane
Date:
=?ISO-8859-2?Q?Sz=FBcs_G=E1bor?= <surrano@gmail.com> writes:
> Create a table with (at least) two fields, say i and o.
> Create three indexes on (i), (o), (i,o)
> Insert enough rows to test.
> Try to replace min/max aggregates with indexable queries such as:

> SELECT o FROM t WHERE i = 1 ORDER BY o LIMIT 1;

> Problem #1: This tends to use one of the single-column indexes (depending on
> the frequency of the indexed element), not the two-column index. Also, I'm
> not perfectly sure but maybe the planner is right. Why?

To get the planner to use the double-column index, you have to use an
ORDER BY that matches the index, eg

    SELECT o FROM t WHERE i = 1 ORDER BY i,o LIMIT 1;

            regards, tom lane