Thread: Index usage in order by with multiple columns in order-by-clause

Index usage in order by with multiple columns in order-by-clause

From

Andreas Joseph Krogh

Date:

10 August 2007, 11:53:34

I have the following test-case:

CREATE TABLE test(
name varchar PRIMARY KEY,
value varchar NOT NULL,
created timestamp not null
);

create index test_lowernamevalue_idx ON test ((lower(name) || lower(value)));
create index test_lowernamevaluecreated_idx ON test ((lower(name) ||
lower(value)), created);

andreak=# EXPLAIN ANALYZE select * from test order by lower(name) ||
lower(value) ASC, created ASC;                                                              QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------Index
Scanusing test_lowernamevaluecreated_idx on test  (cost=0.00..61.58  
rows=770 width=72) (actual time=0.013..0.013 rows=0 loops=1)Total runtime: 0.127 ms
(2 rows)

andreak=# EXPLAIN ANALYZE select * from test order by lower(name) ||
lower(value) ASC, created DESC;                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------Sort
(cost=60.39..62.32rows=770 width=72) (actual time=0.034..0.034 rows=0  
loops=1)  Sort Key: (lower((name)::text) || lower((value)::text)), created  ->  Seq Scan on test  (cost=0.00..23.47
rows=770width=72) (actual  
time=0.004..0.004 rows=0 loops=1)Total runtime: 0.123 ms
(4 rows)

As the EXPLAIN-output shows, the index is not used when sort-ordering differs
in the two order-by-columns.
Is there a way I can have multiple columns in the ORDER BY clause, each with
different ASC/DESC-order and still use an index to speed up sorting?

In my application I often have a need to sort by more than 3 columns, so I'm
really wondering if there is a way to make sorting of multiple columsn (each
which may have different sort-order) use an index? Preferrably without having
to create 2^N indexes.

--
Andreas Joseph Krogh <andreak@officenet.no>
Senior Software Developer / Manager
------------------------+---------------------------------------------+
OfficeNet AS            | The most difficult thing in the world is to |
Karenslyst Allé 11      | know how to do a thing and to watch         |
PO. Box 529 Skøyen      | somebody else doing it wrong, without       |
0214 Oslo               | comment.                                    |
NORWAY                  |                                             |
Tlf:    +47 24 15 38 90 |                                             |
Fax:    +47 24 15 38 91 |                                             |
Mobile: +47 909  56 963 |                                             |
------------------------+---------------------------------------------+

Re: Index usage in order by with multiple columns in order-by-clause

From

Tom Lane

Date:

10 August 2007, 18:30:27

Andreas Joseph Krogh <andreak@officenet.no> writes:
> Is there a way I can have multiple columns in the ORDER BY clause, each with
> different ASC/DESC-order and still use an index to speed up sorting?

A btree index isn't magic, it's just an ordered list of entries.  So you
can't just randomly flip the ordering of individual columns.  For
instance, the natural sort order of a 2-column index on (x,y) is like
x    y
1    11    21    32    12    22    33    13    23    3

If you scan this index forwards, you get the equivalent ofORDER BY x ASC, y ASC
If you scan it backwards, you get the equivalent ofORDER BY x DESC, y DESC
But there is no way to get the equivalent of x ASC, y DESC from
a scan of this index, nor x DESC, y ASC.

If you have a specific requirement for one of those combinations,
what you can do is build an index in which one of the columns is
"reverse sorted".  For instance, if we reverse-sort y, the index
ordering looks like
x    y
1    31    21    12    32    22    13    33    23    1

Now we can get ORDER BY x ASC, y DESC from a forwards indexscan,
or ORDER BY x DESC, y ASC from a backwards scan.  But there's no
way to get ASC/ASC or DESC/DESC from this index.  If you really need
all four orderings to be available, you're stuck with maintaining
two indexes.

Reverse-sorted index columns are possible but not well supported in
existing PG releases (you need a custom operator class, and the planner
is not all that bright about using them).  8.3 will have full support.
        regards, tom lane

Re: Index usage in order by with multiple columns in order-by-clause

From

Andreas Joseph Krogh

Date:

11 August 2007, 06:22:26

On Friday 10 August 2007 23:30:14 Tom Lane wrote:
> Andreas Joseph Krogh <andreak@officenet.no> writes:
> > Is there a way I can have multiple columns in the ORDER BY clause, each
> > with different ASC/DESC-order and still use an index to speed up sorting?
>
> A btree index isn't magic, it's just an ordered list of entries.  So you
> can't just randomly flip the ordering of individual columns.  For
> instance, the natural sort order of a 2-column index on (x,y) is like
>
>     x    y
>
>     1    1
>     1    2
>     1    3
>     2    1
>     2    2
>     2    3
>     3    1
>     3    2
>     3    3
>
> If you scan this index forwards, you get the equivalent of
>     ORDER BY x ASC, y ASC
> If you scan it backwards, you get the equivalent of
>     ORDER BY x DESC, y DESC
> But there is no way to get the equivalent of x ASC, y DESC from
> a scan of this index, nor x DESC, y ASC.
>
> If you have a specific requirement for one of those combinations,
> what you can do is build an index in which one of the columns is
> "reverse sorted".  For instance, if we reverse-sort y, the index
> ordering looks like
>
>     x    y
>
>     1    3
>     1    2
>     1    1
>     2    3
>     2    2
>     2    1
>     3    3
>     3    2
>     3    1
>
> Now we can get ORDER BY x ASC, y DESC from a forwards indexscan,
> or ORDER BY x DESC, y ASC from a backwards scan.  But there's no
> way to get ASC/ASC or DESC/DESC from this index.  If you really need
> all four orderings to be available, you're stuck with maintaining
> two indexes.
>
> Reverse-sorted index columns are possible but not well supported in
> existing PG releases (you need a custom operator class, and the planner
> is not all that bright about using them).  8.3 will have full support.

Thank you for your in-depth reply (a always)!

How exactly do I build an index in which one of the columns is "reverse
sorted" in 8.2 (and 8.3)? This may be *the* reason to upgrade for me if 8.3
is better at this.

--
Andreas Joseph Krogh <andreak@officenet.no>
Senior Software Developer / Manager
------------------------+---------------------------------------------+
OfficeNet AS            | The most difficult thing in the world is to |
Karenslyst Allé 11      | know how to do a thing and to watch         |
PO. Box 529 Skøyen      | somebody else doing it wrong, without       |
0214 Oslo               | comment.                                    |
NORWAY                  |                                             |
Tlf:    +47 24 15 38 90 |                                             |
Fax:    +47 24 15 38 91 |                                             |
Mobile: +47 909  56 963 |                                             |
------------------------+---------------------------------------------+

Re: Index usage in order by with multiple columns in order-by-clause

From

hubert depesz lubaczewski

Date:

11 August 2007, 16:05:36

On Fri, Aug 10, 2007 at 04:53:12PM +0200, Andreas Joseph Krogh wrote:
> I have the following test-case:
> 
> CREATE TABLE test(
> name varchar PRIMARY KEY,
> value varchar NOT NULL,
> created timestamp not null
> );
> 
> create index test_lowernamevalue_idx ON test ((lower(name) || lower(value)));
> create index test_lowernamevaluecreated_idx ON test ((lower(name) || 
> lower(value)), created);
> andreak=# EXPLAIN ANALYZE select * from test order by lower(name) || 
> lower(value) ASC, created DESC;
>                                                QUERY PLAN
> --------------------------------------------------------------------------------------------------------
>  Sort  (cost=60.39..62.32 rows=770 width=72) (actual time=0.034..0.034 rows=0 
> loops=1)
>    Sort Key: (lower((name)::text) || lower((value)::text)), created
>    ->  Seq Scan on test  (cost=0.00..23.47 rows=770 width=72) (actual 
> time=0.004..0.004 rows=0 loops=1)
>  Total runtime: 0.123 ms
> (4 rows)
> In my application I often have a need to sort by more than 3 columns, so I'm 
> really wondering if there is a way to make sorting of multiple columsn (each 
> which may have different sort-order) use an index? Preferrably without having 
> to create 2^N indexes.

first of all - you can try with separate indexes on lower()||lower(),
and created.

then - you can use a trick.
create a function that will reverse order of your date (using a simple
"-" operator)
and then index your lower() and output of this function.

you will need to modify the query, but it's perfectly doable.

for example:
create function test_ts(timestamp) returns interval as $BODY$
begin
return '2000-01-01 00:00:00'::timestamp-$1;
end;
$BODY$ language plpgsql immutable;

of course this particular date is irrelevant, we just have to substract
from something.

then:
create index test_lowernamevaluecreated_idx2 ON test ((lower(name) ||
lower(value)), test_ts(created));

and change your query to:
select * from test order by lower(name) || lower(value) ASC, test_ts(created);
it would show you what you need.

depesz

-- 
quicksil1er: "postgres is excellent, but like any DB it requires a
highly paid DBA.  here's my CV!" :)
http://www.depesz.com/ - blog dla ciebie (i moje CV)

Re: Index usage in order by with multiple columns in order-by-clause

From

Andreas Joseph Krogh

Date:

11 August 2007, 16:45:59

On Saturday 11 August 2007 21:05:22 hubert depesz lubaczewski wrote:
> On Fri, Aug 10, 2007 at 04:53:12PM +0200, Andreas Joseph Krogh wrote:
> > I have the following test-case:
> >
> > CREATE TABLE test(
> > name varchar PRIMARY KEY,
> > value varchar NOT NULL,
> > created timestamp not null
> > );
> >
> > create index test_lowernamevalue_idx ON test ((lower(name) ||
> > lower(value))); create index test_lowernamevaluecreated_idx ON test
> > ((lower(name) || lower(value)), created);
> > andreak=# EXPLAIN ANALYZE select * from test order by lower(name) ||
> > lower(value) ASC, created DESC;
> >                                                QUERY PLAN
> > -------------------------------------------------------------------------
> >------------------------------- Sort  (cost=60.39..62.32 rows=770
> > width=72) (actual time=0.034..0.034 rows=0 loops=1)
> >    Sort Key: (lower((name)::text) || lower((value)::text)), created
> >    ->  Seq Scan on test  (cost=0.00..23.47 rows=770 width=72) (actual
> > time=0.004..0.004 rows=0 loops=1)
> >  Total runtime: 0.123 ms
> > (4 rows)
> > In my application I often have a need to sort by more than 3 columns, so
> > I'm really wondering if there is a way to make sorting of multiple
> > columsn (each which may have different sort-order) use an index?
> > Preferrably without having to create 2^N indexes.
>
> first of all - you can try with separate indexes on lower()||lower(),
> and created.
>
> then - you can use a trick.
> create a function that will reverse order of your date (using a simple
> "-" operator)
> and then index your lower() and output of this function.
>
> you will need to modify the query, but it's perfectly doable.
>
> for example:
> create function test_ts(timestamp) returns interval as $BODY$
> begin
> return '2000-01-01 00:00:00'::timestamp-$1;
> end;
> $BODY$ language plpgsql immutable;
>
> of course this particular date is irrelevant, we just have to substract
> from something.
>
> then:
> create index test_lowernamevaluecreated_idx2 ON test ((lower(name) ||
> lower(value)), test_ts(created));
>
> and change your query to:
> select * from test order by lower(name) || lower(value) ASC,
> test_ts(created); it would show you what you need.
>
> depesz

Thanks. I actaully do have an index on lower(a) || lower(b). Then, as Tom Lane
explained, I need to have lots of indexes if I want to sort with different
ordering (ASC|DESC) on each column.

--
Andreas Joseph Krogh <andreak@officenet.no>
Senior Software Developer / Manager
------------------------+---------------------------------------------+
OfficeNet AS            | The most difficult thing in the world is to |
Karenslyst Allé 11      | know how to do a thing and to watch         |
PO. Box 529 Skøyen      | somebody else doing it wrong, without       |
0214 Oslo               | comment.                                    |
NORWAY                  |                                             |
Tlf:    +47 24 15 38 90 |                                             |
Fax:    +47 24 15 38 91 |                                             |
Mobile: +47 909  56 963 |                                             |
------------------------+---------------------------------------------+

Re: Index usage in order by with multiple columns in order-by-clause

From

Tom Lane

Date:

11 August 2007, 16:56:01

Andreas Joseph Krogh <andreak@officenet.no> writes:
> On Friday 10 August 2007 23:30:14 Tom Lane wrote:
>> Reverse-sorted index columns are possible but not well supported in
>> existing PG releases (you need a custom operator class, and the planner
>> is not all that bright about using them).  8.3 will have full support.

> How exactly do I build an index in which one of the columns is "reverse
> sorted" in 8.2 (and 8.3)?

Here's a minimal example (tested in 8.2).  The pain-in-the-neck part
is creating a btree comparison function that reverses the normal one's
comparisons.  For the example I just did it in plpgsql, but if you were
to do this sort of thing on large tables you'd probably find you needed
a function written in C for speed:

regression=# create function btrevfloat8cmp(float8,float8) returns int as
regression-# $$begin return btfloat8cmp($2, $1); end$$
regression-# language plpgsql strict immutable;
CREATE FUNCTION

You then make the opclass using the regular comparison operators listed
in backwards order, plus the reverse comparison function:

regression=# create operator class rev_float8_ops for type float8 using btree
regression-# as
regression-#   operator 1 > ,
regression-#   operator 2 >= ,
regression-#   operator 3 = ,
regression-#   operator 4 <= ,
regression-#   operator 5 < ,
regression-#   function 1 btrevfloat8cmp(float8,float8) ;
CREATE OPERATOR CLASS

And you're off:

regression=# create table myt (f1 float8, f2 float8);
CREATE TABLE
regression=# create index myi on myt using btree (f1, f2 rev_float8_ops);
CREATE INDEX
regression=# insert into myt values(1,1),(1,2),(1,3),(2,1),(2,2),(2,3);
INSERT 0 6
regression=# explain select * from myt order by f1 asc, f2 desc;                            QUERY PLAN
          
 
--------------------------------------------------------------------Index Scan using myi on myt  (cost=0.00..72.70
rows=1630width=16)
 
(1 row)

regression=# select * from myt order by f1 asc, f2 desc;f1 | f2 
----+---- 1 |  3 1 |  2 1 |  1 2 |  3 2 |  2 2 |  1
(6 rows)

regression=# explain select * from myt order by f1 desc, f2 asc;                                QUERY PLAN
                   
 
-----------------------------------------------------------------------------Index Scan Backward using myi on myt
(cost=0.00..72.70rows=1630 width=16)
 
(1 row)

regression=# select * from myt order by f1 desc, f2 asc;f1 | f2 
----+---- 2 |  1 2 |  2 2 |  3 1 |  1 1 |  2 1 |  3
(6 rows)

This is only a minimal example because I didn't bother with any
cross-type comparisons; you might need those depending on how much
use you expect to get out of the index.

The main problem with this is that you don't have any control over the
NULLS FIRST/LAST behavior.  Pre-8.3, btree indexes will always put nulls
at the end; the opclass has no control over that.  So the effective sort
order here is like ORDER BY f1 ASC NULLS LAST, f2 DESC NULLS LAST (or
NULLS FIRST for the backward scan), which might not be what you'd want.
I'm also pretty sure that the pre-8.3 planner will not figure out how to
use such an index for mergejoins (and it might not work if it did figure
it out, because of the nulls-ordering issue).  You might be able to
finesse all that if you can choose to put the reverse-sort opclass on a
NOT NULL column.
        regards, tom lane

Re: Index usage in order by with multiple columns in order-by-clause

From

Andreas Joseph Krogh

Date:

11 August 2007, 17:15:42

On Saturday 11 August 2007 21:55:49 Tom Lane wrote:
> Andreas Joseph Krogh <andreak@officenet.no> writes:
> > On Friday 10 August 2007 23:30:14 Tom Lane wrote:
> >> Reverse-sorted index columns are possible but not well supported in
> >> existing PG releases (you need a custom operator class, and the planner
> >> is not all that bright about using them).  8.3 will have full support.
> >
> > How exactly do I build an index in which one of the columns is "reverse
> > sorted" in 8.2 (and 8.3)?
>
> Here's a minimal example (tested in 8.2).  The pain-in-the-neck part
> is creating a btree comparison function that reverses the normal one's
> comparisons.  For the example I just did it in plpgsql, but if you were
> to do this sort of thing on large tables you'd probably find you needed
> a function written in C for speed:
>
> regression=# create function btrevfloat8cmp(float8,float8) returns int as
> regression-# $$begin return btfloat8cmp($2, $1); end$$
> regression-# language plpgsql strict immutable;
> CREATE FUNCTION
>
> You then make the opclass using the regular comparison operators listed
> in backwards order, plus the reverse comparison function:
>
> regression=# create operator class rev_float8_ops for type float8 using
> btree regression-# as
> regression-#   operator 1 > ,
> regression-#   operator 2 >= ,
> regression-#   operator 3 = ,
> regression-#   operator 4 <= ,
> regression-#   operator 5 < ,
> regression-#   function 1 btrevfloat8cmp(float8,float8) ;
> CREATE OPERATOR CLASS
>
> And you're off:
>
> regression=# create table myt (f1 float8, f2 float8);
> CREATE TABLE
> regression=# create index myi on myt using btree (f1, f2 rev_float8_ops);
> CREATE INDEX
> regression=# insert into myt values(1,1),(1,2),(1,3),(2,1),(2,2),(2,3);
> INSERT 0 6
> regression=# explain select * from myt order by f1 asc, f2 desc;
>                              QUERY PLAN
> --------------------------------------------------------------------
>  Index Scan using myi on myt  (cost=0.00..72.70 rows=1630 width=16)
> (1 row)
>
> regression=# select * from myt order by f1 asc, f2 desc;
>  f1 | f2
> ----+----
>   1 |  3
>   1 |  2
>   1 |  1
>   2 |  3
>   2 |  2
>   2 |  1
> (6 rows)
>
> regression=# explain select * from myt order by f1 desc, f2 asc;
>                                  QUERY PLAN
> ---------------------------------------------------------------------------
>-- Index Scan Backward using myi on myt  (cost=0.00..72.70 rows=1630
> width=16) (1 row)
>
> regression=# select * from myt order by f1 desc, f2 asc;
>  f1 | f2
> ----+----
>   2 |  1
>   2 |  2
>   2 |  3
>   1 |  1
>   1 |  2
>   1 |  3
> (6 rows)
>
> This is only a minimal example because I didn't bother with any
> cross-type comparisons; you might need those depending on how much
> use you expect to get out of the index.
>
> The main problem with this is that you don't have any control over the
> NULLS FIRST/LAST behavior.  Pre-8.3, btree indexes will always put nulls
> at the end; the opclass has no control over that.  So the effective sort
> order here is like ORDER BY f1 ASC NULLS LAST, f2 DESC NULLS LAST (or
> NULLS FIRST for the backward scan), which might not be what you'd want.
> I'm also pretty sure that the pre-8.3 planner will not figure out how to
> use such an index for mergejoins (and it might not work if it did figure
> it out, because of the nulls-ordering issue).  You might be able to
> finesse all that if you can choose to put the reverse-sort opclass on a
> NOT NULL column.

Thank you, really neat stuff.

--
Andreas Joseph Krogh <andreak@officenet.no>
Senior Software Developer / Manager
------------------------+---------------------------------------------+
OfficeNet AS            | The most difficult thing in the world is to |
Karenslyst Allé 11      | know how to do a thing and to watch         |
PO. Box 529 Skøyen      | somebody else doing it wrong, without       |
0214 Oslo               | comment.                                    |
NORWAY                  |                                             |
Tlf:    +47 24 15 38 90 |                                             |
Fax:    +47 24 15 38 91 |                                             |
Mobile: +47 909  56 963 |                                             |
------------------------+---------------------------------------------+