Thread: Multiple Order By Criteria

Multiple Order By Criteria

From
J@Planeti.Biz
Date:
I'm trying to query a table with 250,000+ rows. My query requires I provide 5 colums in my "order by" clause:
 
select
   column
from table
where
 column >= '2004-3-22 0:0:0'
order by
    ds.receipt desc,
    ds.carrier_id asc,
    ds.batchnum asc,
    encounternum asc,
    ds.encounter_id ASC
limit 100 offset 0
 
I have an index built for each of these columns in my order by clause. This query takes an unacceptable amount of time to execute. Here are the results of the explain:
 
Limit  (cost=229610.78..229611.03 rows=100 width=717)
  ->  Sort  (cost=229610.78..230132.37 rows=208636 width=717)
        Sort Key: receipt, carrier_id, batchnum, encounternum, encounter_id
        ->  Seq Scan on detail_summary ds  (cost=0.00..22647.13 rows=208636 width=717)
              Filter: (receipt >= '2004-03-22'::date)
 
When I have the order by just have 1 criteria, it's fine (just ds.receipt DESC)
 
Limit  (cost=0.00..177.71 rows=100 width=717)
  ->  Index Scan Backward using detail_summary_receipt_id_idx on detail_summary ds  (cost=0.00..370756.84 rows=208636 width=717)
        Index Cond: (receipt >= '2004-03-22'::date)
 
I've increased my work_mem to up to 256meg with no speed increase. I think there's something here I just don't understand.
 
How do I make this go fast ?

 
 
 
 

Re: Multiple Order By Criteria

From
Josh Berkus
Date:
J,

> I have an index built for each of these columns in my order by clause.
> This query takes an unacceptable amount of time to execute. Here are the
> results of the explain:

You need a single index which has all five columns, in order.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

Re: Multiple Order By Criteria

From
J@Planeti.Biz
Date:
I created the index, in order. Did a vacuum analyze on the table and my
explain still says:

Limit  (cost=229610.78..229611.03 rows=100 width=717)
  ->  Sort  (cost=229610.78..230132.37 rows=208636 width=717)
        Sort Key: receipt, carrier_id, batchnum, encounternum, encounter_id
        ->  Seq Scan on detail_summary ds  (cost=0.00..22647.13 rows=208636
width=717)
              Filter: (receipt >= '2004-03-22'::date)


So, for fun I did
set enable_seqscan to off

But that didn't help. For some reason, the sort wants to do a seq scan and
not use my super new index.

Am I doing something wrong ?

----- Original Message -----
From: "Josh Berkus" <josh@agliodbs.com>
To: <pgsql-performance@postgresql.org>
Cc: <J@planeti.biz>
Sent: Tuesday, January 17, 2006 5:25 PM
Subject: Re: [PERFORM] Multiple Order By Criteria


> J,
>
>> I have an index built for each of these columns in my order by clause.
>> This query takes an unacceptable amount of time to execute. Here are the
>> results of the explain:
>
> You need a single index which has all five columns, in order.
>

> --
> --Josh
>
> Josh Berkus
> Aglio Database Solutions
> San Francisco
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>


Re: Multiple Order By Criteria

From
Stephan Szabo
Date:
On Tue, 17 Jan 2006, Josh Berkus wrote:

> J,
>
> > I have an index built for each of these columns in my order by clause.
> > This query takes an unacceptable amount of time to execute. Here are the
> > results of the explain:
>
> You need a single index which has all five columns, in order.

I think he'll also need a reverse opclass for the first column in the
index or for the others since he's doing desc, asc, asc, asc, asc.

Re: Multiple Order By Criteria

From
Fredrick O Jackson
Date:
try adding the keyword 'date' before the date in your query.
I ran into this quite a while back, but I'm not sure I remember the solution.


 > In Reply to: Tuesday January 17 2006 04:29 pm, J@planeti.biz J@planeti.biz
wrote:
> I created the index, in order. Did a vacuum analyze on the table and my
> explain still says:
>
> Limit  (cost=229610.78..229611.03 rows=100 width=717)
>   ->  Sort  (cost=229610.78..230132.37 rows=208636 width=717)
>         Sort Key: receipt, carrier_id, batchnum, encounternum, encounter_id
>         ->  Seq Scan on detail_summary ds  (cost=0.00..22647.13 rows=208636
> width=717)
>               Filter: (receipt >= '2004-03-22'::date)
>
>
> So, for fun I did
> set enable_seqscan to off
>
> But that didn't help. For some reason, the sort wants to do a seq scan and
> not use my super new index.
>
> Am I doing something wrong ?
>
> ----- Original Message -----
> From: "Josh Berkus" <josh@agliodbs.com>
> To: <pgsql-performance@postgresql.org>
> Cc: <J@planeti.biz>
> Sent: Tuesday, January 17, 2006 5:25 PM
> Subject: Re: [PERFORM] Multiple Order By Criteria
>
> > J,
> >
> >> I have an index built for each of these columns in my order by clause.
> >> This query takes an unacceptable amount of time to execute. Here are the
> >> results of the explain:
> >
> > You need a single index which has all five columns, in order.
> >
> >
> > --
> > --Josh
> >
> > Josh Berkus
> > Aglio Database Solutions
> > San Francisco
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 1: if posting/reading through Usenet, please send an appropriate
> >       subscribe-nomail command to majordomo@postgresql.org so that your
> >       message can get through to the mailing list cleanly
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org

Re: Multiple Order By Criteria

From
J@Planeti.Biz
Date:
I created the index like this:

CREATE INDEX rcbee_idx
  ON detail_summary
  USING btree
  (receipt, carrier_id, batchnum, encounternum, encounter_id);

Is this correct ?

How do I make a reverse opclass ?

----- Original Message -----
From: "Stephan Szabo" <sszabo@megazone.bigpanda.com>
To: "Josh Berkus" <josh@agliodbs.com>
Cc: <pgsql-performance@postgresql.org>; <J@planeti.biz>
Sent: Tuesday, January 17, 2006 5:40 PM
Subject: Re: [PERFORM] Multiple Order By Criteria


>
> On Tue, 17 Jan 2006, Josh Berkus wrote:
>
>> J,
>>
>> > I have an index built for each of these columns in my order by clause.
>> > This query takes an unacceptable amount of time to execute. Here are
>> > the
>> > results of the explain:
>>
>> You need a single index which has all five columns, in order.
>
> I think he'll also need a reverse opclass for the first column in the
> index or for the others since he's doing desc, asc, asc, asc, asc.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match
>


Re: Multiple Order By Criteria

From
Stephan Szabo
Date:
On Tue, 17 Jan 2006 J@Planeti.Biz wrote:

> I created the index like this:
>
> CREATE INDEX rcbee_idx
>   ON detail_summary
>   USING btree
>   (receipt, carrier_id, batchnum, encounternum, encounter_id);
>
> Is this correct ?

That would work if you were asking for all the columns ascending or
descending, but we don't currently use it for mixed orders.

> How do I make a reverse opclass ?

There's some information at the following:
http://archives.postgresql.org/pgsql-novice/2005-10/msg00254.php
http://archives.postgresql.org/pgsql-general/2005-01/msg00121.php
http://archives.postgresql.org/pgsql-general/2004-06/msg00565.php

Re: Multiple Order By Criteria

From
J@Planeti.Biz
Date:
I've read all of this info, closely. I wish when I was searching for an
answer for my problem these pages came up. Oh well.

I am getting an idea of what I need to do to make this work well. I was
wondering if there is more information to read on how to implement this
solution in a more simple way. Much of what's written seems to be towards an
audience that should understand certain things automatically.
----- Original Message -----
From: "Stephan Szabo" <sszabo@megazone.bigpanda.com>
To: <J@planeti.biz>
Cc: "Josh Berkus" <josh@agliodbs.com>; <pgsql-performance@postgresql.org>
Sent: Tuesday, January 17, 2006 6:39 PM
Subject: Re: [PERFORM] Multiple Order By Criteria


>
> On Tue, 17 Jan 2006 J@Planeti.Biz wrote:
>
>> I created the index like this:
>>
>> CREATE INDEX rcbee_idx
>>   ON detail_summary
>>   USING btree
>>   (receipt, carrier_id, batchnum, encounternum, encounter_id);
>>
>> Is this correct ?
>
> That would work if you were asking for all the columns ascending or
> descending, but we don't currently use it for mixed orders.
>
>> How do I make a reverse opclass ?
>
> There's some information at the following:
> http://archives.postgresql.org/pgsql-novice/2005-10/msg00254.php
> http://archives.postgresql.org/pgsql-general/2005-01/msg00121.php
> http://archives.postgresql.org/pgsql-general/2004-06/msg00565.php
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>


Re: Multiple Order By Criteria

From
"Ahmad Fajar"
Date:

-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of J@Planeti.Biz
Sent: Rabu, 18 Januari 2006 07:23
To: Stephan Szabo
Cc: Josh Berkus; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Multiple Order By Criteria

I've read all of this info, closely. I wish when I was searching for an
answer for my problem these pages came up. Oh well.
Well, I think you have to know about btree index. Btree is good enough,
although it's not better. It will perform best, if it doesn't index
too many multiple column.
In your case, you have to consentrate on 2 or 3 fields that will
use frequently. Put the most duplicate value on the front and others
are behind.
Eq:
receipt, carrier_id, batchnum is the most frequently use,
but the most duplicate value are: carrier_id, receipt, and batchnum
so make btree index (carrier_id, receipt, batchnum).
Btree will not suffer, and we also will advantage if the table
have relationship with other table with the same fields order. We have
not to make another index for that relation.

Best regards,
ahmad fajar.


> I am getting an idea of what I need to do to make this work well. I was
> wondering if there is more information to read on how to implement this
> solution in a more simple way. Much of what's written seems to be towards
> audience that should understand certain things automatically.
----- Original Message -----
From: "Stephan Szabo" <sszabo@megazone.bigpanda.com>
To: <J@planeti.biz>
Cc: "Josh Berkus" <josh@agliodbs.com>; <pgsql-performance@postgresql.org>
Sent: Tuesday, January 17, 2006 6:39 PM
Subject: Re: [PERFORM] Multiple Order By Criteria


>
> On Tue, 17 Jan 2006 J@Planeti.Biz wrote:
>
>> I created the index like this:
>>
>> CREATE INDEX rcbee_idx
>>   ON detail_summary
>>   USING btree
>>   (receipt, carrier_id, batchnum, encounternum, encounter_id);
>>
>> Is this correct ?
>
> That would work if you were asking for all the columns ascending or
> descending, but we don't currently use it for mixed orders.
>
>> How do I make a reverse opclass ?
>
> There's some information at the following:
> http://archives.postgresql.org/pgsql-novice/2005-10/msg00254.php
> http://archives.postgresql.org/pgsql-general/2005-01/msg00121.php
> http://archives.postgresql.org/pgsql-general/2004-06/msg00565.php
>


Re: Multiple Order By Criteria

From
J@Planeti.Biz
Date:
I have the answer I've been looking for and I'd like to share with all.
After help from you guys, it appeared that the real issue was using an index
for my order by X DESC clauses. For some reason that doesn't make good
sense, postgres doesn't support this, when it kinda should automatically.

Take the following end of an SQL statement.

order by
    col1 DESC
    col2 ASC
    col3 ASC

The first thing I learned is that you need an index that contains all these
columns in it, in this order. If one of them has DESC then you have to
create a function / operator class for each data type, in this case let's
assume it's an int4. So, first thing you do is create a function that you're
going to use in your operator:

create function
    int4_revcmp(int4,int4)  // --> cal the function whatever you want
    returns int4
    as 'select $2 - $1'
language sql;

Then you make your operator class.
CREATE OPERATOR CLASS int4_revop
      FOR TYPE int4 USING btree AS
          OPERATOR        1       > ,
          OPERATOR        2       >= ,
          OPERATOR        3       = ,
          OPERATOR        4       <= ,
          OPERATOR        5       < ,
          FUNCTION        1       int4_revcmp(int4, int4); // --> must be
the name of your function you created.

Then when you make your index

create index rev_idx on table
    using btree(
        col1 int4_revop,    // --> must be name of operator class you
defined.
        col2,
        col3
);

What I don't understand is how to make this function / operator class work
with a text datatype. I tried interchanging the int4 with char and text and
postgres didn't like the (as 'select $2 - $1') in the function, which I can
kinda understand. Since I'm slighlty above my head at this point, I don't
really know how to do it. Does any smart people here know how ?


Re: Multiple Order By Criteria

From
Stephan Szabo
Date:
On Wed, 18 Jan 2006 J@Planeti.Biz wrote:

> I have the answer I've been looking for and I'd like to share with all.
> After help from you guys, it appeared that the real issue was using an index
> for my order by X DESC clauses. For some reason that doesn't make good
> sense, postgres doesn't support this, when it kinda should automatically.

Well, the problem is that we do order with the index simply by through
following index order.  Standard index order is going to give you a sorted
order only in some particular order and its inverse.  IIRC, there are ways
to use an index in all ascending order to do mixed orders, but I think
those may involve traversing parts of the index multiple times and hasn't
been implemented.

> The first thing I learned is that you need an index that contains all these
> columns in it, in this order. If one of them has DESC then you have to
> create a function / operator class for each data type, in this case let's
> assume it's an int4. So, first thing you do is create a function that you're
> going to use in your operator:
>
> create function
>     int4_revcmp(int4,int4)  // --> cal the function whatever you want
>     returns int4
>     as 'select $2 - $1'
> language sql;
>
> Then you make your operator class.
> CREATE OPERATOR CLASS int4_revop
>       FOR TYPE int4 USING btree AS
>           OPERATOR        1       > ,
>           OPERATOR        2       >= ,
>           OPERATOR        3       = ,
>           OPERATOR        4       <= ,
>           OPERATOR        5       < ,
>           FUNCTION        1       int4_revcmp(int4, int4); // --> must be
> the name of your function you created.
>
> Then when you make your index
>
> create index rev_idx on table
>     using btree(
>         col1 int4_revop,    // --> must be name of operator class you
> defined.
>         col2,
>         col3
> );
>
> What I don't understand is how to make this function / operator class work
> with a text datatype. I tried interchanging the int4 with char and text and
> postgres didn't like the (as 'select $2 - $1') in the function, which I can
> kinda understand. Since I'm slighlty above my head at this point, I don't
> really know how to do it. Does any smart people here know how ?

I think having the function call the helper function for the normal
operator class for the type function with the arguments in reverse order
may work (or negating its output).

If you have any interest, there's an outstanding call for C versions of
the helper functions that we could then package up in contrib with the
operator class definitions.

Re: Multiple Order By Criteria

From
J@Planeti.Biz
Date:
Here's some C to use to create the operator classes, seems to work ok.
---

#include "postgres.h"
#include <string.h>
#include "fmgr.h"
#include "utils/date.h"

/* For date sorts */

PG_FUNCTION_INFO_V1(ddd_date_revcmp);

Datum   ddd_date_revcmp(PG_FUNCTION_ARGS){
        DateADT         arg1=PG_GETARG_DATEADT(0);
        DateADT         arg2=PG_GETARG_DATEADT(1);

        PG_RETURN_INT32(arg2 - arg1);
}

/* For integer sorts */

PG_FUNCTION_INFO_V1(ddd_int_revcmp);

Datum   ddd_int_revcmp(PG_FUNCTION_ARGS){
        int32           arg1=PG_GETARG_INT32(0);
        int32           arg2=PG_GETARG_INT32(1);

        PG_RETURN_INT32(arg2 - arg1);
}

/* For string sorts */

PG_FUNCTION_INFO_V1(ddd_text_revcmp);

Datum   ddd_text_revcmp(PG_FUNCTION_ARGS){
        text*           arg1=PG_GETARG_TEXT_P(0);
        text*           arg2=PG_GETARG_TEXT_P(1);

        PG_RETURN_INT32(strcmp((char*)VARDATA(arg2),(char*)VARDATA(arg1)));
}


/*
create function ddd_date_revcmp(date,date) returns int4 as
'/data/postgres/contrib/cmplib.so', 'ddd_date_revcmp' LANGUAGE C STRICT;
create function ddd_int_revcmp(int4,int4) returns int4 as
'/data/postgres/contrib/cmplib.so', 'ddd_int_revcmp' LANGUAGE C STRICT;
create function ddd_text_revcmp(text,text) returns int4 as
'/data/postgres/contrib/cmplib.so', 'ddd_text_revcmp' LANGUAGE C STRICT;
 */

----- Original Message -----
From: "Stephan Szabo" <sszabo@megazone.bigpanda.com>
To: <J@Planeti.Biz>
Sent: Wednesday, January 18, 2006 2:24 PM
Subject: Re: [PERFORM] Multiple Order By Criteria


> On Wed, 18 Jan 2006 J@Planeti.Biz wrote:
>
>> Could you explain to me how do create this operator class for a text data
>> type ? I think it will give me more of an understanding of what's going
>> on
>> if I could see this example.
>
> Using an SQL function (mostly because I'm too lazy to look up the C call
> syntax) I think it'd be something like:
>
> create function bttextrevcmp(text, text) returns int4 as
> 'select bttextcmp($2, $1)' language 'sql';
>
> CREATE OPERATOR CLASS text_revop
>      FOR TYPE text USING btree AS
>          OPERATOR        1       > ,
>          OPERATOR        2       >= ,
>          OPERATOR        3       = ,
>          OPERATOR        4       <= ,
>          OPERATOR        5       < ,
>          FUNCTION        1       bttextrevcmp(text,text);
>
> I believe bttextcmp is the standard text btree operator class helper
> function, so we call it with reverse arguments to try to flip its results
> (I think -bttextcmp($1,$2) would also work).
>