Thread: Multiple Order By Criteria
I'm trying to query a table with 250,000+ rows. My query requires I provide 5 colums in my "order by" clause:
select
column
column
from table
where
column >= '2004-3-22 0:0:0'
order by
order by
ds.receipt desc,
ds.carrier_id asc,
ds.batchnum asc,
encounternum asc,
ds.encounter_id ASC
limit 100 offset 0
I have an index built for each of these columns in my order by clause. This query takes an unacceptable amount of time to execute. Here are the results of the explain:
Limit (cost=229610.78..229611.03 rows=100 width=717)
-> Sort (cost=229610.78..230132.37 rows=208636 width=717)
Sort Key: receipt, carrier_id, batchnum, encounternum, encounter_id
-> Seq Scan on detail_summary ds (cost=0.00..22647.13 rows=208636 width=717)
Filter: (receipt >= '2004-03-22'::date)
-> Sort (cost=229610.78..230132.37 rows=208636 width=717)
Sort Key: receipt, carrier_id, batchnum, encounternum, encounter_id
-> Seq Scan on detail_summary ds (cost=0.00..22647.13 rows=208636 width=717)
Filter: (receipt >= '2004-03-22'::date)
When I have the order by just have 1 criteria, it's fine (just ds.receipt DESC)
Limit (cost=0.00..177.71 rows=100 width=717)
-> Index Scan Backward using detail_summary_receipt_id_idx on detail_summary ds (cost=0.00..370756.84 rows=208636 width=717)
Index Cond: (receipt >= '2004-03-22'::date)
-> Index Scan Backward using detail_summary_receipt_id_idx on detail_summary ds (cost=0.00..370756.84 rows=208636 width=717)
Index Cond: (receipt >= '2004-03-22'::date)
I've increased my work_mem to up to 256meg with no speed increase. I think there's something here I just don't understand.
How do I make this go fast ?
J, > I have an index built for each of these columns in my order by clause. > This query takes an unacceptable amount of time to execute. Here are the > results of the explain: You need a single index which has all five columns, in order. -- --Josh Josh Berkus Aglio Database Solutions San Francisco
I created the index, in order. Did a vacuum analyze on the table and my explain still says: Limit (cost=229610.78..229611.03 rows=100 width=717) -> Sort (cost=229610.78..230132.37 rows=208636 width=717) Sort Key: receipt, carrier_id, batchnum, encounternum, encounter_id -> Seq Scan on detail_summary ds (cost=0.00..22647.13 rows=208636 width=717) Filter: (receipt >= '2004-03-22'::date) So, for fun I did set enable_seqscan to off But that didn't help. For some reason, the sort wants to do a seq scan and not use my super new index. Am I doing something wrong ? ----- Original Message ----- From: "Josh Berkus" <josh@agliodbs.com> To: <pgsql-performance@postgresql.org> Cc: <J@planeti.biz> Sent: Tuesday, January 17, 2006 5:25 PM Subject: Re: [PERFORM] Multiple Order By Criteria > J, > >> I have an index built for each of these columns in my order by clause. >> This query takes an unacceptable amount of time to execute. Here are the >> results of the explain: > > You need a single index which has all five columns, in order. > > -- > --Josh > > Josh Berkus > Aglio Database Solutions > San Francisco > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
On Tue, 17 Jan 2006, Josh Berkus wrote: > J, > > > I have an index built for each of these columns in my order by clause. > > This query takes an unacceptable amount of time to execute. Here are the > > results of the explain: > > You need a single index which has all five columns, in order. I think he'll also need a reverse opclass for the first column in the index or for the others since he's doing desc, asc, asc, asc, asc.
try adding the keyword 'date' before the date in your query. I ran into this quite a while back, but I'm not sure I remember the solution. > In Reply to: Tuesday January 17 2006 04:29 pm, J@planeti.biz J@planeti.biz wrote: > I created the index, in order. Did a vacuum analyze on the table and my > explain still says: > > Limit (cost=229610.78..229611.03 rows=100 width=717) > -> Sort (cost=229610.78..230132.37 rows=208636 width=717) > Sort Key: receipt, carrier_id, batchnum, encounternum, encounter_id > -> Seq Scan on detail_summary ds (cost=0.00..22647.13 rows=208636 > width=717) > Filter: (receipt >= '2004-03-22'::date) > > > So, for fun I did > set enable_seqscan to off > > But that didn't help. For some reason, the sort wants to do a seq scan and > not use my super new index. > > Am I doing something wrong ? > > ----- Original Message ----- > From: "Josh Berkus" <josh@agliodbs.com> > To: <pgsql-performance@postgresql.org> > Cc: <J@planeti.biz> > Sent: Tuesday, January 17, 2006 5:25 PM > Subject: Re: [PERFORM] Multiple Order By Criteria > > > J, > > > >> I have an index built for each of these columns in my order by clause. > >> This query takes an unacceptable amount of time to execute. Here are the > >> results of the explain: > > > > You need a single index which has all five columns, in order. > > > > > > -- > > --Josh > > > > Josh Berkus > > Aglio Database Solutions > > San Francisco > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 1: if posting/reading through Usenet, please send an appropriate > > subscribe-nomail command to majordomo@postgresql.org so that your > > message can get through to the mailing list cleanly > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org
I created the index like this: CREATE INDEX rcbee_idx ON detail_summary USING btree (receipt, carrier_id, batchnum, encounternum, encounter_id); Is this correct ? How do I make a reverse opclass ? ----- Original Message ----- From: "Stephan Szabo" <sszabo@megazone.bigpanda.com> To: "Josh Berkus" <josh@agliodbs.com> Cc: <pgsql-performance@postgresql.org>; <J@planeti.biz> Sent: Tuesday, January 17, 2006 5:40 PM Subject: Re: [PERFORM] Multiple Order By Criteria > > On Tue, 17 Jan 2006, Josh Berkus wrote: > >> J, >> >> > I have an index built for each of these columns in my order by clause. >> > This query takes an unacceptable amount of time to execute. Here are >> > the >> > results of the explain: >> >> You need a single index which has all five columns, in order. > > I think he'll also need a reverse opclass for the first column in the > index or for the others since he's doing desc, asc, asc, asc, asc. > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match >
On Tue, 17 Jan 2006 J@Planeti.Biz wrote: > I created the index like this: > > CREATE INDEX rcbee_idx > ON detail_summary > USING btree > (receipt, carrier_id, batchnum, encounternum, encounter_id); > > Is this correct ? That would work if you were asking for all the columns ascending or descending, but we don't currently use it for mixed orders. > How do I make a reverse opclass ? There's some information at the following: http://archives.postgresql.org/pgsql-novice/2005-10/msg00254.php http://archives.postgresql.org/pgsql-general/2005-01/msg00121.php http://archives.postgresql.org/pgsql-general/2004-06/msg00565.php
I've read all of this info, closely. I wish when I was searching for an answer for my problem these pages came up. Oh well. I am getting an idea of what I need to do to make this work well. I was wondering if there is more information to read on how to implement this solution in a more simple way. Much of what's written seems to be towards an audience that should understand certain things automatically. ----- Original Message ----- From: "Stephan Szabo" <sszabo@megazone.bigpanda.com> To: <J@planeti.biz> Cc: "Josh Berkus" <josh@agliodbs.com>; <pgsql-performance@postgresql.org> Sent: Tuesday, January 17, 2006 6:39 PM Subject: Re: [PERFORM] Multiple Order By Criteria > > On Tue, 17 Jan 2006 J@Planeti.Biz wrote: > >> I created the index like this: >> >> CREATE INDEX rcbee_idx >> ON detail_summary >> USING btree >> (receipt, carrier_id, batchnum, encounternum, encounter_id); >> >> Is this correct ? > > That would work if you were asking for all the columns ascending or > descending, but we don't currently use it for mixed orders. > >> How do I make a reverse opclass ? > > There's some information at the following: > http://archives.postgresql.org/pgsql-novice/2005-10/msg00254.php > http://archives.postgresql.org/pgsql-general/2005-01/msg00121.php > http://archives.postgresql.org/pgsql-general/2004-06/msg00565.php > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend >
-----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of J@Planeti.Biz Sent: Rabu, 18 Januari 2006 07:23 To: Stephan Szabo Cc: Josh Berkus; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Multiple Order By Criteria I've read all of this info, closely. I wish when I was searching for an answer for my problem these pages came up. Oh well. Well, I think you have to know about btree index. Btree is good enough, although it's not better. It will perform best, if it doesn't index too many multiple column. In your case, you have to consentrate on 2 or 3 fields that will use frequently. Put the most duplicate value on the front and others are behind. Eq: receipt, carrier_id, batchnum is the most frequently use, but the most duplicate value are: carrier_id, receipt, and batchnum so make btree index (carrier_id, receipt, batchnum). Btree will not suffer, and we also will advantage if the table have relationship with other table with the same fields order. We have not to make another index for that relation. Best regards, ahmad fajar. > I am getting an idea of what I need to do to make this work well. I was > wondering if there is more information to read on how to implement this > solution in a more simple way. Much of what's written seems to be towards > audience that should understand certain things automatically. ----- Original Message ----- From: "Stephan Szabo" <sszabo@megazone.bigpanda.com> To: <J@planeti.biz> Cc: "Josh Berkus" <josh@agliodbs.com>; <pgsql-performance@postgresql.org> Sent: Tuesday, January 17, 2006 6:39 PM Subject: Re: [PERFORM] Multiple Order By Criteria > > On Tue, 17 Jan 2006 J@Planeti.Biz wrote: > >> I created the index like this: >> >> CREATE INDEX rcbee_idx >> ON detail_summary >> USING btree >> (receipt, carrier_id, batchnum, encounternum, encounter_id); >> >> Is this correct ? > > That would work if you were asking for all the columns ascending or > descending, but we don't currently use it for mixed orders. > >> How do I make a reverse opclass ? > > There's some information at the following: > http://archives.postgresql.org/pgsql-novice/2005-10/msg00254.php > http://archives.postgresql.org/pgsql-general/2005-01/msg00121.php > http://archives.postgresql.org/pgsql-general/2004-06/msg00565.php >
I have the answer I've been looking for and I'd like to share with all. After help from you guys, it appeared that the real issue was using an index for my order by X DESC clauses. For some reason that doesn't make good sense, postgres doesn't support this, when it kinda should automatically. Take the following end of an SQL statement. order by col1 DESC col2 ASC col3 ASC The first thing I learned is that you need an index that contains all these columns in it, in this order. If one of them has DESC then you have to create a function / operator class for each data type, in this case let's assume it's an int4. So, first thing you do is create a function that you're going to use in your operator: create function int4_revcmp(int4,int4) // --> cal the function whatever you want returns int4 as 'select $2 - $1' language sql; Then you make your operator class. CREATE OPERATOR CLASS int4_revop FOR TYPE int4 USING btree AS OPERATOR 1 > , OPERATOR 2 >= , OPERATOR 3 = , OPERATOR 4 <= , OPERATOR 5 < , FUNCTION 1 int4_revcmp(int4, int4); // --> must be the name of your function you created. Then when you make your index create index rev_idx on table using btree( col1 int4_revop, // --> must be name of operator class you defined. col2, col3 ); What I don't understand is how to make this function / operator class work with a text datatype. I tried interchanging the int4 with char and text and postgres didn't like the (as 'select $2 - $1') in the function, which I can kinda understand. Since I'm slighlty above my head at this point, I don't really know how to do it. Does any smart people here know how ?
On Wed, 18 Jan 2006 J@Planeti.Biz wrote: > I have the answer I've been looking for and I'd like to share with all. > After help from you guys, it appeared that the real issue was using an index > for my order by X DESC clauses. For some reason that doesn't make good > sense, postgres doesn't support this, when it kinda should automatically. Well, the problem is that we do order with the index simply by through following index order. Standard index order is going to give you a sorted order only in some particular order and its inverse. IIRC, there are ways to use an index in all ascending order to do mixed orders, but I think those may involve traversing parts of the index multiple times and hasn't been implemented. > The first thing I learned is that you need an index that contains all these > columns in it, in this order. If one of them has DESC then you have to > create a function / operator class for each data type, in this case let's > assume it's an int4. So, first thing you do is create a function that you're > going to use in your operator: > > create function > int4_revcmp(int4,int4) // --> cal the function whatever you want > returns int4 > as 'select $2 - $1' > language sql; > > Then you make your operator class. > CREATE OPERATOR CLASS int4_revop > FOR TYPE int4 USING btree AS > OPERATOR 1 > , > OPERATOR 2 >= , > OPERATOR 3 = , > OPERATOR 4 <= , > OPERATOR 5 < , > FUNCTION 1 int4_revcmp(int4, int4); // --> must be > the name of your function you created. > > Then when you make your index > > create index rev_idx on table > using btree( > col1 int4_revop, // --> must be name of operator class you > defined. > col2, > col3 > ); > > What I don't understand is how to make this function / operator class work > with a text datatype. I tried interchanging the int4 with char and text and > postgres didn't like the (as 'select $2 - $1') in the function, which I can > kinda understand. Since I'm slighlty above my head at this point, I don't > really know how to do it. Does any smart people here know how ? I think having the function call the helper function for the normal operator class for the type function with the arguments in reverse order may work (or negating its output). If you have any interest, there's an outstanding call for C versions of the helper functions that we could then package up in contrib with the operator class definitions.
Here's some C to use to create the operator classes, seems to work ok. --- #include "postgres.h" #include <string.h> #include "fmgr.h" #include "utils/date.h" /* For date sorts */ PG_FUNCTION_INFO_V1(ddd_date_revcmp); Datum ddd_date_revcmp(PG_FUNCTION_ARGS){ DateADT arg1=PG_GETARG_DATEADT(0); DateADT arg2=PG_GETARG_DATEADT(1); PG_RETURN_INT32(arg2 - arg1); } /* For integer sorts */ PG_FUNCTION_INFO_V1(ddd_int_revcmp); Datum ddd_int_revcmp(PG_FUNCTION_ARGS){ int32 arg1=PG_GETARG_INT32(0); int32 arg2=PG_GETARG_INT32(1); PG_RETURN_INT32(arg2 - arg1); } /* For string sorts */ PG_FUNCTION_INFO_V1(ddd_text_revcmp); Datum ddd_text_revcmp(PG_FUNCTION_ARGS){ text* arg1=PG_GETARG_TEXT_P(0); text* arg2=PG_GETARG_TEXT_P(1); PG_RETURN_INT32(strcmp((char*)VARDATA(arg2),(char*)VARDATA(arg1))); } /* create function ddd_date_revcmp(date,date) returns int4 as '/data/postgres/contrib/cmplib.so', 'ddd_date_revcmp' LANGUAGE C STRICT; create function ddd_int_revcmp(int4,int4) returns int4 as '/data/postgres/contrib/cmplib.so', 'ddd_int_revcmp' LANGUAGE C STRICT; create function ddd_text_revcmp(text,text) returns int4 as '/data/postgres/contrib/cmplib.so', 'ddd_text_revcmp' LANGUAGE C STRICT; */ ----- Original Message ----- From: "Stephan Szabo" <sszabo@megazone.bigpanda.com> To: <J@Planeti.Biz> Sent: Wednesday, January 18, 2006 2:24 PM Subject: Re: [PERFORM] Multiple Order By Criteria > On Wed, 18 Jan 2006 J@Planeti.Biz wrote: > >> Could you explain to me how do create this operator class for a text data >> type ? I think it will give me more of an understanding of what's going >> on >> if I could see this example. > > Using an SQL function (mostly because I'm too lazy to look up the C call > syntax) I think it'd be something like: > > create function bttextrevcmp(text, text) returns int4 as > 'select bttextcmp($2, $1)' language 'sql'; > > CREATE OPERATOR CLASS text_revop > FOR TYPE text USING btree AS > OPERATOR 1 > , > OPERATOR 2 >= , > OPERATOR 3 = , > OPERATOR 4 <= , > OPERATOR 5 < , > FUNCTION 1 bttextrevcmp(text,text); > > I believe bttextcmp is the standard text btree operator class helper > function, so we call it with reverse arguments to try to flip its results > (I think -bttextcmp($1,$2) would also work). >