Thread: Indexes with descending date columns
Hi I have a performance problem when traversing a table in index order with multiple columns including a date column in date reverse order. Below follows a simplified description of the table, the index and the associated query \d prcdedit prcdedit_prcd | character(20) | prcdedit_date | timestamp without time zone | Indexes: "prcdedit_idx" btree (prcdedit_prcd, prcdedit_date) When invoking a query such as select oid, prcdedit_prcd, prcdedit_date, 'dd/mm/yyyy hh24:mi:ss') as mydate where prcdedit_prcd > 'somevalue' order by prcdedit_prcd, prcdedit_date desc; the peformance is dismal. However removing the 'desc' qualifier as follows the query flys select oid, prcdedit_prcd, prcdedit_date, 'dd/mm/yyyy hh24:mi:ss') as mydate where prcdedit_prcd > 'somevalue' order by prcdedit_prcd, prcdedit_date; PostgreSQL Version = 8.1.2 Row count on the table is > 300000 Explain is as follows for desc Limit (cost=81486.35..81486.41 rows=25 width=230) (actual time=116619.652..116619.861 rows=25 loops=1) -> Sort (cost=81486.35..82411.34 rows=369997 width=230) (actual time=116619.646..116619.729 rows=25 loops=1) Sort Key: prcdedit_prcd, prcdedit_date, oid -> Bitmap Heap Scan on prcdedit (cost=4645.99..23454.94 rows=369997 width=230) (actual time=376.952..11798.834 rows=369630 loops=1) Recheck Cond: (prcdedit_prcd > '063266 '::bpchar) -> Bitmap Index Scan on prcdedit_idx (cost=0.00..4645.99 rows=369997 width=0) (actual time=366.048..366.048 rows=369630 loops=1) Index Cond: (prcdedit_prcd > '063266 '::bpchar) Total runtime: 116950.175 ms and as follows when I remove the 'desc' Limit (cost=0.00..2.34 rows=25 width=230) (actual time=0.082..0.535 rows=25 loops=1) -> Index Scan using prcdedit_idx on prcdedit (cost=0.00..34664.63 rows=369997 width=230) (actual time=0.075..0.405 rows=25 loops=1) Index Cond: (prcdedit_prcd > '063266 '::bpchar) Total runtime: 0.664 ms Any assistance/advice much appreciated. -- Regards Theo
> I have a performance problem when traversing a table in index order with > multiple columns including a date column in date reverse order. Below > follows a simplified description of the table, the index and the > associated query > > \d prcdedit > prcdedit_prcd | character(20) | > prcdedit_date | timestamp without time zone | > > Indexes: > "prcdedit_idx" btree (prcdedit_prcd, prcdedit_date) Depending on how you use the table, there are three possible solutions. First, if it makes sense in the domain, using an ORDER BY where _both_ columns are used descending will make PG search theindex in reverse and will be just as fast as when both as searched by the default ascending. Second possibility: Create a dummy column whose value depends on the negative of prcdedit_date, e.g., -extract(epoch fromprcdedit_date), keep the dummy column in sync with the original column using triggers, and rewrite your queries to useORDER BY prcdedit_prod, dummy_column. Third: Create an index on a function which sorts in the order you want, and then always sort using the function index (youcould use the -extract(epoch...) gimmick for that, among other possibilities.) HTH.
On Fri, 2006-03-17 at 08:25, andrew@pillette.com wrote: > > I have a performance problem when traversing a table in index order with > > multiple columns including a date column in date reverse order. Below > > follows a simplified description of the table, the index and the > > associated query > > > > \d prcdedit > > prcdedit_prcd | character(20) | > > prcdedit_date | timestamp without time zone | > > > > Indexes: > > "prcdedit_idx" btree (prcdedit_prcd, prcdedit_date) > > Depending on how you use the table, there are three possible solutions. > > First, if it makes sense in the domain, using an ORDER BY where _both_ columns are used descending will make PG searchthe index in reverse and will be just as fast as when both as searched by the default ascending. > > Second possibility: Create a dummy column whose value depends on the negative of prcdedit_date, e.g., -extract(epoch fromprcdedit_date), keep the dummy column in sync with the original column using triggers, and rewrite your queries to useORDER BY prcdedit_prod, dummy_column. > > Third: Create an index on a function which sorts in the order you want, and then always sort using the function index (youcould use the -extract(epoch...) gimmick for that, among other possibilities.) > > HTH. All good input - thanks, however, before I start messing with my stuff which I know will be complex - some questions to any of the developers on the list. i Is it feasible to extend index creation to support descending columns? ... this is supported on other commercial and non commercial databases, but I do not know if this is a SQL standard. ii If no to i, is it feasible to extend PostgreSQL to allow traversing an index in column descending and column ascending order - assuming an order by on more than one column with column order not in the same direction and indexes existing? ... if that makes sense. -- Regards Theo
Theo Kramer wrote: > All good input - thanks, however, before I start messing with my stuff > which I know will be complex - some questions to any of the developers > on the list. > > i Is it feasible to extend index creation to support descending > columns? ... this is supported on other commercial and non > commercial databases, but I do not know if this is a SQL standard. This can be done. You need to create an operator class which specifies the reverse sort order (i.e. reverse the operators), and then use it in the new index. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Thu, 2006-03-23 at 16:16, Alvaro Herrera wrote: > Theo Kramer wrote: > > > All good input - thanks, however, before I start messing with my stuff > > which I know will be complex - some questions to any of the developers > > on the list. > > > > i Is it feasible to extend index creation to support descending > > columns? ... this is supported on other commercial and non > > commercial databases, but I do not know if this is a SQL standard. > > This can be done. You need to create an operator class which specifies > the reverse sort order (i.e. reverse the operators), and then use it in > the new index. Hmmm, would that then result in the following syntax being valid? create index my_idx on my_table (c1, c2 desc, c3, c4 desc) ; where my_table is defined as create table my_table ( c1 text, c2 timestamp, c3 integer, c4 integer ); If so, I would appreciate any pointers on where to start on this - already fumbling my way through Interfacing Extensions To Indexes in the manual... Regards Theo -- Regards Theo
Theo Kramer <theo@flame.co.za> writes: > If so, I would appreciate any pointers on where to start on this - > already fumbling my way through Interfacing Extensions To Indexes in the > manual... Search the PG list archives for discussions of reverse-sort opclasses. It's really pretty trivial, once you've created a negated btree comparison function for the datatype. This is the sort of thing that we are almost but not quite ready to put into the standard distribution. The issues that are bugging me have to do with whether NULLs sort low or high --- right now, if you make a reverse-sort opclass, it will effectively sort NULLs low instead of high, and that has some unpleasant consequences because the rest of the system isn't prepared for variance on the point (in particular I'm afraid this could break mergejoins). I'd like to see us make "NULLs low" vs "NULLs high" be a defined property of opclasses, and deal with the fallout from that, and then we could put reverse-sort opclasses for all the standard datatypes into the regular distribution. regards, tom lane
On Thu, Mar 23, 2006 at 01:09:49PM +0200, Theo Kramer wrote: > ii If no to i, is it feasible to extend PostgreSQL to allow traversing > an index in column descending and column ascending order - assuming > an order by on more than one column with column order not > in the same direction and indexes existing? ... if that makes sense. Yes. stats=# explain select * from email_contrib order by project_id desc, id desc, date desc limit 10; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Limit (cost=0.00..31.76 rows=10 width=24) -> Index Scan Backward using email_contrib_pkey on email_contrib (cost=0.00..427716532.18 rows=134656656 width=24) (2 rows) -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Hi, I have a select like SELECT (array[20]+array[21]+ ... +array[50]+array[51]) as total FROM table WHERE (array[20]+array[21]+ ... +array[50]+array[51])<5000 AND array[20]<>0 AND array[21]<>0 ... AND array[50]<>0 AND array[51])<>0 Any ideas to make this query faster?
On Fri, Mar 24, 2006 at 01:41:50PM +0100, Ruben Rubio Rey wrote: > Hi, > > I have a select like > > SELECT (array[20]+array[21]+ ... +array[50]+array[51]) as total > FROM table > WHERE > (array[20]+array[21]+ ... +array[50]+array[51])<5000 http://www.varlena.com/GeneralBits/109.php might provide some useful insights. I also recall seeing something about sum operators for arrays, but I can't recall where. > AND array[20]<>0 > AND array[21]<>0 > ... > AND array[50]<>0 > AND array[51])<>0 Uhm... please don't tell me that you're using 0 in place of NULL... You might be able to greatly simplify that by use of ANY; you'd need to ditch elements 1-19 though: ... WHERE NOT ANY(array) = 0 See http://www.postgresql.org/docs/8.1/interactive/arrays.html -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim C. Nasby wrote: >On Fri, Mar 24, 2006 at 01:41:50PM +0100, Ruben Rubio Rey wrote: > > >>Hi, >> >>I have a select like >> >>SELECT (array[20]+array[21]+ ... +array[50]+array[51]) as total >>FROM table >>WHERE >>(array[20]+array[21]+ ... +array[50]+array[51])<5000 >> >> > >http://www.varlena.com/GeneralBits/109.php might provide some useful >insights. I also recall seeing something about sum operators for arrays, >but I can't recall where. > > I ll check it out, seems to be very useful Is faster create a function to sum the array? > > >>AND array[20]<>0 >>AND array[21]<>0 >>... >>AND array[50]<>0 >>AND array[51])<>0 >> >> > >Uhm... please don't tell me that you're using 0 in place of NULL... > > mmm ... i have read in postgres documentation that null values on arrays are not supported ... >You might be able to greatly simplify that by use of ANY; you'd need to >ditch elements 1-19 though: > >... WHERE NOT ANY(array) = 0 > > Yep this is much better. >See http://www.postgresql.org/docs/8.1/interactive/arrays.html > >
On Fri, Mar 24, 2006 at 02:01:29PM +0100, Ruben Rubio Rey wrote: > >http://www.varlena.com/GeneralBits/109.php might provide some useful > >insights. I also recall seeing something about sum operators for arrays, > >but I can't recall where. > > > > > I ll check it out, seems to be very useful > Is faster create a function to sum the array? There's been talk of having one, but I don't think any such thing currently exists. > >>AND array[20]<>0 > >>AND array[21]<>0 > >>... > >>AND array[50]<>0 > >>AND array[51])<>0 > >> > >> > > > >Uhm... please don't tell me that you're using 0 in place of NULL... > > > > > mmm ... i have read in postgres documentation that null values on arrays > are not supported ... Damn, you're right. Another reason I tend to stay away from them... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Fri, Mar 24, 2006 at 07:06:19AM -0600, Jim C. Nasby wrote: > On Fri, Mar 24, 2006 at 02:01:29PM +0100, Ruben Rubio Rey wrote: > > mmm ... i have read in postgres documentation that null values on arrays > > are not supported ... > > Damn, you're right. Another reason I tend to stay away from them... 8.2 will support NULL array elements. http://archives.postgresql.org/pgsql-committers/2005-11/msg00385.php http://developer.postgresql.org/docs/postgres/arrays.html test=> SELECT '{1,2,NULL,3,4}'::integer[]; int4 ---------------- {1,2,NULL,3,4} (1 row) -- Michael Fuhr
With 8.1.3, I get an error when trying to do this on a Text[] column : .. WHERE ANY(array) LIKE 'xx%' Indeed, I get rejected even with: .. WHERE ANY(array) = 'xx' In both cases, the error is: ERROR: syntax error at or near "any" ... It would only work as documented in the manual (8.10.5): SELECT * FROM sal_emp WHERE 10000 = ANY (pay_by_quarter);It appears that this restriction is still in place in 8.2:
http://developer.postgresql.org/docs/postgres/arrays.html
Is that the case?
Thanks in advance,
KC.
Ruben Rubio Rey <ruben@rentalia.com> writes: > SELECT (array[20]+array[21]+ ... +array[50]+array[51]) as total > FROM table > WHERE > (array[20]+array[21]+ ... +array[50]+array[51])<5000 > AND array[20]<>0 > AND array[21]<>0 > ... > AND array[50]<>0 > AND array[51])<>0 > Any ideas to make this query faster? What's the array datatype? Integer or float would probably go a lot faster than NUMERIC, if that's what you're using now. regards, tom lane
K C Lau <kclau60@netvigator.com> writes: > Indeed, I get rejected even with: > .. WHERE ANY(array) = 'xx' > It would only work as documented in the manual (8.10.5): > SELECT * FROM sal_emp WHERE 10000 = ANY (pay_by_quarter); That's not changing any time soon; the SQL spec defines only the second syntax for ANY, and I believe there would be syntactic ambiguity if we tried to allow the other. > With 8.1.3, I get an error when trying to do this on a Text[] column : > .. WHERE ANY(array) LIKE 'xx%' If you're really intent on doing that, make an operator for "reverse LIKE" and use it with the ANY on the right-hand side. regression=# create function rlike(text,text) returns bool as regression-# 'select $2 like $1' language sql strict immutable; CREATE FUNCTION regression=# create operator ~~~ (procedure = rlike, leftarg = text, regression(# rightarg = text, commutator = ~~); CREATE OPERATOR regression=# select 'xx%' ~~~ any(array['aaa','bbb']); ?column? ---------- f (1 row) regression=# select 'xx%' ~~~ any(array['aaa','xxb']); ?column? ---------- t (1 row) regression=# regards, tom lane
Thank you very much, Tom. We'll try it and report if there is any significant impact performance-wise. Best regards, KC. At 00:25 06/03/25, Tom Lane wrote: >K C Lau <kclau60@netvigator.com> writes: > > Indeed, I get rejected even with: > > .. WHERE ANY(array) = 'xx' > > > It would only work as documented in the manual (8.10.5): > > SELECT * FROM sal_emp WHERE 10000 = ANY (pay_by_quarter); > >That's not changing any time soon; the SQL spec defines only the second >syntax for ANY, and I believe there would be syntactic ambiguity if we >tried to allow the other. > > > With 8.1.3, I get an error when trying to do this on a Text[] column : > > .. WHERE ANY(array) LIKE 'xx%' > >If you're really intent on doing that, make an operator for "reverse >LIKE" and use it with the ANY on the right-hand side. > >regression=# create function rlike(text,text) returns bool as >regression-# 'select $2 like $1' language sql strict immutable; >CREATE FUNCTION >regression=# create operator ~~~ (procedure = rlike, leftarg = text, >regression(# rightarg = text, commutator = ~~); >CREATE OPERATOR >regression=# select 'xx%' ~~~ any(array['aaa','bbb']); > ?column? >---------- > f >(1 row) > >regression=# select 'xx%' ~~~ any(array['aaa','xxb']); > ?column? >---------- > t >(1 row) > >regression=# > > regards, tom lane
Tom Lane wrote: >Ruben Rubio Rey <ruben@rentalia.com> writes: > > >>SELECT (array[20]+array[21]+ ... +array[50]+array[51]) as total >>FROM table >>WHERE >>(array[20]+array[21]+ ... +array[50]+array[51])<5000 >>AND array[20]<>0 >>AND array[21]<>0 >> ... >>AND array[50]<>0 >>AND array[51])<>0 >> >Any ideas to make this query faster? > > > >What's the array datatype? Integer or float would probably go a lot >faster than NUMERIC, if that's what you're using now. > > Already its integer[]
On Fri, 2006-03-24 at 12:21, Jim C. Nasby wrote: > On Thu, Mar 23, 2006 at 01:09:49PM +0200, Theo Kramer wrote: > > ii If no to i, is it feasible to extend PostgreSQL to allow traversing > > an index in column descending and column ascending order - assuming > > an order by on more than one column with column order not > > in the same direction and indexes existing? ... if that makes sense. > > Yes. > > stats=# explain select * from email_contrib order by project_id desc, id desc, date desc limit 10; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------ > Limit (cost=0.00..31.76 rows=10 width=24) > -> Index Scan Backward using email_contrib_pkey on email_contrib (cost=0.00..427716532.18 rows=134656656 width=24) > (2 rows) Not quite what I mean - redo the above as follows and then see what explain returns explain select * from email_contrib order by project_id, id, date desc limit 10; -- Regards Theo
On Wed, Mar 29, 2006 at 12:52:31PM +0200, Theo Kramer wrote: > On Fri, 2006-03-24 at 12:21, Jim C. Nasby wrote: > > On Thu, Mar 23, 2006 at 01:09:49PM +0200, Theo Kramer wrote: > > > ii If no to i, is it feasible to extend PostgreSQL to allow traversing > > > an index in column descending and column ascending order - assuming > > > an order by on more than one column with column order not > > > in the same direction and indexes existing? ... if that makes sense. > > > > Yes. > > > > stats=# explain select * from email_contrib order by project_id desc, id desc, date desc limit 10; > > QUERY PLAN > > ------------------------------------------------------------------------------------------------------------------------ > > Limit (cost=0.00..31.76 rows=10 width=24) > > -> Index Scan Backward using email_contrib_pkey on email_contrib (cost=0.00..427716532.18 rows=134656656 width=24) > > (2 rows) > > Not quite what I mean - redo the above as follows and then see what > explain returns > > explain select * from email_contrib order by project_id, id, date desc > limit 10; Ahh. There's a hack to do that by defining a new opclass that reverses < and >, and then doing ORDER BY project_id, id, date USING new_opclass. I think there's a TODO about this, but I'm not sure... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim C. Nasby wrote: > On Wed, Mar 29, 2006 at 12:52:31PM +0200, Theo Kramer wrote: > > On Fri, 2006-03-24 at 12:21, Jim C. Nasby wrote: > > > On Thu, Mar 23, 2006 at 01:09:49PM +0200, Theo Kramer wrote: > > > > ii If no to i, is it feasible to extend PostgreSQL to allow traversing > > > > an index in column descending and column ascending order - assuming > > > > an order by on more than one column with column order not > > > > in the same direction and indexes existing? ... if that makes sense. > > > > > > Yes. > > > > > > stats=# explain select * from email_contrib order by project_id desc, id desc, date desc limit 10; > > > QUERY PLAN > > > ------------------------------------------------------------------------------------------------------------------------ > > > Limit (cost=0.00..31.76 rows=10 width=24) > > > -> Index Scan Backward using email_contrib_pkey on email_contrib (cost=0.00..427716532.18 rows=134656656 width=24) > > > (2 rows) > > > > Not quite what I mean - redo the above as follows and then see what > > explain returns > > > > explain select * from email_contrib order by project_id, id, date desc > > limit 10; > > Ahh. There's a hack to do that by defining a new opclass that reverses < > and >, and then doing ORDER BY project_id, id, date USING new_opclass. > > I think there's a TODO about this, but I'm not sure... Yes, and updated: * Allow the creation of indexes with mixed ascending/descending specifiers This is possible now by creating an operator class with reversed sort operators. One complexity is that NULLs would then appear at the start of the result set, and this might affect certain sort types, like merge join. -- Bruce Momjian http://candle.pha.pa.us + If your life is a hard drive, Christ can be your backup. +
Hi, Bruce, Bruce Momjian wrote: >>Ahh. There's a hack to do that by defining a new opclass that reverses < >>and >, and then doing ORDER BY project_id, id, date USING new_opclass. >> >>I think there's a TODO about this, but I'm not sure... > > Yes, and updated: > > * Allow the creation of indexes with mixed ascending/descending > specifiers > > This is possible now by creating an operator class with reversed sort > operators. One complexity is that NULLs would then appear at the start > of the result set, and this might affect certain sort types, like > merge join. I think it would be better to allow "index zig-zag scans" for multi-column index.[1] So it traverses in a given order on the higher order column, and the sub trees for each specific high order value is traversed in reversed order. From my knowledge at least of BTrees, and given correct commutator definitions, this should be not so complicated to implement.[2] This would allow the query planner to use the same index for arbitrary ASC/DESC combinations of the given columns. Just a thought, Markus [1] It may make sense to implement the mixed specifiers on indices as well, to allow CLUSTERing on mixed search order. [2] But I admit that I currently don't have enough knowledge in PostgreSQL index scan internals to know whether it really is easy to implement. -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org