Thread: Indexes with descending date columns

Indexes with descending date columns

From
Theo Kramer
Date:
Hi

I have a performance problem when traversing a table in index order with
multiple columns including a date column in date reverse order. Below
follows a simplified description of the table, the index and the
associated query

\d prcdedit
 prcdedit_prcd       | character(20)               |
 prcdedit_date       | timestamp without time zone |

Indexes:
    "prcdedit_idx" btree (prcdedit_prcd, prcdedit_date)

When invoking a query such as

select oid, prcdedit_prcd, prcdedit_date, 'dd/mm/yyyy hh24:mi:ss') as
mydate where prcdedit_prcd > 'somevalue' order by prcdedit_prcd,
prcdedit_date desc;

the peformance is dismal.

However removing the 'desc' qualifier as follows the query flys

select oid, prcdedit_prcd, prcdedit_date, 'dd/mm/yyyy hh24:mi:ss') as
mydate where prcdedit_prcd > 'somevalue' order by prcdedit_prcd,
prcdedit_date;

PostgreSQL Version = 8.1.2

Row count on the table is > 300000

Explain is as follows for desc
 Limit  (cost=81486.35..81486.41 rows=25 width=230) (actual
time=116619.652..116619.861 rows=25 loops=1)
   ->  Sort  (cost=81486.35..82411.34 rows=369997 width=230) (actual
time=116619.646..116619.729 rows=25 loops=1)
         Sort Key: prcdedit_prcd, prcdedit_date, oid
         ->  Bitmap Heap Scan on prcdedit  (cost=4645.99..23454.94
rows=369997 width=230) (actual time=376.952..11798.834 rows=369630
loops=1)
               Recheck Cond: (prcdedit_prcd > '063266
'::bpchar)
               ->  Bitmap Index Scan on prcdedit_idx
(cost=0.00..4645.99 rows=369997 width=0) (actual time=366.048..366.048
rows=369630 loops=1)
                     Index Cond: (prcdedit_prcd > '063266
'::bpchar)
 Total runtime: 116950.175 ms

and as follows when I remove the 'desc'

 Limit  (cost=0.00..2.34 rows=25 width=230) (actual time=0.082..0.535
rows=25 loops=1)
   ->  Index Scan using prcdedit_idx on prcdedit  (cost=0.00..34664.63
rows=369997 width=230) (actual time=0.075..0.405 rows=25 loops=1)
         Index Cond: (prcdedit_prcd > '063266              '::bpchar)
 Total runtime: 0.664 ms


Any assistance/advice much appreciated.

--
Regards
Theo


Re: Indexes with descending date columns

From
andrew@pillette.com
Date:
> I have a performance problem when traversing a table in index order with
> multiple columns including a date column in date reverse order. Below
> follows a simplified description of the table, the index and the
> associated query
>
> \d prcdedit
>  prcdedit_prcd       | character(20)               |
>  prcdedit_date       | timestamp without time zone |
>
> Indexes:
>     "prcdedit_idx" btree (prcdedit_prcd, prcdedit_date)

Depending on how you use the table, there are three possible solutions.

First, if it makes sense in the domain, using an ORDER BY where _both_ columns are used descending will make PG search
theindex in reverse and will be just as fast as when both as searched by the default ascending. 

Second possibility: Create a dummy column whose value depends on the negative of prcdedit_date, e.g., -extract(epoch
fromprcdedit_date), keep the dummy column in sync with the original column using triggers, and rewrite your queries to
useORDER BY prcdedit_prod, dummy_column. 

Third: Create an index on a function which sorts in the order you want, and then always sort using the function index
(youcould use the -extract(epoch...) gimmick for that, among other possibilities.) 

HTH.

Re: Indexes with descending date columns

From
Theo Kramer
Date:
On Fri, 2006-03-17 at 08:25, andrew@pillette.com wrote:
> > I have a performance problem when traversing a table in index order with
> > multiple columns including a date column in date reverse order. Below
> > follows a simplified description of the table, the index and the
> > associated query
> >
> > \d prcdedit
> >  prcdedit_prcd       | character(20)               |
> >  prcdedit_date       | timestamp without time zone |
> >
> > Indexes:
> >     "prcdedit_idx" btree (prcdedit_prcd, prcdedit_date)
>
> Depending on how you use the table, there are three possible solutions.
>
> First, if it makes sense in the domain, using an ORDER BY where _both_ columns are used descending will make PG
searchthe index in reverse and will be just as fast as when both as searched by the default ascending. 
>
> Second possibility: Create a dummy column whose value depends on the negative of prcdedit_date, e.g., -extract(epoch
fromprcdedit_date), keep the dummy column in sync with the original column using triggers, and rewrite your queries to
useORDER BY prcdedit_prod, dummy_column. 
>
> Third: Create an index on a function which sorts in the order you want, and then always sort using the function index
(youcould use the -extract(epoch...) gimmick for that, among other possibilities.) 
>
> HTH.

All good input - thanks, however, before I start messing with my stuff
which I know will be complex - some questions to any of the developers
on the list.

i  Is it feasible to extend index creation to support descending
   columns? ... this is supported on other commercial and non
   commercial databases, but I do not know if this is a SQL standard.

ii If no to i, is it feasible to extend PostgreSQL to allow traversing
   an index in column descending and column ascending order - assuming
   an order by on more than one column with column order not
   in the same direction and indexes existing? ... if that makes sense.

--
Regards
Theo


Re: Indexes with descending date columns

From
Alvaro Herrera
Date:
Theo Kramer wrote:

> All good input - thanks, however, before I start messing with my stuff
> which I know will be complex - some questions to any of the developers
> on the list.
>
> i  Is it feasible to extend index creation to support descending
>    columns? ... this is supported on other commercial and non
>    commercial databases, but I do not know if this is a SQL standard.

This can be done.  You need to create an operator class which specifies
the reverse sort order (i.e. reverse the operators), and then use it in
the new index.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Indexes with descending date columns

From
Theo Kramer
Date:
On Thu, 2006-03-23 at 16:16, Alvaro Herrera wrote:
> Theo Kramer wrote:
>
> > All good input - thanks, however, before I start messing with my stuff
> > which I know will be complex - some questions to any of the developers
> > on the list.
> >
> > i  Is it feasible to extend index creation to support descending
> >    columns? ... this is supported on other commercial and non
> >    commercial databases, but I do not know if this is a SQL standard.
>
> This can be done.  You need to create an operator class which specifies
> the reverse sort order (i.e. reverse the operators), and then use it in
> the new index.

Hmmm, would that then result in the following syntax  being valid?

  create index my_idx on my_table (c1, c2 desc, c3, c4 desc) ;

where my_table is defined as

  create table my_table (
    c1 text,
    c2 timestamp,
    c3 integer,
    c4 integer
  );

If so, I would appreciate any pointers on where to start on this -
already fumbling my way through Interfacing Extensions To Indexes in the
manual...

Regards
Theo
--
Regards
Theo


Re: Indexes with descending date columns

From
Tom Lane
Date:
Theo Kramer <theo@flame.co.za> writes:
> If so, I would appreciate any pointers on where to start on this -
> already fumbling my way through Interfacing Extensions To Indexes in the
> manual...

Search the PG list archives for discussions of reverse-sort opclasses.
It's really pretty trivial, once you've created a negated btree
comparison function for the datatype.

This is the sort of thing that we are almost but not quite ready to put
into the standard distribution.  The issues that are bugging me have to
do with whether NULLs sort low or high --- right now, if you make a
reverse-sort opclass, it will effectively sort NULLs low instead of
high, and that has some unpleasant consequences because the rest of the
system isn't prepared for variance on the point (in particular I'm
afraid this could break mergejoins).  I'd like to see us make "NULLs
low" vs "NULLs high" be a defined property of opclasses, and deal with
the fallout from that, and then we could put reverse-sort opclasses for
all the standard datatypes into the regular distribution.

            regards, tom lane

Re: Indexes with descending date columns

From
"Jim C. Nasby"
Date:
On Thu, Mar 23, 2006 at 01:09:49PM +0200, Theo Kramer wrote:
> ii If no to i, is it feasible to extend PostgreSQL to allow traversing
>    an index in column descending and column ascending order - assuming
>    an order by on more than one column with column order not
>    in the same direction and indexes existing? ... if that makes sense.

Yes.

stats=# explain select * from email_contrib order by project_id desc, id desc, date desc limit 10;
                                                       QUERY PLAN


------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..31.76 rows=10 width=24)
   ->  Index Scan Backward using email_contrib_pkey on email_contrib  (cost=0.00..427716532.18 rows=134656656 width=24)
(2 rows)

--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Array performance

From
Ruben Rubio Rey
Date:
Hi,

I have a select like

SELECT (array[20]+array[21]+ ... +array[50]+array[51]) as total
FROM table
WHERE
(array[20]+array[21]+ ... +array[50]+array[51])<5000
AND array[20]<>0
AND array[21]<>0
 ...
AND array[50]<>0
AND array[51])<>0

Any ideas to make this query faster?

Re: Array performance

From
"Jim C. Nasby"
Date:
On Fri, Mar 24, 2006 at 01:41:50PM +0100, Ruben Rubio Rey wrote:
> Hi,
>
> I have a select like
>
> SELECT (array[20]+array[21]+ ... +array[50]+array[51]) as total
> FROM table
> WHERE
> (array[20]+array[21]+ ... +array[50]+array[51])<5000

http://www.varlena.com/GeneralBits/109.php might provide some useful
insights. I also recall seeing something about sum operators for arrays,
but I can't recall where.

> AND array[20]<>0
> AND array[21]<>0
> ...
> AND array[50]<>0
> AND array[51])<>0

Uhm... please don't tell me that you're using 0 in place of NULL...

You might be able to greatly simplify that by use of ANY; you'd need to
ditch elements 1-19 though:

... WHERE NOT ANY(array) = 0

See http://www.postgresql.org/docs/8.1/interactive/arrays.html
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Array performance

From
Ruben Rubio Rey
Date:
Jim C. Nasby wrote:

>On Fri, Mar 24, 2006 at 01:41:50PM +0100, Ruben Rubio Rey wrote:
>
>
>>Hi,
>>
>>I have a select like
>>
>>SELECT (array[20]+array[21]+ ... +array[50]+array[51]) as total
>>FROM table
>>WHERE
>>(array[20]+array[21]+ ... +array[50]+array[51])<5000
>>
>>
>
>http://www.varlena.com/GeneralBits/109.php might provide some useful
>insights. I also recall seeing something about sum operators for arrays,
>but I can't recall where.
>
>
I ll check it out, seems to be very useful
Is faster create a function to sum the array?

>
>
>>AND array[20]<>0
>>AND array[21]<>0
>>...
>>AND array[50]<>0
>>AND array[51])<>0
>>
>>
>
>Uhm... please don't tell me that you're using 0 in place of NULL...
>
>
mmm ... i have read in postgres documentation that null values on arrays
are not supported ...

>You might be able to greatly simplify that by use of ANY; you'd need to
>ditch elements 1-19 though:
>
>... WHERE NOT ANY(array) = 0
>
>
Yep this is much better.

>See http://www.postgresql.org/docs/8.1/interactive/arrays.html
>
>



Re: Array performance

From
"Jim C. Nasby"
Date:
On Fri, Mar 24, 2006 at 02:01:29PM +0100, Ruben Rubio Rey wrote:
> >http://www.varlena.com/GeneralBits/109.php might provide some useful
> >insights. I also recall seeing something about sum operators for arrays,
> >but I can't recall where.
> >
> >
> I ll check it out, seems to be very useful
> Is faster create a function to sum the array?

There's been talk of having one, but I don't think any such thing
currently exists.

> >>AND array[20]<>0
> >>AND array[21]<>0
> >>...
> >>AND array[50]<>0
> >>AND array[51])<>0
> >>
> >>
> >
> >Uhm... please don't tell me that you're using 0 in place of NULL...
> >
> >
> mmm ... i have read in postgres documentation that null values on arrays
> are not supported ...

Damn, you're right. Another reason I tend to stay away from them...
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Array performance

From
Michael Fuhr
Date:
On Fri, Mar 24, 2006 at 07:06:19AM -0600, Jim C. Nasby wrote:
> On Fri, Mar 24, 2006 at 02:01:29PM +0100, Ruben Rubio Rey wrote:
> > mmm ... i have read in postgres documentation that null values on arrays
> > are not supported ...
>
> Damn, you're right. Another reason I tend to stay away from them...

8.2 will support NULL array elements.

http://archives.postgresql.org/pgsql-committers/2005-11/msg00385.php
http://developer.postgresql.org/docs/postgres/arrays.html

test=> SELECT '{1,2,NULL,3,4}'::integer[];
      int4
----------------
 {1,2,NULL,3,4}
(1 row)

--
Michael Fuhr

limitation using LIKE on ANY(array)

From
K C Lau
Date:

With 8.1.3, I get an error when trying to do this on a Text[] column
:
.. WHERE ANY(array) LIKE 'xx%'

Indeed, I get rejected even with:
.. WHERE ANY(array) = 'xx'

In both cases, the error is: ERROR:  syntax error at or near
"any" ... 

It would only work as documented in the manual (8.10.5):
SELECT * FROM sal_emp WHERE 10000 = ANY (pay_by_quarter);

It appears that this restriction is still in place in 8.2:
http://developer.postgresql.org/docs/postgres/arrays.html

Is that the case?

Thanks in advance,
KC.

Re: Array performance

From
Tom Lane
Date:
Ruben Rubio Rey <ruben@rentalia.com> writes:
> SELECT (array[20]+array[21]+ ... +array[50]+array[51]) as total
> FROM table
> WHERE
> (array[20]+array[21]+ ... +array[50]+array[51])<5000
> AND array[20]<>0
> AND array[21]<>0
>  ...
> AND array[50]<>0
> AND array[51])<>0

> Any ideas to make this query faster?

What's the array datatype?  Integer or float would probably go a lot
faster than NUMERIC, if that's what you're using now.

            regards, tom lane

Re: limitation using LIKE on ANY(array)

From
Tom Lane
Date:
K C Lau <kclau60@netvigator.com> writes:
> Indeed, I get rejected even with:
> .. WHERE ANY(array) = 'xx'

> It would only work as documented in the manual (8.10.5):
> SELECT * FROM sal_emp WHERE 10000 = ANY (pay_by_quarter);

That's not changing any time soon; the SQL spec defines only the second
syntax for ANY, and I believe there would be syntactic ambiguity if we
tried to allow the other.

> With 8.1.3, I get an error when trying to do this on a Text[] column :
> .. WHERE ANY(array) LIKE 'xx%'

If you're really intent on doing that, make an operator for "reverse
LIKE" and use it with the ANY on the right-hand side.

regression=# create function rlike(text,text) returns bool as
regression-# 'select $2 like $1' language sql strict immutable;
CREATE FUNCTION
regression=# create operator ~~~ (procedure = rlike, leftarg = text,
regression(# rightarg = text, commutator = ~~);
CREATE OPERATOR
regression=# select 'xx%' ~~~ any(array['aaa','bbb']);
 ?column?
----------
 f
(1 row)

regression=# select 'xx%' ~~~ any(array['aaa','xxb']);
 ?column?
----------
 t
(1 row)

regression=#

            regards, tom lane

Re: limitation using LIKE on ANY(array)

From
K C Lau
Date:
Thank you very much, Tom. We'll try it and report if there is any
significant impact performance-wise.

Best regards,
KC.

At 00:25 06/03/25, Tom Lane wrote:
>K C Lau <kclau60@netvigator.com> writes:
> > Indeed, I get rejected even with:
> > .. WHERE ANY(array) = 'xx'
>
> > It would only work as documented in the manual (8.10.5):
> > SELECT * FROM sal_emp WHERE 10000 = ANY (pay_by_quarter);
>
>That's not changing any time soon; the SQL spec defines only the second
>syntax for ANY, and I believe there would be syntactic ambiguity if we
>tried to allow the other.
>
> > With 8.1.3, I get an error when trying to do this on a Text[] column :
> > .. WHERE ANY(array) LIKE 'xx%'
>
>If you're really intent on doing that, make an operator for "reverse
>LIKE" and use it with the ANY on the right-hand side.
>
>regression=# create function rlike(text,text) returns bool as
>regression-# 'select $2 like $1' language sql strict immutable;
>CREATE FUNCTION
>regression=# create operator ~~~ (procedure = rlike, leftarg = text,
>regression(# rightarg = text, commutator = ~~);
>CREATE OPERATOR
>regression=# select 'xx%' ~~~ any(array['aaa','bbb']);
>  ?column?
>----------
>  f
>(1 row)
>
>regression=# select 'xx%' ~~~ any(array['aaa','xxb']);
>  ?column?
>----------
>  t
>(1 row)
>
>regression=#
>
>                         regards, tom lane


Re: Array performance

From
Ruben Rubio Rey
Date:
Tom Lane wrote:

>Ruben Rubio Rey <ruben@rentalia.com> writes:
>
>
>>SELECT (array[20]+array[21]+ ... +array[50]+array[51]) as total
>>FROM table
>>WHERE
>>(array[20]+array[21]+ ... +array[50]+array[51])<5000
>>AND array[20]<>0
>>AND array[21]<>0
>> ...
>>AND array[50]<>0
>>AND array[51])<>0
>>
>Any ideas to make this query faster?
>
>
>
>What's the array datatype?  Integer or float would probably go a lot
>faster than NUMERIC, if that's what you're using now.
>
>
Already its integer[]

Re: Indexes with descending date columns

From
Theo Kramer
Date:
On Fri, 2006-03-24 at 12:21, Jim C. Nasby wrote:
> On Thu, Mar 23, 2006 at 01:09:49PM +0200, Theo Kramer wrote:
> > ii If no to i, is it feasible to extend PostgreSQL to allow traversing
> >    an index in column descending and column ascending order - assuming
> >    an order by on more than one column with column order not
> >    in the same direction and indexes existing? ... if that makes sense.
>
> Yes.
>
> stats=# explain select * from email_contrib order by project_id desc, id desc, date desc limit 10;
>                                                        QUERY PLAN
  
>
------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..31.76 rows=10 width=24)
>    ->  Index Scan Backward using email_contrib_pkey on email_contrib  (cost=0.00..427716532.18 rows=134656656
width=24)
> (2 rows)

Not quite what I mean - redo the above as follows and then see what
explain returns

explain select * from email_contrib order by project_id, id, date desc
limit 10;

--
Regards
Theo


Re: Indexes with descending date columns

From
"Jim C. Nasby"
Date:
On Wed, Mar 29, 2006 at 12:52:31PM +0200, Theo Kramer wrote:
> On Fri, 2006-03-24 at 12:21, Jim C. Nasby wrote:
> > On Thu, Mar 23, 2006 at 01:09:49PM +0200, Theo Kramer wrote:
> > > ii If no to i, is it feasible to extend PostgreSQL to allow traversing
> > >    an index in column descending and column ascending order - assuming
> > >    an order by on more than one column with column order not
> > >    in the same direction and indexes existing? ... if that makes sense.
> >
> > Yes.
> >
> > stats=# explain select * from email_contrib order by project_id desc, id desc, date desc limit 10;
> >                                                        QUERY PLAN
    
> >
------------------------------------------------------------------------------------------------------------------------
> >  Limit  (cost=0.00..31.76 rows=10 width=24)
> >    ->  Index Scan Backward using email_contrib_pkey on email_contrib  (cost=0.00..427716532.18 rows=134656656
width=24)
> > (2 rows)
>
> Not quite what I mean - redo the above as follows and then see what
> explain returns
>
> explain select * from email_contrib order by project_id, id, date desc
> limit 10;

Ahh. There's a hack to do that by defining a new opclass that reverses <
and >, and then doing ORDER BY project_id, id, date USING new_opclass.

I think there's a TODO about this, but I'm not sure...
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Indexes with descending date columns

From
Bruce Momjian
Date:
Jim C. Nasby wrote:
> On Wed, Mar 29, 2006 at 12:52:31PM +0200, Theo Kramer wrote:
> > On Fri, 2006-03-24 at 12:21, Jim C. Nasby wrote:
> > > On Thu, Mar 23, 2006 at 01:09:49PM +0200, Theo Kramer wrote:
> > > > ii If no to i, is it feasible to extend PostgreSQL to allow traversing
> > > >    an index in column descending and column ascending order - assuming
> > > >    an order by on more than one column with column order not
> > > >    in the same direction and indexes existing? ... if that makes sense.
> > >
> > > Yes.
> > >
> > > stats=# explain select * from email_contrib order by project_id desc, id desc, date desc limit 10;
> > >                                                        QUERY PLAN
      
> > >
------------------------------------------------------------------------------------------------------------------------
> > >  Limit  (cost=0.00..31.76 rows=10 width=24)
> > >    ->  Index Scan Backward using email_contrib_pkey on email_contrib  (cost=0.00..427716532.18 rows=134656656
width=24)
> > > (2 rows)
> >
> > Not quite what I mean - redo the above as follows and then see what
> > explain returns
> >
> > explain select * from email_contrib order by project_id, id, date desc
> > limit 10;
>
> Ahh. There's a hack to do that by defining a new opclass that reverses <
> and >, and then doing ORDER BY project_id, id, date USING new_opclass.
>
> I think there's a TODO about this, but I'm not sure...

Yes, and updated:

    * Allow the creation of indexes with mixed ascending/descending
      specifiers

      This is possible now by creating an operator class with reversed sort
      operators.  One complexity is that NULLs would then appear at the start
      of the result set, and this might affect certain sort types, like
      merge join.

--
  Bruce Momjian   http://candle.pha.pa.us

  + If your life is a hard drive, Christ can be your backup. +

Re: Indexes with descending date columns

From
Markus Schaber
Date:
Hi, Bruce,

Bruce Momjian wrote:

>>Ahh. There's a hack to do that by defining a new opclass that reverses <
>>and >, and then doing ORDER BY project_id, id, date USING new_opclass.
>>
>>I think there's a TODO about this, but I'm not sure...
>
> Yes, and updated:
>
>     * Allow the creation of indexes with mixed ascending/descending
>       specifiers
>
>       This is possible now by creating an operator class with reversed sort
>       operators.  One complexity is that NULLs would then appear at the start
>       of the result set, and this might affect certain sort types, like
>       merge join.

I think it would be better to allow "index zig-zag scans" for
multi-column index.[1]

So it traverses in a given order on the higher order column, and the sub
trees for each specific high order value is traversed in reversed order.
From my knowledge at least of BTrees, and given correct commutator
definitions, this should be not so complicated to implement.[2]

This would allow the query planner to use the same index for arbitrary
ASC/DESC combinations of the given columns.


Just a thought,
Markus


[1] It may make sense to implement the mixed specifiers on indices as
well, to allow CLUSTERing on mixed search order.

[2] But I admit that I currently don't have enough knowledge in
PostgreSQL index scan internals to know whether it really is easy to
implement.


--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org