Thread: Indexes slower when used in decending vs. ascending order?

Indexes slower when used in decending vs. ascending order?

From

Alasdair Young

Date:

11 April 2006, 15:07:12

I was under the impression that Postgres did not need explicit
descending order indexes created and that standard indexes would work
fine.

I am noticing a significant speed difference between the following two
queries (the one using "ORDER BY clientkey desc, premiseskey desc,
logtime desc, logkey desc" takes 19 seconds to execute, versus almost
immediate execution of the "ORDER BY clientkey, premiseskey, logtime,
logkey") and was wondering if there was anything I could do to make
execution any faster.

Any thoughts?



Queries follow:



The slow one:

vigprem=> explain SELECT * FROM log WHERE clientkey in
('000000004000000000010000000001')  AND premiseskey in
('000000004000000000030000000001') and logicaldel = 'N' /*VIG_GEN*/
ORDER BY clientkey desc, premiseskey desc, logtime desc, logkey desc
LIMIT 20 OFFSET 0;
                                                                QUERY
PLAN

-------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.00..74.89 rows=20 width=548)
   ->  Index Scan Backward using logtime_index on log
(cost=0.00..6195163.59 rows=1654578 width=548)
         Index Cond: ((clientkey =
'000000004000000000010000000001'::bpchar) AND (premiseskey =
'000000004000000000030000000001'::bpchar))
         Filter: (logicaldel = 'N'::bpchar)
(4 rows)




The fast one:


vigprem=> explain SELECT * FROM log WHERE clientkey in
('000000004000000000010000000001')  AND premiseskey in
('000000004000000000030000000001') and logicaldel = 'N' /*VIG_GEN*/
ORDER BY clientkey, premiseskey, logtime, logkey LIMIT 20 OFFSET 0;
                                                                QUERY
PLAN

-------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.00..74.89 rows=20 width=548)
   ->  Index Scan using logtime_index on log  (cost=0.00..6195163.59
rows=1654578 width=548)
         Index Cond: ((clientkey =
'000000004000000000010000000001'::bpchar) AND (premiseskey =
'000000004000000000030000000001'::bpchar))
         Filter: (logicaldel = 'N'::bpchar)
(4 rows)

Re: Indexes slower when used in decending vs. ascending order?

From

Tom Lane

Date:

11 April 2006, 15:25:19

Alasdair Young <ayoung@vigilos.com> writes:
> I am noticing a significant speed difference between the following two
> queries (the one using "ORDER BY clientkey desc, premiseskey desc,
> logtime desc, logkey desc" takes 19 seconds to execute, versus almost
> immediate execution of the "ORDER BY clientkey, premiseskey, logtime,
> logkey") and was wondering if there was anything I could do to make
> execution any faster.

I'd bet that the problem is the "filter" on logicaldel --- is the value
of that heavily correlated with the index ordering?  You could be
scanning through many many rows of the index to reach the first (last)
one with logicaldel = 'N'.

            regards, tom lane

Re: Indexes slower when used in decending vs. ascending

From

Alasdair Young

Date:

11 April 2006, 17:47:42

On Tue, 2006-04-11 at 14:18 -0400, Tom Lane wrote:
> Alasdair Young <ayoung@vigilos.com> writes:
> > I am noticing a significant speed difference between the following two
> > queries (the one using "ORDER BY clientkey desc, premiseskey desc,
> > logtime desc, logkey desc" takes 19 seconds to execute, versus almost
> > immediate execution of the "ORDER BY clientkey, premiseskey, logtime,
> > logkey") and was wondering if there was anything I could do to make
> > execution any faster.
>
> I'd bet that the problem is the "filter" on logicaldel --- is the value
> of that heavily correlated with the index ordering?  You could be
> scanning through many many rows of the index to reach the first (last)
> one with logicaldel = 'N'.
>
>             regards, tom lane


Removing the logicaldel seems to give the same results.

(The archives seem to indicate the two queries should take roughly the
same amount of time)


Any other ideas?


- alasdair

The slow version:
vigprem=> explain analyze SELECT * FROM log WHERE clientkey in
('000000004000000000010000000001')  AND premiseskey in
('000000004000000000030000000001') /*VIG_GEN*/  ORDER BY clientkey desc,
premiseskey desc, logtime desc, logkey desc LIMIT 20 OFFSET 0;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..74.84 rows=20 width=548) (actual
time=19799.54..19799.95 rows=20 loops=1)
   ->  Index Scan Backward using logtime_index on log
(cost=0.00..6191056.91 rows=1654586 width=548) (actual
time=19799.54..19799.92 rows=21 loops=1)
         Index Cond: ((clientkey =
'000000004000000000010000000001'::bpchar) AND (premiseskey =
'000000004000000000030000000001'::bpchar))
 Total runtime: 19800.03 msec
(4 rows)




The fast version:
vigprem=> explain analyze SELECT * FROM log WHERE clientkey in
('000000004000000000010000000001')  AND premiseskey in
('000000004000000000030000000001') /*VIG_GEN*/  ORDER BY clientkey,
premiseskey, logtime, logkey LIMIT 20 OFFSET 0;
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..74.84 rows=20 width=548) (actual time=0.35..0.75
rows=20 loops=1)
   ->  Index Scan using logtime_index on log  (cost=0.00..6191056.91
rows=1654586 width=548) (actual time=0.35..0.72 rows=21 loops=1)
         Index Cond: ((clientkey =
'000000004000000000010000000001'::bpchar) AND (premiseskey =
'000000004000000000030000000001'::bpchar))
 Total runtime: 0.80 msec
(4 rows)

Re: Indexes slower when used in decending vs. ascending order?

From

Tom Lane

Date:

11 April 2006, 17:53:23

Alasdair Young <ayoung@vigilos.com> writes:
> On Tue, 2006-04-11 at 14:18 -0400, Tom Lane wrote:
>> I'd bet that the problem is the "filter" on logicaldel --- is the value
>> of that heavily correlated with the index ordering?

> Removing the logicaldel seems to give the same results.

Hmm.  Maybe a whole lot of recently-dead row versions near the upper end
of the index range?

> (The archives seem to indicate the two queries should take roughly the
> same amount of time)

Yeah, the scan speed should be essentially the same in either direction,
I'd think.  I have to suppose that the backwards scan is fetching a
whole lot of rows that it ends up not returning.  Offhand the only
reasons I can think of for that are that the rows are not visible
according to the current MVCC snapshot, or because of a post-index
filter condition.

>  Limit  (cost=0.00..74.84 rows=20 width=548) (actual
> time=19799.54..19799.95 rows=20 loops=1)
>    ->  Index Scan Backward using logtime_index on log
> (cost=0.00..6191056.91 rows=1654586 width=548) (actual
> time=19799.54..19799.92 rows=21 loops=1)
>          Index Cond: ((clientkey =
> '000000004000000000010000000001'::bpchar) AND (premiseskey =
> '000000004000000000030000000001'::bpchar))
>  Total runtime: 19800.03 msec
> (4 rows)

That's pretty spectacular.  There is no way that Postgres is only
fetching one row per second; it's got to be discarding a whole lot
of rows under the hood.  It'd be useful to run VACUUM VERBOSE on
this table and see what it's got to say.

            regards, tom lane

Re: Indexes slower when used in decending vs. ascending

From

Alasdair Young

Date:

11 April 2006, 17:59:34

>That's pretty spectacular.  There is no way that Postgres is only
> fetching one row per second; it's got to be discarding a whole lot
> of rows under the hood.  It'd be useful to run VACUUM VERBOSE on
> this table and see what it's got to say.

vigprem=# vacuum verbose log;
INFO:  --Relation public.log--
INFO:  Pages 82731: Changed 0, Empty 0; Tup 1654586: Vac 0, Keep 0,
UnUsed 0.
        Total CPU 0.70s/0.26u sec elapsed 10.63 sec.
VACUUM

Doesn't mean a lot to me. (I've only just starting using postgres,
coming from an Oracle background) I'm going to go try to work out what
possible post-index filter conditions could be in place...

Thanks for your help. If you have any further ideas, please post them.
I'm convinced I can get this to go faster :)

- alasdair

On Tue, 2006-04-11 at 16:24 -0400, Tom Lane wrote:
> Alasdair Young <ayoung@vigilos.com> writes:
> > On Tue, 2006-04-11 at 14:18 -0400, Tom Lane wrote:
> >> I'd bet that the problem is the "filter" on logicaldel --- is the value
> >> of that heavily correlated with the index ordering?
>
> > Removing the logicaldel seems to give the same results.
>
> Hmm.  Maybe a whole lot of recently-dead row versions near the upper end
> of the index range?
>
> > (The archives seem to indicate the two queries should take roughly the
> > same amount of time)
>
> Yeah, the scan speed should be essentially the same in either direction,
> I'd think.  I have to suppose that the backwards scan is fetching a
> whole lot of rows that it ends up not returning.  Offhand the only
> reasons I can think of for that are that the rows are not visible
> according to the current MVCC snapshot, or because of a post-index
> filter condition.
>
> >  Limit  (cost=0.00..74.84 rows=20 width=548) (actual
> > time=19799.54..19799.95 rows=20 loops=1)
> >    ->  Index Scan Backward using logtime_index on log
> > (cost=0.00..6191056.91 rows=1654586 width=548) (actual
> > time=19799.54..19799.92 rows=21 loops=1)
> >          Index Cond: ((clientkey =
> > '000000004000000000010000000001'::bpchar) AND (premiseskey =
> > '000000004000000000030000000001'::bpchar))
> >  Total runtime: 19800.03 msec
> > (4 rows)
>
> That's pretty spectacular.  There is no way that Postgres is only
> fetching one row per second; it's got to be discarding a whole lot
> of rows under the hood.  It'd be useful to run VACUUM VERBOSE on
> this table and see what it's got to say.
>
>             regards, tom lane

Re: Indexes slower when used in decending vs. ascending order?

From

Tom Lane

Date:

11 April 2006, 18:06:34

Alasdair Young <ayoung@vigilos.com> writes:
>> That's pretty spectacular.  There is no way that Postgres is only
>> fetching one row per second; it's got to be discarding a whole lot
>> of rows under the hood.  It'd be useful to run VACUUM VERBOSE on
>> this table and see what it's got to say.

> vigprem=# vacuum verbose log;
> INFO:  --Relation public.log--
> INFO:  Pages 82731: Changed 0, Empty 0; Tup 1654586: Vac 0, Keep 0,
> UnUsed 0.
>         Total CPU 0.70s/0.26u sec elapsed 10.63 sec.
> VACUUM

[ looks again... ]  Hmm, I can give you one quick performance tip:
use something newer than PG 7.3.x.  VACUUM's output hasn't looked
like that in a long time.

Realizing that you're using such an old version, I now have another
theory: are there by any chance a whole lot of rows matching the WHERE
condition?  If so I think this is explained by a problem we fixed in
PG 8.0: the original coding for btree index search didn't handle lots
of equal keys very well.  IIRC, what it's doing is descending the search
tree to find the first entry matching the WHERE condition --- and then,
because you asked for a backwards scan, it sequentially steps forward
over all the matching keys to get in position to scan them backwards.
In 8.0 we made the tree descent logic smart enough to arrive at the end
of the run of matching keys to start with.

2003-12-20 20:23  tgl

    * src/: backend/access/nbtree/nbtinsert.c,
    backend/access/nbtree/nbtpage.c, backend/access/nbtree/nbtsearch.c,
    include/access/nbtree.h: Improve btree's
    initial-positioning-strategy code so that we never need to step
    more than one entry after descending the search tree to arrive at
    the correct place to start the scan.  This can improve the behavior
    substantially when there are many entries equal to the chosen
    boundary value.  Per suggestion from Dmitry Tkach, 14-Jul-03.

            regards, tom lane

Re: Indexes slower when used in decending vs. ascending order?

From

"Alasdair Young"

Date:

11 April 2006, 20:47:33

This is exactly what I needed to know. Thanks again :)

(Now I just need to get Postgres 8.0 running on a RedHat 7.2/7.3 box :)
(Compiling 8.0 from source seems to require a whole stack of things that
are missing from RH7.2.)

Ah well :)

- alasdair

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, April 11, 2006 1:54 PM
To: Alasdair Young
Cc: pgsql-novice@postgresql.org
Subject: Re: [NOVICE] Indexes slower when used in decending vs.
ascending order?

Alasdair Young <ayoung@vigilos.com> writes:
>> That's pretty spectacular.  There is no way that Postgres is only
>> fetching one row per second; it's got to be discarding a whole lot of

>> rows under the hood.  It'd be useful to run VACUUM VERBOSE on this
>> table and see what it's got to say.

> vigprem=# vacuum verbose log;
> INFO:  --Relation public.log--
> INFO:  Pages 82731: Changed 0, Empty 0; Tup 1654586: Vac 0, Keep 0,
> UnUsed 0.
>         Total CPU 0.70s/0.26u sec elapsed 10.63 sec.
> VACUUM

[ looks again... ]  Hmm, I can give you one quick performance tip:
use something newer than PG 7.3.x.  VACUUM's output hasn't looked like
that in a long time.

Realizing that you're using such an old version, I now have another
theory: are there by any chance a whole lot of rows matching the WHERE
condition?  If so I think this is explained by a problem we fixed in PG
8.0: the original coding for btree index search didn't handle lots of
equal keys very well.  IIRC, what it's doing is descending the search
tree to find the first entry matching the WHERE condition --- and then,
because you asked for a backwards scan, it sequentially steps forward
over all the matching keys to get in position to scan them backwards.
In 8.0 we made the tree descent logic smart enough to arrive at the end
of the run of matching keys to start with.

2003-12-20 20:23  tgl

    * src/: backend/access/nbtree/nbtinsert.c,
    backend/access/nbtree/nbtpage.c,
backend/access/nbtree/nbtsearch.c,
    include/access/nbtree.h: Improve btree's
    initial-positioning-strategy code so that we never need to step
    more than one entry after descending the search tree to arrive
at
    the correct place to start the scan.  This can improve the
behavior
    substantially when there are many entries equal to the chosen
    boundary value.  Per suggestion from Dmitry Tkach, 14-Jul-03.

            regards, tom lane

Re: Indexes slower when used in decending vs. ascending order?

From

"Alasdair Young"

Date:

11 April 2006, 20:52:01

Oh: and as a followup, I ran the same query against the same data in 8.1
and it returned in 0.5 seconds. Looks like this bug/issue/feature in
pre-8.0 versions was to blame.

Thanks once again for your help.

- alasdair

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, April 11, 2006 1:54 PM
To: Alasdair Young
Cc: pgsql-novice@postgresql.org
Subject: Re: [NOVICE] Indexes slower when used in decending vs.
ascending order?

Alasdair Young <ayoung@vigilos.com> writes:
>> That's pretty spectacular.  There is no way that Postgres is only
>> fetching one row per second; it's got to be discarding a whole lot of

>> rows under the hood.  It'd be useful to run VACUUM VERBOSE on this
>> table and see what it's got to say.

> vigprem=# vacuum verbose log;
> INFO:  --Relation public.log--
> INFO:  Pages 82731: Changed 0, Empty 0; Tup 1654586: Vac 0, Keep 0,
> UnUsed 0.
>         Total CPU 0.70s/0.26u sec elapsed 10.63 sec.
> VACUUM

[ looks again... ]  Hmm, I can give you one quick performance tip:
use something newer than PG 7.3.x.  VACUUM's output hasn't looked like
that in a long time.

Realizing that you're using such an old version, I now have another
theory: are there by any chance a whole lot of rows matching the WHERE
condition?  If so I think this is explained by a problem we fixed in PG
8.0: the original coding for btree index search didn't handle lots of
equal keys very well.  IIRC, what it's doing is descending the search
tree to find the first entry matching the WHERE condition --- and then,
because you asked for a backwards scan, it sequentially steps forward
over all the matching keys to get in position to scan them backwards.
In 8.0 we made the tree descent logic smart enough to arrive at the end
of the run of matching keys to start with.

2003-12-20 20:23  tgl

    * src/: backend/access/nbtree/nbtinsert.c,
    backend/access/nbtree/nbtpage.c,
backend/access/nbtree/nbtsearch.c,
    include/access/nbtree.h: Improve btree's
    initial-positioning-strategy code so that we never need to step
    more than one entry after descending the search tree to arrive
at
    the correct place to start the scan.  This can improve the
behavior
    substantially when there are many entries equal to the chosen
    boundary value.  Per suggestion from Dmitry Tkach, 14-Jul-03.

            regards, tom lane

Re: Indexes slower when used in decending vs. ascending order?

From

Tom Lane

Date:

12 April 2006, 00:19:09

"Alasdair Young" <ayoung@vigilos.com> writes:
> (Now I just need to get Postgres 8.0 running on a RedHat 7.2/7.3 box :)

Um ... I work for Red Hat, so I think I have to slip you the bad news
twice today: no sane person would still be running that OS version
either.  7.2 was obsolete about the time I went to work at Red Hat
(in 2001).

If you want something that was current last year and is probably pretty
stable now, PG 8.0.x on Fedora Core 4 would be a good bet.  Personally
I'd recommend PG 8.1.x on FC5 if you are just looking at deploying now.
It'll be much more maintainable a year or two out.

            regards, tom lane

Re: Indexes slower when used in decending vs. ascending

From

Steve Crawford

Date:

12 April 2006, 13:43:20

> Um ... I work for Red Hat, so I think I have to slip you the bad news
> twice today: no sane person would still be running that OS version
> either.  7.2 was obsolete about the time I went to work at Red Hat
> (in 2001).

Sadly, some users are stuck. For instance if you have Dialogic D240SC/T1
Rev 1 cards then the latest drivers you can get are the 5.x and they are
very picky about kernel versions, streams libs, etc.

If you want to build a supported system, you must start with Red Hat
7.2/7.3 and must, among other detailed steps, apply specific kernel
patches. Or you can go through a similar set of steps on Windows. I'll
take Linux - even an old version.

Fortunately I have it on authority of an Intel employee that they will
be open-sourcing the Dialogic drivers sometime in the next year so over
time this problem will probably disappear.

Cheers,
Steve

Re: Indexes slower when used in decending vs. ascending order?

From

"Alasdair Young"

Date:

12 April 2006, 15:16:22

I'm basically stuck with that as well: we run a custom kernel and a
stack of drivers that have no support past 7.2 (at least presently) so
it looks like I'll just need to work out a way of getting 8.0 to compile
on rh7.2 (or patching postgres 7.3 with the bugfix and recompiling :)

None of those look like fun, so this issue will probably be added to our
stack of "reasons to upgrade to FC5"

:)

- alasdair

-----Original Message-----
From: Steve Crawford [mailto:scrawford@pinpointresearch.com]
Sent: Wednesday, April 12, 2006 9:43 AM
To: Tom Lane
Cc: Alasdair Young; pgsql-novice@postgresql.org
Subject: Re: [NOVICE] Indexes slower when used in decending vs.
ascending order?

> Um ... I work for Red Hat, so I think I have to slip you the bad news
> twice today: no sane person would still be running that OS version
> either.  7.2 was obsolete about the time I went to work at Red Hat (in

> 2001).

Sadly, some users are stuck. For instance if you have Dialogic D240SC/T1
Rev 1 cards then the latest drivers you can get are the 5.x and they are
very picky about kernel versions, streams libs, etc.

If you want to build a supported system, you must start with Red Hat
7.2/7.3 and must, among other detailed steps, apply specific kernel
patches. Or you can go through a similar set of steps on Windows. I'll
take Linux - even an old version.

Fortunately I have it on authority of an Intel employee that they will
be open-sourcing the Dialogic drivers sometime in the next year so over
time this problem will probably disappear.

Cheers,
Steve