Thread: Shouldn't the planner have a higher cost for reverse index scans?

From:
Josh Berkus
Date:

All,

I was looking at these IOZone results for some NAS hardware and thinking
about index scans:

Children see throughput for  6 readers         =   72270.04 KB/sec
Parent sees throughput for  6 readers         =   72269.06 KB/sec
Min throughput per process             =   11686.53 KB/sec
Max throughput per process             =   12506.65 KB/sec
Avg throughput per process             =   12045.01 KB/sec
Min xfer                     = 3919344.00 KB

Children see throughput for 6 reverse readers     =   17313.57 KB/sec
Parent sees throughput for 6 reverse readers     =   17313.52 KB/sec
Min throughput per process             =    2569.21 KB/sec
Max throughput per process             =    3101.18 KB/sec
Avg throughput per process             =    2885.60 KB/sec
Min xfer                     = 3474840.00 KB

Now, what that says to me is that for this system reverse sequential
reads are 1/4 the speed of forwards reads.  And from my testing
elsewhere, that seems fairly typical of disk systems in general.

Now, while index scans (for indexes on disk) aren't 100% sequential
reads, it seems like we should be increasing (substantially) the
estimated cost of reverse index scans if the index is likely to be on
disk.  No?

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

From:
Tom Lane
Date:

Josh Berkus <> writes:
> Now, what that says to me is that for this system reverse sequential
> reads are 1/4 the speed of forwards reads.  And from my testing
> elsewhere, that seems fairly typical of disk systems in general.

Well, that's because filesystems try to lay out files so that logically
successive sectors are about as far apart as needed to support the
disk's maximum transfer rate.  If you fetch them in reverse order,
then instead of optimizing the rotational latency you find you are
pessimizing it.  This has got approximately nothing to do with
indexscans, either forward or reverse, because then we aren't fetching
blocks in a pre-optimized order.

> Now, while index scans (for indexes on disk) aren't 100% sequential
> reads, it seems like we should be increasing (substantially) the
> estimated cost of reverse index scans if the index is likely to be on
> disk.  No?

AFAICS this is already folded into random_page_cost.

            regards, tom lane

From:
Josh Berkus
Date:

Tom,

>> Now, while index scans (for indexes on disk) aren't 100% sequential
>> reads, it seems like we should be increasing (substantially) the
>> estimated cost of reverse index scans if the index is likely to be on
>> disk.  No?
>
> AFAICS this is already folded into random_page_cost.

Not as far as I can tell.   It looks to me like the planner is assuming
that a forwards index scan and a reverse index scan will have the same
cost.

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

From:
Tom Lane
Date:

Josh Berkus <> writes:
> Not as far as I can tell.   It looks to me like the planner is assuming
> that a forwards index scan and a reverse index scan will have the same
> cost.

Right, because they do.  If you think otherwise, demonstrate it.
(bonnie tests approximating a reverse seqscan are not relevant
to the performance of indexscans.)

            regards, tom lane

From:
Josh Berkus
Date:

Tom,

> Right, because they do.  If you think otherwise, demonstrate it.
> (bonnie tests approximating a reverse seqscan are not relevant
> to the performance of indexscans.)

Working on it.  I *think* I've seen this issue in the field, which is
why I brought it up in the first place, but getting a good test case is,
of course, difficult.


--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

From:
Matthew Wakeling
Date:

On Fri, 10 Apr 2009, Tom Lane wrote:
> Josh Berkus <> writes:
>> Not as far as I can tell.   It looks to me like the planner is assuming
>> that a forwards index scan and a reverse index scan will have the same
>> cost.
>
> Right, because they do.  If you think otherwise, demonstrate it.

They do when the correlation of indexed value versus position in the table
is low, resulting in random access. However, when the correlation is near
1, then the index scan approximates to sequential access to disc. In that
case, scan direction would be important.

Of course, there's the separate issue that correlation isn't actually that
good a measure of the cost of an index scan, but I'm not sure what is
better, and feasible.

Matthew


--
 Our riverbanks and seashores have a beauty all can share, provided
 there's at least one boot, three treadless tyres, a half-eaten pork
 pie, some oil drums, an old felt hat, a lorry-load of tar blocks,
 and a broken bedstead there.                   -- Flanders and Swann

From:
Lists
Date:

Josh Berkus wrote:
Tom,

Right, because they do.  If you think otherwise, demonstrate it.
(bonnie tests approximating a reverse seqscan are not relevant
to the performance of indexscans.)

Working on it.  I *think* I've seen this issue in the field, which is why I brought it up in the first place, but getting a good test case is, of course, difficult.


I think I may be experiencing this situation now.

The query
select comment_date
    from user_comments
    where user_comments.uid=1
    order by comment_date desc limit 1

Explain:
"Limit  (cost=0.00..2699.07 rows=1 width=8) (actual time=52848.785..52848.787 rows=1 loops=1)"
"  ->  Index Scan Backward using idx_user_comments_comment_date on user_comments  (cost=0.00..5789515.40 rows=2145 width=8) (actual time=52848.781..52848.781 rows=1 loops=1)"
"        Filter: (uid = 1)"
"Total runtime: 52848.840 ms"

takes 10's of seconds to complete (52 sec last run). However
select comment_date
    from user_comments
    where user_comments.uid=1
    order by comment_date limit 1

Explain:
"Limit  (cost=0.00..2699.07 rows=1 width=8) (actual time=70.402..70.403 rows=1 loops=1)"
"  ->  Index Scan using idx_user_comments_comment_date on user_comments  (cost=0.00..5789515.40 rows=2145 width=8) (actual time=70.398..70.398 rows=1 loops=1)"
"        Filter: (uid = 1)"
"Total runtime: 70.453 ms"
takes well under 1 sec.


reply_date is a timestamp with time zone and has the index
CREATE INDEX idx_user_comments_comment_date
  ON user_comments
  USING btree
  (comment_date);

I don't understand why it is so much slower to scan it reverse

It's a fairly big table. About 4.4 million rows, 888MB. That index is 96MB. I tried dropping and recreating the index, but it doesn't seem to have helped any.


Can I create a reverse index on the dates so it can do a forward scan of the reverse index?
From:
Grzegorz Jaśkiewicz
Date:

create index foobar on table(row desc);

From:
Merlin Moncure
Date:

On Thu, Apr 16, 2009 at 2:02 AM, Lists <> wrote:
>
> Right, because they do.  If you think otherwise, demonstrate it.
> (bonnie tests approximating a reverse seqscan are not relevant
> to the performance of indexscans.)
>
> Working on it.  I *think* I've seen this issue in the field, which is why I
> brought it up in the first place, but getting a good test case is, of
> course, difficult.
>
>
> I think I may be experiencing this situation now.
>
> The query
>
> select comment_date
>     from user_comments
>     where user_comments.uid=1
>     order by comment_date desc limit 1

try this:
create index comment_data_uid_idx on user_comments(uid, comment_date);

select * from user_comments where (uid, comment_date) < (1, high_date)
  order by uid desc, comment_date desc limit 1;

select * from user_comments where (uid, comment_date) > (1, low_date)
  order by uid, comment_date limit 1;

low_date and high_date are arbitrarily chosen to be lower and higher
than the lowest and highest dates found in the table, respectively.
You will be amazed how much faster this is than what you are doing
now.  You will not need to make an index for the 'desc' case.

for ranges, (give me some comments for user x from now back to particular time:
set enable_seqscan = false;
select * from user_comments where (uid, comment_date)
  between(1, time_of_interest) and (1, high_date)
  order by uid desc, comment_date desc;

enable_seqscan is required because the server will suddenly and
spectacularly switch to sequential scans because it can't use the non
leftmost portion of the index in range queries (this only mainly
matters when the left-most field is inselective and the comparison is
equal).

merlin

From:
Tom Lane
Date:

Merlin Moncure <> writes:
> On Thu, Apr 16, 2009 at 2:02 AM, Lists <> wrote:
>> select comment_date
>>     from user_comments
>>     where user_comments.uid=1
>>     order by comment_date desc limit 1

> try this:
> create index comment_data_uid_idx on user_comments(uid, comment_date);

> select * from user_comments where (uid, comment_date) < (1, high_date)
>   order by uid desc, comment_date desc limit 1;

You don't really need to complicate your queries like that.  Having the
two-column index will suffice to make the given query work fine, at
least in reasonably modern PG versions (since 8.1 I think).

            regards, tom lane

From:
Tom Lane
Date:

Lists <> writes:
> The query

>     select comment_date
>         from user_comments
>         where user_comments.uid=1
>         order by comment_date desc limit 1

>     Explain:
>     "Limit  (cost=0.00..2699.07 rows=1 width=8) (actual
>     time=52848.785..52848.787 rows=1 loops=1)"
>     "  ->  Index Scan Backward using idx_user_comments_comment_date on
>     user_comments  (cost=0.00..5789515.40 rows=2145 width=8) (actual
>     time=52848.781..52848.781 rows=1 loops=1)"
>     "        Filter: (uid = 1)"
>     "Total runtime: 52848.840 ms"

> takes 10's of seconds to complete (52 sec last run). However

>     select comment_date
>         from user_comments
>         where user_comments.uid=1
>         order by comment_date limit 1

>     Explain:
>     "Limit  (cost=0.00..2699.07 rows=1 width=8) (actual
>     time=70.402..70.403 rows=1 loops=1)"
>     "  ->  Index Scan using idx_user_comments_comment_date on
>     user_comments  (cost=0.00..5789515.40 rows=2145 width=8) (actual
>     time=70.398..70.398 rows=1 loops=1)"
>     "        Filter: (uid = 1)"
>     "Total runtime: 70.453 ms"

> takes well under 1 sec.

AFAICS this is pure chance --- it is based on when we happen to hit the
first row with uid = 1 while scanning in forward or reverse comment_date
order.  Unless you have evidence that the number of rows skipped over
is similar in both cases, there is no reason to suppose that this
example bears on Josh's concern.

As noted by Merlin, if you're willing to create another index to help
this type of query, then a two-column index on (uid, comment_date) would
be ideal.

            regards, tom lane

From:
Lists
Date:

Tom Lane wrote:
Lists <> writes: 
The query   
 
    select comment_date       from user_comments       where user_comments.uid=1       order by comment_date desc limit 1   
 
    Explain:   "Limit  (cost=0.00..2699.07 rows=1 width=8) (actual   time=52848.785..52848.787 rows=1 loops=1)"   "  ->  Index Scan Backward using idx_user_comments_comment_date on   user_comments  (cost=0.00..5789515.40 rows=2145 width=8) (actual   time=52848.781..52848.781 rows=1 loops=1)"   "        Filter: (uid = 1)"   "Total runtime: 52848.840 ms"   
 
takes 10's of seconds to complete (52 sec last run). However   
 
    select comment_date       from user_comments       where user_comments.uid=1       order by comment_date limit 1   
 
    Explain:   "Limit  (cost=0.00..2699.07 rows=1 width=8) (actual   time=70.402..70.403 rows=1 loops=1)"   "  ->  Index Scan using idx_user_comments_comment_date on   user_comments  (cost=0.00..5789515.40 rows=2145 width=8) (actual   time=70.398..70.398 rows=1 loops=1)"   "        Filter: (uid = 1)"   "Total runtime: 70.453 ms"   
 
takes well under 1 sec.   
AFAICS this is pure chance --- it is based on when we happen to hit the
first row with uid = 1 while scanning in forward or reverse comment_date
order.  Unless you have evidence that the number of rows skipped over
is similar in both cases, there is no reason to suppose that this
example bears on Josh's concern.

As noted by Merlin, if you're willing to create another index to help
this type of query, then a two-column index on (uid, comment_date) would
be ideal.
		regards, tom lane 

Thank you Tom and Merlin (and Grzegorz for the answer to my other question I no longer need). The composite index seems to do the trick. The reverse index scan is now taking about the same time.

Rows with uid=1 should be spread throughout the table but there should be a larger amount earlier in the table (based on insert order).

I already had a separate index on uid
CREATE INDEX idx_user_comments_uid
  ON user_comments
  USING btree
  (uid);
Under the circumstances, shouldn't a bitmap of those 2 indexes be far faster than using just the date index (compared to the old plan, not the new composite index). Why would the planner not choose that plan?
From:
Tom Lane
Date:

Lists <> writes:
> I already had a separate index on uid

>     CREATE INDEX idx_user_comments_uid
>       ON user_comments
>       USING btree
>       (uid);

> Under the circumstances, shouldn't a bitmap of those 2 indexes be far
> faster than using just the date index (compared to the old plan, not the
> new composite index). Why would the planner not choose that plan?

It wouldn't produce sorted output; you'd have to read all the rows with
uid 1 and then sort them to find the lowest [highest] comment_date.
I'm sure the planner did consider that, but guessed that the other way
would win on average.  The fact that you have lots of rows with uid=1
would tend to push its cost estimates in that direction.  Unfortunately
it doesn't have any clue that the rows with uid=1 are concentrated in
older comment_dates, making the approach a loser for the highest-date
flavor of the problem.

            regards, tom lane