Thread: Shouldn't the planner have a higher cost for reverse index scans?
All, I was looking at these IOZone results for some NAS hardware and thinking about index scans: Children see throughput for 6 readers = 72270.04 KB/sec Parent sees throughput for 6 readers = 72269.06 KB/sec Min throughput per process = 11686.53 KB/sec Max throughput per process = 12506.65 KB/sec Avg throughput per process = 12045.01 KB/sec Min xfer = 3919344.00 KB Children see throughput for 6 reverse readers = 17313.57 KB/sec Parent sees throughput for 6 reverse readers = 17313.52 KB/sec Min throughput per process = 2569.21 KB/sec Max throughput per process = 3101.18 KB/sec Avg throughput per process = 2885.60 KB/sec Min xfer = 3474840.00 KB Now, what that says to me is that for this system reverse sequential reads are 1/4 the speed of forwards reads. And from my testing elsewhere, that seems fairly typical of disk systems in general. Now, while index scans (for indexes on disk) aren't 100% sequential reads, it seems like we should be increasing (substantially) the estimated cost of reverse index scans if the index is likely to be on disk. No? -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > Now, what that says to me is that for this system reverse sequential > reads are 1/4 the speed of forwards reads. And from my testing > elsewhere, that seems fairly typical of disk systems in general. Well, that's because filesystems try to lay out files so that logically successive sectors are about as far apart as needed to support the disk's maximum transfer rate. If you fetch them in reverse order, then instead of optimizing the rotational latency you find you are pessimizing it. This has got approximately nothing to do with indexscans, either forward or reverse, because then we aren't fetching blocks in a pre-optimized order. > Now, while index scans (for indexes on disk) aren't 100% sequential > reads, it seems like we should be increasing (substantially) the > estimated cost of reverse index scans if the index is likely to be on > disk. No? AFAICS this is already folded into random_page_cost. regards, tom lane
Tom, >> Now, while index scans (for indexes on disk) aren't 100% sequential >> reads, it seems like we should be increasing (substantially) the >> estimated cost of reverse index scans if the index is likely to be on >> disk. No? > > AFAICS this is already folded into random_page_cost. Not as far as I can tell. It looks to me like the planner is assuming that a forwards index scan and a reverse index scan will have the same cost. -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > Not as far as I can tell. It looks to me like the planner is assuming > that a forwards index scan and a reverse index scan will have the same > cost. Right, because they do. If you think otherwise, demonstrate it. (bonnie tests approximating a reverse seqscan are not relevant to the performance of indexscans.) regards, tom lane
Tom, > Right, because they do. If you think otherwise, demonstrate it. > (bonnie tests approximating a reverse seqscan are not relevant > to the performance of indexscans.) Working on it. I *think* I've seen this issue in the field, which is why I brought it up in the first place, but getting a good test case is, of course, difficult. -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com
On Fri, 10 Apr 2009, Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: >> Not as far as I can tell. It looks to me like the planner is assuming >> that a forwards index scan and a reverse index scan will have the same >> cost. > > Right, because they do. If you think otherwise, demonstrate it. They do when the correlation of indexed value versus position in the table is low, resulting in random access. However, when the correlation is near 1, then the index scan approximates to sequential access to disc. In that case, scan direction would be important. Of course, there's the separate issue that correlation isn't actually that good a measure of the cost of an index scan, but I'm not sure what is better, and feasible. Matthew -- Our riverbanks and seashores have a beauty all can share, provided there's at least one boot, three treadless tyres, a half-eaten pork pie, some oil drums, an old felt hat, a lorry-load of tar blocks, and a broken bedstead there. -- Flanders and Swann
Josh Berkus wrote:
The query
reply_date is a timestamp with time zone and has the index
I don't understand why it is so much slower to scan it reverse
It's a fairly big table. About 4.4 million rows, 888MB. That index is 96MB. I tried dropping and recreating the index, but it doesn't seem to have helped any.
Can I create a reverse index on the dates so it can do a forward scan of the reverse index?
Tom,I think I may be experiencing this situation now.Right, because they do. If you think otherwise, demonstrate it.
(bonnie tests approximating a reverse seqscan are not relevant
to the performance of indexscans.)
Working on it. I *think* I've seen this issue in the field, which is why I brought it up in the first place, but getting a good test case is, of course, difficult.
The query
select comment_datetakes 10's of seconds to complete (52 sec last run). However
from user_comments
where user_comments.uid=1
order by comment_date desc limit 1
Explain:
"Limit (cost=0.00..2699.07 rows=1 width=8) (actual time=52848.785..52848.787 rows=1 loops=1)"
" -> Index Scan Backward using idx_user_comments_comment_date on user_comments (cost=0.00..5789515.40 rows=2145 width=8) (actual time=52848.781..52848.781 rows=1 loops=1)"
" Filter: (uid = 1)"
"Total runtime: 52848.840 ms"
select comment_datetakes well under 1 sec.
from user_comments
where user_comments.uid=1
order by comment_date limit 1
Explain:
"Limit (cost=0.00..2699.07 rows=1 width=8) (actual time=70.402..70.403 rows=1 loops=1)"
" -> Index Scan using idx_user_comments_comment_date on user_comments (cost=0.00..5789515.40 rows=2145 width=8) (actual time=70.398..70.398 rows=1 loops=1)"
" Filter: (uid = 1)"
"Total runtime: 70.453 ms"
reply_date is a timestamp with time zone and has the index
CREATE INDEX idx_user_comments_comment_date
ON user_comments
USING btree
(comment_date);
I don't understand why it is so much slower to scan it reverse
It's a fairly big table. About 4.4 million rows, 888MB. That index is 96MB. I tried dropping and recreating the index, but it doesn't seem to have helped any.
Can I create a reverse index on the dates so it can do a forward scan of the reverse index?
Re: Shouldn't the planner have a higher cost for reverse index scans?
From
Grzegorz Jaśkiewicz
Date:
create index foobar on table(row desc);
On Thu, Apr 16, 2009 at 2:02 AM, Lists <lists@on-track.ca> wrote: > > Right, because they do. If you think otherwise, demonstrate it. > (bonnie tests approximating a reverse seqscan are not relevant > to the performance of indexscans.) > > Working on it. I *think* I've seen this issue in the field, which is why I > brought it up in the first place, but getting a good test case is, of > course, difficult. > > > I think I may be experiencing this situation now. > > The query > > select comment_date > from user_comments > where user_comments.uid=1 > order by comment_date desc limit 1 try this: create index comment_data_uid_idx on user_comments(uid, comment_date); select * from user_comments where (uid, comment_date) < (1, high_date) order by uid desc, comment_date desc limit 1; select * from user_comments where (uid, comment_date) > (1, low_date) order by uid, comment_date limit 1; low_date and high_date are arbitrarily chosen to be lower and higher than the lowest and highest dates found in the table, respectively. You will be amazed how much faster this is than what you are doing now. You will not need to make an index for the 'desc' case. for ranges, (give me some comments for user x from now back to particular time: set enable_seqscan = false; select * from user_comments where (uid, comment_date) between(1, time_of_interest) and (1, high_date) order by uid desc, comment_date desc; enable_seqscan is required because the server will suddenly and spectacularly switch to sequential scans because it can't use the non leftmost portion of the index in range queries (this only mainly matters when the left-most field is inselective and the comparison is equal). merlin
Merlin Moncure <mmoncure@gmail.com> writes: > On Thu, Apr 16, 2009 at 2:02 AM, Lists <lists@on-track.ca> wrote: >> select comment_date >> from user_comments >> where user_comments.uid=1 >> order by comment_date desc limit 1 > try this: > create index comment_data_uid_idx on user_comments(uid, comment_date); > select * from user_comments where (uid, comment_date) < (1, high_date) > order by uid desc, comment_date desc limit 1; You don't really need to complicate your queries like that. Having the two-column index will suffice to make the given query work fine, at least in reasonably modern PG versions (since 8.1 I think). regards, tom lane
Lists <lists@on-track.ca> writes: > The query > select comment_date > from user_comments > where user_comments.uid=1 > order by comment_date desc limit 1 > Explain: > "Limit (cost=0.00..2699.07 rows=1 width=8) (actual > time=52848.785..52848.787 rows=1 loops=1)" > " -> Index Scan Backward using idx_user_comments_comment_date on > user_comments (cost=0.00..5789515.40 rows=2145 width=8) (actual > time=52848.781..52848.781 rows=1 loops=1)" > " Filter: (uid = 1)" > "Total runtime: 52848.840 ms" > takes 10's of seconds to complete (52 sec last run). However > select comment_date > from user_comments > where user_comments.uid=1 > order by comment_date limit 1 > Explain: > "Limit (cost=0.00..2699.07 rows=1 width=8) (actual > time=70.402..70.403 rows=1 loops=1)" > " -> Index Scan using idx_user_comments_comment_date on > user_comments (cost=0.00..5789515.40 rows=2145 width=8) (actual > time=70.398..70.398 rows=1 loops=1)" > " Filter: (uid = 1)" > "Total runtime: 70.453 ms" > takes well under 1 sec. AFAICS this is pure chance --- it is based on when we happen to hit the first row with uid = 1 while scanning in forward or reverse comment_date order. Unless you have evidence that the number of rows skipped over is similar in both cases, there is no reason to suppose that this example bears on Josh's concern. As noted by Merlin, if you're willing to create another index to help this type of query, then a two-column index on (uid, comment_date) would be ideal. regards, tom lane
Tom Lane wrote:
Thank you Tom and Merlin (and Grzegorz for the answer to my other question I no longer need). The composite index seems to do the trick. The reverse index scan is now taking about the same time.
Rows with uid=1 should be spread throughout the table but there should be a larger amount earlier in the table (based on insert order).
I already had a separate index on uid
Lists <lists@on-track.ca> writes:The queryselect comment_date from user_comments where user_comments.uid=1 order by comment_date desc limit 1Explain: "Limit (cost=0.00..2699.07 rows=1 width=8) (actual time=52848.785..52848.787 rows=1 loops=1)" " -> Index Scan Backward using idx_user_comments_comment_date on user_comments (cost=0.00..5789515.40 rows=2145 width=8) (actual time=52848.781..52848.781 rows=1 loops=1)" " Filter: (uid = 1)" "Total runtime: 52848.840 ms"takes 10's of seconds to complete (52 sec last run). Howeverselect comment_date from user_comments where user_comments.uid=1 order by comment_date limit 1Explain: "Limit (cost=0.00..2699.07 rows=1 width=8) (actual time=70.402..70.403 rows=1 loops=1)" " -> Index Scan using idx_user_comments_comment_date on user_comments (cost=0.00..5789515.40 rows=2145 width=8) (actual time=70.398..70.398 rows=1 loops=1)" " Filter: (uid = 1)" "Total runtime: 70.453 ms"takes well under 1 sec.AFAICS this is pure chance --- it is based on when we happen to hit the first row with uid = 1 while scanning in forward or reverse comment_date order. Unless you have evidence that the number of rows skipped over is similar in both cases, there is no reason to suppose that this example bears on Josh's concern. As noted by Merlin, if you're willing to create another index to help this type of query, then a two-column index on (uid, comment_date) would be ideal. regards, tom lane
Thank you Tom and Merlin (and Grzegorz for the answer to my other question I no longer need). The composite index seems to do the trick. The reverse index scan is now taking about the same time.
Rows with uid=1 should be spread throughout the table but there should be a larger amount earlier in the table (based on insert order).
I already had a separate index on uid
CREATE INDEX idx_user_comments_uidUnder the circumstances, shouldn't a bitmap of those 2 indexes be far faster than using just the date index (compared to the old plan, not the new composite index). Why would the planner not choose that plan?
ON user_comments
USING btree
(uid);
Lists <lists@on-track.ca> writes: > I already had a separate index on uid > CREATE INDEX idx_user_comments_uid > ON user_comments > USING btree > (uid); > Under the circumstances, shouldn't a bitmap of those 2 indexes be far > faster than using just the date index (compared to the old plan, not the > new composite index). Why would the planner not choose that plan? It wouldn't produce sorted output; you'd have to read all the rows with uid 1 and then sort them to find the lowest [highest] comment_date. I'm sure the planner did consider that, but guessed that the other way would win on average. The fact that you have lots of rows with uid=1 would tend to push its cost estimates in that direction. Unfortunately it doesn't have any clue that the rows with uid=1 are concentrated in older comment_dates, making the approach a loser for the highest-date flavor of the problem. regards, tom lane