Thread: The planner chooses seqscan+sort when there is an index on the sort column
Hi all, I wonder why this happens: - postgres: 8.1.3 - the table has ~200 million rows; - there is a primary key on (col_1, col_2); - the table was ANALYZEd; - the planner chooses seqscan+sort for the following query even with enable_seqscan=off: select * from table order by col_1; Isn't it supposed to choose the index scan at least when enable_seqscan=off ? Even if it is indeed not faster to do the index scan than seqscan+sort. The actual plan looks like (names changed): db=# set enable_seqscan = off; SET db=# explain select * from table order by col_1; QUERY PLAN ----------------------------------------------------------------------------------------- Sort (cost=165585198.70..166102386.06 rows=206874944 width=60) Sort Key: col_1 -> Seq Scan on table (cost=100000000.00..104552091.44 rows=206874944 width=60) (3 rows) db=# \d table Table "public.table" Column | Type | Modifiers -----------------+-----------------------------+---------------------------------------------------- col_1 | bigint | not null col_2 | bigint | not null ... Indexes: "pk_table" PRIMARY KEY, btree (col_1, col_2) ... Cheers, Csaba.
Re: The planner chooses seqscan+sort when there is an index on the sort column
From
"John D. Burger"
Date:
Csaba Nagy wrote: > select * from table order by col_1; > > Isn't it supposed to choose the index scan at least when > enable_seqscan=off ? Even if it is indeed not faster to do the index > scan than seqscan+sort. I think because you've asked for every row, it's going to have to scan the whole table anyway, to determine MVCC "liveness" of the rows (sorry, dunno what the correct word is). - John Burger MITRE
Re: The planner chooses seqscan+sort when there is an index on the sort column
From
Andreas Kretschmer
Date:
Csaba Nagy <nagy@ecircle-ag.com> schrieb: > select * from table order by col_1; Without a WHERE you get the whole table. A Index-Scan is, in this case, expensive. HTH, Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknow) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
On Wed, 2006-05-03 at 17:48, John D. Burger wrote: > Csaba Nagy wrote: > > > select * from table order by col_1; > > > > Isn't it supposed to choose the index scan at least when > > enable_seqscan=off ? Even if it is indeed not faster to do the index > > scan than seqscan+sort. > > I think because you've asked for every row, it's going to have to scan > the whole table anyway, to determine MVCC "liveness" of the rows > (sorry, dunno what the correct word is). But I also asked for _ordered_ results, which the seq scan is not covering, but the index does... and I specifically disabled sequential scan. That means the planner is not even considering the primary key index, and I would like to know why... Actually this is a problem for me in a more complex query, which also needs this table ordered by that column, and it results in the same plan fragment with sequential scan + sort. Thanks, Csaba.
> Without a WHERE you get the whole table. > > A Index-Scan is, in this case, expensive. I know that, and I would agree if I wouldn't have asked for _ordered_ result, and wouldn't have turned enable_seqscan=off. In these conditions I would have expected an index scan, even if it is more expensive than the sequential scan + sort. Actually I wanted to see how expensive it thinks it is, but I can't get to that plan at all. Thanks, Csaba.
> But I also asked for _ordered_ results, which the seq scan is not > covering, but the index does... and I specifically disabled sequential > scan. Docs say: > Enables or disables the query planner's use of sequential scan plan > types. It's not possible to suppress sequential scans entirely, but > turning this variable off discourages the planner from using one if > there are other methods available. Note the second sentence. Again, I think it will have to scan the whole table anyway, because that's what you've asked for, and given that, enable_seqscan=off doesn't apply. - John Burger MITRE
> Docs say: > > > Enables or disables the query planner's use of sequential scan plan > > types. It's not possible to suppress sequential scans entirely, but > > turning this variable off discourages the planner from using one if > > there are other methods available. > > Note the second sentence. Again, I think it will have to scan the > whole table anyway, because that's what you've asked for, and given > that, enable_seqscan=off doesn't apply. OK, maybe that's the point... the "cost bust" given to the sequential scan by enable_seqscan=off is not enough in this case to exceed the cost of the index scan ? The table is quite big, might be possible. I still wonder why would be seqscan+sort faster than index scan... the sort will for sure have to write to disk too given the size of the table... Cheers, Csaba.
On Wed, May 03, 2006 at 06:42:00PM +0200, Csaba Nagy wrote: > OK, maybe that's the point... the "cost bust" given to the sequential > scan by enable_seqscan=off is not enough in this case to exceed the cost > of the index scan ? The table is quite big, might be possible. I still > wonder why would be seqscan+sort faster than index scan... the sort will > for sure have to write to disk too given the size of the table... Have you tuned the values of effective_cache_size and random_page_cost? These have significant effects on index scans. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
Csaba Nagy wrote: >> Docs say: >> >>> Enables or disables the query planner's use of sequential scan plan >>> types. It's not possible to suppress sequential scans entirely, but >>> turning this variable off discourages the planner from using one if >>> there are other methods available. >> Note the second sentence. Again, I think it will have to scan the >> whole table anyway, because that's what you've asked for, and given >> that, enable_seqscan=off doesn't apply. > > OK, maybe that's the point... the "cost bust" given to the sequential > scan by enable_seqscan=off is not enough in this case to exceed the cost > of the index scan ? The table is quite big, might be possible. I still > wonder why would be seqscan+sort faster than index scan... the sort will > for sure have to write to disk too given the size of the table... When using an indexscan, postgres will read the actual rows in index-order, rathen then in the order they appear on-disk. For 200 million rows this means doing at least 200 million disk seeks. Assuming that each seek takes just 1ms, thats still amount to 200.000 seconds. greetings, Florian Pflug
Csaba Nagy <nagy@ecircle-ag.com> writes: > OK, maybe that's the point... the "cost bust" given to the sequential > scan by enable_seqscan=off is not enough in this case to exceed the cost > of the index scan ? Looks that way to me. You could try setting enable_sort off as well, which will penalize the seqscan+sort plan another 100million cost units. And maybe try reducing random_page_cost to make the indexscan look cheaper. However, if there's a 100million delta between the two plans, I suspect you really really don't want the indexscan anyway ;-) regards, tom lane
On Wed, 2006-05-03 at 13:34, Tom Lane wrote: > Csaba Nagy <nagy@ecircle-ag.com> writes: > > OK, maybe that's the point... the "cost bust" given to the sequential > > scan by enable_seqscan=off is not enough in this case to exceed the cost > > of the index scan ? > > Looks that way to me. You could try setting enable_sort off as well, > which will penalize the seqscan+sort plan another 100million cost units. > And maybe try reducing random_page_cost to make the indexscan look > cheaper. However, if there's a 100million delta between the two plans, > I suspect you really really don't want the indexscan anyway ;-) I imagine the followup post: So, I've had this query running for six weeks now, and...
am 03.05.2006, um 20:20:55 +0200 mailte Florian G. Pflug folgendes: > >of the index scan ? The table is quite big, might be possible. I still > >wonder why would be seqscan+sort faster than index scan... the sort will > >for sure have to write to disk too given the size of the table... > > When using an indexscan, postgres will read the actual rows in index-order, > rathen then in the order they appear on-disk. > For 200 million rows this means doing at least 200 million > disk seeks. Assuming that each seek takes just 1ms, thats > still amount to 200.000 seconds. Yepp, it is much cheaper to read the table seq and order later. Andreas -- Andreas Kretschmer (Kontakt: siehe Header) Heynitz: 035242/47215, D1: 0160/7141639 GnuPG-ID 0x3FFF606C http://wwwkeys.de.pgp.net === Schollglas Unternehmensgruppe ===
> > Looks that way to me. You could try setting enable_sort off as well, > > which will penalize the seqscan+sort plan another 100million cost units. > > And maybe try reducing random_page_cost to make the indexscan look > > cheaper. However, if there's a 100million delta between the two plans, > > I suspect you really really don't want the indexscan anyway ;-) > > I imagine the followup post: > > So, I've had this query running for six weeks now, and... Well, I guess that's it then... I will let the query run with the seqscan+sort. It will still run 1-2 days, yesterday I stopped it after 6 hours ;-) Actually it would be nice to have some kind of feedback on what is it doing so I can estimate how long it will still take... cause I'm not sure the seqscan+sort won't run itself for 6 weeks... Thanks, Csaba.
On Thu, May 04, 2006 at 10:09:33AM +0200, Csaba Nagy wrote: > Well, I guess that's it then... I will let the query run with the > seqscan+sort. It will still run 1-2 days, yesterday I stopped it after 6 > hours ;-) Actually it would be nice to have some kind of feedback on > what is it doing so I can estimate how long it will still take... cause > I'm not sure the seqscan+sort won't run itself for 6 weeks... There are a number of ways you can (indirectly) see how far it has got. If it's in the first phase (the seq scan), by looking at which file it has open you should be able to see how far along the table it is. Once it's in the sort stage you should be able to see from the tape files approimately how far it is. As for the number of sort stages required, that depends on the size of the data to be sorted, but you should be able to estimate that... At least, you should be able to tell the difference between a 6 hour and 6 day query. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
> There are a number of ways you can (indirectly) see how far it has got. > If it's in the first phase (the seq scan), by looking at which file it > has open you should be able to see how far along the table it is. Once > it's in the sort stage you should be able to see from the tape files > approimately how far it is. As for the number of sort stages required, > that depends on the size of the data to be sorted, but you should be > able to estimate that... > > At least, you should be able to tell the difference between a 6 hour and > 6 day query. Thanks for the hints, I'll try to look up on the details... Cheers, Csaba.