Thread: The planner chooses seqscan+sort when there is an index on the sort column

The planner chooses seqscan+sort when there is an index on the sort column

From

Csaba Nagy

Date:

03 May 2006, 12:38:42

Hi all,

I wonder why this happens:

 - postgres: 8.1.3
 - the table has ~200 million rows;
 - there is a primary key on (col_1, col_2);
 - the table was ANALYZEd;
 - the planner chooses seqscan+sort for the following query even with
enable_seqscan=off:

select * from table order by col_1;

Isn't it supposed to choose the index scan at least when
enable_seqscan=off ? Even if it is indeed not faster to do the index
scan than seqscan+sort.

The actual plan looks like (names changed):

db=# set enable_seqscan = off;
SET
db=# explain select * from table order by col_1;
                                  QUERY PLAN
-----------------------------------------------------------------------------------------
 Sort  (cost=165585198.70..166102386.06 rows=206874944 width=60)
   Sort Key: col_1
   ->  Seq Scan on table  (cost=100000000.00..104552091.44
rows=206874944 width=60)
(3 rows)


db=# \d table
                                     Table "public.table"
     Column      |            Type             |
Modifiers
-----------------+-----------------------------+----------------------------------------------------
 col_1           | bigint                      | not null
 col_2           | bigint                      | not null
...
Indexes:
    "pk_table" PRIMARY KEY, btree (col_1, col_2)
...


Cheers,
Csaba.

Re: The planner chooses seqscan+sort when there is an index on the sort column

From

"John D. Burger"

Date:

03 May 2006, 12:49:14

Csaba Nagy wrote:

> select * from table order by col_1;
>
> Isn't it supposed to choose the index scan at least when
> enable_seqscan=off ? Even if it is indeed not faster to do the index
> scan than seqscan+sort.

I think because you've asked for every row, it's going to have to scan
the whole table anyway, to determine MVCC "liveness" of the rows
(sorry, dunno what the correct word is).

- John Burger
   MITRE

Re: The planner chooses seqscan+sort when there is an index on the sort column

From

Andreas Kretschmer

Date:

03 May 2006, 12:53:48

Csaba Nagy <nagy@ecircle-ag.com> schrieb:
> select * from table order by col_1;

Without a WHERE you get the whole table.

A Index-Scan is, in this case, expensive.


HTH, Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."    (unknow)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°

Re: The planner chooses seqscan+sort when there is an

From

Csaba Nagy

Date:

03 May 2006, 12:59:03

On Wed, 2006-05-03 at 17:48, John D. Burger wrote:
> Csaba Nagy wrote:
>
> > select * from table order by col_1;
> >
> > Isn't it supposed to choose the index scan at least when
> > enable_seqscan=off ? Even if it is indeed not faster to do the index
> > scan than seqscan+sort.
>
> I think because you've asked for every row, it's going to have to scan
> the whole table anyway, to determine MVCC "liveness" of the rows
> (sorry, dunno what the correct word is).

But I also asked for _ordered_ results, which the seq scan is not
covering, but the index does... and I specifically disabled sequential
scan. That means the planner is not even considering the primary key
index, and I would like to know why...

Actually this is a problem for me in a more complex query, which also
needs this table ordered by that column, and it results in the same plan
fragment with sequential scan + sort.

Thanks,
Csaba.

Re: The planner chooses seqscan+sort when there is an

From

Csaba Nagy

Date:

03 May 2006, 13:02:16

> Without a WHERE you get the whole table.
>
> A Index-Scan is, in this case, expensive.

I know that, and I would agree if I wouldn't have asked for _ordered_
result, and wouldn't have turned enable_seqscan=off.

In these conditions I would have expected an index scan, even if it is
more expensive than the sequential scan + sort. Actually I wanted to see
how expensive it thinks it is, but I can't get to that plan at all.

Thanks,
Csaba.

Re: The planner chooses seqscan+sort when there is an

From

"John D. Burger"

Date:

03 May 2006, 13:33:18

> But I also asked for _ordered_ results, which the seq scan is not
> covering, but the index does... and I specifically disabled sequential
> scan.

Docs say:

> Enables or disables the query planner's use of sequential scan plan
> types. It's not possible to suppress sequential scans entirely, but
> turning this variable off discourages the planner from using one if
> there are other methods available.

Note the second sentence.  Again, I think it will have to scan the
whole table anyway, because that's what you've asked for, and given
that, enable_seqscan=off doesn't apply.

- John Burger
   MITRE

Re: The planner chooses seqscan+sort when there is an

From

Csaba Nagy

Date:

03 May 2006, 13:42:00

> Docs say:
>
> > Enables or disables the query planner's use of sequential scan plan
> > types. It's not possible to suppress sequential scans entirely, but
> > turning this variable off discourages the planner from using one if
> > there are other methods available.
>
> Note the second sentence.  Again, I think it will have to scan the
> whole table anyway, because that's what you've asked for, and given
> that, enable_seqscan=off doesn't apply.

OK, maybe that's the point... the "cost bust" given to the sequential
scan by enable_seqscan=off is not enough in this case to exceed the cost
of the index scan ? The table is quite big, might be possible. I still
wonder why would be seqscan+sort faster than index scan... the sort will
for sure have to write to disk too given the size of the table...

Cheers,
Csaba.

Re: The planner chooses seqscan+sort when there is an

From

Martijn van Oosterhout

Date:

03 May 2006, 13:45:27

On Wed, May 03, 2006 at 06:42:00PM +0200, Csaba Nagy wrote:
> OK, maybe that's the point... the "cost bust" given to the sequential
> scan by enable_seqscan=off is not enough in this case to exceed the cost
> of the index scan ? The table is quite big, might be possible. I still
> wonder why would be seqscan+sort faster than index scan... the sort will
> for sure have to write to disk too given the size of the table...

Have you tuned the values of effective_cache_size and random_page_cost?
These have significant effects on index scans.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

signature.asc

Re: The planner chooses seqscan+sort when there is an

From

"Florian G. Pflug"

Date:

03 May 2006, 15:21:50

Csaba Nagy wrote:
>> Docs say:
>>
>>> Enables or disables the query planner's use of sequential scan plan
>>> types. It's not possible to suppress sequential scans entirely, but
>>> turning this variable off discourages the planner from using one if
>>> there are other methods available.
>> Note the second sentence.  Again, I think it will have to scan the
>> whole table anyway, because that's what you've asked for, and given
>> that, enable_seqscan=off doesn't apply.
>
> OK, maybe that's the point... the "cost bust" given to the sequential
> scan by enable_seqscan=off is not enough in this case to exceed the cost
> of the index scan ? The table is quite big, might be possible. I still
> wonder why would be seqscan+sort faster than index scan... the sort will
> for sure have to write to disk too given the size of the table...

When using an indexscan, postgres will read the actual rows in
index-order, rathen then in the order they appear on-disk.
For 200 million rows this means doing at least 200 million
disk seeks. Assuming that each seek takes just 1ms, thats
still amount to 200.000 seconds.

greetings, Florian Pflug

Re: The planner chooses seqscan+sort when there is an

From

Tom Lane

Date:

03 May 2006, 15:34:32

Csaba Nagy <nagy@ecircle-ag.com> writes:
> OK, maybe that's the point... the "cost bust" given to the sequential
> scan by enable_seqscan=off is not enough in this case to exceed the cost
> of the index scan ?

Looks that way to me.  You could try setting enable_sort off as well,
which will penalize the seqscan+sort plan another 100million cost units.
And maybe try reducing random_page_cost to make the indexscan look
cheaper.  However, if there's a 100million delta between the two plans,
I suspect you really really don't want the indexscan anyway ;-)

            regards, tom lane

Re: The planner chooses seqscan+sort when there is an

From

Scott Marlowe

Date:

03 May 2006, 15:36:47

On Wed, 2006-05-03 at 13:34, Tom Lane wrote:
> Csaba Nagy <nagy@ecircle-ag.com> writes:
> > OK, maybe that's the point... the "cost bust" given to the sequential
> > scan by enable_seqscan=off is not enough in this case to exceed the cost
> > of the index scan ?
>
> Looks that way to me.  You could try setting enable_sort off as well,
> which will penalize the seqscan+sort plan another 100million cost units.
> And maybe try reducing random_page_cost to make the indexscan look
> cheaper.  However, if there's a 100million delta between the two plans,
> I suspect you really really don't want the indexscan anyway ;-)

I imagine the followup post:

So, I've had this query running for six weeks now, and...

Re: The planner chooses seqscan+sort when there is an

From

"A. Kretschmer"

Date:

03 May 2006, 15:45:54

am  03.05.2006, um 20:20:55 +0200 mailte Florian G. Pflug folgendes:
> >of the index scan ? The table is quite big, might be possible. I still
> >wonder why would be seqscan+sort faster than index scan... the sort will
> >for sure have to write to disk too given the size of the table...
>
> When using an indexscan, postgres will read the actual rows in index-order,
> rathen then in the order they appear on-disk.
> For 200 million rows this means doing at least 200 million
> disk seeks. Assuming that each seek takes just 1ms, thats
> still amount to 200.000 seconds.

Yepp, it is much cheaper to read the table seq and order later.


Andreas
--
Andreas Kretschmer    (Kontakt: siehe Header)
Heynitz:  035242/47215,      D1: 0160/7141639
GnuPG-ID 0x3FFF606C http://wwwkeys.de.pgp.net
 ===    Schollglas Unternehmensgruppe    ===

Re: The planner chooses seqscan+sort when there is an

From

Csaba Nagy

Date:

04 May 2006, 05:09:46

> > Looks that way to me.  You could try setting enable_sort off as well,
> > which will penalize the seqscan+sort plan another 100million cost units.
> > And maybe try reducing random_page_cost to make the indexscan look
> > cheaper.  However, if there's a 100million delta between the two plans,
> > I suspect you really really don't want the indexscan anyway ;-)
>
> I imagine the followup post:
>
> So, I've had this query running for six weeks now, and...

Well, I guess that's it then... I will let the query run with the
seqscan+sort. It will still run 1-2 days, yesterday I stopped it after 6
hours ;-) Actually it would be nice to have some kind of feedback on
what is it doing so I can estimate how long it will still take... cause
I'm not sure the seqscan+sort won't run itself for 6 weeks...

Thanks,
Csaba.

Re: The planner chooses seqscan+sort when there is an

From

Martijn van Oosterhout

Date:

04 May 2006, 05:18:43

On Thu, May 04, 2006 at 10:09:33AM +0200, Csaba Nagy wrote:
> Well, I guess that's it then... I will let the query run with the
> seqscan+sort. It will still run 1-2 days, yesterday I stopped it after 6
> hours ;-) Actually it would be nice to have some kind of feedback on
> what is it doing so I can estimate how long it will still take... cause
> I'm not sure the seqscan+sort won't run itself for 6 weeks...

There are a number of ways you can (indirectly) see how far it has got.
If it's in the first phase (the seq scan), by looking at which file it
has open you should be able to see how far along the table it is. Once
it's in the sort stage you should be able to see from the tape files
approimately how far it is. As for the number of sort stages required,
that depends on the size of the data to be sorted, but you should be
able to estimate that...

At least, you should be able to tell the difference between a 6 hour and
6 day query.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

signature.asc

Re: The planner chooses seqscan+sort when there is an

From

Csaba Nagy

Date:

04 May 2006, 05:36:49

> There are a number of ways you can (indirectly) see how far it has got.
> If it's in the first phase (the seq scan), by looking at which file it
> has open you should be able to see how far along the table it is. Once
> it's in the sort stage you should be able to see from the tape files
> approimately how far it is. As for the number of sort stages required,
> that depends on the size of the data to be sorted, but you should be
> able to estimate that...
>
> At least, you should be able to tell the difference between a 6 hour and
> 6 day query.

Thanks for the hints, I'll try to look up on the details...

Cheers,
Csaba.