Thread: Understanding sequential versus index scans.

Understanding sequential versus index scans.

From

Robert James

Date:

19 July 2009, 19:58:32

Hi. I notice that when I do a WHERE x, Postgres uses an index, and when I do WHERE y, it does so as well, but when I do WHERE x OR y, it doesn't. Why is this so? And how can I shut this off?

select * from dict
where
word in (select substr('moon', 0, generate_series(3,length('moon')))) -- this is my X above
OR word like 'moon%' -- this is my Y above

Seq Scan on dict (cost=0.02..2775.66 rows=30422 width=24) (actual time=16.635..28.580 rows=8 loops=1)
Filter: ((hashed subplan) OR ((word)::text ~~ 'moon%'::text))
SubPlan
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.014..0.019 rows=2 loops=1)
Total runtime: 28.658 ms

(Using just X or Y alone uses the index, and completes in 0.150 ms)

Is this a bug?

Re: Understanding sequential versus index scans.

From

Robert James

Date:

19 July 2009, 20:01:00

PS Running "PostgreSQL 8.2.1 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC) 3.4.2 (mingw-special)"

On Sun, Jul 19, 2009 at 6:58 PM, Robert James <srobertjames@gmail.com> wrote:

Hi. I notice that when I do a WHERE x, Postgres uses an index, and when I do WHERE y, it does so as well, but when I do WHERE x OR y, it doesn't. Why is this so? And how can I shut this off?
select * from dict
where
word in (select substr('moon', 0, generate_series(3,length('moon')))) -- this is my X above
OR word like 'moon%' -- this is my Y above

Seq Scan on dict (cost=0.02..2775.66 rows=30422 width=24) (actual time=16.635..28.580 rows=8 loops=1)
Filter: ((hashed subplan) OR ((word)::text ~~ 'moon%'::text))
SubPlan
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.014..0.019 rows=2 loops=1)
Total runtime: 28.658 ms
(Using just X or Y alone uses the index, and completes in 0.150 ms)
Is this a bug?

Re: Understanding sequential versus index scans.

From

Greg Stark

Date:

19 July 2009, 20:27:56

On Sun, Jul 19, 2009 at 11:59 PM, Robert James<srobertjames@gmail.com> wrote:
> PS Running "PostgreSQL 8.2.1 on i686-pc-mingw32, compiled by GCC gcc.exe
> (GCC) 3.4.2 (mingw-special)"
>
> On Sun, Jul 19, 2009 at 6:58 PM, Robert James <srobertjames@gmail.com>
> wrote:
>>
>> Hi.  I notice that when I do a WHERE x, Postgres uses an index, and when I
>> do WHERE y, it does so as well, but when I do WHERE x OR y, it doesn't.  Why
>> is this so? And how can I shut this off?
>> select * from dict
>> where
>>  word in (select substr('moon', 0, generate_series(3,length('moon')))) --
>> this is my X above
>>  OR word like 'moon%' -- this is my Y above
>> Seq Scan on dict (cost=0.02..2775.66 rows=30422 width=24) (actual
>> time=16.635..28.580 rows=8 loops=1)
>>  Filter: ((hashed subplan) OR ((word)::text ~~ 'moon%'::text))
>>  SubPlan
>>  -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.014..0.019
>> rows=2 loops=1)
>> Total runtime: 28.658 ms
>> (Using just X or Y alone uses the index, and completes in 0.150 ms)
>> Is this a bug?

Well there are known bugs in 8.2.1 -- that's why the current 8.2
release is 8.2.13.

The next step here is to set enable_seqscan=off and run explain
analyze again. You may have to adjust some costs to sync the estimated
cost with actual run-time.

--
greg
http://mit.edu/~gsstark/resume.pdf

Re: Understanding sequential versus index scans.

From

Tom Lane

Date:

19 July 2009, 20:47:54

Robert James <srobertjames@gmail.com> writes:
> Hi.  I notice that when I do a WHERE x, Postgres uses an index, and when I
> do WHERE y, it does so as well, but when I do WHERE x OR y, it
> doesn't.

It can use indexes for OR conditions, but not for arbitrary OR
conditions...

> select * from dict
> where
>  word in (select substr('moon', 0, generate_series(3,length('moon')))) --
> this is my X above
>  OR word like 'moon%' -- this is my Y above

... and that one is pretty arbitrary.  You might have some luck with
using a UNION instead, viz

select * from dict where X
union all
select * from dict where Y

            regards, tom lane

Re: Understanding sequential versus index scans.

From

John R Pierce

Date:

19 July 2009, 20:50:59

Robert James wrote:
> Hi.  I notice that when I do a WHERE x, Postgres uses an index, and
> when I do WHERE y, it does so as well, but when I do WHERE x OR y, it
> doesn't.  Why is this so? And how can I shut this off?

maybe its because you have no index on (X OR Y) ?  or maybe because the
analyzer thinks that X or Y includes enough rows that a sequential scan
is more effective ?

Re: Understanding sequential versus index scans.

From

Robert James

Date:

19 July 2009, 21:11:04

UNION was better, but still 5 times as slow as either query done individually.

set enable_seqscan=off didn't help at all - it was totally ignored

Is there anything else I can do?

On Sun, Jul 19, 2009 at 7:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert James <srobertjames@gmail.com> writes:
> Hi. I notice that when I do a WHERE x, Postgres uses an index, and when I
> do WHERE y, it does so as well, but when I do WHERE x OR y, it
> doesn't.

It can use indexes for OR conditions, but not for arbitrary OR
conditions...

> select * from dict
> where
> word in (select substr('moon', 0, generate_series(3,length('moon')))) --
> this is my X above
> OR word like 'moon%' -- this is my Y above

... and that one is pretty arbitrary. You might have some luck with
using a UNION instead, viz

select * from dict where X
union all
select * from dict where Y

regards, tom lane

Re: Understanding sequential versus index scans.

From

Robert James

Date:

19 July 2009, 21:59:26

Is there anyway to tell Postgres "Run these two queries, and union their results, but don't change the plan as to a UNION - just run them separately"?

Something seems funny to me that running a UNION should be twice as slow as running the two queries one after the other.

On Sun, Jul 19, 2009 at 8:10 PM, Robert James <srobertjames@gmail.com> wrote:

UNION was better, but still 5 times as slow as either query done individually.
set enable_seqscan=off didn't help at all - it was totally ignored
Is there anything else I can do?

On Sun, Jul 19, 2009 at 7:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert James <srobertjames@gmail.com> writes:
> Hi. I notice that when I do a WHERE x, Postgres uses an index, and when I
> do WHERE y, it does so as well, but when I do WHERE x OR y, it
> doesn't.

It can use indexes for OR conditions, but not for arbitrary OR
conditions...

> select * from dict
> where
> word in (select substr('moon', 0, generate_series(3,length('moon')))) --
> this is my X above
> OR word like 'moon%' -- this is my Y above

... and that one is pretty arbitrary. You might have some luck with
using a UNION instead, viz

select * from dict where X
union all
select * from dict where Y

regards, tom lane

Re: Understanding sequential versus index scans.

From

Scott Marlowe

Date:

19 July 2009, 22:05:15

On Sun, Jul 19, 2009 at 6:10 PM, Robert James<srobertjames@gmail.com> wrote:
> UNION was better, but still 5 times as slow as either query done
> individually.
> set enable_seqscan=off didn't help at all - it was totally ignored
> Is there anything else I can do?

Did you try union, or union all?

Re: Understanding sequential versus index scans.

From

Robert James

Date:

20 July 2009, 10:22:33

Yes, I had done UNION. UNION ALL achives the expected plan and speed! Thank you!

BTW, this is interesting, because there are only about 5 or 6 rows max returned from both queries - but I guess the planner expects more and hence changes the plan to remove duplicates.

On Sun, Jul 19, 2009 at 9:05 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

On Sun, Jul 19, 2009 at 6:10 PM, Robert James<srobertjames@gmail.com> wrote:
> UNION was better, but still 5 times as slow as either query done
> individually.
> set enable_seqscan=off didn't help at all - it was totally ignored
> Is there anything else I can do?

Did you try union, or union all?

Re: Understanding sequential versus index scans.

From

Greg Stark

Date:

20 July 2009, 11:14:14

On Mon, Jul 20, 2009 at 2:22 PM, Robert James<srobertjames@gmail.com> wrote:
> BTW, this is interesting, because there are only about 5 or 6 rows max
> returned from both queries - but I guess the planner expects more and hence
> changes the plan to remove duplicates.


If you sent the plans for the various attempts we might be able to
explain what's going on.

--
greg
http://mit.edu/~gsstark/resume.pdf