Thread: Postgres 8.3 only uses seq scan

Postgres 8.3 only uses seq scan

From

Clemens Schwaighofer

Date:

25 November 2008, 23:39:24

Hi,

I have system here with Debian/Testing and the latest 8.2 and 8.3
database installed.

on a blank database I create two very simple tables

                                Table "public.foo"
 Column |       Type        |                      Modifiers
--------+-------------------+------------------------------------------------------
 foo_id | integer           | not null default
nextval('foo_foo_id_seq'::regclass)
 test   | character varying |
Indexes:
    "foo_pkey" PRIMARY KEY, btree (foo_id)


                                Table "public.bar"
 Column |       Type        |                      Modifiers
--------+-------------------+------------------------------------------------------
 bar_id | integer           | not null default
nextval('bar_bar_id_seq'::regclass)
 foo_id | integer           | not null
 test   | character varying |
Indexes:
    "bar_pkey" PRIMARY KEY, btree (bar_id)
    "bar_foo_id_idx" btree (foo_id)
Foreign-key constraints:
    "bar_foo_id_fkey" FOREIGN KEY (foo_id) REFERENCES foo(foo_id) MATCH
FULL ON UPDATE CASCADE ON DELETE CASCADE

now if I run a simple join query over both tables Postgres 8.2 gives
this back for the explain:

# explain select * from foo f, bar b where f.foo_id = b.foo_id;
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..33.14 rows=3 width=76)
   ->  Index Scan using bar_foo_id_idx on bar b  (cost=0.00..12.30
rows=3 width=40)
   ->  Index Scan using foo_pkey on foo f  (cost=0.00..6.93 rows=1 width=36)
         Index Cond: (f.foo_id = b.foo_id)


but on the 8.3 version i get this back

# explain select * from foo f, bar b where f.foo_id = b.foo_id;
                            QUERY PLAN
------------------------------------------------------------------
 Hash Join  (cost=1.07..2.14 rows=3 width=24)
   Hash Cond: (b.foo_id = f.foo_id)
   ->  Seq Scan on bar b  (cost=0.00..1.03 rows=3 width=14)
   ->  Hash  (cost=1.03..1.03 rows=3 width=10)
         ->  Seq Scan on foo f  (cost=0.00..1.03 rows=3 width=10)

once I insert a million rows he does use the index:

# explain select * from foo f, bar b where f.foo_id = b.foo_id;
                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..26.39 rows=9 width=35)
   ->  Seq Scan on foo f  (cost=0.00..1.03 rows=3 width=21)
   ->  Index Scan using bar_foo_id_idx on bar b  (cost=0.00..8.42 rows=3
width=14)
         Index Cond: (b.foo_id = f.foo_id)


I have seen this behavior on all of my postgres 8.3 installs. The
indexes are there, auto vacuum is turned on. even a reindex of the
tables does not help. The configuration files are identical in grounds
of memory usage, query planning, etc.

I see this on RPM packages for RedHat Enterprise, self compiled for
FreeBSD 4, and debian packages. I am seriously very very confused.

What can I do to debug this further, or find out why this happens?

Does this mean Postgres 8.3 thinks a sequence scan is faster than an
index scan? Even on tables with hundred thousands rows?

--
[ Clemens Schwaighofer                      -----=====:::::~ ]
[ IT Engineer/Manager                                        ]
[ E-Graphics Communications, TEQUILA\ Japan IT Group         ]
[                6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ]
[ Tel: +81-(0)3-3545-7706            Fax: +81-(0)3-3545-7343 ]
[ http://www.tequila.jp                                      ]

Attachment

Re: Postgres 8.3 only uses seq scan

From

"Scott Marlowe"

Date:

26 November 2008, 01:04:18

On Tue, Nov 25, 2008 at 8:39 PM, Clemens Schwaighofer
<clemens.schwaighofer@tequila.jp> wrote:
> but on the 8.3 version i get this back
>
> # explain select * from foo f, bar b where f.foo_id = b.foo_id;
>                            QUERY PLAN
> ------------------------------------------------------------------
>  Hash Join  (cost=1.07..2.14 rows=3 width=24)
>   Hash Cond: (b.foo_id = f.foo_id)
>   ->  Seq Scan on bar b  (cost=0.00..1.03 rows=3 width=14)
>   ->  Hash  (cost=1.03..1.03 rows=3 width=10)
>         ->  Seq Scan on foo f  (cost=0.00..1.03 rows=3 width=10)

Of course it uses a seq scan.  All the data fits handily into a single
page I assume.

> once I insert a million rows he does use the index:
>
> # explain select * from foo f, bar b where f.foo_id = b.foo_id;
>                                    QUERY PLAN
> -----------------------------------------------------------------------------------
>  Nested Loop  (cost=0.00..26.39 rows=9 width=35)
>   ->  Seq Scan on foo f  (cost=0.00..1.03 rows=3 width=21)
>   ->  Index Scan using bar_foo_id_idx on bar b  (cost=0.00..8.42 rows=3
> width=14)
>         Index Cond: (b.foo_id = f.foo_id)

I don't see a million rows here, only three.  Have you run analyze
after loading all that data?  Or is it retrieving 3 rows out of a
million?  If so then an index scan does make sense.

Re: Postgres 8.3 only uses seq scan

From

Clemens Schwaighofer

Date:

26 November 2008, 01:07:31

On 11/26/2008 02:04 PM, Scott Marlowe wrote:
> On Tue, Nov 25, 2008 at 8:39 PM, Clemens Schwaighofer
> <clemens.schwaighofer@tequila.jp> wrote:
>> but on the 8.3 version i get this back
>>
>> # explain select * from foo f, bar b where f.foo_id = b.foo_id;
>>                            QUERY PLAN
>> ------------------------------------------------------------------
>>  Hash Join  (cost=1.07..2.14 rows=3 width=24)
>>   Hash Cond: (b.foo_id = f.foo_id)
>>   ->  Seq Scan on bar b  (cost=0.00..1.03 rows=3 width=14)
>>   ->  Hash  (cost=1.03..1.03 rows=3 width=10)
>>         ->  Seq Scan on foo f  (cost=0.00..1.03 rows=3 width=10)
>
> Of course it uses a seq scan.  All the data fits handily into a single
> page I assume.

okay, the strange thing is, that in 8.2 it always used an index scan.

>> once I insert a million rows he does use the index:
>>
>> # explain select * from foo f, bar b where f.foo_id = b.foo_id;
>>                                    QUERY PLAN
>> -----------------------------------------------------------------------------------
>>  Nested Loop  (cost=0.00..26.39 rows=9 width=35)
>>   ->  Seq Scan on foo f  (cost=0.00..1.03 rows=3 width=21)
>>   ->  Index Scan using bar_foo_id_idx on bar b  (cost=0.00..8.42 rows=3
>> width=14)
>>         Index Cond: (b.foo_id = f.foo_id)
>
> I don't see a million rows here, only three.  Have you run analyze
> after loading all that data?  Or is it retrieving 3 rows out of a
> million?  If so then an index scan does make sense.

yeah, there are 3 matching rows, and the rest is just data to make the
table big.

I am just still confused, because if Postgres does only use seq scan
even in very large databases, I am worried I do something very wrong in
my DB design ...


--
[ Clemens Schwaighofer                      -----=====:::::~ ]
[ IT Engineer/Manager                                        ]
[ E-Graphics Communications, TEQUILA\ Japan IT Group         ]
[                6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ]
[ Tel: +81-(0)3-3545-7706            Fax: +81-(0)3-3545-7343 ]
[ http://www.tequila.jp                                      ]

Attachment

signature.asc

Re: Postgres 8.3 only uses seq scan

From

"Scott Marlowe"

Date:

26 November 2008, 01:15:36

On Tue, Nov 25, 2008 at 10:07 PM, Clemens Schwaighofer
<clemens.schwaighofer@tequila.jp> wrote:
> On 11/26/2008 02:04 PM, Scott Marlowe wrote:
>> On Tue, Nov 25, 2008 at 8:39 PM, Clemens Schwaighofer
>> <clemens.schwaighofer@tequila.jp> wrote:
>>> but on the 8.3 version i get this back
>>>
>>> # explain select * from foo f, bar b where f.foo_id = b.foo_id;
>>>                            QUERY PLAN
>>> ------------------------------------------------------------------
>>>  Hash Join  (cost=1.07..2.14 rows=3 width=24)
>>>   Hash Cond: (b.foo_id = f.foo_id)
>>>   ->  Seq Scan on bar b  (cost=0.00..1.03 rows=3 width=14)
>>>   ->  Hash  (cost=1.03..1.03 rows=3 width=10)
>>>         ->  Seq Scan on foo f  (cost=0.00..1.03 rows=3 width=10)
>>
>> Of course it uses a seq scan.  All the data fits handily into a single
>> page I assume.
>
> okay, the strange thing is, that in 8.2 it always used an index scan.

Are there more rows in the 8.2 table you're testing on?  Or is the
whole table small enough to fit on a few pages?

>>> once I insert a million rows he does use the index:
>>>
>>> # explain select * from foo f, bar b where f.foo_id = b.foo_id;
>>>                                    QUERY PLAN
>>> -----------------------------------------------------------------------------------
>>>  Nested Loop  (cost=0.00..26.39 rows=9 width=35)
>>>   ->  Seq Scan on foo f  (cost=0.00..1.03 rows=3 width=21)
>>>   ->  Index Scan using bar_foo_id_idx on bar b  (cost=0.00..8.42 rows=3
>>> width=14)
>>>         Index Cond: (b.foo_id = f.foo_id)
>>
>> I don't see a million rows here, only three.  Have you run analyze
>> after loading all that data?  Or is it retrieving 3 rows out of a
>> million?  If so then an index scan does make sense.
>
> yeah, there are 3 matching rows, and the rest is just data to make the
> table big.
>
> I am just still confused, because if Postgres does only use seq scan
> even in very large databases, I am worried I do something very wrong in
> my DB design ...

Postgresql has no visibility in its indexes, meaning that whether it
uses an index or not, it still has to go to the table to see if the
tuple is actually visible to this transaction.  For this reason,
PostgreSQL switches to sequential scans quicker than other dbs that
have visibility information in their indexes.

The planner is pretty smart, but if you're going to hit a large % of
the table anyway, it switches to sequential scans since it will have
to retreive the majority of the table anyway.

Re: Postgres 8.3 only uses seq scan

From

Clemens Schwaighofer

Date:

26 November 2008, 01:22:37

On 11/26/2008 02:15 PM, Scott Marlowe wrote:
> On Tue, Nov 25, 2008 at 10:07 PM, Clemens Schwaighofer
> <clemens.schwaighofer@tequila.jp> wrote:
>> On 11/26/2008 02:04 PM, Scott Marlowe wrote:
>>> On Tue, Nov 25, 2008 at 8:39 PM, Clemens Schwaighofer
>>> <clemens.schwaighofer@tequila.jp> wrote:
>>>> but on the 8.3 version i get this back
>>>>
>>>> # explain select * from foo f, bar b where f.foo_id = b.foo_id;
>>>>                            QUERY PLAN
>>>> ------------------------------------------------------------------
>>>>  Hash Join  (cost=1.07..2.14 rows=3 width=24)
>>>>   Hash Cond: (b.foo_id = f.foo_id)
>>>>   ->  Seq Scan on bar b  (cost=0.00..1.03 rows=3 width=14)
>>>>   ->  Hash  (cost=1.03..1.03 rows=3 width=10)
>>>>         ->  Seq Scan on foo f  (cost=0.00..1.03 rows=3 width=10)
>>> Of course it uses a seq scan.  All the data fits handily into a single
>>> page I assume.
>> okay, the strange thing is, that in 8.2 it always used an index scan.
>
> Are there more rows in the 8.2 table you're testing on?  Or is the
> whole table small enough to fit on a few pages?

I highly doubt that. I have right now in one of the DBs I transfered
tables from ~100.000 down to ~40.000 rows that all join together. I
somehow really doubt that fit in a few pages.

That is why I was so surprised to see such a big difference in the explain.

>
>>>> once I insert a million rows he does use the index:
>>>>
>>>> # explain select * from foo f, bar b where f.foo_id = b.foo_id;
>>>>                                    QUERY PLAN
>>>> -----------------------------------------------------------------------------------
>>>>  Nested Loop  (cost=0.00..26.39 rows=9 width=35)
>>>>   ->  Seq Scan on foo f  (cost=0.00..1.03 rows=3 width=21)
>>>>   ->  Index Scan using bar_foo_id_idx on bar b  (cost=0.00..8.42 rows=3
>>>> width=14)
>>>>         Index Cond: (b.foo_id = f.foo_id)
>>> I don't see a million rows here, only three.  Have you run analyze
>>> after loading all that data?  Or is it retrieving 3 rows out of a
>>> million?  If so then an index scan does make sense.
>> yeah, there are 3 matching rows, and the rest is just data to make the
>> table big.
>>
>> I am just still confused, because if Postgres does only use seq scan
>> even in very large databases, I am worried I do something very wrong in
>> my DB design ...
>
> Postgresql has no visibility in its indexes, meaning that whether it
> uses an index or not, it still has to go to the table to see if the
> tuple is actually visible to this transaction.  For this reason,
> PostgreSQL switches to sequential scans quicker than other dbs that
> have visibility information in their indexes.
>
> The planner is pretty smart, but if you're going to hit a large % of
> the table anyway, it switches to sequential scans since it will have
> to retreive the majority of the table anyway.

So, I am fine when I trust the Postgresql planner :) Because speed wise
I see no difference that 8.3 would be slower than 8.2

--
[ Clemens Schwaighofer                      -----=====:::::~ ]
[ IT Engineer/Manager                                        ]
[ E-Graphics Communications, TEQUILA\ Japan IT Group         ]
[                6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ]
[ Tel: +81-(0)3-3545-7706            Fax: +81-(0)3-3545-7343 ]
[ http://www.tequila.jp                                      ]

Attachment

signature.asc

Re: Postgres 8.3 only uses seq scan

From

"Scott Marlowe"

Date:

26 November 2008, 02:21:01

On Tue, Nov 25, 2008 at 10:22 PM, Clemens Schwaighofer
<clemens.schwaighofer@tequila.jp> wrote:
> On 11/26/2008 02:15 PM, Scott Marlowe wrote:
>>
>> Are there more rows in the 8.2 table you're testing on?  Or is the
>> whole table small enough to fit on a few pages?
>
> I highly doubt that. I have right now in one of the DBs I transfered
> tables from ~100.000 down to ~40.000 rows that all join together. I
> somehow really doubt that fit in a few pages.

Right, with more rows, and choosing fewer, pgsql will go for an index
scan.  if choosing a good %, the seq scan.

> So, I am fine when I trust the Postgresql planner :) Because speed wise
> I see no difference that 8.3 would be slower than 8.2

Well, the planner's not perfect.  Some off corner cases can catch it
out, or if your database isn't analyzed after a lot of changes it may
make an uninformed decision.  But most the time it makes the right, or
close enough, decision.

Re: Postgres 8.3 only uses seq scan

From

Clemens Schwaighofer

Date:

26 November 2008, 02:28:27

On 11/26/2008 03:20 PM, Scott Marlowe wrote:
> On Tue, Nov 25, 2008 at 10:22 PM, Clemens Schwaighofer
> <clemens.schwaighofer@tequila.jp> wrote:
>> On 11/26/2008 02:15 PM, Scott Marlowe wrote:
>>> Are there more rows in the 8.2 table you're testing on?  Or is the
>>> whole table small enough to fit on a few pages?
>> I highly doubt that. I have right now in one of the DBs I transfered
>> tables from ~100.000 down to ~40.000 rows that all join together. I
>> somehow really doubt that fit in a few pages.
>
> Right, with more rows, and choosing fewer, pgsql will go for an index
> scan.  if choosing a good %, the seq scan.

okay, then I think I get it. One of my test queries was actually
selecting a very big chunk (90%) of the data, so it makes sense the
planner chooses seq scan over index scan here.

>> So, I am fine when I trust the Postgresql planner :) Because speed wise
>> I see no difference that 8.3 would be slower than 8.2
>
> Well, the planner's not perfect.  Some off corner cases can catch it
> out, or if your database isn't analyzed after a lot of changes it may
> make an uninformed decision.  But most the time it makes the right, or
> close enough, decision.

Well, I have autovacuum turned on, so this should hopefully keep the
planner up to date.

Anyway, thanks a lot for your help.

--
[ Clemens Schwaighofer                      -----=====:::::~ ]
[ IT Engineer/Manager                                        ]
[ E-Graphics Communications, TEQUILA\ Japan IT Group         ]
[                6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ]
[ Tel: +81-(0)3-3545-7706            Fax: +81-(0)3-3545-7343 ]
[ http://www.tequila.jp                                      ]

Attachment

signature.asc

Re: Postgres 8.3 only uses seq scan

From

tv@fuzzy.cz

Date:

26 November 2008, 05:44:16

Try running EXPLAIN ANALYZE - that gives much more information. For
example it may show differences in number of rows between the two
machines, that the statistics are not up to date, etc.

regards
Tomas

> Hi,
>
> I have system here with Debian/Testing and the latest 8.2 and 8.3
> database installed.
>
> on a blank database I create two very simple tables
>
>                                 Table "public.foo"
>  Column |       Type        |                      Modifiers
> --------+-------------------+------------------------------------------------------
>  foo_id | integer           | not null default
> nextval('foo_foo_id_seq'::regclass)
>  test   | character varying |
> Indexes:
>     "foo_pkey" PRIMARY KEY, btree (foo_id)
>
>
>                                 Table "public.bar"
>  Column |       Type        |                      Modifiers
> --------+-------------------+------------------------------------------------------
>  bar_id | integer           | not null default
> nextval('bar_bar_id_seq'::regclass)
>  foo_id | integer           | not null
>  test   | character varying |
> Indexes:
>     "bar_pkey" PRIMARY KEY, btree (bar_id)
>     "bar_foo_id_idx" btree (foo_id)
> Foreign-key constraints:
>     "bar_foo_id_fkey" FOREIGN KEY (foo_id) REFERENCES foo(foo_id) MATCH
> FULL ON UPDATE CASCADE ON DELETE CASCADE
>
> now if I run a simple join query over both tables Postgres 8.2 gives
> this back for the explain:
>
> # explain select * from foo f, bar b where f.foo_id = b.foo_id;
>                                      QUERY PLAN
> ------------------------------------------------------------------------------------
>  Nested Loop  (cost=0.00..33.14 rows=3 width=76)
>    ->  Index Scan using bar_foo_id_idx on bar b  (cost=0.00..12.30
> rows=3 width=40)
>    ->  Index Scan using foo_pkey on foo f  (cost=0.00..6.93 rows=1
> width=36)
>          Index Cond: (f.foo_id = b.foo_id)
>
>
> but on the 8.3 version i get this back
>
> # explain select * from foo f, bar b where f.foo_id = b.foo_id;
>                             QUERY PLAN
> ------------------------------------------------------------------
>  Hash Join  (cost=1.07..2.14 rows=3 width=24)
>    Hash Cond: (b.foo_id = f.foo_id)
>    ->  Seq Scan on bar b  (cost=0.00..1.03 rows=3 width=14)
>    ->  Hash  (cost=1.03..1.03 rows=3 width=10)
>          ->  Seq Scan on foo f  (cost=0.00..1.03 rows=3 width=10)
>
> once I insert a million rows he does use the index:
>
> # explain select * from foo f, bar b where f.foo_id = b.foo_id;
>                                     QUERY PLAN
> -----------------------------------------------------------------------------------
>  Nested Loop  (cost=0.00..26.39 rows=9 width=35)
>    ->  Seq Scan on foo f  (cost=0.00..1.03 rows=3 width=21)
>    ->  Index Scan using bar_foo_id_idx on bar b  (cost=0.00..8.42 rows=3
> width=14)
>          Index Cond: (b.foo_id = f.foo_id)
>
>
> I have seen this behavior on all of my postgres 8.3 installs. The
> indexes are there, auto vacuum is turned on. even a reindex of the
> tables does not help. The configuration files are identical in grounds
> of memory usage, query planning, etc.
>
> I see this on RPM packages for RedHat Enterprise, self compiled for
> FreeBSD 4, and debian packages. I am seriously very very confused.
>
> What can I do to debug this further, or find out why this happens?
>
> Does this mean Postgres 8.3 thinks a sequence scan is faster than an
> index scan? Even on tables with hundred thousands rows?
>
> --
> [ Clemens Schwaighofer                      -----=====:::::~ ]
> [ IT Engineer/Manager                                        ]
> [ E-Graphics Communications, TEQUILA\ Japan IT Group         ]
> [                6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ]
> [ Tel: +81-(0)3-3545-7706            Fax: +81-(0)3-3545-7343 ]
> [ http://www.tequila.jp                                      ]
>
>

Re: Postgres 8.3 only uses seq scan

From

Clemens Schwaighofer

Date:

26 November 2008, 05:46:55

On 11/26/2008 06:44 PM, tv@fuzzy.cz wrote:
> Try running EXPLAIN ANALYZE - that gives much more information. For
> example it may show differences in number of rows between the two
> machines, that the statistics are not up to date, etc.

Thanks a lot for this tip

--
[ Clemens Schwaighofer                      -----=====:::::~ ]
[ IT Engineer/Manager                                        ]
[ E-Graphics Communications, TEQUILA\ Japan IT Group         ]
[                6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ]
[ Tel: +81-(0)3-3545-7706            Fax: +81-(0)3-3545-7343 ]
[ http://www.tequila.jp                                      ]

Attachment

signature.asc