Thread: Indicies work on FreeBSD, not on Linux

Indicies work on FreeBSD, not on Linux

From
Christopher Farley
Date:
I'm sure this is something simple, but my Linux development machine
running Postgresql 7.4.7 has very different behavior than my FreeBSD
production machine running Postgresql 7.4.2. I've got the same table
definitions, but I do have different data in the databases.

On FreeBSD, indicies do what I expect they would -- everything speeds up
nice and 'explain analyze' shows me that an index scan is happening.

On my Linux machine, the CREATE INDEX statement works, and I can see
the index when I view the table information. But 'explain analyze'
indicates that a sequential scan is still occuring.

Any ideas? 

############ FREEBSD BEFORE & AFTER INDEXING ###############

freebsd=# explain analyze select * from orders where or_status = 'N';
QUERYPLAN 
 
-------------------------------------------------------------------------------------------------------------Seq Scan
onorders  (cost=0.00..2590.03 rows=47 width=759) (actual
 
time=6029.894..6090.079 rows=38 loops=1)  Filter: (or_status = 'N'::bpchar)Total runtime: 6090.476 ms
(3 rows)

freebsd=# create index or_status_idx on orders(or_status);
CREATE INDEX
freebsd=# explain analyze select * from orders where or_status = 'N';
    QUERY PLAN 
 

----------------------------------------------------------------------------------------------------------------------------Index
Scanusing or_status_idx on orders  (cost=0.00..382.26 rows=96
 
width=759) (actual time=0.098..1.648 rows=38 loops=1)  Index Cond: (or_status = 'N'::bpchar)Total runtime: 1.866 ms
(3 rows)


############ LINUX BEFORE & AFTER INDEXING ###############

linux=# explain analyze select * from orders where or_status = 'N' order
by or_number;                                                   QUERY PLAN 
-------------------------------------------------------------------------------------------------------------------Sort
(cost=3978.80..3985.74 rows=2776 width=770) (actual
 
time=1304.325..1351.477 rows=7760 loops=1)  Sort Key: or_number  ->  Seq Scan on orders  (cost=0.00..3260.35 rows=2776
width=770)
(actual time=0.192..869.504 rows=7760 loops=1)        Filter: (or_status = 'N'::bpchar)Total runtime: 1361.277 ms
(5 rows)

linux=# create index or_status_idx on orders(or_status);
CREATE INDEX
linux=# explain analyze select * from orders where or_status = 'N' order
by or_number;                                                   QUERY PLAN 
-------------------------------------------------------------------------------------------------------------------Sort
(cost=3981.40..3988.35 rows=2779 width=770) (actual
 
time=1338.112..1384.465 rows=7760 loops=1)  Sort Key: or_number  ->  Seq Scan on orders  (cost=0.00..3260.54 rows=2779
width=770)
(actual time=0.185..891.342 rows=7760 loops=1)        Filter: (or_status = 'N'::bpchar)Total runtime: 1394.538 ms
(5 rows)


-- 
Christopher Farley
www.northernbrewer.com


Re: Indicies work on FreeBSD, not on Linux

From
Michael Fuhr
Date:
On Sat, Mar 05, 2005 at 02:18:37AM -0600, Christopher Farley wrote:

> I'm sure this is something simple, but my Linux development machine
> running Postgresql 7.4.7 has very different behavior than my FreeBSD
> production machine running Postgresql 7.4.2. I've got the same table
> definitions, but I do have different data in the databases.

FreeBSD vs. Linux is probably a red herring -- different data is
more likely the cause of different behavior.  You also ran different
queries (one set has ORDER BY, one set doesn't); that might not
matter in this case, but when comparing test results the queries
should be identical.  And it might be a good idea to ANALYZE the
tables to make sure the planners' statistics are current.

Here's what you show for FreeBSD after the index is created:

> freebsd=# explain analyze select * from orders where or_status = 'N';
>                                                          QUERY PLAN 
>
----------------------------------------------------------------------------------------------------------------------------
>  Index Scan using or_status_idx on orders  (cost=0.00..382.26 rows=96
> width=759) (actual time=0.098..1.648 rows=38 loops=1)
>    Index Cond: (or_status = 'N'::bpchar)
>  Total runtime: 1.866 ms
> (3 rows)

The planner estimates that this query will return 96 rows (it
actually returns 38 rows).  How many pages and rows does the planner
think this table has?  That is, what's the result of the following
query?

SELECT relpages, reltuples FROM pg_class WHERE relname = 'orders';

Retrieving a small percentage of the total number of rows is more
likely to favor an index scan than retrieving a large percentage.

Here's what you show for Linux after creating the index:

> linux=# explain analyze select * from orders where or_status = 'N' order
> by or_number;
>                                                     QUERY PLAN 
> -------------------------------------------------------------------------------------------------------------------
>  Sort  (cost=3981.40..3988.35 rows=2779 width=770) (actual
> time=1338.112..1384.465 rows=7760 loops=1)
>    Sort Key: or_number
>    ->  Seq Scan on orders  (cost=0.00..3260.54 rows=2779 width=770)
> (actual time=0.185..891.342 rows=7760 loops=1)
>          Filter: (or_status = 'N'::bpchar)
>  Total runtime: 1394.538 ms
> (5 rows)

The planner estimates that this query will return 2779 rows (it
actually returns 7760).  Again, how many pages and rows does the
planner think this table has?  If 2279 rows (the estimate) is a
significant percentage of the total then the planner might decide
that a sequential scan would be faster than an index scan.

What output do you get if you ANALYZE the tables?  What output do
you get on Linux if you disable sequential scans ("SET enable_seqscan
TO off")?  Updating the statistics might not help the Linux query
since a more accurate row estimate (7760) would be higher than the
inaccurate estimate (2779), which might make a sequential scan even
more likely (unless perhaps a more accurate reltuples is also much
higher), but in general it's a good idea to let the planner work
with accurate statistics.

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/