Re: index / sequential scan problem - Mailing list pgsql-performance

From Fabian Kreitner
Subject Re: index / sequential scan problem
Date
Msg-id 5.1.0.14.0.20030718063650.0397b238@195.145.148.245
Whole thread Raw
In response to Re: index / sequential scan problem  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: index / sequential scan problem  (Fabian Kreitner <fabian.kreitner@ainea-ag.de>)
List pgsql-performance
At 20:12 17.07.2003, Tom Lane wrote:
>Fabian Kreitner <fabian.kreitner@ainea-ag.de> writes:
> > That is what I read too and is why Im confused that the index is indeed
> > executing faster. Can this be a problem with the hardware and/or postgress
> > installation?
>
>I think the actual issue here is that you are executing the EXISTS
>subplan over and over, once for each outer row.  The planner's cost
>estimate for EXISTS is based on the assumption that you do it once
>... in which scenario the seqscan very possibly is cheaper.  However,
>when you do the EXISTS subplan over and over for many outer rows, you
>get a savings from the fact that the index and table pages soon get
>cached in memory.  The seqscan plan gets a savings too, since the table
>is small enough to fit in memory, but once everything is in memory the
>indexscan plan is faster.
>
>There's been some discussion on pghackers about how to teach the planner
>to account for repeated executions of subplans, but we have not come up
>with a good solution yet.
>
>For the moment, what people tend to do if they know their database is
>small enough to mostly stay in memory is to reduce random_page_cost to
>make the planner favor indexscans.  If you know the database is entirely
>cached then the theoretically correct value of random_page_cost is 1.0
>(since fetching any page will cost the same, if it's all in RAM).  I'd
>recommend against adopting that as a default, but a lot of people find
>that setting it to 2.0 or so seems to model their situation better than
>the out-of-the-box 4.0.

Thanks for the explanation :)


However .... :(

perg_1097=# vacuum analyze;
VACUUM
perg_1097=# set random_page_cost to 1.0;
SET VARIABLE
perg_1097=#  EXPLAIN ANALYZE    select  notiz_id, obj_id, obj_typ
perg_1097-#       from    notiz_objekt a
perg_1097-#       where not exists
perg_1097-#      (
perg_1097(#        select  1
perg_1097(#        from    notiz_gelesen b
perg_1097(#        where   ma_id  = 2001
perg_1097(#        and     ma_pid = 1097
perg_1097(#        and     a.notiz_id = b.notiz_id
perg_1097(#      )
perg_1097-#   ;
NOTICE:  QUERY PLAN:

Seq Scan on notiz_objekt a  (cost=0.00..56125.80 rows=15561 width=12)
(actual time=0.27..2299.09 rows=31122 loops=1)
   SubPlan
     ->  Seq Scan on notiz_gelesen b  (cost=0.00..1.79 rows=1 width=0)
(actual time=0.07..0.07 rows=0 loops=31122)
Total runtime: 2328.05 msec

EXPLAIN
perg_1097=#

...

perg_1097=# set enable_seqscan to false;
SET VARIABLE
perg_1097=# set random_page_cost to 1.0;
SET VARIABLE
perg_1097=#  EXPLAIN ANALYZE    select  notiz_id, obj_id, obj_typ
perg_1097-#       from    notiz_objekt a
perg_1097-#       where not exists
perg_1097-#      (
perg_1097(#        select  1
perg_1097(#        from    notiz_gelesen b
perg_1097(#        where   ma_id  = 2001
perg_1097(#        and     ma_pid = 1097
perg_1097(#        and     a.notiz_id = b.notiz_id
perg_1097(#      )
perg_1097-#   ;
NOTICE:  QUERY PLAN:

Seq Scan on notiz_objekt a  (cost=100000000.00..100093380.36 rows=15561
width=12) (actual time=0.07..550.07 rows=31122 loops=1)
   SubPlan
     ->  Index Scan using idx_notiz_gelesen_2 on notiz_gelesen
b  (cost=0.00..2.98 rows=1 width=0) (actual time=0.01..0.01 rows=0 loops=31122)
Total runtime: 582.90 msec

EXPLAIN
perg_1097=#


Even with a random page cost of 1 it thinks using the index should/could
take significantly longer which it doesnt for some reason :-/


pgsql-performance by date:

Previous
From: Robert Creager
Date:
Subject: Re: Hardware performance
Next
From: Fabian Kreitner
Date:
Subject: Re: index / sequential scan problem