Re: Optimizer improvements: to do or not to do? - Mailing list pgsql-hackers

From Say42
Subject Re: Optimizer improvements: to do or not to do?
Date
Msg-id 1158145203.102440.55670@b28g2000cwb.googlegroups.com
Whole thread Raw
In response to Re: Optimizer improvements: to do or not to do?  (Peter Eisentraut <peter_e@gmx.net>)
Responses Re: Optimizer improvements: to do or not to do?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Peter Eisentraut wrote:
> But you haven't offered any analysis about the cause of this problem, so any
> speculation about normalization, usual cases, caching effects and so on are
> unfounded and premature.

Ok. My previous message was a bit pompous and unfounded. Sorry.
Below I'll try to explain what I mean when I spoke about caching
effect. Let's take my pervious example (I repost query and some lines
from 'explain' here for convenience):

select count(*) from conn.conn20060803 c where   exists (select code from belg_mobile tc       where c.bnum >= tc.code
andc.bnum like tc.code || '%'       order by tc.code desc limit 1)
 

Index Scan Backward using belg_mobile_pkey on belg_mobile tc
(cost=0.00..6.42 rows=1 width=10)
(actual time=0.012..0.012 rows=0 loops=494527)

Seq Scan on belg_mobile tc
(cost=0.00..2.19 rows=1 width=10)
(actual time=0.096..0.099 rows=0 loops=494527)

belg_mobile is very small (68 rows (1 heap page) and has PK on code
column (2 index pages)). indexCorrelation is equal to 0.0445 and almost
don't affect cost estimation result.

PG cost estimation (as far as I know, of course):

Index scan cost = 2 (index pages) + 1 (heap pages) * 4
(random_page_cost) + ( 0.0025 (cpu_operator_cost) * 3 (# ops) + 0.001
(cpu_index_tuple_cost) + 0.01 (cpu_tuple_cost) ) * 68 (record count) * 0.5 (selectivity of
subquery) ~ 6 (pages fetch cost) + 0.42 (cpu cost) = 6.42

Seq scan cost = 1(heap page) + (0.0025 (cpu_operator_cost) * 3 (# ops) + 0.01 (cpu_tuple_cost)) * 68 (record count) = 1
(pagesfetch cost) + 1.19 (cpu cost) = 2.19
 

The estimation is ok if we touch the belg_mobile table only once. In
the subquery we do it many times. After the first iteration of the
subquery all the belg_mobile's heap and index pages are in the cache
and cost per iteration should be estimated using formulae:

Index scan cost = ( 6 (pages fetch cost) + 0.42 (cpu cost) * 500K (conn table row count) ) / 500K  ~ 0.42

Seq scan cost = ( 1 (pages fetch cost) + 1.19 (cpu cost) * 500K (conn table row count) ) / 500K  ~ 1.19

Index scan actually more cheaper because less than one tenth of conn
rows have appropriate codes in the belg_mobile table.

That's what I want to say. I am not a veteran DBMS user so I can not
gauge importance of this cost inaccuracy in the whole. I hope you help
me to look at the problem (?) more widely than I can at the moment.



pgsql-hackers by date:

Previous
From: Teodor Sigaev
Date:
Subject: GIN documentation
Next
From: Alvaro Herrera
Date:
Subject: Re: Getting a move on for 8.2 beta