Home > mailing lists

Re: Optimizer improvements: to do or not to do? - Mailing list pgsql-hackers

From	Say42
Subject	Re: Optimizer improvements: to do or not to do?
Date	September 13, 2006 11:00:22
Msg-id	1158145203.102440.55670@b28g2000cwb.googlegroups.com Whole thread Raw
In response to	Re: Optimizer improvements: to do or not to do? (Peter Eisentraut <peter_e@gmx.net>)
Responses	Re: Optimizer improvements: to do or not to do? (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

Peter Eisentraut wrote:
> But you haven't offered any analysis about the cause of this problem, so any
> speculation about normalization, usual cases, caching effects and so on are
> unfounded and premature.

Ok. My previous message was a bit pompous and unfounded. Sorry.
Below I'll try to explain what I mean when I spoke about caching
effect. Let's take my pervious example (I repost query and some lines
from 'explain' here for convenience):

select count(*) from conn.conn20060803 c where   exists (select code from belg_mobile tc       where c.bnum >= tc.code
andc.bnum like tc.code || '%'       order by tc.code desc limit 1)

Index Scan Backward using belg_mobile_pkey on belg_mobile tc
(cost=0.00..6.42 rows=1 width=10)
(actual time=0.012..0.012 rows=0 loops=494527)

Seq Scan on belg_mobile tc
(cost=0.00..2.19 rows=1 width=10)
(actual time=0.096..0.099 rows=0 loops=494527)

belg_mobile is very small (68 rows (1 heap page) and has PK on code
column (2 index pages)). indexCorrelation is equal to 0.0445 and almost
don't affect cost estimation result.

PG cost estimation (as far as I know, of course):

Index scan cost = 2 (index pages) + 1 (heap pages) * 4
(random_page_cost) + ( 0.0025 (cpu_operator_cost) * 3 (# ops) + 0.001
(cpu_index_tuple_cost) + 0.01 (cpu_tuple_cost) ) * 68 (record count) * 0.5 (selectivity of
subquery) ~ 6 (pages fetch cost) + 0.42 (cpu cost) = 6.42

Seq scan cost = 1(heap page) + (0.0025 (cpu_operator_cost) * 3 (# ops) + 0.01 (cpu_tuple_cost)) * 68 (record count) = 1
(pagesfetch cost) + 1.19 (cpu cost) = 2.19

The estimation is ok if we touch the belg_mobile table only once. In
the subquery we do it many times. After the first iteration of the
subquery all the belg_mobile's heap and index pages are in the cache
and cost per iteration should be estimated using formulae:

Index scan cost = ( 6 (pages fetch cost) + 0.42 (cpu cost) * 500K (conn table row count) ) / 500K  ~ 0.42

Seq scan cost = ( 1 (pages fetch cost) + 1.19 (cpu cost) * 500K (conn table row count) ) / 500K  ~ 1.19

Index scan actually more cheaper because less than one tenth of conn
rows have appropriate codes in the belg_mobile table.

That's what I want to say. I am not a veteran DBMS user so I can not
gauge importance of this cost inaccuracy in the whole. I hope you help
me to look at the problem (?) more widely than I can at the moment.

pgsql-hackers by date:

From: Teodor Sigaev
Date: 13 September 2006, 10:25:20
Subject: GIN documentation

From: Alvaro Herrera
Date: 13 September 2006, 12:26:35
Subject: Re: Getting a move on for 8.2 beta

Re: Optimizer improvements: to do or not to do? - Mailing list pgsql-hackers

Previous

Next