Thread: Optimizer improvements: to do or not to do?

Optimizer improvements: to do or not to do?

From
"Say42"
Date:
I intend to play with some optimizer aspects. Just for fun. I'm a
novice in the DBMS development so I can not promise any available
results but if it can be useful even as yet another failed attempt I
will try.

That's what I want to do:
1. Replace not very useful indexCorrelation with indexClustering.
2. Consider caching of inner table in a nested loops join during
estimation total cost of the join.

More details:
1. During analyze we have sample rows. For every N-th sample row we can
scan indices on qual like 'value >= index_first_column' and fetch first
N row TIDs. To estimate count of fetched heap pages is not hard. To
take the index clustering value just divide the pages count by the
sample rows count.
2. It's more-more harder and may be impossible to me at all. The main
ideas:
- split page fetches cost and CPU cost into different variables and
don't summarize it before join estimation.
- final path cost estimation should be done in the join cost estimation
and take into account number of inner table access (=K). CPU cost is
directly proportionate to K but page fetches can be estimated by
Mackert and Lohman formula using the total tuples count (K *
inner_table_selectivity * inner_table_total_tuples).

Any thoughts?



Re: Optimizer improvements: to do or not to do?

From
Simon Riggs
Date:
On Mon, 2006-09-11 at 06:20 -0700, Say42 wrote:
> I intend to play with some optimizer aspects. Just for fun. 

Cool. If you think its fun (it is), you're half way there.

> I'm a
> novice in the DBMS development so I can not promise any available
> results but if it can be useful even as yet another failed attempt I
> will try.

This type of work is 90% analysis, 10% coding. You'll need to do a lot
of investigation, lots of discussion and listening.

> That's what I want to do:
> 1. Replace not very useful indexCorrelation with indexClustering.

An opinion such as "not very useful" isn't considered sufficient
explanation or justification for a change around here.

> 2. Consider caching of inner table in a nested loops join during
> estimation total cost of the join.
> 
> More details:
> 1. During analyze we have sample rows. For every N-th sample row we can
> scan indices on qual like 'value >= index_first_column' and fetch first
> N row TIDs. To estimate count of fetched heap pages is not hard. To
> take the index clustering value just divide the pages count by the
> sample rows count.
> 2. It's more-more harder and may be impossible to me at all. The main
> ideas:
> - split page fetches cost and CPU cost into different variables and
> don't summarize it before join estimation.
> - final path cost estimation should be done in the join cost estimation
> and take into account number of inner table access (=K). CPU cost is
> directly proportionate to K but page fetches can be estimated by
> Mackert and Lohman formula using the total tuples count (K *
> inner_table_selectivity * inner_table_total_tuples).

I'd work on one thing at a time and go into it deeply.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com



Re: Optimizer improvements: to do or not to do?

From
Gregory Stark
Date:
Simon Riggs <simon@2ndquadrant.com> writes:

>> That's what I want to do:
>> 1. Replace not very useful indexCorrelation with indexClustering.
>
> An opinion such as "not very useful" isn't considered sufficient
> explanation or justification for a change around here.

There's been some previous discussion about how "correlation" was not really
what we wanted to be measuring. But that discussion was in regards to
cross-column "correlation". In that case we're trying to predict how selective
a clause will be. If we read x% of the table due to a restriction on X what
percentage of the values of Y will be represented?

In this case I think we do need to know correlation or something like it.
That's because what we're trying to predict is how close to sequential the i/o
accesses will be. If there's no correlation between index order and disk order
then they'll be random. If they're highly correlated then accesses will be
close to sequential.

It's possible there's some sort of "block-wise correlated" measure which would
be even better for our needs. We don't care if all the high values are towards
the start and low values towards the end as long as each section is in order,
for example.

It's also possible that we could use something like what you describe to
predict how many physical i/os will happen altogether. If the table is highly
clustered but disordered then the io will be random access but the cache will
be more effective than if the table is highly correlated but not clustered
(though it would take a large table to make that possible I think).

In short I think what's needed is someone to review a lot of different stats
metrics for correlation and clustering and do some analysis of how each would
be useful for cost modelling. 

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: Optimizer improvements: to do or not to do?

From
"Say42"
Date:
Simon Riggs wrote:
> This type of work is 90% analysis, 10% coding. You'll need to do a lot
> of investigation, lots of discussion and listening.

I absolutely agree with you and I am not about to rush into coding
right now. First of all I'm going to dig a lot in the PG sources,
readme's and so on. It's a good school of coding and DBMS internals
understanding.

> > That's what I want to do:
> > 1. Replace not very useful indexCorrelation with indexClustering.
>
> An opinion such as "not very useful" isn't considered sufficient
> explanation or justification for a change around here.

Sometimes the indexCorrelation even wrongful. There are many examples
of overestimation of index scan cost (data well-clustered but not
ordered - correlation is low) and some cases of underestimation when
tuples look like well ordered with high degree of correlation, but
index scan actually causes random page fetches (1-3-2-4-6-5, for
example. On server without RAID it is VERY slow. 25 times slower than
bitmap index scan). If we have special clustering measure we can more
precisely estimate pages count.
The next step could be to introduce 'ordering'  as a measure of
pages access sequentiality. Without the 'ordering' all we can
assume that pages are fetched in random order. Anyhow, if index access
cost is overestimated we can set random_page_cost=2. (Is it true in a
production database with smart RAID?)
Moreover, I think problem is more complex. With assumption that index
access is always random we dip in another problem: overestimation of
master table index scan. If it is small enough PG can choose seq scan
instead of index scan even if the last one actually much cheaper
because of caching. That is why caching should be taking into account
during joining cost calculation.

> > 2. Consider caching of inner table in a nested loops join during
> > estimation total cost of the join.

> I'd work on one thing at a time and go into it deeply.

Good news. So I'm very interested in what you think about my ideas.
Is it wrong or too naive?



Re: Optimizer improvements: to do or not to do?

From
Peter Eisentraut
Date:
Am Dienstag, 12. September 2006 12:48 schrieb Say42:
> That is why caching should be taking into account
> during joining cost calculation.

If you know of a more effective way to do that beyond the effective_cache_size 
parameter that we have now, let us know.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: Optimizer improvements: to do or not to do?

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> It's possible there's some sort of "block-wise correlated" measure
> which would be even better for our needs.

Actually, it seems obvious to me that the correlation measure ought to
ignore within-block ordering, but right now it does not.  OTOH it's not
clear how important that is, as on a decent-size table you'll probably
not have more than one sample row in a block anyway.
        regards, tom lane


Re: Optimizer improvements: to do or not to do?

From
"Say42"
Date:
Peter Eisentraut wrote:

> If you know of a more effective way to do that beyond the effective_cache_size
> parameter that we have now, let us know.

I don't know the better way and it is not my goal at all. I think about
more accurate cost estimation of nested loops join and subqueries.
Usual case in data request is a joining detail and some master tables
into a single relation. Often master tables are small and after some
nested loops iterations are well (perhaps wholly) cached. Cost
estimation of the tables access path don't care about the such caching
and cause overestimation. In some cases it can lead up to choosing not
the best plan.

Example from real life. The following request return count of national
calls from the call registration table.

select count(*) from conn.conn20060803 c
where   exists (select code from trunk_codes tc       where c.bnum >= tc.code and c.bnum like tc.code || '%'
orderby tc.code desc limit 1)
 

enable_seqscan = off:

"Aggregate  (cost=103185258.68..103185258.69 rows=1 width=0) (actual
time=13385.674..13385.676 rows=1 loops=1)"
"  ->  Seq Scan on conn20060803 c  (cost=100000000.00..103184640.52
rows=247264 width=0) (actual time=0.409..13307.254 rows=38739 loops=1)"
"        Filter: (subplan)"
"        SubPlan"
"          ->  Limit  (cost=0.00..6.42 rows=1 width=10) (actual
time=0.020..0.020 rows=0 loops=494527)"
"                ->  Index Scan Backward using belg_mobile_pkey on
belg_mobile tc  (cost=0.00..6.42 rows=1 width=10) (actual
time=0.012..0.012 rows=0 loops=494527)"
"                      Index Cond: (($0)::text >= (code)::text)"
"                      Filter: (($0)::text ~~ ((code)::text ||
'%'::text))"
"Total runtime: 13385.808 ms"

enable_seqscan =on:

"Aggregate  (cost=1101623.47..1101623.48 rows=1 width=0) (actual
time=63724.508..63724.509 rows=1 loops=1)"
"  ->  Seq Scan on conn20060803 c  (cost=0.00..1101005.30 rows=247264
width=0) (actual time=2.244..63640.413 rows=38739 loops=1)"
"        Filter: (subplan)"
"        SubPlan"
"          ->  Limit  (cost=2.20..2.20 rows=1 width=10) (actual
time=0.121..0.121 rows=0 loops=494527)"
"                ->  Sort  (cost=2.20..2.20 rows=1 width=10) (actual
time=0.114..0.114 rows=0 loops=494527)"
"                      Sort Key: code"
"                      ->  Seq Scan on belg_mobile tc  (cost=0.00..2.19
rows=1 width=10) (actual time=0.096..0.099 rows=0 loops=494527)"
"                            Filter: ((($0)::text >= (code)::text) AND
(($0)::text ~~ ((code)::text || '%'::text)))"
"Total runtime: 63724.630 ms"



Re: Optimizer improvements: to do or not to do?

From
Tom Lane
Date:
"Say42" <andrews42@yandex.ru> writes:
> Usual case in data request is a joining detail and some master tables
> into a single relation.

Optimizing on the basis of only one example is seldom a good idea...
and I think you are falling into that trap by supposing that there
is a "usual case".
        regards, tom lane


Re: Optimizer improvements: to do or not to do?

From
"Say42"
Date:
Say42 wrote:

> select count(*) from conn.conn20060803 c
> where
>     exists (select code from belg_mobile tc
...

Correction: replace 'trunk_codes' with 'belg_mobile'.



Re: Optimizer improvements: to do or not to do?

From
"Say42"
Date:
Tom Lane wrote:
> Optimizing on the basis of only one example is seldom a good idea...
> and I think you are falling into that trap by supposing that there
> is a "usual case".

Perhaps I am wrong but I assume normalization is a usual case, small
master (parent) tables are not very rare also.
Yes, my example is unusual but it is _real_ and demonstrate PG
optimizer inaccuracy. Why don't we make PG optimizer more close to
reality if we can? Is it so needless and I make a mountain out of a
molehill?



Re: Optimizer improvements: to do or not to do?

From
Peter Eisentraut
Date:
Say42 wrote:
> Perhaps I am wrong but I assume normalization is a usual case, small
> master (parent) tables are not very rare also.
> Yes, my example is unusual but it is _real_ and demonstrate PG
> optimizer inaccuracy. Why don't we make PG optimizer more close to
> reality if we can? Is it so needless and I make a mountain out of a
> molehill?

All you have shown so far is that one particular query runs faster on 
your machine when sequential scans are turned off.  That is certainly a 
problem that is worth addressing.  But you haven't offered any analysis 
about the cause of this problem, so any speculation about 
normalization, usual cases, caching effects and so on are unfounded and 
premature.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: Optimizer improvements: to do or not to do?

From
"Say42"
Date:
Peter Eisentraut wrote:
> But you haven't offered any analysis about the cause of this problem, so any
> speculation about normalization, usual cases, caching effects and so on are
> unfounded and premature.

Ok. My previous message was a bit pompous and unfounded. Sorry.
Below I'll try to explain what I mean when I spoke about caching
effect. Let's take my pervious example (I repost query and some lines
from 'explain' here for convenience):

select count(*) from conn.conn20060803 c where   exists (select code from belg_mobile tc       where c.bnum >= tc.code
andc.bnum like tc.code || '%'       order by tc.code desc limit 1)
 

Index Scan Backward using belg_mobile_pkey on belg_mobile tc
(cost=0.00..6.42 rows=1 width=10)
(actual time=0.012..0.012 rows=0 loops=494527)

Seq Scan on belg_mobile tc
(cost=0.00..2.19 rows=1 width=10)
(actual time=0.096..0.099 rows=0 loops=494527)

belg_mobile is very small (68 rows (1 heap page) and has PK on code
column (2 index pages)). indexCorrelation is equal to 0.0445 and almost
don't affect cost estimation result.

PG cost estimation (as far as I know, of course):

Index scan cost = 2 (index pages) + 1 (heap pages) * 4
(random_page_cost) + ( 0.0025 (cpu_operator_cost) * 3 (# ops) + 0.001
(cpu_index_tuple_cost) + 0.01 (cpu_tuple_cost) ) * 68 (record count) * 0.5 (selectivity of
subquery) ~ 6 (pages fetch cost) + 0.42 (cpu cost) = 6.42

Seq scan cost = 1(heap page) + (0.0025 (cpu_operator_cost) * 3 (# ops) + 0.01 (cpu_tuple_cost)) * 68 (record count) = 1
(pagesfetch cost) + 1.19 (cpu cost) = 2.19
 

The estimation is ok if we touch the belg_mobile table only once. In
the subquery we do it many times. After the first iteration of the
subquery all the belg_mobile's heap and index pages are in the cache
and cost per iteration should be estimated using formulae:

Index scan cost = ( 6 (pages fetch cost) + 0.42 (cpu cost) * 500K (conn table row count) ) / 500K  ~ 0.42

Seq scan cost = ( 1 (pages fetch cost) + 1.19 (cpu cost) * 500K (conn table row count) ) / 500K  ~ 1.19

Index scan actually more cheaper because less than one tenth of conn
rows have appropriate codes in the belg_mobile table.

That's what I want to say. I am not a veteran DBMS user so I can not
gauge importance of this cost inaccuracy in the whole. I hope you help
me to look at the problem (?) more widely than I can at the moment.



Re: Optimizer improvements: to do or not to do?

From
Tom Lane
Date:
"Say42" <andrews42@yandex.ru> writes:
> ... Let's take my pervious example (I repost query and some lines
> from 'explain' here for convenience):

> select count(*) from conn.conn20060803 c where
>     exists (select code from belg_mobile tc
>         where c.bnum >= tc.code and c.bnum like tc.code || '%'
>         order by tc.code desc limit 1)

I'm having a hard time getting excited about improving this query when
it's so badly coded in the first place.  What's an ORDER BY doing in
an EXISTS subquery?  The LIMIT is unnecessary too.  And the inner WHERE
says nothing so much as "I don't know how to design a database" :-(.
If we're going to look at specific examples we should at least look
at examples that are representative of typical good practice.

It is true that EXISTS() subqueries are planned independently without
any idea of how often they might get re-executed.  This would be good
to fix but I don't see any clear way to do it --- at the time we are
processing the outer WHERE, we don't have enough context to judge
how many times a particular clause might be evaluated.  (Yeah, in this
case it's pretty obvious that it'll be executed once per conn20060803
row, but in join situations, or even just with additional outer WHERE
clauses, it's not nearly so obvious.)
        regards, tom lane


Re: Optimizer improvements: to do or not to do?

From
Ron Mayer
Date:
Simon Riggs wrote:
> On Mon, 2006-09-11 at 06:20 -0700, Say42 wrote:
>> That's what I want to do:
>> 1. Replace not very useful indexCorrelation with indexClustering.
> 
> An opinion such as "not very useful" isn't considered sufficient
> explanation or justification for a change around here.

"Not sufficient for some types of data" would have been more fair.

I speculate that an new additional stat of "average # of unique values for a column within a block"
would go a long way to helping my worst queries.


It's common here for queries to vastly overestimate the
number of pages that would need to be read because
postgresql's guess at the correlation being practically 0
despite the fact that the distinct values for any given
column are closely packed on a few pages.

Our biggest tables (180G or so) are mostly spatial data with columns
like "City" "State" "Zip" "County" "Street" "School District", "Police
Beat", "lat/long" etc; and we cluster the table on zip,street.

Note that practically all the rows for any single value of any
of the columns will lay in the same few blocks.  However the
calculated "correlation" being low because the total ordering
of the other values doesn't match that of zip codes.  This
makes the optimizer vastly overestimate the cost of index
scans because it guesses that most of the table will need
to be read, even though in reality just a few pages are needed.


If someone does look at the correlation calculations, I hope
this type of data gets considered as well.


I speculate that a new stat of "average # of unique values for a column within a block"
could be useful here in addition to correlation.  For most
all my columns in my big table, this stat would be 1 or 2;
which I think would be a useful hint that despite a low
"correlation", the distinct values are indeed packed together
in blocks.   That way the optimizer can see that a
smaller number of pages would need to be accessed than
correlation alone would suggest.

Does this make sense, or am I missing something.


Re: Optimizer improvements: to do or not to do?

From
Gregory Stark
Date:
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:

> It's common here for queries to vastly overestimate the
> number of pages that would need to be read because
> postgresql's guess at the correlation being practically 0
> despite the fact that the distinct values for any given
> column are closely packed on a few pages.

I think we need a serious statistics jock to pipe up with some standard
metrics that do what we need. Otherwise we'll never have a solid footing for
the predictions we make and will never know how much we can trust them.

That said I'm now going to do exactly what I just said we should stop doing
and brain storm about an ad-hoc metric that might help:

I wonder if what we need is something like: sort the sampled values by value
and count up the average number of distinct blocks per value. That might let
us predict how many pages a fetch of a specific value would retrieve. Or
perhaps we need a second histogram where the quantities are of distinct pages
rather than total records.

We might also need a separate "average number of n-block spans per value"
metric to predict how sequential the i/o will be in addition to how many pages
will be fetched.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: Optimizer improvements: to do or not to do?

From
AgentM
Date:
On Sep 13, 2006, at 14:44 , Gregory Stark wrote:

>
> I think we need a serious statistics jock to pipe up with some  
> standard
> metrics that do what we need. Otherwise we'll never have a solid  
> footing for
> the predictions we make and will never know how much we can trust  
> them.
>
> That said I'm now going to do exactly what I just said we should  
> stop doing
> and brain storm about an ad-hoc metric that might help:
>
> I wonder if what we need is something like: sort the sampled values  
> by value
> and count up the average number of distinct blocks per value. That  
> might let
> us predict how many pages a fetch of a specific value would  
> retrieve. Or
> perhaps we need a second histogram where the quantities are of  
> distinct pages
> rather than total records.
>
> We might also need a separate "average number of n-block spans per  
> value"
> metric to predict how sequential the i/o will be in addition to how  
> many pages
> will be fetched.

Currently, statistics are only collected during an "ANALYZE". Why  
aren't statistics collected during actual query runs such as seq  
scans? One could turn such as beast off in order to get repeatable,  
deterministic optimizer results.

-M


Re: Optimizer improvements: to do or not to do?

From
Ron Mayer
Date:
Gregory Stark wrote:
> Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:
>> ...vastly overestimate the number of pages .. because postgresql's guess 
>> at the correlation being practically 0 despite the fact that the distinct 
>> values for any given column are closely packed on a few pages.
> 
> I think we need a serious statistics jock to pipe up with some standard
> metrics that do what we need. Otherwise we'll never have a solid footing for
> the predictions we make and will never know how much we can trust them.

Do we know if any such people participate/lurk on this list, or
if the conversation should go elsewhere?

> That said I'm now going to do exactly what I just said we should stop doing
> and brain storm about an ad-hoc metric that might help:
> 
> I wonder if what we need is something like: sort the sampled values by value
> and count up the average number of distinct blocks per value. That might let
> us predict how many pages a fetch of a specific value would retrieve. Or
> perhaps we need a second histogram where the quantities are of distinct pages
> rather than total records.

Either of these sound like they might be an improvement over correlation
itself to estimate the number of pages it'd need to read.  Would it be
relatively easy or hard for a programmer not too familiar with the code
to experiment with these ideas?  Where would be a good place to look.

> We might also need a separate "average number of n-block spans per value"
> metric to predict how sequential the i/o will be in addition to how many pages
> will be fetched.

I'm wildly guessing that, the # of pages itself seems to be
a bigger factor than the sequential/random nature.  For example,
I do a query for data from a particular small city I'd only
need dozens of pages, not many thousands.

OTOH, it'd be neat to know if this were true.  Is there any
good way to make something like explain analyze show both
the expected and actual # of pages and # of seeks?


Re: Optimizer improvements: to do or not to do?

From
Tom Lane
Date:
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:
> I'm wildly guessing that, the # of pages itself seems to be
> a bigger factor than the sequential/random nature.

No, they're both important: fetching N pages in sequence is way cheaper
than fetching the same number of pages scattered all over.  This is
partly because you reduce seeking at the hardware level, and partly
because sequential reads cue the kernel to do read-ahead, allowing
overlap of I/O and computation.
        regards, tom lane


Re: Optimizer improvements: to do or not to do?

From
Joshua Reich
Date:
Ron Mayer wrote:
> Gregory Stark wrote:
>   
>> Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:
>>     
>>> ...vastly overestimate the number of pages .. because postgresql's guess 
>>> at the correlation being practically 0 despite the fact that the distinct 
>>> values for any given column are closely packed on a few pages.
>>>       
>> I think we need a serious statistics jock to pipe up with some standard
>> metrics that do what we need. Otherwise we'll never have a solid footing for
>> the predictions we make and will never know how much we can trust them.
>>     
>
> Do we know if any such people participate/lurk on this list, or
> if the conversation should go elsewhere?
>   
I lurk... I don't  know if I'm a 'statistics jock', but I may be 
valuable if only I had a better understanding of how the optimizer 
works. I have been following this thread with interest, but could really 
do with a good pointer to background information beyond what I have read 
in the main postgres manual. Does such information exist,  and if so, 
where ?

Josh Reich


Re: Optimizer improvements: to do or not to do?

From
Tom Lane
Date:
Joshua Reich <josh@root.net> writes:
> I lurk... I don't  know if I'm a 'statistics jock', but I may be 
> valuable if only I had a better understanding of how the optimizer 
> works. I have been following this thread with interest, but could really 
> do with a good pointer to background information beyond what I have read 
> in the main postgres manual. Does such information exist,  and if so, 
> where ?

Well, there's the 20000-foot view here:
http://developer.postgresql.org/pgdocs/postgres/planner-optimizer.html
but after that you have to start reading code.

The optimizer README file may be useful:
http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/optimizer/README
but it goes into a lot of details that probably aren't interesting for
your purposes.  Most of the planner is just mechanism associated with
generating different possible plans.  The policy that determines which
plan is chosen is the cost-estimation equations, and those are all in
costsize.c and selfuncs.c:
http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/optimizer/path/costsize.c
http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/adt/selfuncs.c
The division between these two files is a bit historical, but roughly
speaking selfuncs.c knows about the behavior of specific WHERE-clause
operators and index access methods, while costsize.c knows about the
behavior of particular plan types.

I'd like to think that costsize.c is well enough commented that you can
follow it even without any C knowledge, but selfuncs.c may be a bit more
daunting.  Still, the comments are pretty extensive, and feel free to
ask questions on pg-hackers.
        regards, tom lane


Re: Optimizer improvements: to do or not to do?

From
"Say42"
Date:
Tom Lane wrote:

> I'm having a hard time getting excited about improving this query when
> it's so badly coded in the first place.  What's an ORDER BY doing in
> an EXISTS subquery?  The LIMIT is unnecessary too.  And the inner WHERE
> says nothing so much as "I don't know how to design a database" :-(.

It was the test query which has the same execution plan for belg_mobile
(and the same problem) as the production query below:

select (select max(code) from belg_mobile tc   where c.bnum >= tc.code and c.bnum like tc.code || '%') as code,
c.cause,c.ilno, extract(hour from c.datetime) as hour, count(*) as cnt, sum(c.dur) as dur
 
from conn.conn20060803 c
where itgrp = :itgrp
group by 1,2,3,4

It's a simple OLAP query for analysis telephonic traffic distribution
over time and trunk codes.
'max(codes)' is used to get  the most matching code. For example,
84725 and 8472 are both valid codes, and number 84725123456 must match
84725 but not 8472. The 'c.bnum >= tc.code' qual significantly reduce
index scan and execution time.