Thread: why do optimizer parameters have to be set manually?

why do optimizer parameters have to be set manually?

From
"Marinos J. Yannikos"
Date:
Hi,

it seems to me that the optimizer parameters (like random_page_cost
etc.) could easily be calculated and adjusted dynamically be the DB
backend based on the planner's cost estimates and actual run times for
different queries. Perhaps the developers could comment on that?

I'm not sure how the parameters are used internally (apart from whatever
"EXPLAIN" shows), but if cpu_operator_cost is the same for all
operators, this should probably also be adjusted for individual
operators (I suppose that ">" is not as costly as "~*").

As far as the static configuration is concerned, I'd be interested in
other users' parameters and hardware configurations. Here's ours (for a
write-intensive db that also performs many queries with regular
expression matching):

effective_cache_size = 1000000  # typically 8KB each
#random_page_cost = 0.2     # units are one sequential page fetch cost
random_page_cost = 3        # units are one sequential page fetch cost
#cpu_tuple_cost = 0.01      # (same)
cpu_index_tuple_cost = 0.01 # (same) 0.1
#cpu_operator_cost = 0.0025 # (same)
cpu_operator_cost = 0.025   # (same)

other options:

shared_buffers = 240000 # 2*max_connections, min 16, typically 8KB each
max_fsm_relations = 10000   # min 10, fsm is free space map, ~40 bytes
max_fsm_pages = 10000000    # min 1000, fsm is free space map, ~6 bytes
#max_locks_per_transaction = 20 # min 10
wal_buffers = 128       # min 4, typically 8KB each
sort_mem = 800000       # min 64, size in KB
vacuum_mem = 100000     # min 1024, size in KB
checkpoint_segments = 80    # in logfile segments, min 1, 16MB each
checkpoint_timeout = 300    # range 30-3600, in seconds
commit_delay = 100000       # range 0-100000, in microseconds
commit_siblings = 5     # range 1-1000

12GB RAM, dual 2,80GHz Xeon, 6x 10K rpm disks in a RAID-5, Linux 2.4.23
with HT enabled.

Regards,
  Marinos


Re: why do optimizer parameters have to be set manually?

From
Tom Lane
Date:
"Marinos J. Yannikos" <mjy@geizhals.at> writes:
> it seems to me that the optimizer parameters (like random_page_cost
> etc.) could easily be calculated and adjusted dynamically be the DB
> backend based on the planner's cost estimates and actual run times for
> different queries. Perhaps the developers could comment on that?

No, they are not that easy to determine.  In particular I think the idea
of automatically feeding back error measurements is hopeless, because
you cannot tell which parameters are wrong.

> I'm not sure how the parameters are used internally (apart from whatever
> "EXPLAIN" shows), but if cpu_operator_cost is the same for all
> operators, this should probably also be adjusted for individual
> operators (I suppose that ">" is not as costly as "~*").

In theory perhaps, but in practice this is far down in the noise in most
situations.

            regards, tom lane

Re: why do optimizer parameters have to be set manually?

From
Christopher Browne
Date:
mjy@geizhals.at ("Marinos J. Yannikos") writes:
> it seems to me that the optimizer parameters (like random_page_cost
> etc.) could easily be calculated and adjusted dynamically be the DB
> backend based on the planner's cost estimates and actual run times for
> different queries. Perhaps the developers could comment on that?

Yes, it seems like a Small Matter Of Programming.

http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?SMOP

In seriousness, yes, it would seem a reasonable idea to calculate some
of these values a bit more dynamically.

I would be inclined to start with something that ran a workload, and
provided static values based on how that workload went.  That would
require NO intervention inside the DB server; it could be accomplished
simply by writing a database script.  Feel free to contribute either a
script or a backend "hack"...
--
let name="cbbrowne" and tld="libertyrms.info" in String.concat "@" [name;tld];;
<http://dev6.int.libertyrms.com/>
Christopher Browne
(416) 646 3304 x124 (land)

is it possible to get the optimizer to use indexes with a like clause

From
Dave Cramer
Date:
It appears that the optimizer only uses indexes for = clause?

Dave


Re: is it possible to get the optimizer to use indexes

From
Christopher Kings-Lynne
Date:
> It appears that the optimizer only uses indexes for = clause?

The optimizer will used indexes for LIKE clauses, so long as the clause
is a prefix search, eg:

SELECT * FROM test WHERE a LIKE 'prf%';

Chris


Re: is it possible to get the optimizer to use indexes

From
Dave Cramer
Date:
after vacuum verbose analyze, I still get

explain select * from isppm where item_upc_cd like '06038301234';
                              QUERY PLAN
-----------------------------------------------------------------------
 Seq Scan on isppm  (cost=100000000.00..100009684.89 rows=2 width=791)
   Filter: (item_upc_cd ~~ '06038301234'::text)
(2 rows)

isp=# explain select * from isppm where item_upc_cd = '06038301234';
                               QUERY PLAN
------------------------------------------------------------------------
 Index Scan using isppm_x0 on isppm  (cost=0.00..5.86 rows=2 width=791)
   Index Cond: (item_upc_cd = '06038301234'::bpchar)
(2 rows)


Dave
On Thu, 2003-12-18 at 20:38, Christopher Kings-Lynne wrote:
> > It appears that the optimizer only uses indexes for = clause?
>
> The optimizer will used indexes for LIKE clauses, so long as the clause
> is a prefix search, eg:
>
> SELECT * FROM test WHERE a LIKE 'prf%';
>
> Chris
>
>


Re: is it possible to get the optimizer to use indexes with a like clause

From
Christopher Browne
Date:
pg@fastcrypt.com (Dave Cramer) wrote:
> It appears that the optimizer only uses indexes for = clause?

It can use indices only if there is a given prefix.

Thus:
   where text_field like 'A%'

can use the index, essentially transforming this into the clauses

   where text_field >= 'A' and
         text_field < 'B'.

You can't get much out of an index for
   where text_field like '%SOMETHING'
--
(reverse (concatenate 'string "moc.enworbbc" "@" "enworbbc"))
http://www3.sympatico.ca/cbbrowne/wp.html
"When the grammar checker identifies an error, it suggests a
correction and can even makes some changes for you."
-- Microsoft Word for Windows 2.0 User's Guide, p.35:

Re: is it possible to get the optimizer to use indexes

From
Stephan Szabo
Date:
On Thu, 18 Dec 2003, Dave Cramer wrote:

> after vacuum verbose analyze, I still get
>
> explain select * from isppm where item_upc_cd like '06038301234';
>                               QUERY PLAN
> -----------------------------------------------------------------------
>  Seq Scan on isppm  (cost=100000000.00..100009684.89 rows=2 width=791)
>    Filter: (item_upc_cd ~~ '06038301234'::text)
> (2 rows)

IIRC, the other limitation is that it only does so in "C" locale due to
wierdnesses in other locales.

Re: is it possible to get the optimizer to use indexes

From
Tom Lane
Date:
Dave Cramer <pg@fastcrypt.com> writes:
> after vacuum verbose analyze, I still get [a seqscan]

The other gating factor is that you have to have initdb'd in C locale.
Non-C locales tend to use wild and wooly sort orders that are not
compatible with what LIKE needs.

            regards, tom lane

Re: is it possible to get the optimizer to use indexes

From
Dave Cramer
Date:
So even in a north-american locale, such as en_CA this will be a
problem?

Dave
On Thu, 2003-12-18 at 22:44, Tom Lane wrote:
> Dave Cramer <pg@fastcrypt.com> writes:
> > after vacuum verbose analyze, I still get [a seqscan]
>
> The other gating factor is that you have to have initdb'd in C locale.
> Non-C locales tend to use wild and wooly sort orders that are not
> compatible with what LIKE needs.
>
>             regards, tom lane
>
>


Re: why do optimizer parameters have to be set manually?

From
"Marinos J. Yannikos"
Date:
Tom Lane wrote:
> No, they are not that easy to determine.  In particular I think the idea
> of automatically feeding back error measurements is hopeless, because
> you cannot tell which parameters are wrong.

Isn't it just a matter of solving an equation system with n variables (n
being the number of parameters), where each equation stands for the
calculation of the run time of a particular query? I.e. something like
this for a sequential scan over 1000 rows with e.g. 2 operators used per
iteration that took 2 seconds (simplified so that the costs are actual
timings and not relative costs to a base value):

1000 * sequential_scan_cost + 1000 * 2 * cpu_operator_cost = 2.0 seconds

With a sufficient number of equations (not just n, since not all query
plans use all the parameters) this system can be solved for the
particular query mix that was used. E.g. with a second sequential scan
over 2000 rows with 1 operator per iteration that took 3 seconds you can
derive:

sequential_scan_cost = 1ms
cpu_operator_cost = 0.5ms

This could probably be implemented with very little overhead compared to
the actual run times of the queries.

Regard,
  Marinos




Re: is it possible to get the optimizer to use indexes

From
Tom Lane
Date:
Dave Cramer <pg@fastcrypt.com> writes:
> So even in a north-american locale, such as en_CA this will be a
> problem?

If it's not "C" we won't try to optimize LIKE.  I know en_US does not
work (case-insensitive, funny rules about spaces, etc) and I would
expect en_CA has the same issues.

If you're using 7.4 you have the option to create a special index
instead of re-initdb'ing your whole database.

            regards, tom lane

Re: why do optimizer parameters have to be set manually?

From
Tom Lane
Date:
"Marinos J. Yannikos" <mjy@geizhals.at> writes:
> Tom Lane wrote:
>> No, they are not that easy to determine.  In particular I think the idea
>> of automatically feeding back error measurements is hopeless, because
>> you cannot tell which parameters are wrong.

> Isn't it just a matter of solving an equation system with n variables (n
> being the number of parameters), where each equation stands for the
> calculation of the run time of a particular query?

If we knew all the variables involved, it might be (though since the
equations would be nonlinear, the solution would be more difficult than
you suppose).  The real problems are:

1. There is lots of noise in any real-world measurement, mostly due to
competition from other processes.

2. There are effects we don't even try to model, such as the current
contents of kernel cache.  Everybody who's done any work with Postgres
knows that for small-to-middling tables, running the same query twice in
a row will yield considerably different runtimes, because the second
time through all the data will be in kernel cache.  But we don't have
any useful way to model that in the optimizer, since we can't see what
the kernel has in its buffers.

3. Even for the effects we do try to model, some of the equations are
pretty ad-hoc and might not fit real data very well.  (I have little
confidence in the current correction for index order correlation, for
example.)

In short, if you just try to fit the present cost equations to real
data, what you'll get will inevitably be "garbage in, garbage out".
You could easily end up with parameter values that are much less
realistic than the defaults.

Over time we'll doubtless improve the optimizer's cost models, and
someday we might get to a point where this wouldn't be a fool's errand,
but I don't see it happening in the foreseeable future.

I think a more profitable approach is to set up special test code to try
to approximate the value of individual parameters measured in isolation.
For instance, the current default of 4.0 for random_page_cost was
developed through rather extensive testing a few years ago, and I think
it's still a decent average value (for the case where you are actually
doing I/O, mind you).  But if your disks have particularly fast or slow
seek times, maybe it's not good for you.  It might be useful to package
up a test program that repeats those measurements on particular systems
--- though the problem of noisy measurements still applies.  It is not
easy or cheap to get a measurement that isn't skewed by kernel caching
behavior.  (You need a test file significantly larger than RAM, and
even then you'd better repeat the measurement quite a few times to see
how much noise there is in it.)

            regards, tom lane

Re: is it possible to get the optimizer to use indexes

From
Erki Kaldjärv
Date:
Hello,

i got indexes to work with "text_pattern_ops" for locale et_EE.

So instead of:
create index some_index_name on some_table(some_text_field);

nor

create index some_index_name on some_table(some_text_field text_ops);

try to create index as follows:
create index some_index_name on some_table(some_text_field text_pattern_ops);

Note that text_pattern_ops is available pg >= 7.4.

Regards,

Erki Kaldjärv
Webware OÜ
www.webware.ee

Tom Lane wrote:
Dave Cramer <pg@fastcrypt.com> writes: 
So even in a north-american locale, such as en_CA this will be a
problem?   
If it's not "C" we won't try to optimize LIKE.  I know en_US does not
work (case-insensitive, funny rules about spaces, etc) and I would
expect en_CA has the same issues.

If you're using 7.4 you have the option to create a special index
instead of re-initdb'ing your whole database.
		regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster 

Re: why do optimizer parameters have to be set manually?

From
William Yu
Date:
Tom Lane wrote:
> easy or cheap to get a measurement that isn't skewed by kernel caching
> behavior.  (You need a test file significantly larger than RAM, and
> even then you'd better repeat the measurement quite a few times to see
> how much noise there is in it.)

I found a really fast way in Linux to flush the kernel cache and that is
to unmount the drive and then remount. Beats having to read though a
file > RAM everytime.


Re: is it possible to get the optimizer to use indexes

From
Doug McNaught
Date:
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:

>> It appears that the optimizer only uses indexes for = clause?
>
> The optimizer will used indexes for LIKE clauses, so long as the
> clause is a prefix search, eg:
>
> SELECT * FROM test WHERE a LIKE 'prf%';

Doesn't this still depend on your locale?

-Doug

Re: is it possible to get the optimizer to use indexes

From
Dave Cramer
Date:
Doug,

Yes, it does depend on the locale, you can get around this in 7.4 by
building the index with smart operators

Dave
On Thu, 2003-12-18 at 20:38, Doug McNaught wrote:
> Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
>
> >> It appears that the optimizer only uses indexes for = clause?
> >
> > The optimizer will used indexes for LIKE clauses, so long as the
> > clause is a prefix search, eg:
> >
> > SELECT * FROM test WHERE a LIKE 'prf%';
>
> Doesn't this still depend on your locale?
>
> -Doug
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>
>