Thread: cpu_tuple_cost

cpu_tuple_cost

From
Daniel Schuchardt
Date:
Hi List,

i have a query plan who is bad with standard cpu_tuple_costs and good if
I raise cpu_tuple_costs. Is it is a good practice to raise them if i
want to force postgres to use indexes more often? Or is it is better to
disable sequence scans?

CIMSOFT=# ANALYSE mitpln;
ANALYZE

CIMSOFT=# EXPLAIN ANALYSE SELECT * FROM mitpln WHERE
date_to_yearmonth_dec(mpl_date)='20050';
                                                   QUERY PLAN

--------------------------------------------------------------------------------
  Seq Scan on mitpln  (cost=0.00..1411.85 rows=2050 width=69) (actual
time=562.000..1203.000 rows=1269 loops=1)
    Filter: ((date_to_yearmonth_dec((mpl_date)::timestamp without time
zone))::text = '20050'::text)
  Total runtime: 1203.000 ms
(3 rows)

CIMSOFT=# SET cpu_tuple_cost = 0.07;
SET
CIMSOFT=# EXPLAIN ANALYSE SELECT * FROM mitpln WHERE
date_to_yearmonth_dec(mpl_date)='20050';
                                                               QUERY PLAN

--------------------------------------------------------------------------------
  Index Scan using mitpln_yearmonth_dec on mitpln  (cost=0.00..2962.86
rows=2050width=69) (actual time=0.000..0.000 rows=1269 loops=1)
    Index Cond: ((date_to_yearmonth_dec((mpl_date)::timestamp without
time zone))::text = '20050'::text)
  Total runtime: 16.000 ms
(3 rows)


CIMSOFT=# \d mitpln
                                        Table "public.mitpln"
     Column    |         Type          |                         Modifiers

--------------+-----------------------+-----------------------------------------
  mpl_id       | integer               | not null default
nextval('public.mitpln_mpl_id_seq'::text)
  mpl_date     | date                  |
  mpl_minr     | integer               | not null
  mpl_tpl_name | character varying(20) |
  mpl_feiertag | character varying(50) |
  mpl_min      | real                  |
  mpl_saldo    | real                  |
  mpl_buch     | boolean               | not null default false
  mpl_absaldo  | real                  |
  mpl_vhz      | real                  |
  dbrid        | character varying     | default nextval('db_id_seq'::text)
Indexes:
     "mitpln_pkey" PRIMARY KEY, btree (mpl_id)
     "mitpln_idindex" UNIQUE, btree (dbrid)
     "xtt5126" UNIQUE, btree (mpl_date, mpl_minr)
     "mitpln_yearmonth_dec" btree
(date_to_yearmonth_dec(mpl_date::timestamp with
out time zone))


CIMSOFT=# SELECT count(*) FROM mitpln;
  count
-------
  26128
(1 row)

Re: cpu_tuple_cost

From
Tom Lane
Date:
Daniel Schuchardt <daniel_schuchardt@web.de> writes:
> i have a query plan who is bad with standard cpu_tuple_costs and good if
> I raise cpu_tuple_costs. Is it is a good practice to raise them if i
> want to force postgres to use indexes more often?

Reducing random_page_cost is usually the best way to get the planner to
favor indexscans more.

            regards, tom lane

Re: cpu_tuple_cost

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> Reducing random_page_cost is usually the best way to get the
> planner to favor indexscans more.

On that note, can I raise the idea again of dropping the default
value for random_page_cost in postgresql.conf? I think 4 is too
conservative in this day and age. Certainly the person who will
be negatively impacted by a default drop of 4 to 3 will be the
exception and not the rule.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200503140702
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iD8DBQFCNX2avJuQZxSWSsgRAk7QAJ4lye7pEcQIWMRV2fs15bHGY2zBbACeJtLC
E/vUG/lagjcyWPt9gfngsn0=
=CKIq
-----END PGP SIGNATURE-----



Re: cpu_tuple_cost

From
Bruce Momjian
Date:
Greg Sabino Mullane wrote:
[ There is text before PGP section. ]
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
> > Reducing random_page_cost is usually the best way to get the
> > planner to favor indexscans more.
>
> On that note, can I raise the idea again of dropping the default
> value for random_page_cost in postgresql.conf? I think 4 is too
> conservative in this day and age. Certainly the person who will
> be negatively impacted by a default drop of 4 to 3 will be the
> exception and not the rule.

Agreed.  I think we should reduce it at least to 3.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: cpu_tuple_cost

From
Tom Lane
Date:
"Greg Sabino Mullane" <greg@turnstep.com> writes:
> On that note, can I raise the idea again of dropping the default
> value for random_page_cost in postgresql.conf? I think 4 is too
> conservative in this day and age. Certainly the person who will
> be negatively impacted by a default drop of 4 to 3 will be the
> exception and not the rule.

The ones who'd be negatively impacted are the ones we haven't
been hearing from ;-).  To assume that they aren't out there
is a logical fallacy.

I still think that 4 is about right for large databases (where
"large" is in comparison to available RAM).

Also, to the extent that we think these numbers mean anything at all,
we should try to keep them matching the physical parameters we think
they represent.  I think that the "reduce random_page_cost" mantra
is not an indication that that parameter is wrong, but that the
cost models it feeds into need more work.  One thing we *know*
is wrong is the costing of nestloop inner indexscans: there needs
to be a correction for caching of index blocks across repeated
scans.  I've looked at this a few times but not come up with
anything that seemed convincing.  Another thing I've wondered
about more than once is if we shouldn't discount fetching of
higher-level btree pages on the grounds that they're probably
in RAM already, even if the indexscan isn't inside a loop.

            regards, tom lane

Re: cpu_tuple_cost

From
Gregory Stark
Date:
Bruce Momjian <pgman@candle.pha.pa.us> wrote:

> Agreed.  I think we should reduce it at least to 3.

Note that changing it from 4 to 3 or even 2 is unlikely to really change much.
Many of the plans people complain about turn out to have critical points
closer to 1.2 or 1.1.

The only reason things work out better with such low values is because people
have data sets that fit more or less entirely in RAM. So values close to 1 or
even equal to 1 actually represent the reality.

The "this day and age" argument isn't very convincing. Hard drive capacity
growth has far outstripped hard drive seek time and bandwidth improvements.
Random access has more penalty than ever.

--
greg

Re: cpu_tuple_cost

From
Josh Berkus
Date:
Greg,

> On that note, can I raise the idea again of dropping the default
> value for random_page_cost in postgresql.conf? I think 4 is too
> conservative in this day and age. Certainly the person who will
> be negatively impacted by a default drop of 4 to 3 will be the
> exception and not the rule.

I don't agree.  The defaults are there for people who aren't going to read
enough of the documentation to set them.  As such, conservative for the
defaults is appropriate.

If we were going to change anything automatically, it would be to set
effective_cache_size to 1/3 of RAM at initdb time.  However, I don't know any
method to determine RAM size that works on all the platforms we support.

Tom,

> Also, to the extent that we think these numbers mean anything at all,
> we should try to keep them matching the physical parameters we think
> they represent.

Personally, what I would love to see is the system determining and caching
some of these parameters automatically.   For example, in a database which
has been running in production for a couple of days, it should be possible to
determine the ratio of average random seek tuple cost to average seq scan
tuple cost.

Other parameters should really work the same way.   Effective_cache_size, for
example, is a blunt instrument to replace what the database should ideally do
through automated interactive fine tuning.  Particularly since we have 2
separate caches (or 3, if you count t1 and t2 from 2Q).   What the planner
really needs to know is: is this table or index already in the t1 or t2 cache
(can't we determine this?)?   How likely is it to be in the filesystem cache?
The latter question is not just one of size (table < memory), but one of
frequency of access.

Of course, this stuff is really, really hard which is why we rely on the
GUCs ...

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Changing the random_page_cost default (was: cpu_tuple_cost)

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Josh Berkus wrote:
> I don't agree.  The defaults are there for people who aren't going to read
> enough of the documentation to set them.  As such, conservative for the
> defaults is appropriate.

Sure, but I would argue that 4 is *too* conservative. We've certainly changed
other values over the years. I see it as those most affected by this change
are those who are least likely to have the know-how to change the default,
and are also the majority of our users. I've often had to reduce this, working
on many databases, on many versions of PostgreSQL. Granted, I don't work on
any huge, complex, hundreds of gig databases, but that supports my point -
if you are really better off with a /higher/ (than 3) random_page_cost, you
already should be tweaking a lot of stuff yourself anyway. Tom Lane has a
good point about tweaking other default parameters as well, and that's a
worthy goal, but I don't think extended searching for a "sweet spot" should
prevent us from making a small yet important (IMO!) change in the default
of this one variable.

N.B. My own personal starting default is 2, but I thought 3 was a nice
middle ground more likely to reach consensus here. :)

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200503141727
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iD8DBQFCNhFAvJuQZxSWSsgRAgZiAJ9947emxFoMMXKooJHi2ZPIQr9xGACgjaFf
hBCPTuHZwGFzomf1Z1TDpVo=
=KX9t
-----END PGP SIGNATURE-----



Re: Changing the random_page_cost default (was: cpu_tuple_cost)

From
Jeff Hoffmann
Date:
On Mar 15, 2005, at 6:35 AM, Greg Sabino Mullane wrote:

> Granted, I don't work on
> any huge, complex, hundreds of gig databases, but that supports my
> point -
> if you are really better off with a /higher/ (than 3)
> random_page_cost, you
> already should be tweaking a lot of stuff yourself anyway.

I think this is a good point.  The people that tend to benefit from the
lower cost are precisely the people least likely to know to change it.
It's the "install & go" crowd with smaller databases and only a few
users/low concurrency that expect it to "just work".  The bigger
installations are more like to have dedicated DB admins that understand
tuning.

Wasn't there an idea on the table once to ship with several different
configuration files with different defaults for small, medium, large,
etc. installs?  Wouldn't it make sense to ask the user during initdb to
pick from one of the default config files?  Or even have a few simple
questions like "How much memory do you expect to be available to
PostgreSQL?" and "How many concurrent users do you expect to have?".
It's one thing to know how much memory is in a machine, it quite
another thing to know how much the user wants dedicated to PostgreSQL.
A couple of questions like that can go a long way to coming up with
better ballpark figures.

--
Jeff Hoffmann
jeff@propertykey.com


Re: Changing the random_page_cost default (was: cpu_tuple_cost)

From
Tom Lane
Date:
"Greg Sabino Mullane" <greg@turnstep.com> writes:
> N.B. My own personal starting default is 2, but I thought 3 was a nice
> middle ground more likely to reach consensus here. :)

Your argument seems to be "this produces nice results for me", not
"I have done experiments to measure the actual value of the parameter
and it is X".  I *have* done experiments of that sort, which is where
the default of 4 came from.  I remain of the opinion that reducing
random_page_cost is a band-aid that compensates (but only partially)
for problems elsewhere.  We can see that it's not a real fix from
the not-infrequent report that people have to reduce random_page_cost
below 1.0 to get results anywhere near local reality.  That doesn't say
that the parameter value is wrong, it says that the model it's feeding
into is wrong.

            regards, tom lane

Re: Changing the random_page_cost default (was:

From
Mark Kirkwood
Date:
Tom Lane wrote:
> "Greg Sabino Mullane" <greg@turnstep.com> writes:
>
>>N.B. My own personal starting default is 2, but I thought 3 was a nice
>>middle ground more likely to reach consensus here. :)
>
>
> Your argument seems to be "this produces nice results for me", not
> "I have done experiments to measure the actual value of the parameter
> and it is X".  I *have* done experiments of that sort, which is where
> the default of 4 came from.  I remain of the opinion that reducing
> random_page_cost is a band-aid that compensates (but only partially)
> for problems elsewhere.  We can see that it's not a real fix from
> the not-infrequent report that people have to reduce random_page_cost
> below 1.0 to get results anywhere near local reality.  That doesn't say
> that the parameter value is wrong, it says that the model it's feeding
> into is wrong.
>

I would like to second that. A while back I performed a number of
experiments on differing hardware and came to the conclusion that *real*
random_page_cost was often higher than 4 (like 10-15 for multi-disk raid
  systems).

However I have frequently adjusted Pg's random_page_cost to be less than
4 - if it helped queries perform better.

So yes, it looks like the model is the issue - not the value of the
parameter!

regards

Mark


Re: cpu_tuple_cost

From
David Brown
Date:
Gregory Stark wrote:

>The "this day and age" argument isn't very convincing. Hard drive capacity
>growth has far outstripped hard drive seek time and bandwidth improvements.
>Random access has more penalty than ever.
>
>
In point of fact, there haven't been noticeable seek time improvements
for years. Transfer rates, on the other hand, have gone through the roof.

Which is why I would question the published tuning advice that
recommends lowering it to 2 for arrays. Arrays increase the effective
transfer rate more than they reduce random access times. Dropping from 4
to 2 would reflect going from a typical single 7200rpm ATA drive to a
15000rpm SCSI drive, but striping will move it back up again - probably
even higher than 4 with a big array (at a guess, perhaps the
relationship might be approximated as a square root after allowing for
the array type?).

With default settings, I've seen the planner pick the wrong index unless
random_page_cost was set to 2. But in testing on an ATA drive, I
achieved slightly better plan costings by increasing cpu_tuple_cost
(relative to cpu_index_tuple_cost - by default it's only a factor of 10)
and actually *raising* random_page_cost to 5! So why pick on one
parameter? It's all going to vary according to the query and the data.

I agree with Tom 100%. Pulling levers on a wonky model is no solution.

Re: Changing the random_page_cost default (was: cpu_tuple_cost)

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> Your argument seems to be "this produces nice results for me", not
> "I have done experiments to measure the actual value of the parameter
> and it is X".  I *have* done experiments of that sort, which is where
> the default of 4 came from.  I remain of the opinion that reducing
> random_page_cost is a band-aid that compensates (but only partially)
> for problems elsewhere.  We can see that it's not a real fix from
> the not-infrequent report that people have to reduce random_page_cost
> below 1.0 to get results anywhere near local reality.  That doesn't say
> that the parameter value is wrong, it says that the model it's feeding
> into is wrong.

Good points: allow me to rephrase my question then:

When I install a new version of PostgreSQL and start testing my
applications, one of the most common problems is that many of my queries
are not hitting an index. I typically drop random_page_cost to 2 or
lower and this speeds things very significantly. How can I determine a
better way to speed up my queries, and why would this be advantageous
over simply dropping random_page_cost? How can I use my particular
situation to help develop a better model and perhaps make the defaults
work better for my queries and other people with databaes like mine.
(fairly simple schema, not too large (~2 Gig total), SCSI, medium to
high complexity queries, good amount of RAM available)?

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200503150600
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8

-----BEGIN PGP SIGNATURE-----

iD8DBQFCNsCbvJuQZxSWSsgRAs0sAJwLFsGApzfYNV5jPL0gGVW5BH37hwCfRSW8
ed3sLnMg1UOTgN3oL9JSIFo=
=cZIe
-----END PGP SIGNATURE-----



Re: cpu_tuple_cost

From
Greg Stark
Date:
David Brown <time@bigpond.net.au> writes:

> Gregory Stark wrote:
>
> >The "this day and age" argument isn't very convincing. Hard drive capacity
> >growth has far outstripped hard drive seek time and bandwidth improvements.
> >Random access has more penalty than ever.
>
> In point of fact, there haven't been noticeable seek time improvements for
> years. Transfer rates, on the other hand, have gone through the roof.

Er, yeah. I stated it wrong. The real ratio here is between seek time and
throughput.

Typical 7200RPM drives have average seek times are in the area of 10ms.
Typical sustained transfer rates are in the range of 40Mb/s. Postgres reads
8kB blocks at a time.

So 800kB/s for random access reads. And 40Mb/s for sequential reads. That's a
factor of 49. I don't think anyone wants random_page_cost to be set to 50
though.

For a high end 15k drive I see average seek times get as low as 3ms. And
sustained transfer rates get as high as 100Mb/s. So about 2.7Mb/s for random
access reads or about a random_page_cost of 37. Still pretty extreme.

So what's going on with the empirically derived value of 4? Perhaps this is
because even though Postgres is reading an entire table sequentially it's
unlikely to be the only I/O consumer? The sequential reads would be
interleaved occasionally by some other I/O forcing a seek to continue.

In which case the true random_page_cost seems like it would be extremely
sensitive to the amount of readahead the OS does. To reach a random_page_cost
of 4 given the numbers above for a 7200RPM drive requires that just under 25%
of the I/O of a sequential table scan be random seeks [*]. That translates to
32kB of sequential reading, which actually does sound like a typical value for
OS readahead.

I wonder if those same empirical tests would show even higher values of
random_page_cost if the readahead were turned up to 64kB or 128kB.




[*] A bit of an algebraic diversion:

    1s/10ms = 100 random buffers/s.
    random_page_cost = 4 so net sequential buffers/s = 400.

    solve:

    400 buffers = rnd+seq
    1000ms = .2*seq + 10*rnd


--
greg

Re: cpu_tuple_cost

From
"Magnus Hagander"
Date:
> > >The "this day and age" argument isn't very convincing. Hard drive
> > >capacity growth has far outstripped hard drive seek time
> and bandwidth improvements.
> > >Random access has more penalty than ever.
> >
> > In point of fact, there haven't been noticeable seek time
> improvements
> > for years. Transfer rates, on the other hand, have gone
> through the roof.
>
> Er, yeah. I stated it wrong. The real ratio here is between
> seek time and throughput.
>
> Typical 7200RPM drives have average seek times are in the
> area of 10ms.
> Typical sustained transfer rates are in the range of 40Mb/s.
> Postgres reads 8kB blocks at a time.
>
> So 800kB/s for random access reads. And 40Mb/s for sequential
> reads. That's a factor of 49. I don't think anyone wants
> random_page_cost to be set to 50 though.
>
> For a high end 15k drive I see average seek times get as low
> as 3ms. And sustained transfer rates get as high as 100Mb/s.
> So about 2.7Mb/s for random access reads or about a
> random_page_cost of 37. Still pretty extreme.
>
> So what's going on with the empirically derived value of 4?
> Perhaps this is because even though Postgres is reading an
> entire table sequentially it's unlikely to be the only I/O
> consumer? The sequential reads would be interleaved
> occasionally by some other I/O forcing a seek to continue.

What about the cache memory on the disk? Even IDE disks have some 8Mb
cache today, which makes a lot of difference for fairly short scans.
Even if it's just read cache. That'll bring the speed of random access
down to a 1=1 relationship with sequential access, assuming all fits in
the cache.


//Magnus

Re: cpu_tuple_cost

From
Greg Stark
Date:
"Magnus Hagander" <mha@sollentuna.net> writes:

> What about the cache memory on the disk? Even IDE disks have some 8Mb
> cache today, which makes a lot of difference for fairly short scans.
> Even if it's just read cache. That'll bring the speed of random access
> down to a 1=1 relationship with sequential access, assuming all fits in
> the cache.

8MB cache is really insignificant compared to the hundreds or thousands of
megabytes the OS would be using to cache. You could just add the 8MB to your
effective_cache_size (except it's not really 100% effective since it would
contain some of the same blocks as the OS cache).

--
greg

Re: cpu_tuple_cost

From
Josh Berkus
Date:
Greg,

> So 800kB/s for random access reads. And 40Mb/s for sequential reads. That's
> a factor of 49. I don't think anyone wants random_page_cost to be set to 50
> though.
>
> For a high end 15k drive I see average seek times get as low as 3ms. And
> sustained transfer rates get as high as 100Mb/s. So about 2.7Mb/s for
> random access reads or about a random_page_cost of 37. Still pretty
> extreme.

Actually, what you're demonstrating here is that there's really no point in
having a random_page_cost GUC, since the seek/scan ratio is going to be high
regardless.

Although I can point out that you left out the fact that the disk needs to do
a seek to find the beginning of the seq scan area, and even then some file
fragmentation is possible.   Finally, I've never seen PostgreSQL manage more
than 70% of the maximum read rate, and in most cases more like 30%.

> So what's going on with the empirically derived value of 4?

It's not empirically derived; it's a value we plug into an
internal-to-postgresql formula.   And "4" is a fairly conservative value that
works for a lot of systems.

Realistically, the values we should be deriving from are:
-- median file cache size for postgresql files
-- average disk read throughput
-- effective processor calculation throughput
-- median I/O contention

However, working those 4 hardware "facts" into forumulas that allow us to
calculate the actual cost of a query execution plan is somebody's PhD paper.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: cpu_tuple_cost

From
Greg Stark
Date:
Josh Berkus <josh@agliodbs.com> writes:

> Although I can point out that you left out the fact that the disk needs to do
> a seek to find the beginning of the seq scan area, and even then some file
> fragmentation is possible.   Finally, I've never seen PostgreSQL manage more
> than 70% of the maximum read rate, and in most cases more like 30%.

Hm. I just did a quick test. It wasn't really long enough to get a good
estimate, but it seemed to reach about 30MB/s on this drive that's only
capable of 40-50MB/s depending on the location on the platters.

That's true though, some of my calculated 25% random seeks could be caused by
fragmentation. But it seems like that would be a small part.

> > So what's going on with the empirically derived value of 4?
>
> It's not empirically derived; it's a value we plug into an
> internal-to-postgresql formula.

I thought Tom said he got the value by doing empirical tests.

--
greg

Re: cpu_tuple_cost

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
>> So what's going on with the empirically derived value of 4?

> It's not empirically derived;

Yes it is.  I ran experiments back in the late 90s to derive it.
Check the archives.

Disks have gotten noticeably bigger since then, but I don't think
the ratio of seek time to rotation rate has changed much.

            regards, tom lane

Re: cpu_tuple_cost

From
Manfred Koizar
Date:
On Mon, 14 Mar 2005 21:23:29 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I think that the "reduce random_page_cost" mantra
>is not an indication that that parameter is wrong, but that the
>cost models it feeds into need more work.

One of these areas is the cost interpolation depending on correlation.
This has been discussed on -hackes in October 2002 and August 2003
("Correlation in cost_index()").  My Postgres installations contain the
patch presented during that discussion (and another index correlation
patch), and I use *higher* values for random_page_cost (up to 10).

Servus
 Manfred

Re: cpu_tuple_cost

From
Josh Berkus
Date:
Tom,

> Yes it is.  I ran experiments back in the late 90s to derive it.
> Check the archives.

Hmmmm ... which list?

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: cpu_tuple_cost

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
>> Yes it is.  I ran experiments back in the late 90s to derive it.
>> Check the archives.

> Hmmmm ... which list?

-hackers, no doubt.  -performance didn't exist then.

            regards, tom lane

Re: cpu_tuple_cost

From
Michael Fuhr
Date:
On Thu, Mar 17, 2005 at 09:54:29AM -0800, Josh Berkus wrote:
>
> > Yes it is.  I ran experiments back in the late 90s to derive it.
> > Check the archives.
>
> Hmmmm ... which list?

These look like relevant threads:

http://archives.postgresql.org/pgsql-hackers/2000-01/msg00910.php
http://archives.postgresql.org/pgsql-hackers/2000-02/msg00215.php

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: cpu_tuple_cost

From
Daniel Schuchardt
Date:
Tom Lane wrote:

>
> Reducing random_page_cost is usually the best way to get the planner to
> favor indexscans more.
>

Ok, I tried a bit with random_page_cost and I have set it to 1 to become
  PG using the index on mitpln:

CIMSOFT=# ANALYSE mitpln;
ANALYZE
CIMSOFT=# SET random_page_cost=2;
SET
CIMSOFT=# EXPLAIN ANALYSE SELECT * FROM mitpln WHERE
date_to_yearmonth_dec(mpl_date)='20050';
                                                   QUERY PLAN

--------------------------------------------------------------------------------

  Seq Scan on mitpln  (cost=0.00..1173.78 rows=1431 width=69) (actual
time=219.000..1125.000 rows=1266 loops=1)
    Filter: ((date_to_yearmonth_dec((mpl_date)::timestamp without time
zone))::text = '20050'::text)
  Total runtime: 1125.000 ms
(3 rows)

CIMSOFT=# SET random_page_cost=1;
SET
CIMSOFT=# EXPLAIN ANALYSE SELECT * FROM mitpln WHERE
date_to_yearmonth_dec(mpl_date)='20050';
                                                               QUERY PLAN

--------------------------------------------------------------------------------
  Index Scan using mitpln_yearmonth_dec on mitpln  (cost=0.00..699.01
rows=1431 width=69) (actual time=0.000..16.000 rows=1266 loops=1)
    Index Cond: ((date_to_yearmonth_dec((mpl_date)::timestamp without
time zone))::text = '20050'::text)
  Total runtime: 16.000 ms
(3 rows)


CIMSOFT=# \d mitpln
                                        Table "public.mitpln"
     Column    |         Type          |                         Modifiers

--------------+-----------------------+-----------------------------------------
  mpl_id       | integer               | not null default
nextval('public.mitpln_mpl_id_seq'::text)
  mpl_date     | date                  |
  mpl_minr     | integer               | not null
  mpl_tpl_name | character varying(20) |
  mpl_feiertag | character varying(50) |
  mpl_min      | real                  |
  mpl_saldo    | real                  |
  mpl_buch     | boolean               | not null default false
  mpl_absaldo  | real                  |
  mpl_vhz      | real                  |
  dbrid        | character varying     | default nextval('db_id_seq'::text)
Indexes:
     "mitpln_pkey" PRIMARY KEY, btree (mpl_id)
     "mitpln_idindex" UNIQUE, btree (dbrid)
     "xtt5126" UNIQUE, btree (mpl_date, mpl_minr)
     "mitpln_yearmonth_dec" btree
(date_to_yearmonth_dec(mpl_date::timestamp with
out time zone))
     "mpl_minr" btree (mpl_minr)
     "mpl_minr_nobuch" btree (mpl_minr) WHERE NOT mpl_buch


CIMSOFT=# SELECT count(*) FROM mitpln;
  count
-------
  26330
(1 row)


CREATE OR REPLACE FUNCTION date_to_yearmonth_dec(TIMESTAMP) RETURNS
VARCHAR AS'
BEGIN
  RETURN extract(year FROM $1) || extract(month FROM $1)-1;
END'LANGUAGE plpgsql IMMUTABLE;


Daniel

PS : thats a 2.4 GHZ P4 Server with 1 GB Ram and RAID - SCSI
(WIN2000, PG8.0.1)