Thread: cost_nonsequential_access()

cost_nonsequential_access()

From
Manfred Koizar
Date:
The comment describing cost_nonsequential_access() says that the two
functions "meet in the middle".  They meet at random_page_cost/2,
however, not in the middle between 1 and random_page_cost.  For
random_page_cost < 2 the result can be less than 1 for relpages near
effective_cache_size.  I don't think that this was intended.

This patch makes sure that cost_nonsequential_access() is always between
1 and randon_page_cost and the functions meet a (1+random_page_cost)/2.

Servus
 Manfred
diff -Ncr ../base/src/backend/optimizer/path/costsize.c src/backend/optimizer/path/costsize.c
*** ../base/src/backend/optimizer/path/costsize.c    Tue Jun  1 05:02:52 2004
--- src/backend/optimizer/path/costsize.c    Tue Jun  8 08:34:27 2004
***************
*** 189,199 ****
   * for now by assuming we are given an effective_cache_size parameter.
   *
   * Given a guesstimated cache size, we estimate the actual I/O cost per page
!  * with the entirely ad-hoc equations:
   *    if relpages >= effective_cache_size:
!  *        random_page_cost * (1 - (effective_cache_size/relpages)/2)
   *    if relpages < effective_cache_size:
!  *        1 + (random_page_cost/2-1) * (relpages/effective_cache_size) ** 2
   * These give the right asymptotic behavior (=> 1.0 as relpages becomes
   * small, => random_page_cost as it becomes large) and meet in the middle
   * with the estimate that the cache is about 50% effective for a relation
--- 189,200 ----
   * for now by assuming we are given an effective_cache_size parameter.
   *
   * Given a guesstimated cache size, we estimate the actual I/O cost per page
!  * with the entirely ad-hoc equations (writing rpc for random_page_cost and
!  * relsize for relpages/effective_cache_size):
   *    if relpages >= effective_cache_size:
!  *        rpc - (rpc-1)/2 * 1/relsize
   *    if relpages < effective_cache_size:
!  *        1 + (rpc-1)/2 * relsize
   * These give the right asymptotic behavior (=> 1.0 as relpages becomes
   * small, => random_page_cost as it becomes large) and meet in the middle
   * with the estimate that the cache is about 50% effective for a relation
***************
*** 213,221 ****
      relsize = relpages / effective_cache_size;

      if (relsize >= 1.0)
!         return random_page_cost * (1.0 - 0.5 / relsize);
      else
!         return 1.0 + (random_page_cost * 0.5 - 1.0) * relsize * relsize;
  }

  /*
--- 214,222 ----
      relsize = relpages / effective_cache_size;

      if (relsize >= 1.0)
!         return random_page_cost - (random_page_cost - 1.0) / 2.0 / relsize;
      else
!         return 1.0 + (random_page_cost - 1.0) / 2.0 * relsize;
  }

  /*

Re: cost_nonsequential_access()

From
Tom Lane
Date:
Manfred Koizar <mkoi-pg@aon.at> writes:
> The comment describing cost_nonsequential_access() says that the two
> functions "meet in the middle".  They meet at random_page_cost/2,
> however, not in the middle between 1 and random_page_cost.  For
> random_page_cost < 2 the result can be less than 1 for relpages near
> effective_cache_size.  I don't think that this was intended.

You're right, I failed to consider that random_page_cost might be less
than 2.

> This patch makes sure that cost_nonsequential_access() is always between
> 1 and randon_page_cost and the functions meet a (1+random_page_cost)/2.

This patch seems to do considerably more violence to the equations than
is needed to cover that oversight, though.  The old behavior was
intentionally nonlinear in relsize; this is not.

            regards, tom lane

Re: cost_nonsequential_access()

From
Manfred Koizar
Date:
On Tue, 08 Jun 2004 11:13:43 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>This patch seems to do considerably more violence to the equations than
>is needed to cover that oversight, though.  The old behavior was
>intentionally nonlinear in relsize; this is not.

The comment says "entirely ad-hoc" and I didn't see a particular reason
why the lower half should be nonlinear in relsize while the upper half
is linear in 1/relsize.  So I opted for the "more esthetic" symmetrical
function.  :-)

http://www.pivot.at/pg/costsize.jpg original
http://www.pivot.at/pg/costsize_2.jpg nonlinear/linear
http://www.pivot.at/pg/costsize_3a.jpg nonlinear
http://www.pivot.at/pg/costsize_3b.jpg linear

I don't have a strong opinion for either relsize or relsize^2.  So
please add " * relsize" or " / relsize" as appropriate before you apply
(if you intend to apply).

Servus
 Manfred

Re: cost_nonsequential_access()

From
Tom Lane
Date:
Manfred Koizar <mkoi-pg@aon.at> writes:
> On Tue, 08 Jun 2004 11:13:43 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> This patch seems to do considerably more violence to the equations than
>> is needed to cover that oversight, though.  The old behavior was
>> intentionally nonlinear in relsize; this is not.

> The comment says "entirely ad-hoc" and I didn't see a particular reason
> why the lower half should be nonlinear in relsize while the upper half
> is linear in 1/relsize.

Incremental changes in the relsize fraction are not going to change the
cost much except near 1, so I was after a curve that went like this
(pardon the crude artwork):


rpc                                            ***********************
                                        *******
                                    ****
                                  **
cost                             *
                                *
                               *
                             **
                         ****
                 ********
1.0 *************
    0                           1                                large
                          relsize fraction

I don't think replacing the lower half of this with a straight line
segment is an improvement.

Possibly the relsize axis ought to be measured on a log scale, or
something like that, but that didn't seem to work nicely when relsize
approaches zero.

            regards, tom lane

Re: cost_nonsequential_access()

From
Manfred Koizar
Date:
On Tue, 08 Jun 2004 13:13:01 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>Possibly the relsize axis ought to be measured on a log scale, or
>something like that, but that didn't seem to work nicely when relsize
>approaches zero.

In my experiments I used log(relsize) on the x axis, and I don't think
that the graph looks unpleasant for small relsize.  My thought was (and
is) that we are much more interested in whether relpages is 1/100, 1/10,
1, 10, 100 times effective_cache_size than whether it is relpages +/-
1000, 2000, 3000, ...

I played around with some numbers that could be considered fairly
realistic.  You might want to look at the graphs I linked to in the
previous message or download http://www.pivot.at/pg/costsize.sxc.

But I think we are wasting too much effort.  The graphs don't look too
different, whether you use relsize or relsize^2.  Maybe relsize^3 is
optimal?  Nobody knows.  The important part of the patch is that the
result is scaled and shifted into the range 1 to random_page_cost.
Whatever you decide to do is ok with me.

Servus
 Manfred