Thread: cost_nonsequential_access()
The comment describing cost_nonsequential_access() says that the two functions "meet in the middle". They meet at random_page_cost/2, however, not in the middle between 1 and random_page_cost. For random_page_cost < 2 the result can be less than 1 for relpages near effective_cache_size. I don't think that this was intended. This patch makes sure that cost_nonsequential_access() is always between 1 and randon_page_cost and the functions meet a (1+random_page_cost)/2. Servus Manfred diff -Ncr ../base/src/backend/optimizer/path/costsize.c src/backend/optimizer/path/costsize.c *** ../base/src/backend/optimizer/path/costsize.c Tue Jun 1 05:02:52 2004 --- src/backend/optimizer/path/costsize.c Tue Jun 8 08:34:27 2004 *************** *** 189,199 **** * for now by assuming we are given an effective_cache_size parameter. * * Given a guesstimated cache size, we estimate the actual I/O cost per page ! * with the entirely ad-hoc equations: * if relpages >= effective_cache_size: ! * random_page_cost * (1 - (effective_cache_size/relpages)/2) * if relpages < effective_cache_size: ! * 1 + (random_page_cost/2-1) * (relpages/effective_cache_size) ** 2 * These give the right asymptotic behavior (=> 1.0 as relpages becomes * small, => random_page_cost as it becomes large) and meet in the middle * with the estimate that the cache is about 50% effective for a relation --- 189,200 ---- * for now by assuming we are given an effective_cache_size parameter. * * Given a guesstimated cache size, we estimate the actual I/O cost per page ! * with the entirely ad-hoc equations (writing rpc for random_page_cost and ! * relsize for relpages/effective_cache_size): * if relpages >= effective_cache_size: ! * rpc - (rpc-1)/2 * 1/relsize * if relpages < effective_cache_size: ! * 1 + (rpc-1)/2 * relsize * These give the right asymptotic behavior (=> 1.0 as relpages becomes * small, => random_page_cost as it becomes large) and meet in the middle * with the estimate that the cache is about 50% effective for a relation *************** *** 213,221 **** relsize = relpages / effective_cache_size; if (relsize >= 1.0) ! return random_page_cost * (1.0 - 0.5 / relsize); else ! return 1.0 + (random_page_cost * 0.5 - 1.0) * relsize * relsize; } /* --- 214,222 ---- relsize = relpages / effective_cache_size; if (relsize >= 1.0) ! return random_page_cost - (random_page_cost - 1.0) / 2.0 / relsize; else ! return 1.0 + (random_page_cost - 1.0) / 2.0 * relsize; } /*
Manfred Koizar <mkoi-pg@aon.at> writes: > The comment describing cost_nonsequential_access() says that the two > functions "meet in the middle". They meet at random_page_cost/2, > however, not in the middle between 1 and random_page_cost. For > random_page_cost < 2 the result can be less than 1 for relpages near > effective_cache_size. I don't think that this was intended. You're right, I failed to consider that random_page_cost might be less than 2. > This patch makes sure that cost_nonsequential_access() is always between > 1 and randon_page_cost and the functions meet a (1+random_page_cost)/2. This patch seems to do considerably more violence to the equations than is needed to cover that oversight, though. The old behavior was intentionally nonlinear in relsize; this is not. regards, tom lane
On Tue, 08 Jun 2004 11:13:43 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote: >This patch seems to do considerably more violence to the equations than >is needed to cover that oversight, though. The old behavior was >intentionally nonlinear in relsize; this is not. The comment says "entirely ad-hoc" and I didn't see a particular reason why the lower half should be nonlinear in relsize while the upper half is linear in 1/relsize. So I opted for the "more esthetic" symmetrical function. :-) http://www.pivot.at/pg/costsize.jpg original http://www.pivot.at/pg/costsize_2.jpg nonlinear/linear http://www.pivot.at/pg/costsize_3a.jpg nonlinear http://www.pivot.at/pg/costsize_3b.jpg linear I don't have a strong opinion for either relsize or relsize^2. So please add " * relsize" or " / relsize" as appropriate before you apply (if you intend to apply). Servus Manfred
Manfred Koizar <mkoi-pg@aon.at> writes: > On Tue, 08 Jun 2004 11:13:43 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> This patch seems to do considerably more violence to the equations than >> is needed to cover that oversight, though. The old behavior was >> intentionally nonlinear in relsize; this is not. > The comment says "entirely ad-hoc" and I didn't see a particular reason > why the lower half should be nonlinear in relsize while the upper half > is linear in 1/relsize. Incremental changes in the relsize fraction are not going to change the cost much except near 1, so I was after a curve that went like this (pardon the crude artwork): rpc *********************** ******* **** ** cost * * * ** **** ******** 1.0 ************* 0 1 large relsize fraction I don't think replacing the lower half of this with a straight line segment is an improvement. Possibly the relsize axis ought to be measured on a log scale, or something like that, but that didn't seem to work nicely when relsize approaches zero. regards, tom lane
On Tue, 08 Jun 2004 13:13:01 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote: >Possibly the relsize axis ought to be measured on a log scale, or >something like that, but that didn't seem to work nicely when relsize >approaches zero. In my experiments I used log(relsize) on the x axis, and I don't think that the graph looks unpleasant for small relsize. My thought was (and is) that we are much more interested in whether relpages is 1/100, 1/10, 1, 10, 100 times effective_cache_size than whether it is relpages +/- 1000, 2000, 3000, ... I played around with some numbers that could be considered fairly realistic. You might want to look at the graphs I linked to in the previous message or download http://www.pivot.at/pg/costsize.sxc. But I think we are wasting too much effort. The graphs don't look too different, whether you use relsize or relsize^2. Maybe relsize^3 is optimal? Nobody knows. The important part of the patch is that the result is scaled and shifted into the range 1 to random_page_cost. Whatever you decide to do is ok with me. Servus Manfred