Re: Erroneous cost estimation for nested loop join - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Erroneous cost estimation for nested loop join |
Date | |
Msg-id | CA+TgmoYkNQXRcRZekNK=qVQkgfp5zD+1_7vYa7fDASCx8hiqzA@mail.gmail.com Whole thread Raw |
In response to | Re: Erroneous cost estimation for nested loop join (Jeff Janes <jeff.janes@gmail.com>) |
Responses |
Re: Erroneous cost estimation for nested loop join
|
List | pgsql-hackers |
On Mon, Nov 16, 2015 at 6:50 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Mon, Nov 9, 2015 at 6:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> kawamichi@tkl.iis.u-tokyo.ac.jp writes: >>> - cost parameter calibration: random_page_cost = 92.89 >> >> TBH, you lost me there already. I know of no hardware on which that would >> be a sane depiction of reality, so I think you've probably overfitted the >> model to some particular case it was already inaccurate on. > > I can easily get a ratio of random to sequential of 50, and my RAID is > nothing special. I don't see why a high-end RAID couldn't justify 100 > or more, as long as they know the cache-hit is very low. (The default > value of 4 seems to bake in the notion that 90% of even random IO is > going to be satisfied from the cache, which might be a good estimate > if you have frequently used smallish lookup tables that always get > joined to the RAM-busters, but some people aren't going to have that > type of database queries as their main load). I agree. What I've been thinking about is: - If we're sequential scanning a small table, let's say less than 1/4 of shared_buffers, which is the point where synchronized scans kick in, then assume the data is coming from shared_buffers. - If we're scanning a medium-sized table, let's say less than effective_cache_size, then assume the data is coming from the OS cache. Maybe this is the same cost as the previous case, or maybe it's slightly more. - Otherwise, assume that the first effective_cache_size pages are coming from cache and the rest has to be read from disk. This is perhaps unrealistic, but we don't want the cost curve to be discontinuous. The effect of this would be that sequential scanning a small table would look cheaper per page than sequential scanning a large table, which is a good idea, because it's likely to be true. Similar adaptations could be made for index scans, index-only scans, bitmap index scans, and bitmap heap scans. The default value of random_page_cost would go up, but there would be a new cached_page_cost GUC that would apply in some cases where seq_page_cost and/or random_page_cost apply today. Previous attempts to improve the modeling of cache effects have faltered because we don't know what will be cached at execution time, and even if we did it can change very quickly and we don't want plan instability. But it seems to me that guessing based on the size of the relation is pretty reasonable - essentially we'd be trying to refine a model which says that every page is equally likely to be cached, and that shouldn't be too high a bar to clear. A problem with this sort of thing, of course, is that it's really hard to test a proposed change broadly enough to be certain how it will play out in the real world. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: