Re: SeqScan costs - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: SeqScan costs |
Date | |
Msg-id | 1218601837.5343.204.camel@ebony.2ndQuadrant Whole thread Raw |
In response to | Re: SeqScan costs (Gregory Stark <stark@enterprisedb.com>) |
List | pgsql-hackers |
On Tue, 2008-08-12 at 23:58 +0100, Gregory Stark wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > > > On Tue, 2008-08-12 at 15:46 -0400, Tom Lane wrote: > >> Simon Riggs <simon@2ndquadrant.com> writes: > >> > Proposal: Make the first block of a seq scan cost random_page_cost, then > >> > after that every additional block costs seq_page_cost. > >> > >> This is only going to matter for a table of 1 block (or at least very > >> few blocks), and for such a table it's highly likely that it's in RAM > >> anyway. So I'm unconvinced that the proposed change represents a > >> better model of reality. > > I think the first block of a sequential scan is clearly a random access. Agreed > If > that doesn't represent reality well then perhaps we need to tackle both > problems together. It does represent reality. > Somehow we need to discount scan i/o cost based on how much > of the table we expect to be in cache. For 1-block tables if we should expect > them to be in cache we should be zeroing out all the i/o cost whether random > or sequential. This isn't relevant. > > The access cost should be the same for a 1 block table, whether its on > > disk or in memory. > > Uhm, huh? That can't be what you meant to write? It can be. The *comparative* access cost (in the optimizer) between random and sequential access should be exactly the same for a 1 block table whether the block is on disk or in memory. In reality the block access time is identical for a 1 block table at each level of the storage hierarchy * in shared_buffers (ReadBuffer) * in filesystem cache (read() to cache) * on disk (read() to disk) > > AFAICS the cost cross-over is much higher than the actual elapsed time > > cross-over for both narrow and wide tables. > > > > Thats why using SET enable_seqscan=off helps performance in many cases, > > or why people reduce random_page_cost to force index selection. > > People lower random_page_cost because we're not doing a good job estimating > how much of a table is in cache. I think that would be a great target for some > careful analysis. If you can come up with specific places and reasonable > heuristics to discount i/o costs based on effective_cache_size and then > demonstrate cases where it produces consistently better cost estimates that > would be a huge help. > > I've been running benchmarks where I see accurate random_page_costs of 13-80 > on uncached data on a moderate sized raid array. But of course when a some of > the data is cached the effective random_page_cost is much much lower than > that. Agreed, but thats a harder problem and nothing I wish to raise here. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
pgsql-hackers by date: