Re: Gsoc2012 idea, tablesample - Mailing list pgsql-hackers
From | Kevin Grittner |
---|---|
Subject | Re: Gsoc2012 idea, tablesample |
Date | |
Msg-id | 4FAB8A310200002500047ABA@gw.wicourts.gov Whole thread Raw |
In response to | Re: Gsoc2012 idea, tablesample (Florian Pflug <fgp@phlo.org>) |
Responses |
Re: Gsoc2012 idea, tablesample
|
List | pgsql-hackers |
Florian Pflug <fgp@phlo.org> wrote: > One problem I see with this approach is that its efficiency > depends on the average tuple length, at least with a naive > approach to random ctid generator. The simplest way to generate > those randomly without introducing bias is to generate a random > page index between 0 and the relation's size in pages, Right. > and then generate random tuple index between 0 and > MaxHeapTuplesPerPage, which is 291 on x86-64 assuming the standard > page size of 8k. I think we can do better than that without moving too far from "the simplest way". It seems like we should be able to get a more accurate calculation of a minimum base tuple size based on the table definition, and calculate the maximum number of those which could fit on a page. On the other hand, ctid uses a line pointer, doesn't it? Do we need to worry about dead line pointers allowing higher tuple indexes than the calculated maximum number of tuples per page? > The current toasting threshold (TOAST_TUPLE_THRESHOLD) is > approximately 2k, so having tables with an average heap tuple size > of a few hundred bytes doesn't seem unlikely. Now, assume the > average tuple length is 128 bytes, i.e. on average you'll have ~ > 8k/128 = 64 live tuples / page if the fill factor is 100% and all > tuples are live. To account for lower fill factors and dead > tuples, let's thus say there are 50 live tuples / page. Then, on > average, only every 6th randomly generated ctid will point to a > live tuple. But whether or not it does can only be decided after > reading the page from disk, so you end up with a rate of 6 > random-access reads per returned tuple. > > IIRC, the cutoff point where an index scan loses compared to a > sequential scan is somewhere around 10% of the table read, i.e. if > a predicate selects more than 10% of the available rows, a > sequential scan is more efficient than an index scan. That ratio *might* be better for a ctid scan, since you don't have the index access in the mix. > Scaling that with the 1/6-th success rate from above means that > Kevin's approach would only beat a sequential scan if the sampling > percentage isn't much larger than 1%, assuming an average row size > of 128 bytes. > > The algorithm still seems like a good choice for very small > sampling percentages, though. Yeah, even with a maximum tuple count calculated by page, there are certainly going to be cases where another approach will be faster, especially where the sample is a relatively high percentage of the table. It would be good to have multiple plans compete on costs, if possible. It would not surprise me at all if the typical break-even point between the two techniques was somewhere on the order of a 1% sample size, but one would hope we could get there on the basis of estimated costs rather than using arbitrary rules or forcing the user to guess and code a choice explicitly. -Kevin
pgsql-hackers by date: