Re: Gsoc2012 idea, tablesample - Mailing list pgsql-hackers
From | Florian Pflug |
---|---|
Subject | Re: Gsoc2012 idea, tablesample |
Date | |
Msg-id | 798891F2-2E56-41CB-89A3-5BF5F332F77C@phlo.org Whole thread Raw |
In response to | Re: Gsoc2012 idea, tablesample (Qi Huang <huangqiyx@hotmail.com>) |
Responses |
Re: Gsoc2012 idea, tablesample
|
List | pgsql-hackers |
On May10, 2012, at 10:43 , Qi Huang wrote: > 2. use TIDSCAN to directly access tuples. The below way of using ctid proposed by Kevin looks good. > > -One technique which might be suitably random without reading the > -whole table would be to figure out a maximum block number and tuple > -ID for the table, and generate a series of random ctid values to > -read. If the tuple doesn't exist or is not visible to the snapshot, > -you ignore it and continue, until you have read the requisite number > -of rows. You could try to generate them in advance and sort them by > -block number, but then you need to solve the problems of what to do > -if that set of ctids yields too many rows or too few rows, both of > -which have sticky issues. > I think this technique could be considered as an implementation algo for BERNOULLI method. It looks that it could stillreduce a lot of cost compared to just assign random number to every tuple and then retrieve. One problem I see with this approach is that its efficiency depends on the average tuple length, at least with a naive approachto random ctid generator. The simplest way to generate those randomly without introducing bias is to generate a randompage index between 0 and the relation's size in pages, and then generate random tuple index between 0 and MaxHeapTuplesPerPage,which is 291 on x86-64 assuming the standard page size of 8k. The current toasting threshold (TOAST_TUPLE_THRESHOLD) is approximately 2k, so having tables with an average heap tuple sizeof a few hundred bytes doesn't seem unlikely. Now, assume the average tuple length is 128 bytes, i.e. on average you'llhave ~ 8k/128 = 64 live tuples / page if the fill factor is 100% and all tuples are live. To account for lower fillfactors and dead tuples, let's thus say there are 50 live tuples / page. Then, on average, only every 6th randomly generatedctid will point to a live tuple. But whether or not it does can only be decided after reading the page from disk,so you end up with a rate of 6 random-access reads per returned tuple. IIRC, the cutoff point where an index scan loses compared to a sequential scan is somewhere around 10% of the table read,i.e. if a predicate selects more than 10% of the available rows, a sequential scan is more efficient than an index scan.Scaling that with the 1/6-th success rate from above means that Kevin's approach would only beat a sequential scan ifthe sampling percentage isn't much larger than 1%, assuming an average row size of 128 bytes. The algorithm still seems like a good choice for very small sampling percentages, though. best regards, Florian Pflug
pgsql-hackers by date: