Thread: Query Sampling
Hi everybody, I would like to add query sampling support to postgresql (atleast as a part ofmy project, if someone feelsstrongly against checking it in the main branch). I have been going over the code and I do see a lot of sampling stuff in backend/commands/analyze.c. However, I plan to add sampling support to the executor, allowing the following types of queries: SELECT STORE, AVG(SALES) FROM TRANSACTIONS TABLESAMPLEBERNOULLI(10) REPEATABLE(5) GROUP BY STORE (This is supported by DB2). For starters I think this should be doable in the executor by cannibalizing nodeSeqscan.c and adding sampling support to it. However I am concerned about the planner optimizations as it might decide to run an index scan (instead of a sequential scan) for a particular base relation. My question is: Is there any easy way of forcing the optimizer to choose sequential scan for a particular relation? (I apologize if this is documented in the planner code as I am still going over it). I would appreciate any other comments. Thanks much, Varun
On Sat, 2005-08-27 at 17:00 -0700, Varun Kacholia wrote: > Hi everybody, > I would like to add query sampling support to postgresql (atleast as a part of > my project, if someone feels strongly against checking it in the main branch). > I have been going over the code and I do see a lot of sampling stuff > in backend/commands/analyze.c. However, I plan to add sampling support > to the > executor, allowing the following types of queries: > > SELECT STORE, AVG(SALES) FROM TRANSACTIONS TABLESAMPLE > BERNOULLI(10) REPEATABLE(5) GROUP BY STORE > > (This is supported by DB2). This is part of the ISO/ANSI SQL standard, so would no doubt be accepted as a feature into PostgreSQL. I assume you realise that Bernoulli sampling is currently possibly using the random() function and setseed() ? SYSTEM sampling could be a good performance gain for SELECTs requiring fast sampling operations against large tables. That's a common situation for data mining applications. > For starters I think this should be doable in the executor by cannibalizing > nodeSeqscan.c and adding sampling support to it. > However I am concerned about the planner optimizations as it might decide > to run an index scan (instead of a sequential scan) for a particular > base relation. > My question is: Is there any easy way of forcing the optimizer to > choose sequential > scan for a particular relation? (I apologize if this is documented in > the planner code > as I am still going over it). > I would appreciate any other comments. I can't see why TABLESAMPLE effects a sequential scan *only*, in all cases. I agree that there seems little point in sampling rows from a table when it is already sufficiently restricted that the query could use an index. AFAICS this clause would potentially effect Index and Bitmap scans also, and would be required for full correctness to the standard. In terms of the standard syntax, BERNOULLI sampling might be slightly easier to implement within the row-oriented logic of the executor. However, I'd suggest that SYSTEM might be a more useful starting place since it could greatly improve performance for sampling operations. SYSTEM sampling allows you to skip whole blocks altogether, or even groups of blocks so that filesystem readahead could still give you sequential I/O, even while sampling. (The standard doesn't say anything about how you achieve this, hence "SYSTEM" - but it does relax the requirement that the probability of a row's inclusion in the result set is independent of other rows). You would need to be careful to sample the whole table though, rather than to follow the temptation to just scan the first X% of it. The start of a table has a tendency to be dead tuples, which was an error that the sampling logic in 7.4 made, so it would be wise to avoid repeating that. I'd be willing to lend a hand over the coming year - since 8.1 just went beta we can expect a good few months before the next code deadline. Best Regards, Simon Riggs
> I assume you realise that Bernoulli sampling is currently possibly using > the random() function and setseed() ? Yes, select * from table where random() < x, does the job. > I can't see why TABLESAMPLE effects a sequential scan *only*, in all > cases. I agree that there seems little point in sampling rows from a > table when it is already sufficiently restricted that the query could > use an index. > AFAICS this clause would potentially effect Index and Bitmap scans also, > and would be required for full correctness to the standard. As I see it, there are 3 ways of implementing the sample operator: 1. modify node[Seq|Tid|Index|..]scan.c to consider sampling 2. create new nodes for each of the possible scans..sequential, index, tid et al 3. support sequential scan only for sampling. (1) does not seem to be attractive, while (2) is a lot of work to begin with. I was planning to start with (3) and approach to (2) in the long run. I would appreciate your opinion on this. > You would need to be careful to sample the whole table though, rather > than to follow the temptation to just scan the first X% of it. The start > of a table has a tendency to be dead tuples, which was an error that the > sampling logic in 7.4 made, so it would be wise to avoid repeating that. Correct. I have that in mind. > I'd be willing to lend a hand over the coming year - since 8.1 just went > beta we can expect a good few months before the next code deadline. Great! I would appreciate an extra hand. Thanks Varun >
On Mon, 2005-08-29 at 14:58 -0700, Varun Kacholia wrote: > > I assume you realise that Bernoulli sampling is currently possibly using > > the random() function and setseed() ? > Yes, select * from table where random() < x, does the job. > > > I can't see why TABLESAMPLE effects a sequential scan *only*, in all > > cases. I agree that there seems little point in sampling rows from a > > table when it is already sufficiently restricted that the query could > > use an index. > > AFAICS this clause would potentially effect Index and Bitmap scans also, > > and would be required for full correctness to the standard. > > As I see it, there are 3 ways of implementing the sample operator: > 1. modify node[Seq|Tid|Index|..]scan.c to consider sampling > 2. create new nodes for each of the possible scans..sequential, index, tid et al > 3. support sequential scan only for sampling. > > (1) does not seem to be attractive, while (2) is a lot of work to > begin with. I was > planning to start with (3) and approach to (2) in the long run. > I would appreciate your opinion on this. IMHO creating new nodes just for sampling would duplicate too much code. To me, the Bernoulli sampling sounds like 2-3 carefully placed statements in the executor nodes and a couple of additions to the node data structures. (As well as logic in the parser). Sounds like you would be better off prototyping something for sequential scans. If you can get Bernoulli working, you can move on to get SYSTEM working - which needs a deeper reach into the guts of the block request logic. We might be able to get away with the thought that SYSTEM sampling will actually use BERNOULLI sampling when an index or bitmap scan is used. That would give us standards compliant behaviour without too much effort (since that effort is essentially wasted, as previously observed). Best Regards, Simon Riggs