Thread: Query Sampling

Query Sampling

From
Varun Kacholia
Date:
Hi everybody, I would like to add query sampling support to postgresql (atleast as a part ofmy project, if someone
feelsstrongly against checking it in the main branch). 
I have been going over the code and I do see a lot of sampling stuff
in backend/commands/analyze.c. However, I plan to add sampling support
to the
executor, allowing the following types of queries:

SELECT STORE, AVG(SALES) FROM TRANSACTIONS TABLESAMPLEBERNOULLI(10) REPEATABLE(5) GROUP BY STORE

(This is supported by DB2).

For starters I think this should be doable in the executor by cannibalizing
nodeSeqscan.c and adding sampling support to it.
However I am concerned about the planner optimizations as it might decide
to run an index scan (instead of a sequential scan) for a particular
base relation.
My question is: Is there any easy way of forcing the optimizer to
choose sequential
scan for a particular relation? (I apologize if this is documented in
the planner code
as I am still going over it).
I would appreciate any other comments.

Thanks much,
Varun


Re: Query Sampling

From
Simon Riggs
Date:
On Sat, 2005-08-27 at 17:00 -0700, Varun Kacholia wrote:
> Hi everybody,
>   I would like to add query sampling support to postgresql (atleast as a part of
>  my project, if someone feels strongly against checking it in the main branch).
> I have been going over the code and I do see a lot of sampling stuff
> in backend/commands/analyze.c. However, I plan to add sampling support
> to the
> executor, allowing the following types of queries:
> 
> SELECT STORE, AVG(SALES) FROM TRANSACTIONS TABLESAMPLE
>  BERNOULLI(10) REPEATABLE(5) GROUP BY STORE
> 
> (This is supported by DB2).

This is part of the ISO/ANSI SQL standard, so would no doubt be accepted
as a feature into PostgreSQL. 

I assume you realise that Bernoulli sampling is currently possibly using
the random() function and setseed() ?

SYSTEM sampling could be a good performance gain for SELECTs requiring
fast sampling operations against large tables. That's a common situation
for data mining applications.

> For starters I think this should be doable in the executor by cannibalizing
> nodeSeqscan.c and adding sampling support to it.
> However I am concerned about the planner optimizations as it might decide
> to run an index scan (instead of a sequential scan) for a particular
> base relation.
> My question is: Is there any easy way of forcing the optimizer to
> choose sequential
> scan for a particular relation? (I apologize if this is documented in
> the planner code
> as I am still going over it).
> I would appreciate any other comments.

I can't see why TABLESAMPLE effects a sequential scan *only*, in all
cases. I agree that there seems little point in sampling rows from a
table when it is already sufficiently restricted that the query could
use an index. 

AFAICS this clause would potentially effect Index and Bitmap scans also,
and would be required for full correctness to the standard.

In terms of the standard syntax, BERNOULLI sampling might be slightly
easier to implement within the row-oriented logic of the executor.
However, I'd suggest that SYSTEM might be a more useful starting place
since it could greatly improve performance for sampling operations.
SYSTEM sampling allows you to skip whole blocks altogether, or even
groups of blocks so that filesystem readahead could still give you
sequential I/O, even while sampling. (The standard doesn't say anything
about how you achieve this, hence "SYSTEM" - but it does relax the
requirement that the probability of a row's inclusion in the result set
is independent of other rows).

You would need to be careful to sample the whole table though, rather
than to follow the temptation to just scan the first X% of it. The start
of a table has a tendency to be dead tuples, which was an error that the
sampling logic in 7.4 made, so it would be wise to avoid repeating that.

I'd be willing to lend a hand over the coming year - since 8.1 just went
beta we can expect a good few months before the next code deadline.

Best Regards, Simon Riggs




Re: Query Sampling

From
Varun Kacholia
Date:
 > I assume you realise that Bernoulli sampling is currently possibly using
> the random() function and setseed() ?
Yes, select * from table where random() < x, does the job.
> I can't see why TABLESAMPLE effects a sequential scan *only*, in all
> cases. I agree that there seems little point in sampling rows from a
> table when it is already sufficiently restricted that the query could
> use an index.
> AFAICS this clause would potentially effect Index and Bitmap scans also,
> and would be required for full correctness to the standard.

As I see it, there are 3 ways of implementing the sample operator:
1. modify node[Seq|Tid|Index|..]scan.c to consider sampling
2. create new nodes for each of the possible scans..sequential, index, tid et al
3. support sequential scan only for sampling.

(1) does not seem to be attractive, while (2) is a lot of work to
begin with. I was
planning to start with (3) and approach to (2) in the long run.
I would appreciate your opinion on this.

> You would need to be careful to sample the whole table though, rather
> than to follow the temptation to just scan the first X% of it. The start
> of a table has a tendency to be dead tuples, which was an error that the
> sampling logic in 7.4 made, so it would be wise to avoid repeating that.
Correct. I have that in mind.
> I'd be willing to lend a hand over the coming year - since 8.1 just went
> beta we can expect a good few months before the next code deadline.
Great! I would appreciate an extra hand.

Thanks
Varun

>


Re: Query Sampling

From
Simon Riggs
Date:
On Mon, 2005-08-29 at 14:58 -0700, Varun Kacholia wrote:
>  > I assume you realise that Bernoulli sampling is currently possibly using
> > the random() function and setseed() ?
> Yes, select * from table where random() < x, does the job.
>  
> > I can't see why TABLESAMPLE effects a sequential scan *only*, in all
> > cases. I agree that there seems little point in sampling rows from a
> > table when it is already sufficiently restricted that the query could
> > use an index.
> > AFAICS this clause would potentially effect Index and Bitmap scans also,
> > and would be required for full correctness to the standard.
> 
> As I see it, there are 3 ways of implementing the sample operator:
> 1. modify node[Seq|Tid|Index|..]scan.c to consider sampling
> 2. create new nodes for each of the possible scans..sequential, index, tid et al
> 3. support sequential scan only for sampling.
> 
> (1) does not seem to be attractive, while (2) is a lot of work to
> begin with. I was
> planning to start with (3) and approach to (2) in the long run.
> I would appreciate your opinion on this.

IMHO creating new nodes just for sampling would duplicate too much code.
To me, the Bernoulli sampling sounds like 2-3 carefully placed
statements in the executor nodes and a couple of additions to the node
data structures. (As well as logic in the parser). 

Sounds like you would be better off prototyping something for sequential
scans. If you can get Bernoulli working, you can move on to get SYSTEM
working - which needs a deeper reach into the guts of the block request
logic.

We might be able to get away with the thought that SYSTEM sampling will
actually use BERNOULLI sampling when an index or bitmap scan is used.
That would give us standards compliant behaviour without too much effort
(since that effort is essentially wasted, as previously observed).

Best Regards, Simon Riggs