On 11/12/14 00:24, Petr Jelinek wrote:
> Hello,
>
> Attached is a basic implementation of TABLESAMPLE clause. It's SQL
> standard clause and couple of people tried to submit it before so I
> think I don't need to explain in length what it does - basically returns
> "random" sample of a table using a specified sampling method.
>
> I implemented both SYSTEM and BERNOULLI sampling as specified by SQL
> standard. The SYSTEM sampling does block level sampling using same
> algorithm as ANALYZE, BERNOULLI scans whole table and picks tuple randomly.
>
> There is API for sampling methods which consists of 4 functions at the
> moment - init, end, nextblock and nexttuple. I added catalog which maps
> the sampling method to the functions implementing this API. The grammar
> creates new TableSampleRange struct that I added for sampling. Parser
> then uses the catalog to load information about the sampling method into
> TableSampleClause which is then attached to RTE. Planner checks for if
> this parameter is present in the RTE and if it finds it it will create
> plan with just one path - SampleScan. SampleScan implements standard
> executor API and calls the sampling method API as needed.
>
> It is possible to write custom sampling methods. The sampling method
> parameters are not limited to just percent number as in standard but
> dynamic list of expressions which is checked against the definition of
> the init function in a similar fashion (although much simplified) as
> function calls are.
>
> Notable lacking parts are:
> - proper costing and returned row count estimation - given the dynamic
> nature of parameters I think for we'll need to let the sampling method
> do this, so there will have to be fifth function in the API.
> - ruleutils support (it needs a bit of code in get_from_clause_item
> function)
> - docs are sparse at the moment
>
Forgot the obligatory:
The research leading to these results has received funding from the
European Union's Seventh Framework Programme (FP7/2007-2013) under grant
agreement n° 318633.
-- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services