Chapter 54. Writing a Table Sampling Method
Table of Contents
Postgres Pro's implementation of the TABLESAMPLE
clause supports custom table sampling methods, in addition to the BERNOULLI
and SYSTEM
methods that are required by the SQL standard. The sampling method determines which rows of the table will be selected when the TABLESAMPLE
clause is used.
At the SQL level, a table sampling method is represented by a single SQL function, typically implemented in C, having the signature
method_name(internal) RETURNS tsm_handler
The name of the function is the same method name appearing in the TABLESAMPLE
clause. The internal
argument is a dummy (always having value zero) that simply serves to prevent this function from being called directly from a SQL command. The result of the function must be a palloc'd struct of type TsmRoutine
, which contains pointers to support functions for the sampling method. These support functions are plain C functions and are not visible or callable at the SQL level. The support functions are described in Section 54.1.
In addition to function pointers, the TsmRoutine
struct must provide these additional fields:
List *parameterTypes
This is an OID list containing the data type OIDs of the parameter(s) that will be accepted by the
TABLESAMPLE
clause when this sampling method is used. For example, for the built-in methods, this list contains a single item with valueFLOAT4OID
, which represents the sampling percentage. Custom sampling methods can have more or different parameters.bool repeatable_across_queries
If
true
, the sampling method can deliver identical samples across successive queries, if the same parameters andREPEATABLE
seed value are supplied each time and the table contents have not changed. When this isfalse
, theREPEATABLE
clause is not accepted for use with the sampling method.bool repeatable_across_scans
If
true
, the sampling method can deliver identical samples across successive scans in the same query (assuming unchanging parameters, seed value, and snapshot). When this isfalse
, the planner will not select plans that would require scanning the sampled table more than once, since that might result in inconsistent query output.
The table sampling methods included in the standard distribution are good references when trying to write your own. Look into the contrib
subdirectory for add-on methods.