Chapter 55. Writing A Table Sampling Method
Table of Contents
PostgreSQL's implementation of the
TABLESAMPLE clause supports custom table sampling methods, in addition to the
SYSTEM methods that are required by the SQL standard. The sampling method determines which rows of the table will be selected when the
TABLESAMPLE clause is used.
At the SQL level, a table sampling method is represented by a single SQL function, typically implemented in C, having the signature
method_name(internal) RETURNS tsm_handler
The name of the function is the same method name appearing in the
TABLESAMPLE clause. The
internal argument is a dummy (always having value zero) that simply serves to prevent this function from being called directly from a SQL command. The result of the function must be a palloc'd struct of type
TsmRoutine, which contains pointers to support functions for the sampling method. These support functions are plain C functions and are not visible or callable at the SQL level. The support functions are described in Section 55.1.
In addition to function pointers, the
TsmRoutine struct must provide these additional fields:
This is an OID list containing the data type OIDs of the parameter(s) that will be accepted by the
TABLESAMPLEclause when this sampling method is used. For example, for the built-in methods, this list contains a single item with value
FLOAT4OID, which represents the sampling percentage. Custom sampling methods can have more or different parameters.
true, the sampling method can deliver identical samples across successive queries, if the same parameters and
REPEATABLEseed value are supplied each time and the table contents have not changed. When this is
REPEATABLEclause is not accepted for use with the sampling method.
true, the sampling method can deliver identical samples across successive scans in the same query (assuming unchanging parameters, seed value, and snapshot). When this is
false, the planner will not select plans that would require scanning the sampled table more than once, since that might result in inconsistent query output.
TsmRoutine struct type is declared in
src/include/access/tsmapi.h, which see for additional details.
The table sampling methods included in the standard distribution are good references when trying to write your own. Look into the
src/backend/access/tablesample subdirectory of the source tree for the built-in sampling methods, and into the
contrib subdirectory for add-on methods.