From 8f5117806ab74241821ad33e2e8df9f90f2f6ffc Mon Sep 17 00:00:00 2001 From: Kommi Date: Mon, 11 Mar 2019 15:44:44 +1100 Subject: [PATCH 5/5] Table access method API explanation All the table access method API's and their details are explained. --- doc/src/sgml/am.sgml | 579 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 574 insertions(+), 5 deletions(-) diff --git a/doc/src/sgml/am.sgml b/doc/src/sgml/am.sgml index 8d9edff622..85a94aefca 100644 --- a/doc/src/sgml/am.sgml +++ b/doc/src/sgml/am.sgml @@ -18,14 +18,583 @@ All Tables in PostgreSQL are the primary data store. Each table is stored as its own physical relation - and so is described by an entry in the pg_class - catalog. The table contents are entirely under the control of its - access method. (All the access methods furthermore use the standard page - layout described in .) + and is described by an entry in the pg_class + catalog. A table's content is entirely controlled by its access method, although + all access methods use the same standard page layout described in . + + Table access method API + + + Each table access method is described by a row in the + pg_am system + catalog. The pg_am entry specifies a type + of the access method and a handler function for the + access method. These entries can be created and deleted using the + and SQL commands. + + + + A table access method handler function must be declared to accept a + single argument of type internal and to return the + pseudo-type table_am_handler. The argument is a dummy value that + simply serves to prevent handler functions from being called directly from + SQL commands. The result of the function must be a palloc'd struct of + type TableAmRoutine, which contains everything + that the core code needs to know to make use of the table access method. + The TableAmRoutine struct, also called the access + method's API struct, includes fields specifying assorted + fixed properties of the access method, such as whether it can support + bitmap scans. More importantly, it contains pointers to support + functions for the access method, which do all of the real work to access + tables. These support functions are plain C functions and are not + visible or callable at the SQL level. The support functions are described + in TableAmRoutine structure. For more details, please + refer the file src/include/access/tableam.h. + + + + Any new TABLE ACCSESS METHOD developers can refer the exisitng HEAP + implementation present in the src/backend/heap/heapam_handler.c for more details of + how it is implemented for HEAP access method. + + + + There are different type of API's that are defined and those details are below. + + + + Slot implementation functions + + + +const TupleTableSlotOps *(*slot_callbacks) (Relation rel); + + + This API expects the function should return the slot implementation that is specific to the AM. + Following are the predefined types of slot implementations that are available, + TTSOpsVirtual, TTSOpsHeapTuple, + TTSOpsMinimalTuple and TTSOpsBufferHeapTuple. + The AM implementations can use any one of them. For more details of these slot + specific implementations, you can refer src/include/executor/tuptable.h. + + + + + Table scan functions + + + The following API's are used for scanning of a table. + + + + +TableScanDesc (*scan_begin) (Relation rel, + Snapshot snapshot, + int nkeys, struct ScanKeyData *key, + ParallelTableScanDesc pscan, + bool allow_strat, + bool allow_sync, + bool allow_pagemode, + bool is_bitmapscan, + bool is_samplescan, + bool temp_snap); + + + This API to start a scan of a relation pointed by rel and returns the + TableScanDesc, which will be typically embed in a larger AM specific, + strcut. nkeys indicates results needs to be filtered based on the key. + pscan can be used by the AM, in case if it supports parallel scan. + allow_strat, allow_sync and allow_pagemode + are used for specifying whether the scan strategy, as whether it supports synchronize scans or + pagemode scans (although every AM is not required to support these). is_bitmapscan + and is_samplescan are used to specify whether the scan is intended to support + those type of scans are not? temp_snap indicates the provided snapshot is a + temporary allocated and it needs to be freed at the scan end. + + + + +void (*scan_end) (TableScanDesc scan); + + + This API to end the scan that is started by the API scan_begin + by releasing the resources. TableScanDesc.rs_snapshot + needs to be unregistered and it can be deallocated based on TableScanDesc.temp_snap. + + + + +void (*scan_rescan) (TableScanDesc scan, struct ScanKeyData *key, bool set_params, + bool allow_strat, bool allow_sync, bool allow_pagemode); + + + This API to restart the given relation scan that is already started by the + API scan_begin. if set_params is set + to true, consider the provided options into the scan. + + + + +TupleTableSlot *(*scan_getnextslot) (TableScanDesc scan, + ScanDirection direction, TupleTableSlot *slot); + + + This API to return the next satisified tuple from the scan started by the API + scan_begin and store it in the slot. + + + + + + parallel table scan functions + + + The following API's are used to perform the parallel table scan. + + + + +Size (*parallelscan_estimate) (Relation rel); + + + This API to return the total size that is required for the AM to perform + the parallel table scan. The requied size must include the ParallelTableScanDesc + which is typically embed in the AM specific struct. + + + + +Size (*parallelscan_initialize) (Relation rel, ParallelTableScanDesc pscan); + + + This API to perform the initialization of the pscan + that is required for the parallel scan to be performed by the AM and also return + the size that is estimated by the parallelscan_estimate. + + + + +void (*parallelscan_reinitialize) (Relation rel, ParallelTableScanDesc pscan); + + + This API to reinitalize the parallel scan structure pointed by the pscan + for the same relation. + + + + + + Index scan functions + + + +struct IndexFetchTableData *(*index_fetch_begin) (Relation rel); + + + This API to prepare fetching tuples from the relation, as needed when fetching + from index scan. The API needs to return the allocated and initialized IndexFetchTableData + strutucture, which is typically embed in the AM specific struct. + + + + +void (*index_fetch_reset) (struct IndexFetchTableData *data); + + + This API to reset the index fetch, typically it releases the AM specific resources + that are held by IndexFetchTableData of a index scan. + + + + +void (*index_fetch_end) (struct IndexFetchTableData *data); + + + This API to release AM-specific resources held by the IndexFetchTableData + and free the memory of IndexFetchTableData itself. + + + + +bool (*index_fetch_tuple) (struct IndexFetchTableData *scan, + ItemPointer tid, + Snapshot snapshot, + TupleTableSlot *slot, + bool *call_again, bool *all_dead); + + + This API to fetch the tuple pointed by tid of a relation and store it in the + slot after performing visibility check according the provided snapshot. + Returns true when the tuple is found or false. call_again is false when the API + is called for the first time with the tid, in case if there are any potential match for + another tuple, call_again must be set to true to indicate the caller to execute the + API again to fetch the tuple. all_dead needs to be set to true when the tuple is not + visible. + + + + +TransactionId (*compute_xid_horizon_for_tuples) (Relation rel, + ItemPointerData *items, + int nitems); + + + This API to get the newest xid among the provided tuples by items. This is used + to compute what snapshots to conflict with the items when replaying WAL records + for page-level index vacuums. + + + + + + Non modifying tuple functions + + + +bool (*tuple_satisfies_snapshot) (Relation rel, + TupleTableSlot *slot, + Snapshot snapshot); + + + This API performs the tuple visibility that is present in the slot + based on provided snapshot and returns true if the current tuple is visible, otherwise false. + + + + +bool (*tuple_fetch_row_version) (Relation rel, + ItemPointer tid, + Snapshot snapshot, + TupleTableSlot *slot, + Relation stats_relation); + + + This API to fetches the latest tuple specified by the ItemPointer tid + and store it in the slot. For e.g, in the case if Heap AM, the update chains are created + whenever the tuple is updated, so the function should fetch the latest tuple. + + + + +void (*tuple_get_latest_tid) (Relation rel, + Snapshot snapshot, + ItemPointer tid); + + + This API to get the TID of the latest version of the tuple based on the specified + ItemPointer. For e.g, in the case of Heap AM, the update chains are created whenever + any tuple is updated. This API is useful to find out latest ItemPointer. + + + + +bool (*tuple_fetch_follow) (struct IndexFetchTableData *scan, + ItemPointer tid, + Snapshot snapshot, + TupleTableSlot *slot, + bool *call_again, bool *all_dead); + + + This API is used to fetch the tuple pointed by the ItemPointer based on the + IndexFetchTableData and store it in the specified slot and also updates the flags. + This API is called from the index scan operation. + + + + + + Manipulation of physical tuples functions + + + +void (*tuple_insert) (Relation rel, TupleTableSlot *slot, CommandId cid, + int options, struct BulkInsertStateData *bistate); + + + This API to insert the tuple contained in the provided slot into the relation + and update the unique identifier of the tuple ItemPointerData + in the slot, use the BulkInsertStateData if available. + + + + +void (*tuple_insert_speculative) (Relation rel, + TupleTableSlot *slot, + CommandId cid, + int options, + struct BulkInsertStateData *bistate, + uint32 specToken); + + + This API is similar like tuple_insert API, but it inserts the tuple + with addtional information that is necessray for speculative insertion, the insertion will + be confirmed later based on its successful insertion to the index. + + + + +void (*tuple_complete_speculative) (Relation rel, + TupleTableSlot *slot, + uint32 specToken, + bool succeeded); + + + This API to complete the speculative insertion of a tuple started by tuple_insert_speculative, + invoked after finishing the index insert and returns whether the operation is successfule or not? + + + + +HTSU_Result (*tuple_delete) (Relation rel, + ItemPointer tid, + CommandId cid, + Snapshot snapshot, + Snapshot crosscheck, + bool wait, + HeapUpdateFailureData *hufd, + bool changingPart); + + + This API to delete a tuple of the relation pointed by the ItemPointer and returns the + result of the operation. In case of any failure updates the hufd. + + + + +HTSU_Result (*tuple_update) (Relation rel, + ItemPointer otid, + TupleTableSlot *slot, + CommandId cid, + Snapshot snapshot, + Snapshot crosscheck, + bool wait, + HeapUpdateFailureData *hufd, + LockTupleMode *lockmode, + bool *update_indexes); + + + This API to perform updating a tuple with the new tuple pointed by the ItemPointer and returns + the result of the operation and also updates the flag whether the index needs an update or not? + In case of any failure it should update the hufd flag. + + + + +void (*multi_insert) (Relation rel, TupleTableSlot **slots, int nslots, + CommandId cid, int options, struct BulkInsertStateData *bistate); + + + This API to perform insertion of multiple tuples into the relation for faster data insertion. + use the BulkInsertStateData if available. + + + + +HTSU_Result (*tuple_lock) (Relation rel, + ItemPointer tid, + Snapshot snapshot, + TupleTableSlot *slot, + CommandId cid, + LockTupleMode mode, + LockWaitPolicy wait_policy, + uint8 flags, + HeapUpdateFailureData *hufd); + + + This API to lock the specified tuple pointed by the ItemPointer tid + of its newest version and returns the result of the operation. In case of failure updates the hufd. + + + + +void (*finish_bulk_insert) (Relation rel, int options); + + + This API to perform the operations necessary to complete insertions made + via tuple_insert and multi_insert with a + BulkInsertState specified. This e.g. may e.g. used to flush the relation when + inserting with skipping WAL or may be no operation. + + + + + + DDL related functions + + + +void (*relation_set_new_filenode) (Relation rel, + char persistence, + TransactionId *freezeXid, + MultiXactId *minmulti); + + + This API to create the storage that is necessary to store the tuples of the relation + and also updates the minimum XID that is possible to insert the tuples. For e.g, the Heap AM, + should create the relfilenode that is necessary to store the heap tuples. + + + + +void (*relation_nontransactional_truncate) (Relation rel); + + + This API is used to truncate the specified relation, this operation is not non-reversible. + + + + +void (*relation_copy_data) (Relation rel, RelFileNode newrnode); + + + This API to perform the copy of the relation from existing filenode to the new filenode + specified by the newrnode and removes the existing filenode. + + + + +void (*relation_vacuum) (Relation onerel, int options, + struct VacuumParams *params, BufferAccessStrategy bstrategy); + + + This API performs vacuuming of the relation based on the specified params. + It Gathers all the dead tuples of the relation and clean them including + the indexes. + + + + +void (*scan_analyze_next_block) (TableScanDesc scan, BlockNumber blockno, + BufferAccessStrategy bstrategy); + + + This API to return a relation block, required to perform tuple analysis. Analysis of this + information is used by the planner to optimize the query planning on this relation. + + + + +bool (*scan_analyze_next_tuple) (TableScanDesc scan, TransactionId OldestXmin, + double *liverows, double *deadrows, TupleTableSlot *slot); + + + This API to get the next visible tuple from the block being scanned based on the snapshot + and also updates the number of live and dead tuples encountered. + + + + +void (*relation_copy_for_cluster) (Relation NewHeap, Relation OldHeap, Relation OldIndex, + bool use_sort, + TransactionId OldestXmin, TransactionId FreezeXid, MultiXactId MultiXactCutoff, + double *num_tuples, double *tups_vacuumed, double *tups_recently_dead); + + + This API to make a copy of the content of a relation, optionally sorted using either the specified index or by sorting + explicitly. It also removes the dead tuples. + + + + +double (*index_build_range_scan) (Relation heap_rel, + Relation index_rel, + IndexInfo *index_nfo, + bool allow_sync, + bool anyvisible, + BlockNumber start_blockno, + BlockNumber end_blockno, + IndexBuildCallback callback, + void *callback_state, + TableScanDesc scan); + + + This API to scan the specified blocks of a given relation and insert them into the specified index + using the provided the callback function. + + + + +void (*index_validate_scan) (Relation heap_rel, + Relation index_rel, + IndexInfo *index_info, + Snapshot snapshot, + struct ValidateIndexState *state); + + + This API to scan the table according to the given snapshot and insert tuples + satisfying the snapshot into the specified index, provided their TIDs are + also present in the ValidateIndexState struct; + this API is used as the last phase of a concurrent index build. + + + + + + planner functions + + + +void (*relation_estimate_size) (Relation rel, int32 *attr_widths, + BlockNumber *pages, double *tuples, double *allvisfrac); + + + This API estimates the total size of the relation and also returns the number of + pages, tuples and etc related to the corresponding relation. + + + + + + executor functions + + + +bool (*scan_bitmap_pagescan) (TableScanDesc scan, + TBMIterateResult *tbmres); + + + This API to scan the relation block specified in the scan descriptor to collect and return the + tuples requested by the tbmres based on the visibility. + + + + +bool (*scan_bitmap_pagescan_next) (TableScanDesc scan, + TupleTableSlot *slot); + + + This API to get the next tuple from the set of tuples of a given page specified in the scan descriptor + and return the provided slot; returns false in case if there are no more tuples. + + + + +bool (*scan_sample_next_block) (TableScanDesc scan, + struct SampleScanState *scanstate); + + + This API to select the next block of a relation using the given sampling method or sequentially and + set its information in the scan descriptor. + + + + +bool (*scan_sample_next_tuple) (TableScanDesc scan, + struct SampleScanState *scanstate, + TupleTableSlot *slot); + + + This API get the next tuple to sample from the current sampling block based on + the sampling method, otherwise get the next visible tuple of the block that is + choosen from the scan_sample_next_block. + + + + - + Overview of Index access methods -- 2.20.1.windows.1