From 8f5117806ab74241821ad33e2e8df9f90f2f6ffc Mon Sep 17 00:00:00 2001
From: Kommi <haribabuk@fast.au.fujitsu.com>
Date: Mon, 11 Mar 2019 15:44:44 +1100
Subject: [PATCH 5/5] Table access method API explanation

All the table access method API's and their details are explained.
---
 doc/src/sgml/am.sgml | 579 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 574 insertions(+), 5 deletions(-)
diff --git a/doc/src/sgml/am.sgml b/doc/src/sgml/am.sgml
index 8d9edff622..85a94aefca 100644
--- a/doc/src/sgml/am.sgml
+++ b/doc/src/sgml/am.sgml
@@ -18,14 +18,583 @@
   <para>
    All Tables in <productname>PostgreSQL</productname> are the primary
    data store. Each table is stored as its own physical <firstterm>relation</firstterm>
-   and so is described by an entry in the <structname>pg_class</structname>
-   catalog. The table contents are entirely under the control of its
-   access method. (All the access methods furthermore use the standard page
-   layout described in <xref linkend="storage-page-layout"/>.)
+   and is described by an entry in the <structname>pg_class</structname>
+   catalog. A table's content is entirely controlled by its access method, although
+   all access methods use the same standard page layout described in <xref linkend="storage-page-layout"/>.
   </para>
 
+  <sect2 id="table-access-methods-api">
+   <title>Table access method API</title>
+
+   <para>
+    Each table access method is described by a row in the
+    <link linkend="catalog-pg-am"><structname>pg_am</structname></link> system
+    catalog. The <structname>pg_am</structname> entry specifies a <firstterm>type</firstterm>
+    of the access method and a <firstterm>handler function</firstterm> for the
+    access method. These entries can be created and deleted using the <xref linkend="sql-create-access-method"/>
+    and <xref linkend="sql-drop-access-method"/> SQL commands.
+   </para>
+
+   <para>
+    A table access method handler function must be declared to accept a
+    single argument of type <type>internal</type> and to return the
+    pseudo-type <type>table_am_handler</type>.  The argument is a dummy value that
+    simply serves to prevent handler functions from being called directly from
+    SQL commands.  The result of the function must be a palloc'd struct of
+    type <structname>TableAmRoutine</structname>, which contains everything
+    that the core code needs to know to make use of the table access method.
+    The <structname>TableAmRoutine</structname> struct, also called the access
+    method's <firstterm>API struct</firstterm>, includes fields specifying assorted
+    fixed properties of the access method, such as whether it can support
+    bitmap scans.  More importantly, it contains pointers to support
+    functions for the access method, which do all of the real work to access
+    tables.  These support functions are plain C functions and are not
+    visible or callable at the SQL level.  The support functions are described
+    in <structname>TableAmRoutine</structname> structure. For more details, please
+    refer the file <filename>src/include/access/tableam.h</filename>.
+   </para>
+
+   <para>
+    Any new <literal>TABLE ACCSESS METHOD</literal> developers can refer the exisitng <literal>HEAP</literal>
+    implementation present in the <filename>src/backend/heap/heapam_handler.c</filename> for more details of
+    how it is implemented for HEAP access method.
+   </para>
+
+   <para>
+    There are different type of API's that are defined and those details are below.
+   </para>
+
+   <sect3 id="slot-implementation-function">
+    <title>Slot implementation functions</title>
+
+   <para>
+<programlisting>
+const TupleTableSlotOps *(*slot_callbacks) (Relation rel);
+</programlisting>
+
+    This API expects the function should return the slot implementation that is specific to the AM.
+    Following are the predefined types of slot implementations that are available,
+    <literal>TTSOpsVirtual</literal>, <literal>TTSOpsHeapTuple</literal>,
+    <literal>TTSOpsMinimalTuple</literal> and <literal>TTSOpsBufferHeapTuple</literal>.
+    The AM implementations can use any one of them. For more details of these slot
+    specific implementations, you can refer <filename>src/include/executor/tuptable.h</filename>.
+   </para>
+   </sect3>
+
+   <sect3 id="table-scan-functions">
+    <title>Table scan functions</title>
+
+    <para>
+     The following API's are used for scanning of a table.
+    </para>
+
+    <para>
+<programlisting>
+TableScanDesc (*scan_begin) (Relation rel,
+                             Snapshot snapshot,
+                             int nkeys, struct ScanKeyData *key,
+                             ParallelTableScanDesc pscan,
+                             bool allow_strat,
+                             bool allow_sync,
+                             bool allow_pagemode,
+                             bool is_bitmapscan,
+                             bool is_samplescan,
+                             bool temp_snap);
+</programlisting>
+
+     This API to start a scan of a relation pointed by <literal>rel</literal> and returns the
+     <structname>TableScanDesc</structname>, which will be typically embed in a larger AM specific,
+     strcut. <literal>nkeys</literal> indicates results needs to be filtered based on the <literal>key</literal>.
+     <literal>pscan</literal> can be used by the AM, in case if it supports parallel scan.
+     <literal>allow_strat</literal>, <literal>allow_sync</literal> and <literal>allow_pagemode</literal>
+     are used for specifying whether the scan strategy, as whether it supports synchronize scans or
+     pagemode scans (although every AM is not required to support these). <literal>is_bitmapscan</literal>
+     and <literal>is_samplescan</literal> are used to specify whether the scan is intended to support
+     those type of scans are not? <literal>temp_snap</literal> indicates the provided snapshot is a
+     temporary allocated and it needs to be freed at the scan end.
+    </para>
+
+    <para>
+<programlisting>
+void        (*scan_end) (TableScanDesc scan);
+</programlisting>
+
+     This API to end the scan that is started by the API <literal>scan_begin</literal>
+     by releasing the resources. <structfield>TableScanDesc.rs_snapshot</structfield>
+     needs to be unregistered and it can be deallocated based on <structfield>TableScanDesc.temp_snap</structfield>.
+    </para>
+
+    <para>
+<programlisting>
+void        (*scan_rescan) (TableScanDesc scan, struct ScanKeyData *key, bool set_params,
+                            bool allow_strat, bool allow_sync, bool allow_pagemode);
+</programlisting>
+
+     This API to restart the given relation scan that is already started by the
+     API <literal>scan_begin</literal>. if <literal>set_params</literal> is set
+     to true, consider the provided options into the scan.
+    </para>
+
+    <para>
+<programlisting>
+TupleTableSlot *(*scan_getnextslot) (TableScanDesc scan,
+                                     ScanDirection direction, TupleTableSlot *slot);
+</programlisting>
+
+     This API to return the next satisified tuple from the scan started by the API
+     <literal>scan_begin</literal> and store it in the <literal>slot</literal>.
+    </para>
+
+   </sect3>
+
+   <sect3 id="parallel-table-scan-function">
+    <title>parallel table scan functions</title>
+
+    <para>
+     The following API's are used to perform the parallel table scan.
+    </para>
+
+    <para>
+<programlisting>
+Size        (*parallelscan_estimate) (Relation rel);
+</programlisting>
+
+     This API to return the total size that is required for the AM to perform
+     the parallel table scan. The requied size must include the <structname>ParallelTableScanDesc</structname>
+     which is typically embed in the AM specific struct.
+    </para>
+
+    <para>
+<programlisting>
+Size        (*parallelscan_initialize) (Relation rel, ParallelTableScanDesc pscan);
+</programlisting>
+
+     This API to perform the initialization of the <literal>pscan</literal>
+     that is required for the parallel scan to be performed by the AM and also return
+     the size that is estimated by the <literal>parallelscan_estimate</literal>.
+    </para>
+
+    <para>
+<programlisting>
+void        (*parallelscan_reinitialize) (Relation rel, ParallelTableScanDesc pscan);
+</programlisting>
+
+     This API to reinitalize the parallel scan structure pointed by the <literal>pscan</literal>
+     for the same relation.
+    </para>
+
+   </sect3>
+
+   <sect3 id="index-scan-functions">
+    <title>Index scan functions</title>
+
+    <para>
+<programlisting>
+struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+</programlisting>
+
+     This API to prepare fetching tuples from the relation, as needed when fetching
+     from index scan. The API needs to return the allocated and initialized <structname>IndexFetchTableData</structname>
+     strutucture, which is typically embed in the AM specific struct.
+    </para>
+
+    <para>
+<programlisting>
+void        (*index_fetch_reset) (struct IndexFetchTableData *data);
+</programlisting>
+
+     This API to reset the index fetch, typically it releases the AM specific resources
+     that are held by <structname>IndexFetchTableData</structname> of a index scan.
+    </para>
+
+    <para>
+<programlisting>
+void        (*index_fetch_end) (struct IndexFetchTableData *data);
+</programlisting>
+
+     This API to release AM-specific resources held by the <structname>IndexFetchTableData</structname>
+     and free the memory of <structname>IndexFetchTableData</structname> itself.
+    </para>
+
+    <para>
+<programlisting>
+bool        (*index_fetch_tuple) (struct IndexFetchTableData *scan,
+                                  ItemPointer tid,
+                                  Snapshot snapshot,
+                                  TupleTableSlot *slot,
+                                  bool *call_again, bool *all_dead);
+</programlisting>
+
+     This API to fetch the tuple pointed by <literal>tid</literal> of a relation and store it in the
+     <literal>slot</literal> after performing visibility check according the provided <literal>snapshot</literal>.
+     Returns true when the tuple is found or false. <literal>call_again</literal> is false when the API
+     is called for the first time with the <literal>tid</literal>, in case if there are any potential match for
+     another tuple, <literal>call_again</literal> must be set to true to indicate the caller to execute the
+     API again to fetch the tuple. <literal>all_dead</literal> needs to be set to true when the tuple is not
+     visible.
+    </para>
+
+    <para>
+<programlisting>
+TransactionId (*compute_xid_horizon_for_tuples) (Relation rel,
+                                                 ItemPointerData *items,
+                                                 int nitems);
+</programlisting>
+
+     This API to get the newest xid among the provided tuples by <literal>items</literal>. This is used
+     to compute what snapshots to conflict with the <literal>items</literal> when replaying WAL records
+     for page-level index vacuums.
+    </para>
+
+   </sect3>
+
+   <sect3 id="non-modifying-tuple-functions">
+    <title>Non modifying tuple functions</title>
+
+    <para>
+<programlisting>
+bool        (*tuple_satisfies_snapshot) (Relation rel,
+                                         TupleTableSlot *slot,
+                                         Snapshot snapshot);
+</programlisting>
+
+     This API performs the tuple visibility that is present in the <literal>slot</literal>
+     based on provided snapshot and returns true if the current tuple is visible, otherwise false.
+    </para>
+
+    <para>
+<programlisting>
+bool        (*tuple_fetch_row_version) (Relation rel,
+                                        ItemPointer tid,
+                                        Snapshot snapshot,
+                                        TupleTableSlot *slot,
+                                        Relation stats_relation);
+</programlisting>
+
+     This API to fetches the latest tuple specified by the ItemPointer <literal>tid</literal>
+     and store it in the slot. For e.g, in the case if Heap AM, the update chains are created
+     whenever the tuple is updated, so the function should fetch the latest tuple.
+    </para>
+
+    <para>
+<programlisting>
+void        (*tuple_get_latest_tid) (Relation rel,
+                                     Snapshot snapshot,
+                                     ItemPointer tid);
+</programlisting>
+
+     This API to get the TID of the latest version of the tuple based on the specified
+     ItemPointer. For e.g, in the case of Heap AM, the update chains are created whenever
+     any tuple is updated. This API is useful to find out latest ItemPointer.
+    </para>
+
+    <para>
+<programlisting>
+bool        (*tuple_fetch_follow) (struct IndexFetchTableData *scan,
+                                   ItemPointer tid,
+                                   Snapshot snapshot,
+                                   TupleTableSlot *slot,
+                                   bool *call_again, bool *all_dead);
+</programlisting>
+
+     This API is used to fetch the tuple pointed by the ItemPointer based on the
+     IndexFetchTableData and store it in the specified slot and also updates the flags.
+     This API is called from the index scan operation.
+    </para>
+
+   </sect3>
+
+   <sect3 id="manipulation-of-physical-tuples-functions">
+    <title>Manipulation of physical tuples functions</title>
+
+    <para>
+<programlisting>
+void        (*tuple_insert) (Relation rel, TupleTableSlot *slot, CommandId cid,
+                             int options, struct BulkInsertStateData *bistate);
+</programlisting>
+
+     This API to insert the tuple contained in the provided slot into the relation
+     and update the unique identifier of the tuple <literal>ItemPointerData</literal>
+     in the slot, use the BulkInsertStateData if available.
+    </para>
+
+    <para>
+<programlisting>
+void        (*tuple_insert_speculative) (Relation rel,
+                                         TupleTableSlot *slot,
+                                         CommandId cid,
+                                         int options,
+                                         struct BulkInsertStateData *bistate,
+                                         uint32 specToken);
+</programlisting>
+
+     This API is similar like <literal>tuple_insert</literal> API, but it inserts the tuple
+     with addtional information that is necessray for speculative insertion, the insertion will
+     be confirmed later based on its successful insertion to the index.
+    </para>
+
+    <para>
+<programlisting>
+void        (*tuple_complete_speculative) (Relation rel,
+                                           TupleTableSlot *slot,
+                                           uint32 specToken,
+                                           bool succeeded);
+</programlisting>
+
+     This API to complete the speculative insertion of a tuple started by <literal>tuple_insert_speculative</literal>,
+     invoked after finishing the index insert and returns whether the operation is successfule or not?
+    </para>
+
+    <para>
+<programlisting>
+HTSU_Result (*tuple_delete) (Relation rel,
+                             ItemPointer tid,
+                             CommandId cid,
+                             Snapshot snapshot,
+                             Snapshot crosscheck,
+                             bool wait,
+                             HeapUpdateFailureData *hufd,
+                             bool changingPart);
+</programlisting>
+
+     This API to delete a tuple of the relation pointed by the ItemPointer and returns the
+     result of the operation. In case of any failure updates the hufd.
+    </para>
+
+    <para>
+<programlisting>
+HTSU_Result (*tuple_update) (Relation rel,
+                             ItemPointer otid,
+                             TupleTableSlot *slot,
+                             CommandId cid,
+                             Snapshot snapshot,
+                             Snapshot crosscheck,
+                             bool wait,
+                             HeapUpdateFailureData *hufd,
+                             LockTupleMode *lockmode,
+                             bool *update_indexes);
+</programlisting>
+
+     This API to perform updating a tuple with the new tuple pointed by the ItemPointer and returns
+     the result of the operation and also updates the flag whether the index needs an update or not?
+     In case of any failure it should update the hufd flag.
+    </para>
+
+    <para>
+<programlisting>
+void        (*multi_insert) (Relation rel, TupleTableSlot **slots, int nslots,
+                             CommandId cid, int options, struct BulkInsertStateData *bistate);
+</programlisting>
+
+     This API to perform insertion of multiple tuples into the relation for faster data insertion.
+     use the BulkInsertStateData if available.
+    </para>
+
+    <para>
+<programlisting>
+HTSU_Result (*tuple_lock) (Relation rel,
+                           ItemPointer tid,
+                           Snapshot snapshot,
+                           TupleTableSlot *slot,
+                           CommandId cid,
+                           LockTupleMode mode,
+                           LockWaitPolicy wait_policy,
+                           uint8 flags,
+                           HeapUpdateFailureData *hufd);
+</programlisting>
+
+     This API to lock the specified tuple pointed by the ItemPointer <literal>tid</literal>
+     of its newest version and returns the result of the operation. In case of failure updates the hufd.
+    </para>
+
+    <para>
+<programlisting>
+void        (*finish_bulk_insert) (Relation rel, int options);
+</programlisting>
+
+     This API to perform the operations necessary to complete insertions made
+     via <literal>tuple_insert</literal> and <literal>multi_insert</literal> with a
+     BulkInsertState specified. This e.g. may e.g. used to flush the relation when
+     inserting with skipping WAL or may be no operation.
+    </para>
+
+   </sect3>
+
+   <sect3 id="ddl-related-functions">
+    <title>DDL related functions</title>
+
+    <para>
+<programlisting>
+void        (*relation_set_new_filenode) (Relation rel,
+                                          char persistence,
+                                          TransactionId *freezeXid,
+                                          MultiXactId *minmulti);
+</programlisting>
+
+     This API to create the storage that is necessary to store the tuples of the relation
+     and also updates the minimum XID that is possible to insert the tuples. For e.g, the Heap AM,
+     should create the relfilenode that is necessary to store the heap tuples.
+    </para>
+
+    <para>
+<programlisting>
+void        (*relation_nontransactional_truncate) (Relation rel);
+</programlisting>
+
+     This API is used to truncate the specified relation, this operation is not non-reversible.
+    </para>
+
+    <para>
+<programlisting>
+void        (*relation_copy_data) (Relation rel, RelFileNode newrnode);
+</programlisting>
+
+     This API to perform the copy of the relation from existing filenode to the new filenode
+     specified by the <literal>newrnode</literal> and removes the existing filenode.
+    </para>
+
+    <para>
+<programlisting>
+void        (*relation_vacuum) (Relation onerel, int options,
+                                struct VacuumParams *params, BufferAccessStrategy bstrategy);
+</programlisting>
+
+     This API performs vacuuming of the relation based on the specified params.
+     It Gathers all the dead tuples of the relation and clean them including
+     the indexes.
+    </para>
+
+    <para>
+<programlisting>
+void        (*scan_analyze_next_block) (TableScanDesc scan, BlockNumber blockno,
+                                        BufferAccessStrategy bstrategy);
+</programlisting>
+
+     This API to return a relation block, required to perform tuple analysis. Analysis of this
+     information is used by the planner to optimize the query planning on this relation.
+    </para>
+
+    <para>
+<programlisting>
+bool        (*scan_analyze_next_tuple) (TableScanDesc scan, TransactionId OldestXmin,
+                                        double *liverows, double *deadrows, TupleTableSlot *slot);
+</programlisting>
+
+     This API to get the next visible tuple from the block being scanned based on the snapshot
+     and also updates the number of live and dead tuples encountered.
+    </para>
+
+    <para>
+<programlisting>
+void        (*relation_copy_for_cluster) (Relation NewHeap, Relation OldHeap, Relation OldIndex,
+                                          bool use_sort,
+                                          TransactionId OldestXmin, TransactionId FreezeXid, MultiXactId MultiXactCutoff,
+                                          double *num_tuples, double *tups_vacuumed, double *tups_recently_dead);
+</programlisting>
+
+     This API to make a copy of the content of a relation, optionally sorted using either the specified index or by sorting
+     explicitly. It also removes the dead tuples.
+    </para>
+
+    <para>
+<programlisting>
+double      (*index_build_range_scan) (Relation heap_rel,
+                                       Relation index_rel,
+                                       IndexInfo *index_nfo,
+                                       bool allow_sync,
+                                       bool anyvisible,
+                                       BlockNumber start_blockno,
+                                       BlockNumber end_blockno,
+                                       IndexBuildCallback callback,
+                                       void *callback_state,
+                                       TableScanDesc scan);
+</programlisting>
+
+     This API to scan the specified blocks of a given relation and insert them into the specified index
+     using the provided the callback function.
+    </para>
+
+    <para>
+<programlisting>
+void        (*index_validate_scan) (Relation heap_rel,
+                                    Relation index_rel,
+                                    IndexInfo *index_info,
+                                    Snapshot snapshot,
+                                    struct ValidateIndexState *state);
+</programlisting>
+
+     This API to scan the table according to the given snapshot and insert tuples
+     satisfying the snapshot into the specified index, provided their TIDs are
+     also present in the <structname>ValidateIndexState</structname> struct;
+     this API is used as the last phase of a concurrent index build.
+    </para>
+
+   </sect3>
+
+   <sect3 id="planner-functions">
+    <title>planner functions</title>
+
+    <para>
+<programlisting>
+void        (*relation_estimate_size) (Relation rel, int32 *attr_widths,
+                                       BlockNumber *pages, double *tuples, double *allvisfrac);
+</programlisting>
+
+     This API estimates the total size of the relation and also returns the number of
+     pages, tuples and etc related to the corresponding relation.
+    </para>
+
+   </sect3>
+
+   <sect3 id="executor-functions">
+    <title>executor functions</title>
+
+    <para>
+<programlisting>
+bool        (*scan_bitmap_pagescan) (TableScanDesc scan,
+                                     TBMIterateResult *tbmres);
+</programlisting>
+
+     This API to scan the relation block specified in the scan descriptor to collect and return the
+     tuples requested by the <structname>tbmres</structname> based on the visibility.
+    </para>
+
+    <para>
+<programlisting>
+bool        (*scan_bitmap_pagescan_next) (TableScanDesc scan,
+                                          TupleTableSlot *slot);
+</programlisting>
+
+     This API to get the next tuple from the set of tuples of a given page specified in the scan descriptor
+     and return the provided slot; returns false in case if there are no more tuples.
+    </para>
+
+    <para>
+<programlisting>
+bool        (*scan_sample_next_block) (TableScanDesc scan,
+                                       struct SampleScanState *scanstate);
+</programlisting>
+
+     This API to select the next block of a relation using the given sampling method or sequentially and
+     set its information in the scan descriptor.
+    </para>
+
+    <para>
+<programlisting>
+bool        (*scan_sample_next_tuple) (TableScanDesc scan,
+                                       struct SampleScanState *scanstate,
+                                       TupleTableSlot *slot);
+</programlisting>
+
+     This API get the next tuple to sample from the current sampling block based on
+     the sampling method, otherwise get the next visible tuple of the block that is
+     choosen from the <literal>scan_sample_next_block</literal>.
+    </para>
+
+  </sect3>
+  </sect2>
  </sect1>
- 
+
  <sect1 id="index-access-methods">
   <title>Overview of Index access methods</title>
 
-- 
2.20.1.windows.1