Re: Extension of Thick Indexes - Mailing list pgsql-hackers

From Shrish Purohit
Subject Re: Extension of Thick Indexes
Date
Msg-id 001801c9a93f$8caf1cd0$49d8580a@persistent.co.in
Whole thread Raw
In response to Re: Extension of Thick Indexes  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
List pgsql-hackers
Hi All,

Some brief information about the thick index patch.

The patch adds snapshot (MVCC) information to the indexes to enable them
being used independently.  With this information, the indexes need not refer
to the heap data to check an index key's visibility. Various functions such
as IndexTupleSatisfiesMVCC() are uesd to verify if the MVCC satisfies a
snapshot.

The following data structure is added to the thick index key data structure:
struct SnapshotFieldsData{      TransactionId t_xmin;      TransactionId t_xmax;      TransactionId t_cid;      uint16
infomask;};

Thus the pg_index catalog is added with flag to indicate thick index.
(FormData_pg_index ) is modified accordingly.

The IndexScanDescData structure, which stores information about how an index
can be scanned, was modified to include:SnapshotIndexParams  sindex_params; 
// params of thick index that can be used for scanningvoid *opaque;

The following data data-structure is used by the database execution engine
to scan thick index:
typedef struct SnapshotIndexParamsData{      bool         isIndexOnlyScan;    // true if index should be used without
accessingthe table.      bool         is_minmax_scan;    // true if this the query contains min/max aggregate
 
functions.      bool         is_count_only_scan;    // if the query contains only count(), then true. 
// If true, then index rows are not cached.      bool         cache_stack;    // If true, remembers path from root to
leafin the stack (a
 
member 
// var defined below). Used For optimization of updates of a row, which 
// requires delete and insert.  
// While deleting a key from index, its stack is tracked, and the same stack

// is used for inserting the updated key.      CmdType      operation;    // insert, update, delete, .      HeapTuple
htup;    // reference to the heap tuple that's being operated      ScanKey      insertion_skey;  /* Insertion Scan Key
*/     void*      stack;// reference to the stack mentioned in comments of cache_stack.      TupleTableSlot* slot;
// stores heap data structures containing index keys} SnapshotIndexParamsData;
 

IndexScan data structure, which stores information about how an index is
scanned, is updated to include following members:
Bool indexOnlyScan;
Bool is_minmax_scan;
Bool is_count_only_scan;

The Postgres optimizer figures out various ways to access each relation used
by the input SQL statement in set_plain_rel_pathlist() function, which
indirectly invokes create_index_path(). Create_index_path()  does the
following:
Identifies index path to access a table, figures out if index covers all the
columns used by the SQL statement to initialize index_covers_all_clauses
sets the cost of accessing table using index (or just using index only scan)
by invoking cost_index() function.

Functions index_covers_rel_clauses, index_covers_having_quals,
index_covers_target_list are used to set flag index_covers_all_clauses. 
index_covers_all_clauses flag indicates that index covers all clauses and
can be used In IndexOnlyScan mode.

--How updates and deletes are performed? 

find_old_slot(estate) identifies the old_slot information from estates'
ExprContexts. ExecUpdate uses oldslot, newslot and function
ExecUpdateIndexTuples is used to update the index. ExecUpdateIndexTuples
prepares iscan IndexScanDescriptor which is used to scan index.
Index is scanded by index_getnext(iscan,ForwardScanDirection).for b-tree
index it internall calls _bt_readpage. for update and delete operations
_bt_readpage calls SnapshotIndexDeleteTuple which updates mvcc information
of the index tuple based on MVCC information includeed in the slot.
ExecUpdateIndexTuples then inserts the new index tuple. 

FormIndexDatum stores the MVCC information along with values []. 
index_form_tuple is also modified accordingly to store MVCC information. 

Functions StoreMinimalIndexTuple and GetNextMinimalIndexTuple are used to
store index key column values into mintup and to get values from mintup
respectively.

For countOnlyScans we count same static slot, while for minmax_scan first
non-null tuple as the tuple with minium/maximum value.


Thanks,
Shrish Purohit | Software Engineer | Persistent Systems
shrish_purohit@persistent.co.in  | Cell: +91 98509 59940 | Tel: +91 (20)
3023 4493
Innovation in software product design, development and delivery-
www.persistentsys.com

-----Original Message-----
From: Heikki Linnakangas [mailto:heikki.linnakangas@enterprisedb.com] 
Sent: Friday, March 20, 2009 2:01 PM
To: Gokulakannan Somasundaram
Cc: Alvaro Herrera; Josh Berkus; Shrish Purohit; Amit Gupta;
pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Extension of Thick Indexes

Gokulakannan Somasundaram wrote:
>> It would be helpful to explain how this solves the lack of atomicity of
>> visibility data updates, which last time I checked was the killer
>> problem for this feature.
> 
> Hmmm... To put it more clearly, this problem occurs when there is a thick
> index on a mutable function(marked as immutable). In order to avoid the
> problem, i wrote the code that would not support functional indexes, it
> would only support the normal ones. I  think the main argument against
Thick
> Index was
>  - Visibility Map, which supports "Index only Scans" partially but by
> occupying lesser space and doesn't have the functional index issue. Since
> the main advantage of Thick Index was Index Only Scans, the community
> preferred to wait for Visibility map
> 
> Heikki is working on the Visibility map and i think his observations might
> help on Thick Index project.

The common ground between thick indexes and visibility map based 
index-only-scans is the method to pass Datums from the index to the 
executor, and all the executor and planner changes to take advantage of 
that. In fact, even without thick indexes or visibility map, that would 
provide some gain: we could use the data from the index to filter rows 
before going to the heap to check visibility. For example if you have a 
wide table with a text column, and there's an index on the text column, 
it would be faster to do a full scan on the index for a predicate like 
"textcol LIKE '%foobar%'", than a sequential scan of the heap, assuming 
there's only few matches.

The thick index approach has a lot of limitations compared to using 
visibility map: How to deal with functional indexes? How to deal with 
datatypes with broken comparison functions? And it needs support from 
each indexam separately, and is outright impossible in something like an 
bitmap index. These have all been discussed before, but I believe the 
executor and indexam API changes required are similar to that with the 
visibility map. That part of the patch I'm very interested in, though I 
haven't looked at it at all yet.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is
intendedonly for the use of the individual or entity to which it is addressed. If you are not the intended recipient,
youare not authorized to read, retain, copy, print, distribute or use this message. If you have received this
communicationin error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not
acceptany liability for virus infected mails.
 


pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Extension of Thick Indexes
Next
From: Robert Treat
Date:
Subject: Re: small but useful patches for text search