New xindex.sgml - Mailing list pgsql-hackers

From D'Arcy" "J.M." Cain
Subject New xindex.sgml
Date
Msg-id m10n0fm-0000bIC@druid.net
Whole thread Raw
List pgsql-hackers
I have made a number of changes to xindex.sgml.  I reformatted it to make
the source a little easier to read.  I also added the bits that I recently
discovered making my own user-defined type work with indeces in where
clauses.  Can a few people pick at this and see if I am correct then
put it into the tree if it is OK?

And thanks for the help getting my type working.

<Chapter Id="xindex">
<Title>Interfacing Extensions To Indices</Title>

<Para>    The procedures described thus far let you define a new type, new    functions and new operators.  However, we
cannotyet define a secondary    index (such as a <Acronym>B-tree</Acronym>, <Acronym>R-tree</Acronym> or    hash access
method)over a new type or its operators.
 
</Para>

<Para>    Look back at    <XRef LinkEnd="EXTEND-CATALOGS" EndTerm="EXTEND-CATALOGS">.    The right half shows the
catalogs that we must modify in order to tell    <ProductName>Postgres</ProductName> how to use a user-defined type
and/or   user-defined  operators with an index (i.e., <FileName>pg_am, pg_amop,    pg_amproc, pg_operator</FileName>
and<FileName>pg_opclass</FileName>).    Unfortunately, there is no simple command to do this.  We will demonstrate
howto modify these catalogs through a running example:  a  new  operator    class for the <Acronym>B-tree</Acronym>
accessmethod that stores and    sorts complex numbers in ascending absolute value order.
 
</Para>

<Para>    The <FileName>pg_am</FileName> class contains one instance for every user    defined access method.  Support
forthe heap access method is built into    <ProductName>Postgres</ProductName>, but every other access method is
describedhere.  The schema is
 

<TABLE TOCENTRY="1">
<Title>Index Schema</Title>
<TitleAbbrev>Indices</TitleAbbrev>
<TGroup Cols="2">
<THead>
<Row> <Entry>Attribute</Entry> <Entry>Description</Entry>
</Row>
</THead>
<TBody>
<Row> <Entry>amname</Entry> <Entry>name of the access method</Entry>
</Row>
<Row>
<Entry>amowner</Entry>
<Entry>object id of the owner's instance in pg_user</Entry>
</Row>
<Row>
<Entry>amkind</Entry>
<Entry>not used at present, but set to 'o' as a place holder</Entry>
</Row>
<Row>
<Entry>amstrategies</Entry>
<Entry>number of strategies for this access method (see below)</Entry>
</Row>
<Row>
<Entry>amsupport</Entry>
<Entry>number of support routines for this access method (see below)</Entry>
</Row>
<Row>
<Entry>amgettuple   aminsert   ...</Entry>

<Entry>procedure  identifiers  for  interface routines to the access   method.  For example, regproc ids for opening,
closing, and   getting instances from the access method appear here. </Entry>
 
</Row>
</TBody>
</TGroup>
</TABLE>
</Para>

<Para>    The <Acronym>object ID</Acronym> of the instance in    <FileName>pg_am</FileName> is used as a foreign key in
lotsof other    classes.  You  don't  need to  add a new instance to this class; all    you're interested in is the
<Acronym>objectID</Acronym> of the access    method instance you want to extend:
 

<ProgramListing>
SELECT oid FROM pg_am WHERE amname = 'btree';
        +----+        |oid |        +----+        |403 |        +----+
</ProgramListing>
</Para>

<Para>    We will use that select in a where clause later.
</Para>

<Para>    The <FileName>amstrategies</FileName> attribute exists to standardize    comparisons across data types.  For
example,<Acronym>B-tree</Acronym>s    impose a strict ordering on keys, lesser to greater.  Since
<ProductName>Postgres</ProductName>allows the user to define operators,    <ProductName>Postgres</ProductName> cannot
lookat the name of an operator    (eg, ">" or "<") and tell what kind of comparison it is.  In fact,    some  access
methodsdon't impose any ordering at all.  For example,    <Acronym>R-tree</Acronym>s express a rectangle-containment
relationship,   whereas a hashed data structure expresses only bitwise similarity based    on the value of a hash
function. <ProductName>Postgres</ProductName>    needs some consistent way of taking a qualification in your query,
lookingat the operator and then deciding if a usable index exists.  This    implies that
<ProductName>Postgres</ProductName>needs to know, for    example, that the  "<="  and  ">" operators partition a
<Acronym>B-tree</Acronym>. <ProductName>Postgres</ProductName>    uses strategies to express these relationships
between   operators and the way they can be used to scan indices.
 
</Para>

<Para>    Defining a new set of strategies is beyond the scope of this discussion,    but we'll explain how
<Acronym>B-tree</Acronym>strategies work because    you'll need to know that to add a new operator class. In the
<FileName>pg_am</FileName>class, the amstrategies attribute is the    number of strategies defined for this access
method.For    <Acronym>B-tree</Acronym>s, this number is 5.  These strategies    correspond to
 

<TABLE TOCENTRY="1">
<Title>B-tree Strategies</Title>
<TitleAbbrev>B-tree</TitleAbbrev>
<TGroup Cols="2">
<THead>
<Row>
<Entry>Operation</Entry>
<Entry>Index</Entry>
</Row>
</THead>
<TBody>
<Row>
<Entry>less than</Entry>
<Entry>1</Entry>
</Row>
<Row>
<Entry>less than or equal</Entry>
<Entry>2</Entry>
</Row>
<Row>
<Entry>equal</Entry>
<Entry>3</Entry>
</Row>
<Row>
<Entry>greater than or equal</Entry>
<Entry>4</Entry>
</Row>
<Row>
<Entry>greater than</Entry>
<Entry>5</Entry>
</Row>
</TBody>
</TGroup>
</TABLE>
</Para>

<Para>    The idea is that you'll need to add procedures corresponding to the    comparisons above to the
<FileName>pg_amop</FileName>relation (see below).    The access method code can use these strategy numbers, regardless
ofdata    type, to figure out how to partition the <Acronym>B-tree</Acronym>,    compute selectivity, and so on.  Don't
worryabout the details of adding    procedures yet; just understand that there must be a set of these    procedures for
<FileName>int2,int4, oid,</FileName> and every other    data type on which a <Acronym>B-tree</Acronym> can operate.
 
    Sometimes, strategies aren't enough information for the system to figure    out how to use an index.  Some access
methodsrequire other support    routines in order to work. For example, the <Acronym>B-tree</Acronym>    access method
mustbe able to compare two keys and determine whether one    is greater than, equal to, or less than the other.
Similarly,the    <Acronym>R-tree</Acronym> access method must be able to compute    intersections,  unions, and sizes
ofrectangles.  These    operations do not correspond to user qualifications in    SQL queries;  they are administrative
routinesused by    the access methods, internally.
 
</Para>

<Para>    In order to manage diverse support routines consistently across all    <ProductName>Postgres</ProductName>
accessmethods,    <FileName>pg_am</FileName> includes an attribute called    <FileName>amsupport</FileName>.  This
attributerecords the number of    support routines used by an access method.  For <Acronym>B-tree</Acronym>s,    this
numberis one -- the routine to take two keys and return -1, 0, or    +1, depending on whether the first key is less
than,equal    to, or greater than the second.
 
<Note>
<Para>
Strictly  speaking, this routine can return a negative
number (< 0), 0, or a non-zero positive number (> 0).
</Para>
</Note>
</para>
<Para>    The <FileName>amstrategies</FileName> entry in pg_am is just the number    of strategies defined for the
accessmethod in question.  The procedures    for less than, less equal, and so on don't appear in
<FileName>pg_am</FileName>. Similarly, <FileName>amsupport</FileName>    is just the number of support routines
requiredby  the  access    method.  The actual routines are listed elsewhere.
 
</Para>

<Para>    The next class of interest is pg_opclass.  This class exists only to    associate a name and default type
withan oid.  In pg_amop, every    <Acronym>B-tree</Acronym> operator class has a set of procedures, one    through
five,above. Some existing opclasses are <FileName>int2_ops,    int4_ops, and oid_ops</FileName>.  You need to add an
instancewith your    opclass name (for example, <FileName>complex_abs_ops</FileName>) to
<FileName>pg_opclass</FileName>. The <FileName>oid</FileName> of    this instance is a foreign key in other classes.
 

<ProgramListing>
INSERT INTO pg_opclass (opcname, opcdeftype)   SELECT 'complex_abs_ops', oid FROM pg_type WHERE typname =
'complex_abs';

SELECT oid, opcname, opcdeftype   FROM pg_opclass   WHERE opcname = 'complex_abs_ops';
        +------+-----------------+------------+        |oid   | opcname         | opcdeftype |
+------+-----------------+------------+       |17314 | complex_abs_ops |      29058 |
+------+-----------------+------------+
</ProgramListing>
    Note that the oid for your <FileName>pg_opclass</FileName> instance will    be different!  Don't worry about this
though. We'll get this number    from the system later just like we got the oid of the type here.
 
</Para>

<Para>    So now we have an access method and an operator  class.    We  still  need  a  set of operators; the
procedurefor    defining operators was discussed earlier in  this  manual.    For  the  complex_abs_ops  operator
classon Btrees,    the operators we require are:
 

<ProgramListing>       absolute value less-than       absolute value less-than-or-equal       absolute value equal
absolute value greater-than-or-equal       absolute value greater-than
 
</ProgramListing>
</Para>

<Para>    Suppose the code that implements the functions  defined    is stored in the file
<FileName>PGROOT/src/tutorial/complex.c</FileName>
</Para>

<Para>    Part of the code look like this: (note that we will only show the    equality operator for the rest of the
examples. The other four    operators are very similar.  Refer to <FileName>complex.c</FileName>    or
<FileName>complex.source</FileName>for the details.)
 

<ProgramListing>
#define Mag(c) ((c)->x*(c)->x + (c)->y*(c)->y)
        bool        complex_abs_eq(Complex *a, Complex *b)        {            double amag = Mag(a), bmag = Mag(b);
      return (amag==bmag);        }
 
</ProgramListing>
</Para>

<Para>    There are a couple of important things that are happening below.
</Para>

<Para>    First, note that operators for less-than, less-than-or equal, equal,    greater-than-or-equal, and
greater-thanfor <FileName>int4</FileName>    are being defined.  All of these operators are already defined for
<FileName>int4</FileName>under the names <, <=, =, >=,    and >. The new operators behave differently, of
course. In order    to guarantee that <ProductName>Postgres</ProductName> uses these    new operators rather than the
oldones, they need to be named differently    from the old ones.  This is a key point: you can overload operators in
<ProductName>Postgres</ProductName>,but only if the operator isn't    already defined for the argument types.  That is,
ifyou have <    defined for (int4, int4), you can't define it again.    <ProductName>Postgres</ProductName> does not
checkthis when you define    your operator, so be careful.  To avoid this problem, odd names will be    used for the
operators. If you get this wrong, the access methods    are likely to crash when you try to do scans.
 
</Para>

<Para>    The other important point is that all the operator functions return    Boolean values.  The access methods
relyon this fact.  (On the other    hand, the support function returns whatever the particular access method    expects
--in this case, a signed integer.) The final routine in the    file is the "support routine" mentioned when we
discussedthe amsupport    attribute of the <FileName>pg_am</FileName> class.  We will use this    later on.  For now,
ignoreit.
 
</Para>

<Para>
<ProgramListing>
CREATE FUNCTION complex_abs_eq(complex_abs, complex_abs)             RETURNS bool             AS
'PGROOT/tutorial/obj/complex.so'            LANGUAGE 'c';
 
</ProgramListing>
</Para>

<Para>    Now define the operators that use them.  As noted, the operator names    must be unique among all operators
thattake two <FileName>int4</FileName>    operands.  In order to see if the operator names listed below are taken,
wecan do a query  on <FileName>pg_operator</FileName>:
 

<ProgramListing>   /*    * this query uses the regular expression operator (~)    * to find three-character operator
namesthat end in    * the character &    */   SELECT *    FROM pg_operator    WHERE oprname ~ '^..&$'::text;
 
</ProgramListing>

</Para>

<Para>    to see if your name is taken for the types you want.  The important    things here are the procedure (which
arethe <Acronym>C</Acronym>    functions defined above) and the restriction and join selectivity    functions.  You
shouldjust use the ones used below--note that there    are different such functions for the less-than, equal, and
greater-than   cases.  These must be supplied, or the access method will crash when it    tries to use the operator.
Youshould copy the names for restrict and    join, but use the procedure names you defined in the last step.
 

<ProgramListing>
CREATE OPERATOR = (    leftarg = complex_abs, rightarg = complex_abs,    procedure = complex_abs_eq,    restrict =
eqsel,join = eqjoinsel        )
 
</ProgramListing>
</Para>

<Para>    Notice that five operators corresponding to less,  less equal, equal,    greater, and greater equal are
defined.
</Para>

<Para>    We're just about finished. the last thing we need to do is to update    the <FileName>pg_amop</FileName>
relation. To do this, we need the    following attributes:
 

<TABLE TOCENTRY="1">
<Title><FileName>pg_amproc</FileName> Schema</Title>
<TitleAbbrev><FileName>pg_amproc</FileName></TitleAbbrev>
<TGroup Cols="2">
<THead>
<Row>
<Entry>Attribute</Entry>
<Entry>Description</Entry>
</Row>
</THead>
<TBody>
<Row>
<Entry>amopid</Entry>
<Entry>the <FileName>oid</FileName> of the <FileName>pg_am</FileName> instancefor  B-tree (== 403, see above)</Entry>
</Row>
<Row>
<Entry>amopclaid</Entry>
<Entry>the <FileName>oid</FileName> of the
<FileName>pg_opclass</FileName>  instance for <FileName>complex_abs_ops</FileName>(== whatever you got instead  of
<FileName>17314</FileName>,see above)</Entry>
 
</Row>
<Row>
<Entry>amopopr</Entry>
<Entry>the <FileName>oid</FileName>s of the  operators  for the opclass (which we'll get in just a minute)</Entry>
</Row>
<Row>
<Entry>amopselect, amopnpages</Entry>
<Entry>cost functions</Entry>
</Row>
</TBody>
</TGroup>
</TABLE>
    The cost functions are used by the query optimizer to decide whether or    not to use a given index in a scan.
Fortunately,these already exist.    The two functions we'll use are <FileName>btreesel</FileName>, which    estimates
theselectivity of the <Acronym>B-tree</Acronym>, and    <FileName>btreenpage</FileName>, which estimates the number of
pagesa    search will touch in the tree.
 
</Para>

<Para>    So we need the <FileName>oid</FileName>s of the operators we just    defined.  We'll look up the names of all
theoperators that take    two <FileName>complex</FileName>es, and pick ours out:
 

<ProgramListing>   SELECT o.oid AS opoid, o.oprname    INTO TABLE complex_ops_tmp    FROM pg_operator o, pg_type t
WHEREo.oprleft = t.oid and o.oprright = t.oid     and t.typname = 'complex_abs';
 
        +------+---------+        |oid   | oprname |        +------+---------+        |17321 | <       |
+------+---------+       |17322 | <=      |        +------+---------+        |17323 |  =      |
+------+---------+       |17324 | >=      |        +------+---------+        |17325 | >       |
+------+---------+
</ProgramListing>
    (Again, some of your <FileName>oid</FileName> numbers will almost    certainly be different.)  The operators we are
interestedin are those    with <FileName>oid</FileName>s 17321 through 17325.  The values you    get will probably be
different,and you should substitute them for the    values below.  We will do this with a select statement.
 
</Para>

<Para>    Now we're ready to update <FileName>pg_amop</FileName> with our new    operator class.  The most important
thingin this entire discussion    is that the operators are ordered, from less equal through greater    equal, in
<FileName>pg_amop</FileName>. We add the instances we need:
 

<ProgramListing>   INSERT INTO pg_amop (amopid, amopclaid, amopopr, amopstrategy,               amopselect, amopnpages)
      SELECT am.oid, opcl.oid, c.opoid, 1,               'btreesel'::regproc, 'btreenpage'::regproc       FROM pg_am
am,pg_opclass opcl, complex_abs_ops_tmp c       WHERE amname = 'btree' AND           opcname = 'complex_abs_ops' AND
      c.oprname = '<';
 
</ProgramListing>
    Now do this for the other operators substituting for the "1" in the    third line above and the "<" in the last
line. Note the order:    "less than" is 1, "less than or equal" is 2, "equal" is 3, "greater    than or equal" is 4,
and"greater than" is 5.
 
</Para>

<Para>    The next step is registration of the "support routine" previously    described in our discussion of
<FileName>pg_am</FileName>. The    <FileName>oid</FileName> of this support routine is stored in the
<FileName>pg_amproc</FileName>class, keyed by the access method    <FileName>oid</FileName> and the operator class
<FileName>oid</FileName>.   First, we need to register the function in    <ProductName>Postgres</ProductName> (recall
thatwe put the    <Acronym>C</Acronym> code that implements this routine in the bottom of    the file in which we
implementedthe operator routines):
 

<ProgramListing>   CREATE FUNCTION complex_abs_cmp(complex, complex)    RETURNS int4    AS
'PGROOT/tutorial/obj/complex.so'   LANGUAGE 'c';
 
   SELECT oid, proname FROM pg_proc    WHERE proname = 'complex_abs_cmp';
        +------+-----------------+        |oid   | proname         |        +------+-----------------+        |17328 |
complex_abs_cmp|        +------+-----------------+
 
</ProgramListing>
    (Again, your <FileName>oid</FileName> number will probably be different    and you should substitute the value you
seefor the value below.)    We can add the new instance as follows:
 

<ProgramListing>   INSERT INTO pg_amproc (amid, amopclaid, amproc, amprocnum)       SELECT a.oid, b.oid, c.oid, 1
   FROM pg_am a, pg_opclass b, pg_proc c           WHERE a.amname = 'btree' AND               b.opcname =
'complex_abs_ops'AND               c.proname = 'complex_abs_cmp';
 
</ProgramListing>
</Para>

<Para>    Now we need to add a hashing strategy to allow the type to be indexed.    We do this by using another type in
pg_ambut we reuse the sames ops.
 

<ProgramListing>   INSERT INTO pg_amop (amopid, amopclaid, amopopr, amopstrategy,               amopselect, amopnpages)
     SELECT am.oid, opcl.oid, c.opoid, 1,               'hashsel'::regproc, 'hashnpage'::regproc       FROM pg_am am,
pg_opclassopcl, complex_abs_ops_tmp c       WHERE amname = 'hash' AND           opcname = 'complex_abs_ops' AND
 c.oprname = '=';
 
</ProgramListing>
</Para>

<Para>   In order to use this index in a where clause, we need to modify the   <FileName>pg_operator</FileName> class
asfollows.
 

<ProgramListing>   UPDATE pg_operator       SET oprrest = 'eqsel'::regproc, oprjoin = 'eqjoinsel'       WHERE oprname =
'='AND           oprleft = oprright AND           oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs');
 UPDATE pg_operator       SET oprrest = 'neqsel'::regproc, oprjoin = 'neqjoinsel'       WHERE oprname = '<>' AND
  oprleft = oprright AND           oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs');      UPDATE
pg_operator      SET oprrest = 'neqsel'::regproc, oprjoin = 'neqjoinsel'       WHERE oprname = '<>' AND
oprleft= oprright AND           oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs');      UPDATE
pg_operator      SET oprrest = 'intltsel'::regproc, oprjoin = 'intltjoinsel'       WHERE oprname = '<' AND
oprleft= oprright AND           oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs');      UPDATE
pg_operator      SET oprrest = 'intltsel'::regproc, oprjoin = 'intltjoinsel'       WHERE oprname = '<=' AND
oprleft= oprright AND           oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs');      UPDATE
pg_operator      SET oprrest = 'intgtsel'::regproc, oprjoin = 'intgtjoinsel'       WHERE oprname = '>' AND
oprleft= oprright AND           oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs');      UPDATE
pg_operator      SET oprrest = 'intgtsel'::regproc, oprjoin = 'intgtjoinsel'       WHERE oprname = '>=' AND
oprleft= oprright AND           oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs');
 
</ProgramListing> 
</Para>

<Para>  
And last (Finally!) we register a description of this type.

<ProgramListing>   INSERT INTO pg_description (objoid, description)    SELECT oid, 'Two part G/L account'    FROM
pg_typeWHERE typname = 'complex_abs';
 
</ProgramListing> 
</Para> 

</Chapter>

-- 
D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 424 2871     (DoD#0082)    (eNTP)   |  what's for dinner.


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Help: fmgr_info: function 0: cache lookup failed
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] Uh-oh II - ecpg