Re: GIN documentation - Mailing list pgsql-hackers

From David Fuhry
Subject Re: GIN documentation
Date
Msg-id 450CD4F4.6030406@cs.kent.edu
Whole thread Raw
In response to GIN documentation  (Teodor Sigaev <teodor@sigaev.ru>)
List pgsql-hackers
Teodor,

    Attached is a diff -c against your original gindocs patch.  I did my
best not to change any of the semantics.  My changes no doubt overlap &
conflict with those Jeff Davis sent you earlier, so consider both of our
diffs.

Thanks,

Dave Fuhry

Teodor Sigaev wrote:
> Patch adds GIN documentation and slightly improves GiST docs.
>
> Somebody of native English speakers, pls, check the text... Thank you.
>
>
> ------------------------------------------------------------------------
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org
*** gindocs.orig    2006-09-17 00:21:38.000000000 -0400
--- gindocs    2006-09-17 00:57:12.000000000 -0400
***************
*** 22,28 ****
  !       </indexterm>
  !       <listitem>
  !        <para>
! !         Soft upper limit of the size of the returned set by GIN index. For more
  !         information see <xref linkend="gin-tips">.
  !        </para>
  !       </listitem>
--- 22,28 ----
  !       </indexterm>
  !       <listitem>
  !        <para>
! !         Soft upper limit of the size of the set returned by the GIN index. For more
  !         information see <xref linkend="gin-tips">.
  !        </para>
  !       </listitem>
***************
*** 88,95 ****
  +  <para>
  +    <acronym>GIN</acronym> stands for Generalized Inverted Index.  It is
  +    an index structure storing a set of (key, posting list) pairs, where
! +    'posting list' is a set of rows in which the key occurs. The
! +    row may contains a lot of keys.
  +  </para>
  +
  +  <para>
--- 88,95 ----
  +  <para>
  +    <acronym>GIN</acronym> stands for Generalized Inverted Index.  It is
  +    an index structure storing a set of (key, posting list) pairs, where
! +    'posting list' is a set of rows in which the key occurs. Each
! +    row may contain many keys.
  +  </para>
  +
  +  <para>
***************
*** 178,184 ****
  +      <listitem>
  +       <para>
  +        Returns an array of keys of the query to be executed. n contains
! +        strategy number of operation (see <xref linkend="xindex-strategies">).
  +        Depending on n, query may be different type.
  +       </para>
  +      </listitem>
--- 178,184 ----
  +      <listitem>
  +       <para>
  +        Returns an array of keys of the query to be executed. n contains
! +        the strategy number of the operation (see <xref linkend="xindex-strategies">).
  +        Depending on n, query may be different type.
  +       </para>
  +      </listitem>
***************
*** 188,196 ****
  +      <term>bool consistent( bool check[], StrategyNumber n, Datum query)</term>
  +      <listitem>
  +       <para>
! +        Returns TRUE if indexed value satisfies query qualifier with strategy n
  +        (or may satisfy in case of RECHECK mark in operator class).
! +        Each element of the check array is TRUE if indexed value has a
  +        corresponding key in the query: if (check[i] == TRUE ) the i-th key of
  +        the query is present in the indexed value.
  +       </para>
--- 188,196 ----
  +      <term>bool consistent( bool check[], StrategyNumber n, Datum query)</term>
  +      <listitem>
  +       <para>
! +        Returns TRUE if the indexed value satisfies the query qualifier with strategy n
  +        (or may satisfy in case of RECHECK mark in operator class).
! +        Each element of the check array is TRUE if the indexed value has a
  +        corresponding key in the query: if (check[i] == TRUE ) the i-th key of
  +        the query is present in the indexed value.
  +       </para>
***************
*** 209,218 ****
  +    <term>Create vs insert</term>
  +    <listitem>
  +     <para>
! +      In most cases, insertion into <acronym>GIN</acronym> index is slow enough
! +      due to a lot keys should be inserted per one value. So, for bulk upload
! +      data in table it will be useful to drop index and create it
! +      after finishing upload.
  +     </para>
  +    </listitem>
  +   </varlistentry>
--- 209,218 ----
  +    <term>Create vs insert</term>
  +    <listitem>
  +     <para>
! +      In most cases, insertion into a <acronym>GIN</acronym> index is slow
! +      due to the likelihood of many keys being inserted for each value. So, for bulk insertions into a
! +      table it is advisable to to drop the GIN index and recreate it
! +      after finishing bulk insertion.
  +     </para>
  +    </listitem>
  +   </varlistentry>
***************
*** 221,227 ****
  +    <term>gin_fuzzy_search_limit</term>
  +    <listitem>
  +     <para>
! +      The primary goal of development <acronym>GIN</acronym> indices was
  +      support for highly scalable, full-text search in
  +      <productname>PostgreSQL</productname> and there are often situations when
  +      a full-text search returns a very large set of results.  Since reading
--- 221,227 ----
  +    <term>gin_fuzzy_search_limit</term>
  +    <listitem>
  +     <para>
! +      The primary goal of developing <acronym>GIN</acronym> indices was
  +      support for highly scalable, full-text search in
  +      <productname>PostgreSQL</productname> and there are often situations when
  +      a full-text search returns a very large set of results.  Since reading
***************
*** 232,238 ****
  +     <para>
  +      Such queries usually contain very frequent words, so the results are not
  +      very helpful. To facilitate execution of such queries
! +      <acronym>GIN</acronym> has a configurable  soft upper limit of the size
  +      of the returned set, determined by the
  +      <varname>gin_fuzzy_search_limit</varname> GUC variable.  It is set to 0 by
  +      default (no limit).
--- 232,238 ----
  +     <para>
  +      Such queries usually contain very frequent words, so the results are not
  +      very helpful. To facilitate execution of such queries
! +      <acronym>GIN</acronym> has a configurable soft upper limit of the size
  +      of the returned set, determined by the
  +      <varname>gin_fuzzy_search_limit</varname> GUC variable.  It is set to 0 by
  +      default (no limit).
***************
*** 256,271 ****
  +  <title>Limitations</title>
  +
  +  <para>
! +   <acronym>GIN</acronym> doesn't support full scan of index due to it's
! +   extremely inefficiency: because of a lot of keys per value,
  +   each heap pointer will returned several times.
  +  </para>
  +
  +  <para>
! +   When extractQuery returns zero number of keys, <acronym>GIN</acronym> will
! +   emit a error: for different opclass and strategy semantic meaning of void
! +   query may be different (for example, any array contains void array,
! +   but they aren't overlapped with void one), and <acronym>GIN</acronym> can't
  +   suggest reasonable answer.
  +  </para>
  +
--- 256,271 ----
  +  <title>Limitations</title>
  +
  +  <para>
! +   <acronym>GIN</acronym> doesn't support full index scans due to their
! +   extremely inefficiency: because there are often many keys per value,
  +   each heap pointer will returned several times.
  +  </para>
  +
  +  <para>
! +   When extractQuery returns zero keys, <acronym>GIN</acronym> will
! +   emit a error: for different opclasses and strategies the semantic meaning of a void
! +   query may be different (for example, any array contains the void array,
! +   but they don't overlap the void array), and <acronym>GIN</acronym> can't
  +   suggest reasonable answer.
  +  </para>
  +
***************
*** 340,346 ****
  +     <see>index</see>
  +    </indexterm>
  +    GIN is a inverted index and it's usable for values which have more
! +    than one key, arrays for example. Like to GiST, GIN may support
  +    many different user-defined indexing strategies and the particular
  +    operators with which a GIN index can be used vary depending on the
  +    indexing strategy.
--- 340,346 ----
  +     <see>index</see>
  +    </indexterm>
  +    GIN is a inverted index and it's usable for values which have more
! +    than one key, arrays for example. Like GiST, GIN may support
  +    many different user-defined indexing strategies and the particular
  +    operators with which a GIN index can be used vary depending on the
  +    indexing strategy.
***************
*** 358,364 ****
  +
  +    (See <xref linkend="functions-array"> for the meaning of
  +    these operators.)
! +    Another GIN operator classes are available in the <literal>contrib</>
  +    tsearch2 and intarray modules. For more information see <xref linkend="GIN">.
      </para>
     </sect1>
--- 358,364 ----
  +
  +    (See <xref linkend="functions-array"> for the meaning of
  +    these operators.)
! +    Other GIN operator classes are available in the <literal>contrib</>
  +    tsearch2 and intarray modules. For more information see <xref linkend="GIN">.
      </para>
     </sect1>
***************
*** 381,389 ****
  +        <para>
  +         Short-term share/exclusive page-level locks are used for
  +         read/write access. Locks are released immediately after each
! +         index row is fetched or inserted. But note, that GIN index
! +         usually requires produce several inserts per one row, so,
! +         GIN makes more work per one value's insertion.
  +        </para>
  +       </listitem>
  +      </varlistentry>
--- 381,390 ----
  +        <para>
  +         Short-term share/exclusive page-level locks are used for
  +         read/write access. Locks are released immediately after each
! +         index row is fetched or inserted. But note that a GIN-indexed
! +         value insertion usually produces several index key insertions
! +         per row, so GIN may do substantial work for a single value's
! +         insertion.
  +        </para>
  +       </listitem>
  +      </varlistentry>
***************
*** 436,443 ****
       </table>

      <para>
! +    GIN indexes are similar to GiST in flexibility: it hasn't a fixed set
! +    of strategies. Instead, the <quote>consistency</> support routine
  +    interprets the strategy numbers accordingly with operator class
  +    definition. As an example, strategies of operator class over arrays
  +    is shown in <xref linkend="xindex-gin-array-strat-table">.
--- 437,444 ----
       </table>

      <para>
! +    GIN indexes are similar to GiST's in flexibility: they don't have a fixed
! +    set of strategies. Instead, the <quote>consistency</> support routine
  +    interprets the strategy numbers accordingly with operator class
  +    definition. As an example, strategies of operator class over arrays
  +    is shown in <xref linkend="xindex-gin-array-strat-table">.

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Timezone List
Next
From: Stefan Kaltenbrunner
Date:
Subject: Re: -HEAD planner issue wrt hash_joins on dbt3 ?