Re: GIN documentation - Mailing list pgsql-hackers
From | David Fuhry |
---|---|
Subject | Re: GIN documentation |
Date | |
Msg-id | 450CD4F4.6030406@cs.kent.edu Whole thread Raw |
In response to | GIN documentation (Teodor Sigaev <teodor@sigaev.ru>) |
List | pgsql-hackers |
Teodor, Attached is a diff -c against your original gindocs patch. I did my best not to change any of the semantics. My changes no doubt overlap & conflict with those Jeff Davis sent you earlier, so consider both of our diffs. Thanks, Dave Fuhry Teodor Sigaev wrote: > Patch adds GIN documentation and slightly improves GiST docs. > > Somebody of native English speakers, pls, check the text... Thank you. > > > ------------------------------------------------------------------------ > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org *** gindocs.orig 2006-09-17 00:21:38.000000000 -0400 --- gindocs 2006-09-17 00:57:12.000000000 -0400 *************** *** 22,28 **** ! </indexterm> ! <listitem> ! <para> ! ! Soft upper limit of the size of the returned set by GIN index. For more ! information see <xref linkend="gin-tips">. ! </para> ! </listitem> --- 22,28 ---- ! </indexterm> ! <listitem> ! <para> ! ! Soft upper limit of the size of the set returned by the GIN index. For more ! information see <xref linkend="gin-tips">. ! </para> ! </listitem> *************** *** 88,95 **** + <para> + <acronym>GIN</acronym> stands for Generalized Inverted Index. It is + an index structure storing a set of (key, posting list) pairs, where ! + 'posting list' is a set of rows in which the key occurs. The ! + row may contains a lot of keys. + </para> + + <para> --- 88,95 ---- + <para> + <acronym>GIN</acronym> stands for Generalized Inverted Index. It is + an index structure storing a set of (key, posting list) pairs, where ! + 'posting list' is a set of rows in which the key occurs. Each ! + row may contain many keys. + </para> + + <para> *************** *** 178,184 **** + <listitem> + <para> + Returns an array of keys of the query to be executed. n contains ! + strategy number of operation (see <xref linkend="xindex-strategies">). + Depending on n, query may be different type. + </para> + </listitem> --- 178,184 ---- + <listitem> + <para> + Returns an array of keys of the query to be executed. n contains ! + the strategy number of the operation (see <xref linkend="xindex-strategies">). + Depending on n, query may be different type. + </para> + </listitem> *************** *** 188,196 **** + <term>bool consistent( bool check[], StrategyNumber n, Datum query)</term> + <listitem> + <para> ! + Returns TRUE if indexed value satisfies query qualifier with strategy n + (or may satisfy in case of RECHECK mark in operator class). ! + Each element of the check array is TRUE if indexed value has a + corresponding key in the query: if (check[i] == TRUE ) the i-th key of + the query is present in the indexed value. + </para> --- 188,196 ---- + <term>bool consistent( bool check[], StrategyNumber n, Datum query)</term> + <listitem> + <para> ! + Returns TRUE if the indexed value satisfies the query qualifier with strategy n + (or may satisfy in case of RECHECK mark in operator class). ! + Each element of the check array is TRUE if the indexed value has a + corresponding key in the query: if (check[i] == TRUE ) the i-th key of + the query is present in the indexed value. + </para> *************** *** 209,218 **** + <term>Create vs insert</term> + <listitem> + <para> ! + In most cases, insertion into <acronym>GIN</acronym> index is slow enough ! + due to a lot keys should be inserted per one value. So, for bulk upload ! + data in table it will be useful to drop index and create it ! + after finishing upload. + </para> + </listitem> + </varlistentry> --- 209,218 ---- + <term>Create vs insert</term> + <listitem> + <para> ! + In most cases, insertion into a <acronym>GIN</acronym> index is slow ! + due to the likelihood of many keys being inserted for each value. So, for bulk insertions into a ! + table it is advisable to to drop the GIN index and recreate it ! + after finishing bulk insertion. + </para> + </listitem> + </varlistentry> *************** *** 221,227 **** + <term>gin_fuzzy_search_limit</term> + <listitem> + <para> ! + The primary goal of development <acronym>GIN</acronym> indices was + support for highly scalable, full-text search in + <productname>PostgreSQL</productname> and there are often situations when + a full-text search returns a very large set of results. Since reading --- 221,227 ---- + <term>gin_fuzzy_search_limit</term> + <listitem> + <para> ! + The primary goal of developing <acronym>GIN</acronym> indices was + support for highly scalable, full-text search in + <productname>PostgreSQL</productname> and there are often situations when + a full-text search returns a very large set of results. Since reading *************** *** 232,238 **** + <para> + Such queries usually contain very frequent words, so the results are not + very helpful. To facilitate execution of such queries ! + <acronym>GIN</acronym> has a configurable soft upper limit of the size + of the returned set, determined by the + <varname>gin_fuzzy_search_limit</varname> GUC variable. It is set to 0 by + default (no limit). --- 232,238 ---- + <para> + Such queries usually contain very frequent words, so the results are not + very helpful. To facilitate execution of such queries ! + <acronym>GIN</acronym> has a configurable soft upper limit of the size + of the returned set, determined by the + <varname>gin_fuzzy_search_limit</varname> GUC variable. It is set to 0 by + default (no limit). *************** *** 256,271 **** + <title>Limitations</title> + + <para> ! + <acronym>GIN</acronym> doesn't support full scan of index due to it's ! + extremely inefficiency: because of a lot of keys per value, + each heap pointer will returned several times. + </para> + + <para> ! + When extractQuery returns zero number of keys, <acronym>GIN</acronym> will ! + emit a error: for different opclass and strategy semantic meaning of void ! + query may be different (for example, any array contains void array, ! + but they aren't overlapped with void one), and <acronym>GIN</acronym> can't + suggest reasonable answer. + </para> + --- 256,271 ---- + <title>Limitations</title> + + <para> ! + <acronym>GIN</acronym> doesn't support full index scans due to their ! + extremely inefficiency: because there are often many keys per value, + each heap pointer will returned several times. + </para> + + <para> ! + When extractQuery returns zero keys, <acronym>GIN</acronym> will ! + emit a error: for different opclasses and strategies the semantic meaning of a void ! + query may be different (for example, any array contains the void array, ! + but they don't overlap the void array), and <acronym>GIN</acronym> can't + suggest reasonable answer. + </para> + *************** *** 340,346 **** + <see>index</see> + </indexterm> + GIN is a inverted index and it's usable for values which have more ! + than one key, arrays for example. Like to GiST, GIN may support + many different user-defined indexing strategies and the particular + operators with which a GIN index can be used vary depending on the + indexing strategy. --- 340,346 ---- + <see>index</see> + </indexterm> + GIN is a inverted index and it's usable for values which have more ! + than one key, arrays for example. Like GiST, GIN may support + many different user-defined indexing strategies and the particular + operators with which a GIN index can be used vary depending on the + indexing strategy. *************** *** 358,364 **** + + (See <xref linkend="functions-array"> for the meaning of + these operators.) ! + Another GIN operator classes are available in the <literal>contrib</> + tsearch2 and intarray modules. For more information see <xref linkend="GIN">. </para> </sect1> --- 358,364 ---- + + (See <xref linkend="functions-array"> for the meaning of + these operators.) ! + Other GIN operator classes are available in the <literal>contrib</> + tsearch2 and intarray modules. For more information see <xref linkend="GIN">. </para> </sect1> *************** *** 381,389 **** + <para> + Short-term share/exclusive page-level locks are used for + read/write access. Locks are released immediately after each ! + index row is fetched or inserted. But note, that GIN index ! + usually requires produce several inserts per one row, so, ! + GIN makes more work per one value's insertion. + </para> + </listitem> + </varlistentry> --- 381,390 ---- + <para> + Short-term share/exclusive page-level locks are used for + read/write access. Locks are released immediately after each ! + index row is fetched or inserted. But note that a GIN-indexed ! + value insertion usually produces several index key insertions ! + per row, so GIN may do substantial work for a single value's ! + insertion. + </para> + </listitem> + </varlistentry> *************** *** 436,443 **** </table> <para> ! + GIN indexes are similar to GiST in flexibility: it hasn't a fixed set ! + of strategies. Instead, the <quote>consistency</> support routine + interprets the strategy numbers accordingly with operator class + definition. As an example, strategies of operator class over arrays + is shown in <xref linkend="xindex-gin-array-strat-table">. --- 437,444 ---- </table> <para> ! + GIN indexes are similar to GiST's in flexibility: they don't have a fixed ! + set of strategies. Instead, the <quote>consistency</> support routine + interprets the strategy numbers accordingly with operator class + definition. As an example, strategies of operator class over arrays + is shown in <xref linkend="xindex-gin-array-strat-table">.
pgsql-hackers by date: