Re: [PATCH] Opclass parameters - Mailing list pgsql-hackers

From Nikolay Shaplov
Subject Re: [PATCH] Opclass parameters
Date
Msg-id 15077864.qMPJiTemUy@x200m
Whole thread Raw
In response to [PATCH] Opclass parameters  (Nikita Glukhov <n.gluhov@postgrespro.ru>)
Responses Re: Re: [PATCH] Opclass parameters
Re: [PATCH] Opclass parameters
Re: [PATCH] Opclass parameters
List pgsql-hackers
В письме от 28 февраля 2018 00:46:36 пользователь Nikita Glukhov написал:

> I would like to present patch set implementing opclass parameters.
>
> This feature was recently presented at pgconf.ru:
> http://www.sai.msu.su/~megera/postgres/talks/opclass_pgconf.ru-2018.pdf
>
> A analogous work was already done by Nikolay Shaplov two years ago:
> https://www.postgresql.org/message-id/5213596.TqFRiqmCTe%40nataraj-amd64
> But this patches are not based on it, although they are very similar.
Hi!

You know, I am still working on this issue.

When I started to work on it, I found out that option code is not flexible at
all, and you can' reuse it for options that are not relOptions.

I gave your patch just a short glance for now, but as far as I can you start
deviding options into global and local one. I am afraid it will create grater
mess than it is now.

What I am doing right now, I am creating a new reloption internal API, that
will allow to create any option in any place using the very same code.

I think it should be done first, and then use it for opclass parameters and
any kind of options you like.

The big patch is here https://commitfest.postgresql.org/14/992/ (I am afraid
it will not apply to current master as it is, It is quite old. But you can get
the idea)

The smaller parts of the patch that in current commitfest are

https://commitfest.postgresql.org/17/1536/
https://commitfest.postgresql.org/17/1489/

You can help reviewing them. Then the whole thing will go faster. The second
patch is quite trivial. The first will also need attention of someone who is
really good in understanding postgres internals.

-----------

Concerning the patch that you've provided. I've just have a short look. But I
already have some question.

1. I've seen you've added a new attribute into pg_index. Why??!!
As far as I can get, if have index built on several columns (A1, A2, A3) you
can set, own opclass for each column. And set individual options for each
opclass if we are speaking about options. So I would expect to have these
options not in pg_index, but in pg_attribute. And we already have one there:
attoptions.I just do not get how you have come to per-index options. May be I
should read code more attentively...


2. Your patch does not provide any example of your new tool usage. In my
prototype patch I've shown the implementation of opclass options for intarray.
May be you should do the same. (Use my example it will be more easy then do it
from scratch). It will give more understanding of how this improvement can be
used.

3.

--- a/src/test/regress/expected/btree_index.out
+++ b/src/test/regress/expected/btree_index.out
@@ -150,3 +150,8 @@ vacuum btree_tall_tbl;
 -- need to insert some rows to cause the fast root page to split.
 insert into btree_tall_tbl (id, t)
   select g, repeat('x', 100) from generate_series(1, 500) g;
+-- Test unsupported btree opclass parameters
+create index on btree_tall_tbl (id int4_ops(foo=1));
+ERROR:  access method "btree" does not support opclass options
+create index on btree_tall_tbl (id default(foo=1));
+ERROR:  access method "btree" does not support opclass options

Are you sure we really need these in postgres???

---------------------------------

So my proposal is the following:
let's commit

https://commitfest.postgresql.org/17/1536/
https://commitfest.postgresql.org/17/1489/

I will provide a final patch for option engine refactoring in commit fest, and
you will write you implementation of opclass options on top of it. We can to
it simultaneously. Or I will write opclass options myself... I do not care who
will do oplcass options as long it is done using refactored options engine. If
you do it, I can review it.

>
>
> Opclass parameters can give user ability to:
>   * Define the values of the constants that are hardcoded now in the
> opclasses depending on the indexed data.
>   * Specify what to index for non-atomic data types (arrays, json[b],
> tsvector). Partial index can only filter whole rows.
>   * Specify what indexing algorithm to use depending on the indexed data.
>
>
> Description of patches:
>
> 1. Infrastructure for opclass parameters.
>
> SQL grammar is changed only for CREATE INDEX statement: parenthesized
> parameters in reloptions format are added after column's opclass name.
> Default opclass can be specified with DEFAULT keyword:
>
> CREATE INDEX idx ON tab USING am (
>     {expr {opclass | DEFAULT} ({name=value} [,...])} [,...]
> );
>
> Example for contrib/intarray:
>
> CREATE INDEX ON arrays USING gist (
>    arr gist__intbig_ops (siglen = 32),
>    arr DEFAULT (numranges = 100)
> );
>
> \d arrays
>                  Table "public.arrays"
>   Column |   Type    | Collation | Nullable | Default
> --------+-----------+-----------+----------+---------
>   arr    | integer[] |           |          |
> Indexes:
>      "arrays_arr_arr1_idx" gist (arr gist__intbig_ops (siglen='32'), arr
> gist__int_ops (numranges='100'))
>
>
> I decided to store parameters in text[] column pg_index.indoptions near to
> existing columns like indkey, indcollation, indclass, indoption. I-th
> element of indoptions[] is a text array of parameters of i-th index column
> serialized into a string.  Each parameter is stored as 'name=value' text
> string like ordinal reloptions.  There is another way to store opclass
> parameters: store them in the existing column pg_attribute.attoptions (as
> it was done by Nikolay Shaplov) and there will be no need to serialize
> reloptions to a text array element.
>
> Example query showing how parameters are stored:
>
> SELECT ARRAY(
>      SELECT (pg_identify_object('pg_opclass'::regclass, opcid, 0)).name
>      FROM unnest(indclass::int[]) opcid
>    ) indclass, indoptions
> FROM pg_index
> WHERE indoptions IS NOT NULL;
>
>               indclass             |             indoptions
> ----------------------------------+------------------------------------
>   {gist__intbig_ops,gist__int_ops} | {"{siglen=32}","{numranges=100}"}
>   {jsonb_path_ops}                 | {"{projection=$.tags[*].term}"}
> (2 rows)
>
>
> Each access method supporting opclass parameters specifies amopclassoptions
> routine for transformation of text[] parameters datum into a binary bytea
> structure which will be cached in RelationData and IndexOptInfo structures:
>
> typedef bytea *(*amopclassoptions_function) (
>     Relation index, AttrNumber colnum, Datum indoptions, bool validate
> );
>
>
> If access method wants simply to delegate parameters processing to one of
> column opclass's support functions, then it can use
> index_opclass_options_generic() subroutine in its amopclassoptions
> implementation:
>
> bytea *index_opclass_options_generic(
>     Relation relation, AttrNumber attnum, uint16 procnum,
>     Datum indoptions, bool validate
> );
>
> This support functions must have the following signature:
> internal (options internal, validate bool).
> Opclass parameters are passed as a text[] reloptions datum, returned pointer
> to a bytea structure with parsed parameter values.
>
> Opclass can use new functions parseLocalRelOptions(),
> parseAndFillLocalRelOptions() for reloptions parsing.  This functions differ
> from the standard parseRelOptions() in that a local array of reloptions
> descriptions is passed here, not a global relopt_kind.  But it seems that
> reloptions processing still needs deeper refactoring like the one already
> done by Nikolay Shaplov
> (https://www.postgresql.org/message-id/flat/2146419.veIEZdk4E4%40x200m#2146
> 419.veIEZdk4E4@x200m).
>
>
> 2. Opclass parameters support in GiST indices.
>
> Parametrized GiST opclass specifies optional 10th (GIST_OPCLASSOPT_PROC)
> support function with the following signature:
> internal (options internal, validate bool)
>
> Returned parsed bytea pointer with parameters will be passed to all support
> functions in the last argument.
>
> 3. Opclass parameters support in GIN indices.
>
> Everything is the same as for GiST, except for the optional support
> function number which is 7 (GIN_OPCLASSOPTIONS_PROC) here.
>
> 4. Opclass parameters for GiST tsvector_ops
> 5. Opclass parameters for contrib/intarray
> 6. Opclass parameters for contrib/ltree
> 7. Opclass parameters for contrib/pg_trgm
> 8. Opclass parameters for contrib/hstore
>
> This 5 patches for GiST opclasses are very similar: added optional 'siglen'
> parameter for specifying signature length.  Default signature length is left
> equal to the hardcoded value that was here before. Also added 'numranges'
> parameter for gist__int_ops.
>
>
>
> We also have two more complex unfinished patches for GIN opclasses which
> should be posted in separate threads:
>
>   * tsvector_ops: added parameter 'weights' for specification of indexed
>     lexeme's weight groups.  This parameter can reduce index size and its
>     build/update time and can also eliminate recheck.  By default, all
> weights are indexed within the same group.
>
>   * jsonb_ops: added jsonpath parameter 'projection' for specification of
>     indexed paths in jsonb (this patch depends on SQL/JSON jsonpath patch).
>     Analogically to tsvector_ops, this parameter can reduce index size and
> its build/update time, but can not eliminate recheck.

--
Do code for fun.
Attachment

pgsql-hackers by date:

Previous
From: Ivan Kartyshov
Date:
Subject: Re: [HACKERS] WIP: long transactions on hot standby feedback replica/ proof of concept
Next
From: Peter Eisentraut
Date:
Subject: Re: Disabling src/test/[ssl|ldap] when not building with SSL/LDAPsupport