Thread: [PATCH] Opclass parameters
Hi hackers. I would like to present patch set implementing opclass parameters. This feature was recently presented at pgconf.ru: http://www.sai.msu.su/~megera/postgres/talks/opclass_pgconf.ru-2018.pdf A analogous work was already done by Nikolay Shaplov two years ago: https://www.postgresql.org/message-id/5213596.TqFRiqmCTe%40nataraj-amd64 But this patches are not based on it, although they are very similar. Opclass parameters can give user ability to: * Define the values of the constants that are hardcoded now in the opclasses depending on the indexed data. * Specify what to index for non-atomic data types (arrays, json[b], tsvector). Partial index can only filter whole rows. * Specify what indexing algorithm to use depending on the indexed data. Description of patches: 1. Infrastructure for opclass parameters. SQL grammar is changed only for CREATE INDEX statement: parenthesized parameters in reloptions format are added after column's opclass name. Default opclass can be specified with DEFAULT keyword: CREATE INDEX idx ON tab USING am ( {expr {opclass | DEFAULT} ({name=value} [,...])} [,...] ); Example for contrib/intarray: CREATE INDEX ON arrays USING gist ( arr gist__intbig_ops (siglen = 32), arr DEFAULT (numranges = 100) ); \d arrays Table "public.arrays" Column | Type | Collation | Nullable | Default --------+-----------+-----------+----------+--------- arr | integer[] | | | Indexes: "arrays_arr_arr1_idx" gist (arr gist__intbig_ops (siglen='32'), arr gist__int_ops (numranges='100')) I decided to store parameters in text[] column pg_index.indoptions near to existing columns like indkey, indcollation, indclass, indoption. I-th element of indoptions[] is a text array of parameters of i-th index column serialized into a string. Each parameter is stored as 'name=value' text string like ordinal reloptions. There is another way to store opclass parameters: store them in the existing column pg_attribute.attoptions (as it was done by Nikolay Shaplov) and there will be no need to serialize reloptions to a text array element. Example query showing how parameters are stored: SELECT ARRAY( SELECT (pg_identify_object('pg_opclass'::regclass, opcid, 0)).name FROM unnest(indclass::int[]) opcid ) indclass, indoptions FROM pg_index WHERE indoptions IS NOT NULL; indclass | indoptions ----------------------------------+------------------------------------ {gist__intbig_ops,gist__int_ops} | {"{siglen=32}","{numranges=100}"} {jsonb_path_ops} | {"{projection=$.tags[*].term}"} (2 rows) Each access method supporting opclass parameters specifies amopclassoptions routine for transformation of text[] parameters datum into a binary bytea structure which will be cached in RelationData and IndexOptInfo structures: typedef bytea *(*amopclassoptions_function) ( Relation index, AttrNumber colnum, Datum indoptions, bool validate ); If access method wants simply to delegate parameters processing to one of column opclass's support functions, then it can use index_opclass_options_generic() subroutine in its amopclassoptions implementation: bytea *index_opclass_options_generic( Relation relation, AttrNumber attnum, uint16 procnum, Datum indoptions, bool validate ); This support functions must have the following signature: internal (options internal, validate bool). Opclass parameters are passed as a text[] reloptions datum, returned pointer to a bytea structure with parsed parameter values. Opclass can use new functions parseLocalRelOptions(), parseAndFillLocalRelOptions() for reloptions parsing. This functions differ from the standard parseRelOptions() in that a local array of reloptions descriptions is passed here, not a global relopt_kind. But it seems that reloptions processing still needs deeper refactoring like the one already done by Nikolay Shaplov (https://www.postgresql.org/message-id/flat/2146419.veIEZdk4E4%40x200m#2146419.veIEZdk4E4@x200m). 2. Opclass parameters support in GiST indices. Parametrized GiST opclass specifies optional 10th (GIST_OPCLASSOPT_PROC) support function with the following signature: internal (options internal, validate bool) Returned parsed bytea pointer with parameters will be passed to all support functions in the last argument. 3. Opclass parameters support in GIN indices. Everything is the same as for GiST, except for the optional support function number which is 7 (GIN_OPCLASSOPTIONS_PROC) here. 4. Opclass parameters for GiST tsvector_ops 5. Opclass parameters for contrib/intarray 6. Opclass parameters for contrib/ltree 7. Opclass parameters for contrib/pg_trgm 8. Opclass parameters for contrib/hstore This 5 patches for GiST opclasses are very similar: added optional 'siglen' parameter for specifying signature length. Default signature length is left equal to the hardcoded value that was here before. Also added 'numranges' parameter for gist__int_ops. We also have two more complex unfinished patches for GIN opclasses which should be posted in separate threads: * tsvector_ops: added parameter 'weights' for specification of indexed lexeme's weight groups. This parameter can reduce index size and its build/update time and can also eliminate recheck. By default, all weights are indexed within the same group. * jsonb_ops: added jsonpath parameter 'projection' for specification of indexed paths in jsonb (this patch depends on SQL/JSON jsonpath patch). Analogically to tsvector_ops, this parameter can reduce index size and its build/update time, but can not eliminate recheck. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
- 0001-opclass-parameters-v01.patch
- 0002-opclass-parameters-GiST-v01.patch
- 0003-opclass-parameters-GIN-v01.patch
- 0004-opclass-parameters-GiST-tsvector_ops-v01.patch
- 0005-opclass-parameters-contrib_intarray-v01.patch
- 0006-opclass-parameters-contrib_ltree-v01.patch
- 0007-opclass-parameters-contrib_pg_trgm-v01.patch
- 0008-opclass-parameters-contrib_hstore-v01.patch
В письме от 28 февраля 2018 00:46:36 пользователь Nikita Glukhov написал: > I would like to present patch set implementing opclass parameters. > > This feature was recently presented at pgconf.ru: > http://www.sai.msu.su/~megera/postgres/talks/opclass_pgconf.ru-2018.pdf > > A analogous work was already done by Nikolay Shaplov two years ago: > https://www.postgresql.org/message-id/5213596.TqFRiqmCTe%40nataraj-amd64 > But this patches are not based on it, although they are very similar. Hi! You know, I am still working on this issue. When I started to work on it, I found out that option code is not flexible at all, and you can' reuse it for options that are not relOptions. I gave your patch just a short glance for now, but as far as I can you start deviding options into global and local one. I am afraid it will create grater mess than it is now. What I am doing right now, I am creating a new reloption internal API, that will allow to create any option in any place using the very same code. I think it should be done first, and then use it for opclass parameters and any kind of options you like. The big patch is here https://commitfest.postgresql.org/14/992/ (I am afraid it will not apply to current master as it is, It is quite old. But you can get the idea) The smaller parts of the patch that in current commitfest are https://commitfest.postgresql.org/17/1536/ https://commitfest.postgresql.org/17/1489/ You can help reviewing them. Then the whole thing will go faster. The second patch is quite trivial. The first will also need attention of someone who is really good in understanding postgres internals. ----------- Concerning the patch that you've provided. I've just have a short look. But I already have some question. 1. I've seen you've added a new attribute into pg_index. Why??!! As far as I can get, if have index built on several columns (A1, A2, A3) you can set, own opclass for each column. And set individual options for each opclass if we are speaking about options. So I would expect to have these options not in pg_index, but in pg_attribute. And we already have one there: attoptions.I just do not get how you have come to per-index options. May be I should read code more attentively... 2. Your patch does not provide any example of your new tool usage. In my prototype patch I've shown the implementation of opclass options for intarray. May be you should do the same. (Use my example it will be more easy then do it from scratch). It will give more understanding of how this improvement can be used. 3. --- a/src/test/regress/expected/btree_index.out +++ b/src/test/regress/expected/btree_index.out @@ -150,3 +150,8 @@ vacuum btree_tall_tbl; -- need to insert some rows to cause the fast root page to split. insert into btree_tall_tbl (id, t) select g, repeat('x', 100) from generate_series(1, 500) g; +-- Test unsupported btree opclass parameters +create index on btree_tall_tbl (id int4_ops(foo=1)); +ERROR: access method "btree" does not support opclass options +create index on btree_tall_tbl (id default(foo=1)); +ERROR: access method "btree" does not support opclass options Are you sure we really need these in postgres??? --------------------------------- So my proposal is the following: let's commit https://commitfest.postgresql.org/17/1536/ https://commitfest.postgresql.org/17/1489/ I will provide a final patch for option engine refactoring in commit fest, and you will write you implementation of opclass options on top of it. We can to it simultaneously. Or I will write opclass options myself... I do not care who will do oplcass options as long it is done using refactored options engine. If you do it, I can review it. > > > Opclass parameters can give user ability to: > * Define the values of the constants that are hardcoded now in the > opclasses depending on the indexed data. > * Specify what to index for non-atomic data types (arrays, json[b], > tsvector). Partial index can only filter whole rows. > * Specify what indexing algorithm to use depending on the indexed data. > > > Description of patches: > > 1. Infrastructure for opclass parameters. > > SQL grammar is changed only for CREATE INDEX statement: parenthesized > parameters in reloptions format are added after column's opclass name. > Default opclass can be specified with DEFAULT keyword: > > CREATE INDEX idx ON tab USING am ( > {expr {opclass | DEFAULT} ({name=value} [,...])} [,...] > ); > > Example for contrib/intarray: > > CREATE INDEX ON arrays USING gist ( > arr gist__intbig_ops (siglen = 32), > arr DEFAULT (numranges = 100) > ); > > \d arrays > Table "public.arrays" > Column | Type | Collation | Nullable | Default > --------+-----------+-----------+----------+--------- > arr | integer[] | | | > Indexes: > "arrays_arr_arr1_idx" gist (arr gist__intbig_ops (siglen='32'), arr > gist__int_ops (numranges='100')) > > > I decided to store parameters in text[] column pg_index.indoptions near to > existing columns like indkey, indcollation, indclass, indoption. I-th > element of indoptions[] is a text array of parameters of i-th index column > serialized into a string. Each parameter is stored as 'name=value' text > string like ordinal reloptions. There is another way to store opclass > parameters: store them in the existing column pg_attribute.attoptions (as > it was done by Nikolay Shaplov) and there will be no need to serialize > reloptions to a text array element. > > Example query showing how parameters are stored: > > SELECT ARRAY( > SELECT (pg_identify_object('pg_opclass'::regclass, opcid, 0)).name > FROM unnest(indclass::int[]) opcid > ) indclass, indoptions > FROM pg_index > WHERE indoptions IS NOT NULL; > > indclass | indoptions > ----------------------------------+------------------------------------ > {gist__intbig_ops,gist__int_ops} | {"{siglen=32}","{numranges=100}"} > {jsonb_path_ops} | {"{projection=$.tags[*].term}"} > (2 rows) > > > Each access method supporting opclass parameters specifies amopclassoptions > routine for transformation of text[] parameters datum into a binary bytea > structure which will be cached in RelationData and IndexOptInfo structures: > > typedef bytea *(*amopclassoptions_function) ( > Relation index, AttrNumber colnum, Datum indoptions, bool validate > ); > > > If access method wants simply to delegate parameters processing to one of > column opclass's support functions, then it can use > index_opclass_options_generic() subroutine in its amopclassoptions > implementation: > > bytea *index_opclass_options_generic( > Relation relation, AttrNumber attnum, uint16 procnum, > Datum indoptions, bool validate > ); > > This support functions must have the following signature: > internal (options internal, validate bool). > Opclass parameters are passed as a text[] reloptions datum, returned pointer > to a bytea structure with parsed parameter values. > > Opclass can use new functions parseLocalRelOptions(), > parseAndFillLocalRelOptions() for reloptions parsing. This functions differ > from the standard parseRelOptions() in that a local array of reloptions > descriptions is passed here, not a global relopt_kind. But it seems that > reloptions processing still needs deeper refactoring like the one already > done by Nikolay Shaplov > (https://www.postgresql.org/message-id/flat/2146419.veIEZdk4E4%40x200m#2146 > 419.veIEZdk4E4@x200m). > > > 2. Opclass parameters support in GiST indices. > > Parametrized GiST opclass specifies optional 10th (GIST_OPCLASSOPT_PROC) > support function with the following signature: > internal (options internal, validate bool) > > Returned parsed bytea pointer with parameters will be passed to all support > functions in the last argument. > > 3. Opclass parameters support in GIN indices. > > Everything is the same as for GiST, except for the optional support > function number which is 7 (GIN_OPCLASSOPTIONS_PROC) here. > > 4. Opclass parameters for GiST tsvector_ops > 5. Opclass parameters for contrib/intarray > 6. Opclass parameters for contrib/ltree > 7. Opclass parameters for contrib/pg_trgm > 8. Opclass parameters for contrib/hstore > > This 5 patches for GiST opclasses are very similar: added optional 'siglen' > parameter for specifying signature length. Default signature length is left > equal to the hardcoded value that was here before. Also added 'numranges' > parameter for gist__int_ops. > > > > We also have two more complex unfinished patches for GIN opclasses which > should be posted in separate threads: > > * tsvector_ops: added parameter 'weights' for specification of indexed > lexeme's weight groups. This parameter can reduce index size and its > build/update time and can also eliminate recheck. By default, all > weights are indexed within the same group. > > * jsonb_ops: added jsonpath parameter 'projection' for specification of > indexed paths in jsonb (this patch depends on SQL/JSON jsonpath patch). > Analogically to tsvector_ops, this parameter can reduce index size and > its build/update time, but can not eliminate recheck. -- Do code for fun.
Attachment
Hi Nikita, On 2/28/18 9:46 AM, Nikolay Shaplov wrote: > В письме от 28 февраля 2018 00:46:36 пользователь Nikita Glukhov написал: > >> I would like to present patch set implementing opclass parameters. >> >> This feature was recently presented at pgconf.ru: >> http://www.sai.msu.su/~megera/postgres/talks/opclass_pgconf.ru-2018.pdf >> >> A analogous work was already done by Nikolay Shaplov two years ago: >> https://www.postgresql.org/message-id/5213596.TqFRiqmCTe%40nataraj-amd64 >> But this patches are not based on it, although they are very similar. > > You know, I am still working on this issue. This patch was submitted to the 2018-03 CF at the last moment with no prior discussion or review as far as I can tell. It appears to be non-trivial and therefore not a good fit for the last CF for PG11. In addition, based on Nikolay's response, I think the patch should be marked Returned with Feedback until it is reconciled with the existing patches. Any objects to marking this Returned with Feedback? Or, I can move it to the next CF as is. Regards, -- -David david@pgmasters.net
On Wed, Feb 28, 2018 at 5:46 PM, Nikolay Shaplov <dhyan@nataraj.su> wrote: > Concerning the patch that you've provided. I've just have a short look. But I > already have some question. > > 1. I've seen you've added a new attribute into pg_index. Why??!! > As far as I can get, if have index built on several columns (A1, A2, A3) you > can set, own opclass for each column. And set individual options for each > opclass if we are speaking about options. So I would expect to have these > options not in pg_index, but in pg_attribute. And we already have one there: > attoptions.I just do not get how you have come to per-index options. May be I > should read code more attentively... this is what we want to discuss. > > > 2. Your patch does not provide any example of your new tool usage. In my > prototype patch I've shown the implementation of opclass options for intarray. > May be you should do the same. (Use my example it will be more easy then do it > from scratch). It will give more understanding of how this improvement can be > used. Hey, look on patches, there are many examples ! > > 3. > > --- a/src/test/regress/expected/btree_index.out > +++ b/src/test/regress/expected/btree_index.out > @@ -150,3 +150,8 @@ vacuum btree_tall_tbl; > -- need to insert some rows to cause the fast root page to split. > insert into btree_tall_tbl (id, t) > select g, repeat('x', 100) from generate_series(1, 500) g; > +-- Test unsupported btree opclass parameters > +create index on btree_tall_tbl (id int4_ops(foo=1)); > +ERROR: access method "btree" does not support opclass options > +create index on btree_tall_tbl (id default(foo=1)); > +ERROR: access method "btree" does not support opclass options > > Are you sure we really need these in postgres??? Hey, this is a just a test to check unsupported opclass options !
On Thu, Mar 1, 2018 at 7:02 PM, David Steele <david@pgmasters.net> wrote: > Hi Nikita, > > On 2/28/18 9:46 AM, Nikolay Shaplov wrote: >> В письме от 28 февраля 2018 00:46:36 пользователь Nikita Glukhov написал: >> >>> I would like to present patch set implementing opclass parameters. >>> >>> This feature was recently presented at pgconf.ru: >>> http://www.sai.msu.su/~megera/postgres/talks/opclass_pgconf.ru-2018.pdf >>> >>> A analogous work was already done by Nikolay Shaplov two years ago: >>> https://www.postgresql.org/message-id/5213596.TqFRiqmCTe%40nataraj-amd64 >>> But this patches are not based on it, although they are very similar. >> >> You know, I am still working on this issue. > > This patch was submitted to the 2018-03 CF at the last moment with no > prior discussion or review as far as I can tell. It appears to be > non-trivial and therefore not a good fit for the last CF for PG11. the idea is simple, the main problem is where to store parameters. We hoped that we get a bright idea from developers. > > In addition, based on Nikolay's response, I think the patch should be > marked Returned with Feedback until it is reconciled with the existing > patches. We proposed something that works and could be useful for people, especially for people, who use complex json documents. It would require minimal changes if Nikolay's patch, which is quite invasive, will be committed in future. What we need to discuss is the user-visible features - the syntax changes. > > Any objects to marking this Returned with Feedback? Or, I can move it > to the next CF as is. I think that Returned with Feedback would be good. We will continue discussion in -hackers. > > Regards, > -- > -David > david@pgmasters.net
On 3/1/18 3:50 PM, Oleg Bartunov wrote: > On Thu, Mar 1, 2018 at 7:02 PM, David Steele <david@pgmasters.net> wrote: >> >> Any objections to marking this Returned with Feedback? Or, I can move it >> to the next CF as is. > > I think that Returned with Feedback would be good. We will continue > discussion in -hackers. Marked as Returned with Feedback. Hopefully we'll see this patch in the 2018-09 CF. -- -David david@pgmasters.net
В письме от 1 марта 2018 23:02:20 пользователь Oleg Bartunov написал: > > 2. Your patch does not provide any example of your new tool usage. In my > > prototype patch I've shown the implementation of opclass options for > > intarray. May be you should do the same. (Use my example it will be more > > easy then do it from scratch). It will give more understanding of how > > this improvement can be used. > > Hey, look on patches, there are many examples ! Oups...Sorry. I've looked at the patch from commitfest https://commitfest.postgresql.org/17/1559/ and it have shown only the first file. And When I read the letter I did not pay attention to attachments at all. So I was sure there is only one there. Yes. Now I see seven examples. But I think seven it is too many. For each case a reviewer should make consideration if this parameter worth moving to opclass options, or fixed definition in the C code is quite ok. Doing it for whole bunch, may make it messy. I think, it would be good to commit an implementation of opclass options, with a good example of usage. And then commit patches for all cases where these options can be used. But since it is now "Rejected with feedback", let's wait till autumn. -- Do code for fun.
Attachment
On 02.03.2018 19:12, Nikolay Shaplov wrote: > В письме от 1 марта 2018 23:02:20 пользователь Oleg Bartunov написал: >>> 2. Your patch does not provide any example of your new tool usage. In my >>> prototype patch I've shown the implementation of opclass options for >>> intarray. May be you should do the same. (Use my example it will be more >>> easy then do it from scratch). It will give more understanding of how >>> this improvement can be used. >> Hey, look on patches, there are many examples ! > Oups...Sorry. I've looked at the patch from commitfest > https://commitfest.postgresql.org/17/1559/ and it have shown only the first > file. And When I read the letter I did not pay attention to attachments at > all. So I was sure there is only one there. > > Yes. Now I see seven examples. But I think seven it is too many. > For each case a reviewer should make consideration if this parameter worth > moving to opclass options, or fixed definition in the C code is quite ok. > Doing it for whole bunch, may make it messy. I think, it would be good to > commit an implementation of opclass options, with a good example of usage. And > then commit patches for all cases where these options can be used. There are 5 examples for GiST opclasses, not 7, and they are almost identical -- in all of them added 'siglen' parameter for signature length specification. > But since it is now "Rejected with feedback", let's wait till autumn. We don't want to wait that long. But now we only need to сome to an agreement about CREATE INDEX syntax and where to store the opclass parameters. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 03.03.2018 1:32, Nikita Glukhov wrote: > On 02.03.2018 19:12, Nikolay Shaplov wrote: >> But since it is now "Rejected with feedback", let's wait till autumn. > > We don't want to wait that long. But now we only need to сome to an > agreement > about CREATE INDEX syntax and where to store the opclass parameters. Attached 2nd version of the patches. Nothing has changed since March, this is just a rebased version. CREATE INDEX syntax and parameters storage method still need discussion. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
- 0001-Add-opclass-parameters-v02.patch
- 0002-Add-opclass-parameters-to-GiST-v02.patch
- 0003-Add-opclass-parameters-to-GIN-v02.patch
- 0004-Add-opclass-parameters-to-GiST-tsvector_ops-v02.patch
- 0005-Add-opclass-parameters-to-contrib-intarray-v02.patch
- 0006-Add-opclass-parameters-to-contrib-ltree-v02.patch
- 0007-Add-opclass-parameters-to-contrib-pg_trgm-v02.patch
- 0008-Add-opclass-parameters-to-contrib-hstore-v02.patch
В письме от 15 ноября 2018 18:26:43 пользователь Nikita Glukhov написал: > >> But since it is now "Rejected with feedback", let's wait till autumn. > > > > We don't want to wait that long. But now we only need to сome to an > > agreement > > about CREATE INDEX syntax and where to store the opclass parameters. > > Attached 2nd version of the patches. Nothing has changed since March, > this is just a rebased version. > > CREATE INDEX syntax and parameters storage method still need discussion. Hi! I'd like to review your patch if you do not mind. Since I am familiar with the subject. I'll add it to the TOP 3 of my IT TODO list :-) But even without carefully reading the code I have several important questions: 1. I am sure that this patch should contain code for opclass praram generic implementation, and _one_ example of how it is used for certain opclass. From my point of view, we should not add all hardcoded constant as opclass params just because we can. It should be done only when it is really needed. And each case needs special consideration. This is not only about code complexity, this worries me less, but also about documentation complexity. Once option is added, it should be documented. This will make documentation more hard to understand (and completely unexperienced users reads documentation too). If this option is needed, this is price we all pay. But if nobody really use it, I see no reason to make things more complex for everyone. 2. The second question, is why you create new column in one of the table of pg_catalog, when we have attopitions in attribute description. And each indexed column in index is technically a relation attribute. Options of index column should be stored there, from my point of view (yes I know it is a hassle to pass it there, but I did in in my preview, it can be done) 2.1. I did not read the code, I'll do it some time late, but the main question I have is, how you will manage case, when you set different values for same options of different columns. like: CREATE INDEX ON arrays USING gist ( arr1 gist__intbig_ops (siglen = 32), arr2 gist__intbig_ops (siglen = 64) ); It is easitly solved when you store them in attoptions. But my question how do you manage it... (I'll get the answer from the code soon, I hope) -- Do code for fun.
Attachment
В письме от 15 ноября 2018 18:26:43 пользователь Nikita Glukhov написал: > Attached 2nd version of the patches. Nothing has changed since March, > this is just a rebased version. > > CREATE INDEX syntax and parameters storage method still need discussion. I've played around a bit with you patch and come to some conclusions, I'd like to share. They are almost same as those before, but now there are more details. Again some issues about storing opclass options in pg_inedx: 1. Having both indoption and indoptions column in pg_index will make someone's brain explode for sure. If not, it will bring troubles when people start confusing them. 2. Now I found out how do you store option values for each opclass: text[] of indoptions in pg_index is not the same as text[] in reloptions in pg_catalog (and it brings more confusion). In reloption each member of the array is a single option. reloptions | {fillfactor=90,autovacuum_enabled=false} In indoptions, is a whole string of options for one of the indexed attributes, each array item has all options for one indexed attribute. And this string needs furthermore parsing, that differs from reloption parsing. indoptions | {"{numranges=150}","{numranges=160}"} This brings us to the following issues: 2a. pg_index stores properties of index in general. Properties of each indexed attributes is stored in pg_attribute table. If we follow this philosophy it is wrong to add any kind of per-attribute array values into pg_index. These values should be added to pg_attribute one per each pg_attribute entry. 2b. Since you've chosen method of storage that differs from one that is used in reloptions, that will lead to two verstions of code that processes the attributes. And from now on, if we accept this, we should support both of them and keep them in sync. (I see that you tried to reuse as much code as possible, but still you added some more that duplicate current reloptions functionality.) I know that relotions code is not really suitable for reusing. This was the reason why I started solving oplcass option task with rewriting reloptions code,to make it 100% reusable for any kind of options. So I would offer you again to join me as a reviewer of that code. This will make opclass code more simple and more sensible, if my option code is used... 3. Speaking of sensible code Datum g_int_options(PG_FUNCTION_ARGS) { Datum raw_options = PG_GETARG_DATUM(0); bool validate = PG_GETARG_BOOL(1); relopt_int siglen = { {"numranges", "number of ranges for compression", 0, 0, 9, RELOPT_TYPE_INT }, G_INT_NUMRANGES_DEFAULT, 1, G_INT_NUMRANGES_MAX }; relopt_gen *optgen[] = { &siglen.gen }; int offsets[] = { offsetof(IntArrayOptions, num_ranges) }; IntArrayOptions *options = parseAndFillLocalRelOptions(raw_options, optgen, offsets, 1, sizeof(IntArrayOptions), validate); PG_RETURN_POINTER(options); } It seems to be not a very nice hack. What would you do if you would like to have both int, real and boolean options for one opclass? I do not know how to implement it using this code. We have only int opclass options for now, but more will come and we should be ready for it. 4. Now getting back to not adding opclass options wherever we can, just because we can: 4a. For inrarray there were no opclass options tests added. I am sure there should be one, at least just to make sure it still does not segfault when you try to set one. And in some cases more tests can be needed. To add and review them one should be familiar with this opclass internals. So it is good when different people do it for different opclasses 4b. When you add opclass options instead of hardcoded values, it comes to setting minimum and maximum value. Why do you choose 1000 as maximum for num_ranges in gist_int_ops in intarray? Why 1 as minimum? All these decisions needs careful considerations and can't be done for bunch of opclasses just in one review. 4c. Patch usually take a long path from prototype to final commit. Do you really want to update all these opclasses code each time when some changes in the main opclass option code is made? ;-) So I would suggest to work only with intarray and add other opclasses later. 5. You've been asking about SQL grammar > CREATE INDEX idx ON tab USING am ( > {expr {opclass | DEFAULT} ({name=value} [,...])} [,...] > ); As for me I do not really care about it. For me all the solutions is acceptable. But looking at is i came to one notion: I've never seen before DEFAULT keyword to be used in this way. There is logic in such usage, but I did not remember any practical usage case. If there are such usages (I can easily missed it) or if it is somehow recommended in SQL standard -- let it be. But if none above, I would suggest to use WITH keyword instead. As it is already used for reloptions. As far as I remember in my prototype I used "WITH OPTIONS" but did if just because did not find my way through yac with single "WITH". So ideal way for me would be create index ON test USING GIST (i WITH (sig_len_int=22)); But as I said it is not thing of importance for me. Just an observation. -- Do code for fun.
Attachment
On Wed, Feb 28, 2018 at 9:46 AM Nikolay Shaplov <dhyan@nataraj.su> wrote: > 1. I've seen you've added a new attribute into pg_index. Why??!! > As far as I can get, if have index built on several columns (A1, A2, A3) you > can set, own opclass for each column. And set individual options for each > opclass if we are speaking about options. So I would expect to have these > options not in pg_index, but in pg_attribute. And we already have one there: > attoptions.I just do not get how you have come to per-index options. May be I > should read code more attentively... It seems sensible to have both per-column options and per-index options. For example, we've got the fastupdate option for GIN, which is a property of the index as a whole, not any individual column. But you could also want to specify some column-specific options, which seems to be what this patch is about, since an opclass is associated with an individual column. And since an index can have more than one column, I agree that it seems more appropriate to store this information in pg_attribute than pg_index. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attached 3rd version of the patches.
On 20.11.2018 14:15, Nikolay Shaplov wrote:
On 20.11.2018 14:15, Nikolay Shaplov wrote:
В письме от 15 ноября 2018 18:26:43 пользователь Nikita Glukhov написал:Attached 2nd version of the patches. Nothing has changed since March, this is just a rebased version. CREATE INDEX syntax and parameters storage method still need discussion.I've played around a bit with you patch and come to some conclusions, I'd like to share. They are almost same as those before, but now there are more details.
Again some issues about storing opclass options in pg_inedx: 1. Having both indoption and indoptions column in pg_index will make someone's brain explode for sure. If not, it will bring troubles when people start confusing them. 2. Now I found out how do you store option values for each opclass: text[] of indoptions in pg_index is not the same as text[] in reloptions in pg_catalog (and it brings more confusion). In reloption each member of the array is a single option. reloptions | {fillfactor=90,autovacuum_enabled=false} In indoptions, is a whole string of options for one of the indexed attributes, each array item has all options for one indexed attribute. And this string needs furthermore parsing, that differs from reloption parsing. indoptions | {"{numranges=150}","{numranges=160}"} This brings us to the following issues: 2a. pg_index stores properties of index in general. Properties of each indexed attributes is stored in pg_attribute table. If we follow this philosophy it is wrong to add any kind of per-attribute array values into pg_index. These values should be added to pg_attribute one per each pg_attribute entry. 2b. Since you've chosen method of storage that differs from one that is used in reloptions, that will lead to two verstions of code that processes the attributes. And from now on, if we accept this, we should support both of them and keep them in sync. (I see that you tried to reuse as much code as possible, but still you added some more that duplicate current reloptions functionality.)
On 21.11.2018 18:04, Robert Haas wrote:
It seems sensible to have both per-column options and per-index options. For example, we've got the fastupdate option for GIN, which is a property of the index as a whole, not any individual column. But you could also want to specify some column-specific options, which seems to be what this patch is about, since an opclass is associated with an individual column. And since an index can have more than one column, I agree that it seems more appropriate to store this information in pg_attribute than pg_index.
Ok, I have switched from pg_index.indoptions to pg_attribute.attoptions. I agree that we should distinguish per-index and per-column options, but they can also be AM-specific and opclass-specific. 'fastupdate' option for GIN is an example of AM-specific per-index option. ASC/DESC and NULLS LAST/FIRST are examples of AM-class-specific per-column options having special SQL syntax. "AM-class-specific" here means "specific only for the class of AMs that support ordering". Now they are stored as flags in pg_index.indoption[] but later can be moved to pg_attribute.attoptions. Don't know should per-column AM-specific and opclass-specific options be stored in the single column attoptions or have separate columns (like attamoptions and attopclassoptions). If we will store them together, we can run into name collisions, but this collisions can be easily resolved using autogenerated prefixes in option names ("am.foo=bar", "opclass.foo=baz"). And another problem is the options with default values. They may be not explicitly specified by the user, and in this case in current implementation nothing will be stored in the catalog because default values can be obtained from the code. But this will not allow changing of default values without compatibility issues. So I think it would be better to store both default and explicitly specified values of all opclass options, but this will require a major refactoring of current API. Also I have idea to define list of opclass parameters declaratively when opclass is created using syntax like the following: CREATE OPERATOR CLASS name [ (param_name param_type [= default_value] [,...]) ] FOR TYPE ... AS ({ OPTIONS function_name ( arg_type [,...] ) /* reloptions => binary */ | OPERATOR ...} [,...] ) But if we remember about the min/max values etc. the syntax will definitely become more complicated, and it will require much more changes in the catalog (up to creation of new table pg_opclassparams). In any case, I think it would be nice to give somehow the user the ability to obtain a list of opclass parameters not only from the documentation. On 20.11.2018 14:15, Nikolay Shaplov wrote:
I know that relotions code is not really suitable for reusing. This was the reason why I started solving oplcass option task with rewriting reloptions code,to make it 100% reusable for any kind of options. So I would offer you again to join me as a reviewer of that code. This will make opclass code more simple and more sensible, if my option code is used...
I absolutely agree that reloptions code needs such refactoring, and I am planning to continue reviewing your patches.
3. Speaking of sensible code Datum g_int_options(PG_FUNCTION_ARGS) { Datum raw_options = PG_GETARG_DATUM(0); bool validate = PG_GETARG_BOOL(1); relopt_int siglen = { {"numranges", "number of ranges for compression", 0, 0, 9, RELOPT_TYPE_INT }, G_INT_NUMRANGES_DEFAULT, 1, G_INT_NUMRANGES_MAX }; relopt_gen *optgen[] = { &siglen.gen }; int offsets[] = { offsetof(IntArrayOptions, num_ranges) }; IntArrayOptions *options = parseAndFillLocalRelOptions(raw_options, optgen, offsets, 1, sizeof(IntArrayOptions), validate); PG_RETURN_POINTER(options); } It seems to be not a very nice hack. What would you do if you would like to have both int, real and boolean options for one opclass? I do not know how to implement it using this code. We have only int opclass options for now, but more will come and we should be ready for it.
Don't understand where is hack. It's possible to parse options of different types, see the following example: typedef struct FooOptions { int int_opt; int str_opt_offset; } FooOptions; relopt_int int_option = { {"int_opt", "desc", 0, 0, 7, RELOPT_TYPE_INT }, INT_OPT_DEFAULT, INT_OPT_MIN, INT_OPT_MAX }; relopt_string str_option = { {"str_opt", "desc", 0, 0, 7, RELOPT_TYPE_STRING }, 0, true, validate_str_option, NULL }; relopt_gen *gen_options[] = { &int_option.gen, &str_option.gen }; int offsets[] = { offsetof(FooOptions, int_opt), offsetof(FooOptions, str_opt_offset) }; FooOptions *opts = parseAndFillLocalRelOptions(raw_options, gen_options, offsets, 2, sizeof(FooOptions), validate); int int_option_value = opts->int_opt; char *str_option_value = (char *) opts + opts->str_opt_offset; Also I can propose to combine this two arrays into the one: relopt_gen_offset options[] = { &int_option.gen, offsetof(FooOptions, int_opt), &str_option.gen, offsetof(FooOptions, str_opt_offset) };
4. Now getting back to not adding opclass options wherever we can, just because we can: 4a. For inrarray there were no opclass options tests added. I am sure there should be one, at least just to make sure it still does not segfault when you try to set one. And in some cases more tests can be needed. To add and review them one should be familiar with this opclass internals. So it is good when different people do it for different opclasses 4b. When you add opclass options instead of hardcoded values, it comes to setting minimum and maximum value. Why do you choose 1000 as maximum for num_ranges in gist_int_ops in intarray? Why 1 as minimum? All these decisions needs careful considerations and can't be done for bunch of opclasses just in one review. 4c. Patch usually take a long path from prototype to final commit. Do you really want to update all these opclasses code each time when some changes in the main opclass option code is made? ;-) So I would suggest to work only with intarray and add other opclasses later.
I decided to leave only GiST tsvector_ops as an example, because it is an in-core opclass, not an extension like intarray.
5. You've been asking about SQL grammarCREATE INDEX idx ON tab USING am ( {expr {opclass | DEFAULT} ({name=value} [,...])} [,...] );As for me I do not really care about it. For me all the solutions is acceptable. But looking at is i came to one notion: I've never seen before DEFAULT keyword to be used in this way. There is logic in such usage, but I did not remember any practical usage case. If there are such usages (I can easily missed it) or if it is somehow recommended in SQL standard -- let it be. But if none above, I would suggest to use WITH keyword instead. As it is already used for reloptions. As far as I remember in my prototype I used "WITH OPTIONS" but did if just because did not find my way through yac with single "WITH". So ideal way for me would be create index ON test USING GIST (i WITH (sig_len_int=22)); But as I said it is not thing of importance for me. Just an observation.
"[opclass] WITH OPTIONS (options)" looks too verbose, of course. "[opclass] WITH (options)" looks acceptable, but it seems to conflict with exclusion constraints syntax ("index_elem WITH operator"). "opclass (options)" looks the most natural to me. But we still need some keyword before the parentheses when the opclass is not specified since we can't distinguish "func_name (func_params)" and "col_name (opclass_options)" in grammar. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Wed, Dec 5, 2018 at 6:58 PM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote: > I agree that we should distinguish per-index and per-column options, but they > can also be AM-specific and opclass-specific. True, but an index is bound to a single AM, and a column is bound to a single opclass which is bound to a single AM. So I'm not very worried about name collisions. Can't we just tell opclass authors to pick names that are unlikely to collide with names chose by the AM or names that are globally reserved? It's hard to imagine that we're ever going to have more than a dozen or so options that could possibly apply to a column, so splitting things up into different namespaces seems like an unnecessary burden on the user. > 'fastupdate' option for GIN is an example of AM-specific per-index option. > > ASC/DESC and NULLS LAST/FIRST are examples of AM-class-specific per-column > options having special SQL syntax. "AM-class-specific" here means "specific > only for the class of AMs that support ordering". Now they are stored as flags > in pg_index.indoption[] but later can be moved to pg_attribute.attoptions. Or left where they are. > And another problem is the options with default values. They may be not > explicitly specified by the user, and in this case in current implementation > nothing will be stored in the catalog because default values can be obtained > from the code. But this will not allow changing of default values without > compatibility issues. So I think it would be better to store both default and > explicitly specified values of all opclass options, but this will require a > major refactoring of current API. Hmm. I think if the default ever changes, it will require a new major release, and we can compensate in pg_dump. That means that a dump taken with an old version will not preserve the options. However, using the pg_dump from the newest release is the recommended procedure, and if you don't do that, you might get outright failures. Failing to preserve an option value in the rare case that a default was changed seems less bad than that. > Also I have idea to define list of opclass parameters declaratively when opclass > is created using syntax like the following: > > CREATE OPERATOR CLASS name [ (param_name param_type [= default_value] [,...]) ] > FOR TYPE ... AS ( > { OPTIONS function_name ( arg_type [,...] ) /* reloptions => binary */ > | OPERATOR ... > } [,...] > ) I'm not sure exposing SQL syntax for this is a very good idea. > "[opclass] WITH OPTIONS (options)" looks too verbose, of course. It's not that bad. > "[opclass] WITH (options)" looks acceptable, but it seems to conflict with > exclusion constraints syntax ("index_elem WITH operator"). Are you sure? The operator can't be ( But it might be confusing anyhow. > "opclass (options)" looks the most natural to me. But we still need some > keyword before the parentheses when the opclass is not specified since we > can't distinguish "func_name (func_params)" and "col_name (opclass_options)" > in grammar. Are you sure? What's the SQL syntax where there is actually a problem here? CREATE INDEX requires parentheses around a non-trivial expression. How about just OPTIONS (options) ? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Dec 5, 2018 at 6:58 PM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote: >> "opclass (options)" looks the most natural to me. But we still need some >> keyword before the parentheses when the opclass is not specified since we >> can't distinguish "func_name (func_params)" and "col_name (opclass_options)" >> in grammar. > Are you sure? What's the SQL syntax where there is actually a problem > here? CREATE INDEX requires parentheses around a non-trivial > expression. Well, the reason we have to require parens around nontrivial expressions is mostly lack of forethought about making the syntax non-ambiguous :-( > How about just OPTIONS (options) ? That would require making OPTIONS a fully reserved word, I think, else it's ambiguous with an opclass name. How about saying that you must give an opclass name if you want to specify options, ie the syntax is [ opclass_name [ ( options... ) ] ] I'm not necessarily wedded to that, but it seems worth throwing out the idea. regards, tom lane
On Thu, Dec 6, 2018 at 11:55 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > How about saying that you must give an opclass name if you want to > specify options, ie the syntax is > > [ opclass_name [ ( options... ) ] ] > > I'm not necessarily wedded to that, but it seems worth throwing > out the idea. Agreed, that's not bad, certainly better than making OPTIONS more reserved. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, while rebasing the patch series [1] adding bloom/multi-minmax BRIN opclasses, I've decided to also rebase it on top of this patch, because it needs the opclass parameters. So I had to rebase this too - it went mostly fine, with reasonably limited bitrot. The rebased patch series is attached. Using this patch series in [1] was mostly smooth, I only have two minor comments at this point: 1) We need a better infrastructure to parse opclass parameters. For example the gtsvector_options does this: Datum gtsvector_options(PG_FUNCTION_ARGS) { Datum raw_options = PG_GETARG_DATUM(0); bool validate = PG_GETARG_BOOL(1); relopt_int siglen = { {"siglen", "signature length", 0, 0, 6, RELOPT_TYPE_INT }, SIGLEN_DEFAULT, 1, SIGLEN_MAX }; relopt_gen *optgen[] = { &siglen.gen }; int offsets[] = { offsetof(GistTsVectorOptions, siglen) }; GistTsVectorOptions *options = parseAndFillLocalRelOptions(raw_options, optgen, offsets, 1, sizeof(GistTsVectorOptions), validate); PG_RETURN_POINTER(options); } So in other words, it builds all the various pieces (relopts, optgen, offsets, lengths etc.) manually, which is really error-prone and difficult to maintain. We need to make it simpler - ideally as simple as defining a custom GUC, or just an array of relopt_* structs. 2) The 0001 part does this in index_opclass_options_generic: get_opclass_name(opclass, InvalidOid, &str); ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("operator class \"%s\" has no options", opclassname.data))); But that's a bit broken, because get_opclass_name() appends the opclass name to 'str', but with a space at the beginning. So this produces messages like ERROR: operator class " int4_bloom_ops" has no options which is not right. I haven't checked if a better function already exists, or whether we need to implement it. regards https://www.postgresql.org/message-id/flat/c1138ead-7668-f0e1-0638-c3be3237e812%402ndquadrant.com -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 2019-Jun-11, Tomas Vondra wrote: > 1) We need a better infrastructure to parse opclass parameters. For > example the gtsvector_options does this: I think this is part of what Nikolay's patch series was supposed to address. But that one has been going way too slow. I agree we need something better. > 2) The 0001 part does this in index_opclass_options_generic: > > get_opclass_name(opclass, InvalidOid, &str); > > ereport(ERROR, > (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > errmsg("operator class \"%s\" has no options", > opclassname.data))); > > But that's a bit broken, because get_opclass_name() appends the opclass > name to 'str', but with a space at the beginning. Yeah, I think just exporting get_opclass_name from ruleutils.c is a bad idea. Sounds like we need a (very small) new function in lsyscache.c that does the job of extracting the opclass name, and then the ruleutils function can call that one to avoid duplicated code. Anyway, this patchset doesn't apply anymore. Somebody (maybe its author this time?) please rebase. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 04.09.2019 1:02, Alvaro Herrera wrote:
On 2019-Jun-11, Tomas Vondra wrote:1) We need a better infrastructure to parse opclass parameters. For example the gtsvector_options does this:I think this is part of what Nikolay's patch series was supposed to address. But that one has been going way too slow. I agree we need something better.
API was simplified in the new version of the patches (see below).
2) The 0001 part does this in index_opclass_options_generic: get_opclass_name(opclass, InvalidOid, &str); ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("operator class \"%s\" has no options", opclassname.data))); But that's a bit broken, because get_opclass_name() appends the opclass name to 'str', but with a space at the beginning.Yeah, I think just exporting get_opclass_name from ruleutils.c is a bad idea. Sounds like we need a (very small) new function in lsyscache.c that does the job of extracting the opclass name, and then the ruleutils function can call that one to avoid duplicated code.
I decided to add new function generate_opclass_name() like existing generate_collation_name(), and to reuse static get_opclass_name().
Anyway, this patchset doesn't apply anymore. Somebody (maybe its author this time?) please rebase.
New version of rebased patches is attached:
1. Opclass parameters infrastructure. API was completely refactored since the previous version: - API was generalized for all AMs. Previously, each AM should implement opclass options parsing/passing in its own way using its own support function numbers. Now each AMs uses 0th support function (discussable). Binary bytea values of parsed options are passed to support functions using special expression node initialized in FmgrInfo.fn_expr (see macro PG_GET_OPCLASS_OPTIONS(), get_fn_opclass_options(), set_fn_opclass_options). - Introduced new structure local_relopts (needs a better name, of course) with a set of functions for opclass/AM options definition. The parsing was moved into index_opclass_option(). That was necessary for mixing of AM- and opclass-specific options. Opclasses now extend the structure with AM's options adding their own options. See patch #4 for an example. 2. New AM method amattoptions(). amattoptions() is used to specify per-column AM-specific options. The example is signature length for bloom indexes (patch #3).
3. Use amattoptions() in contrib/bloom. This is an attempt to replace "col1"-"col32" AM reloptions with per-column option "bits" using new amattoptions API: -CREATE INDEX bloomidx ON tst USING bloom (i, t) WITH (col1 = 3); +CREATE INDEX bloomidx ON tst USING bloom (i DEFAULT (bits = 3), t); 4. An example of using new API in GiST tsvector_ops. Opclass options definition looks much simpler now: typedef struct GistTsVectorOptions { /* GistAttOptions am_att_options; (future) */ int siglen; } GistTsVectorOptions; Datum gtsvector_options(PG_FUNCTION_ARGS) { local_relopts *relopts = (local_relopts *) PG_GETARG_POINTER(0); GistTsVectorOptions *options = NULL; extend_local_reloptions(relopts, options, sizeof(*options)); add_local_int_reloption(relopts, "siglen", "signature length", SIGLEN_DEFAULT, 1, SIGLEN_MAX, &options->siglen); PG_RETURN_VOID(); } 5. Remove pg_index.indoption column (experimental). pg_index.indoption (array of per-column ASC/DESC, NULLS FIRST/LAST flags) can be removed now, and these flags can be stored as special per-column AM-specific options "desc" and "nulls_first". -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Tue, Sep 10, 2019 at 04:30:41AM +0300, Nikita Glukhov wrote: >On 04.09.2019 1:02, Alvaro Herrera wrote: > >>On 2019-Jun-11, Tomas Vondra wrote: >> >>>1) We need a better infrastructure to parse opclass parameters. For >>>example the gtsvector_options does this: >>I think this is part of what Nikolay's patch series was supposed to >>address. But that one has been going way too slow. I agree we need >>something better. > >API was simplified in the new version of the patches (see below). > >>>2) The 0001 part does this in index_opclass_options_generic: >>> >>> get_opclass_name(opclass, InvalidOid, &str); >>> >>> ereport(ERROR, >>> (errcode(ERRCODE_INVALID_PARAMETER_VALUE), >>> errmsg("operator class \"%s\" has no options", >>> opclassname.data))); >>> >>>But that's a bit broken, because get_opclass_name() appends the opclass >>>name to 'str', but with a space at the beginning. >>Yeah, I think just exporting get_opclass_name from ruleutils.c is a bad >>idea. Sounds like we need a (very small) new function in lsyscache.c >>that does the job of extracting the opclass name, and then the ruleutils >>function can call that one to avoid duplicated code. > >I decided to add new function generate_opclass_name() like existing >generate_collation_name(), and to reuse static get_opclass_name(). > >>Anyway, this patchset doesn't apply anymore. Somebody (maybe its >>author this time?) please rebase. > >New version of rebased patches is attached: > >1. Opclass parameters infrastructure. > > API was completely refactored since the previous version: > > - API was generalized for all AMs. Previously, each AM should implement > opclass options parsing/passing in its own way using its own support > function numbers. > Now each AMs uses 0th support function (discussable). Binary bytea values > of parsed options are passed to support functions using special expression > node initialized in FmgrInfo.fn_expr (see macro PG_GET_OPCLASS_OPTIONS(), > get_fn_opclass_options(), set_fn_opclass_options). > I very much doubt these changes are in the right direction. Firstly, using 0 as procnum is weird - AFAICS you picked 0 because after moving it from individual AMs to pg_amproc.h it's hard to guarantee the procnum does not clash with other am procs. But ISTM that's more a hint that the move to pg_amproc.h itself was a bad idea. I suggest we undo that move, instead of trying to fix the symptoms. That is, each AM should have a custom procnum. Also, looking at fn_expr definition in FmgrInfo, I see this fmNodePtr fn_expr; /* expression parse tree for call, or NULL */ it seems like a rather bad idea to reuse that to pass options when it's clearly not meant for that purpose. > - Introduced new structure local_relopts (needs a better name, of course) > with a set of functions for opclass/AM options definition. The parsing > was moved into index_opclass_option(). That was necessary for mixing of > AM- and opclass-specific options. Opclasses now extend the structure with > AM's options adding their own options. See patch #4 for an example. > OK. No opinion on this change yet. >2. New AM method amattoptions(). > > amattoptions() is used to specify per-column AM-specific options. > The example is signature length for bloom indexes (patch #3). > I'm somewhat confused how am I supposed to use this, considering the patch set only defines this for the contrib/bloom index AM. So let's say I want to create a custom BRIN opclass with per-attribute options (like the two BRIN opclasses I work on in the other thread). Clearly, I can't tweak the IndexAmRoutine from the extension. ISTM the patch series should modify all existing index AMs to have a valid amattoptions() implementation, calling the new amproc if defined. Or what is the correct way to define custom opclass for existing index AM (e.g. BRIN) with attribute options? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Sep 11, 2019 at 12:03:58AM +0200, Tomas Vondra wrote: >On Tue, Sep 10, 2019 at 04:30:41AM +0300, Nikita Glukhov wrote: >>On 04.09.2019 1:02, Alvaro Herrera wrote: >> >>>On 2019-Jun-11, Tomas Vondra wrote: >>> >>>>1) We need a better infrastructure to parse opclass parameters. For >>>>example the gtsvector_options does this: >>>I think this is part of what Nikolay's patch series was supposed to >>>address. But that one has been going way too slow. I agree we need >>>something better. >> >>API was simplified in the new version of the patches (see below). >> >>>>2) The 0001 part does this in index_opclass_options_generic: >>>> >>>> get_opclass_name(opclass, InvalidOid, &str); >>>> >>>> ereport(ERROR, >>>> (errcode(ERRCODE_INVALID_PARAMETER_VALUE), >>>> errmsg("operator class \"%s\" has no options", >>>> opclassname.data))); >>>> >>>>But that's a bit broken, because get_opclass_name() appends the opclass >>>>name to 'str', but with a space at the beginning. >>>Yeah, I think just exporting get_opclass_name from ruleutils.c is a bad >>>idea. Sounds like we need a (very small) new function in lsyscache.c >>>that does the job of extracting the opclass name, and then the ruleutils >>>function can call that one to avoid duplicated code. >> >>I decided to add new function generate_opclass_name() like existing >>generate_collation_name(), and to reuse static get_opclass_name(). >> >>>Anyway, this patchset doesn't apply anymore. Somebody (maybe its >>>author this time?) please rebase. >> >>New version of rebased patches is attached: >> >>1. Opclass parameters infrastructure. >> >> API was completely refactored since the previous version: >> >> - API was generalized for all AMs. Previously, each AM should implement >> opclass options parsing/passing in its own way using its own support >> function numbers. >> Now each AMs uses 0th support function (discussable). Binary bytea values >> of parsed options are passed to support functions using special expression >> node initialized in FmgrInfo.fn_expr (see macro PG_GET_OPCLASS_OPTIONS(), >> get_fn_opclass_options(), set_fn_opclass_options). >> > >I very much doubt these changes are in the right direction. Firstly, using >0 as procnum is weird - AFAICS you picked 0 because after moving it from >individual AMs to pg_amproc.h it's hard to guarantee the procnum does not >clash with other am procs. But ISTM that's more a hint that the move to >pg_amproc.h itself was a bad idea. I suggest we undo that move, instead of >trying to fix the symptoms. That is, each AM should have a custom procnum. > >Also, looking at fn_expr definition in FmgrInfo, I see this > > fmNodePtr fn_expr; /* expression parse tree for call, or NULL */ > >it seems like a rather bad idea to reuse that to pass options when it's >clearly not meant for that purpose. > BTW, is there a place where we actually verify the signature of the new am proc? Because I only see code like this: + case OPCLASS_OPTIONS_PROC: + ok = true; + break; in all "validate" functions. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 11.09.2019 1:14, Tomas Vondra wrote: > > BTW, is there a place where we actually verify the signature of the > new am > proc? Because I only see code like this: > > + case OPCLASS_OPTIONS_PROC: > + ok = true; > + break; > > in all "validate" functions. See assignProcTypes() at src/backend/commands/opclasscmds.c -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 11.09.2019 1:03, Tomas Vondra wrote:
On Tue, Sep 10, 2019 at 04:30:41AM +0300, Nikita Glukhov wrote:2. New AM method amattoptions().
amattoptions() is used to specify per-column AM-specific options.
The example is signature length for bloom indexes (patch #3).
I'm somewhat confused how am I supposed to use this, considering the patch
set only defines this for the contrib/bloom index AM. So let's say I want
to create a custom BRIN opclass with per-attribute options (like the two
BRIN opclasses I work on in the other thread). Clearly, I can't tweak the
IndexAmRoutine from the extension. ISTM the patch series should modify all
existing index AMs to have a valid amattoptions() implementation, calling
the new amproc if defined.
Or what is the correct way to define custom opclass for existing index AM
(e.g. BRIN) with attribute options?
Per-attribute opclass options are implemented independently from per-attribute AM options. amattoptions() is optional and needs to be defined only if AM has per-attribute options. amproc #0 is called regardless of whether amattoptions is defined or not. That was the main reason why uniform procnum 0 was picked.
You should simply define function like that and use it as amproc #0:
Datum brin_bloom_options(PG_FUNCTION_ARGS) { local_relopts *relopts = (local_relopts *) PG_GETARG_POINTER(0); BloomOptions *blopts = NULL; extend_local_reloptions(relopts, blopts, sizeof(*blopts)); add_local_real_reloption(relopts, "n_distinct_per_range", "desc", -0.1, -1.0, INT_MAX, &blopts->nDistinctPerRange); add_local_real_reloption(relopts, "false_positive_rate", "desc", 0.01, 0.001, 1.0, &blopts->falsePositiveRate); PG_RETURN_VOID(); }
On Wed, Sep 11, 2019 at 01:44:28AM +0300, Nikita Glukhov wrote: >On 11.09.2019 1:03, Tomas Vondra wrote: > >>On Tue, Sep 10, 2019 at 04:30:41AM +0300, Nikita Glukhov wrote: >> >>>2. New AM method amattoptions(). >>> >>> amattoptions() is used to specify per-column AM-specific options. >>> The example is signature length for bloom indexes (patch #3). >>> >> >>I'm somewhat confused how am I supposed to use this, considering the >>patch >>set only defines this for the contrib/bloom index AM. So let's say I want >>to create a custom BRIN opclass with per-attribute options (like the two >>BRIN opclasses I work on in the other thread). Clearly, I can't tweak the >>IndexAmRoutine from the extension. ISTM the patch series should >>modify all >>existing index AMs to have a valid amattoptions() implementation, calling >>the new amproc if defined. >> >>Or what is the correct way to define custom opclass for existing index AM >>(e.g. BRIN) with attribute options? >> >Per-attribute opclass options are implemented independently from per-attribute >AM options. amattoptions() is optional and needs to be defined only if AM has >per-attribute options. OK, thanks for the explanation - so the per-attribute opclass options will work even when the AM does not have amattoptions() defined. Is there any practical reason why not to just define everything as opclass options and get rid of the amattoptions() entirely? > amproc #0 is called regardless of whether amattoptions >is defined or not. That was the main reason why uniform procnum 0 was >picked. > I still think using procnum 0 and passing the data through fn_expr are not the right solution. Firstly, traditionally the amprocs are either required or optional, with required procs having low procnums and optional starting at 11 or so. The 0 breaks this, because it's optional but it contradicts the procnum rule. Also, what happens if we need to add another optional amproc defined for all AMs? Surely we won't use -1. IMHO we should keep AM-specific procnum and pass it somehow to the AM machinery. FWIW there seems to be a bug in identify_opfamily_groups() which does this: /* Ignore strategy numbers outside supported range */ if (oprform->amopstrategy > 0 && oprform->amopstrategy < 64) thisgroup->operatorset |= ((uint64) 1) << oprform->amopstrategy; but then identify_opfamily_groups() computes allfuncs without any such restriction, i.e. it includes procnum 0. And then it fails on this check if (thisgroup->functionset != allfuncs) {...} None of the built-in brin opclasses defines the new amproc, so the code does not hit this issue. I only noticed this with the opclasses added in the other thread. As for the fn_expr, I still think this seems like a misuse of a field which was intended for something else. I wonder if it might be breaking some exising code - either in code or in some extension. It seems quite possible. It just seems like we're inventing a new way to novel way to pass data into a function while we already have parameters for that purpose. Adding parameters may require care so as not to break existing opclasses, but it seems like the right approach. >You should simply define function like that and use it as amproc #0: > >Datum >brin_bloom_options(PG_FUNCTION_ARGS) >{ > local_relopts *relopts = (local_relopts *) PG_GETARG_POINTER(0); > BloomOptions *blopts = NULL; > > extend_local_reloptions(relopts, blopts, sizeof(*blopts)); > > add_local_real_reloption(relopts, "n_distinct_per_range", "desc", > -0.1, -1.0, INT_MAX, &blopts->nDistinctPerRange); > > add_local_real_reloption(relopts, "false_positive_rate", "desc", > 0.01, 0.001, 1.0, &blopts->falsePositiveRate); > > PG_RETURN_VOID(); >} > OK, this did the trick. Thanks. I don't have a clear opinion on the API, but it certainly looks like an improvement. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Sep 12, 2019 at 02:16:34AM +0200, Tomas Vondra wrote: > I still think using procnum 0 and passing the data through fn_expr are not > the right solution. Firstly, traditionally the amprocs are either required > or optional, with required procs having low procnums and optional starting > at 11 or so. The 0 breaks this, because it's optional but it contradicts > the procnum rule. Also, what happens if we need to add another optional > amproc defined for all AMs? Surely we won't use -1. > > IMHO we should keep AM-specific procnum and pass it somehow to the AM > machinery. The latest review has not been addressed, and this was 7 weeks ago. So I am marking the patch as returned with feedback. -- Michael
Attachment
Attached new version of the patches. On 12.09.2019 3:16, Tomas Vondra wrote: > On Wed, Sep 11, 2019 at 01:44:28AM +0300, Nikita Glukhov wrote: >> On 11.09.2019 1:03, Tomas Vondra wrote: >> >>> On Tue, Sep 10, 2019 at 04:30:41AM +0300, Nikita Glukhov wrote: >>> >>>> 2. New AM method amattoptions(). >>>> >>>> amattoptions() is used to specify per-column AM-specific options. >>>> The example is signature length for bloom indexes (patch #3). >>> >>> I'm somewhat confused how am I supposed to use this, considering the >>> patch >>> set only defines this for the contrib/bloom index AM. So let's say I >>> want >>> to create a custom BRIN opclass with per-attribute options (like the >>> two >>> BRIN opclasses I work on in the other thread). Clearly, I can't >>> tweak the >>> IndexAmRoutine from the extension. ISTM the patch series should >>> modify all >>> existing index AMs to have a valid amattoptions() implementation, >>> calling >>> the new amproc if defined. >>> >>> Or what is the correct way to define custom opclass for existing >>> index AM >>> (e.g. BRIN) with attribute options? >>> >> Per-attribute opclass options are implemented independently from >> per-attribute >> AM options. amattoptions() is optional and needs to be defined only >> if AM has >> per-attribute options. > > OK, thanks for the explanation - so the per-attribute opclass options > will > work even when the AM does not have amattoptions() defined. Is there any > practical reason why not to just define everything as opclass options and > get rid of the amattoptions() entirely? The reason is that it would be no need to duplicate AM-specific per-attribute options in each opclass. For example, contrib/bloom AM has per-column options (signature length), but its opclasses have no options now. >> amproc #0 is called regardless of whether amattoptions >> is defined or not. That was the main reason why uniform procnum 0 was >> picked. >> > > I still think using procnum 0 and passing the data through fn_expr are > not > the right solution. Firstly, traditionally the amprocs are either > required > or optional, with required procs having low procnums and optional > starting > at 11 or so. The 0 breaks this, because it's optional but it contradicts > the procnum rule. Also, what happens if we need to add another optional > amproc defined for all AMs? Surely we won't use -1. > > IMHO we should keep AM-specific procnum and pass it somehow to the AM > machinery. As you suggested, I introduced AM-specific procnum IndexAmRoutine.attoptsprocnum, that has different values for each AM. This was not a problem, because there are only a few AMs now. > > FWIW there seems to be a bug in identify_opfamily_groups() which does > this: > /* Ignore strategy numbers outside supported range */ > if (oprform->amopstrategy > 0 && oprform->amopstrategy < 64) > thisgroup->operatorset |= ((uint64) 1) << oprform->amopstrategy; > > but then identify_opfamily_groups() computes allfuncs without any such > restriction, i.e. it includes procnum 0. And then it fails on this check > > if (thisgroup->functionset != allfuncs) {...} > > None of the built-in brin opclasses defines the new amproc, so the code > does not hit this issue. I only noticed this with the opclasses added in > the other thread. This really seems to be bug in previous patches, but now procnum can't be 0. > > As for the fn_expr, I still think this seems like a misuse of a field > which was intended for something else. I wonder if it might be breaking > some exising code - either in code or in some extension. It seems quite > possible. > > It just seems like we're inventing a new way to novel way to pass data > into a function while we already have parameters for that purpose. Adding > parameters may require care so as not to break existing opclasses, but it > seems like the right approach. fn_expr is used to pass metainformation about user function calls. It is not used In AM method calls, so it seems safe to reuse it. Opclass parameters can be considered here as some kind metainfo about AM calls. Of course, we could add special parameter to each support function of each AM. But it looks too complicated, and every external AM needs to do that to enable opclass parameters. So I think it is desirable to implement a more general solution. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
- 0001-Introduce-opclass-parameters-20200229.patch.gz
- 0002-Introduce-amattoptions-20200229.patch.gz
- 0003-Use-amattoptions-in-contrib_bloom-20200229.patch.gz
- 0004-Use-opclass-parameters-in-GiST-tsvector_ops-20200229.patch.gz
- 0005-Use-opclass-parameters-in-contrib_intarray-20200229.patch.gz
- 0006-Use-opclass-parameters-in-contrib_ltree-20200229.patch.gz
- 0007-Use-opclass-parameters-in-contrib_pg_trgm-20200229.patch.gz
- 0008-Use-opclass-parameters-in-contrib_hstore-20200229.patch.gz
- 0009-Use-opclass-parameters-in-GIN-tsvector_ops-20200229.patch.gz
- 0010-Remove-pg_index.indoption-20200229.patch.gz
Hi! I took a look on this patchset. There is a first set of questions. * Patchset badly needs comments. I've to literally reverse engineer to get what's going on. But I still don't understand many things. * I'm curious about what local_relopts.base field means. void extend_local_reloptions(local_relopts *opts, void *base, Size base_size) { Assert(opts->base_size < base_size); opts->base = base; opts->base_size = base_size; } /* * add_local_reloption * Add an already-created custom reloption to the local list. */ static void add_local_reloption(local_relopts *relopts, relopt_gen *newoption, void *pval) { local_relopt *opt = palloc(sizeof(*opt)); opt->option = newoption; opt->offset = (char *) pval - (char *) relopts->base; relopts->options = lappend(relopts->options, opt); } Datum ghstore_options(PG_FUNCTION_ARGS) { local_relopts *relopts = (local_relopts *) PG_GETARG_POINTER(0); GistHstoreOptions *options = NULL; extend_local_reloptions(relopts, options, sizeof(*options)); add_local_int_reloption(relopts, "siglen", "signature length in bytes", SIGLEN_DEFAULT, 1, SIGLEN_MAX, &options->siglen); PG_RETURN_VOID(); } It's not commented, but I guess it's used to calculate offsets from pointers passed to add_local_*_reloption(). Is it better to just pass offsets to add_local_*_reloption()? * It's generally unclear how does amattoptions and opclass options interact. As I get we now don't have an example where both amattoptions and opclass options involved. What is general benefit from putting both two kind of options into single bytea? Can opclass options method do something useful with amattoptions? For instance, some amattoptions can be calculated from opclass options? That would be some point for putting these options together, but it doesn't look like opclass options method can do this? * It current opclass code safe for introduction new atattoptions. For instace, would ghstore_*() work the same way expecting GistHstoreOptions struct to be passed as opclass options if gist would introduce own attoptions? I guess not. If I'm wrong, please clarify this. And patchset needs comment one could get this without guessing. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attached new version of reordered patches. Questionable patches for AM-specific per-attribute options were moved to the end, so they can be skipped now.
On 16.03.2020 18:22, Alexander Korotkov wrote:
Hi! I took a look on this patchset. There is a first set of questions. * Patchset badly needs comments. I've to literally reverse engineer to get what's going on. But I still don't understand many things. * I'm curious about what local_relopts.base field means. void extend_local_reloptions(local_relopts *opts, void *base, Size base_size) { Assert(opts->base_size < base_size); opts->base = base; opts->base_size = base_size; } /** add_local_reloption* Add an already-created custom reloption to the local list.*/ static void add_local_reloption(local_relopts *relopts, relopt_gen *newoption, void *pval) { local_relopt *opt = palloc(sizeof(*opt)); opt->option = newoption; opt->offset = (char *) pval - (char *) relopts->base; relopts->options = lappend(relopts->options, opt); } Datum ghstore_options(PG_FUNCTION_ARGS) { local_relopts *relopts = (local_relopts *) PG_GETARG_POINTER(0); GistHstoreOptions *options = NULL; extend_local_reloptions(relopts, options, sizeof(*options)); add_local_int_reloption(relopts, "siglen", "signature length in bytes", SIGLEN_DEFAULT, 1, SIGLEN_MAX, &options->siglen); PG_RETURN_VOID(); } It's not commented, but I guess it's used to calculate offsets from pointers passed to add_local_*_reloption(). Is it better to just pass offsets to add_local_*_reloption()?
Yes, 'base' field was used to calculate offsets. Now I started to pass offsets instead of pointers to the fields of template structure (that gave us additional type checking). Some comments were added.
* It's generally unclear how does amattoptions and opclass options interact. As I get we now don't have an example where both amattoptions and opclass options involved. What is general benefit from putting both two kind of options into single bytea? Can opclass options method do something useful with amattoptions? For instance, some amattoptions can be calculated from opclass options? That would be some point for putting these options together, but it doesn't look like opclass options method can do this?
There are no examples for AM and opclass options interaction now. But AM and opclass can register custom callbacks that will be called after parsing in their registration order. In these callbacks it is possible to post-process option values, check presence or absence of some options. The main benefit of putting both option into single bytea is that it does not require major modifications of reloption processing code. And it also does not require to split reloption list obtained from SQL into two separate lists for AM and opclass options.
* It current opclass code safe for introduction new atattoptions. For instace, would ghstore_*() work the same way expecting GistHstoreOptions struct to be passed as opclass options if gist would introduce own attoptions? I guess not. If I'm wrong, please clarify this. And patchset needs comment one could get this without guessing.
Yes, the code will be broken after introduction of GiST per-attribute options. GistHstoreOptions should include GistAttOptions which simply did not exist in the previous version of the patches. I added empty XxxAttOptions for all AMs in patch #7, and GistAttOptions and GinAttOptions now are included into corresponding structures for opclass options.--
Attachment
- 0001-Introduce-opclass-parameters-20200318.patch.gz
- 0002-Use-opclass-parameters-in-GiST-tsvector_ops-20200318.patch.gz
- 0003-Use-opclass-parameters-in-contrib_intarray-20200318.patch.gz
- 0004-Use-opclass-parameters-in-contrib_ltree-20200318.patch.gz
- 0005-Use-opclass-parameters-in-contrib_pg_trgm-20200318.patch.gz
- 0006-Use-opclass-parameters-in-contrib_hstore-20200318.patch.gz
- 0007-Introduce-amattoptions-20200318.patch.gz
- 0008-Use-amattoptions-in-contrib_bloom-20200318.patch.gz
- 0009-Remove-pg_index.indoption-20200318.patch.gz
On Wed, Mar 18, 2020 at 3:28 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote: > Attached new version of reordered patches. Thank you, Nikita. I've merged 0001-0006 into single patch. I think this part is committable to pg13 and I don't see much point in splitting it into multiple patches. I've done a lot of adjustments to this patch. Major things are: * Now contribs can correctly work when extension is not yet upgraded to the new version. So, if no options is available, default values are used. This is important for pg_upgrade. * Regression tests are provided for each modified opclass. * Improvements to comment, documentation, coding style. I'm going to push this if no objections. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Sat, Mar 28, 2020 at 06:05:51PM +0300, Alexander Korotkov wrote: > On Wed, Mar 18, 2020 at 3:28 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote: > > Attached new version of reordered patches. > > I'm going to push this if no objections. Find attached patch with editorial corrections to docs for this commit. --word-diff to follow. commit d3f077b813efa90b25a162bf8d227f3e4218c248 Author: Justin Pryzby <pryzbyj@telsasoft.com> Date: Mon Mar 30 20:55:06 2020 -0500 Doc review: Implement operator class parameters commit 911e70207703799605f5a0e8aad9f06cff067c63 Author: Alexander Korotkov <akorotkov@postgresql.org> diff --git a/doc/src/sgml/hstore.sgml b/doc/src/sgml/hstore.sgml index f1f2b08cd7..b2e04d0815 100644 --- a/doc/src/sgml/hstore.sgml +++ b/doc/src/sgml/hstore.sgml @@ -468,13 +468,13 @@ CREATE INDEX hidx ON testhstore USING GIN (h); </programlisting> <para> <literal>gist_hstore_ops</literal> GiST opclass approximates {+a+} set of key/value pairs as a bitmap signature. [-Optional-]{+Its optional+} integer parameter <literal>siglen</literal>[-of <literal>gist_hstore_ops</literal>-] determines {+the+} signature length in bytes. [-Default signature-]{+The default+} length is 16 bytes. Valid values of signature length are between 1 and 2024 bytes. Longer signatures [-leads-]{+lead+} to {+a+} more precise search [-(scan less-]{+(scanning a smaller+} fraction of [-index, scan-] [- less-]{+the index and+} {+ fewer+} heap pages), [-but-]{+at the cost of a+} larger index. </para> <para> diff --git a/doc/src/sgml/intarray.sgml b/doc/src/sgml/intarray.sgml index 72b4b23c15..7956a746a6 100644 --- a/doc/src/sgml/intarray.sgml +++ b/doc/src/sgml/intarray.sgml @@ -265,7 +265,7 @@ </para> <para> Two [-parametrized-]{+parameterized+} GiST index operator classes are provided: <literal>gist__int_ops</literal> (used by default) is suitable for small- to medium-size data sets, while <literal>gist__intbig_ops</literal> uses a larger signature and is more @@ -276,22 +276,23 @@ </para> <para> <literal>gist__int_ops</literal> approximates {+an+} integer set as an array of integer ranges. [-Optional-]{+Its optional+} integer parameter <literal>numranges</literal>[-of-] [- <literal>gist__int_ops</literal>-] determines {+the+} maximum number of ranges in one index key. [-Default-]{+The default+} value of <literal>numranges</literal> is 100. Valid values are between 1 and 253. Using larger arrays as GiST index keys leads to {+a+} more precise search [-(scan less-]{+(scaning a smaller+} fraction of [-index, scan less-]{+the indexand+} {+ fewer+} heap pages), [-but-]{+at the cost of a+} larger index. </para> <para> <literal>gist__intbig_ops</literal> approximates {+an+} integer set as a bitmap [-signature. Optional-]{+signature XXX. Its optional+} integer parameter <literal>siglen</literal>[-of-] [- <literal>gist__intbig_ops</literal>-] determines {+the+} signature length in bytes. [-Default-]{+The default+} signature length is 16 bytes. Valid values of signature length are between 1 and 2024 bytes. Longer signatures [-leads-]{+lead+} to {+a+} more precise search [-(scan less-]{+(scanning a smaller+} fraction of [-index, scan less-]{+the index and fewer+} heap pages), [-but-]{+at+} {+ the cost of a+} larger index. </para> <para> diff --git a/doc/src/sgml/ltree.sgml b/doc/src/sgml/ltree.sgml index ae4b33ec85..4971b71524 100644 --- a/doc/src/sgml/ltree.sgml +++ b/doc/src/sgml/ltree.sgml @@ -506,16 +506,16 @@ Europe & Russia*@ & !Transportation <literal>@</literal>, <literal>~</literal>, <literal>?</literal> </para> <para> <literal>gist_ltree_ops</literal> GiST opclass approximates {+a+} set of path labels as a bitmap signature. [-Optional-]{+Its optional+} integer parameter <literal>siglen</literal>[-of <literal>gist_ltree_ops</literal>-] determines {+the+} signature length in bytes. [-Default-]{+The default+} signature length is 8 bytes. Valid values of signature length are between 1 and 2024 bytes. Longer signatures [-leads-]{+lead+} to {+a+} more precise search [-(scan less-]{+(scanning a smaller+} fraction of [-index,scan-] [- less-]{+the index and+} {+ fewer+} heap pages), [-but-]{+at the cost of a+} larger index. </para> <para> Example of creating such an index with [-a-]{+the+} default signature length of 8 bytes: </para> <programlisting> CREATE INDEX path_gist_idx ON test USING GIST (path); @@ -535,13 +535,13 @@ CREATE INDEX path_gist_idx ON test USING GIST (path gist_ltree_ops(siglen=100)); <literal>@</literal>, <literal>~</literal>, <literal>?</literal> </para> <para> <literal>gist__ltree_ops</literal> GiST opclass works [-similar-]{+similarly+} to <literal>gist_ltree_ops</literal> and also takes signature length as a parameter. [-Default-]{+The default+} value of <literal>siglen</literal> in <literal>gist__ltree_ops</literal> is 28 bytes. </para> <para> Example of creating such an index with [-a-]{+the+} default signature length of 28 bytes: </para> <programlisting> CREATE INDEX path_gist_idx ON test USING GIST (array_path); diff --git a/doc/src/sgml/pgtrgm.sgml b/doc/src/sgml/pgtrgm.sgml index dde02634ae..97b3d13a88 100644 --- a/doc/src/sgml/pgtrgm.sgml +++ b/doc/src/sgml/pgtrgm.sgml @@ -391,13 +391,13 @@ CREATE INDEX trgm_idx ON test_trgm USING GIN (t gin_trgm_ops); </para> <para> <literal>gist_trgm_ops</literal> GiST opclass approximates {+a+} set of trigrams as a bitmap signature. [-Optional-]{+Its optional+} integer parameter <literal>siglen</literal>[-of <literal>gist_trgm_ops</literal>-] determines {+the+} signature length in bytes. [-Default signature-]{+The default+} length is 12 bytes. Valid values of signature length are between 1 and 2024 bytes. Longer signatures [-leads-]{+lead+} to {+a+} more precise search [-(scan less-]{+(scanning a smaller+} fraction of [-index, scan-] [- less-]{+the index and+} {+ fewer+} heap pages), [-but-]{+at the cost of a+} larger index. </para> <para> diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml index 2217fcd6c2..0dc427289d 100644 --- a/doc/src/sgml/textsearch.sgml +++ b/doc/src/sgml/textsearch.sgml @@ -3670,17 +3670,17 @@ SELECT plainto_tsquery('supernovae stars'); to check the actual table row to eliminate such false matches. (<productname>PostgreSQL</productname> does this automatically when needed.) GiST indexes are lossy because each document is represented in the index by a fixed-length signature. [-Signature-]{+The signature+} length in bytes is determined by the value of the optional integer parameter <literal>siglen</literal>. [-Default-]{+The default+} signature length (when <literal>siglen</literal> is not [-specied)-]{+specified)+} is 124 bytes, [-maximal-]{+the maximum signature+} length is 2024 bytes. The signature is generated by hashing each word into a single bit in an n-bit string, with all these bits OR-ed together to produce an n-bit document signature. When two words hash to the same bit position there will be a false match. If all words in the query have matches (real or false) then the table row must be retrieved to see if the match is correct. Longer signatures [-leads-]{+lead+} to {+a+} more precise search [-(scan less-]{+(scanning a smaller+} fraction of [-index, scan less-]{+the index and fewer+} heap pages), [-but-]{+at the cost of a+} larger index. </para> <para>
Attachment
On Tue, Mar 31, 2020 at 5:44 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > On Sat, Mar 28, 2020 at 06:05:51PM +0300, Alexander Korotkov wrote: > > On Wed, Mar 18, 2020 at 3:28 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote: > > > Attached new version of reordered patches. > > > > I'm going to push this if no objections. > > Find attached patch with editorial corrections to docs for this commit. Cool, thank you! > [-signature. Optional-]{+signature XXX. Its optional+} integer parameter <literal>siglen</literal>[-of-] What is XXX supposed to be? The rest of patch looks good to me. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Tue, Mar 31, 2020 at 12:15 PM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > What is XXX supposed to be? > > The rest of patch looks good to me. I've pushed the patch excepts XXX. Thank you. You're welcome to clarify XXX and/or do additional corrections. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Wed, Apr 01, 2020 at 02:53:41PM +0300, Alexander Korotkov wrote: > On Tue, Mar 31, 2020 at 12:15 PM Alexander Korotkov > <a.korotkov@postgrespro.ru> wrote: > > What is XXX supposed to be? > > > > The rest of patch looks good to me. > > I've pushed the patch excepts XXX. Thank you. > You're welcome to clarify XXX and/or do additional corrections. XXX was a reminder to myself to check the accuracy of that description, rather than just its language. As you can see, I didn't manage to do that, but the language is at least consistent. Thanks for pushing the update. -- Justin
On Wed, Apr 1, 2020 at 2:59 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > On Wed, Apr 01, 2020 at 02:53:41PM +0300, Alexander Korotkov wrote: > > On Tue, Mar 31, 2020 at 12:15 PM Alexander Korotkov > > <a.korotkov@postgrespro.ru> wrote: > > > What is XXX supposed to be? > > > > > > The rest of patch looks good to me. > > > > I've pushed the patch excepts XXX. Thank you. > > You're welcome to clarify XXX and/or do additional corrections. > > XXX was a reminder to myself to check the accuracy of that description, rather > than just its language. As you can see, I didn't manage to do that, but the > language is at least consistent. > > Thanks for pushing the update. Thank you! ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company