[PROPOSAL][PROTOTYPE] Individual options for each index column: Opclass options - Mailing list pgsql-hackers

From Nikolay Shaplov
Subject [PROPOSAL][PROTOTYPE] Individual options for each index column: Opclass options
Date
Msg-id 5213596.TqFRiqmCTe@nataraj-amd64
Whole thread Raw
Responses Re: [PROPOSAL][PROTOTYPE] Individual options for each index column: Opclass options
List pgsql-hackers
Hi!

There was an interesting idea. Now it is possible to add in runtime options
for relation, when you are creating a new extension with new access method.

But if you add a custom operator class, you have no tool to tune it's
behavior. But in some cases it would be nice to have such tool. For example in
intarray extension for gist__intbig_ops there is an option SIGLENINT that
defines the length of the signature that will be used while creating index. Now
it is #defined in the code, but theoretically one can tune index
size/performance by manually setting SIGLENINT value that is more suitable for
his data.

Moreover the authors of bloom extension have to create workaround for
customizing each column behavior: they sets number of bits per column by
setting reloptions col1 — col16. This is not handy. I would seven say a little
bit ugly.

So adding options for opclass seems to be really good idea.

As far as in postgres index is a relation, index column is an attribute in
this relation and each column might have only one opclass, this brings us to
conclusion that when we as speaking about options of an opclass in the index
column, then we can actually speak about an options of an attribute.

If we look into pg_attribute table, we will see, that there is already
attoptions attribute there. So we can use it. Just add a way to add them in
runtime and teach opclass to use it.

The general idea is the following:

Each access method should have amattoptions function, that works similar to
amoptions function (parses and validate text[] of attoptions into internal
data structure) but it takes an additional argument -- the number of an
attribute we are processing.

amattoptions do this job by calling a specific support function of an opclass,
that knows how to parse it (and do it in a similar way as amoptions function
of an access method)

the result of amattoptions should be cached as attoptions are cached now or
similar way. (In current version of prototype I do it wrong, but should change
it later)

All the support functions of the access method should accept parsed attoptions
in their argument lists, and use it if necessary. In the prototype I've
implemented if for gist opclass, and added attoptions as a last argument of
all support functions. So it is even backward compatible as all funtions I
have not touched just ignores this last argument.

The implementation of an access method should get attoptions from cache, if
they are there, or from am->amattoptions if not (in the prototype it is done
using get_index_attribute_options function) and pass it to the support
functions.

In the prototype I've added a member to GISTSTATE structure for
storing parsed attoptions, and fill in initGISTstate and used it all over the
code. In other implementations solutions might be different.

So... The prototype I wrote is in attachment. It is really dirty. Some things
are not done yet. Some things should be changed. But never the less work, and
it shows what I am going to do better than all explanations.

To see how it works you should do the following:

# create extension intarray ;
# create table test (i int[]);
# create table test2 (i int[]);
# create index ON test USING GIST (i USING gist__intbig_ops WITH OPTIONS
(sig_len_int=22) );
# create index ON test2 USING GIST (i USING gist__intbig_ops WITH OPTIONS
(sig_len_int=120) );

# select attoptions from pg_attribute where attrelid = 'test_i_idx'::regclass
OR attrelid = 'test2_i_idx'::regclass;

    attoptions
-------------------
 {sig_len_int=120}
 {sig_len_int=22}
(2 rows)

and if you uncomment development warning output from the code you will see
that these values are really used through the code.

Speaking of the syntax, I've added WITH OPTIONS keywords for specifying
attoptions, not just WITH, that most of us would expect.  I did it because
index_elem, the node that is used for definition of an index column is used not
only in CREATE INDEX expression, but also in CREATE TABLE CONSTRAINT EXCLUDE.
Because CREATE TABLE CONSTRAINT EXCLUDE actually creates an index, and one can
specify there full specification of index element there. But right after this
you should write "WITH operator" clause.

So if I will try to use WITH keyword for attoptions, a syntax parser will be
confused, as it cat not distinguish one WITH keyword from another. So decided
to use WITH OTPTIONS keywords for attoptions.

There might be some tricks that will allow to use WITH in both places but I
think "WITH OPTIONS" might make SQL code more readable, because an expression
with two "WITH" in a row will be hard to read for human too ;-)

So we come to the last part of this message, that is most important for me for
the moment:

One of the issues that I've solved while writing this patch, was the
following: attoptions and reloptions are should use almost the same code for
options parsing. For reloptions this code is written, it is good, but you
can't reuse it for attoptions as it is binded to relopt_kind and quite
centralized.

So first step to implement attoptions for indexes, will be rewriting reoptions
code to get rid of relopt_kind, make all access methods has its own option
descriptor catalogs and use reference to this catalog instead of relopt_kind.

This should be a separate patch and I think I will start another thread for
it. I will write another letter for relopt_kind remove issue...

--
Nikolay Shaplov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
Attachment

pgsql-hackers by date:

Previous
From: Jim Nasby
Date:
Subject: Re: pg_dump -j against standbys
Next
From: Jim Nasby
Date:
Subject: Re: Calling json_* functions with JSONB data