[HACKERS] [PATCH] Move all am-related reloption code into src/backend/access/[am-name] and get rid of relopt_kind for custom AM - Mailing list pgsql-hackers

From Nikolay Shaplov
Subject [HACKERS] [PATCH] Move all am-related reloption code into src/backend/access/[am-name] and get rid of relopt_kind for custom AM
Date
Msg-id 2146419.veIEZdk4E4@x200m
Whole thread Raw
Responses Re: [HACKERS] [PATCH] Move all am-related reloption code into src/backend/access/[am-name] and get rid of relopt_kind for custom AM
Re: [HACKERS] [PATCH] Move all am-related reloption code intosrc/backend/access/[am-name] and get rid of relopt_kind for custom AM
List pgsql-hackers
(Please see Attribution notice at the bottom of the letter)

Hi! Some time ago I've send a proposal, about removing all am-related code 
from src/backend/access/common/reloptions.c and move it into somewhere inside 
am code. It was in 
https://www.postgresql.org/message-id/4636505.Bu9AKW1Kzc%40nataraj-amd64
thread.

Now I have a patch that is ready

Before explaining what have been done, I need to define some concepts, so it 
would be easy to speak about them in the further explanation.

Reloptions values in postgres actually exists in four representations: 
DefList*, TextArray[], Bytea and so called Values.

- DefList representation is a list of DefElem, it is the representation in 
which reloptions comes from syntax analyzer, when relation is created or 
altered, and also this representation that is needed for pg_dump to dump a 
create statement that creates this relation.

- TextArray[] representation is a way, how reloptions are stored in 
pg_catalog. It is TEXT[] attribute in the pg_catalog table and from the poinf 
o view of postgres code it is Datum array of TEXT Datums, when read into the 
memory

-  Bytea (or I will also call it Binary) representation, is a representation 
with which relation code really works. It has C-structure inside in which 
fixed-length options are sored, some space is reserved for storing varlen 
values after the structure, and certainly it has BYTEA header in front of all 
of it. It is cached, it is fast to access. It is a good stuff.

- Values (or in some cases RawValues) -- internal representation of option 
parser. Technically it is List of option_value structures (old name for it is 
relopt_value). This representation is used for parsing and validating options. 
Actually all the convertations between representations listed above are dove 
through Values representation. Values are called RawValues when Values List 
were already created, but it was not parsed yet, and all option values are in 
raw state: represented as a text.


DefList and TextArray representations are converted in both ways(always 
through Values representation): we need DefList -> TextArray to create or 
change entry in the pg_catalog, and need TextArray -> DefList for pg_dump

Bytea representation is converted from TextArray representation (also through 
Values representation). When relation code, tries to get cached 
Bytea representation for certain relation, and none is actually cached, then 
cache code calls extractRelOptions function that fetches entry from pg_catalog 
for that relation, gets reloptions  TextArray[], converts it into Bytea and 
cache it.


Each reloption has it's own definition. This definitions tells how this option 
should be parsed, validated and converted. Before my patch this information 
were stored in relopt_gen and relopt_parse_elt structures. In my patch all 
this information were moved into option_definition_basic structure (that is 
actually old relopt_gen + relopt_parse_elt)

The set of options definition I would call a options catalog. Before my patch 
there was one global options catalog (actually it was divided into parts by 
data types and has another part for dynamically added options, but 
nevertheless, it was global). After my patch there would be a separate options 
catalog for each relation type. For AM-relation this catalog would be 
available via AM-handler. for other relation types it would be still kept in 
reloption.c



Now I am ready to tell what actually have been done in this patch:

1. I moved options definitions from global lists from reloptions.c 
[type name]RelOpts to lists that are kept inside AM code. One list per AM.  
heap options, toast options and options for all other relation types that are 
not accessible through  AccessMethod, all were kept inside reloptions.c, but 
now it is one list for each relation type, not a global one

2. Each AccessMethod had amoptions function that was responsible for 
converting reloptions from TextArray[] representation into Bytea 
representaions. This function used functions from reloptions.c and also had
an array of relopt_parse_elt that defines how parsed reloption data should be 
packed inside Bytea chunk. I've moved data from relopt_parse_elt array into 
option definitions that are stored in the catalog (since catalog are  now 
defined inside the AccessMethod we have access to a structure of Bytea 
representation at the place where we define option catalog)

3. Speaking of amoptions function, I've completely removed it, since we do not 
need it. For developers of Access Methods who still want to do some custom 
action with just parsed data, I've added postprocess_fun to option catalog. If 
this function is defined, it is called by convertation function right after 
Bytea representation were created. In postprocess_fun developer can do some 
custom validation, or change just parsed data. This feature is used in bloom 
index.

4. Instead of amoptions function I've added amrelopt_catalog function to the 
Access Method. This function returns option definition catalog for an Access 
Method. The catalog is used for processing options.

5. As before, relation cache calls extractRelOptions when it needs options 
that were not previously cached.
Before my patch, it created Bytea representations for non-AM relations, and 
called amoptions AM function to get Bytea representation for AM relation.
In by patch, it now gets option catalog, (using local functions for non-AM 
relations, and  using amrelopt_catalog function for AM-relations), and then 
use this catalog to convert options from TextArray into Bytea representation.

6. I've added some more featues: 

- Now you can set OPTION_DEFINITION_FLAG_FORBID_ALTER flag in definition of the 
option, and then postgres will refuse to change option value using ALTER 
command. In many indexes, some options are used only for index creation. After 
this it's value is saved in MetaPage, and used from there. Nevertheless user 
is allowed to change option value, though it affects nothing, one need to 
recreate index to really change such value. Now using this flag we can prevent 
user from getting an illusion that he successfully changed option value.

- After applying my patch if you try to set toast. option for table that does 
not have toast, you will get an error. Before postgres will accept toast 
option for a table without a toast, but this value will be lost. This is bad 
behavior, so now postgres will throw an error in this case.

- I've noticed that all string relation options in postgres are technically 
enum options. So I've added enum option type. This will save some bytes of the 
memory and some tacts of the CPU. I also left string option type (through it 
is not used now). A. Korotkov said that it might be needed later for storing 
patterns, xml/json paths or something like that.

- Added some more tests that will trigger more options code. Though I am 
dreaming about more advanced test system, that will allow to check that 
certain value is received somewhere deep in the code.

7. And now the most controversial change. I've got rid of  StdRdOptions 
structure, in which options for heap, toast, nbtree, hash and spgist were 
stored in Bytea representation. In indexes only fillfactor value were actually 
used from the whole big structure. So I've created a separate structure for 
each relation type. HeapOptions and ToastOptions are very similar to each 
other. So common part of creating these catalogs were moved to 
add_autovacuum_options function that are called while creating each catalog.

Now to the controversial part: in src/include/utils/rel.h had a lot of 
Relation* macroses that used StdRdOptions structure. I've changed them into 
View* Heap* and Toast* analogues, and changed the code to use these macroses 
instead of Relation* one. But this part of code I least sure of. I'd like to 
ask experienced developers to double check it.

8. I've moved all options-abstract code into options.c and options.h file, and 
left in reloptions.c and reloptions.h only the code that concerns relation
options. Actually the main idea of this patch was to get this abstract code in 
order to use it for custom attribute options later. All the rest just a side 
effects :-)

So, before adding this to commitfest I want somebody to give a general look at 
it, if the code is OK.

Alvaro Herrera, you once said that you can review the patch...

The patch is available in the attachment.  It can be applied to current master

You can also see latest version on my github 
https://github.com/dhyannataraj/postgres/tree/reloption_fix
at the reloption_fix branch.


ATTRIBUTION NOTICE: I wrote this patch, when I was an  employee in Postgres 
Professional company. So this patch should be attributed as patch from 
Postgres Pro (or something like that). For some reasons I did not manage to 
commit this patch while I left that job. But I got a permission to commit it 
even if I left the company.
So my contribution to this patch as a independent developer was final code 
cleanup, and writing some more comments. All the rest was a work of Postgres 
Pro employee Nikolay Shaplov.

-- 
Nikolay Shaplov, independent Perl & C/C++ developer. Available for hire.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [HACKERS] pg_sequences bug ?
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] Add pgstathashindex() to get hash index table statistics.