[HACKERS] [PATCH] Move all am-related reloption code into src/backend/access/[am-name] and get rid of relopt_kind for custom AM - Mailing list pgsql-hackers
From | Nikolay Shaplov |
---|---|
Subject | [HACKERS] [PATCH] Move all am-related reloption code into src/backend/access/[am-name] and get rid of relopt_kind for custom AM |
Date | |
Msg-id | 2146419.veIEZdk4E4@x200m Whole thread Raw |
Responses |
Re: [HACKERS] [PATCH] Move all am-related reloption code into src/backend/access/[am-name] and get rid of relopt_kind for custom AM
Re: [HACKERS] [PATCH] Move all am-related reloption code intosrc/backend/access/[am-name] and get rid of relopt_kind for custom AM |
List | pgsql-hackers |
(Please see Attribution notice at the bottom of the letter) Hi! Some time ago I've send a proposal, about removing all am-related code from src/backend/access/common/reloptions.c and move it into somewhere inside am code. It was in https://www.postgresql.org/message-id/4636505.Bu9AKW1Kzc%40nataraj-amd64 thread. Now I have a patch that is ready Before explaining what have been done, I need to define some concepts, so it would be easy to speak about them in the further explanation. Reloptions values in postgres actually exists in four representations: DefList*, TextArray[], Bytea and so called Values. - DefList representation is a list of DefElem, it is the representation in which reloptions comes from syntax analyzer, when relation is created or altered, and also this representation that is needed for pg_dump to dump a create statement that creates this relation. - TextArray[] representation is a way, how reloptions are stored in pg_catalog. It is TEXT[] attribute in the pg_catalog table and from the poinf o view of postgres code it is Datum array of TEXT Datums, when read into the memory - Bytea (or I will also call it Binary) representation, is a representation with which relation code really works. It has C-structure inside in which fixed-length options are sored, some space is reserved for storing varlen values after the structure, and certainly it has BYTEA header in front of all of it. It is cached, it is fast to access. It is a good stuff. - Values (or in some cases RawValues) -- internal representation of option parser. Technically it is List of option_value structures (old name for it is relopt_value). This representation is used for parsing and validating options. Actually all the convertations between representations listed above are dove through Values representation. Values are called RawValues when Values List were already created, but it was not parsed yet, and all option values are in raw state: represented as a text. DefList and TextArray representations are converted in both ways(always through Values representation): we need DefList -> TextArray to create or change entry in the pg_catalog, and need TextArray -> DefList for pg_dump Bytea representation is converted from TextArray representation (also through Values representation). When relation code, tries to get cached Bytea representation for certain relation, and none is actually cached, then cache code calls extractRelOptions function that fetches entry from pg_catalog for that relation, gets reloptions TextArray[], converts it into Bytea and cache it. Each reloption has it's own definition. This definitions tells how this option should be parsed, validated and converted. Before my patch this information were stored in relopt_gen and relopt_parse_elt structures. In my patch all this information were moved into option_definition_basic structure (that is actually old relopt_gen + relopt_parse_elt) The set of options definition I would call a options catalog. Before my patch there was one global options catalog (actually it was divided into parts by data types and has another part for dynamically added options, but nevertheless, it was global). After my patch there would be a separate options catalog for each relation type. For AM-relation this catalog would be available via AM-handler. for other relation types it would be still kept in reloption.c Now I am ready to tell what actually have been done in this patch: 1. I moved options definitions from global lists from reloptions.c [type name]RelOpts to lists that are kept inside AM code. One list per AM. heap options, toast options and options for all other relation types that are not accessible through AccessMethod, all were kept inside reloptions.c, but now it is one list for each relation type, not a global one 2. Each AccessMethod had amoptions function that was responsible for converting reloptions from TextArray[] representation into Bytea representaions. This function used functions from reloptions.c and also had an array of relopt_parse_elt that defines how parsed reloption data should be packed inside Bytea chunk. I've moved data from relopt_parse_elt array into option definitions that are stored in the catalog (since catalog are now defined inside the AccessMethod we have access to a structure of Bytea representation at the place where we define option catalog) 3. Speaking of amoptions function, I've completely removed it, since we do not need it. For developers of Access Methods who still want to do some custom action with just parsed data, I've added postprocess_fun to option catalog. If this function is defined, it is called by convertation function right after Bytea representation were created. In postprocess_fun developer can do some custom validation, or change just parsed data. This feature is used in bloom index. 4. Instead of amoptions function I've added amrelopt_catalog function to the Access Method. This function returns option definition catalog for an Access Method. The catalog is used for processing options. 5. As before, relation cache calls extractRelOptions when it needs options that were not previously cached. Before my patch, it created Bytea representations for non-AM relations, and called amoptions AM function to get Bytea representation for AM relation. In by patch, it now gets option catalog, (using local functions for non-AM relations, and using amrelopt_catalog function for AM-relations), and then use this catalog to convert options from TextArray into Bytea representation. 6. I've added some more featues: - Now you can set OPTION_DEFINITION_FLAG_FORBID_ALTER flag in definition of the option, and then postgres will refuse to change option value using ALTER command. In many indexes, some options are used only for index creation. After this it's value is saved in MetaPage, and used from there. Nevertheless user is allowed to change option value, though it affects nothing, one need to recreate index to really change such value. Now using this flag we can prevent user from getting an illusion that he successfully changed option value. - After applying my patch if you try to set toast. option for table that does not have toast, you will get an error. Before postgres will accept toast option for a table without a toast, but this value will be lost. This is bad behavior, so now postgres will throw an error in this case. - I've noticed that all string relation options in postgres are technically enum options. So I've added enum option type. This will save some bytes of the memory and some tacts of the CPU. I also left string option type (through it is not used now). A. Korotkov said that it might be needed later for storing patterns, xml/json paths or something like that. - Added some more tests that will trigger more options code. Though I am dreaming about more advanced test system, that will allow to check that certain value is received somewhere deep in the code. 7. And now the most controversial change. I've got rid of StdRdOptions structure, in which options for heap, toast, nbtree, hash and spgist were stored in Bytea representation. In indexes only fillfactor value were actually used from the whole big structure. So I've created a separate structure for each relation type. HeapOptions and ToastOptions are very similar to each other. So common part of creating these catalogs were moved to add_autovacuum_options function that are called while creating each catalog. Now to the controversial part: in src/include/utils/rel.h had a lot of Relation* macroses that used StdRdOptions structure. I've changed them into View* Heap* and Toast* analogues, and changed the code to use these macroses instead of Relation* one. But this part of code I least sure of. I'd like to ask experienced developers to double check it. 8. I've moved all options-abstract code into options.c and options.h file, and left in reloptions.c and reloptions.h only the code that concerns relation options. Actually the main idea of this patch was to get this abstract code in order to use it for custom attribute options later. All the rest just a side effects :-) So, before adding this to commitfest I want somebody to give a general look at it, if the code is OK. Alvaro Herrera, you once said that you can review the patch... The patch is available in the attachment. It can be applied to current master You can also see latest version on my github https://github.com/dhyannataraj/postgres/tree/reloption_fix at the reloption_fix branch. ATTRIBUTION NOTICE: I wrote this patch, when I was an employee in Postgres Professional company. So this patch should be attributed as patch from Postgres Pro (or something like that). For some reasons I did not manage to commit this patch while I left that job. But I got a permission to commit it even if I left the company. So my contribution to this patch as a independent developer was final code cleanup, and writing some more comments. All the rest was a work of Postgres Pro employee Nikolay Shaplov. -- Nikolay Shaplov, independent Perl & C/C++ developer. Available for hire. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: