Re: Refactoring of heapam code. - Mailing list pgsql-hackers
From | Thom Brown |
---|---|
Subject | Re: Refactoring of heapam code. |
Date | |
Msg-id | CAA-aLv4TS4-QUn=NXcJ2J0wdR4=y9X8etJ1kFWjQXBwpPsSVCQ@mail.gmail.com Whole thread Raw |
In response to | Refactoring of heapam code. (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>) |
List | pgsql-hackers |
On 5 August 2016 at 08:54, Anastasia Lubennikova <a.lubennikova@postgrespro.ru> wrote: > Working on page compression and some other issues related to > access methods, I found out that the code related to heap > looks too complicated. Much more complicated, than it should be. > Since I anyway got into this area, I want to suggest a set of improvements. > > There is a number of problems I see in the existing code: > Problem 1. Heap != Relation. > > File heapam.c has a note: > * This file contains the heap_ routines which implement > * the POSTGRES heap access method used for all POSTGRES > * relations. > This statement is wrong, since we also have index relations, > that are implemented using other access methods. > > Also I think that it's quite strange to have a function heap_create(), that > is called > from index_create(). It has absolutely nothing to do with heap access > method. > > And so on, and so on. > > One more thing that requires refactoring is "RELKIND_RELATION" name. > We have a type of relation called "relation"... > > Problem 2. > Some functions are shared between heap and indexes access methods. > Maybe someday it used to be handy, but now, I think, it's time to separate > them, because IndexTuples and HeapTuples have really little in common. > > Problem 3. The word "heap" is used in many different contexts. > Heap - a storage where we have tuples and visibility information. > Generally speaking, it can be implemented over any data structure, > not just a plain table as it is done now. > > Heap - an access method, that provides a set of routines for plain table > storage, such as insert, delete, update, gettuple, vacuum and so on. > We use this access method not only for user's tables. > > There are many types of relations (pg_class.h): > #define RELKIND_RELATION 'r' /* ordinary table */ > #define RELKIND_INDEX 'i' /* secondary index */ > #define RELKIND_SEQUENCE 'S' /* sequence object */ > #define RELKIND_TOASTVALUE 't' /* for out-of-line > values */ > #define RELKIND_VIEW 'v' /* view */ > #define RELKIND_COMPOSITE_TYPE 'c' /* composite type */ > #define RELKIND_FOREIGN_TABLE 'f' /* foreign table */ > #define RELKIND_MATVIEW 'm' /* materialized view */ > > They can be logically separated into three categories: > "primary storage" - r, S, t, v. They store data and visibility information. > The only implementation is heapam.c > "secondary index" - i. They store some data and pointers to primary storage. > Various existing AMs and managed by AM API. > "virtual relations" - c, f, m. They have no physical storage, only entries > in caches and catalogs. > > Now, for some reasons, we sometimes use name "heap" for both > "primary storage" and "virual relations". See src/backend/catalog/heap.c. > But some functions work only with "primary storage". See pgstat_relation(). > And we constantly have to check relkind everywhere. > I'd like it would be clear from the code interface and naming. > > So there is a couple of patches. They do not cover all mentioned problems, > but I'd like to get a feedback before continuing. > > Patch 1: > There is a macro "PageGetItem" in bufpage.h that is used to retrieve an item > from the given page. It is used for both heap and index tuples. > It doesn't seems a problem, until you are going to find anything in this > code. > > The first patch "PageGetItem_refactoring.patch" is intended to fix it. > It is just renaming: > (IndexTuple) PageGetItem ---> PageGetItemIndex > (HeapTupleHeader) PageGetItem ---> PageGetItemHeap > Other types of tuples (BrinTuple, SpGistInnerTuple, SpGistLeafTuple, > SpGistDeadTuple) > still use PageGetItem. > > I also changed functions that do not access tuple's data, but only need > HeapTupleHeader fields. They use a macro PageGetItemHeapHeaderOnly. > > I do not insist on these particular names, by the way. > > Patch 2. > heapam.c, hio.c and src/backend/catalog/heap.c have a lot of confusing names > and outdated comments. The patch is intended to improve it. > Fun fact: I found several "check it later" comments unchanged since > "Postgres95 1.01 Distribution - Virgin Sources" commit. > > I wonder if we can wind better name relation_drop_with_catalog() since, > again, it's > not about all kinds of relations. We use another function to drop index > relations. > > I hope these changes will be useful for the future development. > Suggested patches are mostly about renaming and rearrangement of functions > between files, so, if nobody has conceptual objections, all I need from > reviewers > is an attentive look to find typos, grammar mistakes and overlooked areas. Could you add this to the commitfest? Thom
pgsql-hackers by date: