Re: Refactoring of heapam code. - Mailing list pgsql-hackers

From Thom Brown
Subject Re: Refactoring of heapam code.
Date
Msg-id CAA-aLv4TS4-QUn=NXcJ2J0wdR4=y9X8etJ1kFWjQXBwpPsSVCQ@mail.gmail.com
Whole thread Raw
In response to Refactoring of heapam code.  (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>)
List pgsql-hackers
On 5 August 2016 at 08:54, Anastasia Lubennikova
<a.lubennikova@postgrespro.ru> wrote:
> Working on page compression and some other issues related to
> access methods, I found out that the code related to heap
> looks too complicated. Much more complicated, than it should be.
> Since I anyway got into this area, I want to suggest a set of improvements.
>
> There is a number of problems I see in the existing code:
> Problem 1. Heap != Relation.
>
> File heapam.c has a note:
>  *      This file contains the heap_ routines which implement
>  *      the POSTGRES heap access method used for all POSTGRES
>  *      relations.
> This statement is wrong, since we also have index relations,
> that are implemented using other access methods.
>
> Also I think that it's quite strange to have a function heap_create(), that
> is called
> from index_create(). It has absolutely nothing to do with heap access
> method.
>
> And so on, and so on.
>
> One more thing that requires refactoring is "RELKIND_RELATION" name.
> We have a type of relation called "relation"...
>
> Problem 2.
> Some functions are shared between heap and indexes access methods.
> Maybe someday it used to be handy, but now, I think, it's time to separate
> them, because IndexTuples and HeapTuples have really little in common.
>
> Problem 3. The word "heap" is used in many different contexts.
> Heap - a storage where we have tuples and visibility information.
> Generally speaking, it can be implemented over any data structure,
> not just a plain table as it is done now.
>
> Heap - an access method, that provides a set of routines for plain table
> storage, such as insert, delete, update, gettuple, vacuum and so on.
> We use this access method not only for user's tables.
>
> There are many types of relations (pg_class.h):
> #define          RELKIND_RELATION        'r'        /* ordinary table */
> #define          RELKIND_INDEX           'i'        /* secondary index */
> #define          RELKIND_SEQUENCE        'S'        /* sequence object */
> #define          RELKIND_TOASTVALUE      't'        /* for out-of-line
> values */
> #define          RELKIND_VIEW            'v'        /* view */
> #define          RELKIND_COMPOSITE_TYPE  'c'        /* composite type */
> #define          RELKIND_FOREIGN_TABLE   'f'        /* foreign table */
> #define          RELKIND_MATVIEW         'm'        /* materialized view */
>
> They can be logically separated into three categories:
> "primary storage" - r, S, t, v. They store data and visibility information.
> The only implementation is heapam.c
> "secondary index" - i. They store some data and pointers to primary storage.
> Various existing AMs and managed by AM API.
> "virtual relations" - c, f, m. They have no physical storage, only entries
> in caches and catalogs.
>
> Now, for some reasons, we sometimes use name "heap" for both
> "primary storage" and "virual relations". See src/backend/catalog/heap.c.
> But some functions work only with "primary storage". See pgstat_relation().
> And we constantly have to check relkind everywhere.
> I'd like it would be clear from the code interface and naming.
>
> So there is a couple of patches. They do not cover all mentioned problems,
> but I'd like to get a feedback before continuing.
>
> Patch 1:
> There is a macro "PageGetItem" in bufpage.h that is used to retrieve an item
> from the given page. It is used for both heap and index tuples.
> It doesn't seems a problem, until you are going to find anything in this
> code.
>
> The first patch "PageGetItem_refactoring.patch" is intended to fix it.
> It is just renaming:
> (IndexTuple) PageGetItem ---> PageGetItemIndex
> (HeapTupleHeader) PageGetItem ---> PageGetItemHeap
> Other types of tuples (BrinTuple, SpGistInnerTuple, SpGistLeafTuple,
> SpGistDeadTuple)
> still use PageGetItem.
>
> I also changed functions that do not access tuple's data, but only need
> HeapTupleHeader fields. They use a macro PageGetItemHeapHeaderOnly.
>
> I do not insist on these particular names, by the way.
>
> Patch 2.
> heapam.c, hio.c and src/backend/catalog/heap.c have a lot of confusing names
> and outdated comments. The patch is intended to improve it.
> Fun fact: I found several "check it later" comments unchanged since
>  "Postgres95 1.01 Distribution - Virgin Sources" commit.
>
> I wonder if we can wind better name relation_drop_with_catalog() since,
> again, it's
> not about all kinds of relations. We use another function to drop index
> relations.
>
> I hope these changes will be useful for the future development.
> Suggested patches are mostly about renaming and rearrangement of functions
> between files, so, if nobody has conceptual objections, all I need from
> reviewers
> is an attentive look to find typos, grammar mistakes and overlooked areas.

Could you add this to the commitfest?

Thom



pgsql-hackers by date:

Previous
From: Andrew Gierth
Date:
Subject: Re: Bogus ANALYZE results for an otherwise-unique column with many nulls
Next
From: Murat Tuncer
Date:
Subject: truncate trigger for foreign data wrappers