Re: Table AM Interface Enhancements - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: Table AM Interface Enhancements
Date
Msg-id CAPpHfdvi5env4eqQMeE0wkV7k6a5Rc8BPiYMt0H4yEqSP03-Bw@mail.gmail.com
Whole thread Raw
In response to Re: Table AM Interface Enhancements  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
List pgsql-hackers
Hi, Matthias!

On Fri, Nov 24, 2023 at 1:07 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> On Thu, 23 Nov 2023 at 13:43, Alexander Korotkov <aekorotkov@gmail.com> wrote:
> >
> > Hello PostgreSQL Hackers,
> >
> > I am pleased to submit a series of patches related to the Table Access
> > Method (AM) interface, which I initially announced during my talk at
> > PGCon 2023 [1]. These patches are primarily designed to support the
> > OrioleDB engine, but I believe they could be beneficial for other
> > table AM implementations as well.
> >
> > The focus of these patches is to introduce more flexibility and
> > capabilities into the Table AM interface. This is particularly
> > relevant for advanced use cases like index-organized tables,
> > alternative MVCC implementations, etc.
> >
> > Here's a brief overview of the patches included in this set:
>
> Note: no significant review of the patches, just a first response on
> the cover letters and oddities I noticed:
>
> Overall, this patchset adds significant API area to TableAmRoutine,
> without adding the relevant documentation on how it's expected to be
> used.

I have to note that, unlike documentation for index access methods,
our documentation for table access methods doesn't have an explanation
of API functions.  Instead, it just refers to tableam.h for details.
The patches touching tableam.h also revise relevant comments.  These
comments are for sure a target for improvements.

> With the overall size of the patchset also being very
> significant

I wouldn't say that volume is very significant.  It's just 2K lines,
not the great size of a patchset.  But it for sure represents changes
of great importance.

> I don't think this patch is reviewable as is; the goal
> isn't clear enough,

The goal is to revise table AM API so that new full-featured
implementations could exist.  AFAICS, the current API was designed
keeping zheap in mind, but even zheap was always shipped with the core
patch.  All other implementations of table AM, which I've seen, are
very limited.  Basically, there is still no real alternative and
functional OLTP table AM.  I believe API limitation is one of the
reasons for that.

> the APIs aren't well explained, and

As I mentioned before, the table AM API is documented by the comments
in tableam.h.  The comments in the patchset aren't perfect for sure,
but a subject for the incremental improvements.

> the interactions with the index API are left up in the air.

Right.  These patches bring more control on interactions with indexes
to table AMs without touching the index API.  In my PGCon 2016 talk I
proposed that table AM could have its own implementation of index AM.

As you mentioned before, this patchset isn't very small already.
Considering it all together with a patchset for index AM redesign
would make it a mess.  I propose we can consider here the patches,
which are usable by themselves even without index AM changes.  And the
patches tightly coupled with index AM API changes could be considered
later together with those changes.

> > 0001-Allow-locking-updated-tuples-in-tuple_update-and--v1.patch
> >
> > Optimizes the process of locking concurrently updated tuples during
> > update and delete operations. Helpful for table AMs where refinding
> > existing tuples is expensive.
>
> Is this essentially an optimized implementation of the "DELETE FROM
> ... RETURNING *" per-tuple primitive?

Not really.  The test for "DELETE FROM ... RETURNING *" was used just
to reproduce one of the bugs stopped in [2].  The general idea is to
avoid repeated calls for tuple lock.

> > 0003-Allow-table-AM-to-store-complex-data-structures-i-v1.patch
> >
> > Allows table AM to store complex data structure in rd_amcache rather
> > than a single chunk of memory.
>
> I don't think we should allow arbitrarily large and arbitrarily many
> chunks of data in the relcache or table caches.

Hmm.. It seems to be far out of control of API what and how large
PostgreSQL extensions could actually cache.

> Why isn't one chunk
> enough?

It's generally possible to fit everything into one chunk, but that's
extremely unhandy when your cache contains something at least as
complex as tuple slots and descriptors.  I think the reason that we
still have one chunk restriction is that we don't have a full-featured
implementation fitting API yet.  If we had it, I can't imagine there
would be one chunk for a cache.

> > 0004-Add-table-AM-tuple_is_current-method-v1.patch
> >
> > This allows us to abstract how/whether table AM uses transaction identifiers.
>
> I'm not a fan of the indirection here. Also, assuming that table slots
> don't outlive transactions, wouldn't this be a more appropriate fit
> with the table tuple slot API?

This is a good idea.  I will update the patch accordingly.

> > 0005-Generalize-relation-analyze-in-table-AM-interface-v1.patch
> >
> > Provides a more flexible API for sampling tuples, beneficial for
> > non-standard table types like index-organized tables.
> >
> > 0006-Generalize-table-AM-API-for-INSERT-.-ON-CONFLICT-v1.patch
> >
> > Provides a new table AM API method to encapsulate the whole INSERT ...
> > ON CONFLICT ... algorithm rather than just implementation of
> > speculative tokens.
>
> Does this not still require speculative inserts, with speculative
> tokens, for secondary indexes? Why make AMs implement that all over
> again?

The idea here is to generalize upsert and leave speculative tokens as
details of one particular implementation.  Imagine an index-organized
table and upsert on primary key.  For that you need to just locate the
relevant page in a tree and do insert or update.  Speculative tokens
would rather be an unreasonable complication for this case.

> > 0007-Allow-table-AM-tuple_insert-method-to-return-the--v1.patch
> >
> > This allows table AM to return a native tuple slot, which is aware of
> > table AM-specific system attributes.
>
> This seems reasonable.
>
> > 0008-Let-table-AM-insertion-methods-control-index-inse-v1.patch
> >
> > Allows table AM to skip index insertions in the executor and handle
> > those insertions itself.
>
> Who handles index tuple removal then?

Table AM implementation decides what actions to perform on tuple
update/delete.  The reason why it can't really care about updating
indexes is that the executor already does it.
The situation is different with deletes, because the executor doesn't
do something immediately about the corresponding index tuples.  They
are deleted later by vacuum, which is also controlled by table AM
implementation.

> I don't see a patch that describes index AM changes for this...

Yes, index AM should be revised for that.  See my comment about that earlier.

> > 0009-Custom-reloptions-for-table-AM-v1.patch
> >
> > Enables table AMs to define and override reloptions for tables and indexes.
> >
> > 0010-Notify-table-AM-about-index-creation-v1.patch
> >
> > Allows table AMs to prepare or update specific meta-information during
> > index creation.
>
> I don't think the described use case of this API is OK - a table AM
> cannot know about the internals of index AMs, and is definitely not
> allowed to overwrite the information of that index.
> If I ask for an index that uses the "btree" index, then that needs to
> be the AM actually used, or an error needs to be raised if it is
> somehow incompatible with the table AM used. It can't be that we
> silently update information and create an index that is explicitly not
> what the user asked to create.

I agree that this currently looks more like workarounds rather than
proper API changes.  I propose these two should be considered later
together with relevant index API changes.

> I also don't see updates in documentation, which I think is quite a
> shame as I have trouble understanding some parts.

Sorry for this.  I hope I gave some answers in this message and I'll
update the patchset comments and commit messages accordingly.  And I'm
open to answer any further questions.

> > 0012-Introduce-RowID-bytea-tuple-identifier-v1.patch
> >
> > `This patch introduces 'RowID', a new bytea tuple identifier, to
> > overcome the limitations of the current 32-bit block number and 16-bit
> > offset-based tuple identifier. This is particularly useful for
> > index-organized tables and other advanced use cases.
>
> We don't have any index methods that can handle anything but
> block+offset TIDs, and I don't see any changes to the IndexAM APIs to
> support these RowID tuples, so what's the plan here? I don't see any
> of that in the commit message, nor in the rest of this patchset.
>
> > Each commit message contains a detailed explanation of the changes and
> > their rationale. I believe these enhancements will significantly
> > improve the flexibility and capabilities of the PostgreSQL Table AM
> > interface.
>
> I've noticed there is not a lot of rationale for several of the
> changes as to why PostgreSQL needs these changes implemented like
> this, amongst which the index-related tableAM changes.
>
> I understand that index-organized tables can be quite useful, but I
> don't see design solutions to the more complex questions that would
> still be required before we could host such table AMs like OreoleDB's
> as a first-party citizen: For index-organized tables, you also need
> index AM APIs that support TIDS with more than 48 bits of data
> (assuming we actually want primary keys with >48 bits of usable
> space), and for undo-based logging you would probably need index APIs
> for retail index tuple deletion. Neither is supplied here, nor is
> described why these APIs were omitted.

As I mentioned before, I agree that index AM changes haven't been
presented yet.  And yes, for bytea rowID there is currently no way to
use the current index API.  However, I think this exact patch could be
useful even without index AM implements.  This allows table AMs to
identify rows by custom bytea, even though these tables couldn't be
indexed yet.  So, if we allow a custom table AM to implement an
index-organized table, that would have use cases even if secondary
indexes are not supported yet.

Links
1. https://pgconf.ru/media/2016/02/19/06_Korotkov%20Extendability.pdf




2. https://www.postgresql.org/message-id/CAPpHfdua-YFw3XTprfutzGp28xXLigFtzNbuFY8yPhqeq6X5kg%40mail.gmail.com

------
Regards,
Alexander Korotkov



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: RangeTblEntry.inh vs. RTE_SUBQUERY
Next
From: Alexander Korotkov
Date:
Subject: Re: Table AM Interface Enhancements