tableam vs. TOAST - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | tableam vs. TOAST |
Date | |
Msg-id | CA+TgmoZv-=2iWM4jcw5ZhJeL18HF96+W1yJeYrnGMYdkFFnEpQ@mail.gmail.com Whole thread Raw |
Responses |
Re: tableam vs. TOAST
|
List | pgsql-hackers |
In a nearby thread[1], Ashwin Agrawal complained that there is no way for a table AM to get rid the TOAST table that the core system thinks should be created. To that I added a litany of complaints of my own, including... - the core system decides whether or not a TOAST table is needed based on criteria that are very much heap-specific, - the code for reading and writing values stored in a TOAST table is heap-specific, and - the core system assumes that you want to use the same table AM for the main table and the toast table, but you might not (e.g. you might want to use the regular old heap for the latter). Attached as a series of patches which try to improve things in this area. Except possibly for 0001, this is v13 material; see discussion on the other thread. These likely need some additional work, but I've done enough with them that I thought it would be worth publishing them at this stage, because it seems that I'm not the only one thinking about the problems that exist in this general area. Here is an overview: 0001 moves the needs_toast_table() calculation below the table AM layer. That allows a table AM to decide for itself whether it wants a TOAST table. The most obvious way in which a table AM might want to be different from what core expects is to decide that the answer is always "no," which it can do if it has some other method of storing large values or doesn't wish to support them. Another possibility is that it wants logic that is basically similar to the heap, but with a different size threshold because its tuple format is different. There are probably other possibilities. 0002 breaks tuptoaster.c into three separate files. It just does code movement; no functional changes. The three pieces are detoast.c, which handles detoasting of toast values and inspection of the sizes of toasted datums; heaptoast.c, which keeps all the functions that are intrinsically heap-specific; and toast_internals.c, which is intended to have a very limited audience. A nice fringe benefit of this stuff is that a lot of other files that current have to include tuptoaster.h and thus htup_details.h no longer do. 0003 creates a new file toast_helper.c which is intended to help table AMs implement insertion and deletion of toast table rows. Most of the AM-independent logic from the functions remaining in heaptoast.c is moved to this file. This leaves about ~600 of the original ~2400 lines from tuptoaster.c as heap-specific logic, but a new heap AM actually wouldn't need all of that stuff, because some of the logic here is in support of stuff like record types, which use HeapTuple internally and will continue to do so even if those record types are stored in some other kind of table. 0004 allows TOAST tables to be implemented using a table AM other than heap. In a certain sense this is the opposite of 0003. 0003 is intended to help people who are implementing a new kind of main table, whereas 0004 is intended to help people implementing a new kind of TOAST table. It teaches the code that inserts, deletes, and retrieves TOAST row to use slots, and it makes some efficiency improvements in the hopes of offsetting any performance loss from so doing. See commit message and/or patch for full details. I believe that with all of these changes it should be pretty straightforward for a table AM that wants to use itself to store TOAST data to do so, or to delegate that task back to say the regular heap. I haven't really validated that yet, but plan to do so. In addition to what's in this patch set, I believe that we should probably rename some of these functions and macros, so that the heap-specific ones have heap-specific names and the generic ones don't, but I haven't gone through all of that yet. The existing patches try to choose good names for the new things they add, but they don't rename any of the existing stuff. I also think we should consider removing TOAST_MAX_CHUNK_SIZE from the control file, both because I'm not sure anybody's really using the ability to vary that for anything and because that solution doesn't seem entirely sensible in a world of multiple AMs. However, that is a debatable change, so maybe others will disagree. [1] http://postgr.es/m/CALfoeitE+P8UGii8=BsGQLpHch2EZWJhq4M+D-jfaj8YCa_FSw@mail.gmail.com -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
pgsql-hackers by date: