Re: tableam vs. TOAST - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: tableam vs. TOAST |
Date | |
Msg-id | 20191106165546.gsjwm5e66epclehn@alap3.anarazel.de Whole thread Raw |
In response to | Re: tableam vs. TOAST (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
Hi, On 2019-11-06 10:38:58 -0500, Robert Haas wrote: > Yeah. I've been nervous about trying to proceed with this patch > because Andres seemed confident there was a better approach than what > I did here, but as I wrote about back on September 12th, it doesn't > seem like his idea will work. I'm not clear whether I'm being stupid > and there's a way to salvage his idea, or whether he just made a > mistake. (still trying to get back to this) > One possible approach would be to move more of the logic below the > tableam layer. For example, toast_fetch_datum() could do this: > > toastrel = table_open(toast_pointer.va_toastrelid, AccessShareLock); > call_a_new_tableam_method_here(toast_rel, &toast_pointer); > table_close(toastrel, AccessShareLock); > ...and then it becomes the tableam's job to handle everything that > needs to be done in the middle. I think that's a good direction to go in, for more than just the the discussion we're having here about the fixed chunking size. I'm fairly sure that plenty AMs e.g. wouldn't want to actually store toasted datums in a separate relation. And this would at least go more in the direction of making that possible. And I think the above ought to not even increase the overhead compared to the patch, as the number of indirect function calls ought to stay the same or even be lower. The indirection would otherwise have to happen within toast_fetch_datum(), whereas with the above, it ought to be possible to avoid needing to do so processing toast chunks. It seems, unfortunately, that atm an AM not wanting to store toast datums in a separate file, would still need to create a toast relation, just to get a distinct toast oid, to make sure that toasted datums from the old and new relation are distinct. Which seems like it could be important e.g. for table rewrites. There's probably some massaging of table rewrites and cluster needed. I suspect the new callback ought to allow sliced and non-sliced access, perhaps by just allowing to specify slice offset / length to 0 and INT32_MAX respectively (or maybe just -1, -1?). That'd also allow an AM to make slicing possible in cases that's not possible for heap. And there seems little point in having two callbacks. > That might be better than what I've got now; it's certainly more > flexible. It does mean that an AM that just wants to reuse the > existing logic with a different chunk size has got to repeat some > code, but it's probably <~150 lines, so that's perhaps not a > catastrophe. Also seems that the relevant code can be made reusable (opting in into the current logic), in line with where you've been going with this code already. > Alternatively, we could (a) stick with the current approach, (b) use > the current approach but make the table AM member a callback rather > than a constant, or (c) something else entirely. A callback returning the chunk size does not seem like an improvement to me. > I don't want to give up on making the TOAST infrastructure pluggable; > requiring every AM to use the heap as its TOAST implementation seems > too constraining. +1 > > I think AMs are probably going to need a general mechanism to store > > pg_control-like data somewhere. There are going to be chunk sizes, > > block sizes, segment sizes, and so on. This one is just a particular > > case of that. > > That's an interesting point. I don't know for sure to what extent we > need that; I think that the toast chunk size is actually not very > interesting to vary, and the fact that we technically allow it to be > varied seems like it isn't buying us much. Whether I agree with that statement depends a bit on what you mean by varying the chunk size. If you mean that there's not much need for a value other than a adjusted computation of what's currently used, then I don't agree: We currently make toast a lot more expensive by quadrupling the number of separate heap fetches. And e.g. compressing chunks separately, to allow for sliced access even when compressed, would also be hard to do with the current sizes. Additionally, I *very* strongly suspect that, while it makes sense to use chunks sizes where multiple chunks fit a page for a heavily updated toast table full of small-ish values, that it makes no sense whatsoever to do so when toasting a 10MB value that's going to be appended to the toast relation, because there's no space available for reuse anyway. And smaller chunking isn't going to help with space reuse either, because the whole toast datum is going to be deleted together. I think the right way for toast creation to behave really would be to check whether there's free space available that'd benefit from using smaller chunks and do so if available, and otherwise use all the space in a page for each chunk. That'd obviously make sliced access harder, so it surely isn't a panacea. Greetings, Andres Freund
pgsql-hackers by date: