Home > mailing lists

Re: tableam vs. TOAST - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: tableam vs. TOAST
Date	November 6, 2019 19:55:46
Msg-id	20191106165546.gsjwm5e66epclehn@alap3.anarazel.de Whole thread Raw
In response to	Re: tableam vs. TOAST (Robert Haas <robertmhaas@gmail.com>)
List	pgsql-hackers

Tree view

Hi,

On 2019-11-06 10:38:58 -0500, Robert Haas wrote:
> Yeah. I've been nervous about trying to proceed with this patch
> because Andres seemed confident there was a better approach than what
> I did here, but as I wrote about back on September 12th, it doesn't
> seem like his idea will work. I'm not clear whether I'm being stupid
> and there's a way to salvage his idea, or whether he just made a
> mistake.

(still trying to get back to this)

> One possible approach would be to move more of the logic below the
> tableam layer. For example, toast_fetch_datum() could do this:
> 
> toastrel = table_open(toast_pointer.va_toastrelid, AccessShareLock);
> call_a_new_tableam_method_here(toast_rel, &toast_pointer);
> table_close(toastrel, AccessShareLock);

> ...and then it becomes the tableam's job to handle everything that
> needs to be done in the middle.

I think that's a good direction to go in, for more than just the the
discussion we're having here about the fixed chunking size.

I'm fairly sure that plenty AMs e.g. wouldn't want to actually store
toasted datums in a separate relation. And this would at least go more
in the direction of making that possible. And I think the above ought to
not even increase the overhead compared to the patch, as the number of
indirect function calls ought to stay the same or even be lower. The
indirection would otherwise have to happen within toast_fetch_datum(),
whereas with the above, it ought to be possible to avoid needing to do
so processing toast chunks.

It seems, unfortunately, that atm an AM not wanting to store toast
datums in a separate file, would still need to create a toast relation,
just to get a distinct toast oid, to make sure that toasted datums from
the old and new relation are distinct. Which seems like it could be
important e.g. for table rewrites. There's probably some massaging
of table rewrites and cluster needed.

I suspect the new callback ought to allow sliced and non-sliced access,
perhaps by just allowing to specify slice offset / length to 0 and
INT32_MAX respectively (or maybe just -1, -1?). That'd also allow an AM
to make slicing possible in cases that's not possible for heap. And
there seems little point in having two callbacks.

> That might be better than what I've got now; it's certainly more
> flexible. It does mean that an AM that just wants to reuse the
> existing logic with a different chunk size has got to repeat some
> code, but it's probably <~150 lines, so that's perhaps not a
> catastrophe.

Also seems that the relevant code can be made reusable (opting in into
the current logic), in line with where you've been going with this code
already.

> Alternatively, we could (a) stick with the current approach, (b) use
> the current approach but make the table AM member a callback rather
> than a constant, or (c) something else entirely.

A callback returning the chunk size does not seem like an improvement to
me.

> I don't want to give up on making the TOAST infrastructure pluggable;
> requiring every AM to use the heap as its TOAST implementation seems
> too constraining.

+1

> > I think AMs are probably going to need a general mechanism to store
> > pg_control-like data somewhere.  There are going to be chunk sizes,
> > block sizes, segment sizes, and so on.  This one is just a particular
> > case of that.
> 
> That's an interesting point. I don't know for sure to what extent we
> need that; I think that the toast chunk size is actually not very
> interesting to vary, and the fact that we technically allow it to be
> varied seems like it isn't buying us much.

Whether I agree with that statement depends a bit on what you mean by
varying the chunk size. If you mean that there's not much need for a
value other than a adjusted computation of what's currently used, then I
don't agree:

We currently make toast a lot more expensive by quadrupling the number
of separate heap fetches.

And e.g. compressing chunks separately, to allow for sliced access even
when compressed, would also be hard to do with the current sizes.

Additionally, I *very* strongly suspect that, while it makes sense to
use chunks sizes where multiple chunks fit a page for a heavily updated
toast table full of small-ish values, that it makes no sense whatsoever
to do so when toasting a 10MB value that's going to be appended to the
toast relation, because there's no space available for reuse anyway. And
smaller chunking isn't going to help with space reuse either, because
the whole toast datum is going to be deleted together.

I think the right way for toast creation to behave really would be to
check whether there's free space available that'd benefit from using
smaller chunks and do so if available, and otherwise use all the space
in a page for each chunk.

That'd obviously make sliced access harder, so it surely isn't a
panacea.

Greetings,

Andres Freund

pgsql-hackers by date:

From: Robert Haas
Date: 06 November 2019, 19:49:10
Subject: Re: tableam vs. TOAST

From: Andres Freund
Date: 06 November 2019, 20:00:18
Subject: Re: tableam vs. TOAST

Re: tableam vs. TOAST - Mailing list pgsql-hackers

Previous

Next