Re: [RFC] indirect toast tuple support - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [RFC] indirect toast tuple support
Date
Msg-id 20130219140055.GA4582@awork2.anarazel.de
Whole thread Raw
In response to Re: [RFC] indirect toast tuple support  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [RFC] indirect toast tuple support  (Robert Haas <robertmhaas@gmail.com>)
Re: [RFC] indirect toast tuple support  (Greg Stark <stark@mit.edu>)
List pgsql-hackers
On 2013-02-19 08:48:05 -0500, Robert Haas wrote:
> On Sat, Feb 16, 2013 at 11:42 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > Given that there have been wishes to support something like b) for quite
> > some time, independent from logical decoding, it seems like a good idea
> > to add support for it. Its e.g. useful for avoiding repeated detoasting
> > or decompression of tuples.
> >
> > The problem with b) is that there is no space in varlena's flag bits to
> > directly denote that a varlena points into memory instead of either
> > directly containing the data or a varattrib_1b_e containing a
> > varatt_external pointing to an on-disk toasted tuple.
> 
> So the other way that we could do this is to use something that's the
> same size as a TOAST pointer but has different content - the
> seemingly-obvious choice being  va_toastrelid == 0.

Unfortunately that would mean you need to copy the varatt_external (or
whatever it would be called) to aligned storage to check what it
is. Thats why I went the other way.

Its a bit sad that varatt_1b_e only contains a length and not a type
byte. I would like to change the storage of existing toast types but
thats not going to work for pg_upgrade reasons...


>  I'd be a little
> reluctant to do it the way you propose because we might, at some
> point, want to try to reduce the size of toast pointers.   If you have
> a tuple with many attributes, the size of the TOAST pointers
> themselves starts to add up.  It would be nice to be able to have 8
> byte or even 4 byte toast pointers to handle those situations.  If we
> steal one or both of those lengths to mean "the data is cached in
> memory somewhere" then we can't use those lengths in a smaller on-disk
> representation, which would seem a shame.

I agree. As I said above, having the type overlayed into the lenght was
and is a bad idea, I just haven't found a better one thats compatible
yet.
Except inventing typlen=-3 aka "toast2" or something. But even that
wouldn't help getting rid of existing pg_upgraded tables. Besides being
a maintenance nightmare.

The only reasonable thing I can see us doing is renaming
varattrib_1b_e.va_len_1be into va_type and redefine VARSIZE_1B_E into a
switch that maps types into lengths. But I think I would put this off,
except placing a comment somewhere, until its gets necessary.

> But having said that, +1 on the general idea of getting something like
> this done.  We really need a better infrastructure to avoid copying
> large values around repeatedly in memory - a gigabyte is a lot of data
> to be slinging around.
> 
> Of course, you will not be surprised to hear that I think this is 9.4 material.

Yes, obviously. But I need time to actually propose a working patch (I
already found 2 bugs in what I had submitted), thats why I brought it up
now. No point in wasting time if there's an oviously better idea around.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: sql_drop Event Trigger
Next
From: Robert Haas
Date:
Subject: Re: JSON Function Bike Shedding