Re: [HACKERS] GSOC - TOAST'ing in slices - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] GSOC - TOAST'ing in slices
Date
Msg-id 19235.1489586039@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] GSOC - TOAST'ing in slices  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] GSOC - TOAST'ing in slices  (George Papadrosou <gpapadrosou@gmail.com>)
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Mar 14, 2017 at 10:03 PM, George Papadrosou
> <gpapadrosou@gmail.com> wrote:
>> The project’s idea is implement different slicing approaches according to
>> the value’s datatype. For example a text field could be split upon character
>> boundaries while a JSON document would be split in a way that allows fast
>> access to it’s keys or values.

> Hmm.  So if you had a long text field containing multibyte characters,
> and you split it after, say, every 1024 characters rather than after
> every N bytes, then you could do substr() without detoasting the whole
> field.  On the other hand, my guess is that you'd waste a fair amount
> of space in the TOAST table, because it's unlikely that the chunks
> would be exactly the right size to fill every page of the table
> completely.  On balance it seems like you'd be worse off, because
> substr() probably isn't all that common an operation.

Keep in mind also that slicing on "interesting" boundaries rather than
with the current procrustean-bed approach could save you at most one or
two chunk fetches per access.  So the upside seems limited.  Moreover,
how are you going to know whether a given toast item has been stored
according to your newfangled approach?  I doubt we're going to accept
forcing a dump/reload for this.

IMO, the real problem here is to be able to predict which chunk(s) to
fetch at all, and I'd suggest focusing on that part of the problem rather
than changes to physical storage.  It's hard to see how to do anything
very smart for text (except in the single-byte-encoding case, which is
already solved).  But the JSONB format was designed with some thought
to this issue, so you might be able to get some traction there.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [HACKERS] GUC for cleanup indexes threshold.
Next
From: Stephen Frost
Date:
Subject: Re: [HACKERS] Allow pg_dumpall to work without pg_authid